Integration and Synthess for Automated Performance Tuning: the SYNAPT Project
|
|
|
- Shawn McDonald
- 9 years ago
- Views:
Transcription
1 Integration and Synthess for Automated Performance Tuning: the SYNAPT Project Nicholas Chaimov, Boyana Norris, and Allen D. Malony Department of Computer and Information Science University of Oregon Eugene, OR Abstract. To address the growing needs of performance analysis and optimization of complex applications, we present our vision for a new unified architecture, SYNAPT, that can support not just data acquisition and simple analysis, but also performance knowledge preservation of models and the metadata required to make performance data analysis approaches reproducible, extensible, and composable. In addition to standard representation of performance analysis results, the SYNAPT architecture will include new functionality that enables users and tools to use and grow the comprehensive performance knowledge base at the heart of the SYNAPT performance expert system that can support both manual analysis and modeling tasks and interface with external tools such as autotuning compilers. 1 Introduction High-performance applications and architectures are rapidly growing in scale and complexity, presenting a similarly expanding set of challenges to tools that provide performance measurement, analysis, and optimization capabilities. While many tools enable the collection of performance data at different granularities and through both non-intrusive and intrusive approaches, their analysis capabilities lag behind in functionality, scalability, usability, and interoperability. Consider, for example, performance analysis that generates some results summarizing properties of interest from the raw performance data. At present, most tools simply present the results in a graphical interface or at best store them by using a custom format, which no other tool can read. This hampers the ability to share or even just reproduce analysis results. Furthermore, other independently developed tools, such as autotuners, cannot easily use any of the derived performance metrics to help better define the potentially huge space of optimization options. To address these and other shortcomings of the existing state-of-the art performance modeling, analysis, and optimization environments, we introduce our vision for SYNAPT (Synthesized tools for Automated Performance Tuning), a unified architecture for performance knowledge preservation, integration, and evolution that enables users and tools to store and share the results of program and code analysis, architecture and systems characterization, performance experiments, modeling activities, and tuning parameters and transformations.
2 2 Related Work The concept of integrated measurement and analysis can be found in several parallel performance tools projects [9, 16, 21]. However, these environments generally are not extensible enough to accept input of more information from other sources, or do not provide interfaces usable from autotuning frameworks. In contrast, there have been several projects integrating data mining and machine learning techniques with the selection of compiler optimizations based on static properties of the code, hardware counters sampled during runtime, or both, using a variety of classifiers [5, 8, 18, 19, 22]. A major project combining machine learning for selecting compiler optimizations with a centralized database is the Collective Tuning project [7]. That project uses a modified version of GCC in which functions of interest are cloned at compile time, creating an optimized and unoptimized version. When the original function is invoked at runtime, one of the clones is selected for execution according to a probability distribution, which is initially uniform. Runtime sampling determines whether the proportion of time spent in the variants differs from the probability distribution. If the proportion of time spent in a function is significantly lower than the expectation assuming equality, then it can be inferred that that variant offers better performance than its competitor. The result of the competition is then communicated to a central server which updates probability distributions: one probability distribution for the program, another which aggregates data for programs with similar reactions to optimizations, and a third which aggregates data across all programs. Cavazos proposed an intelligent compiler [4] which features a centralized knowledge base. Such a compiler would use machine learning techniques to generate performance models for predicting likely-good optimizations based upon static analysis and empirical performance measurement of both the code now being compiled as well as similar programs previously compiled, and microbenchmarks of the target system. 3 Proposed Methodology: A Knowledge Database The idea for our proposed system comes from our experience with the TAU Performance System R [21], and in particular the included performance data management system, TAUdb [13], which stores profiles from performance experiments along with metadata describing the execution environment and applicationspecific metadata. Data stored in TAUdb is accessible from ParaProf [3], TAU s parallel profile analyzer, and PerfExplorer [12,14], TAU s performance data mining framework. PerfExplorer provides a library of internal analysis routines and a means to incorporate statistics and data mining packages, such R and Weka. An API to the analysis library and a Python scripting engine are available so that analysis pipelines can be specified programattically. TAUdb has been previously used to store annotated performance profiles of code variants generated during autotuning runs and to provide information on past runs in order to select good starting points for hill-climbing search algorithms in autotuning [6, 26].
3 Fig. 1. The proposed system features a centralized knowledge store. However, the existing TAUdb system is limited in that it only stores performance profiles and associated metadata. The remainder of this section describes our plans for a unified knowledge database capable of storing data from and providing data to a much broader set of tools. Figure 1 shows the central role this knowledge database plays in the SYNAPT system. The goal of the knowledge database will be to store more comprehensive data about the environment in which tuning is taking place: what code is being tuned, what data are inputs to the code, what runtime systems are being used (e.g., threading libraries, network stack), and what architectures are being used (e.g., CPUs, GPUs, other accelerator cards). This is already possible to some extent with the existing TAUdb, but we envision significantly more detailed knowledge about algorithms and libraries. This will require new ways to represent this information. 3.1 Functional Descriptions Our knowledge database should annotate code with a description of the algorithm the code implements, which would allow the system to identify relevant models and alternate implementations of the algorithm. While the user could manually annotate their functions, it would be better to train classifiers to identify algorithms based on features from static or dynamic analysis, as has been done recently with decision trees [24, 25]. The knowledge database would also contain data on commonly-used libraries, tagging functions in those libraries according to what algorithms they implement in the same way. The system could
4 then propose optimized library versions of code the user had previously developed manually. When input code already uses a library known to the system, it could provide guidance about best practices for the use of that library. We have promising preliminary results from constructing and using such functional taxonomies for numerical libraries in the Lighthouse project [15]. For example, we have successfully integrated all of LAPACK [1] and portions of PETSc [2] and SLEPc [11] and will continue to expand with more numerical packages. The taxonomy allows users to quickly find the best method available for a given problem based on different types of search interfaces (from guided question-answer based ones for beginners, to keyword domain-specific search methods). In addition to functional descriptions, we are integrating performance models (generated through various machine learning approaches), which can enable users to discover solution methods that fit both functional and performance requirements. However, because of the lack of a common, portable, compact representation of performance analysis results, we are unable to effectively use some of the available model generation approaches (e.g., one cannot even store a DynaTree model [23] generated in R it is only available in a single R session). 3.2 Performance Analysis Results The knowledge database will store performance models, whether generated from empirical performance measurements, from simulation, or analytical models developed by users. The system could make use of these models for performance prediction in order to carry out guided search for autotuning, and could automatically generate experiments to determine whether a given model s predictions are accurate for a given code, dataset and environment. The knowledge database will interface with analysis tools, such as visualization, data mining, and machine learning packages. For example, autotuning runs produce large amounts of data, so such tools are very useful in understanding the performance consequences of the various transformations which were tried. A standardized model for storing the results of such analyses in the database will be developed, so that the results can be persisted across trials and shared with other users. While the languages and approaches used for the model generation can vary, e.g., Java or Jython in PerfExplorer can be used to analyze performance data stored in TAUdb through interfaces to Weka and other data mining packages, the resulting performance metrics or models must be stored by using a common format, which does not yet exist and must be defined in order to make analyses reproducible, reusable, extensible, and composable. Such analysis results or models will be stored with provenance metadata describing the data which was analyzed and the experiments which generated the data. Search Capabilities The knowledge database will support advanced search capabilities beyond the low-level database queries available today in TAUdb and other performance databases. Because it will also store performance models, it will be possible to use the knowledge database to search for similar types of
5 computations without knowing a priori what they are. New tools for such an expert system will rely on the database to discover potentially similar performance data across applications, something that is not possible to do automatically with the state-of-the-art performance tools today. Any discovered similarities can in turn be used to enrich the knowledge database with such associations, which can be used to expand the set of known computational patterns and common performance characteristics (or problems). 3.3 Integration with Autotuning Systems Performance tuning tools today, including Orio, typically operate in a standalone fashion, i.e., given a code instance and a set of input parameters, they apply transformations and explore the space of possible transformations by using a number of optimization strategies (e.g., genetic algorithms, simulated annealing, Nelder-Mead-based optimization methods). Because purely empirical autotuning relies on compiling and executing optimized code variants in order to evaluate the objective function (e.g., time), the autotuning process can be costly in terms of time and computational resources. The knowledge database will provide input into, and collect metadata from autotuning systems. It will be used to propose parameters to code transformations in source-to-source autotuning systems such as Orio [10], and will capture parameters and generated code variants as metadata alongside the original code, runtime system and architectural data described above. If the system carries out algorithm recognition as described above, it can carry out transformations which involve replacing the implementation of the algorithm, as guided by models and data from past autotuning runs. The knowledge database could also be accessed by runtime-adaptive systems, which could deposit performance data while a program is running and retrieve data used to select an algorithm or other relevant parameters. We have previously used machine learning techniques for selecting good starting points for search in autotuning problems and for runtime selection from a library of specialized, autotuned code variants [6]. We used annotated performance data from autotuning runs across different CUDA targets to learn decision trees. Using a decision tree learned from a set of such devices to predict code transformation parameters for another device not previously tested significantly reduced the number of evaluations and overall time spent in search on that device, while still producing the same output code as a search from a default configuration. By autotuning matrix kernels across matrices of different sizes, we produced a library of size-specialized kernels. This system is shown in Figure 2. A specialized kernel could be selected by evaluating the decision tree at runtime when a wrapper kernel was invoked. We have recently added OpenCL code generation support to Orio, allowing a much wider range of possible target architectures, making the problem of selecting good initial parameters more difficult; a system combining profile data with architectural models and code characteristics would help with this.
6 Orio Code Generator OrCUDA OpenCL Experiment Links at Runtime TAU Metadata Entries CUDA CUPTI callback measurement library Writes Writes Autotuning analysis Machine learning Optimization search Specialization Transformations Execution Time Uploaded TAU Profiles TAUdb Measurement Metric profiling Metadata TAUdb storage Fig. 2. Architecture of the existing system integrating TAU and Orio. The system stores annotated performance profiles in a database, which can be used to select starting points for autotuning experiments. In addition to enabling the better selection of tuning parameters, the knowledge database and associated analysis infrastructure can support autotuning by providing models that can be used to guide the transformations themselves. We envision an approach that builds on the successful experience of autotuning compilers such as MilepostGCC [17, 20], which considers a set of features corresponding to program entities (e.g., functions, instructions and operands, variables, types, constants, basic blocks, and loops) and the relations among them (e.g., call graph, control-flow graph, loop hierarchy, control dependence graph, and data dependence graph). Unlike general compiler approaches, however, Orio can also capture some higher level, possibly domain-specific information (e.g., stencil-based computations, linear algebra operations), which can be used to produce more accurate models of numerical kernels or algorithms. Given such models, autotuners such as Orio would no longer require the user to explicitly specify the set of transformations to perform but would instead select them based on the models created by analyzing the results from previous autotuning experiments in the knowledge database. 4 Conclusion High-performance parallel computing is evolving towards systems of greater complexity, increasingly challenging our abilities to create HPC software productively
7 and effectively. Current parallel performance tools are necessary for understanding performance characteristics and inefficiencies, but they are insufficient by themselves to address complex optimization. On the other hand, autotuning frameworks are eager to obtain richer performance information wherever available and support within the tool environments for automated experimentation, data and results management, and sophisticated analysis. Our SYNAPT concept identifies the importance of knowledge integration as the key factor for tool synthesis, both with respect to information sharing and tool interoperation. The general idea is to shift the focus of attention to what tools are producing as knowledge and in what forms so that other tools can use the collective knowledge base for enabling greater functionality. We have taken initial steps towards the SYNAPT architecture with the Orio autotuning framework and TAU. Our early experiences of multi-target autotuning for accelerators support the benefits of improved automation, data mining, and association of tuning context with results, to aid in learning features and correlations. However, it is apparent from this work that the real challenge for SYNAPT research will be in formulation of techniques and methods for knowledge representation, management, and sharing. References 1. LAPACK - Linear Algebra PACKage, Balay, S., Brown, J., Buschelman, K., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Smith, B. F., and Zhang, H. PETSc Web page, Bell, R., Malony, A., and Shende, S. A portable, extensible, and scalable tool for parallel performance profile analysis. In European Conference on Parallel Processing (EuroPar 2003) (Sept. 2003), vol. LNCS 2790, pp Cavazos, J. Intelligent compilers. In Cluster Computing, 2008 IEEE International Conference on (Sept 2008), pp Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O Boyle, M. F. P., and Temam, O. Rapidly selecting good compiler optimizations using performance counters. In Proceedings of the International Symposium on Code Generation and Optimization (Washington, DC, USA, 2007), CGO 07, IEEE Computer Society, pp Chaimov, N., Biersdorff, S., and Malony, A. D. Tools for machine-learningbased empirical autotuning and specialization. International Journal of High Performance Computing Applications 27, 4 (2013), Fursin, G., and Temam, O. Collective optimization: A practical collaborative approach. ACM Trans. Archit. Code Optim. 7, 4 (Dec. 2010), 20:1 20: Ganapathi, A., Datta, K., Fox, A., and Patterson, D. A case for machine learning to optimize multicore performance. In Proceedings of the First USENIX conference on Hot topics in parallelism (Berkeley, CA, USA, 2009), HotPar 09, USENIX Association, pp Geimer, M., Wolf, F., Wylie, B. J. N., and Mohr, B. Scalable parallel trace-based performance analysis. In Proceedings of the 13th European PVM/MPI Users Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface (2006) (Bonn, Germany, 2006), pp
8 10. Hartono, A., Norris, B., and Sadayappan, P. Annotation-based empirical performance tuning using Orio. In Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium (Rome, Italy, 2009). 11. Hernandez, V., Roman, J. E., and Vidal, V. SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Software 31, 3 (2005), Huck, K., and Malony, A. PerfExplorer: A performance data mining framework for large-scale parallel computing. In Supercomputing Conference (SC 2005) (Nov. 2005), ACM. 13. Huck, K., Malony, A., Bell, R., and Morris, A. Design and implementation of a parallel performance data management framework. In International Conference on Parallel Processing (ICPP 2005) (Aug. 2005), IEEE Computer Society. 14. Huck, K., Malony, A., Shende, S., and Morris, A. Knowledge support and automation for performance analysis with PerfExplorer 2.0. The Journal of Scientific Programming 16, 2-3 (2008), (special issue on Large-Scale Programming Tools and Environments). 15. Lighthouse project Mellor-Crummey, J. HPCToolkit: Multi-platform tools for profile-based performance analysis. In 5th International Workshop on Automatic Performance Analysis (APART) (November 2003). 17. MILEPOST GCC: Collaborative development website Monsifrot, A., Bodin, F., and Quiniou, R. A machine learning approach to automatic production of compiler heuristics. In Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications (London, UK, UK, 2002), AIMSA 02, Springer-Verlag, pp Murthy, G., Ravishankar, M., Baskaran, M., and Sadayappan, P. Optimal loop unrolling for GPGPU programs. In Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on (2010), pp Namolaru, M., Cohen, A., Fursin, G., Zaks, A., and Freund, A. Practical aggregation of semantical program properties for machine learning based optimization. In Proceedings of the 2010 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (New York, NY, USA, 2010), CASES 10, ACM, pp Shende, S., and Malony, A. TAU: The TAU parallel performance system. International Journal of High Performance Computing Applications 20, 2 (2006), Stephenson, M., and Amarasinghe, S. Predicting unroll factors using supervised classification. In Proceedings of the International Symposium on Code Generation and Optimization (Washington, DC, USA, 2005), CGO 05, IEEE Computer Society, pp Taddy, M., Gramacy, R., and Polson, N. Dynamic trees for learning and design. Journal of the American Statistical Association 106, 493 (2011), Taherkhani, A. Using decision tree classifiers in source code analysis to recognize algorithms: An experiment with sorting algorithms. The Computer Journal (2011). 25. Taherkhani, A. Automatic Algorithm Recognition Based on Programming Schemas and Beacons: A Supervised Machine Learning Classification Approach. PhD thesis, Aalto University, Esbo, Finland, Yifan, Z. Extensions of an empirical automated tuning framework. Master s thesis, University of Maryland, 2013.
Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment
Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment Wyatt Spear, Allen Malony, Alan Morris, Sameer Shende {wspear, malony, amorris, sameer}@cs.uoregon.edu
Performance Data Management with TAU PerfExplorer
Performance Data Management with TAU PerfExplorer Sameer Shende Performance Research Laboratory, University of Oregon http://tau.uoregon.edu TAU Analysis 2 TAUdb: Performance Data Management Framework
Improving Time to Solution with Automated Performance Analysis
Improving Time to Solution with Automated Performance Analysis Shirley Moore, Felix Wolf, and Jack Dongarra Innovative Computing Laboratory University of Tennessee {shirley,fwolf,dongarra}@cs.utk.edu Bernd
Data Structure Oriented Monitoring for OpenMP Programs
A Data Structure Oriented Monitoring Environment for Fortran OpenMP Programs Edmond Kereku, Tianchao Li, Michael Gerndt, and Josef Weidendorfer Institut für Informatik, Technische Universität München,
A Pattern-Based Approach to. Automated Application Performance Analysis
A Pattern-Based Approach to Automated Application Performance Analysis Nikhil Bhatia, Shirley Moore, Felix Wolf, and Jack Dongarra Innovative Computing Laboratory University of Tennessee (bhatia, shirley,
Profile Data Mining with PerfExplorer. Sameer Shende Performance Reseaerch Lab, University of Oregon http://tau.uoregon.edu
Profile Data Mining with PerfExplorer Sameer Shende Performance Reseaerch Lab, University of Oregon http://tau.uoregon.edu TAU Analysis PerfDMF: Performance Data Mgmt. Framework 3 Using Performance Database
Advanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: [email protected] November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
A Survey Study on Monitoring Service for Grid
A Survey Study on Monitoring Service for Grid Erkang You [email protected] ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
Exploiting GPU Hardware Saturation for Fast Compiler Optimization
Exploiting GPU Hardware Saturation for Fast Compiler Optimization Alberto Magni School of Informatics University of Edinburgh United Kingdom [email protected] Christophe Dubach School of Informatics
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
Component visualization methods for large legacy software in C/C++
Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University [email protected]
End-user Tools for Application Performance Analysis Using Hardware Counters
1 End-user Tools for Application Performance Analysis Using Hardware Counters K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour, T. Spencer Abstract One purpose of the end-user tools described in
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
MEng, BSc Computer Science with Artificial Intelligence
School of Computing FACULTY OF ENGINEERING MEng, BSc Computer Science with Artificial Intelligence Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give
Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner
Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner Research Group Scientific Computing Faculty of Computer Science University of Vienna AUSTRIA http://www.par.univie.ac.at
Topics in basic DBMS course
Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch
MEng, BSc Applied Computer Science
School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions
Data Mining with Hadoop at TACC
Data Mining with Hadoop at TACC Weijia Xu Data Mining & Statistics Data Mining & Statistics Group Main activities Research and Development Developing new data mining and analysis solutions for practical
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
Data Mining & Data Stream Mining Open Source Tools
Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.
Part I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
Fast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
A Business Process Services Portal
A Business Process Services Portal IBM Research Report RZ 3782 Cédric Favre 1, Zohar Feldman 3, Beat Gfeller 1, Thomas Gschwind 1, Jana Koehler 1, Jochen M. Küster 1, Oleksandr Maistrenko 1, Alexandru
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
IT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
BSC vision on Big Data and extreme scale computing
BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,
Software Distributed Shared Memory Scalability and New Applications
Software Distributed Shared Memory Scalability and New Applications Mats Brorsson Department of Information Technology, Lund University P.O. Box 118, S-221 00 LUND, Sweden email: [email protected]
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Big Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy [email protected] Data Availability or Data Deluge? Some decades
RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS
ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for
D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Version 1.0
D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Document Information Contract Number 288777 Project Website www.montblanc-project.eu Contractual Deadline
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
A SURVEY ON MAPREDUCE IN CLOUD COMPUTING
A SURVEY ON MAPREDUCE IN CLOUD COMPUTING Dr.M.Newlin Rajkumar 1, S.Balachandar 2, Dr.V.Venkatesakumar 3, T.Mahadevan 4 1 Asst. Prof, Dept. of CSE,Anna University Regional Centre, Coimbatore, [email protected]
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and
Digital libraries of the future and the role of libraries
Digital libraries of the future and the role of libraries Donatella Castelli ISTI-CNR, Pisa, Italy Abstract Purpose: To introduce the digital libraries of the future, their enabling technologies and their
Secure Document Circulation Using Web Services Technologies
Secure Document Circulation Using Web Services Technologies Shane Bracher Bond University, Gold Coast QLD 4229, Australia Siemens AG (Corporate Technology), Otto-Hahn-Ring 6, 81739 Munich, Germany [email protected]
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008
Professional Organization Checklist for the Computer Science Curriculum Updates Association of Computing Machinery Computing Curricula 2008 The curriculum guidelines can be found in Appendix C of the report
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
Performance Tools for Parallel Java Environments
Performance Tools for Parallel Java Environments Sameer Shende and Allen D. Malony Department of Computer and Information Science, University of Oregon {sameer,malony}@cs.uoregon.edu http://www.cs.uoregon.edu/research/paracomp/tau
Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior
Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), [email protected], Velalar College of Engineering and Technology,
Bachelor Degree in Informatics Engineering Master courses
Bachelor Degree in Informatics Engineering Master courses Donostia School of Informatics The University of the Basque Country, UPV/EHU For more information: Universidad del País Vasco / Euskal Herriko
Industrial Adoption of Automatically Extracted GUI Models for Testing
Industrial Adoption of Automatically Extracted GUI Models for Testing Pekka Aho 1,2 [email protected], Matias Suarez 3 [email protected], Teemu Kanstrén 1,4 [email protected], and Atif M. Memon
Application Performance Analysis Tools and Techniques
Mitglied der Helmholtz-Gemeinschaft Application Performance Analysis Tools and Techniques 2012-06-27 Christian Rössel Jülich Supercomputing Centre [email protected] EU-US HPC Summer School Dublin
Scalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
Introduction to Arvados. A Curoverse White Paper
Introduction to Arvados A Curoverse White Paper Contents Arvados in a Nutshell... 4 Why Teams Choose Arvados... 4 The Technical Architecture... 6 System Capabilities... 7 Commitment to Open Source... 12
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
A Framework of User-Driven Data Analytics in the Cloud for Course Management
A Framework of User-Driven Data Analytics in the Cloud for Course Management Jie ZHANG 1, William Chandra TJHI 2, Bu Sung LEE 1, Kee Khoon LEE 2, Julita VASSILEVA 3 & Chee Kit LOOI 4 1 School of Computer
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
Self Organizing Maps for Visualization of Categories
Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, [email protected]
Tool Support for Inspecting the Code Quality of HPC Applications
Tool Support for Inspecting the Code Quality of HPC Applications Thomas Panas Dan Quinlan Richard Vuduc Center for Applied Scientific Computing Lawrence Livermore National Laboratory P.O. Box 808, L-550
Praseeda Manoj Department of Computer Science Muscat College, Sultanate of Oman
International Journal of Electronics and Computer Science Engineering 290 Available Online at www.ijecse.org ISSN- 2277-1956 Analysis of Grid Based Distributed Data Mining System for Service Oriented Frameworks
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology [email protected] Abstract This paper is
PAPI - PERFORMANCE API. ANDRÉ PEREIRA [email protected]
1 PAPI - PERFORMANCE API ANDRÉ PEREIRA [email protected] 2 Motivation Application and functions execution time is easy to measure time gprof valgrind (callgrind) It is enough to identify bottlenecks,
Course Syllabus For Operations Management. Management Information Systems
For Operations Management and Management Information Systems Department School Year First Year First Year First Year Second year Second year Second year Third year Third year Third year Third year Third
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
City Data Pipeline. A System for Making Open Data Useful for Cities. [email protected]
City Data Pipeline A System for Making Open Data Useful for Cities Stefan Bischof 1,2, Axel Polleres 1, and Simon Sperl 1 1 Siemens AG Österreich, Siemensstraße 90, 1211 Vienna, Austria {bischof.stefan,axel.polleres,simon.sperl}@siemens.com
Zhenping Liu *, Yao Liang * Virginia Polytechnic Institute and State University. Xu Liang ** University of California, Berkeley
P1.1 AN INTEGRATED DATA MANAGEMENT, RETRIEVAL AND VISUALIZATION SYSTEM FOR EARTH SCIENCE DATASETS Zhenping Liu *, Yao Liang * Virginia Polytechnic Institute and State University Xu Liang ** University
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
An Open MPI-based Cloud Computing Service Architecture
An Open MPI-based Cloud Computing Service Architecture WEI-MIN JENG and HSIEH-CHE TSAI Department of Computer Science Information Management Soochow University Taipei, Taiwan {wjeng, 00356001}@csim.scu.edu.tw
Data Warehousing and OLAP Technology for Knowledge Discovery
542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
MAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
The University of Jordan
The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S
A Conceptual Approach to Data Visualization for User Interface Design of Smart Grid Operation Tools
A Conceptual Approach to Data Visualization for User Interface Design of Smart Grid Operation Tools Dong-Joo Kang and Sunju Park Yonsei University [email protected], [email protected] Abstract
Hadoop Technology for Flow Analysis of the Internet Traffic
Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet
Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising
Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising Open Data Partners and AdReady April 2012 1 Executive Summary AdReady is working to develop and deploy sophisticated
Mobile Cloud Computing for Data-Intensive Applications
Mobile Cloud Computing for Data-Intensive Applications Senior Thesis Final Report Vincent Teo, [email protected] Advisor: Professor Priya Narasimhan, [email protected] Abstract The computational and storage
2. Distributed Handwriting Recognition. Abstract. 1. Introduction
XPEN: An XML Based Format for Distributed Online Handwriting Recognition A.P.Lenaghan, R.R.Malyan, School of Computing and Information Systems, Kingston University, UK {a.lenaghan,r.malyan}@kingston.ac.uk
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
Recent Advances in Periscope for Performance Analysis and Tuning
Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik,
International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS
PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, [email protected]; Third C.
A Framework for Online Performance Analysis and Visualization of Large-Scale Parallel Applications
A Framework for Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Robert Bell, and Sameer Shende University of Oregon, Eugene, OR 97403 USA, {likai,malony,bertie,sameer}@cs.uoregon.edu
Li Sheng. [email protected]. Nowadays, with the booming development of network-based computing, more and more
36326584 Li Sheng Virtual Machine Technology for Cloud Computing Li Sheng [email protected] Abstract: Nowadays, with the booming development of network-based computing, more and more Internet service vendors
Search and Data Mining: Techniques. Introduction Anna Yarygina Boris Novikov
Search and Data Mining: Techniques Introduction Anna Yarygina Boris Novikov Data Analytics: Conference Sections Fundamentals for data analytics Mechanisms and features Big Data Huge data Target analytics
An Artificial Immune Model for Network Intrusion Detection
An Artificial Immune Model for Network Intrusion Detection Jungwon Kim and Peter Bentley Department of Computer Science, University Collge London Gower Street, London, WC1E 6BT, U. K. Phone: +44-171-380-7329,
European Data Infrastructure - EUDAT Data Services & Tools
European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28
ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
Optimization of Image Search from Photo Sharing Websites Using Personal Data
Optimization of Image Search from Photo Sharing Websites Using Personal Data Mr. Naeem Naik Walchand Institute of Technology, Solapur, India Abstract The present research aims at optimizing the image search
Model-Driven Cloud Data Storage
Model-Driven Cloud Data Storage Juan Castrejón 1, Genoveva Vargas-Solar 1, Christine Collet 1, and Rafael Lozano 2 1 Université de Grenoble, LIG-LAFMIA, 681 rue de la Passerelle, Saint Martin d Hères,
