Fast Multipole Method for particle interactions: an open source parallel library component
|
|
- Jonah Burke
- 8 years ago
- Views:
Transcription
1 Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, United Kingdom 2 Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Avenue, Argonne, IL U.S.A. Abstract. The fast multipole method is used in many scientific computing applications such as astrophysics, fluid dynamics, electrostatics and others. It is capable of greatly accelerating calculations involving pair-wise interactions, but one impediment to a more widespread use is the algorithmic complexity and programming effort required to implement this method. We are developing an open source, parallel implementation of the fast multipole method, to be made available as a component of the PETSc library. In this process, we also contribute to the understanding of how the accuracy of the multipole approximation depends on the parameter choices available to the user. Moreover, the proposed parallelization strategy provides optimizations for automatic data decomposition and load balancing. 1INTRODUCTION The advantages of the fast multipole method (FMM) for accelerating pair-wise interactions or N-body problems are well-known. In theory, one can reduce an O(N 2 ) calculation to O(N),whichhasahugeimpactinsimulationsusingparticlemethods. Considering this impact, it is perhaps surprising that the adoption of the FMM algorithm has not been more widespread. There are two main reasons for its seemingly slow adoption; first, the scaling of the FMM can really only be achieved for simulations involving very large numbers of particles, say, larger than Soonly those researchers interested in solving large problems will see an advantage with the method. More importantly, perhaps, is the fact that the FMM requires considerable extra programming effort, when compared with other algorithms like particle-mesh methods, or treecodes providing O(N log N) complexity. One could argue that a similar concern has been experienced in relation to most advanced algorithms. For example, when faced with a problem resulting in a large system of algebraic equations to be solved, most scientists would be hard pressed to have to program a modern iterative solution method, such as a generalized minimum residual (GMRES) method. Their choice, in the face of programming from scratch, will most likely be direct Gaussian elimination, or if attempting an iterative method, D. Tromeur-Dervout (eds.), Parallel Computational Fluid Dynamics 2008, Lecture Notes in Computational Science and Engineering 74, DOI: / , c Springer-Verlag Berlin Heidelberg 2010
2 286 F. A. Cruz, M. G. Knepley, and L. A. Barba the simplest to implement but slow to converge Jacobi method. Fortunately, there is no need to make this choice, as we nowadays have available a wealth of libraries for solving linear systems with a variety of advanced methods. What s more, there are available parallel implementations of many librarires, for solving large problems in a distributed computational resource. One of these tools is the PETSc library for large-scale scientific computing [1]. This library has been under development for more than 15 years and offers distributed arrays, and parallel vector and matrix operations, as well as a complete suite of solvers, and much more. We propose that a parallel implementation of the FMM, provided as a library component in PETSc, is a welcome contribution to the diverse scientific applications that will benefit from the acceleration of this algorithm. In this paper, we present an ongoing project which is developing such a library component, offering an open source, parallel FMM implementation, which furthermore will be supported and maintained via the PETSc project. In the development of this software component, we have also investigated the features of the FMM approximation, to offer a deeper understanding of how the accuracy depends on the parameters. Moreover, the parallelization strategy involves an optimization approach for data decomposition among processors andloadbalancing,thatshouldmakethis averyusefullibraryforcomputationalscientists. This paper is presented as follows, in section 2 wepresentanoverviewofthe original FMM [2], in section 3 we present our proposed parallelization strategy, and in 4 scaling results of the parallel implementation are presented. 2CHARACTERIZATIONOFTHEMULTIPOLE APPROXIMATION 2.1 Overview of the algorithm The FMM is a fast summation method that accelerates the multiple evaluation of functions of the form: f (y j )= N i=1 c i K(y j,x i ) (1) where the function f ( ) is evaluated at a set of {y j } locations. In a single evaluation of (1) thesumoverasetofsources{(c i,x i )} is carried out, where each source is characterized by its weight c i and position x i.therelationbetweentheevaluation and source points is given by the the kernel K, ingeneral,thekernelfunctionis required to decay monotonically. In order to accelerate the computations, the FMM uses the idea that the influence of a cluster of sources can be approximated by an agglomerated quantity, when such influence is evaluated far enough away from the cluster itself. The method works by dividing the computational domain into a near-domain and a far-domain: Near domain: contains all the particles that are near the evaluation point, and is usually a minor fraction of all the N particles. The influence of the near-domain is
3 Fast Multipole Method for particle interactions 287 computed by directly evaluating the pair-wise particle interactions with (1). The computational cost of directly evaluating the near domain is not dominant as the near-domain remains small. Far domain: contains all the particles that are far away from the evaluation point, and ideally contains most of the N particles of the domain. The evaluation of the far domain will be sped-up by evaluating the approximated influence of clusters of particles rather than computing the interaction with every particle of the system. The approximation of the influence of a cluster is represented as Multipole Expansions (MEs) and as Local Expansions (LEs); these two different representations of the cluster are the key ideas behind thefmm.themesandlesaretaylorseries (or other) that converge in different subdomains of space. The center of the series for an ME is the center of the cluster of source particles, and it only converges outside the cluster of particles. In the case of an LE, the series is centered near an evaluation point and converges locally. The first step of the FMM is to hierarchically subdivide space in order to form the clusters of particles; this is accomplished byusingatreestructure,illustratedin Figure 1,torepresenteachsubdivision.Inaone-dimensionalexample:level0isthe whole domain, which is split in two halves at level 1, and so on up to level l. The spatial decomposition for higher dimensions follows the same idea but changing the numberof subdivisions.in two dimensions,each domainis divided in four, to obtain aquadtree,whileinthreedimensions,domainsaresplitin8toobtainanoct-tree. Independently of the dimension of the problem, we can always make a flat drawing of the tree as in Fig. 1, with the only difference between dimensions being the number of branches coming out of each node of the flat tree. Level 0 Level 1 Level 1 Level 2 Level 2 Level 3 Level 4 Fig. 1. Sketch of a one-dimensional domain (right), divided hierarchically using a binary tree (left), to illustrate the meaning of levels in a tree and the idea of a final leaf holding a set of particles at the deepest level.
4 288 F. A. Cruz, M. G. Knepley, and L. A. Barba Upward Sweep To sibling: ME to LE Downward Sweep Translate ME to grand-parent Translate ME to parent Level 2 To child: LE to LE Create Multipole Expansion, ME Local Expansion, LE Fig. 2. Illustration of the upward sweep and the downward sweep of the tree. The multipole expansions (ME) are created at the deepest level, then translated upwards to the center of the parent cells. The MEs are then translated to a local expansion (LE) for the siblings at all levels deeper than level 2, and then translated downward to children cells. Finally, the LEs are created at the deepest levels. After the space decomposition, the FMM builds the MEs for each node of the tree in a recursive manner. The MEs are built first at the deepest level, level l, andthen translated to the center of the parent cell, recursively creating the ME of the whole tree. This is referred to as the upward sweep of the tree. Then, in the downward sweep the MEs are first translated into LEs for all the boxes in the interaction list. At each level, the interaction list corresponds to the cells of the same level that are in the far field for a given cell, this is defined by simple relations between nodes of the tree structure. Finally, the LEs of upper levels are added up to obtain the complete far domain influence for each box at the leaf level of the tree. The result is added to the local influence, calculated directly with Equation (1).These ideas are better visualized with an illustration, as provided in Figure 2.Formoredetailsofthe algorithm, we cite the original reference [2]. 3PARALLELIZATIONSTRATEGY In the implementation of the FMM for parallel systems, the contribution of this work lies on automatically performing load balance, while minimizing the communications between processes. To automatically perform this task, first the FMM is decomposed into more basic algorithmic elements, and then we automatically optimize the distribution of these elements across the available processes, using as criteria the load balance and minimizing communications. In our parallel strategy, we use sub-trees of the FMM as the basic algorithmic element to distribute across processes. In order to partition work among processes, we cut the tree at a certain level k, dividingitintoaroot tree and 2 dk local trees,
5 Fast Multipole Method for particle interactions 289 Data decomposition Level k Sub-tree Sub-tree Fig. 3. Sketch to illustrate the parallelization strategy. The image depicts the FMM algorithm as a tree, this tree is then cut at a chosen level k,andalltheresultingsub-treesaredistributed among the available processes. as seen in Figure 3. Byassigningmultiplesub-treestoanygivenprocess,wecan achieve both load balance and minimimal communication. A basic requirement of our approach is that when decomposing the FMM, we produce more sub-trees than available processes. To optimally distribute the sub-trees among the processes, we change the problem of distributing the sub-trees into a graph partitioning problem, that we later partition in as many parts as processes available. First, we assemble a graph whose vertices are the sub-trees, with edges (i, j) indicating that a cell c in sub-tree j is in the interaction list of cell c in sub-tree i, asseeninfigure4. Thenweightsare assigned to each vertex i, indicatingtheamountofcomputationalworkperformed by the sub-tree i, andtoeachedge(i, j) indicating the communication cost between sub-trees i and j. One of the advantages of using the sub-trees as a basic algorithmic elements, is that due to its well defined structure, we can rapidly estimate the amount of work and communication performed by the sub-tree. In order to produce these estimates, the only information that we require is the depth of the sub-trees and neighbor sub-tree information. By using the work and communication estimates, we have a weighted graph representation of the FMM. The graph is now partitioned, for instance using ParMetis [3], andthetree informationcan be distributedto therelevantprocesses. Advantages of this approach, over a space-filling curve partitioning for example, include its simplicity, the reuse of the serial tree data structure, and reuse of existing partitioning tools. Moreover, purely local data and data communicated from other processes are handled using the same parallel structure, known as Sieve [4]. The Sieve framework provides support for parallel, scalable, unstructures data structures, and the operations defined over them. In our parallel implementation, the use of
6 290 F. A. Cruz, M. G. Knepley, and L. A. Barba v i e ij v j Fig. 4. Sketch that illustrates the graph representation of the sub-trees. The FMM sub-trees are represented as vertex in the graph, and the communication between sub-trees is represented by the edges between corresponding vertices. Each vertex has an associated weight v i,that represents an estimate of the total work performed by the sub-tree. Each edge has an associated weight e ij,thatrepresentsanestimateoftheworkbetweensub-treesv i and v j.byfindingan optimal partitioning of the graph into as many section as processes available, we achieve load balance while minimizing communications. In the figure the graph has been partitioned in three sections, the boundaries between sections are represented by the dashed lines, and all the sub-trees that belongs to the same section are assigned to the same process. Sieve reduces the parallelism to a single operation for neighbor and interaction list exchange. 4RESULTS In this section we present results of the parallel implementation on a multicore architecture, a Mac Pro machine, equipped with two quad-core processors and 8GB in RAM. The implementation is being currently integrated into the open source PETSc library. In Figure 5, we present experimental results of the timings obtained from an optimized version of PETSc library using its built-in profiling tools. The experiments were run on a multicore setup with up to four cores working in parallel, while varying the number of particles in the system. We performed experiments from three thousand particles up to system with more than two million particles. All the experiments were run for a fixed set of parameters of the FMM algorithm, in all cases we used parameters that ensure us to achieve high accuracy an performance (FMM expansion terms p = 17, and FMM tree level l = 8). In the experimental results, we were able to evaluate a problem with up to two million particles in under 20 seconds. To put this in context, if we consider a problem
7 Fast Multipole Method for particle interactions cpu 2 cpu 3 cpu 4 cpu Time [sec] Number of particles Fig. 5. Log-log plot of the total execution time of the FMM parallel algorithm for fixed parameters (expansion terms p = 17, and FMM tree level l = 8) vs. problem size (number of particles in the system). In the figure, each line represents the total amount of time used by the parallel FMM to compute the interaction in the system when using 1, 2, 3, and 4 processors. with one million particles (N = 10 6 ), the direct computation of the N 2 operations would be proportional to 1 Tera floating point operations (10 12 operations), in order to compute all the particles interactions of the system. Instead, the FMM algorithm accelerates the computations and achieve an approximate result with high accuracy in only 10 Giga floating point operations, meaning 100 times less floating point operations. 5CONCLUSIONSANDFUTUREWORK In this paper we have presented the first open source project of a parallel Fast Multipole Method library. As second contribution in this work, we also introduced a new parallelization strategy for the Fast Multipole Method algorithm that automatically achieves load balance and minimization of the communication between processes. The current implementation iscapableofsolvingn-bodyproblemsformillionsof unknowns in a multicore machine with good results. We are currently tuning the library and running experiments on a cluster setup. With this paper we expect to contribute to widespread the use of fast algorithms by the scientific community, and the open source characteristic of the FMM library
8 292 F. A. Cruz, M. G. Knepley, and L. A. Barba is the first step towards this goal. The open source project is currently being incorporated into the PETSc library [1] and a first version of the software is going to be released by October [1] S. Balay, K. Buschelman, W. D. Gropp, D. Kaushik, M. Knepley, L. Curfman- McInnes, B. F. Smith, and H. Zhang. PETSc User s Manual. Technical Report ANL-95/11 - Revision 2.1.5, Argonne National Laboratory, [2] L. Greengard and V. Rokhlin. A fast algorithm for particle simulations. J. Comput. Phys.,73(2): ,1987. [3] George Karypis and Vipin Kumar. A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. Journal of Parallel and Distributed Computing,48:71 85,1998. [4] Matthew G. Knepley and Dmitry A. Karpeev. Mesh algorithms for pde with sieve i: Mesh distribution. Scientific Programming,2008.toappear.
PetFMM A dynamically load-balancing parallel fast multipole library
PetFMM A dynamically load-balancing parallel fast multipole library Felipe A. Cruz a Matthew G. Knepley b L. A. Barba c, a Department of Mathematics, University of Bristol, England BS8 1TW arxiv:0905.2637v1
More informationHPC Deployment of OpenFOAM in an Industrial Setting
HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment
More informationMesh Generation and Load Balancing
Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable
More informationFRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite
More informationPROVABLY GOOD PARTITIONING AND LOAD BALANCING ALGORITHMS FOR PARALLEL ADAPTIVE N-BODY SIMULATION
SIAM J. SCI. COMPUT. c 1998 Society for Industrial and Applied Mathematics Vol. 19, No. 2, pp. 635 656, March 1998 019 PROVABLY GOOD PARTITIONING AND LOAD BALANCING ALGORITHMS FOR PARALLEL ADAPTIVE N-BODY
More informationLoad Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity
Load Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity Jaswinder Pal Singh, Chris Holt, Takashi Totsuka, Anoop Gupta and John L. Hennessy Computer
More informationDistributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationCharacterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies
Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems November 3-6, 1999 in Cambridge Massachusetts, USA Characterizing the Performance of Dynamic Distribution
More informationYousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008
A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity
More informationParallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
More informationHPC enabling of OpenFOAM R for CFD applications
HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,
More informationPartitioning and Divide and Conquer Strategies
and Divide and Conquer Strategies Lecture 4 and Strategies Strategies Data partitioning aka domain decomposition Functional decomposition Lecture 4 and Strategies Quiz 4.1 For nuclear reactor simulation,
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
More informationPerformance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
More informationPoisson Equation Solver Parallelisation for Particle-in-Cell Model
WDS'14 Proceedings of Contributed Papers Physics, 233 237, 214. ISBN 978-8-7378-276-4 MATFYZPRESS Poisson Equation Solver Parallelisation for Particle-in-Cell Model A. Podolník, 1,2 M. Komm, 1 R. Dejarnac,
More informationABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS
1 ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS Sreenivas Varadan a, Kentaro Hara b, Eric Johnsen a, Bram Van Leer b a. Department of Mechanical Engineering, University of Michigan,
More informationA Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
More informationAN APPROACH FOR SECURE CLOUD COMPUTING FOR FEM SIMULATION
AN APPROACH FOR SECURE CLOUD COMPUTING FOR FEM SIMULATION Jörg Frochte *, Christof Kaufmann, Patrick Bouillon Dept. of Electrical Engineering and Computer Science Bochum University of Applied Science 42579
More informationCode Generation Tools for PDEs. Matthew Knepley PETSc Developer Mathematics and Computer Science Division Argonne National Laboratory
Code Generation Tools for PDEs Matthew Knepley PETSc Developer Mathematics and Computer Science Division Argonne National Laboratory Talk Objectives Introduce Code Generation Tools - Installation - Use
More informationHigh Performance Computing for Operation Research
High Performance Computing for Operation Research IEF - Paris Sud University claude.tadonki@u-psud.fr INRIA-Alchemy seminar, Thursday March 17 Research topics Fundamental Aspects of Algorithms and Complexity
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationDesigning and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
More informationBranch-and-Price Approach to the Vehicle Routing Problem with Time Windows
TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents
More informationInteractive comment on A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc by W. He et al.
Geosci. Model Dev. Discuss., 8, C1166 C1176, 2015 www.geosci-model-dev-discuss.net/8/c1166/2015/ Author(s) 2015. This work is distributed under the Creative Commons Attribute 3.0 License. Geoscientific
More informationNumerical Algorithms Group. Embedded Analytics. A cure for the common code. www.nag.com. Results Matter. Trust NAG.
Embedded Analytics A cure for the common code www.nag.com Results Matter. Trust NAG. Executive Summary How much information is there in your data? How much is hidden from you, because you don t have access
More informationP013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE
1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France
More informationThe enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images
The enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images Borusyak A.V. Research Institute of Applied Mathematics and Cybernetics Lobachevsky Nizhni Novgorod
More informationClustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
More information?kt. An Unconventional Method for Load Balancing. w = C ( t m a z - ti) = p(tmaz - 0i=l. 1 Introduction. R. Alan McCoy,*
ENL-62052 An Unconventional Method for Load Balancing Yuefan Deng,* R. Alan McCoy,* Robert B. Marr,t Ronald F. Peierlst Abstract A new method of load balancing is introduced based on the idea of dynamically
More informationParallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation
Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation Ying Peng, Bin Gong, Hui Liu, and Yanxin Zhang School of Computer Science and Technology, Shandong University,
More informationDynamic Load Balancing of the Adaptive Fast Multipole Method in Heterogeneous Systems
2013 IEEE 27th International Symposium on Parallel & Distributed Processing Workshops and PhD Forum Dynamic Load Balancing of the Adaptive Fast Multipole Method in Heterogeneous Systems Robert E. Overman,
More informationNumerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015
Numerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015 Xian Shi 1 bio I am a second-year Ph.D. student from Combustion Analysis/Modeling Lab,
More informationAn analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University
More informationLoad Balancing Strategies for Parallel SAMR Algorithms
Proposal for a Summer Undergraduate Research Fellowship 2005 Computer science / Applied and Computational Mathematics Load Balancing Strategies for Parallel SAMR Algorithms Randolf Rotta Institut für Informatik,
More informationCluster Computing at HRI
Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: jasjeet@mri.ernet.in 1 Introduction and some local history High performance computing
More informationDATA MINING METHODS WITH TREES
DATA MINING METHODS WITH TREES Marta Žambochová 1. Introduction The contemporary world is characterized by the explosion of an enormous volume of data deposited into databases. Sharp competition contributes
More informationMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,
More informationVISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills
VISUALIZING HIERARCHICAL DATA Graham Wills SPSS Inc., http://willsfamily.org/gwills SYNONYMS Hierarchical Graph Layout, Visualizing Trees, Tree Drawing, Information Visualization on Hierarchies; Hierarchical
More informationA Theory of the Spatial Computational Domain
A Theory of the Spatial Computational Domain Shaowen Wang 1 and Marc P. Armstrong 2 1 Academic Technologies Research Services and Department of Geography, The University of Iowa Iowa City, IA 52242 Tel:
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationParallel Algorithm for Dense Matrix Multiplication
Parallel Algorithm for Dense Matrix Multiplication CSE633 Parallel Algorithms Fall 2012 Ortega, Patricia Outline Problem definition Assumptions Implementation Test Results Future work Conclusions Problem
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationPower-Aware High-Performance Scientific Computing
Power-Aware High-Performance Scientific Computing Padma Raghavan Scalable Computing Laboratory Department of Computer Science Engineering The Pennsylvania State University http://www.cse.psu.edu/~raghavan
More informationPERFORMANCE ANALYSIS AND OPTIMIZATION OF LARGE-SCALE SCIENTIFIC APPLICATIONS JINGJIN WU
PERFORMANCE ANALYSIS AND OPTIMIZATION OF LARGE-SCALE SCIENTIFIC APPLICATIONS BY JINGJIN WU Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science
More informationA Load Balancing Tool for Structured Multi-Block Grid CFD Applications
A Load Balancing Tool for Structured Multi-Block Grid CFD Applications K. P. Apponsah and D. W. Zingg University of Toronto Institute for Aerospace Studies (UTIAS), Toronto, ON, M3H 5T6, Canada Email:
More informationNumerical Analysis. Professor Donna Calhoun. Fall 2013 Math 465/565. Office : MG241A Office Hours : Wednesday 10:00-12:00 and 1:00-3:00
Numerical Analysis Professor Donna Calhoun Office : MG241A Office Hours : Wednesday 10:00-12:00 and 1:00-3:00 Fall 2013 Math 465/565 http://math.boisestate.edu/~calhoun/teaching/math565_fall2013 What is
More informationBig Data Systems CS 5965/6965 FALL 2015
Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html
More informationDesign and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
More informationHome Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit
Data Structures Page 1 of 24 A.1. Arrays (Vectors) n-element vector start address + ielementsize 0 +1 +2 +3 +4... +n-1 start address continuous memory block static, if size is known at compile time dynamic,
More informationBSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,
More informationThe Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland
The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which
More informationMesh Partitioning and Load Balancing
and Load Balancing Contents: Introduction / Motivation Goals of Load Balancing Structures Tools Slide Flow Chart of a Parallel (Dynamic) Application Partitioning of the initial mesh Computation Iteration
More informationLarge-Scale Reservoir Simulation and Big Data Visualization
Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)
More informationAccelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism
Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Jianqiang Dong, Fei Wang and Bo Yuan Intelligent Computing Lab, Division of Informatics Graduate School at Shenzhen,
More informationHPC Infrastructure Development in Bulgaria
HPC Infrastructure Development in Bulgaria Svetozar Margenov margenov@parallel.bas.bg Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev Str. Bl. 25-A,
More informationHash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization
Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03018-1 Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization Michael Griebel and Gerhard Zumbusch
More informationSIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
More informationIntroduction to Logistic Regression
OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More informationNEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES
NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES Silvija Vlah Kristina Soric Visnja Vojvodic Rosenzweig Department of Mathematics
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationIterative Solvers for Linear Systems
9th SimLab Course on Parallel Numerical Simulation, 4.10 8.10.2010 Iterative Solvers for Linear Systems Bernhard Gatzhammer Chair of Scientific Computing in Computer Science Technische Universität München
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationPortable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications
Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Rupak Biswas MRJ Technology Solutions NASA Ames Research Center Moffett Field, CA 9435, USA rbiswas@nas.nasa.gov
More informationPresto/Blockus: Towards Scalable R Data Analysis
/Blockus: Towards Scalable R Data Analysis Andrew A. Chien University of Chicago and Argonne ational Laboratory IRIA-UIUC-AL Joint Institute Potential Collaboration ovember 19, 2012 ovember 19, 2012 Andrew
More informationMixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
More informationParallel Simplification of Large Meshes on PC Clusters
Parallel Simplification of Large Meshes on PC Clusters Hua Xiong, Xiaohong Jiang, Yaping Zhang, Jiaoying Shi State Key Lab of CAD&CG, College of Computer Science Zhejiang University Hangzhou, China April
More informationBest practices for efficient HPC performance with large models
Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013 - Industry Oriented HPC Simulations, September 21-27, University of Ljubljana,
More informationNew Challenges in Dynamic Load Balancing
New Challenges in Dynamic Load Balancing Karen D. Devine 1, Erik G. Boman, Robert T. Heaphy, Bruce A. Hendrickson Discrete Algorithms and Mathematics Department, Sandia National Laboratories, Albuquerque,
More informationAN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS
AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS by M. Storti, L. Dalcín, R. Paz Centro Internacional de Métodos Numéricos en Ingeniería - CIMEC INTEC, (CONICET-UNL), Santa Fe, Argentina
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationLoad balancing. David Bindel. 12 Nov 2015
Load balancing David Bindel 12 Nov 2015 Inefficiencies in parallel code Poor single processor performance Typically in the memory system Saw this in matrix multiply assignment Overhead for parallelism
More informationNETZCOPE - a tool to analyze and display complex R&D collaboration networks
The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.
More informationMultiphase Flow - Appendices
Discovery Laboratory Multiphase Flow - Appendices 1. Creating a Mesh 1.1. What is a geometry? The geometry used in a CFD simulation defines the problem domain and boundaries; it is the area (2D) or volume
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationLarge Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs + D. Pugmire, D. Camp, C. Garth, G.
Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs + D. Pugmire, D. Camp, C. Garth, G. Weber, S. Ahern, & K. Joy Lawrence Berkeley National Laboratory
More informationScientific Computing Programming with Parallel Objects
Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore
More informationData Mining with Hadoop at TACC
Data Mining with Hadoop at TACC Weijia Xu Data Mining & Statistics Data Mining & Statistics Group Main activities Research and Development Developing new data mining and analysis solutions for practical
More informationStatistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.
More informationGeneral Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
More informationTHE NAS KERNEL BENCHMARK PROGRAM
THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures
More informationSpatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets-
Progress in NUCLEAR SCIENCE and TECHNOLOGY, Vol. 2, pp.603-608 (2011) ARTICLE Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets- Hiroko Nakamura MIYAMURA 1,*, Sachiko
More informationOn-Demand Supercomputing Multiplies the Possibilities
Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Image courtesy of Wolfram Research, Inc. On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server
More informationTopological Properties
Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any
More informationMethodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph
More informationA Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids
A Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids William F. Mitchell Mathematical and Computational Sciences Division National nstitute of Standards
More informationPRODUCT INFORMATION. Insight+ Uses and Features
PRODUCT INFORMATION Insight+ Traditionally, CAE NVH data and results have been presented as plots, graphs and numbers. But, noise and vibration must be experienced to fully comprehend its effects on vehicle
More informationHigh Performance Matrix Inversion with Several GPUs
High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationCCTech TM. ICEM-CFD & FLUENT Software Training. Course Brochure. Simulation is The Future
. CCTech TM Simulation is The Future ICEM-CFD & FLUENT Software Training Course Brochure About. CCTech Established in 2006 by alumni of IIT Bombay. Our motive is to establish a knowledge centric organization
More informationThe Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): A Vision for Large-Scale Climate Data
The Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): A Vision for Large-Scale Climate Data Lawrence Livermore National Laboratory? Hank Childs (LBNL) and Charles Doutriaux (LLNL) September
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationDistributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More informationJunghyun Ahn Changho Sung Tag Gon Kim. Korea Advanced Institute of Science and Technology (KAIST) 373-1 Kuseong-dong, Yuseong-gu Daejoen, Korea
Proceedings of the 211 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. A BINARY PARTITION-BASED MATCHING ALGORITHM FOR DATA DISTRIBUTION MANAGEMENT Junghyun
More information14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)
Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,
More informationOpenFOAM Optimization Tools
OpenFOAM Optimization Tools Henrik Rusche and Aleks Jemcov h.rusche@wikki-gmbh.de and a.jemcov@wikki.co.uk Wikki, Germany and United Kingdom OpenFOAM Optimization Tools p. 1 Agenda Objective Review optimisation
More informationMapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
More informationDistributed Particle Simulation Method on Adaptive Collaborative System
Distributed Particle Simulation Method on Adaptive Collaborative System Yudong Sun, Zhengyu Liang, and Cho-Li Wang Department of Computer Science and Information Systems The University of Hong Kong, Pokfulam
More information