A New Scalable Parallel Method for Molecular Dynamics Based on Cell-Block Data Structure 1
|
|
- Ashley Kelly
- 7 years ago
- Views:
Transcription
1 A New Scalable Parallel Method for Molecular Dynamics Based on Cell-Block Data Structure Cao Xiaolin Mo Zeyao High Performance Computing Center, State Key Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, P. O. Box 8009, 00088, Bei-Jing P. R. China {xiaolincao, Abstract. A scalable parallel algorithm especially for large-scale three dimensional simulations with seriously non-uniform particles distributions is presented. In particular, based on cell-block data structures, this algorithm uses Hilbert space filling curve to convert three-dimensional domain decomposition for load distribution across processors into one-dimensional load balancing problems for which measurement-based multilevel averaging weights(maw) method can be applied successfully. Against inverse space-filling partitioning(isp), MAW redistributes blocks by monitoring change of total load in each processor. Numerical experimental results have shown that MAW is superior to ISP in rendering balanced load for large-scale multi-medium MD simulation in high temperature and high pressure physics. Excellent scalability was demonstrated, with a speedup larger than 200 with 240 processors of one MPP. The largest run with. 0 9 particles on 500 processors took 80 seconds per time step. Introduction Molecular dynamics(md) simulation is an important tool in studying the properties of condensed matter and their dynamic interactions that can be difficult to obtain by other means. In order to make reasonable comparison with experiment, it is often necessary to simulate features on a micron scale. Realistic MD simulations of this size require at least 0 8 ~0 9 particles, preferably more. Non-uniform distribution of various kinds of particles in space produces a highly irregular computational load. Also, as the simulations evolves, the movement of particles causes changes in the load distribution of processor used. These factors make it difficult to achieve high scalability. Therefore a good load balancing scheme is necessary to enhance scalability. Research supported by Chinese NSF( ), Chinese 863 program(2002aa04570) and CAEP Funds
2 Mo[] has presented a robust iterative -D DLB algorithm(i.e. MAW) to be suitable for 2-D parallel link-cell MD simulation. Hayashi[2] generalized the cellular automation diffusion scheme to the 3-D simulation by introducing a concept of permanent cell to minimize inter-processor communication overheads. Against ORB and ORB-MM, Pilkington[3] have shown that of the three strategies, only Inverse Space-filling Partitioning(ISP) is able to render highly balanced workloads without incurring elaborate bookkeeping on a uniform mesh of N-body problem. NAMD[4] relies on a measurement-based DLB scheme to achieve high scalability for biomolecular systems. However, these DLB schemes are not suitable for our real application. Based on these research described above, a measurement-based multilevel averaging weight DLB scheme based on Hilbert space-filling curve(hsfc)[5] was presented for our large-scale multi-medium MD simulation in high temperature and high pressure physics, which computational load is unpredictably and non-uniform with position and time. Then a new cell-block data structure required to describe MD simulation was constructed in order to help DLB scheme provide assistance with the movement of data. After data are moved between processors, the data structures must be rebuilt and the inter-processor communication patterns need to be updated. The DLB scheme and cell-block data structure, along with some auxiliary function, were integrated into a new parallel MD algorithm aimed at utilizing large parallel machines in a scalable manner. The new parallel MD algorithm is described in section 2. The results of some performance evaluation are discussed in section 3. Against ISP, our DLB scheme can get better load balance with monitoring change of total load in each processor instead of monitoring change of workload in each block. Two numerical experimental results have showed that this new parallel MD algorithm can achieve high scalability for large-scale multi-medium MD simulation in high temperature and high pressure physics. Finally, we give some conclusions in section 4. 2 Parallelization strategy Provided that P is the number of processors, traditional link-cell domain decomposition method(ddm) partitions space into P sub-spaces (one per processor). It is highly scalable while particle densities are uniform. However, it has some disadvantages: () It is hard to use if the number of processors cannot be factored into three roughly equal factors; (2) Non-uniform distribution of particles can result in load imbalance; (3) Its data structure is not suitable for most DLB methods. In order to solve these problems, our method firstly partitions space into Q(Q >>P ) fixed-size blocks and creates cell-block data structure. Secondly, Hilbert space filling curve(hsfc) imposes a linear order(i.e. HSFC index) of the blocks in the high-dimensional space, which is the foundation of our DDM and DLB scheme. Thirdly, it constructs cell-block DDM based on HSFC and maps multiple blocks to each processor. Finally, a measurement- based multilevel averaging weight DLB scheme based on HSFC is used to get better load balance by redistributing blocks.
3 2. Cell-Block data structure Our method firstly partitions physical space into Q blocks and then each block is subdivided into smaller volumes named cells with a side length R L = r c + δ, where r c is cut-off radius, δ is a small positive number. In Fig., each block includes 4 4 computing cells (white). A layer of empty cells (shade) is padded to each block. We call these extra cells auxiliary cell, which stores temporarily some indices of particles mo v- ing to neighbor blocks. Because link-cell data structure can result in irregular memory access, we adopt compact memory management in cell-block data structure. Particles in the same cell are always stored sequentially in a list. So we designed a cell head pointer describing initial memory address of particles in the cell and an integer number describing the number of particles in the cell Fig.. Cell-Block structure include compute cell(white) and auxiliary cell(shade), where number in cell is index of cell. 2.2 cell-block DDM based on HSFC Given a non-uniform model as shown in Fig. 2(a), simulation space was divided into 8 8 blocks, the total number of blocks Q is 64. Load of each block was first evaluated approximately by counting particle number in block. Then we adopt Zoltan method[6] generating HSFC, which is valid for any shape space. As shown in Fig. 2(b), HSFC visit every block of 2-D space. Meanwhile, we numbered all blocks from to 64 along HSFC. The mapping of the hyperspace to the line is done once only, and is therefore a pre-processing step. Moreover, we construct a fast transform table in order to manage the mapping information. Finally, we apply -D recursive bisection to cut HSFC into 4
4 logically contiguous segments containing almost equal loads that correspond to physically irregular partitions, as shown in Fig. 2(b). Loads of 4 sub-domain is 5(dot: -8), 508(up diagonal line: 9-33), 525(white: 34-49), 543(down diagonal line: 50-64), respectively. Each segment includes a collection of blocks, which can be assigned to corresponding processors. So we can implicitly partition the hyperspace by partitioning simply and effectively -D line, which transforms hyperspace load balance problem into -D load balance problem. It can achieve better load balance, but arise irregular hyperspace partitions, which mu st manage unstructured communication. Therefore, bookkeeping information is required for each block. In real large-scale simulation, the number of blocks are generally far less than the number of cells, which reduce overheads of managing communication and the memory of bookkeeping information to O(Q). Moreover, D recursive bisection partitioning cost is O(Q). So we use block instead of cell as a allocable and mobile unit in order to reduce these overheads (a) (b) Fig. 2. HSFC-based domain decomposition. (a) load of 8 8 blocks, the number in each block represent load of owned block (b) partitioning results, the number in each block represent the HSFC index of owned block. 2.3 Measured-based MAW DLB based on HSFC In our MD application, the movement of particles cause load fluctuation, and this phenomenon becomes more and more critical as the time evolves. This may cause severe load imbalance. It is often necessary to adjust the loads very quickly to be balanced. Paper[3] adopt ISP method to solve this imbalance in N-body problem. It exhibits several disadvantages in our large-scale multi-medium MD simulation in high temperature and high pressure physics. The main difficulty was determining the computational load of single block. ISP evaluates load of each block by a simple timedependent function, such as the number of particles in the block and maybe the number of particles in neighboring blocks. It is not suitable for our MD simulation because
5 load of each block is dependent on the number of particles, the geometric distribution of particles and the type of particles. Moreover, processor speed and cache effect both also affect this load. So a new DLB scheme needs to be designed for this MD simulation. Once the hyperspace has been mapped to the line by HSFC, -D MAW[7] can be used to solve load balance problem arising from our MD application. So a measurement-based MAW DLB scheme based on HSFC and cell-block data structure was presented. It redistributes blocks sorted by HSFC index by monitoring change of total load in single processor instead of monitoring change of workload in single block. Simulation executes in the following procedure to perform DLB. First, the simulation runs for a small number of steps, typically lasting a few minutes. Actual computing time of local processor is measured during this time. After a particular simulation step, the main processor collects the load data from each processor, computes a new blocks distribution by calling MAW method. Since its partitioning time is linear with the number of blocks(q) that is O(Q), partitioning overheads on a single processor will remain modest for large problems so long as Q scaled accordingly. A table describing the complete partitioning with an O(P) storage may be broadcasted to all processors, where P is the number of processors. Each processor can maintain the change in its load. Then, only the blocks near the boundary of HSFC contiguous segments need to be considered for exchanging with the neighbor processor to balance the load assuming that the particles do not move quickly. If the processors are sorted by one dimension, a processor typically communicates with only its two neighbors on the line. Communication is therefore inexpensive for adjusting load by migrating block. After migrating blocks are moved between processors, the data structures must be rebuilt and the inter-processor communication patterns need to be updated. 2.4 Force calculation schemes The calculation of force on each particle is the most expensive step in our MD simulation. So it must be calculated both efficiently and in a manner which can be readily parallelized and load-balanced. We have enabled parallelization within our MD algorithm by dividing force computation into two classes of compute function: self-block and pair-block. The self-block function calculates pair interactions between particles within a particular block. The pair-block function calculates pair interactions between pairs of particles residing in neighboring blocks. If one neighbor block lies in the other processor, the system triggers a pair-block function when these data in the neighbor block are received. For managing these calculations, we create two index tables with exploitation of Newton s 3 rd law. One is pair cell in single block, the other is pair cell in neighbor blocks sorted by neighbor relationship between block. These tables along with cell head pointer and the number of particles in the cell can improve executing efficiency by avoiding some jump instruction. These are much more suitable for instruction-level parallelism in advanced computer architecture.
6 3 Parallel performance and scalability We have implemented a MD code based on these algorithm described above on the distributed memory MPP with MPI. It is suitable for our large-scale multi-medium MD simulation in high temperature and high pressure physics. For simplicity, we call this code PMD2D/3D. In this section, we examine parallel efficiency and scalability of our algorithm. These are run on a MPP including hundreds of processors. All units are given in a dimensionless form. We define the following variables: N = number of particles; Q = number of blocks; P = number of Processor used; PE is parallel efficiency; LBE is load balancing efficiency. 3. Parallel performance The first model is: N=,560,000, Q=2000, P=~64. Each block includes cells in x, y, z direction. MD simulation lasted,000 time-steps. We adopt three parallel strategies: () regularly geometric partitioning (RGP); (2) ISP; (3) our method. RGP is often used by classical link-cell MD, which doesn t adjust load balance. So PE of RGP decreases quickly while P increases. ISP and our DLB method are much better than RGP in rendering balanced loads because they can adjust load distribution on time. Our DLB method is superior to ISP, which has improved LBE by 0%. The main reason is that load in single block change quickly and is very difficult to evaluate accurately. By comparison, our DLB relies on actual measurement of time spent by each processor to achieve a much more efficient load distribution as shown in figure 3(b). So its PE decreases slowly while P increases, while P=64, PE 60%. Part of the efficiency loss is inevitable due to communication overhead because communication and compute ratio grows when P increases. Part of this loss is idle time and partitioning time. We believe that improvements to our DLB method will allow us to decrease the idle time further. PE LBE (a) P 0.4 STEP (b) Fig. 3. N=,560,000 parallel efficiency (a) and load balance efficiency while P=64 (b) of PMD2D/3D using RGP(line ), ISP(line 2) and our method(line3) respectively.
7 3.2 Scalability In order to make reasonable comp arison with experimental data, it is often necessary to be able to simulate features on a micro-scale system with at least hundreds of millions of particles, preferably more. So we simultaneously increase the system size and the number of processors, such that N/P=const. Table gives the corresponding parallel efficiency of 2-D simulation while keeping N/P=,600,000 and 3-D simulation keeping N/P=,00,000, where t step is time per integration step in seconds. Obviously, PMD2D/3D has gained very good performance for all number of processors ranging from 2 to 240. Both achieve good scalability with speedup of over 200 on 240 processors even on large numbers of processor for sufficiently large simulations. On 240 processors, it takes about 0 second every step to simulate 380,000,000 particles in 2-D and about 37 second to simulate 276,000,000 particles in 3-D. These results have showed that our algorithm is very effective in modeling hundreds of millions of particles in both 2-D and 3-D. Table. parallel efficiency with N/P=constant P 2-D t step (s) 2-D E (%) 3-D t step (s) 3-D E (%) Comparable performances In the last few years, many research groups[8-0] reported their record of MD simulation. For comparison, we list these results and our current record together in Table 2. However, we have not yet done experiments to compare the performance of MD with other programs for identical molecules with identical potential parameters and identical machines. From Table 2, it is shown that our MD code can simulate the same magnitude number of particles within the same magnitude time costing compared with the world record reported. Table 2. comparable performance in 3-D MD simulation Groups Machine P N t step (s) Lohmdahl [8] CM ,000, Plimpton [9] Paragon ,000, Stadler [9] T3E 52,23,857, Roth [0] T3E 52 5,80,6, Our team One MPP 500,00,000,000 80
8 4 Conclusion We have described the design of the new scalable parallel MD algorithm for largescale multi-medium MD simulation in high temperature and high pressure physics. It uses cell-block DDM based on HSFC to attain practical scalability. It uses a Measured-based MAW DLB based on HSFC to attain high parallel efficiency even while simulating non-uniform MD systems. Our DLB scheme is superior to ISP in real application. Excellent scalability was gained, with a speedup of above 200 on 240 processors of one MPP in both 2-D and 3-D. Although the parallel performance of our algorithm is quite good, it still leaves room for improvement. We believe that improvements to our DLB scheme, combined with the use of computation and communication overlap will allow us to decrease the idle time further. Moreover, parallel I/O and corresponding parallel visualization will be developed in order to help physicists analyze results and penetrate deeply motion of particles. References. Mo Zeyao, Zhang Jinglin Dynamic Load Balancing for Short-Range Parallel Molecular Dynamics Simulations. Intern. J. Computer Math. 79 (2002) Hayashi R., Horiguchi S.: Efficiency of Dynamic Load Balancing Based on Permanent Cells for Parallel Molecular Dynamics Simulation. Proc. Of IPDPS, Cancun, Mexio(2000) Pilkington R., Baden B.: Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves. IEEE Trans. on Parallel and Distributed Systems. 7 (996) Kale, L., Skeel, R., Bhandarkar M.: NAMD2: Greater Scalability for Parallel Molecular Dynamics. J. Computational Physics, 5(999) Sagan, H.: Space-Filling Curves. Springer, New York (994) Mo Zeyao, Zhang Baolin: Multilayer Averaged Weight Method for Dynamic Load Imbalance Problems. Intern. J. Computer Math. 76 (200) Plimpton S.: Fast Parallel Algorithms for Short-range Molecular Dynamics. J. of Computational Physics, 7 (995) Stadler J., Mikulla R., Trebin H. R.: IMD: A Software Package for Molecular Dynamics Studies on Parallel Computes. Intern. J. Modern Physics. 8 (997) Roth J., Gahler F., Trebin H.: A Molecular Dynamics Run with 5,80,6,000 Particles. Int. J. Modern Physics C. (2000)
?kt. An Unconventional Method for Load Balancing. w = C ( t m a z - ti) = p(tmaz - 0i=l. 1 Introduction. R. Alan McCoy,*
ENL-62052 An Unconventional Method for Load Balancing Yuefan Deng,* R. Alan McCoy,* Robert B. Marr,t Ronald F. Peierlst Abstract A new method of load balancing is introduced based on the idea of dynamically
More informationPartitioning and Divide and Conquer Strategies
and Divide and Conquer Strategies Lecture 4 and Strategies Strategies Data partitioning aka domain decomposition Functional decomposition Lecture 4 and Strategies Quiz 4.1 For nuclear reactor simulation,
More informationMulti-GPU Load Balancing for Simulation and Rendering
Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks
More informationCharacterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies
Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems November 3-6, 1999 in Cambridge Massachusetts, USA Characterizing the Performance of Dynamic Distribution
More informationMesh Generation and Load Balancing
Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable
More informationA Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
More informationFast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
More informationACHIEVING SCALABLE PARALLEL MOLECULAR DYNAMICS USING DYNAMIC SPATIAL DOMAIN DECOMPOSITION TECHNIQUES
ACHIEVING SCALABLE PARALLEL MOLECULAR DYNAMICS USING DYNAMIC SPATIAL DOMAIN DECOMPOSITION TECHNIQUES LARS NYLAND, JAN PRINS, RU HUAI YUN, JAN HERMANS, HYE-CHUNG KUM, AND LEI WANG ABSTRACT. To achieve scalable
More informationA Review of Customized Dynamic Load Balancing for a Network of Workstations
A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester
More informationPerformance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
More informationJunghyun Ahn Changho Sung Tag Gon Kim. Korea Advanced Institute of Science and Technology (KAIST) 373-1 Kuseong-dong, Yuseong-gu Daejoen, Korea
Proceedings of the 211 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. A BINARY PARTITION-BASED MATCHING ALGORITHM FOR DATA DISTRIBUTION MANAGEMENT Junghyun
More informationUNIVERSITY OF CALIFORNIA, SAN DIEGO. A Performance Model and Load Balancer for a Parallel Monte-Carlo Cellular Microphysiology Simulator
UNIVERSITY OF CALIFORNIA, SAN DIEGO A Performance Model and Load Balancer for a Parallel Monte-Carlo Cellular Microphysiology Simulator A thesis submitted in partial satisfaction of the requirements for
More informationLoad Balancing Of Parallel Monte Carlo Transport Calculations
Load Balancing Of Parallel Monte Carlo Transport Calculations R.J. Procassini, M. J. O Brien and J.M. Taylor Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 9551 The performance of
More informationCellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
More informationNAMD2- Greater Scalability for Parallel Molecular Dynamics. Presented by Abel Licon
NAMD2- Greater Scalability for Parallel Molecular Dynamics Laxmikant Kale, Robert Steel, Milind Bhandarkar,, Robert Bunner, Attila Gursoy,, Neal Krawetz,, James Phillips, Aritomo Shinozaki, Krishnan Varadarajan,,
More informationOptimizing Load Balance Using Parallel Migratable Objects
Optimizing Load Balance Using Parallel Migratable Objects Laxmikant V. Kalé, Eric Bohm Parallel Programming Laboratory University of Illinois Urbana-Champaign 2012/9/25 Laxmikant V. Kalé, Eric Bohm (UIUC)
More informationChapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup
Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to
More informationPortable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications
Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Rupak Biswas MRJ Technology Solutions NASA Ames Research Center Moffett Field, CA 9435, USA rbiswas@nas.nasa.gov
More informationLoad balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
More informationUintah Framework. Justin Luitjens, Qingyu Meng, John Schmidt, Martin Berzins, Todd Harman, Chuch Wight, Steven Parker, et al
Uintah Framework Justin Luitjens, Qingyu Meng, John Schmidt, Martin Berzins, Todd Harman, Chuch Wight, Steven Parker, et al Uintah Parallel Computing Framework Uintah - far-sighted design by Steve Parker
More informationwalberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation
walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation SIAM Parallel Processing for Scientific Computing 2012 February 16, 2012 Florian Schornbaum,
More informationIndex Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
More informationFD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling
Center for Information Services and High Performance Computing (ZIH) FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling Symposium on HPC and Data-Intensive Applications in Earth
More informationLoad Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity
Load Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity Jaswinder Pal Singh, Chris Holt, Takashi Totsuka, Anoop Gupta and John L. Hennessy Computer
More informationA Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids
A Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids William F. Mitchell Mathematical and Computational Sciences Division National nstitute of Standards
More informationA Novel Switch Mechanism for Load Balancing in Public Cloud
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) A Novel Switch Mechanism for Load Balancing in Public Cloud Kalathoti Rambabu 1, M. Chandra Sekhar 2 1 M. Tech (CSE), MVR College
More informationDynamic Load Balancing of Parallel Monte Carlo Transport Calculations
Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien and Janine Taylor Lawrence Livermore National Laboratory Joint Russian-American Five-Laboratory
More informationLocality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 18, 1037-1048 (2002) Short Paper Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors PANGFENG
More informationDYNAMIC LOAD BALANCING SCHEME FOR ITERATIVE APPLICATIONS
Journal homepage: www.mjret.in DYNAMIC LOAD BALANCING SCHEME FOR ITERATIVE APPLICATIONS ISSN:2348-6953 Rahul S. Wankhade, Darshan M. Marathe, Girish P. Nikam, Milind R. Jawale Department of Computer Engineering,
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationPartitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations
Partitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations James D. Teresco, Karen D. Devine, and Joseph E. Flaherty 3 Department of Computer Science, Williams
More informationParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008
ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element
More informationPerformance metrics for parallel systems
Performance metrics for parallel systems S.S. Kadam C-DAC, Pune sskadam@cdac.in C-DAC/SECG/2006 1 Purpose To determine best parallel algorithm Evaluate hardware platforms Examine the benefits from parallelism
More informationParallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation
Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation Ying Peng, Bin Gong, Hui Liu, and Yanxin Zhang School of Computer Science and Technology, Shandong University,
More informationP013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE
1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France
More informationDynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Dynamic Load Balancing framework in Charm++ Measurement Based Load Balancing Examples: Hybrid Load Balancers Topology-aware
More informationDECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH
DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH P.Neelakantan Department of Computer Science & Engineering, SVCET, Chittoor pneelakantan@rediffmail.com ABSTRACT The grid
More informationConstrained Tetrahedral Mesh Generation of Human Organs on Segmented Volume *
Constrained Tetrahedral Mesh Generation of Human Organs on Segmented Volume * Xiaosong Yang 1, Pheng Ann Heng 2, Zesheng Tang 3 1 Department of Computer Science and Technology, Tsinghua University, Beijing
More informationLarge Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs + D. Pugmire, D. Camp, C. Garth, G.
Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs + D. Pugmire, D. Camp, C. Garth, G. Weber, S. Ahern, & K. Joy Lawrence Berkeley National Laboratory
More informationPERFORMANCE ANALYSIS AND OPTIMIZATION OF LARGE-SCALE SCIENTIFIC APPLICATIONS JINGJIN WU
PERFORMANCE ANALYSIS AND OPTIMIZATION OF LARGE-SCALE SCIENTIFIC APPLICATIONS BY JINGJIN WU Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationLoad Balancing on a Non-dedicated Heterogeneous Network of Workstations
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department
More informationDynamic Load Balancing for Cluster Computing Jaswinder Pal Singh, CSE @ Technische Universität München. e-mail: singhj@in.tum.de
Dynamic Load Balancing for Cluster Computing Jaswinder Pal Singh, CSE @ Technische Universität München. e-mail: singhj@in.tum.de Abstract: In parallel simulations, partitioning and load-balancing algorithms
More informationAdaptive Time-Dependent CFD on Distributed Unstructured Meshes
Adaptive Time-Dependent CFD on Distributed Unstructured Meshes Chris Walshaw and Martin Berzins School of Computer Studies, University of Leeds, Leeds, LS2 9JT, U K e-mails: chris@scsleedsacuk, martin@scsleedsacuk
More informationMulti-GPU Load Balancing for In-situ Visualization
Multi-GPU Load Balancing for In-situ Visualization R. Hagan and Y. Cao Department of Computer Science, Virginia Tech, Blacksburg, VA, USA Abstract Real-time visualization is an important tool for immediately
More informationJan F. Prins. Work-efficient Techniques for the Parallel Execution of Sparse Grid-based Computations TR91-042
Work-efficient Techniques for the Parallel Execution of Sparse Grid-based Computations TR91-042 Jan F. Prins The University of North Carolina at Chapel Hill Department of Computer Science CB#3175, Sitterson
More informationResource Allocation Schemes for Gang Scheduling
Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian
More informationSOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs
SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND -Based SSDs Sangwook Shane Hahn, Sungjin Lee, and Jihong Kim Department of Computer Science and Engineering, Seoul National University,
More informationHPC Deployment of OpenFOAM in an Industrial Setting
HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment
More informationQuantifying the Effectiveness of Load Balance Algorithms
Quantifying the Effectiveness of Load Balance Algorithms Olga Pearce, Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Nancy M. Amato Department of Computer Science and Engineering, Texas A&M University,
More informationA Promising Approach to Dynamic Load Balancing of Weather Forecast Models
CENTER FOR WEATHER FORECAST AND CLIMATE STUDIES A Promising Approach to Dynamic Load Balancing of Weather Forecast Models Jairo Panetta Eduardo Rocha Rodigues Philippe O. A. Navaux Celso L. Mendes Laxmikant
More informationIntroduction to DISC and Hadoop
Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and
More informationText Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
More informationLOAD BALANCING FOR MULTIPLE PARALLEL JOBS
European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 ECCOMAS LOAD BALANCING FOR MULTIPLE PARALLEL JOBS A. Ecer, Y. P. Chien, H.U Akay
More informationPartitioning and Dynamic Load Balancing for Petascale Applications
Partitioning and Dynamic Load Balancing for Petascale Applications Karen Devine, Sandia National Laboratories Erik Boman, Sandia National Laboratories Umit Çatalyürek, Ohio State University Lee Ann Riesen,
More informationFRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite
More informationLoad Balancing Techniques
Load Balancing Techniques 1 Lecture Outline Following Topics will be discussed Static Load Balancing Dynamic Load Balancing Mapping for load balancing Minimizing Interaction 2 1 Load Balancing Techniques
More informationAn Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems
An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems Ardhendu Mandal and Subhas Chandra Pal Department of Computer Science and Application, University
More informationClient/Server Computing Distributed Processing, Client/Server, and Clusters
Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the
More informationLoad Balancing in Structured Peer to Peer Systems
Load Balancing in Structured Peer to Peer Systems DR.K.P.KALIYAMURTHIE 1, D.PARAMESWARI 2 Professor and Head, Dept. of IT, Bharath University, Chennai-600 073 1 Asst. Prof. (SG), Dept. of Computer Applications,
More informationLoad Balancing in Structured Peer to Peer Systems
Load Balancing in Structured Peer to Peer Systems Dr.K.P.Kaliyamurthie 1, D.Parameswari 2 1.Professor and Head, Dept. of IT, Bharath University, Chennai-600 073. 2.Asst. Prof.(SG), Dept. of Computer Applications,
More informationA Comparison of Load Balancing Algorithms for AMR in Uintah
1 A Comparison of Load Balancing Algorithms for AMR in Uintah Qingyu Meng, Justin Luitjens, Martin Berzins UUSCI-2008-006 Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT
More informationClustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
More informationThe Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang
International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang Nanjing Communications
More informationOpenMosix Presented by Dr. Moshe Bar and MAASK [01]
OpenMosix Presented by Dr. Moshe Bar and MAASK [01] openmosix is a kernel extension for single-system image clustering. openmosix [24] is a tool for a Unix-like kernel, such as Linux, consisting of adaptive
More informationDistributed Particle Simulation Method on Adaptive Collaborative System
Distributed Particle Simulation Method on Adaptive Collaborative System Yudong Sun, Zhengyu Liang, and Cho-Li Wang Department of Computer Science and Information Systems The University of Hong Kong, Pokfulam
More informationParallel Hierarchical Visualization of Large Time-Varying 3D Vector Fields
Parallel Hierarchical Visualization of Large Time-Varying 3D Vector Fields Hongfeng Yu Chaoli Wang Kwan-Liu Ma Department of Computer Science University of California at Davis ABSTRACT We present the design
More informationPerformance Metrics for Parallel Programs. 8 March 2010
Performance Metrics for Parallel Programs 8 March 2010 Content measuring time towards analytic modeling, execution time, overhead, speedup, efficiency, cost, granularity, scalability, roadblocks, asymptotic
More informationBasin simulation for complex geological settings
Énergies renouvelables Production éco-responsable Transports innovants Procédés éco-efficients Ressources durables Basin simulation for complex geological settings Towards a realistic modeling P. Havé*,
More informationFPGA area allocation for parallel C applications
1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University
More informationLAMMPS Developer Guide 23 Aug 2011
LAMMPS Developer Guide 23 Aug 2011 This document is a developer guide to the LAMMPS molecular dynamics package, whose WWW site is at lammps.sandia.gov. It describes the internal structure and algorithms
More informationA Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
More informationParallel Scalable Algorithms- Performance Parameters
www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for
More informationHash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization
Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03018-1 Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization Michael Griebel and Gerhard Zumbusch
More informationA Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture
A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi
More informationDistributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
More informationExplicit Spatial Scattering for Load Balancing in Conservatively Synchronized Parallel Discrete-Event Simulations
Explicit Spatial ing for Load Balancing in Conservatively Synchronized Parallel Discrete-Event Simulations Sunil Thulasidasan Shiva Prasad Kasiviswanathan Stephan Eidenbenz Phillip Romero Los Alamos National
More informationA Distributed Render Farm System for Animation Production
A Distributed Render Farm System for Animation Production Jiali Yao, Zhigeng Pan *, Hongxin Zhang State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058, China {yaojiali, zgpan, zhx}@cad.zju.edu.cn
More informationHPC enabling of OpenFOAM R for CFD applications
HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,
More informationLoad Balancing Support for Grid-enabled Applications
John von Neumann Institute for Computing Load Balancing Support for Grid-enabled Applications S. Rips published in Parallel Computing: Current & Future Issues of High-End Computing, Proceedings of the
More informationParallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
More informationHighly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4
Center for Information Services and High Performance Computing (ZIH) Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4 PARA 2010, June 9, Reykjavík, Iceland Matthias
More informationSparse Matrix Decomposition with Optimal Load Balancing
Sparse Matrix Decomposition with Optimal Load Balancing Ali Pınar and Cevdet Aykanat Computer Engineering Department, Bilkent University TR06533 Bilkent, Ankara, Turkey apinar/aykanat @cs.bilkent.edu.tr
More informationHectiling: An Integration of Fine and Coarse Grained Load Balancing Strategies 1
Copyright 1998 IEEE. Published in the Proceedings of HPDC 7 98, 28 31 July 1998 at Chicago, Illinois. Personal use of this material is permitted. However, permission to reprint/republish this material
More informationPARALLELS CLOUD STORAGE
PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...
More informationHigh Performance Matrix Inversion with Several GPUs
High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República
More informationLoad Balancing Strategies for Parallel SAMR Algorithms
Proposal for a Summer Undergraduate Research Fellowship 2005 Computer science / Applied and Computational Mathematics Load Balancing Strategies for Parallel SAMR Algorithms Randolf Rotta Institut für Informatik,
More informationReliable Systolic Computing through Redundancy
Reliable Systolic Computing through Redundancy Kunio Okuda 1, Siang Wun Song 1, and Marcos Tatsuo Yamamoto 1 Universidade de São Paulo, Brazil, {kunio,song,mty}@ime.usp.br, http://www.ime.usp.br/ song/
More informationA Load Balancing Schema for Agent-based SPMD Applications
A Load Balancing Schema for Agent-based SPMD Applications Claudio Márquez, Eduardo César, and Joan Sorribes Computer Architecture and Operating Systems Department (CAOS), Universitat Autònoma de Barcelona,
More informationLayer Load Balancing and Flexibility
Periodic Hierarchical Load Balancing for Large Supercomputers Gengbin Zheng, Abhinav Bhatelé, Esteban Meneses and Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign,
More informationSCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION. Marc-Olivier Briat, Jean-Luc Monnot, Edith M.
SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION Abstract Marc-Olivier Briat, Jean-Luc Monnot, Edith M. Punt Esri, Redlands, California, USA mbriat@esri.com, jmonnot@esri.com,
More informationHigh Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationParallel Visualization for GIS Applications
Parallel Visualization for GIS Applications Alexandre Sorokine, Jamison Daniel, Cheng Liu Oak Ridge National Laboratory, Geographic Information Science & Technology, PO Box 2008 MS 6017, Oak Ridge National
More informationUsing Predictive Adaptive Parallelism to Address Portability and Irregularity
Using Predictive Adaptive Parallelism to Address Portability and Irregularity avid L. Wangerin and Isaac. Scherson {dwangeri,isaac}@uci.edu School of Computer Science University of California, Irvine Irvine,
More informationLossless Data Compression Standard Applications and the MapReduce Web Computing Framework
Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Sergio De Agostino Computer Science Department Sapienza University of Rome Internet as a Distributed System Modern
More informationHeat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments
Heat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments Yunhua Deng Rynson W.H. Lau Department of Computer Science, City University of Hong Kong, Hong Kong Abstract Distributed
More informationSupporting Mobility In Publish-Subscribe Networks
A Selective Neighbor Caching Approach for Supporting Mobility in Publish/Subscribe Networks Vasilios A. Siris, Xenofon Vasilakos, and George C. Polyzos Mobile Multimedia Laboratory Department of Informatics
More informationFast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
More informationDistributed Memory Machines. Sanjay Goil and Sanjay Ranka. School of CIS ond NPAC. sgoil,ranka@top.cis.syr.edu
Dynamic Load Balancing for Raytraced Volume Rendering on Distributed Memory Machines Sanjay Goil and Sanjay Ranka School of CIS ond NPAC Syracuse University, Syracuse, NY, 13244-4100 sgoil,ranka@top.cis.syr.edu
More information