A New Scalable Parallel Method for Molecular Dynamics Based on Cell-Block Data Structure 1

Size: px
Start display at page:

Download "A New Scalable Parallel Method for Molecular Dynamics Based on Cell-Block Data Structure 1"

Transcription

1 A New Scalable Parallel Method for Molecular Dynamics Based on Cell-Block Data Structure Cao Xiaolin Mo Zeyao High Performance Computing Center, State Key Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, P. O. Box 8009, 00088, Bei-Jing P. R. China {xiaolincao, Abstract. A scalable parallel algorithm especially for large-scale three dimensional simulations with seriously non-uniform particles distributions is presented. In particular, based on cell-block data structures, this algorithm uses Hilbert space filling curve to convert three-dimensional domain decomposition for load distribution across processors into one-dimensional load balancing problems for which measurement-based multilevel averaging weights(maw) method can be applied successfully. Against inverse space-filling partitioning(isp), MAW redistributes blocks by monitoring change of total load in each processor. Numerical experimental results have shown that MAW is superior to ISP in rendering balanced load for large-scale multi-medium MD simulation in high temperature and high pressure physics. Excellent scalability was demonstrated, with a speedup larger than 200 with 240 processors of one MPP. The largest run with. 0 9 particles on 500 processors took 80 seconds per time step. Introduction Molecular dynamics(md) simulation is an important tool in studying the properties of condensed matter and their dynamic interactions that can be difficult to obtain by other means. In order to make reasonable comparison with experiment, it is often necessary to simulate features on a micron scale. Realistic MD simulations of this size require at least 0 8 ~0 9 particles, preferably more. Non-uniform distribution of various kinds of particles in space produces a highly irregular computational load. Also, as the simulations evolves, the movement of particles causes changes in the load distribution of processor used. These factors make it difficult to achieve high scalability. Therefore a good load balancing scheme is necessary to enhance scalability. Research supported by Chinese NSF( ), Chinese 863 program(2002aa04570) and CAEP Funds

2 Mo[] has presented a robust iterative -D DLB algorithm(i.e. MAW) to be suitable for 2-D parallel link-cell MD simulation. Hayashi[2] generalized the cellular automation diffusion scheme to the 3-D simulation by introducing a concept of permanent cell to minimize inter-processor communication overheads. Against ORB and ORB-MM, Pilkington[3] have shown that of the three strategies, only Inverse Space-filling Partitioning(ISP) is able to render highly balanced workloads without incurring elaborate bookkeeping on a uniform mesh of N-body problem. NAMD[4] relies on a measurement-based DLB scheme to achieve high scalability for biomolecular systems. However, these DLB schemes are not suitable for our real application. Based on these research described above, a measurement-based multilevel averaging weight DLB scheme based on Hilbert space-filling curve(hsfc)[5] was presented for our large-scale multi-medium MD simulation in high temperature and high pressure physics, which computational load is unpredictably and non-uniform with position and time. Then a new cell-block data structure required to describe MD simulation was constructed in order to help DLB scheme provide assistance with the movement of data. After data are moved between processors, the data structures must be rebuilt and the inter-processor communication patterns need to be updated. The DLB scheme and cell-block data structure, along with some auxiliary function, were integrated into a new parallel MD algorithm aimed at utilizing large parallel machines in a scalable manner. The new parallel MD algorithm is described in section 2. The results of some performance evaluation are discussed in section 3. Against ISP, our DLB scheme can get better load balance with monitoring change of total load in each processor instead of monitoring change of workload in each block. Two numerical experimental results have showed that this new parallel MD algorithm can achieve high scalability for large-scale multi-medium MD simulation in high temperature and high pressure physics. Finally, we give some conclusions in section 4. 2 Parallelization strategy Provided that P is the number of processors, traditional link-cell domain decomposition method(ddm) partitions space into P sub-spaces (one per processor). It is highly scalable while particle densities are uniform. However, it has some disadvantages: () It is hard to use if the number of processors cannot be factored into three roughly equal factors; (2) Non-uniform distribution of particles can result in load imbalance; (3) Its data structure is not suitable for most DLB methods. In order to solve these problems, our method firstly partitions space into Q(Q >>P ) fixed-size blocks and creates cell-block data structure. Secondly, Hilbert space filling curve(hsfc) imposes a linear order(i.e. HSFC index) of the blocks in the high-dimensional space, which is the foundation of our DDM and DLB scheme. Thirdly, it constructs cell-block DDM based on HSFC and maps multiple blocks to each processor. Finally, a measurement- based multilevel averaging weight DLB scheme based on HSFC is used to get better load balance by redistributing blocks.

3 2. Cell-Block data structure Our method firstly partitions physical space into Q blocks and then each block is subdivided into smaller volumes named cells with a side length R L = r c + δ, where r c is cut-off radius, δ is a small positive number. In Fig., each block includes 4 4 computing cells (white). A layer of empty cells (shade) is padded to each block. We call these extra cells auxiliary cell, which stores temporarily some indices of particles mo v- ing to neighbor blocks. Because link-cell data structure can result in irregular memory access, we adopt compact memory management in cell-block data structure. Particles in the same cell are always stored sequentially in a list. So we designed a cell head pointer describing initial memory address of particles in the cell and an integer number describing the number of particles in the cell Fig.. Cell-Block structure include compute cell(white) and auxiliary cell(shade), where number in cell is index of cell. 2.2 cell-block DDM based on HSFC Given a non-uniform model as shown in Fig. 2(a), simulation space was divided into 8 8 blocks, the total number of blocks Q is 64. Load of each block was first evaluated approximately by counting particle number in block. Then we adopt Zoltan method[6] generating HSFC, which is valid for any shape space. As shown in Fig. 2(b), HSFC visit every block of 2-D space. Meanwhile, we numbered all blocks from to 64 along HSFC. The mapping of the hyperspace to the line is done once only, and is therefore a pre-processing step. Moreover, we construct a fast transform table in order to manage the mapping information. Finally, we apply -D recursive bisection to cut HSFC into 4

4 logically contiguous segments containing almost equal loads that correspond to physically irregular partitions, as shown in Fig. 2(b). Loads of 4 sub-domain is 5(dot: -8), 508(up diagonal line: 9-33), 525(white: 34-49), 543(down diagonal line: 50-64), respectively. Each segment includes a collection of blocks, which can be assigned to corresponding processors. So we can implicitly partition the hyperspace by partitioning simply and effectively -D line, which transforms hyperspace load balance problem into -D load balance problem. It can achieve better load balance, but arise irregular hyperspace partitions, which mu st manage unstructured communication. Therefore, bookkeeping information is required for each block. In real large-scale simulation, the number of blocks are generally far less than the number of cells, which reduce overheads of managing communication and the memory of bookkeeping information to O(Q). Moreover, D recursive bisection partitioning cost is O(Q). So we use block instead of cell as a allocable and mobile unit in order to reduce these overheads (a) (b) Fig. 2. HSFC-based domain decomposition. (a) load of 8 8 blocks, the number in each block represent load of owned block (b) partitioning results, the number in each block represent the HSFC index of owned block. 2.3 Measured-based MAW DLB based on HSFC In our MD application, the movement of particles cause load fluctuation, and this phenomenon becomes more and more critical as the time evolves. This may cause severe load imbalance. It is often necessary to adjust the loads very quickly to be balanced. Paper[3] adopt ISP method to solve this imbalance in N-body problem. It exhibits several disadvantages in our large-scale multi-medium MD simulation in high temperature and high pressure physics. The main difficulty was determining the computational load of single block. ISP evaluates load of each block by a simple timedependent function, such as the number of particles in the block and maybe the number of particles in neighboring blocks. It is not suitable for our MD simulation because

5 load of each block is dependent on the number of particles, the geometric distribution of particles and the type of particles. Moreover, processor speed and cache effect both also affect this load. So a new DLB scheme needs to be designed for this MD simulation. Once the hyperspace has been mapped to the line by HSFC, -D MAW[7] can be used to solve load balance problem arising from our MD application. So a measurement-based MAW DLB scheme based on HSFC and cell-block data structure was presented. It redistributes blocks sorted by HSFC index by monitoring change of total load in single processor instead of monitoring change of workload in single block. Simulation executes in the following procedure to perform DLB. First, the simulation runs for a small number of steps, typically lasting a few minutes. Actual computing time of local processor is measured during this time. After a particular simulation step, the main processor collects the load data from each processor, computes a new blocks distribution by calling MAW method. Since its partitioning time is linear with the number of blocks(q) that is O(Q), partitioning overheads on a single processor will remain modest for large problems so long as Q scaled accordingly. A table describing the complete partitioning with an O(P) storage may be broadcasted to all processors, where P is the number of processors. Each processor can maintain the change in its load. Then, only the blocks near the boundary of HSFC contiguous segments need to be considered for exchanging with the neighbor processor to balance the load assuming that the particles do not move quickly. If the processors are sorted by one dimension, a processor typically communicates with only its two neighbors on the line. Communication is therefore inexpensive for adjusting load by migrating block. After migrating blocks are moved between processors, the data structures must be rebuilt and the inter-processor communication patterns need to be updated. 2.4 Force calculation schemes The calculation of force on each particle is the most expensive step in our MD simulation. So it must be calculated both efficiently and in a manner which can be readily parallelized and load-balanced. We have enabled parallelization within our MD algorithm by dividing force computation into two classes of compute function: self-block and pair-block. The self-block function calculates pair interactions between particles within a particular block. The pair-block function calculates pair interactions between pairs of particles residing in neighboring blocks. If one neighbor block lies in the other processor, the system triggers a pair-block function when these data in the neighbor block are received. For managing these calculations, we create two index tables with exploitation of Newton s 3 rd law. One is pair cell in single block, the other is pair cell in neighbor blocks sorted by neighbor relationship between block. These tables along with cell head pointer and the number of particles in the cell can improve executing efficiency by avoiding some jump instruction. These are much more suitable for instruction-level parallelism in advanced computer architecture.

6 3 Parallel performance and scalability We have implemented a MD code based on these algorithm described above on the distributed memory MPP with MPI. It is suitable for our large-scale multi-medium MD simulation in high temperature and high pressure physics. For simplicity, we call this code PMD2D/3D. In this section, we examine parallel efficiency and scalability of our algorithm. These are run on a MPP including hundreds of processors. All units are given in a dimensionless form. We define the following variables: N = number of particles; Q = number of blocks; P = number of Processor used; PE is parallel efficiency; LBE is load balancing efficiency. 3. Parallel performance The first model is: N=,560,000, Q=2000, P=~64. Each block includes cells in x, y, z direction. MD simulation lasted,000 time-steps. We adopt three parallel strategies: () regularly geometric partitioning (RGP); (2) ISP; (3) our method. RGP is often used by classical link-cell MD, which doesn t adjust load balance. So PE of RGP decreases quickly while P increases. ISP and our DLB method are much better than RGP in rendering balanced loads because they can adjust load distribution on time. Our DLB method is superior to ISP, which has improved LBE by 0%. The main reason is that load in single block change quickly and is very difficult to evaluate accurately. By comparison, our DLB relies on actual measurement of time spent by each processor to achieve a much more efficient load distribution as shown in figure 3(b). So its PE decreases slowly while P increases, while P=64, PE 60%. Part of the efficiency loss is inevitable due to communication overhead because communication and compute ratio grows when P increases. Part of this loss is idle time and partitioning time. We believe that improvements to our DLB method will allow us to decrease the idle time further. PE LBE (a) P 0.4 STEP (b) Fig. 3. N=,560,000 parallel efficiency (a) and load balance efficiency while P=64 (b) of PMD2D/3D using RGP(line ), ISP(line 2) and our method(line3) respectively.

7 3.2 Scalability In order to make reasonable comp arison with experimental data, it is often necessary to be able to simulate features on a micro-scale system with at least hundreds of millions of particles, preferably more. So we simultaneously increase the system size and the number of processors, such that N/P=const. Table gives the corresponding parallel efficiency of 2-D simulation while keeping N/P=,600,000 and 3-D simulation keeping N/P=,00,000, where t step is time per integration step in seconds. Obviously, PMD2D/3D has gained very good performance for all number of processors ranging from 2 to 240. Both achieve good scalability with speedup of over 200 on 240 processors even on large numbers of processor for sufficiently large simulations. On 240 processors, it takes about 0 second every step to simulate 380,000,000 particles in 2-D and about 37 second to simulate 276,000,000 particles in 3-D. These results have showed that our algorithm is very effective in modeling hundreds of millions of particles in both 2-D and 3-D. Table. parallel efficiency with N/P=constant P 2-D t step (s) 2-D E (%) 3-D t step (s) 3-D E (%) Comparable performances In the last few years, many research groups[8-0] reported their record of MD simulation. For comparison, we list these results and our current record together in Table 2. However, we have not yet done experiments to compare the performance of MD with other programs for identical molecules with identical potential parameters and identical machines. From Table 2, it is shown that our MD code can simulate the same magnitude number of particles within the same magnitude time costing compared with the world record reported. Table 2. comparable performance in 3-D MD simulation Groups Machine P N t step (s) Lohmdahl [8] CM ,000, Plimpton [9] Paragon ,000, Stadler [9] T3E 52,23,857, Roth [0] T3E 52 5,80,6, Our team One MPP 500,00,000,000 80

8 4 Conclusion We have described the design of the new scalable parallel MD algorithm for largescale multi-medium MD simulation in high temperature and high pressure physics. It uses cell-block DDM based on HSFC to attain practical scalability. It uses a Measured-based MAW DLB based on HSFC to attain high parallel efficiency even while simulating non-uniform MD systems. Our DLB scheme is superior to ISP in real application. Excellent scalability was gained, with a speedup of above 200 on 240 processors of one MPP in both 2-D and 3-D. Although the parallel performance of our algorithm is quite good, it still leaves room for improvement. We believe that improvements to our DLB scheme, combined with the use of computation and communication overlap will allow us to decrease the idle time further. Moreover, parallel I/O and corresponding parallel visualization will be developed in order to help physicists analyze results and penetrate deeply motion of particles. References. Mo Zeyao, Zhang Jinglin Dynamic Load Balancing for Short-Range Parallel Molecular Dynamics Simulations. Intern. J. Computer Math. 79 (2002) Hayashi R., Horiguchi S.: Efficiency of Dynamic Load Balancing Based on Permanent Cells for Parallel Molecular Dynamics Simulation. Proc. Of IPDPS, Cancun, Mexio(2000) Pilkington R., Baden B.: Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves. IEEE Trans. on Parallel and Distributed Systems. 7 (996) Kale, L., Skeel, R., Bhandarkar M.: NAMD2: Greater Scalability for Parallel Molecular Dynamics. J. Computational Physics, 5(999) Sagan, H.: Space-Filling Curves. Springer, New York (994) Mo Zeyao, Zhang Baolin: Multilayer Averaged Weight Method for Dynamic Load Imbalance Problems. Intern. J. Computer Math. 76 (200) Plimpton S.: Fast Parallel Algorithms for Short-range Molecular Dynamics. J. of Computational Physics, 7 (995) Stadler J., Mikulla R., Trebin H. R.: IMD: A Software Package for Molecular Dynamics Studies on Parallel Computes. Intern. J. Modern Physics. 8 (997) Roth J., Gahler F., Trebin H.: A Molecular Dynamics Run with 5,80,6,000 Particles. Int. J. Modern Physics C. (2000)

?kt. An Unconventional Method for Load Balancing. w = C ( t m a z - ti) = p(tmaz - 0i=l. 1 Introduction. R. Alan McCoy,*

?kt. An Unconventional Method for Load Balancing. w = C ( t m a z - ti) = p(tmaz - 0i=l. 1 Introduction. R. Alan McCoy,* ENL-62052 An Unconventional Method for Load Balancing Yuefan Deng,* R. Alan McCoy,* Robert B. Marr,t Ronald F. Peierlst Abstract A new method of load balancing is introduced based on the idea of dynamically

More information

Partitioning and Divide and Conquer Strategies

Partitioning and Divide and Conquer Strategies and Divide and Conquer Strategies Lecture 4 and Strategies Strategies Data partitioning aka domain decomposition Functional decomposition Lecture 4 and Strategies Quiz 4.1 For nuclear reactor simulation,

More information

Multi-GPU Load Balancing for Simulation and Rendering

Multi-GPU Load Balancing for Simulation and Rendering Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks

More information

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems November 3-6, 1999 in Cambridge Massachusetts, USA Characterizing the Performance of Dynamic Distribution

More information

Mesh Generation and Load Balancing

Mesh Generation and Load Balancing Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable

More information

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

ACHIEVING SCALABLE PARALLEL MOLECULAR DYNAMICS USING DYNAMIC SPATIAL DOMAIN DECOMPOSITION TECHNIQUES

ACHIEVING SCALABLE PARALLEL MOLECULAR DYNAMICS USING DYNAMIC SPATIAL DOMAIN DECOMPOSITION TECHNIQUES ACHIEVING SCALABLE PARALLEL MOLECULAR DYNAMICS USING DYNAMIC SPATIAL DOMAIN DECOMPOSITION TECHNIQUES LARS NYLAND, JAN PRINS, RU HUAI YUN, JAN HERMANS, HYE-CHUNG KUM, AND LEI WANG ABSTRACT. To achieve scalable

More information

A Review of Customized Dynamic Load Balancing for a Network of Workstations

A Review of Customized Dynamic Load Balancing for a Network of Workstations A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester

More information

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions

More information

Junghyun Ahn Changho Sung Tag Gon Kim. Korea Advanced Institute of Science and Technology (KAIST) 373-1 Kuseong-dong, Yuseong-gu Daejoen, Korea

Junghyun Ahn Changho Sung Tag Gon Kim. Korea Advanced Institute of Science and Technology (KAIST) 373-1 Kuseong-dong, Yuseong-gu Daejoen, Korea Proceedings of the 211 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. A BINARY PARTITION-BASED MATCHING ALGORITHM FOR DATA DISTRIBUTION MANAGEMENT Junghyun

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO. A Performance Model and Load Balancer for a Parallel Monte-Carlo Cellular Microphysiology Simulator

UNIVERSITY OF CALIFORNIA, SAN DIEGO. A Performance Model and Load Balancer for a Parallel Monte-Carlo Cellular Microphysiology Simulator UNIVERSITY OF CALIFORNIA, SAN DIEGO A Performance Model and Load Balancer for a Parallel Monte-Carlo Cellular Microphysiology Simulator A thesis submitted in partial satisfaction of the requirements for

More information

Load Balancing Of Parallel Monte Carlo Transport Calculations

Load Balancing Of Parallel Monte Carlo Transport Calculations Load Balancing Of Parallel Monte Carlo Transport Calculations R.J. Procassini, M. J. O Brien and J.M. Taylor Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 9551 The performance of

More information

Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

More information

NAMD2- Greater Scalability for Parallel Molecular Dynamics. Presented by Abel Licon

NAMD2- Greater Scalability for Parallel Molecular Dynamics. Presented by Abel Licon NAMD2- Greater Scalability for Parallel Molecular Dynamics Laxmikant Kale, Robert Steel, Milind Bhandarkar,, Robert Bunner, Attila Gursoy,, Neal Krawetz,, James Phillips, Aritomo Shinozaki, Krishnan Varadarajan,,

More information

Optimizing Load Balance Using Parallel Migratable Objects

Optimizing Load Balance Using Parallel Migratable Objects Optimizing Load Balance Using Parallel Migratable Objects Laxmikant V. Kalé, Eric Bohm Parallel Programming Laboratory University of Illinois Urbana-Champaign 2012/9/25 Laxmikant V. Kalé, Eric Bohm (UIUC)

More information

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to

More information

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Rupak Biswas MRJ Technology Solutions NASA Ames Research Center Moffett Field, CA 9435, USA rbiswas@nas.nasa.gov

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

Uintah Framework. Justin Luitjens, Qingyu Meng, John Schmidt, Martin Berzins, Todd Harman, Chuch Wight, Steven Parker, et al

Uintah Framework. Justin Luitjens, Qingyu Meng, John Schmidt, Martin Berzins, Todd Harman, Chuch Wight, Steven Parker, et al Uintah Framework Justin Luitjens, Qingyu Meng, John Schmidt, Martin Berzins, Todd Harman, Chuch Wight, Steven Parker, et al Uintah Parallel Computing Framework Uintah - far-sighted design by Steve Parker

More information

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation SIAM Parallel Processing for Scientific Computing 2012 February 16, 2012 Florian Schornbaum,

More information

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk. Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated

More information

FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling

FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling Center for Information Services and High Performance Computing (ZIH) FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling Symposium on HPC and Data-Intensive Applications in Earth

More information

Load Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity

Load Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity Load Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity Jaswinder Pal Singh, Chris Holt, Takashi Totsuka, Anoop Gupta and John L. Hennessy Computer

More information

A Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids

A Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids A Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids William F. Mitchell Mathematical and Computational Sciences Division National nstitute of Standards

More information

A Novel Switch Mechanism for Load Balancing in Public Cloud

A Novel Switch Mechanism for Load Balancing in Public Cloud International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) A Novel Switch Mechanism for Load Balancing in Public Cloud Kalathoti Rambabu 1, M. Chandra Sekhar 2 1 M. Tech (CSE), MVR College

More information

Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations

Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien and Janine Taylor Lawrence Livermore National Laboratory Joint Russian-American Five-Laboratory

More information

Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors

Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 18, 1037-1048 (2002) Short Paper Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors PANGFENG

More information

DYNAMIC LOAD BALANCING SCHEME FOR ITERATIVE APPLICATIONS

DYNAMIC LOAD BALANCING SCHEME FOR ITERATIVE APPLICATIONS Journal homepage: www.mjret.in DYNAMIC LOAD BALANCING SCHEME FOR ITERATIVE APPLICATIONS ISSN:2348-6953 Rahul S. Wankhade, Darshan M. Marathe, Girish P. Nikam, Milind R. Jawale Department of Computer Engineering,

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

Partitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations

Partitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations Partitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations James D. Teresco, Karen D. Devine, and Joseph E. Flaherty 3 Department of Computer Science, Williams

More information

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element

More information

Performance metrics for parallel systems

Performance metrics for parallel systems Performance metrics for parallel systems S.S. Kadam C-DAC, Pune sskadam@cdac.in C-DAC/SECG/2006 1 Purpose To determine best parallel algorithm Evaluate hardware platforms Examine the benefits from parallelism

More information

Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation

Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation Ying Peng, Bin Gong, Hui Liu, and Yanxin Zhang School of Computer Science and Technology, Shandong University,

More information

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE 1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France

More information

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Dynamic Load Balancing framework in Charm++ Measurement Based Load Balancing Examples: Hybrid Load Balancers Topology-aware

More information

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH P.Neelakantan Department of Computer Science & Engineering, SVCET, Chittoor pneelakantan@rediffmail.com ABSTRACT The grid

More information

Constrained Tetrahedral Mesh Generation of Human Organs on Segmented Volume *

Constrained Tetrahedral Mesh Generation of Human Organs on Segmented Volume * Constrained Tetrahedral Mesh Generation of Human Organs on Segmented Volume * Xiaosong Yang 1, Pheng Ann Heng 2, Zesheng Tang 3 1 Department of Computer Science and Technology, Tsinghua University, Beijing

More information

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs + D. Pugmire, D. Camp, C. Garth, G.

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs + D. Pugmire, D. Camp, C. Garth, G. Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs + D. Pugmire, D. Camp, C. Garth, G. Weber, S. Ahern, & K. Joy Lawrence Berkeley National Laboratory

More information

PERFORMANCE ANALYSIS AND OPTIMIZATION OF LARGE-SCALE SCIENTIFIC APPLICATIONS JINGJIN WU

PERFORMANCE ANALYSIS AND OPTIMIZATION OF LARGE-SCALE SCIENTIFIC APPLICATIONS JINGJIN WU PERFORMANCE ANALYSIS AND OPTIMIZATION OF LARGE-SCALE SCIENTIFIC APPLICATIONS BY JINGJIN WU Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department

More information

Dynamic Load Balancing for Cluster Computing Jaswinder Pal Singh, CSE @ Technische Universität München. e-mail: singhj@in.tum.de

Dynamic Load Balancing for Cluster Computing Jaswinder Pal Singh, CSE @ Technische Universität München. e-mail: singhj@in.tum.de Dynamic Load Balancing for Cluster Computing Jaswinder Pal Singh, CSE @ Technische Universität München. e-mail: singhj@in.tum.de Abstract: In parallel simulations, partitioning and load-balancing algorithms

More information

Adaptive Time-Dependent CFD on Distributed Unstructured Meshes

Adaptive Time-Dependent CFD on Distributed Unstructured Meshes Adaptive Time-Dependent CFD on Distributed Unstructured Meshes Chris Walshaw and Martin Berzins School of Computer Studies, University of Leeds, Leeds, LS2 9JT, U K e-mails: chris@scsleedsacuk, martin@scsleedsacuk

More information

Multi-GPU Load Balancing for In-situ Visualization

Multi-GPU Load Balancing for In-situ Visualization Multi-GPU Load Balancing for In-situ Visualization R. Hagan and Y. Cao Department of Computer Science, Virginia Tech, Blacksburg, VA, USA Abstract Real-time visualization is an important tool for immediately

More information

Jan F. Prins. Work-efficient Techniques for the Parallel Execution of Sparse Grid-based Computations TR91-042

Jan F. Prins. Work-efficient Techniques for the Parallel Execution of Sparse Grid-based Computations TR91-042 Work-efficient Techniques for the Parallel Execution of Sparse Grid-based Computations TR91-042 Jan F. Prins The University of North Carolina at Chapel Hill Department of Computer Science CB#3175, Sitterson

More information

Resource Allocation Schemes for Gang Scheduling

Resource Allocation Schemes for Gang Scheduling Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian

More information

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND -Based SSDs Sangwook Shane Hahn, Sungjin Lee, and Jihong Kim Department of Computer Science and Engineering, Seoul National University,

More information

HPC Deployment of OpenFOAM in an Industrial Setting

HPC Deployment of OpenFOAM in an Industrial Setting HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment

More information

Quantifying the Effectiveness of Load Balance Algorithms

Quantifying the Effectiveness of Load Balance Algorithms Quantifying the Effectiveness of Load Balance Algorithms Olga Pearce, Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Nancy M. Amato Department of Computer Science and Engineering, Texas A&M University,

More information

A Promising Approach to Dynamic Load Balancing of Weather Forecast Models

A Promising Approach to Dynamic Load Balancing of Weather Forecast Models CENTER FOR WEATHER FORECAST AND CLIMATE STUDIES A Promising Approach to Dynamic Load Balancing of Weather Forecast Models Jairo Panetta Eduardo Rocha Rodigues Philippe O. A. Navaux Celso L. Mendes Laxmikant

More information

Introduction to DISC and Hadoop

Introduction to DISC and Hadoop Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and

More information

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,

More information

LOAD BALANCING FOR MULTIPLE PARALLEL JOBS

LOAD BALANCING FOR MULTIPLE PARALLEL JOBS European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 ECCOMAS LOAD BALANCING FOR MULTIPLE PARALLEL JOBS A. Ecer, Y. P. Chien, H.U Akay

More information

Partitioning and Dynamic Load Balancing for Petascale Applications

Partitioning and Dynamic Load Balancing for Petascale Applications Partitioning and Dynamic Load Balancing for Petascale Applications Karen Devine, Sandia National Laboratories Erik Boman, Sandia National Laboratories Umit Çatalyürek, Ohio State University Lee Ann Riesen,

More information

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite

More information

Load Balancing Techniques

Load Balancing Techniques Load Balancing Techniques 1 Lecture Outline Following Topics will be discussed Static Load Balancing Dynamic Load Balancing Mapping for load balancing Minimizing Interaction 2 1 Load Balancing Techniques

More information

An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems

An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems Ardhendu Mandal and Subhas Chandra Pal Department of Computer Science and Application, University

More information

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Client/Server Computing Distributed Processing, Client/Server, and Clusters Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the

More information

Load Balancing in Structured Peer to Peer Systems

Load Balancing in Structured Peer to Peer Systems Load Balancing in Structured Peer to Peer Systems DR.K.P.KALIYAMURTHIE 1, D.PARAMESWARI 2 Professor and Head, Dept. of IT, Bharath University, Chennai-600 073 1 Asst. Prof. (SG), Dept. of Computer Applications,

More information

Load Balancing in Structured Peer to Peer Systems

Load Balancing in Structured Peer to Peer Systems Load Balancing in Structured Peer to Peer Systems Dr.K.P.Kaliyamurthie 1, D.Parameswari 2 1.Professor and Head, Dept. of IT, Bharath University, Chennai-600 073. 2.Asst. Prof.(SG), Dept. of Computer Applications,

More information

A Comparison of Load Balancing Algorithms for AMR in Uintah

A Comparison of Load Balancing Algorithms for AMR in Uintah 1 A Comparison of Load Balancing Algorithms for AMR in Uintah Qingyu Meng, Justin Luitjens, Martin Berzins UUSCI-2008-006 Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang

The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang Nanjing Communications

More information

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

OpenMosix Presented by Dr. Moshe Bar and MAASK [01] OpenMosix Presented by Dr. Moshe Bar and MAASK [01] openmosix is a kernel extension for single-system image clustering. openmosix [24] is a tool for a Unix-like kernel, such as Linux, consisting of adaptive

More information

Distributed Particle Simulation Method on Adaptive Collaborative System

Distributed Particle Simulation Method on Adaptive Collaborative System Distributed Particle Simulation Method on Adaptive Collaborative System Yudong Sun, Zhengyu Liang, and Cho-Li Wang Department of Computer Science and Information Systems The University of Hong Kong, Pokfulam

More information

Parallel Hierarchical Visualization of Large Time-Varying 3D Vector Fields

Parallel Hierarchical Visualization of Large Time-Varying 3D Vector Fields Parallel Hierarchical Visualization of Large Time-Varying 3D Vector Fields Hongfeng Yu Chaoli Wang Kwan-Liu Ma Department of Computer Science University of California at Davis ABSTRACT We present the design

More information

Performance Metrics for Parallel Programs. 8 March 2010

Performance Metrics for Parallel Programs. 8 March 2010 Performance Metrics for Parallel Programs 8 March 2010 Content measuring time towards analytic modeling, execution time, overhead, speedup, efficiency, cost, granularity, scalability, roadblocks, asymptotic

More information

Basin simulation for complex geological settings

Basin simulation for complex geological settings Énergies renouvelables Production éco-responsable Transports innovants Procédés éco-efficients Ressources durables Basin simulation for complex geological settings Towards a realistic modeling P. Havé*,

More information

FPGA area allocation for parallel C applications

FPGA area allocation for parallel C applications 1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University

More information

LAMMPS Developer Guide 23 Aug 2011

LAMMPS Developer Guide 23 Aug 2011 LAMMPS Developer Guide 23 Aug 2011 This document is a developer guide to the LAMMPS molecular dynamics package, whose WWW site is at lammps.sandia.gov. It describes the internal structure and algorithms

More information

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

More information

Parallel Scalable Algorithms- Performance Parameters

Parallel Scalable Algorithms- Performance Parameters www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization

Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03018-1 Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization Michael Griebel and Gerhard Zumbusch

More information

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Explicit Spatial Scattering for Load Balancing in Conservatively Synchronized Parallel Discrete-Event Simulations

Explicit Spatial Scattering for Load Balancing in Conservatively Synchronized Parallel Discrete-Event Simulations Explicit Spatial ing for Load Balancing in Conservatively Synchronized Parallel Discrete-Event Simulations Sunil Thulasidasan Shiva Prasad Kasiviswanathan Stephan Eidenbenz Phillip Romero Los Alamos National

More information

A Distributed Render Farm System for Animation Production

A Distributed Render Farm System for Animation Production A Distributed Render Farm System for Animation Production Jiali Yao, Zhigeng Pan *, Hongxin Zhang State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058, China {yaojiali, zgpan, zhx}@cad.zju.edu.cn

More information

HPC enabling of OpenFOAM R for CFD applications

HPC enabling of OpenFOAM R for CFD applications HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,

More information

Load Balancing Support for Grid-enabled Applications

Load Balancing Support for Grid-enabled Applications John von Neumann Institute for Computing Load Balancing Support for Grid-enabled Applications S. Rips published in Parallel Computing: Current & Future Issues of High-End Computing, Proceedings of the

More information

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4

Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4 Center for Information Services and High Performance Computing (ZIH) Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4 PARA 2010, June 9, Reykjavík, Iceland Matthias

More information

Sparse Matrix Decomposition with Optimal Load Balancing

Sparse Matrix Decomposition with Optimal Load Balancing Sparse Matrix Decomposition with Optimal Load Balancing Ali Pınar and Cevdet Aykanat Computer Engineering Department, Bilkent University TR06533 Bilkent, Ankara, Turkey apinar/aykanat @cs.bilkent.edu.tr

More information

Hectiling: An Integration of Fine and Coarse Grained Load Balancing Strategies 1

Hectiling: An Integration of Fine and Coarse Grained Load Balancing Strategies 1 Copyright 1998 IEEE. Published in the Proceedings of HPDC 7 98, 28 31 July 1998 at Chicago, Illinois. Personal use of this material is permitted. However, permission to reprint/republish this material

More information

PARALLELS CLOUD STORAGE

PARALLELS CLOUD STORAGE PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...

More information

High Performance Matrix Inversion with Several GPUs

High Performance Matrix Inversion with Several GPUs High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República

More information

Load Balancing Strategies for Parallel SAMR Algorithms

Load Balancing Strategies for Parallel SAMR Algorithms Proposal for a Summer Undergraduate Research Fellowship 2005 Computer science / Applied and Computational Mathematics Load Balancing Strategies for Parallel SAMR Algorithms Randolf Rotta Institut für Informatik,

More information

Reliable Systolic Computing through Redundancy

Reliable Systolic Computing through Redundancy Reliable Systolic Computing through Redundancy Kunio Okuda 1, Siang Wun Song 1, and Marcos Tatsuo Yamamoto 1 Universidade de São Paulo, Brazil, {kunio,song,mty}@ime.usp.br, http://www.ime.usp.br/ song/

More information

A Load Balancing Schema for Agent-based SPMD Applications

A Load Balancing Schema for Agent-based SPMD Applications A Load Balancing Schema for Agent-based SPMD Applications Claudio Márquez, Eduardo César, and Joan Sorribes Computer Architecture and Operating Systems Department (CAOS), Universitat Autònoma de Barcelona,

More information

Layer Load Balancing and Flexibility

Layer Load Balancing and Flexibility Periodic Hierarchical Load Balancing for Large Supercomputers Gengbin Zheng, Abhinav Bhatelé, Esteban Meneses and Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign,

More information

SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION. Marc-Olivier Briat, Jean-Luc Monnot, Edith M.

SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION. Marc-Olivier Briat, Jean-Luc Monnot, Edith M. SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION Abstract Marc-Olivier Briat, Jean-Luc Monnot, Edith M. Punt Esri, Redlands, California, USA mbriat@esri.com, jmonnot@esri.com,

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

Parallel Visualization for GIS Applications

Parallel Visualization for GIS Applications Parallel Visualization for GIS Applications Alexandre Sorokine, Jamison Daniel, Cheng Liu Oak Ridge National Laboratory, Geographic Information Science & Technology, PO Box 2008 MS 6017, Oak Ridge National

More information

Using Predictive Adaptive Parallelism to Address Portability and Irregularity

Using Predictive Adaptive Parallelism to Address Portability and Irregularity Using Predictive Adaptive Parallelism to Address Portability and Irregularity avid L. Wangerin and Isaac. Scherson {dwangeri,isaac}@uci.edu School of Computer Science University of California, Irvine Irvine,

More information

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Sergio De Agostino Computer Science Department Sapienza University of Rome Internet as a Distributed System Modern

More information

Heat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments

Heat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments Heat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments Yunhua Deng Rynson W.H. Lau Department of Computer Science, City University of Hong Kong, Hong Kong Abstract Distributed

More information

Supporting Mobility In Publish-Subscribe Networks

Supporting Mobility In Publish-Subscribe Networks A Selective Neighbor Caching Approach for Supporting Mobility in Publish/Subscribe Networks Vasilios A. Siris, Xenofon Vasilakos, and George C. Polyzos Mobile Multimedia Laboratory Department of Informatics

More information

Fast Matching of Binary Features

Fast Matching of Binary Features Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

More information

Distributed Memory Machines. Sanjay Goil and Sanjay Ranka. School of CIS ond NPAC. sgoil,ranka@top.cis.syr.edu

Distributed Memory Machines. Sanjay Goil and Sanjay Ranka. School of CIS ond NPAC. sgoil,ranka@top.cis.syr.edu Dynamic Load Balancing for Raytraced Volume Rendering on Distributed Memory Machines Sanjay Goil and Sanjay Ranka School of CIS ond NPAC Syracuse University, Syracuse, NY, 13244-4100 sgoil,ranka@top.cis.syr.edu

More information