FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling
|
|
|
- Penelope McCormick
- 10 years ago
- Views:
Transcription
1 Center for Information Services and High Performance Computing (ZIH) FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling Symposium on HPC and Data-Intensive Applications in Earth Sciences 13 Nov 2014, Trieste, Italy Matthias Lieber ([email protected]) Center for Information Services and High Performance Computing (ZIH) Technische Universität Dresden, Germany
2 "Climate models now include more cloud and aerosol processes, and their interactions, than at the time of the AR4, but there remains low confidence in the representation and quantification of these processes in models." IPCC, 2013: Summary for Policymakers. In: Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change.
3 Motivation: Spectral Bin Cloud Microphysics Schemes Spectral bin microphysics radius Lynn et al., Mon. Weather Rev., 133:59-71, 2005 Grützun et al., Atmos. Res., 90(2-4): , 2008 mixing ratio mixing ratio Widely used bulk models Khain et al., J. Atmos. Sci., 67(2): , 2010 Sato et al., J. Atmos. Sci., 69: , 2012 radius Bin discretization of cloud particle size distribution Allows more detailed modeling of interaction between aerosols, clouds, and precipitation Planche et al., Quart. J. Roy. Meteor. Soc. Vol. 140, No. 683, 2014 Fan et al., Atmos. Chem. Phys., 14:81-101, 2014 Computationally too expensive for forecast Only used for process studies up to now 3
4 Motivation: Tropical Cyclone Forecast with SBM? Real-time forecast requires ~ CPU cores 1000 grid cells Horizontal grid: 1000 x 1000 Model systems must be tuned for efficient usage of large machines 1000 grid cells 4
5 Outline Bottleneck Analysis Concept of Load-balanced Coupling FD4's Features Benchmarks Conclusion 5
6 FD4 Motivation: COSMO-SPECS Performance COSMO-SPECS: Atmospheric model COSMO extended with highly detailed cloud microphysics model SPECS Small 3D case with 64x64x48 grid t = 10 min t = 30 min Growing cumulus cloud ideal scalability Lieber et al., Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4, PARA 2010,
7 Analysis: Common Parallelization Scheme Partition Communication 3D domain partitioned into rectangular boxes 2D decomposition (horizontal dimensions) Regular communication with 4 direct neighbors required (periodic boundary conditions) Based on MPI (Message Passing Interface) 7
8 Analysis: Load Imbalance due to Microphysics P0 P1 P2 P3 Solution: Apply dynamic load balancing SPECS computing time varies strongly depending on the range of the particle size distribution and presence of frozen particles Leads to load imbalances between partitions 8
9 Analysis: Increasing Communication Volume Surface-to-volume-ratio of partitions grows with number of partitions, in theory (best case): 2D decomposition: A2D(P) = 4 G2/3 P1/2 ~ P1/2 3D decomposition: A3D(P) = 6 G2/3 P1/3 ~ P1/3 Solution: Apply 3D decomposition 9
10 Outline Bottleneck Analysis Concept of Load-balanced Coupling FD4's Features Benchmarks Conclusion 10
11 Concept of Load-Balanced Coupling Atmospheric Model & Spectral Bin Microphysics 2D Decomposition Static Partitioning Lieber et al., Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4, PARA 2010,
12 Concept of Load-Balanced Coupling Atmospheric Model & Spectral Bin Microphysics Spectral Bin Microphysics High Scalability P Model Coupling 2D Decomposition Static Partitioning Block-based 3D Decomposition Dynamic Load Balancing Optimized Data Structures Lieber et al., Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4, PARA 2010,
13 Concept of Load-Balanced Coupling High Scalability P Model Coupling Implemented as independent framework FD4 FD4: Four-Dimensional Distributed Dynamic Data structures Block-based 3D Decomposition Dynamic Load Balancing Optimized Data Structures Lieber et al., Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4, PARA 2010,
14 Outline Bottleneck Analysis Concept of Load-balanced Coupling FD4's Features Benchmarks Conclusion 14
15 FD4: Dynamic Load Balancing 3D block decomposition of rectangular grid Space-filling curve (SFC) partitioning to assign blocks to ranks SFC reduces 3D partitioning problem to 1D High locality of SFC leads to moderate comm. costs Developed a highly scalable, hierarchical method for high-quality 1D partitioning of the SFC-indexed blocks 15
16 FD4: Model Coupling Data exchange between FD4 based model and an external model E.g. weather or CFD model Transfer in both directions FD4 computes partition overlaps after each repartitioning of FD4 grid Highly scalable algorithm No grid transformation / interpolation External model must provide data matching the FD4 grid Sequential coupling only Both models run alternately on same set of MPI ranks 16
17 FD4: 4th Dimension Extra, non-spatial dimension of grid variables, e.g. Size resolving models Array of gas phase tracers FD4 is optimized for a large 4th dimension COSMO-SPECS requires 2 x 11 x 66 ~ 1500 values 17
18 FD4: Adaptive Block Mode Grid allocation adapts to spatial structure of simulated problem Save memory in case data and computations are required for a subset only For multiphase problems like drops, clouds, flame fronts FD4 ensures existence of all blocks required for correct stencil operations 18
19 Outline Bottleneck Analysis Concept of Load-balanced Coupling FD4's Features Benchmarks Conclusion 19
20 Benchmarks: COSMO-SPECS Performance Comparison Almost 3 times faster at 1024 CPU cores Load balancing & coupling scale well, but can we reach > processes? COSMO-SPECS COSMO-SPECS+FD4 20
21 Benchmarks: Scalability on Blue Gene/Q Lieber, Nagel, Mix, Scalability Tuning of the Load Balancing and Coupling Framework FD4, NIC Symposium 2014, pp Grid size: 1024 x 1024 x 48 grid cells, > 3M blocks 256k: 30 min forecast in <5min (w/o init and I/O) Runs on Blue Gene/Q with up to MPI ranks 14 x speed-up from 16k to 256k COSMO-SPECS+FD4 21
22 Benchmarks: Load Balancing & Coupling Scalability Lieber, Nagel, Mix, Scalability Tuning of the Load Balancing and Coupling Framework FD4, NIC Symposium 2014, pp Grid size: 1024 x 1024 x 48 grid cells, > 3M blocks Load balancing scales comparatively very well Coupling scales nearly perfect Dynamic Load Balancing Runtime % Coupling Runtime % 22
23 Benchmarks: 1D Partitioning Comparison on Blue Gene/Q Lieber, Nagel, Scalable High-Quality 1D Partitioning, HPCS 2014, pp , 2014 ExactBS: exact method, but slow and serial H2: fast heuristic, but may result in poor load balance HIER*: hierarchical algorithm implemented in FD4, achieves nearly optimal load balance ExactBS: 2668 ms QBS: 692 ms H2seq: 363 ms H2par: 40.5 ms HIER*G=64 : 8.55 ms HIER*P/G=256 : 3.77 ms 23
24 Heuristic H2 in Action (COSMO-SPECS+FD4) 24
25 HIER* in Action (COSMO-SPECS+FD4) 25
26 Outline Bottleneck Analysis Concept of Load-balanced Coupling FD4's Features Benchmarks Conclusion 26
27 Conclusions FD4 provides for simulation models Parallelization of numerical grid Communication between neighbor partitions Dynamic load balancing Model coupling High scalability Initially developed for atmospheric modeling, but generally applicable FD4 is available as open source software Fortran 95, MPI-2, NetCDF Tested on many different HPC systems FD4 website: Lieber, Nagel, Scalable High-Quality 1D Partitioning, HPCS 2014, pp , 2014 Lieber, Nagel, Mix, Scalability Tuning of the Load Balancing and Coupling Framework FD4, NIC Symposium 2014 Lieber et al., Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMOSPECS+FD4, PARA 2010, 2012 Lieber et al., FD4: A Framework for Highly Scalable Load Balancing and Coupling of Multiphase Models, ICNAAM
28 Thank you very much for your attention! Acknowledgments Verena Grützun, Ralf Wolke, Oswald Knoth, Martin Simmel, René Widera, Matthias Jurenz, Matthias Müller, Wolfgang E. Nagel picongpu.hzdr.de Funding 28
29 Backup Slides 29
30 Framework FD4: Optimized Data Structure A large number of small blocks are good for performance: Size-resolved approach / ~1000 variables per grid cell: Only small blocks do not exceed processor cache Load balancing: #blocks > #partitions to enable fine-grained balancing Additional memory costs for a boundary of ghost cells Too high for small blocks! Add ghost blocks at the partition borders only 30
31 From SFC Partitioning to 1D Partitioning Space-filling curve (SFC) partitioning widely used nd space is mapped to 1D by SFC Mapping is fast and has high locality Migration typically between neighbor ranks 1D partitioning is core problem of SFC partitioning Hilbert SFC Pilkington, Baden, Dynamic partitioning of non-uniform structured workloads with spacefilling curves, IEEE T. Parall. Distr., vol. 7, no. 3, pp , Decomposes task chain into consecutive parts Two classes of existing 1D partitioning algorithms: Heuristics: fast, parallel, no optimal solution Pinar, Aykanat, Fast optimal load balancing algorithms for 1D partitioning, J. Parallel Distr. Com., vol. 64, no. 8, pp , Exact methods: slow, serial, but optimal 31
32 Sequential vs. Concurrent Model Coupling Both models run alternately on same set of MPI ranks Allows tight coupling (data dependencies) Avoids load imbalances between models Data exchange Model A Model B Ranks Data exchange Data exchange Data exchange Model B Data exchange Ranks Model A t Data exchange Concurrent Coupling Sequential Coupling t MPI ranks are split into groups Loose coupling, codes may be separate Scales to higher total number of ranks 32
33 Scalable High-Quality 1D Partitioning: Algorithm HIER* Large scale applications require a fully parallel method, i.e. without gathering all task weights Run parallel H2 to create G < P coarse partitions: Orig part. P0 P1 P2 P3 P4 P5 P6 H2 nearly optimal if wmax << WN / P: Miguet, Pierson, Heuristics for 1D rectilinear partitioning as a low cost and high quality answer to dynamic load balancing, LNCS, vol. 1225, 1997, pp P7 Parallel Heuristic H2 Coarse part. P0 / P1 P2 / P3 P4 / P5 P6 / P7 Run G independent instances of exact QBS* (q=1.0) to create final partitions within each group: Final part. QBS* QBS* QBS* QBS* P0 P2 P4 P6 P1 P3 P5 P7 Parameter G allows trade-off between scalability (high G heuristic dominates) and load balance (small G exact method dominates) 33
34 FD4: Implementation Implemented in Fortran 95 MPI-based parallelization Open Source Software MPI initialization call MPI_Init(err) call MPI_Comm_rank(MPI_COMM_WORLD, rank, err) call MPI_Comm_size(MPI_COMM_WORLD, nproc, err)! create the domain and allocate memory call fd4_domain_create(domain, nb, size, & vartab, ng, peri, MPI_COMM_WORLD, err) call fd4_util_allocate_all_blocks(domain, err)! initialize ghost communication call fd4_ghostcomm_create(ghostcomm, domain, & 4, vars, steps, err)! loop over time steps do timestep=1,nsteps! exchange ghosts call fd4_ghostcomm_exch(ghostcomm, err)! loop over local blocks call fd4_iter_init(domain, iter) do while(associated(iter%cur))! do some computations call compute_block(iter) call fd4_iter_next(iter) end do! dynamic load balancing call fd4_balance_readjust(domain, err) end do 34
35 Benchmarks: COSMO-SPECS Performance Comparison COSMO-SPECS+FD4 COSMO-SPECS 35
36 Scalable High-Quality 1D Partitioning: Load Balance Cloud simulation, tasks CLOUD Simulation System: JUQUEEN, IBM Blue Gene/Q HIER*, G=64 achieves 99.2% of the optimal load balance at processes 36
37 HIER* seen in Vampir (one Group of 256 out of 64Ki) 37
38 ExactBS in Action (COSMO-SPECS+FD4) 38
39 COSMO-SPECS+FD4: Comparison of Methods Lieber, Nagel, Mix, Scalability Tuning of the Load Balancing and Coupling Framework FD4, NIC Symposium 2014, pp
40 Scalable Coupling: Meta Data Subdomains Handshaking Identifying partition overlaps between the coupled models turned out to be the main scalability bottleneck Solved with spatially indexed data structure for coupling meta data in FD4 Time for locating overlap candidates does not depend on number of ranks Lieber, Nagel, Mix, Scalability Tuning of the Load Balancing and Coupling Framework FD4, NIC Symposium 2014, pp
Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4
Center for Information Services and High Performance Computing (ZIH) Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4 PARA 2010, June 9, Reykjavík, Iceland Matthias
Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies
Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems November 3-6, 1999 in Cambridge Massachusetts, USA Characterizing the Performance of Dynamic Distribution
ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008
ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element
Distributed communication-aware load balancing with TreeMatch in Charm++
Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration
walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation
walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation SIAM Parallel Processing for Scientific Computing 2012 February 16, 2012 Florian Schornbaum,
HPC Deployment of OpenFOAM in an Industrial Setting
HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak [email protected] Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment
Scientific Computing Programming with Parallel Objects
Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore
Parameterization of Cumulus Convective Cloud Systems in Mesoscale Forecast Models
DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Parameterization of Cumulus Convective Cloud Systems in Mesoscale Forecast Models Yefim L. Kogan Cooperative Institute
18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two
age 1 18-742 Lecture 4 arallel rogramming II Spring 2005 rof. Babak Falsafi http://www.ece.cmu.edu/~ece742 write X Memory send X Memory read X Memory Slides developed in part by rofs. Adve, Falsafi, Hill,
A Novel Switch Mechanism for Load Balancing in Public Cloud
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) A Novel Switch Mechanism for Load Balancing in Public Cloud Kalathoti Rambabu 1, M. Chandra Sekhar 2 1 M. Tech (CSE), MVR College
Multi-GPU Load Balancing for Simulation and Rendering
Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Dynamic Load Balancing framework in Charm++ Measurement Based Load Balancing Examples: Hybrid Load Balancers Topology-aware
Cellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
Fast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
Performance Engineering of the Community Atmosphere Model
Performance Engineering of the Community Atmosphere Model Patrick H. Worley Oak Ridge National Laboratory Arthur A. Mirin Lawrence Livermore National Laboratory 11th Annual CCSM Workshop June 20-22, 2006
ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS
1 ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS Sreenivas Varadan a, Kentaro Hara b, Eric Johnsen a, Bram Van Leer b a. Department of Mechanical Engineering, University of Michigan,
Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 18, 1037-1048 (2002) Short Paper Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors PANGFENG
Mesh Partitioning and Load Balancing
and Load Balancing Contents: Introduction / Motivation Goals of Load Balancing Structures Tools Slide Flow Chart of a Parallel (Dynamic) Application Partitioning of the initial mesh Computation Iteration
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
Performance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
MPI Hands-On List of the exercises
MPI Hands-On List of the exercises 1 MPI Hands-On Exercise 1: MPI Environment.... 2 2 MPI Hands-On Exercise 2: Ping-pong...3 3 MPI Hands-On Exercise 3: Collective communications and reductions... 5 4 MPI
Turbulent mixing in clouds latent heat and cloud microphysics effects
Turbulent mixing in clouds latent heat and cloud microphysics effects Szymon P. Malinowski1*, Mirosław Andrejczuk2, Wojciech W. Grabowski3, Piotr Korczyk4, Tomasz A. Kowalewski4 and Piotr K. Smolarkiewicz3
Software and Performance Engineering of the Community Atmosphere Model
Software and Performance Engineering of the Community Atmosphere Model Patrick H. Worley Oak Ridge National Laboratory Arthur A. Mirin Lawrence Livermore National Laboratory Science Team Meeting Climate
A Review of Customized Dynamic Load Balancing for a Network of Workstations
A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
How To Balance In Cloud Computing
A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi [email protected] Yedhu Sastri Dept. of IT, RSET,
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden [email protected],
Recommendations for Performance Benchmarking
Recommendations for Performance Benchmarking Shikhar Puri Abstract Performance benchmarking of applications is increasingly becoming essential before deployment. This paper covers recommendations and best
HPC Programming Framework Research Team
HPC Programming Framework Research Team 1. Team Members Naoya Maruyama (Team Leader) Motohiko Matsuda (Research Scientist) Soichiro Suzuki (Technical Staff) Mohamed Wahib (Postdoctoral Researcher) Shinichiro
Mesh Generation and Load Balancing
Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable
Overlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm [email protected] [email protected] Department of Computer Science Department of Electrical and Computer
Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite
High Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
System Models for Distributed and Cloud Computing
System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems
Uintah Framework. Justin Luitjens, Qingyu Meng, John Schmidt, Martin Berzins, Todd Harman, Chuch Wight, Steven Parker, et al
Uintah Framework Justin Luitjens, Qingyu Meng, John Schmidt, Martin Berzins, Todd Harman, Chuch Wight, Steven Parker, et al Uintah Parallel Computing Framework Uintah - far-sighted design by Steve Parker
Control 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
Climate-Weather Modeling Studies Using a Prototype Global Cloud-System Resolving Model
ANL/ALCF/ESP-13/1 Climate-Weather Modeling Studies Using a Prototype Global Cloud-System Resolving Model ALCF-2 Early Science Program Technical Report Argonne Leadership Computing Facility About Argonne
A Multi-layered Domain-specific Language for Stencil Computations
A Multi-layered Domain-specific Language for Stencil Computations Christian Schmitt, Frank Hannig, Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Workshop ExaStencils 2014,
Pros and Cons of HPC Cloud Computing
CloudStat 211 Pros and Cons of HPC Cloud Computing Nils gentschen Felde Motivation - Idea HPC Cluster HPC Cloud Cluster Management benefits of virtual HPC Dynamical sizing / partitioning Loadbalancing
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
Hank Childs, University of Oregon
Exascale Analysis & Visualization: Get Ready For a Whole New World Sept. 16, 2015 Hank Childs, University of Oregon Before I forget VisIt: visualization and analysis for very big data DOE Workshop for
Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012
Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),
MOSIX: High performance Linux farm
MOSIX: High performance Linux farm Paolo Mastroserio [[email protected]] Francesco Maria Taurino [[email protected]] Gennaro Tortone [[email protected]] Napoli Index overview on Linux farm farm
LOAD BALANCING FOR MULTIPLE PARALLEL JOBS
European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 ECCOMAS LOAD BALANCING FOR MULTIPLE PARALLEL JOBS A. Ecer, Y. P. Chien, H.U Akay
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
LAMMPS Developer Guide 23 Aug 2011
LAMMPS Developer Guide 23 Aug 2011 This document is a developer guide to the LAMMPS molecular dynamics package, whose WWW site is at lammps.sandia.gov. It describes the internal structure and algorithms
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of
A Load Balancing Tool for Structured Multi-Block Grid CFD Applications
A Load Balancing Tool for Structured Multi-Block Grid CFD Applications K. P. Apponsah and D. W. Zingg University of Toronto Institute for Aerospace Studies (UTIAS), Toronto, ON, M3H 5T6, Canada Email:
Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)
UC BERKELEY Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II) Anthony D. Joseph LASER Summer School September 2013 My Talks at LASER 2013 1. AMP Lab introduction 2. The Datacenter
P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE
1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France
An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems
An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems Ardhendu Mandal and Subhas Chandra Pal Department of Computer Science and Application, University
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Qingyu Meng, Alan Humphrey, Martin Berzins Thanks to: John Schmidt and J. Davison de St. Germain, SCI Institute Justin Luitjens
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering
Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. [email protected] http://www.dell.com/clustering
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
Scalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
Large-Scale Reservoir Simulation and Big Data Visualization
Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu [email protected] High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
HPC enabling of OpenFOAM R for CFD applications
HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,
Mariana Vertenstein. NCAR Earth System Laboratory CESM Software Engineering Group (CSEG) NCAR is sponsored by the National Science Foundation
Leveraging the New CESM1 CPL7 Architecture - Current and Future Challenges Mariana Vertenstein NCAR Earth System Laboratory CESM Software Engineering Group (CSEG) NCAR is sponsored by the National Science
Operating System for the K computer
Operating System for the K computer Jun Moroo Masahiko Yamada Takeharu Kato For the K computer to achieve the world s highest performance, Fujitsu has worked on the following three performance improvements
Recommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102
GCMs with Implicit and Explicit cloudrain processes for simulation of extreme precipitation frequency
GCMs with Implicit and Explicit cloudrain processes for simulation of extreme precipitation frequency In Sik Kang Seoul National University Young Min Yang (UH) and Wei Kuo Tao (GSFC) Content 1. Conventional
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
Multilevel Load Balancing in NUMA Computers
FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.pucrs.br/inf/pos/ Multilevel Load Balancing in NUMA Computers M. Corrêa, R. Chanin, A. Sales, R. Scheer, A. Zorzo Technical Report Series Number 049 July,
9/26/2011. What is Virtualization? What are the different types of virtualization.
CSE 501 Monday, September 26, 2011 Kevin Cleary [email protected] What is Virtualization? What are the different types of virtualization. Practical Uses Popular virtualization products Demo Question,
