Mesh Partitioning and Load Balancing



Similar documents
Guide to Partitioning Unstructured Meshes for Parallel Computing

Mesh Generation and Load Balancing

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Partitioning and Dynamic Load Balancing for Petascale Applications

Scientific Computing Programming with Parallel Objects

Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations

Fast Multipole Method for particle interactions: an open source parallel library component

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

Optimizing Load Balance Using Parallel Migratable Objects

New Challenges in Dynamic Load Balancing

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

Mining Association Rules on Grid Platforms

FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Mesh Partitioning for Parallel Computational Fluid Dynamics Applications on a Grid

Parallel Simplification of Large Meshes on PC Clusters

A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS

Load Balancing in Structured Peer to Peer Systems

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Load Balancing in Structured Peer to Peer Systems

Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations

Lecture 12: Partitioning and Load Balancing

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig

Dynamic Load Balancing for Cluster Computing Jaswinder Pal Singh, Technische Universität München. singhj@in.tum.de

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC

Tutorial: Partitioning, Load Balancing and the Zoltan Toolkit

Distributed communication-aware load balancing with TreeMatch in Charm++

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load Balancing Strategies for Multi-Block Overset Grid Applications NAS

Load Balancing Techniques

Dynamic Load Balancing of SAMR Applications on Distributed Systems y

Load Balancing. Load Balancing 1 / 24

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

Load Balancing Strategies for Parallel SAMR Algorithms

It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU

Parallelization: Binary Tree Traversal

Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming)

HPC Deployment of OpenFOAM in an Industrial Setting

Load Balancing and Communication Optimization for Parallel Adaptive Finite Element Methods

Partitioning and Load Balancing for Emerging Parallel Applications and Architectures

Load Balancing Of Parallel Monte Carlo Transport Calculations

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Evaluating partitioning of big graphs

Introduction to DISC and Hadoop

Load-Balancing Spatially Located Computations using Rectangular Partitions

Molecular simulation of fluid dynamics on the nanoscale

Designing Dashboards and Scorecards for End-User Needs. Jim Hadley

1. Relational database accesses data in a sequential form. (Figures 7.1, 7.2)

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation

Topological Properties

Big Data Graph Algorithms


GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Charm++, what s that?!

Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4

DYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS

Load Balancing in Downlink LTE Self-Optimizing Networks

Load Balancing on Massively Parallel Networks: A Cellular Automaton Approach

Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization

Hadoop WordCount Explained! IT332 Distributed Systems

PPD: Scheduling and Load Balancing 2

A Novel Switch Mechanism for Load Balancing in Public Cloud

SIMS 255 Foundations of Software Design. Complexity and NP-completeness

Scheduling Shop Scheduling. Tim Nieberg

areprovidedtoviewprograminformationgatheredbythecompilerandrelateittoinformation

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Load Balancing on a Grid Using Data Characteristics

Transcription:

and Load Balancing Contents: Introduction / Motivation Goals of Load Balancing Structures Tools Slide Flow Chart of a Parallel (Dynamic) Application Partitioning of the initial mesh Computation Iteration step Data exchange + Meshrefinement and/or - movement Imbalance Loadredistribution Efficiency loss Slide 2 32. and Load Balancing 32. 32-

Structure of a partitioned mesh Halo-Cells Global mesh Partitioned mesh Slide 3 Application Example from HPS-ICE Project Example: Model DBAG, 360 angle of crankshaft, 48.342 mesh cells Slide 4 32. and Load Balancing 32. 32-2

Application Example from HPS-ICE Project At 540 angle of crankshaft: 273.738 mesh cells Slide 5 Load Imbalance Between 360 and 540 : Increase of meshsize by a factor of 2 Without Loadbalancing: 2 Processors are sharing the additional load Significant loss of efficiency Slide 6 32. and Load Balancing 32. 32-3

Goals of Loadbalancing. Load Balance 2. Minimal communication costs - minimize the number of boundary cells, created by the mesh partitioner - minimize the number of processors, one has to communicate with 3. Minimal migration costs Slide 7 Steps of a Loadbalancing Procedure 2 steps of load distribution:. Decision: - how many and - which cells should be moved - where Load balancing algorithms ( Tools) 2. Migration: move the mesh cells to the selected processors has to be done by the application program Slide 8 32. and Load Balancing 32. 32-4

How Do Load Balancer Operate? How can I handle the problem, using [known] methods/algorithms? Mapping of the problem to known structures: Graphs Nodes of the graph computation costs Edges of the graph communication costs Slide 9 Mapping Mesh Õ Graph Mesh Graph Slide 0 32. and Load Balancing 32. 32-5

Graph Further information represented in a graph: Computation costs Weighting of nodes Communication costs Weighting of edges Example: 2 2 3 3 Slide Goals of Graph Algorithms. Equal number of nodes within each partition (weighted: equal sum of node weights within each partition) Partition Partition 2 2 2 3 3 load balance: 00 % (weight of cut edges: 4) Slide 2 32. and Load Balancing 32. 32-6

Goals of Graph Algorithms 2. Minimal number of cut edges (weighted: minimal sum of cut edges weight) Partition 2 Partition 2 2 3 3 load balance: 00 % weight of cut edges: 3 Slide 3 Goals of Graph Algorithms 2a. minimal processor degree (node degree: number of neighbours) Max. processor degree: 3 Max. processor degree: 2 Slide 4 32. and Load Balancing 32. 32-7

Goals of Graph Algorithms Task for non weighted graphs: find a decomposition of the graph, with preferably the same number of nodes within each partition with a minimal number of cut edges between the partitions NP-complete problem (it cannot be solved in reasonable time in case of bigger problems) use of heuristics, which give a solution near the optimal one Slide 5 Tools Jostle: (Chris Walshaw, University of Greenwich) initial graph partitioning dynamic load distribution serial and parallel http://www.gre.ac.uk/jostle HLRS: http://www.hlrs.de/organization/par/ services/tools/loadbalancer/jostle.html Metis: (George Karypis, University of Minnesota) multilevel partitioning algorithms initial partitioning serial (parmetis: parallel version) http://www.cs.umn.edu/~karypis/metis/metis.html HLRS: http://www.hlrs.de/organization/par/ services/tools/loadbalancer/metis.html Slide 6 32. and Load Balancing 32. 32-8

Jostle operates with an already partitioned and distributed mesh in the initial case: at first partitioning of the mesh using simple methods operates in general parallel no need of gathering the data on one processor partitioning effort is spread to the processors migrates boundary layers to the neighbouring processors minimization of cell migration Slide 7 Jostle available as stand-alone program (only serial version) and as procedure call (C und FORTRAN) different graph formats possible controlling the behaviour is possible by using parameters (e. g. load imbalance) Call (C): void pjostle(int *nnodes, int *offset, int *core, int *halo, int *index, int *degree, int *node_wt, int *partition, int *local_nedges, int *edges, int *edge_wt, int *network, int *processor_wt, int *output_level, int *dimension, double *coords); Slide 8 32. and Load Balancing 32. 32-9

Jostle (serial) Balance: 00 % Cut edges: 0.509 Run time (SGI R4600): 58,9 sec 8 Partitions, 273.738 Cells Slide 9 Jostle (parallel) Balance: 00 % Cut edges: 0.638 Run time (SGI R4600):532,2 sec (in case of running 8 MPI-Processes) 8 Partitions, 273.738 Cells Slide 20 32. and Load Balancing 32. 32-0

Metis serial graph partitioning available as standalone program and as procedure call (C und FORTRAN) fixed data format consecutive cell numbering required Call(C): METIS_PartGraphRecursive(int *n, idxtype *xadj, idxtype *adjncy, idxtype *vwgt, idxtype *adjwgt, int *wgtflag, int *numflag, int *nparts, int *options, int *edgecut, idxtype *part); Slide 2 ParMetis static & dynamic partitioning consecutive cell numbering required no settings possible Call for graph repartitioning (C): ParMETIS_RepartLDiffusion(idxtype *vtxdist, idxtype *xadj, idxtype *adjncy, idxtype *vwgt, idxtype *adjwgt, int *wgtflag, int *numflag, int *options, int *edgecut, idxtype *part, MPI_Comm *comm); Slide 22 32. and Load Balancing 32. 32-

Metis Balance: 00 % Cut edges: 0.007 Run time (SGI R4600): 34,8 sec 8 Partitions, 273.738 Cells Slide 23 Tools: Comparison Tool Jostle Ser. Par. Cut edges Run time (sec) + - serial & parallel arbitrary cell numbering various user settings 0.509 0.638 58,9 532,2 Metis 0.007 34,8 fast good results worse results serial Slide 24 32. and Load Balancing 32. 32-2

Summary Strategy:. At first: initial partitioning: a. choose appropriate serial partitioner (Jostle, Metis) b. distribute the cells to the processors 2. In case of an existing load imbalance: dynamic load balancing: a. cost-benefit-calculation b. in case of benefit: when migration tool available: parallel LB Jostle else: gather cell info on a particular processor proceed like in case of initial partitioning Slide 25 Overview Initial Partitioning Chaco Metis Jostle Submesh Submesh Submesh Processor Computation Load imbalance Processor 2 Computation Load imbalance Processor n Computation Load imbalance Cost - Benefit - Calculation Cost - Benefit - Calculation Cost - Benefit - Calculation >> >> >> >>.... >> >> Migration tool Migration tool Migration tool No No No Yes Yes Yes Collect cell information Jostle Cell exchange Collect cell information Jostle Cell exchange Collect cell information Jostle Cell exchange Central processor Slide 26 32. and Load Balancing 32. 32-3