Mesh Generation and Load Balancing

Similar documents
Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

Fast Multipole Method for particle interactions: an open source parallel library component

Mesh Partitioning and Load Balancing

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

Load Balancing Strategies for Parallel SAMR Algorithms

Partitioning and Dynamic Load Balancing for Petascale Applications

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

How High a Degree is High Enough for High Order Finite Elements?

Load balancing. David Bindel. 12 Nov 2015

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies

HPC enabling of OpenFOAM R for CFD applications

Resource-Aware Load Balancing of Parallel Applications

Partition And Load Balancer on World Wide Web

Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Matrix Multiplication

Lecture 12: Partitioning and Load Balancing

Scientific Computing Programming with Parallel Objects

Adaptive Time-Dependent CFD on Distributed Unstructured Meshes

Load Balancing Support for Grid-enabled Applications

Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Benchmarking for High Performance Systems and Applications. Erich Strohmaier NERSC/LBNL

Parallel Unstructured Tetrahedral Mesh Adaptation: Algorithms, Implementation and Scalability

Distributed communication-aware load balancing with TreeMatch in Charm++

Partitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

New Challenges in Dynamic Load Balancing

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC

Dynamic Load Balancing for Parallel Numerical Simulations based on Repartitioning with Disturbed Diffusion

FEM Software Automation, with a case study on the Stokes Equations

Power-Aware High-Performance Scientific Computing

A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS

HPC Deployment of OpenFOAM in an Industrial Setting

Layer Load Balancing and Flexibility

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI

AN APPROACH FOR SECURE CLOUD COMPUTING FOR FEM SIMULATION

A Pattern-Based Approach to. Automated Application Performance Analysis

HPC Infrastructure Development in Bulgaria

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

Basin simulation for complex geological settings

CHAPTER 1 INTRODUCTION

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Load Balancing and Communication Optimization for Parallel Adaptive Finite Element Methods

Load Balance Strategies for DEVS Approximated Parallel and Distributed Discrete-Event Simulations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms

Optimizing Load Balance Using Parallel Migratable Objects

Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming)

PERFORMANCE ANALYSIS AND OPTIMIZATION OF LARGE-SCALE SCIENTIFIC APPLICATIONS JINGJIN WU

Dynamic Load Balancing of SAMR Applications on Distributed Systems y

GSAND C. Demonstrating Improved Application Performance Using Dynamic Monitoring and Task Mapping

Integrated System Modeling for Handling Big Data in Electric Utility Systems

A Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

UNIVERSITY OF CALIFORNIA, SAN DIEGO. A Performance Model and Load Balancer for a Parallel Monte-Carlo Cellular Microphysiology Simulator

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Dynamic Load Balancing for Cluster Computing Jaswinder Pal Singh, Technische Universität München. singhj@in.tum.de

Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis. Pelle Jakovits

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Proposal of Dynamic Load Balancing Algorithm in Grid System

Interactive comment on A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc by W. He et al.

Grapes Partition and the Ham Sandwich Theorem

Feedback guided load balancing in a distributed memory environment

Uintah Framework. Justin Luitjens, Qingyu Meng, John Schmidt, Martin Berzins, Todd Harman, Chuch Wight, Steven Parker, et al

Large-Scale Reservoir Simulation and Big Data Visualization

Managing and Using Millions of Threads

Load Balancing Strategies for Multi-Block Overset Grid Applications NAS

Best practices for efficient HPC performance with large models

Partitioning and Load Balancing for Emerging Parallel Applications and Architectures

BOĞAZİÇİ UNIVERSITY COMPUTER ENGINEERING

Towards real-time image processing with Hierarchical Hybrid Grids

Multiphase Flow - Appendices

High Performance Computing. Course Notes HPC Fundamentals

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis

?kt. An Unconventional Method for Load Balancing. w = C ( t m a z - ti) = p(tmaz - 0i=l. 1 Introduction. R. Alan McCoy,*

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Parallel Computing. Benson Muite. benson.

Study of Various Load Balancing Techniques in Cloud Environment- A Review

Irregular Applications and their Architectural Challenges

A scalable multilevel algorithm for graph clustering and community structure detection

COM 444 Cloud Computing

High Performance Computing in CST STUDIO SUITE

Hierarchically Parallel FE Software for Assembly Structures : FrontISTR - Parallel Performance Evaluation and Its Industrial Applications

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3


Big Data Optimization at SAS

Streamline Integration using MPI-Hybrid Parallelism on a Large Multi-Core Architecture

Fire Simulations in Civil Engineering

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

Guide to Partitioning Unstructured Meshes for Parallel Computing

Partitioning and Divide and Conquer Strategies

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Load Balancing Techniques

Petascale Software Challenges. William Gropp

Transcription:

Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19

Outline Motivation Reliable & efficient PDE simulations for high end computing systems Background PDE simulation concept: approximation is over a mesh Error Analysis Simulation error: related to local mesh size Adaptive Mesh Generation Support parallel refinement/derefinement and element migration Load Balancing Scalability of the computation on modern architectures Data structures Algorithmically motivated: multigrid, domain decomposition, etc. For performance optimization: architecture aware computing Numerical Example Conclusions Slide 2 / 19

Motivation PDE simulations have errors stemming from the (... mesh, numerical approximation (related to the The need for Reliable: error to be less than desirable tolerance and Efficient: do not do overkill computation PDE simulations for High end computing systems. Slide 3 / 19

Background In general: Error from the discretization is proportional to the mesh size A problem: localized physical phenomena deteriorate the approximation properties of classical PDE approximations How can we find a good mesh, i.e. yielding small and reliable error and efficient computation For example flows near wells; faults; moving fronts, etc. Slide 4 / 19

Background Solution: (1) determine (automatically) the regions of singular behaviour, and (2) refine them in a balanced manner Example: Efficiency of locally adapted vs uniform approximation of on an L-shaped domain Slide 5 / 19

Background Computational framework of the Adaptive methods: Solve PDE Evaluate the approximation error Is error acceptable no yes done ( refinement Improve the approximation (h/p i.e. a process of continuous feedback from the computation to find a reliable and efficient numerical PDE approximation Slide 6 / 19

Error Analysis ( FEM The numerical solution of PDE (e.g. Boundary value problem: Au = f, subject to boundary conditions Get a weak formulation: (Au, ϕ) = (f, ϕ) - multiply by test function ϕ and integrate over the domain a( u, ϕ) = <f, ϕ> for ϕ S φ i Galerkin (FEM) problem: Find u h S h S s.t. a( u h, ϕ h ) = <f, ϕ h > for ϕ h S h The error e u u h The Error problem: a( e, ϕ) = a( u- u h, ϕ) = <A(u- u h, ϕ> = <f-a u h, ϕ> < R h, ϕ> for ϕ S Various error estimators: depend on how we solve the Error problem i Slide 7 / 19

Adaptive Mesh Generation good mesh ~ good approximation Huge area of research and software development See Steven Owen s (Sandia) survey http://www.andrew.cmu.edu/user/sowen/mesh.html Which one to choose? Classification: structured/unstructured, element type, support/or no adaptivity, sequential/parallel, etc. Algorithm requirements: conforming/non-conforming, problem size, etc For HPC on 100s of 1000s of processors: parallel adaptive Software design: framework (application is embedded) or ( compliant toolkits (CCA interface Important algorithmic issues to consider low bookkeeping and storage overhead, easy data transfer between meshes, load balancing Slide 8 / 19

Adaptive Mesh Generation Mesh generation techniques Regenerate the mesh locally or globally; Appealing only for steady state problems; produce meshes with particular properties (Delaunay/Voronoi, etc.). Hierarchical refinement (common method of choice in AMR; (* ParaGrid used in keep hierarchy of meshes; good for both steady & transient problems; algorithms to maintain mesh quality. Various hybrid methods h-refinement with various local node movements (r-refinement/mesh smoothing); various patch-grid refinement strategies; techniques for coupling various grids, etc. Slide 9 / 19

Adaptive Mesh Generation Hierarchical mesh generation ( bisection Element subdivision (e.g. tetrahedral edge ( 2/3D Hierarchy is usually stored in tree (e.g. quad/octrees in Facilitate coarsening Natural creation of multilevel data structures for multilevel solvers Research on various formats/tricks to reduce storage overhead Exploring the deterministic nature of refinement ( elements (relation of parent-child Slide 10 / 19

Load Balancing The need for load balance throughout the adaptive solution process Minimize idle time + interprocessor comm. ~ scalability Partitioning for Load balance, and minimal interface is NP-complete but there are many heuristics discovered, See the survey Schloegel K., Karypis G. and Kumar V.: "Graph Partitioning for High Performance Scientific Simulations", in CRPC Parallel Computing Handbook by Dongarra J., Foster I., Fox G., Kennedy K. and White A. (eds.), Morgan Kaufmann, 2000. 408 Slide 11 / 19

History of partitioning algorithms ( http://www.ima.umn.edu/talks/workshops/2-9-13.2000/kumar/kumar.pdf (Vipin Kumar: Space filling curve ( Party (available in Metis, Jostle, Cartesian nested dissection Multilevel partitioning Slide 12 / 19

Dynamic Load Balancing Issues to consider: (? matrices Load balance (what about DD with conditioned subdomain Minimize edge cut ( expensive Minimize data redistribution cost (most Rebuild internal and shared data structures What about balance for multilevel data structures? Two main techniques (ParMETIS supports both): ( neighbors Diffusive ( diffuse load among Global (global repartition + smart remapping to minimize ( cost redistribution Which one to choose? See for example: R. Biswas, S. Das, D. Harvey, L. Oliker, Parallel Dynamic Load Balancing Strategies For Adaptive Irregular Applications Slide 13 / 19

Data Structures Adaptive methods run at a fraction of the performance peak of cache-based machines: This is due to irregular memory access patterns because of their dynamic nature and unstructured sparse matrices produced Can be improved but efficient parallel programming is difficult for this class of problems A lot of current work in the field is on data structures Improve memory access patterns for better cache reuse (to be discussed further in Lecture #3 ) Slide 14 / 19

Data Structures In particular: for adaptive mesh generation Need distributed tree (quadtree/octree) for efficient derefinement Contiguous storage space + hash table access ( performance ) Use the deterministic nature of the ( storage refinement/coarsening (minimize data Slide 15 / 19

Data Structures Algorithmically motivated Multigrid Domain decomposition For performance of parallel matrix-vector product Pre-compute & block inter-processor communication patterns Index ordering (SAW, space filling curves, Cuthill McKee, etc) and structures for sparse matrix storage for register blocking, and cache blocking see the Sparsity & BeBOP projects at Berkeley; Techniques: similar to blocking for dense matrices; ( tuning architecture dependant (need processor-specific Slide 16 / 19

When does register/cache blocking work? ( group (see R. Nishtala et.al., 2006; R. Vuduc 2003; Berkeley optimization Discontinuous Galerkin FEM is of great current interest Mesh and nonzero matrix structures for approximation of order 1 and 3 ( applications (pictures from D.Darmofal, 2004; MIT; aerospace Naturally occurring dense blocks: open possibilities for various register and multiple level of cache blocking techniques Tuning: Machine dependant Performance can be surprising ( 2003 (Vuduc, Need for automatic machine-specific search register blocking for sparse 3-diagonal matrix consisting of ( 2 8x8 dense blocks (on Intel Itanium explored by storing them as 8x8 blocks what about r x c block storage? shown is the speedup relative to unblocked 1 x 1 code Slide 17 / 19

Numerical Example An example of contaminant flow in porous media: ) 1000,500,500 ( 30 mg/l ) 0,0,0 ( http://www.cs.utk.edu/~tomov/cflow/ Slide 18 / 19

Conclusions Adaptive methods: A computational methodology for reliable and efficient numerical solution of PDE problems Multidisciplinary field CS, math, engineering Need multidisciplinary effort for their successful development Overview of the CS aspects The goal: develop the methodology and build it into an intelligent adaptable reconfigurable system for current and next generation supercomputers Slide 19 / 19