walberla: A software framework for CFD applications on 300.000 Compute Cores

Similar documents
walberla: A software framework for CFD applications

Fast Parallel Algorithms for Computational Bio-Medicine

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation

Turbomachinery CFD on many-core platforms experiences and strategies

Microsoft Windows Compute Cluster Server 2003 Evaluation

Accelerating CFD using OpenFOAM with GPUs

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Large-Scale Reservoir Simulation and Big Data Visualization

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

OpenMP Programming on ScaleMP

Writing Applications for the GPU Using the RapidMind Development Platform

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

HPC Deployment of OpenFOAM in an Industrial Setting

Simulation Platform Overview

Optimized Hybrid Parallel Lattice Boltzmann Fluid Flow Simulations on Complex Geometries

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Overview of HPC Resources at Vanderbilt

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

How To Build An Ark Processor With An Nvidia Gpu And An African Processor

HP ProLiant SL270s Gen8 Server. Evaluation Report

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

Robust Algorithms for Current Deposition and Dynamic Load-balancing in a GPU Particle-in-Cell Code

Data Centric Systems (DCS)

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

High Performance Computing in CST STUDIO SUITE

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Multicore Parallel Computing with OpenMP

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Microscopic and mesoscopic simulations of amorphous systems using LAMMPS and GPU-based algorithms

High Performance Computing. Course Notes HPC Fundamentals

High-Performance Computing and Big Data Challenge

Cosmological simulations on High Performance Computers

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Kriterien für ein PetaFlop System

GPGPU accelerated Computational Fluid Dynamics

Real Time Simulation of Power Plants

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Computational Fluid Dynamics (CFD) and Multiphase Flow Modelling. Associate Professor Britt M. Halvorsen (Dr. Ing) Amaranath S.

Heat transfer in Rotating Fluidized Beds in a Static Geometry: A CFD study

Very special thanks to Wolfgang Gentzsch and Burak Yenier for making the UberCloud HPC Experiment possible.

Recent Advances in HPC for Structural Mechanics Simulations

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

Advanced discretisation techniques (a collection of first and second order schemes); Innovative algorithms and robust solvers for fast convergence.

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

LIGGGHTS OPEN SOURCE DEM: COUPLING TO DNS OF TURBULENT CHANNEL FLOW

HPC-related R&D in 863 Program

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

HPC enabling of OpenFOAM R for CFD applications

Towards real-time image processing with Hierarchical Hybrid Grids

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

A VOXELIZATION BASED MESH GENERATION ALGORITHM FOR NUMERICAL MODELS USED IN FOUNDRY ENGINEERING

Scientific Computing Programming with Parallel Objects

Parallel 3D Image Segmentation of Large Data Sets on a GPU Cluster

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

White Paper The Numascale Solution: Extreme BIG DATA Computing

Performance of the JMA NWP models on the PC cluster TSUBAME.

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

GPUs for Scientific Computing

Using the Windows Cluster

Pedraforca: ARM + GPU prototype

Modeling Rotor Wakes with a Hybrid OVERFLOW-Vortex Method on a GPU Cluster

AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications

Uintah Framework. Justin Luitjens, Qingyu Meng, John Schmidt, Martin Berzins, Todd Harman, Chuch Wight, Steven Parker, et al

Hardware Acceleration for CST MICROWAVE STUDIO

Interactive Data Visualization with Focus on Climate Research

Euler-Euler and Euler-Lagrange Modeling of Wood Gasification in Fluidized Beds

Stream Processing on GPUs Using Distributed Multimedia Middleware

Model of a flow in intersecting microchannels. Denis Semyonov

Evaluation of CUDA Fortran for the CFD code Strukti

Parallel Computing with MATLAB

Multi-GPU Load Balancing for Simulation and Rendering

Next Generation GPU Architecture Code-named Fermi

Data Structures and Mesh Processing in Parallel CFD Project GIMM

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

Lecture 16 - Free Surface Flows. Applied Computational Fluid Dynamics

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

Computer Graphics Hardware An Overview

Transcription:

walberla: A software framework for CFD applications on 300.000 Compute Cores J. Götz (LSS Erlangen, jan.goetz@cs.fau.de), K. Iglberger, S. Donath, C. Feichtinger, U. Rüde Lehrstuhl für Informatik 10 (Systemsimulation) www10.informatik.uni-erlangen.de Multiscale Fluid Dynamics with the Lattice Boltzmann Method 15 February, 2011 Lorentz Center Leiden 1

walberla: A software framework for CFD applications on 300.000 Compute Cores 294.912 J. Götz (LSS Erlangen, jan.goetz@cs.fau.de), K. Iglberger, S. Donath, C. Feichtinger, U. Rüde Lehrstuhl für Informatik 10 (Systemsimulation) www10.informatik.uni-erlangen.de Multiscale Fluid Dynamics with the Lattice Boltzmann Method 15 February, 2011 Lorentz Center Leiden 1

Overview Motivation Why another CFD Package? Why Parallel Programming? The Software walberla: A Software Framework for CFD Fluid-Structure Interaction with Moving Rigid Objects Rigid Body Dynamics for Granular Media Free Surface Flow Simulations GPU Computing Conclusions 2

Motivation 3

Why we need another CFD program? In the last years many PhD students at our chair wrote nice programs for different CFD applications, but: Programming and testing basic functionality takes a lot of time Parallelizing takes even more time When the PhD student leaves the chair, the program most times was not used any more, since nobody knows how to use it 4

Why Parallel Programming? 5

Why Parallel Programming? 5

Why Parallel Programming? (2) Latest standard processors are multicore processors The free lunch is over To exploit multicore performance, parallel algorithms are essential CPUs will have 2, 4, 8, 16,..., 128,...,??? cores Want to simulate problems which are not possible on standard computers 6

The Software 7

walberla Created for desktop PCs and supercomputers Supporting multi-core PCs and GPUs Modular software concept Supports various applications: Blood flow in aneurysms Moving particles and agglomerates Free surfaces to simulate foams, fuel cells, a.m.m. Charged colloids 8

walberla Created for desktop PCs and supercomputers Supporting multi-core PCs and GPUs Modular software concept Supports various applications: Blood flow in aneurysms Moving particles and agglomerates Free surfaces to simulate foams, fuel cells, a.m.m. Charged colloids 8

Fluid-Structure Interaction with Moving Rigid Objects 9

Why Simulate Fluid-Structure Interaction? Transport of solid particles is crucial for: Understanding of physical phenomena Industrial processes But: Fully resolved simulation of the obstacles is computational expensive Up to now only moderate number of obstacles can be simulated 10

Fluid-Structure Interaction 11

Rigid Body Dynamics Newton s laws of motion including rotations Contact detection in each time step Collisions modelled by cofficient of restitution: forces in normal direction friction laws: forces in tangential direction 12

Rigid Body Dynamics Newton s laws of motion including rotations Contact detection in each time step Collisions modelled by cofficient of restitution: forces in normal direction friction laws: forces in tangential direction 12

Rigid Body Dynamics Newton s laws of motion including rotations Contact detection in each time step Collisions modelled by cofficient of restitution: forces in normal direction friction laws: forces in tangential direction 12

Hourglass Simulation 1 250 000 spherical particles, 256 CPUs, 300 300 time steps, runtime: 48 h (including data output) 13

Hourglass Simulation 1 250 000 spherical particles, 256 CPUs, 300 300 time steps, runtime: 48 h (including data output) 13

Mapping Moving Obstacles into the LBM Fluid Grid An Example 14

Mapping Moving Obstacles into the LBM Fluid Grid An Example 14

Mapping Moving Obstacles into the LBM Fluid Grid An Example 14

Mapping Moving Obstacles into the LBM Fluid Grid An Example (2) Cells with state change from Particle to Fluid Cell change from particle to fluid 15

Mapping Moving Obstacles into the LBM Fluid Grid An Example (2) Cells with state change from Fluid to Particle Cell change from fluid to particle 15

Mapping Moving Obstacles into the LBM Fluid Grid An Example (2) PDF acting as Force Momentum calculation 15

The Algorithm 16

Virtual Fluidized Bed 512 processors Simulation Domain Size: 180x198x360 cells of LBM 900 capsules and 1008 spheres = 1908 objects Number of time steps: Run Time: 252,000 07h 12 min 17

Virtual Fluidized Bed 512 processors Simulation Domain Size: 180x198x360 cells of LBM 900 capsules and 1008 spheres = 1908 objects Number of time steps: Run Time: 252,000 07h 12 min 17

Simulation of a Segregation Process Segregation simulation of 242200 objects. Density values of 0.8 kg/dm3 and 1.2 kg/dm3 are used for the objects in water. 18

Weak Scaling 1 Efficiency 0.9 0.8 0.7 0.6 0.5 Jugene Blue Gene/P Jülich Supercomputer Center 40x40x40 lattice cells per core 80x80x80 lattice cells per core 100 1000 10000 300000 Number of Cores Scaling 64 to 294 912 cores Densely packed particles 150 994 944 000 lattice cells 264 331 905 rigid spherical objects 19

Weak Scaling Efficiency 1 0.9 0.8 0.7 Jugene Blue Gene/P Jülich Supercomputer Center Largest simulation to date : 8 Trillion (10 12 ) variables per time step (LBM alone) 50 TByte 0.6 40x40x40 lattice cells per core 80x80x80 lattice cells per core 0.5 100 1000 10000 300000 Number of Cores Scaling 64 to 294 912 cores Densely packed particles 150 994 944 000 lattice cells 264 331 905 rigid spherical objects 19

Free Surface Flow Simulation for foams, fuel cells, food processing, etc. 20

Free Surface Flow Simulation Example applications: Engineering: metal foam simulations Food processing Fuel cells Based on LBM: Free surfaces Surface tension and wetting modell Parallelization with MPI 21

Free Surface Flow Simulation Example applications: Engineering: metal foam simulations Food processing Fuel cells Based on LBM: Free surfaces Surface tension and wetting modell Parallelization with MPI 21

Free Surface Flow Simulation Example applications: Engineering: metal foam simulations Food processing Fuel cells Based on LBM: Free surfaces Surface tension and wetting modell Parallelization with MPI 21

The interface between liquid and gas Volume-of-Fluids like approach Flag field: Compute only in fluid Special free surface conditions in interface cells 22

23

LBM on Clusters with GPUs 24

Motivation Why should we use GPUs for LBM simulations? GPUs currently offer a very high peak performance Basic LBM performs well on GPUs Programming GPUs is getting simpler than years ago Why should we use heterogeneous simulations CPUs are available anyway on GPU nodes Do not waste these resources NVidia Fermi Intel Xeon Node GFLOPS/sp 1030 200 GFLOPS/dp 515 100 Peak Memory bandwidth GB/s 148 61 25

Conclusions 26

Conclusions Desktop PCs and notebooks will get 2,4,8,16,...,128,... cores

Conclusions Desktop PCs and notebooks will get 2,4,8,16,...,128,... cores Parallel programming is necessary for multicore usage

Conclusions Desktop PCs and notebooks will get 2,4,8,16,...,128,... cores Parallel programming is necessary for multicore usage walberla supports: Multicore systems Supercomputers Accelerators (GPU, Cell) A variety of different applications

Future Work OK, the framework is working fine for many applications

Future Work OK, the framework is working fine for many applications, but:

Future Work OK, the framework is working fine for many applications, but: Test cases for validation of particulate flows and free surfaces Any suggestions for moving particles (especially ensembles) are welcome ;-) Grid refinement + load balancing How to deal with massive parallelization: Node crashes Postprocessing Restart mechanisms How to maintain the software?

Thank you for your attention!