walberla: A software framework for CFD applications on 300.000 Compute Cores J. Götz (LSS Erlangen, jan.goetz@cs.fau.de), K. Iglberger, S. Donath, C. Feichtinger, U. Rüde Lehrstuhl für Informatik 10 (Systemsimulation) www10.informatik.uni-erlangen.de Multiscale Fluid Dynamics with the Lattice Boltzmann Method 15 February, 2011 Lorentz Center Leiden 1
walberla: A software framework for CFD applications on 300.000 Compute Cores 294.912 J. Götz (LSS Erlangen, jan.goetz@cs.fau.de), K. Iglberger, S. Donath, C. Feichtinger, U. Rüde Lehrstuhl für Informatik 10 (Systemsimulation) www10.informatik.uni-erlangen.de Multiscale Fluid Dynamics with the Lattice Boltzmann Method 15 February, 2011 Lorentz Center Leiden 1
Overview Motivation Why another CFD Package? Why Parallel Programming? The Software walberla: A Software Framework for CFD Fluid-Structure Interaction with Moving Rigid Objects Rigid Body Dynamics for Granular Media Free Surface Flow Simulations GPU Computing Conclusions 2
Motivation 3
Why we need another CFD program? In the last years many PhD students at our chair wrote nice programs for different CFD applications, but: Programming and testing basic functionality takes a lot of time Parallelizing takes even more time When the PhD student leaves the chair, the program most times was not used any more, since nobody knows how to use it 4
Why Parallel Programming? 5
Why Parallel Programming? 5
Why Parallel Programming? (2) Latest standard processors are multicore processors The free lunch is over To exploit multicore performance, parallel algorithms are essential CPUs will have 2, 4, 8, 16,..., 128,...,??? cores Want to simulate problems which are not possible on standard computers 6
The Software 7
walberla Created for desktop PCs and supercomputers Supporting multi-core PCs and GPUs Modular software concept Supports various applications: Blood flow in aneurysms Moving particles and agglomerates Free surfaces to simulate foams, fuel cells, a.m.m. Charged colloids 8
walberla Created for desktop PCs and supercomputers Supporting multi-core PCs and GPUs Modular software concept Supports various applications: Blood flow in aneurysms Moving particles and agglomerates Free surfaces to simulate foams, fuel cells, a.m.m. Charged colloids 8
Fluid-Structure Interaction with Moving Rigid Objects 9
Why Simulate Fluid-Structure Interaction? Transport of solid particles is crucial for: Understanding of physical phenomena Industrial processes But: Fully resolved simulation of the obstacles is computational expensive Up to now only moderate number of obstacles can be simulated 10
Fluid-Structure Interaction 11
Rigid Body Dynamics Newton s laws of motion including rotations Contact detection in each time step Collisions modelled by cofficient of restitution: forces in normal direction friction laws: forces in tangential direction 12
Rigid Body Dynamics Newton s laws of motion including rotations Contact detection in each time step Collisions modelled by cofficient of restitution: forces in normal direction friction laws: forces in tangential direction 12
Rigid Body Dynamics Newton s laws of motion including rotations Contact detection in each time step Collisions modelled by cofficient of restitution: forces in normal direction friction laws: forces in tangential direction 12
Hourglass Simulation 1 250 000 spherical particles, 256 CPUs, 300 300 time steps, runtime: 48 h (including data output) 13
Hourglass Simulation 1 250 000 spherical particles, 256 CPUs, 300 300 time steps, runtime: 48 h (including data output) 13
Mapping Moving Obstacles into the LBM Fluid Grid An Example 14
Mapping Moving Obstacles into the LBM Fluid Grid An Example 14
Mapping Moving Obstacles into the LBM Fluid Grid An Example 14
Mapping Moving Obstacles into the LBM Fluid Grid An Example (2) Cells with state change from Particle to Fluid Cell change from particle to fluid 15
Mapping Moving Obstacles into the LBM Fluid Grid An Example (2) Cells with state change from Fluid to Particle Cell change from fluid to particle 15
Mapping Moving Obstacles into the LBM Fluid Grid An Example (2) PDF acting as Force Momentum calculation 15
The Algorithm 16
Virtual Fluidized Bed 512 processors Simulation Domain Size: 180x198x360 cells of LBM 900 capsules and 1008 spheres = 1908 objects Number of time steps: Run Time: 252,000 07h 12 min 17
Virtual Fluidized Bed 512 processors Simulation Domain Size: 180x198x360 cells of LBM 900 capsules and 1008 spheres = 1908 objects Number of time steps: Run Time: 252,000 07h 12 min 17
Simulation of a Segregation Process Segregation simulation of 242200 objects. Density values of 0.8 kg/dm3 and 1.2 kg/dm3 are used for the objects in water. 18
Weak Scaling 1 Efficiency 0.9 0.8 0.7 0.6 0.5 Jugene Blue Gene/P Jülich Supercomputer Center 40x40x40 lattice cells per core 80x80x80 lattice cells per core 100 1000 10000 300000 Number of Cores Scaling 64 to 294 912 cores Densely packed particles 150 994 944 000 lattice cells 264 331 905 rigid spherical objects 19
Weak Scaling Efficiency 1 0.9 0.8 0.7 Jugene Blue Gene/P Jülich Supercomputer Center Largest simulation to date : 8 Trillion (10 12 ) variables per time step (LBM alone) 50 TByte 0.6 40x40x40 lattice cells per core 80x80x80 lattice cells per core 0.5 100 1000 10000 300000 Number of Cores Scaling 64 to 294 912 cores Densely packed particles 150 994 944 000 lattice cells 264 331 905 rigid spherical objects 19
Free Surface Flow Simulation for foams, fuel cells, food processing, etc. 20
Free Surface Flow Simulation Example applications: Engineering: metal foam simulations Food processing Fuel cells Based on LBM: Free surfaces Surface tension and wetting modell Parallelization with MPI 21
Free Surface Flow Simulation Example applications: Engineering: metal foam simulations Food processing Fuel cells Based on LBM: Free surfaces Surface tension and wetting modell Parallelization with MPI 21
Free Surface Flow Simulation Example applications: Engineering: metal foam simulations Food processing Fuel cells Based on LBM: Free surfaces Surface tension and wetting modell Parallelization with MPI 21
The interface between liquid and gas Volume-of-Fluids like approach Flag field: Compute only in fluid Special free surface conditions in interface cells 22
23
LBM on Clusters with GPUs 24
Motivation Why should we use GPUs for LBM simulations? GPUs currently offer a very high peak performance Basic LBM performs well on GPUs Programming GPUs is getting simpler than years ago Why should we use heterogeneous simulations CPUs are available anyway on GPU nodes Do not waste these resources NVidia Fermi Intel Xeon Node GFLOPS/sp 1030 200 GFLOPS/dp 515 100 Peak Memory bandwidth GB/s 148 61 25
Conclusions 26
Conclusions Desktop PCs and notebooks will get 2,4,8,16,...,128,... cores
Conclusions Desktop PCs and notebooks will get 2,4,8,16,...,128,... cores Parallel programming is necessary for multicore usage
Conclusions Desktop PCs and notebooks will get 2,4,8,16,...,128,... cores Parallel programming is necessary for multicore usage walberla supports: Multicore systems Supercomputers Accelerators (GPU, Cell) A variety of different applications
Future Work OK, the framework is working fine for many applications
Future Work OK, the framework is working fine for many applications, but:
Future Work OK, the framework is working fine for many applications, but: Test cases for validation of particulate flows and free surfaces Any suggestions for moving particles (especially ensembles) are welcome ;-) Grid refinement + load balancing How to deal with massive parallelization: Node crashes Postprocessing Restart mechanisms How to maintain the software?
Thank you for your attention!