Generating Virtual Worlds with Supercomputer Simulations U. Rüde (LSS Erlangen, ruede@cs.fau.de) joint work with many collaborators and students Lehrstuhl für Informatik 10 (Systemsimulation) Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de December 19, 2007 1
Overview Computers as tools for scientists: What is Computational Science Examples of Simulation for Science and Engineering Simulating Flow Simulations Biomedical Applications Conclusions 2
Motivation 3
How much is a PetaFlops? 10 6 = 1 MegaFlops: Intel 486 Zur Anzeige wird der QuickTime Dekompressor TIFF (Unkomprimiert) benötigt. 33MHz PC (~1989) 10 9 = 1 GigaFlops: Intel Pentium III 1GHz (~2000) If every person on earth does one operation every 6 seconds, all humans together have 1 GigaFlops performance (less than a current laptop from Aldi) 10 12 = 1 TeraFlops: HLRB-I 1344 Proc., ~ 2000 10 15 = 1 PetaFlops >250 000 Proc. Cores?, ~2008? If every person on earth runs a 486 PC, we all together have an aggregate Performance of 6 PetaFlops. Zur Anzeige wird der QuickTime Dekompressor TIFF (Unkomprimiert) benötigt. HLRB-I: 2 TFlops Zur Anzeige wird der QuickTime Dekompressor TIFF (Unkomprimiert) benötigt. HLRB-II: 63 TFlops 10
The Two Principles of Science Three Theory Mathematical Models, Differential Equations, Newton Experiments Observation and prototypes empirical Sciences Computational Science Simulation, Optimization (quantitative) virtual Reality 11
SIAM s Definition of CSE http://www.siam siam.org/cse/report.htm CSE is a broad multidisciplinary area that encompasses applications in science/engineering, applied mathematics, numerical analysis, and computer science. Computer models and computer simulations have become an important part of the research repertoire, supplementing (and in some cases replacing) experimentation. Going from application area to computational results requires domain expertise, mathematical modeling,, numerical analysis, algorithm development, software implementation, program execution, analysis, validation and visualization of results.. CSE involves all of this. 13
SIAM's Definition of CSE (2) Especially: What is it NOT! CSE makes use of the techniques of applied mathematics and computer science for the development of problem-solving methodologies and robust tools which will be the building blocks for solutions to scientific and engineering problems of ever-increasing complexity. It differs from mathematics or computer science in that analysis and methodologies are directed specifically at the solution of problem classes from science and engineering,, and will generally require a detailed knowledge or substantial collaboration from those disciplines. The computing and mathematical techniques used may be more domain specific, and the computer science and mathematics skills needed will be broader. CSE is more than a scientist or engineer using a canned code to generate and visualize results (skipping all of the intermediate steps). 14
Fluid Flow Simulation Metal Foams Nano Technology Fancy Physics In Collaboration with: Lehrstuhl Werkstoffkunde und Technologie der Metalle, Erlangen (R.F. Singer, C. Körner) Lehrstuhl für Bauinformatik, TU München (E. Rank) Institut für Computeranwendungen im Bauingenieurwesen, TU Braunschweig (M. Krafczyk) Lehrstuhl für Feststoff- und Grenzflächenverfahrenstechnik, Erlangen (W. Peukert, H.-J. Schmid) Computer Graphics Lab, ETH Zürich (M. Pauly) 15
First Test Breaking Dam Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. 16
Falling Drop Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. 17
Falling Meteor Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. 18
The interface between Compute only fluid Liquid and Gas Special free surface conditions on interface 22
Why so compute intensive? Millions to billions of cells (1000x1000x1000) Thousands to millions of time steps hundreds of operations in each cell and time step The curse of dimensionality! 23
Visualization Ray-tracing Refraction Reflection Caustics About 15 Min per frame = 1 day for 4 secs About same compute time as flow simulation 24
Process Simulation of Foam Production poorly understood: coalescence collapse, drying, solidification etc. Simulation as tool to understand and control the process 26
Rising Bubbles Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. 27
Simultaneously Rising Bubbles Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. 28
Experimental Verification Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. Simulation and Experiment: Diplomarbeit N. Thürey 29
Foaming Simulation Zur Anzeige wird der QuickTime Dekompressor Cinepak benötigt. Zur Anzeige wird der QuickTime Dekompressor benötigt. 30
Fancy Physics Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. 31
Moving Nano Particles in a Liquid Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. K. Iglberger - Master Thesis C. Feichtinger - Diplomarbeit 32
Bio-medical and Bio-chemical Simulation Bood Flow in an Aneurysma HIV-Protease Bio-Electrical Fields 33
Datensatz Pulsating Blood Flow in an Aneurysma Master Thesis Jan GötzG Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. Collaboration with Neuroradiologie (Prof. Dörfler, D Dr. Richter) Image Processing Simulation Fluid Mechanics (Prof. Durst) 34
Datensatz Pulsating Blood Flow in an Aneurysma Master Thesis Jan GötzG Collaboration with Neuro-Radiology (Prof. Dörfler, D Dr. Richter) Image Processing Simulation Fluid Mechanics (Prof. Durst) Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. 35
Bio-Electromagnetic Fields Source Localisation Collaboration with: Chr. Johnson (Univ. of Utah), C. Popa (Ovidius Univ. Constanta), Bart Vanrumste, (Univ. of Canterbury, New Zealand), G. Greiner, F. Fahlbusch (Erlangen), C. Wolters (Münster) Erlangen Neuro Surgeons at work View through operation microscope 36
Simulation or better do experiments? Source localisation by open brain measurements Operation planning with a virtual head model (Chr. Johnson, Utah) 37
Molekular Dynamics Simulation of HIV-Protease Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. H. Sticht (Inst. für f r Biochemie) 38
International Master (and PhD) Programme Computational Engineering What is this about? it is not Computer Science it is not Mathematics it is not a conventional engineering field it is an interdisciplinary combination of all three - the foundation of future science Master Program in Erlangen Hours Option (Elite Program) jointly with TU Munich Information: http://www9.informatik.uni-erlangen.de/ce/ 39
Acknowledgements Collaborators In Erlangen: WTM, LSE, LSTM, LGDV, RRZE, Neurozentrum, Radiologie, etc. Especially for foams: C. Körner (WTM) International: Utah, Technion, Constanta, Ghent, Boulder, München, Zürich,... Dissertationen Projects U. Fabricius (AMG-Methods and SW-Engineering for parallelization) C. Freundl (Parelle Expression Templates for PDE-solver) K. Iglberger (Rigid Body Dynamics) J. Götz (LBM, blood flow) T. Gradl (Parallel multigrid)... and 8 more 25 Diplom- /Master- Thesis Studien- /Bachelor- Thesis Especially for Performance-Analysis/ Optimization for LBM J. Wilke, K. Iglberger, S. Donath, B. Gmeiner... and 23 more KONWIHR, DFG, NATO, BMBF Elitenetzwerk Bayern Bavarian Graduate School in Computational Engineering (with TUM, since 2004) Special International PhD program: Identifikation, Optimierung und Steuerung für f r technische Anwendungen (with Bayreuth and Würzburg) W since Jan. 2006. 40
Thank you for your interest! Questions? 41
Part II-a Towards Scalable FE Software Scalable Algorithms: Multigrid 42
What is Multigrid? Has nothing to do with grid computing A general methodology multi - scale (actually it is the original ) many different applications developped in the 1970s -... Useful e.g. for solving elliptic PDEs large sparse systems of equations iterative convergence rate independent of problem size asymptotically optimal complexity -> algorithmic scalability! can solve e.g. 2D Poisson Problem in ~ 30 operations per gridpoint efficient parallelization - if one knows how to do it best (maybe the only?) basis for fully scalable FE solvers 43
Multigrid: V-Cycle Goal: solve A h u h = f h using a hierarchy of grids Relax on Correct Residual Restrict Interpolate Solve by recursion 44
Part II - b Towards Scalable FE Software Scalable Architecture Hierarchical Hybrid Grids 46
Hierarchical Hybrid Grids (HHG) Unstructured input grid Resolves geometry of problem domain Patch-wise regular refinement generates nested grid hierarchies naturally suitable for geometric multigrid algorithms New: Modify storage formats and operations on the grid to exploit the regular substructures Does an unstructured grid with 100 000 000 000 elements make sense? HHG - Ultimate Parallel FE Performance! 47
HHG refinement example Input Grid 48
HHG Refinement example Refinement Level one 49
HHG Refinement example Refinement Level Two 50
HHG Refinement example Structured Interior 51
HHG Refinement example Structured Interior 52
HHG Refinement example Edge Interior 53
HHG Refinement example Edge Interior 54
Common HHG Misconceptions Hierarchical hybrid grids (HHG) are not only another block structured grid HHG are more flexible (unstructured, hybrid input grids) are not only another unstructured geometric multigrid package HHG achieve better performance unstructured treatment of regular regions does not improve performance 55
Parallel HHG - Framework Design Goals To realize good parallel scalability: Minimize latency by reducing the number of messages that must be sent Optimize for high bandwidth interconnects large messages Avoid local copying into MPI buffers 56
HHG for Parallelization Use regular HHG patches for partitioning the domain 57
HHG Parallel Update Algorithm for each vertex do apply operation to vertex end for update vertex primary dependencies for each edge do copy from vertex interior apply operation to edge copy to vertex halo end for update edge primary dependencies for each element do copy from edge/vertex interiors apply operation to element copy to edge/vertex halos end for update secondary dependencies 58
Part II - c Towards Scalable FE Software Performance Results 59
Single Processor HHG Performance on Itanium for Relaxation of a Tetrahedral Finite Element Mesh 60
HHG: Parallel Scalability #Procs #DOFS x 10 6 #Els x 10 6 #Input Els GFLOP/s Time [s] 64 2,144 12,884 6144 100/75 68 128 4,288 25,769 12288 200/147 69 256 8,577 51,539 24576 409/270 76 512 17,167 103,079 49152 762/545 75 1024 17,167 103,079 49152 1,456/964 43 Parallel scalability of Poisson problem discretized by tetrahedral finite elements - SGI Altix (Itanium-2 1.6 GHz) B. Bergen, F. Hülsemann, U. Ruede: Is 1.7 10 10 unknowns the largest finite element system that can be solved today? SuperComputing, Nov 2005. See also: ISC Award 2006 for Application Scalability. 61
Part III - a Free Surface Flow Simulation The Lattice Boltzmann Method 62
Free surface flow: Breaking Dam Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. 63
The Lattice-Boltzmann Method (2) Weakly compressible approximation of the Navier-Stokes equations Easy implementation Applicable for small Mach numbers (< 0.1) Easy to adapt, e.g. for Complicated or time-varying geometries Free surfaces Additional physical and chemical effects 65
The Lattice-Boltzmann Method (3) Real valued representation of particles Discrete velocities and positions Algorithm proceeds in two steps: Stream Collide 66
Fluid Cell Treatment Algorithm proceeds in two steps: Stream: advect fluid elements (copy DFs to neighbors) Collide: compute collisions of fluid molecules 67
Fluid Cell Treatment Algorithm proceeds in two steps: Stream: advect fluid elements (copy DFs to neighbors) Collide: compute collisions of fluid molecules 68
Fluid Cell Treatment Algorithm proceeds in two steps: Stream: advect fluid elements (copy DFs to neighbors) Collide: compute collisions of fluid molecules 69
Fluid Cell Treatment Algorithm proceeds in two steps: Stream: advect fluid elements (copy DFs to neighbors) Collide: compute collisions of fluid molecules 70
Fluid Cell Treatment Algorithm proceeds in two steps: Stream: advect fluid elements (copy DFs to neighbors) Collide: compute collisions of fluid molecules 71
The Collide Step Amounts for collisions of particles during movement Weigh equilibrium velocities and velocities from streaming depending on fluid viscosity 72
Stream/Collide: LBM in Equations Zur Anzeige wird der QuickTime Dekompressor TIFF (LZW) benötigt. Equilibrium DF: Zur Anzeige wird der QuickTime Dekompressor TIFF (LZW) benötigt. Zur Anzeige wird der QuickTime Dekompressor TIFF (LZW) benötigt. Zur Anzeige wird der QuickTime Dekompressor TIFF (LZW) benötigt. Zur Anzeige wird der QuickTime Dekompressor TIFF (LZW) benötigt. Zur Anzeige wird der QuickTime Dekompressor TIFF (LZW) benötigt. Zur Anzeige wird der QuickTime Dekompressor TIFF (LZW) benötigt. 73
Stability & Turbulence Modelling Smagorinsky Subgrid Model: Similar to approach for NS-Solvers Model subgrid-scale vortices by locally changing the viscosity Implementation for LBM Reynolds stress tensor computed for each cell Changes only in collision operator Ca. 20% slowdown, significant gain due to decreased resolution requirements 74
Falling Drop with Turbulence Model Zur Anzeige wird der QuickTime Dekompressor benötigt. 75
Falling Drop with Turbulence Model (slower( slower) Zur Anzeige wird der QuickTime Dekompressor benötigt. 76
Part III - b Free Surface Flow Simulation Volume of Fluids 77
Free surfaces with LBM Metal Foams huge gas volumes Only simulate and track fluid motion Compute boundary conditions at free surface Three cell types: Empty/Gas, Fluid, Interface 78
Boundary Conditions Problem: Missing distribution functions at interface cells after streaming! Gas Liquid Reconstruction such that macroscopic boundary conditions are satisfied. Körner et al. Lattice Boltzmann Model for Free Surface Flow, Journal of Computational Physics 79
Free surface simulations Algorithmic Overview: Before stream step, compute mass exchange across cell boundaries for interface cells Calculate bubble volumes and pressure Surface curvature for surface tension Change topology if interface cells become full or empty keep layer of interface cells closed 88
Free Surface Cell Conversions Emptied interface cell > gas Filled interface cell > fluid Guarantee closed layer of interface cells Redistribute mass in the neighborhood 89
Curvature calculation (version I) Alternative approaches: Integrate normals over surface (weighted triangles) Level set methods (track surface as implicit function) 90
Surface Tension (Vers. 2) Marching-cube surface triangulation Compute a curvature for each triangle κ = 1 2 δa δv δa = A A n r n r 3 1 A n r 2 δv A Associate with each LBM cell the average curvature of its triangles Complicated Beats level sets for our applications (mass conservation) 91
Part III - c Free Surface Flow Simulation Application: Metal Foam 92
Towards Simulating Metal Foams Bubble growth, coalescence, collapse, drainage, rheology,, etc. are still poorly understood Simulation as a tool to better understand, control and optimize the process 94
Rising Bubbles Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. 95
More Rising Bubbles Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. 96
Simulation Verification by Experiment Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. Simulation and Experiment: Diplomarbeit N. Thürey 97
Foaming Simulation 1 Zur Anzeige wird der QuickTime Dekompressor Cinepak benötigt. 98
Numerical Experiment: Single Rising Bubble 101
Part III - d Free Surface Flow Simulation Parallel Performance 102
Parallelization Standard LBM-Code: Scalability on SR 8000-F1 Largest Simulation: 1,08*10 9 cells 370 GByte memory Communication Cost because of large data volume (64 MByte) Efficiency ~ 75% Dissertation T. Pohl (2006) 103
Free surface LBM-Code Parallelization Standard LBM Free surface LBM 1 sweep through grid 5 sweeps through grid Cell type changes, Closed boundary for bubbles, Initialization of modified cells, Mass balance correction 104
Free surface LBM-Code: Parallelization Standard LBM Free surface LBM 1 sweep through grid 5 sweeps through grid 1 row of ghost nodes 4 rows of ghost nodes 105
Performance on SR 8000 Standard LBM-Code Free surface LBM-Code Performance lousy on a single node! Conditionals: 2,9 SLBM 51 free surface LBM Pentium 4: almost no degradation ~ 10% SR 8000: enormous degradation (pseudo-vector, predictable jumps) 106
Parallel Performance LSS- Cluster Fujitsu- Siemens 107
Part III - c Free Surface Flow Simulation Visualization and Animation 108
Adaptive Grids Performance Zur Anzeige wird der QuickTime Dekompressor benötigt. Speed up: factor 2-4 for larger resolutions Insignifcant overhead for small resolutions 109
Example Coupled Simulations Zur Anzeige wird der QuickTime Dekompressor benötigt. 113
Physically Based Animation Special Effects e.g. for Computer generated movies Realistic appearance necessary, but only where it s absolutely necessary > Control Fluid or other simulations Examples of Fluid Simulations in Movies: Harry Potter 4 (ship-scene), Ice Age 2 (throughout), Poseidon 114
Simulations with Fluid Control Zur Anzeige wird der QuickTime Dekompressor mpeg4 benötigt. 115
Part IV Outlook 116
Acknowledgements Collaborators In Erlangen: WTM, LSE, LSTM, LGDV, RRZE, Neurozentrum, Radiologie, etc. Especially for foams: C. Körner (WTM) International: Utah, Technion, Constanta, Ghent, Boulder, München, Zürich,... Dissertationen Projects U. Fabricius (AMG-Verfahren and SW-Engineering for parallelization) C. Freundl (Parelle Expression Templates for PDE-solver) J. Härtlein (Expression Templates for FE-Applications) N. Thürey (LBM, free surfaces) T. Pohl (Parallel LBM)... and 6 more 19 Diplom- /Master- Thesis Studien- /Bachelor- Thesis Especially for Performance-Analysis/ Optimization for LBM J. Wilke, K. Iglberger, S. Donath... and 23 more KONWIHR, DFG, NATO, BMBF Elitenetzwerk Bayern Bavarian Graduate School in Computational Engineering (with TUM, since 2004) Special International PhD program: Identifikation, Optimierung und Steuerung für f r technische Anwendungen (with Bayreuth and Würzburg) W since Jan. 2006. 117
Talk is Over Please wake up! 118