Efficient Shallow Water Simulations on GPUs SIAM Conference on Mathematical & Computational Issues in the Geosciences Long Beach, California, USA, 2011-03-23 André R. Brodtkorb, Ph.D., Andre.Brodtkorb@sintef.no SINTEF ICT, Department of Applied Mathematics, Norway 3
Talk Outline Minisymposium Introduction and Motivation The Shallow Water Equations Graphics Processing Units Shallow Water Simulations on GPUs Implementing and adapting numerical schemes Accuracy of GPUs Verification and validation Performance Summary 4
Acknowledgements Martin L. Sætra Knut-Andreas Lie Trond R. Hagen Jostein R. Natvig Mustafa Altinakar Yan Ding Jaswant Singh 5
The Shallow Water Equations First described by de Saint-Venant (1797-1886) Gravity-induced fluid motion 2D free surface Governing flow is horizontal Conservation of mass and momentum Not only for water: Simplification of atmospheric flow Avalanches... Water image from http://freephoto.com / Ian Britton 6
Target Application Areas Tsunamis Floods 2011: Japan (5321+) 2004: Indian Ocean (230 000) Storm Surges 2010: Pakistan (2000+) 1931: China floods (2 500 000+) Dam breaks 2005: Hurricane Katrina (1836) 1530: Netherlands (100 000+) 1975: Banqiao Dam (230 000+) 1959: Malpasset (423) Images from wikipedia.org, www.ecolo.org 7
Why GPUs? Proposition: A GPU is faster than a CPU We can get higher quality results in the same timeframe In preparation for events: Evaluate more scenarios Creation of inundation maps Creation of Emergency Action Plans In response to ongoing events Simulate possible scenarios in real-time Determine who to evacuate based on simulation, not guesswork Inundation map from Los Angeles County Tsunami Inundation Maps, http://www.conservation.ca.gov/cgs/geologic_hazards/tsunami/inundation_maps/losangeles/pages/losangeles.aspx 8
Do we need more speed? Many existing dam break inundation maps are based on 1D simulations Approximate valleys using 1D cross sections Much bias to individual engineer skills Assumptions only hold for valleys Many dams and levees even lack emergency action plans! In US: dams without plans 114.000 miles of levee systems Simulation using GPUs enables high quality 2D simulations See also M. Altinakar, P. Rhodes, Faster-than-Real-Time Operational Flood Simulation using GPGPU Programming 9
2011 Japan Tsunami Tsunami warnings must be issued in minutes Huge computational domains Rapid wave propagation Uncertainties wrt. Tsunami cause Warnings must be accurate Wrongful warning are dangerous! GPUs can be used to increase quality of warnings Images from US Navy (top), NASA (left), NOAA (right) 10
The Graphics Processing Unit (GPU) CPU GPU Cores 4 16 Float ops / clock 64 1024 Frequency (MHz) 3400 1544 GigaFLOPS 217 1580 Memory (GiB) 32+ 3 Performance Memory Bandwidth 11
GPU Programming: From Abuse to Industrial Use OpenCL DirectX DirectCompute BrookGPU AMD Brook+ AMD CTM / CAL NVIDIA CUDA ~2000 ~2005 ~2010 Graphics APIs Various Abstractions Dedicated C-based languages 12
Shallow Water Simulations on GPUs 13
The Shallow Water Equations (SWE) Vector of Conserved variables Flux Functions Bed slope source term Bed friction source term Numerical Simulation of the SWE: Hyperbolic partial differential equation Enables explicit schemes Solutions form discontinuities / shocks Require high accuracy in smooth parts without oscillations near discontinuities Solutions include dry areas Negative water depths ruin simulations Requirements to accuracy Order of spatial/temporal discretization Floating point rounding errors 14
The Finite Volume Scheme of Choice* Scheme of choice: A. Kurganov and G. Petrova, A Second-Order Well-Balanced Positivity Preserving Central-Upwind Scheme for the Saint-Venant System Communications in Mathematical Sciences, 5 (2007), 133-160 Second order accurate fluxes Total Variation Diminishing Well-balanced (captures lake-at-rest) Good (but not perfect) match with GPU execution model * With all possible disclaimers 15
Kurganov-Petrova Spatial Discretization Vector of Conserved variables Bed friction source term Bed slope source term Flux Functions Continuous variables Discrete variables Slope reconstruction Flux calculation Evaluate integration points Dry states fix 16
Temporal Discretization 17
Putting it Together: A Simulation Cycle 1. Calculate fluxes 2. Calculate Dt 6. Apply boundary conditions 3. Halfstep 5. Evolve in time 4. Calculate fluxes 18
Mapping to the GPU Flux Kernel 87% of runtime Nine-point stencil operation Time-step size ~1% of runtime Simple parallel reduction Time integration 12% of runtime Solve the time ODE for each cell Boundary Conditions ~1% of runtime Fill inn ghost cell values Want a minimum amount of kernels (GPU programs) Want each kernel to be massively parallel Four kernels is the best we can do whilst still obeying dependencies 19
Domain Decomposition Traditional CUDA block decomposition Each Streaming Multiprocessor of the GPU computes on a small 2D patch Neighboring patches use overlap to exchange information Global ghost cells for boundary conditions Global ghost cells (and ghost cell expansion) used for multi-gpu simulations Many different optimization parameters: shared mem, thread occupancy, warp size, etc. 20
Accuracy: Single Versus Double Precision What is the relative error in mass conservation for single and double precision? What is the discrepancy between the two? Three different test cases Low water depth (wet only) High water depth (wet only) Synthetic terrain with dam break (wet-dry) Conclusions: We have loss in conservation on the order of machine epsilon Single precision gives larger error than double Errors related to the wet-dry front is more than an order of magnitude larger For our application areas, single precision is sufficient 21
Verification: Parabolic basin Single precision is sufficient, but do we solve the equations? Test against analytical 2D parabolic basin case (Thacker) Planar water surface oscillates 100 x 100 cells Horizontal scale: 8 km Vertical scale: 3.3 m Simulation and analytical match well But, as most schemes, growing errors along wet-dry interface 22
Validation: Barrage du Malpasset We model the equations correctly, but can we model real events? South-east France near Fréjus: Barrage du Malpasset Double curvature dam, 66.5 m high, 220 m crest length, 55 million m 3 Bursts at 21:13 December 2nd 1959 Reaches Mediterranean in 30 minutes (speeds up-to 70 km/h) 423 casualties, $68 million in damages Validate against experimental data from 1:400 model 482 000 cells (1099 x 439 cells) 15 meter resolution Our results match experimental data very well Discrepancies at gauges 14 and 9 present in most (all?) published results Image from google earth, mes-ballades.com 23
Video http://www.youtube.com/watch?v=fbzbr-fjrwy 24
Bonus: MultiGPU Performance Single node with four GPUs Near-perfect weak and strong scaling on two generations of hardware (S1070, C2050) Up-to 350 million cells domain 25
Summary Simulation of the shallow water equations is important Devastating forces: tsunamis, dam breaks, floods, storm surges The problem maps well to GPUs Single precision is not an issue Verification and validation addressed Not a toy model any more GPUs enable more accurate results Evaluate more scenarios Simulate with higher resolution Or do both! 26
Thank you for your attention Contact: André R. Brodtkorb Email: Andre.Brodtkorb@sintef.no Homepage: http://babrodtk.at.ifi.uio.no/ Youtube: http://youtube.com/babrodtk SINTEF homepage: http://www.sintef.no/heterocomp 27
References A. R. Brodtkorb, Scientific Computing on Heterogeneous Architectures, Ph.D. thesis, University of Oslo, ISSN 1501-7710, No. 1031, 2010. A. R. Brodtkorb, T. R. Hagen, K.-A. Lie and J. R. Natvig, Simulation and Visualization of the Saint-Venant System using GPUs, Computing and Visualization in Science, special issue on Hot topics in Computational Engineering, 13(7), (2011), pp. 341--353, DOI: 10.1007/s00791-010-0149-x. M. L. Sætra and A. R. Brodtkorb, Shallow Water Simulations on Multiple GPUs, Proceedings of the Para 2010 Conference, Lecture Notes in Computer Science, Springer, 2010. A. R. Brodtkorb, M. L. Sætra, and M. Altinakar, Efficient Shallow Water Simulations on GPUs: Implementation, Visualization, Verification, and Validation, in review, 2010. Preprints and links to papers available on http://babrodtk.at.ifi.uio.no/ 28