PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms



Similar documents
Imperial College London

Turbomachinery CFD on many-core platforms experiences and strategies

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating CFD using OpenFOAM with GPUs

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Computational Modeling of Wind Turbines in OpenFOAM

GPU Acceleration of the SENSEI CFD Code Suite

C3.8 CRM wing/body Case

2013 Code_Saturne User Group Meeting. EDF R&D Chatou, France. 9 th April 2013

Aerodynamic Department Institute of Aviation. Adam Dziubiński CFD group FLUENT

HPC Deployment of OpenFOAM in an Industrial Setting

BBIPED: BCAM-Baltogar Industrial Platform for Engineering design

High Performance Computing in CST STUDIO SUITE

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Next Generation GPU Architecture Code-named Fermi

HPC Wales Skills Academy Course Catalogue 2015

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

XFlow CFD results for the 1st AIAA High Lift Prediction Workshop

GPGPU accelerated Computational Fluid Dynamics

The Value of High-Performance Computing for Simulation

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

Performance Comparison of a Vertical Axis Wind Turbine using Commercial and Open Source Computational Fluid Dynamics based Codes

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

ME6130 An introduction to CFD 1-1

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications

Current Status and Challenges in CFD at the DLR Institute of Aerodynamics and Flow Technology

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Multiphase Flow - Appendices

Experiences Extending the CFD Solver of the PDE Framework Peano

THE CFD SIMULATION OF THE FLOW AROUND THE AIRCRAFT USING OPENFOAM AND ANSA

CFD Applications using CFD++ Paul Batten & Vedat Akdag

Simulation Platform Overview

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Recent Advances in HPC for Structural Mechanics Simulations

AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications

HPC with Multicore and GPUs

Numerical simulation of maneuvering combat aircraft

FINE TM /Marine. CFD Suite for Marine Applications. Advanced Development for Better Products.

The Influence of Aerodynamics on the Design of High-Performance Road Vehicles

Titelmasterformat durch Klicken bearbeiten

How To Write A Program For The Pd Framework

O.F.Wind Wind Site Assessment Simulation in complex terrain based on OpenFOAM. Darmstadt,

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

CFD Lab Department of Engineering The University of Liverpool

Pushing the limits. Turbine simulation for next-generation turbochargers

Extension of the OpenFOAM CFD tool set for modelling multiphase flow

Part I Courses Syllabus

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

Enterprise HPC & Cloud Computing for Engineering Simulation. Barbara Hutchings Director, Strategic Partnerships ANSYS, Inc.

Aerodynamic Simulation. Viscous CFD Code Validation

Express Introductory Training in ANSYS Fluent Lecture 1 Introduction to the CFD Methodology

Modeling Rotor Wakes with a Hybrid OVERFLOW-Vortex Method on a GPU Cluster

walberla: A software framework for CFD applications on Compute Cores

Advanced discretisation techniques (a collection of first and second order schemes); Innovative algorithms and robust solvers for fast convergence.

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

From CFD to computational finance (and back again?)

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model

Computational Fluid Dynamics in Automotive Applications

Computational Fluid Dynamics Research Projects at Cenaero (2011)

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Stream Processing on GPUs Using Distributed Multimedia Middleware

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Evaluation of CUDA Fortran for the CFD code Strukti

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

CFD analysis for road vehicles - case study

Overview of HPC Resources at Vanderbilt

MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA

Trends in High-Performance Computing for Power Grid Applications

Computational Fluid Dynamics. Department of Aerospace Engineering, IIT Bombay

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

CFD modelling of floating body response to regular waves

Customer Training Material. Lecture 2. Introduction to. Methodology ANSYS FLUENT. ANSYS, Inc. Proprietary 2010 ANSYS, Inc. All rights reserved.

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Fire Simulations in Civil Engineering

Acceleration of a CFD Code with a GPU

Case Study on Productivity and Performance of GPGPUs

Marine CFD applications using OpenFOAM

Computational Fluid Dynamics

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Introduction to CFD Analysis

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS

CFD Based Air Flow and Contamination Modeling of Subway Stations

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

GPUs for Scientific Computing

OpenFOAM Optimization Tools

Introduction to CFD Analysis

ST810 Advanced Computing

Aeroacoustic simulation based on linearized Euler equations and stochastic sound source modelling

Advanced CFD Methods 1

Parallel Large-Scale Visualization

Transcription:

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014

Overview Motivation Flux Reconstruction Many-Core Hardware PyFR Results Summary

Motivation Current industry standard CFD tools have limited capabilities

Motivation Technology is decades old and designed for solving steady flow problems (using RANS approach)

Motivation Technology is decades old and designed for solving steady flow problems (using RANS approach)

Motivation Need to expand the industrial CFD envelope

Motivation Its not just me who thinks this [1]! [1] Murray Cross, Airbus, Technology Product Leader - Future Simulations (2012)

Motivation Objective of our research is to advance industrial CFD capabilities from their current RANS plateau

Motivation We aim to develop the de facto industry standard technology for affordable (and hence industrially relevant) high-fidelity scale-resolving simulations of unsteady flow phenomena within the vicinity of complex geometric configurations

Motivation Achieved by intelligently leveraging benefits of (and synergies between) high-order Flux Reconstruction (FR) methods for unstructured grids and massively-parallel Many-Core Hardware platforms

Motivation Flux Reconstruction + Many-Core Hardware

Flux Reconstruction Flux Reconstruction (FR) approach to high-order methods was first proposed by Huynh in 2007 [2] High-order accurate in space Works on unstructured grids [2] H. T. Huynh. A flux reconstruction approach to high-order schemes including discontinuous Galerkin methods. AIAA Paper 2007-4079. 2007

Flux Reconstruction What does high-order mean?

Flux Reconstruction What does high-order mean? Order N accuracy implies error of order h N High-order typically refers to N > 2

Flux Reconstruction Why is high-order useful?

Flux Reconstruction Why is high-order useful? 2 nd Order (25,600 DOF) 4 th Order (25,600 DOF) t = 0 t = 0

Flux Reconstruction Why is high-order useful? 2 nd Order (25,600 DOF) 4 th Order (25,600 DOF) t = 0 t = 0

Flux Reconstruction Why is high-order useful? 2 nd Order (25,600 DOF) 4 th Order (25,600 DOF) t = 1 t = 1

Flux Reconstruction Why is high-order useful? 2 nd Order (25,600 DOF) 4 th Order (25,600 DOF) t = 2 t = 2

Flux Reconstruction Why is high-order useful? 2 nd Order (25,600 DOF) 4 th Order (25,600 DOF) t = 3 t = 3

Flux Reconstruction Why is high-order useful? 2 nd Order (25,600 DOF) 4 th Order (25,600 DOF) t = 4 t = 4

Flux Reconstruction Why is high-order useful? 2 nd Order (25,600 DOF) 4 th Order (25,600 DOF) t = 5 t = 5

Motivation Introduction Flux Theory Reconstruction PyFR Results Many-Core Summary Hardware PyFR Results Summary Flux Introduction Reconstruction Why is high-order useful? 2 nd Order (25,600 DOF) 4 th Order (25,600 DOF) t = 20 t = 20

Flux Reconstruction Why is high-order useful? 2 nd Order (25,600 DOF) 4 th Order (25,600 DOF) t = 40 t = 40

Flux Reconstruction Why is high-order useful? 2 nd Order (25,600 DOF) 4 th Order (25,600 DOF) t = 60 t = 60

Flux Reconstruction Why is high-order useful? 2 nd Order (25,600 DOF) 4 th Order (25,600 DOF) t = 180 t = 180

Flux Reconstruction What does unstructured mean?

Flux Reconstruction What does unstructured mean? Unstructured 4th Order Structured 4th Order t = 0 t = 0

Flux Reconstruction What does unstructured mean? Unstructured 4th Order Structured 4th Order t = 1 t = 1

Flux Reconstruction What does unstructured mean? Unstructured 4th Order Structured 4th Order t = 2 t = 2

Flux Reconstruction What does unstructured mean? Unstructured 4th Order Structured 4th Order t = 3 t = 3

Flux Reconstruction What does unstructured mean? Unstructured 4th Order Structured 4th Order t = 4 t = 4

Flux Reconstruction What does unstructured mean? Unstructured 4th Order Structured 4th Order t = 5 t = 5

Flux Reconstruction Why is unstructured useful?

Flux Reconstruction Why is unstructured useful?

Flux Reconstruction Also Unifying (can cast various known + new schemes within FR framework) Significant element locality Abundance of structured compute

Motivation Flux Reconstruction Many-Core Hardware PyFR Results Summary Many-Core Hardware Multi-core CPUs now ubiquitous (with 10 s of cores per device)

Many-Core Hardware Many-core accelerators a hot topic (100s -1000s cores per device) Nvidia K40

Many-Core Hardware Many-core accelerators a hot topic (100s -1000s cores per device) Nvidia K40 Intel Phi AMD S10000

Many-Core Hardware Rapid evolution of hardware Need to exploit massive parallelism Temporal and spatial data locality is important Limited memory per device Exposed to complex memory hierarchy

Many-Core Hardware So, several challenges...

Many-Core Hardware But significant compute available if it can be harnessed

PyFR Flux Reconstruction PyFR + Many-Core Hardware

PyFR Features (v0.1.0 - current stable release) Governing Equations Spatial Discretisation Compressible Euler Compressible Navier Stokes Arbitrary order FR on mixed unstructured grids (tris, quads, hexes) Temporal Discretisation Range of explicit Runge-Kutta schemes Platforms Precision Input Output CPU clusters (C-OpenMP-MPI) Nvidia GPU clusters (CUDA-MPI) Single Double Gmsh Paraview

PyFR Python Outer Layer (Hardware Independent) Python Outer Layer (Hardware Independent) Setup Distributed memory parallelism Outer for loop and calls to Hardware Specific Kernels

PyFR Need to generate the Hardware Specific Kernels Python Outer Layer (Hardware Independent) Setup Distributed memory parallelism Outer for loop and calls to Hardware Specific Kernels

PyFR Templated Kernels (Hardware Independent) Python Outer Layer (Hardware Independent) Setup Distributed memory parallelism Outer for loop and calls to Hardware Specific Kernels Templated Kernels (Hardware Independent) Point-wise non-linear kernels (flux function, Riemann solver) Matrix multiply kernels

PyFR Passed through Mako Derived Templating Engine Python Outer Layer (Hardware Independent) Setup Distributed memory parallelism Outer for loop and calls to Hardware Specific Kernels Templated Kernels (Hardware Independent) Point-wise non-linear kernels (flux function, Riemann solver) Matrix multiply kernels Mako Derived Templating Engine Generates hardware specific kernel code Deals with device level parallelism

Motivation Flux Reconstruction Many-Core Hardware PyFR Results Summary PyFR Generates Hardware Specific Kernels Python Outer Layer (Hardware Independent) Templated Kernels (Hardware Independent) Setup Distributed memory parallelism Outer for loop and calls to Hardware Specific Kernels C/OpenMP Hardware Specific Kernels CUDA Hardware Specific Kernels Point-wise non-linear kernels (flux function, Riemann solver) Matrix multiply kernels OpenCL Hardware Specific Kernels Mako Derived Templating Engine Generates hardware specific kernel code Deals with device level parallelism

Motivation Flux Reconstruction Many-Core Hardware PyFR Results Summary PyFR These can now be called Python Outer Layer (Hardware Independent) Templated Kernels (Hardware Independent) Setup Distributed memory parallelism Outer for loop and calls to Hardware Specific Kernels C/OpenMP Hardware Specific Kernels CUDA Hardware Specific Kernels Point-wise non-linear kernels (flux function, Riemann solver) Matrix multiply kernels OpenCL Hardware Specific Kernels Mako Derived Templating Engine Generates hardware specific kernel code Deals with device level parallelism

PyFR ~5,000 lines of code

PyFR Speed critical kernel is sgemm/dgemm In first instance we can drop in vendor BLAS (cublas etc.) But can we do better than this

Motivation Flux Reconstruction Many-Core Hardware PyFR Results Summary Results Unstructured 5th orderataccurate Macurved = 0.2inRe element space = 3900 hexahedral mesh Cylinder

Results T106c turbine blade Ma = 0.65 Re = 80,000

Results Unstructured curved element hexahedral mesh

Results 4th order accurate in space

Results Good agreement with experimental data Isentropic Mach Number Chord

Results c.f. RANS result from leading commercial software Isentropic Mach Number Chord

Results Double ~500,000 1x10 4th Unstructured Flow ~100 order over hours precision RK4 accurate a deployed hexahedral time-steps 4 x K20c in spoiler space GPUs meshma in = a desktop 0.3 Re = workstation 10,000

Results 3D Navier-Stokes weak scaling over M2090s (Emerald) 1.2 1 0.8 0.6 0.4 0.2 0 0 16 32 48 64 80 96 112 PyFR Ideal

Results 3D Navier-Stokes strong scaling over M2090s (Emerald) 32 28 24 20 16 12 8 4 0 0 4 8 12 16 20 24 28 32 PyFR Ideal

Motivation Flux Reconstruction Many-Core Hardware PyFR Results Summary Results 1.3 billion DOF on 184 x M2090 GPUs (Emerald)

Summary Objective of research is to advance industrial CFD capabilities from their current RANS plateau

Summary We aim to develop the de facto industry standard technology for affordable (and hence industrially relevant) high-fidelity scale-resolving simulations of unsteady flow phenomena within the vicinity of complex geometric configurations

Summary Achieved by intelligently leveraging benefits of (and synergies between) High-Order Flux Reconstruction (FR) methods for unstructured grids and massively-parallel Many-Core Hardware platforms

Summary Web: www.pyfr.org Twitter: @PyFR_Solver

PhD Students Freddie Witherden (Imperial College) Antony Farrington (Imperial College) George Ntemos (Imperial College) Harry Davis (Imperial College)

Emerald The authors would like to acknowledge the work presented here made use of the EMERALD HPC facility provided by the Centre for Innovation

Funding Early Career Fellowship Platform Grant UK Turbulence Consortium 4 x PhD Studentships Hardware Donations

Questions Web: www.imperial.ac.uk/aeronautics/research/vincentlab Twitter: @Vincent_Lab