GPGPU accelerated Computational Fluid Dynamics



Similar documents
GPGPU acceleration in OpenFOAM

Accelerating CFD using OpenFOAM with GPUs

AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

Introduction to GPGPU. Tiziano Diamanti

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Case Study on Productivity and Performance of GPGPUs

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Introduction to GPU Programming Languages

Computer Graphics Hardware An Overview

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Evaluation of CUDA Fortran for the CFD code Strukti

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

HPC enabling of OpenFOAM R for CFD applications

Parallel Programming Survey

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

High Performance Computing in CST STUDIO SUITE

Turbomachinery CFD on many-core platforms experiences and strategies

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

GPU Parallel Computing Architecture and CUDA Programming Model

Introduction to GPU hardware and to CUDA

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

Parallel Computing. Introduction

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Large-Scale Reservoir Simulation and Big Data Visualization

YALES2 porting on the Xeon- Phi Early results

ST810 Advanced Computing

OpenFOAM Workshop. Yağmur Gülkanat Res.Assist.

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

QCD as a Video Game?

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

GPUs for Scientific Computing

GPU Acceleration of the SENSEI CFD Code Suite

HPC with Multicore and GPUs

Retargeting PLAPACK to Clusters with Hardware Accelerators

Parallel Computing with MATLAB

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

GPU Architecture. Michael Doggett ATI

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Chapter 2 Parallel Architecture, Software And Performance

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

The Methodology of Application Development for Hybrid Architectures

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

Introduction to GPU Computing

GeoImaging Accelerator Pansharp Test Results

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

GPU for Scientific Computing. -Ali Saleh

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

This offering is not approved or endorsed by OpenCFD Limited, the producer of the OpenFOAM software and owner of the OPENFOAM and OpenCFD trade marks.

Next Generation GPU Architecture Code-named Fermi

SUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS UPDATE

An Introduction to Parallel Computing/ Programming

Writing Applications for the GPU Using the RapidMind Development Platform

Le langage OCaml et la programmation des GPU

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

CFD Implementation with In-Socket FPGA Accelerators

OpenFOAM: Computational Fluid Dynamics. Gauss Siedel iteration : (L + D) * x new = b - U * x old

GPU Renderfarm with Integrated Asset Management & Production System (AMPS)

High Performance Matrix Inversion with Several GPUs

HPC Wales Skills Academy Course Catalogue 2015

SGI HPC Systems Help Fuel Manufacturing Rebirth

Real-time Visual Tracker by Stream Processing

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications

OpenFOAM at FM LTH. Erdzan Hodzic. FM Seminars: 16-march Division of Fluid Mechanics, Department of Energy Sciences, Lund University

Graphic Processing Units: a possible answer to High Performance Computing?

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

Hardware Acceleration for CST MICROWAVE STUDIO

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Overview of HPC Resources at Vanderbilt

HPC Deployment of OpenFOAM in an Industrial Setting

CUDA programming on NVIDIA GPUs

Parallel Algorithm Engineering

Multicore Parallel Computing with OpenMP

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Assessing the Performance of OpenMP Programs on the Intel Xeon Phi

~ Greetings from WSU CAPPLab ~

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA

High Performance GPGPU Computer for Embedded Systems

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Transcription:

t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute of Technology Thorsten Grahs Institute of Scientific Computing/move-csc 2nd October 2013 Slide 1/22

Overview 1 HPC on GPGPUs 2 GPU-plugins f. OpenFOAM 3 Comparison CPU-GPU Slide 2/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

HPC on GPGPUs Framework General Purpose Graphical Processing Units High Performance Computing on free programmable graphic devices (Unified shader approach, XBox 2005) 2006 first Tesla Series from NVIDIA Programming model (CUDA - Compute Unified Device Architecture) Since 2009 Open CL (Open Computing Language) Slide 3/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

Why GPGPUs are perfect for HPC? CPU vs. GPU Chip-Design CPU is optimized for serial tasks (single thread) GPU is optimized for massive parallel data handling (multiple thread) GPU does not care if pixel data has to be handled (tessellation, transformation, rendering)...... or scientific calculation has to be performed. Slide 4/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

Single instruction multiple threads Massive parallel data throughput High demand on computational power (real time rendering) Programing model inspired by vector computers (SIMD) Goal: Work off as many threads in parallel as possible Through-put orientated approach Accomplished by: Many Arithmetic Logical Units High clock rate of the data bus Highly suitable for massive parallel computing Slide 5/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

NVIDIA Quadro 6000 NVIDIA Quadro 6000 Affordable computing cluster in your workstation CUDA cores: 448 GFlops (SP): 1030.4 GFlops (DP): 515.2 Frame buffer: 6 GB GDDR5 Memory bandwidth: 144 GB/s Cost: 3.500 e( 4,800 $) Slide 6/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

GPGPU accelerated CFD Use GPGPU acceleration in HPC demanding areas Computational Fluid Dynamics Focus in this talk OpenFOAM Open source code allows Field Operation And Manipulation (FOAM) Slide 7/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

Properties OpenFOAM Open Source tool box Proven for industrial use Access to source code Programmable by top-level-code (highly object orientated) Widespread use in academia and industry Slide 8/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

Advantage Full user control e.g. numerical schemes & matrix solvers Discretization Use one of over 50 numerical schemes (or program our own) Matrix solver Use different solvers and preconditioned (GAMG, CG, BiCG, BiCGstab,... ) (or your own) Slide 9/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013 divschemes { div(rho*phi,u) Gauss linearupwind div(phi,k) Gauss linearupwind div(phi,omega) Gauss linearupwind div((mueff*dev(t(grad(u))))) Gauss l } p { solver PCG; preconditioner DIC; tolerance 1e-6; reltol 0; };

...means for GPU use... just hook in your stuff via an solver interface our use one of this libraries Open Source ofgpu v1.0 Linear solver library (Symscape) CUFFlink (CUda For Foam Link) Dan Combest speed-it classic (Vratis free version) commercial (not tested) Culises (FluidDyna) Speed-IT 2.3 (Vratis) Slide 10/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

Proceeding Compile the plugin Check in the library (controldict) functions { cudagpu { type cudagpu; functionobjectlibs ( " gpu " ); cudadevice 0; } } Declare solver (fvsolutions) p { solver PCGgpu; preconditioner smoothed_aggregation; tolerance 1e-06; reltol 0.01; } Slide 11/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

... so far Very nice (easy use) but where s the beef??? What is the acceleration one can get? With how many CPUS/GPGPUS (Hardware) Does this pay off? Slide 12/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

Validation Coming to numbers one mostly see something like this: Test case lid-driven cavity with very small residuals i.e (1e-15) in order to keep the matrix solver busy. Slide 13/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

but who needs a lid-driven cavity... Speed-up for lid-driven cavity (icofoam solver) Hardware 512000 1 Mill cells remark CPU only 1.00 1.00 ofgpu1.0 3.03 3.35 Single Precision cufflink fpe fpe OF 1.6ext. SpeedITclassic 2.74 3.24 SP (tubs) 1.32 1.54 Student project TUBS 8 CPUs 2.56 2.05 How does these plugins perform on real test cases? Slide 14/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

KRISO container ship example Test Cases (1.8 M cells) a) Solver: simplefoam (steady state) relative Tolerance reltol = 0.1 absolute Tolerance atol=e-7/e-8 b) Solver: simplefoam (steady state) relative Tolerance reltol = 0.0 absolute Tolerance atol=e-7/e-8 c) Solver: interfoam (transient) relative Tolerance reltol = 0.1 absolute Tolerance atol=1e-8 Slide 15/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

Speedup KCS example Speed-up for KCS-test case Hardware a) b) c) remark CPU only 1.00 1.00 1.00 ofgpu1.0 3.35 7.95 1.85 SinglePrecision cufflink 1.21 3.84 1.07 OF 1.6ext. speeditclassic fpe fpe 0.66 CG for pressure SP (tubs) 1.26 3.52 1.03 8 CPUs only 3.75 4.00 2.88 8 CPUs + cufflink 1.98 2.05 1.21 Slide 16/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

Commercial library Culises B. Landmann, Accelerating the Numerical Simulation of Heavy-Vehicle Aerodynamics Using GPUs with Culises, ISC 2013, Leipzig, June 2013 Slide 17/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

Ahmdals Law...? July 2013 Our recent findings indicate that the SpeedIT alone cannot accelerate OpenFOAM (and probably other CFD codes) to the satisfactory extent. If you follow our recent reports you will see SpeedIT is attractive for desktop computers but performs worse when compared to server class CPUs, such as Intel Xeon. The reason for such mild acceleration is the Amdahl s Law which states that the acceleration is bounded by the percentage of the code that cannot be parallalized. Slide 18/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

What happened GPUs are made for number crunching Sometimes there is too few to crunch Inner solver iterations for KCS: Velocity 3 pressure 3 (k, ω) 1 To few inner solver iterations in this example Time to copy matrix from CPU to GPU is dominant Most validation test cases are tuned to very small tolerances This gains the impressive speed-up Realistic test cases needs usually less iterations & more relaxed tolerance settings Slide 19/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

Overhead influence (A. Monakov, V. Platono: Accelerating OpenFOAM with a Parallel GPU, 8th OpenFOAM Workshop 2013) Slide 20/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

Consequences For real speedup I see two possible ways: Bring the whole algorithm (i.e. Simple/PISO) to the GPGPU Not only the matrix Student research Project at the Institute Scientific Computing TU Braunschweig Problems: Object-oriented manner of OpenFOAM A whole bunch of basic classes has to be brought to the GPU Program your own solver Example: ELBE-code (Christian Janßen, TU Hamburg-Harburg) LBM-Solver used in a common project Simulation of the water-wading of a car Slide 21/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013

Comparison CPU-FV vs. LBM-GPU CPU-Finite Volume code Commercial solver running on 80 cores, Two-phase flow Hardware: 80 Core cluster Simulation time: 2 weeks GPU-LBM code ELBE code running on a workstation Two-phase flow Hardware: 1 NVIDIA Quadro 6000 GPGPUs Simulation time : 8 hours Speed up 40 Slide 22/22 Thorsten Grahs GPU plugins for OpenFOAM 2nd October 2013