Tridiagonal Solvers on the GPU and Applications to Fluid Simulation. Nikolai Sakharnykh, NVIDIA nsakharnykh@nvidia.com



Similar documents
Using GPU to Compute Options and Derivatives

Effect of Angular Velocity of Inner Cylinder on Laminar Flow through Eccentric Annular Cross Section Pipe

Effect of flow field on open channel flow properties using numerical investigation and experimental comparison

CFD Platform for Turbo-machinery Simulation

Modeling Roughness Effects in Open Channel Flows D.T. Souders and C.W. Hirt Flow Science, Inc.

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Turbomachinery CFD on many-core platforms experiences and strategies

8. Forced Convection Heat Transfer

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

ultra fast SOM using CUDA

Pricing of cross-currency interest rate derivatives on Graphics Processing Units

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Accelerating CFD using OpenFOAM with GPUs

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures

Parallel Computing with MATLAB

Acceleration of a CFD Code with a GPU

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

Chapter 10 LOW PRANDTL NUMBER THERMAL-HYDRAULICS*

DIFFERENTIAL FORMULATION OF THE BASIC LAWS

CUDA programming on NVIDIA GPUs

High Performance CUDA Accelerated Local Optimization in Traveling Salesman Problem

GPUs for Scientific Computing

Optimizing Application Performance with CUDA Profiling Tools

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

GPU Acceleration of the SENSEI CFD Code Suite

OpenCL Programming for the CUDA Architecture. Version 2.3

A Fast Double Precision CFD Code using CUDA

Monte-Carlo Option Pricing. Victor Podlozhnyuk

Spectrum Balancing for DSL with Restrictions on Maximum Transmit PSD

GPU Hardware Performance. Fall 2015

Planning a Managed Environment

CIVE2400 Fluid Mechanics. Section 1: Fluid Flow in Pipes

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Heavy Parallelization of Alternating Direction Schemes in Multi-Factor Option Valuation Models. Cris Doloc, Ph.D.

Resource Pricing and Provisioning Strategies in Cloud Systems: A Stackelberg Game Approach

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

GPGPU accelerated Computational Fluid Dynamics

Real-time Visual Tracker by Stream Processing

TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Steady Flow: Laminar and Turbulent in an S-Bend

Introduction to GPU hardware and to CUDA

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

10.4 Solving Equations in Quadratic Form, Equations Reducible to Quadratics

Regular Specifications of Resource Requirements for Embedded Control Software

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

CRM Customer Relationship Management. Customer Relationship Management

Research on Pricing Policy of E-business Supply Chain Based on Bertrand and Stackelberg Game

MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

4.What is the appropriate dimensionless parameter to use in comparing flow types? YOUR ANSWER: The Reynolds Number, Re.

THE CFD SIMULATION OF THE FLOW AROUND THE AIRCRAFT USING OPENFOAM AND ANSA

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING

Effect of Aspect Ratio on Laminar Natural Convection in Partially Heated Enclosure

On the urbanization of poverty

Poisson Equation Solver Parallelisation for Particle-in-Cell Model

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

Graphic Processing Units: a possible answer to High Performance Computing?

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Introduction to GPU Programming Languages

Sample Pages. Edgar Dietrich, Alfred Schulze. Measurement Process Qualification

Evolutionary Path Planning for Robot Assisted Part Handling in Sheet Metal Bending

High Performance Matrix Inversion with Several GPUs

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

WHITE PAPER. Filter Bandwidth Definition of the WaveShaper S-series Programmable Optical Processor

Optimal Trust Network Analysis with Subjective Logic

CRM Customer Relationship Management. Customer Relationship Management

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

HPC with Multicore and GPUs

High Performance Computing in CST STUDIO SUITE

CUDA for Real Time Multigrid Finite Element Simulation of

Chapter Consider an economy described by the following equations: Y = 5,000 G = 1,000

Black-Scholes option pricing. Victor Podlozhnyuk

Case Study on Productivity and Performance of GPGPUs

Parallel 3D Image Segmentation of Large Data Sets on a GPU Cluster

(Toward) Radiative transfer on AMR with GPUs. Dominique Aubert Université de Strasbourg Austin, TX,

Basic Equations, Boundary Conditions and Dimensionless Parameters

Model of a flow in intersecting microchannels. Denis Semyonov

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

~ Greetings from WSU CAPPLab ~

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

GPU Parallel Computing Architecture and CUDA Programming Model

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

OpenACC 2.0 and the PGI Accelerator Compilers

TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW

Several tips on how to choose a suitable computer

A priori error analysis of stabilized mixed finite element method for reaction-diffusion optimal control problems

Iterative Solvers for Linear Systems

GPGPU Parallel Merge Sort Algorithm

Spring 2011 Prof. Hyesoon Kim

Binary search tree with SIMD bandwidth optimization using SSE

Transcription:

Tridiagonal Solvers on the GPU and Applications to Flid Simlation Nikolai Sakharnykh, NVIDIA nsakharnykh@nvidia.com

Agenda Introdction and Problem Statement Governing Eqations ADI Nmerical Method GPU Implementation and Optimizations Reslts and Ftre Work

Introdction Trblence simlation Direct Nmerical Simlation all scales of trblence epensive Large-Eddy Simlation Reynolds-Averaged Navier-Stokes Research at Compter Science department of Moscow State University Paskonov V.M., Berezin S.B.

Problem Statement Viscid incompressible flid in 3D domain Initial and bondary conditions Eler coordinates: velocity and temperatre

Definitions Density const Velocity (, v, w) Temperatre T Pressre p State eqation p RT RT R gas constant for air

Governing Eqations Continity eqation div 0 Navier-Stokes eqations dimensionless form t T Re Re Reynolds nmber

Reynolds nmber Similarity parameter the ratio of inertia forces to viscosity forces 3D channel: Re V ' L' ' V ' L' - mean velocity - length of pipe ' - dynamic viscosity High Re trblent flow Low Re laminar flow

Governing Eqations Energy eqation dimensionless form T t T T Pr Re T Re Pr Prandtl nmber heat capacity ratio dissipative fnction

Nmerical Method Alternating Direction Implicit (ADI) t y z X Y Z t t y t z

ADI Heat Condction 3 fractional steps X, Y, Z Implicit finite-difference scheme /3,, /3,, /3,,,, /3,, t n k j i n k j i n k j i n k j i n k j i t n k j n n k j i n k j n k j n n k j i n k j t q q q q,,,,,, /3,, /3,, /3,, 0 0 t q

ADI Navier-Stokes Eqation for X velocity need iterations for non-linear PDEs Re z y T z w y v t Re T t Re y y v t Re z z w t X Y Z

ADI Time Step (n-) time step (n) time step (n+) time step Splitting by X Splitting by Y Splitting by Z Updating non-linear parameters Global iterations

ADI Fractional Time Step Linear PDEs Previos layer N time layer : -velocity v: y-velocity w: z-velocity Sweep N + time layer Net layer T: temperatre Solves many tridiagonal systems independently

ADI Fractional Time Step Non-Linear PDEs Previos layer N time layer : -velocity N + ½ time layer Update Local iterations v: y-velocity w: z-velocity Sweep N + time layer Net layer T: temperatre Solves many tridiagonal systems independently

Main Stages of the Algorithm Solve a lot of independent tridiagonal systems Comptationally intensive Easy to parallelize Sbtasks: Evalate dissipation term Update non-linear parameters

Tridiagonal Solvers Overview Simplified Gass elimination Also known as Thomas algorithm, Sweep The fastest serial approach Cyclic Redction methods Attend Yao Zhang s talk Fast Tridiagonal Solvers afterwards!

Sweep algorithm Memory reqirements one additional array of size N Forward elimination step Backward sbstittion step Compleity: O(N)

GPU Implementation All data arrays are stored on GPU Several 3D time-layers overall GB for 999 grid in DP Main kernels Sweep Dissipative fnction evalation Non-linear pdate

Sweep on the GPU One thread solves one system N^ systems on each fractional step Splitting by X Splitting by Y Splitting by Z Each thread operates with D slice in corresponding direction

Sweep performance time steps/sec 9 8 7 NVIDIA Tesla C060 6 5 4 3 float doble 0 Sweep X Sweep Y Sweep Z X splitting is mch slower than Y/Z ones

Sweep going into details Memory bond need to optimize access to the memory Sweep X Sweep Y Sweep Z ncoalesced coalesced coalesced

Sweep optimization Soltion for X-splitting Reorder data arrays and rn Y-splitting Need few additional 3D matri transposes.0.8.5.6.4 time steps/sec..0.7 original 0.8 optimized 0.6 0.4 0. 0.0 float doble

Code analysis GPU version is based on the CPU code // bondary conditions switch (dir) { case X: case X_as_Y: bc_0( ); break; case Y: bc_y0( ); break; case Z: bc_z0( ); break; } a[] = - c / c; _net[base_id] = f_i / c; // forward trace of sweep int id = base_id; int id_prev; for (int k = ; k < n; k++) { id_prev = id; id += p.stride; doble c = v_temp[id]; c = p.m_c3 * c - p.h; c = p.m_c; c3 = - p.m_c3 * c - p.h; } doble q = (c3 * a[k] + c); doble t = / q; a[k+] = - c * t; _net[id] = (f[id] - c3 * _net[id_prev]) * t;

Performance Comparison Test data Grid size of 8/9 8 non-linear iterations ( inner 4 oter) Hardware NVIDIA Tesla C060 Intel Core Qad (4 threads) Intel Core i7 Nehalem (8 threads)

Performance 8 - float time steps/sec 30 7 4 0 7 8 NVIDIA Tesla C060 5 9 6 7 9 Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) 3 0 Dissipation Sweep NonLinear Total

Performance 8 - doble time steps/sec 4 3 0 4 0 9 8 7 6 5 4 3 4 5 NVIDIA Tesla C060 Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) 0 Dissipation Sweep NonLinear Total

Performance 9 - float time steps/sec 9 8 8 8 7 6 5 4 3 8 NVIDIA Tesla C060 Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) 0 Dissipation Sweep NonLinear Total

Performance 9 - doble time steps/sec 6 5 4 3 5 NVIDIA Tesla C060 3 Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) 4 5 0 Dissipation Sweep NonLinear Total

GPU performance SP/DP time steps/sec 9 8 7 6 5 4 3 float doble 0 dissipation sweep nonlinear total In doble precision GPU is only slower than in single precision

Visal reslts Bondary conditions no-slip free = Constant flow at start:, v w 0 No-slip on sides: v w 0 Free at far end: v w 0

Visal Reslts X-slice v w T = 0,9 t = 6 Re = 000

Ftre Work Effective mlti-gpu sage Distribted memory systems Performance improvements High resoltion grids, high Reynolds nmbers

Conclsion High performance and efficiency of GPUs in comple 3D flid simlation CUDA is an easy-to-se tool for GPU compte programming GPU enables new possibilities for researching

Qestions? Thank yo! Keywords: ADI, Tridiagonal Solvers, DNS, Trblence E-mail: nsakharnykh@nvidia.com

References Paskonov V.M., Berezin S.B., Korkhova E.S. (007) A dynamic visalization system for mltiprocessor compters with common memory and its application for nmerical modeling of the trblent flows of viscos flids, Moscow University Comptational Mathematics and Cybernetics ADI method - Doglas Jr., Jim (96), "Alternating direction methods for three space variables", Nmerische Mathematik 4: 4 63

Dissipative Fnction z v y w z w z w z v z y w z v y v y w y v y w z v y w v z y z y