Tridiagonal Solvers on the GPU and Applications to Fluid Simulation. Nikolai Sakharnykh, NVIDIA

Size: px

Start display at page:

Download "Tridiagonal Solvers on the GPU and Applications to Fluid Simulation. Nikolai Sakharnykh, NVIDIA [email protected]"

Randell Waters
9 years ago
Views:

1 Tridiagonal Solvers on the GPU and Applications to Flid Simlation Nikolai Sakharnykh, NVIDIA

2 Agenda Introdction and Problem Statement Governing Eqations ADI Nmerical Method GPU Implementation and Optimizations Reslts and Ftre Work

3 Introdction Trblence simlation Direct Nmerical Simlation all scales of trblence epensive Large-Eddy Simlation Reynolds-Averaged Navier-Stokes Research at Compter Science department of Moscow State University Paskonov V.M., Berezin S.B.

Reynolds-Averaged Navier-Stokes Research at Compter

4 Problem Statement Viscid incompressible flid in 3D domain Initial and bondary conditions Eler coordinates: velocity and temperatre

5 Definitions Density const Velocity (, v, w) Temperatre T Pressre p State eqation p RT RT R gas constant for air

6 Governing Eqations Continity eqation div 0 Navier-Stokes eqations dimensionless form t T Re Re Reynolds nmber

7 Reynolds nmber Similarity parameter the ratio of inertia forces to viscosity forces 3D channel: Re V ' L' ' V ' L' - mean velocity - length of pipe ' - dynamic viscosity High Re trblent flow Low Re laminar flow

8 Governing Eqations Energy eqation dimensionless form T t T T Pr Re T Re Pr Prandtl nmber heat capacity ratio dissipative fnction

9 Nmerical Method Alternating Direction Implicit (ADI) t y z X Y Z t t y t z

10 ADI Heat Condction 3 fractional steps X, Y, Z Implicit finite-difference scheme /3,, /3,, /3,,,, /3,, t n k j i n k j i n k j i n k j i n k j i t n k j n n k j i n k j n k j n n k j i n k j t q q q q,,,,,, /3,, /3,, /3,, 0 0 t q

11 ADI Navier-Stokes Eqation for X velocity need iterations for non-linear PDEs Re z y T z w y v t Re T t Re y y v t Re z z w t X Y Z

12 ADI Time Step (n-) time step (n) time step (n+) time step Splitting by X Splitting by Y Splitting by Z Updating non-linear parameters Global iterations

13 ADI Fractional Time Step Linear PDEs Previos layer N time layer : -velocity v: y-velocity w: z-velocity Sweep N + time layer Net layer T: temperatre Solves many tridiagonal systems independently

14 ADI Fractional Time Step Non-Linear PDEs Previos layer N time layer : -velocity N + ½ time layer Update Local iterations v: y-velocity w: z-velocity Sweep N + time layer Net layer T: temperatre Solves many tridiagonal systems independently

iterations v: y-velocity w: z-velocity Sweep N + time layer

15 Main Stages of the Algorithm Solve a lot of independent tridiagonal systems Comptationally intensive Easy to parallelize Sbtasks: Evalate dissipation term Update non-linear parameters

16 Tridiagonal Solvers Overview Simplified Gass elimination Also known as Thomas algorithm, Sweep The fastest serial approach Cyclic Redction methods Attend Yao Zhang s talk Fast Tridiagonal Solvers afterwards!

The fastest serial approach Cyclic Redction methods

17 Sweep algorithm Memory reqirements one additional array of size N Forward elimination step Backward sbstittion step Compleity: O(N)

18 GPU Implementation All data arrays are stored on GPU Several 3D time-layers overall GB for 999 grid in DP Main kernels Sweep Dissipative fnction evalation Non-linear pdate

19 Sweep on the GPU One thread solves one system N^ systems on each fractional step Splitting by X Splitting by Y Splitting by Z Each thread operates with D slice in corresponding direction

20 Sweep performance time steps/sec NVIDIA Tesla C float doble 0 Sweep X Sweep Y Sweep Z X splitting is mch slower than Y/Z ones

21 Sweep going into details Memory bond need to optimize access to the memory Sweep X Sweep Y Sweep Z ncoalesced coalesced coalesced

22 Sweep optimization Soltion for X-splitting Reorder data arrays and rn Y-splitting Need few additional 3D matri transposes time steps/sec..0.7 original 0.8 optimized float doble

23 Code analysis GPU version is based on the CPU code // bondary conditions switch (dir) { case X: case X_as_Y: bc_0( ); break; case Y: bc_y0( ); break; case Z: bc_z0( ); break; } a[] = - c / c; _net[base_id] = f_i / c; // forward trace of sweep int id = base_id; int id_prev; for (int k = ; k < n; k++) { id_prev = id; id += p.stride; doble c = v_temp[id]; c = p.m_c3 * c - p.h; c = p.m_c; c3 = - p.m_c3 * c - p.h; } doble q = (c3 * a[k] + c); doble t = / q; a[k+] = - c * t; _net[id] = (f[id] - c3 * _net[id_prev]) * t;

24 Performance Comparison Test data Grid size of 8/9 8 non-linear iterations ( inner 4 oter) Hardware NVIDIA Tesla C060 Intel Core Qad (4 threads) Intel Core i7 Nehalem (8 threads)

25 Performance 8 - float time steps/sec NVIDIA Tesla C Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) 3 0 Dissipation Sweep NonLinear Total

26 Performance 8 - doble time steps/sec NVIDIA Tesla C060 Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) 0 Dissipation Sweep NonLinear Total

27 Performance 9 - float time steps/sec NVIDIA Tesla C060 Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) 0 Dissipation Sweep NonLinear Total

28 Performance 9 - doble time steps/sec NVIDIA Tesla C060 3 Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) Dissipation Sweep NonLinear Total

29 GPU performance SP/DP time steps/sec float doble 0 dissipation sweep nonlinear total In doble precision GPU is only slower than in single precision

30 Visal reslts Bondary conditions no-slip free = Constant flow at start:, v w 0 No-slip on sides: v w 0 Free at far end: v w 0

31 Visal Reslts X-slice v w T = 0,9 t = 6 Re = 000

32 Ftre Work Effective mlti-gpu sage Distribted memory systems Performance improvements High resoltion grids, high Reynolds nmbers

33 Conclsion High performance and efficiency of GPUs in comple 3D flid simlation CUDA is an easy-to-se tool for GPU compte programming GPU enables new possibilities for researching

34 Qestions? Thank yo! Keywords: ADI, Tridiagonal Solvers, DNS, Trblence

35 References Paskonov V.M., Berezin S.B., Korkhova E.S. (007) A dynamic visalization system for mltiprocessor compters with common memory and its application for nmerical modeling of the trblent flows of viscos flids, Moscow University Comptational Mathematics and Cybernetics ADI method - Doglas Jr., Jim (96), "Alternating direction methods for three space variables", Nmerische Mathematik 4: 4 63

36 Dissipative Fnction z v y w z w z w z v z y w z v y v y w y v y w z v y w v z y z y

Using GPU to Compute Options and Derivatives

Using GPU to Compute Options and Derivatives Introdction Algorithmic Trading has created an increasing demand for high performance compting soltions within financial organizations. The actors of portfolio management and ris assessment have the obligation