Ridgeway Kite Innova've Technology for Reservoir Engineers A Massively Parallel Architecture for Reservoir Simula'on

Size: px

Start display at page:

Download "Ridgeway Kite Innova've Technology for Reservoir Engineers A Massively Parallel Architecture for Reservoir Simula'on"

Christiana Simmons
10 years ago
Views:

1 Innova've Technology for Reservoir Engineers A Massively Parallel Architecture for Reservoir Simula'on Garf Bowen 16 th Dec 2013

2 Summary Introduce RKS Reservoir HPC goals Simple example, results Full problem, results and challenges

3 RKS Start- up (April 2013) Long history in Reservoir Sister company, NITEC Massively Parallel Code Coupled surface network

4 Reservoir Finite Volume Unstructured (features) Implicit R= M F=0

5 Driving from London to Manchester Check the Ferrari or the traffic jam? Lot of code that all needs to go fast Challenge is o_en not to go slow Can t just focus on hot spots

6 HPC goals not to go slow Portability CPU/GPU/Phi (+clusters) Want to be future proof (massive) is an opportunity Developer efficiency Same result on any plaeorm

7 Shuffle Calculate Pagern Scager I/O from node zero Shuffle Calculate one- to- one Gather output are embarrassingly parallel No indirect addressing Ability separately

8 Example calculate flows One flow two cells Different flow same cell One cell involved in flows copies slots More flows than cells

9 one code kernel many (independent) calls Simplicity Returns? Split to run MPI distributed on the CPU Underlying system - XPL Takes care of running Different modes Different architectures Code looks serial again

10 Maps & MPI Src Dest Slot i 1 j 1 0 i 2 j 2 1 i 3 j 3 0 i 4 j 4 1 Maps are defined in serial space Not recommended test.exe cpu test.exe gpu mpirun np 16 test.exe

11 Simple Example x i = A i 1 r i i template<typename KP> struct Testinv A - n*n small dense matrix ~millions of i s LU factoriza@on (par@al pivo@ng) host device Testinv(Args* inargs, int index, int N) int ia=0; mat<double,kp> a(inargs,ia++,index); vec<double,kp> r(inargs,ia++,index); vec<double,kp> x(inargs,ia++,index); mat<double,kp> w(inargs,ia++,index); case rks::testkernels::test_inv: w = a; calc(inargs, gpu<testinv<kp> >, cpu<testinv<kp> w.inv(); >,omp<testinv<kp>,phi<testinv<kp> >); break; x.zero(); w.mult(r,x);

12 Layout Array- of- structures (CPU friendly) 0 n 2n 3n 4n 5n 6n 7n 8n 1 n+1 2n+1 3n+1 4n+1 5n+1 6n+1 7n+1 8n+1 Structure- of- arrays (GPU friendly) Templated policy <KP> switch MPI jobs using both CPU & GPU Future proof? Prevents chea@ng no double* pt

13 Performance log 'me (secs) Scaling by matrix size - 1e6 (10 'mes) CPU GPU log 'me (secs) Log n Scaling y = 2.35x y = 2.23x CPU GPU log dense matrix size Scaling for the 3*3 case (10 'mes) log 'me (secs) CPU GPU E E E+07 log number of matrices

14 Effect of layout GPU: Effect of layout CPU: Effect of layout log 'me (secs) log 'me (secs) log dense matrix size s- of- a a- of- s s- of- a a- of- s log dense matrix size

15 Now add complexity well ==================================================== jac Comparison mass between: cpu flow and gpu ==================================================== flow_ norm well lin jac ling mass lins flow orth-it flow_ norm norm precon lin pressure ling lins orth-it norm precon pressure

16 Linear Solver Strategy Linear Solver Important Mechanism Challenge in parallel environments Like gesng the same results If we can implement a solver in XPL, then we get this for free but we re only a small company And don t really want to be linear solver experts Home grown May not be compe@@ve Using Nvidia s AmgX Lose the same algorithm Performing

17 Linear Solver Home Grown Massively helpful for development Challenged on difficult problems AmgX Many (pre- coded) Single GPU working well MPI is a challenge Implementa@on has to fit around it Some solvers missing

18 Summary & Conclusions Shuffle- Calculate pagern Works for us, so far Portable Allowing us to exploit the GPU Full system Commercial offering next year

19 Acknowledgements Co- authors: Bachar Zineddin & Tommy Miller The authors would like to acknowledge the work presented here made use of the IRIDIS*/EMERALD* HPC facility provided by the Centre for InnovaLon. Nvidia for AmgX beta access

21 Backup#1 LU code example Main elimination loop for (int j=0; j<m_xdim; j++) Sum for (int i=0; i<j;i++) double sum = (*this)(i,j); for (int k=0; k<i; k++) sum = sum - (*this)(i,k)*(*this)(k,j); } (*this)(i,j) = sum; } Max aamax = 0.0; for(int i=j; i<m_xdim; i++) double sum = (*this)(i,j); for( int k=0; k<j; k++) sum = sum - (*this)(i,k)*(*this)(k,j); } (*this)(i,j) = sum; } if ( std::fabs(vv[i]*sum)>=aamax ) imax = i; aamax = std::fabs(vv[i]*sum); } Swap if (j!=imax) for( int k=0; k<m_xdim; k++) double dum = (*this)(imax,j); (*this)(imax,k) = (*this)(j,k); (*this)(j,k) = dum; } vv[imax] = vv[j]; } Store piv[j] = imax; if ( (*this)(j,j)==0.0 ) (*this)(j,j) = 1e-20; } Set if(j!=m_xdim) double dum = 1.0/(*this)(j,j); for( int i=j+1; i<m_xdim; i++ ) (*this)(i,j) = (*this)(i,j)*dum; } } } End lu step

22 Backup#2 Home Grown Solver [ A ww & A A bw & A bb ][ x x b ]= [ R R b ] [ A ww &0@ A bw & A bb ][ I& A ][ x x b ]= [ R w A bb = A bb A bw A ww 1 A wb Note: (1 x) 1 =1+x+ x 2 + x With: x= A bw A ww 1 A wb A bb 1

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected]

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected] Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket