Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin.

Transcription

1 Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp.

2 Cost/Complexity Product Maturity Life Cycle in the Open Systems Market Heterogeneous SANs RISC systems Grids Project based SANs Proprietary Standardization 8P servers HPC Clusters Network Attached Storage 4P servers Simplicity/Volume/Choice Direct Attached Storage 1/2P servers Appliance Servers Workstation Desktops Fully Standardized 2 Enterprise Solutions

3 Our Vision Customers define our success: Begin with the customer. End with the customer Provide the best price/performance solutions to our customers in HPC Promote standardization to provide choice, lower cost of ownership, and simplicity in HPC solutions Evangelize new HPC technologies and selectively adopt the relevant ones for productization Derive the requirements for products by focusing on applications Provide a total solution: Hardware, software and services Partner with best of class in HPC 3 Enterprise Solutions

4 Building Block Approach Benchmark Parallel Benchmarks (NAS, HINT, Linpack ) and Parallel Applications Middleware MPI/Pro MPICH MVICH PVM OS OS Linux Windows Protocol TCP VIA GM Elan Interconnect Fast Ethernet Gigabit Ethernet Myrinet Quadrics Infiniband Platform Dell PowerEdge Servers (IA32 & IA64) 4 Enterprise Solutions

5 Dell and UT Austin Dell is sponsoring research in reservoir simulation at the Department of Petroleum and Geosystems Engineering Dr. Kamy Sepehrnoori is collaborating with Dell s HPCC team on performance studies, paper publications, and parallel simulator development Dell HPCC team includes graduates from Dr. Sepehrnoori s group specialized in Petroleum Engineering Dell has participated in Reservoir Simulation JIP (Joint Industry Project) in the past, and is planning to attend the upcoming meeting Dr. Sepehrnoori has access to Dell HPC lab for running large simulations, and is provided with hardware for development, testing, and performance studies of his program 5 Enterprise Solutions

6 A Performance Study of Parallel Reservoir Simulation on HPC Clusters Baris Guler Tau Leng Victor Mashayekhi Reza Rooholamini Dell Computer Corporation Kamy Sepehrnoori Center for Petroleum and Geosystems Engineering The University of Texas at Austin

7 Outline Background Software/Hardware Description Compositional Reservoir simulation on HPCs Results Summary Future Work

8 Reservoir Simulation Application Reservoir Forecasting Reservoir Performance optimization Sensitivity Analysis History Matching Risk Assessment through Stochastic Simulation Assessment of Uncertainity in Forecasting Value of Information Studies Reservoir Management

9 Reservoir Simulation Steps Data Input/Model Initialization Do Time Step Computation Solution of Non-Linear Partial Differential Equation Discretization Linearization and Newtonian Iteration Solution using Direct or Iterative Solvers Test for Convergence of Solution Data Output/Graphics Time-Step Increment End of Simulation Study Results Processing/Interpretation

10 Reservoir Simulation Hardware HPCs MPPs PCs/Workstations RISC Workstations Supercomputers Mainframes

11 Benefits of Parallel Processing Turn-around time Large-scale simulations Cost

12 Parallel Processing Massively Parallel Computers High Performance Computing Clusters

13 Benefits of Clusters Scalability High Performance Computing Low Cost Availability

14 Computational Mode Distributed processing Parallel processing

15 Distributed Processing Input Generator D 1 D 2 D 3 D n User Batch Queuing System to Simulation Program n >> m P 1 P 2 P 3 P m Database Post Processing

16 Input Data Cluster Simulation System FS 1 FS 2 FS FS m Cluster Scheduler Cluster Scheduler DS 1 DS 2 DS DS n Project Advisor User Input Output Data Generator Data Generator Archiver Post-Processor Processor

17

18 CPU-6 CPU-6 CPU-3 CPU-3 CPGE Parallel Processing CPU-1 CPU-1 CPU-2 CPU-2 CPU-5 CPU-5 FD RESERVOIR CPU-4 CPU-4 CPU-1 CPU-1 FD & DD

19 Domain Decomposition Ghost Layers Creation Communication Fundamental strategy for grid-based parallel simulation. Example: 10 x 15 grid 6 processors

20 Performance Issues in Parallel Processing Software Design Algorithm Parallelization Programming practice Load Balancing

21 Performance Issues in Parallel Processing Hardware Configuration CPU Cache Memory subsystem Front Side Bus I/O bandwidth Interconnect

22 Hardware - Interconnect Type Fast Ethernet Gigabit Ethernet Giganet Myrinet Infiniband 4x Quadrics Dolphin Speed(MBps) Latency(ms)

23 CPGE-1(Ararat) 12 Nodes / 16 Processors 1.0 GHz Intel Pentium III Xeon processors 256 MB of memory Diskless configuration 100 Mbps switched Fast Ethernet and Giganet interconnects

24 TACC-1(Tejas) 32 Nodes / 64 Processors 1.0 GHz Intel Pentium III processors 1 GB of memory/processor 225 MBps Myrinet-2000 interconnect

25 Parallel Reservoir Simulators Chevron-Texaco Conoco-Phillips Exxon-Mobil IFP and Beicip-Franlab Landmark Graphics Corporation Schlumberger-Geoquest Saudi Aramco UT CPGE, UT CSM Note : 93 clusters in Top500 supercomputer sites, 23 in Oil and Gas sector.

26 Compositional Reservoir Simulation on HPCs

27 Project Objectives Develop a general purpose adaptive simulator (GPAS) capable of: modeling of complex physical processes including EOS compositional, chemical, black-oil and thermal high resolution studies on supercomputers and highperformance cluster

28 HPC Initiatives Evaluate and compare performance of different cluster systems Test and analyze performance of different parallel simulators Identify areas of improvement in parallel algorithm design and cluster setup for optimal parallel reservoir simulation

29 Summary of Clusters Cluster CPU Type CPU Speed (MHz) CPUs Memory per CPU Interconnect CPGE-1 (Fuji) Pentium II x1=16 384MB Fast Ethernet CPGE-1 (Rocky) Pentium II Xeon 400 8x2=16 256MB Fast Ethernet CPGE-1 (Ararat) Pentium III Xeon x1+4x2=16 256MB Fast Ethernet DELL-1 (PE 1550) Pentium III x2=32 512MB Myrinet, Gigabit, Fast Ethernet DELL-2 (PE 2650) Intel Xeon DP x2=128 1GB Myrinet, Gigabit, Fast Ethernet TACC-1 (Tejas) Pentium III x2=64 512MB Myrinet TACC-2 (Longhorn) Power x16=64 2GB IBM SP Switch2

30 Parallel Simulators Tested GPAS VIP (2003r4)

31 CPGE Simulator (GPAS) EOS Compositional Peng-Robinson EOS Fully Implicit PETSc Linear Solvers Parallel (IPARS Framework)

32 Performance Results

33 Base Benchmark Problem Compositional model 3-component Peng-Robinson EOS Dry-gas cycling process Reservoir size: 800 x x 160 ft, homogeneous 2 wells, 1 Injector, 1 producer Grids: 16 x 224 x 8 (28,672 cells) Unknowns : 229, days of gas injection One dimensional domain decomposition

34 Single-Processors Execution Times(GPAS) Base Benchmark Problem Fuji with Pentium II 300MHz Rocky with Pentium II Xeon 400MHz PowerEdge 1550 with Pentium III 1.0GHz Ararat with Pentium III Xeon 1.0GHz TACC-Tejas with Pentium III 1.0Ghz Dell-PE2650 with Intel Xeon DP 2.4GHz Execution Time [sec]

35 Multi-Processors Execution Times(GPAS) Base Benchmark Problem Execution Time (seconds) Fuji Rocky Ararat PE 1550 PE 2650 Tejas Longhorn Number of Processors

36 Multi-Processors Speedups(GPAS) Base Benchmark Problem Sppedup Fuji(FE) Rocky(FE) Ararat(FE) PE 1550(FE) PE 2650(FE) Tejas(My) Longhorn(*) Ideal Number of Processors

37 Comparison of MPI-Interconnects Interconnects (GPAS) Base Benchmark Problem DELL PE 2650 (Single processor/node) MPICH-GIGABIT MPICH-GM - MYRINET MPI/PRO-GIGABIT MPICH-FE Ideal Speedup Number of Processors

38 Constant Problem Size per Processor(GPAS) Fuji Rocky Ararat Tejas 800 Execution Time [sec] ,1CPU 38400, 2CPUs 76800, 4CPUs , 8CPUs , 16CPUs Grid Dimensions, Number of CPUs , 32CPUs

39 Modified Benchmark Problem Compositional model 3-component Peng-Robinson EOS Dry-gas cycling process Reservoir size: 7.3 x 24.2 x.1 miles Grids: 77 x 256 x 10 (197,120 cells) Unknowns : 1.57 million Anisotropic, Layered Permeability with Kv/Kh = wells, 54 Injectors, 24 producers, staggered line drive Injectors and Producers are completed fully 100 days of gas injection One dimensional domain decomposition

40 Multi-Processors Execution Times(GPAS) Modified Benchmark Problem DELL PE 2650 GBit-SINGLE My-SINGLE FE-SINGLE My-DUAL Execution Time (Seconds) Number of Processors

41 Multi-Processors Speedups(GPAS) Modified Benchmark Problem 72 DELL PE 2650 GIGABIT-SINGLE MYRINET-SINGLE FAST ETH-SINGLE MYRINET-DUAL Ideal Speedup Number of Processors

42 Commercial Parallel Simulator

43 REMARKS Our goal was to run the simulators in parallel mode and evaluate their performance for typical cases Our goal was to analyze the different issues involved in using the simulators in parallel and approaches to improved performance and design We did not Tune simulators for optimum performance Compare or match material balance errors of the simulator runs

44 Benchmark Problem for VIP Compositional model Modified SPE3 comparison project 9-component Peng Robinson EOS Gas condensate with gas cycling process Reservoir size: 10 miles x 4 miles x 160ft Grids: 180 x 72 x 4 (51,840 cells) 1 million unknowns Flow barriers present (using Transmissibility modifiers) 20 wells, 10 Injectors, 10 producers 10 years of cycling followed by 5 years of production

45 Multi-Processors Performance VIP

46 Multi-Processors Execution Times(VIP) MODIFIED SPE3 COMPARISON PROBLEM Elapsed Time (sec) Fuji Rocky Number of Processors

47 Multi-Processors Speedups(VIP) MODIFIED SPE3 COMPARISON PROBLEM Fuji Rocky Ideal Speedup Processors

48 Constant Problem Size per Processor(VIP) MODIFIED SPE3 COMPARISON PROBLEM Fuji Rocky Execution Time [sec] , 1CPU 51480, 2CPUs , 4CPUs , 8CPUs ,16CPUs Number of Cells, Number of CPUs

49 Million Cell Commercial Benchmark Problem for VIP IMPES scheme 7-component Peng Robinson EOS Grid: 100 x 100 x 100 (1 Million cells) 16 million unknowns Stochastically characterized data field 11 wells 49 Year run

50 Performance Speedups - VIP MILLION GRIDBLOCK PROBLEM DELL PE VIP(*) Ideal Speedup Number of Processors

51 Summary Tested GPAS and analyzed performance on new hardware Benchmarked performance of new clusters Compared performance of different interconnects and MPI libraries Tested commercial reservoir simulator VIP in parallel mode

52 Acknowledgements US Department of Energy Reservoir Simulation Joint Industry Project Members Dell Computer Corporation