Application of Grid-Enabled Technologies for Solving Optimization Problems in Data Driven Reservoir Systems

Transcription

1 Application of Grid-Enabled Technologies for Solving Optimization Problems in Data Driven Reservoir Systems M. Parashar, H. Klie, U. Catalyurek, T. Kurc, V. Matossian, J. Saltz, M.F. Wheeler

2 ITR Collaborators University of Chicago CS: Stevens, Papka University of Maryland CS: Sussman MIT Ohio State CS: Saltz, Kurc, Catalyurek Rutgers ECE: Parashar Univ. Chicago OSU Rutgers MIT Engineering: Haines UT Austin Wheeler, Dawson, Peszynska, Klie, Bangerth (Computaional and Applied math); Sen, Stoffa, Seifoullaev (UTIG), Torres-Verdin (CPGE) UMD UT Austin

3 The Instrumented Oil Field Detect and track changes in data during production Invert data for reservoir properties Detect and track reservoir changes Assimilate data & reservoir properties into the evolving reservoir model Use simulation and optimization to guide future production, future data acquisition strategy

4 Assumptions: Production of oil and gas will take advantage of permanently installed geophysical sensors and down hole instrumentation that will monitor the reservoir s state as fluids are extracted. Knowledge of the reservoir s state during production will result in better engineering decisions to modify production techniques that optimize goals while maintaining safe operating conditions in environmentally complex and difficult areas.

5 Data Driven Model Optimization Management decision Dynamic Decision System Optimize Economic revenue Environmental hazard Based on the present subsurface knowledge and numerical model Update knowledge of model Subsurface characterization Dynamic Data- Driven Assimilation Improve knowledge of subsurface to reduce uncertainty Acquire remote sensing data Data assimilation Improve numerical model Experimental design START Plan optimal data acquisition Processing Middleware Autonomic Grid Middleware Grid Data Management

6 D V D D V D D V D D V D D V D D V D MetaData Servers DDDSF Requires Multi-petabyte Virtual Data Archive Ohio Supercomputing Center Mass Storage Testbed (2) 890 MB/s Throughput SAN Volume Controller (4 servers) (2) (2) 890 MB/s throughput (2) (4) 772 MB/s throughput (16-4 per server) 890 MB/s throughput (2) (2) Cisco Directors 9509 (4) 772 MB/s throughput (4) 772 MB/s throughput FAStT900 (4) (2) (4) 772 MB/s throughput Core Storage Pool (35/50 TB) with SAN.FS (2) (40-2 per xseries) 10 GB/s (40-2 per T600) 384 MB/s throughput (4) LinTel boxes (PvFS/ Active Disk Archive) (20) FAStT600 Turbo (20) Scratch / Archive Storage Pool (310/420 TB) Backup Storage 3584 Tape 1 L32 2 D32 Actual: GB for a total of 128 TB 4 drives max drive data rate is 35 MB/s IBM s Storage Tank technology combined with TFN connections will allow large data sets to be seamlessly moved throughout the state with increased redundancy and seamless delivery. 50 TB of performance storage home directories, project storage space, and longterm frequently accessed files. 420 TB of performance/capacity storage Active Disk Cache - compute jobs that require directly connected storage parallel file systems, and scratch space. Large temporary holding area 128 TB tape library Backups and long-term "offline" storage

7 A new generation of IPARS Optimizing oil production on the Grid Static data Clients Visualization Data manag./ assimilation Steering Monitoring Dynamic data Objective function Collaboration

8 Optimization with a Known Oil Reservoir Model f: Objective function α: Control variables in feasibility set A c: Model data

9 Interplay between Data Acquisition, Data Assimilation and Optimization Model c as stochastic E: expectation for PDF of c A posteriori PDF computed to describe current subsurface knowledge Optimization seeks best production strategy Control variables α parameterize production and data acquisition strategy Good choice of α optimizes production and improves model certainty

10 Parallel/Grid Computing Tools The Multiblock Adaptive Computational Engine (MACE) for solving heterogeneous domain applications Adaptive grid blocks Automatic and transparent scheduling, load balancing Distributed Shared Objects: distributed dynamic arrays Datacutter/STORM: Middleware for On-Demand Data Product Generation for Large Archival Scientific Datasets in a Grid Environment Exploration and analysis of scientific datasets in distributed and heterogeneous environments Represents components of a data-intensive application as a set of filters Data virtualization for heterogeneous collections of data formats, storage systems Discover: Grid Computational Collaboratory enabling seamless and secure access to and interactions between users, applications, services, data and resources P2P Grid Middleware: services, autonomic composition, secure access Collaborative Portals

11 Scalability of IPARS and geomechanical coupling 1.2 Parallel efficiency Number of processors Domain by by 1059 feet 513 by 513 by 45 mesh points 282 nodes of dual-processor Dell PowerEdge GHz computer interconnected by a Myrinet 2000 with a point-to-point bandwidth of 2Gb/sec. Each node has a 2GB of memory.

12 Data Middleware Services Filter-stream based distributed execution middleware (DataCutter, STORM) Grid based data virtualization, data management, query, on demand data product generation (STORM, Active ProxyG, Mako) Distributed metadata management (Mobius Global Model Exchange) Track metadata associated with workflows, input image datasets, checkpointed intermediate results

13 Processing Remotely-Sensed Data NOAA Tiros-N w/ AVHRR sensor Data Middleware Services and Very Large Scale Distributed Data Applications AVHRR Level 1 Data As the TIROS-N satellite orbits, the Advanced Very High Resolution Radiometer (AVHRR) sensor scans perpendicular to the satellite s track. At regular intervals along a scan line measurements are gathered to form an instantaneous field of view (IFOV). Scan lines are aggregated into Level 1 data sets. A single file of Global Area Coverage (GAC) data represents: ~one full earth orbit. ~110 minutes. ~40 megabytes. ~15,000 scan lines. One scan line is 409 IFOV s Satellite Data Processing Digital Pathology Managing Oilfields, Contaminant Transport DCE-MRI Analysis Derivation of macroscopic materials properties from MD simulations

14 DataCutter Flow control between components Schedulers place filters on grid processors (scheduler API) Parallel stream based communication Data aggregation implemented as a component Filters placed near data sources NPACkage, NMI host1 Combined Data/Task Parallelism R 0 R 1 E K+1 R 2 host2 Cluster 1 E 0 E K host1 E N Ra 0 host3 Ra 1 M host4 host1 Ra 2 host2 host5 Cluster 2 Cluster 3 9/11/2002 DataCutter 19

15 Automatic Data Virtualization Scientific and engineering applications require interactive exploration and analysis of datasets. Applications developers generally prefer storing data in files Support high level queries on multi-dimensional distributed datasets Many possible data abstractions, query interfaces Grid virtualized object relational database or XML database Grid virtualized objects with user defined methods invoked to access and process data A virtual relational table view Data Service Large distributed scientific datasets Data Virtualization

16 Our Approach Automatic data virtualization Friendly front-end Support a basic SQL Select query with a virtual relational table view or a virtual XML database view A lightweight layer on top of datasets STORM runtime middleware STORM carries out query execution, query planning Compiler front end customizes runtime support Automatic customization and configuration of runtime query support middleware

17 STORM Query Planning

18 STORM Query Execution

19 Compiler Customization support for Select query SELECT < Data Elements > SELECT * FROM < Dataset Name > FROM IPARS WHERE < Expression > WHERE REL in (0,6,26,27) AND TIME>1000 AND Filer( < Data Element> ); AND TIME<1100 AND SOIL>0.7 AND SPEED(OILVX, OILVY,OILVZ)<30.0;

20 Analysis of Oil Reservoir Simulation Data Prototype Implementation Evaluate geologic uncertainty and production strategies simultaneously Multiple realizations of multiple geostatistical models Multiple production strategies (number, location of wells) Dataset Size = ~5TB 500 simulations, selected from several Geostatistics models and well patterns Each simulation is ~10GB 2,000 time steps, 65K grid elements, 8 scalars + 3 vectors = 17 variables Stored at SDSC: HPSS and 30TB Storage Area Network System UMD: 9TB disks on 50 nodes: PIII-650, 128MB, Switched Ethernet OSU: 7.2TB disks on 24 nodes: PIII-900, 512MB, Switched Ethernet Data Analysis Economic model assessment Bypassed oil regions Representative Realization Selection for more simulations

21 Survey # Seismic Data Analysis STORM: On Demand Processing of 1.5 TB Seismic Dataset Line # Sp (or CDP) # & source position Traces Array # Receiver group # & receiver group position Component #

22 DISCOVER: A Grid Computational Collaboratory enabling seamless and secure access to and interactions between users, applications, services, data and resources CPU's, Storage, Instruments,... P2P Grid Middleware (PAWN, DISCOVER-COG) Peer services (discovery, routing, message publication, notification, event), context-aware access control, p2p deductive engines. Autonomic and Interactive Components (DIOS, AUTOMATE) Components encapsulate sensors, actuators, policies and rules. Distributed control network connects sensors, actuators and interaction agents. P2P deductive shell, control network, rules and polices enable autonomic composition, configuration, interaction, protection, optimization and adaptation. Collaborative Portals Pervasive (secure) access, monitoring, interaction and control User Scientist Laptop Computer PDA Discovery Points Resources P2P Grid Middleware Data Archive & Sensors Discovery Points Applications & Services Application Service DISCOVER Portals Data Archives Sensors, Non- Traditional Data Sources

23 Autonomic Oil Well Placement (UT-CSM, UT-IG) Optimization services: VFSA (Very Fast Simulated Annealing) SPSA (Simultaneous Perturbation Stochastic Optimization) IPARS delivers fast-forward model (guess->objective function value) post-processing Formulate a parameter space well position and pressure (y,z,p) Formulate an objective function: maximize economic value Eval(y,z,P)(T)

24 Autonomic Oil Reservoir Optimization using Decentralized Services

25 Components of the AORO Application IPARS : Integrated Parallel Accurate Reservoir Simulator Parallel reservoir simulation framework IPARS Factory Configures instances of IPARS simulations Deploys them on resources on the Grid Manages their execution VFSA/SPSA Optimization Services Optimizes the placement of wells and the inputs (pressure, temperature) to IPARS simulations. Economic Modeling Service Uses IPARS simulations outputs and current market parameters (oil prices, costs, etc.) to compute estimated revenues for a particular reservoir configuration. DISCOVER Computational Collaboratory Interaction & Collaboration Distributed Interactive Object Substrate (DIOS) Collaborative Portals

26 Autonomic Oil Well Placement (VFSA)

27 Autonomic Oil Well Placement (SPSA) Permeability field showing the positioning of current wells. The symbols * and + indicate injection and producer wells, respectively. Search space response surface: Expected revenue - f(p) for all possible well locations p. White marks indicate optimal well locations found by SPSA for 7 different starting points of the algorithm.

28 The Future Scaling up: High resolution IPARS simulations Multi-petabyte distributed archives of model data Exploitation of OSC and Teragrid resources (large teragrid allocation approved) Large scale demonstration of Discover/STORM/DataCutter integration Experimental testbeds EPA/INEEL collaboration live sensor data from superfund site NSF Center for Subsurface Sensing and Imaging Systems Data from industrial affiliates New numerical methods Next generation accurate, multi-scale coupled chemical, fluid, geomechanical and geophysical simulator Large scale global optimization module to drive decision making