Tackling the big data challenges in E&P Dr Duncan Irving, EMEA Oil and Gas Practice Lead
What if you could perform all E&P analytical activities through a web browser? work collaboratively on a single instance of the data? guide the science rather than drive the PC? Seismic Volumes Reservoir Models Well Sensors
What if you could perform all E&P analytical activities through a web browser? work collaboratively on a single instance of the data? guide the science rather than drive the PC?
What if you could perform all E&P analytical activities through a web browser? work collaboratively on a single instance of the data? guide the science rather than drive the PC? + +
What if you could perform all E&P analytical activities through a web browser? work collaboratively on a single instance of the data? guide the science rather than drive the PC? guarantee data custodianship based on standards? know who knew what, and when?
What is inhibiting this? perform all E&P analytical activities through a web browser? work collaboratively on a single instance of the data? guide the science rather than drive the PC? guarantee standardsbased data custodianship? know who knew what, and when? browser capabilities network bandwidth data access data volumes compute loads poor data model/structures 70% of time managing data analytical compartmentalisation application-centric view file/transfer formats rule no community ownership no granularity temporally-enabled data governance
What is out there to help? perform all E&P analytical activities through a web browser? work collaboratively on a single instance of the data? guide the science rather than drive the PC? guarantee standardsbased data custodianship? know who knew what, and when? browser capabilities network bandwidth data access data volumes compute loads poor data model/structures 70% of time managing data analytical compartmentalisation RDBMS application-centric view file/transfer formats rule High Performance no community XXXX ownership no granularity Big Data Solution temporally-enabled data governance appliances Data Warehouse
Why can t I make sense of it all? I am old I am an enterprise architect, not a web programmer! I am a web programmer, not an enterprise architect! I see red and twitch involuntarily whenever I hear Big Data I have all of these things and I m still confused Even with all these pieces, no one has yet managed to bring all necessary data to bear on business questions in a useful timeframe RDBMS High Performance XXXX Big Data Solution appliances Data Warehouse
Am I building it properly? (A lesson from history) 10 15 Subsurface 10 14 volumes (bytes) 10 13 10 12 10 11 Disk Capacity (bytes) 10 10 10 9 10 8 10 7 10 6 10 5 Interconnect Speed Network (bps) Speed (bps) Data Transfer (bps) 10 4 10 3 Transistors on a CPU 1980 1990 2000 2010
Am I building it properly? Subsurface volumes (bytes) Data Transfer (bps) Questioning power is not keeping up with the potential value of the answer.
Application-centric approach??? The quest to integrate decision making led to: Application Service Provision Collaborative Visualisation Virtualized workstations Proprietary file formats No long-term stewardship of: File formats Decision milestones Data dependencies Information management
Analytical Compartments v Data Flow Action Decision Data Retention Insight Knowledge Knowledge Discovery Information Decision Support Wells Borehole Sensors Data Subsurface Models Seismic
E&P Data Usage Modes 50 45 40 35 30 25 20 15 10 5 Could be "Operationalized" Could be used in "Knowledge Discovery" Data of low business value Knowledge Discovery 0 Shell BP Aramco Qatar O&G Statoil Total Repsol Gazprom Maersk Centrica ENI GDF-Suez British Gas NPD DONG KOC Lukoil OMV RWE Gupco Subsurface Data Volumes (Petabytes) Caveat: These numbers are half-baked estimates. Error is ~50% Data Retention Decision Support
The E&P analytical landscape Data Volume / yr Petabytes Exploration Production Terabytes Refining Distribution Gigabytes Logistics Core and Borehole hrs days months Data Latency years
The E&P analytical landscape Many other industries are curing their big data problems: Analytical Integration Massive data volumes Mixed workloads Query Concurrency Exploration Production Derived and Duplicated Seismic, 5000 Exploration Seismic, 5000 Other, 660 Production Seismic, 2000 Geological Interpretation, 100 Reservoir Models, 1000 Production Sensor, 500 Asset and Logistics, 500 Trading, 50 Retail and Marketing, 10 ERP, 100 Refining Distribution Logistics Approximate volumes in Terabytes
Integrated Operations Analytics Wellfield Data Warehousing in a supermajor How to do a rapid comparison between new and everything ever seen and make rapid decisions on it. Data Retention Knowledge Discovery Decision Support Carry out $100M+ seismic survey every three years to re-image producing reservoir Spend 2-3 months reprocessing data and reincorporate into workflow Hope geoscientists and engineers can control reservoir flows at the weekly scale based on imaging from the year+ timeframe Current approach Install seafloor seismic imaging array and stimulate with in-reservoir tectonic events, and supply-vessel based airguns Spend 2-3 days reprocessing data and reincorporate into workflow Allow geoscientists and engineers to respond to HSE and production issues to see how the reservoir is evolving in a right-time manner! Shortened Timeframe Ideal approach
Best of Breed Big Data Architecture ETL Data Retention Seismic Imaging Seismic Modelling Reservoir Char n Reservoir Monitoring Reservoir Modelling Operations Knowledge Discovery MR/Hadoop ~5 concurrent users Fast Loading Data Assimilation Online Archival SQL-MR ~25 concurrent users Decision Support Data Discovery Knowledge Generation Pattern Detection Data Warehouse ~100+ concurrent users Decision Support Data Dependencies Predictive Analytics
MapReduce/Hadoop Is Not a database HDFS No schema, indexes, optimizer No high availability, security Not high performance Not a data warehouse No integrated data, no history Severe data skew Not mature technology Early open source A few ISV tools integrating Single points of failure Not a cloud technology, per se Uses Seismic Processing Trace Sorting 1D/2D filtering and transformation Online Seismic Archiving Repurposing WITSML and other sensor feeds It is a great place to get to know your data.
SQL-MapReduce Is a database MPP database schema, indexes, optimizer security Is high performance Not a data warehouse No integrated data but it can talk to a DW via SQL Not mature technology A few ISV tools integrating Single points of failure Can enable AaaS Uses: Pattern matching Feature extraction Spot the difference Statistical investigations: Clustering Likelihoods Associations Fail fast hypothesis reduction: Seismic Modelling Reservoir Modelling frameworks
How do I describe a pattern in SQL-MR? Simple partitioning in-trace (vertical) analytics and adjacent trace analytics Broader pattern matching how do I state spatial relationships, mesoscale textures? easy with SQL-MR! How do I find everything that looks like a channel in my 10 Tb 3D (inc. pre-stack) image volume? 2D Profile through a 3D volume showing a cross-section across a filled channel. The Power of SQL-MR Java MR functions perform the standard textural descriptions (reflectivity, variance, etc) and SQL asks the questions above, inside, below. 1. Capping boundary: Strong, broadly convex reflection (npath) 2. Basal boundary: Very strong, grossly concave reflection with local minima (npath) 3. Overburden facies: laterally moderately continuous with high amplitude stratigraphy (npath) 4. Incised facies: laterally continuous (low variance) and high amplitude stratigraphy (npath) 5. Internal facies: moderate variance, poor lateral continuity and steeply-dipping beds (statistical description)
The Active Data Warehouse in E&P Operationalised DSS Master Store for all data data Sensor is stored Data in a manner that allows it to be useful Features Fully-integrated Highly relational Fully-decomposed ACID compliant Enterprise Grade Decision Control Systems Asset Data Remove scaling and complexity barriers Asset Management Current Problems: Lack of integration Performance barriers Poor scaling to large data volumes and higher complexity Cannot provide answers to big strategic questions Integrated Single Instance Establish workflows to integrated operations Integrate Subsurface insights with Logging Asset Data and Maintenance actions Link production activities LWD/LWP reporting via + Logical real time analytics Data Models = Datamining while Drilling to other areas of the business Permanent Seismic Monitoring Fracturation Models and regimes Hydrocarbon Accounting
Enabling the E&P cloud Seismic Archiving Seismic Imaging Reservoir Characterisation Reservoir Modelling Reservoir Monitoring Asset Management Integrated Operations Hydrocarbon Accounting End-to-end workflow management guarantee data custodianship based on standards know who knew what, and when? Data dependency management