Big Data Complexities for Scientific Computing in the Oil and Gas Industry



Similar documents
Collecting and Analyzing Big Data for O&G Exploration and Production Applications October 15, 2013 G&G Technology Seminar

DecisionSpace. Prestack Calibration and Analysis Software. DecisionSpace Geosciences DATA SHEET

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

HPC in Oil and Gas Exploration

Broadband seismic to support hydrocarbon exploration on the UK Continental Shelf

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

An Introduction to Applied Mathematics: An Iterative Process

Dip is the vertical angle perpendicular to strike between the imaginary horizontal plane and the inclined planar geological feature.

Basin simulation for complex geological settings

High Performance Data Management Use of Standards in Commercial Product Development

HANDBOOK FOR THE APPLIED AND COMPUTATIONAL MATHEMATICS OPTION. Department of Mathematics Virginia Polytechnic Institute & State University

Securing the future of decom

Survey of the Mathematics of Big Data

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001

Big Data and Big Analytics

Computer Science. 232 Computer Science. Degrees and Certificates Awarded. A.S. Degree Requirements. Program Student Outcomes. Department Offices

Geothermal. . To reduce the CO 2 emissions a lot of effort is put in the development of large scale application of sustainable energy.

A HYBRID GROUND DATA MODEL TO SUPPORT INTERACTION IN MECHANIZED TUNNELING

Structure of Presentation. The Role of Programming in Informatics Curricula. Concepts of Informatics 2. Concepts of Informatics 1

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Division of Mathematical Sciences

Poznan University of Technology Faculty of Electrical Engineering

Tackling the big data challenges in E&P. Dr Duncan Irving, EMEA Oil and Gas Practice Lead

Course Syllabus For Operations Management. Management Information Systems

FAN group includes NAMVARAN UPSTREAM,

Integration of Geological, Geophysical, and Historical Production Data in Geostatistical Reservoir Modelling

ANALYTICS CENTER LEARNING PROGRAM

MEng, BSc Computer Science with Artificial Intelligence

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

GeoEast-Tomo 3D Prestack Tomographic Velocity Inversion System

Overview. Gaudi Design Tools. The Design Process. The Design Process. Overview. Target Audience

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

14TH INTERNATIONAL CONGRESS OF THE BRAZILIAN GEOPHYSICAL SOCIETY AND EXPOGEF

Clustering through Decision Tree Construction in Geology

Oracle Big Data SQL Technical Update

Time-Series Databases and Machine Learning

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel

OpenFOAM Optimization Tools

II Ill Ill1 II. /y3n-7 GAO. Testimony. Supercomputing in Industry

Data Mining and Exploratory Statistics to Visualize Fractures and Migration Paths in the WCBS*

Certificate Programs in. Program Requirements

Introduction to Engineering System Dynamics

Graduate Courses in Petroleum Engineering

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

MEng, BSc Applied Computer Science

What would you like to talk about? Background fast and furious:

EarthStudy 360. Full-Azimuth Angle Domain Imaging and Analysis

The Internet of Things and Big Data: Intro

Visualisatie BMT. Introduction, visualization, visualization pipeline. Arjan Kok Huub van de Wetering

Well-logging Correlation Analysis and correlation of well logs in Rio Grande do Norte basin wells

Figure 1. The only information we have between wells is the seismic velocity.

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

WELL LOGGING TECHNIQUES WELL LOGGING DEPARTMENT OIL INDIA LIMITED

On the Impact of Oil Extraction in North Orange County: Overview of Hydraulic Fracturing

Fast Multipole Method for particle interactions: an open source parallel library component

Development of EM simulator for sea bed logging applications using MATLAB

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Data Centric Systems (DCS)

An Overview of the Finite Element Analysis

Federated, Generic Configuration Management for Engineering Data

How big data is changing the oil & gas industry

DecisionSpace Earth Modeling Software

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Page 1 of 5. (Modules, Subjects) SENG DSYS PSYS KMS ADB INS IAT

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Data Analytics at NERSC. Joaquin Correa NERSC Data and Analytics Services

Big Data: Rethinking Text Visualization

Hue Streams. Seismic Compression Technology. Years of my life were wasted waiting for data loading and copying

TABLE OF CONTENTS PREFACE INTRODUCTION

Representing Geography

Applied Mathematics and Mathematical Modeling

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

THE EVOLVING ROLE OF DATABASE IN OBJECT SYSTEMS

Datavetenskapligt Program (kandidat) Computer Science Programme (master)

Computer Science. Computer Science 207. Degrees and Certificates Awarded. A.S. Computer Science Degree Requirements. Program Student Outcomes

HPC enabling of OpenFOAM R for CFD applications

Exploiting Prestack Seismic from Data Store to Desktop

In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015

How To Use Hadoop For Gis

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

DEGREE PLAN INSTRUCTIONS FOR COMPUTER ENGINEERING

Applied mathematics and mathematical statistics

What Does Big Data Mean and Who Will Win? Michael Stonebraker

Data. Data and database. Aniel Nieves-González. Fall 2015

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Integrating a Big Data Platform into Government:

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Lecture 2 Linear functions and examples

NOSQL, BIG DATA AND GRAPHS. Technology Choices for Today s Mission- Critical Applications

Full azimuth angle domain decomposition and imaging: A comprehensive solution for anisotropic velocity model determination and fracture detection

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

Finite Element Method (ENGC 6321) Syllabus. Second Semester

Cost-Effective Business Intelligence with Red Hat and Open Source

Diploma Of Computing

Transcription:

1 Big Data Complexities for Scientific Computing in the Oil and Gas Industry nosql, SQL, and mo SQL http://www.limitpoint.com/images/publications/bigdatainoilandgas.pdf David M. Butler, President Limit Point Systems, Inc.

2 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model

3 The oil and gas business Adapted from [Krebbers] Upstream is exploration and production ( E&P ) (upper left) Downstream is transportation, refining, and marketing (lower right)

4 Major Acquired Upstream Data Types Time lapse raw seismic Time lapse prestack seismic image Time lapse poststack seismic image Well logs Production monitoring dozens of other data types

5 Time lapse raw seismic data each sensor gives amplitude as a function of time ~10K sensors moving towards ~1M ~10K shots ~5K samples/shot ~4 12 bytes/sample time lapse: repeat ~2/year ~10 years from [KrisEnergy] ~10 TB/project*~100 projects/year/major company ~1PB/year/major

6 Time lapse prestack seismic image data clean up seismic data remove noise remove artifacts other signal processing operations migrate data focus signal energy convert time to position up to 5D array of data reflectivity as a function of 3D position source-sensor 2D offset ~same size as raw seismic

7 Poststack seismic image data stack of prestack data aggregate over 1 or more array indices reduces size ~100x 2D or 3D image reflectivity as function of position similar to medical ultrasound image [epmag 1] interpret to produce model of subsurface

8 Well logs lower sensor package into well measure various properties as a function of depth ~10k samples ~1k components simple numbers bore hole images others typically done once before production starts ~100MB/well*~1K wells/year/major ~ 100GB/year/major [decogeo]

9 Production monitoring Classical methods at well head flow volumes gas/oil/water composition temperature pressure Distributed sensing methods fiber optic cables in well acoustic sensing temperature sensing ~1000 equivalent discrete sensors ~1k samples/sec continuous monitoring ~10-100GB/day/well function of time and position along well path [epmag 2] [slb 1] ~1K wells (growing rapidly) ~1PB/year/major

10 Major interpreted/modeled data types Geological structure model Velocity model Basin model Reservoir models geological quantitative engineering Geomechanical model dozens of other data types

11 Geological structure model geologist interprets seismic image identifies surfaces defining rock strata and faults very complex networks of intersecting surfaces iterative process seismic image depends on acoustic velocity acoustic velocity depends on rock type rock type interpreted from seismic image and well data ~1GB/structure ~1K structures/year/major ~1TB/year/major

12 Velocity model velocity of sound as a function of position in volume corresponding to geological structure scalar, vector, or tensor models used to produce seismic images accurate velocity model key to good seismic image ~1-10GB/model [geosoft] ~1K models/year/major ~1TB/year/major [pdgm 1]

13 Basin model dynamic model of entire sedimentary basin rock movement fluid movement study history of hydrocarbon deposits generation expulsion migration to reservoir entrapment useful in predicting whether structure contains oil or gas [outernode] ~100GB/model*~100/year/major ~10TB/year/major

14 Reservoir models static models prior to production estimate volume and other properities dynamic models fluid flow fluid composition function of position and time used to guide drilling & production keep wells producing ~100GB/project many fields, many versions/year/major ~100 TB/year/major [dgi]

15 Geomechanical model simulation of mechanical stresses and strains whole subsurface specific reservoirs stress, strain, deformation as function of position and time used to anticipate mechanical changes around bore hole and in reservoir ~1-10GB/model ~100 models/year/major ~100GB/year/major [slb3]

Summary of Upstream Data Types (Order of magnitude estimates) 16 Variety Volume (/object) Velocity (/year/major) Raw seismic ~1TB ~1PB Prestack seismic ~1TB ~1PB Poststack seismic ~10GB ~10TB Well logs ~100MB/well ~100GB Production monitoring ~10GB ~1PB Geological structure ~1GB ~1TB Velocity model ~1GB ~1TB Basin model ~100GB ~10TB Reservoir models ~100GB ~100TB Geomechanical model ~1GB ~100GB dozens of other data types, all important variety rather than volume or velocity is dominant feature

17 Upstream Data Flow (partial) [cda] complex interoperation between data types

18 Shared Earth Model concept integrated data base for evolving models of subsurface all data types multiple scales structure reservoir basin multiple interpretations and versions per object uncertainty quantification for everything provenance for everything constantly evolving holy grail of Exploration and Production ( E&P ) data integration in practice: still mostly vendor proprietary islands of integration

Shared Earth Model conceptually similar to conventional enterprise data warehouse 19 analysis and report oriented rather than transaction oriented integrates data from many different applications Extract-Transfer-Load ( ETL ) processes a critical component conventional warehouse and ETL relational data model provides conceptual framework Shared Earth Model for E&P data relational data model has not proven particularly useful why not? most data is physicist s field data

20 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model

21 Field Theory for Data Scientists physicist s field not same as database admin s field field describes some physical property as function of position and/or time in some physical object position in a physical object physical property physical property as a function of position use a simple example to introduce these ideas

22 A simple example derrick floor Upper well well junction Lower well bore 1 bore 2 Branched well

23 position in a physical object position represented by coordinate vector y R 2 r = x(p) y(p) y(p) p x(p) x

24 Physical property physical property types specified by mathematical physics family of types jointly referred to as multilinear algebra scalar types single number F vector types F column of numbers F = 0 F 1 tensor types matrix of numbers F = F 00 F 01 F 10 F 11 each has important algebraic properties a few dozen standard types, many more app specific types

25 Physical property as a function of position function (map) from physical space to property space associates a value of F with each p in the object y R 2 F r = F 00 F 00 x F 11 F 11 y infinite number of points infinite number of property values y(p) x(p) p x F 00 F 11 F 00 F 11 how do we represent this on the computer?

26 How do we represent a field on the computer? numerous methods small industry busy creating new methods makes interoperation and integration difficult some common features decompose physical object into simple pieces approximate by simple function on each piece

27 Decompose physical object into simple pieces mathematicians call each piece a cell decomposition is a cell complex df df s0 v1 s1 j s2 j s4 s3 v3 v5 s5 v4 v6 more commonly called a mesh

28 Approximate by simple function on each cell for each cell c: store a data tuple specify an evaluation method evaluation method F(p) = eval c(p) (p, data tuple) data tuple may or may not correspond to value of field at some point depends on evaluation method data for entire field is an array of tuples example: linear interpolation F F 0 value(p) F 1 v 0 p v 1 u(p) value(p) = u*f 1 + (1-u)*F 0

29 Data for entire field is an array of tuples cell 0 cell 1 cell 2 cell n-1 scalar F 0 F 1 F 2... F n-1 cell 0 cell 1 cell 2 cell n-1 vector F 0,0 F 0,1 F 1,1 F 0,2 F 1,2 F 1,0... F 0,n-1 F 1,n-1 cell 0 cell 1 cell n-1 tensor F 00,0 F 01,0 F 10,0 F 11,0 F 00,1 F 01,1 F 10,1 F 11,1... F 00,n-1 F 01,n-1 F 10,n-1 F 11,n-1 tuple components typically real (float or double) but may be of any type

30 How do we want to use field data? operations specified by mathematical physics five main categories topological operations compose and decompose geometric operations change the shape functional operations set and get the value at a point move field from one mesh to another algebraic operations add, subtract, multiply, divide, diagonalize,... calculus operations differentiate and integrate

31 Why isn t the relational model useful for field data? doesn t fit the way we want to store field data relational schema can t directly capture field entity captures data tuple entity instead of entire field entity field entity has to be reconstructed by queries normalization forces introduction of surrogate keys may require recursive queries doesn t fit the way we want to use field data table operations are too low level aren t useful for high level field operations no pay-off to using relational model most field data is stored in app-specific, proprietary flat files so what data model is useful for field data?

32 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model

33 The data model paradigm Data model [Codd] specifies class of mathematical objects operations on those objects constraints valid instances must satisfy Languages, libraries, tools based on data model Applications developed on top of tools Numerous benefits

34 Benefits of data model paradigm Increases level of abstraction for application development Increases capability of applications Facilitates interoperation and integration Increases productivity of programmers But

35 But Benefits only accrue if model captures application structure The more structure captured the bigger the benefit Important to capture as much structure as possible

Spectrum of mathematical structure captured by various data models 36 most nosql models capture less structure than relational the no in nosql should perhaps be less scientific apps have way more mathematical structure relational model isn t nearly structured enough scientific apps don t need no Structured Query Language need a (much) more Structured Query Language mo SQL

37 Data model/mo SQL requirements must capture common math structure of scientific data scalars, vectors, tensors topology and geometry fields algebra and calculus operations must describe how math entities are represented/stored decomposition into primitive types and operations decomposition for parallelism must maintain rigorous connection between high level semantics and low level implementation need a new data model

38 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model

39 Sheaf data model objects are discrete sheaves over finite distributive lattices math details: http://www.limitpoint.com/images/publications/the%20sheaf%20data%20model.pdf finite distributive lattice part space all distinct composite parts formed from set of basic parts discrete sheaf describes association of attributes with parts algebraic description of decomposition of abstract data types into tuples of primitive attributes

40 Visualizing a finite distributive lattice directed acyclic graph Hasse diagram two kinds of nodes composite parts basic parts links represent covers covers := immediately includes A covers B if and only if A includes B there is no C such that A includes C includes B. draw graph so that if A covers B, B is lower on page composite part A covers basic part B covers basic part C example

41 Example: branched well derrick floor well Upper well upper well lower well well junction bore 1 bore 2 Lower well Well parts bore 1 bore 2 df junction Hasse diagram basic parts are independent objects composite parts are precisely the sum of their basic parts

42 Sheaf table metaphor data base is a set of tables each table represents a type each row an instance each column an attribute rows carry client-defined lattice order col lattice is row lattice of some other table schema are first class objects unified algebraic framework for all common scientific data types

43 Unified framework for scientific data types tabular types contains relational model as limiting case row lattice is a boolean lattice physical property types scalars, vectors, tensors object-oriented types with multiple inheritance col lattice is subobject inclusion hierarchy spatial types (meshes) any decomposition of space row lattice represents spatial inclusion field types any property, any mesh, any evaluation method col lattice = tensor(mesh row lattice, property col lattice) rigorous connection between abstract math types and numeric reps from high level specification to tuples of primitives

44 Open Source Implementation SheafSystem Community Edition C++ libraries with Java, Python, and C# bindings Field API field types pushers refiners Geometry API coordinate sections (invertible sections) point locators spatial types Fiber Bundle Data Model API physical property types tensors groups Jacobians Sheaf Data Model API section types sheaf storage agent HDF5 www.sheafsystem.org or github

45 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model

46 Query language for sheaf data model work in progress with Prof Magne Haveraaen Bergen Language Design Laboratory, University of Bergen started with initial guess at operators extension of relational operators experience with implementation formalizing and refining definitions goal is mo SQL

47 Acknowledgements Mark Verschuren, Shell, provided many useful comments and other input for this presentation Original research and development funded by subcontracts B347785, B515090, and B560973 of prime contract W-7405- ENG-48 with the Department of Energy National Nuclear Security Administration (DOE/NNSA) Ongoing development has been funded by Shell GameChanger and Shell TaCIT http://www.limitpoint.com/images/publications/bigdatainoilandgas.pdf

48 END

49 References 1 [Krebbers] Big Data & Analytics: Exploiting it, Johan Krebbers, VP Architecture, Shell http://cdn.osisoft.com/corp/en/media/presentations/2013/ UsersConference2013/PDF/UC2013_Shell_Krebbers_GlobalIT Architecture_1.pdf [KrisEnergy] http://www.krisenergy.com/company/aboutoil-and-gas/exploration/ [epmag 1] http://www.epmag.com/exploration-geology- Geophysics/Three-D-Seismic-Advances-Improve-Exploration- Success_90469 [decogeo] http://www.decogeo.com/upload/image/log1_bigl.jpg

50 References 2 [epmag 2] http://www.epmag.com/item/das-enablessimultaneous-multiwell-vsp_121593 [slb1] http://www.slb.com/resources/case_studies/completions/~/medi a/images/completions/intelligent/wellwatcher_neon_tp_01tn.jpg [slb 2] System of subsurface faults and horizons in the Gulfaks oil field in the Norwegian sector of the North Sea. Data set courtesy of Schlumberger Limited. [geosoft] http://blogs.geosoft.com/exploringwithdata/2012/08/3dmodelling-with-velocity-volumes-in-gm-sys.html [pdgm 1] http://www.pdgm.com/getmedia/c72b49d9-571b-4fe8- ae3f-bfd00f862b0d/skua-salt- 2010.jpg.aspx?width=1024&height=650&ext=.jpg

51 References 3 [slb 3] http://www.software.slb.com/publishingimages/totalstress.jpg [dgi] http://www.dgi.com/images/cvslideshow/fullsize/coviz4d_slides how_003.jpg [outernode] http://outernode.pir.sa.gov.au/ data/assets/image/0020/119009 /Curnamona_3D.jpg [cda] http://www.oilandgasuk.co.uk/cmsfiles/custom/html/report- 14.png [Codd] E. F. Codd. 1970. A relational model of data for large shared data banks. Commun. ACM 13, 6 (June 1970), 377-387. DOI=10.1145/362384.362685 http://doi.acm.org/10.1145/362384.362685