Big Data Complexities for Scientific Computing in the Oil and Gas Industry


 Griselda Nicholson
 1 years ago
 Views:
Transcription
1 1 Big Data Complexities for Scientific Computing in the Oil and Gas Industry nosql, SQL, and mo SQL David M. Butler, President Limit Point Systems, Inc.
2 2 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model
3 3 The oil and gas business Adapted from [Krebbers] Upstream is exploration and production ( E&P ) (upper left) Downstream is transportation, refining, and marketing (lower right)
4 4 Major Acquired Upstream Data Types Time lapse raw seismic Time lapse prestack seismic image Time lapse poststack seismic image Well logs Production monitoring dozens of other data types
5 5 Time lapse raw seismic data each sensor gives amplitude as a function of time ~10K sensors moving towards ~1M ~10K shots ~5K samples/shot ~4 12 bytes/sample time lapse: repeat ~2/year ~10 years from [KrisEnergy] ~10 TB/project*~100 projects/year/major company ~1PB/year/major
6 6 Time lapse prestack seismic image data clean up seismic data remove noise remove artifacts other signal processing operations migrate data focus signal energy convert time to position up to 5D array of data reflectivity as a function of 3D position sourcesensor 2D offset ~same size as raw seismic
7 7 Poststack seismic image data stack of prestack data aggregate over 1 or more array indices reduces size ~100x 2D or 3D image reflectivity as function of position similar to medical ultrasound image [epmag 1] interpret to produce model of subsurface
8 8 Well logs lower sensor package into well measure various properties as a function of depth ~10k samples ~1k components simple numbers bore hole images others typically done once before production starts ~100MB/well*~1K wells/year/major ~ 100GB/year/major [decogeo]
9 9 Production monitoring Classical methods at well head flow volumes gas/oil/water composition temperature pressure Distributed sensing methods fiber optic cables in well acoustic sensing temperature sensing ~1000 equivalent discrete sensors ~1k samples/sec continuous monitoring ~10100GB/day/well function of time and position along well path [epmag 2] [slb 1] ~1K wells (growing rapidly) ~1PB/year/major
10 10 Major interpreted/modeled data types Geological structure model Velocity model Basin model Reservoir models geological quantitative engineering Geomechanical model dozens of other data types
11 11 Geological structure model geologist interprets seismic image identifies surfaces defining rock strata and faults very complex networks of intersecting surfaces iterative process seismic image depends on acoustic velocity acoustic velocity depends on rock type rock type interpreted from seismic image and well data ~1GB/structure ~1K structures/year/major ~1TB/year/major
12 12 Velocity model velocity of sound as a function of position in volume corresponding to geological structure scalar, vector, or tensor models used to produce seismic images accurate velocity model key to good seismic image ~110GB/model [geosoft] ~1K models/year/major ~1TB/year/major [pdgm 1]
13 13 Basin model dynamic model of entire sedimentary basin rock movement fluid movement study history of hydrocarbon deposits generation expulsion migration to reservoir entrapment useful in predicting whether structure contains oil or gas [outernode] ~100GB/model*~100/year/major ~10TB/year/major
14 14 Reservoir models static models prior to production estimate volume and other properities dynamic models fluid flow fluid composition function of position and time used to guide drilling & production keep wells producing ~100GB/project many fields, many versions/year/major ~100 TB/year/major [dgi]
15 15 Geomechanical model simulation of mechanical stresses and strains whole subsurface specific reservoirs stress, strain, deformation as function of position and time used to anticipate mechanical changes around bore hole and in reservoir ~110GB/model ~100 models/year/major ~100GB/year/major [slb3]
16 Summary of Upstream Data Types (Order of magnitude estimates) 16 Variety Volume (/object) Velocity (/year/major) Raw seismic ~1TB ~1PB Prestack seismic ~1TB ~1PB Poststack seismic ~10GB ~10TB Well logs ~100MB/well ~100GB Production monitoring ~10GB ~1PB Geological structure ~1GB ~1TB Velocity model ~1GB ~1TB Basin model ~100GB ~10TB Reservoir models ~100GB ~100TB Geomechanical model ~1GB ~100GB dozens of other data types, all important variety rather than volume or velocity is dominant feature
17 17 Upstream Data Flow (partial) [cda] complex interoperation between data types
18 18 Shared Earth Model concept integrated data base for evolving models of subsurface all data types multiple scales structure reservoir basin multiple interpretations and versions per object uncertainty quantification for everything provenance for everything constantly evolving holy grail of Exploration and Production ( E&P ) data integration in practice: still mostly vendor proprietary islands of integration
19 Shared Earth Model conceptually similar to conventional enterprise data warehouse 19 analysis and report oriented rather than transaction oriented integrates data from many different applications ExtractTransferLoad ( ETL ) processes a critical component conventional warehouse and ETL relational data model provides conceptual framework Shared Earth Model for E&P data relational data model has not proven particularly useful why not? most data is physicist s field data
20 20 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model
21 21 Field Theory for Data Scientists physicist s field not same as database admin s field field describes some physical property as function of position and/or time in some physical object position in a physical object physical property physical property as a function of position use a simple example to introduce these ideas
22 22 A simple example derrick floor Upper well well junction Lower well bore 1 bore 2 Branched well
23 23 position in a physical object position represented by coordinate vector y R 2 r = x(p) y(p) y(p) p x(p) x
24 24 Physical property physical property types specified by mathematical physics family of types jointly referred to as multilinear algebra scalar types single number F vector types F column of numbers F = 0 F 1 tensor types matrix of numbers F = F 00 F 01 F 10 F 11 each has important algebraic properties a few dozen standard types, many more app specific types
25 25 Physical property as a function of position function (map) from physical space to property space associates a value of F with each p in the object y R 2 F r = F 00 F 00 x F 11 F 11 y infinite number of points infinite number of property values y(p) x(p) p x F 00 F 11 F 00 F 11 how do we represent this on the computer?
26 26 How do we represent a field on the computer? numerous methods small industry busy creating new methods makes interoperation and integration difficult some common features decompose physical object into simple pieces approximate by simple function on each piece
27 27 Decompose physical object into simple pieces mathematicians call each piece a cell decomposition is a cell complex df df s0 v1 s1 j s2 j s4 s3 v3 v5 s5 v4 v6 more commonly called a mesh
28 28 Approximate by simple function on each cell for each cell c: store a data tuple specify an evaluation method evaluation method F(p) = eval c(p) (p, data tuple) data tuple may or may not correspond to value of field at some point depends on evaluation method data for entire field is an array of tuples example: linear interpolation F F 0 value(p) F 1 v 0 p v 1 u(p) value(p) = u*f 1 + (1u)*F 0
29 29 Data for entire field is an array of tuples cell 0 cell 1 cell 2 cell n1 scalar F 0 F 1 F 2... F n1 cell 0 cell 1 cell 2 cell n1 vector F 0,0 F 0,1 F 1,1 F 0,2 F 1,2 F 1,0... F 0,n1 F 1,n1 cell 0 cell 1 cell n1 tensor F 00,0 F 01,0 F 10,0 F 11,0 F 00,1 F 01,1 F 10,1 F 11,1... F 00,n1 F 01,n1 F 10,n1 F 11,n1 tuple components typically real (float or double) but may be of any type
30 30 How do we want to use field data? operations specified by mathematical physics five main categories topological operations compose and decompose geometric operations change the shape functional operations set and get the value at a point move field from one mesh to another algebraic operations add, subtract, multiply, divide, diagonalize,... calculus operations differentiate and integrate
31 31 Why isn t the relational model useful for field data? doesn t fit the way we want to store field data relational schema can t directly capture field entity captures data tuple entity instead of entire field entity field entity has to be reconstructed by queries normalization forces introduction of surrogate keys may require recursive queries doesn t fit the way we want to use field data table operations are too low level aren t useful for high level field operations no payoff to using relational model most field data is stored in appspecific, proprietary flat files so what data model is useful for field data?
32 32 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model
33 33 The data model paradigm Data model [Codd] specifies class of mathematical objects operations on those objects constraints valid instances must satisfy Languages, libraries, tools based on data model Applications developed on top of tools Numerous benefits
34 34 Benefits of data model paradigm Increases level of abstraction for application development Increases capability of applications Facilitates interoperation and integration Increases productivity of programmers But
35 35 But Benefits only accrue if model captures application structure The more structure captured the bigger the benefit Important to capture as much structure as possible
36 Spectrum of mathematical structure captured by various data models 36 most nosql models capture less structure than relational the no in nosql should perhaps be less scientific apps have way more mathematical structure relational model isn t nearly structured enough scientific apps don t need no Structured Query Language need a (much) more Structured Query Language mo SQL
37 37 Data model/mo SQL requirements must capture common math structure of scientific data scalars, vectors, tensors topology and geometry fields algebra and calculus operations must describe how math entities are represented/stored decomposition into primitive types and operations decomposition for parallelism must maintain rigorous connection between high level semantics and low level implementation need a new data model
38 38 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model
39 39 Sheaf data model objects are discrete sheaves over finite distributive lattices math details: finite distributive lattice part space all distinct composite parts formed from set of basic parts discrete sheaf describes association of attributes with parts algebraic description of decomposition of abstract data types into tuples of primitive attributes
40 40 Visualizing a finite distributive lattice directed acyclic graph Hasse diagram two kinds of nodes composite parts basic parts links represent covers covers := immediately includes A covers B if and only if A includes B there is no C such that A includes C includes B. draw graph so that if A covers B, B is lower on page composite part A covers basic part B covers basic part C example
41 41 Example: branched well derrick floor well Upper well upper well lower well well junction bore 1 bore 2 Lower well Well parts bore 1 bore 2 df junction Hasse diagram basic parts are independent objects composite parts are precisely the sum of their basic parts
42 42 Sheaf table metaphor data base is a set of tables each table represents a type each row an instance each column an attribute rows carry clientdefined lattice order col lattice is row lattice of some other table schema are first class objects unified algebraic framework for all common scientific data types
43 43 Unified framework for scientific data types tabular types contains relational model as limiting case row lattice is a boolean lattice physical property types scalars, vectors, tensors objectoriented types with multiple inheritance col lattice is subobject inclusion hierarchy spatial types (meshes) any decomposition of space row lattice represents spatial inclusion field types any property, any mesh, any evaluation method col lattice = tensor(mesh row lattice, property col lattice) rigorous connection between abstract math types and numeric reps from high level specification to tuples of primitives
44 44 Open Source Implementation SheafSystem Community Edition C++ libraries with Java, Python, and C# bindings Field API field types pushers refiners Geometry API coordinate sections (invertible sections) point locators spatial types Fiber Bundle Data Model API physical property types tensors groups Jacobians Sheaf Data Model API section types sheaf storage agent HDF5 or github
45 45 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model
46 46 Query language for sheaf data model work in progress with Prof Magne Haveraaen Bergen Language Design Laboratory, University of Bergen started with initial guess at operators extension of relational operators experience with implementation formalizing and refining definitions goal is mo SQL
47 47 Acknowledgements Mark Verschuren, Shell, provided many useful comments and other input for this presentation Original research and development funded by subcontracts B347785, B515090, and B of prime contract W ENG48 with the Department of Energy National Nuclear Security Administration (DOE/NNSA) Ongoing development has been funded by Shell GameChanger and Shell TaCIT
48 48 END
49 49 References 1 [Krebbers] Big Data & Analytics: Exploiting it, Johan Krebbers, VP Architecture, Shell UsersConference2013/PDF/UC2013_Shell_Krebbers_GlobalIT Architecture_1.pdf [KrisEnergy] [epmag 1] Geophysics/ThreeDSeismicAdvancesImproveExploration Success_90469 [decogeo]
50 50 References 2 [epmag 2] [slb1] a/images/completions/intelligent/wellwatcher_neon_tp_01tn.jpg [slb 2] System of subsurface faults and horizons in the Gulfaks oil field in the Norwegian sector of the North Sea. Data set courtesy of Schlumberger Limited. [geosoft] [pdgm 1] ae3fbfd00f862b0d/skuasalt jpg.aspx?width=1024&height=650&ext=.jpg
51 51 References 3 [slb 3] [dgi] how_003.jpg [outernode] data/assets/image/0020/ /Curnamona_3D.jpg [cda] 14.png [Codd] E. F. Codd A relational model of data for large shared data banks. Commun. ACM 13, 6 (June 1970), DOI= /
JCR or RDBMS why, when, how?
JCR or RDBMS why, when, how? Bertil Chapuis 12/31/2008 Creative Commons Attribution 2.5 Switzerland License This paper compares java content repositories (JCR) and relational database management systems
More informationSPECIAL REPORT. Big Data. What Does It Really Cost? r t s. M a n a g e m e n t Expe. D a t a. S c a l e. L a r g e. T h e
SPECIAL REPORT W I N T E R C O R P O R A T I O N T h e L a r g e S c a l e Big Data What Does It Really Cost? D a t a M a n a g e m e n t Expe r t s W I N T E R C O R P O R A T I O N Big Data What Does
More informationCost aware real time big data processing in Cloud Environments
Cost aware real time big data processing in Cloud Environments By Cristian Montero Under the supervision of Professor Rajkumar Buyya and Dr. Amir Vahid A minor project thesis submitted in partial fulfilment
More information1. Adaptation of cases for casebased forecasting with neural network support
1. Adaptation of cases for casebased forecasting with neural network support Corchado J. M. Artificial Intelligence Research Group Escuela Superior de Ingeniería Informática, University of Vigo, Campus
More informationMaking Optimal Use of JMX in Custom Application Monitoring Systems
Page 1 of 10 Making Optimal Use of JMX in Custom Application Monitoring Systems With any new technology, best practice documents are invaluable in helping developers avoid common errors and design quality
More informationOPEN DATA CENTER ALLIANCE : Big Data Consumer Guide
OPEN DATA CENTER ALLIANCE : sm Big Data Consumer Guide SM Table of Contents Legal Notice...3 Executive Summary...4 Introduction...5 Objective...5 Big Data 101...5 Defining Big Data...5 Big Data Evolution...7
More informationThe Design of the Borealis Stream Processing Engine
The Design of the Borealis Stream Processing Engine Daniel J. Abadi 1, Yanif Ahmad 2, Magdalena Balazinska 1, Uğur Çetintemel 2, Mitch Cherniack 3, JeongHyon Hwang 2, Wolfgang Lindner 1, Anurag S. Maskey
More informationUNIFACE Componentbased. Development Methodology UNIFACE V7.2. 15115720600 Revision 0 Dec 2000 UMET
UNIFACE Componentbased Development Methodology UNIFACE V7.2 15115720600 Revision 0 Dec 2000 UMET UNIFACE Componentbased Development Methodology Revision 0 Restricted Rights Notice This document and
More informationA RelationshipBased Approach to Model Integration
A RelationshipBased Approach to Model Integration Marsha Chechik Shiva Nejati Mehrdad Sabetzadeh Department of Computer Science University of Toronto, Toronto, ON, Canada chechik@cs.toronto.edu Simula
More informationSIMPL A Framework for Accessing External Data in Simulation Workflows
Peter Reimann b Michael Reiter a Holger Schwarz b Dimka Karastoyanova a Frank Leymann a SIMPL A Framework for Accessing External Data in Simulation Workflows Stuttgart, March 20 a Institute of Architecture
More informationQuick Start Guide. www.goldensoftware.com. 2D & 3D Graphing for Scientists, Engineers & Business Professionals. GoldenSoftware,
Quick Start Guide 2D & 3D Graphing for Scientists, Engineers & Business Professionals Quick Sta rt Guide www.goldensoftware.com Golden GoldenSoftware, Software,Inc. Inc. Voxler Registration Information
More informationAn Oracle White Paper June 2013. Oracle: Big Data for the Enterprise
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
More informationDevelopment of a 3D tool for visualization of different software artifacts and their relationships. David Montaño Ramírez
Development of a 3D tool for visualization of different software artifacts and their relationships David Montaño Ramírez Development of a 3D tool for visualization of different software artifacts and their
More informationProblem solving and program design
Keil and Johnson, C++, 1998 11 Chapter 1 Problem solving and program design 1. Problems, solutions, and ways to express them 2. The branch control structure 3. The loop control structure 4. Modular design
More informationTHE TRUTH ABOUT TRIPLESTORES The Top 8 Things You Need to Know When Considering a Triplestore
TABLE OF CONTENTS Introduction... 3 The Importance of Triplestores... 4 Why Triplestores... 5 The Top 8 Things You Should Know When Considering a Triplestore... 9 Inferencing... 9 Integration with Text
More informationA decision support system for application migration to the Cloud
Institute of Architecture of Application Systems University of Stuttgart Universittsstrae 38 D 70569 Stuttgart Diplomarbeit Nr. 3381 A decision support system for application migration to the Cloud Zhe
More informationModelling with Implicit Surfaces that Interpolate
Modelling with Implicit Surfaces that Interpolate Greg Turk GVU Center, College of Computing Georgia Institute of Technology James F O Brien EECS, Computer Science Division University of California, Berkeley
More informationBig Data: Issues and Challenges Moving Forward
2013 46th Hawaii International Conference on System Sciences Big Data: Issues and Challenges Moving Forward Stephen Kaisler i_sw Corporation skaisler1@comcast.net Frank Armour American University fjarmour@gmail.com
More informationBuilding A Better Network Monitoring System
Building A Better Network Monitoring System A report submitted in fulfillment of the requirements for the degree of Bachelor of Computing and Mathematical Sciences with Honours at The University of Waikato
More informationMathematics: Content Knowledge
The Praxis Study Companion Mathematics: Content Knowledge 5161 www.ets.org/praxis Welcome to the Praxis Study Companion Welcome to the Praxis Study Companion Prepare to Show What You Know You have been
More informationAn architectural blueprint for autonomic computing.
Autonomic Computing White Paper An architectural blueprint for autonomic computing. June 2005 Third Edition Page 2 Contents 1. Introduction 3 Autonomic computing 4 Selfmanagement attributes of system
More informationReverse Engineering of Geometric Models  An Introduction
Reverse Engineering of Geometric Models  An Introduction Tamás Várady Ralph R. Martin Jordan Cox 13 May 1996 Abstract In many areas of industry, it is desirable to create geometric models of existing
More informationAnalysis of dynamic sensor networks: power law then what?
Analysis of dynamic sensor networks: power law then what? (Invited Paper) Éric Fleury, JeanLoup Guillaume CITI / ARES INRIA INSA de Lyon F9 Villeurbanne FRANCE Céline Robardet LIRIS / CNRS UMR INSA de
More informationW H I T E P A P E R. Architecting A Big Data Platform for Analytics INTELLIGENT BUSINESS STRATEGIES
INTELLIGENT BUSINESS STRATEGIES W H I T E P A P E R Architecting A Big Data Platform for Analytics By Mike Ferguson Intelligent Business Strategies October 2012 Prepared for: Table of Contents Introduction...
More informationNESSI White Paper, December 2012. Big Data. A New World of Opportunities
NESSI White Paper, December 2012 Big Data A New World of Opportunities Contents 1. Executive Summary... 3 2. Introduction... 4 2.1. Political context... 4 2.2. Research and Big Data... 5 2.3. Purpose of
More informationMISO: Souping Up Big Data Query Processing with a Multistore System
MISO: Souping Up Big Data Query Processing with a Multistore System Jeff LeFevre + Jagan Sankaranarayanan Hakan Hacıgümüş Junichi Tatemura Neoklis Polyzotis + Michael J. Carey % NEC Labs America, Cupertino,
More informationUpon completion of this chapter, you will be able to answer the following questions:
CHAPTER 1 LAN Design Objectives Upon completion of this chapter, you will be able to answer the following questions: How does a hierarchical network support the voice, video, and data needs of a small
More informationIntroduction to SOA with Web Services
Chapter 1 Introduction to SOA with Web Services Complexity is a fact of life in information technology (IT). Dealing with the complexity while building new applications, replacing existing applications,
More informationComputing at School Working Group http://www.computingatschool.org.uk endorsed by BCS, Microsoft, Google and Intellect. March 2012
Computing at School Working Group http://www.computingatschool.org.uk endorsed by BCS, Microsoft, Google and Intellect March 2012 Copyright 2012 Computing At School This work is licensed under the Creative
More informationEmergence and Taxonomy of Big Data as a Service
Emergence and Taxonomy of Big Data as a Service Benoy Bhagattjee Working Paper CISL# 201406 May 2014 Composite Information Systems Laboratory (CISL) Sloan School of Management, Room E62422 Massachusetts
More information