Information systems architecture for the Oil and Gas industry

Size: px
Start display at page:

Download "Information systems architecture for the Oil and Gas industry"

Transcription

1 IBM Sales and Distribution Thought Leadership White Paper Chemicals & Petroleum Information systems architecture for the Oil and Gas industry

2 2 Information systems architecture for the Oil and Gas industry Overview Oil and gas companies continue to address the information challenges associated with digital oilfield implementations by focusing on optimizing operations, productivity, efficiency and safety. Managing the realities of lower commodity prices, capital scarcity, ongoing productivity deficiencies and community scrutiny requires a deft balance one which innovative technology solutions can help achieve. Driving value from information has long been seen as central to the mission of optimized production, improved maintenance and reliability, better supply chain and increased safety. But with technologies changing as quickly as they do, how should organizations best tailor a path through information, big data, analytics and business intelligence? This paper lays out an architectural response to this question, providing a descriptive primer for information systems architecture for the oil and gas industry, aimed at business users, IT architects and solution designers. People, process and technology This paper addresses technology and systems architecture, as well as the critical change elements needed for people to embrace new or changed technologies, and for the business to integrate these technologies into their standard processes. For a consistent operational model to succeed, change needs to be appropriately designed and implemented. The change elements include structures such as project offices, centers of competency and others, and the detailed change work required to document, standardize, refactor and optimize business processes. It is perhaps even more critical to plan and execute based on the required business sponsorship, stakeholder and benefits realization to deliver technology-enabled transformations. Benefits Process Figure 1. High- level Operating Model People Operating Model Stakeholders Sponsors Technology Information systems architectural approach To begin planning the technologies required for information and analytics for an oil and gas company, we must first understand the types of systems and data that we are trying to manage and analyze. There are many methods of depicting the high-level systems architecture of oil and gas companies. One of the most widely accepted methods is the standard, an international standard for developing an automated interface between enterprise and control systems. For example, Figure 2 shows the architectural levels (from Level 0 at the bottom, which represents the production process itself, to Level 4 at the top, representing business planning and logistics) for the upstream oil and gas industry. Systems across the levels store and process data in varying execution windows.

3 IBM Sales and Distribution 3 Level 4 Production scheduling Business planning and logistics (ERP) Materials scheduling Delivery and shipping Inventory Months, years, decades Multidimensional Structured Olap warehouse marts Operations Management Level 3 Health, safety and environment Supply chain Exploration Performance and remote monitoring Fleet, rail and shipping Level 2 Level 1 Asset Monitoring, Supervising & Automated Controls Facility Monitoring Digital Control Integrated Operations Process Control Sensing and Manipulation Production Execution Historians Sensors Actuators Networks Execution window types Analytic approaches Production Process Level 0 Drill Manage Water Pipeline Plant Port Ship Customer Sub-second Unstructured Semistructured Big Hadoop Real-Time Figure 2. Conceptual Architecture for Upstream Oil and Gas A sensor at Level 1 may produce data every second or more frequently, while an enterprise planning system may work on a monthly, yearly or even 30-year basis. at the higher levels is also more likely to be structured such as in accounts, assets, schedules, whereas data at the lower levels is more likely to be unstructured, such as with video, images or text. These variations in time intervals and structure mean that different analytical approaches are necessary at different levels. Traditional OLAP, data warehousing and data marts are more applicable to the higher levels. Some of the newer big data and real-time capabilities offer additional value at the lower levels. The 4 Vs of data Velocity, volume, variety and veracity are critical characteristics of data. The conceptual architecture in Figure 2 also introduces the concept of execution windows (shown in the Execution Windows arrow), or data velocity. The velocity with which data must be processed, ranging from real-time or near-real-time to monthly or yearly data, is one aspect to consider when choosing how best to store data. Velocity also influences the decision of which analytical approach (shown in the Analytic Types arrow) is most appropriate for gaining insight into that data. The volume of data raises another key consideration, in terms of small requirements for master data, to petabyte-scale requirements and beyond for video, geospatial, seismic and historic data.

4 4 Information systems architecture for the Oil and Gas industry Variety is also important, as some storage and analytics approaches are best at processing specific types of data whereas others provide capabilities to integrate and correlate various forms of data. varieties may include text, images and many other types of inputs. Figure 2 illustrates how data types may be managed and analyzed with different analytic approaches. Veracity refers to the accuracy, validity or quality of the data. veracity considerations include how much the data has been or needs to be cleansed, that is, made to be valid, accurate, complete, consistent and uniform. This is important because a data error in a real-time, transient data feed may be able to be ignored. However, data errors in a permanent, enterprise master data record could be costly to remediate. Figure 3 depicts these four key aspects of data. Information systems Velocity, volume, variety and veracity determine which types of data storage and analytics processes are most appropriate for the data types found at the various levels of the oil and gas industry architecture. For example, enterprise resource planning (ERP) data is typically transactional and is best represented by a relational, normalized data structure. may need to be processed in real time, as with transactions, or longer execution windows, in up to 30 year plans. ERP data volumes may be large but not extreme, such as gigabytes or more, and there are a variety of data types to store. A different type is stored in each table. Transactional consistency is important and data must conform to the relational schema defined. For example, a maintenance schedule cannot be entered for a piece of equipment that does not exist. This type of data is therefore best stored in a relational system, which may or may not use inmemory features for fast read/write capabilities. In order to perform analytics on this data, it is typically necessary to extract it to a multi-dimensional OLAP data warehouse to optimize read access and report on aggregations across groups of data. Operational data, on the other hand, is typically non-transactional, transient and may be semi-structured or totally unstructured. It is likely that such data needs to be processed in real time or right time, depending on the underlying operational equipment. Velocity Volume Variety Veracity in Motion at Scale in Many Forms Uncertainty Figure 3. The 4 Vs of

5 IBM Sales and Distribution 5 Operational data volumes can be extremely large (petabytes and beyond), particularly for seismic, image and video data, and in cases where years of historic data needs to be stored to perform predictive analytics. Variety is limited however. For instance, there aren t hundreds of tables, but it might be necessary to integrate different types of data. An example of this is when integrating real-time data from an equipment sensor with video data from a nearby camera. Managing veracity can initially be less complex. cleansing is often limited to ensuring the data matches corporate master entities and removing noise on the curve. This type of data is often best stored in historians and other flat files or file formats specially designed for the data types in question. To analyze the data, however, particularly for large amounts of historical data, big data storage facilities such as NoSQL and Hadoop s are often the best approach. These ideas are shown in Table 1. Structure Volume Velocity Variety Veracity Storage Analytics Level 4 Master data Relational Megabytes Daily Assets, corporate structure ACID required Highest data cleansing requirements Master data platform OLAP, enterprise data warehouse Enterprise transactional Relational Gigabytes Real-time to yearly Assets, budgets, schedules, accounts, HSSE ACID required cleansed to map to relational schema Relational In-memory OLAP, enterprise data warehouse Unstructured text Terabytes Real-time MIME Addressee validation system NoSQL Level 3 Documents Unstructured text Terabytes Daily Documents, spreadsheets, presentations, plans Minimal requirements Enterprise document system NoSQL Hadoop Engineering data Relational, ontologies, CAD, 3D Terabytes Daily ISO15926, ontologies, CAD, 3D cleansed to map to master data Engineering data warehouse Any Level 2 Operational semistructured Semistructured Terabytes Real-time Time series, well logs cleansed to map to master data Historians Hadoop Level 1 Operational unstructured Unstructured Petabytes Real-time to yearly Logs, videos, web logs, geospatial, seismic, text, images cleansed to map to master data Flat files, specialist file formats Hadoop Table 1. Information Systems Considerations for Types

6 6 Information systems architecture for the Oil and Gas industry Information repositories The considerations for different types of storage mechanisms and analytical approaches for the various data types mean that a number of information repository capabilities are required across the enterprise. Table 2 provides a summary of the most important capabilities and their strengths and weaknesses. It also shows the underlying data storage techniques. Commonly used products are listed in the Examples column. Repository Structure Strengths Weaknesses Volume Velocity Storage Examples Master data platform Relational Master data Anything else Gigabytes Real-time On disk SAP Netweaver MDM IBM InfoSphere MDM Relational Relational Columnar Transactions Reports and analytics Terabytes Real-time On disk IBM DB2 MS SQL Server In-memory Columnar Fast read/write for structured data Large volumes Terabytes Real-time In memory IBM DB2 BLU SAP HANA system Proprietary Anything else Terabytes Real-time On disk Archive to tape IBM Lotus Notes MS Exchange Enterprise document system Proprietary Document Anything else Terabytes Real-time On disk IBM FileNet EMC Documentum MS Sharepoint Engineering data warehouse Relational Ontologies Engineering data and documents Anything else Terabytes Batch or real-time On disk Bentley Intergraph Aveva Historians Key-value pairs Time series Anything else Terabytes Real-time On disk OSISoft PI Honeywell Phd OLAP, enterprise data warehouse MOLAP Cubes Columnar DBs DW Appliances Large volumes, fast retrieval, analytics Transactions semi-structured and unstructured data Petabytes Batch On disk In memory On appliance IBM Cognos SAP HANA IBM Netezza NoSQL JSON Key-based retrieval, write performance Reporting Petabytes Real-time On disk IBM Cloudant Hadoop MapReduce Inexpensive storage of semi- and unstructured data Fast retrieval Petabytes Batch MPP file system IBM BigInsights Table 2. Information Systems Considerations for Repositories

7 IBM Sales and Distribution 7 Big data and analytics Information technology (IT) organizations have traditionally achieved strong implementations of ERP, document and other systems at the higher levels of the conceptual architecture. Operations technology (OT) departments, on the other hand, have focused on the process control domain at the lower levels of the architecture. The current trend of IT and OT convergence means that these parts of the business come together to drive additional value from integrating and analyzing operational data. As we have discussed, traditional relational s and data warehouses are less appropriate for storing and analyzing operational data, so a new architecture and approach is needed for big data and analytics. This new architecture for oil and gas introduces additional capabilities and concepts. Real-time analytics is a key new capability. That means analytics are run against each instance of data as it happens, rather than extracting batches of data to be analyzed later in a data warehouse. Applying this real-time analytics paradigm leads to new and enhanced capabilities to address critical questions: Perhaps most importantly, real-time analytics help answer the question: What did I learn and what would be best to do next time? To address these questions, the new architecture enables new capabilities: Discovery and exploration, often in conjunction with real-time visualization and alerts, which are used to understand the current state of operations Reporting and analysis, often root cause analysis, which is used to understand the behavioral characteristics of operations Predictive analytics, which provides the capability to understand future possibilities Decision, which helps support the actions taken to move forward Cognitive computing, best exemplified by IBM Watson technologies, which helps users plan future best outcomes based on learned operational behavior Figure 4 highlights real-time processing and analytics as a key function. What is happening currently? Why is it happening? What should we do? What could happen next?

8 8 Information systems architecture for the Oil and Gas industry All New/Enhanced Applications Operational data zone Landing, Exploration and Archive data zone Deep Analytics data zone EDW and data mart zone What action should I take? Decision What is happening? Discovery and exploration What did I learn, what s best? Cognitive Why did it happen? Reporting and analysis Real-time Processing and Analytics What could happen? Predictive analytics and modeling Information Integration and Governance Systems Security Storage On premise, Cloud, as a Service Figure 4. Conceptual Architecture for Big and Analytics

9 IBM Sales and Distribution 9 Logical architecture The associated logical architecture for big data and analytics is depicted in Figure 5. Here, streaming computing technology is introduced to provide real-time or near-real-time analytics capabilities. integration is highlighted, and the needs for data governance, event processing and security and business continuity are also addressed. Sources Shared Operational Information Actionable Insight Level 4 Analytical Sources Discovery and Exploration What is happening? Level 3 Level 2 Acquisition Integration Landing, Archive and Exploration Integrated Warehouse Deep Analytics and Modeling Analysis and Reporting Analysis and Reporting Predictive Analytics and Modeling Decision Management Why did it happen? What could happen? What action should I take? Level 1 Streaming Computing Planning and Forecasting What did I learn? What s best? Governance Security and Business Continuity Management Figure 5. Logical Architecture for Big and Analytics An example of this architecture is shown in Figure 6. In this simple example, asset, such as equipment, information is captured from asset and maintenance, repair and operations (MRO) inventory systems and integrated with historian data from sensors in the field to provide analytics and optimized planning and scheduling. adaptors are used to capture the data and transport it to an integration bus. From there, it can be loaded into a staging area (Hadoop in this instance) and the enterprise data warehouse. Predictive analytics can be performed on both of these sources, depending on the level of information required. From historic data, asset performance is reported on, and events are analyzed using causal temporal analysis; asset failures are predicted, and optimized maintenance plans and schedules are produced. can also be used in real time mode, processing time series data from historians in a streaming computing engine and invoking the predictive analytics capabilities to provide realtime predictions of impending failures and other scenarios.

10 10 Information systems architecture for the Oil and Gas industry Sources Assets Shared Operational Information Inventory Actionable Insight Level 4 Level 3 Level 2 Level 1 Inventory Management Asset Management Asset Historians Asset Sensors Adaptors and Transport Integration Bus Time Series Processing Hadoop Staging Area Analytical Sources OLAP Enterprise Warehouse Streaming Computing Predictive Analytics Reporting and Analysis Real-time Analytics Asset Performance Failure Analysis Predictive Asset Maintenance Maintenance Planning Maintenance Scheduling Figure 6. Architecture Example Hadoop and HANA Hadoop and SAP HANA solutions work side-by-side to provide an overall enterprise big data solution. Many IBM customers use Hadoop alongside the IBM Netezza data warehouse appliance in a similar pattern. HANA can best be used to store high-value, frequently used, structured data the type of data that comes from ERP and other corporate systems. Hadoop, on the other hand, lends itself to the world of unstructured and semi-structured data found in operational systems, including time series, web logs, geospatial, seismic, text, images, and video data. Holding these volumes of data in memory, as HANA does, would have little value. Hadoop uses large clusters of commodity servers to provide a cost-effective solution to the problem of analyzing and gaining insights from big data. The analytics performed on data in HANA and Hadoop are also different and complementary. With its in-memory, columnar structures, HANA is well-suited to high performance OLAP analytics, aggregating and summarizing data. An example would be forecasting production output by month, well or joint venture partner. Hadoop s MPP file system and MapReduce programming model is better suited to large-scale batch processing, and the predictive analytics that can be the result of finding correlations and patterns among the data. An example would be analyzing years of sensor data from electric submersible pumps to detect anomaly patterns and predict future failures.

11 IBM Sales and Distribution 11 The IBM Hadoop solution Apache Hadoop is an open source project. There are, however, distributions of Hadoop provided by a number of software vendors including IBM. The IBM InfoSphere BigInsights Hadoop solution offers a number of distinctive advantages over using open source Hadoop. These include: Cost savings A study by the International Technology Group of high-impact applications in six companies showed cost savings of 28 percent over three years for the use of the InfoSphere BigInsights analytics platform over open source Hadoop. Cost calculations included licensing, support and personnel. Stability and support Hadoop and the open source software stack it relies on are currently defined in a set of at least 25 Apache sub-projects and incubators, which are supported only by the open source community. BigInsights offers an IBM-supported solution to ensure the stability of IBM clients analytics applications. Predictive analytics support InfoSphere BigInsights provides interoperability with IBM SPSS software, one of the industry s leading predictive analytics solutions. Streaming computing BigInsights contains a limited-use license for the IBM InfoSphere Streams advanced analytics platform, used for real-time, big data analytics. SPSS predictive models can be embedded into the InfoSphere Streams platform to provide real-time predictive analytic scoring on data streamed from meters, sensors, gauges and more. BigSheets, BigSQL and Big R BigSheets provides a spreadsheet-like user interface for big data exploration and visualization. IBM BigSQL provides an SQL programming interface to Hadoop, reducing the need for scarce, expensive MapReduce programmers. IBM InfoSphere BigInsights Big R provides a library of functions for integrating the popular statistical computing language R with BigInsights. IBM Accelerator for Machine This tool provides libraries to speed the development of analytic solutions for data from smart meters, factory floor machines and sensors in vehicles, among others. Other considerations Interoperability, resiliency and manageability are also challenges with open source Hadoop. These are addressed by the InfoSphere BigInsights platform.

12 12 Information systems architecture for the Oil and Gas industry Logical architecture capabilities Finally, in Figure 7 the logical architecture is elaborated. It details the capabilities required across the oil and gas enterprise for a successful big data and analytics rollout. Business Planning and Logistics (ERP) Operations Management Monitoring, Supervising and Automated Controls Sensing and Manipulation Sources Corporate Sources Third-Party Transactional Application Operations Sources Machine and Sensor Image and Video Enterprise Content Social Internet Acquisition (Extract, Replicate, Copy) & Application Access Shared Operational Information Master and Reference Integration Real-Time Analytical Processing Integration Ingestion Content Hub Extract and Subscribe Initial Stage Quality Clean Staging Transformation Load Ready Load Streaming Computing Real-Time Analytical Processing Predictive Analytics Activity Hub Metadata Catalog Analytical Sources Warehousing Warehouse Marts ODS Time Persistent Hadoop and Exploration Landing Archive Accelerators (In-Memory) Analytical Appliances Cubes Sandbox Exploration Indexing Real-Time Insights Semantic Layer Information Access Delivery and Visualization Services Publish Access Caching Delivery Virtualization Federation Link Actionable Insight Decision Management Rules Management Planning, Forecasting Annotation Real Time Decision Management Reporting, Analysis and Content Analytics Budgeting Reporting Collaboration Scorecards, Dashboards Query and Analysis Storytelling Alerting, Monitoring Discovery and Exploration Search Reporting, Analysis and Content Analytics Predictive Analytics Simulation Mining Text Analytics Optimization Correlations Corporate Governance IT Governance Information Governance Governance Security & Privacy IT Security, Privacy & Recovery Security & Business Continuity Management Private Cloud Public Clouds Appliances Custom HW Solutions Platforms Figure 7. Logical Architecture Capabilities for Big and Analytics

13 IBM Sales and Distribution 13 Information systems architecture implementation No company will begin the journey of implementing its information systems architecture from a blank slate, so the architecture diagrams presented here can best be viewed as desired outcomes or end-state artifacts. Depending on the current maturity of the company, a tailored implementation roadmap for the relevant capabilities can be put in place. Questions such as Can we integrate ERP data with our sensor data? and Can we do analytics on data as it occurs? and Do we have optimally performing, cost-effective repositories for all of our data? will help to clarify the current level of maturity. A traditional maturity model, shown in Figure 8, can also be used to illustrate the current state of maturity and perform the gap analysis to the desired state in order to plan the transformation roadmap. value Implement field instrumentation for surveillance of critical points in production system Integrate operations and enterprise data for a crossfunctional view 2 Integrated Intelligent operations Monitor critical performance factors and enable rapid response 3 Intelligent Predictive operations Enable proactive of production system 4 Predictive Optimize resource recovery Model & implement systemic changes to enhance profitability and realize full value 5 Optimized 1 Instrumented Maturity Figure 8. Information Systems Architecture Maturity Model

14 14 Information systems architecture for the Oil and Gas industry IBM offers a unique value proposition IBM presents a unique value proposition to oil and gas clients to help them use leading industry information systems technologies to improve productivity, profitability and safety. IBM does this through: Consulting services and systems integration to help plan and deliver on the people, process and technology elements of any information system or big data and analytics implementation A robust, enterprise-class relational system, such as with IBM DB2 software, which offers relational, columnar and in-memory data storage and retrieval A comprehensive set of technologies for big data repositories, including Hadoop distribution with InfoSphere BigInsights, a Netezza data warehouse appliance, and IBM Cloudant, a NoSQL cloud service. A comprehensive set of technologies for big data analytics, including IBM real-time streaming capability with the InfoSphere Streams platform, predictive analytics with SPSS software, decision and optimization software with the IBM ILOG Optimization Decision Manager, and the Watson family of cognitive technologies Software that provides the governance, lifecycle, archiving and security of all levels of enterprise data Specialized infrastructure, optimized for the performance and scalability of big data and analytics workflows, including those associated with Hadoop (both open-source and using BigInsights), the Netezza data warehouse appliance, and with SAP HANA Cloud solutions provide the flexibility to scale costs with usage, shift expenses from capital to operational and quickly realize business benefits. IBM cloud solutions support public, private and hybrid approaches and consumption models, from software as a service (SaaS) to platform and infrastructure as a service (PaaS). Additional examples of IBM cloud services include: Mobile solutions that provide information systems engagement and analytical insights for workers in the field. Systems support and services for customized cloud-optimized IaaS on-premises support or managed services on installations hosted by IBM. IBM Business Partners are a broad range of independent software vendors, open-source developers and systems integrators. These companies provide the deep ecosystem required to deliver enterprise-scale big data and analytics.

15 IBM Sales and Distribution 15 Glossary of relevant terms ACID Columnar DW appliance Hadoop ISO15926 JSON Key-value pair MapReduce MIME MPP file system NoSQL OLAP Ontology Relational Semi-structured data Structured data Time series Unstructured data Well logs Set of properties that guarantee transactions are processed reliably, with atomicity, consistency, isolation and durability base system that stores data as columns rather than as rows, as does the relational schema Integrated compute, storage, memory and software pre-optimized for data warehousing on a single node Open-source framework for storage and processing of large-scale data on clusters of commodity hardware using a massively parallel file system such as a Hadoop Distributed File System International standard for developing an automated interface between enterprise and control systems Standard for data integration and exchange for process industries JavaScript Object Notation an open standard used to transmit key-value pairs A pair tuple: <attribution name, value> Programming model used to store and search data in the Hadoop framework Internet standard for , headers and attachments A distributed file system that supports massive parallel processing (MPP) Not only SQL storage for semi-structured and unstructured data, such as non-relational data Online analytical processing Hierarchy of concepts within a domain, often represented in web ontology language (OWL) structured according to the relational calculus, such as relationally normalized data that can be structured according to a schema other than relational. Examples are XML and time series. that can be structured according to a relational schema Series of key-value pairs for a given function over time: <Timestamp, Function Value> that is schema-less Depth-dependent time series for geological formations in a well or borehole

16 For more information To learn more about information systems architecture for the oil and gas industry, please contact your IBM representative or IBM Business Partner, or visit the following website: ibm.com/industries/chemicalspetroleum/?lnk=msois-cape-usen About the author Ross Collins is Chief Technical Officer, Natural Resources Sector, IBM Australia - Chief Technical Officer, Chemicals & Petroleum, IBM Copyright IBM Corporation 2014 IBM Corporation Sales and Distribution Group Route 100 Somers, NY Produced in the United States of America November 2014 IBM, the IBM logo, ibm.com, Netezza, InfoSphere, DB2, Lotus Notes, Cognos, Cloudant, BigInsights, Watson, SPSS, and ILOG are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at Copyright and trademark information at ibm.com/legal/copytrade.shtml Microsoft, Windows and Windows NT are trademarks of Microsoft Corporation in the United States, other countries, or both. Other product, company or service names may be trademarks or service marks of others. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. The performance data discussed herein is presented as derived under specific operating conditions. Actual results may vary. It is the user s responsibility to evaluate and verify the operation of any other products or programs with IBM products and programs. THE INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. Please Recycle CHE03009-USEN-00