Big Data Storage Challenges for the Industrial Internet of Things Shyam V Nath Diwakar Kasibhotla SDC September, 2014
Agenda Introduction to IoT and Industrial Internet Industrial & Sensor Data Big Data Storage Challenges Ingestion / Storage Retrieval / Consumption Use Cases Wrap up 2
About Shyam Principal Architect Analytics Board of Director (SIGs), 30K+ member User Group (IOUG) Started the IoT/ Industrial Internet Meetup in East Bay in June 2014, started other BI/Analytics related user groups Worked in IBM, Deloitte, Oracle and Halliburton, prior to GE Under grad from IIT (India), MS (Computer Science) and MBA (FAU) Regular speaker in large events like Oracle Openworld, Collaborate, BIWA Summit on IoT, Business Analytics and Data Warehousing / Engineered Systems related topics 3
About Diwakar Principal Architect GE Aviation Worked at Oracle, EMC, Pivotal prior to GE Regular speaker in large events like Exa SIG and Oracle Openworld Expertise: Data Integration, Database Appliances, Big Data, IOT 4
The Hype Cycle Gartner July 2013 The 2013 Hype Cycle features Internet of Things, machine-to-machine communication services, mesh networks: sensor and activity streams. 5 http://www.gartner.com/newsroom/id/2575515
What are the Things? 6
Big Data and IoT ERP/CRM 7
Different Views of Aircraft - as collection of sensors 8 http://www.flightglobal.com/cutaways/civil/phenom-300/
Data from Jet Engine General Electric Company, 2013. All Rights Reserved. 9
Making Sense of the Sensors EGT = Exhaust Gas Temperature The temperature of the exhaust gases as they enter the tail pipe, after passing through the turbine A good indicator of the health of engine (just like human body temperature) Recording and interpreting the EGT can help to detect several jet engine problems. 10
Industrial Internet: Big Data Analytics Delivering sharper insights to users Ingest massive volumes of with parallelization Bring analytics to and vice versa Elastically execute on large-scale requirements Innovative analytics models Various sources Enterprise (operational and business) Data, Industrial Data & External Data 11
Wind Farms Explained Via Visuals! 12 1
13 1
14 General Electric Company, 2013. All Rights Reserved.
Cloud for efficiency and agility Going mobile: anytime/ anywhere Access End-to-end Security Predictive insights from Big Data Cloud based Integrated Asset Management Industrial Internet computing requirements 021010308 013161090 040109010 104078050 Consistent and meaningful User experience Transition to Brilliant machines 15
Apply Batch or Real-Time Analytics to the Machine-Generated Data 16 General Electric Company, 2013. All Rights Reserved.
Industrial Big Data fast and vast 50B Machines will be connected on the internet by 2020 2X Industrial growth within next 10 years Sensor Historian CRM, ERP, etc. Geo-location Content (images, videos, manuals, etc.) Machine Logs Social network 35GB Data per day from each Smart Meter 50X Data growth in healthcare (2012 2020) 1TB Data per flight In practice only 3% of potentially useful is tagged and even less is analyzed* 9MM Data points per hour for each locomotive 500GB Data per blade by gas turbines *Source: IDC *Source: IDC 17
Today s approaches are not prepared for onslaught of Industrial Big Data Too slow Too expensive Too rigid 80% of an analytics project typically involves gathering and then preparing the for analysis* *Source: IDC 18
Yesterday s warehouse architecture What is it telling me? How is it doing? How does it look? 1 2 All over the place Data across multiple locations Limited types Mostly structured and semi-structured types Data scientist Field operations Business analyst O NE STATIC DATA MO DEL 3 Snapshot Limited to narrow snapshots and time CRM, ERP, etc. Logs Social network Geo-location TRADITIO NAL DATA WAREHO USE 19
New approach Industrial Data Lake architecture How long will it last without failures or maintenance? Is my asset performing optimally? How to configure for best operational results? Is my asset ready when there is market opportunity? 1 2 3 One place Access to all in one place to quickly respond to the speed of business change Any Handing of all types including documents, images machine, sensor All Access to real-time and historical and not limited to snapshot of Sensor Data scientist Field operations Business analyst Content (images, videos, manuals, etc.) FL EX IBL E DATA MODEL S INDUSTRIAL DATA LAKE Machine Historian CRM, ERP, etc. Logs, click streams Social network Geolocation Rapid access to all for analytics 20
Industrial Data Lake Sensor Content (images, videos, manuals, etc.) Business analyst Data scientist Field operations Historian Machine Industrial Data Lake CRM, ERP, etc. Logs Fast ingestion, storage and compute High performance analysis Optimized for mission-critical workloads Geo-location Social network Data governance and federation 21
Aviation and Big Data GE expects the collection to grow to 10 million flights and 1,500 terabytes of full flight operational by 2015. 22
Pivotal Architecture HAWQ Advanced Database Services Pivotal HD Enterprise Resource Management & Workflow Yarn Zookeeper HBase Xtension Framework ANSI SQL + Analytics Catalog Services Dynamic Pipelining Hadoop Virtualization (HVE) HDFS Query Optimizer Pig, Hive, Mahout Map Reduce Deploy, Configure, Monitor, Manage Command Center Sqoop Data Loader Flume Apache Pivotal HD Enterprise HAWQ 23
Q&A Shyam Nath Shyam.Nath@GE.com Diwakar Kasibhotla Diwakar.kasibhotla@GE.com Thank You!