Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights
Big Data, Advanced Analytics: WHAT, WHY, HOW WHAT is Big Data? WHY: The Value of Big Data Technologies in Threat Identification, Situational Awareness, and Geospatially Enabled Applications HOW: Integrating Big Data Analytics with Spatial, Semantic and Traditional Business Intelligence Technologies HOW: Engineered Systems Simplify the Creation and Configuration, and lower the Cost of Big Data Environments 2 Copyright 2012, Oracle and/or its affiliates. All rights
Global Digital Data Growth Growing leaps and bounds by 40+% YoY! 2009 =.8 Zetabytes =.08 ZB Structured Data =.72 ZB Unstructured Data 2020 = 35 Zetabytes LEGEND Structured Data Unstructured Data = 3.5 ZB Structured Data = 31.5 ZB Unstructured Data* (1 Zetabyte = 1 Trillion Gigabytes) Chart conservatively assumes a constant 9:1 ratio of unstructured data vs. structured data (based upon IDC s estimate that 90% of all digital data is unstructured). Chart does not reflect IDC s projection that unstructured data is currently growing twice as fast as structured data at the rate of 63.7% vs. 32.3% CAGR. Source: IDC Digital Universe Study, A Digital Universe Decade Are Your Ready?, 2010 3 Copyright 2012, Oracle and/or its affiliates. All rights
Big Data - Diverse types of data Sensor data Clickstream data (weblogs) Social network data and logs Imagery Video surveillance feeds 4 Copyright 2012, Oracle and/or its affiliates. All rights
Big Data - Diverse Types Big Data: Decisions based on all your data Social Data Semantic Graphs Connect the Dots Documents Machine-Generated Data Geospatial Where is it Happening? In What Context? Video 5 Copyright 2012, Oracle and/or its affiliates. All rights
Definition: What is Big Data Anyway? The 4 Vs of Big Data? Volume It s about 100s TeraBytes and Petabytes... BUT could be smaller volumes size is in the eye of beholder Variety Highlights the new types of data from the internet of things that can be stored and analysed Velocity Relates to streams of high frequency data arriving e.g. From a sensor, network, Imagery, UAV, or high volume event Value At volume low value data can highlight high value patterns, trends and insights of significant commercial value 6 Copyright 2012, Oracle and/or its affiliates. All rights
Big Data in Public Sector Examples of Different Program Areas Use Cases 1. Fraud Prevention 2. Maintenance & Utilities 3. Constituent Sentiment 4. Threat Identification 5. Economic Analysis 6. Healthcare 7. Regulatory Compliance, Licensing & Enforcement 8. Open Government 9. Tax Collections 7 Copyright 2012, Oracle and/or its affiliates. All rights 7
Big Data Poses Big Challenge For Military Intelligence The amount of data generated is reaching epic proportions Number of sensors deployed on the ground or aloft on unmanned aerial vehicles (UAVs) and satellites continues to soar, as does the ability to keep this asset in place for longer periods Volume of data being gathered for intelligence, surveillance and reconnaissance (ISR) is already huge. In Afghanistan, ISR acquisition systems gather more than 53TB of data every day. There s a gap between our growing ability to collect data and our limited ability to process that data. We are collecting 1,500% more data than we did 5 years ago. At the same time, our head count has barely risen. Gen. Robert Kehler, commander of the U.S. Strategic Command, late 2011 Throughout the defense industry, there s a concerted effort to deal with socalled big data. There is a recognition that it is simpler to resolve the problems with technology than to boost the staff of skilled analysts. Source: Terry Coslow, Defense Systems, 29 March 2012 8 Copyright 2012, Oracle and/or its affiliates. All rights
What Can Federal Agencies Do With Big Data? Example: Weapons of Mass Destruction Tracking Combining Information on: Goods Materials People Money Transportation To give analysts awareness never before possible 9 Copyright 2012, Oracle and/or its affiliates. All rights
Secur ed Example: DHS Risk Analysis External Data Sources Transactional & Operational Systems Contents Repository Databases Web resources Blogs, Mails, news Real-time Data Streams Search, Presentation, Report, Visualization, Query Automatic Responses and Publishing SMS Console Alerts EV Grid Management Enterprise Data Management Infrastructure GeoSpatial Documents Historical Records POIs Demographics Customer Data Call Records Workflow Initiation Real-time Dashboards 10 Copyright 2011, Oracle and/or its affiliates. All rights
Maintenance and Utilities Structured, but massive data sets Condition Based Maintenance (CBM+) relies on sensors to collect information about the vehicle condition The condition data is combined with other data such as operational use and cost data Fleet-wide data sets need to be analyzed for trends that lead to increased costs, but analysts do not always know what they are looking for Big Data can help direct the analysts to the areas of most bang-for-the-buck Utilities can use these to spot trends in the massive time series data from millions of smart meter readings and lower costs or prevent blackouts 11 Copyright 2011, Oracle and/or its affiliates. All rights 11
Non-Defense & Security Uses Use Cases National Institutes of Health Disease outbreak detection & tracking USDA Food Stamp Fraud detection/tracking Housing and Urban Development Development and use of affordable housing 12 Copyright 2012, Oracle and/or its affiliates. All rights
Big Data Now Lets Go into HOW Given you know what it is Why You want to use it HOW can you identify all the technologies you need HOW to get all those technologies in an Engineered Solution for you 13 Copyright 2012, Oracle and/or its affiliates. All rights
Hadoop, MapReduce and Related Technologies Hadoop, HDFS, Hive, MapReduce, HBase Mahout, Sqoop, Zookeeper, Flume, Oozie, Pig, Whirr NoSQL, Key Value Stores What are these? When and why are they important? For what Use Cases? What Problems are YOU trying to solve? And Oracle really supports all of these Open Source Technologies? 14 Copyright 2012, Oracle and/or its affiliates. All rights
Hadoop Architecture Management/Monitoring Distributed file system MapReduce Map/Reduce programming paradigm Highly scalable data processing Hadoop Distributed File System (HDFS) 15 Copyright 2012, Oracle and/or its affiliates. All rights
In-Database Analytics Oracle Integrated Solution Stack for Big Data HDFS Hadoop (MapReduce) Oracle NoSQL Database Oracle Big Data Connectors Data Warehouse Analytic Applications Enterprise Applications Oracle Data Integrator ACQUIRE ORGANIZE ANALYZE DECIDE 16 Copyright 2012, Oracle and/or its affiliates. All rights
Big Data Processing & Query Characteristics Batch-Oriented Process data to use Real-Time Deliver a service Bulk storage Fast access to specific record Write once, read all Read, write, delete update 17 Copyright 2011, Oracle and/or its affiliates. All rights
Big Data Storage Options: Batch vs. Real-Time Hadoop Distributed File System (HDFS) File System Parallel scanning No inherent structure High volume writes Batch Oriented NoSQL Database (Oracle ) Database Indexed storage Simple data structure High volume random reads and writes Real-Time 18 Copyright 2011, Oracle and/or its affiliates. All rights
RDBMS & NoSQL Used Alongside Each Other RDBMS High value, high density, complex data NoSQL architectures Low value, low density, simple data Complex data relationships Schema-centric Designed to scale up & out Lots of general purpose features/functionality High overhead ($ per operation) Very simple relationships Schema-free, unstructured or semistructured data Distributed storage and processing Stripped down, special purpose data store Lower overhead ($ per operation) 19 Copyright 2012, Oracle and/or its affiliates. All rights
Oracle NoSQL Database A distributed, scalable key-value database Simple Data Model Key-value pair with major+sub-key paradigm Read/insert/update/delete operations Scalability Dynamic data partitioning and distribution Optimized data access via intelligent driver High availability One or more replicas Disaster recovery through location of replicas Resilient to partition master failures No single point of failure Transparent load balancing Reads from master or replicas Driver is network topology & latency aware Application NoSQLDB Driver Storage Nodes Data Center A Application NoSQLDB Driver Storage Nodes Data Center B 20 Copyright 2012, Oracle and/or its affiliates. All rights
Big Data Apparent Disconnect Two Separate Developer Worlds Unstructured Data Hadoop, HDFS, Hive, MapReduce, HBase Mahout, Sqoop, Zookeeper, Flume, Oozie, Pig, Whirr Distributed File Systems Transaction (Key-Value Stores) Acquire Big Data Environments Organize MapReduce Solutions Traditional Environments Analyze NoSQL Flexible Specialized Developer Centric Structured Data DBMS (OLTP) Extract Transform and Load (ETL) DBMS (DW) Advanced Analytics SQL Trusted Secure Administered Acquire Organize Analyze 21 Copyright 2012, Oracle and/or its affiliates. All rights
Oracle Data Integration for Big Data Improving Productivity and Efficiency for Big Data Transforms Via MapReduce(HIVE) Oracle Data Integrator Oracle Loader for Hadoop Activates Loads Benefits Consistent tooling across BI/DW, SOA, Integration and Big Data Reduce complexities of processing Hadoop through graphical tooling Improves productivity when processing Big Data (Structured + Unstructured) Oracle Big Data Appliance Hadoop, HDFS, Hive, MapReduce, HBase Mahout, Sqoop, Zookeeper, Flume,Oozie, Pig, Whirr Oracle Exadata 22 Copyright 2012, Oracle and/or its affiliates. All rights
Hadoop, HDFS, Hive, MapReduce, HBase Mahout, Sqoop, Zookeeper, Flume, Oozie, Pig, Whirr Unstructured Data Structured Data Oracle s Approach To Big Data Provide all Technologies, Engineered Together Distributed File Systems Oracle Big Data Appliance Transaction (Key-Value Stores) Acquire DBMS (OLTP) Acquire Big Data Environments Organize MapReduce Solutions Traditional Environments Extract Oracle Transform Exadata and Load Database (ETL) Machine Organize DBMS (DW) Analyze Advanced Analytics Analyze Oracle Exalytics In-Memory Machine NoSQL Flexible Specialized Developer Centric SQL Trusted Secure Administered 23 Copyright 2012, Oracle and/or its affiliates. All rights
Cloud Computing Oracle Engineered Systems 24 Copyright 2011, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8
Big Data and Cloud Computing Next Generation Enterprise Data Platform CIO Big Data Private Cloud Drive Business Value Deliver Top Line Growth Drive Business Efficiency and Agility Deliver Bottom Line Savings Business Value Secure 25 Copyright 2011, Oracle and/or its affiliates. All rights
Unstructured / Structured Metadata? How are we solving this? Leverage Hadoop to undertake alignment of structured and unstructured content to defined domain vocabularies Reconcile bottom-up differences in heterogeneous data types and domain semantics, with top-down business analysis and workflows definitions Leverage combined Big Data and RDBMS computing platforms to: Deliver massive pre-processing requirements Provide rich BI analysis, search and discovery 26 Copyright 2011, Oracle and/or its affiliates. All rights
SEMANTICS: RDF Graph Metadata Addresses metadata alignment challenge Apps RDF metadata (data & rules) Instance Data RDF introduces an enterprise metadata framework. The metadata graph associates underlying data, process and domain models based on their semantics. Linking of resources (via metadata registries) enables interoperability between apps that exchange data. 27 Copyright 2012, Oracle and/or its affiliates. All rights
Analysis with R R workspace console Oracle Database 11g statistics engine OBIEE, Web Services Function push-down data transformation & statistics Transparently leverage Hadoop for High Performance Analytics to Oracle Big Data Appliance (Oracle R Connector for Hadoop) 28 Copyright 2012, Oracle and/or its affiliates. All rights
Predictive Analytics with R, BI, Data Mining R workspace console Oracle statistics engine Function push-down data transformation & statistics OBIEE, Web Services No changes to the user experience Scale to large data sets Embed in operational systems Development Production Consumption 29 Copyright 2012, Oracle and/or its affiliates. All rights
Oracle In-Database Analytics Platform Oracle R Enterprise Oracle Data Mining Spatial Analytics Text and Search RDF Graph Analytics SQL Analytics Parallel Processing Engine XML Relational OLAP Spatial Data Layer RDF Media 30 Copyright 2012, Oracle and/or its affiliates. All rights
In-Database Analytics Oracle Complete Big Data Platform Big Data Exadata Exalytics Appliance Exadata Exalytics Big Data Appliance Cloudera s Distribution including Apache Hadoop Oracle NoSQL Database Open Source R Applications Oracle Big Data Connectors InfiniBand Oracle Data Integrator Oracle Advanced Analytics Data Warehouse Oracle Database InfiniBand Analytic Applications Alerts, Dashboards, MD- Analysis, Reports, Query Web Services BI Abstraction ACQUIRE ORGANIZE ANALYZE DECIDE 31 Copyright 2012, Oracle and/or its affiliates. All rights
Why Build A Hadoop Appliance? Time to Build Optimizations Maintenance 32 Copyright 2011, Oracle and/or its affiliates. All rights
Required Skills for MapReduce Development Java Hadoop Framework Parallel Algorithms & Manage Large Clusters of Machines 33 Copyright 2011, Oracle and/or its affiliates. All rights
Batch-Oriented Processing on Hadoop INPUT 1 OUTPUT 1 REDUCE REDUCE REDUCE REDUCE REDUCE SHUFFLE /SORT REDUCE REDUCE SHUFFLE /SORT INPUT 2 SHUFFLE /SORT REDUCE REDUCE SHUFFLE /SORT REDUCE SHUFFLE /SORT REDUCE REDUCE REDUCE OUTPUT 2 34 Copyright 2011, Oracle and/or its affiliates. All rights
Oracle Big Data Appliance Optimized and Complete All you need to store and integrate big data Integrated with Oracle Exadata Analyze all your data Easy to Deploy Risk Free, Quick Installation and Setup Single Vendor Support Full Oracle support for all hardware and software 35 Copyright 2012, Oracle and/or its affiliates. All rights
Oracle Exadata Database Machine Data Warehousing, Transaction Processing, Consolidation Fastest Data Warehouse & OLTP Best Cost/Performance Data Warehouse & OLTP Optimized Hardware (per rack) Processor: up to128 Intel Cores and 2 TB DRAM Network: 880 GB/Sec Throughput Storage: 5 TB Flash and up to 336 TB Disk Software Breakthroughs Exadata Smart Storage Grid Smart Flash Cache Hybrid Columnar Compression Parallel Scale-Out Database and Storage Scales from ¼ Rack to 8 Full Racks 36 Copyright 2012, Oracle and/or its affiliates. All rights
Oracle Exalytics In-Memory Machine First engineered system for analytics Visual Analysis without limits Smarter analytic applications 37 Copyright 2011, Oracle and/or its affiliates. All rights
Cloud Computing Oracle Engineered Systems 38 Copyright 2011, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8