An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle.
Common Data Warehousing Areas of Focus Typical Structured Data Analysis Today AUTOMOTIVE Mfg cost analysis Cost of service / warranty analysis COMMUNICATIONS Logistics optimization Network analysis CONSUMER PACKAGED GOODS Supplier & channels analysis Consumer trends FINANCIAL SERVICES Risk & portfolio analysis Customer analysis EDUCATION & RESEARCH Cost of facilities & staff analysis Academic & alumni profile HIGH TECHNOLOGY / INDUSTRIAL MFG. Customer & distributor analysis Mfg cost analysis LIFE SCIENCES Clinical trials Cost of research & production MEDIA/ ENTERTAINMENT Viewer & channels analysis Venue optimization ON-LINE SERVICES Customer analysis & site statistics Cross-sell / up-sell HEALTH CARE Cost of care Quality of care Staffing analysis OIL & GAS Drilling exploration costs & logistics optimization RETAIL Market basket analysis Supply chain optimization Real estate optimization TRAVEL & TRANSPORTATION Equipment & crew logistics & routing optimization Customer analysis UTILITIES Equipment logistics optimization Customer analysis Grid cost of delivery LAW ENFORCEMENT & DEFENSE Logistics optimization Crime statistics analysis Challenged by: Growing Data Volume & More Complex Analysis Needs
Sources for the Data are Growing 383+ Million Twitter accounts (100m+ tweeting) 835+ Million Facebook subscribers 1.2+ Billion Mobile Web users Sensors everywhere
Structured Data & Big Data Structured data from applications. Semi-structured Big Data from social media and logs, sensors, feeds, etc.
Big Data Fills Out the Complete Picture AUTOMOTIVE Auto sensors reporting location, problems COMMUNICATIONS Location-based advertising CONSUMER PACKAGED GOODS Sentiment analysis of what s hot, problems FINANCIAL SERVICES Risk & portfolio analysis EDUCATION & RESEARCH Experiment sensor analysis HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis LIFE SCIENCES Clinical trials Genomics MEDIA/ ENTERTAINMENT Viewers / advertising effectiveness ON-LINE SERVICES / SOCIAL MEDIA People & career matching Web-site optimization HEALTH CARE Patient sensors, monitoring, EHRs Quality of care OIL & GAS Drilling exploration sensor analysis RETAIL Consumer sentiment Optimized sales & marketing TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment UTILITIES Smart Meter analysis LAW ENFORCEMENT & DEFENSE Threat analysis - social media monitoring, photo analysis Challenged by: Data Volume, Velocity, Variety in finding Value
Typical Stages in Analytics Choosing the Right Solutions for Right Data Needs Discover and Explore Growing investment here Query and Analyze Dashboard and Report Model and Plan Predict Growing investment here
Challenges & Strategies CHALLENGES Fragmented Solutions Difficulty of Self-Service BI Data Not Current Time to ROI / Development Time Rapidly Growing Diverse Data & User Communities Deployment Manageability, Security & Expense STRATEGIES Specialized but integrated data stores and tools Flexible, guided, automated, easy-to-use tools, data discovery Solutions for Just-in-Time well-understood data Horizontal and industry pre-built solutions, appliance-like solutions & Cloud solutions Enterprise class solutions serving 1000s of users optimized for diverse workloads and providing petabytes of data Pre-integrated solutions that are centrally managed with advanced security / governance; Consolidation where possible to reduce platform footprint space & power
An Information Architecture that includes Big Data Source Data Layer Processes Enterprise Data Warehouse Staging Data Layer Information Access Performance Management COTS/ERP External Social/Text Sensors Streaming Strongly Typed Data Data Quality Weakly Typed Data Knowledge Discovery Layer Foundation Layer Data Mining Sandbox Enterprise Data with full history Performance Layer Embedded Data Marts Rapid Dev Sandbox BI Abstraction & Query Federation Alerts, Dashboards, Reporting Services Information Discovery Advanced Analysis & Data Science Security and Metadata Data Integration
Oracle Analytics Software Components Unstructured Data / Sparse Data of Value Structured Data / Highly dense data Cloudera Hadoop Oracle NoSQL DB Oracle Transactional Database & Applications Acquire Oracle Data Integrator / Connectors Organize Oracle Data Warehouse & Embedded Analytics Endeca Information Discovery Oracle BI Foundation Suite Analyze & Decide
& Engineered Systems Unstructured Data / Sparse Data of Value Structured Data / Highly dense data Cloudera Hadoop Big Data Appliance Oracle NoSQL DB Exadata Oracle Transactional Platforms Database & Applications Acquire Oracle Data Integrator / Connectors Organize Oracle Data Warehouse & Embedded Analytics Endeca Information Discovery Exalytics In-Memory Machine Oracle BI Foundation Suite Analyze & Decide
Oracle s Analytics Platforms Oracle Big Data Appliance Oracle Exadata Oracle Exalytics InfiniBand InfiniBand Stream Acquire Organize Analyze & Visualize Expedited time to value Easier to manage and upgrade Lower cost of ownership Reduced change management risk One-stop support Extreme performance
Big Data Appliance Big Data for the Enterprise Foundation Software: Oracle Linux Oracle Java VM Cloudera Apache Hadoop Distribution Cloudera Manager Oracle NoSQL Database Community Edition Application Software: Oracle NoSQL Database Enterprise Edition New Oracle Big Data Connectors - New Oracle Loader for Hadoop Oracle Direct Connector for HDFS Oracle Data Integrator Application Adapter for Hadoop Oracle R Connector for Hadoop 18 Sun X4270 M2 Servers 48 GB memory per node = 864 GB memory 12 Intel cores per node = 216 cores 36 TB storage per node = 648 TB storage 40 Gb/sec InfiniBand 10 Gb/sec Ethernet
Input Oracle Loader for Hadoop Partition and transform into Oracle ready format Load Query.. Table Input Oracle Loader for Hadoop
Oracle Data Integrator & OLH
Oracle R Connector for Hadoop R package that provides an interface between the local R environment, Oracle Database, and Hadoop Using simple R functions, copy data between R memory, the local file system, Oracle Database, HDFS Schedule R programs to execute as Hadoop MapReduce jobs - return the results to any of the locations
Oracle Database 11g Data Warehousing The Leading Database for Data Warehousing Key Data Warehousing Capabilities Flexible Model Deployment Embedded Analytics Advanced Analytics (R & Data Mining) OLAP Single Point of Management Secure 24X7 Availability Optimal Storage Management Scaled to petabytes & large business analyst communities
Exadata Hardware Architecture Database Grid 8 x Dual-processor x64 database servers OR 2 x Eight-processor x64 database servers InfiniBand Network 3 x 36-port 40Gb/s switches Unified server & storage network Intelligent Storage Grid 14 High-performance low-cost storage servers 100 TB High Performance disks, or 504 TB High Capacity disks 22.4 TB PCI Flash Intelligent Storage Server Software
Exadata Storage Server Software Innovations Intelligent storage Scale-out InfiniBand storage Smart Scan query offload Hybrid Columnar Compression 6-10x compression for warehouses 10-15x compression for archives + + + Smart PCI Flash Cache Accelerates random I/O up to 30x Triples data scan rate Data remains compressed for scans and in Flash uncompressed compress primary DB Benefits Cascade to Copies standby test dev backup
Roles for Data Warehouse & Middle Tier BI Data Warehouse Optimized storage for enterprise data volumes Exceeds performance & availability SLAs Persistent & secure version of the truth Flexible schema IT timeframes for solutions Middle-Tier BI Optimized for information delivery Quality of data visualization is key Discovery, scenario modeling, scorecards Dimensional-style self-guided analysis Easy to add new sources of data
Oracle Exalytics & BI Foundation Suite Platform TimesTen for Exalytics Adaptive In-Memory Tools 1 TB RAM 40 Processing Cores High Speed Networking Essbase In-Memory Oracle Business Intelligence Foundation Suite In-Memory Analytics Exalytics Hardware
End-user Experience with Exalytics Speed of Thought Interactive Analysis Highly Interactive Analysis Free Form Data Exploration View Auto Suggestions Contextual Actions
TimesTen In-Memory Database for Exalytics Better Analytics Support OLAP Grouping Operators: CUBE, ROLLUP, GROUPING SETS WITH Clause Analytic Functions: RANK, DENSE_RANK, SUM, AVG, ORDER BY NULLS FIRST LAST Time functions: TIMESTAMPADD, TIMESTAMPDIFF Columnar Compression
In-Memory Analytics: Intelligent Cache Full Navigation Into Result Cache Automatically Cache Past Results Treat Cache as Logical Table Source Re-write queries on the cache Applicable to any size DW Tools BI Server result cache In-Memory store Install to create cache directories in memory 1TB RAM In-Memory Cache Data sources
In-Memory Analytics: Adaptive Data Mart Hot data mart in memory Automatically identify Slow data sources, facts, grains Workload distribution Optimal data mart for overall performance Applicable to any size DW Clustering to expand memory Tools Summary advisor BI Server aggr. persist. to create hot data mart in TimesTen for Exalytics Incremental refresh: double buffering 1TB RAM Hot data In Memory Data sources
In-Memory Analytics: Essbase Cubes Specific Subject Areas in Memory Specify subject areas for cube spinoff High performance scenario modeling (read+write) Query acceleration Manual configuration 1TB RAM In-Memory Cube Data sources
Matured vs. New Data Analysis Processes DECIDE ACQUIRE DECIDE ACQUIRE Matured New ANALYZE ORGANIZE ORGANIZE ANALYZE
Oracle Endeca Information Discovery Helps organizations quickly explore ALL relevant data Unified Querying Endeca Information Discovery Interactive Exploration Endeca Server App Composition Faceted Data Model Integration Enrichment Combines structured & unstructured data from disparate systems Automatically organizes information for search, discovery & analysis
Oracle Exalytics & Endeca Platform Deep Search Contextual Navigation Endeca Server In-Memory 1 TB RAM 40 Processing Cores High Speed Networking Visual Analysis Oracle Endeca Information Discovery In-Memory Data Discovery Exalytics Hardware
In-Memory Data Discovery: Endeca Server Unstructured & Structured Data in Memory Specify data sources for loading into Endeca Server High performance data discovery for all types of data in the Endeca Server Manual configuration 1TB RAM In-memory Unstructured & Structured data Data sources
Making Sense of Diverse Data Sources Data Warehouse Shopping Cart Site Sensors Website Logs & Data NoSQL DB
Determine Value of Data of All Types Knowledge Discovery Engine Structured Data Warehouse Unstructured Semi-structured High Volume Distributed File System Sensors Website Logs & Data NoSQL DB
Valuable Data Found Now Store it Securely Persistent Data Store for All Data of Value Knowledge Discovery Engine Data Warehouse Discoveries High Volume Distributed File System MapReduce code separates valued data, then sent to via specialized adapters to Data Warehouse Sensors Website Logs & Data NoSQL DB
Deploy Widely Available Reports & Analytics Persistent Data Store for All Data of Value + In-DB Analytics Knowledge Discovery Engine Data Warehouse BI Tools and Dashboards Enterprise-class for reporting & analysis High Volume Distributed File System MapReduce code separates valued data, then sent to via specialized adapters to Data Warehouse Sensors Website Logs & Data NoSQL DB
Feed the Recommendation Engine Persistent Data Store for All Data of Value + In-DB Analytics Knowledge Discovery Engine Data Warehouse BI Tools and Dashboards Update Website Recommendations High Volume Distributed File System MapReduce code separates valued data, then sent to via specialized adapters to Data Warehouse Real-Time Analytics and Recommendations Sensors Website Logs & Data NoSQL DB
Make Well-Tuned Real-Time Recommendations Persistent Data Store for All Data of Value + In-DB Analytics Knowledge Discovery Engine Data Warehouse BI Tools and Dashboards High Volume Distributed File System MapReduce code separates valued data, then sent to via specialized adapters to Data Warehouse Location & User Profile Real-Time Analytics and Recommendations Recommend Sensors Website Logs & Data NoSQL DB
Only Oracle Offers this Entire Solution Fast, Intuitive Data Discovery Endeca Information Discovery on Exalytics Unstructured Data Analysis Structured Data Analysis Reliable, Available, Secure Source of Truth Oracle Database DW on Exadata Advanced Analytics Analyst Friendly Reporting Query and Analysis Tools Oracle BI Foundation Suite on Exalytics Real-time Recommendations Cloudera HDFS on Big Data Appliance Oracle ERP & CRM Solutions on Exadata Oracle Real-Time Decisions Unstructured Data Analysis Sensors Website Logs & Data Oracle NoSQL DB
Oracle Big Data Architecture Capabilities Data Acquire Organize Analyze Decide Structured Master & Reference Transactions Machine Generated Social Media Text, Image Video, Audio DBMS (OLTP) Files NoSQL HDFS ETL/ELT ChangeDC Real-Time Unstructured Semistructured Message- Based Hadoop (MapReduce) ODS Warehouse Streaming (CEP Engine) In-Database Analytics Reporting & Dashboards Alerting & Recommendations EPM, BI, Social Applications Text Analytics and Search Advanced Analytics Interactive Discovery Management Security, Governance Specialized Hardware Big Data Cluster High Speed Network RDBMS Cluster In Memory Analytics
Oracle Delivers Value from ALL Data Best for Business, Best for IT Better Insights, Decisions, Actions From measurement to analysis, forecasting & optimization Insights across time, functions and roles Persistent version of the truth for ALL DATA Most Complete, Open, Integrated From Discovery to Dashboards to Analytics to Data Management Standards based & blending of Open Source components Optimized integrated Engineered Systems & software World Class Analytics Infrastructure Best of Breed capabilities at each layer of the stack Uniquely enables complete analysis of ALL DATA Enterprise Architecture: scalable, reliable, manageable & secure