Oracle Data Warehousing and Big Data: Strategy and Roadmap Jean-Pierre Dijcks Data Warehousing Product Management 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle.
Oracle: Built for Data Warehousing OLAP ETL Data Mining Optimized for strategic warehousing Ad hoc queries over detail data Optimized for operational warehousing Near instantaneous tactical, short-running queries Optimized for real-world data loading Round-the-clock loading with concurrent querying Optimized for advanced analytics In-database analytics Optimized for large data sets Compression and partitioning 3 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Data Warehouse Reference Architecture 4 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Workload Management for Data Warehousing! Define Workload policies for Mixed Workload! Set priorities and allocate resources! Set thresholds and throttles! Monitor the Workload! Adjust the Policies over Time Define Workload Plans Adjust Workload Plans Execute Workloads Monitor Workloads Managed from Enterprise Manager, using Database Resource Manager 5 Copyright 2011, Oracle and/or its affiliates. All rights reserved. 2010 Oracle Corporation
Exadata Changes the Equation A major leap in performance, capacity and value!!!!!! Perform database queries 10x faster Consume 10x less storage Consolidate platforms, databases, power, cooling, administration Run existing applications unchanged Apply existing personnel, skills, Oracle licenses Eliminate systems integration trial-and-error Exadata is the fastest growing product in Oracle s history Larry Ellison, Oracle CEO 6 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
What does Extreme Performance mean for your business? Massive data volumes More granular data Daily data instead of weekly Store data instead of account More history 5 years instead of 1 year New data sources Consumer-level data Entirely new analytics Queries that were never possible now run in minutes Near-real-time data loading 8 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Turkcell: 10x Compression, 10x Speedup 250 TB warehouse compresses to 25 TB 1 Exadata rack 25 TB compressed data 10:1 advantage 50,000 Reports run 10x faster each month (avg 27 min to 3 min) 1.5 Billion records (2-3 TB raw) loaded daily (data doubles yearly) Redundancy/HA built-in 10 storage racks 1 large SMP server 250 TB raw data Hitachi USP-V 5 Racks EMC DMX-4 5 Racks 11 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
In-Database Analytics Bring Algorithms to the Data, Not Data to the Algorithms OLAP Statistics Data Mining Analytic computations done in the database Dimensional analysis Statistical analysis Data Mining Scalability Security Backup & Recovery Simplicity 12 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Oracle OLAP In-Database OLAP Embedded in Oracle database instance Cubes store data in Oracle storage Cubes work seamlessly with security and high availability features Cubes can be queried using SQL or MDX Consolidate analytics in the database Quickly build analytically rich applications Summary management Fast query and fast, incremental updates 13 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Oracle Industry Data Models Reference Data Model Aggregate Data Model Relational (STAR) for BI OLAP for Analytical Derived Data Model Data Mining/Complex Reports/Query Base Data Model (3NF) Atomic Level of Transaction Data Combine deep industry knowledge with data warehousing expertise Help jump-start design and implementation of data warehouses Optimized for Oracle Database 11g and Oracle Exadata
Extreme Performance Data Warehousing Integrated Technology Stack BI Applications BI Tools ELT Tools Data Models Database Single source of truth Extreme performance Lower cost of ownership Deeper Insight Smart Storage
Tapping into Diverse Data Sets Big Data: Decisions based on all your data Video and Images Social Data Documents Machine-Generated Data Information Architectures Today: Decisions based on database data Transactions
Big Data Is About Tapping into diverse data sets Finding and monetizing hidden relationships Driving data-based business decisions
Big Data: Challenge to Opportunity Harness Big Data to Increase Business Value Business Value Big Data Platform Deep Analytics High Agility Massive Scalability Real Time Tomorrow Challenges Today High Variety High Volume High Complexity Low Latency Big Data Time
Big Data: Infrastructure Requirements Acquire Organize Analyze Low, predictable Latency High Transaction Count Flexible Data Structures High Throughput In-Place Preparation All Data Sources/Structures Deep Analytics Agile Development Massive Scalability Real Time Results 19 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Divided Solution Spectrum Data Variety Unstructured Low Density Schema-less Distributed File Systems Transaction (Key-Value) Stores MapReduce Solutions NoSQL Flexible Specialized Developer Centric Schema High Density DBMS (OLTP) ETL DBMS (DW) Advanced Analytics SQL Trusted Secure Administered Information Density Acquire Organize Analyze 20 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Hadoop to Oracle Bridging the Gap Data Variety Low Density HDFS Cassandra, Voldemort, HBase, MongoDB Hadoop (Cloudera, MapR, HortonWorks) Oracle Loader for Hadoop High Density RDBMS (OLTP) ETL RDBMS (DW) Advanced Analytics Information Density Acquire Organize Analyze 21 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Integrated Solution Stack Data Variety Low Density High Density HDFS Oracle NoSQL DB Oracle Database (OLTP) Hadoop OLH Oracle Data Integrator Oracle Database (DW) In-DB Analytics R Mining Text Graph Spatial Oracle BI EE New products and capabilities Information Density Acquire Organize Analyze 22 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Engineered Solutions Data Variety Low Density Big Data Appliance High Density Exadata Exalytics Information Density Acquire Organize Analyze 23 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Big Data Appliance: Hardware 18 Sun X4270 M2 Servers 48 GB memory per node; 864 GB memory total 2 CPUs (6-core Intel) per node, 216 cores total 12 x 2 TB HDD capacity, 432TB raw disk total 3 Infiniband switches 40 Gb/sec InfiniBand 100 total ports (for internal backplane and interconnection to Exadata) 10 Gb/sec Ethernet 16 total ports (for connection to datacenter) 24 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Big Data Appliance: Software Big Data for the Enterprise Foundation Software: Oracle Linux Oracle Java VM Open-source Apache Hadoop Distribution Open-source R Distribution Application Software: Oracle NoSQL Database Enterprise Edition New Oracle Loader for Hadoop - New Oracle Data Integrator Application Adapter for Hadoop - New 25 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Big Data Appliance Big Data for the Enterprise Optimized and Complete Everything you need to store and integrate your lower information density data Integrated with Oracle Exadata Analyze all your data Easy to Deploy Risk Free, Quick Installation and Setup Single Vendor Support Full Oracle support for the entire system and software set 26 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Oracle s Big Data solution Oracle Big Data Appliance Oracle Exadata Oracle Exalytics InfiniBand InfiniBand Stream Acquire Organize Analyze & Visualize 27 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Inside the Big Data Appliance Applications <Insert Picture Here>
Oracle NoSQL Database A distributed, scalable key-value database Simple Programming and Operational Model Simple Major + Sub key and Value data structure ACID transactions Configurable consistency & durability Scalable throughput, bounded latency Commercial Grade Software and Support General-purpose Reliable Based on proven Berkeley DB JE HA Easy to install and configure Easy Management Web-based console, API accessible Manages and Monitors: Topology; Load; Performance; Events; Alerts Application NoSQLDB Driver Storage Nodes Data Center A Application NoSQLDB Driver Storage Nodes Data Center B
Oracle Loader for Hadoop INPUT 1 MAP MAP MAP REDUCE REDUCE MAP MAP REDUCE MAP REDUCE MAP MAP MAP SHUFFLE /SORT REDUCE MAP REDUCE MAP SHUFFLE /SORT REDUCE MAP MAP REDUCE MAP REDUCE MAP REDUCE MAP REDUCE MAP MAP REDUCE INPUT 2 MAP SHUFFLE /SORT SHUFFLE /SORT MAP SHUFFLE /SORT REDUCE 30 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Data Integrator Easily integrate data from any source New functionality: Construct Hadoop jobs to transform and load data into Oracle Leverage Hive and/or Oracle Loader for Hadoop 31 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Analytics <Insert Picture Here> 32 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Exadata: A Platform for Analytics Text Analytics Graph Analytics Statistics 2 miles Spatial Analytics Data Mining Integrate into Applications 33 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
In-Database Text Search Proven Enterprise Grade Search Engine Fully Integrated with the Database Find Relevant Text and Comments: Extreme Performance with Exadata Search over content external to the Database select score(1),doc_url from news_articles where Full-text pub_date search AFTER 1-Jan-2011 and Customizable Contains(doc Relevancy, Automatic Big and Clustering Data within & Supervised Title, Classification 1) > 0; Capabilities: Semantic Search => Efficient evaluation of structured and unstructured filters Standard based & User-defined Document Models => Search within document structures Comprehensive Language and Document Format Support => Customizable scoring and relevancy 34 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
In Database Graph Analytics Only leading commercial database with native RDF graph capabilities Uncover Scales to Social repositories Relationships: w/ billions of triples Designed for parallel operations SELECT c_id, relationship Choice of SQL query, SPARQL query, 3 rd party & open source tools FROM Customers WHERE Native SEM_RELATED OWL inferencing support (friends, W3C standards-based technologies Ontology assisted query of relational data rdfs:subclassof, current_customer, Social_ontology = 1) AND SEM_DISTANCE() <= 2; => Broad user community and all BI tools can leverage Data Mining => Parallelism dramatically and transparently improves performance 35 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
In-Database Spatial Analytics Industry leading spatial database server Fully integrated in RDBMS Analyze Regional Differences: Rich feature set: SELECT c.holding_company, c.location FROM competitor c, bank b WHERE b.site_id = 1604 AND SDO_WITHIN_DISTANCE(c.location, b.location, 'distance=2 unit=mile') = 'TRUE' Vector and Raster data sets Spatial analytics SQL Interface Java API Much more => OBI EE and MapViewer delivers Spatial Data to any user => Spatial analysis co-located with the data for exceptional performance Access to external data Leverage any tool to access spatial analysis 2 miles 36 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
In-Database Data Mining 12 embedded, cutting edge algorithms and 50+ native statistical functions Derive Sentiment and Loyalty: Natively Parallel select cust_id from Offloading customers to Exadata Storage Server to improve where model region scoring = US performance 10x and prediction_probability (churnmod, Y using *) > 0.8; Fully integrated with SQL Supports model design using non-oracle tools, => Broad user community and all BI tools can leverage Data Mining while delivering Oracle performance => Parallelism dramatically and transparently improves performance => Off-loading model scoring transparently improves performance 37 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
In-database XML Industry leading native XML database Fully integrated in RDBMS Directly Analyze Purchases: Rich feature set: let$$user$:=$"sking $ for$$doc$in$fn:collection("oradb:/oe/purchaseorder )$ Native XML Storage and Indexing $$$$where$$doc/purchaseorder[user$=$$user]$ $$$$order$by$$doc/purchaseorder/reference$ $$$$return$$doc/purchaseorder/reference$ Multiple API s (SQL, Java, SOAP) Full support for XQUERY and SQL/XML Java API Much more Fully parallel XML operations <PurchaseOrder,,,DateCreated= 2011501531 >,,,,,,,,,<LineItems>,,,,,,,<LineItem,ItemNumber="1">,,,,,,,,,,<Part,Description="Octopus >31398750123</Part>,,,,,,,,,,<Quantity>3.0</Quantity>,,,,,,,</LineItem>,..,,,,,,,<LineItem,ItemNumber="5">,,,,,,,,,,<Part,Description="King,Ralph">18713810168</Part>,,,,,,,,,,<Quantity>7.0</Quantity>,,,,,,,</LineItem>,,,,</LineItems>, </PurchaseOrder>, XQuery operations on XML and Relational data. SQL operations on XML Content XML/SQL interoperability enables SQL based access to XML content and XML operations on relational data 38 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
In-database statistics: Oracle R Enterprise Open source statistical programming language and environment Statistical Oracle R Patterns: Enterprise: Embedded component of the DBMS server Save money on SA$! require(hdrcde) x <- ONTIME_S[ONTIME_S$DEST=="SFO"]$ARRDELAY x <- ore.pull(x[x>-100 & x < 200 &!is.na(x)]) hdr.den(x, Fully leverages the newest R algorithms and models contributed to the Comprehensive R Archive Network (CRAN) main="density Allows R to run for on very Arrival large Delay data sets at SFO", resident in Oracle tables xlab="arrival or external tables Delay (minutes)") Complements Oracle s in-database Data Mining rug(x, ticksize= 0.01) => Readily integrate R models into production systems and BI dashboards => Use and manipulate database data without knowing SQL 39 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
In-database Statistics and Advanced Analytics with R Deliver enterprise-level advanced analytics based on R environment 1. Oracle s Distribution of Open Source R Enterprise support for open-source R Enhanced performance with Intel MKL libraries for x86 hardware 2. Oracle R Enterprise Eliminates R s memory constraint by enabling R to work directly and transparently on database-resident data Transparently leveraging Oracle s in-database analytics via R language Enables integration of R scripts into enterprise production applications and OBIEE dashboards Leverages latest R algorithms and CRAN packages 40 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Oracle R Architecture R workspace console Oracle statistics engine Function pushdown data transformation & statistics OBIEE, Web Services No changes to the user experience Scale to large data sets Embed in operational systems 41 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Oracle In-Database Advanced Analytics Comprehensive Advanced Analytics Platform Oracle R Enterprise Popular open source statistical programming language & environment Integrated with database for scalability Wide range of statistical and advanced analytical functions R embedded in enterprise appls & Oracle BI Foiundation Suite Extensive graphics R Open Source Oracle Data Mining Automated knowledge discovery inside the Database 12 in-database data mining algorithms Text mining Predictive analytics applications development environment Exadata "scoring" of Oracle Data Mining models Statistics Advanced Analytics Data & Text Mining Predictive Analytics 42 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
In-Database Analytics with Best of Breed Partners: Exadata Integration with SAS + Popular Base SAS PROCS for Oracle in-database processing: FREQ, MEANS, RANK, REPORT, SORT, SUMMARY, TABULATE Scalable, parallel execution of SAS functions against Oracle data Ongoing joint engineering 43 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Parallel Analytics Foundation Open Source Analytics Hadoop Oracle R Spatial Analytics Oracle MapReduce External Data Data Mining Text Analytics and Search SQL Analytics Weblogs XML Media Social Data Text Katana DB Parallel Engine In-Memory Parallel Processing XML Relational OLAP Spatial Data Layer RDF Media 44 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Oracle s Big Data solution Oracle Big Data Appliance Oracle Exadata Oracle Exalytics InfiniBand InfiniBand Stream Acquire Organize Analyze & Visualize 47 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
To Learn More SESSION DAY TIME ROOM In-Database Analytics: Statistics and Advanced Analytics with R Monday 5:00-6:00 Managing Big Data Using Hadoop with Oracle Exadata Monday 5:00-6:00 In-Memory Data Warehousing: Bringing Real-Time to Big Data Customer Experience: Using Hadoop with Oracle Database 11g Tuesday 10:15 11:15 Tuesday 1:15 2:15 Moscone S. RM 303 Moscone S. RM 308 Moscone S. RM 303 Marriott Marquis Golden Gate B 48 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
49 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle.
51 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
52 Copyright 2011, Oracle and/or its affiliates. All rights reserved.