How To Manage Big Data In A Microsoft Cloud (Hadoop)

Similar documents

Safe Harbor Statement

Oracle Big Data SQL Technical Update

Big Data SQL and Query Franchising

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle

Oracle Big Data Building A Big Data Management System

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Database 12c Plug In. Switch On. Get SMART.

Seamless Access from Oracle Database to Your Big Data

Oracle Database In-Memory The Next Big Thing

2009 Oracle Corporation 1

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

TUT NoSQL Seminar (Oracle) Big Data

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Using RDBMS, NoSQL or Hadoop?

Constructing a Data Lake: Hadoop and Oracle Database United!

The Future of Data Management

Integrating Apache Spark with an Enterprise Data Warehouse

Using distributed technologies to analyze Big Data

Oracle Big Data Fundamentals Ed 1 NEW

Oracle Big Data Strategy Simplified Infrastrcuture

Integrate Master Data with Big Data using Oracle Table Access for Hadoop

Architecting for the Internet of Things & Big Data

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Actian SQL in Hadoop Buyer s Guide

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

Luncheon Webinar Series May 13, 2013

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Native Connectivity to Big Data Sources in MSTR 10

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Big Data Technologies Compared June 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Inge Os Sales Consulting Manager Oracle Norway

Oracle MulBtenant Customer Success Stories

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

An Oracle White Paper June Oracle: Big Data for the Enterprise

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Main Memory Data Warehouses

Interactive data analytics drive insights

Hadoop Ecosystem B Y R A H I M A.

Best Practices for Hadoop Data Analysis with Tableau

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Oracle BI Roadmap & Visual Analyzer Ljiljana Perica, Oracle Business Solution Leader Ljiljana.perica@oracle.com

Safe Harbor Statement

NoSQL for SQL Professionals William McKnight

Oracle Database In-Memory A Practical Solution

HDP Hadoop From concept to deployment.

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

Oracle Database 11g Comparison Chart

The Future of Data Management with Hadoop and the Enterprise Data Hub

Implement Hadoop jobs to extract business value from large and varied data sets

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Oracle Big Data Management System

Where is... How do I get to...

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Big Data Course Highlights

Big Data Are You Ready? Thomas Kyte

Oracle Database 12c for Data Warehousing and Big Data ORACLE WHITE PAPER SEPTEMBER 2014

Cost-Effective Business Intelligence with Red Hat and Open Source

Big Data on Microsoft Platform

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3

Big Data Can Drive the Business and IT to Evolve and Adapt

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

Ganzheitliches Datenmanagement

An Oracle White Paper October Oracle: Big Data for the Enterprise

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

SQL Server 2012 Performance White Paper

Big Data Analytics Nokia

How To Use Big Data For Telco (For A Telco)

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

An Oracle White Paper September Oracle: Big Data for the Enterprise

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Presenters: Luke Dougherty & Steve Crabb

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Oracle: Database and Data Management Innovations with CERN Public Day

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Safe Harbor Statement

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

HDP Enabling the Modern Data Architecture

Digital Transformation

Getting Started Practical Input For Your Roadmap

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Information Builders Mission & Value Proposition

Disrupt or be disrupted IT Driving Business Transformation

Understanding the Value of In-Memory in the IT Landscape

Transcription:

Oracle Database 12c and the Future of Data Warehousing in the Era of Big Data George Lumpkin Data Warehousing Neil Mendelson Big Data & Advanced AnalyEcs Vice Presidents Server Technologies September 29, 2014

Safe Harbor Statement The following is intended to outline our general product direceon. It is intended for informaeon purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or funceonality, and should not be relied upon in making purchasing decisions. The development, release, and Eming of any features or funceonality described for Oracle s products remains at the sole discreeon of Oracle. 3

Big Data Opportunity Typical use cases in today s world of fast exploraeon of big data Big Data PorZolio Analysis Financial Services Fraud UEliEes Manufacturing Retail Session- izaeon Telcos Call Quality Tracking Stock Market Money Laundering Network Analysis Quality Assessment Supply Planning Buying Pa\erns Returns Fraud SIM Card Fraud Money Laundering Slide - 4

Extending Data Management Big Data = Hadoop + NoSQL + Rela5onal Hadoop Change the Business Disrupt compeetors Disintermediate supply chains Leverage new paradigms Exploit new analyses NoSQL Scale the Business Meet mobile challenges Accelerate developer agility Scale- out economically Serve data faster Rela5onal Run the Business Integrate exiseng systems Support mission- criecal tasks Protect exiseng expenditures Insure skills relevance Oracle ConfidenEal Internal/Restricted/Highly Restricted 5

But fundamental architectures remain Oracle InformaEon Management Reference Architecture Data IngesEon Access & Performance Layer Past, current and future interpretaeon of Access and enterprise Performance data. Structured to support Layer agile Foundation Data Layer Raw Data Reservoir access & navigaeon Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Immutable raw data reservoir Raw data at rest is not interpreted InformaEon InterpretaEon

New features for Data Warehousing and Big Data Oracle Database 12c Release 1 (12.1.0.2) Oracle Database In- Memory SIMD Vector Processing Column- Store Storage Indexes In- Memory AggregaEon New SQL Capabili5es A\ribute Clustering Zone Maps for Exadata JSON SQL FuncEons Approximate Count DisEnct Big Data SQL Public 7

Oracle In- Memory Columnar Technology Pure In- Memory Columnar SALES Pure in- memory column format Not persistent, and no logging 2x to 20x compression Enabled at table or pareeon level Public 8

Scans Billions of Rows per Second per CPU Core Memory CPU Load muleple region values REGION Vector Register CA CA CA CA Example: Find all sales in region of CA Vector Compare all values in 1 cycle > 100x Faster Each CPU core scans local in- memory columns Scans use super fast Single InstrucEon muleple Data Values (SIMD) vector instruceons Originally designed for graphics & science Billions of rows/sec scan rate per CPU core Public 9

In- Memory Column Store Storage Index In- Memory IMCU IMCU IMCU IMCU SALES ORDER_DATE Min 1992-01- 01 Max 1996-01- 01 Min 2004-01- 01 Max 2007-01- 01 Min 2009-06- 01 Max 2013-07- 01 Min 1999-01- 01 Max 2014-03- 01 Data stored in In- Memory Compression Units (IMCU s) A storage index records min/ max values for each column unit Storage indexes allow IMCU pruning select * from SALES where ORDER_DATE between 2013-01- 01 and 2014-01- 01 10

In- Memory AggregaEon New opemized algorithm for star query processing Processing steps Transform joins into scan of the fact table Fast in- memory scan with array lookups OpEmize aggregaeon using in- memory arrays OpEmized in- memory data structures Late joins to dimension data Minimizes data movement in the execueon plan Example: Report sales by Quarter and Region Time Quarters Customers Regions Regions In- Memory Report Outline Quarters $ $$ $$$ $ Sales Sales Oracle ConfidenEal Internal/Restricted/Highly Restricted 11

Zone Maps and A\ribute Clustering X AUribute Clustering Orders data so that columns values are stored together on disk Zone maps Stores min/max of specified columns per zone Used to filter un- needed data during query execueon Combined Benefits: Improved query performance and concurrency Reduced physical data access Significant IO reduceon for highly seleceve operaeons OpEmized space uelizaeon Less need for indexes Improved compression raeos through data clustering Full applicaeon transparency Any applicaeon will benefit Public 12

A\ribute Clustering Concept and Benefits Orders data so that it is in close proximity based on selected columns values: a\ributes A\ributes can be from a single table or muleple tables e.g. from fact and dimension tables Able to cluster data during MOVE PARTITION Benefits Significant IO pruning when used with zone maps Also, reduced block IO for table lookups in index range scans Improved performance for queries that sort and aggregate pre- ordered data Improved compression raeos Ordered data is likely to compress more than unordered data Public 13

Zone Maps Persisted storage index X Stores minimum and maximum of specified columns Analogous to a coarse index structure Much more compact than an index Zone maps filter out what you don t need, indexes find what you do need Significant performance benefits with complete applicaeon transparency IO reduceon for table scans with predicates on the table itself or even a joined table using join zone maps (a.k.a. hierarchical zone map ) ParEEoning pruning for every column of a pareeoned table, not only the pareeon key columns Benefits are most significant with ordered data Used in combinaeon with a\ribute clustering or data that is naturally ordered Public 14

A\ribute Clustering With Zone Maps Example X CLUSTERING BY INTERLEAVED ORDER (category, country) Zone map benefits are most significant with ordered data INTERLEAVED ORDER Pruning with: SELECT.. FROM table WHERE category = BOYS ; SELECT.. FROM table WHERE country = US SELECT.. FROM table WHERE category = BOYS ; AND country = US

Zone Maps with A\ribute Clustering Star Schema Benchmark X Overall, 2.6X elapsed Eme improvement over baseline Comparing with and without zone map and a\ribute clustering Query Elapsed Time Improvements Improvement X 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 Query Step Public 16

New Performance Features MulEply the Benefits 100 TB of User Data 10 TB of User Data With 10x Compression 2TB of User Data With ParEEon Pruning 2 TB of User Data 100 GB of User Data 1TB on disk, 1TB in- memory With Storage Indexes and Zone Maps 30 GB of User Data With Smart Scan Sub second Scan No Indexes

EvoluEon of AnalyEcal SQL IntroducEon of window funceons StaEsEcal funceons SQL model clause ParEEon Outer Join In- database Data Mining PaUern matching Top N clause Approx Count dis5nct JSON support 8i 9i 10g 11g 12c Enhanced window funceons (percenele, etc) Rollup, grouping sets, cube SQL Pivot Recursive WITH ListAgg, Nth value window

Why SQL? 1. Enhanced ProducEvity Using SQL, users simply describe the results they want They do not have to describe how to get those results Widespread availability of SQL skills and tools 2. Increased Performance The SQL engine, not the user, determines how to opemize each query Mature SQL engines have broad arsenal of performance techniques 3. Adaptability SQL has proven extensible to new data types and analyecs

Approximate Count DisEnct Not every query requires a completely accurate result How many disenct individuals visited our website last week? New SQL funceon for approximate results for COUNT DISTINCT aggregates APPROX_COUNT_DISTINCT() Approximate results can be significantly faster and use less resources than exact calculaeons 5x to 50x ++ Emes faster (depending upon number of disenct values and complexity of SQL) Accuracy > 97% (with 95% confidence) Public 20

Full power of SQL over JSON documents Sample customers document: { "firstname": "John", lastname : "Smith", cused :55241 "age": 25, "address": { "streetaddress": "21 2nd Street", "city": "New York", "state": "NY", "postalcode": "10021, "isbusiness" : false}, "phonenumbers": [ {"type": "home, "number": "212 555-1234 }, {"type": "fax "number": "646 555-4567 } ] } select J.CUSTOMER_DOC.postalCode, count(*) from JSON_CUSTOMERS J group by J.CUSTOMER_DOC.postalCode; select J.CUSTOMER_DOC.postalCode, sum(s.sales_revenue) from JSON_CUSTOMERS J SALES S where J.CUSTOMER_DOC.custid = S.custid group by J.CUSTOMER_DOC.postalCode;

Future of Data Warehousing in the Age of Big Data

Barriers to Big Data AdopEon Complexity Skills Lack tools and training to exploit Big Data IT OperaEons ability administer and manage Big Data IntegraEon Adding Big Data to exiseng architecture is complex Too much effort required in data preparaeon Security No clear route to governance or enforcement

Big Data Management Hadoop + NoSQL + Rela5onal The Power of Oracle SQL Wide variety of Big Data types Structured data Numeric, string, date, Unstructured data LOBs, Text, XML, JSON, SpaEal, Graph, MulEmedia Rich SQL AnalyEc FuncEons Ranking, Windowing, LAG/LEAD, Aggregate, StaEsEcal, Linear Regression, CorrelaEons, Cross Tabs, Hypothesis TesEng, DistribuEon Fing, 24

What gives Exadata extreme performance? Exadata: Applies SmartScan Close to the Data Query Data in RDBMS Oracle SQL Exadata Oracle Exadata Storage Server Oracle Exadata Storage Server

Oracle Big Data SQL Exadata & Big Data SQL: Applies SmartScan Close to All Data Query Data in RDBMS and Hadoop Oracle SQL Exadata Fast Massive Parallelism Filtered Locally Minimized Data Movement HDFS Data Node BDS Server HDFS Data Node BDS Server Oracle Exadata Storage Server HDFS Data Node BDS Server Big Data Appliance HDFS Data Node BDS Server Oracle Exadata Storage Server

Oracle Big Data SQL: A New Hadoop Processing Engine MapReduce and Hive Processing Layer Spark Impala Search Big Data SQL Resource Management (YARN, cgroups) Storage Layer Filesystem (HDFS) NoSQL Databases (Oracle NoSQL DB, Hbase) Oracle ConfidenEal Internal/Restricted/Highly Restricted 27

Apply Advanced Security on Hadoop & NoSQL Same security policies apply to Hadoop & Rela5onal JSON JSON data unconverted in Hadoop SQL Customer data in Oracle RedacEon Virtual Private Database Fine- grain Access Control Hadoop Redacted data subset Oracle Database 12c Small data subset quickly returned DBMS_REDACT.ADD_POLICY( object_schema => 'txadp_hive_01', object_name => 'customer_address_ext', column_name => 'ca_street_name', policy_name => 'customer_address_redaction', function_type => DBMS_REDACT.RANDOM, expression => 'SYS_CONTEXT(''SYS_SESSION_ROLES'', ''REDACTION_TESTER'')=''TRUE''' ); 28

Govern and Secure Your Data With Oracle Hadoop, NoSQL & Rela5onal BDA Capability AuthenEcaEon through Kerberos AuthorizaEon through Apache Sentry AudiEng through Oracle Audit Vault EncrypEon for Data- at- Rest Network EncrypEon Big Data SQL adds Advanced Security on Hadoop & NoSQL RedacEon Virtual Private Database Fine- grain Access Control Oracle ConfidenEal Internal 29

When eaeng an elephant take one bite at a Eme. General Creighton Abrams

Experiment Big Data Appliance X4-2 6 Node Starter Rack 2 * 8 Core Intel Xeon E5 Processors/Node 384 GB / 3 TB (64 GB Memory / expandable to 512 GB/Node) 288 TB (48TB Disk space/node) Integrated So ware Oracle Linux, Oracle Java VM Oracle Big Data SQL*, Oracle Big Data Connectors* Cloudera DistribuEon of Apache Hadoop EDH EdiEon Cloudera Manager Oracle R DistribuEon Oracle NoSQL Database 40 % Cost Savings 33 % Faster Time to Value * Licensed separately 31

Data Lifecycle Management & Query Offload More data on- line and available at a lower cost Month 14- n Oracle Big Data SQL Move Par55on to BDA Rolling 13 months Big Data Rolling DRAM Windows Process PCI FLASH Copy older pareeon AcEve to Data BDA Update views Drop older Exadata pareeon Oracle Ho\est Data Warm Data Offloaded Data data can be accessed via Oracle & Hadoop No ApplicaEon changes required Hadoop Data Deep Data

Offload ETL from Data Warehouse Offload long running ETL jobs to Hadoop New Sources via Hadoop Leave exiseng ETL in place Sources Staging Files MR Detail MR Temp Fast load Data Warehouse Files SQL Oracle GoldenGate Oracle Data ETL Tool Integrator Oracle Company ConfidenEal 33

StaEsEcal & PredicEve AnalyEcs Bring the Analy5cs to the Data Hadoop / Big Data Appliance Oracle R DistribuEon 1 Oracle R Advanced AnalyEcs for Hadoop 2 SAS High Performance AnalyEcs Oracle Database / Exadata Oracle Advanced AnalyEcs OpEon SAS High Performance AnalyEcs 1 Included with BDA 2 Included w/oracle Big Data Connectors

Oracle Big Data Discovery + Advanced AnalyEcs Changing the Game for Agile Business Innova5on on Big Data Profile Find Understand Transform Discover Predict Collaborate Easily add data and see it automaecally and conenuously cataloged, enriched and related Use familiar guided search across massive amounts of diverse data Know what s important from diagnosec analysis of millions of data characterisecs Powerful tools to quickly clean up and wrangle dirty data so it s ready to go Uncover valuable new insights Use new insights to define and refine prediceve models Publish, share and evolve as you learn more Oracle ConfidenEal Internal 35

Cloud PlaZorm: Big Data AnalyEcs Big Data Service Integrated with DBaaS SQL on Hadoop Hadoop 2.0 Cluster NoSQL Service for key value data Persistent Data Reservoir in Storage Service Single tenant or muletenant IaaS offerings for performance/qos commodity with NAS, Big Data Appliance Big Data Discovery The Visual Face of Big Data Business user and data scienest collaboraeon Self- service data discovery and exploraeon to separate signal from noise Fully managed infrastructure by Oracle Cloud operaeons Hadoop scalability and cost economies 36

Summary

Oracle Data Warehousing in the era of Big Data Innova5ng and preserving customer investments Leverage 12c innovaeons Real Eme analyecs with Oracle Database In- Memory Performance of Exadata Power of SQL Extend your Data Warehouse with Big Data Oracle SQL across Oracle, Hadoop & NoSQL Fast, massively parallel and interaceve data access Reduced data movement throughout the enterprise Securing access to Big Data analyecs Deploy on choice of private and public Clouds 38

39