Big Data SQL and Query Franchising



Similar documents
Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle

Oracle Big Data SQL Architectural Deep Dive

Seamless Access from Oracle Database to Your Big Data

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Konference Data a znalosti 2015

How To Manage Big Data In A Microsoft Cloud (Hadoop)

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

Safe Harbor Statement

Oracle Big Data Strategy Simplified Infrastrcuture

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Oracle Big Data Fundamentals Ed 1 NEW

Oracle Database 12c Plug In. Switch On. Get SMART.

Integrate Master Data with Big Data using Oracle Table Access for Hadoop

Constructing a Data Lake: Hadoop and Oracle Database United!

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Apache Sentry. Prasad Mujumdar

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Hadoop Ecosystem B Y R A H I M A.

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Architecting for the Internet of Things & Big Data

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Using RDBMS, NoSQL or Hadoop?

The Future of Data Management

WHAT S NEW IN SAS 9.4

Big Data Course Highlights

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Data Domain Profiling and Data Masking for Hadoop

Hadoop & Spark Using Amazon EMR

How To Create A Data Visualization With Apache Spark And Zeppelin

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Cloudera Manager Training: Hands-On Exercises

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Complete Java Classes Hadoop Syllabus Contact No:

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig

Customized Report- Big Data

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Cloudera Certified Developer for Apache Hadoop

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Cloudera Backup and Disaster Recovery

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Oracle BI Roadmap & Visual Analyzer Ljiljana Perica, Oracle Business Solution Leader Ljiljana.perica@oracle.com

Qsoft Inc

Data processing goes big

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Big Data Introduction

Hadoop Big Data for Processing Data and Performing Workload

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Big Data Integration: A Buyer's Guide

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

More Data in Less Time

Big Data Too Big To Ignore

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Bringing Intergalactic Data Speak (a.k.a.: SQL) to Hadoop Martin Willcox Director Big Data Centre of Excellence (Teradata

Deploying Hadoop with Manager

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

The Future of Data Management with Hadoop and the Enterprise Data Hub

Apache HBase. Crazy dances on the elephant back

A very short Intro to Hadoop

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Launch into the Oracle Business Intelligence Cloud Service (BICS) Presented by: Stephen Goldsmith Date: May 15, 2015

Implement Hadoop jobs to extract business value from large and varied data sets

Unified Query for Big Data Management Systems

The Hadoop Eco System Shanghai Data Science Meetup

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Oracle Big Data Essentials

Deploying an Operational Data Store Designed for Big Data

Professional Hadoop Solutions

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Native Connectivity to Big Data Sources in MSTR 10

#TalendSandbox for Big Data

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Information Builders Mission & Value Proposition

AtScale Intelligence Platform

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Open Source Technologies on Microsoft Azure

Cloudera Enterprise Data Hub in Telecom:

Big Data Analytics Nokia

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Transcription:

Big Data SQL and Query Franchising An Architecture for Query Beyond Hadoop Dan McClary, Ph.D. Big Data Product Management Oracle Copyright 2014, Oracle and/or its affiliates. All rights reserved.

Safe Harbor Statement The following is intended to outline our general product direcnon. It is intended for informanon purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or funcnonality, and should not be relied upon in making purchasing decisions. The development, release, and Nming of any features or funcnonality described for Oracle s products remains at the sole discrenon of Oracle. Oracle ConfidenNal Internal/Restricted/Highly Restricted 3

Agenda 1 2 3 4 Fixing the fashion problem Why Unified Query MaYers Query Franchising: an architecture for unified query Customer 360 Live! Oracle ConfidenNal Internal/Restricted/Highly Restricted 4

Oracle has a fashion problem Every industry analyst I ve ever talked with.

We thought everything went with black.

Risk Removal, Not Empire Building Make the Big Data ecosystem easy to consume Simplified operanons Databricks cernfied Spark environment Cloudera s Enterprise Data Hub Encourage and support the use of new technologies We build on Hadoop! You should build on Hadoop! De- risk new innovanons with bridges to the business

@ Oracle Global Support Services Cloudera s DistribuNon captures HW failures Integrate with customer telemetry, configuranons, service history, diagnosncs Predict failures, save money, and serve customer beeer AnFcipate Detect Predict Automate Delight

Oracle Big Data Discovery: Built on Find Explore Discover Transform Copyright 2014, Oracle and/or its affiliates. All rights reserved. Oracle ConfidenNal Internal 9

Oracle Data Enrichment Cloud Service: Uses Scalable Machine Learning Pipeline Natural Language Processing Knowledge Graph- driven Repair and Blending InteracNve RecommendaFons and Visual Profiling to Transform Noisy Signal Into Refined InformaNon Sets for AnalyNcs Oracle ConfidenNal 10

Data AnalyNcs Challenge (2012) Separate data access interfaces 11

Tables on NoSQL SQL on Hadoop 12

Data AnalyNcs Challenge: 2013 No comprehensive SQL interface across Oracle, Hadoop and NoSQL 13

Oracle Big Data Management System Unified SQL access to all enterprise data NoSQL 14

What Does Unified Query Mean for You? Before Data Science AUer PhD Anyone???

What Does Unified Query Mean for You? Before ApplicaFon Development AUer

How Does Unified Query Work? Unify Metadata Catalog data sources Translate queries into plans Distribute ExecuNon Distribute the plan Do work Return answer

Unifying Metadata

Why Unify Metadata? CREATE TABLE customers! SELECT name FROM customers! sales Catalog customers Query CREATE across TABLE sources sales! à Integrate new metadata SELECT customers.name, sales.amount! SALES CUSTOMERS

Metadata: InputFormats and SerDes Consumer Scan and row creanon needs to be able to work on any data format Create ROWS & COLUMNS SCAN Data Node disk Data defininons and column deserializanons are needed to provide a table RecordReader => Scans data (keys and values) InputFormat => Defines parallelism SerDe => Makes columns Metastore => Maps DDL to Java access classes 20

SQL- on- Hadoop Engines Share Metadata, not MapReduce Hive Metastore Oracle Big Data SQL SparkSQL Hive Impala Hive Metastore Table DefiniNons: movieapp_log_json Tweets avro_log Metastore maps DDL to Java access classes Oracle ConfidenNal Internal/Restricted/Highly Restricted 21

Extend Oracle External Tables CREATE TABLE movielog (! click VARCHAR2(4000))! ORGANIZATION EXTERNAL (! TYPE ORACLE_HIVE! DEFAULT DIRECTORY DEFAULT_DIR! ACCESS PARAMETERS! (! com.oracle.bigdata.tablename logs! com.oracle.bigdata.cluster mycluster! ))! REJECT LIMIT UNLIMITED;! New types of external tables ORACLE_HIVE (inherit metadata) ORACLE_HDFS (specify metadata) Access parameters for Big Data Hadoop cluster Remote Hive database/table DBMS_HADOOP Package for automanc import 22

Enhance Oracle External Tables CREATE TABLE ORDER (! cust_num VARCHAR2(10),!!order_num VARCHAR2(20),!!order_total NUMBER(8,2))! ORGANIZATION EXTERNAL (! TYPE ORACLE_HIVE! DEFAULT DIRECTORY DEFAULT_DIR! )! PARALLEL 20! REJECT LIMIT UNLIMITED;! Transparent schema- for- read Use fast C- based readers when possible Use nanve Hadoop classes otherwise Engineered to understand parallelism Map external units of parallelism to Oracle Architected for extensibility StorageHandler capability enables future support for other data sources Examples: MongoDB, HBase, Oracle NoSQL DB 23

That just makes a good client.

Distribute ExecuNon

But how?

Language- level FederaNon Fails Been there, done that. Hadoop Part with sites as! (! select s.custid as cust_id, listagg(s.site, ',')! within group (order by s.custid) site_list! from shortcodes s! group by custid! ),! select c.first_name, c.last_name, c.age, c.state_province, s.site_list! from customers c, sites s;!! Database Part

Language- level FederaNon Fails Been there, done that. select s.custid as cust_id,! listagg(s.site, ',')! within group (order by! s.custid) site_list! from shortcodes s! group by custid!! Operators exist in both places? Is their behavior consistent? How do you negonate resources?

We have to do beeer

Query Franchising dispatch of query processing to self- similar compute agents on disparate systems without loss of opera7onal fidelity

What does that mean?

Query Franchising: Uniform Behavior, Disparate LocaNon 2 1 1 Top- level plan created HolisNc plan plan for all work Distribute to franchises based by locanon 2 2 Franchisees carry out local work Franchises secure and unlize resources All franchises speak the internal language 2 3 Global operanons opnmized Adapts to local varianon Nothing lost in translanon

What Can Big Data Learn from Exadata? Query FederaFon for Oracle Database SELECT name, SUM(purchase)! FROM customers! GROUP BY name;! 1 Oracle SQL query issued Plan constructed Query executed Oracle Exadata Storage Server 2 Smart Scan Works on Storage Filter out unneeded rows Project only queried columns Score data models Bloom filters to speed up joins Oracle Exadata Storage Server CUSTOMERS

Big Data SQL Server: A New Hadoop Processing Engine MapReduce and Hive Processing Layer Spark Impala Search Big Data SQL Resource Management (YARN, cgroups) Storage Layer Filesystem (HDFS) NoSQL Databases (Oracle NoSQL DB, Hbase) Oracle ConfidenNal Internal/Restricted/Highly Restricted 34

Smart Scan for Hadoop: OpNmizing Performance Big Data SQL Server Smart Scan External Table Services Data Node Oracle on top Apply filter predicates Project columns Parse semi- structured data Hadoop on the boeom Work close to the data Schema- on- read with Hadoop classes TransformaNon into Oracle data stream Disk Oracle ConfidenNal Internal/Restricted/Highly Restricted 35

Big Data SQL Query ExecuNon How do we query Hadoop? HDFS NameNode Hive Metastore HDFS Data Node BDS Server B B B HDFS Data Node BDS Server 1 2 3 1 2 3 Query compilanon determines: Data locanons Data structure Parallelism Fast reads using Big Data SQL Server Schema- for- read using Hadoop classes Smart Scan selects only relevant data Process filtered result Move relevant data to database Join with database tables Apply database security policies

Big Data SQL Dataflow Big Data SQL Server Smart Scan External Table Services SerDe RecordReader Data Node Disks 10110010 10110010 10110010 3 2 1 1 2 3 Read data from HDFS Data Node Direct- path reads C- based readers when possible Use nanve Hadoop classes otherwise Translate bytes to Oracle Apply Smart Scan to Oracle bytes Apply filters Project Columns Parse JSON/XML Score models

But How Does Security Work? DBMS_REDACT.ADD_POLICY(! object_schema => 'MCLICK',! SELECT object_name * FROM => my_bigdata_table! 'TWEET_V',! WHERE column_name SALES_REP_ID => 'USERNAME',! = SYS_CONTEXT('USERENV','SESSION_USER'); policy_name => 'tweet_redaction',!! function_type => DBMS_REDACT.PARTIAL,! function_parameters => 'VVVVVVVVVVVVVVVVVVVVVVVVV,*,3,25',! expression => '1=1'! );! B B B *** Filter on SESSION_USER 1 2 3 Database security for query access Virtual Private Databases RedacNon Audit Vault and Database Firewall Hadoop security for Hadoop jobs Kerberos AuthenNcaNon Apache Sentry (RBAC) Audit Vault System- specific encrypnon Database tablespace encrypnon BDA On- disk EncrypNon

Customer 360, Live!

Unified Query Means Less Lock- in Unified Query means More innovanon Less risk Use the right tools for your job Hadoop, NoSQL, and whatever s next We ll query all of it Explore and adopt new technologies Focus on creanng value using your data We ll build bridges back to the business