Oracle Big Data SQL Konference Data a znalosti 2015



Similar documents
Oracle Big Data SQL Architectural Deep Dive

SQL - the best analysis language for Big Data!

Seamless Access from Oracle Database to Your Big Data

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle

Big Data SQL and Query Franchising

Exadata V2 + Oracle Data Mining 11g Release 2 Importing 3 rd Party (SAS) dm models

The Oracle Data Mining Machine Bundle: Zero to Predictive Analytics in Two Weeks Collaborate 15 IOUG

Oracle Big Data SQL Technical Update

Statistical Analysis of Gene Expression Data With Oracle & R (- data mining)

Constructing a Data Lake: Hadoop and Oracle Database United!

OLSUG Workshop Oracle Data Mining

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

Oracle Data Mining In-Database Data Mining Made Easy!

Safe Harbor Statement

Blazing BI: the Analytic Options to the Oracle Database. ODTUG Kscope 2013

How To Manage Big Data In A Microsoft Cloud (Hadoop)

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Oracle Database 12c Plug In. Switch On. Get SMART.

Using distributed technologies to analyze Big Data

Integrate Master Data with Big Data using Oracle Table Access for Hadoop

Oracle's In-Database Statistical Functions

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Data Domain Profiling and Data Masking for Hadoop

Cloudera Certified Developer for Apache Hadoop

Using RDBMS, NoSQL or Hadoop?

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Semantic and Data Mining Technologies. Simon See, Ph.D.,

The Hadoop Eco System Shanghai Data Science Meetup

How to Choose Between Hadoop, NoSQL and RDBMS

Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc.

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Big Data Analytics with Oracle Advanced Analytics In-Database Option

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Hadoop Job Oriented Training Agenda

Hadoop Ecosystem B Y R A H I M A.

Qsoft Inc

Real World Hadoop Use Cases

Integrating Apache Spark with an Enterprise Data Warehouse

Internals of Hadoop Application Framework and Distributed File System

Predictive Analytics for Better Business Intelligence

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig

Moving From Hadoop to Spark

Big Data Too Big To Ignore

Advanced Big Data Analytics with R and Hadoop

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

Unified Big Data Processing with Apache Spark. Matei

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015

Complete Java Classes Hadoop Syllabus Contact No:

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University

Connecting Hadoop with Oracle Database

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Native Connectivity to Big Data Sources in MSTR 10

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

HADOOP. Revised 10/19/2015

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Big Data Course Highlights

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

How To Scale Out Of A Nosql Database

Architecting for the Internet of Things & Big Data

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Apache Sentry. Prasad Mujumdar

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Parquet. Columnar storage for the people

Next-Gen Big Data Analytics using the Spark stack

Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

extreme Datamining mit Oracle R Enterprise

How To Create A Data Visualization With Apache Spark And Zeppelin

I/O Considerations in Big Data Analytics

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

So What s the Big Deal?

Inge Os Sales Consulting Manager Oracle Norway

Trafodion Operational SQL-on-Hadoop

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

Hadoop & Spark Using Amazon EMR

Best Practices for Hadoop Data Analysis with Tableau

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

Can the Elephants Handle the NoSQL Onslaught?

Transcription:

Oracle Big Data SQL Konference Data a znalosti 2015 Jakub ILLNER Information Management Architect XLOB Enterprise Cloud Architects 23 July 2015, version 2

Agenda 1 2 3 4 5 Is SQL Dead? Introducing Oracle Big Data SQL How Oracle Big Data SQL Works Demonstration Questions and Answers 2

Is SQL Dead? With all the Big Data and NoSQL technologies, why to bother with SQL 3

Is SQL Dead? 4

//Part 5 // Keeping context var lastuniqueid = "foobar" var lastrecord: (DataKey, Int) = null var lastlastrecord: (DataKey, Int) = null Peaks and Valleys in Spark vs. SQL var position = 0 it.foreach( r => { position = position + 1 if (!lastuniqueid.equals(r._1.uniqueid)) { Ticker lastrecord = null lastlastrecord = null //Part 6 : Finding those peaks and valleys if (lastrecord!= null && lastlastrecord!= null) { if (lastrecord._2 < r._2 && lastrecord._2 < lastlastrecord._2) { results.+=(new PivotPoint(r._1.uniqueId, position, lastrecord._1.eventtime, lastrecord._2, false)) else if (lastrecord._2 > r._2 && lastrecord._2 > lastlastrecord._2) { results.+=(new PivotPoint(r._1.uniqueId, position, lastrecord._1.eventtime, lastrecord._2, true)) lastuniqueid = r._1.uniqueid lastlastrecord = lastrecord lastrecord = r ) results.iterator ) //Part 7 : pretty everything up pivotpointrdd.map(r => { val pivottype = if (r.ispeak) "peak" else "valley" r.uniqueid + "," + r.position + "," + r.eventtime + "," + r.eventvalue + "," + pivottype ).saveastextfile(outputpath) class DataKey(val uniqueid:string, val eventtime:long) extends Serializable with Comparable[DataKey] { override def compareto(other:datakey): Int = { val compare1 = uniqueid.compareto(other.uniqueid) if (compare1 == 0) { eventtime.compareto(other.eventtime) else { compare1 class PivotPoint(val uniqueid: String, val position:int, val eventtime:long, val eventvalue:int, val ispeak:boolean) extends Serializable { package com.hadooparchitecturebook.spark.peaksandvalleys import org.apache.hadoop.io.{text, LongWritable import org.apache.hadoop.mapred.textinputformat import org.apache.spark.rdd.shuffledrdd import org.apache.spark.{partitioner, SparkContext, SparkConf import scala.collection.mutable /** * Created by ted.malaska on 12/7/14. */ object 5 SparkPeaksAndValleysExecution { def main(args: Array[String]): Unit = { if (args.length == 0) { Finding Peaks and Valleys in Stock Market Data SELECT PRIMARY KEY, POSITION, EVENT_VALUE, CASE WHEN LEAD_EVENT_VALUE is null or LAG_EVENT_VALUE is null then 'EDGE' WHEN EVENT_VALUE < LEAD_EVENT_VALUE AND EVENT_VALUE < LAG_EVENT_VALUE then 'VALLEY' WHEN EVENT_VALUE > LEAD_EVENT_VALUE AND EVENT_VALUE > LAG_EVENT_VALUE then 'PEAK' ELSE 'SLOPE' AND AS POINT_TYPE FROM ( SELECT PRIMARY_KEY, POSITION, EVENT_VALUE, LEAD(EVENT_VALUE,1,null) OVER (PARTITION BY PRIMARY_KEY ORDER BY POSITION) AS LEAD_EVENT_VALUE, LAG(EVENT_VALUE,1,null) OVER (PARTITION BY PRIMARY_KEY ORDER BY POSITION) AS LAG_EVENT_VALUE FROM PEAK_AND_VALLEY_TABLE ) 132 Lines of Scala/Spark 17 Lines of SQL Copyright 2014, Oracle and/or its affiliates. All rights reserved. 7x less code 10:00 10:05 10:10 10:15 10:20 10:25 Example taken from Hadoop Application Architectures by Mark Grover, Ted Malaska, Jonathan Seidman & Gwen Shapira (O Reilly, July 2015) 5

Power of SQL Declarative Abstracted (from storage) Concise Powerful Simple to learn Rich analytical functions Fast Secure Standardized Widely used & known 6

SQL on Hadoop Hive First SQL engine on Hadoop Uses MapReduce for execution Contains metastore (HCatalog) New Hive-on-Spark project Impala Developed by Cloudera Fast, in-memory execution Introduced Parquet format Compatible with Hive SparkSQL Spark module for accessing structured data Fast, in-memory execution Compatible with Hive Presto Developed by Facebook Fast, low latency execution Compatible with Hive Connectivity to other sources 7

What if you need to query Hadoop & RDBMS data? Pure Hadoop-on-SQL engines can access data in Hadoop only (HDFS, Hive, Parquet, ORC, HBase etc.) Performance of BI tools like Cognos, Oracle BIEE, SAS, Tableau for large and complex federated queries is limited Possible solution is to use SQL interface & Hadoop integration available with the DBMS platform of choice 8

Which SQL-on-Hadoop approach will you use most? http://blogs.gartner.com/merv-adrian/2015/02/22/which-sql-on-hadoop-poll-still-says-whatever-but-dbms-providers-gain/ From 9% in 2014 to 19% in 2015 9

Introducing Oracle Big Data SQL One SQL to query ALL the data 10

What if you could Make all data easily available to all your Oracle Database applications While supporting the full breadth of Oracle SQL query language With all the security of Oracle Database 12c Without moving data between your Hadoop cluster and the RDBMS And deliver fast query performance While leveraging your existing skills And still utilize the latest Hadoop innovations 11

One SQL to query ALL the data CQL N1QL UnQL SQL HiveQL NoSQL 12

Rich Analytical Functions with Oracle SQL Ranking functions Rank, dense_rank, cume_dist, percent_rank, ntile Window Aggregate functions (moving and cumulative) Avg, sum, min, max, count, variance, stddev, first_value, last_value LAG/LEAD functions Direct inter-row reference using offsets Reporting Aggregate functions Sum, avg, min, max, variance, stddev, count, ratio_to_report Statistical Aggregates Correlation, linear regression family, covariance Linear regression Fitting of an ordinary-least-squares regression line to a set of number pairs. Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions Descriptive Statistics DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values Correlations Pearson s correlation coefficients, Spearman's and Kendall's (both nonparametric). Cross Tabs Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa Hypothesis Testing Student t-test, F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA Distribution Fitting Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential 13

next = linenext.getquantity(); if (!q.isempty() && (prev.isempty() (eq(q, prev) && gt(q, next)))) { state = "S"; return state; Pattern Matching With Oracle SQL if (gt(q, prev) && gt(q, next)) { state = "T"; return state; Ticker if (lt(q, prev) && lt(q, next)) { state = "B"; return state; if (!q.isempty() && (next.isempty() (gt(q, prev) && eq(q, next)))) { state = "E"; return state; if (q.isempty() eq(q, prev)) { state = "F"; return state; Finding Patterns in Stock Market Data - Double Bottom (W) 10:00 10:05 10:10 10:15 10:20 10:25 return state; private boolean eq(string a, String b) { if (a.isempty() b.isempty()) { return false; return a.equals(b); private boolean gt(string a, String b) { if (a.isempty() b.isempty()) { return false; return Double.parseDouble(a) > Double.parseDouble(b); private boolean lt(string a, String b) { if (a.isempty() b.isempty()) { return false; return Double.parseDouble(a) < Double.parseDouble(b); public String getstate() { return this.state; BagFactory bagfactory = BagFactory.getInstance(); @Override public Tuple exec(tuple input) throws IOException { SELECT first_x, last_z FROM ticker MATCH_RECOGNIZE ( PARTITION BY name ORDER BY time MEASURES FIRST(x.time) AS first_x, LAST(z.time) AS last_z ONE ROW PER MATCH PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price) AND z.time - FIRST(x.time) <= 7 )) 14 long c = 0; String line = ""; String pbkey = ""; V0Line nextline; V0Line thisline; V0Line processline; V0Line evalline = null; V0Line prevline; boolean nomorevalues = false; String matchlist = ""; ArrayList<V0Line> linefifo = new ArrayList<V0Line>(); boolean finished = false; 250+ Lines of Java UDF 12 Lines of Oracle SQL DataBag output = bagfactory.newdefaultbag(); if (input == null) { return null; if (input.size() == 0) { return null; Object o = input.get(0); if (o == null) { return null; Copyright 2014, Oracle and/or its affiliates. All rights reserved. //Object o = input.get(0); if (!(o instanceof DataBag)) { int errcode = 2114; 20x less code 14

Security Virtual Private Database with Oracle SQL SELECT * FROM my_bigdata_table WHERE SALES_REP_ID = SYS_CONTEXT('USERENV','SESSION_USER'); B B B Big Data SQL on Hadoop Cluster Filter on SESSION_USER Oracle Database 12c Oracle Virtual Private Database (VPD) enables you to create security policies to control database access at the row and column level. Oracle VPD adds a dynamic WHERE clause to a SQL statement that is issued against the table, view, or synonym to which an Oracle Virtual Private Database security policy was applied. Because you attach security policies directly to the database objects (tables, views), and the policies are automatically applied whenever a user accesses data, there is no way to bypass security. With Big Data SQL the Oracle Virtual Private Database is available for Hadoop data 15

Security Data Redaction with Oracle SQL DBMS_REDACT.ADD_POLICY( object_schema => 'MCLICK', object_name => 'TWEET_V', column_name => 'USERNAME', policy_name => 'tweet_redaction', function_type => DBMS_REDACT.PARTIAL, function_parameters => 'VVVVVVVVVVVVVVVVVVVVVVVVV,*,3,25', expression => '1=1' ); B B B *** Oracle Data Redaction enables you to create security policies to control what data is visible for sensitive columns with personal or security information. Oracle Data Redaction dynamically applies redaction function to columns. The function transforms, obfuscates or hides the sensitive information for unauthorized users. Since the policy is applied automatically by Oracle Database there is no way to bypass security and get the un-redacted data. With Big Data SQL the Oracle Virtual Private Database is available for Hadoop data Big Data SQL on Hadoop Cluster Oracle Database 12c 16

How Oracle Big Data SQL Works Marriage of Hadoop and Oracle Database Query Processing 17

Hadoop data accessible through Oracle External Tables CREATE TABLE movielog (click VARCHAR2(4000)) ORGANIZATION EXTERNAL ( TYPE ORACLE_HIVE DEFAULT DIRECTORY Dir1 ACCESS PARAMETERS ( com.oracle.bigdata.tablename logs com.oracle.bigdata.cluster mycluster) ) REJECT LIMIT UNLIMITED New set of properties ORACLE_HIVE and ORACLE_HDFS access drivers Identify a Hadoop cluster, data source, column mapping, error handling, overflow handling, logging New table metadata passed from Oracle DDL to Hadoop readers at query execution Architected for extensibility StorageHandler capability enables future support for many other data sources Examples: MongoDB, HBase, Oracle NoSQL DB 18

How Oracle executes a query with Big Data SQL HDFS NameNode 1 Hive Metastore HDFS Data Node BDSQL B B B HDFS Data Node BDSQL 2 3 1 2 3 Query compilation determines: Data locations Data structure Parallelism Fast reads using Big Data SQL Server Schema-for-read using Hadoop classes Smart Scan selects only relevant data Process filtered result Move relevant data to database Join with database tables Apply database security policies Big Data SQL on Hadoop Cluster Oracle Database 12c 19

Big Data SQL: A New Hadoop Processing Engine Processing Layer MapReduce and Hive Spark Impala Search Big Data SQL Resource Management (YARN, cgroups) Storage Layer Filesystem (HDFS) NoSQL Databases (Oracle NoSQL DB, HBase) 20

Big Data SQL Uses Hive Metastore, not MapReduce Oracle Big Data SQL SparkSQL Hive Impala Hive Metastore Common semantic repository (schemas, Java classes) for most of SQL-on- Hadoop tools Metastore maps DDL to Java access classes 21

How Data is Stored in Hadoop Example: 1TB JSON File {"custid":1185972,"movieid":null,"genreid":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8 {"custid":1354924,"movieid":1948,"genreid":9,"time":"2012-07-01:00:00:22","recommended":"n","activity":7 {"custid":1083711,"movieid":null,"genreid":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9 Block B1 {"custid":1234182,"movieid":11547,"genreid":44,"time":"2012-07-01:00:00:32","recommended":"y","activity":7 {"custid":1010220,"movieid":11547,"genreid":44,"time":"2012-07-01:00:00:42","recommended":"y","activity":6 {"custid":1143971,"movieid":null,"genreid":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8 {"custid":1253676,"movieid":null,"genreid":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9 {"custid":1351777,"movieid":608,"genreid":6,"time":"2012-07-01:00:01:03","recommended":"n","activity":7 Block B2 {"custid":1143971,"movieid":null,"genreid":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9 {"custid":1363545,"movieid":27205,"genreid":9,"time":"2012-07-01:00:01:18","recommended":"y","activity":7 {"custid":1067283,"movieid":1124,"genreid":9,"time":"2012-07-01:00:01:26","recommended":"y","activity":7 {"custid":1126174,"movieid":16309,"genreid":9,"time":"2012-07-01:00:01:35","recommended":"n","activity":7 {"custid":1234182,"movieid":11547,"genreid":44,"time":"2012-07-01:00:01:39","recommended":"y","activity":7 Block B3 {"custid":1346299,"movieid":424,"genreid":1,"time":"2012-07-01:00:05:02","recommended":"y","activity":4 1 block = 256 MB Example File = 4096 blocks InputSplits = 4096 Potential scan parallelism 22

Hive Storage Handler How MapReduce and Hive Read Data Consumer Scan and row creation needs to be able to work on any data format Create ROWS & COLUMNS Data definitions and column deserializations are needed to provide a table SCAN Data Node disk RecordReader => Scans data (keys and values) InputFormat => Defines parallelism SerDe => Makes columns Metastore => Maps DDL to Java access classes 23

1011001 0100111 1011001 0100111 10110010 Big Data SQL Server Dataflow Big Data SQL Smart Scan External Table Services 3 2 1 2 Read data from HDFS Data Node Direct-path reads C-based readers when possible Use native Hadoop classes otherwise Translate bytes to Oracle SerDe RecordReader Data Node Disks 1 3 Apply Smart Scan to Oracle bytes Apply filters Project Columns Parse JSON/XML Score models 24

Operations Pushed Down to Hadoop Big Data SQL on Hadoop Cluster Oracle Database 12c Request JSON Pushed down to Big Data SQL Cell Hadoop scans (InputFormat, SerDe) JSON parsing WHERE clause evaluation Storage index evaluation Column projection Bloom filters for faster joins Score Data Mining models Oracle Data Stream Smart Scan Only relevant data are emitted Handled by Oracle Database Query Compilation & Optimization Joins Aggregations Ordering of results PL/SQL evaluation Table functions Security policies 25

Oracle Big Data SQL Storage Index HDFS Field 1, Field 2, Field 3,, Field n 1001 1010 1045 1109 1043 1001 1045 1609 1043 11455 1909 12430 13010 10450 1909 2043 HDFS Block1 (256MB) HDFS Block2 (256MB) Example: Find all ratings from movies with a MOVIE_ID of 1109 Index B1 Movie_ID Min: 1001 Max: 1609 B2 Movie_ID Min: 1909 Max: 13010 Storage index provides query speed-up through transparent IO elimination of HDFS Blocks Columns in SQL are mapped to fields in the HDFS file via External Table Definitions Min / max value is recorded for each HDFS Block in a inmemory storage index 26

Oracle Parallel Query and Hadoop 1 Determine Hadoop Parallelism Determine schema-for-read Determine InputSplits Arrange splits for best performance HDFS NameNode Hive Metastore B B B 1 InputSplits 2 PX 2 3 Map to Oracle Parallelism Map splits to granules Assign batches of granules to PX Servers PX Servers Route Work Send granule requests async to cells Reap results Big Data SQL on Hadoop Cluster Oracle Database 12c 27

Host 4 Host 3 Host 2 Host 1 Oracle and Hadoop Parallelism Big Data SQL on Hadoop Cluster Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Async Parallelism defined by Hadoop InputSplits (fan-out from PX to Hadoop) Utilized as many cores as provided by cgroups (first-come-first-serve) Oracle Database 12c PX Server #1 PX Server #2 PX Server #3 PX Server #4 PX Server #5 PX Server #6 PX Server #7 PX Server #8 Parallelism defined by Degree of Parallelism (DOP) dynamic, statement, table level DOP can be throttled by database if maximum DOP exceeded or table too small SELECT /*+PARALLEL(EVE,8)*/ CUST.NAME, CUST.MSISDN, EVE.MONTH, EVE.EVENT_TYPE, COUNT(*) AS EVENT_COUNT, SUM(EVE.DURATION) AS DURATION FROM D_CUSTOMERS CUST, F_NETWORK_EVENTS EVE WHERE CUST.MSISDN = EVE.MSISDN GROUP BY CUST.NAME, CUST.MSISDN, EVE.MONTH, EVE.EVENT_TYPE ORDER BY 1,2,3,4 28

Big Data SQL Prerequisites Oracle 12c on Linux Oracle Exadata Oracle Big Data Appliance Infiniband interconnection between Oracle Exadata and Oracle Big Data Appliance Oracle Big Data Appliance with CDH B B B Infiniband Oracle 12c on Oracle Exadata 29

Demonstration Big Data SQL in Action 30

31

32

33

34

35

Questions and Answers Provided we still have some time 36

Knowledge Check True or False Hive is leveraged by Big Data SQL as a query execution engine - allowing BDS queries to automatically execute faster as the Hive execution engine improves (e.g. Spark replaces MapReduce) False Big Data SQL leverages only the Hive Metastore (HCatalog) and the corresponding classes (InputFormat, RecordReader, SerDe) but it does not use Hive for execution 37

Knowledge Check True or False Oracle Big Data SQL sends all the data from Hadoop to Oracle Database where the query is processed. Oracle Database does column selection, it applies WHERE, GROUP BY and ORDER BY clauses etc. False Big Data SQL Smart Scan performs low level processing of query. Smart Scan does the column projection, it applies WHERE condition & Bloom filters, it processes JSON etc. 38

Knowledge Check True or False Oracle s ambition with Big Data SQL is to supersede all the Hadoop-on-SQL engines like Hive, SparkSQL, Impala, Drill or Presto. False Big Data SQL is for companies with significant Oracle assets (e.g. Oracle Data Warehouse) who wants to access and process both Hadoop and Oracle data from single SQL environment 39

Thank You! 40

Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 41

42