Oracle Big Data SQL Konference Data a znalosti 2015
|
|
|
- Delilah Lynn Morrison
- 10 years ago
- Views:
Transcription
1 Oracle Big Data SQL Konference Data a znalosti 2015 Jakub ILLNER Information Management Architect XLOB Enterprise Cloud Architects 23 July 2015, version 2
2 Agenda Is SQL Dead? Introducing Oracle Big Data SQL How Oracle Big Data SQL Works Demonstration Questions and Answers 2
3 Is SQL Dead? With all the Big Data and NoSQL technologies, why to bother with SQL 3
4 Is SQL Dead? 4
5 //Part 5 // Keeping context var lastuniqueid = "foobar" var lastrecord: (DataKey, Int) = null var lastlastrecord: (DataKey, Int) = null Peaks and Valleys in Spark vs. SQL var position = 0 it.foreach( r => { position = position + 1 if (!lastuniqueid.equals(r._1.uniqueid)) { Ticker lastrecord = null lastlastrecord = null //Part 6 : Finding those peaks and valleys if (lastrecord!= null && lastlastrecord!= null) { if (lastrecord._2 < r._2 && lastrecord._2 < lastlastrecord._2) { results.+=(new PivotPoint(r._1.uniqueId, position, lastrecord._1.eventtime, lastrecord._2, false)) else if (lastrecord._2 > r._2 && lastrecord._2 > lastlastrecord._2) { results.+=(new PivotPoint(r._1.uniqueId, position, lastrecord._1.eventtime, lastrecord._2, true)) lastuniqueid = r._1.uniqueid lastlastrecord = lastrecord lastrecord = r ) results.iterator ) //Part 7 : pretty everything up pivotpointrdd.map(r => { val pivottype = if (r.ispeak) "peak" else "valley" r.uniqueid + "," + r.position + "," + r.eventtime + "," + r.eventvalue + "," + pivottype ).saveastextfile(outputpath) class DataKey(val uniqueid:string, val eventtime:long) extends Serializable with Comparable[DataKey] { override def compareto(other:datakey): Int = { val compare1 = uniqueid.compareto(other.uniqueid) if (compare1 == 0) { eventtime.compareto(other.eventtime) else { compare1 class PivotPoint(val uniqueid: String, val position:int, val eventtime:long, val eventvalue:int, val ispeak:boolean) extends Serializable { package com.hadooparchitecturebook.spark.peaksandvalleys import org.apache.hadoop.io.{text, LongWritable import org.apache.hadoop.mapred.textinputformat import org.apache.spark.rdd.shuffledrdd import org.apache.spark.{partitioner, SparkContext, SparkConf import scala.collection.mutable /** * Created by ted.malaska on 12/7/14. */ object 5 SparkPeaksAndValleysExecution { def main(args: Array[String]): Unit = { if (args.length == 0) { Finding Peaks and Valleys in Stock Market Data SELECT PRIMARY KEY, POSITION, EVENT_VALUE, CASE WHEN LEAD_EVENT_VALUE is null or LAG_EVENT_VALUE is null then 'EDGE' WHEN EVENT_VALUE < LEAD_EVENT_VALUE AND EVENT_VALUE < LAG_EVENT_VALUE then 'VALLEY' WHEN EVENT_VALUE > LEAD_EVENT_VALUE AND EVENT_VALUE > LAG_EVENT_VALUE then 'PEAK' ELSE 'SLOPE' AND AS POINT_TYPE FROM ( SELECT PRIMARY_KEY, POSITION, EVENT_VALUE, LEAD(EVENT_VALUE,1,null) OVER (PARTITION BY PRIMARY_KEY ORDER BY POSITION) AS LEAD_EVENT_VALUE, LAG(EVENT_VALUE,1,null) OVER (PARTITION BY PRIMARY_KEY ORDER BY POSITION) AS LAG_EVENT_VALUE FROM PEAK_AND_VALLEY_TABLE ) 132 Lines of Scala/Spark 17 Lines of SQL Copyright 2014, Oracle and/or its affiliates. All rights reserved. 7x less code 10:00 10:05 10:10 10:15 10:20 10:25 Example taken from Hadoop Application Architectures by Mark Grover, Ted Malaska, Jonathan Seidman & Gwen Shapira (O Reilly, July 2015) 5
6 Power of SQL Declarative Abstracted (from storage) Concise Powerful Simple to learn Rich analytical functions Fast Secure Standardized Widely used & known 6
7 SQL on Hadoop Hive First SQL engine on Hadoop Uses MapReduce for execution Contains metastore (HCatalog) New Hive-on-Spark project Impala Developed by Cloudera Fast, in-memory execution Introduced Parquet format Compatible with Hive SparkSQL Spark module for accessing structured data Fast, in-memory execution Compatible with Hive Presto Developed by Facebook Fast, low latency execution Compatible with Hive Connectivity to other sources 7
8 What if you need to query Hadoop & RDBMS data? Pure Hadoop-on-SQL engines can access data in Hadoop only (HDFS, Hive, Parquet, ORC, HBase etc.) Performance of BI tools like Cognos, Oracle BIEE, SAS, Tableau for large and complex federated queries is limited Possible solution is to use SQL interface & Hadoop integration available with the DBMS platform of choice 8
9 Which SQL-on-Hadoop approach will you use most? From 9% in 2014 to 19% in
10 Introducing Oracle Big Data SQL One SQL to query ALL the data 10
11 What if you could Make all data easily available to all your Oracle Database applications While supporting the full breadth of Oracle SQL query language With all the security of Oracle Database 12c Without moving data between your Hadoop cluster and the RDBMS And deliver fast query performance While leveraging your existing skills And still utilize the latest Hadoop innovations 11
12 One SQL to query ALL the data CQL N1QL UnQL SQL HiveQL NoSQL 12
13 Rich Analytical Functions with Oracle SQL Ranking functions Rank, dense_rank, cume_dist, percent_rank, ntile Window Aggregate functions (moving and cumulative) Avg, sum, min, max, count, variance, stddev, first_value, last_value LAG/LEAD functions Direct inter-row reference using offsets Reporting Aggregate functions Sum, avg, min, max, variance, stddev, count, ratio_to_report Statistical Aggregates Correlation, linear regression family, covariance Linear regression Fitting of an ordinary-least-squares regression line to a set of number pairs. Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions Descriptive Statistics DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values Correlations Pearson s correlation coefficients, Spearman's and Kendall's (both nonparametric). Cross Tabs Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa Hypothesis Testing Student t-test, F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA Distribution Fitting Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential 13
14 next = linenext.getquantity(); if (!q.isempty() && (prev.isempty() (eq(q, prev) && gt(q, next)))) { state = "S"; return state; Pattern Matching With Oracle SQL if (gt(q, prev) && gt(q, next)) { state = "T"; return state; Ticker if (lt(q, prev) && lt(q, next)) { state = "B"; return state; if (!q.isempty() && (next.isempty() (gt(q, prev) && eq(q, next)))) { state = "E"; return state; if (q.isempty() eq(q, prev)) { state = "F"; return state; Finding Patterns in Stock Market Data - Double Bottom (W) 10:00 10:05 10:10 10:15 10:20 10:25 return state; private boolean eq(string a, String b) { if (a.isempty() b.isempty()) { return false; return a.equals(b); private boolean gt(string a, String b) { if (a.isempty() b.isempty()) { return false; return Double.parseDouble(a) > Double.parseDouble(b); private boolean lt(string a, String b) { if (a.isempty() b.isempty()) { return false; return Double.parseDouble(a) < Double.parseDouble(b); public String getstate() { return this.state; BagFactory bagfactory = public Tuple exec(tuple input) throws IOException { SELECT first_x, last_z FROM ticker MATCH_RECOGNIZE ( PARTITION BY name ORDER BY time MEASURES FIRST(x.time) AS first_x, LAST(z.time) AS last_z ONE ROW PER MATCH PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price) AND z.time - FIRST(x.time) <= 7 )) 14 long c = 0; String line = ""; String pbkey = ""; V0Line nextline; V0Line thisline; V0Line processline; V0Line evalline = null; V0Line prevline; boolean nomorevalues = false; String matchlist = ""; ArrayList<V0Line> linefifo = new ArrayList<V0Line>(); boolean finished = false; 250+ Lines of Java UDF 12 Lines of Oracle SQL DataBag output = bagfactory.newdefaultbag(); if (input == null) { return null; if (input.size() == 0) { return null; Object o = input.get(0); if (o == null) { return null; Copyright 2014, Oracle and/or its affiliates. All rights reserved. //Object o = input.get(0); if (!(o instanceof DataBag)) { int errcode = 2114; 20x less code 14
15 Security Virtual Private Database with Oracle SQL SELECT * FROM my_bigdata_table WHERE SALES_REP_ID = SYS_CONTEXT('USERENV','SESSION_USER'); B B B Big Data SQL on Hadoop Cluster Filter on SESSION_USER Oracle Database 12c Oracle Virtual Private Database (VPD) enables you to create security policies to control database access at the row and column level. Oracle VPD adds a dynamic WHERE clause to a SQL statement that is issued against the table, view, or synonym to which an Oracle Virtual Private Database security policy was applied. Because you attach security policies directly to the database objects (tables, views), and the policies are automatically applied whenever a user accesses data, there is no way to bypass security. With Big Data SQL the Oracle Virtual Private Database is available for Hadoop data 15
16 Security Data Redaction with Oracle SQL DBMS_REDACT.ADD_POLICY( object_schema => 'MCLICK', object_name => 'TWEET_V', column_name => 'USERNAME', policy_name => 'tweet_redaction', function_type => DBMS_REDACT.PARTIAL, function_parameters => 'VVVVVVVVVVVVVVVVVVVVVVVVV,*,3,25', expression => '1=1' ); B B B *** Oracle Data Redaction enables you to create security policies to control what data is visible for sensitive columns with personal or security information. Oracle Data Redaction dynamically applies redaction function to columns. The function transforms, obfuscates or hides the sensitive information for unauthorized users. Since the policy is applied automatically by Oracle Database there is no way to bypass security and get the un-redacted data. With Big Data SQL the Oracle Virtual Private Database is available for Hadoop data Big Data SQL on Hadoop Cluster Oracle Database 12c 16
17 How Oracle Big Data SQL Works Marriage of Hadoop and Oracle Database Query Processing 17
18 Hadoop data accessible through Oracle External Tables CREATE TABLE movielog (click VARCHAR2(4000)) ORGANIZATION EXTERNAL ( TYPE ORACLE_HIVE DEFAULT DIRECTORY Dir1 ACCESS PARAMETERS ( com.oracle.bigdata.tablename logs com.oracle.bigdata.cluster mycluster) ) REJECT LIMIT UNLIMITED New set of properties ORACLE_HIVE and ORACLE_HDFS access drivers Identify a Hadoop cluster, data source, column mapping, error handling, overflow handling, logging New table metadata passed from Oracle DDL to Hadoop readers at query execution Architected for extensibility StorageHandler capability enables future support for many other data sources Examples: MongoDB, HBase, Oracle NoSQL DB 18
19 How Oracle executes a query with Big Data SQL HDFS NameNode 1 Hive Metastore HDFS Data Node BDSQL B B B HDFS Data Node BDSQL Query compilation determines: Data locations Data structure Parallelism Fast reads using Big Data SQL Server Schema-for-read using Hadoop classes Smart Scan selects only relevant data Process filtered result Move relevant data to database Join with database tables Apply database security policies Big Data SQL on Hadoop Cluster Oracle Database 12c 19
20 Big Data SQL: A New Hadoop Processing Engine Processing Layer MapReduce and Hive Spark Impala Search Big Data SQL Resource Management (YARN, cgroups) Storage Layer Filesystem (HDFS) NoSQL Databases (Oracle NoSQL DB, HBase) 20
21 Big Data SQL Uses Hive Metastore, not MapReduce Oracle Big Data SQL SparkSQL Hive Impala Hive Metastore Common semantic repository (schemas, Java classes) for most of SQL-on- Hadoop tools Metastore maps DDL to Java access classes 21
22 How Data is Stored in Hadoop Example: 1TB JSON File {"custid": ,"movieid":null,"genreid":null,"time":" :00:00:07","recommended":null,"activity":8 {"custid": ,"movieid":1948,"genreid":9,"time":" :00:00:22","recommended":"n","activity":7 {"custid": ,"movieid":null,"genreid":null,"time":" :00:00:26","recommended":null,"activity":9 Block B1 {"custid": ,"movieid":11547,"genreid":44,"time":" :00:00:32","recommended":"y","activity":7 {"custid": ,"movieid":11547,"genreid":44,"time":" :00:00:42","recommended":"y","activity":6 {"custid": ,"movieid":null,"genreid":null,"time":" :00:00:43","recommended":null,"activity":8 {"custid": ,"movieid":null,"genreid":null,"time":" :00:00:50","recommended":null,"activity":9 {"custid": ,"movieid":608,"genreid":6,"time":" :00:01:03","recommended":"n","activity":7 Block B2 {"custid": ,"movieid":null,"genreid":null,"time":" :00:01:07","recommended":null,"activity":9 {"custid": ,"movieid":27205,"genreid":9,"time":" :00:01:18","recommended":"y","activity":7 {"custid": ,"movieid":1124,"genreid":9,"time":" :00:01:26","recommended":"y","activity":7 {"custid": ,"movieid":16309,"genreid":9,"time":" :00:01:35","recommended":"n","activity":7 {"custid": ,"movieid":11547,"genreid":44,"time":" :00:01:39","recommended":"y","activity":7 Block B3 {"custid": ,"movieid":424,"genreid":1,"time":" :00:05:02","recommended":"y","activity":4 1 block = 256 MB Example File = 4096 blocks InputSplits = 4096 Potential scan parallelism 22
23 Hive Storage Handler How MapReduce and Hive Read Data Consumer Scan and row creation needs to be able to work on any data format Create ROWS & COLUMNS Data definitions and column deserializations are needed to provide a table SCAN Data Node disk RecordReader => Scans data (keys and values) InputFormat => Defines parallelism SerDe => Makes columns Metastore => Maps DDL to Java access classes 23
24 Big Data SQL Server Dataflow Big Data SQL Smart Scan External Table Services Read data from HDFS Data Node Direct-path reads C-based readers when possible Use native Hadoop classes otherwise Translate bytes to Oracle SerDe RecordReader Data Node Disks 1 3 Apply Smart Scan to Oracle bytes Apply filters Project Columns Parse JSON/XML Score models 24
25 Operations Pushed Down to Hadoop Big Data SQL on Hadoop Cluster Oracle Database 12c Request JSON Pushed down to Big Data SQL Cell Hadoop scans (InputFormat, SerDe) JSON parsing WHERE clause evaluation Storage index evaluation Column projection Bloom filters for faster joins Score Data Mining models Oracle Data Stream Smart Scan Only relevant data are emitted Handled by Oracle Database Query Compilation & Optimization Joins Aggregations Ordering of results PL/SQL evaluation Table functions Security policies 25
26 Oracle Big Data SQL Storage Index HDFS Field 1, Field 2, Field 3,, Field n HDFS Block1 (256MB) HDFS Block2 (256MB) Example: Find all ratings from movies with a MOVIE_ID of 1109 Index B1 Movie_ID Min: 1001 Max: 1609 B2 Movie_ID Min: 1909 Max: Storage index provides query speed-up through transparent IO elimination of HDFS Blocks Columns in SQL are mapped to fields in the HDFS file via External Table Definitions Min / max value is recorded for each HDFS Block in a inmemory storage index 26
27 Oracle Parallel Query and Hadoop 1 Determine Hadoop Parallelism Determine schema-for-read Determine InputSplits Arrange splits for best performance HDFS NameNode Hive Metastore B B B 1 InputSplits 2 PX 2 3 Map to Oracle Parallelism Map splits to granules Assign batches of granules to PX Servers PX Servers Route Work Send granule requests async to cells Reap results Big Data SQL on Hadoop Cluster Oracle Database 12c 27
28 Host 4 Host 3 Host 2 Host 1 Oracle and Hadoop Parallelism Big Data SQL on Hadoop Cluster Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Async Parallelism defined by Hadoop InputSplits (fan-out from PX to Hadoop) Utilized as many cores as provided by cgroups (first-come-first-serve) Oracle Database 12c PX Server #1 PX Server #2 PX Server #3 PX Server #4 PX Server #5 PX Server #6 PX Server #7 PX Server #8 Parallelism defined by Degree of Parallelism (DOP) dynamic, statement, table level DOP can be throttled by database if maximum DOP exceeded or table too small SELECT /*+PARALLEL(EVE,8)*/ CUST.NAME, CUST.MSISDN, EVE.MONTH, EVE.EVENT_TYPE, COUNT(*) AS EVENT_COUNT, SUM(EVE.DURATION) AS DURATION FROM D_CUSTOMERS CUST, F_NETWORK_EVENTS EVE WHERE CUST.MSISDN = EVE.MSISDN GROUP BY CUST.NAME, CUST.MSISDN, EVE.MONTH, EVE.EVENT_TYPE ORDER BY 1,2,3,4 28
29 Big Data SQL Prerequisites Oracle 12c on Linux Oracle Exadata Oracle Big Data Appliance Infiniband interconnection between Oracle Exadata and Oracle Big Data Appliance Oracle Big Data Appliance with CDH B B B Infiniband Oracle 12c on Oracle Exadata 29
30 Demonstration Big Data SQL in Action 30
31 31
32 32
33 33
34 34
35 35
36 Questions and Answers Provided we still have some time 36
37 Knowledge Check True or False Hive is leveraged by Big Data SQL as a query execution engine - allowing BDS queries to automatically execute faster as the Hive execution engine improves (e.g. Spark replaces MapReduce) False Big Data SQL leverages only the Hive Metastore (HCatalog) and the corresponding classes (InputFormat, RecordReader, SerDe) but it does not use Hive for execution 37
38 Knowledge Check True or False Oracle Big Data SQL sends all the data from Hadoop to Oracle Database where the query is processed. Oracle Database does column selection, it applies WHERE, GROUP BY and ORDER BY clauses etc. False Big Data SQL Smart Scan performs low level processing of query. Smart Scan does the column projection, it applies WHERE condition & Bloom filters, it processes JSON etc. 38
39 Knowledge Check True or False Oracle s ambition with Big Data SQL is to supersede all the Hadoop-on-SQL engines like Hive, SparkSQL, Impala, Drill or Presto. False Big Data SQL is for companies with significant Oracle assets (e.g. Oracle Data Warehouse) who wants to access and process both Hadoop and Oracle data from single SQL environment 39
40 Thank You! 40
41 Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 41
42 42
43
Oracle Big Data SQL Architectural Deep Dive
Oracle Big Data SQL Architectural Deep Dive Dan McClary, Ph.D. Big Data Product Management Oracle Safe Harbor Statement The following is intended to outline our general product direction. It is intended
SQL - the best analysis language for Big Data!
SQL - the best analysis language for Big Data! NoCOUG Winter Conference 2014 Hermann Bär, [email protected] Data Warehousing Product Management, Oracle 1 The On-Going Evolution of SQL Introduction
Seamless Access from Oracle Database to Your Big Data
Seamless Access from Oracle Database to Your Big Data Brian Macdonald Big Data and Analytics Specialist Oracle Enterprise Architect September 24, 2015 Agenda Hadoop and SQL access methods What is Oracle
Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle
Oracle Big Data SQL Architectural Deep Dive Dan McClary, Ph.D. Big Data Product Management Oracle Copyright 2014, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is
Big Data SQL and Query Franchising
Big Data SQL and Query Franchising An Architecture for Query Beyond Hadoop Dan McClary, Ph.D. Big Data Product Management Oracle Copyright 2014, Oracle and/or its affiliates. All rights reserved. Safe Harbor
Exadata V2 + Oracle Data Mining 11g Release 2 Importing 3 rd Party (SAS) dm models
Exadata V2 + Oracle Data Mining 11g Release 2 Importing 3 rd Party (SAS) dm models Charlie Berger Sr. Director Product Management, Data Mining Technologies Oracle Corporation [email protected]
The Oracle Data Mining Machine Bundle: Zero to Predictive Analytics in Two Weeks Collaborate 15 IOUG
The Oracle Data Mining Machine Bundle: Zero to Predictive Analytics in Two Weeks Collaborate 15 IOUG Presentation #730 Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com Presentation
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Statistical Analysis of Gene Expression Data With Oracle & R (- data mining)
Statistical Analysis of Gene Expression Data With Oracle & R (- data mining) Patrick E. Hoffman Sc.D. Senior Principal Analytical Consultant [email protected] Agenda (Oracle & R Analysis) Tools Loading
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
OLSUG Workshop Oracle Data Mining
OLSUG Workshop Oracle Data Mining Charlie Berger Sr. Director of Product Mgmt, Life Sciences and Data Mining Oracle Corporation [email protected] Dr. Lutz Hamel Asst. Professor, Computer Science
Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich
Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich Agenda Introduction Old Times Exadata Big Data Oracle In-Memory Headquarters Conclusions 2 sumit
Oracle Data Mining In-Database Data Mining Made Easy!
Oracle Data Mining In-Database Data Mining Made Easy! Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics Oracle Corporation [email protected] www.twitter.com/charliedatamine
Safe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
Blazing BI: the Analytic Options to the Oracle Database. ODTUG Kscope 2013
Blazing BI: the Analytic Options to the Oracle Database ODTUG Kscope 2013 Dan Vlamis Tim Vlamis Vlamis Software Solutions 816-781-2880 http://www.vlamis.com Copyright 2013, Vlamis Software Solutions, Inc.
How To Manage Big Data In A Microsoft Cloud (Hadoop)
Oracle Database 12c and the Future of Data Warehousing in the Era of Big Data George Lumpkin Data Warehousing Neil Mendelson Big Data & Advanced AnalyEcs Vice Presidents Server Technologies September 29,
BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig
BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig Contents Acknowledgements... 1 Introduction to Hive and Pig... 2 Setup... 2 Exercise 1 Load Avro data into HDFS... 2 Exercise 2 Define an
1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
1 Copyright 2011, Oracle and/or its affiliates. FPO In-Database Analytics: Predictive Analytics, Data Mining, Exadata & Business Intelligence Charlie Berger Sr. Director Product Management, Data Mining
Oracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
Using distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
Integrate Master Data with Big Data using Oracle Table Access for Hadoop
Integrate Master Data with Big Data using Oracle Table Access for Hadoop Kuassi Mensah Oracle Corporation Redwood Shores, CA, USA Keywords: Hadoop, BigData, Hive SQL, Spark SQL, HCatalog, StorageHandler
Oracle's In-Database Statistical Functions
Oracle 11g DB Data Warehousing Oracle's In-Database Statistical Functions OLAP Statistics Data Mining Charlie Berger Sr. Director Product Management, Data Mining Technologies
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
Data Domain Profiling and Data Masking for Hadoop
Data Domain Profiling and Data Masking for Hadoop 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or
Cloudera Certified Developer for Apache Hadoop
Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number
Using RDBMS, NoSQL or Hadoop?
Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest
Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE
Spring,2015 Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Contents: Briefly About Big Data Management What is hive? Hive Architecture Working
Semantic and Data Mining Technologies. Simon See, Ph.D.,
Semantic and Data Mining Technologies Simon See, Ph.D., Introduction to Semantic Web and Business Use Cases 2 Lots of Scientific Resources NAR 2009 over 1170 databases Reuse, Recycling, Repurposing Paul
The Hadoop Eco System Shanghai Data Science Meetup
The Hadoop Eco System Shanghai Data Science Meetup Karthik Rajasethupathy, Christian Kuka 03.11.2015 @Agora Space Overview What is this talk about? Giving an overview of the Hadoop Ecosystem and related
How to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc.
Impala: A Modern, Open-Source SQL Engine for Hadoop Marcel Kornacker Cloudera, Inc. Agenda Goals; user view of Impala Impala performance Impala internals Comparing Impala to other systems Impala Overview:
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Big Data Analytics with Oracle Advanced Analytics In-Database Option
Big Data Analytics with Oracle Advanced Analytics In-Database Option Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics [email protected] www.twitter.com/charliedatamine
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016
Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets
Hadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK [email protected] Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Qsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
Real World Hadoop Use Cases
Real World Hadoop Use Cases JFokus 2013, Stockholm Eva Andreasson, Cloudera Inc. Lars Sjödin, King.com 1 2012 Cloudera, Inc. Agenda Recap of Big Data and Hadoop Analyzing Twitter feeds with Hadoop Real
Integrating Apache Spark with an Enterprise Data Warehouse
Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software
Internals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
Predictive Analytics for Better Business Intelligence
Oracle 11g DB Data Warehousing ETL OLAP Statistics Predictive Analytics for Better Business Intelligence Data Mining Charlie Berger Sr. Director Product Management, Data Mining Technologies
Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05
Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970
Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig
Introduction to Pig Agenda What is Pig? Key Features of Pig The Anatomy of Pig Pig on Hadoop Pig Philosophy Pig Latin Overview Pig Latin Statements Pig Latin: Identifiers Pig Latin: Comments Data Types
Moving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
Big Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database
1 Best Practices for Extreme Performance with Data Warehousing on Oracle Database Rekha Balwada Principal Product Manager Agenda Parallel Execution Workload Management on Data Warehouse
Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015
COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt
Complete Java Classes Hadoop Syllabus Contact No: 8888022204
1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What
Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12
Introduction to NoSQL Databases and MapReduce Tore Risch Information Technology Uppsala University 2014-05-12 What is a NoSQL Database? 1. A key/value store Basic index manager, no complete query language
Connecting Hadoop with Oracle Database
Connecting Hadoop with Oracle Database Sharon Stephen Senior Curriculum Developer Server Technologies Curriculum The following is intended to outline our general product direction.
Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah
Pro Apache Hadoop Second Edition Sameer Wadkar Madhu Siddalingaiah Contents J About the Authors About the Technical Reviewer Acknowledgments Introduction xix xxi xxiii xxv Chapter 1: Motivation for Big
MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering
MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation
Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
HADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
MySQL and Hadoop. Percona Live 2014 Chris Schneider
MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for
Big Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW
Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software 22 nd October 2013 10:00 Sesión B - DB2 LUW 1 Agenda Big Data The Technical Challenges Architecture of Hadoop
Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.
Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Architecting for the Internet of Things & Big Data
Architecting for the Internet of Things & Big Data Robert Stackowiak, Oracle North America, VP Information Architecture & Big Data September 29, 2014 Safe Harbor Statement The following is intended to
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon [email protected] [email protected] XLDB
Apache Sentry. Prasad Mujumdar [email protected] [email protected]
Apache Sentry Prasad Mujumdar [email protected] [email protected] Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY
Spark in Action Fast Big Data Analytics using Scala Matei Zaharia University of California, Berkeley www.spark- project.org UC BERKELEY My Background Grad student in the AMP Lab at UC Berkeley» 50- person
Parquet. Columnar storage for the people
Parquet Columnar storage for the people Julien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter Nong Li [email protected] Software engineer, Cloudera Impala Outline Context from various
Next-Gen Big Data Analytics using the Spark stack
Next-Gen Big Data Analytics using the Spark stack Jason Dai Chief Architect of Big Data Technologies Software and Services Group, Intel Agenda Overview Apache Spark stack Next-gen big data analytics Our
Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager
Oracle Data Integrator for Big Data Alex Kotopoulis Senior Principal Product Manager Hands on Lab - Oracle Data Integrator for Big Data Abstract: This lab will highlight to Developers, DBAs and Architects
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park
Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable
extreme Datamining mit Oracle R Enterprise
extreme Datamining mit Oracle R Enterprise Oliver Bracht Managing Director eoda Matthias Fuchs Senior Consultant ISE Information Systems Engineering GmbH extreme Datamining with Oracle R Enterprise About
How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
I/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.
Welkom! WIE? Bestuurslid OGh met BI / WA ervaring Bepalen activiteiten van de vereniging Deelname in organisatie commite van 1 of meerdere events Faciliteren van de SIG s Redactie van OGh-Visie Onderhouden
Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Hadoop Ecosystem Overview of this Lecture Module Background Google MapReduce The Hadoop Ecosystem Core components: Hadoop
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
Inge Os Sales Consulting Manager Oracle Norway
Inge Os Sales Consulting Manager Oracle Norway Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database Machine Oracle & Sun Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database
Trafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
The Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @
The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...
Hadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
Best Practices for Hadoop Data Analysis with Tableau
Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1
Spark ΕΡΓΑΣΤΗΡΙΟ 10 Prepared by George Nikolaides 4/19/2015 1 Introduction to Apache Spark Another cluster computing framework Developed in the AMPLab at UC Berkeley Started in 2009 Open-sourced in 2010
Can the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
