Oracle Big Data SQL Konference Data a znalosti 2015

Size: px
Start display at page:

Download "Oracle Big Data SQL Konference Data a znalosti 2015"

Transcription

1 Oracle Big Data SQL Konference Data a znalosti 2015 Jakub ILLNER Information Management Architect XLOB Enterprise Cloud Architects 23 July 2015, version 2

2 Agenda Is SQL Dead? Introducing Oracle Big Data SQL How Oracle Big Data SQL Works Demonstration Questions and Answers 2

3 Is SQL Dead? With all the Big Data and NoSQL technologies, why to bother with SQL 3

4 Is SQL Dead? 4

5 //Part 5 // Keeping context var lastuniqueid = "foobar" var lastrecord: (DataKey, Int) = null var lastlastrecord: (DataKey, Int) = null Peaks and Valleys in Spark vs. SQL var position = 0 it.foreach( r => { position = position + 1 if (!lastuniqueid.equals(r._1.uniqueid)) { Ticker lastrecord = null lastlastrecord = null //Part 6 : Finding those peaks and valleys if (lastrecord!= null && lastlastrecord!= null) { if (lastrecord._2 < r._2 && lastrecord._2 < lastlastrecord._2) { results.+=(new PivotPoint(r._1.uniqueId, position, lastrecord._1.eventtime, lastrecord._2, false)) else if (lastrecord._2 > r._2 && lastrecord._2 > lastlastrecord._2) { results.+=(new PivotPoint(r._1.uniqueId, position, lastrecord._1.eventtime, lastrecord._2, true)) lastuniqueid = r._1.uniqueid lastlastrecord = lastrecord lastrecord = r ) results.iterator ) //Part 7 : pretty everything up pivotpointrdd.map(r => { val pivottype = if (r.ispeak) "peak" else "valley" r.uniqueid + "," + r.position + "," + r.eventtime + "," + r.eventvalue + "," + pivottype ).saveastextfile(outputpath) class DataKey(val uniqueid:string, val eventtime:long) extends Serializable with Comparable[DataKey] { override def compareto(other:datakey): Int = { val compare1 = uniqueid.compareto(other.uniqueid) if (compare1 == 0) { eventtime.compareto(other.eventtime) else { compare1 class PivotPoint(val uniqueid: String, val position:int, val eventtime:long, val eventvalue:int, val ispeak:boolean) extends Serializable { package com.hadooparchitecturebook.spark.peaksandvalleys import org.apache.hadoop.io.{text, LongWritable import org.apache.hadoop.mapred.textinputformat import org.apache.spark.rdd.shuffledrdd import org.apache.spark.{partitioner, SparkContext, SparkConf import scala.collection.mutable /** * Created by ted.malaska on 12/7/14. */ object 5 SparkPeaksAndValleysExecution { def main(args: Array[String]): Unit = { if (args.length == 0) { Finding Peaks and Valleys in Stock Market Data SELECT PRIMARY KEY, POSITION, EVENT_VALUE, CASE WHEN LEAD_EVENT_VALUE is null or LAG_EVENT_VALUE is null then 'EDGE' WHEN EVENT_VALUE < LEAD_EVENT_VALUE AND EVENT_VALUE < LAG_EVENT_VALUE then 'VALLEY' WHEN EVENT_VALUE > LEAD_EVENT_VALUE AND EVENT_VALUE > LAG_EVENT_VALUE then 'PEAK' ELSE 'SLOPE' AND AS POINT_TYPE FROM ( SELECT PRIMARY_KEY, POSITION, EVENT_VALUE, LEAD(EVENT_VALUE,1,null) OVER (PARTITION BY PRIMARY_KEY ORDER BY POSITION) AS LEAD_EVENT_VALUE, LAG(EVENT_VALUE,1,null) OVER (PARTITION BY PRIMARY_KEY ORDER BY POSITION) AS LAG_EVENT_VALUE FROM PEAK_AND_VALLEY_TABLE ) 132 Lines of Scala/Spark 17 Lines of SQL Copyright 2014, Oracle and/or its affiliates. All rights reserved. 7x less code 10:00 10:05 10:10 10:15 10:20 10:25 Example taken from Hadoop Application Architectures by Mark Grover, Ted Malaska, Jonathan Seidman & Gwen Shapira (O Reilly, July 2015) 5

6 Power of SQL Declarative Abstracted (from storage) Concise Powerful Simple to learn Rich analytical functions Fast Secure Standardized Widely used & known 6

7 SQL on Hadoop Hive First SQL engine on Hadoop Uses MapReduce for execution Contains metastore (HCatalog) New Hive-on-Spark project Impala Developed by Cloudera Fast, in-memory execution Introduced Parquet format Compatible with Hive SparkSQL Spark module for accessing structured data Fast, in-memory execution Compatible with Hive Presto Developed by Facebook Fast, low latency execution Compatible with Hive Connectivity to other sources 7

8 What if you need to query Hadoop & RDBMS data? Pure Hadoop-on-SQL engines can access data in Hadoop only (HDFS, Hive, Parquet, ORC, HBase etc.) Performance of BI tools like Cognos, Oracle BIEE, SAS, Tableau for large and complex federated queries is limited Possible solution is to use SQL interface & Hadoop integration available with the DBMS platform of choice 8

9 Which SQL-on-Hadoop approach will you use most? From 9% in 2014 to 19% in

10 Introducing Oracle Big Data SQL One SQL to query ALL the data 10

11 What if you could Make all data easily available to all your Oracle Database applications While supporting the full breadth of Oracle SQL query language With all the security of Oracle Database 12c Without moving data between your Hadoop cluster and the RDBMS And deliver fast query performance While leveraging your existing skills And still utilize the latest Hadoop innovations 11

12 One SQL to query ALL the data CQL N1QL UnQL SQL HiveQL NoSQL 12

13 Rich Analytical Functions with Oracle SQL Ranking functions Rank, dense_rank, cume_dist, percent_rank, ntile Window Aggregate functions (moving and cumulative) Avg, sum, min, max, count, variance, stddev, first_value, last_value LAG/LEAD functions Direct inter-row reference using offsets Reporting Aggregate functions Sum, avg, min, max, variance, stddev, count, ratio_to_report Statistical Aggregates Correlation, linear regression family, covariance Linear regression Fitting of an ordinary-least-squares regression line to a set of number pairs. Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions Descriptive Statistics DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values Correlations Pearson s correlation coefficients, Spearman's and Kendall's (both nonparametric). Cross Tabs Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa Hypothesis Testing Student t-test, F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA Distribution Fitting Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential 13

14 next = linenext.getquantity(); if (!q.isempty() && (prev.isempty() (eq(q, prev) && gt(q, next)))) { state = "S"; return state; Pattern Matching With Oracle SQL if (gt(q, prev) && gt(q, next)) { state = "T"; return state; Ticker if (lt(q, prev) && lt(q, next)) { state = "B"; return state; if (!q.isempty() && (next.isempty() (gt(q, prev) && eq(q, next)))) { state = "E"; return state; if (q.isempty() eq(q, prev)) { state = "F"; return state; Finding Patterns in Stock Market Data - Double Bottom (W) 10:00 10:05 10:10 10:15 10:20 10:25 return state; private boolean eq(string a, String b) { if (a.isempty() b.isempty()) { return false; return a.equals(b); private boolean gt(string a, String b) { if (a.isempty() b.isempty()) { return false; return Double.parseDouble(a) > Double.parseDouble(b); private boolean lt(string a, String b) { if (a.isempty() b.isempty()) { return false; return Double.parseDouble(a) < Double.parseDouble(b); public String getstate() { return this.state; BagFactory bagfactory = public Tuple exec(tuple input) throws IOException { SELECT first_x, last_z FROM ticker MATCH_RECOGNIZE ( PARTITION BY name ORDER BY time MEASURES FIRST(x.time) AS first_x, LAST(z.time) AS last_z ONE ROW PER MATCH PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price) AND z.time - FIRST(x.time) <= 7 )) 14 long c = 0; String line = ""; String pbkey = ""; V0Line nextline; V0Line thisline; V0Line processline; V0Line evalline = null; V0Line prevline; boolean nomorevalues = false; String matchlist = ""; ArrayList<V0Line> linefifo = new ArrayList<V0Line>(); boolean finished = false; 250+ Lines of Java UDF 12 Lines of Oracle SQL DataBag output = bagfactory.newdefaultbag(); if (input == null) { return null; if (input.size() == 0) { return null; Object o = input.get(0); if (o == null) { return null; Copyright 2014, Oracle and/or its affiliates. All rights reserved. //Object o = input.get(0); if (!(o instanceof DataBag)) { int errcode = 2114; 20x less code 14

15 Security Virtual Private Database with Oracle SQL SELECT * FROM my_bigdata_table WHERE SALES_REP_ID = SYS_CONTEXT('USERENV','SESSION_USER'); B B B Big Data SQL on Hadoop Cluster Filter on SESSION_USER Oracle Database 12c Oracle Virtual Private Database (VPD) enables you to create security policies to control database access at the row and column level. Oracle VPD adds a dynamic WHERE clause to a SQL statement that is issued against the table, view, or synonym to which an Oracle Virtual Private Database security policy was applied. Because you attach security policies directly to the database objects (tables, views), and the policies are automatically applied whenever a user accesses data, there is no way to bypass security. With Big Data SQL the Oracle Virtual Private Database is available for Hadoop data 15

16 Security Data Redaction with Oracle SQL DBMS_REDACT.ADD_POLICY( object_schema => 'MCLICK', object_name => 'TWEET_V', column_name => 'USERNAME', policy_name => 'tweet_redaction', function_type => DBMS_REDACT.PARTIAL, function_parameters => 'VVVVVVVVVVVVVVVVVVVVVVVVV,*,3,25', expression => '1=1' ); B B B *** Oracle Data Redaction enables you to create security policies to control what data is visible for sensitive columns with personal or security information. Oracle Data Redaction dynamically applies redaction function to columns. The function transforms, obfuscates or hides the sensitive information for unauthorized users. Since the policy is applied automatically by Oracle Database there is no way to bypass security and get the un-redacted data. With Big Data SQL the Oracle Virtual Private Database is available for Hadoop data Big Data SQL on Hadoop Cluster Oracle Database 12c 16

17 How Oracle Big Data SQL Works Marriage of Hadoop and Oracle Database Query Processing 17

18 Hadoop data accessible through Oracle External Tables CREATE TABLE movielog (click VARCHAR2(4000)) ORGANIZATION EXTERNAL ( TYPE ORACLE_HIVE DEFAULT DIRECTORY Dir1 ACCESS PARAMETERS ( com.oracle.bigdata.tablename logs com.oracle.bigdata.cluster mycluster) ) REJECT LIMIT UNLIMITED New set of properties ORACLE_HIVE and ORACLE_HDFS access drivers Identify a Hadoop cluster, data source, column mapping, error handling, overflow handling, logging New table metadata passed from Oracle DDL to Hadoop readers at query execution Architected for extensibility StorageHandler capability enables future support for many other data sources Examples: MongoDB, HBase, Oracle NoSQL DB 18

19 How Oracle executes a query with Big Data SQL HDFS NameNode 1 Hive Metastore HDFS Data Node BDSQL B B B HDFS Data Node BDSQL Query compilation determines: Data locations Data structure Parallelism Fast reads using Big Data SQL Server Schema-for-read using Hadoop classes Smart Scan selects only relevant data Process filtered result Move relevant data to database Join with database tables Apply database security policies Big Data SQL on Hadoop Cluster Oracle Database 12c 19

20 Big Data SQL: A New Hadoop Processing Engine Processing Layer MapReduce and Hive Spark Impala Search Big Data SQL Resource Management (YARN, cgroups) Storage Layer Filesystem (HDFS) NoSQL Databases (Oracle NoSQL DB, HBase) 20

21 Big Data SQL Uses Hive Metastore, not MapReduce Oracle Big Data SQL SparkSQL Hive Impala Hive Metastore Common semantic repository (schemas, Java classes) for most of SQL-on- Hadoop tools Metastore maps DDL to Java access classes 21

22 How Data is Stored in Hadoop Example: 1TB JSON File {"custid": ,"movieid":null,"genreid":null,"time":" :00:00:07","recommended":null,"activity":8 {"custid": ,"movieid":1948,"genreid":9,"time":" :00:00:22","recommended":"n","activity":7 {"custid": ,"movieid":null,"genreid":null,"time":" :00:00:26","recommended":null,"activity":9 Block B1 {"custid": ,"movieid":11547,"genreid":44,"time":" :00:00:32","recommended":"y","activity":7 {"custid": ,"movieid":11547,"genreid":44,"time":" :00:00:42","recommended":"y","activity":6 {"custid": ,"movieid":null,"genreid":null,"time":" :00:00:43","recommended":null,"activity":8 {"custid": ,"movieid":null,"genreid":null,"time":" :00:00:50","recommended":null,"activity":9 {"custid": ,"movieid":608,"genreid":6,"time":" :00:01:03","recommended":"n","activity":7 Block B2 {"custid": ,"movieid":null,"genreid":null,"time":" :00:01:07","recommended":null,"activity":9 {"custid": ,"movieid":27205,"genreid":9,"time":" :00:01:18","recommended":"y","activity":7 {"custid": ,"movieid":1124,"genreid":9,"time":" :00:01:26","recommended":"y","activity":7 {"custid": ,"movieid":16309,"genreid":9,"time":" :00:01:35","recommended":"n","activity":7 {"custid": ,"movieid":11547,"genreid":44,"time":" :00:01:39","recommended":"y","activity":7 Block B3 {"custid": ,"movieid":424,"genreid":1,"time":" :00:05:02","recommended":"y","activity":4 1 block = 256 MB Example File = 4096 blocks InputSplits = 4096 Potential scan parallelism 22

23 Hive Storage Handler How MapReduce and Hive Read Data Consumer Scan and row creation needs to be able to work on any data format Create ROWS & COLUMNS Data definitions and column deserializations are needed to provide a table SCAN Data Node disk RecordReader => Scans data (keys and values) InputFormat => Defines parallelism SerDe => Makes columns Metastore => Maps DDL to Java access classes 23

24 Big Data SQL Server Dataflow Big Data SQL Smart Scan External Table Services Read data from HDFS Data Node Direct-path reads C-based readers when possible Use native Hadoop classes otherwise Translate bytes to Oracle SerDe RecordReader Data Node Disks 1 3 Apply Smart Scan to Oracle bytes Apply filters Project Columns Parse JSON/XML Score models 24

25 Operations Pushed Down to Hadoop Big Data SQL on Hadoop Cluster Oracle Database 12c Request JSON Pushed down to Big Data SQL Cell Hadoop scans (InputFormat, SerDe) JSON parsing WHERE clause evaluation Storage index evaluation Column projection Bloom filters for faster joins Score Data Mining models Oracle Data Stream Smart Scan Only relevant data are emitted Handled by Oracle Database Query Compilation & Optimization Joins Aggregations Ordering of results PL/SQL evaluation Table functions Security policies 25

26 Oracle Big Data SQL Storage Index HDFS Field 1, Field 2, Field 3,, Field n HDFS Block1 (256MB) HDFS Block2 (256MB) Example: Find all ratings from movies with a MOVIE_ID of 1109 Index B1 Movie_ID Min: 1001 Max: 1609 B2 Movie_ID Min: 1909 Max: Storage index provides query speed-up through transparent IO elimination of HDFS Blocks Columns in SQL are mapped to fields in the HDFS file via External Table Definitions Min / max value is recorded for each HDFS Block in a inmemory storage index 26

27 Oracle Parallel Query and Hadoop 1 Determine Hadoop Parallelism Determine schema-for-read Determine InputSplits Arrange splits for best performance HDFS NameNode Hive Metastore B B B 1 InputSplits 2 PX 2 3 Map to Oracle Parallelism Map splits to granules Assign batches of granules to PX Servers PX Servers Route Work Send granule requests async to cells Reap results Big Data SQL on Hadoop Cluster Oracle Database 12c 27

28 Host 4 Host 3 Host 2 Host 1 Oracle and Hadoop Parallelism Big Data SQL on Hadoop Cluster Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Granule Request Async Parallelism defined by Hadoop InputSplits (fan-out from PX to Hadoop) Utilized as many cores as provided by cgroups (first-come-first-serve) Oracle Database 12c PX Server #1 PX Server #2 PX Server #3 PX Server #4 PX Server #5 PX Server #6 PX Server #7 PX Server #8 Parallelism defined by Degree of Parallelism (DOP) dynamic, statement, table level DOP can be throttled by database if maximum DOP exceeded or table too small SELECT /*+PARALLEL(EVE,8)*/ CUST.NAME, CUST.MSISDN, EVE.MONTH, EVE.EVENT_TYPE, COUNT(*) AS EVENT_COUNT, SUM(EVE.DURATION) AS DURATION FROM D_CUSTOMERS CUST, F_NETWORK_EVENTS EVE WHERE CUST.MSISDN = EVE.MSISDN GROUP BY CUST.NAME, CUST.MSISDN, EVE.MONTH, EVE.EVENT_TYPE ORDER BY 1,2,3,4 28

29 Big Data SQL Prerequisites Oracle 12c on Linux Oracle Exadata Oracle Big Data Appliance Infiniband interconnection between Oracle Exadata and Oracle Big Data Appliance Oracle Big Data Appliance with CDH B B B Infiniband Oracle 12c on Oracle Exadata 29

30 Demonstration Big Data SQL in Action 30

31 31

32 32

33 33

34 34

35 35

36 Questions and Answers Provided we still have some time 36

37 Knowledge Check True or False Hive is leveraged by Big Data SQL as a query execution engine - allowing BDS queries to automatically execute faster as the Hive execution engine improves (e.g. Spark replaces MapReduce) False Big Data SQL leverages only the Hive Metastore (HCatalog) and the corresponding classes (InputFormat, RecordReader, SerDe) but it does not use Hive for execution 37

38 Knowledge Check True or False Oracle Big Data SQL sends all the data from Hadoop to Oracle Database where the query is processed. Oracle Database does column selection, it applies WHERE, GROUP BY and ORDER BY clauses etc. False Big Data SQL Smart Scan performs low level processing of query. Smart Scan does the column projection, it applies WHERE condition & Bloom filters, it processes JSON etc. 38

39 Knowledge Check True or False Oracle s ambition with Big Data SQL is to supersede all the Hadoop-on-SQL engines like Hive, SparkSQL, Impala, Drill or Presto. False Big Data SQL is for companies with significant Oracle assets (e.g. Oracle Data Warehouse) who wants to access and process both Hadoop and Oracle data from single SQL environment 39

40 Thank You! 40

41 Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 41

42 42

43

Oracle Big Data SQL Architectural Deep Dive

Oracle Big Data SQL Architectural Deep Dive Oracle Big Data SQL Architectural Deep Dive Dan McClary, Ph.D. Big Data Product Management Oracle Safe Harbor Statement The following is intended to outline our general product direction. It is intended

More information

SQL - the best analysis language for Big Data!

SQL - the best analysis language for Big Data! SQL - the best analysis language for Big Data! NoCOUG Winter Conference 2014 Hermann Bär, [email protected] Data Warehousing Product Management, Oracle 1 The On-Going Evolution of SQL Introduction

More information

Seamless Access from Oracle Database to Your Big Data

Seamless Access from Oracle Database to Your Big Data Seamless Access from Oracle Database to Your Big Data Brian Macdonald Big Data and Analytics Specialist Oracle Enterprise Architect September 24, 2015 Agenda Hadoop and SQL access methods What is Oracle

More information

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle Oracle Big Data SQL Architectural Deep Dive Dan McClary, Ph.D. Big Data Product Management Oracle Copyright 2014, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is

More information

Big Data SQL and Query Franchising

Big Data SQL and Query Franchising Big Data SQL and Query Franchising An Architecture for Query Beyond Hadoop Dan McClary, Ph.D. Big Data Product Management Oracle Copyright 2014, Oracle and/or its affiliates. All rights reserved. Safe Harbor

More information

Exadata V2 + Oracle Data Mining 11g Release 2 Importing 3 rd Party (SAS) dm models

Exadata V2 + Oracle Data Mining 11g Release 2 Importing 3 rd Party (SAS) dm models Exadata V2 + Oracle Data Mining 11g Release 2 Importing 3 rd Party (SAS) dm models Charlie Berger Sr. Director Product Management, Data Mining Technologies Oracle Corporation [email protected]

More information

The Oracle Data Mining Machine Bundle: Zero to Predictive Analytics in Two Weeks Collaborate 15 IOUG

The Oracle Data Mining Machine Bundle: Zero to Predictive Analytics in Two Weeks Collaborate 15 IOUG The Oracle Data Mining Machine Bundle: Zero to Predictive Analytics in Two Weeks Collaborate 15 IOUG Presentation #730 Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com Presentation

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Statistical Analysis of Gene Expression Data With Oracle & R (- data mining)

Statistical Analysis of Gene Expression Data With Oracle & R (- data mining) Statistical Analysis of Gene Expression Data With Oracle & R (- data mining) Patrick E. Hoffman Sc.D. Senior Principal Analytical Consultant [email protected] Agenda (Oracle & R Analysis) Tools Loading

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

OLSUG Workshop Oracle Data Mining

OLSUG Workshop Oracle Data Mining OLSUG Workshop Oracle Data Mining Charlie Berger Sr. Director of Product Mgmt, Life Sciences and Data Mining Oracle Corporation [email protected] Dr. Lutz Hamel Asst. Professor, Computer Science

More information

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich Agenda Introduction Old Times Exadata Big Data Oracle In-Memory Headquarters Conclusions 2 sumit

More information

Oracle Data Mining In-Database Data Mining Made Easy!

Oracle Data Mining In-Database Data Mining Made Easy! Oracle Data Mining In-Database Data Mining Made Easy! Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics Oracle Corporation [email protected] www.twitter.com/charliedatamine

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Blazing BI: the Analytic Options to the Oracle Database. ODTUG Kscope 2013

Blazing BI: the Analytic Options to the Oracle Database. ODTUG Kscope 2013 Blazing BI: the Analytic Options to the Oracle Database ODTUG Kscope 2013 Dan Vlamis Tim Vlamis Vlamis Software Solutions 816-781-2880 http://www.vlamis.com Copyright 2013, Vlamis Software Solutions, Inc.

More information

How To Manage Big Data In A Microsoft Cloud (Hadoop)

How To Manage Big Data In A Microsoft Cloud (Hadoop) Oracle Database 12c and the Future of Data Warehousing in the Era of Big Data George Lumpkin Data Warehousing Neil Mendelson Big Data & Advanced AnalyEcs Vice Presidents Server Technologies September 29,

More information

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig Contents Acknowledgements... 1 Introduction to Hive and Pig... 2 Setup... 2 Exercise 1 Load Avro data into HDFS... 2 Exercise 2 Define an

More information

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. 1 Copyright 2011, Oracle and/or its affiliates. FPO In-Database Analytics: Predictive Analytics, Data Mining, Exadata & Business Intelligence Charlie Berger Sr. Director Product Management, Data Mining

More information

Oracle Database 12c Plug In. Switch On. Get SMART.

Oracle Database 12c Plug In. Switch On. Get SMART. Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.

More information

Using distributed technologies to analyze Big Data

Using distributed technologies to analyze Big Data Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/

More information

Integrate Master Data with Big Data using Oracle Table Access for Hadoop

Integrate Master Data with Big Data using Oracle Table Access for Hadoop Integrate Master Data with Big Data using Oracle Table Access for Hadoop Kuassi Mensah Oracle Corporation Redwood Shores, CA, USA Keywords: Hadoop, BigData, Hive SQL, Spark SQL, HCatalog, StorageHandler

More information

Oracle's In-Database Statistical Functions

Oracle's In-Database Statistical Functions Oracle 11g DB Data Warehousing Oracle's In-Database Statistical Functions OLAP Statistics Data Mining Charlie Berger Sr. Director Product Management, Data Mining Technologies

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Data Domain Profiling and Data Masking for Hadoop

Data Domain Profiling and Data Masking for Hadoop Data Domain Profiling and Data Masking for Hadoop 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or

More information

Cloudera Certified Developer for Apache Hadoop

Cloudera Certified Developer for Apache Hadoop Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number

More information

Using RDBMS, NoSQL or Hadoop?

Using RDBMS, NoSQL or Hadoop? Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest

More information

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Spring,2015 Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Contents: Briefly About Big Data Management What is hive? Hive Architecture Working

More information

Semantic and Data Mining Technologies. Simon See, Ph.D.,

Semantic and Data Mining Technologies. Simon See, Ph.D., Semantic and Data Mining Technologies Simon See, Ph.D., Introduction to Semantic Web and Business Use Cases 2 Lots of Scientific Resources NAR 2009 over 1170 databases Reuse, Recycling, Repurposing Paul

More information

The Hadoop Eco System Shanghai Data Science Meetup

The Hadoop Eco System Shanghai Data Science Meetup The Hadoop Eco System Shanghai Data Science Meetup Karthik Rajasethupathy, Christian Kuka 03.11.2015 @Agora Space Overview What is this talk about? Giving an overview of the Hadoop Ecosystem and related

More information

How to Choose Between Hadoop, NoSQL and RDBMS

How to Choose Between Hadoop, NoSQL and RDBMS How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A

More information

Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc.

Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc. Impala: A Modern, Open-Source SQL Engine for Hadoop Marcel Kornacker Cloudera, Inc. Agenda Goals; user view of Impala Impala performance Impala internals Comparing Impala to other systems Impala Overview:

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

Big Data Analytics with Oracle Advanced Analytics In-Database Option

Big Data Analytics with Oracle Advanced Analytics In-Database Option Big Data Analytics with Oracle Advanced Analytics In-Database Option Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics [email protected] www.twitter.com/charliedatamine

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016 Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible

More information

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets

More information

Hadoop Job Oriented Training Agenda

Hadoop Job Oriented Training Agenda 1 Hadoop Job Oriented Training Agenda Kapil CK [email protected] Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

Real World Hadoop Use Cases

Real World Hadoop Use Cases Real World Hadoop Use Cases JFokus 2013, Stockholm Eva Andreasson, Cloudera Inc. Lars Sjödin, King.com 1 2012 Cloudera, Inc. Agenda Recap of Big Data and Hadoop Analyzing Twitter feeds with Hadoop Real

More information

Integrating Apache Spark with an Enterprise Data Warehouse

Integrating Apache Spark with an Enterprise Data Warehouse Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

Predictive Analytics for Better Business Intelligence

Predictive Analytics for Better Business Intelligence Oracle 11g DB Data Warehousing ETL OLAP Statistics Predictive Analytics for Better Business Intelligence Data Mining Charlie Berger Sr. Director Product Management, Data Mining Technologies

More information

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05 Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970

More information

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig Introduction to Pig Agenda What is Pig? Key Features of Pig The Anatomy of Pig Pig on Hadoop Pig Philosophy Pig Latin Overview Pig Latin Statements Pig Latin: Identifiers Pig Latin: Comments Data Types

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

Big Data Too Big To Ignore

Big Data Too Big To Ignore Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database 1 Best Practices for Extreme Performance with Data Warehousing on Oracle Database Rekha Balwada Principal Product Manager Agenda Parallel Execution Workload Management on Data Warehouse

More information

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing

More information

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015 COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt

More information

Complete Java Classes Hadoop Syllabus Contact No: 8888022204

Complete Java Classes Hadoop Syllabus Contact No: 8888022204 1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What

More information

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12 Introduction to NoSQL Databases and MapReduce Tore Risch Information Technology Uppsala University 2014-05-12 What is a NoSQL Database? 1. A key/value store Basic index manager, no complete query language

More information

Connecting Hadoop with Oracle Database

Connecting Hadoop with Oracle Database Connecting Hadoop with Oracle Database Sharon Stephen Senior Curriculum Developer Server Technologies Curriculum The following is intended to outline our general product direction.

More information

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah Pro Apache Hadoop Second Edition Sameer Wadkar Madhu Siddalingaiah Contents J About the Authors About the Technical Reviewer Acknowledgments Introduction xix xxi xxiii xxv Chapter 1: Motivation for Big

More information

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

HADOOP. Revised 10/19/2015

HADOOP. Revised 10/19/2015 HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...

More information

MySQL and Hadoop. Percona Live 2014 Chris Schneider

MySQL and Hadoop. Percona Live 2014 Chris Schneider MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software 22 nd October 2013 10:00 Sesión B - DB2 LUW 1 Agenda Big Data The Technical Challenges Architecture of Hadoop

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI

More information

Architecting for the Internet of Things & Big Data

Architecting for the Internet of Things & Big Data Architecting for the Internet of Things & Big Data Robert Stackowiak, Oracle North America, VP Information Architecture & Big Data September 29, 2014 Safe Harbor Statement The following is intended to

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon [email protected] [email protected] XLDB

More information

Apache Sentry. Prasad Mujumdar [email protected] [email protected]

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Apache Sentry Prasad Mujumdar [email protected] [email protected] Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY Spark in Action Fast Big Data Analytics using Scala Matei Zaharia University of California, Berkeley www.spark- project.org UC BERKELEY My Background Grad student in the AMP Lab at UC Berkeley» 50- person

More information

Parquet. Columnar storage for the people

Parquet. Columnar storage for the people Parquet Columnar storage for the people Julien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter Nong Li [email protected] Software engineer, Cloudera Impala Outline Context from various

More information

Next-Gen Big Data Analytics using the Spark stack

Next-Gen Big Data Analytics using the Spark stack Next-Gen Big Data Analytics using the Spark stack Jason Dai Chief Architect of Big Data Technologies Software and Services Group, Intel Agenda Overview Apache Spark stack Next-gen big data analytics Our

More information

Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager

Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager Oracle Data Integrator for Big Data Alex Kotopoulis Senior Principal Product Manager Hands on Lab - Oracle Data Integrator for Big Data Abstract: This lab will highlight to Developers, DBAs and Architects

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable

More information

extreme Datamining mit Oracle R Enterprise

extreme Datamining mit Oracle R Enterprise extreme Datamining mit Oracle R Enterprise Oliver Bracht Managing Director eoda Matthias Fuchs Senior Consultant ISE Information Systems Engineering GmbH extreme Datamining with Oracle R Enterprise About

More information

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5 Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved. Welkom! WIE? Bestuurslid OGh met BI / WA ervaring Bepalen activiteiten van de vereniging Deelname in organisatie commite van 1 of meerdere events Faciliteren van de SIG s Redactie van OGh-Visie Onderhouden

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Hadoop Ecosystem Overview of this Lecture Module Background Google MapReduce The Hadoop Ecosystem Core components: Hadoop

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Inge Os Sales Consulting Manager Oracle Norway

Inge Os Sales Consulting Manager Oracle Norway Inge Os Sales Consulting Manager Oracle Norway Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database Machine Oracle & Sun Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database

More information

Trafodion Operational SQL-on-Hadoop

Trafodion Operational SQL-on-Hadoop Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL

More information

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @ The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1 Spark ΕΡΓΑΣΤΗΡΙΟ 10 Prepared by George Nikolaides 4/19/2015 1 Introduction to Apache Spark Another cluster computing framework Developed in the AMPLab at UC Berkeley Started in 2009 Open-sourced in 2010

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information