Oracle Big Data SQL Architectural Deep Dive
|
|
- Beryl Madison McCarthy
- 8 years ago
- Views:
Transcription
1
2 Oracle Big Data SQL Architectural Deep Dive Dan McClary, Ph.D. Big Data Product Management Oracle
3 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. Oracle Confidential Internal/Restricted/Highly Restricted 3
4 Agenda The Data Analytics Challenge Why Unified Query Matters SQL on Hadoop and More: Unifying Metadata Query Franchising: Smart Scan for Hadoop Oracle Confidential Internal/Restricted/Highly Restricted 4
5 Data Analytics Challenge Separate silos of information to analyze 5
6 Data Analytics Challenge Separate data access interfaces 6
7 SQL on Hadoop is Obvious Stinger Oracle Confidential Internal/Restricted/Highly Restricted 7
8 Data Analytics Challenge No comprehensive SQL interface across Oracle, Hadoop and NoSQL 8
9 Oracle Big Data Management System Rich, comprehensive SQL access to all enterprise data NoSQL 9
10 What Does Unified Query Mean for You? Before Data Science After PhD Anyone???
11 What Does Unified Query Mean for You? Before Application Development After
12 Use Rich Oracle SQL Dialect Over All Data Snapshot of Oracle SQL Analytic Functions Ranking functions rank, dense_rank, cume_dist, percent_rank, ntile Window Aggregate functions (moving and cumulative) Avg, sum, min, max, count, variance, stddev, first_value, last_value LAG/LEAD functions Direct inter-row reference using offsets Reporting Aggregate functions Sum, avg, min, max, variance, stddev, count, ratio_to_report Statistical Aggregates Correlation, linear regression family, covariance Linear regression Fitting of an ordinary-least-squares regression line to a set of number pairs. Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions Descriptive Statistics DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values Correlations Pearson s correlation coefficients, Spearman's and Kendall's (both nonparametric). Cross Tabs Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa Hypothesis Testing Student t-test, F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA Distribution Fitting Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential
13 } next = linenext.getquantity(); if (!q.isempty() && (prev.isempty() (eq(q, prev) && gt(q, next)))) { state = "S"; return state; } if (gt(q, prev) && gt(q, next)) { state = "T"; return state; Pattern } Matching With Oracle SQL Snapshot of Oracle SQL Analytic Functions if (lt(q, prev) && lt(q, next)) { state = "B"; return state; } if (!q.isempty() && (next.isempty() (gt(q, prev) && eq(q, next)))) { state = "E"; return state; Simplified, } sophisticated, standards based syntax if (q.isempty() eq(q, prev)) { state = "F"; return state; } Finding Patterns in Stock Market Data - Double Bottom (W) Ticker 10:00 10:05 10:10 10:15 10:20 10:25 } return state; private boolean eq(string a, String b) { if (a.isempty() b.isempty()) { return false; } return a.equals(b); } private boolean gt(string a, String b) { if (a.isempty() b.isempty()) { return false; } return Double.parseDouble(a) > Double.parseDouble(b); } private boolean lt(string a, String b) { if (a.isempty() b.isempty()) { return false; } return Double.parseDouble(a) < Double.parseDouble(b); } public String getstate() { return this.state; } } BagFactory bagfactory = public Tuple exec(tuple input) throws IOException { SELECT first_x, last_z FROM ticker MATCH_RECOGNIZE ( PARTITION BY name ORDER BY time MEASURES FIRST(x.time) AS first_x, LAST(z.time) AS last_z ONE ROW PER MATCH PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price) AND z.time - FIRST(x.time) <= 7 )) long c = 0; String line = ""; String pbkey = ""; V0Line nextline; V0Line thisline; V0Line processline; V0Line evalline = null; V0Line prevline; boolean nomorevalues = false; String matchlist = ""; ArrayList<V0Line> linefifo = new ArrayList<V0Line>(); boolean finished = false; 250+ Lines of Java UDF 12 Lines of SQL 13 DataBag output = bagfactory.newdefaultbag(); if (input == null) { return null; } if (input.size() == 0) { return null; } Object o = input.get(0); if (o == null) { return null; } Copyright 2014, Oracle and/or its affiliates. All rights reserved. //Object o = input.get(0); if (!(o instanceof DataBag)) { int errcode = 2114; 20x less code
14 Oracle Big Data SQL A New Architecture Powerful, high-performance SQL on Hadoop Full Oracle SQL capabilities on Hadoop SQL query processing local to Hadoop nodes Simple data integration of Hadoop and Oracle Database Single SQL point-of-entry to access all data Scalable joins between Hadoop and RDBMS data Optimized hardware Balanced Configurations No bottlenecks Oracle Confidential Internal/Restricted/Highly Restricted 14
15 100% Want to know what this really means.
16 SQL on Hadoop and More: Unifying Metadata
17 Why Unify Metadata? CREATE TABLE customers SELECT name FROM customers sales Catalog customers Query CREATE across TABLE sources sales Integrate new metadata SELECT No changes customers.name, for users sales.amount and applications Seamlessly handle schema-on-read Exploit remote data distribution Holistically optimize queries SALES CUSTOMERS
18 How Data is Stored in Hadoop Example: 1TB File {"custid": ,"movieid":null,"genreid":null,"time":" :00:00:07","recommended":null,"activity":8} {"custid": ,"movieid":1948,"genreid":9,"time":" :00:00:22","recommended":"n","activity":7} {"custid": ,"movieid":null,"genreid":null,"time":" :00:00:26","recommended":null,"activity":9} Block B1 {"custid": ,"movieid":11547,"genreid":44,"time":" :00:00:32","recommended":"y","activity":7} {"custid": ,"movieid":11547,"genreid":44,"time":" :00:00:42","recommended":"y","activity":6} {"custid": ,"movieid":null,"genreid":null,"time":" :00:00:43","recommended":null,"activity":8} {"custid": ,"movieid":null,"genreid":null,"time":" :00:00:50","recommended":null,"activity":9} {"custid": ,"movieid":608,"genreid":6,"time":" :00:01:03","recommended":"n","activity":7} Block B2 {"custid": ,"movieid":null,"genreid":null,"time":" :00:01:07","recommended":null,"activity":9} {"custid": ,"movieid":27205,"genreid":9,"time":" :00:01:18","recommended":"y","activity":7} {"custid": ,"movieid":1124,"genreid":9,"time":" :00:01:26","recommended":"y","activity":7} {"custid": ,"movieid":16309,"genreid":9,"time":" :00:01:35","recommended":"n","activity":7} {"custid": ,"movieid":11547,"genreid":44,"time":" :00:01:39","recommended":"y","activity":7}} Block B3 {"custid": ,"movieid":424,"genreid":1,"time":" :00:05:02","recommended":"y","activity":4} 1 block = 256 MB Example File = 4096 blocks InputSplits = 4096 Potential scan parallelism Oracle Confidential Internal/Restricted/Highly Restricted 18
19 How MapReduce and Hive Read Data Consumer Scan and row creation needs to be able to work on any data format Create ROWS & COLUMNS SCAN Data Node disk Data definitions and column deserializations are needed to provide a table RecordReader => Scans data (keys and values) InputFormat => Defines parallelism SerDe => Makes columns Metastore => Maps DDL to Java access classes 19
20 SQL-on-Hadoop Engines Share Metadata, not MapReduce Hive Metastore Oracle Big Data SQL SparkSQL Hive Impala Hive Metastore Table Definitions: movieapp_log_json Tweets avro_log Metastore maps DDL to Java access classes Oracle Confidential Internal/Restricted/Highly Restricted 20
21 Extend Oracle External Tables CREATE TABLE movielog ( click VARCHAR2(4000)) ORGANIZATION EXTERNAL ( TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS ( com.oracle.bigdata.tablename logs com.oracle.bigdata.cluster mycluster )) REJECT LIMIT UNLIMITED; New types of external tables ORACLE_HIVE (inherit metadata) ORACLE_HDFS (specify metadata) Access parameters for Big Data Hadoop cluster Remote Hive database/table DBMS_HADOOP Package for automatic import 21
22 Enhance Oracle External Tables CREATE TABLE ORDER ( cust_num VARCHAR2(10), order_num VARCHAR2(20), order_total NUMBER(8,2)) ORGANIZATION EXTERNAL ( TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ) PARALLEL 20 REJECT LIMIT UNLIMITED; Transparent schema-for-read Use fast C-based readers when possible Use native Hadoop classes otherwise Engineered to understand parallelism Map external units of parallelism to Oracle Architected for extensibility StorageHandler capability enables future support for other data sources Examples: MongoDB, HBase, Oracle NoSQL DB 22
23 Query Franchising: Smart Scan for Hadoop
24 What Can Big Data Learn from Exadata? Intelligent Storage Maximizes Performance SELECT name, SUM(purchase) FROM customers GROUP BY name; 1 Oracle SQL query issued Plan constructed Query executed Oracle Exadata Storage Server 2 Smart Scan Works on Storage Filter out unneeded rows Project only queried columns Score data models Bloom filters to speed up joins Oracle Exadata Storage Server CUSTOMERS
25 Query Franchising dispatch of query processing to self-similar compute agents on disparate systems without loss of operational fidelity
26 Big Data SQL Server: A New Hadoop Processing Engine MapReduce and Hive Processing Layer Spark Impala Search Big Data SQL Resource Management (YARN, cgroups) Storage Layer Filesystem (HDFS) NoSQL Databases (Oracle NoSQL DB, Hbase) Oracle Confidential Internal/Restricted/Highly Restricted 26
27 Smart Scan for Hadoop: Optimizing Performance Big Data SQL Server Smart Scan External Table Services Data Node Oracle on top Apply filter predicates Project columns Parse semi-structured data Hadoop on the bottom Work close to the data Schema-on-read with Hadoop classes Transformation into Oracle data stream Disk Oracle Confidential Internal/Restricted/Highly Restricted 27
28 Big Data SQL Query Execution How do we query Hadoop? HDFS NameNode 1 Hive Metastore HDFS Data Node BDS Server B B B HDFS Data Node BDS Server Query compilation determines: Data locations Data structure Parallelism Fast reads using Big Data SQL Server Schema-for-read using Hadoop classes Smart Scan selects only relevant data Process filtered result Move relevant data to database Join with database tables Apply database security policies
29 Mapping Hadoop to Oracle Parallel Query and Hadoop HDFS NameNode Hive Metastore B B B 1 InputSplits 2 PX Determine Hadoop Parallelism Determine schema-for-read Determine InputSplits Arrange splits for best performance Map to Oracle Parallelism Map splits to granules Assign granules to PX Servers PX Servers Route Work Offload work to Big Data SQL Servers Aggregate Join Apply PL/SQL
30 Big Data SQL Server Dataflow Big Data SQL Server Smart Scan External Table Services Read data from HDFS Data Node Direct-path reads C-based readers when possible Use native Hadoop classes otherwise Translate bytes to Oracle SerDe RecordReader Data Node Disks 1 3 Apply Smart Scan to Oracle bytes Apply filters Project Columns Parse JSON/XML Score models
31 But How Does Security Work? DBMS_REDACT.ADD_POLICY( object_schema => 'MCLICK', SELECT object_name * FROM => 'TWEET_V', my_bigdata_table WHERE column_name SALES_REP_ID => 'USERNAME', = SYS_CONTEXT('USERENV','SESSION_USER'); policy_name => 'tweet_redaction', function_type => DBMS_REDACT.PARTIAL, function_parameters => 'VVVVVVVVVVVVVVVVVVVVVVVVV,*,3,25', expression => '1=1' ); B B B *** Filter on SESSION_USER Database security for query access Virtual Private Databases Redaction Audit Vault and Database Firewall Hadoop security for Hadoop jobs Kerberos Authentication Apache Sentry (RBAC) Audit Vault System-specific encryption Database tablespace encryption BDA On-disk Encryption
32 SQL, Everywhere Futures
33 More Lessons from Exadata Move less data Go faster 1 Storage Indexes Skip reads on irrelevant data Big Hadoop Blocks ~ Big Speed Up 2 Caching Cache frequently accessed columns HDFS Caching
34 Oracle Big Data Management System Rich, comprehensive SQL access to all enterprise data NoSQL 34
35 Oracle Big Data Management System Unite Information Lifecycles NoSQL Shared REST APIs App-embedded schema NoSQL Shared schema Oracle Automatic ILM Roll off cold partitions to Hadoop Promote hot data to Oracle 35
36 Oracle Big Data Management System Unify All Query NoSQL 36
37
38
39 Big Data SQL SELECT w.sess_id,w.cust_id,c.name FROM web_logs w, customers c WHERE w.source_country = Brazil AND c.customer_id = w.cust_id WEB_LOGS Big Data SQL B B B Hadoop Cluster Relevant SQL runs on BDA nodes 10 s of Gigabytes of Data Only columns and rows needed to answer query are returned Oracle Database CUSTOMERS 39
Big Data: Are you ready?
Big Data: Are you ready? Oracle Big Data SQL George Bourmas Enterprise Architect EMEA XLOB Enterprise Architects September 13, 2014 Oracle Confidential Internal/Restricted/Highly Restricted Thoughts Things
More informationOracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle
Oracle Big Data SQL Architectural Deep Dive Dan McClary, Ph.D. Big Data Product Management Oracle Copyright 2014, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is
More informationBig Data Management System Solution Overview
Big Data Management System Solution Overview Pascal GUY Pre Sales Architect Business Unit Systems Oracle France Copyright 2014 Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The
More informationBig Data SQL and Query Franchising
Big Data SQL and Query Franchising An Architecture for Query Beyond Hadoop Dan McClary, Ph.D. Big Data Product Management Oracle Copyright 2014, Oracle and/or its affiliates. All rights reserved. Safe Harbor
More informationOracle Big Data SQL Konference Data a znalosti 2015
Oracle Big Data SQL Konference Data a znalosti 2015 Jakub ILLNER Information Management Architect XLOB Enterprise Cloud Architects 23 July 2015, version 2 Agenda 1 2 3 4 5 Is SQL Dead? Introducing Oracle
More informationSeamless Access from Oracle Database to Your Big Data
Seamless Access from Oracle Database to Your Big Data Brian Macdonald Big Data and Analytics Specialist Oracle Enterprise Architect September 24, 2015 Agenda Hadoop and SQL access methods What is Oracle
More informationSQL - the best analysis language for Big Data!
SQL - the best analysis language for Big Data! NoCOUG Winter Conference 2014 Hermann Bär, hermann.baer@oracle.com Data Warehousing Product Management, Oracle 1 The On-Going Evolution of SQL Introduction
More informationThe Oracle Data Mining Machine Bundle: Zero to Predictive Analytics in Two Weeks Collaborate 15 IOUG
The Oracle Data Mining Machine Bundle: Zero to Predictive Analytics in Two Weeks Collaborate 15 IOUG Presentation #730 Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com Presentation
More informationExadata V2 + Oracle Data Mining 11g Release 2 Importing 3 rd Party (SAS) dm models
Exadata V2 + Oracle Data Mining 11g Release 2 Importing 3 rd Party (SAS) dm models Charlie Berger Sr. Director Product Management, Data Mining Technologies Oracle Corporation charlie.berger@oracle.com
More informationOracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich
Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich Agenda Introduction Old Times Exadata Big Data Oracle In-Memory Headquarters Conclusions 2 sumit
More informationStatistical Analysis of Gene Expression Data With Oracle & R (- data mining)
Statistical Analysis of Gene Expression Data With Oracle & R (- data mining) Patrick E. Hoffman Sc.D. Senior Principal Analytical Consultant pat.hoffman@oracle.com Agenda (Oracle & R Analysis) Tools Loading
More informationSun / Oracle Life Science Platform From Deluge to Discovery. 2011 Oracle Corporation
Sun / Oracle Life Science Platform From Deluge to Discovery SGI and Sun 1996 2011 Graph Algorithims Social Media We re a very tiny circle in the middle of this big universe. So it s more likely interesting
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationOLSUG Workshop Oracle Data Mining
OLSUG Workshop Oracle Data Mining Charlie Berger Sr. Director of Product Mgmt, Life Sciences and Data Mining Oracle Corporation charlie.berger@oracle.com Dr. Lutz Hamel Asst. Professor, Computer Science
More informationOracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
More informationHow To Manage Big Data In A Microsoft Cloud (Hadoop)
Oracle Database 12c and the Future of Data Warehousing in the Era of Big Data George Lumpkin Data Warehousing Neil Mendelson Big Data & Advanced AnalyEcs Vice Presidents Server Technologies September 29,
More informationIntegrate Master Data with Big Data using Oracle Table Access for Hadoop
Integrate Master Data with Big Data using Oracle Table Access for Hadoop Kuassi Mensah Oracle Corporation Redwood Shores, CA, USA Keywords: Hadoop, BigData, Hive SQL, Spark SQL, HCatalog, StorageHandler
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationOracle Data Mining In-Database Data Mining Made Easy!
Oracle Data Mining In-Database Data Mining Made Easy! Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics Oracle Corporation charlie.berger@oracle.com www.twitter.com/charliedatamine
More informationConstructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
More informationBlazing BI: the Analytic Options to the Oracle Database. ODTUG Kscope 2013
Blazing BI: the Analytic Options to the Oracle Database ODTUG Kscope 2013 Dan Vlamis Tim Vlamis Vlamis Software Solutions 816-781-2880 http://www.vlamis.com Copyright 2013, Vlamis Software Solutions, Inc.
More information1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
1 Copyright 2011, Oracle and/or its affiliates. FPO In-Database Analytics: Predictive Analytics, Data Mining, Exadata & Business Intelligence Charlie Berger Sr. Director Product Management, Data Mining
More informationAnalyzing Big Data. Heartland OUG Spring Conference 2014
Analyzing Big Data Heartland OUG Spring Conference 2014 Dan Vlamis Vlamis Software Solutions 816-781-2880 http://www.vlamis.com Copyright 2014, Vlamis Software Solutions, Inc. Copyright 2014, Vlamis Software
More informationCloudera Certified Developer for Apache Hadoop
Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number
More informationIntegrating Apache Spark with an Enterprise Data Warehouse
Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software
More informationSQL - The Goto Language for Big Data Analy&cs!
SQL - The Goto Language for Big Data Analy&cs! Analy&cal SQL SQL Made Great Hermann Bär, hermann.baer@oracle.com Product Management Data Warehousing 1 Who Am I Not?.. But Who SHOULD be Here.. Keith Laker
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationData Domain Profiling and Data Masking for Hadoop
Data Domain Profiling and Data Masking for Hadoop 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or
More informationSemantic and Data Mining Technologies. Simon See, Ph.D.,
Semantic and Data Mining Technologies Simon See, Ph.D., Introduction to Semantic Web and Business Use Cases 2 Lots of Scientific Resources NAR 2009 over 1170 databases Reuse, Recycling, Repurposing Paul
More informationOracle's In-Database Statistical Functions
Oracle 11g DB Data Warehousing Oracle's In-Database Statistical Functions OLAP Statistics Data Mining Charlie Berger Sr. Director Product Management, Data Mining Technologies
More informationComplete Java Classes Hadoop Syllabus Contact No: 8888022204
1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What
More informationUnified Query for Big Data Management Systems
Unified Query for Big Data Management Systems Integrating Big Data Systems with Enterprise Data Warehouses O R A C L E W H I T E P A P E R J A N U A R Y 2 0 1 5 Table of Contents Introduction 1 The Challenge
More informationBIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig
BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig Contents Acknowledgements... 1 Introduction to Hive and Pig... 2 Setup... 2 Exercise 1 Load Avro data into HDFS... 2 Exercise 2 Define an
More informationApache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com
Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture
More informationWelkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.
Welkom! WIE? Bestuurslid OGh met BI / WA ervaring Bepalen activiteiten van de vereniging Deelname in organisatie commite van 1 of meerdere events Faciliteren van de SIG s Redactie van OGh-Visie Onderhouden
More informationBig Data Analytics with Oracle Advanced Analytics In-Database Option
Big Data Analytics with Oracle Advanced Analytics In-Database Option Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationThe Hadoop Eco System Shanghai Data Science Meetup
The Hadoop Eco System Shanghai Data Science Meetup Karthik Rajasethupathy, Christian Kuka 03.11.2015 @Agora Space Overview What is this talk about? Giving an overview of the Hadoop Ecosystem and related
More informationArchitecting for the Internet of Things & Big Data
Architecting for the Internet of Things & Big Data Robert Stackowiak, Oracle North America, VP Information Architecture & Big Data September 29, 2014 Safe Harbor Statement The following is intended to
More informationHADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
More informationOracle Big Data Fundamentals Ed 1 NEW
Oracle University Contact Us: +90 212 329 6779 Oracle Big Data Fundamentals Ed 1 NEW Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big
More informationReal World Hadoop Use Cases
Real World Hadoop Use Cases JFokus 2013, Stockholm Eva Andreasson, Cloudera Inc. Lars Sjödin, King.com 1 2012 Cloudera, Inc. Agenda Recap of Big Data and Hadoop Analyzing Twitter feeds with Hadoop Real
More informationA very short Intro to Hadoop
4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationTE's Analytics on Hadoop and SAP HANA Using SAP Vora
TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -
More informationI/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
More informationPredictive Analytics for Better Business Intelligence
Oracle 11g DB Data Warehousing ETL OLAP Statistics Predictive Analytics for Better Business Intelligence Data Mining Charlie Berger Sr. Director Product Management, Data Mining Technologies
More informationWhere is Hadoop Going Next?
Where is Hadoop Going Next? Owen O Malley owen@hortonworks.com @owen_omalley November 2014 Page 1 Who am I? Worked at Yahoo Seach Webmap in a Week Dreadnaught to Juggernaut to Hadoop MapReduce Security
More informationDBMS / Business Intelligence, SQL Server
DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.
More informationSpring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE
Spring,2015 Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Contents: Briefly About Big Data Management What is hive? Hive Architecture Working
More informationDatenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
More informationOracle Big Data Strategy Simplified Infrastrcuture
Big Data Oracle Big Data Strategy Simplified Infrastrcuture Selim Burduroğlu Global Innovation Evangelist & Architect Education & Research Industry Business Unit Oracle Confidential Internal/Restricted/Highly
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationBig Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationOracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya
Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationBig Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
More informationInternals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
More informationProgramming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
More informationCertified Big Data and Apache Hadoop Developer VS-1221
Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification
More informationUsing distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
More informationSector vs. Hadoop. A Brief Comparison Between the Two Systems
Sector vs. Hadoop A Brief Comparison Between the Two Systems Background Sector is a relatively new system that is broadly comparable to Hadoop, and people want to know what are the differences. Is Sector
More informationBig Data Introduction
Big Data Introduction Ralf Lange Global ISV & OEM Sales 1 Copyright 2012, Oracle and/or its affiliates. All rights Conventional infrastructure 2 Copyright 2012, Oracle and/or its affiliates. All rights
More informationBig Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig
Introduction to Pig Agenda What is Pig? Key Features of Pig The Anatomy of Pig Pig on Hadoop Pig Philosophy Pig Latin Overview Pig Latin Statements Pig Latin: Identifiers Pig Latin: Comments Data Types
More informationBig Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016
Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationUsing RDBMS, NoSQL or Hadoop?
Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationWhat Next for DBAs in the Big Data Era
What Next for DBAs in the Big Data Era February 21 st, 2015 Copyright 2013. Apps Associates LLC. 1 Satyendra Kumar Pasalapudi Associate Practice Director IMS @ Apps Associates Co Founder & President of
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationCloudera Backup and Disaster Recovery
Cloudera Backup and Disaster Recovery Important Note: Cloudera Manager 4 and CDH 4 have reached End of Maintenance (EOM) on August 9, 2015. Cloudera will not support or provide patches for any of the Cloudera
More information<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database
1 Best Practices for Extreme Performance with Data Warehousing on Oracle Database Rekha Balwada Principal Product Manager Agenda Parallel Execution Workload Management on Data Warehouse
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationXiaoming Gao Hui Li Thilina Gunarathne
Xiaoming Gao Hui Li Thilina Gunarathne Outline HBase and Bigtable Storage HBase Use Cases HBase vs RDBMS Hands-on: Load CSV file to Hbase table with MapReduce Motivation Lots of Semi structured data Horizontal
More informationAtScale Intelligence Platform
AtScale Intelligence Platform PUT THE POWER OF HADOOP IN THE HANDS OF BUSINESS USERS. Connect your BI tools directly to Hadoop without compromising scale, performance, or control. TURN HADOOP INTO A HIGH-PERFORMANCE
More informationBig Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park
Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable
More informationHadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions
More informationOracle Big Data Discovery Unlock Potential in Big Data Reservoir
Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Gokula Mishra Premjith Balakrishnan Business Analytics Product Group September 29, 2014 Copyright 2014, Oracle and/or its affiliates. All
More informationHow To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
More informationHow To Use Cloudera Manager Backup And Disaster Recovery (Brd) On A Microsoft Hadoop 5.5.5 (Clouderma) On An Ubuntu 5.2.5 Or 5.3.5
Cloudera Manager Backup and Disaster Recovery Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationUsing MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationSession# - AaS 2.1 Title SQL On Big Data - Technology, Architecture and Roadmap
Session# - AaS 2.1 Title SQL On Big Data - Technology, Architecture and Roadmap Sumit Pal Independent Big Data and Data Science Consultant, Boston 1 Data Center World Certified Vendor Neutral Each presenter
More informationApache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah
Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated
More informationGAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
More informationSentimental Analysis using Hadoop Phase 2: Week 2
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
More informationMySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering
MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationOracle Database 11g Comparison Chart
Key Feature Summary Express 10g Standard One Standard Enterprise Maximum 1 CPU 2 Sockets 4 Sockets No Limit RAM 1GB OS Max OS Max OS Max Database Size 4GB No Limit No Limit No Limit Windows Linux Unix
More informationBig Data and Scripting map/reduce in Hadoop
Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More informationDeep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco
Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco About the Speaker Mark Rittman, Co-Founder of Rittman Mead
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More information