Connecting Hadoop with Oracle Database



Similar documents
Constructing a Data Lake: Hadoop and Oracle Database United!

Hadoop and Eclipse. Eclipse Hawaii User s Group May 26th, Seth Ladd

Oracle Big Data Essentials

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

MR-(Mapreduce Programming Language)

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Copy the.jar file into the plugins/ subfolder of your Eclipse installation. (e.g., C:\Program Files\Eclipse\plugins)

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Step 4: Configure a new Hadoop server This perspective will add a new snap-in to your bottom pane (along with Problems and Tasks), like so:

Xiaoming Gao Hui Li Thilina Gunarathne

Oracle Big Data Fundamentals Ed 1 NEW

Internals of Hadoop Application Framework and Distributed File System

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Hadoop WordCount Explained! IT332 Distributed Systems

OTN Developer Day: Oracle Big Data

Introduction to MapReduce and Hadoop

Oracle Big Data Handbook

Case-Based Reasoning Implementation on Hadoop and MapReduce Frameworks Done By: Soufiane Berouel Supervised By: Dr Lily Liang

Istanbul Şehir University Big Data Camp 14. Hadoop Map Reduce. Aslan Bakirov Kevser Nur Çoğalmış

An Oracle White Paper June Oracle: Big Data for the Enterprise

Oracle R zum Anfassen: Die Themen

HPCHadoop: MapReduce on Cray X-series

Safe Harbor Statement

Architecting for the Internet of Things & Big Data

Agenda. ! Strengths of PostgreSQL. ! Strengths of Hadoop. ! Hadoop Community. ! Use Cases

Big Data Introduction

Why Big Data in the Cloud?

Big Data: Are You Ready? Kevin Lancaster

Using distributed technologies to analyze Big Data

An Oracle White Paper September Oracle: Big Data for the Enterprise

HiBench Introduction. Carson Wang Software & Services Group

CS54100: Database Systems

<Insert Picture Here> Big Data

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com

Zebra and MapReduce. Table of contents. 1 Overview Hadoop MapReduce APIs Zebra MapReduce APIs Zebra MapReduce Examples...

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Implement Hadoop jobs to extract business value from large and varied data sets

Big Data: Tools and Technologies in Big Data

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

Big Data Are You Ready? Thomas Kyte

Big Data Too Big To Ignore

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

An Oracle White Paper October Oracle: Big Data for the Enterprise

A Performance Analysis of Distributed Indexing using Terrier

Apache Hadoop: Past, Present, and Future

Chapter 7. Using Hadoop Cluster and MapReduce

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Introduc)on to the MapReduce Paradigm and Apache Hadoop. Sriram Krishnan

Oracle Big Data Building A Big Data Management System

Word count example Abdalrahman Alsaedi

Native Connectivity to Big Data Sources in MSTR 10

The Hadoop Eco System Shanghai Data Science Meetup

Workshop on Hadoop with Big Data

Big Data and Market Surveillance. April 28, 2014

Hadoop and Map-Reduce. Swati Gore

Getting to know Apache Hadoop

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Word Count Code using MR2 Classes and API

Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco

Big Data Course Highlights

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Hadoop Integration Guide

Report Vertiefung, Spring 2013 Constant Interval Extraction using Hadoop

Introduc)on to Map- Reduce. Vincent Leroy

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Hadoop Integration Guide

Big Data on Microsoft Platform

map/reduce connected components

Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网. Information Management. Information Management IBM CDL Lab

Oracle Database 12c Plug In. Switch On. Get SMART.

TUT NoSQL Seminar (Oracle) Big Data

Introducing Oracle Exalytics In-Memory Machine

ITG Software Engineering

BIG DATA TRENDS AND TECHNOLOGIES

CS 378 Big Data Programming. Lecture 2 Map- Reduce

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Hadoop Framework. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

Oracle Big Data SQL Technical Update

Cloudera Certified Developer for Apache Hadoop

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Big Data for the JVM developer. Costin Leau,

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig

Hadoop Meets Exadata. Presented by: Kerry Osborne. DW Global Leaders Program Decemeber, 2012

Using OBIEE for Location-Aware Predictive Analytics

BIG DATA APPLICATIONS

Hadoop Configuration and First Examples

Working With Hadoop. Important Terminology. Important Terminology. Anatomy of MapReduce Job Run. Important Terminology

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Vectorwise 3.0 Fast Answers from Hadoop. Technical white paper

Oracle Big Data Strategy Simplified Infrastrcuture

Lambda Architecture. CSCI 5828: Foundations of Software Engineering Lecture 29 12/09/2014

Transcription:

<Insert Picture Here> Connecting Hadoop with Oracle Database Sharon Stephen Senior Curriculum Developer Server Technologies Curriculum

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle.

Agenda Introduction to Oracle Big Data Phases of processing Oracle Big Data Overview of the Oracle Big Data Connectors Methods of connecting Hadoop to an Oracle Database Summary

What Is Big Data? Big data is defined as voluminous unstructured data from many different sources. Social networks Banking and financial services E-commerce services Medical records Weblogs Web-centric services Internet search indexes Scientific searches Document searches

Big Data: Big Goals Discover Intelligent data Understand ecommerce behavior Derive sentiment Support interactions

Why Oracle For Big Data?

Oracle Big Data: Four Phased Solution 1 2 3 4 Acquire Organize Analyze Decide

Oracle s Big Data Solution Decide Oracle Real-Time Decisions Endeca Information Discovery Oracle BI Foundation Suite Oracle Event Processing Apache Flume Oracle GoldenGate Cloudera Hadoop Oracle NoSQL Database Oracle R Distribution Oracle Big Data Connectors Oracle Data Integrator Oracle Database Oracle Advanced Analytics Oracle Spatial & Graph Stream Acquire Organize Analyze

Apache Hadoop Apache Hadoop is a framework for executing applications on large clusters built of commodity hardware. It operates by batch processing. Hadoop consists of two components: Hadoop Distributed File System MapReduce

Oracle Big Data Connectors Facilitate data access to data stored in a Hadoop cluster. Can be licensed for use on either Oracle Big Data Appliance or a Hadoop cluster running on commodity hardware. Cloudera Hadoop Oracle NoSQL Database Oracle R Distribution Oracle Big Data Connectors Oracle Data Integrator Oracle Database Oracle Advanced Analytics Oracle Spatial & Graph Acquire Organize Analyze

Usage Scenarios Bulk loading of large volumes of data Example: Historical data; daily uploads of data gathered during the day Loading at regular frequency Example: 24/7 monitoring of log feeds Loading at irregular frequency Example: Monitoring of sensor feeds Accessing data files in place on HDFS

Installation Details You can download connectors from either of the following locations: Oracle Technology Network http://www.oracle.com/technetwork/bdc/big-dataconnectors/downloads/index.html Oracle Software Delivery Cloud https://edelivery.oracle.com/

Oracle Big Data Connectors Licensed together Oracle Loader for Hadoop Oracle SQL Connector for HDFS Oracle Data Integrator Application Adapter for Hadoop Oracle R Connector for Hadoop Oracle Xquery for Hadoop

Oracle Loader for Hadoop Efficient and high-performance loader for fast movement of data from any Hadoop cluster into a table in Oracle Database Allows you to use Hadoop MapReduce processing Partitions the data and transforms it into an Oracleready format Can be operated in two modes Online mode Offline mode

Data Samples JSON files Sensor data Machine logs {"custid":1046915,"movieid":null,"genreid":null,"time":"2012-07-01:00:33:18","recommended":null,"activity":9} {"custid":1144051,"movieid":768,"genreid":9,"time":"2012-07-01:00:33:39","recommended":"n","activity":6} {"custid":1264225,"movieid":null,"genreid":null,"time":"2012-07-01:00:34:01","recommended":null,"activity":8} {"custid":1085645,"movieid":null,"genreid":null,"time":"2012-07-01:00:34:18","recommended":null,"activity":8} {"custid":1098368,"movieid":null,"genreid":null,"time":"2012-07-01:00:34:28","recommended":null,"activity":8} {"custid":1363545,"movieid":27205,"genreid":9,"time":"2012-07-01:00:35:09","recommended":"y","activity":11,"price":3.99} {"custid":1156900,"movieid":20352,"genreid":14,"time":"2012-07-01:00:35:12","recommended":"n","activity":7} {"custid":1336404,"movieid":null,"genreid":null,"time":"2012-07-01:00:35:27","recommended":null,"activity":9} {"custid":1022288,"movieid":null,"genreid":null,"time":"2012-07-01:00:35:38","recommended":null,"activity":8} {"custid":1129727,"movieid":1105903,"genreid":11,"time":"2012-07-01:00:36:08","recommended":"n","activity":1,"rating":3} {"custid":1305981,"movieid":null,"genreid":null,"time":"2012-07-01:00:36:27","recommended":null,"activity":8} Apache Weblogs Twitter feeds

Online and Offline Modes Partition, sort, and convert into Oracle data types on Hadoop Connect to the database from reducer nodes, load into database partitions in ORACLE LOADER FOR HADOOP parallel MAP MAP MAP SHUFFLE /SORT REDUCE REDUCE MAP REDUCE MAP MAP SHUFFLE /SORT REDUCE REDUCE

Choosing Output Mode in OLH OUTPUT MODE Online load with JDBC Online load with Direct Path Offline load with Data Pump files USE CASE CHARACTERISTICS Simplest, can load into for nonpartitioned tables Fast online load for partitioned tables Fastest load method for external tables

OLH: Advantages OLH offloads database server processing to Hadoop by: Converting the input data to final database format Computing the table partition for a row Sorting rows by primary key within a table partition Generating binary Data Pump files Balancing partition groups across reducers

Oracle SQL Connector for HDFS Oracle SQL Connector for HDFS (OSCH) is a connector that facilitates read access from HDFS to Oracle Database using external tables. It uses the ORACLE_LOADER access driver. It enables you to: Access big data without loading the data Access the data stored in HDFS files Access CSV (comma-separated value) files and Data Pump files generated by Oracle Loader for Hadoop Load data extracted and transformed by Oracle Data Integrator

Oracle SQL Connector for HDFS HDFS Oracle Database Query SQL External Table Direct access from Oracle Database SQL access to HDFS External table view Data query or import OSCH HDFS Client Infini Band

OSCH: Three Simple Steps 1.Create an external table. 2.Run the OSCH utility to publish HDFS content to the external table. 3.Access and load into the database using SQL. >hadoop jar \ $ODCH_HOME/jlib/orahdfs.jar \ oracle.hadoop.hdfs.extab.externaltable\ -conf MyConf.xml \ -publish

OSCH: Features Access and analyze data in place on HDFS via external tables. Query and join data on HDFS with database-resident data. Load into the database using SQL (if required). Automatic load balancing to maximize performance. DML operations and indexes cannot be created on external tables. Data files can be text files or Oracle Data Pump files. Parallelism is controlled by the external table definition. Data files are grouped to distribute load evenly across PQ slaves.

Choosing Connectors Oracle Loader for Hadoop Load into Oracle Database. Input data can be delimited text, data from Hive tables (Oraclesupported Input Formats), or any other format, for example, binary format (by creating your own input format). Oracle SQL Connector for HDFS Access directly from Hive, HDFS, or load into Oracle Database. Input data can be delimited text or Oracle Data Pump files only.

ODI Application Adapter for Hadoop Is a Big Data Connector that allows data integration developers to easily integrate and transform data within Hadoop using Oracle Data Integrator Has preconfigured ODI knowledge modules Oracle Data Integrator Oracle Loader for Hadoop

Oracle R Connector for Hadoop Oracle R Connector for Hadoop (ORCH) is an R package that provides an interface between the local R environment, Oracle Database, and Cloudera CDH. Using simple R functions, you can: Sample data in HDFS Copy data between Oracle Database and HDFS Schedule R programs to execute as MapReduce jobs Return the results to Oracle Database or to your laptop Cloudera CDH Oracle R Connector for Hadoop (ORCH) Oracle R Enterprise (ORE)

Word Count: Example Without ORCH import java.io.ioexception; import org.apache.hadoop.io.intwritable; import org.apache.hadoop.io.longwritable; import org.apache.hadoop.io.text; import org.apache.hadoop.mapred.mapreducebase; import org.apache.hadoop.mapred.mapper; import org.apache.hadoop.mapred.outputcollector; import org.apache.hadoop.mapred.reporter; public class WordMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { public void map(longwritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String s = value.tostring(); for (String word : s.split("\\w+")) { if (word.length() > 0) { output.collect(new Text(word), new IntWritable(1)); } } Mapper } } import java.io.ioexception; import java.util.iterator; import org.apache.hadoop.io.intwritable; import org.apache.hadoop.io.text; import org.apache.hadoop.mapred.outputcollector; import org.apache.hadoop.mapred.mapreducebase; import org.apache.hadoop.mapred.reducer; import org.apache.hadoop.mapred.reporter; public class SumReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int wordcount = 0; while (values.hasnext()) { IntWritable value = values.next(); wordcount += value.get(); } Reducer output.collect(key, new IntWritable(wordCount)); } }

Word Count: Example with ORCH input <- hdfs.put(corpus) wordcount <- function (input, output = NULL, pattern = " ") { } res <- hadoop.exec(dfs.id = input, mapper = function(k,v) { lapply( strsplit(x = v, split = pattern)[[1]], function(w) orch.keyval(w,1)[[1]]) }, reducer = function(k,vv) { orch.keyval(k, sum(unlist(vv))) }, config = new("mapred.config", job.name = "wordcount", map.output = data.frame(key=0, val=''), reduce.output = data.frame(key='', val=0) ) ) res

NEW: Oracle XQuery for Hadoop OXH is a transformation engine for Big Data XQuery language executed on the Map/Reduce framework XQuery Map/Reduce Execution Plan Map/Reduce Worker Nodes Oracle Big Data Connectors for $ln in text:collection() let $f := tokenize($ln) where $f[1] = 'x' return text:put($f[2]) M/R M/R M/R M/R Oracle Data Integrator OXH Engine HDFS Oracle Loader for Hadoop Oracle Database Acquire Organize Analyze

Performance 15 TB / hour 25 times faster than third party products Reduced database CPU usage in comparison

Connectors: Engineered to Leverage All Data ODI Application Adapter for Hadoop Oracle Loader for Hadoop Oracle Direct Connector for HDFS Oracle R Connector for Hadoop Oracle Xquery for Hadoop

Summary Oracle Big Data Connectors are products for high speed loading from Hadoop to Oracle Database Cover a range of use cases Several input sources Flexible, easy-to-use, developed and supported by Oracle The fastest load option loads at 15 TB/hour

Further References http://www.oracle.com/us/products/database/big-dataappliance/overview/index.html http://www.oracle.com/us/products/database/exadata/overview/inde x.html http://www.oracle.com/technetwork/bdc/big-dataconnectors/downloads/index.html https://blogs.oracle.com/bigdataconnectors/

Questions Mail your queries to: sharon.stephen@oracle.com