OTN Developer Day: Oracle Big Data
|
|
- Clarissa Richard
- 8 years ago
- Views:
Transcription
1 OTN Developer Day: Oracle Big Data Hands On Lab Manual Oracle Big Data Connectors: Introduction to Oracle R Connector for Hadoop
2 ORACLE R CONNECTOR FOR HADOOP 2.0 HANDS-ON LAB Introduction to Oracle R Connector for Hadoop
3 Contents Introduction to Oracle R Connector for Hadoop... 3 Exercise 1 Work with data in HDFS and Oracle Database... 3 Exercise 2 Execute a simple MapReduce job using ORCH... 4 Exercise 3 Count words in movie plot summaries... 6 Solution for Introduction to Oracle R Connector for Hadoop... 9 Exercise 1 Work with data in HDFS and Oracle Database... 9 Exercise 2 Execute a simple MapReduce job using ORCH Exercise 3 Count words in movie plot summaries... 15
4 Introduction to Oracle R Connector for Hadoop Oracle R Connector for Hadoop (ORCH), a component of the Big Data Connectors option, provides transparent access to Hadoop and HDFS-resident data. Hadoop is a high performance distributed computational system, and the Hadoop Distributed File System (HDFS) is a distributed high-availability file storage mechanism. With ORCH, R users are not forced to learn a new language to work with Hadoop/HDFS they continue to work in R. In addition they can leverage open source R packages as part of their mapper and reducer functions when working on HDFS-resident data. ORCH allows for Hadoop jobs to be executed locally at the client for testing purposes, then, by changing one setting, the exact same code can be executed on the Hadoop cluster without requiring the involvement of administrators, or knowledge of Hadoop internals, the Hadoop call level interface or IT infrastructure. ORCH and Oracle R Enterprise (ORE) can interact in a variety of ways. If ORE is installed on the R client with ORCH, ORCH can copy ore.frames (data tables) to HDFS, ORE can preprocess data that is fed to map-reduce jobs, and ORE can post-process results of map-reduce jobs once data is moved from HDFS to Oracle Database. If ORE is installed on the Big Data Appliance task nodes, mapper and reducer functions can include functions calls to ORE. If ORCH is installed on Oracle Database server, R scripts in embedded R execution can invoke ORCH functionality, achieving operationalization of ORCH scripts via SQL-based applications or those leveraging DBMS_SCHEDULER. To run the commands in this document on the virtual machine (VM), point Firefox to and log into RStudio using oracle user s credentials. From the RStudio File menu, select File-Open File and navigate to location /home/oracle/movie/moviework/advancedanalytics. Select the R Script file _ORCH_Hands-on_Lab.R, and the HOL s script s commands will be opened and available to run in RStudio. Exercise 1 Work with data in HDFS and Oracle Database Loading the ORCH library provides access to some basic functions for manipulating HDFS. After navigating to a specified directory, we ll again access database data in the form of the MOVIE_FACT and MOVIE_GENRE tables, and connect to Oracle Database from ORCH. Although you re connected to the database through ORE, to transfer data between Oracle Database and HDFS requires an ORCH connection. Then, you ll copy data from the database to HDFS for later use with a MapReduce job. Run these commands from the /home/oracle/movie/moviework/advancedanalytics Linux directory. 1. If you are in R, first exit from R using CTRL-D CTRL-D. This will in effect invoke q() and not save the workspace. Change directory and start R: cd /home/oracle/movie/moviework/advancedanalytics R 2. If you are not already connected by default, load the Oracle R Enterprise (ORE) library and connect to the Oracle database, then list the contents of the database to test the connection. Notice that if a table contains columns with unsupported data types, a warning message is returned. If you are connected, you can just invoke ore.ls(). library(ore) ore.connect("moviedemo","orcl","localhost","welcome1",all=true ) ore.ls()
5 3. Load the Oracle R Connector for Hadoop (ORCH) library, get the current working directory, and list the directory contents in Hadoop Distributed File System (HDFS). Change directory in HDFS and view the contents there: library(orch) hdfs.pwd() hdfs.ls() hdfs.cd ("/user/oracle/moviework/advancedanalytics/data") hdfs.ls() 4. Using ORE, view the names of the database tables MOVIE_FACT and MOVIE_GENRE, look at the first few rows of each table, and get the table dimensions: ore.sync("moviedemo","movie_fact") MF <- MOVIE_FACT names(mf) head(mf,3) dim(mf) names(movie_genre) head(movie_genre,3) dim(movie_genre) 5. Since we will use the table MOVIE_GENRE later in our Hadoop recommendation jobs, copy a subset of MOVIE_GENRE from the database to HDFS and validate that it exists. This requires using orch.connect to establish the connect to the database from ORCH. MG_SUBSET <- MOVIE_GENRE[1:10000,] hdfs.rm('movie_genre_subset') orch.connect(host="localhost", user="moviedemo", sid="orcl",passwd="welcome1",secure=f) mg.dfs <- hdfs.push(mg_subset, dfs.name='movie_genre_subset', split.by="genre_id") hdfs.exists('movie_genre_subset') hdfs.describe('movie_genre_subset') hdfs.size('movie_genre_subset') Exercise 2 Execute a simple MapReduce job using ORCH In this exercise, you will execute a Hadoop job that counts the number of movies in each genre. You will first run the script in dry run mode, executing on the local machine serially. Then, you will run on the cluster in the VM. Finally, you will compare the results using ORE. 1. Use hdfs.attach() to attach the movie_genre HDFS file to the working session:
6 mg.dfs <- hdfs.attach("/user/oracle/moviework/advancedanalytics/data/movie_genre_subset ) mg.dfs hdfs.describe(mg.dfs) 2. Specify to run in dry run mode and then execute the MapReduce job that partitions the data based on genre_id, and counts up the number of movies in each genre. Note that you will receive debug output while in dry run mode. orch.dryrun(t) res.dryrun <- NULL res.dryrun <- hadoop.run( mg.dfs, mapper = function(key, val) { orch.keyval(val$genre_id, 1) reducer = function(key, vals) { count <- length(vals) orch.keyval(key, count) }, config = new("mapred.config", map.output = data.frame(key=0, val=0), reduce.output = data.frame(genre_id=0, COUNT=0)) ) 3. Retrieve the result of the Hadoop job, which is stored as an HDFS file. Note that since this is dry run mode, not all data may be used, so only a subset of results may be returned. hdfs.get(res.dryrun) 4. Specify to execute using the cluster by setting orch.dryrun to FALSE, rerun the same MapReduce job, and view the result. Note that this will take longer to execute since it is starting actual Hadoop jobs on the cluster. orch.dryrun(f) res.cluster <- NULL res.cluster <- hadoop.run( mg.dfs, mapper = function(key, val) { orch.keyval(val$genre_id, 1) reducer = function(key, vals) { count <- length(vals) orch.keyval(key, count) }, config = new("mapred.config", map.output = data.frame(key=0, val=0), reduce.output = data.frame(genre_id=0, COUNT=0)) ) hdfs.get(res.cluster) 5. Perform the same analysis using ORE:
7 res.table <- table(mg_subset$genre_id) res.table Exercise 3 Count words in movie plot summaries In this exercise, you will execute a Hadoop job that counts how many times each of the words in MOVIE plot summaries occurs. You will first create the HDFS file containing the data extracted from Oracle Database using ORE. Then, you will run the MarpReduce job on the cluster in the VM. Finally, you will view the results using ORE, but since we ll want the results sorted by most frequent words, another MapReduce job will be needed. 1. If starting a fresh R session, execute the first block. Otherwise, continue to find all the movies with plot summaries and convert them from an ore.factor to ore.character. Remove various unneeded punctuation from the text, create a database table from these, and create the input corpus for the MapReduce job: library(orch) orch.connect(host="localhost", user="moviedemo", sid="orcl",passwd="welcome1",secure=f) hdfs.cd("/user/oracle/moviework/advancedanalytics/data") ore.drop(table= corpus_table ) corpus <- as.character(movie[!is.na(movie$plot_summary),"plot_summary"]) class(corpus) corpus <- gsub("([/\\\":,#.@-])", " ", corpus) head(corpus,2) corpus <- data.frame(text=corpus) ore.create (corpus,table = "corpus_table") hdfs.rm("plot_summary_corpus") input <- hdfs.put(corpus_table,dfs.name="plot_summary_corpus") 2. Try the following example to see how R parses text using strsplit. Notice the extra space between my and text. This gets converted to a null output. You will account for that in the next step. txt <- "This is my text" strsplit(txt," ") mylist <- list(a = 5, B = 10, C = 25) sum(unlist(mylist)) 3. Execute the MapReduce job that performs the word count.
8 res <- hadoop.exec(dfs.id = input, mapper = function(k,v) { x <- strsplit(v[[1]], " ")[[1]] x <- x[x!=''] out <- NULL for(i in 1:length(x)) out <- c(out, orch.keyval(x[i],1)) out reducer = function(k,vv) { orch.keyval(k, sum(unlist(vv))) config = new("mapred.config", job.name = "wordcount", map.output = data.frame(key='', val=0), reduce.output = data.frame(key='', val=0) ) ) 4. View the path of the result HDFS file. Then get the contents of the result. Notice that the results are unordered. res hdfs.get(res) 5. To sort the results, we can use the following MapReduce job. Notice that we can specify explicit stopwords, i.e., those to be excluded from the set, but that we also eliminate words of 3 letters or fewer. Then view the sorted results, as well as a sample of 10 rows from the HDFS file. Which words are the most popular in the plot summaries?
9 stopwords <- c("from","they","that","with","their","when","into","what") sorted.res <- hadoop.exec(dfs.id = res, mapper = function(k,v) { if(!(k %in% stopwords) & nchar(k) > 3) { cnt <- sprintf("%05d", as.numeric(v[[1]])) orch.keyval(cnt,k) } reducer = function(k,vv) { orch.keyvals(k, vv) export= orch.export(stopwords), config = new("mapred.config", job.name = "sort.words", reduce.tasks = 1, map.output = data.frame(key='', val=''), reduce.output = data.frame(key='', val='') ) ) hdfs.get(sorted.res) hdfs.sample(sorted.res,10)
10 Solution for Introduction to Oracle R Connector for Hadoop Oracle R Connector for Hadoop (ORCH), a component of the Big Data Connectors option, provides transparent access to Hadoop and HDFS-resident data. Hadoop is a high performance distributed computational system, and the Hadoop Distributed File System (HDFS) is a distributed high-availability file storage mechanism. With ORCH, R users are not forced to learn a new language to work with Hadoop/HDFS they continue to work in R. In addition they can leverage open source R packages as part of their mapper and reducer functions when working on HDFS-resident data. ORCH allows for Hadoop jobs to be executed locally at the client for testing purposes, then, by changing one setting, the exact same code can be executed on the Hadoop cluster without requiring the involvement of administrators, or knowledge of Hadoop internals, the Hadoop call level interface or IT infrastructure. ORCH and Oracle R Enterprise (ORE) can interact in a variety of ways. If ORE is installed on the R client with ORCH, ORCH can copy ore.frames (data tables) to HDFS, ORE can preprocess data that is fed to map-reduce jobs, and ORE can post-process results of map-reduce jobs once data is moved from HDFS to Oracle Database. If ORE is installed on the Big Data Appliance task nodes, mapper and reducer functions can include functions calls to ORE. If ORCH is installed on Oracle Database server, R scripts in embedded R execution can invoke ORCH functionality, achieving operationalization of ORCH scripts via SQL-based applications or those leveraging DBMS_SCHEDULER. To run the commands in this document on the virtual machine (VM), point Firefox to and log into RStudio using oracle user s credentials. From the RStudio File menu, select File-Open File and navigate to location /home/oracle/movie/moviework/advancedanalytics. Select the R Script file _ORCH_Hands-on_Lab.R, and the HOL s script s commands will be opened and available to run in RStudio. Exercise 1 Work with data in HDFS and Oracle Database Loading the ORCH library provides access to some basic functions for manipulating HDFS. After navigating to a specified directory, we ll again access database data in the form of the MOVIE_FACT and MOVIE_GENRE tables, and connect to Oracle Database from ORCH. Although you re connected to the database through ORE, to transfer data between Oracle Database and HDFS requires an ORCH connection. Then, you ll copy data from the database to HDFS for later use with a MapReduce job. Run these commands from the /home/oracle/movie/moviework/advancedanalytics Linux directory. 1. If you are in R, first exit from R using CTRL-D CTRL-D. This will in effect invoke q() and not save the workspace. Change directory and start R: cd /home/oracle/movie/moviework/advancedanalytics R
11 2. If you are not already connected by default, load the Oracle R Enterprise (ORE) library and connect to the Oracle database, then list the contents of the database to test the connection. Notice that if a table contains columns with unsupported data types, a warning message is returned. If you are connected, you can just invoke ore.ls(). library(ore) ore.connect("moviedemo","orcl","localhost","welcome1",all=true ) ore.ls() 3. Load the Oracle R Connector for Hadoop (ORCH) library, get the current working directory, and list the directory contents in Hadoop Distributed File System (HDFS). Change directory in HDFS and view the contents there:
12 library(orch) hdfs.pwd() hdfs.ls() hdfs.cd ("/user/oracle/moviework/advancedanalytics/data") hdfs.ls() 4. Using ORE, view the names of the database tables MOVIE_FACT and MOVIE_GENRE, look at the first few rows of each table, and get the table dimensions: ore.sync("moviedemo","movie_fact") MF <- MOVIE_FACT names(mf) head(mf,3) dim(mf) names(movie_genre) head(movie_genre,3) dim(movie_genre) 5. Since we will use the table MOVIE_GENRE later in our Hadoop recommendation jobs, copy a subset of MOVIE_GENRE from the database to HDFS and validate that it exists. This requires using orch.connect to establish the connect to the database from ORCH.
13 MG_SUBSET <- MOVIE_GENRE[1:10000,] hdfs.rm('movie_genre_subset') orch.connect(host="localhost", user="moviedemo", sid="orcl",passwd="oracle",secure=f) mg.dfs <- hdfs.push(mg_subset, dfs.name='movie_genre_subset', split.by="genre_id") hdfs.exists('movie_genre_subset') hdfs.describe('movie_genre_subset') hdfs.size('movie_genre_subset') Exercise 2 Execute a simple MapReduce job using ORCH In this exercise, you will execute a Hadoop job that counts the number of movies in each genre. You will first run the script in dry run mode, executing on the local machine serially. Then, you will run on the cluster in the VM. Finally, you will compare the results using ORE. 1. Use hdfs.attach() to attach the movie_genre HDFS file to the working session: mg.dfs <- hdfs.attach("/user/oracle/moviework/advancedanalytics/data/movie_genre_subset") mg.dfs hdfs.describe(mg.dfs)
14 2. Specify to run in dry run mode and then execute the MapReduce job that partitions the data based on genre_id, and counts up the number of movies in each genre. Note that you will receive debug output while in dry run mode. orch.dryrun(t) res.dryrun <- NULL res.dryrun <- hadoop.run( mg.dfs, mapper = function(key, val) { orch.keyval(val$genre_id, 1) reducer = function(key, vals) { count <- length(vals) orch.keyval(key, count) }, config = new("mapred.config", map.output = data.frame(key=0, val=0), reduce.output = data.frame(genre_id=0, COUNT=0)) )
15 3. Retrieve the result of the Hadoop job, which is stored as an HDFS file. Note that since this is dry run mode, not all data may be used, so only a subset of results may be returned. hdfs.get(res.dryrun) 4. Specify to execute using the cluster by setting orch.dryrun to FALSE, rerun the same MapReduce job, and view the result. Note that this will take longer to execute since it is starting actual Hadoop jobs on the cluster.
16 orch.dryrun(f) res.cluster <- NULL res.cluster <- hadoop.run( mg.dfs, mapper = function(key, val) { orch.keyval(val$genre_id, 1) reducer = function(key, vals) { count <- length(vals) orch.keyval(key, count) }, config = new("mapred.config", map.output = data.frame(key=0, val=0), reduce.output = data.frame(genre_id=0, COUNT=0)) ) hdfs.get(res.cluster) 5. Perform the same analysis using ORE: res.table <- table(mg_subset$genre_id) res.table Exercise 3 Count words in movie plot summaries In this exercise, you will execute a Hadoop job that counts how many times each of the words in MOVIE plot summaries occurs. You will first create the HDFS file containing the data extracted from Oracle Database using ORE. Then, you will run the MarpReduce job on the cluster in the VM. Finally, you will view the results using ORE, but since we ll want the results sorted by most frequent words, another MapReduce job will be needed.
17 1. If starting a fresh R session, execute the first block. Otherwise, continue to find all the movies with plot summaries and convert them from an ore.factor to ore.character. Remove various unneeded punctuation from the text, create a database table from these, and create the input corpus for the MapReduce job: library(orch) orch.connect(host="localhost", user="moviedemo", sid="orcl",passwd="welcome1",secure=f) hdfs.cd ("/user/oracle/moviework/advancedanalytics/data") corpus <- as.character(movie[!is.na(movie$plot_summary),"plot_summary"]) class(corpus) corpus <- gsub("([/\\\":,#.@-])", " ", corpus) head(corpus,2) corpus <- data.frame(text=corpus) ore.create (corpus,table = "corpus_table") hdfs.rm("plot_summary_corpus") input <- hdfs.put(corpus_table,dfs.name="plot_summary_corpus") 2. Try the following example to see how R parses text using strsplit. Notice the extra space between my and text. This gets converted to a null output. You will account for that in the next step. txt <- "This is my text" strsplit(txt," ") mylist <- list(a = 5, B = 10, C = 25) sum(unlist(mylist))
18 3. Execute the MapReduce job that performs the word count: res <- hadoop.exec(dfs.id = input, mapper = function(k,v) { x <- strsplit(v[[1]], " ")[[1]] x <- x[x!=''] out <- NULL for(i in 1:length(x)) out <- c(out, orch.keyval(x[i],1)) out reducer = function(k,vv) { orch.keyval(k, sum(unlist(vv))) config = new("mapred.config", job.name = "wordcount", map.output = data.frame(key='', val=0), reduce.output = data.frame(key='', val=0) ) ) 4. View the path of the result HDFS file. Then get the contents of the result. Notice that the results are unordered. res hdfs.get(res)
19 5. To sort the results, we can use the following MapReduce job. Notice that we can specify explicit stopwords, i.e., those to be excluded from the set, but that we also eliminate words of 3 letters or fewer. Then view the sorted results, as well as a sample of 10 rows from the HDFS file. Which words are the most popular in the plot summaries? stopwords <- c("from","they","that","with","their","when","into","what") sorted.res <- hadoop.exec(dfs.id = res, mapper = function(k,v) { if(!(k %in% stopwords) & nchar(k) > 3) { cnt <- sprintf("%05d", as.numeric(v[[1]])) orch.keyval(cnt,k) } reducer = function(k,vv) { orch.keyvals(k, vv) export= orch.export(stopwords), config = new("mapred.config", job.name = "sort.words", reduce.tasks = 1, map.output = data.frame(key='', val=''), reduce.output = data.frame(key='', val='') ) ) hdfs.get(sorted.res) hdfs.sample(sorted.res,10)
20
Big Data Analytics Scaling R to Enterprise Data user! 2013 Albacete Spain #user2013
Big Analytics Scaling R to Enterprise user! 2013 Albacete Spain #user2013 Luis Campos Mark Hornick 1 Big Solutions Lead, Oracle EMEA Director, Oracle base Advanced Analytics @luigicampos @MarkHornick 2
More informationOracle R zum Anfassen: Die Themen
R zum Anfassen: Die Themen 09:30 Begrüßung 09:45 R Zum Anfassen Einführung 10:15 Minikurs in der Sprache R Sprachmittel, Hilfen, GUIs zum Erstellen der Skripte Schnell und einfach ansprechende Grafiken
More informationOLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)
Use Data from a Hadoop Cluster with Oracle Database Hands-On Lab Lab Structure Acronyms: OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS) All files are
More informationConnecting Hadoop with Oracle Database
Connecting Hadoop with Oracle Database Sharon Stephen Senior Curriculum Developer Server Technologies Curriculum The following is intended to outline our general product direction.
More informationOracle Big Data Essentials
Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 40291196 Oracle Big Data Essentials Duration: 3 Days What you will learn This Oracle Big Data Essentials training deep dives into using the
More informationHadoop Basics with InfoSphere BigInsights
An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Part: 1 Exploring Hadoop Distributed File System An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government
More informationQuick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine
Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Version 3.0 Please note: This appliance is for testing and educational purposes only; it is unsupported and not
More informationKognitio Technote Kognitio v8.x Hadoop Connector Setup
Kognitio Technote Kognitio v8.x Hadoop Connector Setup For External Release Kognitio Document No Authors Reviewed By Authorised By Document Version Stuart Watt Date Table Of Contents Document Control...
More informationIntegrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
More informationBIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig
BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig Contents Acknowledgements... 1 Introduction to Hive and Pig... 2 Setup... 2 Exercise 1 Load Avro data into HDFS... 2 Exercise 2 Define an
More informationNIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September 2014. National Institute of Standards and Technology (NIST)
NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop September 2014 Dylan Yaga NIST/ITL CSD Lead Software Designer Fernando Podio NIST/ITL CSD Project Manager National Institute of Standards
More informationHow to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1
How to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationBig Data and Scripting map/reduce in Hadoop
Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb
More informationOracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,
More informationOracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager
Oracle Data Integrator for Big Data Alex Kotopoulis Senior Principal Product Manager Hands on Lab - Oracle Data Integrator for Big Data Abstract: This lab will highlight to Developers, DBAs and Architects
More informationOracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining
Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining R The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
More informationCSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei
CSE 344 Introduction to Data Management Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei Homework 8 Big Data analysis on billion triple dataset using Amazon Web Service (AWS) Billion Triple Set: contains
More informationCloud Computing. Chapter 8. 8.1 Hadoop
Chapter 8 Cloud Computing In cloud computing, the idea is that a large corporation that has many computers could sell time on them, for example to make profitable use of excess capacity. The typical customer
More informationORACLE NOSQL DATABASE HANDS-ON WORKSHOP Cluster Deployment and Management
ORACLE NOSQL DATABASE HANDS-ON WORKSHOP Cluster Deployment and Management Lab Exercise 1 Deploy 3x3 NoSQL Cluster into single Datacenters Objective: Learn from your experience how simple and intuitive
More informationMonitoring Oracle Enterprise Performance Management System Release 11.1.2.3 Deployments from Oracle Enterprise Manager 12c
Monitoring Oracle Enterprise Performance Management System Release 11.1.2.3 Deployments from Oracle Enterprise Manager 12c This document describes how to set up Oracle Enterprise Manager 12c to monitor
More informationHadoop Data Warehouse Manual
Ruben Vervaeke & Jonas Lesy 1 Hadoop Data Warehouse Manual To start off, we d like to advise you to read the thesis written about this project before applying any changes to the setup! The thesis can be
More informationRHadoop Installation Guide for Red Hat Enterprise Linux
RHadoop Installation Guide for Red Hat Enterprise Linux Version 2.0.2 Update 2 Revolution R, Revolution R Enterprise, and Revolution Analytics are trademarks of Revolution Analytics. All other trademarks
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationUser's Guide - Beta 1 Draft
IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent vnext User's Guide - Beta 1 Draft SC27-2319-05 IBM Tivoli Composite Application Manager for Microsoft
More informationHands-on Exercises with Big Data
Hands-on Exercises with Big Data Lab Sheet 1: Getting Started with MapReduce and Hadoop The aim of this exercise is to learn how to begin creating MapReduce programs using the Hadoop Java framework. In
More informationIntroduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
More informationData Domain Profiling and Data Masking for Hadoop
Data Domain Profiling and Data Masking for Hadoop 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or
More informationHadoop Tutorial. General Instructions
CS246: Mining Massive Datasets Winter 2016 Hadoop Tutorial Due 11:59pm January 12, 2016 General Instructions The purpose of this tutorial is (1) to get you started with Hadoop and (2) to get you acquainted
More informationMap Reduce & Hadoop Recommended Text:
Big Data Map Reduce & Hadoop Recommended Text:! Large datasets are becoming more common The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately
More informationOTN Developer Day: Oracle Big Data. Hands On Lab Manual. Introduction to Oracle NoSQL Database
OTN Developer Day: Oracle Big Data Hands On Lab Manual Introduction to ORACLE NOSQL DATABASE HANDS-ON WORKSHOP ii Hands on Workshop Lab Exercise 1 Start and run the Movieplex application. In this lab,
More informationRumen. Table of contents
Table of contents 1 Overview... 2 1.1 Motivation...2 1.2 Components...2 2 How to use Rumen?...3 2.1 Trace Builder... 3 2.2 Folder... 5 3 Appendix... 8 3.1 Resources... 8 3.2 Dependencies... 8 1 Overview
More informationLAB 2 SPARK / D-STREAM PROGRAMMING SCIENTIFIC APPLICATIONS FOR IOT WORKSHOP
LAB 2 SPARK / D-STREAM PROGRAMMING SCIENTIFIC APPLICATIONS FOR IOT WORKSHOP ICTP, Trieste, March 24th 2015 The objectives of this session are: Understand the Spark RDD programming model Familiarize with
More informationChase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
More informationMiami University RedHawk Cluster Working with batch jobs on the Cluster
Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.
More informationClick Stream Data Analysis Using Hadoop
Governors State University OPUS Open Portal to University Scholarship Capstone Projects Spring 2015 Click Stream Data Analysis Using Hadoop Krishna Chand Reddy Gaddam Governors State University Sivakrishna
More informationMySQL for Beginners Ed 3
Oracle University Contact Us: 1.800.529.0165 MySQL for Beginners Ed 3 Duration: 4 Days What you will learn The MySQL for Beginners course helps you learn about the world's most popular open source database.
More informationImportant Notice. (c) 2010-2013 Cloudera, Inc. All rights reserved.
Hue 2 User Guide Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this document
More informationSetting Up the Site Licenses
XC LICENSE SERVER Setting Up the Site Licenses INTRODUCTION To complete the installation of an XC Site License, create an options file that includes the Host Name (computer s name) of each client machine.
More informationSpring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE
Spring,2015 Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Contents: Briefly About Big Data Management What is hive? Hive Architecture Working
More informationCloudera Certified Developer for Apache Hadoop
Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number
More informationHadoop Basics with InfoSphere BigInsights
An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Unit 4: Hadoop Administration An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted
More informationR / TERR. Ana Costa e SIlva, PhD Senior Data Scientist TIBCO. Copyright 2000-2013 TIBCO Software Inc.
R / TERR Ana Costa e SIlva, PhD Senior Data Scientist TIBCO Copyright 2000-2013 TIBCO Software Inc. Tower of Big and Fast Data Visual Data Discovery Hundreds of Records Millions of Records Key peformance
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationIDS 561 Big data analytics Assignment 1
IDS 561 Big data analytics Assignment 1 Due Midnight, October 4th, 2015 General Instructions The purpose of this tutorial is (1) to get you started with Hadoop and (2) to get you acquainted with the code
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationProject 5 Twitter Analyzer Due: Fri. 2015-12-11 11:59:59 pm
Project 5 Twitter Analyzer Due: Fri. 2015-12-11 11:59:59 pm Goal. In this project you will use Hadoop to build a tool for processing sets of Twitter posts (i.e. tweets) and determining which people, tweets,
More informationITG Software Engineering
Introduction to Apache Hadoop Course ID: Page 1 Last Updated 12/15/2014 Introduction to Apache Hadoop Course Overview: This 5 day course introduces the student to the Hadoop architecture, file system,
More informationA Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
More informationIntegrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
More informationHadoop Streaming. Table of contents
Table of contents 1 Hadoop Streaming...3 2 How Streaming Works... 3 3 Streaming Command Options...4 3.1 Specifying a Java Class as the Mapper/Reducer... 5 3.2 Packaging Files With Job Submissions... 5
More informationTRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
More informationConfiguring Secure Socket Layer (SSL) for use with BPM 7.5.x
Configuring Secure Socket Layer (SSL) for use with BPM 7.5.x Configuring Secure Socket Layer (SSL) communication for a standalone environment... 2 Import the Process Server WAS root SSL certificate into
More informationHigh Performance Computing with Hadoop WV HPC Summer Institute 2014
High Performance Computing with Hadoop WV HPC Summer Institute 2014 E. James Harner Director of Data Science Department of Statistics West Virginia University June 18, 2014 Outline Introduction Hadoop
More informationIBM Software Hadoop Fundamentals
Hadoop Fundamentals Unit 2: Hadoop Architecture Copyright IBM Corporation, 2014 US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
More informationMapReduce. Tushar B. Kute, http://tusharkute.com
MapReduce Tushar B. Kute, http://tusharkute.com What is MapReduce? MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity
More informationManaged File Transfer with Universal File Mover
Managed File Transfer with Universal File Mover Roger Lacroix roger.lacroix@capitalware.com http://www.capitalware.com Universal File Mover Overview Universal File Mover (UFM) allows the user to combine
More informationZihang Yin Introduction R is commonly used as an open share statistical software platform that enables analysts to do complex statistical analysis with limited computing knowledge. Frequently these analytical
More informationData Migration from Magento 1 to Magento 2 Including ParadoxLabs Authorize.Net CIM Plugin Last Updated Jan 4, 2016
Data Migration from Magento 1 to Magento 2 Including ParadoxLabs Authorize.Net CIM Plugin Last Updated Jan 4, 2016 This guide was contributed by a community developer for your benefit. Background Magento
More informationRunning Hadoop on Windows CCNP Server
Running Hadoop at Stirling Kevin Swingler Summary The Hadoopserver in CS @ Stirling A quick intoduction to Unix commands Getting files in and out Compliing your Java Submit a HadoopJob Monitor your jobs
More informationUploads from client PC's to mercury are not enabled for security reasons.
Page 1 Oracle via SSH (on line database classes only) The CS2 Oracle Database (Oracle 9i) is located on a Windows 2000 server named mercury. Students enrolled in on line database classes may access this
More informationPackage HadoopStreaming
Package HadoopStreaming February 19, 2015 Type Package Title Utilities for using R scripts in Hadoop streaming Version 0.2 Date 2009-09-28 Author David S. Rosenberg Maintainer
More informationCSCI6900 Assignment 2: Naïve Bayes on Hadoop
DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF GEORGIA CSCI6900 Assignment 2: Naïve Bayes on Hadoop DUE: Friday, September 18 by 11:59:59pm Out September 4, 2015 1 IMPORTANT NOTES You are expected to use
More informationICE Trade Vault. Public User & Technology Guide June 6, 2014
ICE Trade Vault Public User & Technology Guide June 6, 2014 This material may not be reproduced or redistributed in whole or in part without the express, prior written consent of IntercontinentalExchange,
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationThe Hadoop Eco System Shanghai Data Science Meetup
The Hadoop Eco System Shanghai Data Science Meetup Karthik Rajasethupathy, Christian Kuka 03.11.2015 @Agora Space Overview What is this talk about? Giving an overview of the Hadoop Ecosystem and related
More informationSetting Up ALERE with Client/Server Data
Setting Up ALERE with Client/Server Data TIW Technology, Inc. November 2014 ALERE is a registered trademark of TIW Technology, Inc. The following are registered trademarks or trademarks: FoxPro, SQL Server,
More informationCloudera Backup and Disaster Recovery
Cloudera Backup and Disaster Recovery Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans
More informationCOSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015
COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt
More informationSpectrum Technology Platform. Version 9.0. Spectrum Spatial Administration Guide
Spectrum Technology Platform Version 9.0 Spectrum Spatial Administration Guide Contents Chapter 1: Introduction...7 Welcome and Overview...8 Chapter 2: Configuring Your System...9 Changing the Default
More informationCloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu
Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects
More informationCactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)
CactoScale Guide User Guide Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB) Version History Version Date Change Author 0.1 12/10/2014 Initial version Athanasios Tsitsipas(UULM)
More informationActian Analytics Platform Express Hadoop SQL Edition 2.0
Actian Analytics Platform Express Hadoop SQL Edition 2.0 Tutorial AH-2-TU-05 This Documentation is for the end user's informational purposes only and may be subject to change or withdrawal by Actian Corporation
More informationComplete Java Classes Hadoop Syllabus Contact No: 8888022204
1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What
More informationThe Hadoop Implementation. Thomas Zimmermann Philipp Berger
Link Analysis goes MapReduce The Hadoop Implementation Thomas Zimmermann Philipp Berger Flashback 2 Overview 3 1. Pre- / Postprocessing 2. Our Jobs 3. Evaluation Overview 4 1. Pre- / Postprocessing 2.
More informationApache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.
EDUREKA Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.0 Cluster edureka! 11/12/2013 A guide to Install and Configure
More informationPerceptive Intelligent Capture Solution Configration Manager
Perceptive Intelligent Capture Solution Configration Manager Installation and Setup Guide Version: 1.0.x Written by: Product Knowledge, R&D Date: February 2016 2015 Lexmark International Technology, S.A.
More informationMarkLogic Server. MarkLogic Connector for Hadoop Developer s Guide. MarkLogic 8 February, 2015
MarkLogic Connector for Hadoop Developer s Guide 1 MarkLogic 8 February, 2015 Last Revised: 8.0-3, June, 2015 Copyright 2015 MarkLogic Corporation. All rights reserved. Table of Contents Table of Contents
More informationUsing Keil software with Linux via VirtualBox
Using Keil software with Linux via VirtualBox Introduction The Keil UVision software used to develop programs for ARM based microprocessor systems is designed to run on Microsoft Windows operating systems.
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More informationcloud-kepler Documentation
cloud-kepler Documentation Release 1.2 Scott Fleming, Andrea Zonca, Jack Flowers, Peter McCullough, El July 31, 2014 Contents 1 System configuration 3 1.1 Python and Virtualenv setup.......................................
More informationConstructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
More informationTo reduce or not to reduce, that is the question
To reduce or not to reduce, that is the question 1 Running jobs on the Hadoop cluster For part 1 of assignment 8, you should have gotten the word counting example from class compiling. To start with, let
More informationArchitecting the Future of Big Data
Hive ODBC Driver User Guide Revised: July 22, 2013 2012-2013 Hortonworks Inc. All Rights Reserved. Parts of this Program and Documentation include proprietary software and content that is copyrighted and
More informationNovell ZENworks Asset Management 7.5
Novell ZENworks Asset Management 7.5 w w w. n o v e l l. c o m October 2006 USING THE WEB CONSOLE Table Of Contents Getting Started with ZENworks Asset Management Web Console... 1 How to Get Started...
More informationBest Practices for Hadoop Data Analysis with Tableau
Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks
More informationCloudera Backup and Disaster Recovery
Cloudera Backup and Disaster Recovery Important Note: Cloudera Manager 4 and CDH 4 have reached End of Maintenance (EOM) on August 9, 2015. Cloudera will not support or provide patches for any of the Cloudera
More informationForensic Clusters: Advanced Processing with Open Source Software. Jon Stewart Geoff Black
Forensic Clusters: Advanced Processing with Open Source Software Jon Stewart Geoff Black Who We Are Mac Lightbox Guidance alum Mr. EnScript C++ & Java Developer Fortune 100 Financial NCIS (DDK/ManTech)
More informationSAS 9.3 Foundation for Microsoft Windows
Software License Renewal Instructions SAS 9.3 Foundation for Microsoft Windows Note: In this document, references to Microsoft Windows or Windows include Microsoft Windows for x64. SAS software is licensed
More informationCloudera Manager Training: Hands-On Exercises
201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working
More informationBig Data, beating the Skills Gap Using R with Hadoop
Big Data, beating the Skills Gap Using R with Hadoop Using R with Hadoop There are a number of R packages available that can interact with Hadoop, including: hive - Not to be confused with Apache Hive,
More informationHow To Use Query Console
Query Console User Guide 1 MarkLogic 8 February, 2015 Last Revised: 8.0-1, February, 2015 Copyright 2015 MarkLogic Corporation. All rights reserved. Table of Contents Table of Contents Query Console User
More informationFigure 1. Accessing via External Tables with in-database MapReduce
1 The simplest way to access external files or external data on a file system from within an Oracle database is through an external table. See here for an introduction to External tables. External tables
More informationDatabase migration using Wizard, Studio and Commander. Based on migration from Oracle to PostgreSQL (Greenplum)
Step by step guide. Database migration using Wizard, Studio and Commander. Based on migration from Oracle to PostgreSQL (Greenplum) Version 1.0 Copyright 1999-2012 Ispirer Systems Ltd. Ispirer and SQLWays
More information000-420. IBM InfoSphere MDM Server v9.0. Version: Demo. Page <<1/11>>
000-420 IBM InfoSphere MDM Server v9.0 Version: Demo Page 1. As part of a maintenance team for an InfoSphere MDM Server implementation, you are investigating the "EndDate must be after StartDate"
More informationDMX-h ETL Use Case Accelerator. Word Count
DMX-h ETL Use Case Accelerator Word Count Syncsort Incorporated, 2015 All rights reserved. This document contains proprietary and confidential material, and is only for use by licensees of DMExpress. This
More informationConfiguring a Custom Load Evaluator Use the XenApp1 virtual machine, logged on as the XenApp\administrator user for this task.
Lab 8 User name: Administrator Password: Password1 Contents Exercise 8-1: Assigning a Custom Load Evaluator... 1 Scenario... 1 Configuring a Custom Load Evaluator... 1 Assigning a Load Evaluator to a Server...
More informationOracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features
Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine
More informationManaging Linux Servers with System Center 2012 R2
Managing Linux Servers with System Center 2012 R2 System Center 2012 R2 Hands-on lab In this lab, you will use System Center 2012 R2 Operations Manager and System Center 2012 R2 Configuration Manager to
More informationODBC Client Driver Help. 2015 Kepware, Inc.
2015 Kepware, Inc. 2 Table of Contents Table of Contents 2 4 Overview 4 External Dependencies 4 Driver Setup 5 Data Source Settings 5 Data Source Setup 6 Data Source Access Methods 13 Fixed Table 14 Table
More information