OTN Developer Day: Oracle Big Data

Size: px
Start display at page:

Download "OTN Developer Day: Oracle Big Data"

Transcription

1 OTN Developer Day: Oracle Big Data Hands On Lab Manual Oracle Big Data Connectors: Introduction to Oracle R Connector for Hadoop

2 ORACLE R CONNECTOR FOR HADOOP 2.0 HANDS-ON LAB Introduction to Oracle R Connector for Hadoop

3 Contents Introduction to Oracle R Connector for Hadoop... 3 Exercise 1 Work with data in HDFS and Oracle Database... 3 Exercise 2 Execute a simple MapReduce job using ORCH... 4 Exercise 3 Count words in movie plot summaries... 6 Solution for Introduction to Oracle R Connector for Hadoop... 9 Exercise 1 Work with data in HDFS and Oracle Database... 9 Exercise 2 Execute a simple MapReduce job using ORCH Exercise 3 Count words in movie plot summaries... 15

4 Introduction to Oracle R Connector for Hadoop Oracle R Connector for Hadoop (ORCH), a component of the Big Data Connectors option, provides transparent access to Hadoop and HDFS-resident data. Hadoop is a high performance distributed computational system, and the Hadoop Distributed File System (HDFS) is a distributed high-availability file storage mechanism. With ORCH, R users are not forced to learn a new language to work with Hadoop/HDFS they continue to work in R. In addition they can leverage open source R packages as part of their mapper and reducer functions when working on HDFS-resident data. ORCH allows for Hadoop jobs to be executed locally at the client for testing purposes, then, by changing one setting, the exact same code can be executed on the Hadoop cluster without requiring the involvement of administrators, or knowledge of Hadoop internals, the Hadoop call level interface or IT infrastructure. ORCH and Oracle R Enterprise (ORE) can interact in a variety of ways. If ORE is installed on the R client with ORCH, ORCH can copy ore.frames (data tables) to HDFS, ORE can preprocess data that is fed to map-reduce jobs, and ORE can post-process results of map-reduce jobs once data is moved from HDFS to Oracle Database. If ORE is installed on the Big Data Appliance task nodes, mapper and reducer functions can include functions calls to ORE. If ORCH is installed on Oracle Database server, R scripts in embedded R execution can invoke ORCH functionality, achieving operationalization of ORCH scripts via SQL-based applications or those leveraging DBMS_SCHEDULER. To run the commands in this document on the virtual machine (VM), point Firefox to and log into RStudio using oracle user s credentials. From the RStudio File menu, select File-Open File and navigate to location /home/oracle/movie/moviework/advancedanalytics. Select the R Script file _ORCH_Hands-on_Lab.R, and the HOL s script s commands will be opened and available to run in RStudio. Exercise 1 Work with data in HDFS and Oracle Database Loading the ORCH library provides access to some basic functions for manipulating HDFS. After navigating to a specified directory, we ll again access database data in the form of the MOVIE_FACT and MOVIE_GENRE tables, and connect to Oracle Database from ORCH. Although you re connected to the database through ORE, to transfer data between Oracle Database and HDFS requires an ORCH connection. Then, you ll copy data from the database to HDFS for later use with a MapReduce job. Run these commands from the /home/oracle/movie/moviework/advancedanalytics Linux directory. 1. If you are in R, first exit from R using CTRL-D CTRL-D. This will in effect invoke q() and not save the workspace. Change directory and start R: cd /home/oracle/movie/moviework/advancedanalytics R 2. If you are not already connected by default, load the Oracle R Enterprise (ORE) library and connect to the Oracle database, then list the contents of the database to test the connection. Notice that if a table contains columns with unsupported data types, a warning message is returned. If you are connected, you can just invoke ore.ls(). library(ore) ore.connect("moviedemo","orcl","localhost","welcome1",all=true ) ore.ls()

5 3. Load the Oracle R Connector for Hadoop (ORCH) library, get the current working directory, and list the directory contents in Hadoop Distributed File System (HDFS). Change directory in HDFS and view the contents there: library(orch) hdfs.pwd() hdfs.ls() hdfs.cd ("/user/oracle/moviework/advancedanalytics/data") hdfs.ls() 4. Using ORE, view the names of the database tables MOVIE_FACT and MOVIE_GENRE, look at the first few rows of each table, and get the table dimensions: ore.sync("moviedemo","movie_fact") MF <- MOVIE_FACT names(mf) head(mf,3) dim(mf) names(movie_genre) head(movie_genre,3) dim(movie_genre) 5. Since we will use the table MOVIE_GENRE later in our Hadoop recommendation jobs, copy a subset of MOVIE_GENRE from the database to HDFS and validate that it exists. This requires using orch.connect to establish the connect to the database from ORCH. MG_SUBSET <- MOVIE_GENRE[1:10000,] hdfs.rm('movie_genre_subset') orch.connect(host="localhost", user="moviedemo", sid="orcl",passwd="welcome1",secure=f) mg.dfs <- hdfs.push(mg_subset, dfs.name='movie_genre_subset', split.by="genre_id") hdfs.exists('movie_genre_subset') hdfs.describe('movie_genre_subset') hdfs.size('movie_genre_subset') Exercise 2 Execute a simple MapReduce job using ORCH In this exercise, you will execute a Hadoop job that counts the number of movies in each genre. You will first run the script in dry run mode, executing on the local machine serially. Then, you will run on the cluster in the VM. Finally, you will compare the results using ORE. 1. Use hdfs.attach() to attach the movie_genre HDFS file to the working session:

6 mg.dfs <- hdfs.attach("/user/oracle/moviework/advancedanalytics/data/movie_genre_subset ) mg.dfs hdfs.describe(mg.dfs) 2. Specify to run in dry run mode and then execute the MapReduce job that partitions the data based on genre_id, and counts up the number of movies in each genre. Note that you will receive debug output while in dry run mode. orch.dryrun(t) res.dryrun <- NULL res.dryrun <- hadoop.run( mg.dfs, mapper = function(key, val) { orch.keyval(val$genre_id, 1) reducer = function(key, vals) { count <- length(vals) orch.keyval(key, count) }, config = new("mapred.config", map.output = data.frame(key=0, val=0), reduce.output = data.frame(genre_id=0, COUNT=0)) ) 3. Retrieve the result of the Hadoop job, which is stored as an HDFS file. Note that since this is dry run mode, not all data may be used, so only a subset of results may be returned. hdfs.get(res.dryrun) 4. Specify to execute using the cluster by setting orch.dryrun to FALSE, rerun the same MapReduce job, and view the result. Note that this will take longer to execute since it is starting actual Hadoop jobs on the cluster. orch.dryrun(f) res.cluster <- NULL res.cluster <- hadoop.run( mg.dfs, mapper = function(key, val) { orch.keyval(val$genre_id, 1) reducer = function(key, vals) { count <- length(vals) orch.keyval(key, count) }, config = new("mapred.config", map.output = data.frame(key=0, val=0), reduce.output = data.frame(genre_id=0, COUNT=0)) ) hdfs.get(res.cluster) 5. Perform the same analysis using ORE:

7 res.table <- table(mg_subset$genre_id) res.table Exercise 3 Count words in movie plot summaries In this exercise, you will execute a Hadoop job that counts how many times each of the words in MOVIE plot summaries occurs. You will first create the HDFS file containing the data extracted from Oracle Database using ORE. Then, you will run the MarpReduce job on the cluster in the VM. Finally, you will view the results using ORE, but since we ll want the results sorted by most frequent words, another MapReduce job will be needed. 1. If starting a fresh R session, execute the first block. Otherwise, continue to find all the movies with plot summaries and convert them from an ore.factor to ore.character. Remove various unneeded punctuation from the text, create a database table from these, and create the input corpus for the MapReduce job: library(orch) orch.connect(host="localhost", user="moviedemo", sid="orcl",passwd="welcome1",secure=f) hdfs.cd("/user/oracle/moviework/advancedanalytics/data") ore.drop(table= corpus_table ) corpus <- as.character(movie[!is.na(movie$plot_summary),"plot_summary"]) class(corpus) corpus <- gsub("([/\\\":,#.@-])", " ", corpus) head(corpus,2) corpus <- data.frame(text=corpus) ore.create (corpus,table = "corpus_table") hdfs.rm("plot_summary_corpus") input <- hdfs.put(corpus_table,dfs.name="plot_summary_corpus") 2. Try the following example to see how R parses text using strsplit. Notice the extra space between my and text. This gets converted to a null output. You will account for that in the next step. txt <- "This is my text" strsplit(txt," ") mylist <- list(a = 5, B = 10, C = 25) sum(unlist(mylist)) 3. Execute the MapReduce job that performs the word count.

8 res <- hadoop.exec(dfs.id = input, mapper = function(k,v) { x <- strsplit(v[[1]], " ")[[1]] x <- x[x!=''] out <- NULL for(i in 1:length(x)) out <- c(out, orch.keyval(x[i],1)) out reducer = function(k,vv) { orch.keyval(k, sum(unlist(vv))) config = new("mapred.config", job.name = "wordcount", map.output = data.frame(key='', val=0), reduce.output = data.frame(key='', val=0) ) ) 4. View the path of the result HDFS file. Then get the contents of the result. Notice that the results are unordered. res hdfs.get(res) 5. To sort the results, we can use the following MapReduce job. Notice that we can specify explicit stopwords, i.e., those to be excluded from the set, but that we also eliminate words of 3 letters or fewer. Then view the sorted results, as well as a sample of 10 rows from the HDFS file. Which words are the most popular in the plot summaries?

9 stopwords <- c("from","they","that","with","their","when","into","what") sorted.res <- hadoop.exec(dfs.id = res, mapper = function(k,v) { if(!(k %in% stopwords) & nchar(k) > 3) { cnt <- sprintf("%05d", as.numeric(v[[1]])) orch.keyval(cnt,k) } reducer = function(k,vv) { orch.keyvals(k, vv) export= orch.export(stopwords), config = new("mapred.config", job.name = "sort.words", reduce.tasks = 1, map.output = data.frame(key='', val=''), reduce.output = data.frame(key='', val='') ) ) hdfs.get(sorted.res) hdfs.sample(sorted.res,10)

10 Solution for Introduction to Oracle R Connector for Hadoop Oracle R Connector for Hadoop (ORCH), a component of the Big Data Connectors option, provides transparent access to Hadoop and HDFS-resident data. Hadoop is a high performance distributed computational system, and the Hadoop Distributed File System (HDFS) is a distributed high-availability file storage mechanism. With ORCH, R users are not forced to learn a new language to work with Hadoop/HDFS they continue to work in R. In addition they can leverage open source R packages as part of their mapper and reducer functions when working on HDFS-resident data. ORCH allows for Hadoop jobs to be executed locally at the client for testing purposes, then, by changing one setting, the exact same code can be executed on the Hadoop cluster without requiring the involvement of administrators, or knowledge of Hadoop internals, the Hadoop call level interface or IT infrastructure. ORCH and Oracle R Enterprise (ORE) can interact in a variety of ways. If ORE is installed on the R client with ORCH, ORCH can copy ore.frames (data tables) to HDFS, ORE can preprocess data that is fed to map-reduce jobs, and ORE can post-process results of map-reduce jobs once data is moved from HDFS to Oracle Database. If ORE is installed on the Big Data Appliance task nodes, mapper and reducer functions can include functions calls to ORE. If ORCH is installed on Oracle Database server, R scripts in embedded R execution can invoke ORCH functionality, achieving operationalization of ORCH scripts via SQL-based applications or those leveraging DBMS_SCHEDULER. To run the commands in this document on the virtual machine (VM), point Firefox to and log into RStudio using oracle user s credentials. From the RStudio File menu, select File-Open File and navigate to location /home/oracle/movie/moviework/advancedanalytics. Select the R Script file _ORCH_Hands-on_Lab.R, and the HOL s script s commands will be opened and available to run in RStudio. Exercise 1 Work with data in HDFS and Oracle Database Loading the ORCH library provides access to some basic functions for manipulating HDFS. After navigating to a specified directory, we ll again access database data in the form of the MOVIE_FACT and MOVIE_GENRE tables, and connect to Oracle Database from ORCH. Although you re connected to the database through ORE, to transfer data between Oracle Database and HDFS requires an ORCH connection. Then, you ll copy data from the database to HDFS for later use with a MapReduce job. Run these commands from the /home/oracle/movie/moviework/advancedanalytics Linux directory. 1. If you are in R, first exit from R using CTRL-D CTRL-D. This will in effect invoke q() and not save the workspace. Change directory and start R: cd /home/oracle/movie/moviework/advancedanalytics R

11 2. If you are not already connected by default, load the Oracle R Enterprise (ORE) library and connect to the Oracle database, then list the contents of the database to test the connection. Notice that if a table contains columns with unsupported data types, a warning message is returned. If you are connected, you can just invoke ore.ls(). library(ore) ore.connect("moviedemo","orcl","localhost","welcome1",all=true ) ore.ls() 3. Load the Oracle R Connector for Hadoop (ORCH) library, get the current working directory, and list the directory contents in Hadoop Distributed File System (HDFS). Change directory in HDFS and view the contents there:

12 library(orch) hdfs.pwd() hdfs.ls() hdfs.cd ("/user/oracle/moviework/advancedanalytics/data") hdfs.ls() 4. Using ORE, view the names of the database tables MOVIE_FACT and MOVIE_GENRE, look at the first few rows of each table, and get the table dimensions: ore.sync("moviedemo","movie_fact") MF <- MOVIE_FACT names(mf) head(mf,3) dim(mf) names(movie_genre) head(movie_genre,3) dim(movie_genre) 5. Since we will use the table MOVIE_GENRE later in our Hadoop recommendation jobs, copy a subset of MOVIE_GENRE from the database to HDFS and validate that it exists. This requires using orch.connect to establish the connect to the database from ORCH.

13 MG_SUBSET <- MOVIE_GENRE[1:10000,] hdfs.rm('movie_genre_subset') orch.connect(host="localhost", user="moviedemo", sid="orcl",passwd="oracle",secure=f) mg.dfs <- hdfs.push(mg_subset, dfs.name='movie_genre_subset', split.by="genre_id") hdfs.exists('movie_genre_subset') hdfs.describe('movie_genre_subset') hdfs.size('movie_genre_subset') Exercise 2 Execute a simple MapReduce job using ORCH In this exercise, you will execute a Hadoop job that counts the number of movies in each genre. You will first run the script in dry run mode, executing on the local machine serially. Then, you will run on the cluster in the VM. Finally, you will compare the results using ORE. 1. Use hdfs.attach() to attach the movie_genre HDFS file to the working session: mg.dfs <- hdfs.attach("/user/oracle/moviework/advancedanalytics/data/movie_genre_subset") mg.dfs hdfs.describe(mg.dfs)

14 2. Specify to run in dry run mode and then execute the MapReduce job that partitions the data based on genre_id, and counts up the number of movies in each genre. Note that you will receive debug output while in dry run mode. orch.dryrun(t) res.dryrun <- NULL res.dryrun <- hadoop.run( mg.dfs, mapper = function(key, val) { orch.keyval(val$genre_id, 1) reducer = function(key, vals) { count <- length(vals) orch.keyval(key, count) }, config = new("mapred.config", map.output = data.frame(key=0, val=0), reduce.output = data.frame(genre_id=0, COUNT=0)) )

15 3. Retrieve the result of the Hadoop job, which is stored as an HDFS file. Note that since this is dry run mode, not all data may be used, so only a subset of results may be returned. hdfs.get(res.dryrun) 4. Specify to execute using the cluster by setting orch.dryrun to FALSE, rerun the same MapReduce job, and view the result. Note that this will take longer to execute since it is starting actual Hadoop jobs on the cluster.

16 orch.dryrun(f) res.cluster <- NULL res.cluster <- hadoop.run( mg.dfs, mapper = function(key, val) { orch.keyval(val$genre_id, 1) reducer = function(key, vals) { count <- length(vals) orch.keyval(key, count) }, config = new("mapred.config", map.output = data.frame(key=0, val=0), reduce.output = data.frame(genre_id=0, COUNT=0)) ) hdfs.get(res.cluster) 5. Perform the same analysis using ORE: res.table <- table(mg_subset$genre_id) res.table Exercise 3 Count words in movie plot summaries In this exercise, you will execute a Hadoop job that counts how many times each of the words in MOVIE plot summaries occurs. You will first create the HDFS file containing the data extracted from Oracle Database using ORE. Then, you will run the MarpReduce job on the cluster in the VM. Finally, you will view the results using ORE, but since we ll want the results sorted by most frequent words, another MapReduce job will be needed.

17 1. If starting a fresh R session, execute the first block. Otherwise, continue to find all the movies with plot summaries and convert them from an ore.factor to ore.character. Remove various unneeded punctuation from the text, create a database table from these, and create the input corpus for the MapReduce job: library(orch) orch.connect(host="localhost", user="moviedemo", sid="orcl",passwd="welcome1",secure=f) hdfs.cd ("/user/oracle/moviework/advancedanalytics/data") corpus <- as.character(movie[!is.na(movie$plot_summary),"plot_summary"]) class(corpus) corpus <- gsub("([/\\\":,#.@-])", " ", corpus) head(corpus,2) corpus <- data.frame(text=corpus) ore.create (corpus,table = "corpus_table") hdfs.rm("plot_summary_corpus") input <- hdfs.put(corpus_table,dfs.name="plot_summary_corpus") 2. Try the following example to see how R parses text using strsplit. Notice the extra space between my and text. This gets converted to a null output. You will account for that in the next step. txt <- "This is my text" strsplit(txt," ") mylist <- list(a = 5, B = 10, C = 25) sum(unlist(mylist))

18 3. Execute the MapReduce job that performs the word count: res <- hadoop.exec(dfs.id = input, mapper = function(k,v) { x <- strsplit(v[[1]], " ")[[1]] x <- x[x!=''] out <- NULL for(i in 1:length(x)) out <- c(out, orch.keyval(x[i],1)) out reducer = function(k,vv) { orch.keyval(k, sum(unlist(vv))) config = new("mapred.config", job.name = "wordcount", map.output = data.frame(key='', val=0), reduce.output = data.frame(key='', val=0) ) ) 4. View the path of the result HDFS file. Then get the contents of the result. Notice that the results are unordered. res hdfs.get(res)

19 5. To sort the results, we can use the following MapReduce job. Notice that we can specify explicit stopwords, i.e., those to be excluded from the set, but that we also eliminate words of 3 letters or fewer. Then view the sorted results, as well as a sample of 10 rows from the HDFS file. Which words are the most popular in the plot summaries? stopwords <- c("from","they","that","with","their","when","into","what") sorted.res <- hadoop.exec(dfs.id = res, mapper = function(k,v) { if(!(k %in% stopwords) & nchar(k) > 3) { cnt <- sprintf("%05d", as.numeric(v[[1]])) orch.keyval(cnt,k) } reducer = function(k,vv) { orch.keyvals(k, vv) export= orch.export(stopwords), config = new("mapred.config", job.name = "sort.words", reduce.tasks = 1, map.output = data.frame(key='', val=''), reduce.output = data.frame(key='', val='') ) ) hdfs.get(sorted.res) hdfs.sample(sorted.res,10)

20

Big Data Analytics Scaling R to Enterprise Data user! 2013 Albacete Spain #user2013

Big Data Analytics Scaling R to Enterprise Data user! 2013 Albacete Spain #user2013 Big Analytics Scaling R to Enterprise user! 2013 Albacete Spain #user2013 Luis Campos Mark Hornick 1 Big Solutions Lead, Oracle EMEA Director, Oracle base Advanced Analytics @luigicampos @MarkHornick 2

More information

Oracle R zum Anfassen: Die Themen

Oracle R zum Anfassen: Die Themen R zum Anfassen: Die Themen 09:30 Begrüßung 09:45 R Zum Anfassen Einführung 10:15 Minikurs in der Sprache R Sprachmittel, Hilfen, GUIs zum Erstellen der Skripte Schnell und einfach ansprechende Grafiken

More information

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS) Use Data from a Hadoop Cluster with Oracle Database Hands-On Lab Lab Structure Acronyms: OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS) All files are

More information

Connecting Hadoop with Oracle Database

Connecting Hadoop with Oracle Database Connecting Hadoop with Oracle Database Sharon Stephen Senior Curriculum Developer Server Technologies Curriculum The following is intended to outline our general product direction.

More information

Oracle Big Data Essentials

Oracle Big Data Essentials Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 40291196 Oracle Big Data Essentials Duration: 3 Days What you will learn This Oracle Big Data Essentials training deep dives into using the

More information

Hadoop Basics with InfoSphere BigInsights

Hadoop Basics with InfoSphere BigInsights An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Part: 1 Exploring Hadoop Distributed File System An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government

More information

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Version 3.0 Please note: This appliance is for testing and educational purposes only; it is unsupported and not

More information

Kognitio Technote Kognitio v8.x Hadoop Connector Setup

Kognitio Technote Kognitio v8.x Hadoop Connector Setup Kognitio Technote Kognitio v8.x Hadoop Connector Setup For External Release Kognitio Document No Authors Reviewed By Authorised By Document Version Stuart Watt Date Table Of Contents Document Control...

More information

Integrating VoltDB with Hadoop

Integrating VoltDB with Hadoop The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.

More information

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig Contents Acknowledgements... 1 Introduction to Hive and Pig... 2 Setup... 2 Exercise 1 Load Avro data into HDFS... 2 Exercise 2 Define an

More information

NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September 2014. National Institute of Standards and Technology (NIST)

NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September 2014. National Institute of Standards and Technology (NIST) NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop September 2014 Dylan Yaga NIST/ITL CSD Lead Software Designer Fernando Podio NIST/ITL CSD Project Manager National Institute of Standards

More information

How to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1

How to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1 How to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Big Data and Scripting map/reduce in Hadoop

Big Data and Scripting map/reduce in Hadoop Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb

More information

Oracle Data Miner (Extension of SQL Developer 4.0)

Oracle Data Miner (Extension of SQL Developer 4.0) An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,

More information

Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager

Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager Oracle Data Integrator for Big Data Alex Kotopoulis Senior Principal Product Manager Hands on Lab - Oracle Data Integrator for Big Data Abstract: This lab will highlight to Developers, DBAs and Architects

More information

Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining

Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining R The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei

CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei CSE 344 Introduction to Data Management Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei Homework 8 Big Data analysis on billion triple dataset using Amazon Web Service (AWS) Billion Triple Set: contains

More information

Cloud Computing. Chapter 8. 8.1 Hadoop

Cloud Computing. Chapter 8. 8.1 Hadoop Chapter 8 Cloud Computing In cloud computing, the idea is that a large corporation that has many computers could sell time on them, for example to make profitable use of excess capacity. The typical customer

More information

ORACLE NOSQL DATABASE HANDS-ON WORKSHOP Cluster Deployment and Management

ORACLE NOSQL DATABASE HANDS-ON WORKSHOP Cluster Deployment and Management ORACLE NOSQL DATABASE HANDS-ON WORKSHOP Cluster Deployment and Management Lab Exercise 1 Deploy 3x3 NoSQL Cluster into single Datacenters Objective: Learn from your experience how simple and intuitive

More information

Monitoring Oracle Enterprise Performance Management System Release 11.1.2.3 Deployments from Oracle Enterprise Manager 12c

Monitoring Oracle Enterprise Performance Management System Release 11.1.2.3 Deployments from Oracle Enterprise Manager 12c Monitoring Oracle Enterprise Performance Management System Release 11.1.2.3 Deployments from Oracle Enterprise Manager 12c This document describes how to set up Oracle Enterprise Manager 12c to monitor

More information

Hadoop Data Warehouse Manual

Hadoop Data Warehouse Manual Ruben Vervaeke & Jonas Lesy 1 Hadoop Data Warehouse Manual To start off, we d like to advise you to read the thesis written about this project before applying any changes to the setup! The thesis can be

More information

RHadoop Installation Guide for Red Hat Enterprise Linux

RHadoop Installation Guide for Red Hat Enterprise Linux RHadoop Installation Guide for Red Hat Enterprise Linux Version 2.0.2 Update 2 Revolution R, Revolution R Enterprise, and Revolution Analytics are trademarks of Revolution Analytics. All other trademarks

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

User's Guide - Beta 1 Draft

User's Guide - Beta 1 Draft IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent vnext User's Guide - Beta 1 Draft SC27-2319-05 IBM Tivoli Composite Application Manager for Microsoft

More information

Hands-on Exercises with Big Data

Hands-on Exercises with Big Data Hands-on Exercises with Big Data Lab Sheet 1: Getting Started with MapReduce and Hadoop The aim of this exercise is to learn how to begin creating MapReduce programs using the Hadoop Java framework. In

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

Data Domain Profiling and Data Masking for Hadoop

Data Domain Profiling and Data Masking for Hadoop Data Domain Profiling and Data Masking for Hadoop 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or

More information

Hadoop Tutorial. General Instructions

Hadoop Tutorial. General Instructions CS246: Mining Massive Datasets Winter 2016 Hadoop Tutorial Due 11:59pm January 12, 2016 General Instructions The purpose of this tutorial is (1) to get you started with Hadoop and (2) to get you acquainted

More information

Map Reduce & Hadoop Recommended Text:

Map Reduce & Hadoop Recommended Text: Big Data Map Reduce & Hadoop Recommended Text:! Large datasets are becoming more common The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately

More information

OTN Developer Day: Oracle Big Data. Hands On Lab Manual. Introduction to Oracle NoSQL Database

OTN Developer Day: Oracle Big Data. Hands On Lab Manual. Introduction to Oracle NoSQL Database OTN Developer Day: Oracle Big Data Hands On Lab Manual Introduction to ORACLE NOSQL DATABASE HANDS-ON WORKSHOP ii Hands on Workshop Lab Exercise 1 Start and run the Movieplex application. In this lab,

More information

Rumen. Table of contents

Rumen. Table of contents Table of contents 1 Overview... 2 1.1 Motivation...2 1.2 Components...2 2 How to use Rumen?...3 2.1 Trace Builder... 3 2.2 Folder... 5 3 Appendix... 8 3.1 Resources... 8 3.2 Dependencies... 8 1 Overview

More information

LAB 2 SPARK / D-STREAM PROGRAMMING SCIENTIFIC APPLICATIONS FOR IOT WORKSHOP

LAB 2 SPARK / D-STREAM PROGRAMMING SCIENTIFIC APPLICATIONS FOR IOT WORKSHOP LAB 2 SPARK / D-STREAM PROGRAMMING SCIENTIFIC APPLICATIONS FOR IOT WORKSHOP ICTP, Trieste, March 24th 2015 The objectives of this session are: Understand the Spark RDD programming model Familiarize with

More information

Chase Wu New Jersey Ins0tute of Technology

Chase Wu New Jersey Ins0tute of Technology CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at

More information

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Miami University RedHawk Cluster Working with batch jobs on the Cluster Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.

More information

Click Stream Data Analysis Using Hadoop

Click Stream Data Analysis Using Hadoop Governors State University OPUS Open Portal to University Scholarship Capstone Projects Spring 2015 Click Stream Data Analysis Using Hadoop Krishna Chand Reddy Gaddam Governors State University Sivakrishna

More information

MySQL for Beginners Ed 3

MySQL for Beginners Ed 3 Oracle University Contact Us: 1.800.529.0165 MySQL for Beginners Ed 3 Duration: 4 Days What you will learn The MySQL for Beginners course helps you learn about the world's most popular open source database.

More information

Important Notice. (c) 2010-2013 Cloudera, Inc. All rights reserved.

Important Notice. (c) 2010-2013 Cloudera, Inc. All rights reserved. Hue 2 User Guide Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this document

More information

Setting Up the Site Licenses

Setting Up the Site Licenses XC LICENSE SERVER Setting Up the Site Licenses INTRODUCTION To complete the installation of an XC Site License, create an options file that includes the Host Name (computer s name) of each client machine.

More information

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Spring,2015 Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Contents: Briefly About Big Data Management What is hive? Hive Architecture Working

More information

Cloudera Certified Developer for Apache Hadoop

Cloudera Certified Developer for Apache Hadoop Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number

More information

Hadoop Basics with InfoSphere BigInsights

Hadoop Basics with InfoSphere BigInsights An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Unit 4: Hadoop Administration An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted

More information

R / TERR. Ana Costa e SIlva, PhD Senior Data Scientist TIBCO. Copyright 2000-2013 TIBCO Software Inc.

R / TERR. Ana Costa e SIlva, PhD Senior Data Scientist TIBCO. Copyright 2000-2013 TIBCO Software Inc. R / TERR Ana Costa e SIlva, PhD Senior Data Scientist TIBCO Copyright 2000-2013 TIBCO Software Inc. Tower of Big and Fast Data Visual Data Discovery Hundreds of Records Millions of Records Key peformance

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

IDS 561 Big data analytics Assignment 1

IDS 561 Big data analytics Assignment 1 IDS 561 Big data analytics Assignment 1 Due Midnight, October 4th, 2015 General Instructions The purpose of this tutorial is (1) to get you started with Hadoop and (2) to get you acquainted with the code

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Project 5 Twitter Analyzer Due: Fri. 2015-12-11 11:59:59 pm

Project 5 Twitter Analyzer Due: Fri. 2015-12-11 11:59:59 pm Project 5 Twitter Analyzer Due: Fri. 2015-12-11 11:59:59 pm Goal. In this project you will use Hadoop to build a tool for processing sets of Twitter posts (i.e. tweets) and determining which people, tweets,

More information

ITG Software Engineering

ITG Software Engineering Introduction to Apache Hadoop Course ID: Page 1 Last Updated 12/15/2014 Introduction to Apache Hadoop Course Overview: This 5 day course introduces the student to the Hadoop architecture, file system,

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

Hadoop Streaming. Table of contents

Hadoop Streaming. Table of contents Table of contents 1 Hadoop Streaming...3 2 How Streaming Works... 3 3 Streaming Command Options...4 3.1 Specifying a Java Class as the Mapper/Reducer... 5 3.2 Packaging Files With Job Submissions... 5

More information

TRAINING PROGRAM ON BIGDATA/HADOOP

TRAINING PROGRAM ON BIGDATA/HADOOP Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,

More information

Configuring Secure Socket Layer (SSL) for use with BPM 7.5.x

Configuring Secure Socket Layer (SSL) for use with BPM 7.5.x Configuring Secure Socket Layer (SSL) for use with BPM 7.5.x Configuring Secure Socket Layer (SSL) communication for a standalone environment... 2 Import the Process Server WAS root SSL certificate into

More information

High Performance Computing with Hadoop WV HPC Summer Institute 2014

High Performance Computing with Hadoop WV HPC Summer Institute 2014 High Performance Computing with Hadoop WV HPC Summer Institute 2014 E. James Harner Director of Data Science Department of Statistics West Virginia University June 18, 2014 Outline Introduction Hadoop

More information

IBM Software Hadoop Fundamentals

IBM Software Hadoop Fundamentals Hadoop Fundamentals Unit 2: Hadoop Architecture Copyright IBM Corporation, 2014 US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

More information

MapReduce. Tushar B. Kute, http://tusharkute.com

MapReduce. Tushar B. Kute, http://tusharkute.com MapReduce Tushar B. Kute, http://tusharkute.com What is MapReduce? MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity

More information

Managed File Transfer with Universal File Mover

Managed File Transfer with Universal File Mover Managed File Transfer with Universal File Mover Roger Lacroix roger.lacroix@capitalware.com http://www.capitalware.com Universal File Mover Overview Universal File Mover (UFM) allows the user to combine

More information

Zihang Yin Introduction R is commonly used as an open share statistical software platform that enables analysts to do complex statistical analysis with limited computing knowledge. Frequently these analytical

More information

Data Migration from Magento 1 to Magento 2 Including ParadoxLabs Authorize.Net CIM Plugin Last Updated Jan 4, 2016

Data Migration from Magento 1 to Magento 2 Including ParadoxLabs Authorize.Net CIM Plugin Last Updated Jan 4, 2016 Data Migration from Magento 1 to Magento 2 Including ParadoxLabs Authorize.Net CIM Plugin Last Updated Jan 4, 2016 This guide was contributed by a community developer for your benefit. Background Magento

More information

Running Hadoop on Windows CCNP Server

Running Hadoop on Windows CCNP Server Running Hadoop at Stirling Kevin Swingler Summary The Hadoopserver in CS @ Stirling A quick intoduction to Unix commands Getting files in and out Compliing your Java Submit a HadoopJob Monitor your jobs

More information

Uploads from client PC's to mercury are not enabled for security reasons.

Uploads from client PC's to mercury are not enabled for security reasons. Page 1 Oracle via SSH (on line database classes only) The CS2 Oracle Database (Oracle 9i) is located on a Windows 2000 server named mercury. Students enrolled in on line database classes may access this

More information

Package HadoopStreaming

Package HadoopStreaming Package HadoopStreaming February 19, 2015 Type Package Title Utilities for using R scripts in Hadoop streaming Version 0.2 Date 2009-09-28 Author David S. Rosenberg Maintainer

More information

CSCI6900 Assignment 2: Naïve Bayes on Hadoop

CSCI6900 Assignment 2: Naïve Bayes on Hadoop DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF GEORGIA CSCI6900 Assignment 2: Naïve Bayes on Hadoop DUE: Friday, September 18 by 11:59:59pm Out September 4, 2015 1 IMPORTANT NOTES You are expected to use

More information

ICE Trade Vault. Public User & Technology Guide June 6, 2014

ICE Trade Vault. Public User & Technology Guide June 6, 2014 ICE Trade Vault Public User & Technology Guide June 6, 2014 This material may not be reproduced or redistributed in whole or in part without the express, prior written consent of IntercontinentalExchange,

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

The Hadoop Eco System Shanghai Data Science Meetup

The Hadoop Eco System Shanghai Data Science Meetup The Hadoop Eco System Shanghai Data Science Meetup Karthik Rajasethupathy, Christian Kuka 03.11.2015 @Agora Space Overview What is this talk about? Giving an overview of the Hadoop Ecosystem and related

More information

Setting Up ALERE with Client/Server Data

Setting Up ALERE with Client/Server Data Setting Up ALERE with Client/Server Data TIW Technology, Inc. November 2014 ALERE is a registered trademark of TIW Technology, Inc. The following are registered trademarks or trademarks: FoxPro, SQL Server,

More information

Cloudera Backup and Disaster Recovery

Cloudera Backup and Disaster Recovery Cloudera Backup and Disaster Recovery Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans

More information

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015 COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt

More information

Spectrum Technology Platform. Version 9.0. Spectrum Spatial Administration Guide

Spectrum Technology Platform. Version 9.0. Spectrum Spatial Administration Guide Spectrum Technology Platform Version 9.0 Spectrum Spatial Administration Guide Contents Chapter 1: Introduction...7 Welcome and Overview...8 Chapter 2: Configuring Your System...9 Changing the Default

More information

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects

More information

CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)

CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB) CactoScale Guide User Guide Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB) Version History Version Date Change Author 0.1 12/10/2014 Initial version Athanasios Tsitsipas(UULM)

More information

Actian Analytics Platform Express Hadoop SQL Edition 2.0

Actian Analytics Platform Express Hadoop SQL Edition 2.0 Actian Analytics Platform Express Hadoop SQL Edition 2.0 Tutorial AH-2-TU-05 This Documentation is for the end user's informational purposes only and may be subject to change or withdrawal by Actian Corporation

More information

Complete Java Classes Hadoop Syllabus Contact No: 8888022204

Complete Java Classes Hadoop Syllabus Contact No: 8888022204 1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What

More information

The Hadoop Implementation. Thomas Zimmermann Philipp Berger

The Hadoop Implementation. Thomas Zimmermann Philipp Berger Link Analysis goes MapReduce The Hadoop Implementation Thomas Zimmermann Philipp Berger Flashback 2 Overview 3 1. Pre- / Postprocessing 2. Our Jobs 3. Evaluation Overview 4 1. Pre- / Postprocessing 2.

More information

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2. EDUREKA Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.0 Cluster edureka! 11/12/2013 A guide to Install and Configure

More information

Perceptive Intelligent Capture Solution Configration Manager

Perceptive Intelligent Capture Solution Configration Manager Perceptive Intelligent Capture Solution Configration Manager Installation and Setup Guide Version: 1.0.x Written by: Product Knowledge, R&D Date: February 2016 2015 Lexmark International Technology, S.A.

More information

MarkLogic Server. MarkLogic Connector for Hadoop Developer s Guide. MarkLogic 8 February, 2015

MarkLogic Server. MarkLogic Connector for Hadoop Developer s Guide. MarkLogic 8 February, 2015 MarkLogic Connector for Hadoop Developer s Guide 1 MarkLogic 8 February, 2015 Last Revised: 8.0-3, June, 2015 Copyright 2015 MarkLogic Corporation. All rights reserved. Table of Contents Table of Contents

More information

Using Keil software with Linux via VirtualBox

Using Keil software with Linux via VirtualBox Using Keil software with Linux via VirtualBox Introduction The Keil UVision software used to develop programs for ARM based microprocessor systems is designed to run on Microsoft Windows operating systems.

More information

COURSE CONTENT Big Data and Hadoop Training

COURSE CONTENT Big Data and Hadoop Training COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop

More information

cloud-kepler Documentation

cloud-kepler Documentation cloud-kepler Documentation Release 1.2 Scott Fleming, Andrea Zonca, Jack Flowers, Peter McCullough, El July 31, 2014 Contents 1 System configuration 3 1.1 Python and Virtualenv setup.......................................

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

To reduce or not to reduce, that is the question

To reduce or not to reduce, that is the question To reduce or not to reduce, that is the question 1 Running jobs on the Hadoop cluster For part 1 of assignment 8, you should have gotten the word counting example from class compiling. To start with, let

More information

Architecting the Future of Big Data

Architecting the Future of Big Data Hive ODBC Driver User Guide Revised: July 22, 2013 2012-2013 Hortonworks Inc. All Rights Reserved. Parts of this Program and Documentation include proprietary software and content that is copyrighted and

More information

Novell ZENworks Asset Management 7.5

Novell ZENworks Asset Management 7.5 Novell ZENworks Asset Management 7.5 w w w. n o v e l l. c o m October 2006 USING THE WEB CONSOLE Table Of Contents Getting Started with ZENworks Asset Management Web Console... 1 How to Get Started...

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

Cloudera Backup and Disaster Recovery

Cloudera Backup and Disaster Recovery Cloudera Backup and Disaster Recovery Important Note: Cloudera Manager 4 and CDH 4 have reached End of Maintenance (EOM) on August 9, 2015. Cloudera will not support or provide patches for any of the Cloudera

More information

Forensic Clusters: Advanced Processing with Open Source Software. Jon Stewart Geoff Black

Forensic Clusters: Advanced Processing with Open Source Software. Jon Stewart Geoff Black Forensic Clusters: Advanced Processing with Open Source Software Jon Stewart Geoff Black Who We Are Mac Lightbox Guidance alum Mr. EnScript C++ & Java Developer Fortune 100 Financial NCIS (DDK/ManTech)

More information

SAS 9.3 Foundation for Microsoft Windows

SAS 9.3 Foundation for Microsoft Windows Software License Renewal Instructions SAS 9.3 Foundation for Microsoft Windows Note: In this document, references to Microsoft Windows or Windows include Microsoft Windows for x64. SAS software is licensed

More information

Cloudera Manager Training: Hands-On Exercises

Cloudera Manager Training: Hands-On Exercises 201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working

More information

Big Data, beating the Skills Gap Using R with Hadoop

Big Data, beating the Skills Gap Using R with Hadoop Big Data, beating the Skills Gap Using R with Hadoop Using R with Hadoop There are a number of R packages available that can interact with Hadoop, including: hive - Not to be confused with Apache Hive,

More information

How To Use Query Console

How To Use Query Console Query Console User Guide 1 MarkLogic 8 February, 2015 Last Revised: 8.0-1, February, 2015 Copyright 2015 MarkLogic Corporation. All rights reserved. Table of Contents Table of Contents Query Console User

More information

Figure 1. Accessing via External Tables with in-database MapReduce

Figure 1. Accessing via External Tables with in-database MapReduce 1 The simplest way to access external files or external data on a file system from within an Oracle database is through an external table. See here for an introduction to External tables. External tables

More information

Database migration using Wizard, Studio and Commander. Based on migration from Oracle to PostgreSQL (Greenplum)

Database migration using Wizard, Studio and Commander. Based on migration from Oracle to PostgreSQL (Greenplum) Step by step guide. Database migration using Wizard, Studio and Commander. Based on migration from Oracle to PostgreSQL (Greenplum) Version 1.0 Copyright 1999-2012 Ispirer Systems Ltd. Ispirer and SQLWays

More information

000-420. IBM InfoSphere MDM Server v9.0. Version: Demo. Page <<1/11>>

000-420. IBM InfoSphere MDM Server v9.0. Version: Demo. Page <<1/11>> 000-420 IBM InfoSphere MDM Server v9.0 Version: Demo Page 1. As part of a maintenance team for an InfoSphere MDM Server implementation, you are investigating the "EndDate must be after StartDate"

More information

DMX-h ETL Use Case Accelerator. Word Count

DMX-h ETL Use Case Accelerator. Word Count DMX-h ETL Use Case Accelerator Word Count Syncsort Incorporated, 2015 All rights reserved. This document contains proprietary and confidential material, and is only for use by licensees of DMExpress. This

More information

Configuring a Custom Load Evaluator Use the XenApp1 virtual machine, logged on as the XenApp\administrator user for this task.

Configuring a Custom Load Evaluator Use the XenApp1 virtual machine, logged on as the XenApp\administrator user for this task. Lab 8 User name: Administrator Password: Password1 Contents Exercise 8-1: Assigning a Custom Load Evaluator... 1 Scenario... 1 Configuring a Custom Load Evaluator... 1 Assigning a Load Evaluator to a Server...

More information

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine

More information

Managing Linux Servers with System Center 2012 R2

Managing Linux Servers with System Center 2012 R2 Managing Linux Servers with System Center 2012 R2 System Center 2012 R2 Hands-on lab In this lab, you will use System Center 2012 R2 Operations Manager and System Center 2012 R2 Configuration Manager to

More information

ODBC Client Driver Help. 2015 Kepware, Inc.

ODBC Client Driver Help. 2015 Kepware, Inc. 2015 Kepware, Inc. 2 Table of Contents Table of Contents 2 4 Overview 4 External Dependencies 4 Driver Setup 5 Data Source Settings 5 Data Source Setup 6 Data Source Access Methods 13 Fixed Table 14 Table

More information