coreservlets.com Hadoop Course



Similar documents
ITG Software Engineering

Installation Guide Setting Up and Testing Hadoop on Mac By Ryan Tabora, Think Big Analytics

Map Reduce & Hadoop Recommended Text:

CS 455 Spring Word Count Example

How to Install and Configure EBF15328 for MapR or with MapReduce v1

YARN and how MapReduce works in Hadoop By Alex Holmes

Single Node Setup. Table of contents

How To Write A Mapreduce Program On An Ipad Or Ipad (For Free)

Centrify Server Suite For MapR 4.1 Hadoop With Multiple Clusters in Active Directory

Hadoop Streaming coreservlets.com and Dima May coreservlets.com and Dima May

Setting up Hadoop with MongoDB on Windows 7 64-bit

Virtual Machine (VM) For Hadoop Training

Vaidya Guide. Table of contents

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

HADOOP MOCK TEST HADOOP MOCK TEST II

Hadoop Streaming. Table of contents

Fair Scheduler. Table of contents

Rumen. Table of contents

How to Run Spark Application

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.

Hands-on Exercises with Big Data

MarkLogic Server. MarkLogic Connector for Hadoop Developer s Guide. MarkLogic 8 February, 2015

Extreme Computing. Hadoop. Stratis Viglas. School of Informatics University of Edinburgh Stratis Viglas Extreme Computing 1

USING HDFS ON DISCOVERY CLUSTER TWO EXAMPLES - test1 and test2

Single Node Hadoop Cluster Setup

H2O on Hadoop. September 30,

CDH 5 Quick Start Guide

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

HOD Scheduler. Table of contents

Qsoft Inc

ZeroTurnaround License Server User Manual 1.4.0

Hadoop Setup. 1 Cluster

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

Using The Hortonworks Virtual Sandbox

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Hadoop Training Hands On Exercise

HSearch Installation

Hadoop Hands-On Exercises

InfoSphere Master Data Management operational server v11.x OSGi best practices and troubleshooting guide

How To Install Hadoop From Apa Hadoop To (Hadoop)

Hadoop Tutorial Group 7 - Tools For Big Data Indian Institute of Technology Bombay

Map Reduce Workflows

Productionizing a 24/7 Spark Streaming Service on YARN

CS455 - Lab 10. Thilina Buddhika. April 6, 2015

Complete Java Classes Hadoop Syllabus Contact No:

Important Notice. (c) Cloudera, Inc. All rights reserved.

HDFS - Java API coreservlets.com and Dima May coreservlets.com and Dima May

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Getting to know Apache Hadoop

ITG Software Engineering

Hadoop Setup Walkthrough

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

Tes$ng Hadoop Applica$ons. Tom Wheeler

Hadoop (pseudo-distributed) installation and configuration

Understanding Hadoop Performance on Lustre

Hadoop 2.6 Configuration and More Examples

HiBench Introduction. Carson Wang Software & Services Group

map/reduce connected components

Big Data Training - Hackveda

Word Count Code using MR2 Classes and API

10605 BigML Assignment 4(a): Naive Bayes using Hadoop Streaming

Обработка больших данных: Map Reduce (Python) + Hadoop (Streaming) Максим Щербаков ВолгГТУ 8/10/2014

Install Hadoop on Ubuntu and run as standalone

CS 378 Big Data Programming. Lecture 5 Summariza9on Pa:erns

ORACLE GOLDENGATE BIG DATA ADAPTER FOR FLUME

Hadoop Lab Notes. Nicola Tonellotto November 15, 2010

IDS 561 Big data analytics Assignment 1

Hadoop Configuration and First Examples

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.

6. How MapReduce Works. Jari-Pekka Voutilainen

Finding the Needle in a Big Data Haystack. Wolfgang Hoschek (@whoschek) JAX 2014

docs.hortonworks.com

TIBCO ActiveMatrix BusinessWorks Plug-in for Big Data User's Guide

File S1: Supplementary Information of CloudDOE

NetFlow Analytics for Splunk

Installing Hadoop. You need a *nix system (Linux, Mac OS X, ) with a working installation of Java 1.7, either OpenJDK or the Oracle JDK. See, e.g.

Recommended Literature for this Lecture

Informatica Corporation Proactive Monitoring for PowerCenter Operations Version 3.0 Release Notes May 2014

Welcome to Business Internet Banking

RHadoop Installation Guide for Red Hat Enterprise Linux

COURSE CONTENT Big Data and Hadoop Training

SDK Code Examples Version 2.4.2

Hadoop Tutorial. General Instructions

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from

Detection of Distributed Denial of Service Attack with Hadoop on Live Network

Hadoop Forensics. Presented at SecTor. October, Kevvie Fowler, GCFA Gold, CISSP, MCTS, MCDBA, MCSD, MCSE

Peers Techno log ies Pv t. L td. HADOOP

Set JAVA PATH in Linux Environment. Edit.bashrc and add below 2 lines $vi.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-oracle/

From Relational to Hadoop Part 2: Sqoop, Hive and Oozie. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

Click Stream Data Analysis Using Hadoop

Developing Eclipse Plug-ins* Learning Objectives. Any Eclipse product is composed of plug-ins

Istanbul Şehir University Big Data Camp 14. Hadoop Map Reduce. Aslan Bakirov Kevser Nur Çoğalmış

HIPAA Compliance Use Case

Transcription:

Hadoop training: http://courses.coreservlets.com coreservlets.com Hadoop Course Running MapReduce Jobs In this exercise, you will have a chance to practice running MapReduce jobs. You will exercise various options on passing properties as well as configuring both client s and tasks CLASSPATH. If time allows, extra credit section explores utilizing mapred tool to get job s status as well as to kill currently running job(s). Extra credit also offers practice in implementing JUnit test for a MapReduce job. Approx. Time: 60 minutes Perform 1. Run Tool implementation mapred.runningjobs.expectproperty which expects property training.prop to be set, and if the property is not set it emits java.lang.illegalargumentexception: $ yarn jar $PLAY_AREA/Exercises.jar mapred.runningjobs.expectproperty Exception in thread "main" java.lang.illegalargumentexception: Expected property [training.prop] to be provided at mapred.runningjobs.expectproperty.run(expectproperty.java:14) at org.apache.hadoop.util.toolrunner.run(toolrunner.java:70) at org.apache.hadoop.util.runjar.main(runjar.java:208) 2. Set the property training.prop by adding extra parameters to the command line 3. Set the property training.prop by providing external configuration file. You will need to create a brand new configuration file and then specify your file via command line. 4. Run Tool implementation mapred.runningjobs.expectclassonclient which expects class common.propprinter to on CLASSPATH on the client s classpath; this means that the class is used within Tool s implementation; you will get java.lang.classnotfoundexception: $ yarn jar $PLAY_AREA/Exercises.jar mapred.runningjobs.expectclassonclient Exception in thread "main" java.lang.noclassdeffounderror: common/propprinter at mapred.runningjobs.expectclass.run(expectclass.java:14) at org.apache.hadoop.util.runjar.main(runjar.java:208) Caused by: java.lang.classnotfoundexception: common.propprinter at java.net.urlclassloader$1.run(urlclassloader.java:202) at java.security.accesscontroller.doprivileged(native Method) at java.net.urlclassloader.findclass(urlclassloader.java:190) at java.lang.classloader.loadclass(classloader.java:306) at java.lang.classloader.loadclass(classloader.java:247) 9 more 5. The required class, common.propprinter, can be found in $PLAY_AREA/HadoopSample.jar; add the required jar to the CLASSPATH to run the tool without an exception: yarn jar $PLAY_AREA/Exercises.jar mapred.runningjobs.expectclassonclient

6. Run Tool implementation mapred.runningjobs.expectclassontask which expects class common.propprinter to be on the Mapper Task s CLASSPATH; you will get java.lang.classnotfoundexception: $ yarn jar $PLAY_AREA/Exercises.jar mapred.runningjobs.expectclassontask /training/data/hamlet.txt /training/playarea/expectclassontask 2012-09-17 14:05:38,562 INFO mapreduce.job (Job.java:monitorAndPrintJob(1275)) - Running job: job_1347904219060_0005 2012-09-17 14:05:56,215 INFO mapreduce.job (Job.java:printTaskEvents(1391)) - Task Id : attempt_1347904219060_0005_m_000000_2, Status : FAILED Error: java.lang.classnotfoundexception: common.propprinter at java.net.urlclassloader$1.run(urlclassloader.java:202) at java.security.accesscontroller.doprivileged(native Method) at java.net.urlclassloader.findclass(urlclassloader.java:190) at java.lang.classloader.loadclass(classloader.java:306) at sun.misc.launcher$appclassloader.loadclass(launcher.java:301) at java.lang.classloader.loadclass(classloader.java:247) at mapred.runningjobs.expectclassontask$expectclassontaskmapper.map(expectclassontask.java:40) at mapred.runningjobs.expectclassontask$expectclassontaskmapper.map(expectclassontask.java:36) Job Counters Failed map tasks=4 7. The required class, common.propprinter, can be found in $PLAY_AREA/HadoopSample.jar; add the required jar to the Mapper Task s CLASSPATH to run the job without an exception. Perform Extra Credit 1. Execute mapred.runningjobs.neverendingjob whose Mapper implementation is an infinite loop that logs a message then sleeps for 2 seconds. This job will never end by itself. $ yarn jar $PLAY_AREA/Exercises.jar \ mapred.runningjobs.neverendingjob \ /training/data/hamlet.txt \ /training/playarea/neverendingjob 2012-09-17 17:13:07,411 INFO mapreduce.job (Job.java:monitorAndPrintJob(1275)) - Running job: job_1347904219060_0010 2012-09-17 17:13:13,832 INFO mapreduce.job (Job.java:monitorAndPrintJob(1296)) - Job job_1347904219060_0010 running in uber mode : false 2012-09-17 17:13:13,834 INFO mapreduce.job (Job.java:monitorAndPrintJob(1303)) - map 0% reduce 0% Locate a Map task via YARN Management UI by going to http://localhost:8088/cluster and 1. navigating to this specific YARN application 2. to the ApplicationMaster 3. select your job 4. Select a Map task

5. View Map s logs 6. Select syslog Hadoop training: http://courses.coreservlets.com you should see something like this in the log: 2012-09-17 17:19:54,887 INFO [main] mapred.runningjobs.neverendingjob: Ha ha! I will never end! Enjoy this UUID: 88a2abd4-10ac-49b7-9a93-2cdac07636b2 2012-09-17 17:19:56,887 INFO [main] mapred.runningjobs.neverendingjob: Ha ha! I will never end! Enjoy this UUID: 36fb875e-009d-466a-a377-da63d683c9b7 2012-09-17 17:19:58,888 INFO [main] mapred.runningjobs.neverendingjob: Ha ha! I will never end! Enjoy this UUID: 8ff725f0-b16c-40aa-94ac-d9599e6cb35a 2. Use command line to display status of NeverEndingJob 3. Kill NeverEndingJob 4. mapred.runningjobs.counttokens class is a MapReduce job that calculates the total number of tokens in the provided file. Your task is to implement a JUnit test. Run the MapReduce job locally within unit tests. A JUnit class was set up for you in the Exercises project mapred.runningjobs.counttokenstests; The unit test sets up a file with nine tokens therefore the job should produce the output of "count 9"

Solution 1. N/A 2. $ yarn jar $PLAY_AREA/Exercises.jar mapred.runningjobs.expectproperty - Dtraining.prop=hi Property [training.prop] is set to [hi] 3. Follow these steps: $ vi conf.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>training.prop</name> <value>fileprop</value> </property> </configuration> $ yarn jar $PLAY_AREA/Exercises.jar mapred.runningjobs.expectproperty -conf conf.xml 4. N/A 5. Follow these steps: 6. N/A Property [training.prop] is set to [fileprop] a. First Add HaddopSample.jar to the client s CLASSPATH by editing hadoop-env.sh $ vi $HADOOP_CONF_DIR/hadoop-env.sh export HADOOP_CLASSPATH=$HBASE_HOME/*:$HBASE_HOME/conf:$HADOOP_CLASSPATH:$PLAY_AR EA/HadoopSamples.jar b. verify that the jar is on the client s classpath $ yarn classpath grep HadoopSamples c. run the Tool again: $ yarn jar $PLAY_AREA/Exercises.jar mapred.runningjobs.expectclassonclient Class [class common.propprinter] was on CLASSPATH d. Clean up by removing HadoopSample.jar from the CLASSPATH $ vi $HADOOP_CONF_DIR/hadoop-env.sh export HADOOP_CLASSPATH=$HBASE_HOME/*:$HBASE_HOME/conf:$HADOOP_CLASSPATH 7. $ yarn jar $PLAY_AREA/Exercises.jar mapred.runningjobs.expectclassontask \ Extra Credit Solution 1. N/A -libjars $PLAY_AREA/HadoopSamples.jar \ /training/data/hamlet.txt /training/playarea/expectclassontask 2. $ mapred job -status job_1347904219060_0010 3. $ mapred job -kill job_1347904219060_0010 Killed job job_1347904219060_0010

Hadoop training: http://courses.coreservlets.com 4. The code can be found in the Solution s project: /src/test/java/mapred.runningjobs.counttokenstests.java