Word Count Code using MR2 Classes and API



Similar documents
Word count example Abdalrahman Alsaedi

LANGUAGES FOR HADOOP: PIG & HIVE

Hadoop Framework. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

Hadoop Configuration and First Examples

How To Write A Mapreduce Program In Java.Io (Orchestra)

Istanbul Şehir University Big Data Camp 14. Hadoop Map Reduce. Aslan Bakirov Kevser Nur Çoğalmış

Hadoop Lab Notes. Nicola Tonellotto November 15, 2010

Hadoop WordCount Explained! IT332 Distributed Systems

Zebra and MapReduce. Table of contents. 1 Overview Hadoop MapReduce APIs Zebra MapReduce APIs Zebra MapReduce Examples...

Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart

Tutorial- Counting Words in File(s) using MapReduce

Hadoop Streaming coreservlets.com and Dima May coreservlets.com and Dima May

Lambda Architecture. CSCI 5828: Foundations of Software Engineering Lecture 29 12/09/2014

Working With Hadoop. Important Terminology. Important Terminology. Anatomy of MapReduce Job Run. Important Terminology

An Introduction to Apostolos N. Papadopoulos

BIG DATA APPLICATIONS

Introduction to Big Data Science. Wuhui Chen

Introduction to MapReduce and Hadoop

Hadoop Basics with InfoSphere BigInsights

Introduc)on to the MapReduce Paradigm and Apache Hadoop. Sriram Krishnan

Getting to know Apache Hadoop

MR-(Mapreduce Programming Language)

Cloud Computing Era. Trend Micro

HPCHadoop: MapReduce on Cray X-series

Extreme Computing. Hadoop MapReduce in more detail.

Data Science in the Wild

Hadoop: Understanding the Big Data Processing Method

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

Xiaoming Gao Hui Li Thilina Gunarathne

CS54100: Database Systems

The Hadoop Eco System Shanghai Data Science Meetup

map/reduce connected components

Mrs: MapReduce for Scientific Computing in Python

Introduction To Hadoop

Elastic Map Reduce. Shadi Khalifa Database Systems Laboratory (DSL)

The Cloud Computing Era and Ecosystem. Phoenix Liau, Technical Manager

USING HDFS ON DISCOVERY CLUSTER TWO EXAMPLES - test1 and test2

An Implementation of Sawzall on Hadoop

Introduc)on to Map- Reduce. Vincent Leroy

Creating.NET-based Mappers and Reducers for Hadoop with JNBridgePro

Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网. Information Management. Information Management IBM CDL Lab

Hadoop and Eclipse. Eclipse Hawaii User s Group May 26th, Seth Ladd

Step 4: Configure a new Hadoop server This perspective will add a new snap-in to your bottom pane (along with Problems and Tasks), like so:

By Hrudaya nath K Cloud Computing

19 Putting into Practice: Large-Scale Data Management with HADOOP

Want to read more? You can buy this book at oreilly.com in print and ebook format. Buy 2 books, get the 3rd FREE!

Big Data Management and NoSQL Databases

Hadoop. Dawid Weiss. Institute of Computing Science Poznań University of Technology

and HDFS for Big Data Applications Serge Blazhievsky Nice Systems

How to program a MapReduce cluster

Copy the.jar file into the plugins/ subfolder of your Eclipse installation. (e.g., C:\Program Files\Eclipse\plugins)

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

Hadoop & Pig. Dr. Karina Hauser Senior Lecturer Management & Entrepreneurship

Hadoop Integration Guide

Internals of Hadoop Application Framework and Distributed File System

HDInsight Essentials. Rajesh Nadipalli. Chapter No. 1 "Hadoop and HDInsight in a Heartbeat"

BIWA 2015 Big Data Lab Java MapReduce WordCount/Table JOIN Big Data Loader. Arijit Das Greg Belli Erik Lowney Nick Bitto

Enterprise Data Storage and Analysis on Tim Barr

Big Data 2012 Hadoop Tutorial

Map Reduce Workflows

Download and install Download virtual machine Import virtual machine in Virtualbox

Case-Based Reasoning Implementation on Hadoop and MapReduce Frameworks Done By: Soufiane Berouel Supervised By: Dr Lily Liang

Programming Hive. Beijing Cambridge Farnham Köln Sebastopol Tokyo

Data Science Analytics & Research Centre

How To Write A Map In Java (Java) On A Microsoft Powerbook 2.5 (Ahem) On An Ipa (Aeso) Or Ipa 2.4 (Aseo) On Your Computer Or Your Computer

Introduc8on to Apache Spark

Big Data Analytics* Outline. Issues. Big Data

Connecting Hadoop with Oracle Database

Running Hadoop on Windows CCNP Server

Hadoop 2.0 Introduction with HDP for Windows. Seele Lin

Building Big Data Pipelines using OSS. Costin Leau Staff Engineer

Programming Hadoop Map-Reduce Programming, Tuning & Debugging. Arun C Murthy Yahoo! CCDI acm@yahoo-inc.com ApacheCon US 2008

Tutorial. Christopher M. Judd

Big Data for the JVM developer. Costin Leau,

Programming in Hadoop Programming, Tuning & Debugging

Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13

Three Approaches to Data Analysis with Hadoop

Map-Reduce and Hadoop

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com

How To Use Hadoop

BIG DATA ANALYTICS USING HADOOP TOOLS. A Thesis. Presented to the. Faculty of. San Diego State University. In Partial Fulfillment

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch November 11,

AVRO - SERIALIZATION

Advanced Java Client API

Distributed Systems + Middleware Hadoop

This material is built based on, Patterns covered in this class FILTERING PATTERNS. Filtering pattern

Outline of Tutorial. Hadoop and Pig Overview Hands-on

Report Vertiefung, Spring 2013 Constant Interval Extraction using Hadoop

Transcription:

EDUREKA Word Count Code using MR2 Classes and API A Guide to Understand the Execution of Word Count edureka! A guide to understand the execution and flow of word count

WRITE YOU FIRST MRV2 PROGRAM AND EXECUTE IT ON YARN IN HADOOP 2.0 As discussed in the previous blog, Hadoop 2.0, MRv2 and YARN are here to make Hadoop enterprise ready and take it to the next level. If you haven t, create a single Node Hadoop 2.0 cluster to start your hadoop 2.0 journey. This blog helps you to write and execute the first MRv2 Program. Jars Used As Reference Libraries hadoop-mapreduce-client-core-2.2.0.jar hadoop-common-2.2.0.jar These are the jars present in Hadoop-2.2.0 distribution. For your help the jars are zipped along with this document. Packages Imported import java.io.ioexception; import java.util.*; Java Libraries remains same for MR1 and MR2 import org.apache.hadoop.fs.path; import org.apache.hadoop.conf.*; All these packages are present Hadoop-common-2.2.0.jar import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.fileinputformat; import org.apache.hadoop.mapreduce.lib.input.textinputformat; import org.apache.hadoop.mapreduce.lib.output.fileoutputformat; import org.apache.hadoop.mapreduce.lib.output.textoutputformat; All these packages are present in hadoopmapreduceclient-core- 2.2.0.jar 2014 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t. L t d Page 1

Main Class Name of the Main Class or Driver Class within which we have our Mapper and Reducer Class and the main Method public class WordCountNew { MAPPER CLASS Name of the Mapper Class which inherits Super Class Mapper public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { Mapper Class takes 4 Arguments i.e. Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> //Defining a local variable one of type IntWritable private final static IntWritable one = new //Defining a local variable word of type Text private Text word = new Text(); IntWritable(1); We override the map method which is defined in the Parent (Mapper) Class. It takes 3 arguments as Inputs map ( KEYIN key, VALUEIN value, Context context ) public void map(longwritable key, Text value, Context context) throws IOException, InterruptedException { 2014 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t. L t d Page 2

//Converting the record (single line) to String and storing it in a String variable line String line = value.tostring(); //StringTokenizer is breaking the record (line) into words StringTokenizer tokenizer = new StringTokenizer(line); //Running while loop to get each token(word) one by one from StringTokenizer while (tokenizer.hasmoretokens()) { //Saving the token(word) in a variable word word.set(tokenizer.nexttoken()); //Writing the output as (word, one), the value of word will be equal to token and value of one is 1 context.write(word, one); In the map method, we receive a record (single line). It is stored in a string variable line. Using StringTokenizer, we are breaking the line into individual words called tokens, on the basis of space as delimiter. If the line was Hello There, StringTokenizer will give two tokens Hello and There. Finally using the context object we are dumping the Mapper output. So as per our example the Output from the Mapper will be Hello 1 & There 1 and so on. The Output from the Mapper is taken as Input by the Reducer Name of the Reducer Class which inherits Super Class Reducer public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { Reducer Class takes 4 Arguments i.e. Reducer <KEYIN, VALUEIN, KEYOUT, VALUEOUT> 2014 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t. L t d Page 3

We override the reduce method which is defined in the Parent (Reduce) Class. It takes 3 arguments as Inputs reduce ( KEYIN key, VALUEIN value, Context context ) public void reduce(text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { //Defining a local variable sum of type int int sum = 0; //Running for loop to iterate over the values present in Iterator for (IntWritable val : values) { //We are adding the value to the variable over every iteration sum = sum + val.get(); //Finally writing the key and the value of sum(number of times the word occurred in the input file) to the output file context.write(key, new IntWritable(sum)); In the reduce method, we receive a key as word and a list of values as input. For eg: Hello <1,1,1,1> So to find out the occurrence of the word Hello in the input file then we simply have to sum all the values of the list. Hence we run a for loop to iterate over the values one by one and adding it to variable sum. Finally we will write the output i.e key (word) & value (sum) using the context object. So as per the above example the output will be : Hello 4 2014 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t. L t d Page 4

main method known as entry point of the application. This is the method which is called as soon as jar is executed public static void main(string[] args) throws Exception { //Creating an object of Configuration class, which loads the configuration parameters Configuration conf = new Configuration(); //Creating the object of Job class and passing the conf object and Job name as arguments. The Job class allows the user to configure the job, submit it and control its execution. Job job = new Job(conf, "wordcount"); //Setting the jar by finding where a given class came from job.setjarbyclass(wordcountnew.class); //Setting the key class for job output data job.setoutputkeyclass(text.class); //Setting the value class for job output data job.setoutputvalueclass(intwritable.class); //Setting the mapper for the job job.setmapperclass(map.class); //Setting the reducer for the job job.setreducerclass(reduce.class); //Setting the Input Format for the job job.setinputformatclass(textinputformat.class); //Setting the Output Format for the job job.setoutputformatclass(textoutputformat.class); //Adding a path which will act as a input for MR job. args[0] means it will use the first argument written on terminal as input path FileInputFormat.addInputPath(job, new Path(args[0])); 2014 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t. L t d Page 5

//Setting the path to a directory where MR job will dump the output. args[1] means it will use the second argument written on terminal as output path FileOutputFormat.setOutputPath(job,new Path(args[1])); //Submitting the job to the cluster and waiting for its completion job.waitforcompletion(true); ***********************NOW EXPORT THE JAR AND RUN THE CODE*********************** 2014 B r a i n 4 c e E d u c a t i o n S o l u t i o n s P v t. L t d Page 6