MR-(Mapreduce Programming Language)



Similar documents
Hadoop and Eclipse. Eclipse Hawaii User s Group May 26th, Seth Ladd

Step 4: Configure a new Hadoop server This perspective will add a new snap-in to your bottom pane (along with Problems and Tasks), like so:

Word count example Abdalrahman Alsaedi

Hadoop WordCount Explained! IT332 Distributed Systems

Istanbul Şehir University Big Data Camp 14. Hadoop Map Reduce. Aslan Bakirov Kevser Nur Çoğalmış

Case-Based Reasoning Implementation on Hadoop and MapReduce Frameworks Done By: Soufiane Berouel Supervised By: Dr Lily Liang

Copy the.jar file into the plugins/ subfolder of your Eclipse installation. (e.g., C:\Program Files\Eclipse\plugins)

Introduction To Hadoop

How To Write A Mapreduce Program In Java.Io (Orchestra)

Zebra and MapReduce. Table of contents. 1 Overview Hadoop MapReduce APIs Zebra MapReduce APIs Zebra MapReduce Examples...

Introduc)on to the MapReduce Paradigm and Apache Hadoop. Sriram Krishnan

Hadoop Framework. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

Introduction to MapReduce and Hadoop

Hadoop. Dawid Weiss. Institute of Computing Science Poznań University of Technology

Hadoop Configuration and First Examples

CS54100: Database Systems

Connecting Hadoop with Oracle Database

Creating.NET-based Mappers and Reducers for Hadoop with JNBridgePro

Programming Hadoop Map-Reduce Programming, Tuning & Debugging. Arun C Murthy Yahoo! CCDI acm@yahoo-inc.com ApacheCon US 2008

Data Science in the Wild

Elastic Map Reduce. Shadi Khalifa Database Systems Laboratory (DSL)

Word Count Code using MR2 Classes and API

map/reduce connected components

Hadoop Lab Notes. Nicola Tonellotto November 15, 2010

Hadoop: Understanding the Big Data Processing Method

Programming in Hadoop Programming, Tuning & Debugging

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com

Assignment 1 Introduction to the Hadoop Environment

HPCHadoop: MapReduce on Cray X-series

Tutorial- Counting Words in File(s) using MapReduce

Report Vertiefung, Spring 2013 Constant Interval Extraction using Hadoop

How To Write A Map In Java (Java) On A Microsoft Powerbook 2.5 (Ahem) On An Ipa (Aeso) Or Ipa 2.4 (Aseo) On Your Computer Or Your Computer

By Hrudaya nath K Cloud Computing

Hadoop & Pig. Dr. Karina Hauser Senior Lecturer Management & Entrepreneurship

BIG DATA APPLICATIONS

Hadoop, Hive & Spark Tutorial

Hadoop Basics with InfoSphere BigInsights

Working With Hadoop. Important Terminology. Important Terminology. Anatomy of MapReduce Job Run. Important Terminology

Hadoop Integration Guide

Big Data Management and NoSQL Databases

How To Write A Mapreduce Program On An Ipad Or Ipad (For Free)

web-scale data processing Christopher Olston and many others Yahoo! Research

Introduction to MapReduce and Hadoop

How To Write A Map Reduce In Hadoop Hadooper (Ahemos)

Download and install Download virtual machine Import virtual machine in Virtualbox

Hadoop Streaming coreservlets.com and Dima May coreservlets.com and Dima May

Lambda Architecture. CSCI 5828: Foundations of Software Engineering Lecture 29 12/09/2014

Hadoop Pig. Introduction Basic. Exercise

Building a distributed search system with Apache Hadoop and Lucene. Mirko Calvaresi

Extreme Computing. Hadoop MapReduce in more detail.

Hadoop MapReduce Tutorial - Reduce Comp variability in Data Stamps

Mrs: MapReduce for Scientific Computing in Python

Introduc8on to Apache Spark

Xiaoming Gao Hui Li Thilina Gunarathne

Data Science Analytics & Research Centre

Introduction to Cloud Computing

Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart

Hadoop Integration Guide

Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13

Distributed and Cloud Computing

BIG DATA STATE OF THE ART: SPARK AND THE SQL RESURGENCE

Big Data 2012 Hadoop Tutorial

Getting to know Apache Hadoop

Introduction to Big Data Science. Wuhui Chen

Large Scale (Machine) Learning at Twitter. Aleksander Kołcz Twitter, Inc.

and HDFS for Big Data Applications Serge Blazhievsky Nice Systems

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

Three Approaches to Data Analysis with Hadoop

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch November 11,

Hadoop Map-Reduce Tutorial

Running Hadoop on Windows CCNP Server

Why Spark Is the Next Top (Compute) Model

Introduc)on to Map- Reduce. Vincent Leroy

Data Deluge. Billions of users connected through the Internet. Storage getting cheaper

Enterprise Data Storage and Analysis on Tim Barr

School of Parallel Programming & Parallel Architecture for HPC ICTP October, Hadoop for HPC. Instructor: Ekpe Okorafor

Map Reduce a Programming Model for Cloud Computing Based On Hadoop Ecosystem

Lecture #1. An overview of Big Data

How to program a MapReduce cluster

Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网. Information Management. Information Management IBM CDL Lab

Implementations of iterative algorithms in Hadoop and Spark

HowTo Hadoop. Devaraj Das

Data management in the cloud using Hadoop

The Hadoop Eco System Shanghai Data Science Meetup

Performance Overhead on Relational Join in Hadoop using Hive/Pig/Streaming - A Comparative Analysis

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

An Implementation of Sawzall on Hadoop

Petabyte-scale Data with Apache HDFS. Matt Foley Hortonworks, Inc.

MapReduce Tutorial. Table of contents

Scalable Computing with Hadoop

Transcription:

MR-(Mapreduce Programming Language) Siyang Dai Zhi Zhang Shuai Yuan Zeyang Yu Jinxiong Tan sd2694 zz2219 sy2420 zy2156 jt2649 Objective of MR MapReduce is a software framework introduced by Google, aiming at processing distributed computation with huge datasets on a large number of computing nodes. However, this framework includes many duplicable and routine code segments in terms of configuration and exception handling, which is unnecessary burden and distraction for software developer. Our proposed MR Programming Language seeks to provide programmers with a kind of concise and powerful language that simplifies the develop process. Programmers only have to focus on the Map function and Reduce function and specify configuration value, our compiler will finish the rest of work. The basic framework of Mapreduce:

Basic Types byte boolean int long float double text Statements Configuration Statement (start with a # ) Definition Assignment Emit Statement If While For

Word Count Example: In MR: #JobName = word count #Input = $[1] #Output = $[2] #Mappers = $[3] #Reducers = $[4] #Combiners = $[5] // variable key and value are default in Map and Reduce functions, just like the this pointer in OO. def mapfunction( <long, text>, <text, int> ): Map words = value.split() for word in words: emit(word, 1) def reducefunction( <text, int>, <text, int> ): Reduce, Combine word = key count = value.size() emit(word, count) But in Java: import java.io.ioexception; import java.util.arraylist; import java.util.iterator; import java.util.list; import java.util.stringtokenizer; import org.apache.hadoop.conf.configuration; import org.apache.hadoop.conf.configured; import org.apache.hadoop.fs.path; import org.apache.hadoop.io.intwritable; import org.apache.hadoop.io.longwritable; import org.apache.hadoop.io.text; import org.apache.hadoop.mapred.fileinputformat; import org.apache.hadoop.mapred.fileoutputformat; import org.apache.hadoop.mapred.jobclient; import org.apache.hadoop.mapred.jobconf; import org.apache.hadoop.mapred.mapreducebase; import org.apache.hadoop.mapred.mapper; import org.apache.hadoop.mapred.outputcollector; import org.apache.hadoop.mapred.reducer; import org.apache.hadoop.mapred.reporter; import org.apache.hadoop.util.tool; import org.apache.hadoop.util.toolrunner;

/** * This is an example Hadoop Map/Reduce application. * It reads the text input files, breaks each line into words * and counts them. The output is a locally sorted list of words and the * count of how often they occurred. * * To run: bin/hadoop jar build/hadoop-examples.jar wordcount * [-m <i>maps</i>] [-r <i>reduces</i>] <i>in-dir</i> <i>out-dir</i> public class WordCount extends Configured implements Tool { /** * Counts the words in each line. * For each line of input, break the line into words and emit them as * (<b>word</b>, <b>1</b>). public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(longwritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.tostring(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasmoretokens()) { word.set(itr.nexttoken()); output.collect(word, one); /** * A reducer class that just emits the sum of the input values. public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

int sum = 0; while (values.hasnext()) { sum += values.next().get(); output.collect(key, new IntWritable(sum)); static int printusage() { System.out.println("wordcount [-m <maps>] [-r <reduces>] <input> <output>"); ToolRunner.printGenericCommandUsage(System.out); return -1; /** * The main driver for word count map/reduce program. * Invoke this method to submit the map/reduce job. * @throws IOException When there is communication problems with the * job tracker. public int run(string[] args) throws Exception { JobConf conf = new JobConf(getConf(), WordCount.class); conf.setjobname("wordcount"); // the keys are words (strings) conf.setoutputkeyclass(text.class); // the values are counts (ints) conf.setoutputvalueclass(intwritable.class); conf.setmapperclass(mapclass.class); conf.setcombinerclass(reduce.class); conf.setreducerclass(reduce.class); List<String> other_args = new ArrayList<String>(); for(int i=0; i < args.length; ++i) { try { if ("-m".equals(args[i])) { conf.setnummaptasks(integer.parseint(args[++i])); else if ("-r".equals(args[i])) { conf.setnumreducetasks(integer.parseint(args[++i])); else { other_args.add(args[i]); catch (NumberFormatException except) {

System.out.println("ERROR: Integer expected instead of " + args[i]); return printusage(); catch (ArrayIndexOutOfBoundsException except) { System.out.println("ERROR: Required parameter missing from " + args[i-1]); return printusage(); // Make sure there are exactly 2 parameters left. if (other_args.size()!= 2) { System.out.println("ERROR: Wrong number of parameters: " + other_args.size() + " instead of 2."); return printusage(); FileInputFormat.setInputPaths(conf, other_args.get(0)); FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1))); JobClient.runJob(conf); return 0; public static void main(string[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new WordCount(), args); System.exit(res);