How To Write A Mapreduce Program On An Ipad Or Ipad (For Free)

Similar documents
map/reduce connected components

Introduc)on to the MapReduce Paradigm and Apache Hadoop. Sriram Krishnan

How To Write A Mapreduce Program In Java.Io (Orchestra)

Introduction to MapReduce and Hadoop

Hadoop WordCount Explained! IT332 Distributed Systems

Istanbul Şehir University Big Data Camp 14. Hadoop Map Reduce. Aslan Bakirov Kevser Nur Çoğalmış

Word count example Abdalrahman Alsaedi

Hadoop Framework. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

MapReduce. Tushar B. Kute,

Step 4: Configure a new Hadoop server This perspective will add a new snap-in to your bottom pane (along with Problems and Tasks), like so:

Data Science in the Wild

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Hadoop Installation MapReduce Examples Jake Karnes

How To Write A Map Reduce In Hadoop Hadooper (Ahemos)

Basic Hadoop Programming Skills

Hadoop: Understanding the Big Data Processing Method

Big Data Management and NoSQL Databases

Case-Based Reasoning Implementation on Hadoop and MapReduce Frameworks Done By: Soufiane Berouel Supervised By: Dr Lily Liang

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

USING HDFS ON DISCOVERY CLUSTER TWO EXAMPLES - test1 and test2

Hadoop Lab Notes. Nicola Tonellotto November 15, 2010

Research Laboratory. Java Web Crawler & Hadoop MapReduce Anri Morchiladze && Bachana Dolidze Supervisor Nodar Momtselidze

Hadoop. Dawid Weiss. Institute of Computing Science Poznań University of Technology

By Hrudaya nath K Cloud Computing

Creating.NET-based Mappers and Reducers for Hadoop with JNBridgePro

Hadoop & Pig. Dr. Karina Hauser Senior Lecturer Management & Entrepreneurship

Copy the.jar file into the plugins/ subfolder of your Eclipse installation. (e.g., C:\Program Files\Eclipse\plugins)

Sriram Krishnan, Ph.D.

MapReduce (in the cloud)

CS54100: Database Systems

研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1

A. Aiken & K. Olukotun PA3

HPCHadoop: MapReduce on Cray X-series

Hadoop Configuration and First Examples

CS2510 Computer Operating Systems Hadoop Examples Guide

Introduction to Cloud Computing

Introduction To Hadoop

Single Node Hadoop Cluster Setup

Extreme Computing. Hadoop MapReduce in more detail.

Map Reduce & Hadoop Recommended Text:

6. How MapReduce Works. Jari-Pekka Voutilainen

Hadoop, Hive & Spark Tutorial

Hadoop Tutorial Group 7 - Tools For Big Data Indian Institute of Technology Bombay

Big Data Too Big To Ignore

How To Write A Map In Java (Java) On A Microsoft Powerbook 2.5 (Ahem) On An Ipa (Aeso) Or Ipa 2.4 (Aseo) On Your Computer Or Your Computer

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

MR-(Mapreduce Programming Language)

Building a distributed search system with Apache Hadoop and Lucene. Mirko Calvaresi

Getting to know Apache Hadoop

Programming Hadoop Map-Reduce Programming, Tuning & Debugging. Arun C Murthy Yahoo! CCDI acm@yahoo-inc.com ApacheCon US 2008

Using BAC Hadoop Cluster

Extreme computing lab exercises Session one

Hadoop MapReduce Tutorial - Reduce Comp variability in Data Stamps

Hadoop (pseudo-distributed) installation and configuration

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Hadoop Training Hands On Exercise

CSE-E5430 Scalable Cloud Computing. Lecture 4

Big Data 2012 Hadoop Tutorial

Download and install Download virtual machine Import virtual machine in Virtualbox

HADOOP MOCK TEST HADOOP MOCK TEST II

TP1: Getting Started with Hadoop

Discover Hadoop. MapReduce Flexible framework for data analysis. Bonus. Network & Security. Special. Step by Step: Configuring.

MarkLogic Server. MarkLogic Connector for Hadoop Developer s Guide. MarkLogic 8 February, 2015

Hadoop Tutorial. General Instructions

Apache Hadoop new way for the company to store and analyze big data

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Hadoop and Eclipse. Eclipse Hawaii User s Group May 26th, Seth Ladd

Big Data and Apache Hadoop s MapReduce

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

2.1 Hadoop a. Hadoop Installation & Configuration

PassTest. Bessere Qualität, bessere Dienstleistungen!

IDS 561 Big data analytics Assignment 1

Running Hadoop on Windows CCNP Server

Advanced Data Management Technologies

RHadoop and MapR. Accessing Enterprise- Grade Hadoop from R. Version 2.0 (14.March.2014)

NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September National Institute of Standards and Technology (NIST)

Hadoop Basics with InfoSphere BigInsights

MapReduce Tutorial. Table of contents

Report Vertiefung, Spring 2013 Constant Interval Extraction using Hadoop

Data Science Analytics & Research Centre

CS 378 Big Data Programming. Lecture 2 Map- Reduce

Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from

Hadoop Hands-On Exercises

cloud-kepler Documentation

University of Maryland. Tuesday, February 2, 2010

Introduction to Hadoop

CS 378 Big Data Programming

Introduc)on to Map- Reduce. Vincent Leroy

Assignment 1: MapReduce with Hadoop

Hadoop Streaming. Table of contents

School of Parallel Programming & Parallel Architecture for HPC ICTP October, Hadoop for HPC. Instructor: Ekpe Okorafor

Transcription:

Course NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce Martin Svoboda Faculty of Mathematics and Physics, Charles University in Prague

MapReduce: Overview MapReduce Programming model for processing of large data sets with a parallel and distributed algorithm on a cluster Map function Processes input data in order to emit a set of intermediate key/value pairs (k 1, v 1 ) list(k 2, v 2 ) Reduce function Merges (all) intermediate values associated with the same intermediate key (k 2, list(v 2 )) (k 2, possibly smaller list(v 2 )) NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 2

MapReduce: Example Word frequency map and reduce functions map(string key, String value): // key: document name // value: document contents for each word w in value: Emit(w, "1"); reduce(string key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(key, AsString(result)); NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 3

MapReduce: Example Word frequency data flow NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 4

Hadoop: Overview Apache Hadoop Open-source software framework allowing to execute applications on large clusters of commodity hardware Modules Hadoop Common Hadoop Distributed File System (HDFS) Distributed, scalable, and portable file-system Hadoop YARN Job scheduling and cluster resource management Hadoop MapReduce Implementation of the MapReduce programming model NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 5

Apache Hadoop Tutorial HDFS MapReduce

Hadoop Client SSH and SFTP access acheron.ms.mff.cuni.cz:20104 Login: SIS login, lower case (e.g. mylogin) Password: SIS login, upper/lower case (e.g. MyLoGiN) Tools PuTTY http://www.chiark.greenend.org.uk/~sgtatham/putty/ WinSCP (Windows Secure Copy) http://winscp.net/ NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 7

Hadoop Client Open PuTTY connection Change your password passwd Open WinSCP connection Explore directories /home/mylogin/ Personal directory with private content /home/ndbi040/ Shared directory with all the tutorial files NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 8

First Steps Get familiar with basic Hadoop commands hadoop Help for all Hadoop commands hadoop fs Hadoop file system commands hadoop jar MapReduce jobs launching NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 9

Word Count Example Create your local working directory mkdir WordCount Make a copy of the sample java source file cd WordCount cp /home/ndbi040/wordcount.java. Compile it mkdir classes javac classpath /usr/lib/hadoop/hadoop-common- 2.4.1.jar:/usr/lib/hadoop-mapreduce/hadoopmapreduce-client-core-2.4.1.jar -d classes/ WordCount.java jar -cvf WordCount.jar -C classes/. NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 10

Word Count Example Create your HDFS working directory hadoop fs -mkdir /tmp/mylogin Prepare sample input data hadoop fs -mkdir /tmp/mylogin/input1 hadoop fs -copyfromlocal /home/ndbi040/file01 /tmp/mylogin/input1/ hadoop fs -copyfromlocal /home/ndbi040/file02 /tmp/mylogin/input1/ Explore the file system (using a web browser) http://acheron.ms.mff.cuni.cz:20106 (HDFS name node) NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 11

Word Count Example Run the prepared sample job hadoop jar WordCount.jar org.myorg.wordcount /tmp/mylogin/input1 /tmp/mylogin/output1 Follow the execution progress (using a web browser) http://acheron.ms.mff.cuni.cz:20105 (MapReduce JobTracker) NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 12

Word Count Example Look at the job result hadoop fs -copytolocal /tmp/mylogin/output1/part-00000./result.txt Clean the output directory At least in case you want to run the job once again hadoop fs -rmr /tmp/mylogin/output1/ NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 13

Bigger Example Shakespeare hadoop fs -mkdir /tmp/mylogin/input2 hadoop fs -copyfromlocal /home/ndbi040/shake.txt /tmp/mylogin/input2/shake.txt hadoop jar WordCount.jar org.myorg.wordcount /tmp/mylogin/input2 /tmp/mylogin/output2 NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 14

Useful Commands List identifiers of all jobs mapred job list all Kill a particular job mapred job -kill <job-id> Print status counters of a job mapred job -status <job-id> NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 15

NetBeans Project Launch NetBeans IDE Create a new project Select Java application as project type Add Hadoop libraries Make local copies of libraries from /home/ndbi040/ hadoop-common-2.4.1.jar hadoop-mapreduce-client-core-2.4.1.jar Follow Add JAR/Folder and add both these libraries Build project to create jar distribution NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 16

MapReduce Source Files Map and reduce functions class Map extends MapReduceBase implements Mapper { } class Reduce extends MapReduceBase implements Reducer { } Key/values pairs output.collect(key, value); WritableComparable for keys Writable for values Available classes: Text, IntWritable, NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 17

MapReduce Source Files Job configuration Job name conf.setjobname("wordcount"); Map, reduce and combine functions conf.setmapperclass(map.class); conf.setcombinerclass(reduce.class); conf.setreducerclass(reduce.class); Types of keys and values conf.setoutputkeyclass(text.class); conf.setoutputvalueclass(intwritable.class); Input and output format conf.setinputformat(textinputformat.class); conf.setoutputformat(textoutputformat.class); Number of reducers ;() conf.setnumreducetasks NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 18

MapReduce Source Files Input and output files FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); Job execution JobClient.runJob(conf); Blocks and waits until the job is finished JobClient.submitJob(conf); Invokes the job execution but returns immediately Poll for status to make running decisions Or provide a URI to be invoked when the job finishes conf.setjobendnotificationuri() NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 19

Local Cluster Network RDP (Remote Desktop Protocol) access vdr.ms.mff.cuni.cz Login and password as for the client Tools Remote Desktop Connection Cluster Client: 10.20.86.182:22 HDFS name node: http://10.20.86.180:50070/ MapReduce JobTracker: http://10.20.86.177:8088/ NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 20

References MapReduce tutorial http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/ hadoop-mapreduce-client-core/mapreducetutorial.html Commands manual https://hadoop.apache.org/docs/current/hadoop-project-dist/ hadoop-common/commandsmanual.html File system shell commands https://hadoop.apache.org/docs/current/hadoop-project-dist/ hadoop-common/filesystemshell.html JavaDoc documentation https://hadoop.apache.org/docs/current/api/ NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 21

Assignment 1 MapReduce

Assignment 1 Select and implement any computation problem that can be solved using MapReduce Store all the related information into your working subdirectory in /home/ndbi040/ Sample input + output data Java source code + compiled jar Readme file: topic and how-to-run description Send an e-mail as a confirmation svoboda@ksi.mff.cuni.cz NDBI040: Big Data Management and NoSQL Databases Practice 01: MapReduce 20. 10. 2015 23