Running Hadoop on Windows CCNP Server



Similar documents
Hadoop Lab Notes. Nicola Tonellotto November 15, 2010

Tutorial- Counting Words in File(s) using MapReduce

hadoop Running hadoop on Grid'5000 Vinicius Cogo Marcelo Pasin Andrea Charão

Mrs: MapReduce for Scientific Computing in Python

Hadoop Basics with InfoSphere BigInsights

Introduction to Cloud Computing

Extreme Computing. Hadoop MapReduce in more detail.

Big Data 2012 Hadoop Tutorial

Step 4: Configure a new Hadoop server This perspective will add a new snap-in to your bottom pane (along with Problems and Tasks), like so:

Copy the.jar file into the plugins/ subfolder of your Eclipse installation. (e.g., C:\Program Files\Eclipse\plugins)

Hadoop Tutorial. General Instructions

map/reduce connected components

To reduce or not to reduce, that is the question

HDFS. Hadoop Distributed File System

Hadoop Installation MapReduce Examples Jake Karnes

Hadoop Configuration and First Examples

Introduction To Hadoop

BIG DATA APPLICATIONS

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.

How To Write A Mapreduce Program In Java.Io (Orchestra)

Xiaoming Gao Hui Li Thilina Gunarathne

IDS 561 Big data analytics Assignment 1

How To Write A Map In Java (Java) On A Microsoft Powerbook 2.5 (Ahem) On An Ipa (Aeso) Or Ipa 2.4 (Aseo) On Your Computer Or Your Computer

Hadoop Hands-On Exercises

Word count example Abdalrahman Alsaedi

TP1: Getting Started with Hadoop

CS2510 Computer Operating Systems Hadoop Examples Guide

Introduction to MapReduce and Hadoop

Hadoop. Dawid Weiss. Institute of Computing Science Poznań University of Technology

Introduction to HDFS. Prasanth Kothuri, CERN

Word Count Code using MR2 Classes and API

Extreme computing lab exercises Session one

How To Write A Mapreduce Program On An Ipad Or Ipad (For Free)

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

Hadoop (Hands On) Irene Finocchi and Emanuele Fusco

Virtual Machine (VM) For Hadoop Training

Hadoop and Eclipse. Eclipse Hawaii User s Group May 26th, Seth Ladd

Report Vertiefung, Spring 2013 Constant Interval Extraction using Hadoop

Getting to know Apache Hadoop

Hadoop Streaming coreservlets.com and Dima May coreservlets.com and Dima May

Single Node Hadoop Cluster Setup

Hadoop Hands-On Exercises

Hadoop Training Hands On Exercise

Zebra and MapReduce. Table of contents. 1 Overview Hadoop MapReduce APIs Zebra MapReduce APIs Zebra MapReduce Examples...

Hadoop Shell Commands

Command Line Crash Course For Unix

How To Install Hadoop From Apa Hadoop To (Hadoop)

Introduction to HDFS. Prasanth Kothuri, CERN

研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1

Hadoop Tutorial Group 7 - Tools For Big Data Indian Institute of Technology Bombay

Hadoop Shell Commands

Basic Hadoop Programming Skills

Big Data for the JVM developer. Costin Leau,

Hadoop Basics with InfoSphere BigInsights

Extreme computing lab exercises Session one

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box

Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网. Information Management. Information Management IBM CDL Lab

How to use the Eclipse IDE for Java Application Development

Recommended Literature for this Lecture

USING HDFS ON DISCOVERY CLUSTER TWO EXAMPLES - test1 and test2

E6893 Big Data Analytics: Demo Session for HW I. Ruichi Yu, Shuguan Yang, Jen-Chieh Huang Meng-Yi Hsu, Weizhen Wang, Lin Haung.

Elastic Map Reduce. Shadi Khalifa Database Systems Laboratory (DSL)

Apache Hadoop new way for the company to store and analyze big data

Programming in Hadoop Programming, Tuning & Debugging

Creating.NET-based Mappers and Reducers for Hadoop with JNBridgePro

and HDFS for Big Data Applications Serge Blazhievsky Nice Systems

Installing a Symantec Backup Exec Agent on a SnapScale Cluster X2 Node or SnapServer DX1 or DX2. Summary

File System Shell Guide

HDFS File System Shell Guide

HSearch Installation

Extending Remote Desktop for Large Installations. Distributed Package Installs

Data Science Analytics & Research Centre

Big Data Management and NoSQL Databases

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

Overview of Web Services API

MapReduce. Tushar B. Kute,

Hadoop MapReduce Tutorial - Reduce Comp variability in Data Stamps

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

MarkLogic Server. MarkLogic Connector for Hadoop Developer s Guide. MarkLogic 8 February, 2015

Hadoop Streaming. Table of contents

Tutorial 0A Programming on the command line

HDInsight Essentials. Rajesh Nadipalli. Chapter No. 1 "Hadoop and HDInsight in a Heartbeat"

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

HDFS Installation and Shell

SparkLab May 2015 An Introduction to

Istanbul Şehir University Big Data Camp 14. Hadoop Map Reduce. Aslan Bakirov Kevser Nur Çoğalmış

Hadoop MultiNode Cluster Setup

Command Line - Part 1

Hadoop + Clojure. Hadoop World NYC Friday, October 2, Stuart Sierra, AltLaw.org

Analyse von großen Datensätzen in den Lebenswissenschaften (192217)

MR-(Mapreduce Programming Language)

Hadoop Distributed File System (HDFS)

Programming Hadoop Map-Reduce Programming, Tuning & Debugging. Arun C Murthy Yahoo! CCDI acm@yahoo-inc.com ApacheCon US 2008

Tutorial Guide to the IS Unix Service

Single Node Setup. Table of contents

Using The Hortonworks Virtual Sandbox

Installing IBM Websphere Application Server 7 and 8 on OS4 Enterprise Linux

CS242 PROJECT. Presented by Moloud Shahbazi Spring 2015

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

python hadoop pig October 29, 2015

Transcription:

Running Hadoop at Stirling Kevin Swingler Summary The Hadoopserver in CS @ Stirling A quick intoduction to Unix commands Getting files in and out Compliing your Java Submit a HadoopJob Monitor your jobs See the results Debugging 1

Hadoop0 We will be interacting with Hadoopvia the command line First thing to do is connect to the Hadoop server via SSH In Windows, we do this with PuTTY(there are other programs that do the same) PuTTY 2

Where am I? The command window is a Unix shell on the machine called hadoop0 Typing pwdto find out the current directory gives /home/username This is your file space Some Unix Commands ls List contents of directory cat filename Show contents of file more filename Show contents page by page tail filename Show end of file cd /dir/dir Change directory cp file tofile Copy a file mkdir Make a directory grep term file Search for term in file 3

History Use the up and down arrow keys to select previously typed commands Type history to see your recent commands Re-run a recent command with!num where num is the number of the command in the history list Directing output You can direct the output of a command into a file with > 4

Piping Output You can put the output of one process as the input for the next with ls grep txt Lists any file that contains the substring txt Get Files Into HDFS The HDFS file store is different from the local file store on hadoop0, where you log in Copy data files (not program code -see later) from the local store to HDFS: hdfs dfs -copyfromlocal datafile /home/username/targetdir Check it is there with hdfs dfs ls /home/username/targetdir 5

Compile Your Java With your java files in the local store (not HDFS) on hadoop0, compile them with hadoop com.sun.tools.javac.main myprog.java Then make a jar file with jar cf myprog.jar myprog*.class Type ls to verify the jar file has been created Submit the Job You submit the job to Hadoopby typing hadoop jar myprog.jar /home/un/data /home/un/res Where (in this example) the files you wish to process are in /home/un/data and you would like the results to go to /home/un/res. (un is username) The /res folder must NOT already exist. Hadoop will create it for you, but if it is already there, it will fail. 6

Follow Job Progress Messages will appear in the console window Web Server Interface Point a browser at vm000:8088/cluster 7

Debugging System.out.printoutput does not go to the console, it goes to a log, so is not the most convenient way to debug Simple debugging is best done on a modified version of the map and reduce functions running in your IDE (Eclipse in our case) Just need to add two.jar files to the project path: hadoop-common-2.0.0-alpha.jar hadoop-mapreduce-client-core-2.0.2-alpha.jar Debugging in Eclipse This method does NOT use Hadoopor run MapReduce It allows you to use classes such as Writable and do syntax checking You can test the logic of the map and reduce functions independently But not the whole MapReduce process 8

Example public static class TMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasmoretokens()) { word.set(itr.nexttoken()); // context.write(word, one); System.out.printf("%s : %s\n",word.tostring(), one.tostring()); } } } Implement the map function and then test that it processes an example file line. public static void main(string[] args) throws IOException, InterruptedException { TMapper tm=new TMapper(); Text word = new Text(); word.set("hello World"); tm.map(0, word, null); } Note We commented out the context.write line and replaced it with printf No file reading takes place in this example, we pass an example row in the valueparameter when we call map We could, of course, read from a local file We send nullinstead of a context when we call map 9