Running Hadoop at Stirling Kevin Swingler Summary The Hadoopserver in CS @ Stirling A quick intoduction to Unix commands Getting files in and out Compliing your Java Submit a HadoopJob Monitor your jobs See the results Debugging 1
Hadoop0 We will be interacting with Hadoopvia the command line First thing to do is connect to the Hadoop server via SSH In Windows, we do this with PuTTY(there are other programs that do the same) PuTTY 2
Where am I? The command window is a Unix shell on the machine called hadoop0 Typing pwdto find out the current directory gives /home/username This is your file space Some Unix Commands ls List contents of directory cat filename Show contents of file more filename Show contents page by page tail filename Show end of file cd /dir/dir Change directory cp file tofile Copy a file mkdir Make a directory grep term file Search for term in file 3
History Use the up and down arrow keys to select previously typed commands Type history to see your recent commands Re-run a recent command with!num where num is the number of the command in the history list Directing output You can direct the output of a command into a file with > 4
Piping Output You can put the output of one process as the input for the next with ls grep txt Lists any file that contains the substring txt Get Files Into HDFS The HDFS file store is different from the local file store on hadoop0, where you log in Copy data files (not program code -see later) from the local store to HDFS: hdfs dfs -copyfromlocal datafile /home/username/targetdir Check it is there with hdfs dfs ls /home/username/targetdir 5
Compile Your Java With your java files in the local store (not HDFS) on hadoop0, compile them with hadoop com.sun.tools.javac.main myprog.java Then make a jar file with jar cf myprog.jar myprog*.class Type ls to verify the jar file has been created Submit the Job You submit the job to Hadoopby typing hadoop jar myprog.jar /home/un/data /home/un/res Where (in this example) the files you wish to process are in /home/un/data and you would like the results to go to /home/un/res. (un is username) The /res folder must NOT already exist. Hadoop will create it for you, but if it is already there, it will fail. 6
Follow Job Progress Messages will appear in the console window Web Server Interface Point a browser at vm000:8088/cluster 7
Debugging System.out.printoutput does not go to the console, it goes to a log, so is not the most convenient way to debug Simple debugging is best done on a modified version of the map and reduce functions running in your IDE (Eclipse in our case) Just need to add two.jar files to the project path: hadoop-common-2.0.0-alpha.jar hadoop-mapreduce-client-core-2.0.2-alpha.jar Debugging in Eclipse This method does NOT use Hadoopor run MapReduce It allows you to use classes such as Writable and do syntax checking You can test the logic of the map and reduce functions independently But not the whole MapReduce process 8
Example public static class TMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasmoretokens()) { word.set(itr.nexttoken()); // context.write(word, one); System.out.printf("%s : %s\n",word.tostring(), one.tostring()); } } } Implement the map function and then test that it processes an example file line. public static void main(string[] args) throws IOException, InterruptedException { TMapper tm=new TMapper(); Text word = new Text(); word.set("hello World"); tm.map(0, word, null); } Note We commented out the context.write line and replaced it with printf No file reading takes place in this example, we pass an example row in the valueparameter when we call map We could, of course, read from a local file We send nullinstead of a context when we call map 9