Contents Download and install Java JDK... 1 Download the Hadoop tar ball... 1 Update $HOME/.bashrc... 3 Configuration of Hadoop in Pseudo Distributed Mode... 4 Format the newly created cluster to create a DFS... 6 Start the hadoop daemons... 7 Running a Map Reduce Job... 9 Hadoop Installation Tutorial (Hadoop 1.x) Download and install Java JDK Download the Hadoop tar ball Hadoop 1 tar ball download location https://archive.apache.org/dist/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz 1. Download Hadoop 1.x from Apache Hadoop web site For this demo I will be using hadoop-1.2.1.tar.gz 2. Unpack the downloaded hadoop tar file. You will see a folder with the name hadoop-1.2.1 $ tar -zxf hadoop-1.2.1.tar.gz
3. Create a soft link pointing to the newly created directory from above step $ ln -s hadoop-1.2.1 hadoop
Update $HOME/.bashrc Add the following entries to.bashrc file of the user that will run Hadoop hadoop@ubuntu:~$vi ~/.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-oracle export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/home/hadoop/hadoop export PATH=$PATH:$HADOOP_HOME/bin
Save the file and re-login or issue the following command to reload the environment variables $ source.bashrc Your hadoop install is done, execute the following command to verify - $ hadoop Configuration of Hadoop in Pseudo Distributed Mode We will now simulate a small cluster which runs all the Hadoop daemons in single machine. Step 1 Edit the following configuration files
i. /home/hadoop/hadoop/conf/hadoop-env.sh ii. /home/hadoop/hadoop/conf/core-site.xml iii. /home/hadoop/hadoop/conf/hdfs-site.xml iv. /home/hadoop/hadoop/conf/mapred-site.xml hadoop-env.sh The only required environment variable that needs to be configured for Hadoop in this tutorial is JAVA_HOME. export JAVA_HOME=/usr/lib/jvm/java-7-oracle core-site.xml Add the following property which holds location of Namenode <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property> </configuration> hdfs-site.xml and modify the following properties - dfs.replication specifies the no. of times each HDFS block should be replicated - dfs.datanode.name.dir specifies the location of the datanode block location - dfs.namenode.name.dir specifies the location of namenode meta-data location 1.<configuration>
2.<property> 3.<name>dfs.replication</name> 4.<value>1</value> 5.</property> 6.<property> 7.<name>dfs.namenode.name.dir</name> 8.<value>/home/hadoop/hadoop_store/dfs/namenode</value> 9.</property> 10.<property> 11.<name>dfs.datanode.name.dir</name> 12.<value>/home/hadoop/hadoop_store/dfs/datanode</value> 13.</property> 14.</configuration> mapred-site.xml <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property> Step 2 Once the property files are updated, now create the directory path mentioned for the namenode and data node properties in hdfs-site.xml file mkdir p /home/hadoop/hadoop_store/dfs/namenode mkdir p /home/hadoop/hadoop_store/dfs/datanode Make sure the permissions for datanode directory are set to 755 /home/hadoop/hadoop_store/dfs$ chmod 755 datanode Format the newly created cluster to create a DFS Format the Namenode $ hadoop namenode format A message will indicate that the storage directory has been successfully formatted.
Start the hadoop daemons Go to hadoop home install location and make use of hadoop-daemon.sh script to start all the daemons $ hadoop-daemon.sh start namenode $ hadoop-daemon.sh start datanode $ hadoop-daemon.sh start jobtracker
Check for process status $ jps You will see NameNode, DataNode and JobTracker processes running. Access the WebUI of NameNode on port 50070 http://localhost:50070
Running a Map Reduce Job Step 1 Create a directory /input on HDFS $ hadoop fs mkdir /input Verify if the directory is created $ hadoop fs ls / Step 2 Upload a text file which will be used for Map Reduce program
$ hadoop fs put text.txt /input Verify the file is uploaded $ hadoop fs ls /input/ Step 3 Now you are ready to run a Map Reduce Job. We will use the hadoop examples provided in this tar ball $ hadoop jar /home/hadoop/hadoop/hadoop-examples-1.2.1.jarshare/hadoop/mapreduce/hadoopmapreduce-examples-2.6.0.jar wordcount /input/test.txt /output The above command will run a map-reduce program on the file test.txt and send the output to the directory /output on HDFS. The objective here is to count the number of times each word occurs in the file test.txt. Step 4 Once the job is complete you can verify the contents of /output $ hdfs dfs ls /output $ hdfs dfs cat /output/part-r-00000 A snap shot of the commands executed to run a Map Reduce program.