Hadoop 2.2.0 MultiNode Cluster Setup

Hadoop 2.2.0 MultiNode Cluster Setup Sunil Raiyani Jayam Modi June 7, 2014 Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 1 / 14

Outline 4 Starting Daemons 1 Pre-Requisites 2 Network Settings 5 Map Reduce Task 6 References 3 Conguration Files Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 2 / 14

Pre-Requisites Pre-Requisites Setup a Hadoop single node cluster on the master and slaves as described in [2]. Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 3 / 14

Network Settings Network Settings To run a multinode cluster ensure that the master and all the slaves are on a single network. Identify the ip address of each system. Now make entries in the /etc/hosts le as follows: 10.129.46.120 master 10.129.46.111 slave01 These entries should also be in all the systems. Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 4 / 14

Network Settings SSH Login for Slaves SSH Login for Slaves Add the public key of master to all slaves using the command: ssh-copy-id -i $HOME/.ssh/id_dsa.pub hduser@slave01 Now ssh to the master and slaves for ensuring that passwordless ssh has been setup properly. ssh master ssh slave01 Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 5 / 14

Conguration Files Conguration Files Add the following lines bewteen <conguration> and < /conguration> tags to the les in $HADOOP_HOME/etc/hadoop folder for both master and slave [1]: core-site.xml <property> <name>fs.defaultfs</name> <value>hdfs://master:9000</value> </property> Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 6 / 14

Conguration Files Conguration Files Conguration Files yarn-site.xml <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.shufflehandler</value> </property> mapred-site.xml <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 7 / 14

Conguration Files Conguration Files Conguration Files hdfs-site.xml <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hduser/mydata/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hduser/mydata/hdfs/namenode</value> </property> Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 8 / 14

Conguration Files Conguration Files Conguration Files Now add the following names of all slaves to the the $HADOOP_HOME/etc/slaves le. nano $HADOOP_HOME/etc/slaves slave01 Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 9 / 14

Starting Daemons Starting Daemons Format the namenode if you want to erase data on the Hadoop File System using the command hdfs namenode -format Run the following two scripts on the master node to start the hadoop and yarn daemons: start-dfs.sh start-yarn.sh Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 10 / 14

Starting Daemons Starting Daemons Starting Daemons To test of if the Daemons have started properly or not run the jps on the master and slave: Master: hduser@master:/usr/local/hadoop$: 9412 SecondaryNameNode 9784 NameNode 19056 Jps 10173 ResourceManager Slave: hduser@slave01:/usr/local/hadoop$: 18762 Datanode 18865 Nodemanager 20223 Jps jps jps Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 11 / 14

Map Reduce Task Map Reduce task Run the following commands on the master system to run a sample wordcount program on the cluster: sudo mkdir /in sudo nano /in/file Type in some text and save the le. hdfs dfs -copyfromlocal /in/file /file yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /file /out Note : The /out directory must not already exist on the HDFS system else it will give an error. Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 12 / 14

Map Reduce Task Map Reduce Task Map Reduce Task The output of the mapreduce task will be saved in the /out directory on the distributed system. Use the follwing command to view the result : hdfs dfs -text /out/part-r-00000 Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 13 / 14

References References I [1] http://solaimurugan.blogspot.in/2013/11/setup-multi-nodehadoop-20-cluster.html accessed on June 7, 2014 [2] Hadoop Installation Manual : http://www.it.iitb.ac.in/frg/brainstorming/sites/default /les/ P1_saatvik14_Week_2_HadoopHiveInstallation_1_ 2014_05_17.pdf Sunil Raiyani Jayam Modi Hadoop 2.2.0 MultiNode Cluster Setup June 7, 2014 14 / 14