Hadoop Multi-node Cluster Installation on Centos6.6

Hadoop Multi-node Cluster Installation on Centos6.6 Created: 01-12-2015 Author: Hyun Kim Last Updated: 01-12-2015 Version Number: 0.1 Contact info: hyunk@loganbright.com Krish@loganbriht.com

Hadoop Multi Cluster Installation Guide with Centos 6 In this tutorial, we are using Centos 6.6 and we are going to install multi node cluster Hadoop. For this tutorial, we need at least two nodes. One of them is going to be a master node and the other node is going to be a slave node. I m only using two nodes in this tutorial to make this guide as simple as possible. We will be installing namenode and jobtracker on the master node and installing datanode, tasktracker, and secondarynamenode on the slave node. I m using hostname for my masternoe as lbb01.exmaple.com and slavenode as lbb02.example.com. Simple enough? Let s get started. Static IP Configuration We want our servers to work all the time even when they restart by accident. Therefore, we will configure static ip for each server. Use the command below to open ethernet configuration. You connection might be eth0 instead of em1. $nano /etc/sysconfig/network-scripts/ifcfg-em1 Change BOOTPROTO = static and add your IPADDR and NETMASK. You can check your ip and netmask address by using ifconfig command. As an exmaple: IPADDR= 192.168.23.234 NETMASK= 255.255.255.0

Configure Default Gateway $ nano /etc/sysconfig/network Now we are trying to configure network. This may sound complicated but we are simply add HOSTNAME and GATEWAY. If GATEWAY or HOSTNAME exists already, simply edit them. I m using lbb01.exmaple.com as my hostname as you can see in the picture below. Add your GATEWAY=XXX.XXX.XXX.X Restart network $etc/init.d/network restart Configure DNS $ nano /etc/resolv.conf add your primary and alternative nameserver. For example, nameserver xxx.xxx.xxx.x nameserver xxx.xxx.xxx.x $ install yum to update everything.

Download JDK We need JDK to install Hadoop. I m installing jdk-7u25 in this tutorial. ww.oracle.com/technetwork/java/javase/downloads/java-archive-downloadsjavase7-521261.html#jdk-7u25-oth-jpr

Download hadoop We are installing hadoop-0.20.0 in this tutorial. Hadoop-0.20.0 Donwload link-- https://archive.apache.org/dist/hadoop/core/hadoop-0.20.0/

I saved the file under root folder. Ping localhost Do what we ve done so far on slave node as well. Do change host name to lbb02.exmaple.com NOT lbb01.example.com. Each node has different IPADDR(ip address) so use command ifconfig to adjust all the settings. edit /etc/hosts on each node edit the hosts file. $nano /etc/hosts add XXX.XXX.XXX.XXX(ip address for your master node) lbb01.example.com(hostname for your master node)

XXX.XXX.XXX.XXX(ip address for your slave node) lbb02.example.com(hostname for your master node) Try to ping each host to see if they can communicate with each other. You should be able to ping each host by hostname now. On each node, $ping lbb01.example.com $ping lbb02.exmaple.com nslookup $ nslookup lbb01.example.com $ nslookup lbb02.example.com If these commands outputs server, address, name on each node, we have successfully configured network settings. Install hadoop As you can see, I m logged in as a root user. However, I m not going to extract hadoop as a root user. I will be moving the hadoop file to /home/lbbd/ since that is where I can write the file under the user name lbbd. Your user/account name will be different. Be aware. Giving lbbd permission Although the hadoop file is extracted under /home/lbbd/, we need to give lbbd permission to play wit this folder. To do this, use the command below.

$ chown -R lbbd:lbbd /home/lbbd/hadoop-0.20.0 Change hadoop-0.20.0 to hadoop $ ln -s hadoop-0.20.0 hadoop Why change to hadoop? So that whenever we need to edit something on hadoop-0.20.0 folder, we don t have to type -0.20.0 anymore. We can simply go to hadoop-0.20.0 folder by $ cd /home/lbbd/hadoop. It s convenient. Install JDK I saved the jdk-7u25 file on /root/hadoop_packages. You didn t have to do this. Wherever you saved your jdk file, go to the folder. use the command below to extract the file. $ rpm -ivh hadoop_pcakges/jdk-7u25-linux-x64.rpm Edit hadoop-env.sh

$nano /home/lbbd/hadoop/conf/hadoop-env.sh Now we need to change hadoop-env since we need to let hadoop related files to know where we we extracted jdk and hadoop. so I added two lines below: export JAVA_HOME=/usr/java/jdk1.7.0_25/ export HADOOP_HOME=/home/lbbd/hadoop core-site.xml edit $nano /home/lbbd/hadoop/conf/core-site.xml Edit the file by adding <property> <name>fs.default.name</name> <value>hdfs://(your host anme):9000</value> </property>

hdfs-site.xml <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.name.dir</name> <value>/var/datastore</value> <final>true</final> </property>

Don t forget to give you account permission to /var/datastore. Namenode cannot run without permission. So login as root and create the folder shown above $ mkdir /var/datastore then give the user permission to access to the folder $ chown -R lbbd:lbbd /var/datastore use to below command to see if the permission has been updated $ls -l /var/ mapred-site.xml <property> <name>mapred.job.tracker</name> <value>hostname:9001</value> </property>

edit.bash_profile $ nano.bash_profile

run these commands below to see if everything is installed and directed correctly in the system $java $hadoop $jps

Format Namenode $ hadoop namenode -format

$ hadoop-daemon.sh start namenode $ jps jobtracker running $ hadoop-daemon.sh start jobtracker $ jps Do all the followings above on your slave node as well. However, when you edit hdfs.xml file use the properties below: <property> <name>dfs.replication</name>

<value>2</value> </property> <property> <name>dfs.data.dir</name> <value>/home/data</value> <final>true</final> </property> And then you need to create data folder by $mkdir /home/data (as root user) and give your user account permission to this folder as we did with /var/datastore folder.