Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

Size: px

Start display at page:

Download "Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster"

Dina Floyd
10 years ago
Views:

1 Setup Hadoop On Ubuntu Linux ---Multi-Node Cluster We have installed the JDK and Hadoop for you. The JAVA_HOME is /usr/lib/jvm/java/jdk1.6.0_22 The Hadoop home is /home/user/hadoop Network Edit /etc/hosts on every node If you have, for examples, the following nodes: (Remember to replace the hostname according to your machine. eg.ubuntu01-01, ubuntu01-02) master: IP: hostname: ubuntu01-01 slaves: IP: IP: IP: IP: hostname: ubuntu01-02 hostname: ubuntu01-03 hostname: ubuntu01-04 hostname: ubuntu01-05 Then add the following lines in /etc/hosts on every node: # /etc/hosts (for master AND slave) ubuntu ubuntu ubuntu ubuntu ubuntu Configure For master: edit conf/masters as follow: ubuntu01-01 edit conf/slaves as follow: ubuntu01-02 ubuntu01-03 ubuntu01-04 ubuntu01-05

168.0.2 IP: 192.168.0.3 IP: 192.168.0.4 IP: 192.168.0.5 hostname: ubuntu01-02 hostname: ubuntu01-03 hostname: ubuntu01-04 hostname: ubuntu01-05 Then add the following lines in /etc/hosts on every node: # /etc/hosts (for master AND slave) 192.

2 For every node do the followings: 1). Configure JAVA_HOME $ cd hadoop $ gedit conf/hadoop-env.sh And change: # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun to: # The java implementation to use. Required. export JAVA_HOME=/usr/lib/jvm/java/jdk1.6.0_22 Save & exit. 2). Creates some directories in hadoop home: $ cd hadoop $ mkdir tmp $ mkdir hdfs $ mkdir hdfs/name $ mkdir hdfs/data 3). Configurations setup Under conf/, edit the following files, note that "/path/to/your/hadoop" should be replaced with something like "/home/user/hadoop " conf/core-site.xml: <configuration> <name>fs.default.name</name> <value>hdfs://ubuntu01-01:9000</value> <name>hadoop.tmp.dir</name> <value> /tmp/hadoop-${user.name} </value> </configuration>

2 $ mkdir tmp $ mkdir hdfs $ mkdir hdfs/name $ mkdir hdfs/data 3).

3 conf/hdfs-site.xml: <configuration> <name>dfs.replication</name> <value>3</value> <name>dfs.name.dir</name> <value>/home/${user.name}/hadoop/hdfs/name</value> <name>dfs.data.dir</name> <value>/home/${user.name}/hadoop/hdfs/data</value> <name>fs.checkpoint.dir</name> <value>/home/${user.name}/hdfs/namesecondary</value> </configuration> conf/mapred-site.xml: <configuration> <name>mapred.job.tracker</name> <value>ubuntu01-01:9001</value> </configuration> 4). Configure passphaseless ssh $ ssh localhost You will need password to log in ssh. $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ exit Configuration done. Try: $ ssh localhost You should now log in without password.

xml: <configuration> <name>mapred.job.tracker</name> <value>ubuntu01-01:9001</value> </configuration> 4).

4 3. SSH Access master must have passphaseless log in authorities to all slaves. ssh-copy-id I $HOME/.ssh/id_rsa.pub user@ubuntu01-02 user@ubuntu01-01:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub user@ubuntu01-03 user@ubuntu01-01:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub user@ubuntu01-04 user@ubuntu01-01:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub user@ubuntu01-05 You will need the corresponding slave's password to running the above commands. Try: user@ubuntu01-01:~$ ssh ubuntu01-02 user@ubuntu01-01:~$ ssh ubuntu01-03 user@ubuntu01-01:~$ ssh ubuntu01-04 user@ubuntu01-01:~$ ssh ubuntu01-05 You should now log in without password. 4. First run You should format the HDFS (Hadoop Distributed File System). Run the following command on the master: $ bin/hadoop namenode -format 5. Start Cluster 1). Start HDFS Daemons Run the following command on master: $ bin/start-dfs.sh Use the following command on every nodes to check the status of daemons: $ jps run jps on master, you should see something like this: 7803 NameNode 8354 SecondaryNameNode

Try: user@ubuntu01-01:~$ ssh ubuntu01-02 user@ubuntu01-01:~$ ssh ubuntu01-03 user@ubuntu01-01:~$ ssh ubuntu01-04 user@ubuntu01-01:~$ ssh ubuntu01-05 You should now log in without password. 4.

5 run jps on slaves, you should see something like this: 2). Start MapReduce Daemons Run the following command on master: $ bin/start-mapred.sh Use the following command on every nodes to check the status of daemons: $ jps run jps on master, you should see something like this: 7803 NameNode 8547 TaskTracker 8422 JobTracker 8354 SecondaryNameNode run jps on slaves, you should see something like this: 8547 TaskTracker 6. Hadoop Web Interfaces There are some web interfaces that let you know what is going on with the running hadoop. web UI for MapReduce job tracker(s) web UI for task tracker(s) web UI for HDFS name node(s) 7. Run a Map Reduce Job, WordCount Create a directory named "input" in HDFS: $ bin/hadoop dfs -mkdir input

SecondaryNameNode run jps on slaves, you should see something like this: 8547 TaskTracker 6.

6 Copy some text file into input $ bin/hadoop dfs -put conf/* input Run WordCount $ bin/hadoop jar hadoop examples.jar wordcount input output Display output: $ bin/hadoop dfs -cat output/* 8. Stop Cluster Close MapReduce daemons Run on master: $ bin/stop-mapred.sh Close HDFS daemons Run on master: $ bin/stop-dfs.sh

jar wordcount input output Display output: $ bin/hadoop dfs -cat output/* 8.

Apache Hadoop new way for the company to store and analyze big data

Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File