Hadoop Lab - Setting a 3 node Cluster. http://hadoop.apache.org/releases.html. Java - http://wiki.apache.org/hadoop/hadoopjavaversions



Similar documents
Hadoop Multi-node Cluster Installation on Centos6.6

HADOOP - MULTI NODE CLUSTER

How To Install Hadoop From Apa Hadoop To (Hadoop)

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

Hadoop Installation. Sandeep Prasad

Installing Hadoop. You need a *nix system (Linux, Mac OS X, ) with a working installation of Java 1.7, either OpenJDK or the Oracle JDK. See, e.g.

Installing Hadoop. Hortonworks Hadoop. April 29, Mogulla, Deepak Reddy VERSION 1.0

研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1

HSearch Installation

Hadoop (pseudo-distributed) installation and configuration

Installation and Configuration Documentation

Running Kmeans Mapreduce code on Amazon AWS

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters

Set JAVA PATH in Linux Environment. Edit.bashrc and add below 2 lines $vi.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-oracle/

HADOOP. Installation and Deployment of a Single Node on a Linux System. Presented by: Liv Nguekap And Garrett Poppe

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

Single Node Hadoop Cluster Setup

Tableau Spark SQL Setup Instructions

Data Analytics. CloudSuite1.0 Benchmark Suite Copyright (c) 2011, Parallel Systems Architecture Lab, EPFL. All rights reserved.

How To Use Hadoop

This handout describes how to start Hadoop in distributed mode, not the pseudo distributed mode which Hadoop comes preconfigured in as on download.

Deploying Apache Hadoop with Colfax and Mellanox VPI Solutions

Hadoop Distributed File System and Map Reduce Processing on Multi-Node Cluster

HADOOP CLUSTER SETUP GUIDE:

CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation

HADOOP MOCK TEST HADOOP MOCK TEST II

A Study of Data Management Technology for Handling Big Data

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.

SUSE Manager in the Public Cloud. SUSE Manager Server in the Public Cloud

Integration Of Virtualization With Hadoop Tools

Syncplicity On-Premise Storage Connector

HDFS Installation and Shell

Hadoop Setup Walkthrough

Installation Guide Setting Up and Testing Hadoop on Mac By Ryan Tabora, Think Big Analytics

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

How to install Apache Hadoop in Ubuntu (Multi node setup)

Hadoop MultiNode Cluster Setup

Hadoop Training Hands On Exercise

The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - -

MapReduce, Hadoop and Amazon AWS

Pivotal HD Enterprise 1.0 Stack and Tool Reference Guide. Rev: A03

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters

How to install Apache Hadoop in Ubuntu (Multi node/cluster setup)

Oracle 11gR2 two node step by step installation guide on Linux using Virtual Box 4.1.4

TP1: Getting Started with Hadoop

Signiant Agent installation

Local Caching Servers (LCS): User Manual

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

HOW TO BUILD A VMWARE APPLIANCE: A CASE STUDY

Pivotal HD Enterprise

Deploying MongoDB and Hadoop to Amazon Web Services

Installing QuickBooks Enterprise Solutions Database Manager On Different Linux Servers

McAfee Firewall for Linux 8.0.0

Single Node Setup. Table of contents

Using BAC Hadoop Cluster

E6893 Big Data Analytics: Demo Session for HW I. Ruichi Yu, Shuguan Yang, Jen-Chieh Huang Meng-Yi Hsu, Weizhen Wang, Lin Haung.

Platfora Installation Guide

Using VirtualBox ACHOTL1 Virtual Machines

Implementing a Weblogic Architecture with High Availability

Penetration Testing Lab. Reconnaissance and Mapping Using Samurai-2.0

Hadoop Data Warehouse Manual

Release Notes for McAfee(R) VirusScan(R) Enterprise for Linux Version Copyright (C) 2014 McAfee, Inc. All Rights Reserved.

Alinto Mail Server Pro

Perforce Helix Threat Detection OVA Deployment Guide

stub (Private Switch) Solaris 11 Operating Environment In the Solaris 11 Operating Environment, four zones are created namely:

OnCommand Performance Manager 1.1

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box

Spectrum Scale HDFS Transparency Guide

SCHOOL OF SCIENCE & ENGINEERING. Installation and configuration system/tool for Hadoop

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Hadoop 2.6 Configuration and More Examples

HADOOP MOCK TEST HADOOP MOCK TEST

Hadoop Installation Guide

VERSION 9.02 INSTALLATION GUIDE.

Hadoop Setup. 1 Cluster

SYMANTEC BACKUPEXEC2010 WITH StorTrends

Hadoop Architecture. Part 1

2.1 Hadoop a. Hadoop Installation & Configuration

Semantic based Web Application Firewall (SWAF - V 1.6)

Distributed Filesystems

ODP REGIONAL NODE DEPLOYMENT QUICK GUIDE FOR TRAININGS

Revolution R Enterprise 7 Hadoop Configuration Guide

CDH 5 Quick Start Guide

About this Tutorial. Audience. Prerequisites. Copyright & Disclaimer

Introduction to HDFS. Prasanth Kothuri, CERN

CURSO: ADMINISTRADOR PARA APACHE HADOOP

HDFS Cluster Installation Automation for TupleWare

MapReduce. Tushar B. Kute,

Introduction to HDFS. Prasanth Kothuri, CERN

LOCKSS on LINUX. Installation Manual and the OpenBSD Transition 02/17/2011

Apache Hadoop new way for the company to store and analyze big data

Penetration Testing LAB Setup Guide

Prepared for: How to Become Cloud Backup Provider

Cassandra Installation over Ubuntu 1. Installing VMware player:

Hadoop Basics with InfoSphere BigInsights

How to Install Multicraft on a VPS or Dedicated Server (Ubuntu bit)

Hadoop and Hive. Introduction,Installation and Usage. Saatvik Shah. Data Analytics for Educational Data. May 23, 2014

Transcription:

Hadoop Lab - Setting a 3 node Cluster Packages Hadoop Packages can be downloaded from: http://hadoop.apache.org/releases.html Java - http://wiki.apache.org/hadoop/hadoopjavaversions Note: I have tested the Lab environment with OpenJdk and it works fine. But, if you need to run Pig scripts or spawn many jobs, then use an Oracle_sun-jdk. My Packages jdk-7u25-linux-x64.rpm ( Can be downloaded from Oracle site) hadoop-20.205.tar.gz ( Use this for Hadoop 1.0 setup and for Hadoop 2.0, version greater then 0.23) For simplicity of Labs and get started with things, use Hadoop 1.0 - i.e hadoop-20.205.tar.gz

Base Machine Model Name: MacBook Pro Processor Name: Processor Speed: Intel Core i7 2.7 GHz Number of Processors: 1 Total Number of Cores: 4 L2 Cache (per Core): 256 KB L3 Cache: 6 MB Memory: 16 GB As you can see that the base machine has really good configurations, you need at-least a 4 GB machine for 2 nodes Virtualization of Hadoop. LAB Setup I am using CentOS 5.4, but you can use any Linux Flavor. Virtualization: Can use VMware or VirtualBox. Oracle VirtualBox - Can be downloaded free from the Oracle site. Download and install on Base Machine. Create a Linux Virtual Machine in the Vbox, allocate minimum 2 GB RAM, 4 GB HDD. Do a Minimal install of OS. Once a Virtual machine is created, run and test it. Then clone it - Do a full clone, using the clone option in the Vbox.

Can do the number of clone, as per the number of data nodes required and the Base system configuration you have. Assign a unique hostname and IP to each machine. Make sure both the hosts are in the same subnet and can ping each other. Can create extra virtual hard drives and mount them under /data. We will use them for hadoop storage in LAB. Namenode: Hostname: nn1.cluster1.com Domain: cluster1.com IP: 192.168.1.70

Datanode1: Hostname: dn1.cluster1.com Domain: cluster1.com IP: 192.168.1.71

Datanode2: Hostname: dn2.cluster1.com Domain: cluster1.com IP: 192.168.1.72 For the hosts to communicate using host names - Please configure DNS on any one of the nodes. My Namenode is acting as a DNS server. On All host: /etc/resolv.conf nameserver 192.168.1.70

Test, that you can resolve DNS from all hosts. $ nslookup nn1.cluster1.com If it returns the IP, we are all set. Hadoop Installation 1. Log-in to namenode (nn1) as root - Download the Hadoop and Java Packages. Best is to download it on the base Machine by using wget or scp to all the nodes. 2. Create a user hadoop - "useradd hadoop" and set a password for the user using "echo hadoop passwd --stdin hadoop" (Remember we are doing this as "root" or use "sudo") 3. Install Java using "rpm -ivh --aid jdk-7u25-linux-x64.rpm" or use yum/apt-get as per your distro. 4. Extract Hadoop using "tar -zxvf hadoop-20.205.tar.gz /home/hadoop/". 5. chown hadoop:hadoop -R /home/hadoop/hadoop 6. sudo - hadoop (Change as hadoop user)

7. Execute command "pwd" - You must be in /home/hadoop 8. cd hadoop/conf 9. Do "ls -l" and take a look at the config files there. This is the Hadoop configuration directory. 10. We need to edit 3 important files - hadoop-env.sh, hdfssite.xml, core-site.xml, mappred-site-xml The files will contain: 1. hadoop-env.sh This is the file, which sets the Hadoop environment, like JAVA PATH, memory requirement etc. Add a line in the file depending upon the JAVA installation PATH - export JAVA_HOME=/usr/bin/java/ or - export JAVA_HOME=/usr/java/jdk1.7.0_25/ Hadoop Path - export HADOOP_HOME=/home/hadoop/hadoop

2. core-site.sh <configuration> <property> <name>fs.default.name</name> <value>hdfs://nn1.cluster1.com:9000</value> </property> </configuration> 3. hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property>

<name>dfs.name.dir</name> <value>/var/datastore</value> Namenode <------ this section only on <final>true</final> </property> <property> <name>dfs.data.dir</name> <value>/home/data</value> only on Datanode <------ this section <final>true</final> </property> </configuration> Note: /home/data - Can be a separate HDD or a partition for storing data on the Data node. 4. mapred-site.xml

<configuration> <property> <name>mapred.job.tracker</name> <value>nn1.cluster1.com:9001</value> </property> </configuration> 5. Another important file to setup is the.bash_profile In you home directory i.e /home/hadoop 1. vi.bash_profile # Add the below lines to the file, do not remove the existing lines # User specific environment and startup programs PATH=$PATH:$HOME/bin PATH=$PATH:/usr/java/jdk1.7.0_25/bin

export JAVA_HOME=/usr/java/jdk1.7.0_25/ export HADOOP_HOME=/home/hadoop/hadoop export PATH=$PATH:$HADOOP_HOME/bin export HADOOP_HOME_WARN_SUPPRESS="TRUE" 2. Either logout and login back or just run../bash_profile Starting Hadoop As user hadoop, on Namenode (nn1) 1. Execute "hadoop namenode -format" - If this executes, then all your PATH and environment is fine. This will format the namenode. Do not worry, format here does not mean the same as file system formatting. All you data will be safe. 2. $ hadoop-daemon.sh start namenode 3. $ jps Now, you should see Namenode running in the process listed. You can check namenode also from the browser: http://nn1.cluster1.com:50070 4. $ hadoop-daemon.sh start jobtracker Check Job tracker URI : http://nn1.cluster1.com:50030

5. $ jps Now, you will see jobtracker running as well. Setting up Datanode. As user Hadoop on datanodes Do all the steps as above before the Hadoop startup. $ No need to format a "datanode". We will see the details on why? $ hadoop-daemon.sh start datanode $ hadoop-daemon.sh start tasktracker $ jps What daemons you see? Check http://nn1.cluster1.com:50070 to see the difference in the storage presentation then before. Play around with the daemons, to see how they are related. 5. Some hadoop commands $ hadoop fs -ls / $ hadoop fsck /