HSearch Installation

Similar documents
HADOOP - MULTI NODE CLUSTER

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

How To Install Hadoop From Apa Hadoop To (Hadoop)

Set JAVA PATH in Linux Environment. Edit.bashrc and add below 2 lines $vi.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-oracle/

Installing Hadoop. You need a *nix system (Linux, Mac OS X, ) with a working installation of Java 1.7, either OpenJDK or the Oracle JDK. See, e.g.

Hadoop (pseudo-distributed) installation and configuration

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

Installation and Configuration Documentation

Running Kmeans Mapreduce code on Amazon AWS

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

This handout describes how to start Hadoop in distributed mode, not the pseudo distributed mode which Hadoop comes preconfigured in as on download.

HADOOP CLUSTER SETUP GUIDE:

Hadoop Installation. Sandeep Prasad

CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)

Distributed Filesystems

Hadoop Lab - Setting a 3 node Cluster. Java -

Hadoop MultiNode Cluster Setup

HADOOP. Installation and Deployment of a Single Node on a Linux System. Presented by: Liv Nguekap And Garrett Poppe

Single Node Hadoop Cluster Setup

研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1

HADOOP MOCK TEST HADOOP MOCK TEST II

Hadoop Multi-node Cluster Installation on Centos6.6

TP1: Getting Started with Hadoop

HDFS Installation and Shell

Installation Guide Setting Up and Testing Hadoop on Mac By Ryan Tabora, Think Big Analytics

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Hadoop Setup Walkthrough

Hadoop Installation Guide

Introduction to HDFS. Prasanth Kothuri, CERN

Installing Hadoop. Hortonworks Hadoop. April 29, Mogulla, Deepak Reddy VERSION 1.0

About this Tutorial. Audience. Prerequisites. Copyright & Disclaimer

Tableau Spark SQL Setup Instructions

Data Analytics. CloudSuite1.0 Benchmark Suite Copyright (c) 2011, Parallel Systems Architecture Lab, EPFL. All rights reserved.

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.

How to install Apache Hadoop in Ubuntu (Multi node setup)

Introduction to HDFS. Prasanth Kothuri, CERN

How to install Apache Hadoop in Ubuntu (Multi node/cluster setup)

Deploying MongoDB and Hadoop to Amazon Web Services

Hadoop 2.6 Configuration and More Examples

HDFS Cluster Installation Automation for TupleWare

Introduction to Cloud Computing

Kognitio Technote Kognitio v8.x Hadoop Connector Setup

Apache Hadoop new way for the company to store and analyze big data

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters

The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - -

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation

Extreme computing lab exercises Session one

How To Use Hadoop

Setting up Hadoop with MongoDB on Windows 7 64-bit

Hadoop Distributed File System and Map Reduce Processing on Multi-Node Cluster

CS242 PROJECT. Presented by Moloud Shahbazi Spring 2015

Hadoop Installation MapReduce Examples Jake Karnes

Spectrum Scale HDFS Transparency Guide

Hadoop Training Hands On Exercise

SOLR INSTALLATION & CONFIGURATION GUIDE FOR USE IN THE NTER SYSTEM

Perforce Helix Threat Detection On-Premise Deployment Guide

CDH 5 Quick Start Guide

Web Crawling and Data Mining with Apache Nutch Dr. Zakir Laliwala Abdulbasit Shaikh

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Using The Hortonworks Virtual Sandbox

Revolution R Enterprise 7 Hadoop Configuration Guide

IBM Smart Cloud guide started

Tutorial- Counting Words in File(s) using MapReduce

E6893 Big Data Analytics: Demo Session for HW I. Ruichi Yu, Shuguan Yang, Jen-Chieh Huang Meng-Yi Hsu, Weizhen Wang, Lin Haung.

Hadoop Data Warehouse Manual

2.1 Hadoop a. Hadoop Installation & Configuration

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

SAS Data Loader 2.1 for Hadoop

Hadoop Shell Commands

How to Install and Configure EBF15328 for MapR or with MapReduce v1

Basic Hadoop Programming Skills

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box

Cloudera Manager Training: Hands-On Exercises

Hadoop Shell Commands

Hadoop Setup. 1 Cluster

Hadoop Basics with InfoSphere BigInsights

Running Knn Spark on EC2 Documentation

HDFS. Hadoop Distributed File System

MarkLogic Server. MarkLogic Connector for Hadoop Developer s Guide. MarkLogic 8 February, 2015

Install Hadoop on Ubuntu and run as standalone

RDMA for Apache Hadoop User Guide

Chase Wu New Jersey Ins0tute of Technology

docs.hortonworks.com

Single Node Setup. Table of contents

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

MapReduce, Hadoop and Amazon AWS

Cassandra Installation over Ubuntu 1. Installing VMware player:

Hadoop and Hive. Introduction,Installation and Usage. Saatvik Shah. Data Analytics for Educational Data. May 23, 2014

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

docs.hortonworks.com

HADOOP MOCK TEST HADOOP MOCK TEST I

Pivotal HD Enterprise

Deploy and Manage Hadoop with SUSE Manager. A Detailed Technical Guide. Guide. Technical Guide Management.

RHadoop and MapR. Accessing Enterprise- Grade Hadoop from R. Version 2.0 (14.March.2014)

Pivotal HD Enterprise 1.0 Stack and Tool Reference Guide. Rev: A03

CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei

How to install PowerChute Network Shutdown on VMware ESXi 3.5, 4.0 and 4.1

Transcription:

To configure HSearch you need to install Hadoop, Hbase, Zookeeper, HSearch and Tomcat. 1. Add the machines ip address in the /etc/hosts to access all the servers using name as shown below. 2. Allow all these servers to communicate in the machine firewall. $ iptables -A INPUT -s master -j ACCEPT $ iptables -A INPUT -s slave -j ACCEPT Also enable in ec2 instance the ports the instance can connect, in below image all ports are opened for the corresponding master and slave ip 3. Setup ssh so that the two machine can communicate with each other. 2013 Bizosys Technologies Pvt. Ltd. Page 1

In the above step a ssh keypair with empty password is generated using the following command: $ ssh-keygen t rsa $ cd ~/.ssh $ cat id_rsa.pub >> authorized_keys $ chmod 600 authorized_keys Do the above steps for both the master and slave. Now for the master to communicate with the slave the master s public key should be added to the slave s authorized_keys. $ cat id_rsa.pub Copy the contents of the file and append it to the authorized_keys file of slave machine. Now the two machine can communicate with each other using the shared public rsa key. From the master machine: $ ssh slave $ exit From slave machine: $ ssh master $ exit 2013 Bizosys Technologies Pvt. Ltd. Page 2

4. Set up java $ mkdir /usr/java $ cd /usr/java $ wget http://download.oracle.com/otn-pub/java/jdk/6u39-b04/jdk-6u39- linux-x64.bin?authparam=1360821815_696261924ee18abf65bf0d3b2106256a $ mv jdk-6u39-linuxx64.bin\?authparam\=1360821815_696261924ee18abf65bf0d3b2106256a jdk- 6u39-linux-x64.bin $./jdk-6u39-linux-x64.bin $ $ rm -rf jdk-6u39-linux-x64.bin Test java is running $ cd /usr/java/jdk1.6.0_39/bin $ java version Once you have setup this it s good time for installing hadoop. 5. Setup Hadoop $ cd /mnt $ wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0- cdh4.5.0.tar.gz $ gzip -d hadoop-2.0.0-cdh4.1.3.tar.gz $ tar -xf hadoop-2.0.0-cdh4.1.3.tar $ mv hadoop-2.0.0-cdh4.1.3 hadoop $ cd hadoop/etc/hadoop $ echo "" > excludes $ mkdir -p /mnt/data/namenode /mnt/data/namenode/dfsname /mnt/data/namenode/dfsnameedit /mnt/data/datanode /mnt/logs Export variables to the end of the hadoop configuration file(hadoopenv.sh) $ echo "export JAVA_HOME=/usr/java/jdk1.6.0_39" >> hadoop-env.sh $ echo "export HADOOP_LOG_DIR=/mnt/logs" >> hadoop-env.sh $ echo "master" > masters; cat masters $ echo "slave" > slaves; cat slaves Change the following line of log4j.properties file. $ hadoop.log.dir=/mnt/logs 2013 Bizosys Technologies Pvt. Ltd. Page 3

Edit the core-site.xml file: <?xml version="1.0" encoding="utf-8"?> <configuration> <name>fs.defaultfs</name> <value>hdfs://master:54310</value> <description>the name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.scheme.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem. </description> <name>io.file.buffer.size</name> <value>16384</value> <description>read/write buffer size used in SequenceFiles (should be in multiples of the hardware page size All Intel processors have 4KiB pages. Means 4096 In a one machine cluster, I am making this as small as possible continuous streaming. In KB 65536 Means 64KB. 16KB starting 16384. Typical value for a 250 to 2000 nodes is 32768-131072 </description> <name>io.seqfile.compress.blocksize</name> <value>4096</value> <description> The minimum block size for compression in block compressed SequenceFiles We will compress a minimum 4KB file, as this us to read less. </description> </configuration> 2013 Bizosys Technologies Pvt. Ltd. Page 4

Edit the Hdfs-site.xml HSearch Installation <?xml version="1.0" encoding="utf-8"?> <configuration> <name>hadoop.home</name> <value>/mnt/hadoop</value> <name>metadata.dir</name> <value>/mnt/data/namenode</value> <description>where the NameNode metadata should be stored </description> <name>dfs.datanode.data.dir</name> <value>/mnt/data/datanode</value> <description>at which location the data is stored./data/1,/data/2,/data/3 </description> <name>dfs.replication</name> <value>2</value> <name>dfs.blocksize</name> <value>33554432</value> <name>dfs.namenode.handler.count</name> <value>40</value> <name>dfs.datanode.handler.count</name> <value>40</value> <name>dfs.hosts.exclude</name> <value>/mnt/hadoop/etc/hadoop/excludes</value> <name>dfs.namenode.name.dir</name> <value>${metadata.dir}/dfsname</value> <name>dfs.namenode.edits.dir</name> <value>${metadata.dir}/dfsnameedit</value> </configuration> 2013 Bizosys Technologies Pvt. Ltd. Page 5

Edit mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <name>mapreduce.jobtracker.address</name> <value></value> <name>mapreduce.framework.name</name> <value>yarn</value> <name>mapred.child.java.opts</name> <value>-xmx400m</value> </configuration> Edit yarn-site.xml <?xml version="1.0"?> <configuration> <name>yarn.resourcemanager.address</name> <value>master:54311</value> <name>yarn.nodemanager.local-dirs</name> <value>/mnt/data/1/mapred/local/</value> <final>true</final> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </configuration> 2013 Bizosys Technologies Pvt. Ltd. Page 6

After configuring all log files, set the path for hadoop and java. $ cd ~ $ vi.bash_aliases Add the following lines in it export JAVA_HOME=/usr/java/jdk1.6.0_39 export HADOOP_PREFIX=/mnt/hadoop export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_PREFIX/bin Now exit the terminal for the changes to take effect and then login again. Perform the same above activities on the slave machine excluding the creation of masters and slave file in the /mnt/hadoop/etc/hadoop dir. i.e avoid following commands. $ echo "master" > masters; cat masters $ echo "slave" > slaves; cat slaves We don t need above commands since the slave machine will simply run the datanode only. Now all the hadoop is set up and good to go. 2013 Bizosys Technologies Pvt. Ltd. Page 7

Now we need to format the namenode and then we can start hadoop.(if formatting more than once then also delete the datanode directories as there will be conflict of namesapceid when connecting to newly formatted namenode.) $ hdfs namenode format You will see the following similar output if it successfully formats: Start Hadoop $ start-dfs.sh To check if the namenode and datanode successfully started in master and slave use the following command: $ jps Master: Slave: 2013 Bizosys Technologies Pvt. Ltd. Page 8

Test Hadoop $ cat > /tmp/a.txt $ hdfs dfs -copyfromlocal /tmp/a.txt / $ hdfs dfs -ls / $ hdfs dfs -cat /a.txt $ hdfs dfs copytolocal /a.txt /tmp $./hadoop dfs -rm /a.txt $./hadoop dfs -ls / Stop Hadoop $ stop-dfs.sh 6. Setup Zookeeper: $ cd /mnt $ wget http://archive.cloudera.com/cdh4/cdh/4/zookeeper-3.4.5- cdh4.5.0.tar.gz $ mv zookeeper-3.4.5-cdh4.5.0.tar.gz zookeeper-3.4.5.tar.gz $ gzip -d zookeeper-3.4.5.tar.gz $ tar -xf zookeeper-3.4.5.tar $ mv zookeeper-3.4.5.tar zookeeper-3.4.5 $ mv zookeeper-3.4.5 zookeeper $ cd zookeeper $ rm -rf docs ivy.xml ivysettings.xml src contrib dist-maven; ls $ cd conf $ cp zoo_sample.cfg zoo.cfg $ vi zoo.cfg 2013 Bizosys Technologies Pvt. Ltd. Page 9

7. Setup HBase: $ cd /mnt $ wget http://archive.cloudera.com/cdh4/cdh/4/hbase-0.94.6- cdh4.5.0.tar.gz $ mv hbase-0.94.6-cdh4.5.0.tar.gz hbase-0.94.6.tar.gz $ gzip -d hbase-0.94.6.tar.gz $ tar xf hbase-0.94.6.tar $ mv hbase-0.94.6 hbase $ mv hbase-0.90.4.tar hbase $ cd hbase $ rm -rf docs hbase-0.90.4-tests.jar pom.xml src $ cd /mnt/hbase/conf $ echo "slave" > regionservers; cat regionservers log4j.properties file edit. hbase.log.dir=/mnt/logs Export variables to the end of the hbase configuration file. $ echo "export JAVA_HOME=/usr/lib/jdk" >> hbase-env.sh $ echo "export HBASE_CLASSPATH=/mnt/hbase/conf" >> hbase-env.sh $ echo "export HBASE_MANAGES_ZK=true" >> hbase-env.sh $ echo "export HBASE_HEAPSIZE=2048" >> hbase-env.sh $ echo "export HBASE_OPTS=\"-server -XX:+UseParallelGC - XX:ParallelGCThreads=4 -XX:+AggressiveHeap - XX:+HeapDumpOnOutOfMemoryError\"" >> hbase-env.sh $ echo "export HBASE_LOG_DIR=/mnt/logs" >> hbase-env.sh 2013 Bizosys Technologies Pvt. Ltd. Page 10

8. Setup Tomcat: $ cd /mnt $ wget http://apache.mesi.com.ar/tomcat/tomcat-7/v7.0.47/bin/apachetomcat-7.0.47.tar.gz $ gzip -d apache-tomcat-7.0.47.tar.gz $ tar xf apache-tomcat-7.0.47.tar $ mv apache-tomcat-7.0.47 tomcat 9. Setup Hsearch: 1. Download the war file from Amazon S3: For CDH 4.5 (Tested): http://hsearch.war.s3.amazonaws.com/hsearch.war.cdh4.5.tar.gz For HDP 1.3 (Tested): http://hsearch.war.s3.amazonaws.com/hsearch.war.hdp1.3.tar.gz 2. Unzip and extract the war file 3. Deploy the war file to your server and start your server 4. Copy hsearch and lucene jar files to hadoop and hbase lib folder 5. Restart hadoop and hbase 6. Now Open : http://yourserverurl:port/hsearch/ 7. Fill the setup page and create a new project. 8. Import data from hdfs file location, index and search. Hsearch query syntax: e.g. ((sex:male AND subject:{usb101,usb102} ) OR (dayrange:[-10 : 2])) Following where query types are possible: Exact Match = sex:male Not Match = sex:!male In Match = organs:{ Adrenal, glands, Liver } Range Match = reading:[-10 : 20] Note: If data contains comma in inqueries then it should be passed within double quotes. 2013 Bizosys Technologies Pvt. Ltd. Page 11