CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)



Similar documents
HSearch Installation

HADOOP - MULTI NODE CLUSTER

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

HADOOP CLUSTER SETUP GUIDE:

Hadoop MultiNode Cluster Setup

Hadoop Setup Walkthrough

How To Install Hadoop From Apa Hadoop To (Hadoop)

Hadoop Lab - Setting a 3 node Cluster. Java -

Hadoop Multi-node Cluster Installation on Centos6.6

Running Kmeans Mapreduce code on Amazon AWS

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Hadoop Training Hands On Exercise

Perforce Helix Threat Detection On-Premise Deployment Guide

E6893 Big Data Analytics: Demo Session for HW I. Ruichi Yu, Shuguan Yang, Jen-Chieh Huang Meng-Yi Hsu, Weizhen Wang, Lin Haung.

Installing Hadoop. You need a *nix system (Linux, Mac OS X, ) with a working installation of Java 1.7, either OpenJDK or the Oracle JDK. See, e.g.

HADOOP. Installation and Deployment of a Single Node on a Linux System. Presented by: Liv Nguekap And Garrett Poppe

Pivotal HD Enterprise

How to install Apache Hadoop in Ubuntu (Multi node setup)

How to install Apache Hadoop in Ubuntu (Multi node/cluster setup)

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters

Hadoop Distributed File System and Map Reduce Processing on Multi-Node Cluster

This handout describes how to start Hadoop in distributed mode, not the pseudo distributed mode which Hadoop comes preconfigured in as on download.

Running Knn Spark on EC2 Documentation

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

Installation and Configuration Documentation

Hadoop Data Warehouse Manual

The Greenplum Analytics Workbench

Single Node Hadoop Cluster Setup

Assignment 3 Firewalls

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

Apache Hadoop new way for the company to store and analyze big data

Rapid Access Cloud: Se1ng up a Proxy Host

docs.hortonworks.com

Leveraging SAP HANA & Hortonworks Data Platform to analyze Wikipedia Page Hit Data

HADOOP MOCK TEST HADOOP MOCK TEST II

Set JAVA PATH in Linux Environment. Edit.bashrc and add below 2 lines $vi.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-oracle/

Hadoop Basics with InfoSphere BigInsights

OS Installation: CentOS 5.8

Chase Wu New Jersey Ins0tute of Technology

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters

Introduction to Cloud Computing

研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1

Revolution R Enterprise 7 Hadoop Configuration Guide

Upgrading a Single Node Cisco UCS Director Express, page 2. Supported Upgrade Paths to Cisco UCS Director Express for Big Data, Release 2.

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Cloud.com CloudStack Community Edition 2.1 Beta Installation Guide

Linux firewall. Need of firewall Single connection between network Allows restricted traffic between networks Denies un authorized users

HDFS Cluster Installation Automation for TupleWare

Kognitio Technote Kognitio v8.x Hadoop Connector Setup

Installing Hadoop. Hortonworks Hadoop. April 29, Mogulla, Deepak Reddy VERSION 1.0

Hadoop (pseudo-distributed) installation and configuration

Hadoop Installation. Sandeep Prasad

SAS Data Loader 2.1 for Hadoop

Ankush Cluster Manager - Hadoop2 Technology User Guide

Pivotal HD Enterprise 1.0 Stack and Tool Reference Guide. Rev: A03

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

TP1: Getting Started with Hadoop

Tutorial- Counting Words in File(s) using MapReduce

StreamServe Persuasion SP4

HDFS Installation and Shell

Hadoop Installation Guide

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

HDFS to HPCC Connector User's Guide. Boca Raton Documentation Team

Installation Guide Setting Up and Testing Hadoop on Mac By Ryan Tabora, Think Big Analytics

Getting Started with SandStorm NoSQL Benchmark

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box

CDH 5 Quick Start Guide

HDFS Users Guide. Table of contents

Tableau Spark SQL Setup Instructions

A Study of Data Management Technology for Handling Big Data

Cassandra Installation over Ubuntu 1. Installing VMware player:

Using The Hortonworks Virtual Sandbox

How to Install and Configure EBF15328 for MapR or with MapReduce v1

BF2CC Daemon Linux Installation Guide

The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - -

Perforce Helix Threat Detection OVA Deployment Guide

RHadoop and MapR. Accessing Enterprise- Grade Hadoop from R. Version 2.0 (14.March.2014)

How to set up multiple web servers (VMs) on XenServer reusing host's static IP

Hadoop Setup. 1 Cluster

HOD Scheduler. Table of contents

Hadoop Ecosystem B Y R A H I M A.

Hadoop 2.6 Configuration and More Examples

RHadoop Installation Guide for Red Hat Enterprise Linux

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

How To Use Hadoop

Camilyo APS package by Techno Mango Service Provide Deployment Guide Version 1.0

Spectrum Scale HDFS Transparency Guide

MapReduce, Hadoop and Amazon AWS

Web Crawling and Data Mining with Apache Nutch Dr. Zakir Laliwala Abdulbasit Shaikh

How to Create, Setup, and Configure an Ubuntu Router with a Transparent Proxy.

Deploying MongoDB and Hadoop to Amazon Web Services

Insights to Hadoop Security Threats

Setting up a Hadoop Cluster with Cloudera Manager and Impala

AlienVault Unified Security Management (USM) 4.x-5.x. Deploying HIDS Agents to Linux Hosts

Data Analytics. CloudSuite1.0 Benchmark Suite Copyright (c) 2011, Parallel Systems Architecture Lab, EPFL. All rights reserved.

Keyword: YARN, HDFS, RAM

Red Hat Enterprise Linux OpenStack Platform 7 OpenStack Data Processing

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Transcription:

CactoScale Guide User Guide Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)

Version History Version Date Change Author 0.1 12/10/2014 Initial version Athanasios Tsitsipas(UULM) 0.2 14/01/2015 Added description and install notes Papazachos Zafeirios(QUB), Sakil Barbhuiya(QUB) 0.3 23/10/2015 Change version of tools add new install notes Athanasios Tsitsipas(UULM)

TABLE OF CONTENTS TABLE OF CONTENTS I 1. PURPOSE 1 2. OVERVIEW 1 3. PREREQUISITES 1 4. INSTALLATION OF MONITORING FRAMEWORK 1 A) INSTALLING MONITORING CLUSTER 2 B) STEP-BY-STEP - ADD NEW NODE TO EXISTING CLUSTER 3 C) FORWARD REQUESTS TO PORT THROUGH THE MONITORING-GATEWAY VM TO A NODE 3 D) START / STOP THE MONITORING CLUSTER 3 E) INSTALL AND START CHUKWA COLLECTOR 4 F) START CHUKWA AGENT ON A PHYSICAL MACHINE 4 5. START THE RUNTIME MODEL UPDATER 4 6. STEP-BY-STEP OFFLINE ANALYSIS GUIDELINE 4 A) CREATE HBASE SCHEMA TABLES 5 B) IMPORTING STRACE DATA TO HBASE 5 C) ANALYSING STRACE DATA STORED IN THE HBASE 6 D) CSV GENERATION FROM THE ANALYSIS RESULTS 6 E) TROUBLESHOOTING 7 i P a g e C a c t o S c a l e G u i d e C a c t o s

1. PURPOSE This document is a complete guide how to use CactoScale, including steps how to install, completely from scratch, the monitoring framework and start the Runtime Model Updater (D4.3 Parallel Trace Analysis). Finally, it presents instructions of executing the analysis and result of Pig scripts, upon existing trace data from system calls of chemical computations done with Molpro 1. The traces provided by the University of Ulm. 2. OVERVIEW In (D4.1 Data Collection Framework), are described in depth the tools that CactoScale utilizes. CactoScale features extensible monitoring capabilities which allow the tracking of a variety of resources such as embedded sensors, external instrumentation devices, hardware counters, error log files, workload traces, network, processor core, memory, storage, and application logs. Additionally, provides data filtering and correlation analysis tools, which are designed to run on vast volumes of data generated from potentially thousands of servers. These capabilities in turn enable CACTOS to address challenges in managing resources of increased complexity and heterogeneity in cloud infrastructures. 3. PREREQUISITES The required versions of the utilized tools in CactoScale for the current guide are: i. Hadoop version: 2.6.0 ii. Zookeeper version: 3.4.6 iii. HBase version: 1.1.1 iv. Pig version: 0.12.1 v. Chukwa version: 0.5.0 Except the required technologies a running CDO server 2 is needed. 4. INSTALLATION OF MONITORING FRAMEWORK The following instructions is based on four virtual machines: monitoring gateway: used for accessing the monitoring cluster the only publicly accessible vm! monitoring01: HDFS namenode, HDFS datanode, HBase master monitoring02: HDFS secondarynamenode, HDFS datanode, HBase regionserver monitoring03: HDFS datanode, HBase regionserver. All vms have key-based ssh access to each other The above cluster setup 3 is maintained also in the Openstack Testbed of the University of Ulm. 1 https://www.molpro.net/ 2 http://www.cactosfp7.eu/2015/04/03/cactos-blog-setting-secure-cdo-server/ 3 http://www.cactosfp7.eu/2015/08/26/openstack-physical-testbed-part-3-staying-up-to-datesoftware-requirements/ 1 P a g e C a c t o S c a l e G u i d e C a c t o s

a) INSTALLING MONITORING CLUSTER For the multi-node setup, make sure to setup key-based ssh authentication between all nodes first. Also, set up and /etc/hosts correctly on all nodes. On a fresh Centos 7 VM, do the following steps. yum install svn mkdir cactoscale cd cactoscale svn checkout https://svn.fzi.de/svn/cactos/code/scale/trunk/cactoscale_monitoring_framework/. export SVNCHECKOUT=./cactoscale PREPARE THE SETUP ON FIRST NODE # install required packages first yum install epel-release wget vim java-openjdk # download hadoop and hbase binaries cd ~ wget http://mirror.softaculous.com/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz tar xfzv hadoop-2.6.0.tar.gz wget http://ftp-stud.hs-esslingen.de/pub/mirrors/ftp.apache.org/dist/hbase/1.1.1/hbase-1.1.1-bin.tar.gz tar xfzv hbase-1.1.1-bin.tar.gz # copy helper scripts cp $SVNCHECKOUT/hCluster/bin/*. chmod +x./*.sh CONFIGURE THE SETUP # place the config files from this repo cp $SVNCHECKOUT/hCluster/conf/hadoop/* ~/hadoop-2.6.0/etc/hadoop/ cp $SVNCHECKOUT/hCluster/conf/* ~/hbase-1.1.1/conf/ Change the following configuration files as needed: hadoop: core-site.xml, line 46, property "fs.default.name" value "hdfs://monitoring01:8020" hadoop: dfs-hosts, line 1, add hostname of namenode(s) hadoop: hdfs-site.xml, line330, property "dfs.https.address" value "monitoring01:50470" hadoop: slaves, add hostnames for data nodes hbase: hbase-site.xml, line 25, property "hbase.rootdir" value "hdfs://192.168.0.3:8020/hbase" hbase: hbase-site.xml, line 37, property "hbase.zookeeper.quorum" value "list of all hbase nodes" hbase: regionservers, add hostnames for region servers 2 P a g e C a c t o S c a l e G u i d e C a c t o s

# format the hdfs root dir cd ~/hadoop-2.6.0 bin/hdfs namenode format b) STEP-BY-STEP - ADD NEW NODE TO EXISTING CLUSTER PREPARE THE NODE Log in to the new node. Set up key-based ssh login and /etc/hosts. yum install epel-release wget vim java-openjdk ADD NEW NODE TO CONFIGURATION Log in to the first node. Edit the settings: hadoop: slaves, add hostnames for data nodes hbase: hbase-site.xml, line 37, property "hbase.zookeeper.quorum" value "list of all hbase nodes" hbase: regionservers, add hostnames for region servers COPY SETUP FROM FIRST NODE TO NEW NODE Log in to the first node. Use the helper script. # edit the distribute script $SVNCHECKOUT/hCluster/bin/distribute.sh <hostname_of_new_node> Binaries and configuration are copied and extracted now. c) FORWARD REQUESTS TO PORT THROUGH THE MONITORING-GATEWAY VM TO A NODE MAKE SURE TO HAVE IPTABLES INSTALLED, IF NOT RUN yum install iptables-services EXECUTE THE FOLLOWING RULES IN A TERMINAL sysctl net.ipv4.ip_forward=1 iptables -t nat -A PREROUTING -p tcp --dport <port> -j DNAT --to-destination <monitoring01 ip>: <port> iptables -t nat -A POSTROUTING -j MASQUERADE iptables -I FORWARD -p tcp --dport <port> -j ACCEPT d) START / STOP THE MONITORING CLUSTER Log in to the master node (monitoring-gateway) via ssh. $SVNCHECKOUT/hCluster/bin/startHStuff.sh will start the local master services AND the slave services via ssh $SVNCHECKOUT/hCluster/bin/stopHStuff.sh will stop the local master services AND the slave services via ssh 3 P a g e C a c t o S c a l e G u i d e C a c t o s

e) CREATE REQUIRED HBASE TABLES Log on at any virtual machine of the monitoring cluster and execute the below command to create the required HBase tables for CactoScale to store the required information. hbase shell $SVNCHECKOUT/chukwa/chukwa-cactos/etc/chukwa/cactos-hbase.schema f) INSTALL AND START CHUKWA COLLECTOR On the desired machine of the monitoring cluster, e.g monitoring01, execute the following command: $SVNCHECKOUT/chukwa/chukwa-collector.sh start Make sure to have accessible the port 8080 where the chukwa collector runs. g) START CHUKWA AGENT ON A PHYSICAL MACHINE A node that needs monitoring, start a chukwa agent by executing the following: yum install svn mkdir cactoscale cd cactoscale svn checkout https://svn.fzi.de/svn/cactos/code/scale/trunk/cactoscale_monitoring_framework/. export SVNCHECKOUT=./cactoscale $SVNCHECKOUT/chukwa/chukwa-agent.sh start Prior to the last command, the collector IP must be set at the file located at: $SVNCHECKOUT/chukwa/chukwa-cactos/etc/chukwa/collectors 5. START THE RUNTIME MODEL UPDATER Starting the Runtime Model Updater became an easy task by executing the commands below from a desired node: wget https://sdqweb.ipd.kit.edu/eclipse/cactos/cactoscale/runtimemodelupdater/binary_nightly/runtimemodelupdat er.gtk.linux.x86_64.zip unzip -q RuntimeModelUpdater.gtk.linux.x86_64.zip svn checkout https://svn.fzi.de/svn/cactos/code/integration/trunk/eu.cactosfp7.configuration/ Until now, the product folder and the configuration folder were obtained. Before starting the Runtime Model Updater, information in the files cactoscale_model_updater.cfg and integration_cdosession.cfg in the folder eu.cactosfp7.configuration must be filled according to the naming of the variables. After a successful configuration the following commants must be executed in order to start the Runtime Model Updater. cd RuntimeModelUpdater.gtk.linux.x86_64 screen -dms modelupdater bash -c "./RuntimeModelUpdater" 6. STEP-BY-STEP OFFLINE ANALYSIS GUIDELINE The collected strace output data from Molpro obtained for different system scenarios. The available dataset traces, are fully described at (D4.2 Preliminary Offline Trace Analysis), in chapter III. For this guideline the 4 P a g e C a c t o S c a l e G u i d e C a c t o s

configuration for the two strace log files is separated by the storage type of machines (HDD, SSD) and the files are named strace.out_01 and strace.out_02 respectively. a) CREATE HBASE SCHEMA TABLES In order to create the tables, use the provided schemas in the 1_Hbase_schema folder. The execution is as follows: Usage: sudo u hbase shell <localsrc> Example: sudo u hbase /tmp/1_hbase_schema/ulm_strace_import.schema The above applies also for the file ulm_strace_analysis_result.schema. The raw strace output files need to be imported to HBase tables (ulm_strace_import.schema script) and the results from the analysis scripts must be stored in HBase in different tables (ulm_strace_analysis_result.schema). b) IMPORTING STRACE DATA TO HBASE The instructions for this step are the following: i. These two files (strace.out_01, strace.out_02) have to be uploaded first to HDFS. In order to upload them execute the command: Usage: sudo u hdfs hadoop fs -put <localsrc> <HDFS_dest_Path> Example: sudo u hdfs hadoop fs put /tmp/strace.out_01 /tmp/ Execute the same command for the strace.out_02 log file. Having both files uploaded to HDFS, the following instructions can be carried out. Moreover, more information about the traces can be found at (D4.2 Preliminary Offline Trace Analysis), in chapter IV. ii. iii. Edit the pig script storenewstracedata.pig (in the 2_Import_logs folder) by configuring the path of the myudf.jar provided in the same folder. Run the pig script storenewstracedata.pig by running the following command line (change versions according to the installation of the tools) in order to execute the import pig script (storenewstracedata.pig): Usage: <exec file of pig> -Dpig.additional.jars=<Path to hbase jar>:<path to zookeeper jar>:<path to pig jar>:<path to hbase-env.jar of chukwa> <localsrc of script> Example: pig -Dpig.additional.jars=/usr/lib/hbase/hbase-1.1.1.jar:/usr/lib/zookeeper/zookeeper-3.4.6.jar:/var/pig/pig- 0.12.1/pig-0.12.1.jar:/home/chukwa/hbase-env.jar /tmp/2_import_logs/storenewstracedata.pig 5 P a g e C a c t o S c a l e G u i d e C a c t o s

NOTE: Run the pig script storenewstracedata.pig twice, firstly for strace.out_01 and secondly for strace.out_02 (simply search for the LOAD command in the pig script and change the file name). c) ANALYSING STRACE DATA STORED IN THE HBASE Having the imported data analysis scripts will be executed upon these to get meaningful results (more information at the (D4.2 Preliminary Offline Trace Analysis) Section V.1). Run the following commands (change versions according to the installation of the tools) in order to execute the following analytic pig scripts separately stracedataanalytic_perjob.pig, stracedataanalytic_timeseries.pig and stracedataanalytic_variance.pig from the 3_Analysis folder: Usage: <exec file of pig> -Dpig.additional.jars=<Path to hbase jar>:<path to zookeeper jar>:<path to pig jar>:<path to hbase-env.jar of chukwa> <localsrc of script> Example: pig -Dpig.additional.jars=/usr/lib/hbase/hbase-1.1.1.jar:/usr/lib/zookeeper/zookeeper-3.4.6.jar:/var/pig/pig- 0.12.1/pig-0.12.1.jar:/home/chukwa/hbase-env.jar /tmp/3_analysis/ stracedataanalytic_perjob.pig NOTE: Run each analytic script twice, firstly for strace.out_01 and secondly for strace.out_02 (simply search for the FILTER command in the pig script and change the file name; also search for the STORE command and change the HBase table name extension between _01 and _02) ATTENTION: In every analytic pig script there is a parameter that could be defined, named $START. This is for the minkeyval in order to return the rows with rowkeys greater than minkeyval. But, because we want to return from HBase table all the rows and not any specific rows, we can ignore this parameter by deleting the code part, -gt $START. d) CSV GENERATION FROM THE ANALYSIS RESULTS Creating CSV files from the analytic results stored in the HBase, R scripts are finally used to produce the result graphs from these CSV files (more information at the (D4.2 Preliminary Offline Trace Analysis) section V.2). Open the pig scripts in the 4_CSV_Generation folder for creating CSV files and change the HBase table name extensions between _01 and _02 to run the scripts twice (one for strace_output_01 and other for strace_output_02). Also, in the script change the location where to save the CSV file and the name of the CSV file. Execute the following result scripts separately meanvalueresultcsv.pig, perjobresultcsv.pig, sizedistributionresultcsv.pig, standarddeviationresultcsv.pig, timeseriesresultcsv.pig. Run each script like as follows: Usage: <exec file of pig> -Dpig.additional.jars=<Path to hbase jar>:<path to zookeeper jar>:<path to pig jar>:<path to hbase-env.jar of chukwa> <localsrc of script> Example: pig -Dpig.additional.jars=/usr/lib/hbase/hbase-1.1.1.jar:/usr/lib/zookeeper/zookeeper-3.4.6.jar:/var/pig/pig- 0.12.1/pig-0.12.1.jar:/home/chukwa/hbase-env.jar /tmp/4_csv_generation/ meanvalueresultcsv.pig 6 P a g e C a c t o S c a l e G u i d e C a c t o s

e) TROUBLESHOOTING Several issues a user might face during the execution of scripts. Either environmental or actual false execution of scripts, below there are possible issues and given solutions. The scripts provided are fine tested and executed in order to produce the expected results. 1. Problem: Error: JAVA_HOME is not set. Solution: Export the environmental variable e.g export JAVA_HOME=/etc/alternatives/jre:$JAVA_HOME 2. Problem: By running an analysis script get the error log below: Or [main] ERROR org.apache.pig.tools.grunt.grunt - java.lang.noclassdeffounderror: org/apache/hadoop/hbase/filter/writablebytearraycomparable [main] ERROR org.apache.pig.tools.grunt.grunt - java.lang.noclassdeffounderror: org/apache/hadoop/hbase/mapreduce/tableoutputformat Solution: Even the HBase jar is declared as a parameter to the execution of pig script the system need the environmental variable HBASE_HOME e.g export HBASE_HOME=/usr/lib/hbase:$HBASE_HOME. 3. Problem: pig: command not found Solution: Either give the actual path where the pig executable is located, when try to run the scripts or declare an environmental variable e.g export PIG_CLASSPATH=/var/pig/pig-0.12.1/bin:$PIG_CLASSPATH. 4. Problem: By running an analysis script get the error log below: WARN snappy.loadsnappy: Snappy native library not loaded Solution: This error message will appear if the shared library (.so) for snappy is not located in the hadoop native library path. If you have the libraries installed in the correct location then you shouldn't see the above error messages. Try e.g ln -sf /usr/lib64/libsnappy.so /usr/lib/hadoop/lib/native/linux-amd64-64/. 7 P a g e C a c t o S c a l e G u i d e C a c t o s