Vaidya Guide. Table of contents



Similar documents
SDK Code Examples Version 2.4.2

Single Node Setup. Table of contents

ITG Software Engineering

TIBCO Silver Fabric Continuity User s Guide

IBM Tivoli Workload Scheduler Integration Workbench V8.6.: How to customize your automation environment by creating a custom Job Type plug-in

TIBCO Runtime Agent Authentication API User s Guide. Software Release November 2012

WebSphere v5 Administration, Network Deployment Edition

Map Reduce & Hadoop Recommended Text:

Siebel Application Deployment Manager Guide. Siebel Innovation Pack 2013 Version 8.1/8.2 September 2013

Kognitio Technote Kognitio v8.x Hadoop Connector Setup

Builder User Guide. Version Visual Rules Suite - Builder. Bosch Software Innovations

Crystal Reports for Eclipse

bbc Installing Your Development Environment Adobe LiveCycle ES July 2007 Version 8.0

Fair Scheduler. Table of contents

Builder User Guide. Version 5.4. Visual Rules Suite - Builder. Bosch Software Innovations

Third-Party Software Support. Converting from SAS Table Server to a SQL Server Database

EMC Documentum Composer

Understanding class paths in Java EE projects with Rational Application Developer Version 8.0

Configuring Hadoop Distributed File Service as an Optimized File Archive Store

Creating a Java application using Perfect Developer and the Java Develo...

BIRT Application and BIRT Report Deployment Functional Specification

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

McAfee VirusScan Enterprise for Linux Software

Rumen. Table of contents

Cloudera Certified Developer for Apache Hadoop

Oracle WebLogic Server

1. Introduction What is Slice? Background Why Slice? Purpose of this Document Intended Audience...

Integrating VoltDB with Hadoop

OnCommand Unified Manager

RHadoop Installation Guide for Red Hat Enterprise Linux

StreamServe Persuasion SP5 Control Center

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Oracle SOA Suite 11g Oracle SOA Suite 11g HL7 Inbound Example

Configuring Informatica Data Vault to Work with Cloudera Hadoop Cluster

Authoring for System Center 2012 Operations Manager

By default, STRM provides an untrusted SSL certificate. You can replace the untrusted SSL certificate with a self-signed or trusted certificate.

Complete Java Classes Hadoop Syllabus Contact No:

Using the DataDirect Connect for JDBC Drivers with the Sun Java System Application Server

JobScheduler Installation by Copying

Sun ONE Identity Server Web Policy Agents Release Notes

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Running a Workflow on a PowerCenter Grid

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

PEtALS Quick Start. PEtALS Team Roland NAUDIN - February

Using. DataTrust Secure Online Backup. To Protect Your. Hyper-V Virtual Environment. 1 P a g e

BPM Scheduling with Job Scheduler

How To Install Hadoop From Apa Hadoop To (Hadoop)

WebSphere Application Server V6: Diagnostic Data. It includes information about the following: JVM logs (SystemOut and SystemErr)

TIBCO ActiveMatrix BusinessWorks Plug-in for TIBCO Managed File Transfer Software Installation

StreamServe Persuasion SP4

MIGS Payment Client Installation Guide. EGate User Manual

Setting up the Oracle Warehouse Builder Project. Topics. Overview. Purpose

EMC Documentum Content Services for SAP Repository Manager

Witango Application Server 6. Installation Guide for OS X

Sample copy. Introduction To WebLogic Server Property of Web 10.3 Age Solutions Inc.

How to Run Spark Application

Tutorial: Android Object API Application Development. SAP Mobile Platform 2.3 SP02

ITG Software Engineering

Subversion Server for Windows

Hadoop Streaming. Table of contents

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation

File S1: Supplementary Information of CloudDOE

RHadoop and MapR. Accessing Enterprise- Grade Hadoop from R. Version 2.0 (14.March.2014)

Content. Development Tools 2(63)

Hudson Continous Integration Server. Stefan Saasen,

USER GUIDE. Snow Inventory Client for Unix Version Release date Document date

TIBCO ActiveMatrix BusinessWorks Plug-in for Big Data User's Guide

Performance Monitoring API for Java Enterprise Applications

InfoSphere Master Data Management operational server v11.x OSGi best practices and troubleshooting guide

Hadoop Performance Diagnosis By Post-execution Log Analysis

FileBench's Multi-Client feature

Hadoop Basics with InfoSphere BigInsights

Talend for Data Integration guide

Click Start > Control Panel > System icon to open System Properties dialog box. Click Advanced > Environment Variables.

Oracle FLEXCUBE Direct Banking Android Tab Client Installation Guide Release

Automation of Flex Building and Unit Testing or Continuous Integration with Flex, FlexUnit, and Ant

EventTracker: Configuring DLA Extension for AWStats Report AWStats Reports

Tutorial 5: Developing Java applications

Hadoop Data Warehouse Manual

Software project management. and. Maven

Initializing SAS Environment Manager Service Architecture Framework for SAS 9.4M2. Last revised September 26, 2014

StreamServe Persuasion SP5 StreamStudio

bbc Developing Service Providers Adobe Flash Media Rights Management Server November 2008 Version 1.5

TIBCO Administrator User s Guide. Software Release March 2012

Workshop for WebLogic introduces new tools in support of Java EE 5.0 standards. The support for Java EE5 includes the following technologies:

Using the DataDirect Connect for JDBC Drivers with WebLogic 8.1

IBM Operational Decision Manager Version 8 Release 5. Getting Started with Business Rules

Getting Started with SandStorm NoSQL Benchmark

Microsoft Visual Basic Scripting Edition and Microsoft Windows Script Host Essentials

HP Asset Manager. Software version: Integration with software distribution and configuration management tools

New Features... 1 Installation... 3 Upgrade Changes... 3 Fixed Limitations... 4 Known Limitations... 5 Informatica Global Customer Support...

NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September National Institute of Standards and Technology (NIST)

Informatics for Integrating Biology & the Bedside. i2b2 Workbench Developer s Guide. Document Version: 1.0 i2b2 Software Release: 1.3.

TIBCO ActiveMatrix BusinessWorks Process Monitor Server. Installation

Setting up Hadoop with MongoDB on Windows 7 64-bit

Silect Software s MP Author

MapReduce. Tushar B. Kute,

Transcription:

Table of contents 1 Purpose... 2 2 Prerequisites...2 3 Overview... 2 4 Terminology... 2 5 How to Execute the Hadoop Vaidya Tool...4 6 How to Write and Execute your own Tests... 4

1 Purpose This document describes various user-facing facets of Hadoop Vaidya, a performance diagnostic tool for map/reduce jobs. It describes how to execute a default set of rules against your map/reduce job counters and how to write and execute new rules to detect specific performance problems. A few sample test rules are provided with the tool with the objective of growing the rules database over the time. You are welcome to contribute new rules for everyone's benefit; to do so, follow the How to Contribute procedure specified on Apache Hadoop website. 2 Prerequisites Ensure that Hadoop is installed and configured. More details: Make sure HADOOP_HOME environment variable is set. Make sure Java is installed and configured as a part of the Hadoop installation. 3 Overview Hadoop Vaidya (Vaidya in Sanskrit language means "one who knows", or "a physician") is a rule based performance diagnostic tool for MapReduce jobs. It performs a post execution analysis of map/reduce job by parsing and collecting execution statistics through job history and job configuration files. It runs a set of predefined tests/rules against job execution statistics to diagnose various performance problems. Each test rule detects a specific performance problem with the MapReduce job and provides a targeted advice to the user. This tool generates an XML report based on the evaluation results of individual test rules. 4 Terminology This section describes main concepts and terminology involved with Hadoop Vaidya, PostExPerformanceDiagnoser: This class extends the base Diagnoser class and acts as a driver for post execution performance analysis of MapReduce Jobs. It detects performance inefficiencies by executing a set of performance diagnosis rules against the job execution statistics. Job Statistics: This includes the job configuration information (job.xml) and various counters logged by MapReduce job as a part of the job history log file. The counters are parsed and collected into the Job Statistics data structures, which contains global job level aggregate counters and a set of counters for each Map and Reduce task. Diagnostic Test/Rule: This is a program logic that detects the inefficiency of M/R job based on the job statistics. The description of the Test is specified as an XML element (DiagnosticTest) in a test description file e.g. default tests description file, Page 2

$HADOOP_HOME/contrib/vaidya/conf/postex_diagnosis_tests.xml. The actual logic is coded as a java class and referenced in the DiagnosticTest XML element. Following section describes the DiagnosticTest XML element in a diagnostic test description file DiagnosticTest{Title}: Specifies a short name/description of the test. DiagnosticTest{ClassName}: Specifies fully qualified class name that implements the test logic. DiagnosticTest{Description}: Specifies a full description of the test rule. DiagnosticTest{Importance}: Specifies a declarative value for overall importance of the test rule. (Values: High, Medium, Low) DiagnosticTest{SuccessThreshod}: This is a threshold value specified by test case writer such that if impact level of the test case is lesser, then test is declared as PASSED (or NEGATIVE). The impact level is calculated and returned by individual test's evaluate function, specifying the degree of problem job has with respect to the condition being evaluated. DiagnosticTest{Prescription}: This is a targeted advice written by the test case adviser for the user to follow when test is not PASSED. DiagonsticTest{InputElement}: This is a test specific input that test writer has to optionally provide. This will be supplied to individual test case class so that test writer can use it within test case. This is typically a test configuration information such that test writer need not change the Java code for test case but rather can configure the test case using these input values. Following section describes the performance analysis report generated by the tool in XML format PostExPerformanceDiagnosticReport: This is a document (root) element from the XML report generated by the tool. TestReportElement: This is a XML report element from the test report document, one for each individual test specified in test description file TestReportElement{TestTitle}: Will be included from DiagnosticTest{Title} TestReportElement{TestDescription}: Will be included from DiagnosticTest{Description} TestReportElement{TestImportance}: Will be included from DiagnosticTest{Importance} TestReportElement{TestSeverity}: This is a product of Test Impact level and Test Importance. It indicates overall severity of the test. TestReportElement{ReferenceDetails}: This is a test specific runtime information provided by test case to support the test result and severity. Typically Test writer should print the test impact level in this section. Page 3

TestReportElement{TestResults}: This is boolean outcome of the test based on the SuccessThreshold specified by test writer in the DiagnosticTest description. The test PASSED(NEGATIVE) indicates no problem vs. FAILED (POSITIVE) indicates a potential problem with the job for given test case. TestReportElement{TestPrescription}: This will be included from DiagnosticTest{Prescription}, unless test case writer overrides it in the test case class through getprescription() method 5 How to Execute the Hadoop Vaidya Tool Script to execute Hadoop Vaidya is in $HADOOP_HOME/contrib/vaidya/bin/ directory. It comes with a default set of rules defined in file: $HADOOP_HOME/contrib/ vaidya/conf/postex_diagnosis_tests.xml Make sure HADOOP_HOME environment variable is set and Java is installed and configured. Execute the Hadoop Vaidya script with -help (or without any arguments) to get the command line help. e.g. =>sh $HADOOP_HOME/contrib/vaidya/bin/ vaidya.sh -help User needs to supply job's configuration file (-jobconf job_conf.xml), job history log file (-joblog job_history_log_file), and optionally the test description file (-testconf postex_diagonostic_tests.xml). If test description file is not specified then the default one is picked up from the Hadoop Vaidya Jar ($HADOOP_HOME/contrib/vaidya/hadoop-{version}-vaidya.jar). This default test description file is also available at following location for users to make a local copy, modify and add new test rules: $HADOOP_HOME/contrib/vaidya/ conf/postex_diagnostic_tests.xml Use -report report_file option to store the xml report into specified report_file. 6 How to Write and Execute your own Tests Writing and executing your own test rules is not very hard. You can take a look at Hadoop Vaidya source code for existing set of tests. The source code is at this hadoop svn repository location. The default set of tests are under "postexdiagnosis/tests/" folder. Writing a test class for your new test case should extend the org.apache.hadoop.vaidya.diagnostictest class and it should override following three methods from the base class, evaluate() getprescription() getreferencedetails() Page 4

Make a local copy of the $HADOOP_HOME/contrib/vaidya/conf/ postex_diagnostic_tests.xml file or create a new test description XML file. Add the test description element for your new test case to this test description file. Compile your new test class (or multiple classes), archive them into a Jar file and add it to the CLASSPATH e.g. (export CLASSPATH=$CLASSPATH:newtests.jar) Execute the Hadoop Vaidya script with the job configuration, job history log and reference to newly created test description file using --testconf option. =>sh $HADOOP_HOME/contrib/vaidya/bin/vaidya.sh - joblog job_history_log_file -jobconf job.xml -testconf new_test_description_file -report report.xml Java and JNI are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. Page 5