Ankush Cluster Manager - Hadoop2 Technology User Guide



Similar documents
Ankush Cluster Manager - Cassandra Technology User Guide

Cloudera Manager Training: Hands-On Exercises

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

TIBCO Spotfire Metrics Modeler User s Guide. Software Release 6.0 November 2013

Cloudera Manager Health Checks

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Deploying Hadoop with Manager

Cloudera Manager Monitoring and Diagnostics Guide

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Cloudera Manager Health Checks

docs.hortonworks.com

Using The Hortonworks Virtual Sandbox

Interworks. Interworks Cloud Platform Installation Guide

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

There are numerous ways to access monitors:

WatchDox Administrator's Guide. Application Version 3.7.5

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

docs.hortonworks.com

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

CA Unified Infrastructure Management

Complete Java Classes Hadoop Syllabus Contact No:

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Monitoring Replication


Qsoft Inc

Hadoop Ecosystem B Y R A H I M A.

Oracle Enterprise Manager. Description. Versions Supported. Prerequisites

VMware vcenter Operations Manager Administration Guide

Managing Qualys Scanners

Cloudera Navigator Installation and User Guide

PORTAL ADMINISTRATION

Important Notice. (c) Cloudera, Inc. All rights reserved.

Filr 2.0 Administration Guide. April 2016

Installing and Configuring vcenter Support Assistant

Remote Control Tivoli Endpoint Manager - TRC User's Guide

How to Install and Configure EBF15328 for MapR or with MapReduce v1

Rev 7 06-OCT Site Manager Installation Guide

DEPLOYING EMC DOCUMENTUM BUSINESS ACTIVITY MONITOR SERVER ON IBM WEBSPHERE APPLICATION SERVER CLUSTER

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

TIBCO Slingshot User Guide

Peers Techno log ies Pv t. L td. HADOOP

Management Center. Installation and Upgrade Guide. Version 8 FR4

Workshop on Hadoop with Big Data

Cloudera Manager Monitoring and Diagnostics Guide

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, Version 4.0

Upcoming Announcements

Apache Sentry. Prasad Mujumdar

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Copyright 2012 Trend Micro Incorporated. All rights reserved.

HP Intelligent Management Center v7.1 Virtualization Monitor Administrator Guide

HP IMC Firewall Manager

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

VMware vrealize Operations for Horizon Administration

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION

Adeptia Suite LDAP Integration Guide

Deploying Intellicus Portal on IBM WebSphere

COURSE CONTENT Big Data and Hadoop Training

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Accounting Manager. User Guide A31003-P1030-U

Pivotal HD Enterprise

VMware vcenter Operations Manager Enterprise Administration Guide

Cloudera Backup and Disaster Recovery

StreamServe Persuasion SP5 Control Center

LAE 5.1. Windows Server Installation Guide. Version 1.0

CA Nimsoft Monitor Snap

Pivotal HD Enterprise


JD Edwards EnterpriseOne Tools. 1 Understanding JD Edwards EnterpriseOne Business Intelligence Integration. 1.1 Oracle Business Intelligence

IBM WEBSPHERE LOAD BALANCING SUPPORT FOR EMC DOCUMENTUM WDK/WEBTOP IN A CLUSTERED ENVIRONMENT

Advanced Configuration Steps

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Introduction to Big Data Training

CA Nimsoft Monitor. Probe Guide for Active Directory Server. ad_server v1.4 series

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

HP TippingPoint Security Management System User Guide

Quick Start Guide For Ipswitch Failover v9.0

Administration Guide Novell Filr May 2014

Veritas Cluster Server Database Agent for Microsoft SQL Configuration Guide

Red Hat Enterprise Linux OpenStack Platform 7 OpenStack Data Processing

Installation Guide ARGUS Symphony 1.6 and Business App Toolkit. 6/13/ ARGUS Software, Inc.

Installing and Configuring DB2 10, WebSphere Application Server v8 & Maximo Asset Management

StarWind Virtual SAN Installing & Configuring a SQL Server 2012 Failover Cluster

CDH 5 Quick Start Guide

Avalanche Site Edition

SOA Software: Troubleshooting Guide for Agents

CA Nimsoft Monitor. Probe Guide for IIS Server Monitoring. iis v1.5 series

Mobile Device Management Version 8. Last updated:

HADOOP MOCK TEST HADOOP MOCK TEST II

Certified Big Data and Apache Hadoop Developer VS-1221

Cloudera Backup and Disaster Recovery

Move Data from Oracle to Hadoop and Gain New Business Insights

CA Nimsoft Monitor. Probe Guide for URL Endpoint Response Monitoring. url_response v4.1 series

ServerView Inventory Manager

WhatsUp Gold v16.3 Installation and Configuration Guide

TIBCO Silver Fabric Continuity User s Guide

Setting Up Resources in VMware Identity Manager

How To Login To The Mft Internet Server (Mft) On A Pc Or Macbook Or Macintosh (Macintosh) With A Password Protected (Macbook) Or Ipad (Macro) (For Macintosh) (Macros

Transcription:

Ankush Cluster Manager - Hadoop2 Technology User Guide

Ankush User Manual 1.5 Ankush User s Guide for Hadoop2, Version 1.5 This manual, and the accompanying software and other documentation, is protected by U.S. and international copyright laws, and may be used only in accordance with the accompanying license agreement. Features of the software, and of other products and services of Impetus Technologies, may be covered by one or more of the following patents: U.S. Patent Nos. Other patents pending. All rights reserved. All other company, brand and product names are registered trademarks or trademarks of their respective holders. Impetus Technologies disclaims any responsibility for specifying which marks are owned by which companies or which organizations. USA Los Gatos Impetus Technologies, Inc. 720 University Avenue, Suite 130 Los Gatos, CA 95032, USA Ph:408.252.7111, 408.213.3310 Fax:408.252.7114 2014 Impetus Technologies, Inc., All rights reserved. If you have any comments or suggestions regarding this document, please send them via e-mail to support@ankush.com. AnkushUG1.5/01

Table of Contents List of Figures... 5 List of Tables... 6 1 Chapter-1 Introduction... 12 1.1 Objective... 12 1.2 Scope... 12 1.3 Target Audience... 12 2 Chapter-2 Ankush - Hadoop2 Capabilities... 13 3 Chapter-3 Hadoop2 Ecosystem Support (with versions)... 14 4 Chapter-3 Creating Hadoop2 Cluster... 16 4.1 Common Inputs & Actions... 16 4.2 Hadoop2 Specific Inputs & Actions... 19 5 Chapter-4 Managing Hadoop2 Cluster... 28 5.1 Cluster Details... 28 5.2 Node List... 29 5.3 Node Details... 30 5.4 Management Actions... 30 5.5 Submitting Application... 31 5.6 Application Monitoring... 32 5.7 Configuration... 33 5.7.1 Cluster... 34 5.7.2 Hadoop Ecosystem... 34 5.7.3 Parameters... 37 6 Chapter-5 Monitoring Hadoop2 Cluster... 38 6.1 Technology Specific Details... 39 6.1.1 Node Summary... 39 6.1.2 HDFS Summary... 40 6.1.3 Jobs Summary... 40 Impetus Confidential Page 3 of 48

6.1.4 Hadoop Details... 41 6.1.5 Hadoop Ecosystem Details... 41 6.2 Node Utilization Graph... 42 7 Chapter-6 Tiles Summary... 45 8 Glossary... 47 8.1 List of terms... 47 9 Revision History... 48 Impetus Confidential Page 4 of 48

List of Figures Figure 1: Blank Hadoop2 Cluster creation page... 17 Figure 2: Filled Hadoop2 cluster creation page... 18 Figure 3: Configuring Hadoop2 Cluster... 20 Figure 4: HBase Configuration... 23 Figure 5: Hadoop2: Hive Configuration... 24 Figure 6: Hadoop2: Hive - Configurable Parameters... 25 Figure 7: Hadoop2 Cluster Details... 29 Figure 8: Hadoop2 Node List... 29 Figure 9: Hadoop2 Node Details... 30 Figure 10: Management Actions... 31 Figure 11: Submit Application... 32 Figure 12: Application Monitoring... 32 Figure 13: Application Details... 33 Figure 14: Manage Configurations... 33 Figure 15: Cluster Configurations... 34 Figure 16: Hadoop Ecosystem... 35 Figure 17: Hadoop2 Advanced settings... 36 Figure 18: Hadoop - Parameters... 37 Figure 19: Cluster Monitoring - Generic features... 39 Figure 20: Nodes Summary... 40 Figure 21: HDFS Usage... 40 Figure 22: Jobs Summary... 41 Figure 23: Hadoop Details... 41 Figure 24: Hadoop Ecosystem... 41 Figure 25: Hadoop Ecosystem (Zookeeper) Details... 42 Figure 26: Node Utilization Graph... 43 Impetus Confidential Page 5 of 48

List of Tables Table 1: Hadoop2 Ecosystem Support... 14 Table 2: Hadoop2 Advanced Settings - Configurable parameters... 21 Table 3: Hadoop2 Ecosystem Components... 22 Table 4: Hadoop2: HBase - Configurable Parameters... 24 Table 5: Hadoop2: Hive - Configurable Parameters... 25 Table 6: Hadoop2: Zookeeper - Configurable Parameters... 26 Table 7: Tiles Summary... 45 Table 8: Revision History... 48 Impetus Confidential Page 6 of 48

Welcome to Ankush Welcome to Ankush, Impetus s Big Data Cluster Management product and Auto-provisioning product that creates and manages clusters of different technologies. Ankush provides visual, graphical, and email notifications regarding the health of a Cluster that allow Cluster Administrators to take informed actions. Using This Guide This guide describes how to use Ankush for big data cluster management for Hadoop2 Technology. It provides step-by-step information and instructions (as required) to create, monitor and manage clusters with Ankush. PART I Introduction Provides an introduction to Hadoop2 Capabilities of Ankush and enlists all the supported features. PART II Creating Hadoop2 Cluster Describes how to create a new Hadoop2 cluster with all the relevant screenshots and details related to the configurable parameters. PART IV Managing & Monitoring Hadoop2 Cluster Describes about the various options and configurable parameters available to the user that can be applied over a running Hadoop2 Cluster. Provides the options through which a Hadoop2 cluster can be managed and configured. It also describes various options to monitor the health of a running Hadoop2 Cluster. This includes viewing utilization reports, heat maps etc. Glossary Impetus Confidential Page 7 of 48

Provides definition and description to difficult technical terminologies and jargons. Impetus Confidential Page 8 of 48

Ankush Documentation Set In addition to this user s guide, Ankush comes with the following printed documentation. Ankush Installation Guide explains how to install Ankush. Ankush - Getting Started Guide for getting started with Ankush, understand the UI Elements, Hierarchical views, user management and navigational possibilities. Ankush User s Guide for Cassandra explains about creating, managing and monitoring a Cassandra technology cluster using Ankush product. Ankush User s Guide for Kafka explains about creating, managing and monitoring a Kafka technology cluster using Ankush product. Ankush User s Guide for Storm explains about creating, managing and monitoring a Storm technology cluster using Ankush product. Ankush User s Guide for Elastic Search explains about creating, managing and monitoring an Elastic Search technology cluster using Ankush product. Ankush User s Guide for Oracle SQL explains about creating, managing and monitoring an Oracle NoSQL technology cluster using Ankush product. Impetus Confidential Page 9 of 48

Part I Introduction Impetus Confidential Page 10 of 48

Typographical Conventions This guide uses the following typographical conventions: 1, 2, 3 Sequence of steps in a procedure Used to indicate features Indicates options and steps Courier New Bold Used to indicate code Indicates important notes, method or function names [ ] Encloses optional arguments Figure 1, Figure 2 Table 1, Table 2 Indicates figure numbers in sequential order Indicates table numbers in sequential order 1.1, 1.2 Indicates sub-topics Impetus Confidential Page 11 of 48

1 Chapter-1 Introduction Ankush Cluster Manager or Ankush is a Big Data Cluster Management and Autoprovisioning web application that creates and manages clusters of different technologies. Ankush provides visual, graphical, and email notifications regarding the health of a Cluster that allow Cluster Administrators to take informed actions. 1.1 Objective The objective of this document is to provide seamless assistance and support to the users of Ankush in exploring and understanding the features and functionalities offered by the product. It also furnishes a step-by-step guidance to help the users understand the operating steps, information flow with screenshots, navigation methods etc. 1.2 Scope The scope of this document is to elicit and describe all the features and functionalities of Ankush with Hadoop2 technology offered to user in detail. This includes the description of each feature along with the product screenshots. Prerequisites: It is expected that any reader using this guide has basic understanding of Ankush, its UI Elements, Hierarchical views, user management, navigational possibilities, common Cluster Creation inputs and actions, common cluster management and monitoring etc. as elaborated in Ankush - Getting Started Guide. 1.3 Target Audience The target audience of this document is cluster implementers and cluster managers. Impetus Confidential Page 12 of 48

2 Chapter-2 Ankush - Hadoop2 Capabilities Ankush brings forth wide-ranging set of features as follows: Hadoop2 Big Data Technology Supported Vendors: Apache (2.2.0) Management Services Application Submission Application Schedulers Parameters Configuring Alerts/Warnings threshold Monitoring Applications Graphically represented Cluster Level Utilization Metrics Graphically represented Node Level Utilization Metrics Heat Maps Logs Audit Trail Events Impetus Confidential Page 13 of 48

3 Chapter-3 Hadoop2 Ecosystem Support (with versions) The supported Hadoop2 Ecosystem components are as follows: Component Vendors & Versions Supported Flume Apache (1.4.0, 1.2.0) HBase Apache (0.98.0 - Hadoop2) Hive Apache (0.11.0) Mahout - Oozie - Pig Apache (0.11.1) Solr Apache (4.0.0-BETA, 3.6.1) Sqoop Apache (1.4.4. hadoop-2.0.4-alpha) Zookeeper Apache (3.4.5, 3.3.5) Table 1: Hadoop2 Ecosystem Support Impetus Confidential Page 14 of 48

Part II Creating Hadoop2 Cluster Impetus Confidential Page 15 of 48

4 Chapter-3 Creating Hadoop2 Cluster 4.1 Common Inputs & Actions There are various common inputs while creating a cluster as detailed in the Getting Started Guide. General Details: Includes Cluster Name and its description. Java: This deals with installing fresh Java or use existing Java. Nodes: Includes authenticating nodes, retrieving those for further operations and inspecting the nodes whether they suffices the basic installation prerequisites for cluster deployment like Firewall, Require TTY, Sudo user etc. Node Authentication Node Retrieval Node Inspection Configuration: This includes configuring cluster with various generic parameters. The technology-specific parameters are covered in the next section. Deploying Cluster: This includes description of how to deploy a cluster. Tracking Cluster Deployment Progress: This includes description of method to track the progress to deploy a cluster. Please refer to Cluster Creation - Getting Started Guide for more information. A screenshot of default blank Hadoop2 cluster creation page is as shown below: Impetus Confidential Page 16 of 48

Figure 1: Blank Hadoop2 Cluster creation page A screenshot of a filled Hadoop2 cluster creation page is as shown below: Impetus Confidential Page 17 of 48

Figure 2: Filled Hadoop2 cluster creation page Impetus Confidential Page 18 of 48

4.2 Hadoop2 Specific Inputs & Actions Use vendor, version and source input to configure the vendor, version and bundle source for the Hadoop2 version that needs to be deployed. Note: In many of the path related configuration parameters, the default value that appears contains $user. It should be noted that, it automatically gets updated to user name based on the value provided in user name input of node authentication. Other than this, there are few more Hadoop2 technology specific inputs as follows. Impetus Confidential Page 19 of 48

Figure 3: Configuring Hadoop2 Cluster Impetus Confidential Page 20 of 48

Note: The default value is primarily provided either on the basis of what is suggested by the technology-specific component or is suggested by Ankush product. 1. S3 Support Access Key: ID of the account to access S3. 2. S3 Support Secret Key: Authentication key of the account to access S3. 3. S3n Support Access Key: ID of the account to access S3n. 4. S3n Support Secret Key: Authentication key of the account to access S3n. The various configurable parameters as per the user s deployment environment configuration are as follows: Sl. No. Field Default value / path 1 NameNode Path /home//hes/hadoopdirs/name 2 DataNode Path /home//hes/hadoopdirs/data 3 Mapred Temp Path /home//hes/hadoopdirs/mrtmp 4 Hadoop Temp Path /home//hes/hadoopdirs/hadooptmp 5 Web Application Proxy Enabled 6 Web App Proxy Node List of retrieved nodes 7 Resource Manager Node List of retrieved nodes 8 Job History Server Disabled, List of retrieved nodes 9 High Availability Enabled 10 Nameservice ID Value of Cluster Name 11 StandBy NameNode List of retrieved nodes 12 NameNode ID 1 nn1 13 NameNode ID 2 nn2 14 Journal Nodes List of retrieved nodes 15 Journal Nodes Dir /home//hes/hadoopdirs/jndata 16 Automatic Failover Enabled Table 2: Hadoop2 Advanced Settings - Configurable parameters High Availability Configuration: Hadoop2 can be deployed with high availability support by enabling High Availability option in Hadoop configuration. By default, it is enabled. When it is enabled, secondary Namenode column/input is disabled on node list. To deploy Hadoop2 cluster with SecondaryNameNode, user needs to disable high availability. Zookeeper component needs to be selected for high availability configuration. Currently during Hadoop2 cluster setup apart from Hadoop user also has provision to Impetus Confidential Page 21 of 48

setup nine more ecosystem components along with Hadoop. Those components are: 1. Flume 2. Hbase 3. Hive 4. Mahout 5. Oozie 6. Pig 7. Solr 8. Sqoop 9. Zookeeper Table 3: Hadoop2 Ecosystem Components Out of all these Hadoop2 components Hbase, Hive & Zookeeper take considerably more inputs than other components. The inputs are mainly related with Nodes & advanced configuration settings. By clicking on > against each selected component user can further configure it. HBase configuration details are as follows: Impetus Confidential Page 22 of 48

Figure 4: HBase Configuration The various configurable parameters on the Hadoop2 s Hbase page as per the user s deployment environment configuration are as follows: Sl. No. Field Default value / path 1 Region Servers List of retrieved nodes 2 File Size 10737418240 (bytes) Impetus Confidential Page 23 of 48

3 Compaction Threshold 3 4 Cache Size 0.25 (%) 5 Caching 1 6 Timeout 180000 (milliseconds) 7 Multiplier 2 8 Major Compaction 86400000 (milliseconds) 9 Max Size 10485760 (bytes) 10 Flush Size 134217728 (bytes) 11 Handler Count 10 Table 4: Hadoop2: HBase - Configurable Parameters Hive configuration details are as follows: Figure 5: Hadoop2: Hive Configuration The various configurable parameters on the Hadoop2 s Hive page as per the user s deployment environment configuration are as follows: Impetus Confidential Page 24 of 48

Sl. No. Field Default value / path 1 Hive Server List of retrieved nodes 2 Connection Driver Name org.apache.derby.jdbc.embeddeddriver 3 Connection URL jdbc:derby:;databasename=metastore_db;create=true 4 Connection User Name APP 5 Connection Password Mine Table 5: Hadoop2: Hive - Configurable Parameters Zookeeper configuration details are as follows: Figure 6: Hadoop2: Hive - Configurable Parameters 1. Zookeeper Nodes: Nodes for Zookeeper. The various configurable parameters on the Hadoop2 s Zookeeper page as per the user s deployment environment configuration are as follows: Impetus Confidential Page 25 of 48

Sl. No. Field Default value / path 1 Tick Time 2000 (milliseconds) 2 Client Port 2182 3 Data Dir /home//hes/zookeeper/zk_data_dir/ 4 Sync Limit 2 (milliseconds) 5 Init Limit 5 (milliseconds) Table 6: Hadoop2: Zookeeper - Configurable Parameters Ankush User s Guide for Hadoop2 Configuring Roles: From the retrieved node list, configure the nodes that needs to be used as NameNode, SecondaryNameNode and DataNodes. Selecting SecondaryNameNode is optional. Note: 1. For each ecosystem component that needs to be installed configure its vendor, version, bundle path and installation path if any custom changes are required. Otherwise, the mentioned default values and paths will be used. 2. Flume, Mahout, Oozie, Pig, Solr and Sqoop (if selected) are deployed only on that node which is configured as NameNode. 3. HBase master is always deployed on node configured as NameNode It should be noted that the user needs to click the Deploy button as shown in the right topmost corner of the screen to deploy the cluster in the environment. Impetus Confidential Page 26 of 48

Part III Managing & Monitoring Hadoop2 Cluster Impetus Confidential Page 27 of 48

5 Chapter-4 Managing Hadoop2 Cluster This section includes the list of features through which Ankush is managed. The management function includes many features that includes: Adding nodes to a cluster: Add Nodes link from dropdown Actions enables you to add nodes. Configuring Cluster: This provides information about the configuration with which cluster was launched in preview only mode. The contents viewed may change depending on configuration values used for cluster deployment & depending on environment of the cluster. Configuring Alert levels: Alert configuration allows the user to configure alert & warning limits for a cluster. This feature enables the user to configure thresholds level of alert & warning for CPU & Memory utilization applicable over all the nodes of the cluster on which it is configured. View Node Details: Click Nodes > to view the detail of the nodes. This opens up a list of nodes that are members of this cluster. The user can perform node related operations like viewing further details, deleting nodes etc. Deleting nodes: This function enables the user to delete nodes. Deleting cluster: This function enables the user to delete a cluster. Please refer to Cluster Management section - Getting Started Guide for more information. 5.1 Cluster Details Cluster Details page for Hadoop2 Cluster covers the Hadoop2 specific items other than the generic items as covered in Ankush - Getting Started Guide. A screenshot of Hadoop2 Cluster Details page is as shown below: Impetus Confidential Page 28 of 48

Figure 7: Hadoop2 Cluster Details 5.2 Node List A screenshot of Hadoop2 node list page is as shown below. Figure 8: Hadoop2 Node List Impetus Confidential Page 29 of 48

5.3 Node Details A screenshot of Hadoop2 node details page is as shown below: Figure 9: Hadoop2 Node Details Note: For In Premise Hadoop cluster deployment, the user can configure the exact nodes where particular components needs to be installed. Furthermore, the user can also assign various roles specific to that technology (Hadoop, Hive, Hbase & Zookeeper) 5.4 Management Actions This opens up actions menu that provides various cluster management options as shown below: Impetus Confidential Page 30 of 48

Figure 10: Management Actions Management Actions provides certain quick links and action options to various functions like viewing logs, viewing events, viewing audit trails etc. these are same links as appearing in the Management Links section - Getting Started Guide. Also, Add Nodes and Delete Cluster functionality is explained in Management Actions section - Getting Started Guide. Coming to Manage Configurations, this action option enables the user to manage configurations, again same as Configuration link in the Management Links section. Refer to Submitting Application section for more information regarding Submit Jobs option. 5.5 Submitting Application To submit an application on to Hadoop2 cluster, click on Submit Application through Actions menu. This opens the following screen where user need to provide name of execution class name in application input, need to upload application *.jar file by clicking in Jar path input. Depending on number of arguments required, user can add required number of arguments by clicking against application Argument any number of times. Next, provide values for those arguments. Click the Submit button to submit the application. In order to remove an argument, click on against that particular argument. Impetus Confidential Page 31 of 48

Figure 11: Submit Application 5.6 Application Monitoring This opens up view for application monitoring: Figure 12: Application Monitoring This feature provides the user with an option to view as well as manage applications running on Hadoop cluster. Based on application status, filtering parameters can be applied to list the applications meeting a required state criteria. Knowing more Details about job To know more details about application click on > against the application. It provides application related details as shown below. Impetus Confidential Page 32 of 48

Figure 13: Application Details 5.7 Configuration This opens up the configuration management page, also accessible through the Configuration management link as shown below: Figure 14: Manage Configurations Impetus Confidential Page 33 of 48

5.7.1 Cluster This provides information about the configuration with which cluster was launched in preview only mode. The contents viewed may change depending on configuration values used for cluster deployment & depending on environment of the cluster. Figure 15: Cluster Configurations 5.7.2 Hadoop Ecosystem It provides a read only view of setup configuration information related to various installed ecosystem components. Impetus Confidential Page 34 of 48

Figure 16: Hadoop Ecosystem Impetus Confidential Page 35 of 48

Figure 17: Hadoop2 Advanced settings Impetus Confidential Page 36 of 48

5.7.3 Parameters This functionality helps the user to modify configuration of components. User can add new parameters or modify values of existing parameters. Figure 18: Hadoop - Parameters In order to get more information about Add, Update or Delete Parameters refer to Parameters section - Getting Started Guide. Impetus Confidential Page 37 of 48

6 Chapter-5 Monitoring Hadoop2 Cluster A very important aspect of cluster management is monitoring its health, progress status etc. Ankush provides an exhaustive graphical representation and reporting system of varied aspects. There are numerous generic features / aspects that are covered across technologies as follows: Tiles: At very top various tiles are shown which can be classified into different categories like alerts, warning & information. Some of the tiles are clickable & opens up corresponding page related to it. In order to view Hadoop2 Technology specific tiles, refer to Tiles Summary section. Trend utilization of CPU, Memory, Network, Load and Packet Heat Map: This section provides CPU / Memory utilization HeatMap of cluster nodes. Each block corresponds to a node in the cluster. Color of individual block changes independently as per the value of utilization on the corresponding node. By default CPU utilization is shown. By clicking on required metric CPU or Memory its utilization heat map can be viewed. The node IP to which that HeatMap block corresponds can be identified by resting mouse pointer on it. Clicking on HeatMap of a particular node opens up node details page for that node. EcoSystem: This section provides details about various ecosystem components installed on cluster. Individual component details can be viewed by clicking on > against each component. This normally includes information related to Nodes, Configuration, Events, Logs and Audit Trails. However, the reporting system is further enhanced and provide lots of information about other components as well like Job Monitoring for a Hadoop2 cluster etc. Events: This summarizes various events occurred on the cluster. Logs: This functionality allows user to view or download logs of a cluster. Audit Trail: This functionality helps to track changes in the configuration. Please refer to Cluster Monitoring Section - Getting Started Guide for more information. Impetus Confidential Page 38 of 48

Figure 19: Cluster Monitoring - Generic features Note: Each technology contains Cluster details page which displays Node list section through the user can further navigate to Node details page. Refer to individual sections for more details. 6.1 Technology Specific Details Technology specific details section covers summary of nodes, HDFS, Jobs along with installed Hadoop and ecosystem details. Refer to the below given individual sections for more information. 6.1.1 Node Summary Nodes summary section provides summary of nodes that includes information Impetus Confidential Page 39 of 48

about ResourceManager, ActiveNameNode, StandBy NameNode, Live DataNodes and Active NodeManagers. Figure 20: Nodes Summary 6.1.2 HDFS Summary HDFS usage section provides summary of HDFS usage that includes information about DFS used, DFS remaining, Non-DFS used, configured capacity and timestamp when the namenode was started. Figure 21: HDFS Usage 6.1.3 Jobs Summary Jobs Summary section provides summary of jobs that includes information about running applications, completed applications, available memory, total memory and timestamp when the resource manager was started. Impetus Confidential Page 40 of 48

Figure 22: Jobs Summary 6.1.4 Hadoop Details Hadoop Details section provides those Hadoop related configuration details like vendor, version and DFS replication with which the cluster was deployed. Figure 23: Hadoop Details 6.1.5 Hadoop Ecosystem Details Hadoop Ecosystem section lists all the Hadoop ecosystem components that are a part of the given cluster. Figure 24: Hadoop Ecosystem Impetus Confidential Page 41 of 48

Figure 25: Hadoop Ecosystem (Zookeeper) Details 6.2 Node Utilization Graph Node utilization graphs can be accessed through UTILIZATION GRAPHS > link on Node Details page. A screenshot of Hadoop2 node utilization graph page is as shown below. The user can view the trend cumulatively for last one hour, or one day, or one week, or one month or one year. Accordingly the utilization graphs are drawn. Impetus Confidential Page 42 of 48

Figure 26: Node Utilization Graph Impetus Confidential Page 43 of 48

Part IV Tiles Summary Impetus Confidential Page 44 of 48

7 Chapter-6 Tiles Summary The following table summarizes different Hadoop2 technology specific tiles shown on various pages: Page Tile Details Type / Comment Cluster Details Deployed Ecosystems Count Unconditional, always visible Cluster Details Data Nodes Count Unconditional, always visible Cluster Details NameNode Down, Could not get dfs data Conditional, Visible if NameNode Down Cluster Details Job Tracker Down, Could not get Job data Conditional, Visible if Job Tracker Down Node List NameNode Unconditional, always visible Node List DataNodes Unconditional, always visible Node List Secondary NameNode Unconditional, always visible Note: Common tiles like Node Role tile are listed in Getting Started Guide. Table 7: Tiles Summary Impetus Confidential Page 45 of 48

Part V Glossary Impetus Confidential Page 46 of 48

8 Glossary This sections describes the definitions and meaning of various terminologies and technical jargons used in this manual. This helps the readers to understand the essence of the content in a better manner. 8.1 List of terms Cluster: A collection of nodes. ~: Implies user s home directory. For root user, the home directory is /root/ and normally for any other user the home directory is /home/username/ Cluster Details: Page that provides basic information and links for working with clusters. Node List: Page that provides list of nodes that are part of a cluster. Node Details: Page that provides node specific details (Node role, services) and options to perform node level operations. Cluster Level Graphs: Page that provides cluster level graphs. Node Level Graphs: Page that provides node level graphs. Impetus Confidential Page 47 of 48

9 Revision History This sections describes revision history of the document date-wise. Sl. No. Date Document Remarks version 1 14-Apr-2014 v1.5 Updated document according to latest 1.5 release features and functionalities. Also, formatted and structured in a new refined template and font scheming. 2 30-Apr-2014 v1.5 Refined and sent the guide for team review. Table 8: Revision History Impetus Confidential Page 48 of 48