HP SiteScope. Hadoop Cluster Monitoring Solution Template Best Practices. For the Windows, Solaris, and Linux operating systems



Similar documents
HP SiteScope. HP Vertica Solution Template Best Practices. For the Windows, Solaris, and Linux operating systems. Software Version: 11.

HP Business Service Management

HP Business Service Management

Customizing Asset Manager for Managed Services Providers (MSP) Software Asset Management

HP Business Service Management

HP Real User Monitor. Release Notes. For the Windows and Linux operating systems Software Version: Document Release Date: November 2012

HP Universal CMDB. Software Version: Data Flow Management Best Practices

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Collaboration Guide

Managing Scalability of Web services

HP Business Service Management

HP OpenView Smart Plug-in for Microsoft Exchange Server

HP Service Manager. Software Version: 9.34 For the supported Windows and UNIX operating systems. Service Desk help topics for printing

HP Service Manager. Software Version: 9.34 For the supported Windows and UNIX operating systems. Incident Management help topics for printing

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Application Setup help topics for printing

HP Service Manager. Collaboration Guide. For the Supported Windows and UNIX operating systems. Software Version: 9.31

HP Service Manager. Service Desk help topics for printing. For the supported Windows and UNIX operating systems. Software Version: 9.

HP Asset Manager. Implementing Single Sign On for Asset Manager Web 5.x. Legal Notices Introduction Using AM

HP Business Service Management

HP Device Manager 4.7

HP Business Process Monitor

HP Software as a Service

HP Project and Portfolio Management Center

HP Quality Center. Software Version: Microsoft Word Add-in Guide

HP Operations Orchestration Software

HP Change Configuration and Release Management (CCRM) Solution

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Processes and Best Practices Guide (Codeless Mode)

HP Operations Smart Plug-in for Virtualization Infrastructure

HP Application Lifecycle Management

HP Device Manager 4.7

Radia Cloud. User Guide. For the Windows operating systems Software Version: Document Release Date: June 2014

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Request Management help topics for printing

HP AppPulse Active. Software Version: 2.2. Real Device Monitoring For AppPulse Active

HP Records Manager. Release Notes. Software Version: 8.1. Document Release Date: June 2014

HP Business Availability Center

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

Vertica OnDemand Getting Started Guide HPE Vertica Analytic Database. Software Version: 7.2.x

Legal Notices Introduction... 3

HP Operations Orchestration Software

HP Operations Orchestration Software

HP Asset Manager. Software version: Asset Tracking Solution

HP Service Manager. Software Version: 9.34 For the supported Windows and UNIX operating systems. Processes and Best Practices Guide

HP Server Automation Enterprise Edition

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.1.x

HP Software as a Service. Federated SSO Guide

Supported Platforms HPE Vertica Analytic Database. Software Version: 7.2.x

HP OpenView Performance Insight Report Pack for Databases

HP SiteScope software

HP OpenView AssetCenter

HP Business Service Management

HP OpenView AssetCenter

HP OpenView Internet Services. SNMP Integration with HP Operations Manager for Windows White Paper

HP ALM Best Practices Series

Plug-In for Informatica Guide

HP Quality Center. Software Version: Microsoft Excel Add-in Guide

HP Enterprise Integration module for SAP applications

HP Universal CMDB. Software Version: Support Matrix

HP Application Lifecycle Management

HP Operations Orchestration Software

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Service Desk help topics for printing

HP Business Service Management

HP Operations Orchestration Software

Taming Microsoft Environments with HP SiteScope Exchange and Active Directory Solution Templates

HP LeftHand SAN Solutions

HP AppPulse Mobile. Adding HP AppPulse Mobile to Your Android App

HP Quality Center. Upgrade Preparation Guide

HP Quality Center. Software Version: Microsoft Excel Add-in Guide

HP 3PAR Recovery Manager Software for Microsoft Exchange Server 2007, 2010, and 2013

HP IMC User Behavior Auditor

HP Systinet. Software Version: Windows and Linux Operating Systems. Concepts Guide

HP Business Service Management

HP ThinPro. Table of contents. Connection Configuration for RDP Farm Deployments. Technical white paper

HP OpenView Storage Mirroring application notes. Guidelines for testing a disaster recovery/high availability scenario

HP Device Manager 4.7

HP Device Manager 4.6

HP OpenView Service Desk Process Insight 2.10 software

HP Cloud Service Automation

HP Business Availability Center

HP POLYSERVE SOFTWARE

Monitoring and Operating a Private Cloud with System Center 2012 (10750) H7G37S

HP OpenView Smart Plug-in for Microsoft Exchange

HP Vertica on Amazon Web Services Backup and Restore Guide

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Problem Management help topics for printing

Web Integration between AssetCenter 5.0 and ServiceCenter 6.2 Using Single Sign-on

HP Operations Orchestration Software

Installing Microsoft Windows

HP ARCHIVING SOFTWARE FOR EXCHANGE

Microsoft Windows Compute Cluster Server 2003 Getting Started Guide

Bluetooth Pairing. User Guide

P4000 SAN/iQ software upgrade user guide

capacity management for StorageWorks NAS servers

HP LeftHand SAN Solutions

HP Priority Services. Priority Access

FTP Server Configuration

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Configuration Management help topics for printing

How to use Data Protector 6.0 or 6.10 with Exchange Recovery Storage Groups to restore a single mailbox

HP Device Manager 4.6

USING MANAGED PRINTER LISTS

Transcription:

HP SiteScope For the Windows, Solaris, and Linux operating systems Software Version: 11.23 Hadoop Cluster Monitoring Solution Template Best Practices Document Release Date: December 2013 Software Release Date: December 2013

Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Restricted Rights Legend Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. Copyright Notice Copyright 2005-2013 Hewlett-Packard Development Company, L.P. Trademark Notices Adobe and Acrobat are trademarks of Adobe Systems Incorporated. Intel, Pentium, and Intel Xeon are trademarks of Intel Corporation in the U.S. and other countries. ipod is a trademark of Apple Computer, Inc. Java is a registered trademark of Oracle and/or its affiliates. Microsoft, Windows, Windows NT, and Windows XP are U.S registered trademarks of Microsoft Corporation. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. UNIX is a registered trademark of The Open Group. Documentation Updates The title page of this document contains the following identifying information: Software Version number, which indicates the software version. Document Release Date, which changes each time the document is updated. Software Release Date, which indicates the release date of this version of the software. To check for recent updates or to verify that you are using the most recent edition of a document, go to: http://h20230.www2.hp.com/selfsolve/manuals This site requires that you register for an HP Passport and sign in. To register for an HP Passport ID, go to: http://h20229.www2.hp.com/passport-registration.html Or click the New users - please register link on the HP Passport login page. You will also receive updated or new editions if you subscribe to the appropriate product support service. Contact your HP sales representative for details. Support Visit the HP Software Support Online web site at: http://www.hp.com/go/hpsoftwaresupport This web site provides contact information and details about the products, services, and support that HP Software offers. HP Software online support provides customer self-solve capabilities. It provides a fast and efficient way to access interactive technical support tools needed to manage your business. As a valued support customer, you can benefit by using the support web site to: Search for knowledge documents of interest Submit and track support cases and enhancement requests Download software patches Manage support contracts Look up HP support contacts Review information about available services Enter into discussions with other software customers Research and register for software training Most of the support areas require that you register as an HP Passport user and sign in. Many also require a support contract. To register for an HP Passport ID, go to: http://h20229.www2.hp.com/passport-registration.html To find more information about access levels, go to: http://h20230.www2.hp.com/new_access_levels.jsp HP Software Solutions Now accesses the HPSW Solution and Integration Portal Web site. This site enables you to explore HP Product Solutions to meet your business needs, includes a full list of Integrations between HP Products, as well as a listing of ITIL Processes. The URL for this Web site is http://h20230.www2.hp.com/sc/solutions/index.jsp HP SiteScope (11.23) Page 2 of 12

Contents Contents 3 Chapter 1: Overview 4 Chapter 2: Monitors and s Common to HDFS and MapReduce Templates 5 Chapter 3: JMX s on HDFS Master Host 7 Chapter 4: JMX s on MapReduce Host 9 We appreciate your feedback! 12 HP SiteScope (11.23) Page 3 of 12

Chapter 1: Overview The Hadoop Cluster Monitoring Solution Template enables you to monitor and troubleshoot the Hadoop Distributed File System (HDFS) and Hadoop MapReduce master nodes of the Hadoop cluster infrastructure. The solution template deploys a set of monitors against a Hadoop cluster host. These monitors are designed to manage large, fast-growing volumes of data and provide fast query performance when used for data warehouses and other query-intensive applications. Perhaps the greatest value in the SiteScope Hadoop Cluster Monitoring Solution Template is the built-in list of default metrics and thresholds. These default metrics are based not only on HP research and our customers experiences, but also on industry expert recommendations. Benefits of the Hadoop Cluster Monitoring Solution Template include: Reduces the need for Hadoop performance domain expertise Reduces the time to configure and deploy Hadoop monitors Helps identify both real-time performance bottlenecks and longer term trends Adds only negligible overhead to production systems The Hadoop Cluster Monitoring Solution Template deploys the following monitors that target the following performance and health aspects of the Hadoop cluster infrastructure: Hadoop Monitor UNIX Resources Monitor Memory Monitor Multi Log Monitor Note: Each Hadoop cluster has a different size, workload, and job complexity. This means that you need to adjust the default threshold values according to your environment specifics, so as to maximize the benefits from using the Hadoop solution templates. HP SiteScope (11.23) Page 4 of 12

Chapter 2: Monitors and s Common to HDFS and MapReduce Templates The following table lists the monitors and metrics that are common to both the HDFS and MapReduce solution templates. UNIX Resources Monitor Default Threshold /^FileSystems\\.*\\Available$/ /^FileSystems\\.*\\Use\%$/ error if >= 95 warning if >= 90 /^FileSystems\\.*\\Used$/ /^Processor\\.*\\User low$/ /^Processor\\.*\\User$/ /^Processor\\.*\\System$/ /^Processor\\.*\\Idle$/ error if <= 10 warning if <= 20 Memory Monitor Default Threshold virtual/swap used % error if == 'n/a' or if > 90 warning if > 80 good if always(default) Multi Log Monitor Counters: File name match: /.*hadoop.*-namenode.*.log/ Content match: /.*ERROR.*/,/.*exception.*/,/.*warning.*/ Match value labels: errors, exceptions, warnings Thresholds: HP SiteScope (11.23) Page 5 of 12

Hadoop Cluster Monitoring Solution Template Best Practices Chapter 2: Monitors and s Common to HDFS and MapReduce Templates error if: exceptions > 0 errors > 0 warning if: warnings > 0 notprocessedfilesbyfilelimit > 0 good if: always(default) Page 6 of 12 HP SiteScope (11.23)

Chapter 3: JMX s on HDFS Master Host The following table lists the JMX metrics that are enabled by default on the HDFS master node host. JMX metrics on %%hdfs_master_host%% Hadoop/NameNode/ FSNamesystems/BlockCapacity Hadoop/NameNode/ FSNamesystems/CorruptBlocks Hadoop/NameNode/ FSNamesystems/MissingBlocks Hadoop/NameNode/jvm/ gccount Hadoop/NameNode/jvm/ gctimemillis Hadoop/NameNode/jvm/ memheapcommittedm Hadoop/NameNode/jvm/ memheapusedm Hadoop/NameNode/jvm/ memnonheapcommittedm Hadoop/NameNode/jvm/ memnonheapusedm Hadoop/NameNode/NameNodeInfo/Free Default configured data block size in bytes Number of corrupted data block on HDFS Number of missing blocks on HDFS Total number of collections that have occurred Approximate accumulated collection elapsed time in milliseconds Heap memory committed in MB Heap memory used in MB Non-heap memory committed in MB Non-heap memory used in MB HDFS free space in bytes Hadoop/NameNode/NameNodeInfo/ PercentRemaining Hadoop/NameNode/NameNodeInfo/ NodesData/Dead nodes count Hadoop/NameNode/NameNodeInfo/ NodesData/Decom nodes count Hadoop/NameNode/NameNodeInfo/ NodesData/Live nodes count Percentage of remaining HDFS space Number of dead data nodes in cluster Number of decommissioned data nodes in cluster Number of alive data nodes in cluster HP SiteScope (11.23) Page 7 of 12

Hadoop Cluster Monitoring Solution Template Best Practices Chapter 3: JMX s on HDFS Master Host Hadoop/NameNode/NameNodeInfo/ NodesData/NameDir statuses failed count Hadoop/NameNode/NameNodeInfo/ NodesData/NameDir statuses active count Number of name storage directories in failed status Number of name storage directories in active status Calculated s Counter live datanodes percentage dead datanodes percentage decomm datanodes percentage failed Namedirs percentage Ratio of alive nodes to total number of nodes in cluster. Ratio of dead nodes to total number of nodes in cluster. Ratio of decommissioned nodes to total number of nodes in cluster. Ratio of storage directories in failed status to the total number of storage directories. Thresholds Error if: Hadoop/NameNode/NameNodeInfo/PercentUsed > 90 live datanodes percentage <= 66 failed Namedirs percentage >= 25 Warning if: dead datanodes percentage >= 25 Hadoop/NameNode/FSNamesystems/MissingBlocks > 0 Hadoop/NameNode/FSNamesystems/CorruptBlocks > 0 Good if: always(default) Page 8 of 12 HP SiteScope (11.23)

Chapter 4: JMX s on MapReduce Host The following table lists the JMX metrics that are enabled by default on the MapReduce master node host. JMX metrics on %%mapreduce_master_host%% JobTrackerDetails/Alive nodes count JobTrackerDetails/Blacklisted nodes count JobTrackerDetails/Graylisted nodes count JobTrackerDetails/Total jobs count JobTrackerDetails/Total map slots JobTrackerDetails/Total reduce slots JobTrackerDetails/Total nodes count JobTrackerDetails/Used map slots JobTrackerDetails/Used reduce slots Queues Data/Queues in running state Queues Data/Queues in stopped state Queues Data/Queues in undefined state ThreadCount Hadoop/JobTracker/jvm/gcCount Number of alive task trackers in cluster. Number of blacklisted task trackers in cluster. Number of graylisted task trackers in cluster. Total number of jobs submitted. Sum of all slots for map operations configured on all task trackers. Sum of all slots for reduce operations configured on all task trackers. Total number of task trackers in cluster regardless of their state. Number of currently used map slots. Number of currently used reduce slots. Number of queues in running state. Number of queues in stopped state. Number of queues in undefined state. Number of thread created by map/reduce jobs. Total number of collections that have occurred. HP SiteScope (11.23) Page 9 of 12

Hadoop Cluster Monitoring Solution Template Best Practices Chapter 4: JMX s on MapReduce Host Hadoop/JobTracker/jvm/gcTimeMillis Hadoop/JobTracker/jvm/memHeapCommittedM Approximate accumulated collection elapsed time in milliseconds. Heap memory committed in MB. Hadoop/JobTracker/jvm/memHeapUsedM Hadoop/JobTracker/jvm/threadsBlocked jobs_ completed/ jobs_failed/ jobs_killed/ jobs_preparing/ jobs_running/ jobs_submitted/ maps_completed/ maps_failed/ maps_killed/ waiting_maps/ maps_launched/ reduces_completed/ reduces_failed/ waiting_reduces/ Heap memory used in MB. Number of blocked threads. Number of completed jobs per configured queue. Number of failed jobs per. Number of killed jobs per. Number of jobs preparing per configured queue. Number of running jobs per. Number of submitted jobs per configured queue Number of completed map operations per Number of failed map operations per Number of killed map operations per Number of waiting map operations per Number of launched map operations per Number of completed reduce operations per Number of failed reduce operations per Number of waiting reduce operations per Page 10 of 12 HP SiteScope (11.23)

Hadoop Cluster Monitoring Solution Template Best Practices Chapter 4: JMX s on MapReduce Host reduces_killed/ Hadoop/JobTracker/Queues/ custom_queue/reduces_launched reserved_map_slots/ reserved_reduce_slots/ Number of killed reduce operations per Number of launched reduce operations per Number of reserved slots for map operations per Number of reserved slots for reduce operations per Calculated s Counter Alive tasktrackers percentage Dead tasktrackers percentage Graylisted tasktarckers percentage Ratio of alive nodes to total nodes in cluster Ratio of blacklisted nodes to total nodes in cluster Ratio of graylisted nodes to total nodes in cluster Thresholds Error if: alive tasktracker percentage <= 66 Warning if: alive tasktracker percentage <= 75 JobTrackerDetails/Blacklisted nodes count > 0 JobTrackerDetails/Graylisted nodes count > 0 Good if: always(default) HP SiteScope (11.23) Page 11 of 12

We appreciate your feedback! If you have comments about this document, you can contact the documentation team by email. If an email client is configured on this system, click the link above and an email window opens with the following information in the subject line: Feedback on Hadoop Cluster Monitoring Solution Template Best Practices (SiteScope 11.23) Just add your feedback to the email and click send. If no email client is available, copy the information above to a new message in a web mail client, and send your feedback to SW-doc@hp.com. HP SiteScope (11.23) Page 12 of 12