Cloudera Manager Health Checks



Similar documents
Cloudera Manager Health Checks

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Manager Introduction

Cloudera Navigator Installation and User Guide

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Backup and Disaster Recovery

Cloudera Backup and Disaster Recovery

Configuring TLS Security for Cloudera Manager

Cloudera Manager Monitoring and Diagnostics Guide

CDH 5 Quick Start Guide

Cloudera Manager Monitoring and Diagnostics Guide

How To Use Cloudera Manager Backup And Disaster Recovery (Brd) On A Microsoft Hadoop (Clouderma) On An Ubuntu Or 5.3.5

Cloudera Administration

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Communicating with the Elephant in the Data Center

Ankush Cluster Manager - Hadoop2 Technology User Guide

Important Notice. (c) Cloudera, Inc. All rights reserved.

Cloudera Manager Training: Hands-On Exercises

HDFS: Hadoop Distributed File System

HADOOP MOCK TEST HADOOP MOCK TEST I

Apache HBase. Crazy dances on the elephant back

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

CDH AND BUSINESS CONTINUITY:

Distributed Filesystems

CDH 5 High Availability Guide

Hadoop Architecture. Part 1

Hadoop Scalability at Facebook. Dmytro Molkov YaC, Moscow, September 19, 2011

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

CURSO: ADMINISTRADOR PARA APACHE HADOOP

Cloudera Manager Administration Guide

The Hadoop Distributed File System

CDH 5 High Availability Guide

The Hadoop Distributed File System

Deploying Hadoop with Manager

Design and Evolution of the Apache Hadoop File System(HDFS)

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

HDFS Users Guide. Table of contents

Hadoop Distributed Filesystem. Spring 2015, X. Zhang Fordham Univ.

THE HADOOP DISTRIBUTED FILE SYSTEM

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

6. How MapReduce Works. Jari-Pekka Voutilainen

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Windows Small Business Server 2003 Upgrade Best Practices

Hadoop Distributed File System (HDFS) Overview

HADOOP MOCK TEST HADOOP MOCK TEST II

Test-King.CCA Q.A. Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH)

HP SiteScope. Hadoop Cluster Monitoring Solution Template Best Practices. For the Windows, Solaris, and Linux operating systems

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

Benchmarking Hadoop & HBase on Violin

Introduction to Hyper-V High- Availability with Failover Clustering

Apache Hadoop. Alexandru Costan

HDFS Reliability. Tom White, Cloudera, 12 January 2008

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Certified Big Data and Apache Hadoop Developer VS-1221

Introduction to HDFS. Prasanth Kothuri, CERN

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Cloudera ODBC Driver for Apache Hive Version

Big Data With Hadoop

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

7 Deadly Hadoop Misconfigurations. Kathleen Ting February 2013

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Memory-to-memory session replication

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Qsoft Inc

HADOOP MOCK TEST HADOOP MOCK TEST

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网. Information Management. Information Management IBM CDL Lab

Pivotal HD Enterprise

Cloudera Navigator Installation and User Guide

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

Distributed File Systems

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Operations and Big Data: Hadoop, Hive and Scribe. Zheng 铮 9 12/7/2011 Velocity China 2011

Apache Hadoop new way for the company to store and analyze big data

COURSE CONTENT Big Data and Hadoop Training

Introduction to HDFS. Prasanth Kothuri, CERN

Comparing Scalable NOSQL Databases

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Polycom CMA System Upgrade Guide

Important Notice. (c) Cloudera, Inc. All rights reserved.

Taming Operations in the Apache Hadoop Ecosystem. Jon Hsieh, Kate Ting, USENIX LISA 14 Nov 14, 2014

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES

Pipeliner CRM Phaenomena Guide Sales Pipeline Management Pipelinersales Inc.

HDFS Architecture Guide

Apache Hadoop FileSystem and its Usage in Facebook

COSC 6397 Big Data Analytics. Distributed File Systems (II) Edgar Gabriel Spring HDFS Basics

docs.hortonworks.com

Veritas Cluster Server Application Note: Disaster Recovery for Microsoft SharePoint Server

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Transcription:

Cloudera, Inc. 220 Portage Avenue Palo Alto, CA 94306 info@cloudera.com US: 1-888-789-1488 Intl: 1-650-362-0488 www.cloudera.com Cloudera Manager Health Checks

Important Notice 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and any other product or service names or slogans contained in this document, except as otherwise disclaimed, are trademarks of Cloudera and its suppliers or licensors, and may not be copied, imitated or used, in whole or in part, without the prior written permission of Cloudera or the applicable trademark holder. Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation. All other trademarks, registered trademarks, product names and company names or logos mentioned in this document are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, supplier or otherwise does not constitute or imply endorsement, sponsorship or recommendation thereof by us. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Cloudera. Cloudera may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Cloudera, the furnishing of this document does not give you any license to these patents, trademarks copyrights, or other intellectual property. The information in this document is subject to change without notice. Cloudera shall not be liable for any damages resulting from technical errors or omissions which may be present in this document, or from use of this document. Version: 4.6 Date: August 8, 2013

Contents ACTIVITY MONITOR ACTIVITY MONITOR PIPELINE... 1 ACTIVITY MONITOR ACTIVITY TREE PIPELINE... 1 ACTIVITY MONITOR FILE DESCRIPTOR... 2 ACTIVITY MONITOR HOST HEALTH... 3 ACTIVITY MONITOR LOG DIRECTORY FREE SPACE... 3 ACTIVITY MONITOR CLOUDERA MANAGER AGENT HEALTH... 4 ACTIVITY MONITOR UNEXPECTED EXITS... 5 ACTIVITY MONITOR WEB METRIC COLLECTION... 5 FLUME AGENT FILE DESCRIPTOR... 6 FLUME AGENT HOST HEALTH... 6 FLUME AGENT LOG DIRECTORY FREE SPACE... 7 FLUME AGENT CLOUDERA MANAGER AGENT HEALTH... 8 FLUME AGENT UNEXPECTED EXITS... 8 ALERT PUBLISHER FILE DESCRIPTOR... 9 ALERT PUBLISHER HOST HEALTH... 9 ALERT PUBLISHER LOG DIRECTORY FREE SPACE... 10 ALERT PUBLISHER CLOUDERA MANAGER AGENT HEALTH... 11 ALERT PUBLISHER UNEXPECTED EXITS... 11 DATANODE BLOCK COUNT... 12 DATANODE CONNECTIVITY... 12 DATANODE FILE DESCRIPTOR... 13 DATANODE FREE SPACE REMAINING... 14 DATANODE GARBAGE COLLECTION DURATION... 14 DATANODE HIGH AVAILABILITY CONNECTIVITY... 15 DATANODE HOST HEALTH... 16 DATANODE LOG DIRECTORY FREE SPACE... 16 DATANODE CLOUDERA MANAGER AGENT HEALTH... 17 DATANODE UNEXPECTED EXITS... 18 DATANODE VOLUME FAILURES... 18 DATANODE WEB METRIC COLLECTION... 19 EVENT SERVER EVENT STORE SIZE... 19

EVENT SERVER FILE DESCRIPTOR... 20 EVENT SERVER HOST HEALTH... 20 EVENT SERVER INDEX DIRECTORY FREE SPACE... 21 EVENT SERVER LOG DIRECTORY FREE SPACE... 21 EVENT SERVER CLOUDERA MANAGER AGENT HEALTH... 22 EVENT SERVER UNEXPECTED EXITS... 23 EVENT SERVER WEB METRIC COLLECTION... 23 EVENT SERVER WRITE PIPELINE... 24 FAILOVER CONTROLLER FILE DESCRIPTOR... 24 FAILOVER CONTROLLER HOST HEALTH... 25 FAILOVER CONTROLLER LOG DIRECTORY FREE SPACE... 25 FAILOVER CONTROLLER CLOUDERA MANAGER AGENT HEALTH... 26 FAILOVER CONTROLLER UNEXPECTED EXITS... 27 FLUME AGENTS HEALTH... 27 HBASE BACKUP MASTERS HEALTH... 28 HBASE MASTER HEALTH... 29 HBASE REGIONSERVERS HEALTH... 30 HBASE REST SERVER FILE DESCRIPTOR... 30 HBASE REST SERVER HOST HEALTH... 31 HBASE REST SERVER LOG DIRECTORY FREE SPACE... 31 HBASE REST SERVER CLOUDERA MANAGER AGENT HEALTH... 32 HBASE REST SERVER UNEXPECTED EXITS... 33 HBASE THRIFT SERVER FILE DESCRIPTOR... 33 HBASE THRIFT SERVER HOST HEALTH... 34 HBASE THRIFT SERVER LOG DIRECTORY FREE SPACE... 34 HBASE THRIFT SERVER CLOUDERA MANAGER AGENT HEALTH... 35 HBASE THRIFT SERVER UNEXPECTED EXITS... 35 HDFS BLOCKS WITH CORRUPT REPLICAS... 36 HDFS CANARY HEALTH... 37 HDFS CORRUPT REPLICAS... 37 HDFS DATANODES HEALTH... 38 HDFS FREE SPACE REMAINING... 39

HDFS HIGH AVAILABILITY NAMENODE HEALTH... 39 HDFS MISSING BLOCKS... 40 HDFS NAMENODE HEALTH... 41 HDFS STANDBY NAMENODES HEALTH... 41 HDFS UNDER REPLICATED BLOCKS... 42 HOST AGENT LOG DIRECTORY FREE SPACE... 42 HOST AGENT PARCEL DIRECTORY FREE SPACE... 43 HOST AGENT PROCESS DIRECTORY FREE SPACE... 44 HOST CLOCK OFFSET... 45 HOST DNS RESOLUTION... 45 HOST DNS RESOLUTION DURATION... 46 HOST MEMORY SWAPPING... 46 HOST NETWORK FRAME ERRORS... 47 HOST NETWORK INTERFACES SLOW MODE... 48 HOST CLOUDERA MANAGER AGENT HEALTH... 49 HOST MONITOR FILE DESCRIPTOR... 49 HOST MONITOR HOST HEALTH... 49 HOST MONITOR HOST PIPELINE... 50 HOST MONITOR LOG DIRECTORY FREE SPACE... 50 HOST MONITOR CLOUDERA MANAGER AGENT HEALTH... 51 HOST MONITOR UNEXPECTED EXITS... 52 HOST MONITOR WEB METRIC COLLECTION... 52 HTTPFS FILE DESCRIPTOR... 53 HTTPFS HOST HEALTH... 53 HTTPFS LOG DIRECTORY FREE SPACE... 54 HTTPFS CLOUDERA MANAGER AGENT HEALTH... 54 HTTPFS UNEXPECTED EXITS... 55 IMPALA ASSIGNMENT LOCALITY... 56 IMPALA DAEMONS HEALTH... 56 IMPALA STATESTORE HEALTH... 57 IMPALAD CONNECTIVITY... 58 IMPALAD FILE DESCRIPTOR... 59

IMPALAD HOST HEALTH... 59 IMPALAD LOG DIRECTORY FREE SPACE... 59 IMPALAD MEMORY RESIDENT SET SIZE HEALTH... 60 IMPALAD CLOUDERA MANAGER AGENT HEALTH... 61 IMPALAD UNEXPECTED EXITS... 61 IMPALAD WEB METRIC COLLECTION... 62 JOBTRACKER FILE DESCRIPTOR... 62 JOBTRACKER GARBAGE COLLECTION DURATION... 63 JOBTRACKER HOST HEALTH... 63 JOBTRACKER LOG DIRECTORY FREE SPACE... 64 JOBTRACKER CLOUDERA MANAGER AGENT HEALTH... 65 JOBTRACKER UNEXPECTED EXITS... 65 JOBTRACKER WEB METRIC COLLECTION... 66 JOURNALNODE EDITS DIRECTORY FREE SPACE... 66 JOURNALNODE FILE DESCRIPTOR... 67 JOURNALNODE GARBAGE COLLECTION DURATION... 68 JOURNALNODE HOST HEALTH... 68 JOURNALNODE LOG DIRECTORY FREE SPACE... 69 JOURNALNODE CLOUDERA MANAGER AGENT HEALTH... 69 JOURNALNODE SYNC STATUS... 70 JOURNALNODE UNEXPECTED EXITS... 70 JOURNALNODE WEB METRIC COLLECTION... 71 MAPREDUCE HIGH AVAILABILITY JOBTRACKER HEALTH... 72 MAPREDUCE JOB FAILURE RATIO... 73 MAPREDUCE JOBTRACKER HEALTH... 74 MAPREDUCE MAPS LOCALITY... 74 MAPREDUCE MAP BACKLOG... 76 MAPREDUCE REDUCE BACKLOG... 76 MAPREDUCE STANDBY JOBTRACKERS HEALTH... 77 MAPREDUCE TASKTRACKERS HEALTH... 77 MASTER CANARY HEALTH... 78 MASTER FILE DESCRIPTOR... 78

MASTER GARBAGE COLLECTION DURATION... 79 MASTER HOST HEALTH... 80 MASTER LOG DIRECTORY FREE SPACE... 80 MASTER CLOUDERA MANAGER AGENT HEALTH... 81 MASTER UNEXPECTED EXITS... 82 MASTER WEB METRIC COLLECTION... 82 MANAGEMENT ACTIVITY MONITOR HEALTH... 83 MANAGEMENT ALERT PUBLISHER HEALTH... 83 MANAGEMENT EVENT SERVER HEALTH... 84 MANAGEMENT HOST MONITOR HEALTH... 84 MANAGEMENT NAVIGATOR HEALTH... 84 MANAGEMENT REPORTS MANAGER HEALTH... 85 MANAGEMENT SERVICE MONITOR HEALTH... 85 NAMENODE CHECKPOINT AGE... 86 NAMENODE DATA DIRECTORIES FREE SPACE... 86 NAMENODE DIRECTORY FAILURES... 87 NAMENODE FILE DESCRIPTOR... 87 NAMENODE GARBAGE COLLECTION DURATION... 88 NAMENODE HIGH AVAILABILITY CHECKPOINT AGE... 89 NAMENODE HOST HEALTH... 89 NAMENODE JOURNALNODE SYNC STATUS... 90 NAMENODE LOG DIRECTORY FREE SPACE... 90 NAMENODE RPC LATENCY... 91 NAMENODE SAFE MODE... 91 NAMENODE CLOUDERA MANAGER AGENT HEALTH... 92 NAMENODE UNEXPECTED EXITS... 93 NAMENODE UPGRADE STATUS... 93 NAMENODE WEB METRIC COLLECTION... 94 NAVIGATOR FILE DESCRIPTOR... 94 NAVIGATOR HOST HEALTH... 95 NAVIGATOR LOG DIRECTORY FREE SPACE... 95 NAVIGATOR CLOUDERA MANAGER AGENT HEALTH... 96

NAVIGATOR UNEXPECTED EXITS... 96 REGIONSERVER COMPACTION QUEUE... 97 REGIONSERVER FILE DESCRIPTOR... 97 REGIONSERVER FLUSH QUEUE... 98 REGIONSERVER GARBAGE COLLECTION DURATION... 98 REGIONSERVER HOST HEALTH... 99 REGIONSERVER LOG DIRECTORY FREE SPACE... 100 REGIONSERVER MASTER CONNECTIVITY... 100 REGIONSERVER MEMSTORE SIZE... 101 REGIONSERVER READ LATENCY... 102 REGIONSERVER CLOUDERA MANAGER AGENT HEALTH... 102 REGIONSERVER STORE FILE IDX SIZE... 103 REGIONSERVER SYNC LATENCY... 103 REGIONSERVER UNEXPECTED EXITS... 104 REGIONSERVER WEB METRIC COLLECTION... 105 REPORTS MANAGER FILE DESCRIPTOR... 105 REPORTS MANAGER HOST HEALTH... 106 REPORTS MANAGER LOG DIRECTORY FREE SPACE... 106 REPORTS MANAGER CLOUDERA MANAGER AGENT HEALTH... 107 REPORTS MANAGER SCRATCH DIRECTORY FREE SPACE... 108 REPORTS MANAGER UNEXPECTED EXITS... 108 SECONDARY NAMENODE CHECKPOINT DIRECTORIES FREE SPACE... 109 SECONDARY NAMENODE FILE DESCRIPTOR... 110 SECONDARY NAMENODE GARBAGE COLLECTION DURATION... 110 SECONDARY NAMENODE HOST HEALTH... 111 SECONDARY NAMENODE LOG DIRECTORY FREE SPACE... 112 SECONDARY NAMENODE CLOUDERA MANAGER AGENT HEALTH... 112 SECONDARY NAMENODE UNEXPECTED EXITS... 113 SECONDARY NAMENODE WEB METRIC COLLECTION... 113 ZOOKEEPER SERVER CONNECTION COUNT... 114 ZOOKEEPER SERVER DATA DIRECTORY FREE SPACE... 115 ZOOKEEPER SERVER DATA LOG DIRECTORY FREE SPACE... 116

ZOOKEEPER SERVER FILE DESCRIPTOR... 116 ZOOKEEPER SERVER GARBAGE COLLECTION DURATION... 117 ZOOKEEPER SERVER HOST HEALTH... 117 ZOOKEEPER SERVER LOG DIRECTORY FREE SPACE... 118 ZOOKEEPER SERVER MAX LATENCY... 119 ZOOKEEPER SERVER OUTSTANDING REQUESTS... 119 ZOOKEEPER SERVER QUORUM MEMBERSHIP... 120 ZOOKEEPER SERVER CLOUDERA MANAGER AGENT HEALTH... 121 ZOOKEEPER SERVER UNEXPECTED EXITS... 121 SERVICE MONITOR FILE DESCRIPTOR... 122 SERVICE MONITOR HOST HEALTH... 122 SERVICE MONITOR LOG DIRECTORY FREE SPACE... 123 SERVICE MONITOR ROLE PIPELINE... 123 SERVICE MONITOR CLOUDERA MANAGER AGENT HEALTH... 124 SERVICE MONITOR UNEXPECTED EXITS... 125 SERVICE MONITOR WEB METRIC COLLECTION... 125 STATESTORE FILE DESCRIPTOR... 126 STATESTORE HOST HEALTH... 126 STATESTORE LOG DIRECTORY FREE SPACE... 127 STATESTORE MEMORY RESIDENT SET SIZE HEALTH... 127 STATESTORE CLOUDERA MANAGER AGENT HEALTH... 128 STATESTORE UNEXPECTED EXITS... 129 STATESTORE WEB METRIC COLLECTION... 129 TASKTRACKER BLACKLISTED... 130 TASKTRACKER CONNECTIVITY... 130 TASKTRACKER FILE DESCRIPTOR... 131 TASKTRACKER GARBAGE COLLECTION DURATION... 131 TASKTRACKER HOST HEALTH... 132 TASKTRACKER LOG DIRECTORY FREE SPACE... 132 TASKTRACKER CLOUDERA MANAGER AGENT HEALTH... 133 TASKTRACKER UNEXPECTED EXITS... 134 TASKTRACKER WEB METRIC COLLECTION... 134

ZOOKEEPER CANARY HEALTH... 135 ZOOKEEPER CURRENT ZXID... 136 ZOOKEEPER SERVERS HEALTH... 137

Activity Monitor Activity Monitor Pipeline Activity Monitor Activity Monitor Pipeline Details: This Activity Monitor health check checks that no messages are being dropped by the activity monitor stage of the Activity Monitor pipeline. A failure of this health check indicates a problem with the Activity Monitor. This may indicate a configuration problem or a bug in the Activity Monitor. This test can be configured using the Activity Monitor Activity Monitor Pipeline Monitoring Time Period monitoring setting. Short Name: Activity Monitor Pipeline Activity Monitor Activity Monitor Pipeline Monitoring The health check for monitoring the Activity Monitor activity monitor pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. activitymonitor_activity_ monitor_pipeline_ critical:any, warning:never Activity Monitor Activity Monitor Pipeline Monitoring Time Period The time period over which the Activity Monitor activity monitor pipeline will be monitored for dropped messages. activitymonitor_activity_ monitor_pipeline_ window 5 MINUTES Activity Monitor Activity Tree Pipeline Details: This Activity Monitor health check checks that no messages are being dropped by the activity tree stage of the Activity Monitor pipeline. A failure of this health check indicates a problem with the Activity Monitor. This may indicate a configuration problem or a bug in the Activity Monitor. This test can be configured using the Activity Monitor Activity Tree Pipeline Monitoring Time Period monitoring setting. Short Name: Activity Tree Pipeline Cloudera Manager 4.6 Health Checks 1

Activity Monitor File Descriptor Activity Monitor Activity Tree Pipeline Monitoring The health check activitymonitor_activity_ critical:any, for tree_pipeline_ warning:never monitoring the Activity Monitor activity tree pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. Activity Monitor Activity Tree Pipeline Monitoring Time Period The time period over which the Activity Monitor activity tree pipeline will be monitored for dropped messages. activitymonitor_activity_ tree_pipeline_window 5 MINUTES Activity Monitor File Descriptor Details: This Activity Monitor health check checks that the number of file descriptors used does not rise above some percentage of the Activity Monitor file descriptor limit. A failure of this health check may indicate a bug in either Hadoop or Cloudera Manager. Contact Cloudera support. This test can be configured using the File Descriptor Monitoring Activity Monitor monitoring setting. Short Name: File Descriptors File Descriptor Monitoring The health check of the number of file descriptors used. Specified as a percentage of file descriptor limit. activitymonitor_fd_ critical:70.000000, warning:50.000000 2 Cloudera Manager 4.6 Health Checks

Activity Monitor Host Health Activity Monitor Host Health Details: This Activity Monitor health check factors in the health of the host upon which the Activity Monitor is running. A failure of this check means that the host running the Activity Monitor is experiencing some problem. See that host's status page for more details.this test can be enabled or disabled using the Activity Monitor Host Health Check Activity Monitor monitoring setting. Short Name: Host Health Activity Monitor Host Health Check When computing the overall Activity Monitor health, consider the host's health. activitymonitor_host_ health_enabled Activity Monitor Log Directory Free Space Details: This Activity Monitor health check checks that the filesystem containing the log directory of this Activity Monitor has sufficient free space. This test can be configured using the Log Directory Free Space Monitoring Absolute and Log Directory Free Space Monitoring Percentage Activity Monitor monitoring settings. Short Name: Log Directory Free Space Log Directory Free Space Monitoring Absolute The health check for monitoring of free space on the filesystem that contains this role's log directory. log_directory_free_ space_absolute_ critical:5368709120.000000, BYTES warning:10737418240.00000 Log Directory Free Space Monitoring Percentage The health check for monitoring of free space on the filesystem that contains this role's log directory. Specified as a log_directory_free_ space_percentage_ critical:never, warning:never Cloudera Manager 4.6 Health Checks 3

Activity Monitor Cloudera Manager Agent Health percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute setting is configured. Activity Monitor Cloudera Manager Agent Health Details: This Activity Monitor health check checks that the Cloudera Manager Agent on the Activity Monitor host is heart beating correctly and that the process associated with the Activity Monitor role is in the state expected by Cloudera Manager. A failure of this health check may indicate a problem with the Activity Monitor process, a lack of connectivity to the Cloudera Manager Agent on the Activity Monitor host, or a problem with the Cloudera Manager Agent. This check can fail either because the Activity Monitor has crashed or because the Activity Monitor will not start or stop in a timely fashion. Check the Activity Monitor logs for more details. If the check fails because of problems communicating with the Cloudera Manager Agent on the Activity Monitor host, check the status of the Cloudera Manager Agent by running /etc/init.d/cloudera-scm-agent status on the Activity Monitor host, or look in the Cloudera Manager Agent logs on the Activity Monitor host for more details. This test can be enabled or disabled using the Activity Monitor Process Health Check Activity Monitor monitoring setting. Short Name: Process Status Activity Monitor Process Health Check Enables the health check that the Activity Monitor's process state is consistent with the role configuration activitymonitor_scm_ health_enabled 4 Cloudera Manager 4.6 Health Checks

Activity Monitor Unexpected Exits Activity Monitor Unexpected Exits Details: This Activity Monitor health check checks that the Activity Monitor has not recently exited unexpectedly. The check returns "Bad" health if the number of unexpected exits goes above a critical threshold. For example, if this check is configured with a critical threshold of 1, this check would return "Good" health if there have been no unexpected exits recently. If there has been 1 or more unexpected exits recently, this check would return "Bad" health. This test can be configured using the Unexpected Exits and Unexpected Exits Monitoring Period Activity Monitor monitoring settings. Short Name: Unexpected Exits Unexpected Exits Monitoring Period The period to review when computing unexpected exits. unexpected_exits_ window 5 MINUTES Unexpected Exits The health check unexpected_exits_ for unexpected exits encountered within a recent period specified by the unexpected_exits_wind ow configuration for the role. critical:any, warning:never Activity Monitor Web Metric Collection Details: This Activity Monitor health check checks that the web server of the Activity Monitor is responding quickly to requests by the Cloudera Manager agent, and that the Cloudera Manager agent can collect metrics from the web server. A failure of this health check may indicate a problem with the web server of the Activity Monitor, a misconfiguration of the Activity Monitor or a problem with the Cloudera Manager agent. Consult the Cloudera Manager agent logs and the logs of the Activity Monitor for more detail. If the test's failure message indicates a communication problem, this means that the Cloudera Manager Agent's HTTP requests to the Activity Monitor's web server are failing or timing out. These requests are completely local to the Activity Monitor's host, and so should never fail under normal conditions. If the test's failure message indicates an unexpected response, then the Activity Monitor's web server responded to the Cloudera Manager Agent's request, but the Cloudera Manager Agent could not interpret the response for some reason. This test can be configured using the Web Metric Collection Activity Monitor monitoring setting. Short Name: Web Server Status Cloudera Manager 4.6 Health Checks 5

Flume Agent File Descriptor Web Metric Collection Enables the health check that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. activitymonitor_web_ metric_collection_ enabled Flume Agent File Descriptor Details: This Agent health check checks that the number of file descriptors used does not rise above some percentage of the Agent file descriptor limit. A failure of this health check may indicate a bug in either Hadoop or Cloudera Manager. Contact Cloudera support. This test can be configured using the File Descriptor Monitoring Agent monitoring setting. Short Name: File Descriptors File Descriptor Monitoring The health check of the number of file descriptors used. Specified as a percentage of file descriptor limit. flume_agent_fd_ critical:70.000000, warning:50.000000 Flume Agent Host Health Details: This Agent health check factors in the health of the host upon which the Agent is running. A failure of this check means that the host running the Agent is experiencing some problem. See that host's status page for more details.this test can be enabled or disabled using the Flume Agent Host Health Check Agent monitoring setting. Short Name: Host Health Flume Agent Host When computing the overall Flume flume_agent_host_health 6 Cloudera Manager 4.6 Health Checks

Flume Agent Log Directory Free Space Health Check Agent health, consider the host's health. _enabled Flume Agent Log Directory Free Space Details: This Agent health check checks that the filesystem containing the log directory of this Agent has sufficient free space. This test can be configured using the Log Directory Free Space Monitoring Absolute and Log Directory Free Space Monitoring Percentage Agent monitoring settings. Short Name: Log Directory Free Space Log Directory Free Space Monitoring Absolute The health check for monitoring of free space on the filesystem that contains this role's log directory. log_directory_free _space_absolute_ critical:5368709120.0 BYTES 00000, warning:10737418240.000000 Log Directory Free Space Monitoring Percentage The health check for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute setting is configured. log_directory_ free_space_ percentage_ critical:never, warning:never Cloudera Manager 4.6 Health Checks 7

Flume Agent Cloudera Manager Agent Health Flume Agent Cloudera Manager Agent Health Details: This Agent health check checks that the Cloudera Manager Agent on the Agent host is heart beating correctly and that the process associated with the Agent role is in the state expected by Cloudera Manager. A failure of this health check may indicate a problem with the Agent process, a lack of connectivity to the Cloudera Manager Agent on the Agent host, or a problem with the Cloudera Manager Agent. This check can fail either because the Agent has crashed or because the Agent will not start or stop in a timely fashion. Check the Agent logs for more details. If the check fails because of problems communicating with the Cloudera Manager Agent on the Agent host, check the status of the Cloudera Manager Agent by running /etc/init.d/cloudera-scm-agent status on the Agent host, or look in the Cloudera Manager Agent logs on the Agent host for more details. This test can be enabled or disabled using the Flume Agent Process Health Check Agent monitoring setting. Short Name: Process Status Flume Agent Process Health Check Enables the health check that the Flume Agent's process state is consistent with the role configuration flume_agent_scm_health _enabled Flume Agent Unexpected Exits Details: This Agent health check checks that the Agent has not recently exited unexpectedly. The check returns "Bad" health if the number of unexpected exits goes above a critical threshold. For example, if this check is configured with a critical threshold of 1, this check would return "Good" health if there have been no unexpected exits recently. If there has been 1 or more unexpected exits recently, this check would return "Bad" health. This test can be configured using the Unexpected Exits and Unexpected Exits Monitoring Period Agent monitoring settings. Short Name: Unexpected Exits Unexpected Exits Monitoring Period The period to review when computing unexpected exits. unexpected_exits_ window 5 MINUTES Unexpected Exits The health check for unexpected_exits_ critical:any, warning:never 8 Cloudera Manager 4.6 Health Checks

Alert Publisher File Descriptor unexpected exits encountered within a recent period specified by the unexpected_exits_windo w configuration for the role. Alert Publisher File Descriptor Details: This Alert Publisher health check checks that the number of file descriptors used does not rise above some percentage of the Alert Publisher file descriptor limit. A failure of this health check may indicate a bug in either Hadoop or Cloudera Manager. Contact Cloudera support. This test can be configured using the File Descriptor Monitoring Alert Publisher monitoring setting. Short Name: File Descriptors File Descriptor Monitoring The health check of the number of file descriptors used. Specified as a percentage of file descriptor limit. alertpublisher_fd_ critical:70.000000, warning:50.000000 Alert Publisher Host Health Details: This Alert Publisher health check factors in the health of the host upon which the Alert Publisher is running. A failure of this check means that the host running the Alert Publisher is experiencing some problem. See that host's status page for more details.this test can be enabled or disabled using the Alert Publisher Host Health Check Alert Publisher monitoring setting. Short Name: Host Health Alert Publisher Host Health Check When computing the overall Alert alertpublisher_host_health _enabled Cloudera Manager 4.6 Health Checks 9

Alert Publisher Log Directory Free Space Publisher health, consider the host's health. Alert Publisher Log Directory Free Space Details: This Alert Publisher health check checks that the filesystem containing the log directory of this Alert Publisher has sufficient free space. This test can be configured using the Log Directory Free Space Monitoring Absolute and Log Directory Free Space Monitoring Percentage Alert Publisher monitoring settings. Short Name: Log Directory Free Space Log Directory Free Space Monitoring Absolute The health check for monitoring of free space on the filesystem that contains this role's log directory. log_directory_free_ space_absolute_ critical:5368709120. 000000, warning:107374182 40.000000 BYTES Log Directory Free Space Monitoring Percentage The health check for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute setting is configured. log_directory_free_ space_percentage_ critical:never, warning:never 10 Cloudera Manager 4.6 Health Checks

Alert Publisher Cloudera Manager Agent Health Alert Publisher Cloudera Manager Agent Health Details: This Alert Publisher health check checks that the Cloudera Manager Agent on the Alert Publisher host is heart beating correctly and that the process associated with the Alert Publisher role is in the state expected by Cloudera Manager. A failure of this health check may indicate a problem with the Alert Publisher process, a lack of connectivity to the Cloudera Manager Agent on the Alert Publisher host, or a problem with the Cloudera Manager Agent. This check can fail either because the Alert Publisher has crashed or because the Alert Publisher will not start or stop in a timely fashion. Check the Alert Publisher logs for more details. If the check fails because of problems communicating with the Cloudera Manager Agent on the Alert Publisher host, check the status of the Cloudera Manager Agent by running /etc/init.d/cloudera-scm-agent status on the Alert Publisher host, or look in the Cloudera Manager Agent logs on the Alert Publisher host for more details. This test can be enabled or disabled using the Alert Publisher Process Health Check Alert Publisher monitoring setting. Short Name: Process Status Alert Publisher Process Health Check Enables the health alertpublisher_scm_ check that the Alert health_enabled Publisher's process state is consistent with the role configuration Alert Publisher Unexpected Exits Details: This Alert Publisher health check checks that the Alert Publisher has not recently exited unexpectedly. The check returns "Bad" health if the number of unexpected exits goes above a critical threshold. For example, if this check is configured with a critical threshold of 1, this check would return "Good" health if there have been no unexpected exits recently. If there has been 1 or more unexpected exits recently, this check would return "Bad" health. This test can be configured using the Unexpected Exits and Unexpected Exits Monitoring Period Alert Publisher monitoring settings. Short Name: Unexpected Exits Unexpected Exits Monitoring Period The period to review when computing unexpected exits. unexpected_exits_ window 5 MINUTES Unexpected Exits The health check for unexpected_exits_ critical:any, warning:never Cloudera Manager 4.6 Health Checks 11

DataNode Block Count unexpected exits encountered within a recent period specified by the unexpected_exits_wind ow configuration for the role. DataNode Block Count Details: This is a DataNode health check that checks for whether the DataNode has too many blocks. A failure of this health check indicates that there may be performance problems with the DataNode. See the DataNode system for more information. This test can be enabled or disabled using the DataNode Block Count DataNode monitoring setting. Short Name: Block Count DataNode Block Count The health check of the number of blocks on a DataNode datanode_block_ count_ critical:never, warning:200000.00000 DataNode Connectivity Details: This is a DataNode health check that checks that the NameNode considers the DataNode alive. A failure of this health check may indicate that the DataNode is having trouble communicating with the NameNode. Look in the DataNode logs for more details. This test can be enabled or disabled using the DataNode Connectivity Health Check DataNode monitoring setting. The DataNode Connectivity Tolerance at Startup DataNode monitoring setting and the Health Check Startup Tolerance NameNode monitoring setting can be used to control the check's tolerance windows around DataNode and NameNode restarts respectively. Short Name: NameNode Connectivity DataNode Connectivity Health Enables the health check that verifies the datanode_connectivity _health_enabled 12 Cloudera Manager 4.6 Health Checks

DataNode File Descriptor Check DataNode Connectivity Tolerance at Startup DataNode is connected to the NameNode The amount of time to datanode_connectivity wait for the DataNode to _tolerance fully start up and connect to the NameNode before enforcing the connectivity check. 180 SECONDS Health Check Startup Tolerance The amount of time allowed after this role is started that failures of health checks that rely on communication with this role will be tolerated. namenode_startup_ tolerance 5 MINUTES DataNode File Descriptor Details: This DataNode health check checks that the number of file descriptors used does not rise above some percentage of the DataNode file descriptor limit. A failure of this health check may indicate a bug in either Hadoop or Cloudera Manager. Contact Cloudera support. This test can be configured using the File Descriptor Monitoring DataNode monitoring setting. Short Name: File Descriptors File Descriptor Monitoring The health check of the number of file descriptors used. Specified as a percentage of file descriptor limit. datanode_fd_ critical:70.000000, warning:50.000000 Cloudera Manager 4.6 Health Checks 13

DataNode Free Space Remaining DataNode Free Space Remaining Details: This is a DataNode health check that checks that the amount of free space available for HDFS block data on the DataNode does not fall below some percentage of total configured capacity of the DataNode. A failure of this health check may indicate a capacity planning problem. Try adding more disk capacity and additional data directories to the DataNode, or add additional DataNodes and take steps to rebalance your HDFS cluster. This test can be configured using the DataNode Free Space Monitoring DataNode monitoring setting. Short Name: Free Space DataNode Free Space Monitoring The health check of free space in a DataNode. Specified as a percentage of the capacity on the DataNode. datanode_free_space_ critical:10.000000, warning:20.000000 DataNode Garbage Collection Duration Details: This DataNode health check checks that the DataNode is not spending too much time performing Java garbage collection. It checks that no more than some percentage of recent time is spent performing Java garbage collection. A failure of this health check may indicate a capacity planning problem or misconfiguration of the DataNode. This test can be configured using the DataNode Garbage Collection Duration and DataNode Garbage Collection Duration Monitoring Period DataNode monitoring settings. Short Name: GC Duration DataNode Garbage Collection Duration Monitoring Period The period to review when computing the moving average of garbage collection time. datanode_gc_duration _window 5 MINUTES DataNode Garbage Collection Duration The health check for the weighted average datanode_gc_duration _ critical:60.000000, warning:30.000000 14 Cloudera Manager 4.6 Health Checks

DataNode High Availability Connectivity time spent in Java garbage collection. Specified as a percentage of elapsed wall clock time. See DataNode Garbage Collection Duration Monitoring Period. DataNode High Availability Connectivity Details: This is a DataNode health check that checks that the all running NameNodes in the HDFS service consider the DataNode alive. A failure of this health check may indicate that the DataNode is having trouble communicating with some or all NameNodes in the service. Look in the DataNode logs for more details. This test can be enabled or disabled using the DataNode Connectivity Health Check DataNode monitoring setting. The DataNode Connectivity Tolerance at Startup DataNode monitoring setting and the Health Check Startup Tolerance NameNode monitoring setting can be used to control the check's tolerance windows around DataNode and NameNode restarts respectively. Short Name: NameNode Connectivity DataNode Connectivity Health Check Enables the health check that verifies the DataNode is connected to the NameNode datanode_connectivity_ health_enabled DataNode Connectivity Tolerance at Startup The amount of time to wait for the DataNode to fully start up and connect to the NameNode before enforcing the connectivity check. datanode_connectivity_ tolerance 180 SECONDS Health Check The amount of time allowed after this role namenode_startup_ 5 MINUTES Cloudera Manager 4.6 Health Checks 15

DataNode Host Health Startup Tolerance is started that failures of health checks that rely on communication with this role will be tolerated. tolerance DataNode Host Health Details: This DataNode health check factors in the health of the host upon which the DataNode is running. A failure of this check means that the host running the DataNode is experiencing some problem. See that host's status page for more details.this test can be enabled or disabled using the DataNode Host Health Check DataNode monitoring setting. Short Name: Host Health DataNode Host Health Check When computing the overall DataNode health, consider the host's health. datanode_host_health_ enabled DataNode Log Directory Free Space Details: This DataNode health check checks that the filesystem containing the log directory of this DataNode has sufficient free space. This test can be configured using the Log Directory Free Space Monitoring Absolute and Log Directory Free Space Monitoring Percentage DataNode monitoring settings. Short Name: Log Directory Free Space Log Directory Free Space Monitoring Absolute The health check for monitoring of free space on the filesystem that contains this role's log directory. log_directory_free_ space_absolute_ critical:5368709120. 000000, warning:107374182 40.000000 BYTES 16 Cloudera Manager 4.6 Health Checks

DataNode Cloudera Manager Agent Health Log Directory Free Space Monitoring Percentage The health check for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute setting is configured. log_directory_free_ space_percentage_ critical:never, warning:never DataNode Cloudera Manager Agent Health Details: This DataNode health check checks that the Cloudera Manager Agent on the DataNode host is heart beating correctly and that the process associated with the DataNode role is in the state expected by Cloudera Manager. A failure of this health check may indicate a problem with the DataNode process, a lack of connectivity to the Cloudera Manager Agent on the DataNode host, or a problem with the Cloudera Manager Agent. This check can fail either because the DataNode has crashed or because the DataNode will not start or stop in a timely fashion. Check the DataNode logs for more details. If the check fails because of problems communicating with the Cloudera Manager Agent on the DataNode host, check the status of the Cloudera Manager Agent by running /etc/init.d/cloudera-scm-agent status on the DataNode host, or look in the Cloudera Manager Agent logs on the DataNode host for more details. This test can be enabled or disabled using the DataNode Process Health Check DataNode monitoring setting. Short Name: Process Status DataNode Process Health Check Enables the health datanode_scm_health check that the _enabled DataNode's process state is consistent with the role configuration Cloudera Manager 4.6 Health Checks 17

DataNode Unexpected Exits DataNode Unexpected Exits Details: This DataNode health check checks that the DataNode has not recently exited unexpectedly. The check returns "Bad" health if the number of unexpected exits goes above a critical threshold. For example, if this check is configured with a critical threshold of 1, this check would return "Good" health if there have been no unexpected exits recently. If there has been 1 or more unexpected exits recently, this check would return "Bad" health. This test can be configured using the Unexpected Exits and Unexpected Exits Monitoring Period DataNode monitoring settings. Short Name: Unexpected Exits Unexpected Exits Monitoring Period The period to review when computing unexpected exits. unexpected_exits_ window 5 MINUTES Unexpected Exits The health check unexpected_exits_ for unexpected exits encountered within a recent period specified by the unexpected_exits_wind ow configuration for the role. critical:any, warning:never DataNode Volume Failures Details: This is a DataNode health check that checks for whether the DataNode has reported any failed volumes. A failure of this health check indicates that there is a problem with one or more volumes on the DataNode. See the DataNode system for more information. This test can be configured using the DataNode Volume Failures DataNode monitoring setting. Short Name: Data Directory Status DataNode Volume Failures The health check of failed volumes in a DataNode. datanode_volume_ failures_ critical:any, warning:never 18 Cloudera Manager 4.6 Health Checks

DataNode Web Metric Collection DataNode Web Metric Collection Details: This DataNode health check checks that the web server of the DataNode is responding quickly to requests by the Cloudera Manager agent, and that the Cloudera Manager agent can collect metrics from the web server. A failure of this health check may indicate a problem with the web server of the DataNode, a misconfiguration of the DataNode or a problem with the Cloudera Manager agent. Consult the Cloudera Manager agent logs and the logs of the DataNode for more detail. If the test's failure message indicates a communication problem, this means that the Cloudera Manager Agent's HTTP requests to the DataNode's web server are failing or timing out. These requests are completely local to the DataNode's host, and so should never fail under normal conditions. If the test's failure message indicates an unexpected response, then the DataNode's web server responded to the Cloudera Manager Agent's request, but the Cloudera Manager Agent could not interpret the response for some reason. This test can be configured using the Web Metric Collection DataNode monitoring setting. Short Name: Web Server Status Web Metric Collection Enables the health check that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. datanode_web_ metric_collection_ enabled Event Server Event Store Size Details: This is an Event Server health check that checks that the event store size has not grown too far above the configured event store capacity. A failure of this health check indicates that the Event Server is having a problem performing cleanup. This may indicate a configuration problem or bug in the Event Server. This test can be configured using the Event Store Capacity Monitoring Event Server monitoring setting. Short Name: Event Store Size Event Store Capacity Monitoring The health check on the number of events in the event store. Specified as a percentage of the eventserver_ capacity_ critical:130.000000, warning:115.000000 Cloudera Manager 4.6 Health Checks 19

Event Server File Descriptor maximum number of events in Event Server store. Event Server File Descriptor Details: This Event Server health check checks that the number of file descriptors used does not rise above some percentage of the Event Server file descriptor limit. A failure of this health check may indicate a bug in either Hadoop or Cloudera Manager. Contact Cloudera support. This test can be configured using the File Descriptor Monitoring Event Server monitoring setting. Short Name: File Descriptors File Descriptor Monitoring The health check of the number of file descriptors used. Specified as a percentage of file descriptor limit. eventserver_fd_ critical:70.000000, warning:50.000000 Event Server Host Health Details: This Event Server health check factors in the health of the host upon which the Event Server is running. A failure of this check means that the host running the Event Server is experiencing some problem. See that host's status page for more details.this test can be enabled or disabled using the Event Server Host Health Check Event Server monitoring setting. Short Name: Host Health Event Server Host Health Check When computing the overall Event Server health, consider the host's health. eventserver_host_ health_enabled 20 Cloudera Manager 4.6 Health Checks

Event Server Index Directory Free Space Event Server Index Directory Free Space Details: This is an Event Server health check that checks that the filesystem containing the index directory of this Event Server has sufficient free space. This test can be configured using the Index Directory Free Space Monitoring Absolute and Index Directory Free Space Monitoring Percentage Event Server monitoring settings. Short Name: Index Directory Free Space Index Directory Free Space Monitoring Absolute The health check for monitoring of free space on the filesystem that contains the index directory. eventserver_index_ directory_free_space _absolute_ critical:53687091 20.000000, warning:1073741 8240.000000 BYTES Index Directory Free Space Monitoring Percentage The health check for monitoring of free space on the filesystem that contains the index directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if an Index Directory Free Space Monitoring Absolute setting is configured. eventserver_index_ critical:never, directory_free_space warning:never _percentage_ Event Server Log Directory Free Space Details: This Event Server health check checks that the filesystem containing the log directory of this Event Server has sufficient free space. This test can be configured using the Log Directory Free Space Monitoring Absolute and Log Directory Free Space Monitoring Percentage Event Server monitoring settings. Short Name: Log Directory Free Space Log Directory Free Space Monitoring Absolute The health check for monitoring of free space on the log_directory_free_ space_absolute_ critical:5368709120. 000000, warning:107374182 BYTES Cloudera Manager 4.6 Health Checks 21

Event Server Cloudera Manager Agent Health filesystem that contains this role's log directory. 40.000000 Log Directory Free Space Monitoring Percentage The health check for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute setting is configured. log_directory_free_ space_percentage_ critical:never, warning:never Event Server Cloudera Manager Agent Health Details: This Event Server health check checks that the Cloudera Manager Agent on the Event Server host is heart beating correctly and that the process associated with the Event Server role is in the state expected by Cloudera Manager. A failure of this health check may indicate a problem with the Event Server process, a lack of connectivity to the Cloudera Manager Agent on the Event Server host, or a problem with the Cloudera Manager Agent. This check can fail either because the Event Server has crashed or because the Event Server will not start or stop in a timely fashion. Check the Event Server logs for more details. If the check fails because of problems communicating with the Cloudera Manager Agent on the Event Server host, check the status of the Cloudera Manager Agent by running /etc/init.d/cloudera-scm-agent status on the Event Server host, or look in the Cloudera Manager Agent logs on the Event Server host for more details. This test can be enabled or disabled using the Event Server Process Health Check Event Server monitoring setting. Short Name: Process Status Event Server Process Health Check Enables the health check eventserver_scm_ that the Event Server's health_enabled process state is consistent with the role configuration 22 Cloudera Manager 4.6 Health Checks

Event Server Unexpected Exits Event Server Unexpected Exits Details: This Event Server health check checks that the Event Server has not recently exited unexpectedly. The check returns "Bad" health if the number of unexpected exits goes above a critical threshold. For example, if this check is configured with a critical threshold of 1, this check would return "Good" health if there have been no unexpected exits recently. If there has been 1 or more unexpected exits recently, this check would return "Bad" health. This test can be configured using the Unexpected Exits and Unexpected Exits Monitoring Period Event Server monitoring settings. Short Name: Unexpected Exits Unexpected Exits Monitoring Period The period to review when computing unexpected exits. unexpected_exits_ window 5 MINUTES Unexpected Exits The health check unexpected_exits_ for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. critical:any, warning:never Event Server Web Metric Collection Details: This Event Server health check checks that the web server of the Event Server is responding quickly to requests by the Cloudera Manager agent, and that the Cloudera Manager agent can collect metrics from the web server. A failure of this health check may indicate a problem with the web server of the Event Server, a misconfiguration of the Event Server or a problem with the Cloudera Manager agent. Consult the Cloudera Manager agent logs and the logs of the Event Server for more detail. If the test's failure message indicates a communication problem, this means that the Cloudera Manager Agent's HTTP requests to the Event Server's web server are failing or timing out. These requests are completely local to the Event Server's host, and so should never fail under normal conditions. If the test's failure message indicates an unexpected response, then the Event Server's web server responded to the Cloudera Manager Agent's request, but the Cloudera Manager Agent could not interpret the response for some reason. This test can be configured using the Web Metric Collection Event Server monitoring setting. Short Name: Web Server Status Cloudera Manager 4.6 Health Checks 23

Event Server Write Pipeline Web Metric Collection Enables the health check that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. eventserver_web_ metric_collection_ enabled Event Server Write Pipeline Details: This Event Server health check checks that no messages are being dropped by the writer stage of the Event Server pipeline. A failure of this health check indicates a problem with the Event Server. This may indicate a configuration problem or a bug in the Event Server. This test can be configured using the Event Server Write Pipeline Monitoring Time Period monitoring setting. Short Name: Write Pipeline Event Server Write Pipeline Monitoring The health check for monitoring the Event Server write pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. eventserver_write_ pipeline_ critical:any, warning:never Event Server Write Pipeline Monitoring Time Period The time period over which the Event Server write pipeline will be monitored for dropped messages. eventserver_write_ pipeline_window 5 MINUTES Failover Controller File Descriptor Details: This Failover Controller health check checks that the number of file descriptors used does not rise above some percentage of the Failover Controller file descriptor limit. A failure of this health check may indicate a bug in either Hadoop or Cloudera Manager. Contact Cloudera support. This test can be 24 Cloudera Manager 4.6 Health Checks

Failover Controller Host Health configured using the File Descriptor Monitoring Failover Controller monitoring setting. Short Name: File Descriptors File Descriptor Monitoring The health check of the number of file descriptors used. Specified as a percentage of file descriptor limit. failovercontroller_fd _ critical:70.000000, warning:50.000000 Failover Controller Host Health Details: This Failover Controller health check factors in the health of the host upon which the Failover Controller is running. A failure of this check means that the host running the Failover Controller is experiencing some problem. See that host's status page for more details.this test can be enabled or disabled using the FailoverController Host Health Check Failover Controller monitoring setting. Short Name: Host Health FailoverController Host Health Check When computing the overall FailoverController health, consider the host's health. failovercontroller_ host_health_enabled Failover Controller Log Directory Free Space Details: This Failover Controller health check checks that the filesystem containing the log directory of this Failover Controller has sufficient free space. This test can be configured using the Log Directory Free Space Monitoring Absolute and Log Directory Free Space Monitoring Percentage Failover Controller monitoring settings. Short Name: Log Directory Free Space Log Directory Free Space Monitoring Absolute The health check for monitoring of free space on the filesystem that contains this role's log log_directory_free_ space_absolute_ critical:536870912 0.000000, warning:10737418 240.000000 BYTES Cloudera Manager 4.6 Health Checks 25