Cloudera Manager Health Checks

Size: px
Start display at page:

Download "Cloudera Manager Health Checks"

Transcription

1 Cloudera, Inc. 220 Portage Avenue Palo Alto, CA US: Intl: Cloudera Manager Health Checks

2 Important Notice Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and any other product or service names or slogans contained in this document, except as otherwise disclaimed, are trademarks of Cloudera and its suppliers or licensors, and may not be copied, imitated or used, in whole or in part, without the prior written permission of Cloudera or the applicable trademark holder. Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation. All other trademarks, registered trademarks, product names and company names or logos mentioned in this document are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, supplier or otherwise does not constitute or imply endorsement, sponsorship or recommendation thereof by us. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Cloudera. Cloudera may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Cloudera, the furnishing of this document does not give you any license to these patents, trademarks copyrights, or other intellectual property. The information in this document is subject to change without notice. Cloudera shall not be liable for any damages resulting from technical errors or omissions which may be present in this document, or from use of this document. Version: 4.6 Date: August 8, 2013

3 Contents ACTIVITY MONITOR ACTIVITY MONITOR PIPELINE... 1 ACTIVITY MONITOR ACTIVITY TREE PIPELINE... 1 ACTIVITY MONITOR FILE DESCRIPTOR... 2 ACTIVITY MONITOR HOST HEALTH... 3 ACTIVITY MONITOR LOG DIRECTORY FREE SPACE... 3 ACTIVITY MONITOR CLOUDERA MANAGER AGENT HEALTH... 4 ACTIVITY MONITOR UNEXPECTED EXITS... 5 ACTIVITY MONITOR WEB METRIC COLLECTION... 5 FLUME AGENT FILE DESCRIPTOR... 6 FLUME AGENT HOST HEALTH... 6 FLUME AGENT LOG DIRECTORY FREE SPACE... 7 FLUME AGENT CLOUDERA MANAGER AGENT HEALTH... 8 FLUME AGENT UNEXPECTED EXITS... 8 ALERT PUBLISHER FILE DESCRIPTOR... 9 ALERT PUBLISHER HOST HEALTH... 9 ALERT PUBLISHER LOG DIRECTORY FREE SPACE ALERT PUBLISHER CLOUDERA MANAGER AGENT HEALTH ALERT PUBLISHER UNEXPECTED EXITS DATANODE BLOCK COUNT DATANODE CONNECTIVITY DATANODE FILE DESCRIPTOR DATANODE FREE SPACE REMAINING DATANODE GARBAGE COLLECTION DURATION DATANODE HIGH AVAILABILITY CONNECTIVITY DATANODE HOST HEALTH DATANODE LOG DIRECTORY FREE SPACE DATANODE CLOUDERA MANAGER AGENT HEALTH DATANODE UNEXPECTED EXITS DATANODE VOLUME FAILURES DATANODE WEB METRIC COLLECTION EVENT SERVER EVENT STORE SIZE... 19

4 EVENT SERVER FILE DESCRIPTOR EVENT SERVER HOST HEALTH EVENT SERVER INDEX DIRECTORY FREE SPACE EVENT SERVER LOG DIRECTORY FREE SPACE EVENT SERVER CLOUDERA MANAGER AGENT HEALTH EVENT SERVER UNEXPECTED EXITS EVENT SERVER WEB METRIC COLLECTION EVENT SERVER WRITE PIPELINE FAILOVER CONTROLLER FILE DESCRIPTOR FAILOVER CONTROLLER HOST HEALTH FAILOVER CONTROLLER LOG DIRECTORY FREE SPACE FAILOVER CONTROLLER CLOUDERA MANAGER AGENT HEALTH FAILOVER CONTROLLER UNEXPECTED EXITS FLUME AGENTS HEALTH HBASE BACKUP MASTERS HEALTH HBASE MASTER HEALTH HBASE REGIONSERVERS HEALTH HBASE REST SERVER FILE DESCRIPTOR HBASE REST SERVER HOST HEALTH HBASE REST SERVER LOG DIRECTORY FREE SPACE HBASE REST SERVER CLOUDERA MANAGER AGENT HEALTH HBASE REST SERVER UNEXPECTED EXITS HBASE THRIFT SERVER FILE DESCRIPTOR HBASE THRIFT SERVER HOST HEALTH HBASE THRIFT SERVER LOG DIRECTORY FREE SPACE HBASE THRIFT SERVER CLOUDERA MANAGER AGENT HEALTH HBASE THRIFT SERVER UNEXPECTED EXITS HDFS BLOCKS WITH CORRUPT REPLICAS HDFS CANARY HEALTH HDFS CORRUPT REPLICAS HDFS DATANODES HEALTH HDFS FREE SPACE REMAINING... 39

5 HDFS HIGH AVAILABILITY NAMENODE HEALTH HDFS MISSING BLOCKS HDFS NAMENODE HEALTH HDFS STANDBY NAMENODES HEALTH HDFS UNDER REPLICATED BLOCKS HOST AGENT LOG DIRECTORY FREE SPACE HOST AGENT PARCEL DIRECTORY FREE SPACE HOST AGENT PROCESS DIRECTORY FREE SPACE HOST CLOCK OFFSET HOST DNS RESOLUTION HOST DNS RESOLUTION DURATION HOST MEMORY SWAPPING HOST NETWORK FRAME ERRORS HOST NETWORK INTERFACES SLOW MODE HOST CLOUDERA MANAGER AGENT HEALTH HOST MONITOR FILE DESCRIPTOR HOST MONITOR HOST HEALTH HOST MONITOR HOST PIPELINE HOST MONITOR LOG DIRECTORY FREE SPACE HOST MONITOR CLOUDERA MANAGER AGENT HEALTH HOST MONITOR UNEXPECTED EXITS HOST MONITOR WEB METRIC COLLECTION HTTPFS FILE DESCRIPTOR HTTPFS HOST HEALTH HTTPFS LOG DIRECTORY FREE SPACE HTTPFS CLOUDERA MANAGER AGENT HEALTH HTTPFS UNEXPECTED EXITS IMPALA ASSIGNMENT LOCALITY IMPALA DAEMONS HEALTH IMPALA STATESTORE HEALTH IMPALAD CONNECTIVITY IMPALAD FILE DESCRIPTOR... 59

6 IMPALAD HOST HEALTH IMPALAD LOG DIRECTORY FREE SPACE IMPALAD MEMORY RESIDENT SET SIZE HEALTH IMPALAD CLOUDERA MANAGER AGENT HEALTH IMPALAD UNEXPECTED EXITS IMPALAD WEB METRIC COLLECTION JOBTRACKER FILE DESCRIPTOR JOBTRACKER GARBAGE COLLECTION DURATION JOBTRACKER HOST HEALTH JOBTRACKER LOG DIRECTORY FREE SPACE JOBTRACKER CLOUDERA MANAGER AGENT HEALTH JOBTRACKER UNEXPECTED EXITS JOBTRACKER WEB METRIC COLLECTION JOURNALNODE EDITS DIRECTORY FREE SPACE JOURNALNODE FILE DESCRIPTOR JOURNALNODE GARBAGE COLLECTION DURATION JOURNALNODE HOST HEALTH JOURNALNODE LOG DIRECTORY FREE SPACE JOURNALNODE CLOUDERA MANAGER AGENT HEALTH JOURNALNODE SYNC STATUS JOURNALNODE UNEXPECTED EXITS JOURNALNODE WEB METRIC COLLECTION MAPREDUCE HIGH AVAILABILITY JOBTRACKER HEALTH MAPREDUCE JOB FAILURE RATIO MAPREDUCE JOBTRACKER HEALTH MAPREDUCE MAPS LOCALITY MAPREDUCE MAP BACKLOG MAPREDUCE REDUCE BACKLOG MAPREDUCE STANDBY JOBTRACKERS HEALTH MAPREDUCE TASKTRACKERS HEALTH MASTER CANARY HEALTH MASTER FILE DESCRIPTOR... 78

7 MASTER GARBAGE COLLECTION DURATION MASTER HOST HEALTH MASTER LOG DIRECTORY FREE SPACE MASTER CLOUDERA MANAGER AGENT HEALTH MASTER UNEXPECTED EXITS MASTER WEB METRIC COLLECTION MANAGEMENT ACTIVITY MONITOR HEALTH MANAGEMENT ALERT PUBLISHER HEALTH MANAGEMENT EVENT SERVER HEALTH MANAGEMENT HOST MONITOR HEALTH MANAGEMENT NAVIGATOR HEALTH MANAGEMENT REPORTS MANAGER HEALTH MANAGEMENT SERVICE MONITOR HEALTH NAMENODE CHECKPOINT AGE NAMENODE DATA DIRECTORIES FREE SPACE NAMENODE DIRECTORY FAILURES NAMENODE FILE DESCRIPTOR NAMENODE GARBAGE COLLECTION DURATION NAMENODE HIGH AVAILABILITY CHECKPOINT AGE NAMENODE HOST HEALTH NAMENODE JOURNALNODE SYNC STATUS NAMENODE LOG DIRECTORY FREE SPACE NAMENODE RPC LATENCY NAMENODE SAFE MODE NAMENODE CLOUDERA MANAGER AGENT HEALTH NAMENODE UNEXPECTED EXITS NAMENODE UPGRADE STATUS NAMENODE WEB METRIC COLLECTION NAVIGATOR FILE DESCRIPTOR NAVIGATOR HOST HEALTH NAVIGATOR LOG DIRECTORY FREE SPACE NAVIGATOR CLOUDERA MANAGER AGENT HEALTH... 96

8 NAVIGATOR UNEXPECTED EXITS REGIONSERVER COMPACTION QUEUE REGIONSERVER FILE DESCRIPTOR REGIONSERVER FLUSH QUEUE REGIONSERVER GARBAGE COLLECTION DURATION REGIONSERVER HOST HEALTH REGIONSERVER LOG DIRECTORY FREE SPACE REGIONSERVER MASTER CONNECTIVITY REGIONSERVER MEMSTORE SIZE REGIONSERVER READ LATENCY REGIONSERVER CLOUDERA MANAGER AGENT HEALTH REGIONSERVER STORE FILE IDX SIZE REGIONSERVER SYNC LATENCY REGIONSERVER UNEXPECTED EXITS REGIONSERVER WEB METRIC COLLECTION REPORTS MANAGER FILE DESCRIPTOR REPORTS MANAGER HOST HEALTH REPORTS MANAGER LOG DIRECTORY FREE SPACE REPORTS MANAGER CLOUDERA MANAGER AGENT HEALTH REPORTS MANAGER SCRATCH DIRECTORY FREE SPACE REPORTS MANAGER UNEXPECTED EXITS SECONDARY NAMENODE CHECKPOINT DIRECTORIES FREE SPACE SECONDARY NAMENODE FILE DESCRIPTOR SECONDARY NAMENODE GARBAGE COLLECTION DURATION SECONDARY NAMENODE HOST HEALTH SECONDARY NAMENODE LOG DIRECTORY FREE SPACE SECONDARY NAMENODE CLOUDERA MANAGER AGENT HEALTH SECONDARY NAMENODE UNEXPECTED EXITS SECONDARY NAMENODE WEB METRIC COLLECTION ZOOKEEPER SERVER CONNECTION COUNT ZOOKEEPER SERVER DATA DIRECTORY FREE SPACE ZOOKEEPER SERVER DATA LOG DIRECTORY FREE SPACE

9 ZOOKEEPER SERVER FILE DESCRIPTOR ZOOKEEPER SERVER GARBAGE COLLECTION DURATION ZOOKEEPER SERVER HOST HEALTH ZOOKEEPER SERVER LOG DIRECTORY FREE SPACE ZOOKEEPER SERVER MAX LATENCY ZOOKEEPER SERVER OUTSTANDING REQUESTS ZOOKEEPER SERVER QUORUM MEMBERSHIP ZOOKEEPER SERVER CLOUDERA MANAGER AGENT HEALTH ZOOKEEPER SERVER UNEXPECTED EXITS SERVICE MONITOR FILE DESCRIPTOR SERVICE MONITOR HOST HEALTH SERVICE MONITOR LOG DIRECTORY FREE SPACE SERVICE MONITOR ROLE PIPELINE SERVICE MONITOR CLOUDERA MANAGER AGENT HEALTH SERVICE MONITOR UNEXPECTED EXITS SERVICE MONITOR WEB METRIC COLLECTION STATESTORE FILE DESCRIPTOR STATESTORE HOST HEALTH STATESTORE LOG DIRECTORY FREE SPACE STATESTORE MEMORY RESIDENT SET SIZE HEALTH STATESTORE CLOUDERA MANAGER AGENT HEALTH STATESTORE UNEXPECTED EXITS STATESTORE WEB METRIC COLLECTION TASKTRACKER BLACKLISTED TASKTRACKER CONNECTIVITY TASKTRACKER FILE DESCRIPTOR TASKTRACKER GARBAGE COLLECTION DURATION TASKTRACKER HOST HEALTH TASKTRACKER LOG DIRECTORY FREE SPACE TASKTRACKER CLOUDERA MANAGER AGENT HEALTH TASKTRACKER UNEXPECTED EXITS TASKTRACKER WEB METRIC COLLECTION

10 ZOOKEEPER CANARY HEALTH ZOOKEEPER CURRENT ZXID ZOOKEEPER SERVERS HEALTH

11 Activity Monitor Activity Monitor Pipeline Activity Monitor Activity Monitor Pipeline Details: This Activity Monitor health check checks that no messages are being dropped by the activity monitor stage of the Activity Monitor pipeline. A failure of this health check indicates a problem with the Activity Monitor. This may indicate a configuration problem or a bug in the Activity Monitor. This test can be configured using the Activity Monitor Activity Monitor Pipeline Monitoring Time Period monitoring setting. Short Name: Activity Monitor Pipeline Activity Monitor Activity Monitor Pipeline Monitoring The health check for monitoring the Activity Monitor activity monitor pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. activitymonitor_activity_ monitor_pipeline_ critical:any, warning:never Activity Monitor Activity Monitor Pipeline Monitoring Time Period The time period over which the Activity Monitor activity monitor pipeline will be monitored for dropped messages. activitymonitor_activity_ monitor_pipeline_ window 5 MINUTES Activity Monitor Activity Tree Pipeline Details: This Activity Monitor health check checks that no messages are being dropped by the activity tree stage of the Activity Monitor pipeline. A failure of this health check indicates a problem with the Activity Monitor. This may indicate a configuration problem or a bug in the Activity Monitor. This test can be configured using the Activity Monitor Activity Tree Pipeline Monitoring Time Period monitoring setting. Short Name: Activity Tree Pipeline Cloudera Manager 4.6 Health Checks 1

12 Activity Monitor File Descriptor Activity Monitor Activity Tree Pipeline Monitoring The health check activitymonitor_activity_ critical:any, for tree_pipeline_ warning:never monitoring the Activity Monitor activity tree pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. Activity Monitor Activity Tree Pipeline Monitoring Time Period The time period over which the Activity Monitor activity tree pipeline will be monitored for dropped messages. activitymonitor_activity_ tree_pipeline_window 5 MINUTES Activity Monitor File Descriptor Details: This Activity Monitor health check checks that the number of file descriptors used does not rise above some percentage of the Activity Monitor file descriptor limit. A failure of this health check may indicate a bug in either Hadoop or Cloudera Manager. Contact Cloudera support. This test can be configured using the File Descriptor Monitoring Activity Monitor monitoring setting. Short Name: File Descriptors File Descriptor Monitoring The health check of the number of file descriptors used. Specified as a percentage of file descriptor limit. activitymonitor_fd_ critical: , warning: Cloudera Manager 4.6 Health Checks

13 Activity Monitor Host Health Activity Monitor Host Health Details: This Activity Monitor health check factors in the health of the host upon which the Activity Monitor is running. A failure of this check means that the host running the Activity Monitor is experiencing some problem. See that host's status page for more details.this test can be enabled or disabled using the Activity Monitor Host Health Check Activity Monitor monitoring setting. Short Name: Host Health Activity Monitor Host Health Check When computing the overall Activity Monitor health, consider the host's health. activitymonitor_host_ health_enabled Activity Monitor Log Directory Free Space Details: This Activity Monitor health check checks that the filesystem containing the log directory of this Activity Monitor has sufficient free space. This test can be configured using the Log Directory Free Space Monitoring Absolute and Log Directory Free Space Monitoring Percentage Activity Monitor monitoring settings. Short Name: Log Directory Free Space Log Directory Free Space Monitoring Absolute The health check for monitoring of free space on the filesystem that contains this role's log directory. log_directory_free_ space_absolute_ critical: , BYTES warning: Log Directory Free Space Monitoring Percentage The health check for monitoring of free space on the filesystem that contains this role's log directory. Specified as a log_directory_free_ space_percentage_ critical:never, warning:never Cloudera Manager 4.6 Health Checks 3

14 Activity Monitor Cloudera Manager Agent Health percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute setting is configured. Activity Monitor Cloudera Manager Agent Health Details: This Activity Monitor health check checks that the Cloudera Manager Agent on the Activity Monitor host is heart beating correctly and that the process associated with the Activity Monitor role is in the state expected by Cloudera Manager. A failure of this health check may indicate a problem with the Activity Monitor process, a lack of connectivity to the Cloudera Manager Agent on the Activity Monitor host, or a problem with the Cloudera Manager Agent. This check can fail either because the Activity Monitor has crashed or because the Activity Monitor will not start or stop in a timely fashion. Check the Activity Monitor logs for more details. If the check fails because of problems communicating with the Cloudera Manager Agent on the Activity Monitor host, check the status of the Cloudera Manager Agent by running /etc/init.d/cloudera-scm-agent status on the Activity Monitor host, or look in the Cloudera Manager Agent logs on the Activity Monitor host for more details. This test can be enabled or disabled using the Activity Monitor Process Health Check Activity Monitor monitoring setting. Short Name: Process Status Activity Monitor Process Health Check Enables the health check that the Activity Monitor's process state is consistent with the role configuration activitymonitor_scm_ health_enabled 4 Cloudera Manager 4.6 Health Checks

15 Activity Monitor Unexpected Exits Activity Monitor Unexpected Exits Details: This Activity Monitor health check checks that the Activity Monitor has not recently exited unexpectedly. The check returns "Bad" health if the number of unexpected exits goes above a critical threshold. For example, if this check is configured with a critical threshold of 1, this check would return "Good" health if there have been no unexpected exits recently. If there has been 1 or more unexpected exits recently, this check would return "Bad" health. This test can be configured using the Unexpected Exits and Unexpected Exits Monitoring Period Activity Monitor monitoring settings. Short Name: Unexpected Exits Unexpected Exits Monitoring Period The period to review when computing unexpected exits. unexpected_exits_ window 5 MINUTES Unexpected Exits The health check unexpected_exits_ for unexpected exits encountered within a recent period specified by the unexpected_exits_wind ow configuration for the role. critical:any, warning:never Activity Monitor Web Metric Collection Details: This Activity Monitor health check checks that the web server of the Activity Monitor is responding quickly to requests by the Cloudera Manager agent, and that the Cloudera Manager agent can collect metrics from the web server. A failure of this health check may indicate a problem with the web server of the Activity Monitor, a misconfiguration of the Activity Monitor or a problem with the Cloudera Manager agent. Consult the Cloudera Manager agent logs and the logs of the Activity Monitor for more detail. If the test's failure message indicates a communication problem, this means that the Cloudera Manager Agent's HTTP requests to the Activity Monitor's web server are failing or timing out. These requests are completely local to the Activity Monitor's host, and so should never fail under normal conditions. If the test's failure message indicates an unexpected response, then the Activity Monitor's web server responded to the Cloudera Manager Agent's request, but the Cloudera Manager Agent could not interpret the response for some reason. This test can be configured using the Web Metric Collection Activity Monitor monitoring setting. Short Name: Web Server Status Cloudera Manager 4.6 Health Checks 5

16 Flume Agent File Descriptor Web Metric Collection Enables the health check that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. activitymonitor_web_ metric_collection_ enabled Flume Agent File Descriptor Details: This Agent health check checks that the number of file descriptors used does not rise above some percentage of the Agent file descriptor limit. A failure of this health check may indicate a bug in either Hadoop or Cloudera Manager. Contact Cloudera support. This test can be configured using the File Descriptor Monitoring Agent monitoring setting. Short Name: File Descriptors File Descriptor Monitoring The health check of the number of file descriptors used. Specified as a percentage of file descriptor limit. flume_agent_fd_ critical: , warning: Flume Agent Host Health Details: This Agent health check factors in the health of the host upon which the Agent is running. A failure of this check means that the host running the Agent is experiencing some problem. See that host's status page for more details.this test can be enabled or disabled using the Flume Agent Host Health Check Agent monitoring setting. Short Name: Host Health Flume Agent Host When computing the overall Flume flume_agent_host_health 6 Cloudera Manager 4.6 Health Checks

17 Flume Agent Log Directory Free Space Health Check Agent health, consider the host's health. _enabled Flume Agent Log Directory Free Space Details: This Agent health check checks that the filesystem containing the log directory of this Agent has sufficient free space. This test can be configured using the Log Directory Free Space Monitoring Absolute and Log Directory Free Space Monitoring Percentage Agent monitoring settings. Short Name: Log Directory Free Space Log Directory Free Space Monitoring Absolute The health check for monitoring of free space on the filesystem that contains this role's log directory. log_directory_free _space_absolute_ critical: BYTES 00000, warning: Log Directory Free Space Monitoring Percentage The health check for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute setting is configured. log_directory_ free_space_ percentage_ critical:never, warning:never Cloudera Manager 4.6 Health Checks 7

18 Flume Agent Cloudera Manager Agent Health Flume Agent Cloudera Manager Agent Health Details: This Agent health check checks that the Cloudera Manager Agent on the Agent host is heart beating correctly and that the process associated with the Agent role is in the state expected by Cloudera Manager. A failure of this health check may indicate a problem with the Agent process, a lack of connectivity to the Cloudera Manager Agent on the Agent host, or a problem with the Cloudera Manager Agent. This check can fail either because the Agent has crashed or because the Agent will not start or stop in a timely fashion. Check the Agent logs for more details. If the check fails because of problems communicating with the Cloudera Manager Agent on the Agent host, check the status of the Cloudera Manager Agent by running /etc/init.d/cloudera-scm-agent status on the Agent host, or look in the Cloudera Manager Agent logs on the Agent host for more details. This test can be enabled or disabled using the Flume Agent Process Health Check Agent monitoring setting. Short Name: Process Status Flume Agent Process Health Check Enables the health check that the Flume Agent's process state is consistent with the role configuration flume_agent_scm_health _enabled Flume Agent Unexpected Exits Details: This Agent health check checks that the Agent has not recently exited unexpectedly. The check returns "Bad" health if the number of unexpected exits goes above a critical threshold. For example, if this check is configured with a critical threshold of 1, this check would return "Good" health if there have been no unexpected exits recently. If there has been 1 or more unexpected exits recently, this check would return "Bad" health. This test can be configured using the Unexpected Exits and Unexpected Exits Monitoring Period Agent monitoring settings. Short Name: Unexpected Exits Unexpected Exits Monitoring Period The period to review when computing unexpected exits. unexpected_exits_ window 5 MINUTES Unexpected Exits The health check for unexpected_exits_ critical:any, warning:never 8 Cloudera Manager 4.6 Health Checks

19 Alert Publisher File Descriptor unexpected exits encountered within a recent period specified by the unexpected_exits_windo w configuration for the role. Alert Publisher File Descriptor Details: This Alert Publisher health check checks that the number of file descriptors used does not rise above some percentage of the Alert Publisher file descriptor limit. A failure of this health check may indicate a bug in either Hadoop or Cloudera Manager. Contact Cloudera support. This test can be configured using the File Descriptor Monitoring Alert Publisher monitoring setting. Short Name: File Descriptors File Descriptor Monitoring The health check of the number of file descriptors used. Specified as a percentage of file descriptor limit. alertpublisher_fd_ critical: , warning: Alert Publisher Host Health Details: This Alert Publisher health check factors in the health of the host upon which the Alert Publisher is running. A failure of this check means that the host running the Alert Publisher is experiencing some problem. See that host's status page for more details.this test can be enabled or disabled using the Alert Publisher Host Health Check Alert Publisher monitoring setting. Short Name: Host Health Alert Publisher Host Health Check When computing the overall Alert alertpublisher_host_health _enabled Cloudera Manager 4.6 Health Checks 9

20 Alert Publisher Log Directory Free Space Publisher health, consider the host's health. Alert Publisher Log Directory Free Space Details: This Alert Publisher health check checks that the filesystem containing the log directory of this Alert Publisher has sufficient free space. This test can be configured using the Log Directory Free Space Monitoring Absolute and Log Directory Free Space Monitoring Percentage Alert Publisher monitoring settings. Short Name: Log Directory Free Space Log Directory Free Space Monitoring Absolute The health check for monitoring of free space on the filesystem that contains this role's log directory. log_directory_free_ space_absolute_ critical: , warning: BYTES Log Directory Free Space Monitoring Percentage The health check for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute setting is configured. log_directory_free_ space_percentage_ critical:never, warning:never 10 Cloudera Manager 4.6 Health Checks

21 Alert Publisher Cloudera Manager Agent Health Alert Publisher Cloudera Manager Agent Health Details: This Alert Publisher health check checks that the Cloudera Manager Agent on the Alert Publisher host is heart beating correctly and that the process associated with the Alert Publisher role is in the state expected by Cloudera Manager. A failure of this health check may indicate a problem with the Alert Publisher process, a lack of connectivity to the Cloudera Manager Agent on the Alert Publisher host, or a problem with the Cloudera Manager Agent. This check can fail either because the Alert Publisher has crashed or because the Alert Publisher will not start or stop in a timely fashion. Check the Alert Publisher logs for more details. If the check fails because of problems communicating with the Cloudera Manager Agent on the Alert Publisher host, check the status of the Cloudera Manager Agent by running /etc/init.d/cloudera-scm-agent status on the Alert Publisher host, or look in the Cloudera Manager Agent logs on the Alert Publisher host for more details. This test can be enabled or disabled using the Alert Publisher Process Health Check Alert Publisher monitoring setting. Short Name: Process Status Alert Publisher Process Health Check Enables the health alertpublisher_scm_ check that the Alert health_enabled Publisher's process state is consistent with the role configuration Alert Publisher Unexpected Exits Details: This Alert Publisher health check checks that the Alert Publisher has not recently exited unexpectedly. The check returns "Bad" health if the number of unexpected exits goes above a critical threshold. For example, if this check is configured with a critical threshold of 1, this check would return "Good" health if there have been no unexpected exits recently. If there has been 1 or more unexpected exits recently, this check would return "Bad" health. This test can be configured using the Unexpected Exits and Unexpected Exits Monitoring Period Alert Publisher monitoring settings. Short Name: Unexpected Exits Unexpected Exits Monitoring Period The period to review when computing unexpected exits. unexpected_exits_ window 5 MINUTES Unexpected Exits The health check for unexpected_exits_ critical:any, warning:never Cloudera Manager 4.6 Health Checks 11

22 DataNode Block Count unexpected exits encountered within a recent period specified by the unexpected_exits_wind ow configuration for the role. DataNode Block Count Details: This is a DataNode health check that checks for whether the DataNode has too many blocks. A failure of this health check indicates that there may be performance problems with the DataNode. See the DataNode system for more information. This test can be enabled or disabled using the DataNode Block Count DataNode monitoring setting. Short Name: Block Count DataNode Block Count The health check of the number of blocks on a DataNode datanode_block_ count_ critical:never, warning: DataNode Connectivity Details: This is a DataNode health check that checks that the NameNode considers the DataNode alive. A failure of this health check may indicate that the DataNode is having trouble communicating with the NameNode. Look in the DataNode logs for more details. This test can be enabled or disabled using the DataNode Connectivity Health Check DataNode monitoring setting. The DataNode Connectivity Tolerance at Startup DataNode monitoring setting and the Health Check Startup Tolerance NameNode monitoring setting can be used to control the check's tolerance windows around DataNode and NameNode restarts respectively. Short Name: NameNode Connectivity DataNode Connectivity Health Enables the health check that verifies the datanode_connectivity _health_enabled 12 Cloudera Manager 4.6 Health Checks

23 DataNode File Descriptor Check DataNode Connectivity Tolerance at Startup DataNode is connected to the NameNode The amount of time to datanode_connectivity wait for the DataNode to _tolerance fully start up and connect to the NameNode before enforcing the connectivity check. 180 SECONDS Health Check Startup Tolerance The amount of time allowed after this role is started that failures of health checks that rely on communication with this role will be tolerated. namenode_startup_ tolerance 5 MINUTES DataNode File Descriptor Details: This DataNode health check checks that the number of file descriptors used does not rise above some percentage of the DataNode file descriptor limit. A failure of this health check may indicate a bug in either Hadoop or Cloudera Manager. Contact Cloudera support. This test can be configured using the File Descriptor Monitoring DataNode monitoring setting. Short Name: File Descriptors File Descriptor Monitoring The health check of the number of file descriptors used. Specified as a percentage of file descriptor limit. datanode_fd_ critical: , warning: Cloudera Manager 4.6 Health Checks 13

24 DataNode Free Space Remaining DataNode Free Space Remaining Details: This is a DataNode health check that checks that the amount of free space available for HDFS block data on the DataNode does not fall below some percentage of total configured capacity of the DataNode. A failure of this health check may indicate a capacity planning problem. Try adding more disk capacity and additional data directories to the DataNode, or add additional DataNodes and take steps to rebalance your HDFS cluster. This test can be configured using the DataNode Free Space Monitoring DataNode monitoring setting. Short Name: Free Space DataNode Free Space Monitoring The health check of free space in a DataNode. Specified as a percentage of the capacity on the DataNode. datanode_free_space_ critical: , warning: DataNode Garbage Collection Duration Details: This DataNode health check checks that the DataNode is not spending too much time performing Java garbage collection. It checks that no more than some percentage of recent time is spent performing Java garbage collection. A failure of this health check may indicate a capacity planning problem or misconfiguration of the DataNode. This test can be configured using the DataNode Garbage Collection Duration and DataNode Garbage Collection Duration Monitoring Period DataNode monitoring settings. Short Name: GC Duration DataNode Garbage Collection Duration Monitoring Period The period to review when computing the moving average of garbage collection time. datanode_gc_duration _window 5 MINUTES DataNode Garbage Collection Duration The health check for the weighted average datanode_gc_duration _ critical: , warning: Cloudera Manager 4.6 Health Checks

25 DataNode High Availability Connectivity time spent in Java garbage collection. Specified as a percentage of elapsed wall clock time. See DataNode Garbage Collection Duration Monitoring Period. DataNode High Availability Connectivity Details: This is a DataNode health check that checks that the all running NameNodes in the HDFS service consider the DataNode alive. A failure of this health check may indicate that the DataNode is having trouble communicating with some or all NameNodes in the service. Look in the DataNode logs for more details. This test can be enabled or disabled using the DataNode Connectivity Health Check DataNode monitoring setting. The DataNode Connectivity Tolerance at Startup DataNode monitoring setting and the Health Check Startup Tolerance NameNode monitoring setting can be used to control the check's tolerance windows around DataNode and NameNode restarts respectively. Short Name: NameNode Connectivity DataNode Connectivity Health Check Enables the health check that verifies the DataNode is connected to the NameNode datanode_connectivity_ health_enabled DataNode Connectivity Tolerance at Startup The amount of time to wait for the DataNode to fully start up and connect to the NameNode before enforcing the connectivity check. datanode_connectivity_ tolerance 180 SECONDS Health Check The amount of time allowed after this role namenode_startup_ 5 MINUTES Cloudera Manager 4.6 Health Checks 15

26 DataNode Host Health Startup Tolerance is started that failures of health checks that rely on communication with this role will be tolerated. tolerance DataNode Host Health Details: This DataNode health check factors in the health of the host upon which the DataNode is running. A failure of this check means that the host running the DataNode is experiencing some problem. See that host's status page for more details.this test can be enabled or disabled using the DataNode Host Health Check DataNode monitoring setting. Short Name: Host Health DataNode Host Health Check When computing the overall DataNode health, consider the host's health. datanode_host_health_ enabled DataNode Log Directory Free Space Details: This DataNode health check checks that the filesystem containing the log directory of this DataNode has sufficient free space. This test can be configured using the Log Directory Free Space Monitoring Absolute and Log Directory Free Space Monitoring Percentage DataNode monitoring settings. Short Name: Log Directory Free Space Log Directory Free Space Monitoring Absolute The health check for monitoring of free space on the filesystem that contains this role's log directory. log_directory_free_ space_absolute_ critical: , warning: BYTES 16 Cloudera Manager 4.6 Health Checks

27 DataNode Cloudera Manager Agent Health Log Directory Free Space Monitoring Percentage The health check for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute setting is configured. log_directory_free_ space_percentage_ critical:never, warning:never DataNode Cloudera Manager Agent Health Details: This DataNode health check checks that the Cloudera Manager Agent on the DataNode host is heart beating correctly and that the process associated with the DataNode role is in the state expected by Cloudera Manager. A failure of this health check may indicate a problem with the DataNode process, a lack of connectivity to the Cloudera Manager Agent on the DataNode host, or a problem with the Cloudera Manager Agent. This check can fail either because the DataNode has crashed or because the DataNode will not start or stop in a timely fashion. Check the DataNode logs for more details. If the check fails because of problems communicating with the Cloudera Manager Agent on the DataNode host, check the status of the Cloudera Manager Agent by running /etc/init.d/cloudera-scm-agent status on the DataNode host, or look in the Cloudera Manager Agent logs on the DataNode host for more details. This test can be enabled or disabled using the DataNode Process Health Check DataNode monitoring setting. Short Name: Process Status DataNode Process Health Check Enables the health datanode_scm_health check that the _enabled DataNode's process state is consistent with the role configuration Cloudera Manager 4.6 Health Checks 17

28 DataNode Unexpected Exits DataNode Unexpected Exits Details: This DataNode health check checks that the DataNode has not recently exited unexpectedly. The check returns "Bad" health if the number of unexpected exits goes above a critical threshold. For example, if this check is configured with a critical threshold of 1, this check would return "Good" health if there have been no unexpected exits recently. If there has been 1 or more unexpected exits recently, this check would return "Bad" health. This test can be configured using the Unexpected Exits and Unexpected Exits Monitoring Period DataNode monitoring settings. Short Name: Unexpected Exits Unexpected Exits Monitoring Period The period to review when computing unexpected exits. unexpected_exits_ window 5 MINUTES Unexpected Exits The health check unexpected_exits_ for unexpected exits encountered within a recent period specified by the unexpected_exits_wind ow configuration for the role. critical:any, warning:never DataNode Volume Failures Details: This is a DataNode health check that checks for whether the DataNode has reported any failed volumes. A failure of this health check indicates that there is a problem with one or more volumes on the DataNode. See the DataNode system for more information. This test can be configured using the DataNode Volume Failures DataNode monitoring setting. Short Name: Data Directory Status DataNode Volume Failures The health check of failed volumes in a DataNode. datanode_volume_ failures_ critical:any, warning:never 18 Cloudera Manager 4.6 Health Checks

29 DataNode Web Metric Collection DataNode Web Metric Collection Details: This DataNode health check checks that the web server of the DataNode is responding quickly to requests by the Cloudera Manager agent, and that the Cloudera Manager agent can collect metrics from the web server. A failure of this health check may indicate a problem with the web server of the DataNode, a misconfiguration of the DataNode or a problem with the Cloudera Manager agent. Consult the Cloudera Manager agent logs and the logs of the DataNode for more detail. If the test's failure message indicates a communication problem, this means that the Cloudera Manager Agent's HTTP requests to the DataNode's web server are failing or timing out. These requests are completely local to the DataNode's host, and so should never fail under normal conditions. If the test's failure message indicates an unexpected response, then the DataNode's web server responded to the Cloudera Manager Agent's request, but the Cloudera Manager Agent could not interpret the response for some reason. This test can be configured using the Web Metric Collection DataNode monitoring setting. Short Name: Web Server Status Web Metric Collection Enables the health check that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. datanode_web_ metric_collection_ enabled Event Server Event Store Size Details: This is an Event Server health check that checks that the event store size has not grown too far above the configured event store capacity. A failure of this health check indicates that the Event Server is having a problem performing cleanup. This may indicate a configuration problem or bug in the Event Server. This test can be configured using the Event Store Capacity Monitoring Event Server monitoring setting. Short Name: Event Store Size Event Store Capacity Monitoring The health check on the number of events in the event store. Specified as a percentage of the eventserver_ capacity_ critical: , warning: Cloudera Manager 4.6 Health Checks 19

30 Event Server File Descriptor maximum number of events in Event Server store. Event Server File Descriptor Details: This Event Server health check checks that the number of file descriptors used does not rise above some percentage of the Event Server file descriptor limit. A failure of this health check may indicate a bug in either Hadoop or Cloudera Manager. Contact Cloudera support. This test can be configured using the File Descriptor Monitoring Event Server monitoring setting. Short Name: File Descriptors File Descriptor Monitoring The health check of the number of file descriptors used. Specified as a percentage of file descriptor limit. eventserver_fd_ critical: , warning: Event Server Host Health Details: This Event Server health check factors in the health of the host upon which the Event Server is running. A failure of this check means that the host running the Event Server is experiencing some problem. See that host's status page for more details.this test can be enabled or disabled using the Event Server Host Health Check Event Server monitoring setting. Short Name: Host Health Event Server Host Health Check When computing the overall Event Server health, consider the host's health. eventserver_host_ health_enabled 20 Cloudera Manager 4.6 Health Checks

31 Event Server Index Directory Free Space Event Server Index Directory Free Space Details: This is an Event Server health check that checks that the filesystem containing the index directory of this Event Server has sufficient free space. This test can be configured using the Index Directory Free Space Monitoring Absolute and Index Directory Free Space Monitoring Percentage Event Server monitoring settings. Short Name: Index Directory Free Space Index Directory Free Space Monitoring Absolute The health check for monitoring of free space on the filesystem that contains the index directory. eventserver_index_ directory_free_space _absolute_ critical: , warning: BYTES Index Directory Free Space Monitoring Percentage The health check for monitoring of free space on the filesystem that contains the index directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if an Index Directory Free Space Monitoring Absolute setting is configured. eventserver_index_ critical:never, directory_free_space warning:never _percentage_ Event Server Log Directory Free Space Details: This Event Server health check checks that the filesystem containing the log directory of this Event Server has sufficient free space. This test can be configured using the Log Directory Free Space Monitoring Absolute and Log Directory Free Space Monitoring Percentage Event Server monitoring settings. Short Name: Log Directory Free Space Log Directory Free Space Monitoring Absolute The health check for monitoring of free space on the log_directory_free_ space_absolute_ critical: , warning: BYTES Cloudera Manager 4.6 Health Checks 21

32 Event Server Cloudera Manager Agent Health filesystem that contains this role's log directory Log Directory Free Space Monitoring Percentage The health check for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute setting is configured. log_directory_free_ space_percentage_ critical:never, warning:never Event Server Cloudera Manager Agent Health Details: This Event Server health check checks that the Cloudera Manager Agent on the Event Server host is heart beating correctly and that the process associated with the Event Server role is in the state expected by Cloudera Manager. A failure of this health check may indicate a problem with the Event Server process, a lack of connectivity to the Cloudera Manager Agent on the Event Server host, or a problem with the Cloudera Manager Agent. This check can fail either because the Event Server has crashed or because the Event Server will not start or stop in a timely fashion. Check the Event Server logs for more details. If the check fails because of problems communicating with the Cloudera Manager Agent on the Event Server host, check the status of the Cloudera Manager Agent by running /etc/init.d/cloudera-scm-agent status on the Event Server host, or look in the Cloudera Manager Agent logs on the Event Server host for more details. This test can be enabled or disabled using the Event Server Process Health Check Event Server monitoring setting. Short Name: Process Status Event Server Process Health Check Enables the health check eventserver_scm_ that the Event Server's health_enabled process state is consistent with the role configuration 22 Cloudera Manager 4.6 Health Checks

33 Event Server Unexpected Exits Event Server Unexpected Exits Details: This Event Server health check checks that the Event Server has not recently exited unexpectedly. The check returns "Bad" health if the number of unexpected exits goes above a critical threshold. For example, if this check is configured with a critical threshold of 1, this check would return "Good" health if there have been no unexpected exits recently. If there has been 1 or more unexpected exits recently, this check would return "Bad" health. This test can be configured using the Unexpected Exits and Unexpected Exits Monitoring Period Event Server monitoring settings. Short Name: Unexpected Exits Unexpected Exits Monitoring Period The period to review when computing unexpected exits. unexpected_exits_ window 5 MINUTES Unexpected Exits The health check unexpected_exits_ for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. critical:any, warning:never Event Server Web Metric Collection Details: This Event Server health check checks that the web server of the Event Server is responding quickly to requests by the Cloudera Manager agent, and that the Cloudera Manager agent can collect metrics from the web server. A failure of this health check may indicate a problem with the web server of the Event Server, a misconfiguration of the Event Server or a problem with the Cloudera Manager agent. Consult the Cloudera Manager agent logs and the logs of the Event Server for more detail. If the test's failure message indicates a communication problem, this means that the Cloudera Manager Agent's HTTP requests to the Event Server's web server are failing or timing out. These requests are completely local to the Event Server's host, and so should never fail under normal conditions. If the test's failure message indicates an unexpected response, then the Event Server's web server responded to the Cloudera Manager Agent's request, but the Cloudera Manager Agent could not interpret the response for some reason. This test can be configured using the Web Metric Collection Event Server monitoring setting. Short Name: Web Server Status Cloudera Manager 4.6 Health Checks 23

34 Event Server Write Pipeline Web Metric Collection Enables the health check that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. eventserver_web_ metric_collection_ enabled Event Server Write Pipeline Details: This Event Server health check checks that no messages are being dropped by the writer stage of the Event Server pipeline. A failure of this health check indicates a problem with the Event Server. This may indicate a configuration problem or a bug in the Event Server. This test can be configured using the Event Server Write Pipeline Monitoring Time Period monitoring setting. Short Name: Write Pipeline Event Server Write Pipeline Monitoring The health check for monitoring the Event Server write pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. eventserver_write_ pipeline_ critical:any, warning:never Event Server Write Pipeline Monitoring Time Period The time period over which the Event Server write pipeline will be monitored for dropped messages. eventserver_write_ pipeline_window 5 MINUTES Failover Controller File Descriptor Details: This Failover Controller health check checks that the number of file descriptors used does not rise above some percentage of the Failover Controller file descriptor limit. A failure of this health check may indicate a bug in either Hadoop or Cloudera Manager. Contact Cloudera support. This test can be 24 Cloudera Manager 4.6 Health Checks

35 Failover Controller Host Health configured using the File Descriptor Monitoring Failover Controller monitoring setting. Short Name: File Descriptors File Descriptor Monitoring The health check of the number of file descriptors used. Specified as a percentage of file descriptor limit. failovercontroller_fd _ critical: , warning: Failover Controller Host Health Details: This Failover Controller health check factors in the health of the host upon which the Failover Controller is running. A failure of this check means that the host running the Failover Controller is experiencing some problem. See that host's status page for more details.this test can be enabled or disabled using the FailoverController Host Health Check Failover Controller monitoring setting. Short Name: Host Health FailoverController Host Health Check When computing the overall FailoverController health, consider the host's health. failovercontroller_ host_health_enabled Failover Controller Log Directory Free Space Details: This Failover Controller health check checks that the filesystem containing the log directory of this Failover Controller has sufficient free space. This test can be configured using the Log Directory Free Space Monitoring Absolute and Log Directory Free Space Monitoring Percentage Failover Controller monitoring settings. Short Name: Log Directory Free Space Log Directory Free Space Monitoring Absolute The health check for monitoring of free space on the filesystem that contains this role's log log_directory_free_ space_absolute_ critical: , warning: BYTES Cloudera Manager 4.6 Health Checks 25

Cloudera Manager Health Checks

Cloudera Manager Health Checks Cloudera, Inc. 1001 Page Mill Road Palo Alto, CA 94304-1008 info@cloudera.com US: 1-888-789-1488 Intl: 1-650-362-0488 www.cloudera.com Cloudera Manager Health Checks Important Notice 2010-2013 Cloudera,

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Cloudera Manager Introduction

Cloudera Manager Introduction Cloudera Manager Introduction Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained

More information

Cloudera Navigator Installation and User Guide

Cloudera Navigator Installation and User Guide Cloudera Navigator Installation and User Guide Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Cloudera Backup and Disaster Recovery

Cloudera Backup and Disaster Recovery Cloudera Backup and Disaster Recovery Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans

More information

Cloudera Backup and Disaster Recovery

Cloudera Backup and Disaster Recovery Cloudera Backup and Disaster Recovery Important Note: Cloudera Manager 4 and CDH 4 have reached End of Maintenance (EOM) on August 9, 2015. Cloudera will not support or provide patches for any of the Cloudera

More information

Configuring TLS Security for Cloudera Manager

Configuring TLS Security for Cloudera Manager Configuring TLS Security for Cloudera Manager Cloudera, Inc. 220 Portage Avenue Palo Alto, CA 94306 info@cloudera.com US: 1-888-789-1488 Intl: 1-650-362-0488 www.cloudera.com Notice 2010-2012 Cloudera,

More information

Cloudera Manager Monitoring and Diagnostics Guide

Cloudera Manager Monitoring and Diagnostics Guide Cloudera Manager Monitoring and Diagnostics Guide Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names

More information

CDH 5 Quick Start Guide

CDH 5 Quick Start Guide CDH 5 Quick Start Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this

More information

Cloudera Manager Monitoring and Diagnostics Guide

Cloudera Manager Monitoring and Diagnostics Guide Cloudera Manager Monitoring and Diagnostics Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names

More information

How To Use Cloudera Manager Backup And Disaster Recovery (Brd) On A Microsoft Hadoop 5.5.5 (Clouderma) On An Ubuntu 5.2.5 Or 5.3.5

How To Use Cloudera Manager Backup And Disaster Recovery (Brd) On A Microsoft Hadoop 5.5.5 (Clouderma) On An Ubuntu 5.2.5 Or 5.3.5 Cloudera Manager Backup and Disaster Recovery Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or

More information

Cloudera Administration

Cloudera Administration Cloudera Administration Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing

More information

Communicating with the Elephant in the Data Center

Communicating with the Elephant in the Data Center Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline

More information

Ankush Cluster Manager - Hadoop2 Technology User Guide

Ankush Cluster Manager - Hadoop2 Technology User Guide Ankush Cluster Manager - Hadoop2 Technology User Guide Ankush User Manual 1.5 Ankush User s Guide for Hadoop2, Version 1.5 This manual, and the accompanying software and other documentation, is protected

More information

Important Notice. (c) 2010-2013 Cloudera, Inc. All rights reserved.

Important Notice. (c) 2010-2013 Cloudera, Inc. All rights reserved. Hue 2 User Guide Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this document

More information

Cloudera Manager Training: Hands-On Exercises

Cloudera Manager Training: Hands-On Exercises 201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working

More information

HDFS: Hadoop Distributed File System

HDFS: Hadoop Distributed File System Istanbul Şehir University Big Data Camp 14 HDFS: Hadoop Distributed File System Aslan Bakirov Kevser Nur Çoğalmış Agenda Distributed File System HDFS Concepts HDFS Interfaces HDFS Full Picture Read Operation

More information

HADOOP MOCK TEST HADOOP MOCK TEST I

HADOOP MOCK TEST HADOOP MOCK TEST I http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

CDH AND BUSINESS CONTINUITY:

CDH AND BUSINESS CONTINUITY: WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable

More information

Distributed Filesystems

Distributed Filesystems Distributed Filesystems Amir H. Payberah Swedish Institute of Computer Science amir@sics.se April 8, 2014 Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 1 / 32 What is Filesystem? Controls

More information

CDH 5 High Availability Guide

CDH 5 High Availability Guide CDH 5 High Availability Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Hadoop Scalability at Facebook. Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011

Hadoop Scalability at Facebook. Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011 Hadoop Scalability at Facebook Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011 How Facebook uses Hadoop Hadoop Scalability Hadoop High Availability HDFS Raid How Facebook uses Hadoop Usages

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CURSO: ADMINISTRADOR PARA APACHE HADOOP

CURSO: ADMINISTRADOR PARA APACHE HADOOP CURSO: ADMINISTRADOR PARA APACHE HADOOP TEST DE EJEMPLO DEL EXÁMEN DE CERTIFICACIÓN www.formacionhadoop.com 1 Question: 1 A developer has submitted a long running MapReduce job with wrong data sets. You

More information

Cloudera Manager Administration Guide

Cloudera Manager Administration Guide Cloudera Manager Administration Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS

More information

CDH 5 High Availability Guide

CDH 5 High Availability Guide CDH 5 High Availability Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo, 2010 Agenda Topic 1: Introduction Topic 2: Architecture

More information

Deploying Hadoop with Manager

Deploying Hadoop with Manager Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution

More information

Design and Evolution of the Apache Hadoop File System(HDFS)

Design and Evolution of the Apache Hadoop File System(HDFS) Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

HDFS Users Guide. Table of contents

HDFS Users Guide. Table of contents Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9

More information

Hadoop Distributed Filesystem. Spring 2015, X. Zhang Fordham Univ.

Hadoop Distributed Filesystem. Spring 2015, X. Zhang Fordham Univ. Hadoop Distributed Filesystem Spring 2015, X. Zhang Fordham Univ. MapReduce Programming Model Split Shuffle Input: a set of [key,value] pairs intermediate [key,value] pairs [k1,v11,v12, ] [k2,v21,v22,

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Big Data Operations Guide for Cloudera Manager v5.x Hadoop Big Data Operations Guide for Cloudera Manager v5.x Hadoop Logging into the Enterprise Cloudera Manager 1. On the server where you have installed 'Cloudera Manager', make sure that the server is running,

More information

6. How MapReduce Works. Jari-Pekka Voutilainen

6. How MapReduce Works. Jari-Pekka Voutilainen 6. How MapReduce Works Jari-Pekka Voutilainen MapReduce Implementations Apache Hadoop has 2 implementations of MapReduce: Classic MapReduce (MapReduce 1) YARN (MapReduce 2) Classic MapReduce The Client

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A COMPREHENSIVE VIEW OF HADOOP ER. AMRINDER KAUR Assistant Professor, Department

More information

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF Non-Stop for Apache HBase: -active region server clusters TECHNICAL BRIEF Technical Brief: -active region server clusters -active region server clusters HBase is a non-relational database that provides

More information

Windows Small Business Server 2003 Upgrade Best Practices

Windows Small Business Server 2003 Upgrade Best Practices Windows Small Business Server 2003 Upgrade Best Practices Microsoft Corporation Published: May 2005 Version: 1 Abstract To ensure a successful upgrade from the Microsoft Windows Small Business Server 2003

More information

Hadoop Distributed File System (HDFS) Overview

Hadoop Distributed File System (HDFS) Overview 2012 coreservlets.com and Dima May Hadoop Distributed File System (HDFS) Overview Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized

More information

HADOOP MOCK TEST HADOOP MOCK TEST II

HADOOP MOCK TEST HADOOP MOCK TEST II http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at

More information

Test-King.CCA-500.68Q.A. Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH)

Test-King.CCA-500.68Q.A. Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH) Test-King.CCA-500.68Q.A Number: Cloudera CCA-500 Passing Score: 800 Time Limit: 120 min File Version: 5.1 http://www.gratisexam.com/ Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop

More information

HP SiteScope. Hadoop Cluster Monitoring Solution Template Best Practices. For the Windows, Solaris, and Linux operating systems

HP SiteScope. Hadoop Cluster Monitoring Solution Template Best Practices. For the Windows, Solaris, and Linux operating systems HP SiteScope For the Windows, Solaris, and Linux operating systems Software Version: 11.23 Hadoop Cluster Monitoring Solution Template Best Practices Document Release Date: December 2013 Software Release

More information

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data

More information

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

Hadoop. History and Introduction. Explained By Vaibhav Agarwal Hadoop History and Introduction Explained By Vaibhav Agarwal Agenda Architecture HDFS Data Flow Map Reduce Data Flow Hadoop Versions History Hadoop version 2 Hadoop Architecture HADOOP (HDFS) Data Flow

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Introduction to Hyper-V High- Availability with Failover Clustering

Introduction to Hyper-V High- Availability with Failover Clustering Introduction to Hyper-V High- Availability with Failover Clustering Lab Guide This lab is for anyone who wants to learn about Windows Server 2012 R2 Failover Clustering, focusing on configuration for Hyper-V

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

HDFS Reliability. Tom White, Cloudera, 12 January 2008

HDFS Reliability. Tom White, Cloudera, 12 January 2008 HDFS Reliability Tom White, Cloudera, 12 January 2008 The Hadoop Distributed Filesystem (HDFS) is a distributed storage system for reliably storing petabytes of data on clusters of commodity hardware.

More information

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information

Certified Big Data and Apache Hadoop Developer VS-1221

Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification

More information

Introduction to HDFS. Prasanth Kothuri, CERN

Introduction to HDFS. Prasanth Kothuri, CERN Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. Hadoop

More information

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014 Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability

More information

Cloudera ODBC Driver for Apache Hive Version 2.5.16

Cloudera ODBC Driver for Apache Hive Version 2.5.16 Cloudera ODBC Driver for Apache Hive Version 2.5.16 Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and any other product or service

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

7 Deadly Hadoop Misconfigurations. Kathleen Ting February 2013

7 Deadly Hadoop Misconfigurations. Kathleen Ting February 2013 7 Deadly Hadoop Misconfigurations Kathleen Ting February 2013 Who Am I? Kathleen Ting Apache Sqoop Committer, PMC Member Customer Operations Engineering Mgr, Cloudera @kate_ting, kathleen@apache.org 2

More information

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013 Hadoop 101 Lars George NoSQL- Ma4ers, Cologne April 26, 2013 1 What s Ahead? Overview of Apache Hadoop (and related tools) What it is Why it s relevant How it works No prior experience needed Feel free

More information

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Hadoop Distributed File System. Jordan Prosch, Matt Kipps Hadoop Distributed File System Jordan Prosch, Matt Kipps Outline - Background - Architecture - Comments & Suggestions Background What is HDFS? Part of Apache Hadoop - distributed storage What is Hadoop?

More information

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software

More information

Memory-to-memory session replication

Memory-to-memory session replication Memory-to-memory session replication IBM WebSphere Application Server V7 This presentation will cover memory-to-memory session replication in WebSphere Application Server V7. WASv7_MemorytoMemoryReplication.ppt

More information

HDFS Under the Hood. Sanjay Radia. Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

HDFS Under the Hood. Sanjay Radia. Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. 1 Outline Overview of Hadoop, an open source project Design of HDFS On going work 2 Hadoop Hadoop provides a framework

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

HADOOP MOCK TEST HADOOP MOCK TEST

HADOOP MOCK TEST HADOOP MOCK TEST http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at

More information

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Understanding Big Data and Big Data Analytics Getting familiar with Hadoop Technology Hadoop release and upgrades

More information

Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网. Information Management. Information Management IBM CDL Lab

Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网. Information Management. Information Management IBM CDL Lab IBM CDL Lab Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网 Information Management 2012 IBM Corporation Agenda Hadoop 技 术 Hadoop 概 述 Hadoop 1.x Hadoop 2.x Hadoop 生 态

More information

Pivotal HD Enterprise

Pivotal HD Enterprise PRODUCT DOCUMENTATION Pivotal HD Enterprise Version 1.1 Stack and Tool Reference Guide Rev: A01 2013 GoPivotal, Inc. Table of Contents 1 Pivotal HD 1.1 Stack - RPM Package 11 1.1 Overview 11 1.2 Accessing

More information

Cloudera Navigator Installation and User Guide

Cloudera Navigator Installation and User Guide Cloudera Navigator Installation and User Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or

More information

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 Lecture 2 (08/31, 09/02, 09/09): Hadoop Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 K. Zhang BUDT 758 What we ll cover Overview Architecture o Hadoop

More information

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems Processing of massive data: MapReduce 2. Hadoop 1 MapReduce Implementations Google were the first that applied MapReduce for big data analysis Their idea was introduced in their seminal paper MapReduce:

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction

More information

Operations and Big Data: Hadoop, Hive and Scribe. Zheng Shao @ 铮 9 12/7/2011 Velocity China 2011

Operations and Big Data: Hadoop, Hive and Scribe. Zheng Shao @ 铮 9 12/7/2011 Velocity China 2011 Operations and Big Data: Hadoop, Hive and Scribe Zheng Shao @ 铮 9 12/7/2011 Velocity China 2011 Agenda 1 Operations: Challenges and Opportunities 2 Big Data Overview 3 Operations with Big Data 4 Big Data

More information

Apache Hadoop new way for the company to store and analyze big data

Apache Hadoop new way for the company to store and analyze big data Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File

More information

COURSE CONTENT Big Data and Hadoop Training

COURSE CONTENT Big Data and Hadoop Training COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop

More information

Introduction to HDFS. Prasanth Kothuri, CERN

Introduction to HDFS. Prasanth Kothuri, CERN Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. HDFS

More information

Comparing Scalable NOSQL Databases

Comparing Scalable NOSQL Databases Comparing Scalable NOSQL Databases Functionalities and Measurements Dory Thibault UCL Contact : thibault.dory@student.uclouvain.be Sponsor : Euranova Website : nosqlbenchmarking.com February 15, 2011 Clarications

More information

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Hadoop Distributed File System. Dhruba Borthakur June, 2007 Hadoop Distributed File System Dhruba Borthakur June, 2007 Goals of HDFS Very Large Distributed File System 10K nodes, 100 million files, 10 PB Assumes Commodity Hardware Files are replicated to handle

More information

Polycom CMA System Upgrade Guide

Polycom CMA System Upgrade Guide Polycom CMA System Upgrade Guide 5.0 May 2010 3725-77606-001C Trademark Information Polycom, the Polycom Triangles logo, and the names and marks associated with Polycom s products are trademarks and/or

More information

Important Notice. (c) 2010-2015 Cloudera, Inc. All rights reserved.

Important Notice. (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera Security Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this document

More information

Taming Operations in the Apache Hadoop Ecosystem. Jon Hsieh, jon@cloudera.com Kate Ting, kate@cloudera.com USENIX LISA 14 Nov 14, 2014

Taming Operations in the Apache Hadoop Ecosystem. Jon Hsieh, jon@cloudera.com Kate Ting, kate@cloudera.com USENIX LISA 14 Nov 14, 2014 Taming Operations in the Apache Hadoop Ecosystem Jon Hsieh, jon@cloudera.com Kate Ting, kate@cloudera.com USENIX LISA 14 Nov 14, 2014 $ whoami Jon Hsieh, Cloudera Software engineer HBase Tech Lead Apache

More information

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security

More information

Pipeliner CRM Phaenomena Guide Sales Pipeline Management. 2015 Pipelinersales Inc. www.pipelinersales.com

Pipeliner CRM Phaenomena Guide Sales Pipeline Management. 2015 Pipelinersales Inc. www.pipelinersales.com Sales Pipeline Management 2015 Pipelinersales Inc. www.pipelinersales.com Sales Pipeline Management Learn how to manage sales opportunities with Pipeliner Sales CRM Application. CONTENT 1. Configuring

More information

HDFS Architecture Guide

HDFS Architecture Guide by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5

More information

Apache Hadoop FileSystem and its Usage in Facebook

Apache Hadoop FileSystem and its Usage in Facebook Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs

More information

COSC 6397 Big Data Analytics. Distributed File Systems (II) Edgar Gabriel Spring 2014. HDFS Basics

COSC 6397 Big Data Analytics. Distributed File Systems (II) Edgar Gabriel Spring 2014. HDFS Basics COSC 6397 Big Data Analytics Distributed File Systems (II) Edgar Gabriel Spring 2014 HDFS Basics An open-source implementation of Google File System Assume that node failure rate is high Assumes a small

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com Hortonworks Data Platform: Hadoop High Availability Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively

More information

Veritas Cluster Server Application Note: Disaster Recovery for Microsoft SharePoint Server

Veritas Cluster Server Application Note: Disaster Recovery for Microsoft SharePoint Server Veritas Cluster Server Application Note: Disaster Recovery for Microsoft SharePoint Server Windows Server 2003, Windows Server 2008 5.1 Veritas Cluster Server Application Note: Disaster Recovery for Microsoft

More information

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software 22 nd October 2013 10:00 Sesión B - DB2 LUW 1 Agenda Big Data The Technical Challenges Architecture of Hadoop

More information