Cloudera Navigator Installation and User Guide
Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this document are trademarks of Cloudera and its suppliers or licensors, and may not be copied, imitated or used, in whole or in part, without the prior written permission of Cloudera or the applicable trademark holder. Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation. All other trademarks, registered trademarks, product names and company names or logos mentioned in this document are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, supplier or otherwise does not constitute or imply endorsement, sponsorship or recommendation thereof by us. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Cloudera. Cloudera may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Cloudera, the furnishing of this document does not give you any license to these patents, trademarks copyrights, or other intellectual property. The information in this document is subject to change without notice. Cloudera shall not be liable for any damages resulting from technical errors or omissions which may be present in this document, or from use of this document. Cloudera, Inc. 220 Portage Avenue Palo Alto, CA 94306 info@cloudera.com US: 1-888-789-1488 Intl: 1-650-362-0488 www.cloudera.com Release Information Version: 1.0.0 Date: June 4, 2013
Table of Contents Cloudera Navigator Installation and User Guide...5 Introducing Cloudera Navigator...7 Cloudera Navigator Architecture...7 Service Versions and Audited Operations...8 HDFS...8 HBase...8 Hive...8 Hue...9 Installing Cloudera Navigator...11 Requirements...11 New Installation...11 Upgrading an Existing Cloudera Manager Installation...11 Configuring Navigator Server...13 Configuring Service Auditing...15 Service Auditing Properties...15 Configuring Service Auditing Properties...15 Audit Event Logs...17 Viewing Audit Event Logs...17 Filtering Events...17 Adding Filters...17 Selecting a Time Range...18 Removing Filters...18 Modifying Filters...18 Downloading Audit Event Logs...18 Downloading HDFS Directory Access Permission Reports...19
Cloudera Navigator Installation and User Guide Cloudera Navigator Installation and User Guide This guide explains how to install, configure, and use Cloudera Navigator. Introducing Cloudera Navigator Installing Cloudera Navigator Configuring Navigator Server Configuring Service Auditing Audit Event Logs Downloading HDFS Directory Access Permission Reports Cloudera Navigator Installation and User Guide 5
Introducing Cloudera Navigator Introducing Cloudera Navigator Cloudera Navigator is the first fully integrated data management tool for the Hadoop platform. Cloudera Navigator 1.0 provides data governance capabilities such as verifying access privileges and auditing access to all data stored in Hadoop. These capabilities are critical for enterprise customers that are in highly regulated industries and have stringent compliance requirements. Cloudera Navigator tracks access permissions and actual accesses to all data objects in Hive, HBase, and HDFS to help answer questions such as - who has access to which data object(s), which data objects were accessed by a user, when was a data object accessed and by whom, what data assets were accessed using a service, which device was used to access, and so on. In the current release Cloudera Navigator supports tracking access to: HDFS data accessed through HDFS, Hive, and HBase operations Hive metadata Cloudera Navigator allows administrators to configure, collect, and view audit events, to understand who accessed what data and how. The information in an audit event includes: Timestamp - The date and time of the access. Operation - The operation performed on the object. For example, list an HDFS directory, create a Hive table, or put an HBase object. Object accessed - The object that was accessed. For example, a Hive table, an HDFS file or directory, or an HBase table. User - The principal that accessed the object. Typically, this is a username. Where appropriate, this is annotated with the authentication mechanism. IP address - The address of the machine that accessed the object. Service - The service instance through which the data was accessed. For example, a Hive service instance. Cloudera Navigator allows administrators to generate reports that list the HDFS access permissions granted to groups. Cloudera Navigator Architecture The architecture of Cloudera Navigator is illustrated below. Cloudera Navigator is implemented as an add-on to Cloudera Manager 4.5; all Cloudera Navigator functions (installation, configuration, and audit log review) are accessed through the Cloudera Manager Admin Console. Cloudera Navigator Installation and User Guide 7
When Cloudera Navigator is installed, plug-ins that enable collection of audit events are added to each audited service. When data is accessed via the services for whom auditing is enabled via Cloudera Navigator, audit events are generated and sent to the Navigator Server, which stores the events securely and durably in a database. Service Versions and Audited Operations HDFS HBase This section describes the service versions and audited operations supported by Cloudera Navigator. Minimum supported CDH version: 4.0 The captured operations are: Operations that access or modify a file's or directory's data or metadata Operations denied due to lack of privileges Minimum supported CDH version: 4.0 The captured operations are: Operations that require a privilege (except balance, balance switch, and append) Operations denied due to lack of privileges Note: In CDH versions less than 4.2, for grant and revoke operations, the operation in log events is "ADMIN" In simple authentication mode, if the HBase Secure RPC Engine property is "false" (the default), the username in log events is "UNKNOWN". To see a meaningful user name: 1. Click the HBase service. 2. Select Configuration > View and Edit > Service-wide > Security 3. Set the HBase Secure RPC Engine property to"true". 4. Save the change and restart the service. Hive Minimum supported CDH version: 4.2 The captured operations are: Operations (except grant, revoke, and metadata access only) sent to HiveServer2 Note: Operations denied due to lack of privileges are not captured Access via the Hive CLI is not supported In simple authentication mode, the username in log events is the username passed in the HiveServer2 connect command. If you do not pass a username in the connect command, the username is log events is "anonymous".
Hue Minumum supported CDH version: 4.2 The captured operations are: Operations (except grant, revoke, and metadata access only) sent to Beeswax Server Introducing Cloudera Navigator Note: You do not directly configure the Hue service for auditing. Instead, when you configure the Hive service for auditing, operations sent to the Hive service through Beeswax appear in the Hue service audit log Cloudera Navigator Installation and User Guide 9
Installing Cloudera Navigator Installing Cloudera Navigator You can install Cloudera Navigator while installing Cloudera Manager for the first time or while upgrading an existing Cloudera Manager installation. When you install Cloudera Navigator you choose the database to store audit events. You can choose either an embedded PostgreSQL database or an external database. For information on setting up a standalone database, see Installing and Configuring Databases in Cloudera Manager Installation Guide. Requirements For information on the requirements for installing Cloudera Navigator, see Requirements for Cloudera Manager in Cloudera Manager Installation Guide. New Installation 1. Install Cloudera Manager following the instructions in the Cloudera Manager Installation Guide. 2. In the Add Cloudera Management Services area of the Choose the CDH4 services screen, check the Include Cloudera Navigator checkbox. Upgrading an Existing Cloudera Manager Installation If you are installing Cloudera Navigator while upgrading Cloudera Manager to a new version: 1. Upgrade Cloudera Manager following the instructions in the Cloudera Manager Installation Guide. 2. Click the management service (for example, MGMT-1) in the Cloudera Management Services table. 3. Click the Instances tab. 4. Click the Add button. 5. Choose a host and select the Navigator Server radio button. 6. Click Accept to acknowledge that you must ensure you have sufficient licenses. 7. Choose a database option and click Test Connection to verify the availability of the database. 8. Click Continue. 9. Click Accept to acknowledge that no configuration changes must be performed. 10. Check the checkbox next to the navigator role on the Role Instances page. 11. Select Actions for Selected > Start. 12. Click Start in the confirmation pop-up. 13. Click Close on the Start Command Details pop-up. 14. Restart all audited services for auditing to go into effect. Cloudera Navigator Installation and User Guide 11
Configuring Navigator Server Configuring Navigator Server To configure Navigator Server, do one of the following: In the Cloudera Management Services table, 1. Click the Navigator Server role. 2. Select Configuration > View and Edit. 3. Expand the navigator category and optionally choose a subcategory. 4. Configure the server and click Save Changes. Select Services > Cloudera Management Service. 1. Select Configuration > View and Edit. 2. Expand the Navigator Server category and optionally choose a subcategory. 3. Configure the server and click Save Changes. For detailed information on service configuration, see Modifying Service Configurations in Managing Clusters with Cloudera Manager and Configuring Monitoring Settings in Cloudera Manager Monitoring and Diagnostics Guide. Cloudera Navigator Installation and User Guide 13
Configuring Service Auditing Configuring Service Auditing You can configure services to: Enable and disable auditing Exclude and include auditing of files and directories, users, and tables Coalesce auditing events based on operation attributes (time, operation name), user attributes (username) and object attributes (path, table name, and so on). Specify what action to take when the audit event queue is full Service Auditing Properties Each service that supports auditing configuration has the following properties: Enable collection - A flag to enable collection of audit events Event filter - A set of rules that capture properties of auditable events and actions to be performed when an event matches those properties Event tracker - A set of rules for tracking and coalescing events. Queue policy - The action to take when the audit event queue is full. When a queue is full and the queue policy of the service is Shutdown, before shutting down the service, N audits will be discarded, where N is the size of the Cloudera Navigator Server queue. The Event Filter and Event Tracker rules for filtering and coalescing events are expressed as JSON objects. For information on the structure of the objects, see the description on the configuration screen. Configuring Service Auditing Properties 1. Click an HDFS, HBase, or Hive service. 2. Select Configuration > View and Edit. 3. Click the Cloudera Navigator category. The Service-Wide properties display. 4. Edit the properties and click Save Changes. Cloudera Navigator Installation and User Guide 15
Audit Event Logs Audit Event Logs In Cloudera Manager audit event logs display service and role life cycle events recorded by Cloudera Manager management services and service access events recorded by Cloudera Navigator. For information on the former, see Viewing and Filtering Audit Events in Cloudera Manager Monitoring and Diagnostics Guide. Viewing Audit Event Logs You can view audit event logs for all services or for a specific service. To view the audit event log for all services: 1. Click Audits in the banner. To view the audit event log for a service: 1. Click an HDFS, HBase, Hive, or Hue service. 2. Click the Audits tab. Note: When you mouse over a Hive or Hue service event, a pop-up will display the query that generated the event. Events that represent denied access are labeled Denied and have a pink background. Filtering Events You filter events by adding filters or selecting a time range. Adding Filters Do one of the following: Click the icon that displays next to a property when you hover in one of the event entries. A filter containing the property and its value is added to the list of filters at the left and Cloudera Manager redisplays all events that match the filter. Click the Add Filter to the left of the log. A filter control is added to the list of filters. 1. Choose an event property in the property drop-down list. 2. Choose an operator in the operator drop-down list. 3. Type an event property value in the value text field. If you use the LIKE operator, specify combinations of literal strings and '%' in the value field. For example, the value 'THE%S' matches THEMOVIES and THEUSERS. 4. Do one of the following: Click Search. A filter containing the property, operation, and value is added to the list of filters at the left and the audit log redisplays all events that match the filter. Cloudera Navigator Installation and User Guide 17
Click Add Another. A filter containing the property and its value is added to the list of filters at the left, the audit log redisplays all events that match the filter, and another filter control is added to the list of filters. Selecting a Time Range Do one of the following: Click a duration link ( ) on the right of the audit log. Specify a time range using the Time Range Selector. For information on the Time Range Selector, see Time Line in Cloudera Manager Monitoring and Diagnostics Guide. The audit log redisplays all events that match the time range. Removing Filters Click the at the right of the filter. The filter is removed and the audit log redisplays all audit events that match the remaining filters. If there are no filters, the audit log displays all events. Modifying Filters 1. Click the filter. The filter expands into separate property, operator, and value fields. 2. Modify the value of one or more fields. 3. Click Search. A filter containing the property, operation, and value is added to the list of filters at the left and the audit log redisplays all events that match the filter. Downloading Audit Event Logs 1. Specify desired filters and time range. 2. Click the Download CSV button to the left of the audit log. A file with the following fields is downloaded: service, username, command, ipaddress, resource, allowed, timestamp. The structure of the resource field depends on the type of the service as follows: HDFS - A file path. Hive and Hue - <database>:<tablename> HBase - <table> <famil>:<qualifier> Here is an example of an HDFS service audit log: service,username,command,ipaddress,resource,allowed,timestamp hdfs1,cloudera,setpermission,10.20.187.242,/user/hive,false,"2013-02-09t00:59:34.430z" hdfs1,cloudera,getfileinfo,10.20.187.242,/user/cloudera,true,"2013-02-09t00:59:22.667z" hdfs1,cloudera,getfileinfo,10.20.187.242,/,true,"2013-02-09t00:59:22.658z" In this example, the first event access was denied, and therefore the "allowed" property has the value "false".
Downloading HDFS Directory Access Permission Reports Downloading HDFS Directory Access Permission Reports For each HDFS service you can download a report that details the HDFS directories a group has permission to access. To download a directory access permission report: 1. In Cloudera Manager, click Reports. 2. In the Directory Access by Group row, click CSV or XLS. The Download User Access Report pop-up displays. a. In the pop-up, type a group and directory. b. Click Download. A report of the selected type will be generated containing the following information path, owner, permissions, and size for each directory contained in the specified directory that the specified group has access to. Cloudera Navigator Installation and User Guide 19