Cloudera Manager Training: Hands-On Exercises



Similar documents
Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Cloudera Backup and Disaster Recovery

Cloudera Manager Monitoring and Diagnostics Guide

Cloudera Backup and Disaster Recovery

Cloudera Manager Monitoring and Diagnostics Guide

MultiSite Manager. User Guide

IIS, FTP Server and Windows

How To Use Cloudera Manager Backup And Disaster Recovery (Brd) On A Microsoft Hadoop (Clouderma) On An Ubuntu Or 5.3.5

docs.hortonworks.com

Dynamic DNS How-To Guide

Technical Support Set-up Procedure

EVALUATION ONLY. WA2088 WebSphere Application Server 8.5 Administration on Windows. Student Labs. Web Age Solutions Inc.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

USER CONFERENCE 2011 SAN FRANCISCO APRIL Running MarkLogic in the Cloud DEVELOPER LOUNGE LAB

SQL Server Setup for Assistant/Pro applications Compliance Information Systems

Creating a universe on Hive with Hortonworks HDP 2.0

Monitoring Oracle Enterprise Performance Management System Release Deployments from Oracle Enterprise Manager 12c

SOA Software API Gateway Appliance 7.1.x Administration Guide

Tool Tip. SyAM Management Utilities and Non-Admin Domain Users

Important Notice. (c) Cloudera, Inc. All rights reserved.

User Guide Microsoft Exchange Remote Test Instructions

Configuration Guide - OneDesk to SalesForce Connector

Step by step guide for installing highly available System Centre 2012 Virtual Machine Manager Management server:

AXIS Camera Companion Internet access

NSi Mobile Installation Guide. Version 6.2

Active Directory Self-Service FAQ

Configure Single Sign on Between Domino and WPS

Ankush Cluster Manager - Hadoop2 Technology User Guide

Defender Token Deployment System Quick Start Guide

Cloudera Navigator Installation and User Guide

How To Create An Easybelle History Database On A Microsoft Powerbook (Windows)

MiraCosta College now offers two ways to access your student virtual desktop.

TSM for Windows Installation Instructions: Download the latest TSM Client Using the following link:

How to Install and Configure EBF15328 for MapR or with MapReduce v1

Hadoop Basics with InfoSphere BigInsights

owncloud Configuration and Usage Guide

System Administration Training Guide. S100 Installation and Site Management

Virtual Appliance Setup Guide

Cloud Services. Introduction...2 Overview...2. Security considerations Installation...3 Server Configuration...4

Red Hat Enterprise Linux OpenStack Platform 7 OpenStack Data Processing

F-Secure Messaging Security Gateway. Deployment Guide

HOW TO CONFIGURE SQL SERVER REPORTING SERVICES IN ORDER TO DEPLOY REPORTING SERVICES REPORTS FOR DYNAMICS GP

File Share Navigator Online 1

Team Foundation Server 2012 Installation Guide

Set Up Hortonworks Hadoop with SQL Anywhere

Using The Hortonworks Virtual Sandbox

VERALAB LDAP Configuration Guide

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Active Directory Management. Agent Deployment Guide

Guide to the LBaaS plugin ver for Fuel

Team Foundation Server 2013 Installation Guide

NetBeans IDE Field Guide

Quick Start Guide. Installation and Setup

Setting up the Oracle Warehouse Builder Project. Topics. Overview. Purpose

Installing and Using the vnios Trial

Configuring a Windows 2003 Server for IAS

ShadowControl ShadowStream

AVG Business SSO Connecting to Active Directory

Schools Remote Access Server

Test Case 3 Active Directory Integration

This manual provides information and instructions for Mac SharePoint Users at Fermilab. Using Sharepoint from a Mac: Terminal Server Instructions

QUANTIFY INSTALLATION GUIDE

Setting up Sharp MX-Color Imagers for Inbound Fax Routing to or Network Folder

Remote Monitoring Service - Setup Guide for InfraStruXure Central and StruxureWare 1 5

Hadoop Data Warehouse Manual

NAS 221 Remote Access Using Cloud Connect TM

Visual Studio.NET Database Projects

Only LDAP-synchronized users can access SAML SSO-enabled web applications. Local end users and applications users cannot access them.

WhatsUp Gold v16.3 Installation and Configuration Guide

Reporting works by connecting reporting tools directly to the database and retrieving stored information from the database.

NetIQ. How to guides: AppManager v7.04 Initial Setup for a trial. Haf Saba Attachmate NetIQ. Prepared by. Haf Saba. Senior Technical Consultant

SHAREPOINT 2013 IN INFRASTRUCTURE AS A SERVICE

Integrating ConnectWise Service Desk Ticketing with the Cisco OnPlus Portal

Immotec Systems, Inc. SQL Server 2005 Installation Document

Remote Access VPN SSL VPN Access via Internet Explorer

BaseManager & BACnet Manager VM Server Configuration Guide

Cloudera Manager Administration Guide

Outlook Profile Setup Guide Exchange 2010 Quick Start and Detailed Instructions

ManageEngine IT360. Professional Edition Installation Guide.

User Manual. Onsight Management Suite Version 5.1. Another Innovation by Librestream

Note: With v3.2, the DocuSign Fetch application was renamed DocuSign Retrieve.

AlienVault. Unified Security Management 5.x Configuring a VPN Environment

UP L18 Enhanced MDM and Updated Protection Hands-On Lab

CDH 5 Quick Start Guide

Cloud Services ADM. Agent Deployment Guide

Upgrading from MSDE to SQL Server 2005 Express Edition with Advanced Services SP2

Virtual Appliance for VMware Server. Getting Started Guide. Revision Warning and Disclaimer

RDS Migration Tool Customer FAQ Updated 7/23/2015

Network Probe User Guide

Installation Guide for Pulse on Windows Server 2012

docs.hortonworks.com

Using Virtual Machines

Cloudera Manager Introduction

DiskPulse DISK CHANGE MONITOR

HADOOP MOCK TEST HADOOP MOCK TEST I

Windows XP Exchange Client Installation Instructions

Pcounter Web Report 3.x Installation Guide - v Pcounter Web Report Installation Guide Version 3.4

User Manual. Version Yeastar Technology Co., Ltd.

XenApp/Citrix Program Neighborhood Installation

How to Remotely View Security Cameras Using the Internet

Transcription:

201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working with Cloudera Manager... 5 Hands- On Exercise: Enabling High Availability and Adding Services... 8 Hands- On Exercise: Monitoring and Using Hue, Impala, and Hive... 14 Hands- On Exercise: Host Templating and the Cloudera Manager API... 19 Hands- On Exercise: Parcels and Rolling Restarts... 22 Hands- On Exercise: Working With Users... 24 1

General Notes The Cloudera Manager training course uses Amazon Web Services (AWS) and EC2 instances to create a cluster in the cloud. There will be a total of four EC2 instances. The first virtual machine will run the Cloudera Manager services. Using AWS credentials, Cloudera Manager will provision the other three EC2 instances. If you are in an in- class environment for this course, your instructor will give you the necessary credentials and DNS name for the first EC2 instance. If you are taking this course online, you will need to follow the AWS installation instructions to create your cluster. 2

In-Class Preparation: Accessing Your Cluster In this preparatory exercise you will configure networking for your four instances. Accessing Your Cluster: Cloud Training Environment If you are in a Cloudera class and have been told by your instructor to perform this section, please do so. Otherwise, please skip to the first Hands- On Exercise. Your instructor will give you the information you need to access the cluster. This information will include the DNS name or IP address of the computer running Cloudera Manager. It will include the username and password to access Cloudera Manager. Open your browser and go to the server running Cloudera Manager using port 7180. The DNS name will vary. For example, if the DNS name is: ec2-72-44-45-204.compute-1.amazonaws.com Then the URL to type in the browser is: ec2-72-44-45-204.compute-1.amazonaws.com:7180 The browser will show the login page. Do not log in at this time. If there is an error, double check the DNS name or IP address and try again. If you are unable to access the login page, please ask your instructor for help. This is the end of the setup activity for the cloud training environment. 3

Self-Study Preparation: Creating Your Cluster In this preparatory exercise you will create your cluster. Creating Your Cluster: Cloud Training Environment If you are in a Cloudera class, please skip to the first Hands- On Exercise. You will need to create and install your cluster using Cloudera Manager. The documentation contains complete step- by- step instructions. You can find the documentation at: http://tiny.cloudera.com/install The easiest method is Installation Path A using Amazon Web Services. You will need a least four hosts. One host will run the Cloudera Manager services. The rest will run the Hadoop services. The version of Cloudera Manager should be 5.0.2 and the version of CDH should be 5.0.0. When installing the cluster, you should only install the HDFS, YARN, and ZooKeeper services. When prompted for the edition of Cloudera Manager to install, choose the trial for the Data Hub Edition. This is the end of the setup activity for the self-study training environment. 4

Hands-On Exercise: Working with Cloudera Manager In this exercise you will start working with Cloudera Manager. This will take you through many of the day- to- day operations you will perform on your cluster. Step 1: Logging In Before you can start working with Cloudera Manager, you must use your browser to connect to the host running Cloudera Manager. 1. Open your browser and go to the public DNS name of the host running Cloudera Manager using port 7180. The DNS name is assigned by AWS and will vary. For example, if the DNS name is: ec2-72-44-45-204.compute-1.amazonaws.com Then the URL to type in the browser is: ec2-72-44-45-204.compute-1.amazonaws.com:7180 2. The username is admin; the password is admin. 3. Select Remember Me and then click Login. Step 2: Viewing the Home Page The home page in Cloudera Manager gives an overview of the health of your cluster. 1. At the home page, look at the Status section. 2. Verify that all of the services and Cloudera Management Services are in good health. 5

3. Click on the context menu for the cluster and service to familiarize yourself with the commands. The context menu has the following icon: 4. In the Charts section, look at the charts for the cluster. 5. Move your mouse over the chart to get the absolute value of the point in time. 6. Click on the point in time to expand the chart details. 7. Press the left and right arrows to get the next and previous values in the chart. 8. Click on the X in the popup to close the chart details. 9. To the right of the charts are links to change the amount of time shown in the chart. Change the time to one hour then two hours and observe how the charts update. Step 3: Searching in Cloudera Manager Cloudera Manager includes a way to search through all settings and services to quickly find what you are looking for. 1. In the top right, click on the search box. 2. Type in HDFS and the context menu will display the relevant search items. 3. In the service section, click on HDFS- 1. 4. Notice that the search brought you to the HDFS overview page. 5. This time, use your keyboard and press the / key to access the search. 6. Type in YARN and the context menu will come up with the relevant search items. 7. In the service section, click on YARN- 1. 6

8. Notice that the search brought you to the YARN overview page. Step 4: Using the Timeline and Status Using the timeline, you can look at a specific period of time. Moving the timeline will update all of the data on the page, such as the charts and status. 1. In the timeline, move the start marker (leftmost) back an hour. 2. Move the end marker (rightmost) back an hour. 3. Notice that the service s data is updated with the statistics for the selected time period. 4. In the Health History section, click on the Show link for the various events. 5. Notice that the timeline will automatically be moved back to the time of the event and that the charts update as well. 6. To return to the present time, click on the Now button to the right of the timeline. 7. Notice that the timeline has moved back to the present time. This is the end of the Exercise 7

Hands-On Exercise: Enabling High Availability and Adding Services In this exercise you will enable high availability (HA) for HDFS. You will also install and configure some new services. Step 1: Enabling HDFS High Availability (HA) To remove the single point of failure in HDFS, we will enable HA in the cluster. This will change the SecondaryNameNode to run as a Standby NameNode. 1. From the Clusters tab, select the HDFS service. 1. From Actions, select Enable High Availability. 2. In the second row, click on the Standby NameNode column. 3. In all three rows, click on the JournalNode column. 4. Click on Continue. 5. Click on Continue to accept the Nameservice Name. 6. The new JournalNodes need to be configured with the directory where their state information will be stored. Change only the JournalNode Edits Directory setting for all of the JournalNodes to: /data0/dfs/jn 7. Click on Continue. Cloudera Manager will start enabling high availability. If the step Formatting the name directories of the current NameNode fails, don t worry it is expected to fail, since the directory is already formatted. 8. Once the process is complete, click on Finish. 9. Click on OK. We will perform the steps this message talks about in the next exercise. 8

Step 2: Verifying Automatic Failover When the Active NameNode fails, we want HDFS to automatically fail over to the Standby NameNode. 1. Click on the Instances tab. 2. Find the column Automatic Failover. 3. Verify that the value is Yes. Step 3: Performing a Manual Fail Over To verify the correct setup, we will manually fail over the HDFS service. 1. Click on Manual Failover. 2. Once the failover is completed, click on Close. Step 4: Viewing the NameNode Web UI Most services in Hadoop have a Web UI that gives some information about the service. Cloudera Manager takes many of the statistics and other information shown in these Web UIs and displays them in its UI. Each service provides these links as a convenience. 1. Click on the Web UI button and choose one of the nodes. 2. Look through the information presented by the service's Web UI. Once you are done, close the tab or window. Step 5: Adding Services In this step, we will be adding some Hadoop Ecosystem services. Hive and Impala are services that use a SQL- like language to process data. Hive is an abstraction on top of MapReduce. Impala runs its own role instances. Oozie is a workflow manager for Hadoop. It allows automation of entire Hadoop workflows with MapReduce, Hive, 9

and other ecosystem projects. Hue is a browser- based environment for graphically interacting with a Hadoop cluster. To install some services, a certain number of prerequisite services must be installed first. We want to install the Hue web interface and all prerequisite services. 1. Go to the Cloudera Manager main page. 2. Click on the context menu for the cluster and click on Add a Service The context menu has the following icon: 3. Click on the radio button next to Hue. 4. Click on Continue. An error will appear, showing that Hue requires services like Hive to be installed before you can install Hue. 5. Click on Close. 6. Click on the radio button next to Hive. 7. Click on Continue. 8. Click on Continue. 9. Click on Continue. 10. Click on Test Connection. 11. Click on Continue. 12. Click on Continue. The Hive service will start installing. 13. Once the installation has finished, click on Continue. 10

14. Click on Finish. 15. Use the same steps to install the Oozie service and accept all defaults. 16. Use the same steps to install the Impala service and accept all defaults. 17. Use the same steps to install the Hue service. When prompted to select the dependencies, choose the row with the impala service defined. Otherwise, accept all defaults. 18. Click on the context menu for the Hue service and click on Start. 19. Once the service is started, click on Close. Step 6: Adding Another DataNode The HDFS service is now showing bad health. This is because there are only two DataNodes running on the cluster. You must add another DataNode to replicate the HDFS blocks three times, which is the Hadoop default. 1. Click on the HDFS service. 2. Click on the Instances tab. 3. Notice the validation warning saying that there are only two DataNodes running and that the suggested number is three. 4. Click on Add. 5. In the DataNode section, click on Select hosts. 6. Click on All hosts. 7. Click on Continue. 8. Click on Finish. 9. Click on the check box for the newly added DataNode. It will be the only one that has a status of Stopped. 10. Click on Actions for Selected and then click on Start. 11

11. Click on Start. 12. Once the new instance is started, click on Close. The HDFS service will begin to replicate blocks to the new DataNode. During this time, the HDFS service will still say that it is in Bad Health. Once all the blocks have three- fold replication, the service will change to Good Health. Step 7: Configuring HA for Hue In the previous exercise, we enabled high availability. We need to make some configuration changes to allow Hue to work with HA. 1. Click on Clusters then HDFS. 2. Click on the Instances tab. 3. Click on Add. 4. Click in the row with the fewest Added Roles. This will add the HttpFS instance to that host. 5. Click on OK. 6. Click on Continue. 7. Click on the check box for the newly added HttpFS instance. It will be the only one that has a status of Stopped. 8. Click on Actions for Selected and then click on Start. 9. Click on Start. 10. Once the new instance is started, click on Close. 11. Click on Clusters then Hue. 12. Click on the Configuration tab and then View and Edit. 13. In the row for HDFS Web Interface Role, choose httpfs. 12

14. Click on Save Changes. 15. Go to the Cloudera Manager main page. 16. The cluster needs to be restarted to pick up the configuration changes. Click on the context menu for the cluster and click on Restart. 17. Click on Restart. 18. Click on Close. This is the end of the Exercise 13

Hands-On Exercise: Monitoring and Using Hue, Impala, and Hive In this exercise you will use Hue to run Impala and Hive queries and monitor their services. Step 1: Setting Up Hue 1. Click on Clusters then Hue. 2. Click on Hue Web UI. This will open a new tab or window for the Hue interface. 3. Log in with the username training and password training. This creates the default superuser for Hue. If you need log in to Hue again, you will need to use this username and password. 4. Click on Next. 5. Click on All to install all application examples. 6. Once the application examples are installed, click on Next. 7. Click on Next. 8. Click on Hue Home. Step 2: Querying Impala You can run Impala queries from within Hue. 1. Click on Query Editors, then Impala. 2. In the query box, type: invalidate metadata; 3. Click on Execute. 14

4. In the query box, type: SELECT * FROM sample_07 WHERE total_emp > 6003930; 5. Click on Execute. 6. In the query box box, type: SELECT AVG(salary), SUM(total_emp) FROM sample_07; 7. Click on Execute. Step 3: Monitoring Impala Queries Cloudera Manager monitors the queries and health of the Impala service. It gives information about each query that you run. 1. Go back to the tab or window for Cloudera Manager. 2. Click on Clusters then Impala. 3. Click on the Queries tab. 4. In the list of queries, find the last Impala query that you ran. Look at the data that is tracked by Cloudera Manager. 5. Click on the Details button for that row. 6. This page gives even more information about Impala s execution plan and information about the query, as well as displaying the query itself. Using this information, you can debug slow queries. 7. Click on Clusters then Impala. 8. Click on the Best Practices tab. 9. This page shows whether the best practices for Impala are being followed. A chart shows each best practice. Read through the descriptions of each chart. 15

Step 4: Creating an Impala Trigger You can use Cloudera Manager to trigger events when a certain threshold is passed. This can change the status of a service. 1. Click on the Charts Library tab. 2. Find the Impala Queries chart. This chart shows the number of queries per second that Impala served. 3. Mouse over the chart and click on the context menu for it. 4. Click on Create Trigger. 5. We want to create a trigger that will change the status if the Impala service is being used too much, indicating that we need to expand the cluster with new nodes. 6. Give the trigger the name Impala Usage. 7. Change the Stream Threshold to 50. 8. Click on Create Trigger. This trigger will now change the Impala service s heath to Concerning whenever there are 50 queries per second to the Impala service. Step 4: Running Hive Queries You can run Hive queries from within Hue. 1. Go back to the tab or window for Hue. 2. Click on Query Editors then Hive. 3. In the query box, type: 16

SELECT * FROM sample_08 WHERE description LIKE "%engineer%" ORDER BY salary DESC; 4. Click on Execute. 5. In the query box, type: SELECT * FROM sample_08 WHERE description NOT LIKE "%engineer%" ORDER BY salary DESC; 6. Click on Execute. 7. In the query box, type: SELECT isengineer, AVG(salary) as avgsalary FROM ( SELECT INSTR(description, "engineer")!= 0 as isengineer, salary FROM sample_08 ) engineersubselect GROUP BY isengineer; 8. Click on Execute. Step 5: Hive and MapReduce Monitoring Cloudera Manager monitors Hive queries and MapReduce job with the YARN service. 1. Go back to the tab or window for Cloudera Manager. 2. Click on Clusters then YARN. 17

3. Review the charts for the YARN service showing the Hive query activity. This is the end of the Exercise 18

Hands-On Exercise: Host Templating and the Cloudera Manager API In this exercise, we will create a host template for new hosts. We will use the Cloudera Manager API to get status about the cluster. Step 1: Creating a template As the number of hosts in the cluster grows, we want a simple way to configure the new hosts. Using Host Templating you can quickly add new hosts that have certain role instances and configurations. 1. Go to the Hosts tab. 2. Click on the Templates tab. 3. Click on Click here to create a new template. 4. Give the template the name Worker Host. 5. Under HDFS, check DataNode and leave the configuration group as it is. 6. Under YARN, check NodeManager and leave the configuration group as it is. 7. Click on Create. The next time a worker node is added, you can use the Worker Host template to quickly set up the host to run a DataNode and NodeManager. Step 2: Cloudera Manager API Cloudera Manager has a built- in RESTful API to get and set information about the cluster and its status. This API can be used with a browser, curl or other programming languages that support HTTP verbs. 1. Note the base URL of the Cloudera Manager server. For example, the base URL could be: 19

2.compute.amazonaws.com:7180 2. Open a new browser tab and type in the base URL followed by: /api/v6/tools/echo?message=hello%20world For example, the full URL would look like: 2.compute.amazonaws.com:7180/api/v6/tools/echo?message= hello%20world 3. After hitting enter, you will see browser update to JSON and echo back the message in the URL. This uses Cloudera Manager's built- in echo to verify connectivity and functionality. 4. Change the browser URL to the base URL followed by: /api/v6/clusters For example, the full URL would look like: http://ec2-54-191-61-124.us-west- http://ec2-54-191-61-124.us-west- http://ec2-54-191-61-124.us-west- 2.compute.amazonaws.com:7180/api/v6/clusters 5. After hitting enter, you will see some basic information about the cluster. The API will return the name of the cluster and version information. The cluster name is needed for other API calls. 6. Go back to the Cloudera Manager tab. 7. Click on Support then API Documentation. 20

8. This will bring up a page containing all of the documentation about Cloudera Manager s API. Familiarize yourself with the various calls. This is the end of the Exercise 21

Hands-On Exercise: Parcels and Rolling Restarts In this exercise, you will update the version of CDH and deploy the update with rolling restarts to prevent downtime. Step 1: Downloading the Parcels The cluster is using CDH 5.0.0, and newer minor versions have been released. Before we can download the new parcels, we must update the URL of where to download the parcels. 1. Click on Administration then Settings. 2. Find the setting that says Remote Parcel Repository URLs. 3. Change the setting to say: http://archive.cloudera.com/cdh5/parcels/5.0.1/ 4. Click on the Save Changes button on the top right. 5. Click on the New Parcels button. 6. Find the parcel for CDH 5.0.1 and click on Download. The parcel will start downloading in the background. This may take a few minutes to finish. Feel free to explore Cloudera Manager during this time. Step 2: Distributing and Activating the Parcels 1. Once the download is finished, click on Distribute. This will distribute the parcel to all hosts in the cluster. 2. Click on Activate. 22

3. Click on Rolling Restart. 4. Check all services to restart under rolling and basic. Not all services support rolling restarts. Services that do not support rolling restarts are listed as basic. 5. Under Roles to include, click on All Roles. 6. Click on Confirm. Cloudera Manager will stop certain services, rolling restart some services, and start all services. Once the process is done, the newer version of CDH will be active. This is the end of the Exercise 23

Hands-On Exercise: Working With Users In this exercise, we will create new users and see how their permissions work. Admin User Your current user is an admin user. Cloudera Manager creates this user by default. Other users can be created with less permissions. Step 1: Change the Admin Password For security purposes, you should change the default password for the admin user. 1. Click on Administration then Users. 2. Find the row for the admin user and click on the Change Password button. 3. Change the password to newpassword and click on Update. Step 2: Adding Users 1. Click on Add User. 2. Create a read- only user called readonly. 3. Create a limited administrator user call limited. Step 3: Logging In As Different Users 1. On the top right, click on admin then Logout. 2. At the prompt, login as the readonly user. 3. Click on the context menu for the cluster. Notice that there are no buttons to add services or stop the cluster. 24

4. Go to the HDFS service. 5. Click on the Configuration tab the View. Notice that the user can view all configurations, but cannot make any changes. 6. On the top right, click on readonly then Logout. 7. At the prompt, login as the limited user. 8. Click on the context menu for the cluster. Notice that there are no buttons to add services or stop the cluster. 9. Go to the HDFS service. 10. Click on the Configuration tab the View. Notice that the user can view all configurations, but cannot make any changes. 11. Click on the Hosts tab. 12. Click on Actions for Selected. Notice that this user can decommission hosts. This is the end of the Exercise 25