Cloudera Administrator Training for Apache Hadoop



Similar documents
Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Qsoft Inc

Fundamentals Curriculum HAWQ

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Oracle Big Data Essentials

Peers Techno log ies Pv t. L td. HADOOP

BIG DATA HADOOP TRAINING

Certified Big Data and Apache Hadoop Developer VS-1221

Table of Contents. Introduction. Audience. At Course Completion

Apache Hadoop: Past, Present, and Future

Hadoop Architecture. Part 1

ITG Software Engineering

Deploying Hadoop with Manager

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Oracle Big Data Fundamentals Ed 1 NEW

20417-Upgrading Your Skills to MCSA Windows Server 2012

Planning and Administering Windows Server 2008 Servers

CURSO: ADMINISTRADOR PARA APACHE HADOOP

Adobe s Story of Integrating Hadoop and SAP HANA with SAP Data Services

Planning and Administering Windows Server 2008 Servers

CDH AND BUSINESS CONTINUITY:

Bright Cluster Manager

Workshop on Hadoop with Big Data

HDFS Users Guide. Table of contents

THE HADOOP DISTRIBUTED FILE SYSTEM

Cloudera Manager Training: Hands-On Exercises

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Implementing and Managing Windows Server 2008 Clustering

Data Analyst Program- 0 to 100

Planning and Administering Windows Server 2008 Servers

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Test-King.CCA Q.A. Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH)

ITG Software Engineering

HADOOP. Revised 10/19/2015

HADOOP MOCK TEST HADOOP MOCK TEST II

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

Training Catalog. Summer 2015 Training Catalog. Apache Hadoop Training from the Experts. Apache Hadoop Training From the Experts

Big Data Course Highlights

Course Syllabus. Planning and Administering Windows Server 2008 Servers. Key Data. Audience. At Course Completion. Prerequisites. Recommended Courses

Course 2788A: Designing High Availability Database Solutions Using Microsoft SQL Server 2005

The Greenplum Analytics Workbench

How Intel IT Successfully Migrated to Cloudera Apache Hadoop*

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Big Data Too Big To Ignore

NOTE: Labs in this course are based on the General Availability release of Windows Server 2012 R2 and Windows 8.1.

Upgrading Your Skills to MCSA Windows Server 2012

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Lessons Learned: Building a Big Data Research and Education Infrastructure

Cloudera Backup and Disaster Recovery

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

HADOOP MOCK TEST HADOOP MOCK TEST I

Introduction to Big Data Training

Upgrading Your Skills to MCSA Windows Server 2012

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Data movement for globally deployed Big Data Hadoop architectures

Complete Java Classes Hadoop Syllabus Contact No:

M6430a Planning and Administering Windows Server 2008 Servers

Updating Your Skills from Microsoft Exchange Server 2003 or Exchange Server 2007 to Exchange Server 2010 Course 10165; 5 Days, Instructor-led

Implement Hadoop jobs to extract business value from large and varied data sets

Lenovo ThinkServer Solution For Apache Hadoop: Cloudera Installation Guide

COURSE CONTENT Big Data and Hadoop Training

Deploying Cisco Unified Contact Center Express 5.0 (UCCX)

Core Solutions of Microsoft Exchange Server 2013

Chase Wu New Jersey Ins0tute of Technology

Apache Hadoop new way for the company to store and analyze big data

Administering a Microsoft SQL Server 2000 Database

Module: Sharepoint Administrator

Cloudera Manager Health Checks

VMWARE COURSE OUTLINE. Revision 1.0 Prepared by: See CY

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Managing and Maintaining Windows Server 2008 Servers

Cloudera Manager Introduction

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from

Course # 20417B. Upgrading Your Skills to MCSA Windows Server 2012

Symantec Enterprise Vault 10.x for File System Archiving: Administration

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

MS Upgrading Your Skills to MCSA Window Server 20102

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.

Getting Hadoop, Hive and HBase up and running in less than 15 mins

Configuring Managing and Troubleshooting Microsoft Exchange Server 2010

20341 Core Solutions of Microsoft Exchange Server 2013

Administering a Microsoft SQL Server 2000 Database

Dell Reference Configuration for Hortonworks Data Platform

Apache Hadoop. Alexandru Costan

Has been into training Big Data Hadoop and MongoDB from more than a year now

A very short Intro to Hadoop

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop

Deploying Cisco Unified Contact Center Express - Digital

Transcription:

Cloudera Administrator Training for Apache Hadoop Duration: 4 Days Course Code: GK3901 Overview: In this hands-on course, you will be introduced to the basics of Hadoop, Hadoop Distributed File System (HDFS),, Hive,, and. You will cover core administration skills, such as cluster deployment, job management, and ongoing Hadoop maintenance and monitoring, as you gain the expertise to support your environments in day-to-day activities. This course covers concepts addressed on the Cloudera Certified Administrator for Apache Hadoop (CCAH) exam and includes a CCAH exam voucher you'll receive at the end of class. Target Audience: System administrators looking to understand all of the steps necessary to operate and manage Apache Hadoop clusters Objectives: HDFS and Configure the FairScheduler to provide service-level agreements for multiple users of a cluster Optimal hardware configurations for Hadoop clusters Maintain and monitor your cluster Network considerations to take into account when building out your cluster Load data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop Configure Hadoop options for best cluster performance System administration issues with other Hadoop projects such as Hive,, and Prerequisites: Basic level of Linux system administration experience Prior knowledge of Apache Hadoop is not required Testing and Certification This course is part of the following programs or tracks: CCAH: Cloudera Certified Administrator for Apache Hadoop (CDH3) Follow-on-Courses: Cloudera Training for Apache Cloudera Training for Apache Hive and

Content: Hadoop and HDFS Managing and Scheduling Jobs Why Hadoop? Starting and Stopping Jobs HDFS Hive,,, and Other Ecosystem Hive,,, and Other Ecosystem Hive,,, and Other Ecosystem Choosing the Right Software Backup and Restore Backup and Restore Web UIs Using the NameNode and JobTracker Web Using the NameNode and JobTracker General Optimization Tips General Optimization Tips Using Flume Using Flume HDFS Hive,,, and Other Ecosystem Hive,,, and Other Ecosystem Hive,,, and Other Ecosystem Choosing the Right Software Backup and Restore Backup and Restore Web UIs Using the NameNode and JobTracker Web Using the NameNode and JobTracker

General Optimization Tips General Optimization Tips Using Flume Using Flume HDFS Cluster Maintenance Hive,,, and Other Ecosystem HDFS Checking HDFS with Fsck Choosing the Right Hardware Hive,,, and Other Ecosystem Choosing the Right Software Projects Using SCM Express for Easy Installation Choosing the Right Hardware HDFS Typical Configuration Parameters Configuring Rack Awareness Choosing the Right Software Hive,,, and Other Ecosystem Using Configuration Management Tools Using SCM Express for Easy Installation Projects FIFO Scheduler Typical Configuration Parameters Choosing the Right Hardware Fair Scheduler Configuring Rack Awareness Copying Data with Distcp Using Configuration Management Tools Choosing the Right Software Rebalancing Cluster Nodes FIFO Scheduler Using SCM Express for Easy Installation Adding and Removing Cluster Nodes Fair Scheduler Typical Configuration Parameters Backup and Restore Copying Data with Distcp Configuring Rack Awareness Upgrading and Migrating Rebalancing Cluster Nodes Using Configuration Management Tools NameNode Metadata Adding and Removing Cluster Nodes FIFO Scheduler Using the NameNode and JobTracker Backup and Restore Fair Scheduler Web UIs Upgrading and Migrating Copying Data with Distcp Interpreting Job Logs NameNode Metadata Rebalancing Cluster Nodes Monitoring with Ganglia Using the NameNode and JobTracker Web Adding and Removing Cluster Nodes UIs Backup and Restore General Optimization Tips Interpreting Job Logs Upgrading and Migrating Benchmarking Your Cluster Monitoring with Ganglia NameNode Metadata Using Flume Using the NameNode and JobTracker General Optimization Tips Web UIs Benchmarking Your Cluster Interpreting Job Logs Using Flume Monitoring with Ganglia General Optimization Tips Benchmarking Your Cluster Using Flume HDFS Planning Your Hadoop Cluster Hive,,, and Other Ecosystem Projects General Planning Considerations Choosing the Right Hardware Choosing the Right Software Using SCM Express for Easy Installation HDFS HDFS Typical Configuration Parameters Configuring Rack Awareness Hive,,, and Other Ecosystem Hive,,, and Other Ecosystem Using Configuration Management Tools Projects Projects FIFO Scheduler Choosing the Right Hardware Choosing the Right Hardware Fair Scheduler Copying Data with Distcp Choosing the Right Software Choosing the Right Software Rebalancing Cluster Nodes Using SCM Express for Easy Installation Using SCM Express for Easy Installation Adding and Removing Cluster Nodes Typical Configuration Parameters Typical Configuration Parameters Backup and Restore Configuring Rack Awareness Configuring Rack Awareness Upgrading and Migrating Using Configuration Management Tools Using Configuration Management Tools NameNode Metadata FIFO Scheduler FIFO Scheduler Using the NameNode and JobTracker Fair Scheduler Fair Scheduler Web UIs Copying Data with Distcp Copying Data with Distcp Interpreting Job Logs Rebalancing Cluster Nodes Rebalancing Cluster Nodes Monitoring with Ganglia Adding and Removing Cluster Nodes Adding and Removing Cluster Nodes Backup and Restore Backup and Restore General Optimization Tips Upgrading and Migrating Upgrading and Migrating Benchmarking Your Cluster

NameNode Metadata NameNode Metadata Using Flume Using the NameNode and JobTracker Web Using the NameNode and JobTracker UIs Web UIs Interpreting Job Logs Interpreting Job Logs Monitoring with Ganglia Monitoring with Ganglia General Optimization Tips General Optimization Tips Populating HDFS from External Sources Benchmarking Your Cluster Benchmarking Your Cluster Using Flume Using Flume Using Sqoop HDFS Hive,,, and Other Ecosystem Hive,,, and Other Ecosystem Hive,,, and Other Ecosystem Choosing the Right Software Backup and Restore Backup and Restore Web UIs Using the NameNode and JobTracker Web Using the NameNode and JobTracker General Optimization Tips General Optimization Tips Using Flume Using Flume HDFS Hive,,, and Other Ecosystem Hive,,, and Other Ecosystem Hive,,, and Other Ecosystem Choosing the Right Software

Backup and Restore Backup and Restore Web UIs Using the NameNode and JobTracker Web Using the NameNode and JobTracker General Optimization Tips General Optimization Tips Using Flume Using Flume Installing and Managing Other Hadoop Projects Hive Deploying Your Cluster Installing Hadoop HDFS HDFS Hive,,, and Other Ecosystem Projects Hive,,, and Other Ecosystem HDFS Choosing the Right Hardware Projects Choosing the Right Hardware Hive,,, and Other Ecosystem Choosing the Right Software Projects Using SCM Express for Easy Installation Choosing the Right Software Choosing the Right Hardware Typical Configuration Parameters Using SCM Express for Easy Installation Configuring Rack Awareness Typical Configuration Parameters Choosing the Right Software Using Configuration Management Tools Configuring Rack Awareness Using SCM Express for Easy Installation FIFO Scheduler Using Configuration Management Tools Typical Configuration Parameters Fair Scheduler FIFO Scheduler Configuring Rack Awareness Copying Data with Distcp Fair Scheduler Using Configuration Management Tools Rebalancing Cluster Nodes Copying Data with Distcp FIFO Scheduler Adding and Removing Cluster Nodes Rebalancing Cluster Nodes Fair Scheduler Backup and Restore Adding and Removing Cluster Nodes Copying Data with Distcp Upgrading and Migrating Backup and Restore Rebalancing Cluster Nodes NameNode Metadata Upgrading and Migrating Adding and Removing Cluster Nodes Using the NameNode and JobTracker NameNode Metadata Backup and Restore Web UIs Using the NameNode and JobTracker Upgrading and Migrating Interpreting Job Logs Web UIs NameNode Metadata Monitoring with Ganglia Interpreting Job Logs Using the NameNode and JobTracker Web Monitoring with Ganglia UIs General Optimization Tips Interpreting Job Logs Benchmarking Your Cluster General Optimization Tips Monitoring with Ganglia Using Flume Benchmarking Your Cluster Using Flume General Optimization Tips Benchmarking Your Cluster Using Flume HDFS HDFS Hive,,, and Other Ecosystem Projects Hive,,, and Other Ecosystem HDFS Choosing the Right Hardware Projects Choosing the Right Hardware Hive,,, and Other Ecosystem Choosing the Right Software Projects Using SCM Express for Easy Installation Choosing the Right Software Choosing the Right Hardware Typical Configuration Parameters Using SCM Express for Easy Installation Configuring Rack Awareness Typical Configuration Parameters Choosing the Right Software Using Configuration Management Tools Configuring Rack Awareness Using SCM Express for Easy Installation FIFO Scheduler Using Configuration Management Tools Typical Configuration Parameters Fair Scheduler FIFO Scheduler Configuring Rack Awareness Copying Data with Distcp Fair Scheduler

Using Configuration Management Tools Rebalancing Cluster Nodes Copying Data with Distcp FIFO Scheduler Adding and Removing Cluster Nodes Rebalancing Cluster Nodes Fair Scheduler Backup and Restore Adding and Removing Cluster Nodes Copying Data with Distcp Upgrading and Migrating Backup and Restore Rebalancing Cluster Nodes NameNode Metadata Upgrading and Migrating Adding and Removing Cluster Nodes Using the NameNode and JobTracker NameNode Metadata Backup and Restore Web UIs Using the NameNode and JobTracker Upgrading and Migrating Interpreting Job Logs Web UIs NameNode Metadata Monitoring with Ganglia Interpreting Job Logs Using the NameNode and JobTracker Web Monitoring with Ganglia UIs General Optimization Tips Interpreting Job Logs Benchmarking Your Cluster General Optimization Tips Monitoring with Ganglia Using Flume Benchmarking Your Cluster Using Flume General Optimization Tips Benchmarking Your Cluster Using Flume Cluster Monitoring, Troubleshooting, and Optimizing Hadoop Log Files HDFS Hive,,, and Other Ecosystem HDFS Projects HDFS Choosing the Right Hardware Hive,,, and Other Ecosystem Projects Hive,,, and Other Ecosystem Choosing the Right Software Choosing the Right Hardware Projects Using SCM Express for Easy Installation Choosing the Right Hardware Typical Configuration Parameters Choosing the Right Software Configuring Rack Awareness Using SCM Express for Easy Installation Choosing the Right Software Using Configuration Management Tools Typical Configuration Parameters Using SCM Express for Easy Installation FIFO Scheduler Configuring Rack Awareness Typical Configuration Parameters Fair Scheduler Using Configuration Management Tools Configuring Rack Awareness Copying Data with Distcp FIFO Scheduler Using Configuration Management Tools Rebalancing Cluster Nodes Fair Scheduler FIFO Scheduler Adding and Removing Cluster Nodes Copying Data with Distcp Fair Scheduler Backup and Restore Rebalancing Cluster Nodes Copying Data with Distcp Upgrading and Migrating Adding and Removing Cluster Nodes Rebalancing Cluster Nodes NameNode Metadata Backup and Restore Adding and Removing Cluster Nodes Using the NameNode and JobTracker Upgrading and Migrating Backup and Restore Web UIs NameNode Metadata Upgrading and Migrating Interpreting Job Logs Using the NameNode and JobTracker Web NameNode Metadata Monitoring with Ganglia UIs Using the NameNode and JobTracker Interpreting Job Logs Web UIs General Optimization Tips Monitoring with Ganglia Interpreting Job Logs Benchmarking Your Cluster Monitoring with Ganglia Using Flume General Optimization Tips Benchmarking Your Cluster General Optimization Tips Using Flume Benchmarking Your Cluster Using Flume Labs Install a Pseudo-Distributed Cluster Install a Hadoop Cluster Manage Jobs HDFS Use the FairScheduler HDFS Break the Cluster Hive,,, and Other Ecosystem Verify the Cluster's Self-Healing Features Projects Hive,,, and Other Ecosystem Back Up and Restoring Choosing the Right Hardware Projects Configure the Hive Shared Choosing the Right Hardware Choosing the Right Software Using SCM Express for Easy Installation Choosing the Right Software

Typical Configuration Parameters Configuring Rack Awareness Using Configuration Management Tools FIFO Scheduler Fair Scheduler Copying Data with Distcp Rebalancing Cluster Nodes Adding and Removing Cluster Nodes Backup and Restore Upgrading and Migrating NameNode Metadata Using the NameNode and JobTracker Web UIs Interpreting Job Logs Monitoring with Ganglia General Optimization Tips Benchmarking Your Cluster Using Flume Using SCM Express for Easy Installation Typical Configuration Parameters Configuring Rack Awareness Using Configuration Management Tools FIFO Scheduler Fair Scheduler Copying Data with Distcp Rebalancing Cluster Nodes Adding and Removing Cluster Nodes Backup and Restore Upgrading and Migrating NameNode Metadata Using the NameNode and JobTracker Web UIs Interpreting Job Logs Monitoring with Ganglia General Optimization Tips Benchmarking Your Cluster Using Flume Further Information: For More information, or to book your course, please call us on Head Office 01189 123456 / Northern Office 0113 242 5931 info@globalknowledge.co.uk www.globalknowledge.co.uk Global Knowledge, Mulberry Business Park, Fishponds Road, Wokingham Berkshire RG41 2GY UK