COURSE CONTENT Big Data and Hadoop Training



Similar documents
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Peers Techno log ies Pv t. L td. HADOOP

Complete Java Classes Hadoop Syllabus Contact No:

Qsoft Inc

ITG Software Engineering

ITG Software Engineering

Implement Hadoop jobs to extract business value from large and varied data sets

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce

BIG DATA - HADOOP PROFESSIONAL amron

BIG DATA HADOOP TRAINING

Hadoop Job Oriented Training Agenda

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Big Data Course Highlights

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from

Workshop on Hadoop with Big Data

HADOOP. Revised 10/19/2015

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS PART 4 BEYOND MAPREDUCE...385

Hadoop Ecosystem B Y R A H I M A.

Hadoop Certification (Developer, Administrator HBase & Data Science) CCD-410, CCA-410 and CCB-400 and DS-200

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Cloudera Certified Developer for Apache Hadoop

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Certified Big Data and Apache Hadoop Developer VS-1221

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

The Hadoop Eco System Shanghai Data Science Meetup

Hadoop implementation of MapReduce computational model. Ján Vaňo

Deploying Hadoop with Manager

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

TRAINING PROGRAM ON BIGDATA/HADOOP

Chase Wu New Jersey Ins0tute of Technology

Important Notice. (c) Cloudera, Inc. All rights reserved.

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

HDFS. Hadoop Distributed File System

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Hadoop & Spark Using Amazon EMR

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Open source Google-style large scale data analysis with Hadoop

A Brief Outline on Bigdata Hadoop

Cloudera Manager Training: Hands-On Exercises

Working With Hadoop. Important Terminology. Important Terminology. Anatomy of MapReduce Job Run. Important Terminology

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Fundamentals Curriculum HAWQ

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

MapReduce with Apache Hadoop Analysing Big Data

Hadoop Development & BI- 0 to 100

Constructing a Data Lake: Hadoop and Oracle Database United!

HADOOP MOCK TEST HADOOP MOCK TEST II

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Hadoop & its Usage at Facebook

A very short Intro to Hadoop

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Internals of Hadoop Application Framework and Distributed File System

HADOOP MOCK TEST HADOOP MOCK TEST I

Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop

Getting to know Apache Hadoop

Training Catalog. Summer 2015 Training Catalog. Apache Hadoop Training from the Experts. Apache Hadoop Training From the Experts

Hadoop Setup. 1 Cluster

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

Hadoop and Hive Development at Facebook. Dhruba Borthakur Zheng Shao {dhruba, Presented at Hadoop World, New York October 2, 2009

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE

Hadoop. Sunday, November 25, 12

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Using The Hortonworks Virtual Sandbox

Generic Log Analyzer Using Hadoop Mapreduce Framework

Data Analyst Program- 0 to 100

Understanding Hadoop Performance on Lustre

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

Apache Hadoop. Alexandru Costan

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

HiBench Introduction. Carson Wang Software & Services Group

Best Practices for Hadoop Data Analysis with Tableau

Hadoop IST 734 SS CHUNG

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

THE HADOOP DISTRIBUTED FILE SYSTEM

Transcription:

COURSE CONTENT Big Data and Hadoop Training

1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop and the Hadoop Ecosystem Hadoop Releases. 2. MapReduce A Weather Dataset Data Format Analyzing the Data with Unix Tools Analyzing the Data with Hadoop Map and Reduce Java MapReduce Scaling Out Data Flow Combiner Functions Running a Distributed MapReduce Job Hadoop Streaming Compiling and Running 3. The Hadoop Distributed Filesystem The Design of HDFS HDFS Concepts Blocks

Namenodes and Datanodes HDFS Federation HDFS High-Availability The Command-Line Interface Basic Filesystem Operations Hadoop Filesystems Interfaces The Java Interface Reading Data from a Hadoop URL Reading Data Using the FileSystem API Writing Data Directories Querying the Filesystem Deleting Data Data Flow Anatomy of a File Read Anatomy of a File Write Coherency Model Parallel Copying with distcp Keeping an HDFS Cluster Balanced Hadoop Archives 4. Hadoop I/O Data Integrity Data Integrity in HDFS LocalFileSystem ChecksumFileSystem Compression Codecs Compression and Input Splits Using Compression in MapReduce

Serialization The Writable Interface Writable Classes File-Based Data Structures SequenceFile MapFile 5. Developing a MapReduce Application The Configuration API Combining Resources Variable Expansion Configuring the Development Environment Managing Configuration GenericOptionsParser, Tool, and ToolRunner Writing a Unit Test Mapper Reducer Running Locally on Test Data Running a Job in a Local Job Runner Testing the Driver Running on a Cluster Packaging Launching a Job The MapReduce Web UI Retrieving the Results Debugging a Job Hadoop Logs Tuning a Job Profiling Tasks MapReduce Workflows Decomposing a Problem into MapReduce Jobs JobControl

6. How MapReduce Works Anatomy of a MapReduce Job Run Classic MapReduce (MapReduce 1) Failures Failures in Classic MapReduce Failures in YARN Job Scheduling The Capacity Scheduler Shuffle and Sort The Map Side The Reduce Side Configuration Tuning Task Execution The Task Execution Environment Speculative Execution Output Committers Task JVM Reuse Skipping Bad Records 7. MapReduce Types and Formats MapReduce Types The Default MapReduce Job Input Formats Input Splits and Records Text Input Binary Input Multiple Inputs Database Input (and Output) Output Formats Text Output Binary Output Multiple Outputs

Lazy Output Database Output 8. MapReduce Features Counters Built-in Counters User-Defined Java Counters User-Defined Streaming Counters Sorting Preparation Partial Sort Total Sort Secondary Sort Joins Map-Side Joins Reduce-Side Joins Side Data Distribution Using the Job Configuration Distributed Cache MapReduce Library Classes 9. Setting Up a Hadoop Cluster Cluster Specification Network Topology Cluster Setup and Installation Installing Java Creating a Hadoop User Installing Hadoop Testing the Installation SSH Configuration Hadoop Configuration Configuration Management Environment Settings

Important Hadoop Daemon Properties Hadoop Daemon Addresses and Ports Other Hadoop Properties User Account Creation YARN Configuration Important YARN Daemon Properties YARN Daemon Addresses and Ports Security Kerberos and Hadoop Delegation Tokens Other Security Enhancements Benchmarking a Hadoop Cluster Hadoop Benchmarks User Jobs Hadoop in the Cloud Hadoop on Amazon EC2 10. Administering Hadoop HDFS Persistent Data Structures Safe Mode Audit Logging Tools Monitoring Logging Metrics Java Management Extensions Routine Administration Procedures Commissioning and Decommissioning Nodes Upgrades

11. Pig Installing and Running Pig Execution Types Running Pig Programs Grunt Pig Latin Editors An Example Generating Examples Comparison with Databases Pig Latin Structure Statements Expressions Types Schemas Functions Macros User-Defined Functions A Filter UDF An Eval UDF A Load UDF Data Processing Operators Loading and Storing Data Filtering Data Grouping and Joining Data Sorting Data Combining and Splitting Data Pig in Practice Parallelism Parameter Substitution 12. Hive

Installing Hive The Hive Shell An Example Running Hive Configuring Hive Hive Services Comparison with Traditional Databases Schema on Read Versus Schema on Write Updates, Transactions, and Indexes HiveQL Data Types Operators and Functions Tables Managed Tables and External Tables Partitions and Buckets Storage Formats Importing Data Altering Tables Dropping Tables Querying Data Sorting and Aggregating MapReduce Scripts Joins Subqueries Views User-Defined Functions Writing a UDF Writing a UDAF 13. Hbase Hbasics Backdrop

Concepts Whirlwind Tour of the Data Model Implementation Installation Test Drive Clients Java Avro, REST, and Thrift Schemas Loading Data Web Queries HBase Versus RDBMS Successful Service Hbase 14. ZooKeeper Installing and Running ZooKeeper Group Membership in ZooKeeper Creating the Group Joining a Group Listing Members in a Group Deleting a Group The ZooKeeper Service Data Model Operations Implementation Consistency Sessions States 15. Sqoop Getting Sqoop A Sample Import

Generated Code Additional Serialization Systems Database Imports: A Deeper Look Controlling the Import Imports and Consistency Direct-mode Imports Working with Imported Data Imported Data and Hive Importing Large Objects o Performing an Expo