Big Data Course Highlights



Similar documents
Qsoft Inc

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Workshop on Hadoop with Big Data

Peers Techno log ies Pv t. L td. HADOOP

ITG Software Engineering

ITG Software Engineering

BIG DATA HADOOP TRAINING

Complete Java Classes Hadoop Syllabus Contact No:

Implement Hadoop jobs to extract business value from large and varied data sets

BIG DATA - HADOOP PROFESSIONAL amron

Hadoop Job Oriented Training Agenda

COURSE CONTENT Big Data and Hadoop Training

Hadoop Ecosystem B Y R A H I M A.

Hadoop: The Definitive Guide

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

Introduction to Big Data Training

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS PART 4 BEYOND MAPREDUCE...385

Certified Big Data and Apache Hadoop Developer VS-1221

TRAINING PROGRAM ON BIGDATA/HADOOP

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

HADOOP. Revised 10/19/2015

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Hadoop IST 734 SS CHUNG

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Hadoop Development & BI- 0 to 100

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Internals of Hadoop Application Framework and Distributed File System

Hadoop and Map-Reduce. Swati Gore

Data Analyst Program- 0 to 100

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from

Oracle Big Data Fundamentals Ed 1 NEW

BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

A Brief Outline on Bigdata Hadoop

Deploying Hadoop with Manager

Constructing a Data Lake: Hadoop and Oracle Database United!

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Beyond Hadoop with Apache Spark and BDAS

Data processing goes big

How To Scale Out Of A Nosql Database

Cloudera Certified Developer for Apache Hadoop

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Big Data Too Big To Ignore

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Communicating with the Elephant in the Data Center

Big Data Training - Hackveda

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Google Bing Daytona Microsoft Research

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to am to 5 pm HOTEL DUBAI GRAND DUBAI

Dominik Wagenknecht Accenture

The Hadoop Eco System Shanghai Data Science Meetup

<Insert Picture Here> Big Data

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

HDP Hadoop From concept to deployment.

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Large scale processing using Hadoop. Ján Vaňo

Big Data Workshop. dattamsha.com

Big Data on Microsoft Platform

Next Gen Hadoop Gather around the campfire and I will tell you a good YARN

HDFS. Hadoop Distributed File System

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

[Type text] Week. National summer training program on. Big Data & Hadoop. Why big data & Hadoop is important?

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Scaling Up 2 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. HBase, Hive

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Hadoop: The Definitive Guide

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Testing Big data is one of the biggest

Apache Hadoop: Past, Present, and Future

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

WA2341 Hadoop Programming EVALUATION ONLY

Hadoop implementation of MapReduce computational model. Ján Vaňo

Unified Big Data Analytics Pipeline. 连 城

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Big Data: Tools and Technologies in Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Hadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

APACHE HADOOP JERRIN JOSEPH CSU ID#

MapReduce with Apache Hadoop Analysing Big Data

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Scaling Up HBase, Hive, Pegasus

Integrating Big Data into the Computing Curricula

MySQL and Hadoop Big Data Integration

SQL on NoSQL (and all of the data) With Apache Drill

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

THE HADOOP DISTRIBUTED FILE SYSTEM

Transcription:

Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like a Word Count program) to some of the advanced concepts around Hadoop/Big Data (like writing triggers/stored procedures to HBase). The main focus of the course will be on how to use the Big Data tools, but we will also focus on how to install and configure some of the Big Data related frameworks on in-premise and cloud based infrastructure. Because of the hype Hadoop is the news all the time. But, there are a lot of frameworks supporting Hadoop (like Pig/Hive) and a lot of frameworks which are alternatives to Hadoop (like Twitter Storm and LinkedIn Samza) to address the limitations of the MapReduce model. Some of these frameworks will also be discussed during the course to give a big picture of what Big Data is about. Also, time will be spent on NoSQL databases. Starting with why NoSQL instead of RDBMS databases to some of advanced concepts like importing data in bulk from a RDBMS to a NoSQL database. Different NoSQL databases will be compared and HBase will be discussed in much more detail. A VM (Virtual Machine) will be provided for all the participants with Big Data frameworks (Hadoop etc.) installed and configured on CentOS with data sets and code to process the same. The VM helps in making the Big Data learning experience less steeper. The training will help the participant get through the Cloudera Certified Developer for Apache Hadoop (CCDH) certification with minimal effort.

Pre-Requisites: Knowledge of Java is a definitive plus to get started with Big Data, but not mandatory. Hadoop provides streaming which allows programming MapReduce in non-java languages like Perl, Python and there are also higher level abstracts like Hive/Pig which provides SQL like procedure type interface. Similarly knowledge of Linux would be a definitive plus but the basics of Linux just enough to get started with the different Big data frameworks. A laptop/desktop with minimum of 3GB RAM, 10 GB free HARD Disk and with a decent processor. These specifications would be enough to run the Big Data VM and the framework smoothly. Who should Plan on joining this program Audience: This course is designed for anyone who is Any Developer with skills in other technologies interested in getting into the emerging Big Data field. Any Data Analyst who would like to enhance/transfer their existing knowledge to the Big Data space. Any Architect who would like to design application in conjunction to Big Data or Big Data applications itself.

Anyone involved in Software Quality Assurance (Testing). Knowledge of Hadoop will help them to test the application better and will also help them to move into the development cycle. Topics covered in the training Understanding Big Data Understanding Big Data - 3V (Volume-Variety-Velocity) characteristics - Structured and Unstructured Data - Application and use cases of Big Data Limitations of traditional large Scale systems How a distributed way of computing is superior (cost and scale) Opportunities and challenges with Big Data HDFS (The Hadoop Distributed File System) HDFS Overview and Architecture Deployment Architecture Name Node, Data Node and Checkpoint Node (aka Secondary Name Node) Safe mode Configuration files HDFS Data Flows (Read vs Write) How HDFS addresses fault tolerance? CRC Check Sum Data replication Rack awareness and Block placement policy Small files problem

HDFS Interfaces Web Interface Command Line Interface File System Administrative Advanced HDFS features Load Balancer DistCP HDFS High Availability Hadoop Cache MapReduce - 1 MapReduce Overview Functional Programming paradigms How to think in a MapReduce way? MapReduce Architecture Legacy MR vs Next Generation MapReduce (YARN) Slots vs Containers Schedulers Shuffling, Sorting Hadoop Data types Input and Output formats Input Splits

Partitioning (Hash Partitioner vs Custom Partitioner) Counters Configuration files Distributed Cache MapReduce 2 Developing, debugging and deploying MR programs Standalone mode(eclipse) Pseudo distributed mode (as in the Big Data VM) Fully Distributed Mode MR API Old and new MR API Java Client API Overview of MRUnit Hadoop Data types and custom writables/writable comparables Different input and output formats Saving Binary Data using Sequence Files and Avro Files Optimizing techniques Speculative execution Combiners Compression MR algorithms Sorting (Max Temperature) Different ways of joining data Inverted Index

Word co-occurrence Pig Introduction to PIG Why PIG not MapReduce Pig Components Pig Execution Modes Pig Shell - Grunt Pig Latin, Writing PIG Latin scripts Pig Data Types Storage Types Diagnosing Pig commands Macros UDF and External Scripts Hive Introduction and Architecture Different modes of executing Hive queries Metastore implementations HiveQL (DDL & DML operations) External vs Internal Tables Views Partitions & Buckets UDF Comparison of Pig and Hive Flume Overview of Flume

Where is Flume used - import/export unstructured data Flume Architecture Using Flume to load data into HDFS Sqoop Overview of Sqoop Where is Sqoop used - import/export structured data Using Sqoop to import data from RDBMS into HDFS Using Sqoop to import data from RDBMS into HBASE Using Sqoop to export data from HDFS into RDMBS Sqoop connectors Impala Overview of Impala Architecture of Impala NoSQL Databases Introduction to NoSQL database Types of NoSQL databases and their features Brewers CAP Theorem Advantage of NoSQL vs. traditional RDBMS ACID vs BASE Different types of NoSQL databases Key value Columnar Document Graph

HBase Introduction to HBase Why use HBase HBase Architecture - read and write paths HBase vs. RDBMS Installing and Configuration Schema design in HBase - column families, hot spotting Accessing data with HBase Shell Accessing data with HBase API - Reading, Adding, Updating data from the shell, JAVA API HBase Coprocessors (Endpoints, Observers) POC Click stream analysis Analyzing the Twitter data with Hive