Certified Big Data and Apache Hadoop Developer VS-1221



Similar documents
Qsoft Inc

Workshop on Hadoop with Big Data

ITG Software Engineering

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Complete Java Classes Hadoop Syllabus Contact No:

CURSO: ADMINISTRADOR PARA APACHE HADOOP

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

BIG DATA HADOOP TRAINING

Big Data Course Highlights

Peers Techno log ies Pv t. L td. HADOOP

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

ITG Software Engineering

HADOOP MOCK TEST HADOOP MOCK TEST I

Hadoop Job Oriented Training Agenda

BIG DATA - HADOOP PROFESSIONAL amron

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

<Insert Picture Here> Big Data

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Certified Apache CouchDB Professional VS-1045

Hadoop Ecosystem B Y R A H I M A.

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

COURSE CONTENT Big Data and Hadoop Training

HADOOP. Revised 10/19/2015

Hadoop IST 734 SS CHUNG

BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION

Hadoop implementation of MapReduce computational model. Ján Vaňo

Deploying Hadoop with Manager

Hadoop: The Definitive Guide

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

THE HADOOP DISTRIBUTED FILE SYSTEM

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Hadoop Architecture. Part 1

Apache Hadoop: Past, Present, and Future

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Big Data With Hadoop

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

Data Analyst Program- 0 to 100

Large scale processing using Hadoop. Ján Vaňo

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

The Greenplum Analytics Workbench

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

Oracle Big Data Fundamentals Ed 1 NEW

The Future of Data Management

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Big Data Too Big To Ignore

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Design and Evolution of the Apache Hadoop File System(HDFS)

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Hadoop: Embracing future hardware

Hadoop Scalability at Facebook. Dmytro Molkov YaC, Moscow, September 19, 2011

Cloudera Administrator Training for Apache Hadoop

Apache HBase. Crazy dances on the elephant back

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

TRAINING PROGRAM ON BIGDATA/HADOOP

Open source Google-style large scale data analysis with Hadoop

Hadoop Development & BI- 0 to 100

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

BIG DATA TRENDS AND TECHNOLOGIES

Fundamentals Curriculum HAWQ

Chase Wu New Jersey Ins0tute of Technology

Implement Hadoop jobs to extract business value from large and varied data sets

Microsoft SQL Server 2012 with Hadoop

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

The Future of Data Management with Hadoop and the Enterprise Data Hub

Internals of Hadoop Application Framework and Distributed File System

Comprehensive Analytics on the Hortonworks Data Platform

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Constructing a Data Lake: Hadoop and Oracle Database United!

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Case Study : 3 different hadoop cluster deployments

APACHE HADOOP JERRIN JOSEPH CSU ID#

Apache Hadoop. Alexandru Costan

How to Hadoop Without the Worry: Protecting Big Data at Scale

A very short Intro to Hadoop

Apache Hadoop FileSystem and its Usage in Facebook

White Paper: What You Need To Know About Hadoop

Transcription:

Certified Big Data and Apache Hadoop Developer VS-1221

Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification assesses the knowledge and skills required to become a successful Hadoop Developer, Administrator, Data Scientist Professional etc in the field of Big Data. The certification tests the candidates on various areas in Big Data and Apache Hadoop. Please note that completing the Video based course by Digital Vidya is mandatory to appear in this certification exam. Why should one take this certification? This Course is intended for professionals and graduates wanting to excel in their chosen areas. It is also well suited for those who are already working and would like to take certification for further career progression. Earning Vskills Big Data and Apache Hadoop Developer Certification can help candidate differentiate in today's competitive job market, broaden their employment opportunities by displaying their advanced skills, and result in higher earning potential. Who will benefit from taking this certification? The course is designed for professionals aspiring to make a career in Big Data and Hadoop Framework. Students, Software Professionals, Analytics Professionals, ETL developers, Project Managers, Architects, and Testing Professionals are the key beneficiaries of this course. Other professionals who are looking forward to acquire a solid foundation on Big Data Industry can also opt for this course. This not only improves their skill set but also makes their CV stronger and existing employees looking for a better role can prove their employers the value of their skills through this certification. Test Details Duration: 60 minutes No. of questions: 50 Maximum marks: 50, Passing marks: 35 (70%) There is no negative marking in this module. Fee Structure Rs. 4,999/- (Includes all taxes)

Companies that hire Vskills Big Data and Apache Hadoop Developer With 1.8 trillion gigabytes of structured and unstructured data in the world, and the volume doubling every two years, the need for big data analysis and business intelligence has never been greater. It adds up to an incredible need for Hadoop professionals who understand how to develop, process and manage half of world's data on Hadoop. Build game-changing Big Data Applications on Hadoop and future-proof your career.

Table of Contents Module 1: Introduction to Big Data and Hadoop 1. Today s Market 2. Current Situation 3. Introduction to Big Data 4. Sources of Big Data 5. Technical & Business Drivers 6. Big Data Use Cases Banking, Healthcare, Agriculture 7. Traditional DBMS & their Limitations 8. Introduction to Hadoop 9. Hadoop Usage 10. Real-Time Use Cases Retail, Farming Module 2: Getting started with Hadoop 1. Hadoop History 2. Hadoop v/s RDBMS 3. Hadoop Architecture 4. Hadoop Ecosystem components 5. Hadoop Storage - HDFS 6. Hadoop Processor - MapReduce 6. Hadoop Server Roles: NameNode, Secondary NameNode, DataNode 7. Anatomy of File Write and Read Module 3: Hadoop Distributed File System 1. HDFS Architecture 2. HDFS internals and use cases 3. HDFS Daemons 4. Files and blocks 5. NameNode memory concerns 6. Secondary NameNode 7. HDFS access options Module 4: MapReduce 1. Use cases of MapReduce 2. MapReduce Architecture 3. Understand the concept of Mappers, Reducers 4. Anatomy of MapReduce Program 5. MapReduce Components Mapper Class, Reducer Class, Driver code 6. Splits and Blocks 7. Understand Combiner and Partitioner 8. Write your own Partitioner 9. Joins - Map Side, Distributed, Distributed Cache, Reduce Side Join 10. Counters 11. Map Reduce API & Data Types

Module 5: Pig 1. Introduction to Apache Pig 2. Pig Data Types 3. Operators in Pig 4. Pig program structure and execution process 5. Joins & filtering using Pig 6. Group & co-group 7. Schema merging and redefining functions 8. Pig functions Module 6: Hive 1. Understanding Hive 2. Hive Architecture & Components 3. Using Hive command line interface 4. Data types and file formats 5. Hive DDL & DML operations 6. Hive vs. RDBMS Module 7: HBase 1. What is HBase 2. HBase architecture 3. HBase in Hadoop Ecosystem 4. HBase vs. HDFS 5. HBase Data model 6. Physical Model in HBase 7. Components of HBase 8. Managing large data sets with HBase 9. Using HBase in Hadoop applications Module 8: Sqoop 1. Introducing Sqoop 2. The principles of Sqoop Design 3. Connectors and Drivers 4. Importing Data with Sqoop 5. Exporting Data with Sqoop Module 9: ZooKeeper 1. Overview of Zookeeper 2. How ZooKeeper Works 3. The ZooKeeper CLI 4. Reading and Writing Data 5. Sequential and Ephemeral znodes 6. Watches 7. Versioning and ACLs 8. Zookeeper use cases

Module 10: Flume 1. Flume Overview 2. Channels 3. Sinks and Sink Processors 4. Sources and Channel Selectors 5. Interceptors, ETL, and Routing 6. Monitoring Flume Module 11: Oozie 1. Introduction to Oozie 2. Oozie Simple/Complex Flow 3. Oozie Components 4. Oozie Service/ Scheduler 5. Use Cases Time and Data triggers 6. Running/Debugging a Coordinator Job 7. Bundle Module 12: Yarn 1. History of Yarn 2. Core Components 3. YARN Administration 4. Capacity Scheduler 5. YARN Distributed-shell Module 13: Troubleshooting, Administering and Optimizing Hadoop 1. Planning a Hadoop Cluster 2. Identity, Authentication and Authorization 3. Resource Management 4. Cluster Maintenance 5. Troubleshooting 6. Monitoring 7. Backup and Recovery Module 14: Real-Time Projects 1. Twitter Data Analysis 2. Stack Exchange Ranking and Percentile data-set 3. Loan Dataset 4. Data-sets by Government 5. Machine Learning Dataset like Badges datasets 6. NYC Data Set 7. Weather Dataset

Sample Questions 1. For a MapReduce job, on a cluster running MapReduce v1 (MRv1), what s the relationship between tasks and a task templates? A. Once the write stream closes on the DataNode, the DataNode immediately initiates a black report to the NameNode. B. The change is written to the NameNode disk. C. The metadata in the RAM on the NameNode is flushed to disk. D. The metadata in RAM on the NameNode is flushed disk. E. The metadata in RAM on the NameNode is updated. F. The change is written to the edits file. 2. How does HDFS Federation help HDFS Scale horizontally? A. HDFS Federation improves the resiliency of HDFS in the face of network issues by removing the NameNode as a single-point-of-failure. B. HDFS Federation allows the Standby NameNode to automatically resume the services of an active NameNode. C. HDFS Federation provides cross-data center (non-local) support for HDFS, allowing a cluster administrator to split the Block Storage outside the local cluster. D. HDFS Federation reduces the load on any single NameNode by using the multiple, independent NameNode to manage individual pars of the filesystem namespace 3. What is the recommended disk configuration c for slave nodes in your Hadoop cluster with 6 x 2 TB hard drives? A. RAID 10 B. JBOD C. RAID 5 D. RAID 1+0 4. Your developers request that you enable them to use Hive on your Hadoop cluster. What do install and/or configure? A. Install the Hive interpreter on the client machines only, and configure a shared remote Hive Metastore. B. Install the Hive Interpreter on the client machines and all the slave nodes, and configure a shared remote Hive Metastore. C. Install the Hive interpreter on the master node running the JobTracker, and configure a shared remote Hive Metastore. D. Install the Hive interpreter on the client machines and all nodes on the cluster

5. Which command does Hadoop offer to discover missing or corrupt HDFS data? A. The map-only checksum utility, B. Fsck C. Du D. Dskchk E. Hadoop does not provide any tools to discover missing or corrupt data; there is no need because three replicas are kept for each data block. Answers: 1 (A), ( 2 (D), ( 3 (B),( 4 (A), 5 (B)(