PassTest. Bessere Qualität, bessere Dienstleistungen!



Similar documents
Cloudera Certified Developer for Apache Hadoop

Qsoft Inc

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Hadoop WordCount Explained! IT332 Distributed Systems

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

HADOOP MOCK TEST HADOOP MOCK TEST II

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

A bit about Hadoop. Luca Pireddu. March 9, CRS4Distributed Computing Group. (CRS4) Luca Pireddu March 9, / 18

Hadoop Ecosystem B Y R A H I M A.

Big Data With Hadoop

Internals of Hadoop Application Framework and Distributed File System

Hadoop Job Oriented Training Agenda

MapReduce. Tushar B. Kute,

Constructing a Data Lake: Hadoop and Oracle Database United!

Chapter 7. Using Hadoop Cluster and MapReduce

ITG Software Engineering

Click Stream Data Analysis Using Hadoop

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

A very short Intro to Hadoop

COURSE CONTENT Big Data and Hadoop Training

Complete Java Classes Hadoop Syllabus Contact No:

Getting to know Apache Hadoop

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from

Hadoop MapReduce Tutorial - Reduce Comp variability in Data Stamps

Extreme Computing. Hadoop MapReduce in more detail.

Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart

A. Aiken & K. Olukotun PA3

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Performance and Energy Efficiency of. Hadoop deployment models

Hadoop Design and k-means Clustering

Big Data and Scripting map/reduce in Hadoop

Hadoop Certification (Developer, Administrator HBase & Data Science) CCD-410, CCA-410 and CCB-400 and DS-200

Developing a MapReduce Application

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

HADOOP PERFORMANCE TUNING

Data-intensive computing systems

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Understanding Hadoop Performance on Lustre

How To Use Hadoop

Hadoop Architecture. Part 1

Big Data Course Highlights

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

GraySort and MinuteSort at Yahoo on Hadoop 0.23

H2O on Hadoop. September 30,

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Introduction to MapReduce and Hadoop

Hadoop Development & BI- 0 to 100

Data-Intensive Computing with Map-Reduce and Hadoop

Keywords: Big Data, HDFS, Map Reduce, Hadoop

This exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.

L1: Introduction to Hadoop

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Introduc)on to the MapReduce Paradigm and Apache Hadoop. Sriram Krishnan

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Certified Big Data and Apache Hadoop Developer VS-1221

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Developing MapReduce Programs

Hadoop and Map-Reduce. Swati Gore

BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION

How To Write A Mapreduce Program On An Ipad Or Ipad (For Free)

Introduc)on to Map- Reduce. Vincent Leroy

NoSQL and Hadoop Technologies On Oracle Cloud

Introduction to Hadoop

Data Intensive Computing Handout 5 Hadoop

Big Data Introduction

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Extreme computing lab exercises Session one

A Brief Outline on Bigdata Hadoop

University of Maryland. Tuesday, February 2, 2010

R / TERR. Ana Costa e SIlva, PhD Senior Data Scientist TIBCO. Copyright TIBCO Software Inc.

Scalable Forensics with TSK and Hadoop. Jon Stewart

Extreme computing lab exercises Session one

Working With Hadoop. Important Terminology. Important Terminology. Anatomy of MapReduce Job Run. Important Terminology

Hadoop: The Definitive Guide

Peers Techno log ies Pv t. L td. HADOOP

Hadoop Parallel Data Processing

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

How to properly misuse Hadoop. Marcel Huntemann NERSC tutorial session 2/12/13

Map Reduce & Hadoop Recommended Text:

Big Data Too Big To Ignore

MarkLogic Server. MarkLogic Connector for Hadoop Developer s Guide. MarkLogic 8 February, 2015

The Hadoop Eco System Shanghai Data Science Meetup

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

ITG Software Engineering

Data Intensive Computing Handout 6 Hadoop

Big Fast Data Hadoop acceleration with Flash. June 2013

Transcription:

PassTest Bessere Qualität, bessere Dienstleistungen! Q&A

Exam : CCD-410 Title : Cloudera Certified Developer for Apache Hadoop (CCDH) Version : DEMO 1 / 4

1.When is the earliest point at which the reduce method of a given Reducer can be called? A. As soon as at least one mapper has finished processing its input split. B. As soon as a mapper has emitted at least one record. C. Not until all mappers have finished processing all records. D. It depends on the InputFormat used for the job. Answer: C Explanation: In a MapReduce job reducers do not start executing the reduce method until the all Map jobs have completed. Reducers start copying intermediate key-value pairs from the mappers as soon as they are available. The programmer defined reduce method is called only after all the mappers have finished. Note: The reduce phase has 3 steps: shuffle, sort, reduce. Shuffle is where the data is collected by the reducer from each mapper. This can happen while mappers are generating data since it is only a data transfer. On the other hand, sort and reduce can only start once all the mappers are done. Why is starting the reducers early a good thing? Because it spreads out the data transfer from the mappers to the reducers over time, which is a good thing if your network is the bottleneck. Why is starting the reducers early a bad thing? Because they "hog up" reduce slots while only copying data. Another job that starts later that will actually use the reduce slots now can't use them. You can customize when the reducers startup by changing the default value of mapred.reduce.slowstart.completed.maps in mapred-site.xml. A value of 1.00 will wait for all the mappers to finish before starting the reducers. A value of 0.0 will start the reducers right away. A value of 0.5 will start the reducers when half of the mappers are complete. You can also change mapred.reduce.slowstart.completed.maps on a job-by-job basis. Typically, keep mapred.reduce.slowstart.completed.maps above 0.9 if the system ever has multiple jobs running at once. This way the job doesn't hog up reducers when they aren't doing anything but copying data. If you only ever have one job running at a time, doing 0.1 would probably be appropriate. Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, When is the reducers are started in a MapReduce job? 2.Which describes how a client reads a file from HDFS? A. The client queries the NameNode for the block location(s). The NameNode returns the block location(s) to the client. The client reads the data directory off the DataNode(s). B. The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly to the client. The client reads the data directly off the DataNode. C. The client contacts the NameNode for the block location(s). The NameNode then queries the DataNodes for block locations. The DataNodes respond to the NameNode, and the NameNode redirects the client to the DataNode that holds the requested data block(s). The client then reads the data directly off the DataNode. D. The client contacts the NameNode for the block location(s). The NameNode contacts the DataNode that holds the requested data block. Data is transferred from the DataNode to the NameNode, and then from the NameNode to the client. Answer: A Explanation: Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How the Client communicates with HDFS? 2 / 4

3.You are developing a combiner that takes as input Text keys, IntWritable values, and emits Text keys, IntWritable values. Which interface should your class implement? A. Combiner <Text, IntWritable, Text, IntWritable> B. Mapper <Text, IntWritable, Text, IntWritable> C. Reducer <Text, Text, IntWritable, IntWritable> D. Reducer <Text, IntWritable, Text, IntWritable> E. Combiner <Text, Text, IntWritable, IntWritable> Answer: D 4.Indentify the utility that allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer? A. Oozie B. Sqoop C. Flume D. Hadoop Streaming E. mapred Answer: D Explanation: Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. Reference: http://hadoop.apache.org/common/docs/r0.20.1/streaming.html (Hadoop Streaming, second sentence) 5.How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapReduce? A. Keys are presented to reducer in sorted order; values for a given key are not sorted. B. Keys are presented to reducer in sorted order; values for a given key are sorted in ascending order. C. Keys are presented to a reducer in random order; values for a given key are not sorted. D. Keys are presented to a reducer in random order; values for a given key are sorted in ascending order. Answer: A Explanation: Reducer has 3 primary phases: Shuffle The Reducer copies the sorted output from each Mapper using HTTP across the network. Sort The framework merge sorts Reducer inputs by keys (since different Mappers may have output the same key). The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged. SecondarySort To achieve a secondary sort on the values returned by the value iterator, the application should extend the key with the secondary key and define a grouping comparator. The keys will be sorted using the entire key, but will be grouped using the grouping comparator to decide which keys and values are sent in the same call to reduce. 3. Reduce In this phase the reduce(object, Iterable, Context) method is called for each <key, (collection of values)> in the sorted inputs. The output of the reduce task is typically written to a RecordWriter via 3 / 4

TaskInputOutputContext.write(Object, Object). The output of the Reducer is not re-sorted. Reference: org.apache.hadoop.mapreduce, Class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> 4 / 4