PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Similar documents
PassTest. Bessere Qualität, bessere Dienstleistungen!

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Hadoop Certification (Developer, Administrator HBase & Data Science) CCD-410, CCA-410 and CCB-400 and DS-200

Qsoft Inc

Hadoop Design and k-means Clustering

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

How To Write A Map Reduce In Hadoop Hadooper (Ahemos)

Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart

Certified Big Data and Apache Hadoop Developer VS-1221

Getting to know Apache Hadoop

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Internals of Hadoop Application Framework and Distributed File System

Cloudera Certified Developer for Apache Hadoop

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Complete Java Classes Hadoop Syllabus Contact No:

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Extreme Computing. Hadoop MapReduce in more detail.

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop Architecture. Part 1

Introduction to Cloud Computing

MapReduce, Hadoop and Amazon AWS

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

A very short Intro to Hadoop

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Hadoop Ecosystem B Y R A H I M A.

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Data-intensive computing systems

Big Data Management and NoSQL Databases

Big Data With Hadoop

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Introduction to MapReduce and Hadoop

Keywords: Big Data, HDFS, Map Reduce, Hadoop

MapReduce Job Processing

Chapter 7. Using Hadoop Cluster and MapReduce

CSE-E5430 Scalable Cloud Computing Lecture 2

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Hadoop Distributed Filesystem. Spring 2015, X. Zhang Fordham Univ.

The Hadoop Eco System Shanghai Data Science Meetup

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

Lecture 3 Hadoop Technical Introduction CSE 490H

Google Bing Daytona Microsoft Research

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Hadoop Streaming. Table of contents

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Apache HBase. Crazy dances on the elephant back

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Hadoop EKG: Using Heartbeats to Propagate Resource Utilization Data

Map-Reduce and Hadoop

INTRODUCTION TO HADOOP

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Big Data Technology Core Hadoop: HDFS-YARN Internals

MapReduce. Tushar B. Kute,

HADOOP MOCK TEST HADOOP MOCK TEST II

University of Maryland. Tuesday, February 2, 2010

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

Large scale processing using Hadoop. Ján Vaňo

!"#$%&' ( )%#*'+,'-#.//"0( !"#$"%&'()*$+()',!-+.'/', 4(5,67,!-+!"89,:*$;'0+$.<.,&0$'09,&)"/=+,!()<>'0, 3, Processing LARGE data sets

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics

ITG Software Engineering

Task Scheduling in Hadoop

Big Data and Scripting map/reduce in Hadoop

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Data-Intensive Computing with Map-Reduce and Hadoop

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

MapReduce on YARN Job Execution

Copy the.jar file into the plugins/ subfolder of your Eclipse installation. (e.g., C:\Program Files\Eclipse\plugins)

Apache Sentry. Prasad Mujumdar

Using Lustre with Apache Hadoop

Big Data Course Highlights

Big Data Analytics* Outline. Issues. Big Data

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Map Reduce & Hadoop Recommended Text:

Hadoop: Embracing future hardware

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Deploying Hadoop with Manager


BIG DATA HADOOP TRAINING

Peers Techno log ies Pv t. L td. HADOOP

Lecture 10 - Functional programming: Hadoop and MapReduce

Hadoop MapReduce: Review. Spring 2015, X. Zhang Fordham Univ.

Communicating with the Elephant in the Data Center

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

Cloudera Administrator Training for Apache Hadoop

H2O on Hadoop. September 30,

How to properly misuse Hadoop. Marcel Huntemann NERSC tutorial session 2/12/13

Step 4: Configure a new Hadoop server This perspective will add a new snap-in to your bottom pane (along with Problems and Tasks), like so:

ITG Software Engineering

Big Data Introduction

Transcription:

PASS4TEST IT Certification Guaranteed, The Easy Way! \ http://www.pass4test.com We offer free update service for one year

Exam : CCD-410 Title : Cloudera Certified Developer for Apache Hadoop (CCDH) Vendor : Cloudera Version : DEMO 1

NO.1 You want to understand more about how users browse your public website, such as which pages they visit prior to placing an order. You have a farm of 200 web servers hosting your website. How will you gather this data for your analysis? A. Ingest the server web logs into HDFS using Flume. B. Write a MapReduce job, with the web servers for mappers, and the Hadoop cluster nodes for reduces. C. Import all users' clicks from your OLTP databases into Hadoop, using Sqoop. D. Channel these clickstreams inot Hadoop using Hadoop Streaming. E. Sample the weblogs from the web servers, copying them into Hadoop using curl. Answer: A NO.2 To process input key-value pairs, your mapper needs to lead a 512 MB data file in memory. What is the best way to accomplish this? A. Serialize the data file, insert in it the JobConf object, and read the data into memory in the configure method of the mapper. B. Place the data file in the DistributedCache and read the data into memory in the map method of the mapper. C. Place the data file in the DataCache and read the data into memory in the configure method of the mapper. D. Place the data file in the DistributedCache and read the data into memory in the configure method of the mapper. Answer: C NO.3 On a cluster running MapReduce v1 (MRv1), a TaskTracker heartbeats into the JobTracker on your cluster, and alerts the JobTracker it has an open map task slot. What determines how the JobTracker assigns each map task to a TaskTracker? A. The amount of RAM installed on the TaskTracker node. B. The amount of free disk space on the TaskTracker node. C. The number and speed of CPU cores on the TaskTracker node. D. The average system load on the TaskTracker node over the past fifteen (15) minutes. E. The location of the InsputSplit to be processed in relation to the location of the node. Answer: E The TaskTrackers send out heartbeat messages to the JobTracker, usually every few minutes, to reassure the JobTracker that it is still alive. These message also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated. When the JobTracker tries to find somewhere to schedule a task within the MapReduce operations, it first looks for an empty slot on the same server that hosts the DataNode containing the data, and if not, it looks for an empty slot on a machine in the same rack. Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How JobTracker schedules a task? NO.4 You write MapReduce job to process 100 files in HDFS. Your MapReduce algorithm uses 2

TextInputFormat: the mapper applies a regular expression over input values and emits key-values pairs with the key consisting of the matching text, and the value containing the filename and byte offset. Determine the difference between setting the number of reduces to one and settings the number of reducers to zero. A. There is no difference in output between the two settings. B. With zero reducers, no reducer runs and the job throws an exception. With one reducer, instances of matching patterns are stored in a single file on HDFS. C. With zero reducers, all instances of matching patterns are gathered together in one file on HDFS. With one reducer, instances of matching patterns are stored in multiple files on HDFS. D. With zero reducers, instances of matching patterns are stored in multiple files on HDFS. With one reducer, all instances of matching patterns are gathered together in one file on HDFS. Answer: D * It is legal to set the number of reduce-tasks to zero if no reduction is desired. In this case the outputs of the map-tasks go directly to the FileSystem, into the output path set by setoutputpath(path). The framework does not sort the map-outputs before writing them out to the FileSystem. * Often, you may want to process input data using a map function only. To do this, simply set mapreduce.job.reduces to zero. The MapReduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job. Note: Reduce In this phase the reduce(writablecomparable, Iterator, OutputCollector, Reporter) method is called for each <key, (list of values)> pair in the grouped inputs. The output of the reduce task is typically written to the FileSystem via OutputCollector.collect(WritableComparable, Writable). Applications can use the Reporter to report progress, set application-level status messages and update Counters, or just indicate that they are alive. The output of the Reducer is not sorted. NO.5 For each intermediate key, each reducer task can emit: A. As many final key-value pairs as desired. There are no restrictions on the types of those key-value pairs (i.e., they can be heterogeneous). B. As many final key-value pairs as desired, but they must have the same type as the intermediate key-value pairs. C. As many final key-value pairs as desired, as long as all the keys have the same type and all the values have the same type. D. One final key-value pair per value associated with the key; no restrictions on the type. E. One final key-value pair per key; no restrictions on the type. Answer: C Reference: Hadoop Map-Reduce Tutorial; Yahoo! Hadoop Tutorial, Module 4: MapReduce 3

NO.6 You've written a MapReduce job that will process 500 million input records and generated 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reduces which is a potential bottleneck. A custom implementation of which interface is most likely to reduce the amount of intermediate data transferred across the network? A. Partitioner B. OutputFormat C. WritableComparable D. Writable E. InputFormat F. Combiner Answer: F Combiners are used to increase the efficiency of a MapReduce program. They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can help you reduce the amount of data that needs to be transferred across to the reducers. You can use your reducer code as a combiner if the operation performed is commutative and associative. Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, What are combiners? When should I use a combiner in my MapReduce Job? NO.7 In a MapReduce job, the reducer receives all values associated with same key. Which statement best describes the ordering of these values? A. The values are in sorted order. B. The values are arbitrarily ordered, and the ordering may vary from run to run of the same MapReduce job. C. The values are arbitrary ordered, but multiple runs of the same MapReduce job will always have the same ordering. D. Since the values come from mapper outputs, the reducers will receive contiguous sections of sorted values. Answer: B Note: *Input to the Reducer is the sorted output of the mappers. *The framework calls the application's Reduce function once for each unique key in the sorted order. *Example: For the given sample input the first map emits: < Hello, 1> < World, 1> < Bye, 1> < World, 1> The second map emits: 4

< Hello, 1> < Hadoop, 1> < Goodbye, 1> < Hadoop, 1> NO.8 Table metadata in Hive is: A. Stored as metadata on the NameNode. B. Stored along with the data in HDFS. C. Stored in the Metastore. D. Stored in ZooKeeper. Answer: C By default, hive use an embedded Derby database to store metadata information. The metastore is the "glue" between Hive and HDFS. It tells Hive where your data files live in HDFS, what type of data they contain, what tables they belong to, etc. The Metastore is an application that runs on an RDBMS and uses an open source ORM layer called DataNucleus, to convert object representations into a relational schema and vice versa. They chose this approach as opposed to storing this information in hdfs as they need the Metastore to be very low latency. The DataNucleus layer allows them to plugin many different RDBMS technologies. Note: *By default, Hive stores metadata in an embedded Apache Derby database, and other client/server databases like MySQL can optionally be used. *features of Hive include: Metadata storage in an RDBMS, significantly reducing the time to perform semantic checks during query execution. Reference: Store Hive Metadata into RDBMS 5