CURSO: DESARROLLADOR PARA APACHE HADOOP

Size: px

Start display at page:

Download "CURSO: DESARROLLADOR PARA APACHE HADOOP"

Arline Merritt
7 years ago
Views:

1 CURSO: DESARROLLADOR PARA APACHE HADOOP TEST DE EJEMPLO DEL EXÁMEN DE CERTIFICACIÓN

2 1 Question: 1 When is the earliest point at which the reduce method of a given Reducer can be called? A. As soon as at least one mapper has finished processing its input split. B. As soon as a mapper has emitted at least one record. C. Not until all mappers have finished processing all records. D. It depends on the InputFormat used for the job. Answer: C Explanation: In a MapReduce job reducers do not start executing the reduce method until the all Map jobs have completed. Reducers start copying intermediate key-value pairs from the mappers as soon as they are available. The programmer defined reduce method is called only after all the mappers have finished. 2

3 2 Question: 2 You have just executed a MapReduce job. Where is intermediate data written to after being emitted from the Mapper s map method? A. Intermediate data in streamed across the network from Mapper to the Reduce and is never written to disk. B. Into in-memory buffers on the TaskTracker node running the Mapper that spill over and are written into HDFS. C. Into in-memory buffers that spill over to the local file system of the TaskTracker node running the Mapper. D. Into in-memory buffers that spill over to the local file system (outside HDFS) of the TaskTracker node running the Reducer E. Into in-memory buffers on the TaskTracker node running the Reducer that spill over and are written into HDFS. Answer: C 3

4 3 Question: 3 You are developing a combiner that takes as input Text keys, IntWritable values, and emits Text keys, IntWritable values. Which interface should your class implement? A. Combiner <Text, IntWritable, Text, IntWritable> B. Mapper <Text, IntWritable, Text, IntWritable> C. Reducer <Text, Text, IntWritable, IntWritable> D. Reducer <Text, IntWritable, Text, IntWritable> E. Combiner <Text, Text, IntWritable, IntWritable> Answer: D 4

5 4 Question: 4 Indentify the utility that allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer? A. Oozie B. Sqoop C. Flume D. Hadoop Streaming E. Mapred Answer: D Explanation: Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. 5

6 5 Question: 5 Assuming default settings, which best describes the order of data provided to a reducer s reduce method: A. The keys given to a reducer aren t in a predictable order, but the values associated with those keys always are. B. Both the keys and values passed to a reducer always appear in sorted order. C. Neither keys nor values are in any predictable order. D. The keys given to a reducer are in sorted order but the values associated with each key are in no predictable order Answer: D 6

7 6 Question: 6 You have the following key-value pairs as output from your Map task: (the, 1) (fox, 1) (faster, 1) (than, 1) (the, 1) (dog, 1) How many keys will be passed to the Reducer s reduce method? A. Six B. Five C. Four D. Two E. One F. Three Answer: B Explanation: Only one key value pair will be passed from the two (The, 1) key value pairs. 7

8 7 Question: 7 You want to populate an associative array in order to perform a map-side join. You ve decided to put this information in a text file, place that file into the DistributedCache and read it in your Mapper before any records are processed. Indentify which method in the Mapper you should use to implement code for reading the file and populating the associative array? A. combine B. map C. init D. setup Answer: D 8

9 8 Question: 8 You ve written a MapReduce job that will process 500 million input records and generated 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reduces which is a potential bottleneck. A custom implementation of which interface is most likely to reduce the amount of intermediate data transferred across the network? A. Partitioner B. OutputFormat C. WritableComparable D. Writable E. InputFormat F. Combiner Answer: F Explanation: Combiners are used to increase the efficiency of a MapReduce program. They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can help you reduce the amount of data that needs to be transferred across to the reducers. You can use your reducer code as a combiner if the operation performed is commutative and associative. 9

10 9 Question: 9 The Hadoop framework provides a mechanism for coping with machine issues such as faulty configuration or impending hardware failure. MapReduce detects that one or a number of machines are performing poorly and starts more copies of a map or reduce task. All the tasks run simultaneously and the task finish first are used. This is called: A. Combine B. IdentityMapper C. IdentityReducer D. Default Partitioner E. Speculative Execution Answer: E 10

11 9 Explanation: Speculative execution: One problem with the Hadoop system is that by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program. For example if one node has a slow disk controller, then it may be reading its input at only 10% the speed of all the other nodes. So when 99 map tasks are already complete, the system is still waiting for the final map task to check in, which takes much longer than all the other nodes. By forcing tasks to run in isolation from one another, individual tasks do not know where their inputs come from. Tasks trust the Hadoop platform to just deliver the appropriate input. Therefore, the same input can be processed multiple times in parallel, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform. 11

12 10 Question: 10 Which of the following technique is used to incapacitate the reduce task: A. The Hadoop administrator has to set the number of the reducer slot to zero on all slave nodes. This will disable the reduce step. B. It is imposible to disable the reduce step since it is critical part of the Mep-Reduce abstraction. C. A developer can always set the number of the reducers to zero. That will completely disable the reduce step. D. While you cannot completely disable reducers you can set output to one. There needs to be at least one reduce step in Map-Reduce abstraction. Answer: C 12

Contacto administracion@formacionhadoop.com www.

13 Contacto TWITTER Twitter.com/formacionhadoop FACEBOOK Facebook.com/formacionhadoop LINKEDIN linkedin.com/company/formación-hadoop 13

PassTest. Bessere Qualität, bessere Dienstleistungen!

PassTest. Bessere Qualität, bessere Dienstleistungen! PassTest Bessere Qualität, bessere Dienstleistungen! Q&A Exam : CCD-410 Title : Cloudera Certified Developer for Apache Hadoop (CCDH) Version : DEMO 1 / 4 1.When is the earliest point at which the reduce