Test-King.CCA Q.A. Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH)

Size: px
Start display at page:

Download "Test-King.CCA-500.68Q.A. Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH)"

Transcription

1 Test-King.CCA Q.A Number: Cloudera CCA-500 Passing Score: 800 Time Limit: 120 min File Version: Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH) JUST PASSED a few minutes ago with outstanding marks 95%, Still valid, Hurry up guys study and pass this one. I have correct many of questions answers. If there is any more then update this vce and re-upload. Nicely written Questions with many corrections inside. It would be really helpful for all those people who want pass exam without any ambiguity. Awesome works, thanks for create easiness for us people.

2 Exam A QUESTION 1 What does CDH packaging do on install to facilitate Kerberos security setup? A. Automatically configures permissions for log files at & MAPRED_LOG_DIR/userlogs B. Creates users for hdfs and mapreduce to facilitate role assignment C. Creates directories for temp, hdfs, and mapreduce with the correct permissions D. Creates a set of pre-configured Kerberos keytab files and their permissions E. Creates and configures your kdc with default cluster values answer is to the point. QUESTION 2 Which is the default scheduler in YARN? A. YARN doesn't configure a default scheduler, you must first assign an appropriate scheduler class in yarn-site.xml B. Capacity Scheduler C. Fair Scheduler D. FIFO Scheduler Reference: site/capacityscheduler.html QUESTION 3 Which YARN process run as "container 0" of a submitted job and is responsible for resource qrequests? A. ApplicationManager B. JobTracker C. ApplicationMaster D. JobHistoryServer E. ResoureManager F. NodeManager Correct Answer: C nice. QUESTION 4 Which scheduler would you deploy to ensure that your cluster allows short jobs to finish within a reasonable time without starting long-running jobs? A. Complexity Fair Scheduler (CFS) B. Capacity Scheduler C. Fair Scheduler D. FIFO Scheduler

3 Correct Answer: C Reference: QUESTION 5 Your cluster is configured with HDFS and MapReduce version 2 (MRv2) on YARN. What is the result when you execute: hadoop jar SampleJar MyClass on a client machine? A. SampleJar.Jar is sent to the ApplicationMaster which allocates a container for SampleJar.Jar B. Sample.jar is placed in a temporary directory in HDFS C. SampleJar.jar is sent directly to the ResourceManager D. SampleJar.jar is serialized into an XML file which is submitted to the ApplicatoionMaster finest answer. QUESTION 6 You are working on a project where you need to chain together MapReduce, Pig jobs. You also need the ability to use forks, decision points, and path joins. Which ecosystem project should you use to perform these actions? A. Oozie B. ZooKeeper C. HBase D. Sqoop E. HUE answer is upgraded. QUESTION 7 Which process instantiates user code, and executes map and reduce tasks on a cluster running MapReduce v2 (MRv2) on YARN? A. NodeManager B. ApplicationMaster C. TaskTracker D. JobTracker E. NameNode

4 F. DataNode G. ResourceManager answer is nice. QUESTION 8 Cluster Summary: 45 files and directories, 12 blocks = 57 total. Heap size is MB/193.38MB(7%) Refer to the above screenshot. You configure a Hadoop cluster with seven DataNodes and on of your monitoring UIs displays the details shown in the exhibit. What does the this tell you? A. The DataNode JVM on one host is not active B. Because your under-replicated blocks count matches the Live Nodes, one node is dead, and your DFS Used % equals 0%, you can't be certain that your cluster has all the data you've written it. C. Your cluster has lost all HDFS data which had bocks stored on the dead DatNode D. The HDFS cluster is in safe mode finest QUESTION 9 Which two features does Kerberos security add to a Hadoop cluster? (Choose two) A. User authentication on all remote procedure calls (RPCs) B. Encryption for data during transfer between the Mappers and Reducers C. Encryption for data on disk ("at rest") D. Authentication for user access to the cluster against a central server E. Root access to the cluster for users hdfs and mapred but non-root access for clients D

5 Okay. QUESTION 10 Assuming a cluster running HDFS, MapReduce version 2 (MRv2) on YARN with all settings at their default, what do you need to do when adding a new slave node to cluster? A. Nothing, other than ensuring that the DNS (or/etc/hosts files on all machines) contains any entry for the new node. B. Restart the NameNode and ResourceManager daemons and resubmit any running jobs. C. Add a new entry to /etc/nodes on the NameNode host. D. Restart the NameNode of dfs.number.of.nodes in hdfs-site.xml : cluster.3b_how_do_i_start_services_on_just_one_node.3f QUESTION 11 Which YARN daemon or service negotiations map and reduce Containers from the Scheduler, tracking their status and monitoring progress? A. NodeManager B. ApplicationMaster C. ApplicationManager D. ResourceManager Reference: (See resource manager) QUESTION 12 You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the network fabric. Which workloads benefit the most from faster network fabric? A. When your workload generates a large amount of output data, significantly larger than the amount of intermediate data B. When your workload consumes a large amount of input data, relative to the entire capacity if HDFS C. When your workload consists of processor-intensive tasks D. When your workload generates a large amount of intermediate data, on the order of the input data itself ALL right. QUESTION 13

6 Your cluster is running MapReduce version 2 (MRv2) on YARN. Your ResourceManager is configured to use the FairScheduler. Now you want to configure your scheduler such that a new user on the cluster can submit jobs into their own queue application submission. Which configuration should you set? A. You can specify new queue name when user submits a job and new queue can be created dynamically if the property yarn.scheduler.fair.allow-undecleared-pools = true B. Yarn.scheduler.fair.user.fair-as-default-queue = false and yarn.scheduler.fair.allow- undecleared-pools = true C. You can specify new queue name when user submits a job and new queue can be created dynamically if yarn.schedule.fair.user-as-default-queue = false D. You can specify new queue name per application in allocations.xml file and have new jobs automatically assigned to the application queue answer is confirmed. QUESTION 14 A slave node in your cluster has 4 TB hard drives installed (4 x 2TB). The DataNode is configured to store HDFS blocks on all disks. You set the value of the dfs.datanode.du.reserved parameter to 100 GB. How does this alter HDFS block storage? A. 25GB on each hard drive may not be used to store HDFS blocks B. 100GB on each hard drive may not be used to store HDFS blocks C. All hard drives may be used to store HDFS blocks as long as at least 100 GB in total is available on the node D. A maximum if 100 GB on each hard drive may be used to store HDFS blocks Correct Answer: C acceptable answer. QUESTION 15 What two processes must you do if you are running a Hadoop cluster with a single NameNode and six DataNodes, and you want to change a configuration parameter so that it affects all six DataNodes. (Choose two) A. You must modify the configuration files on the NameNode only. DataNodes read their configuration from the master nodes B. You must modify the configuration files on each of the DataNodes machines C. You don't need to restart any daemon, as they will pick up changes automatically D. You must restart the NameNode daemon to apply the changes to the cluster E. You must restart all six DatNode daemon to apply the changes to the cluster D : Typically one machine in the cluster is designated as the NameNode and another machine the as JobTracker, exclusively. These are the masters. The rest of the machines in the cluster act as both DataNode and TaskTracker. These are the slaves QUESTION 16

7 You have installed a cluster HDFS and MapReduce version 2 (MRv2) on YARN. You have no dfs.hosts entry(ies) in your hdfs-site.xml configuration file. You configure a new worker node by setting fs.default.name in its configuration files to point to the NameNode on your cluster, and you start the DataNode daemon on that worker node. What do you have to do on the cluster to allow the worker node to join, and start sorting HDFS blocks? A. Without creating a dfs.hosts file or making any entries, run the commands hadoop.dfsadminrefreshmodes on the NameNode B. Restart the NameNode C. Creating a dfs.hosts file on the NameNode, add the worker Node's name to it, then issue the command hadoop dfsadmin refresh Nodes = on the Namenode D. Nothing; the worker node will automatically join the cluster when NameNode daemon is started answer is valid QUESTION 17 You use the hadoop fs put command to add a file "sales.txt" to HDFS. This file is small enough that it fits into a single block, which is replicated to three nodes in your cluster (with a replication factor of 3). One of the nodes holding this file (a single block) fails. How will the cluster handle the replication of file in this situation? A. The file will remain under-replicated until the administrator brings that node back online B. The cluster will re-replicate the file the next time the system administrator reboots the NameNode daemon (as long as the file's replication factor doesn't fall below) C. This will be immediately re-replicated and all other HDFS operations on the cluster will halt until the cluster's replication values are resorted D. The file will be re-replicated automatically after the NameNode determines it is under-replicated based on the block reports it receives from the NameNodes Correct Answer: D : The NameNode marks all blocks stored on the dead DataNode as under-replicated and orchestrates their rereplication. If the NameNode has marked a DataNode as dead, it will not include that DataNode in the list of machines it returns to clients as containing blocks they have requested. The NameNode will begin the process of re-replication, contacting a DataNode which contains a copy of each block which was on the now-dead DataNode and telling it to re-replicate that block to another DataNode. QUESTION 18 Given: You want to clean up this list by removing jobs where the State is KILLED. What command you enter? A. Yarn application refreshjobhistory B. Yarn application kill application_ _0109

8 C. Yarn rmadmin refreshqueue D. Yarn rmadmin kill application_ _0109 Reference: hadoop/ content/common_mrv2_commands.html QUESTION 19 Assume you have a file named foo.txt in your local directory. You issue the following three commands: Hadoop fs mkdir input Hadoop fs put foo.txt input/foo.txt Hadoop fs put foo.txt input What happens when you issue the third command? A. The write succeeds, overwriting foo.txt in HDFS with no warning B. The file is uploaded and stored as a plain file named input C. You get a warning that foo.txt is being overwritten D. You get an error message telling you that foo.txt already exists, and asking you if you would like to overwrite it. E. You get a error message telling you that foo.txt already exists. The file is not written to HDFS F. You get an error message telling you that input is not a directory G. The write silently fails Correct Answer: CE valid. QUESTION 20 You are configuring a server running HDFS, MapReduce version 2 (MRv2) on YARN running Linux. How must you format underlying file system of each DataNode? A. They must be formatted as HDFS B. They must be formatted as either ext3 or ext4 C. They may be formatted in any Linux file system D. They must not be formatted - - HDFS will format the file system automatically superb answer, QUESTION 21 You are migrating a cluster from MApReduce version 1 (MRv1) to MapReduce version 2 (MRv2) on YARN. You want to maintain your MRv1 TaskTracker slot capacities when you migrate. What should you do/ A. Configure yarn.applicationmaster.resource.memory-mb and yarn.applicationmaster.resource.cpuvcores so that ApplicationMaster container allocations match the capacity you require.

9 B. You don't need to configure or balance these properties in YARN as YARN dynamically balances resource management capabilities on your cluster C. Configure mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum ub yarn-site.xml to match your cluster's capacity set by the yarn-scheduler.minimum-allocation D. Configure yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu- vcores to match the capacity you require under YARN for each NodeManager Correct Answer: D answer is genuine. QUESTION 22 On a cluster running MapReduce v2 (MRv2) on YARN, a MapReduce job is given a directory of 10 plain text files as its input directory. Each file is made up of 3 HDFS blocks. How many Mappers will run? A. We cannot say; the number of Mappers is determined by the ResourceManager B. We cannot say; the number of Mappers is determined by the developer C. 30 D. 3 E. 10 F. We cannot say; the number of mappers is determined by the ApplicationMaster Correct Answer: E real answer. QUESTION 23 You're upgrading a Hadoop cluster from HDFS and MapReduce version 1 (MRv1) to one running HDFS and MapReduce version 2 (MRv2) on YARN. You want to set and enforce version 1 (MRv1) to one running HDFS and MapReduce version 2 (MRv2) on YARN. You want to set and enforce a block size of 128MB for all new files written to the cluster after upgrade. What should you do? A. You cannot enforce this, since client code can always override this value B. Set dfs.block.size to 128M on all the worker nodes, on all client machines, and on the NameNode, and set the parameter to final C. Set dfs.block.size to 128 M on all the worker nodes and client machines, and set the parameter to final. You do not need to set this value on the NameNode D. Set dfs.block.size to on all the worker nodes, on all client machines, and on the NameNode, and set the parameter to final E. Set dfs.block.size to on all the worker nodes and client machines, and set the parameter to final. You do not need to set this value on the NameNode Correct Answer: C answer is well explained. QUESTION 24 Your cluster has the following characteristics: - A rack aware topology is configured and on

10 - Replication is set to 3 - Cluster block size is set to 64MB Which describes the file read process when a client application connects into the cluster and requests a 50MB file? A. The client queries the NameNode for the locations of the block, and reads all three copies. The first copy to complete transfer to the client is the one the client reads as part of hadoop's speculative execution framework. B. The client queries the NameNode for the locations of the block, and reads from the first location in the list it receives. C. The client queries the NameNode for the locations of the block, and reads from a random location in the list it receives to eliminate network I/O loads by balancing which nodes it retrieves data from any given time. D. The client queries the NameNode which retrieves the block from the nearest DataNode to the client then passes that block back to the client. great answer. QUESTION 25 Your Hadoop cluster is configuring with HDFS and MapReduce version 2 (MRv2) on YARN. Can you configure a worker node to run a NodeManager daemon but not a DataNode daemon and still have a functional cluster? A. Yes. The daemon will receive data from the NameNode to run Map tasks B. Yes. The daemon will get data from another (non-local) DataNode to run Map tasks C. Yes. The daemon will receive Map tasks only D. Yes. The daemon will receive Reducer tasks only answer is upgraded. QUESTION 26 You have A 20 node Hadoop cluster, with 18 slave nodes and 2 master nodes running HDFS High Availability (HA). You want to minimize the chance of data loss in your cluster. What should you do? A. Add another master node to increase the number of nodes running the JournalNode which increases the number of machines available to HA to create a quorum B. Set an HDFS replication factor that provides data redundancy, protecting against node failure C. Run a Secondary NameNode on a different master from the NameNode in order to provide automatic recovery from a NameNode failure. D. Run the ResourceManager on a different master from the NameNode in order to load-share HDFS metadata processing E. Configure the cluster's disk drives with an appropriate fault tolerant RAID level Correct Answer: D :

11 QUESTION 27 You are running Hadoop cluster with all monitoring facilities properly configured. Which scenario will go undeselected? A. HDFS is almost full B. The NameNode goes down C. A DataNode is disconnected from the cluster D. Map or reduce tasks that are stuck in an infinite loop E. MapReduce jobs are causing excessive memory swaps : QUESTION 28 You decide to create a cluster which runs HDFS in High Availability mode with automatic failover, using Quorum Storage. What is the purpose of ZooKeeper in such a configuration? A. It only keeps track of which NameNode is Active at any given time B. It monitors an NFS mount point and reports if the mount point disappears C. It both keeps track of which NameNode is Active at any given time, and manages the Edits file. Which is a log of changes to the HDFS filesystem D. If only manages the Edits file, which is log of changes to the HDFS filesystem E. Clients connect to ZooKeeper to determine which NameNode is Active Reference: Reference: docs/cdh4/latest/ PDF/CDH4-High-Availability-Guide.pdf (page 15) QUESTION 29 Choose three reasons why should you run the HDFS balancer periodically? (Choose three) A. To ensure that there is capacity in HDFS for additional data B. To ensure that all blocks in the cluster are 128MB in size C. To help HDFS deliver consistent performance under heavy loads D. To ensure that there is consistent disk utilization across the DataNodes E. To improve data locality MapReduce Correct Answer: CDE : balancerperiodically-why-choose-3 QUESTION 30 Your cluster implements HDFS High Availability (HA). Your two NameNodes are named nn01 and nn02. What occurs when you execute the command: hdfs haadmin failover nn01 nn02?

12 A. nn02 is fenced, and nn01 becomes the active NameNode B. nn01 is fenced, and nn02 becomes the active NameNode C. nn01 becomes the standby NameNode and nn02 becomes the active NameNode D. nn02 becomes the standby NameNode and nn01 becomes the active NameNode : : failover initiate a failover between two NameNodes This subcommand causes a failover from the first provided NameNode to the second. If the first NameNode is in the Standby state, this command simply transitions the second to the Active state without error. If the first NameNode is in the Active state, an attempt will be made to gracefully transition it to the Standby state. If this fails, the fencing methods (as configured by dfs.ha.fencing.methods) will be attempted in order until one of the methods succeeds. Only after this process will the second NameNode be transitioned to the Active state. If no fencing method succeeds, the second NameNode will not be transitioned to the Active state, and an error will be returned. QUESTION 31 You have a Hadoop cluster HDFS, and a gateway machine external to the cluster from which clients submit jobs. What do you need to do in order to run Impala on the cluster and submit jobs from the command line of the gateway machine? A. Install the impalad daemon statestored daemon, and daemon on each machine in the cluster, and the impala shell on your gateway machine B. Install the impalad daemon, the statestored daemon, the catalogd daemon, and the impala shell on your gateway machine C. Install the impalad daemon and the impala shell on your gateway machine, and the statestored daemon and catalogd daemon on one of the nodes in the cluster D. Install the impalad daemon on each machine in the cluster, the statestored daemon and catalogd daemon on one machine in the cluster, and the impala shell on your gateway machine E. Install the impalad daemon, statestored daemon, and catalogd daemon on each machine in the cluster and on the gateway node Correct Answer: D answer is factual. QUESTION 32 You have just run a MapReduce job to filter user messages to only those of a selected geographical region. The output for this job is in a directory named westusers, located just below your home directory in HDFS. Which command gathers these into a single file on your local file system? A. Hadoop fs getmerge R westusers.txt B. Hadoop fs getemerge westusers westusers.txt C. Hadoop fs cp westusers/* westusers.txt D. Hadoop fs get westusers westusers.txt

13 sensual answer. QUESTION 33 In CDH4 and later, which file contains a serialized form of all the directory and files inodes in the filesystem, giving the NameNode a persistent checkpoint of the filesystem metadata? A. fstime B. VERSION C. Fsimage_N (where N reflects transactions up to transaction ID N) D. Edits_N-M (where N-M transactions between transaction ID N and transaction ID N) Correct Answer: C Reference: QUESTION 34 You are running a Hadoop cluster with a NameNode on host mynamenode. What are two ways to determine available HDFS space in your cluster? A. Run hdfs fs du / and locate the DFS Remaining value B. Run hdfs dfsadmin report and locate the DFS Remaining value C. Run hdfs dfs / and subtract NDFS Used from configured Capacity D. Connect to and locate the DFS remaining value real answer. QUESTION 35 You have recently converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture to MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to specifying map and reduce tasks (resource allocation) tasks when they run jobs: A developer wants to know how specify to reduce tasks when a specific job runs. Which method should you tell that developers to implement? A. MapReduce version 2 (MRv2) on YARN abstracts resource allocation away from the idea of "tasks" into memory and virtual cores, thus eliminating the need for a developer to specify the number of reduce tasks, and indeed preventing the developer from specifying the number of reduce tasks. B. In YARN, resource allocations is a function of megabytes of memory in multiples of 1024mb. Thus, they should specify the amount of memory resource they need by executing D mapreducereduces.memory-mb-2048 C. In YARN, the ApplicationMaster is responsible for requesting the resource required for a specific launch. Thus, executing D yarn.applicationmaster.reduce.tasks=2 will specify that the ApplicationMaster launch two task contains on the worker nodes. D. Developers specify reduce tasks in the exact same way for both MapReduce version 1 (MRv1) and MapReduce version 2 (MRv2) on YARN. Thus, executing D mapreduce.job.reduces-2 will specify reduce tasks. E. In YARN, resource allocation is function of virtual cores specified by the ApplicationManager making requests to the NodeManager where a reduce task is handeled by a single container (and thus a single

14 virtual core). Thus, the developer needs to specify the number of virtual cores to the NodeManager by executing p yarn.nodemanager.cpu-vcores=2 Correct Answer: D good. QUESTION 36 Your Hadoop cluster contains nodes in three racks. You have not configured the dfs.hosts property in the NameNode's configuration file. What results? A. The NameNode will update the dfs.hosts property to include machines running the DataNode daemon on the next NameNode reboot or with the command dfsadmin refreshnodes B. No new nodes can be added to the cluster until you specify them in the dfs.hosts file C. Any machine running the DataNode daemon can immediately join the cluster D. Presented with a blank dfs.hosts property, the NameNode will permit DataNodes specified in mapred.hosts to join the cluster Correct Answer: C proved. QUESTION 37 You are running a Hadoop cluster with MapReduce version 2 (MRv2) on YARN. You consistently see that MapReduce map tasks on your cluster are running slowly because of excessive garbage collection of JVM, how do you increase JVM heap size property to 3GB to optimize performance? A. yarn.application.child.java.opts=-xsx3072m B. yarn.application.child.java.opts=-xmx3072m C. mapreduce.map.java.opts=-xms3072m D. mapreduce.map.java.opts=-xmx3072m Correct Answer: C Reference: QUESTION 38 You have a cluster running with a FIFO scheduler enabled. You submit a large job A to the cluster, which you expect to run for one hour. Then, you submit job B to the cluster, which you expect to run a couple of minutes only. You submit both jobs with the same priority. Which two best describes how FIFO Scheduler arbitrates the cluster resources for job and its tasks? (Choose two) A. Because there is a more than a single job on the cluster, the FIFO Scheduler will enforce a limit on the percentage of resources allocated to a particular job at any given time B. Tasks are scheduled on the order of their job submission C. The order of execution of job may vary D. Given job A and submitted in that order, all tasks from job A are guaranteed to finish before all tasks

15 from job B E. The FIFO Scheduler will give, on average, and equal share of the cluster resources over the job lifecycle F. The FIFO Scheduler will pass an exception back to the client when Job B is submitted, since all slots on the cluster are use D fine. QUESTION 39 A user comes to you, complaining that when she attempts to submit a Hadoop job, it fails. There is a Directory in HDFS named /data/input. The Jar is named j.jar, and the driver class is named DriverClass. She runs the command: Hadoop jar j.jar DriverClass /data/input/data/output The error message returned includes the line: PriviligedActionException as:training (auth:simple) cause:org.apache.hadoop.mapreduce.lib.input.invalidinputexception: Input path does not exist: file:/data/input What is the cause of the error? A. The user is not authorized to run the job on the cluster B. The output directory already exists C. The name of the driver has been spelled incorrectly on the command line D. The directory name is misspelled in HDFS E. The Hadoop configuration files on the client do not point to the cluster approved answer. QUESTION 40 Your company stores user profile records in an OLTP databases. You want to join these records with web server logs you have already ingested into the Hadoop file system. What is the best way to obtain and ingest these user records? A. Ingest with Hadoop streaming B. Ingest using Hive's IQAD DATA command C. Ingest with sqoop import D. Ingest with Pig's LOAD command E. Ingest using the HDFS put command Correct Answer: C modified answer.

16 QUESTION 41 Which two are features of Hadoop's rack topology? (Choose two) A. Configuration of rack awareness is accomplished using a configuration file. You cannot use a rack topology script. B. Hadoop gives preference to intra-rack data transfer in order to conserve bandwidth C. Rack location is considered in the HDFS block placement policy D. HDFS is rack aware but MapReduce daemon are not E. Even for small clusters on a single rack, configuring rack awareness will improve performance C updated. QUESTION 42 For each YARN job, the Hadoop framework generates task log file. Where are Hadoop task log files stored? A. Cached by the NodeManager managing the job containers, then written to a log directory on the NameNode B. Cached in the YARN container running the task, then copied into HDFS on job completion C. In HDFS, in the directory of the user who generates the job D. On the local disk of the slave mode running the task Correct Answer: D answer is correct QUESTION 43 In CDH4 and later, which file contains a serialized form of all the directory and files inodes in the filesystem, giving the NameNode a persistent checkpoint of the filesystem metadata? A. fstime B. VERSION C. Fsimage_N (where N reflects transactions up to transaction ID N) D. Edits_N-M (where N-M transactions between transaction ID N and transaction ID N) Correct Answer: C Reference: QUESTION 44 Which YARN process run as "container 0" of a submitted job and is responsible for resource qrequests? A. ApplicationManager B. JobTracker C. ApplicationMaster D. JobHistoryServer E. ResoureManager

17 F. NodeManager Correct Answer: C definite. QUESTION 45 A user comes to you, complaining that when she attempts to submit a Hadoop job, it fails. There is a Directory in HDFS named /data/input. The Jar is named j.jar, and the driver class is named DriverClass. She runs the command: Hadoop jar j.jar DriverClass /data/input/data/output The error message returned includes the line: PriviligedActionException as:training (auth:simple) cause:org.apache.hadoop.mapreduce.lib.input.invalidinputexception: Input path does not exist: file:/data/input What is the cause of the error? A. The user is not authorized to run the job on the cluster B. The output directory already exists C. The name of the driver has been spelled incorrectly on the command line D. The directory name is misspelled in HDFS E. The Hadoop configuration files on the client do not point to the cluster finest answer. QUESTION 46 You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the network fabric. Which workloads benefit the most from faster network fabric? A. When your workload generates a large amount of output data, significantly larger than the amount of intermediate data B. When your workload consumes a large amount of input data, relative to the entire capacity if HDFS C. When your workload consists of processor-intensive tasks D. When your workload generates a large amount of intermediate data, on the order of the input data itself answer is nice. QUESTION 47 Table schemas in Hive are: A. Stored as metadata on the NameNode B. Stored along with the data in HDFS

18 C. Stored in the Metadata D. Stored in ZooKeeper : on-hdfslocation-path-with-out-connecting-to-m QUESTION 48 For each YARN job, the Hadoop framework generates task log file. Where are Hadoop task log files stored? A. Cached by the NodeManager managing the job containers, then written to a log directory on the NameNode B. Cached in the YARN container running the task, then copied into HDFS on job completion C. In HDFS, in the directory of the user who generates the job D. On the local disk of the slave mode running the task Correct Answer: D answer is 100% fit. QUESTION 49 You have a cluster running with the fair Scheduler enabled. There are currently no jobs running on the cluster, and you submit a job A, so that only job A is running on the cluster. A while later, you submit Job B. now Job A and Job B are running on the cluster at the same time. How will the Fair Scheduler handle these two jobs? (Choose two) A. When Job B gets submitted, it will get assigned tasks, while job A continues to run with fewer tasks. B. When Job B gets submitted, Job A has to finish first, before job B can gets scheduled. C. When Job A gets submitted, it doesn't consumes all the task slots. D. When Job A gets submitted, it consumes all the task slots. true. QUESTION 50 Each node in your Hadoop cluster, running YARN, has 64GB memory and 24 cores. Your yarn.site.xml has the following configuration: <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>32768</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name>

19 <value>12</value> </property> You want YARN to launch no more than 16 containers per node. What should you do? A. Modify yarn-site.xml with the following property: <name>yarn.scheduler.minimum-allocation-mb</name> <value>2048</value> B. Modify yarn-sites.xml with the following property: <name>yarn.scheduler.minimum-allocation-mb</name> <value>4096</value> C. Modify yarn-site.xml with the following property: <name>yarn.nodemanager.resource.cpu-vccores</name> D. No action is needed: YARN's dynamic resource allocation automatically optimizes the node memory and cores answer is definite. QUESTION 51 You want to node to only swap Hadoop daemon data from RAM to disk when absolutely necessary. What should you do? A. Delete the /dev/vmswap file on the node B. Delete the /etc/swap file on the node C. Set the ram.swap parameter to 0 in core-site.xml D. Set vm.swapfile file on the node E. Delete the /swapfile file on the node Correct Answer: D absolutely perfect answer. QUESTION 52 You are configuring your cluster to run HDFS and MapReducer v2 (MRv2) on YARN. Which two daemons needs to be installed on your cluster's master nodes? (Choose two) A. HMaster B. ResourceManager C. TaskManager D. JobTracker E. NameNode F. DataNode E all right.

20 QUESTION 53 You observed that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to 1000MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio? A. For a 1GB child heap size an io.sort.mb of 128 MB will always maximize memory to disk I/O B. Increase the io.sort.mb to 1GB C. Decrease the io.sort.mb value to 0 D. Tune the io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records. Correct Answer: D most obvious answer. QUESTION 54 You are running a Hadoop cluster with a NameNode on host mynamenode, a secondary NameNode on host mysecondarynamenode and several DataNodes. Which best describes how you determine when the last checkpoint happened? A. Execute hdfs namenode report on the command line and look at the Last Checkpoint information B. Execute hdfs dfsadmin savenamespace on the command line which returns to you the last checkpoint value in fstime file C. Connect to the web UI of the Secondary NameNode ( and look at the "Last Checkpoint" information D. Connect to the web UI of the NameNode ( and look at the "Last Checkpoint" information Correct Answer: C Reference: QUESTION 55 On a cluster running CDH 5.0 or above, you use the hadoop fs put command to write a 300MB file into a previously empty directory using an HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another use see when they look in directory? A. The directory will appear to be empty until the entire file write is completed on the cluster B. They will see the file with a._copying_ extension on its name. If they view the file, they will see contents of the file up to the last completed block (as each 64MB block is written, that block becomes available) C. They will see the file with a._copying_ extension on its name. If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster D. They will see the file with its original name. If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster assessed answer. QUESTION 56

21 Which scheduler would you deploy to ensure that your cluster allows short jobs to finish within a reasonable time without starting long-running jobs? A. Complexity Fair Scheduler (CFS) B. Capacity Scheduler C. Fair Scheduler D. FIFO Scheduler Correct Answer: C Reference: QUESTION 57 You want to understand more about how users browse your public website. For example, you want to know which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your website. Which is the most efficient process to gather these web server across logs into your Hadoop cluster analysis? A. Sample the web server logs web servers and copy them into HDFS using curl B. Ingest the server web logs into HDFS using Flume C. Channel these clickstreams into Hadoop using Hadoop Streaming D. Import all user clicks from your OLTP databases into Hadoop using Sqoop E. Write a MapReeeduce job with the web servers for mappers and the Hadoop cluster nodes for reducers : Apache Flume is a service for streaming logs into Hadoop. Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS). It has a simple and flexible architecture based on streaming data flows; and is robust and fault tolerant with tunable reliability mechanisms for failover and recovery. QUESTION 58 Which three basic configuration parameters must you set to migrate your cluster from MapReduce 1 (MRv1) to MapReduce V2 (MRv2)? (Choose three) A. Configure the NodeManager to enable MapReduce services on YARN by setting the following property in yarn-site.xml: <name>yarn.nodemanager.hostname</name> <value>your_nodemanager_shuffle</value> B. Configure the NodeManager hostname and enable node services on YARN by setting the following property in yarn-site.xml: <name>yarn.nodemanager.hostname</name> <value>your_nodemanager_hostname</value> C. Configure a default scheduler to run on YARN by setting the following property in mapred- site.xml: <name>mapreduce.jobtracker.taskscheduler</name> <Value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value> D. Configure the number of map tasks per jon YARN by setting the following property in mapred: <name>mapreduce.job.maps</name> <value>2</value> E. Configure the ResourceManager hostname and enable node services on YARN by setting the following property in yarn-site.xml: <name>yarn.resourcemanager.hostname</name> <value>your_resourcemanager_hostname</value>

22 F. Configure MapReduce as a Framework running on YARN by setting the following property in mapredsite.xml: <name>mapreduce.framework.name</name> <value>yarn</value> EF options are properly give. QUESTION 59 You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn't optimized for storing and processing many small files, you decide to do the following actions: 1. Group the individual images into a set of larger files 2. Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming. Which data serialization system gives the flexibility to do this? A. CSV B. XML C. HTML D. Avro E. SequenceFiles F. JSON Correct Answer: E : Sequence files are block-compressed and provide direct serialization and deserialization of several arbitrary data types (not just text). Sequence files can be generated as the output of other MapReduce tasks and are an efficient intermediate representation for data that is passing from one MapReduce job to anther. QUESTION 60 Identify two features/issues that YARN is designated to address: (Choose two) A. Standardize on a single MapReduce API B. Single point of failure in the NameNode C. Reduce complexity of the MapReduce APIs D. Resource pressure on the JobTracker E. Ability to run framework other than MapReduce, such as MPI F. HDFS latency Correct Answer: DE Reference: (YARN, first para) QUESTION 61

23 Which YARN daemon or service monitors a Controller's per-application resource using (e.g., memory CPU)? A. ApplicationMaster B. NodeManager C. ApplicationManagerService D. ResourceManager answer is accurate. QUESTION 62 During the execution of a MapReduce v2 (MRv2) job on YARN, where does the Mapper place the intermediate data of each Map Task? A. The Mapper stores the intermediate data on the node running the Job's ApplicationMaster so that it is available to YARN ShuffleService before the data is presented to the Reducer B. The Mapper stores the intermediate data in HDFS on the node where the Map tasks ran in the HDFS / usercache/&(user)/apache/application_&(appid) directory for the user who ran the job C. The Mapper transfers the intermediate data immediately to the reducers as it is generated by the Map Task D. YARN holds the intermediate data in the NodeManager's memory (a container) until it is transferred to the Reducer E. The Mapper stores the intermediate data on the underlying filesystem of the local disk in the directories yarn.nodemanager.locak-difs Correct Answer: E answer is actual. QUESTION 63 You suspect that your NameNode is incorrectly configured, and is swapping memory to disk. Which Linux commands help you to identify whether swapping is occurring? (Select all that apply) A. free B. df C. memcat D. top E. jps F. vmstat G. swapinfo DF Reference: QUESTION 64 On a cluster running CDH 5.0 or above, you use the hadoop fs put command to write a 300MB file into a previously empty directory using an HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another use see when they look in directory?

24 A. The directory will appear to be empty until the entire file write is completed on the cluster B. They will see the file with a._copying_ extension on its name. If they view the file, they will see contents of the file up to the last completed block (as each 64MB block is written, that block becomes available) C. They will see the file with a._copying_ extension on its name. If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster D. They will see the file with its original name. If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster rightful. QUESTION 65 Which command does Hadoop offer to discover missing or corrupt HDFS data? A. Hdfs fs du B. Hdfs fsck C. Dskchk D. The map-only checksum E. Hadoop does not provide any tools to discover missing or corrupt data; there is not need because three replicas are kept for each data block Reference: QUESTION 66 You are running a Hadoop cluster with a NameNode on host mynamenode, a secondary NameNode on host mysecondarynamenode and several DataNodes. Which best describes how you determine when the last checkpoint happened? A. Execute hdfs namenode report on the command line and look at the Last Checkpoint information B. Execute hdfs dfsadmin savenamespace on the command line which returns to you the last checkpoint value in fstime file C. Connect to the web UI of the Secondary NameNode ( and look at the "Last Checkpoint" information D. Connect to the web UI of the NameNode ( and look at the "Last Checkpoint" information Correct Answer: C Reference: QUESTION 67 Your cluster's mapred-start.xml includes the following parameters <name>mapreduce.map.memory.mb</name>

25 <value>4096</value> <name>mapreduce.reduce.memory.mb</name> <value>8192</value> And any cluster's yarn-site.xml includes the following parameters <name>yarn.nodemanager.vmen-pmen-ration</name> <value>2.1</value> What is the maximum amount of virtual memory allocated for each map task before YARN will kill its Container? A. 4 GB B GB C. 8.9 GB D. 8.2 GB E GB Correct Answer: D : In order to get maximum amount of virtual memory allocated for each map task, you have to multiply mapreduce.map.memory.mb with yarn.nodemanager.vmen- pmen-ration. The result would be MB. So the nearest answer is 8.2 since 8.9 is more than MB. Reference: Parameters QUESTION 68 Assuming you're not running HDFS Federation, what is the maximum number of NameNode daemons you should run on your cluster in order to avoid a "split-brain" scenario with your NameNode when running HDFS High Availability (HA) using Quorum-based storage? A. Two active NameNodes and two Standby NameNodes B. One active NameNode and one Standby NameNode C. Two active NameNodes and on Standby NameNode D. Unlimited. HDFS High Availability (HA) is designed to overcome limitations on the number of NameNodes you can deploy com plate answer.

CURSO: ADMINISTRADOR PARA APACHE HADOOP

CURSO: ADMINISTRADOR PARA APACHE HADOOP CURSO: ADMINISTRADOR PARA APACHE HADOOP TEST DE EJEMPLO DEL EXÁMEN DE CERTIFICACIÓN www.formacionhadoop.com 1 Question: 1 A developer has submitted a long running MapReduce job with wrong data sets. You

More information

Communicating with the Elephant in the Data Center

Communicating with the Elephant in the Data Center Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

Certified Big Data and Apache Hadoop Developer VS-1221

Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing

More information

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction

More information

Deploying Hadoop with Manager

Deploying Hadoop with Manager Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution

More information

HDFS Users Guide. Table of contents

HDFS Users Guide. Table of contents Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

HADOOP MOCK TEST HADOOP MOCK TEST II

HADOOP MOCK TEST HADOOP MOCK TEST II http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at

More information

Cloudera Manager Health Checks

Cloudera Manager Health Checks Cloudera, Inc. 220 Portage Avenue Palo Alto, CA 94306 info@cloudera.com US: 1-888-789-1488 Intl: 1-650-362-0488 www.cloudera.com Cloudera Manager Health Checks Important Notice 2010-2013 Cloudera, Inc.

More information

COURSE CONTENT Big Data and Hadoop Training

COURSE CONTENT Big Data and Hadoop Training COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop

More information

HADOOP MOCK TEST HADOOP MOCK TEST I

HADOOP MOCK TEST HADOOP MOCK TEST I http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Understanding Big Data and Big Data Analytics Getting familiar with Hadoop Technology Hadoop release and upgrades

More information

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

Hadoop. History and Introduction. Explained By Vaibhav Agarwal Hadoop History and Introduction Explained By Vaibhav Agarwal Agenda Architecture HDFS Data Flow Map Reduce Data Flow Hadoop Versions History Hadoop version 2 Hadoop Architecture HADOOP (HDFS) Data Flow

More information

Cloudera Manager Health Checks

Cloudera Manager Health Checks Cloudera, Inc. 1001 Page Mill Road Palo Alto, CA 94304-1008 info@cloudera.com US: 1-888-789-1488 Intl: 1-650-362-0488 www.cloudera.com Cloudera Manager Health Checks Important Notice 2010-2013 Cloudera,

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

CDH 5 High Availability Guide

CDH 5 High Availability Guide CDH 5 High Availability Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained

More information

CDH 5 High Availability Guide

CDH 5 High Availability Guide CDH 5 High Availability Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained

More information

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Big Data Operations Guide for Cloudera Manager v5.x Hadoop Big Data Operations Guide for Cloudera Manager v5.x Hadoop Logging into the Enterprise Cloudera Manager 1. On the server where you have installed 'Cloudera Manager', make sure that the server is running,

More information

<Insert Picture Here> Big Data

<Insert Picture Here> Big Data Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big

More information

Cloudera Administration

Cloudera Administration Cloudera Administration Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this

More information

CDH 5 Quick Start Guide

CDH 5 Quick Start Guide CDH 5 Quick Start Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this

More information

Cloudera Manager Training: Hands-On Exercises

Cloudera Manager Training: Hands-On Exercises 201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working

More information

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 Lecture 2 (08/31, 09/02, 09/09): Hadoop Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 K. Zhang BUDT 758 What we ll cover Overview Architecture o Hadoop

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

A very short Intro to Hadoop

A very short Intro to Hadoop 4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,

More information

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software 22 nd October 2013 10:00 Sesión B - DB2 LUW 1 Agenda Big Data The Technical Challenges Architecture of Hadoop

More information

Ankush Cluster Manager - Hadoop2 Technology User Guide

Ankush Cluster Manager - Hadoop2 Technology User Guide Ankush Cluster Manager - Hadoop2 Technology User Guide Ankush User Manual 1.5 Ankush User s Guide for Hadoop2, Version 1.5 This manual, and the accompanying software and other documentation, is protected

More information

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data

More information

Distributed Filesystems

Distributed Filesystems Distributed Filesystems Amir H. Payberah Swedish Institute of Computer Science amir@sics.se April 8, 2014 Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 1 / 32 What is Filesystem? Controls

More information

HADOOP MOCK TEST HADOOP MOCK TEST

HADOOP MOCK TEST HADOOP MOCK TEST http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at

More information

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software

More information

YARN and how MapReduce works in Hadoop By Alex Holmes

YARN and how MapReduce works in Hadoop By Alex Holmes YARN and how MapReduce works in Hadoop By Alex Holmes YARN was created so that Hadoop clusters could run any type of work. This meant MapReduce had to become a YARN application and required the Hadoop

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

Chase Wu New Jersey Ins0tute of Technology

Chase Wu New Jersey Ins0tute of Technology CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at

More information

HADOOP. Revised 10/19/2015

HADOOP. Revised 10/19/2015 HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...

More information

Hadoop 2.6 Configuration and More Examples

Hadoop 2.6 Configuration and More Examples Hadoop 2.6 Configuration and More Examples Big Data 2015 Apache Hadoop & YARN Apache Hadoop (1.X)! De facto Big Data open source platform Running for about 5 years in production at hundreds of companies

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Hadoop Distributed File System. Dhruba Borthakur June, 2007 Hadoop Distributed File System Dhruba Borthakur June, 2007 Goals of HDFS Very Large Distributed File System 10K nodes, 100 million files, 10 PB Assumes Commodity Hardware Files are replicated to handle

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS

More information

How To Use Cloudera Manager Backup And Disaster Recovery (Brd) On A Microsoft Hadoop 5.5.5 (Clouderma) On An Ubuntu 5.2.5 Or 5.3.5

How To Use Cloudera Manager Backup And Disaster Recovery (Brd) On A Microsoft Hadoop 5.5.5 (Clouderma) On An Ubuntu 5.2.5 Or 5.3.5 Cloudera Manager Backup and Disaster Recovery Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or

More information

HADOOP PERFORMANCE TUNING

HADOOP PERFORMANCE TUNING PERFORMANCE TUNING Abstract This paper explains tuning of Hadoop configuration parameters which directly affects Map-Reduce job performance under various conditions, to achieve maximum performance. The

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers

More information

HDFS Installation and Shell

HDFS Installation and Shell 2012 coreservlets.com and Dima May HDFS Installation and Shell Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized Hadoop training courses

More information

Getting to know Apache Hadoop

Getting to know Apache Hadoop Getting to know Apache Hadoop Oana Denisa Balalau Télécom ParisTech October 13, 2015 1 / 32 Table of Contents 1 Apache Hadoop 2 The Hadoop Distributed File System(HDFS) 3 Application management in the

More information

Spectrum Scale HDFS Transparency Guide

Spectrum Scale HDFS Transparency Guide Spectrum Scale Guide Spectrum Scale BDA 2016-1-5 Contents 1. Overview... 3 2. Supported Spectrum Scale storage mode... 4 2.1. Local Storage mode... 4 2.2. Shared Storage Mode... 4 3. Hadoop cluster planning...

More information

Cloudera Administrator Training for Apache Hadoop

Cloudera Administrator Training for Apache Hadoop Cloudera Administrator Training for Apache Hadoop Duration: 4 Days Course Code: GK3901 Overview: In this hands-on course, you will be introduced to the basics of Hadoop, Hadoop Distributed File System

More information

H2O on Hadoop. September 30, 2014. www.0xdata.com

H2O on Hadoop. September 30, 2014. www.0xdata.com H2O on Hadoop September 30, 2014 www.0xdata.com H2O on Hadoop Introduction H2O is the open source math & machine learning engine for big data that brings distribution and parallelism to powerful algorithms

More information

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian From Relational to Hadoop Part 1: Introduction to Hadoop Gwen Shapira, Cloudera and Danil Zburivsky, Pythian Tutorial Logistics 2 Got VM? 3 Grab a USB USB contains: Cloudera QuickStart VM Slides Exercises

More information

Introduction to HDFS. Prasanth Kothuri, CERN

Introduction to HDFS. Prasanth Kothuri, CERN Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. Hadoop

More information

Complete Java Classes Hadoop Syllabus Contact No: 8888022204

Complete Java Classes Hadoop Syllabus Contact No: 8888022204 1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What

More information

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008 Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Pivotal HD Enterprise

Pivotal HD Enterprise PRODUCT DOCUMENTATION Pivotal HD Enterprise Version 1.1 Stack and Tool Reference Guide Rev: A01 2013 GoPivotal, Inc. Table of Contents 1 Pivotal HD 1.1 Stack - RPM Package 11 1.1 Overview 11 1.2 Accessing

More information

Peers Techno log ies Pv t. L td. HADOOP

Peers Techno log ies Pv t. L td. HADOOP Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo, 2010 Agenda Topic 1: Introduction Topic 2: Architecture

More information

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2. EDUREKA Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.0 Cluster edureka! 11/12/2013 A guide to Install and Configure

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

Hadoop Distributed Filesystem. Spring 2015, X. Zhang Fordham Univ.

Hadoop Distributed Filesystem. Spring 2015, X. Zhang Fordham Univ. Hadoop Distributed Filesystem Spring 2015, X. Zhang Fordham Univ. MapReduce Programming Model Split Shuffle Input: a set of [key,value] pairs intermediate [key,value] pairs [k1,v11,v12, ] [k2,v21,v22,

More information

Important Notice. (c) 2010-2013 Cloudera, Inc. All rights reserved.

Important Notice. (c) 2010-2013 Cloudera, Inc. All rights reserved. Hue 2 User Guide Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this document

More information

6. How MapReduce Works. Jari-Pekka Voutilainen

6. How MapReduce Works. Jari-Pekka Voutilainen 6. How MapReduce Works Jari-Pekka Voutilainen MapReduce Implementations Apache Hadoop has 2 implementations of MapReduce: Classic MapReduce (MapReduce 1) YARN (MapReduce 2) Classic MapReduce The Client

More information

HDFS Architecture Guide

HDFS Architecture Guide by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5

More information

Hadoop@LaTech ATLAS Tier 3

Hadoop@LaTech ATLAS Tier 3 Cerberus Hadoop Hadoop@LaTech ATLAS Tier 3 David Palma DOSAR Louisiana Tech University January 23, 2013 Cerberus Hadoop Outline 1 Introduction Cerberus Hadoop 2 Features Issues Conclusions 3 Cerberus Hadoop

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

HDFS. Hadoop Distributed File System

HDFS. Hadoop Distributed File System HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files

More information

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability

More information

Big Data Technology Core Hadoop: HDFS-YARN Internals

Big Data Technology Core Hadoop: HDFS-YARN Internals Big Data Technology Core Hadoop: HDFS-YARN Internals Eshcar Hillel Yahoo! Ronny Lempel Outbrain *Based on slides by Edward Bortnikov & Ronny Lempel Roadmap Previous class Map-Reduce Motivation This class

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

7 Deadly Hadoop Misconfigurations. Kathleen Ting February 2013

7 Deadly Hadoop Misconfigurations. Kathleen Ting February 2013 7 Deadly Hadoop Misconfigurations Kathleen Ting February 2013 Who Am I? Kathleen Ting Apache Sqoop Committer, PMC Member Customer Operations Engineering Mgr, Cloudera @kate_ting, kathleen@apache.org 2

More information

Building and Administering Hadoop Clusters. 21 April 2011 Jordan Boyd-Graber

Building and Administering Hadoop Clusters. 21 April 2011 Jordan Boyd-Graber Building and Administering Hadoop Clusters 21 April 2011 Jordan Boyd-Graber Administrivia Homework 5 graded Homework 6 due soon Keep working on projects! Final next week (will take better of midterm of

More information

Hadoop Basics with InfoSphere BigInsights

Hadoop Basics with InfoSphere BigInsights An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Unit 4: Hadoop Administration An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Cloudera Backup and Disaster Recovery

Cloudera Backup and Disaster Recovery Cloudera Backup and Disaster Recovery Important Note: Cloudera Manager 4 and CDH 4 have reached End of Maintenance (EOM) on August 9, 2015. Cloudera will not support or provide patches for any of the Cloudera

More information

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray ware 2 Agenda The Hadoop Journey Why Virtualize Hadoop? Elasticity and Scalability Performance Tests Storage Reference

More information

HDFS: Hadoop Distributed File System

HDFS: Hadoop Distributed File System Istanbul Şehir University Big Data Camp 14 HDFS: Hadoop Distributed File System Aslan Bakirov Kevser Nur Çoğalmış Agenda Distributed File System HDFS Concepts HDFS Interfaces HDFS Full Picture Read Operation

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Introduction to HDFS. Prasanth Kothuri, CERN

Introduction to HDFS. Prasanth Kothuri, CERN Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. HDFS

More information

Apache Hadoop Storage Provisioning Using VMware vsphere Big Data Extensions TECHNICAL WHITE PAPER

Apache Hadoop Storage Provisioning Using VMware vsphere Big Data Extensions TECHNICAL WHITE PAPER Apache Hadoop Storage Provisioning Using VMware vsphere Big Data Extensions TECHNICAL WHITE PAPER Table of Contents Apache Hadoop Deployment on VMware vsphere Using vsphere Big Data Extensions.... 3 Local

More information

Design and Evolution of the Apache Hadoop File System(HDFS)

Design and Evolution of the Apache Hadoop File System(HDFS) Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop

More information

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters CONNECT - Lab Guide Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters Hardware, software and configuration steps needed to deploy Apache Hadoop 2.4.1 with the Emulex family

More information

Apache Hadoop YARN: The Nextgeneration Distributed Operating. System. Zhijie Shen & Jian He @ Hortonworks

Apache Hadoop YARN: The Nextgeneration Distributed Operating. System. Zhijie Shen & Jian He @ Hortonworks Apache Hadoop YARN: The Nextgeneration Distributed Operating System Zhijie Shen & Jian He @ Hortonworks About Us Software Engineer @ Hortonworks, Inc. Hadoop Committer @ The Apache Foundation We re doing

More information

Apache Hadoop new way for the company to store and analyze big data

Apache Hadoop new way for the company to store and analyze big data Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File

More information

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013 Hadoop 101 Lars George NoSQL- Ma4ers, Cologne April 26, 2013 1 What s Ahead? Overview of Apache Hadoop (and related tools) What it is Why it s relevant How it works No prior experience needed Feel free

More information