Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Size: px

Start display at page:

Download "Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela"

Homer Hood
8 years ago
Views:

1 Hadoop Distributed File System T Seminar On Multimedia Eero Kurkela

2 Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance When to choose HDFS? HDFS in action Future of HDFS Alternative approaches References / literature

3 Apache Hadoop HDFS Apache Hadoop Project that "develops open-source software for reliable, scalable, distributed computing" [1] HDFS (Hadoop Distributed File System) Subproject of Hadoop Target: reliable and rapid computation on large data sets, emphasis on high throughput Primary storage system for Hadoop applications Designed especially for sending and receiving data sets for MapReduce operations Serves also as a (limited) general purpose DFS [1][3][10]

on large data sets, emphasis on high throughput Primary storage system for Hadoop applications Designed especially

4 Flesh and bones of HDFS User level file system Written in Java Typically running on some GNU/Linux operating system Can be deployed on commodity hardware This is actually a key assumption in the design Inter-node and client communication protocols work on top of TCP/IP API/shell/browser access [5]

hardware This is actually a key assumption in the design Inter-node and

5 Architecture 1/4: Overview Based on GFS, master/slave [5][10]

6 Architecture 2/4: Namespace and files Common hierarchical namespace structure Directories that include directories and files WORM (write-once-read-many) access model Files and directories can be created, deleted, moved, renamed, opened, closed NOT modified Simplifies replication Files split into blocks Default size 64 MB [5][10]

model Files and directories can be created, deleted, moved, renamed, opened,

7 Architecture 3/4: NameNode NameNode = Master Provides instructions to DataNodes Point of access for clients One NameNode per cluster Typically a dedicated machine Achilles' heel Namespace and metadata management Keeps metadata on RAM ( scalability bottleneck) Decides how blocks are placed in DataNodes [5][10]

dedicated machine Achilles' heel Namespace and metadata management Keeps

8 Architecture 4/4: DataNode DataNode = Slave [5] Serves block creation/deletion/replication requests from NameNode Serves read/write requests from clients Typically several DataNodes, dedicated machines Stores blocks as files in local file system, knows nothing about HDFS files Provides Blockreports to NameNode Blocks always transferred directly between DataNodes and Clients

dedicated machines Stores blocks as files in local file system, knows nothing about HDFS

9 Accessing data [5] FileSystem Java API + wrapper for C Commandline interface: FS Shell Practical for scripts Commands resemble using Unix utilities, e.g. bin/hadoop dfs mkdir /tempdir bin/hadoop dfs -cat /tempdir/tempfile.txt dfsadmin for administrative tasks e.g. bin/hadoop dfsadmin -refreshnodes Web browser based interface for browsing the namespace

Unix utilities, e.g. bin/hadoop dfs mkdir /tempdir bin/hadoop dfs -cat /tempdir/tempfile.

10 Data replication strategy 1/2: Overview Basis for fault tolerance Replica placement affects performance a lot NameNode responsible for deployment Number of replicas and block size can be configured separately for each file Concept of rack-awareness NameNode determines which DataNodes belong to same racks Idea is to minimize network traffic between racks [5]

block size can be configured separately for each file Concept of rack-awareness NameNode

11 Data replication strategy 2/2: Default strategy Replication factory = 3 One replica in a node in local rack One replica in another node in the same rack One replica in another rack Balanced for write performance and fault tolerance Replication pipelining DataNode forwards data to another DataNode according to a list generated by NameNode [5]

another rack Balanced for write performance and fault tolerance Replication

12 Fault tolerance Failure at DataNode Hearbeat missing stop I/O, re-replicate Network failure Data integrity failure Checksum Failure at NameNode Backing up data highly recommended No built-in method for automatic recovery available [5]

Checksum Failure at NameNode Backing up data highly

13 When to choose HDFS? 1/2: Applications & data Think of HDFS as data set system instead of file system Ideal for batch processing, not interactive tasks Intended for streaming a lot of data though seeking to an arbitrary point is also supported Throughput optimized at the cost of latency Typically millions of files, avg file size 1 GB... 1 TB E.g. web crawlers, GIS data management, archival, statistical analysis, and naturally Hadoop apps WORM access model must be acceptable [5][10]

interactive tasks Intended for streaming a lot of data though seeking to an arbitrary point is also supported Throughput

14 When to choose HDFS? 2/2: Points regarding environment Works whenever Java works Highly portable, good support for Java applications Supports mechanisms for briging computation physically closer to the data Saves bandwidth compared to moving data Designed for thousands of nodes, several of which are always broken [5]

good support for Java applications Supports mechanisms for briging computation

15 HDFS in action Yahoo: The Yahoo! Search Webmap cores and 5 PB of storage capacity Produces data for all Yahoo! web search queries HDFS caused a 34% drop in processing time [7] Adobe [2] 30 nodes in clusters of 5-14 nodes (dev+prod) Social services, data storage, internal use AOL 50-node cluster with 37 TB of HDFS capacity Behavioral analysis, targeting, statistics generation

web search queries HDFS caused a 34% drop in processing time [7] Adobe [2] 30 nodes in clusters

16 HDFS in action Facebook [2] 600-node cluster with 2 PW of storage capacity Logs, reporting, analysis, machine learning FUSE implementation over HDFS Iterend Blog search engine 10-node HDFS cluster Spadac Storing and processing geospatial imagery and vector data

implementation over HDFS Iterend Blog search engine 10-node HDFS

17 Future of HDFS 1/2: Confirmed plans Moving on from WORM: support for appending data to files Improvements in namespace maintenance (invisible to clients) Access via WebDAV protocol Extends HTTP for file management & modification Support for snapshots for returning to a functional state in case of corruption of file system [5] Tuning the replica placement policy

protocol Extends HTTP for file management & modification Support for snapshots for

18 Future of HDFS 2/2: Possible improvements User quotas Hard and soft links Data rebalancing mecahisms Move blocks to other nodes if disk space on a certain DataNode drops too low Create additional replicas if demand for a certain file rises significantly Automatic recovery from NameNode failure [5]

certain DataNode drops too low Create additional replicas if demand for a

19 Alternative approaches DFS's come generally in two flavors 1) Designed for running Internet services Often developed by companies like Google and Amazon GoogleFS, Amazon S3, HDFS 2) Designed for high-performance computing Parallel file systems IBM GPFS, Sun Lustre FS PVFS (Parallel Virtual File System) Open-source, user-level filesystem like HDFS, has some highlevel design similarities In use at Argonne national lab, Ohio supercomputer center,... [8][9]

Parallel file systems IBM GPFS, Sun Lustre FS PVFS (Parallel Virtual File System) Open-source, user-level

20 [9] HDFS vs. PVFS 1/2: Design

21 HDFS vs. PVFS 2/2: Performance [9] Executed in the Hadoop Internet services stack, note that PVFS is sending writes to three servers

22 References / Literature [1] What is Hadoop?, The Apache Software Foundation, referenced on , available at hadoop.apache.org/ [2] PoweredBy, The Apache Software Foundation, referenced on , available at [4] HDFS User Guide, The Apache Software Foundation, referenced on , available at [5] HDFS Architecture, The Apache Software Foundation, referenced on , available at [7] Yahoo! Launches World's Largest Hadoop Production Application, Eric Baldeschwieler (Senior Director, Grid Computing, Yahoo! Inc.), referenced on , available at [8] The Google File System; Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung; 2003; available at [9] Data-intensive File systems for Internet services: A rose by any other name...; Wittawat Tantisiriroj, Swapnil Patil, Garth Gibson; Carnegie Mellon University / Parallel Data Laboratory; 10/2008; available at [10] MapReduce and HDFS; Cloudera, Inc.; referenced on , slides and video available at

HDFS Architecture Guide

HDFS Architecture Guide by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5