Hadoop Distributed File System Jordan Prosch, Matt Kipps
Outline - Background - Architecture - Comments & Suggestions
Background
What is HDFS? Part of Apache Hadoop - distributed storage
What is Hadoop? Hadoop was created by Doug Cutting and Mike Cafarella in 2005 Originally designed to support Nutch as a web page indexer Based on Google File System and Google MapReduce Distributed data processing framework Designed to be portable - implemented in Java Actively developed by Apache Software Foundation Open source Made up of HDFS, MapReduce, YARN, Common
Hadoop Stack
Usage Yahoo Webmap analytics - every Yahoo web search query Facebook Analytics warehouse Distributed database storage Server backups Twitter Internal data analytics
What is HDFS? Part of Apache Hadoop - distributed storage Stream-oriented data storage Batch, not interactive Supports huge file sizes and up to thousands of servers Provides reliable and operable storage Designed for distributed data processing Works with Hadoop MapReduce Servers provide both computation and storage resources
Architecture
HDFS Architecture Master/slave One cluster has one NameNode and multiple DataNodes Files are split into equal-sized data blocks (typically 64MB or 128MB) Blocks are replicated across the cluster (typically 3 replicas) Simplified data consistency model Write-once-read-many Improves throughput/performance Reads are straightforward (read from closest replica) TCP/IP, special protocols between HDFS components (RPC-like)
Master/Slave NameNode Acts as master server Manages file system metadata, maps blocks to files Responsible for file namespace operations (open, close, rename...) DataNode No system knowledge Store file blocks Responsible for reading, writing Responsible for block creation, deletion, and replication (when NameNode says so) Send block report and heartbeat periodically to NameNode
Architecture Diagram
Writing Writing is cached locally to client s own disk in temp file Each time a block size of data is accumulated in cache, client notifies the NameNode NameNode responds with list of DataNodes Client flushes block from cache to first DataNode,which is then replicated in a pipeline fashion between the DataNodes in the list.
Configuration Typical cluster configuration: 1 machine for active NameNode (master server) 1 machine for standby NameNode (in case active NameNode fails) remaining machines used for DataNodes (1 machine per DataNode) Possible to have more than one active NameNode But having only one simplifies the cluster architecture Also possible to have more than one DataNode on a machine Almost never done
Fault Tolerance Hardware failure is the norm rather than the exception It is the primary objective of HDFS to store data reliably in the presence of failures DataNode failures NameNode faiures Network failures Data integrity
Fault Tolerance Mechanisms DataNode failure Solution: replicate files across multiple DataNodes so the data is hard to lose Network failure Causes DataNodes to lose connection to NameNode Solution: mark lost DataNodes as dead and replicate any lost block replicas to other DataNodes Data Integrity Data may arrive corrupted Solution: implement checksum checking. When a checksum fails, client can request the data from another replica. NameNode failure Solution: switch to other NameNode server (if any), otherwise manually restart
Replication NameNode regulates all block replication decisions How it works: DataNodes send periodic heartbeats to NameNode A missing heartbeat indicates a dead DataNode NameNode initiates replication for lost blocks (using a block from another DataNode) until the replication factor is reached Move new replicas to other DataNodes Try not to put multiple replicas of a block on the same DataNode
Block Replica Placement Properly configured Hadoop cluster has rack awareness Rack switches at each rack Block replicas are typically placed: 2 replicas on the local rack (1 each on separate DataNodes) 1 replica on a different rack Placement policy motivations: Minimize inter-rack write traffic Rack failure << DataNode failure Still maintain some benefit of distributed reads 2 2 1 1 2 1
Additional Features HDFS cluster monitoring Operator monitoring tools for cluster DataNode health Cluster rebalancing Tools available to assist in rebalancing blocks across the cluster
Comments & Suggestions
Key Contributions Fault tolerant Runnable on commodity hardware Provides streaming-data access GB- and TB-sized files PB-sized clusters Distributed, scalable, portable Gives control over the number of replicas 3 is a common replication factor
Drawbacks No access permissions No user quotas NameNode is a single-point-of-failure Write-once-read-many access model Files are immutable. If a file needs to be edited in the middle, the only solution is to make another one (!) with the edits made (which will quickly fill up disk space)
Improvements Add more fault-tolerance to NameNode Allow files to be rewritten or appended Implement automatic periodic data block balancing across the cluster DataNodes
Documentation Hadoop architecture inconsistently described Current status of file writing Append? Read-while-writing? Resources are scattered Architecture description in Hadoop resources are brief Open source project Different commercial modifications to Hadoop Many different branches, incomplete features
Resources Apache Hadoop Project Homepage. Apache Software Foundation. Online: http://hadoop.apache.org/ Apache Hadoop. Wikipedia. Online: http://en.wikipedia.org/wiki/apache_hadoop File Appends in HDFS. Cloudera. Online: http://blog.cloudera.com/blog/2009/07/fileappends-in-hdfs/ Hadoop Distributed File System. Hortonworks. Online: http://hortonworks.com/hadoop/hdfs/ HDFS Architecture. Apache Software Foundation. 7 Oct. 2013. Online: http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoophdfs/hdfsdesign.html How Yahoo Spawned Hadoop, the Future of Big Data. Wired. 10 Oct. 2011. Online: http://www.wired.com/2011/10/how-yahoo-spawned-hadoop/all/ The Hadoop Distributed File System. The Architecture of Open Source Applications. Online: http://aosabook.org/en/hdfs.html
Questions?