Distributed File Systems

Similar documents

Distributed File Systems

THE HADOOP DISTRIBUTED FILE SYSTEM

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

The Hadoop Distributed File System

The Hadoop Distributed File System

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File Systems

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

The Google File System

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

The Google File System

Sunita Suralkar, Ashwini Mujumdar, Gayatri Masiwal, Manasi Kulkarni Department of Computer Technology, Veermata Jijabai Technological Institute

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Google File System. Web and scalability

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

HDFS Architecture Guide

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

HDFS Users Guide. Table of contents

Massive Data Storage

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Snapshots in Hadoop Distributed File System

The Hadoop Distributed File System

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Hadoop Architecture. Part 1

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

Design and Evolution of the Apache Hadoop File System(HDFS)

IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY

HDFS Space Consolidation

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Big Data Processing in the Cloud. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center

Comparative analysis of Google File System and Hadoop Distributed File System

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Seminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems

CSE-E5430 Scalable Cloud Computing Lecture 2

Erlang Distributed File System (edfs)

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

HDFS Design Principles

Cloud Computing at Google. Architecture

Distributed Metadata Management Scheme in HDFS

Apache Hadoop FileSystem and its Usage in Facebook

Processing of Hadoop using Highly Available NameNode

RAID. Tiffany Yu-Han Chen. # The performance of different RAID levels # read/write/reliability (fault-tolerant)/overhead

Big Data With Hadoop

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Accelerating and Simplifying Apache

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Data-Intensive Computing with Map-Reduce and Hadoop

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

International Journal of Advance Research in Computer Science and Management Studies

Parallel Processing of cluster by Map Reduce

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Hadoop Distributed File System (HDFS) Overview

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

GraySort and MinuteSort at Yahoo on Hadoop 0.23

HADOOP MOCK TEST HADOOP MOCK TEST I

PERFORMANCE ANALYSIS OF A DISTRIBUTED FILE SYSTEM

Hadoop IST 734 SS CHUNG

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Hadoop: Embracing future hardware

Hadoop Scalability at Facebook. Dmytro Molkov YaC, Moscow, September 19, 2011

!"#$%&' ( )%#*'+,'-#.//"0( !"#$"%&'()*$+()',!-+.'/', 4(5,67,!-+!"89,:*$;'0+$.<.,&0$'09,&)"/=+,!()<>'0, 3, Processing LARGE data sets

Distributed Filesystems

NoSQL and Hadoop Technologies On Oracle Cloud

Distributed File Systems

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics

Fault Tolerance in Hadoop for Work Migration

Storage Architectures for Big Data in the Cloud

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop and Map-Reduce. Swati Gore

<Insert Picture Here> Big Data

Big Data Technology Core Hadoop: HDFS-YARN Internals

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

Apache Hadoop. Alexandru Costan

A REVIEW: Distributed File System

ZooKeeper. Table of contents

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2

Distributed Lucene : A distributed free text index for Hadoop

What Is Datacenter (Warehouse) Computing. Distributed and Parallel Technology. Datacenter Computing Application Programming

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Hadoop and ecosystem * 本文中的言论仅代表作者个人观点 * 本文中的一些图例来自于互联网. Information Management. Information Management IBM CDL Lab

Hadoop Big Data for Processing Data and Performing Workload

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

CDH AND BUSINESS CONTINUITY:

Highly Available Hadoop Name Node Architecture-Using Replicas of Name Node with Time Synchronization among Replicas

HDFS scalability: the limits to growth

Transcription:

Distributed File Systems Alemnew Sheferaw Asrese University of Trento - Italy December 12, 2012 Acknowledgement: Mauro Fruet Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 1 / 55

Outline 1 Distributed File Systems Introduction Most Important DFS 2 The Google File System (GFS) Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis 3 The Hadoop Distributed File System (HDFS) Introduction Architecture File I/O Operations and Replica Management Writing and Reading Data Replica Placement 4 Conclusions 5 Bibliography Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 2 / 55

Distributed File Systems Introduction Introduction Definition of DFS File system that allows access to files from multiple hosts via a computer network Features Share files and storage resources on multiple machines No direct access to data storage for clients Clients interact using a protocol Access restrictions to file system Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 3 / 55

Distributed File Systems Introduction DFS Goals Access transparency Location transparency Concurrent file updates File replication Hardware and software hetereogenity Fault tolerance Consistency Security Efficiency Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 4 / 55

Distributed File Systems Most Important DFS Most Important DFS NFS (Network File System), developed by Sun Microsystems in 1984 Small number of clients, very small number of servers Any machine can be a client and/or a server Stateless file server with few exceptions (file locking) High performance AFS (Andrew File System), developed by the Carnegie Mellon University Support sharing on a large scale (up to 10,000 users) Client machines have disks (cache files for long periods) Most files are small Reads more common than writes Most files read/written by one user Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 5 / 55

Distributed File Systems Most Important DFS Need for Large Streams of Data Standard DFS not adequate to manage large streams of data: Google : 24 PB/day Facebook: 130 TB/day for logs, 500+ TB/day new date entry, 300 M photos uploaded per day, 5 040 TB data scanned in Hive per day LHC-CERN: 25 PB/year Such workloads need different assumptions and goals. Proposed solutions: The Google File System (GFS) Hadoop Distributed File System (HDFS) by Apache Software Foundation Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 6 / 55

The Google File System (GFS) Introduction The Google File System (GFS) [1] Proprietary DFS, only key ideas available Developed and implemented by Google in 2003 Shares many goals with standard DFS (performance, scalability, reliability, availability,...) Needed to manage data-intensive applications Provides fault tolerance Provides high performance Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 7 / 55

The Google File System (GFS) Introduction GFS - Motivations Component failures are the norm: A storage cluster is built from hundreds or thousands of inexpensive commodity servers Constant monitoring, error detection, fault tolerance, and automatic recovery is needed Huge files (Multi- GB) with many application objects Appending new data rather than overwriting existing Co-designing the applications and the file system API to increase flexibility Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 8 / 55

The Google File System (GFS) Design Overview Assumptions Composed of inexpensive commodity components that often fail Modest number of huge files (Multi GBs) Large streaming reads and small random reads Sequential writes that append to files rather than overwrite Files seldom modified again Semantics for concurrently append to the same file by multiple clients High bandwidth more important than low latency Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 9 / 55

The Google File System (GFS) Design Overview GFS Interface Familiar file system interface Don t implement Standard API such as POSIX Supports Common Operations (create, delete,open, close, read, and write files) Snapshot Record append Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 10 / 55

Architecture The Google File System (GFS) Design Overview Figure: GFS Architecture Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 11 / 55

The Google File System (GFS) Design Overview Architecture Master has file system metadata Master controls: Chunk lease management Garbage collection of Orphaned chunks Chunk migration Periodically HeartBeat messages exchanged between master and chunkservers to check the state of chunkserver Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 12 / 55

The Google File System (GFS) Design Overview Single Master Enables to make sophisticated chunk placement and replication decisions Must minimize its involvement in reads and writes Clients never read and write file data through the master Client asks the master which chunkservers it should contact Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 13 / 55

The Google File System (GFS) Design Overview Single Master Cont... Problems Bandwidth bottleneck A single point of failure Opportunities Central place to control replication and garbage collection and many other activities Solution Around 2009 a Distributed Master is proposed and implemented for the current schema. BigTable and file count are being used. Details are not clearly available :) Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 14 / 55

The Google File System (GFS) Design Overview ChunkServer Large chunk size 64 MB - larger than typical File system Large chunk size: Pros Cons Reduces client master interaction Reduce network overhead by keeping a persistent TCP connection Reduces the size of the metadata stored on the master internal fragmentation (solution: lazy space allocation) large chunks can become hot spots (solutions: higher replication factor, staggering application start time, client-to-client communication) Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 15 / 55

Metadata The Google File System (GFS) Design Overview All metadata is kept in main memory of the master Namespace File-to-chunk mapping Chunks replicas location Access control information Namespace and file-to-chunk mapping are persistently stored Mutations logged to an operation log No persistent record of replicas locations Operation Log: Contains critical metadata changes Defines order of concurrent operations Replicated on multiple remote machines Used to recover the master Checkpoint of master if log beyond certain size: Master switches to a new log Checkpoint created with all mutations before switch Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 16 / 55

The Google File System (GFS) Design Overview Consistency Model File namespace mutations are atomic Namespace locking guarantees atomicity and correctness Mutations applied in same order on all replicas Chunk version numbers used to detect any stale replica (missed mutations while chunkserver was down) Stale replicas are garbage collected Checksum for detection of data corruption A file region is consistent if all the clients will always see the same data regardless of which replicas they read from. Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 17 / 55

The Google File System (GFS) System Interactions Leases and Mutation Order Master uses leases to maintain consistent mutation order Master grants a chunk lease to a replica, called primary Primary chooses serial order between Mutations Lease grant order defined by the master Order within a lease defined by the primary Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 18 / 55

The Google File System (GFS) Control Flow of a Write System Interactions Figure: Write Control and Data Flow Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 19 / 55

Data Flow The Google File System (GFS) System Interactions Separated flow of data and control Goals: 1 Fully use outgoing bandwidth of each machine: Data pushed in a chain of chunkservers 2 Avoid network bottlenecks and high-latency links: Each machine forwards data to closest machine 3 Minimize latency: Once a chunkserver receives data, it starts forwarding immediately Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 20 / 55

The Google File System (GFS) System Interactions Atomic Record Appends Traditional Write Client specifies the offset Concurrent writes to the same region are not serializable Record Append Client specifies only the data GFS Appends it to the file at least once atomically at an offset of GFSs choosing and returns that offset to the client Used by Distributed application in which many clients on different machines append to the same file concurrently Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 21 / 55

Snapshot The Google File System (GFS) System Interactions Used to: 1 Quickly create copies of huge data sets 2 Easily checkpoint current state The master receives snapshot request: 1 Revokes leases on chunks of the files it is about to snapshot 2 Logs the operation to disk 3 Duplicates the corresponding metadata New snapshot file points to same chunk C of source file New chunk C created for first write after snapshot Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 22 / 55

The Google File System (GFS) Master Operation Namespace Management and Locking No per-directory data structure that lists all files in the directory Doesn t support aliases for the same file (No hard or symbolic links) Namespace maps full pathnames to metadata Each master operation acquires a set of locks before it runs Read lock on internal nodes and read/write lock on the leaf Locking allows concurrent mutations in the same directory Read lock on directory prevents its deletion, renaming or snapshot Write locks on file names serialize attempts to create a file with the same name twice Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 23 / 55

The Google File System (GFS) Master Operation Replica Placement Hundreds of chunkservers across many racks Chunkservers accessed from hundreds of clients Goals: Maximize data reliability and availability Maximize network bandwidth utilization Replicas spread across machines and racks Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 24 / 55

The Google File System (GFS) Master Operation Creation, Re-replication and Rebalancing Creation Place new replicas on chunkservers with below-average disk usage Limit number of recent creations on each chunkservers Replication When number of available replicas falls below a user-specified goal Prioritization: based on how far it is from its replication goal, live files than recently deleted files... Rebalancing Periodically, for better disk utilization and load balancing Distribution of replicas is analyzed Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 25 / 55

The Google File System (GFS) Master Operation Garbage Collection File deletion logged by master File renamed to a hidden name with deletion timestamp instead of reclaiming the resources immediately Master regularly scan and removes hidden files older than 3 days (configurable) Until then, hidden file can be read and undeleted When a hidden file is removed, its in-memory metadata is erased Orphaned chunks identified removed during a regular scan of the chunk namespace Safety against accidental irreversible deletion Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 26 / 55

The Google File System (GFS) Master Operation Stale Replica Detection Need to distinguish between up-to-date and stale replicas Chunk version number: Increased when master grants new lease on the chunk Not increased if replica is unavailable Stale replicas deleted by master in regular garbage collection Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 27 / 55

The Google File System (GFS) Fault Tolerance and Diagnosis High Availability Fast Recovery Master and chunkservers have to restore their state and start in seconds no matter how they terminated Chunk Replication Each chunk is replicated on multiple chunkservers on different racks Master Replication Master state replicated for reliability on multiple machines When master fails: It can restart almost instantly A new master process is started elsewhere Shadow (not mirror) master provides only read-only access to file system when primary master is down Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 28 / 55

The Google File System (GFS) Fault Tolerance and Diagnosis Data Integrity Checksumming to detect corruption of stored data Integrity verified by each chunkserver independently Corruptions not propagated to other chunkservers Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 29 / 55

The Google File System (GFS) Fault Tolerance and Diagnosis Diagnostic Tools Extensive and detailed diagnostic logging Diagnostic logs record many significant events(chunkservers going up and down, RPC requests and replies)... can be easily deleted without affecting the system Bibliography Case Study, GFS: Evolution on Fast-forward, A discussion between Kirk McKusick and Sean Quinlan; ACM Queue, 2009 Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 30 / 55

The Hadoop Distributed File System (HDFS) Introduction What is Hadoop? Summary An Apache project Yahoo! has developed and contributed to 80% of the core of Hadoop (HDFS and MapReduce). Components: HBase - originally by Powerset, then Microsoft Hive - Facebook Pig, ZooKeeper, Chukwa - Yahoo! Avro - Originally Yahoo, and co-developed with cloudera Uses MapReduce Distributed Computation framework Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 31 / 55

The Hadoop Distributed File System (HDFS) Introduction The Hadoop Distributed File System (HDFS) [2] Sub-project of Apache Hadoop project Designed to store very large data sets separately: Metadata on dedicated server called NameNode Application data on DataNodes Creates multiple replicas of data blocks Distributes replicas on nodes Highly fault-tolerant Designed to run on commodity hardware Suitable for applications for large data sets Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 32 / 55

Alemnew S. Asrese (UniTN) Figure: Distributed HDFSFile Architecture Systems 2012/12/12 33 / 55 Architecture The Hadoop Distributed File System (HDFS) Architecture

The Hadoop Distributed File System (HDFS) Architecture NameNode Namespace is a hierarchy of files and directories Files and directories are represented on the NameNode by inodes Maintains the namespace tree and the mapping of file blocks to DataNodes Authorization and authentication A multithreaded system and processes requests simultaneously from multiple clients Keeps the entire namespace in RAM, allowing fast access to the metadata. Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 34 / 55

The Hadoop Distributed File System (HDFS) Architecture DataNodes Responsible for serving read and write requests from the file systems clients Perform block creation, deletion, and replication upon instruction from the NameNode Performs handshake by connecting with NameNode during startup To verify the Namespace ID and software version of the DataNode Nodes with a different namespace ID will not be able to join the cluster Incompatible version may cause data corruption or loss Send heartbeats to the NameNode to confirm that the DataNode is operating and the block replicas it hosts are available. Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 35 / 55

HDFS Client The Hadoop Distributed File System (HDFS) Architecture Figure: HDFS Client, NameNode and DataNode interaction for creating a file Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 36 / 55

The Hadoop Distributed File System (HDFS) Architecture Image and Journal Image File system metadata that describes the organization of application data as directories and files A persistent record of the image written to disk is called a checkpoint Journal A write-ahead commit log for changes to the file system that must be persistent During startup the NameNode initializes the namespace image from the checkpoint, and then replays changes from the journal A new checkpoint and empty journal are written back to the storage directories before the NameNode starts serving clients Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 37 / 55

The Hadoop Distributed File System (HDFS) Architecture CheckpointNode Periodically combines the existing checkpoint and journal to create a new checkpoint and an empty journal Usually runs on a different host wrt NameNode Downloads the current checkpoint and journal files from the NameNode, Merges them locally,and returns the new checkpoint back to the NameNode Multiple CheckPoint Nodes may be used simultaneously Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 38 / 55

The Hadoop Distributed File System (HDFS) Architecture BackupNode Create periodic checkpoints As Checkpoint Node, but it applies edits to its own namespace copy Without downloading checkpoint and journal files from the active NameNode But in addition it maintains an in-memory, up-to-date image of the file system namespace Always synchronized with the state of the NameNode Can be viewed as a read-only NameNode Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 39 / 55

The Hadoop Distributed File System (HDFS) File Read and Write, Block Placement File I/O Operations and Replica Management File Read and Write Implements a single-writer, multiple-reader model Block Placement Spread the nodes across multiple racks Replicas placed: The first on the node where the writer located, The second and third on different nodes at different racks The others palced on random nodes: 1 No Datanode contains more than one replica of any block 2 No rack contains more than two replicas are in the same rack Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 40 / 55

The Hadoop Distributed File System (HDFS) File I/O Operations and Replica Management Replication management Detects that a block has become under- or over-replicated When a block becomes over replicated, the NameNode chooses a replica to remove When a block becomes under-replicated, it is put in the replication priority queue A block with only one replica has the highest priority A block with a number of replicas that is greater than two thirds of its replication factor has the lowest priority Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 41 / 55

The Hadoop Distributed File System (HDFS) File I/O Operations and Replica Management Inter-Cluster Data Copy A tool called DistCp used for large inter/intra-cluster parallel copying A MapReduce job; each of the map tasks copies a portion of the source data into the destination file system Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 42 / 55

The Hadoop Distributed File System (HDFS) Writing Data in HDFS Cluster Writing and Reading Data Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 43 / 55

The Hadoop Distributed File System (HDFS) Writing Data in HDFS Cluster Writing and Reading Data Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 44 / 55

The Hadoop Distributed File System (HDFS) Writing Data in HDFS Cluster Writing and Reading Data Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 45 / 55

The Hadoop Distributed File System (HDFS) Reading Data in HDFS Cluster Writing and Reading Data Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 46 / 55

The Hadoop Distributed File System (HDFS) Types of Faults and Their Detection Writing and Reading Data Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 47 / 55

The Hadoop Distributed File System (HDFS) Types of Faults and Their Detection Writing and Reading Data Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 48 / 55

The Hadoop Distributed File System (HDFS) Writing and Reading Data Handling Reading and Writing Failures Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 49 / 55

The Hadoop Distributed File System (HDFS) Handling DataNode Failures Writing and Reading Data Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 50 / 55

The Hadoop Distributed File System (HDFS) Replica Placement Strategy Replica Placement Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 51 / 55

The Hadoop Distributed File System (HDFS) Replica Placement Cluster Rebalancing Balanced cluster: no under-utilized or over-utilized DataNodes Common reason: addition of new DataNodes Requirements No block loss No change of the number of replicas No reduction of the number of racks a block resides in No saturation of the network Rebalancing issued by an administrator Rebalancing decisions made by a Rebalancing Server Goal of each iteration: reduce imbalance in over/under-utilized DataNodes Source and destination selection based on some priorities Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 52 / 55

The Hadoop Distributed File System (HDFS) Replica Placement Balancer A tool that balances disk space usage on an HDFS cluster. Iteratively moves replicas from DataNodes with higher utilization to DataNodes with lower utilization Bibliography http: // hadoop. apache. org/ docs/ hdfs Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 53 / 55

Conclusions Conclusions GFS Proprietary file system All the details in the paper of 2003 HDFS Open-source project Details are available in the paper of 2010 and in Hadoop website Used by many big companies Inspired by GFS Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 54 / 55

Bibliography Bibliography Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP 03, pages 29 43, New York, NY, USA, 2003. ACM. Shvachko Konstantin, Kuang Hairong, Radia Sanjay, and Robert Chansler. The Hadoop Distributed File System. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST 10, pages 1 10, Washington, DC, USA, 2010. IEEE Computer Society. Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 55 / 55