마스터 제목 스타일 편집 마스터 부제목 Availability 스타일 편집 and Data durability in HDFS 3 Jun 2011 nfracatals, 고등기술 연구소 / 이문수 moon@nfractals.com Company Profile and Business 1
Who we are? Since 2009 Consulting Solution for cloud computing Media delivery service 2
Topics We discuss about HDFS's Availability Data durability Not Performance Scalability 3
Overview of HDFS HDFS is a distributed storage system for reliably storing petabytes of data on clusters of commodity hardware Name node Managing metadata. Issues replication Secondary name node (Aka. Checkpoint node), Backup node Checkpoint, Backup metadata of Name node Data node Storing data Reliability backed by multiple replicas 4
Challenge Is reliable? Our case. Cloud storage for CDN Service 1PB data 99.999% Availability (less than 5 min / year). Files are not reproducible. The only copy. 5
Data durability Data durability = (1-P_blockloss) * (1-P_metadataloss) * (1-P_humanerror) * (1-P_sortwareerror) 6
P_metadataloss P_metadataloss is reduced by Multiple local disk RAID Remote NFS Separate secondary node or backup node Backup multiple copies of different ages 7
P_humanerror P_humanerror is reduced by Enable Trash facility Permissions 8
P_softwareerror P_softwareerror is reduced by Common HDFS configuration Move instead of programmatic delete 9
P_blockloss r = number of replicas f = number of datanode that fail concurrently n = number of datanode b = number of blocks p = probability of failure of a single machine P_single_failure = E[downtime] / (E[downtime] + E[uptime]) = MTTR / (MTTR+MTBF) MTTR = ( b / n ) / (3*n) P_single_failure = b/3n 2 / (MTBF + b/3n 2 ) P_failure = C(n,f) * p_single_failure f * (1-p_single_failure) (n-f) P_single_blockloss = ( C(n,f)*C(f,r) ) / ( C(n,f)*C(n,r) ) = C(f,r) / C(n,r) P_no_blockloss = (1 - P_single_blockloss) b n P_blockloss = 1 - P_failure * P_no_blockloss f=0 10
P_blockloss To reduce P_blockloss Minimize number of blocks Enable dfs.datanode.failed.volumes.tolerated Local redundancy for datanode OS volume 11
Availability Availability = (1-P_failure_NN) * (1-P_blockloss) * (1-P_networkerror) * (1-P_maintenance) where P_failure_NN = MTTR / (MTBF+MTTR) 12
Improve Availability To improve Availability Active - Standby HA Cluster Fault tolerant Name node Standby HDFS Cluster Virtualization Layer for Multiple HDFS Cluster 13
Active - Standby HA Cluster DRBD+LinuxHA (http://www.cloudera.com/blog/2009/07/hadoop-ha-configuration/) Reduce MTTR Reduced MTTR is still very large from few minutes to several hours 14
Fault tolerant Name node Remove MTTR, Reduce P_maintenance NN availability (https://issues.apache.org/jira/browse/hdfs-1064) Avatar node (https://issues.apache.org/jira/browse/hdfs-976) - Not yet available EMC Greenplum Google GFS 15
Standby HDFS Cluster Standby cluster keep synchronized with Master cluster Reduce MTTR, P_maintenance, P_blockloss, P_networkerror Google app engine (GFS) 5 hours down, 2, July, 2009 http://groups.google.com/group/google-appengine/msg/ba95ded980c8c179?pli=1 16
Virtualization layer for Multiple HDFS Virtualization layer provides Different version HDFS support Virtual namespace on the top of multiple HDFS cluster Redundancy over HDFS clusters Remove MTTR, P_maintenance Reduce P_blockloss, P_networkerror 17
Summary To get greater availability, data durability Keep number of blocks as small as possible Multiple age, location of metadata backup Fast recovery is very important for both availability and durability More replica for important data 18
For Your Market Leading Questions nfractals www.nfractals.com Moonsoo Lee(moon@nfractals.com) 19