5 HDFS - Hadoop Distributed System 5.1 Definition and Remarks HDFS is a file system designed for storing very large files with streaming data access patterns running on clusters of commoditive hardware. (Definition of Konstantin Shvachko) Some notions : Very large files : Files of hundreds of megabytes, gigabytes, terabytes or maybe petabytes. Streaming data access : The idea behind HDFS is that the most e cient data processing pattern is write-once, read-many-times. The time to read the whole dataset is more important than the latency in reading the first record. Commodity hardware : Hadoop is designed to run on clusters of commodity hardware, the chance of node failure across the cluster is high (at least for large clusters). HDFS is designed to carry on working without a noticeable interruption to the user in the face of such failure. 5.2 HDFS Concepts 5.2.1 Blocks A block of HDFS is 64MB, it is much more than a disk (typical size 512 bytes). HDFS blocks are large to minimize the cost of seeks. By making a block large enough, the time to transfer the data from the disk can be significantly longer than the time to seek the start of the block. Benefits for having a block abstraction for a distributed file system : File can be larger than any disk in the network, so it can be stored on di erent disks. Unit of abstraction a block rather than a file simplifies the storage subsystem (in term of calculation the memory size). File metadata such as permissions information does not need to be stored with the blocks. To list the blocks that make up each file in the filesystem. % hadoop fsck / -files -blocks (after starting dfs by start-dfs.sh, here we run in pseudo-distributed mode) The result would be like Connecting to namenode via http://localhost:50070 FSCK started by riduan91 (auth:simple) from /127.0.0.1 for path / at Mon Jan 19 10:02:38 CET 2015 / <dir> /user <dir> /user/riduan91 <dir> /user/riduan91/input <dir> /user/riduan91/input/capacity-scheduler.xml 4436 bytes, 1 block(s): OK 0. BP-934661546-129.104.73.150-1421604583204:blk_1073741825_1001 len=4436 repl=1 /user/riduan91/input/configuration.xsl 1335 bytes, 1 block(s): OK 0. BP-934661546-129.104.73.150-1421604583204:blk_1073741826_1002 len=1335 repl=1 [...] /user/riduan91/input/yarn-site.xml 886 bytes, 1 block(s): OK 0. BP-934661546-129.104.73.150-1421604583204:blk_1073741853_1029 len=886 repl=1 Status: HEALTHY Total size: 77373 B Total dirs: 4 14
Total files: 29 Total symlinks: 0 Total blocks (validated): 29 (avg. block size 2668 B) Minimally replicated blocks: 29 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 1 Average block replication: 1.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Mon Jan 19 10:02:38 CET 2015 in 23 milliseconds 5.2.2 Namenodes and Datanodes There are two types of nodes operating in a master-worker pattern : a namenode and a number of datanodes, also called master and workers. The namenode manages the filesystem namespace, maintains the filesystem tree and the metadata for all the files and directories. The namenode knows the datanodes on which all the blocks for a file are located. It does not store block location persistently. A client accesses the filesyestem by communicating with both namenode and datanodes. Datanodes store and retrieve blocks when they are told to (by clients or by the namenode), they report back to the namenode periodically with lists of blocks that they are storing. The namenode is extremely important. It is essential to make the namenode resilient to failure. There are two mechanisms for this. Back up the files that make up the persitaent state of the filesystem metadata. Hadoop can be configured so that the namenode writes its persistent state to multiple file systems. Usually, it writes to local disk as well as a remote NFS mount. Run a secondary namenode which periodically merge the namespace image with the edit log to prevent the edit log become too large. Figure 3 HDFS architecture 15
5.2.3 Namespace and Block Storage The namespace consists of directories, files and blocks and supports all the namespace related file system operations such as create, delete, modify and list files and directories. The Block Storage Service has two parts Block Management (which is done in Namenode) Storage - is provided by datanodes by storing blocks on the local file system and allows read/write access. Figure 4 Namespace and Block Storage 5.2.4 HDFS Federation This part is copied from the Hadoop website In order to scale the name service horizontally, federation uses multiple independent Namenodes/namespaces. The Namenodes are federated, that is, the Namenodes are independent and don t require coordination with each other. The datanodes are used as common storage for blocks by all the Namenodes. Each datanode registers with all the Namenodes in the cluster. Datanodes send periodic heartbeats and block reports and handles commands from the Namenodes. Figure 5 HDFS Federation Architecture 5.3 The Command-Line Interface Some basic step : Format the system % hdfs namenode -format 16
Start NameNode and DataNode daemons % start-dfs.sh Make HDFS directories require to execute MapReduce jobs % hdfs dfs -mkdir /user % hdfs dfs -mkdir /user/riduan91 Copy input files to the DFS % hadoop fs -mkdir input1 % hdfs dfs -put weather.txt input1/weather.txt Run example % export HADOOP_CLASSPATH=bin % hadoop MaxTemperature input1/weather.txt output1 Copy output % hdfs dfs -get output1 output1 View the output % hdfs dfs -cat output1/* Remove file % hdfs dfs -rm input1/weather.txt Remove directory % hdfs dfs -rm output1/* % hdfs dfs -rmdir input1 output1 Stop the daemons % hdfs dfs stop-dfs.sh 5.4 The Java Interface Although we focus mainly on the HDFS implementation, DistributedFileSystem, in general we should try to write our code against the FileSystem abstract class, to retain portability across filesystems. 17
5.4.1 Reading Data from a Hadoop URL The program in Java : import java.io.inputstream; import java.net.url; import org.apache.hadoop.fs.fsurlstreamhandlerfactory; import org.apache.hadoop.io.ioutils; public class URLCat { static { URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); public static void main(string[] args) throws Exception { InputStream in = null; try { in = new URL(args[0]).openStream(); IOUtils.copyBytes(in, System.out, 4096, false); finally { IOUtils.closeStream(in); In Command Line : % cd [...] % export HADOOP_CLASSPATH=bin % hadoop URLCat hdfs:///user/riduan91/input1/weather.txt We should have the content of the file, for example 1941+22349_ 1941+26219_ 1942+27339_ 1942+28129_ 1943-00169_ 1943+01179_ 1944+24009_ 1944+25009_ 5.4.2 Writing Data import java.io.bufferedinputstream; import java.io.fileinputstream; import java.io.inputstream; import java.io.outputstream; import java.net.uri; import org.apache.hadoop.conf.configuration; import org.apache.hadoop.fs.filesystem; import org.apache.hadoop.fs.path; import org.apache.hadoop.io.ioutils; 18
import org.apache.hadoop.util.progressable; public class FileCopyWithProgress { public static void main(string[] args) throws Exception { String localsrc = args[0]; String dst = args[1]; InputStream in = new BufferedInputStream(new FileInputStream(localSrc)); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(dst), conf); OutputStream out = fs.create(new Path(dst), new Progressable() { public void progress() { System.out.print("."); ); IOUtils.copyBytes(in, out, 4096, true); Run in command line : % hadoop FileCopyWithProgress weather.txt hdfs:///user/riduan91/weather1.txt 5.4.3 Querying the system import java.net.uri; import org.apache.hadoop.conf.configuration; import org.apache.hadoop.fs.filestatus; import org.apache.hadoop.fs.filesystem; import org.apache.hadoop.fs.fileutil; import org.apache.hadoop.fs.path; public class ListStatus { public static void main(string[] args) throws Exception { String uri = args[0]; Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(uri), conf); Path[] paths = new Path[args.length]; for (int i = 0; i < paths.length; i++) { paths[i] = new Path(args[i]); FileStatus[] status = fs.liststatus(paths); Path[] listedpaths = FileUtil.stat2Paths(status); for (Path p : listedpaths) { System.out.println(p); 19
Run in command line : % hadoop ListStatus hdfs:///user/riduan91 5.4.4 Deleting Data We can use directly the delete() method on FileSystem to permanently remove files or directories. 20