A Comparison of Fault-Tolerant Cloud Storage File Systems

Size: px
Start display at page:

Download "A Comparison of Fault-Tolerant Cloud Storage File Systems"

Transcription

1 A Comparison of Fault-Tolerant Cloud Storage File Systems Steven Verkuil University of Twente P.O. Box 217, 7500AE Enschede The Netherlands ABSTRACT There are many cloud storage file systems that guarantee fault-tolerance. Implementation of fault-tolerance in a cloud storage file system is achieved in several different ways. This paper aims to find the benefits and drawbacks of existing fault-tolerant file systems by defining criteria on which faulttolerant file systems can be graded. Several distributed file systems will be compared to discover how the criteria are satisfied. The research concludes with an overview of how the different file systems, each powered by their own distinct architecture, perform in an environment that is prone to errors. It is shown that not all file systems perform evenly well regarding the criteria, therefore underlining the need to evaluate fault-tolerance behavior when choosing a file system. Keywords Fault-tolerant, distributed, cloud, storage, comparison, file system, HDFS, GlusterFS, XtreemFS. 1. INTRODUCTION Fault-tolerant data storage is becoming more relevant as businesses and individuals move their data to the cloud. Many cloud providers exist these days and provide out of the box solutions for basic storage needs. Examples of such cloud storage providers are Dropbox, box.net and Google Drive. However not all situations allow for third-party hosting of data. For example when storing privacy sensitive data that must be stored in a certain country by law. Or when a third party distributed network does not allow for fast enough access times for data intensive applications. Therefore, research in setting up robust cloud storage networks is relevant for many businesses and cloud operators these days. Fault-tolerance is an important aspect in cloud storage because it concerns the robustness of the data that is stored [3]. Data is distributed over multiple machines which are prone to network failures. If a server containing data becomes unavailable in a cloud environment, it must be prevented that depending services also become unavailable. This is the main reason that robustness in cloud storage file systems is actively researched. Although ample research is done on cloud storage file systems [3, 4, 9] no detailed study on the aspect of comparing fault-tolerant mechanisms used in cloud storage is done at the moment of writing. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 19 th Twente Student Conference on IT, June 24 th, 2013, Enschede, The Netherlands. Copyright 2013, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science. This paper aims to find out what current cloud storage faulttolerant file systems exist and compare how they implement fault-tolerance. This paper describes a methodology to compare different fault-tolerant file systems. The aim is to facilitate the selection of file systems by cloud storage operators. This paper answers the following research question: What are the benefits and drawbacks of existing fault-tolerant cloud storage file systems? The main research question is answered by evaluating related sub-questions. These questions are: 1. Which fault-tolerant architectures for cloud storage file systems are currently available? 2. Which criteria can be used to compare faulttolerant cloud storage file systems? 3. How do existing fault-tolerant cloud storage file systems satisfy these criteria? The first research question will be answered by performing a literature study on existing fault-tolerant file systems architectures. Because the vast amount of available file systems, and due to the limited amount of time, this research will only focus on three commonly used open-source faulttolerant file systems. The following file systems, accompanied with version numbers, are chosen to be included in the research: Apache Hadoop File System (HDFS) release GlusterFS release XtreemFS release 1.4 The above three file systems are chosen after a literature search for distributed file systems and selected for their popularity in usage. An extensive analysis of the documentation of these file systems and relevant literature will be carried out to answer question one. Several additional papers focusing on different fault-tolerant architectures are studied to understand the inner operation of the faulttolerance mechanisms.. The literature analysis creates a basis for question two, in which we extrapolate on the term `fault-tolerance`. This paper focuses on the failure of a node / multiple nodes in a distributed network at any given time. The possibility that the actual data is corrupted during storage transactions is not covered in this document due to time and resource constraints. Criteria that can be used for comparison of different faulttolerant cloud file systems are defined. These criteria will be related, but not limited, to aspects such as (1) the ability to recover from the concurrent loss of multiple machines, (2) the ability to handle interruptions during a read or write operation

2 on a file, or (3) the time it takes for a node to be synchronized after recovery from network failure. The third research question will be answered by conducting a comparison on the defined criteria. How the criteria are satisfied by the file systems is derived by querying available documentation and referencing other researches. Answering the third research question will also involve a basic experiment in which some of the criteria can be evaluated in order to compare the three file systems. The experiment setup involves four virtualized Linux machines which together run a distributed file system installation. A networking error is then simulated on one of the machines and investigating which impact this will have on the files that are stored. The experiment is thus concerned with the difference between the files that are stored, see block 1.files in Figure 1, and how long it takes for the files to become accessible after a crash has occurred, see block 3.files in Figure files 3. files 2. Fault-tolerant file systems Figure 1. Schematic test setup Many research studies regarding fault-tolerance have already been done. Oriani et al. [12] have proposed a way to improve fault-tolerance for the Hadoop file system. Wang et al. [15] did research in removing single points of failure in an effort to improve fault-tolerance for Hadoop based file systems. General research on fault-tolerance behavior of the HDFS fault-tolerant file system was done by Evans [4]. Hupfeld et al. [9] have researched object-based file systems in grids regarding the XtreemFS architecture and how they benefit from fault-tolerance. Pardi et al. [13] researched GlusterFS scalability for large data processing. Chan et al. [3] proposed a coding-based fault-tolerant network storage system which is able to recover files after storage interruption. However, no detailed research has been done in comparing fault-tolerant file mechanisms used in cloud storage systems. This research contributes to obtaining an understanding of criteria that can be used by cloud providers to compare different fault-tolerant architectures. Furthermore a comparison of these criteria will be provided by comparing how different file systems satisfy these criteria. This paper is organized as follows: in Section 2 the different architectures for fault-tolerance are explored and explained for each of the three file systems, answering question 1. In section 3 criteria for comparison will be described, answering question 2. A qualitative comparison between the different architectures will also be done in this section, partially answering question 3. In section 4 the setup for the experiments will be discussed which is needed to fully answer question 3 in Section 5. Section 6 contains the conclusions of the research and the future work. 2. FAULT-TOLERANT ARCHITECTURES 2.1 HDFS The Hadoop Distributed File System (HDFS) is a file system build upon the open-source Apache Hadoop framework [2]. The Apache Hadoop framework, together with MapReduce (a computational paradigm from Google), provides a basis for applications which need to process large numbers of data in a distributed manner. HDFS is capable of handling multiple petabytes of data [14] and is used by companies that require a large storage infrastructure such as Facebook 1 and Yahoo 2. It is important to understand the basic architecture of HDFS before investigating its fault-tolerance capacities. A typical HDFS cluster contains two important types of nodes. These are the NameNode and DataNodes [2]. A cluster contains one NameNode and one or more DataNode machines. The NameNode is a master server responsible for managing the file system namespace and regulating access to data by clients. When a client uploads a file it is split into one or more blocks which are then stored in a set of DataNodes. These DataNodes are also responsible for servicing read and write requests from the file system s clients [2]. Figure 2 shows a schematic overview of the HDFS architecture. HDFS Client Read File Write File Metadata DataNodes NameNode Replication Operations Figure 2. HDFS architecture overview, based on [2] The DataNodes in Figure 2 are responsible for replication which is the primary fault-tolerance mechanism of HDFS. Each file can have a certain replication factor, indicating how many copies of all blocks associated with the

3 file should be stored in the HDFS cluster. Once a block is being written to a DataNode it is instantly replicated to another node until the replication factor is satisfied. For each 4KB of data the DataNode reads, it writes the same 4KB of data to another DataNode. The order of DataNodes is determined by the NameNode when setting up the initial file write transaction for the client. The process of passing on 4KB chucks repeats until the replication factor is met. The data is effectively pipelined from one node to the next, a process called Replication Pipelining [2]. Now that we have a basic understanding of HDFS, we can also easily identify the first shortcome of this architecture with regard to network fault-tolerance. Namely, the NameNode is a single node in the network for which there is no duplicate. Once the NameNode goes offline no transactions could be made by any of the clients, effectively loosing track of all stored data. This single point of failure is a drawback of the HDFS implementation [4]. However, it should be mentioned that some work has been done in replicating the NameNode to overcome the problem of a single point of failure [15]. 2.2 GlusterFS GlusterFS is an open source distributed file system designed to be capable to handle enormous amounts of data storage [13]. GlusterFS is maintained by the multinational software company Red Hat and is actively deployed in cloud services 3. GlusterFS architecture aggregates several disks and memory resources in a single global namespace with one common mount point on a Linux machine. Thousands of applications and clients can then connect to the GlusterFS file system via this mount point and interact with the stored data [6]. GlusterFS is built upon the FUSE project which allows software to create virtual file systems and integrates with Linux kernels 2.4.x and 2.6.x. GlusterFS is basically a layer upon already existing file systems such as ext4 and can scale to several petabytes of storage which is available under a single mount point [8]. In GlusterFS the elemental storage units are called Bricks. Bricks store data via translators on lower-level file systems. A server can have one or more bricks. A Trusted Storage Pool (TSP) can be created to combine multiple storage servers into a Distributed volume. GlusterFS can work with three different types of volumes which are discussed briefly below. Distributed Volumes spread files randomly across bricks in the volume. There could be multiple servers running one or more bricks. However each file is only stored once and therefore server failure can result in serious loss of data. Replicated Volumes create copies of files across multiple bricks in the volume. It can be configured such that a file copy is always put on a different server, and not on a different brick on the same server, in order to better protect against data loss. Striped Volumes stripe file data across multiple volumes. A file is split up into segments and each segment is stored on a brick. This allows for a significant speedup in high concurrency environments. There are also several variations possible on the types discussed above. For example Distributed Striped Volumes or Striped Replicated Volumes. However since this research is only concerned with fault-tolerance, the focus will be on Replicated Volumes for the rest of the this research. Replicated volumes are the most fault-tolerant types of 3 storage structures that GlusterFS supports [8]. Figure 3 illustrates a typical setup using a Replicated Volume in GlusterFS. File1 GlusterFS Client Read Server Brick File2 Replicated Volume File3 Write File1, File2, File3 File1 Server Brick File2 File3 Figure 3. GlusterFS Replicated Volume based on [8] When a volume is created in GlusterFS, a so called replica count is given as parameter during setup. This replica count specifies how many copies of each file should be stored inside the Replicated Volume. The replica count should match the number of bricks inside the volume. So assigning one brick to each server in a volume ensures maximum protection against server failure since files are distributed amongst many different machines [8]. 2.3 XtreemFS The XtreemFS file system is an open-source object-based file system for wide-area infrastructures [9]. XtreemFS uses Replication to implement fault-tolerance by maintaining replicas of files on physically different servers [1]. The XtreemFS architecture prescribes three types of servers which should all be present to form a working installation [1]. The first type is a Directory Service (DIR). The DIR is the central registry for all services in XtreemFS and contains the configuration settings and a list of peers. It is used by the Metadata and Replica Catalog (MRC) to discover storage servers. The MRC contains the directory tree and file metadata. It is also responsible for authentication and authorization of file access. The third type is the Object Storage Device (OSD). The OSD stores actual objects of files and interfaces with Linux/Windows clients for read and write operations [1]. Figure 4 illustrates the XtreemFS layout. Metadata operations, Object locations, Authorization Metadata and Replica Catalog (MRC) Service registration XtreemFS Client File system management DIR server Parallel read/write Object Storage Devices (OSD s) Figure 4. XtreemFS architecture, based on [1] XtreemFS allows for replication of all three types of servers [1]. Whereas OSD server replication would be `trivial` to implement fault-tolerance for stored files, MRC and DIR replication could also be used to significantly increase the file systems reliability. For example, the MRC server is in some degree similar to the NameNode in the HDFS file system (see Section 2.1). When the NameNode fails, all metadata is lost

4 and HDFS cannot serve files any longer. However when we analogously have replicas of MRC servers, other MRC servers could take over in case of failures, avoiding file system disruptions. The implementation of replication in OSD, MRC and DIR servers are all similar. When a XtreemFS client connects a lease is given and a primary replica is identified for the transaction. This primary replica accepts all updates from the XtreemFS client and applies them to all other replica instances in the same order as received. When the primary node fails, for example due to a networking error, the lease will eventually expire and one of the other replica servers can become primary [1]. The transaction is restored up to the point where the previous server left after the lease is renewed. Upon connection of a new user machine, the XtreemFS client software mounts one of the MRC volumes in a local directory. This is done via the FUSE user-level driver which was briefly introduced in Section 2.2. The freshly mounted MRC is used to find the location of objects on the OSDs and its associated metadata such as file size, file type, modification date and ownership information. After acquiring the location of the file that the user wishes to read/write, the XtreemFS client facilitates a parallel read/write connection to the OSDs that correspond to the file that is being processed. XtreemFS uses striping for parallel reading/writing to objects that are part of a file. This increases data throughput without compromising for available data operations [10]. XtreemFS allows for three different policies to be used when storing files onto the file system. The first policy does not allow for replicas to be made and simply stores each file only once. The second policy, WaR1, stands for write all, read once. The amount of replicas can be configured by the replication count during setup and before writing a file it is checked if the required amount of servers is online to guarantee the replication count. If this is not the case the read operation would return false and the write is cancelled. The final policy, WqRq (Write Quorum Read Quorum) applies majority voting to writing and reading a file. It means that the majority of servers has to be available when writing a file to the file system. This is the most fault-tolerant strategy [1] and will be used for the rest of this research. 2.4 Architectural differences As becomes clear from the previous sections, one of the most distinct difference in the architectures from the three selected file systems lies within how metadata is stored. The metadata is separated from the actual storage of data in the HDFS and XtreemFS architectures whilst it is combined into one block in the GlusterFS file system. Also the metadata server implementation differs between file systems. The XtreemFS architecture allows for the metadata server to be replicated whilst the HDFS architecture does not allow for metadata replication, leaving a single point of failure in the file system. In all file systems the actual fault-tolerance behaviour is obtained by replicating the bytes of data that is being stored. When a single storage machine fails, other machines still contain a replica of the stored data and can serve that copy. 3. COMPARING FAULT TOLERANT ARCHITECTURES Several different approaches to fault-tolerance mechanisms are deployed in the file systems Apache Hadoop (HDFS), GlusterFS and XtreemFS. In order to compare these three different file systems, a set of criteria have to be defined. These criteria will then be used for a qualitative comparison in Section 3.2 and a quantitative comparison in Section Criteria This research is concerned with the fault-tolerance mechanisms of distributed file systems regarding networking issues. There are six identified criteria which are related to networking issues and can be used to compare fault-tolerant architectures. The first three criteria are related to node network failure. The subsequent criteria are related to network interruptions during a transaction. The last criterion is related to the file systems ability to recover from network outage. These criteria are identified after evaluating the different phases in a network transaction which could interrupt the file system. A fault-tolerant file system should be able to handle networking related errors in each of these phases, namely before a transaction, during a transaction and after a transaction. For each of the criteria a detailed description is listed. Some terminology is used to define the criteria. A node references a Linux machine running some (specific) file system software component. A storage node is a node which is dedicated to actually storing the bytes of a file. The term network relates to the network connection of a single node which is used by that node to communicate in a distributed way with other nodes that are part of the file system. The following six criteria are identified for this research: 1. Random node network failure. The file system's ability to handle the network loss of a single randomly chosen node without compromising for file availability. 2. Storage node network failure. The file system's ability to cope with the loss of a single storage node without loosing the capability to restore the original file from other, still available, nodes. 3. Multiple storage node network failure. The file system's ability to successfully respond to client file requests whilst coping with the concurrent loss of two or more storage nodes. 4. Write interruption. The file system's ability to continue writing the file served by the client whilst the primary storage node becomes unavailable during the transaction. 5. Read interruption. The file system's ability to continue serving a read request when the primary storage node that serves the request becomes unavailable during the transaction. 6. Re-replication time. The file system's ability to create a new replica of a file on an available storage node when a (primary) node containing the file disconnects due to networking failure. The first three criteria are concerned with network failure of specifically or randomly chosen nodes. A true fault-tolerant file system should be able to handle the loss of one or more nodes without compromising for system stability [4]. Criteria four and five are related to user interaction. Applications that rely on files that are stored on a distributed cloud file system can be interrupted if a read/write operation on a specific file fails due to a networking node error. Interruption of applications can be prevented if a file system is able to handle node networking failures real-time. The last criterion is concerned with the data redundancy factor. Each file should have multiple copies to guarantee maximal availability and

5 resistance against networking errors. If a node designated to contain a copy of a file recovers from a network crash, the file system should update it with all files that need to be replicated. Maximizing the amount of file copies in the system at any given point of time increases storage robustness [14]. Most of the six criteria regarding network fault-tolerance that are identified allow for a literature-study approach to compare how the different file systems behave in the different scenario's. File system architectures enforce a certain behavior upon events that occur. This behavior is thoroughly documented and tested by the designers of the file system and the users that have implemented the file system in their own computer systems. In Section 3.2 criteria 1-5 will be the basis for a literature-based comparison between the three different file systems. In Section 4 an experiment is described to discover how the file systems satisfy the sixth criterion. 3.2 Qualitative comparison In this section we perform a qualitative comparison based on the first five criteria described in the previous section. For each of the criteria, numbered 1-5, we describe the behavior of each of the three file systems. For each of the criteria the three file systems will be evaluated and graded bad, fair, good or N/A. File systems obtaining a bad grade mean that the file system does not satisfy the criterion at all. A fair grade is given if the file system satisfies the criterion for most situations. A good grade means that the criterion is satisfied even in the worst case in which a large part of the file system fails to operate due to network failures. A N/A grade is used when the selected criterion cannot be applied Random node network failure As we recall from Section 2.1 the HDFS architecture consists of two types of nodes, namely the NameNode and DataNodes. A random node network failure would mean that any of these types of nodes can fail. If the NameNode fails the entire file system is compromised and will be unable to serve files [15]. This single point of failure is a weakness in the HDFS file system and therefore HDFS is unable to guarantee correct functionality after a random node network failure. Mikami et al [11] explain in their research that GlusterFS does not store metadata and does not have a meta server. All servers that host bricks are able to locate any piece of data without looking it up in an index or querying other servers. Therefore the failure of a randomly chosen node does not influence file availability if Replicated Volumes were used. XtreemFS allows for all three types of servers; DIR, MRC and OSD to be replicated for redundancy [1]. Assuming that this functionality is exploited by the client the systems allows for full fault-tolerance, and at any given point in time a random node can fail without depending applications even noticing. Concerning random node network failure, HDFS is rated bad because the failure of a single NameNode renders the entire file system useless. GlusterFS and XtreemFS are both considered good because they allow for a random node to disconnect from the system without disrupting the file system Storage node network failure HDFS has DataNodes that are responsible for storing blocks of data. According to Oriani et al [12], the HDFS architecture can handle the failure of a single DataNode without compromising for file availability, due to the fact HDFS replicates blocks to at least three different DataNodes. GlusterFS distributes data over mirrors using synchronous writes [7]. A single storage server can fail without consequences to the rest of the file system. Stored data will be available if at least one node containing the file is online. Hupfeld et al [9] show that the XtreemFS architecture is capable of replicating stored files across multiple OSD instances. If a single OSD fails the data will still be retrievable from a duplicate stored on a different OSD as long as the majority of the servers remains online [1]. All of the three architectures allow for data to be stored redundantly and therefore are resistant against the failure of a single storage node. All three file systems are therefore classified as good Multiple storage node network failure HDFS is capable of handling multiple DataNode failures simultaneously. File availability after the failure of several DataNodes is depending on the replication factor that is set upon writing the file for the first time [2]. Once a DataNode becomes unavailable it triggers the NameNode to register a new copy of it on a still available DataNode in an attempt to honor the replication count set by the user [14]. GlusterFS maintains a parallel connection to all storage servers in order to quickly read an write data. In a replicated environment each file is stored n-times where n is to be configured in the GlusterFS setup. As long as at most one storage server is online, the file can be retrieved without the depending application being interrupted [8]. The availability of files in XtreemFS is depending on the replication policy that is chosen during setup. The WqRq policy, which is previously described in Section 2.3, is the most fault-tolerant policy. File availability using the WqRq policy is only guaranteed when the majority of the replica's is available [1]. HDFS is classified as good because multiple storage nodes can disconnect, and this failure of a storage node trigger the replication-process instantly. GlusterFS allows for a large number of nodes to disconnect before depending applications fail. Assuming a replication count of three or more copies, GlusterFS is also considered good. XtreemFS is considered fair because it depends on the majority of the storage nodes to be online to ensure data availability Write interruption Riahi et al [14] showed that HDFS is able to handle interruptions during the write operation of a file to the file system. If the primary DataNode that is used to write the file to becomes unavailable during the write operation, another DataNode is selected and a new attempt to write the blocks of the file is made. An upper bound on the amount of retries can be set in the HDFS configuration file and the system will retry the write operation until the transfer completed successfully. The GlusterFS client maintains parallel connections to all GlusterFS servers. Data being written to a Replicated Volume will be written to all associated servers at the same time. If a server disconnects during the write operation, all other parallel connections will stay alive, thus writing the file to all other servers that are still online. Applications depending on GlusterFS will not be interrupted in any way [7]. XtreemFS utilizes the so called hot-backup paradigm which assigns a new server to the job if a primary server fails [1]. This behavior, combined with parallel writing to multiple

6 OSDs ensures that the primary server can fail without the transaction being invalidated. All three of the evaluated file systems provide a mechanism, either at the server side or at the client side, which allows for real-time recovery of write operations and therefore are graded good Read interruption HDFS is capable of handling DataNode failures during read operations. Riahi et al [14] showed that when a disruption occurs on the primary DataNode during a read operation the file system is able to automatically select a new DataNode that serves a copy of the requested data block. GlusterFS is able to handle read interruption automatically and without disruption to the application served by the file system. GlusterFS utilizes parallel and dynamically requests data from another connected server if the I/O operation is disrupted [7]. XtreemFS hot-backup functionality ensures that a read operation performed by a application is handled with the minimal amount of disruption when the primary storage node fails. XtreemFS maintains a lease for the transaction which expires when a server becomes offline. This lease is then assigned to a OSD containing a copy of the file and the read operation is continued with minimal delay [1]. Each of the file systems provide a way to continue serving read requests of a specific file without interrupting the client that is reading the files. Therefore all of the tested file systems are considered good in handling read interruptions due to networking failures. In the subsections above several criteria regarding faulttolerance are tested against the HDFS, GlusterFS and XtreemFS architecture. Table 2 in Section 6 shows qualitative comparison results when using criterion 1-5 given in Section 3.1. Keep in mind that only fault-tolerance regarding networking issues are evaluated in this comparison. Different results might be obtained when tested against other issues, for example partial data corruption from disk errors. 4. EXPERIMENTS An experiment has been performed to compare HDFS 1.1.2, GlusterFS and XtreemFS 1.4 on the remaining criterion namely Re-replication time. For this experiment four virtual machines running Debian Squeeze are setup for each file system. Each virtual machine has one dedicated CPU core of 2,53 GHz and 1 GB ram assigned to it. The virtual machines are connected locally via a virtual 1Gbit network connection. One of the four virtual machines will be running the file system client and other related services. The other three servers will be part of the storage infrastructure. The virtualization software used to run and configure the virtual machines is Oracle VM VirtualBox 4.2, the host machine is a 64-bit Windows 7 installation. In order to test the Re-replication time criterion/metric, different sets of files containing randomly generated contents will be written to the file system and thus distributed over the different storage machines. For the purpose of this experiment, 12 different sets of files are used which are listed in figure 5. The sets vary in both file size and file count in order to identify how these parameters influence the rereplication time in comparison with the other file systems. Possible synchronization time differences will be highlighted when reviewing the test results in later sections of this paper. 1. Set of 10 files, 1 Megabyte each 2. Set of 20 files, 1 Megabyte each 3. Set of 50 files, 1 Megabyte each 4. Set of 70 files, 1 Megabyte each 5. Set of 90 files, 1 Megabyte each 6. Set of 100 files, 1 Megabyte each 7. Set of 10 files, 5 Megabytes each 8. Set of 20 files, 5 Megabytes each 9. Set of 50 files, 5 Megabytes each 10. Set of 70 files, 5 Megabytes each 11. Set of 90 files, 5 Megabytes each 12. Set of 100 files, 5 Megabytes each Figure 5. List of different file sets used Each test run starts by storing half the amount of the files to all three storage nodes. Then one storage machine will be disconnected from the network such that the replication count falls below the configured minimum of three servers. After a single server is disconnected, the other half of the files is written to the file system. Then the server is put back online and it is measured how long it takes for the file system to upload the newly written files on the reconnected node such that the minimum of three copies of each file is met. Measurements are done by regularly polling log files or status commands to see how long it takes for the synchronization to complete. The time is measured in seconds and will give a good indication of how fast a file system is able to upload the newly written files on the reconnected node over time. The following grading for the re-replication time criterion is used. A good grade is given when the file system is able to sync the files within an average time of ten minutes. The ten minute criterion is chosen because it allows for a few minutes of network recovery and message propagation, and gives room for possible timeouts that are used in the file system. It is assumed that ten minutes is an acceptable amount of time when operating in a real-life environment with hundreds or even thousands of servers in which a single node is never the critical factor for file availability. A fair grade is given when synchronization happens within minutes. A bad grade will be given when no synchronization happened after 30 minutes of idle waiting time. It is then assumed that the file system is unable to provide the reconnected node with files that were written to the file system during the network outage. The N/A grade is given when the criterion could not be applied to the file system. Each file system has many relevant configuration parameters and differs in installation procedure. For this research the default provided configuration is used and where needed improved for fault-tolerance purposes. A global overview of how the experiment is performed for each file system is provided in Section 4.1. Detailed installation procedures and relevant parameters for each file system are given in Appendix A. The replication count is set to three copies on all file systems to allow for a fair comparison. 4.1 File system setup HDFS setup The HDFS file system is installed and configured on all four nodes as described in appendix A.1. For each test run, the node running the file system client uploads half of the file set and then disconnects a single DataNode slave. Once the

7 second half of the files is uploaded, the slave node is reconnected and the log file of the reconnected DataNode is polled every second. The time difference between the time of reconnection and the time of sync completion can be found in the log file and is stored for further analysis when the count matches the total amount of blocks for the current run GlusterFS configuration GlusterFS is installed and configured as described in appendix A.2. The node running the GlusterFS client software uploads half of the file and then blocks the connection to a chosen Storage brick. The connection is restored after the second half of the files is stored on the file system. The GlusterFS heal info command, which contains information about the amount of replicas, is then polled every second and the time it takes for the replication process to complete stored. Note that polling the heal information does not trigger the replication process itself [8] XtreemFS configuration The precise XtreemFS configuration parameters are given in Appendix A.3. For each test run, the master node running the client, MRC and DIR services starts by uploading half of the file set and then, analogously to HDF and GlusterFS, disconnects a single slave node. The second half of the files is then written to the file system and the disconnected node is reconnected. The master node polls the log file on the slave node each second and stores the timestamp when the newly written files are synchronized. 5. RESULTS AND ANALYSIS This section provides the results of the experiment described in Section 4. The experiment defined 12 sets of test data to be used in test runs. Each test run is executed ten times in order to obtain an average, meaning that for each file system a total of 120 test measurements is done. The average times in seconds it takes for a file system to finish replicating the files to a previously disconnected node after it becomes back online are given in Table 1. Each row represents a set of files as previously listed in Figure 5. The XtreemFS architecture does not lend itself for testing the re-replication time criterion for reasons explained later on in this section and did not produce any time measurements. Table 1. HDFS and GlusterFS test run averages HDFS GlusterFS Set sec 487 sec Set sec 505 sec Set sec 504 sec Set sec 490 sec Set sec 510 sec Set sec 502 sec Set sec 504 sec Set sec 503 sec Set sec 498 sec Set sec 504 sec Set sec 513 sec Set sec 488 sec Overall it can be seen that the re-replication times are rather high, in the order of seconds. One might expect that the file systems tested would have timings in the order of seconds regarding the relative small amount of files that is tested in the experiment setup and the capability of the file systems to handle terabytes of data. It is however good to understand that this experiment measures the total time before the files are synchronized to the reconnected node. The largest part of the recorded timings is simply waiting time before the actual synchronization starts. Figure 6 illustrates the re-replication average times for files with 1 Mb and 5MB size when the number of files is increased. The 95% confidence intervals are also shown. Figure 6. HDFS and GlusterFS test runs For HDFS, the experiment does not show a linear relationship between increasing the file size or file count and the delay before the synchronization is completed. The HDFS architecture allows for a possible explanation for the absence of such a linear relationship. Each DataNode which holds a copy of the file is able of synchronizing with the DataNode that has encountered the network disconnection. Each DataNode might have a different timeout value for checking file integrity via so called heartbeat messages [2]. It could also be that HDFS can more efficiently synchronize large file sets than smaller sets, although this statement can not be backed by literature or documentation. Further research is needed to investigate the observed behavior of the HDFS file system in more detail. Due to the fact that the HDFS file system is capable of replicating files after networking failure in a reasonable amount of time and therefore is graded good regarding the re-replication time criterion. The GlusterFS experiment results show an almost constant synchronization time for Replicated Volumes after they recover from a networking error. An explanation for this behavior is that GlusterFS uses a constant timeout [8] to check for file integrity and that internal transfer rates are high enough to not make a significant time difference between the performed test runs. Due to the fact that GlusterFS is able to synchronize data within an acceptable amount of time it is graded good regarding the re-replication time criterion. XtreemFS allows for different policies to be used for file replication. As explained in Section 2.3 the write Quorum Read Quorum (WqRq) policy was used for this experiment. All of the policies specify a minimum amount of servers that has to be online for the file-write operation to even succeed. For the WqRq policy the majority of the servers has to be online and then the file system would allow for the files to be stored. When a single server is taken offline out the three available servers the majority, namely two servers, is still online. XtreemFS accepts the loss of a single server and

8 writes the files to only two servers. No effort will be made to synchronize the files to another node if it becomes back online since the policy only guarantees that the files are stored on the majority of servers, which is the case. This behavior earns XtreemFS a N/A grade in for the re-replication time criterion. 6. CONCLUSIONS AND FUTURE WORK Three file systems that are architecturally different have been compared on network related fault-tolerance in this research. HDFS, GlusterFS and XtreemFS each proved to be capable handling networking errors to some degree. A total of six criteria each of which is associated with a phase in a network transaction have been identified and used for comparison. HDFS is overall very capable of delivering fault-tolerance for file storage. However, the architecture of HDFS has a single point of failure, the NameNode. Because of this it is uncertain what will happen if a random node fails. If the NameNode is put out of service, all other files will become unavailable and the file system halts until the NameNode is back online. GlusterFS is an robust file system which is the only file system to receive the good grade on all six criteria. Faulttolerance data storage is one of the key aspects of the GlusterFS file system and is proved to be reliable for resilient data storage. The absence of a metadata server allows for a random node to fail without the possibility of disturbing applications which are depending on the contents served by GlusterFS. The XtreemFS file system architecture is not quite as resilient as the two other tested file systems. The XtreemFS architecture does not allow for replication to automatically occur once a node is reconnected after failure. XtreemFS establishes whether there are enough storage nodes online at the time the file-write operation is performed. It is not concerned with replicating data, to match the desired replication count, when other Storage servers become available after reconnection. Table 2 lists a summary of grades given for each file system based on the selected criteria. A bad grade is given when the file system does not satisfy the criterion at all. A fair grade is given if the file system satisfies the criterion for most situations. A good grade is only given when the criterion is satisfied in the worst possible scenario. A N/A grade is given if the criterion cannot be applied. Table 2. Comparison of fault-tolerant behavior 1. Random node network failure 2. Storage node network failure 3. Multiple storage node network failure HDFS GlusterFS XtreemFS Bad Good Good Good Good Good Good Good Fair 4. Write interruption Good Good Good 5. Read interruption Good Good Good 6. Re-replication time Good Good N/A For future work, other types of fault-tolerance such as data corruption could be evaluated and used as basis for file system comparison. The role of the file system clients could also be investigated and evaluated how the different file systems handle an error originated from the client pc. Also the experiments performed in this research were limited to time and resource constraints. Running the experiments with larger sets of data could provide more insight on the cause of the fluctuations in the synchronization times that were measured when evaluating criterion 6. Further research could also be put into finding why HDFS provides such large time differences compared to GlusterFS when evaluating the rereplication time metric. 7. REFERENCES [1] J. S. Björn Kolbeck, Michael Berlin,, P. S. Matthias Noack, Felix Langner, NEC HPC Europe, Felix Hupfeld,, and J. Gonzales., The XtreemFS Installation and User Guide, [2] D. Borthakur. "HDFS Architecture Guide," Visited on 01-05, 2013; ml. [3] M. C. Chan, J. R. Jiang, and S. T. Huang, "Faulttolerant and secure networked storage," 7th International Conference on Digital Information Management, ICDIM pp , [4] J. Evans, Fault Tolerance in Hadoop for Work Migration, , [5] Fuse. "Filesystem in Userspace," Visited on 01-05, 2013; [6] Gluster. "About GlusterFS," Visited on 04-05, 2013; [7] Gluster. "Intruduction to Gluster," Visited on 04-04, 2013; [8] Gluster, Gluster File System Administration Guide, gluster.org, [9] F. Hupfeld, T. Cortes, B. Kolbeck et al., The XtreemFS architecture - A case for object-based file systems in Grids, Concurrency Computation Practice and Experience, vol. 20, no. 17, pp , [10] B. K. Jan Stender, Felix Hupfeld,, E. F. Eugenio Cesario, Matthias Hess,, and J. M. Jesús Malo, Striping without Sacrifices: Maintaining POSIX Semantics in a Parallel File System, LASCO'08 First USENIX Workshop on Large-Scale Computing, no. 6, [11] S. Mikami, K. Ohta, and O. Tatebe, "Using the Gfarm file system as a POSIX compatible storage platform for Hadoop MapReduce applications." pp [12] A. Oriani, and I. C. Garcia, "From backup to hot standby: High availability for HDFS." pp [13] S. Pardi, A. Fella, F. Bianchi et al., Testing and evaluating storage technology to build a distributed Tier1 for SuperB in Italy, Journal of Physics: Conference Series, vol. 396, no. PART 4, [14] H. Riahi, G. Donvito, L. Fanò et al., Using hadoop file system and MapReduce in a small/medium grid site, Journal of Physics: Conference Series, vol. 396, no. PART 4, [15] F. Wang, J. Qiu, J. Yang et al., "Hadoop high availability through metadata replication." pp

9 APPENDIX A. FILE SYSTEM CONFIGURATIONS A.1 HDFS configuration For the Hadoop Distributed File System setup, HDFS was installed by downloading the stable release from the official repository and installing it with the `dpkg -i ` Linux command on all four virtual machines. HDFS comes with a predefined set of scripts, readily executable by the root system user. The main configuration script which is delivered with the installation files is `hadoop-setup-conf.sh`. It was executed on all four servers whilst designating the first server for both the namenode and jobtracker services (master node). The other three servers (slave nodes) are listed by their ipaddress in the conf/slaves file of the Hadoop installation. A passphraseless SSH setup is configured using ssh-keygen, allowing the master node to connect to all slaves via SSH without entering a password. The master node is then started using the predefined `start-all.sh` command which also automatically starts up the datanode daemon on all the slaves via a secure shell connection. For each test run, the master node uploads half of the file set and the disconnects a single slave node by executing `iptables -A INPUT -j DROP && iptables -A OUTPUT -j DROP` on the selected slave node. The connection is restored via the `iptables -F` command after the second half of the file set is uploaded. The log file of the disconnected DataNode, located on is then polled every second and the occurrence of the 'Received block' line counted. The time difference between the date of reconnection and date of sync completion is stored when the count matches the total amount of blocks for the current run. A.2 GlusterFS configuration GlusterFS setup required manually downloading the glusterfs- (common client server)_ _amd64.deb packages from the GlusterFS main repository. One virtual machine was designated for running the client software by installing the common and client packages via the `dpkg -i` command. The other three virtual machines were designated as GlusterFS slaves by installing the common, client and server packages. Installation of these packages automatically triggered the glusterd service to start on all servers. A Replicated Volume was created on the main node by executing the following command: `gluster volume create gv0 replica 3 s1:/export/sdb1/brick s2:/export/sdb1/brick s3:/export/sdb1/brick` In this command s1-s3 represent the GlusterFS-slaves ipaddresses and /export/sdb1/brick is an XFS formatted virtual drive used to store the actual files which is mounted on each GlusterFS-slave node. The resulting GlusterFS volume gv0 was then mounted on the node running the client software by executing `mkdir /gfs && mount -t glusterfs localhost:/gv0 /gfs`. The contents of the Replicated Volume are now available on the client node at the /gfs mount location. For each test run, the client node uploads half of the file set and the disconnects a single slave node by executing `iptables -A INPUT -s slave -j DROP && iptables -A OUTPUT -s slave -j DROP` on the client node itself, substituting "slave" for the ip-address of the slave node that is about to be disconnected. The connection is restored by executing the `iptables -F` command after the second half of the file set is uploaded. The GlusterFS heal information, available via the ` gluster volume heal gv0 info` command, is then polled every second. It is measured how long it takes for the heal information to display that all files have been replicated exactly three times. A.3 XtreemFS configuration XtreemFS 1.4 was installed via the `apt-get install xtreemfs- (client server)` command after adding the official Debian repository which can be found on the xtreemfs.org website. Of the four available servers in the test setup, the first server was considered the master node and the xtreemfs-client and xtreemfs-server packages were installed. On the remaining three servers only the xtreemfs-server package was set up. Installation of the XtreemFS packages delivered three types of services to the Linux installation, namely the xtreemfs-dir, xtreemfs-mrc and xtreemfs-osd services. The Directory Service and Metadata service were started on the master node, the Storage service was started on the three slave nodes. For each of the slave nodes the dir_service.host directive in osdconfig.properties was changed to match the ip-address of the master node so a connection could be made. Finally the file system is mounted on the master node at /xtreemfs by executing the following command: `mkfs.xtreemfs master-node-ip/volume && mkdir /xtreemfs && mount.xtreemfs master-node-ip /volume /xtreemfs ` The desired replication count is set to three replica's via the xtfsutil tool which resides in the xtreemfs-tools package: `xtfsutil --set-drp --replication-policy WqRq --replicationfactor 3 /xtreemfs` For each test run, the master node starts by uploading half of the file set and then disconnects a single slave node via SSH. Disconnecting is done by blocking all incoming and outgoing traffic via iptables, similar to the method described for HDFS. The second half of the files is then written to the file system and the disconnected node is reconnected. The master node polls the /var/logs/xtreemfs/osd.log on the slave node each second and stores the timestamp when the newly written files are synchronized.

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Processing of Hadoop using Highly Available NameNode

Processing of Hadoop using Highly Available NameNode Processing of Hadoop using Highly Available NameNode 1 Akash Deshpande, 2 Shrikant Badwaik, 3 Sailee Nalawade, 4 Anjali Bote, 5 Prof. S. P. Kosbatwar Department of computer Engineering Smt. Kashibai Navale

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

HDFS Users Guide. Table of contents

HDFS Users Guide. Table of contents Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

XtreemFS Extreme cloud file system?! Udo Seidel

XtreemFS Extreme cloud file system?! Udo Seidel XtreemFS Extreme cloud file system?! Udo Seidel Agenda Background/motivation High level overview High Availability Security Summary Distributed file systems Part of shared file systems family Around for

More information

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari 1 Agenda Introduction on the objective of the test activities

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! [email protected]

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Data Storage in Clouds

Data Storage in Clouds Data Storage in Clouds Jan Stender Zuse Institute Berlin contrail is co-funded by the EC 7th Framework Programme 1 Overview Introduction Motivation Challenges Requirements Cloud Storage Systems XtreemFS

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo, 2010 Agenda Topic 1: Introduction Topic 2: Architecture

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information

More information

Gluster Filesystem 3.3 Beta 2 Hadoop Compatible Storage

Gluster Filesystem 3.3 Beta 2 Hadoop Compatible Storage Gluster Filesystem 3.3 Beta 2 Hadoop Compatible Storage Release: August 2011 Copyright Copyright 2011 Gluster, Inc. This is a preliminary document and may be changed substantially prior to final commercial

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Intro to Map/Reduce a.k.a. Hadoop

Intro to Map/Reduce a.k.a. Hadoop Intro to Map/Reduce a.k.a. Hadoop Based on: Mining of Massive Datasets by Ra jaraman and Ullman, Cambridge University Press, 2011 Data Mining for the masses by North, Global Text Project, 2012 Slides by

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A COMPREHENSIVE VIEW OF HADOOP ER. AMRINDER KAUR Assistant Professor, Department

More information

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

marlabs driving digital agility WHITEPAPER Big Data and Hadoop marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil

More information

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 [email protected], 2 [email protected],

More information

INUVIKA TECHNICAL GUIDE

INUVIKA TECHNICAL GUIDE --------------------------------------------------------------------------------------------------- INUVIKA TECHNICAL GUIDE FILE SERVER HIGH AVAILABILITY OVD Enterprise External Document Version 1.0 Published

More information

Google File System. Web and scalability

Google File System. Web and scalability Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might

More information

The Google File System

The Google File System The Google File System By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Presented at SOSP 2003) Introduction Google search engine. Applications process lots of data. Need good file system. Solution:

More information

Introduction to Gluster. Versions 3.0.x

Introduction to Gluster. Versions 3.0.x Introduction to Gluster Versions 3.0.x Table of Contents Table of Contents... 2 Overview... 3 Gluster File System... 3 Gluster Storage Platform... 3 No metadata with the Elastic Hash Algorithm... 4 A Gluster

More information

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc [email protected]

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc [email protected] What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

<Insert Picture Here> Big Data

<Insert Picture Here> Big Data Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big

More information

Data-Intensive Computing with Map-Reduce and Hadoop

Data-Intensive Computing with Map-Reduce and Hadoop Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan [email protected] Abstract Every day, we create 2.5 quintillion

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011 BookKeeper Flavio Junqueira Yahoo! Research, Barcelona Hadoop in China 2011 What s BookKeeper? Shared storage for writing fast sequences of byte arrays Data is replicated Writes are striped Many processes

More information

Apache Hadoop new way for the company to store and analyze big data

Apache Hadoop new way for the company to store and analyze big data Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Highly Available Hadoop Name Node Architecture-Using Replicas of Name Node with Time Synchronization among Replicas

Highly Available Hadoop Name Node Architecture-Using Replicas of Name Node with Time Synchronization among Replicas IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 3, Ver. II (May-Jun. 2014), PP 58-62 Highly Available Hadoop Name Node Architecture-Using Replicas

More information

XtreemFS - a distributed and replicated cloud file system

XtreemFS - a distributed and replicated cloud file system XtreemFS - a distributed and replicated cloud file system Michael Berlin Zuse Institute Berlin DESY Computing Seminar, 16.05.2011 Who we are Zuse Institute Berlin operates the HLRN supercomputer (#63+64)

More information

Design and Evolution of the Apache Hadoop File System(HDFS)

Design and Evolution of the Apache Hadoop File System(HDFS) Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop

More information

A very short Intro to Hadoop

A very short Intro to Hadoop 4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,

More information

POSIX and Object Distributed Storage Systems

POSIX and Object Distributed Storage Systems 1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome

More information

Fault Tolerance Techniques in Big Data Tools: A Survey

Fault Tolerance Techniques in Big Data Tools: A Survey on 21 st & 22 nd April 2014, Organized by Fault Tolerance Techniques in Big Data Tools: A Survey Manjula Dyavanur 1, Kavita Kori 2 Asst. Professor, Dept. of CSE, SKSVMACET, Laxmeshwar-582116, India 1,2

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Hadoop Distributed File System. Dhruba Borthakur June, 2007 Hadoop Distributed File System Dhruba Borthakur June, 2007 Goals of HDFS Very Large Distributed File System 10K nodes, 100 million files, 10 PB Assumes Commodity Hardware Files are replicated to handle

More information

HDFS Architecture Guide

HDFS Architecture Guide by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5

More information

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] June 3 rd, 2008

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008 Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] June 3 rd, 2008 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Snapshots in Hadoop Distributed File System

Snapshots in Hadoop Distributed File System Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any

More information

PARALLELS CLOUD STORAGE

PARALLELS CLOUD STORAGE PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...

More information

Optimize the execution of local physics analysis workflows using Hadoop

Optimize the execution of local physics analysis workflows using Hadoop Optimize the execution of local physics analysis workflows using Hadoop INFN CCR - GARR Workshop 14-17 May Napoli Hassen Riahi Giacinto Donvito Livio Fano Massimiliano Fasi Andrea Valentini INFN-PERUGIA

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud Aditya Jadhav, Mahesh Kukreja E-mail: [email protected] & [email protected] Abstract : In the information industry,

More information

Survey on Scheduling Algorithm in MapReduce Framework

Survey on Scheduling Algorithm in MapReduce Framework Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Efficient Data Replication Scheme based on Hadoop Distributed File System

Efficient Data Replication Scheme based on Hadoop Distributed File System , pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,

More information

High Availability Solutions for the MariaDB and MySQL Database

High Availability Solutions for the MariaDB and MySQL Database High Availability Solutions for the MariaDB and MySQL Database 1 Introduction This paper introduces recommendations and some of the solutions used to create an availability or high availability environment

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

www.basho.com Technical Overview Simple, Scalable, Object Storage Software

www.basho.com Technical Overview Simple, Scalable, Object Storage Software www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...

More information

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability

More information

Fault Tolerance in Hadoop for Work Migration

Fault Tolerance in Hadoop for Work Migration 1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous

More information

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline

More information

Suresh Lakavath csir urdip Pune, India [email protected].

Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. A Big Data Hadoop Architecture for Online Analysis. Suresh Lakavath csir urdip Pune, India [email protected]. Ramlal Naik L Acme Tele Power LTD Haryana, India [email protected]. Abstract Big Data

More information

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week Michael Thomas, Dorian Kcira California Institute of Technology CMS Offline & Computing Week San Diego, April 20-24 th 2009 Map-Reduce plus the HDFS filesystem implemented in java Map-Reduce is a highly

More information

Real-time Protection for Hyper-V

Real-time Protection for Hyper-V 1-888-674-9495 www.doubletake.com Real-time Protection for Hyper-V Real-Time Protection for Hyper-V Computer virtualization has come a long way in a very short time, triggered primarily by the rapid rate

More information

Distributed Filesystems

Distributed Filesystems Distributed Filesystems Amir H. Payberah Swedish Institute of Computer Science [email protected] April 8, 2014 Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 1 / 32 What is Filesystem? Controls

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization ovirt and Gluster hyper-converged! HA solution for maximum resource utilization 31 st of Jan 2016 Martin Sivák Senior Software Engineer Red Hat Czech FOSDEM, Jan 2016 1 Agenda (Storage) architecture of

More information

Sujee Maniyam, ElephantScale

Sujee Maniyam, ElephantScale Hadoop PRESENTATION 2 : New TITLE and GOES Noteworthy HERE Sujee Maniyam, ElephantScale SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member

More information

HADOOP MOCK TEST HADOOP MOCK TEST I

HADOOP MOCK TEST HADOOP MOCK TEST I http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at

More information

TP1: Getting Started with Hadoop

TP1: Getting Started with Hadoop TP1: Getting Started with Hadoop Alexandru Costan MapReduce has emerged as a leading programming model for data-intensive computing. It was originally proposed by Google to simplify development of web

More information

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Hadoop Distributed File System. Jordan Prosch, Matt Kipps Hadoop Distributed File System Jordan Prosch, Matt Kipps Outline - Background - Architecture - Comments & Suggestions Background What is HDFS? Part of Apache Hadoop - distributed storage What is Hadoop?

More information

Hadoop Scheduler w i t h Deadline Constraint

Hadoop Scheduler w i t h Deadline Constraint Hadoop Scheduler w i t h Deadline Constraint Geetha J 1, N UdayBhaskar 2, P ChennaReddy 3,Neha Sniha 4 1,4 Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore,

More information

Hadoop Parallel Data Processing

Hadoop Parallel Data Processing MapReduce and Implementation Hadoop Parallel Data Processing Kai Shen A programming interface (two stage Map and Reduce) and system support such that: the interface is easy to program, and suitable for

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction

More information

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM

IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM Sugandha Agarwal 1, Pragya Jain 2 1,2 Department of Computer Science & Engineering ASET, Amity University, Noida,

More information

Hadoop Basics with InfoSphere BigInsights

Hadoop Basics with InfoSphere BigInsights An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Unit 4: Hadoop Administration An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted

More information

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization ovirt and Gluster hyper-converged! HA solution for maximum resource utilization 21 st of Aug 2015 Martin Sivák Senior Software Engineer Red Hat Czech KVM Forum Seattle, Aug 2015 1 Agenda (Storage) architecture

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

HDFS Under the Hood. Sanjay Radia. [email protected] Grid Computing, Hadoop Yahoo Inc.

HDFS Under the Hood. Sanjay Radia. Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. HDFS Under the Hood Sanjay Radia [email protected] Grid Computing, Hadoop Yahoo Inc. 1 Outline Overview of Hadoop, an open source project Design of HDFS On going work 2 Hadoop Hadoop provides a framework

More information

Mirror File System for Cloud Computing

Mirror File System for Cloud Computing Mirror File System for Cloud Computing Twin Peaks Software Abstract The idea of the Mirror File System (MFS) is simple. When a user creates or updates a file, MFS creates or updates it in real time on

More information

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of

More information

Deploying Hadoop with Manager

Deploying Hadoop with Manager Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer [email protected] Alejandro Bonilla / Sales Engineer [email protected] 2 Hadoop Core Components 3 Typical Hadoop Distribution

More information

HDFS Space Consolidation

HDFS Space Consolidation HDFS Space Consolidation Aastha Mehta*,1,2, Deepti Banka*,1,2, Kartheek Muthyala*,1,2, Priya Sehgal 1, Ajay Bakre 1 *Student Authors 1 Advanced Technology Group, NetApp Inc., Bangalore, India 2 Birla Institute

More information

GraySort on Apache Spark by Databricks

GraySort on Apache Spark by Databricks GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner

More information

High Availability Guide for Distributed Systems

High Availability Guide for Distributed Systems Tivoli IBM Tivoli Monitoring Version 6.2.2 Fix Pack 2 (Revised May 2010) High Availability Guide for Distributed Systems SC23-9768-01 Tivoli IBM Tivoli Monitoring Version 6.2.2 Fix Pack 2 (Revised May

More information