Scalable Storage for Data-Intensive Computing
|
|
|
- Jemimah Shepherd
- 10 years ago
- Views:
Transcription
1 Scalable Storage for Data-Intensive Computing Abhishek Verma, Shivaram Venkataraman, Matthew Caesar, and Roy H. Campbell Abstract Cloud computing applications require a scalable, elastic and fault tolerant storage system. We survey how storage systems have evolved from the traditional distributed filesystems, peer-to-peer storage systems and how these ideas have been synthesized in current cloud computing storage systems. Then, we describe how metadata management can be improved for a file system built to support large scale data-intensive applications. We implement Ring File System (RFS), that uses a single hop Distributed Hash Table, to manage file metadata and a traditional client-server model for managing the actual data. Our solution does not have a single point of failure, since the metadata is replicated. The number of files that can be stored and the throughput of metadata operations scales linearly with the number of servers. We compare our solution against two open source implementations of the Google File System (GFS): HDFS and KFS and show that it performs better in terms of fault tolerance, scalability and throughput. We envision that similar ideas from peer-to-peer systems can be applied to other large scale cloud computing storage systems. 1 Introduction Persistent storage is a fundamental abstraction in computing. It consists of a named set of data items that come into existence through explicit creation, persist through temporary failures of the system, until they are explicitly deleted. Sharing of data in distributed systems has become pervasive as these systems have grown in scale in terms of number of machines and the amount of data stored. {verma7, venkata4, caesar, rhc}@illinois.edu Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Avenue, Urbana, IL
2 2 Abhishek Verma, Shivaram Venkataraman, Matthew Caesar, and Roy H. Campbell The phenomenal growth of web services in the past decade has resulted in many Internet companies needing to perform large scale data analysis such as indexing the contents of the billions of websites or analyzing terabytes of traffic logs to mine usage patterns. A study into the economics of distributed computing [1] published in 2008, revealed that the cost of transferring data across the network is relatively high. Hence moving computation near the data is a more efficient computing model and several large scale, data-intensive application frameworks [2, 3] exemplify this model. The growing size of the datacenter also means that hardware failures occur more frequently making such data analysis much harder. A recent presentation about a typical Google datacenter reported that up to 5% of disk drives fail each year and that every server restarts at least twice a year due to software or hardware issues [4]. With the size of digital data doubling every 18 months [5], it is also essential that applications are designed to scale and meet the growing demands. To deal with these challenges, there has been a lot of work on building large scale distributed file systems. Distributed data storage has been identified as one of the challenges in cloud computing [6]. An efficient distributed file system needs to: 1. provide large bandwidth for data access from multiple concurrent jobs 2. operate reliably amidst hardware failures 3. be able to scale to many millions or billions of files and thousands of machines The Google File System (GFS) [7] was proposed to meet the above requirements and has since been cloned in open source projects such as Hadoop Distributed File System (HDFS) 1 and Kosmos File System (KFS) 2 that are used by companies such as Yahoo, Facebook, Amazon, Baidu, etc. The GFS architecture was picked for its simplicity and works well for hundreds of terabytes with few millions of files [8]. One of the direct implications of storing all the metadata in memory is that the size of metadata is limited by the memory available. A typical GFS master is capable of handling a few thousand operations per second [8] but when massively parallel applications like a MapReduce [2] job with many thousand mappers need to open a number of files, the GFS master becomes overloaded. Though the probability of a single server failing in a datacenter is low and the GFS master is continuously monitored, it still remains a single point of failure for the system. With storage requirements growing to petabytes, there is a need for distributing the metadata storage to more than one server. Having multiple servers to handle failure would increase the overall reliability of the system and reduce the downtime visible to clients. As datacenters grow to accommodate many thousands of machines in one location, distributing the metadata operations among multiple servers would be necessary to increase the throughput. Handling metadata operations efficiently is an important aspect of the file system as they constitute up to half of file system workloads [9]. While I/O bandwidth available for a distributed file system can be increased by adding more data stor
3 Scalable Storage for Data-Intensive Computing 3 age servers, scaling metadata management involves dealing with consistency issues across replicated servers. Peer-to-peer storage systems [10], studied previously, provide decentralized control and tolerance to failures in untrusted-internet scale environments. Grid Computing has been suggested as a potential environment for peer-to-peer ideas [11]. Similarly we believe that large scale cloud computing applications could benefit by adopting peer-to-peer system designs. In this work, we address the above mentioned limitations and present the design of Ring File System (RFS), a distributed file system for large scale data-intensive applications. In RFS, the metadata is distributed among multiple replicas connected using a Distributed Hash Table (DHT). This design provides better fault tolerance and scalability while ensuring a high throughput for metadata operations from multiple clients. The major contributions of our work include: 1. A metadata storage architecture that provides fault tolerance, improved throughput and increased scalability for the file system. 2. Studying the impact of the proposed design through analysis and simulations. 3. Implementing and deploying RFS on a 16-node cluster and comparison with HDFS and KFS. The rest of this chapter is organized as follows: We first provide a background on two popular traditional distributed file systems NFS and AFS in Section 2. Then, we discuss how peer-to-peer system ideas have been used to design distributed storage systems in Section 3. Section 4 discusses how current cloud storage systems are designed based on the amalgamation of ideas from traditional and P2P-based file systems. We describe the design of our system, RFS, in Section 5 and analyze its implications in Section 6. We then demonstrate the scalability and fault tolerance of our design through simulations followed by implementation results in Section 7. Section 8 summarizes the metadata management techniques in existing distributed file system and their limitations. We discuss possible future work and conclude with Section 9. 2 Traditional Distributed Filesystems In this section, we examine two traditional distributed filesystems NFS and AFS and focus on their design goals and consistency mechanisms. Traditional distributed filesystems are typically geared towards providing sharing capabilities among multiple (human) users under a common administrative domain.
4 4 Abhishek Verma, Shivaram Venkataraman, Matthew Caesar, and Roy H. Campbell 2.1 NFS The NFS [12] protocol has been an industry standard since its introduction by Sun Microsystems in the 1980s. It allows remote clients to mount file systems over the network and interact with those file systems as if they were mounted locally. Although the first implementation of NFS was in a Unix environment, NFS is now implemented within several different OS environments. File manipulation primitives supported by NFS are: read, write, create a file or directory, remove a file or directory. The NFS protocol is designed to be machine, operating system, network architecture, and transport protocol independent. This independence is achieved through the use of Remote Procedure Call (RPC) primitives built on top of an external Data Representation (XDR). NFS uses the Virtual file system (VFS) layer to handle local and remote files. VFS provides a standard file system interface and allows NFS to hide the difference between accessing local or remote file systems. NFS is a stateless protocol, i.e., the server does not maintain the state of files there are no open or close primitives in NFS. Hence, each client request contains all the necessary information for the server to complete the request and the server responds fully to every client s request without being aware of the conditions under which the client is making the request. Only the client knows the state of a file for which an operation is requested. If the server and/or the client maintains state, the failure of a client, server or the network is difficult to recover from. NFS uses the mount protocol to access remote files, which establishes a local name for remote files. Thus, the users access remote files using local names, while the OS takes care of the mapping to remote names. Most NFS implementations provide session semantics for performance reasons no changes are visible to other processes until the file is closed. Using local caches greatly improves performance at the cost of consistency and reliability. Different implementations use different caching policies. Sun s implementation allows cache data to be stable for up to 30 seconds. Applications can use locks in order to ensure consistency. Unlike earlier versions, the NFS version 4 protocol supports traditional file access while integrating support for file locking and the mount protocol. In addition, support for strong security (and its negotiation), compound operations, client caching, and internationalization have been added. Client checks cache validity when the file is opened. Modified data is written back to the server when the file is closed. Parallel NFS (pnfs) is a part of the NFS v4.1 standard that allows clients to access storage devices directly and in parallel. The pnfs architecture eliminates the scalability and performance issues associated with NFS servers in earlier deployments. This is achieved by the separation of data and metadata, and moving the metadata server out of the data path.
5 Scalable Storage for Data-Intensive Computing AFS The Andrew File System (AFS) was developed as a part of the Andrew project at Carnegie Mellon University. AFS is designed to allow users with workstations to share data easily. The design of AFS consists two components: a set of centralized file servers and a communication network, called Vice, and a client process named Venus that runs on every workstation. The distributed file system is mounted as a single tree in every workstation and Venus communicates with Vice to open files and manage the local cache. The two main design goals of AFS are scalability and security. Scalability is achieved by caching relevant information in the clients to support a large number of clients per server. In the first version of AFS, clients cached the pathname prefix information and directed requests to the appropriate server. Additionally the file cache used in AFS-1 was pessimistic and verified if the cache was up to date every time a file was opened. AFS-2 was designed to improve the performance and overcome some of the administrative difficulties found in AFS-1. The cache coherence protocol in AFS-2 assumes that the cache was valid unless notified by a callback. AFS-2 also introduces the notion of having data volumes to eliminate the static mapping from files to servers. Volumes consist of a partial subtree and many volumes are contained in a single disk partition. Furthermore, using volumes helps the design of other features like read-only snapshots, backups and per-user disk quotas. AFS-2 was used for around four years at CMU and experiments showed that its performance was better than NFS [13]. The third version of AFS was motivated by the need to support multiple administrative domains. Such a design could support a federation of cells but present users with a single unified namespace. AFS was also commercialized during this time and the development of AFS-3 was continued at Transarc Corporation in The currently available implementation of AFS is a community supported distribution named OpenAFS. AFS has also played an important role in shaping the design of other distributed file system. Coda a highly-available distributed file system, also developed at CMU, is a descendant of AFS-2. The design of NFSv4 published in 2003 was also heavily influenced by AFS. 3 P2P-based Storage Systems The need for sharing files over the Internet led to the birth of peer-to-peer systems like Napster and Gnutella. In this section, we describe the design of two storage systems based on Distributed Hash Tables (DHTs).
6 6 Abhishek Verma, Shivaram Venkataraman, Matthew Caesar, and Roy H. Campbell 3.1 OceanStore OceanStore [10] is a global persistent data store that aims to provide a consistent, highly available storage utility. There are two differentiating design goals: 1. the ability to be constructed from untrusted infrastructure; and 2. aggressive promiscuous caching OceanStore assumes that the infrastructure is fundamentally untrusted any server may crash without warning, leak information to third parties or be compromised. OceanStore caches data promiscuously anywhere, anytime, in order to provide faster access and robustness to network partitions. Although aggressive caching complicates data coherence and location, it provides greater flexibility to optimize locality and trades off consistency for availability. It also helps to reduce network congestion by localizing access traffic. Promiscuous caching requires redundancy and cryptographic techniques to ensure the integrity and authenticity of the data. OceanStore employs a Byzantine-fault tolerant commit protocol to provide strong consistency across replicas. The OceanStore API also allows applications to weaken their consistency restrictions in exchange for higher performance and availability. A version-based archival storage system provides durability. OceanStore stores each version of a data object in a permanent, read-only form, which is encoded with an erasure code and spread over hundreds or thousands of servers. A small subset of the encoded fragments are sufficient to reconstruct the archived object; only a global-scale disaster could disable enough machines to destroy the archived object. The OceanStore introspection layer adapts the system to improve performance and fault tolerance. Internal event monitors collect and analyze information such as usage patterns, network activity, and resource availability. OceanStore can then adapt to regional outages and denial of service attacks, pro-actively migrate data towards areas of use and maintain sufficiently high levels of data redundancy. OceanStore objects are identified by a globally unique identifier (GUID), which is the secure hash (for e.g., SHA1 [14]) of the owner s key and a human readable name. This scheme allows servers to verify and object s owner efficiently and facilitates access checks and resource accounting. OceanStore uses Tapestry [15] to store and locate objects. Tapestry is a scalable overlay network, built on TCP/IP, that frees the OceanStore implementation from worrying about the location of resources. Each message sent through Tapestry is addressed with a GUID rather than an IP address; Tapestry routes the message to a physical host containing a resource with that GUID. Further, Tapestry is locality aware: if there are several resources with the same GUID, it locates (with high probability) one that is among the closest to the message source.
7 Scalable Storage for Data-Intensive Computing PAST PAST is a large scale, decentralized, persistent, peer-to-peer storage system that aims to provide high availability and scalability. PAST is composed of nodes connected to the Internet and an overlay routing network among the nodes is constructed using Pastry [16]. PAST supports three operations: 1. Insert: Stores a given file a k different locations in the PAST network. k represents the number of replicas created and can be chosen by the user. The file identifier is generated using a SHA-1 hash of the file name, the owner s public key and a random salt to ensure that they are unique. k nodes which have an identifier closest to the fileid are selected to store the node. 2. Lookup: Retrieves a copy of the file from the nearest available replica. 3. Reclaim: Reclaims the storage of k copies of the file but does not guarantee that the file is no longer available. Since node identifiers and file identifiers are uniformly distributed in their domains, the number of files stored by each node is roughly balanced. However, due to variation in the size of the inserted files and the capacity of each PAST node there could be storage imbalances in the system. There are two schemes proposed to handle such imbalances. In the first scheme, called replica diversion, if one of the k closest nodes to the given fileid does not have enough space to store the file, a node from the leafset is chosen and a pointer is maintained in the original node. To handle failures, this pointer is also replicated. If no suitable node is found for replica diversion the entire request is reverted and the client is forced to choose a different fileid by using a different random salt. This scheme is called file diversion and is a costly operation. Simulations show that file diversion is required only for up to 4% of the requests when the leaf set size is 32. The replication of a file in PAST, to k different locations, is to ensure high availability. However, some popular files could require more than k replicas to minimize latency and improve performance. PAST uses any space unused in the nodes to cache files which are frequently accessed. When a file is routed through a node during an insert or lookup operation, if the size of the file is less than a fraction of the available free space, it is cached. The cached copies of a file are maintained in addition to the k replicas and are evicted when space is required for a new file. 4 Cloud Storage Systems Motivated by the need for processing terabytes of data generated by systems, cloud storage systems need to provide high performance at large scale. In this section, we examine two cloud computing storage systems.
8 8 Abhishek Verma, Shivaram Venkataraman, Matthew Caesar, and Roy H. Campbell 4.1 Google File System Around 2000, Google designed and implemented the Google File System (GFS) [7] to provide support for large, distributed, data-intensive applications. One of the requirements was to run on cheap commodity hardware and deliver good aggregate throughput to a large number of clients. GFS is designed for storing a modest (millions) number of huge files. It is optimized for large streaming reads and writes. Though small random reads and writes are supported, it is a non-goal to perform them efficiently. The GFS architecture comprises of a single GFS master server which stores the file metadata of the file system and multiple slaves known as chunkservers which store the data. The GFS master stores the metadata, which consists of information such as file names, size, directory structure and block locations, in memory. The chunkservers periodically send heartbeat messages to the master to report their state and get instructions. Files are divided into chunks (usually 64 MB in size) and the GFS master manages the placement and data-layout among the various chunkservers. A large chunk size reduces the overhead of the client interacting with the master to find out its location. For reliability, each chunk is replicated on multiple (by default three) chunkservers. Having a single master simplifies the design and enables the master to make sophisticated chunk placement and replication decisions using global knowledge. It s involvement in reads and writes is minimized so that it does not become a bottleneck. Clients never read and write file data through the master. Instead, a client asks the master which chunkservers it should contact. Clients cache this information (for a limited time) and directly communicate with the chunkservers for subsequent operations. GFS supports a relaxed consistency model that is simple and efficient to implement at scale. File namespace mutations at the master are guaranteed to be atomic. Record append causes data to be appended atomically at least once even in the presence of concurrent mutations. Since clients cache chunk locations, they may read stale data. However, this window is limited by the cache entry s timeout and the next open of the file. Data corruption (like bit rot) is detected through checksumming by the chunkservers. 4.2 Dynamo Dynamo [17] is a distributed key-value store used at Amazon and is primarily built to replace relational databases with a key-value store having eventual consistency. Dynamo uses a synthesis of well known techniques to achieve the goals of scalability and availability. It is designed taking churn into account: storage nodes can be added or removed without requiring any manual partitioning or redistribution. Data is partitioned and replicated using consistent hashing, and consistency is pro-
9 Scalable Storage for Data-Intensive Computing 9 vided using versioning. Vector clocks with reconciliation during reads are used to provide high availability for writes. Consistency among replicas during failures is maintained by a quorum-like replica synchronization protocol. Dynamo uses a gossip based distributed failure detection and membership protocol. The basic consistent hashing algorithm assigns a random position for each node on the ring. This can lead to non-uniform data and load distribution and is oblivious to heterogeneity. Hence, Dynamo uses virtual nodes : a virtual node looks like a single node in the system, but each node can be responsible for more than one virtual node. When a new node is added to the system, it is assigned multiple positions in the ring. Dynamo provides eventual consistency, which allows for updates to be propagated to all replicas asynchronously. A write request can return to the client, before the update has been applied at all the replicas, which can result in scenarios where a subsequent read operation may return stale data. In the absence of failures, there is a bound on the update propagation times. However under certain failures (like network partitions), updates may not propagate to all replicas for a long time. Dynamo shows that an eventually consistent storage system can be a building block for highly available applications. 5 RFS Design In this section, we present the design of Ring File System (RFS), a distributed file system for large scale data-intensive applications. In RFS, the metadata is distributed among multiple replicas connected using a Distributed Hash Table (DHT). This design provides better fault tolerance and scalability while ensuring a high throughput for metadata operations from multiple clients. Our architecture consists of three types of nodes: metaservers, chunkservers and clients as shown in the Figure 1. The metaservers store the metadata of the file system whereas the chunkservers store the actual contents of the file. Every metaserver has information about the locations of all the other metaservers in the file system. Thus, the metaservers are organized in a single hop Distributed Hash Table (DHT). Each metaserver has an identifier which is obtained by hashing its MAC address. Chunkservers are grouped into multiple cells and each cell communicates with a single metaserver. This grouping can be performed in two ways. The chunkserver can compute a hash of its MAC address and connect to the metaserver that is its successor in the DHT. This makes the system more self adaptive since the file system is symmetric with respect to each metaserver. The alternative is to configure each chunkserver to connect to a particular metaserver alone. This gives more control over the mapping of chunkservers to metaservers and can be useful in configuring geographically distributed cells each having its own metaserver. The clients distribute the metadata for the files and directories over the DHT by computing a hash of the parent path present in the file operation. Using the parent path implies that the metadata for all the files in a given directory is present at
10 10 Abhishek Verma, Shivaram Venkataraman, Matthew Caesar, and Roy H. Campbell the same metaserver. This makes listing the contents of a directory efficient and is commonly used by MapReduce [2] and other cloud computing applications. Mm-1 K DHT of Metaservers (1) M0 (2) M3 M1 M2 (4) (3) (6) (5) C1 C2 cell 0... Cn cell 1 C1 C2... Cn... Fig. 1 Architecture of RingFS 5.1 Normal Operation We demonstrate the steps involved in the creation of a file, when there are no failures in the system. The sequence of operations shown in Figure 1 are: 1. Client wishes to create a file named /dir1/dir2/filename. It computes a hash of the parent path, /dir1/dir2, to determine that it has to contact metaserver M 0 for this file operation. 2. Client issues a create request to this metaserver which adds a record to its metatable and allocates space for the file in Cell Before returning the response back to the client, M 0 sends a replication request to r of its successors, M 1,M 2,,M r in the DHT to perform the same operation on their replica metatable. 4. All of the successor metaservers send replies to M 0. Synchronous replication is necessary to ensure consistency in the event of failures of metaservers. 5. M 0 sends back the response to the client. 6. Client then contacts the chunkserver and sends the actual file contents. 7. Chunkserver stores the file contents. Thus, in all r metadata Remote Procedure Calls (RPCs) are needed for a write operation. If multiple clients try to create a file or write to the same file, consistency is ensured by the fact that these mutable operations are serialized at the primary metaserver for that file. The read operation is similarly performed by using the hash of the parent path to determine the metaserver to contact. This metaserver directly replies with the metadata information of the file and the location of the chunks. The client then
11 Scalable Storage for Data-Intensive Computing 11 communicates directly with the chunkservers to read the contents of the file. Thus, read operations need a single metadata RPC. 5.2 Failure and Recovery Fig. 2 Tolerating metaserver failures Let us now consider a case now where metaserver M 0 has failed. The chunkservers in cell 0, detect the failure through heartbeat messages and connect to the next server M 1 in the DHT. When a client wishes to create a file its connection is now handled by M 1 in place of M 0. We replicate the metadata to r successive servers for this request. M 1 also allocates space for the file in cell 0 and manages the layout and replication for the chunkservers in cell 0. Once M 0 recovers, it sends a request to its neighboring metaservers M 1,M 2,,M r to obtain the latest version of the metadata. On receipt of this request, M 1 sends the metadata which belongs to M 0 and also closes the connection with the chunkservers in cell 0. The chunkservers now reconnect to M 0 which takes over the layout management for this cell and verifies the file chunks based on the latest metadata version obtained. Also, M r lazily deletes the (r+1) th copy of the metadata. Thus, our design guarantees strict consistency through the use of synchronous replication. In order to withstand the failure of a machine or a rack in a datacenter, a suitable number of replicas can be chosen using techniques from Carbonite [18]. If instead, M k, one of the r successors of M 0 fails in step 3, then M 0 retries the replication request a fixed number of times. In the mean time, the underlying DHT stabilization protocol updates the routing table and M k+1 handles the requests directed to the namespace previously serviced by M k. If M 0 is unable to replicate the metadata to r successors, then it sends an error message back to the client.
12 12 Abhishek Verma, Shivaram Venkataraman, Matthew Caesar, and Roy H. Campbell 6 Analysis In this section, we present a mathematical analysis comparing the design of GFS and RFS with respect to the scalability, throughput followed by failure analysis. m : Number of metaservers R,W : Baseline Read and Write throughputs r : Number of times the metadata is replicated Metric GFS RFS Metaserver failures that can be tolerated 0 r 1 RPCs required for a read 1 1 RPCs required for a write 1 r Metadata throughput for reads R R m Metadata throughput for writes W W m/r Table 1 Analytical comparison of GFS and RFS. 6.1 Design Analysis Let the total number of machines in the system be n. In GFS, there is exactly 1 metaserver and the remaining n 1 machines are chunkservers that store the actual data. Since there is only one metaserver, the metadata is not replicated and the file system cannot survive the crash of the metaserver. In RFS, we have m metaservers that distribute the metadata r times. RFS can thus survive the crash of r 1 metaservers. Although a single Remote Procedure Call (RPC) is enough for the lookup using a hash of the path, r RPCs are needed for the creation of the file, since the metadata has to be replicated to r other servers. Since m metaservers can handle the read operations for different files, the read metadata throughput is m times that of GFS. Similarly, the write metadata throughput is m/r times that of GFS, since it is distributed over m metaservers, but replicated r times. This analysis is summarized in Table Failure Analysis Failures are assumed to be independent. This assumption is reasonable because we have only tens of metaservers and they are distributed across racks and potentially different clusters. We ignore the failure of chunkservers in this analysis since it has the same effect on both the designs and simplifies our analysis. Let f = 1/MT BF
13 Scalable Storage for Data-Intensive Computing 13 be the probability that the meta server fails in a given time, and let R g be the time required to recover it. The file system is unavailable for R g f of the time. If GFS is deployed with a hot standby master replica, GFS is unavailable for R g f 2 of the time, when both of them fail. For example, if the master server fails once a month and it takes 6 hours for it to recover, then the file system availability with a single master is 99.18% and increases to 99.99% with a hot standby. Let m be the number of metaservers in our system, r be the number of times the metadata is replicated, f be the probability that a given server fails in a given time t and R r be the time required to recover it. Since the recovery time of a metaserver is proportional to the amount of metadata stored on it and we assume that the metadata is replicated r times, R r will be roughly equal to r R g /n. The probability that any r consecutive metaservers in the ring go down is m f r (1 f) m r. If we have m = 10 metaservers, r = 3 copies of the metadata and MT BF is 30 days, then this probability is 0.47%. However, a portion of our file system is unavailable if and only if all the replicated metaservers go down within the recovery time of each other. This ) r 1 (1 f) m r, assuming that the happens with a probability of F r = m f ( f Rr t failures are equally distributed over time. The file system is unavailable for F r R r of the time. Continuing with the example and substituting appropriate values, we find that the recovery time would be 1.8 hours and the availability is %. 7 Experiments In this section, we present experimental results obtained from our prototype implementation of RFS. Our implementation is based on KFS and has modified data structures for metadata management and the ability for metaservers to recover from failures by communicating with its replicas. To study the behavior on large networks of nodes, we also implemented a simulation environment. All experiments were performed on sixteen 8-core HP DL160 (Intel Xeon 2.66GHz CPUs) with 16GB of main memory, running CentOS 5.4. The MapReduce implementation used was Hadoop and was executed using Sun s Java SDK We compare our results against Hadoop Distributed File System (HDFS) that accompanied the Hadoop release and Kosmos File system (KFS) 0.4. For the HDFS and KFS experiments, a single server is configured as the metaserver and the other 15 nodes as chunkservers. RFS is configured with 3 metaservers and 5 chunkservers connecting to each of them. We replicate the metadata three times in our experiments.
14 14 Abhishek Verma, Shivaram Venkataraman, Matthew Caesar, and Roy H. Campbell Percentage of requests GFS, f=0.03 GFS, f=0.04 GFS, f=0.05 RFS, f=0.03 RFS, f=0.04 RFS, f= Percentage of successful lookups Fig. 3 CDF of number of successful lookups for different failure probabilities 7.1 Simulation Fault tolerance of a design is difficult to measure without a large scale deployment. Hence, we chose to model the failures that occur in datacenters using a discrete iterative simulation. Each metaserver is assumed to have a constant and independent failure probability. The results show that RFS has better fault tolerance than the single master (GFS) design. In the case of GFS, if the metaserver fails, the whole file system is unavailable and the number of successful lookups is 0 till it recovers after some time. In RFS, we configure 10 metaservers and each fails independently. The metadata is replicated on the two successor metaservers. Only a part of the file system is unavailable only when three successive metaservers fail. Figure 3 shows a plot of the CDF of the number of successful lookups for GFS and RFS for different probabilities of failure. As the failure probability increases, the number of successful lookups decreases. Less than 10% of the lookups fail in RFS in all the cases. 7.2 Fault Tolerance The second experiment demonstrates the fault tolerance of our implementation. A client sends 150 metadata operations per second and the number of successful operations is plotted over time for GFS, KFS and RFS in Figure 4. HDFS achieves a steady state throughput, but when the metaserver is killed, the complete file system become unavailable. Around t = 110s, the metaserver is restarted and it recovers from its checkpointed state and replays the logs of operations that couldn t be check-
15 Scalable Storage for Data-Intensive Computing 15 pointed. The spike during the recovery happens because the metaserver buffers the requests till it is recovering and batches them together. A similar trend is observed in the case of KFS, in which we kill the metaserver at t = 70s and restart it at t = 140s. Successful operations per second HDFS KFS RFS Time (in seconds) Fig. 4 Fault tolerance of HDFS, KFS and RFS For testing the fault tolerance of RFS, we kill one of the three metaservers at t = 20s and it does not lead to any decline in the throughput of successful operations. At t = 30s, we kill another metaserver, leaving just one metaserver leading to a drop in the throughput. At t = 60s, we restart the failed metaserver and the throughput stabilizes to its steady state. 7.3 Throughput The third experiment demonstrates the metadata throughput performance. A multithreaded client is configured to spawn a new thread and perform read and write metadata operations at the appropriate frequency to achieve the target qps. We then measure how many operations complete successfully each second and use this to compute the server s capacity. Figure 5 shows the load graph comparison for HDFS, KFS and RFS. The throughput of RFS is roughly twice that of HDFS and KFS and though the experiment was conducted with 3 metaservers, the speed is slightly lesser due to the replication overhead. Also, the performance of HDFS and KFS are quite similar and RFS with no metadata replication has the same performance as KFS.
16 16 Abhishek Verma, Shivaram Venkataraman, Matthew Caesar, and Roy H. Campbell Successful operations per second RFS KFS HDFS Client operations sent per second Fig. 5 Comparison of throughput under different load conditions 7.4 MapReduce Performance Time (in seconds) HDFS KFS RFS Input dataset size (in GB) Fig. 6 MapReduce application - Wordcount We ran a simple MapReduce application that counts the number of words on a Wikipedia dataset and varied the input dataset size from 2GB to 16GB. We measured the time taken for the job to compute on all three file system and a plot of the same is shown in Figure 6. We observed that for a smaller dataset the overhead
17 Scalable Storage for Data-Intensive Computing 17 of replicating the metadata increased the time taken to run the job, but on larger datasets the running times were almost the same for KFS and RFS. 8 Comparison with Related Work Metadata management has been implemented in file systems such as NFS and AFS [13] by statically partitioning the directory hierarchy to different servers. This, however, requires an administrator to assign directory subtrees to each server but enables clients to know easily which servers have the metadata for a give file name. Techniques of hashing a file name or the parent directory name to locate a server have been previously discussed in file systems such as Vesta [19] and Lustre [20]. Ceph [21], a petabyte scale file system, uses a dynamic metadata distribution scheme where subtrees are migrated when the load on a server increases. Hashing schemes have been found to be inefficient while trying to satisfy POSIX directory access semantics as this would involve contacting more than one server. However studies have shown that most cloud computing applications do not require strict POSIX semantics [7] and with efficient caching of metadata on the clients, the performance overhead can be overcome. 8.1 Peer-to-Peer File systems File systems such as PAST [22] and CFS [23] have been built on top of DHTs like Pastry [16] and Chord [24], but they concentrate on storage management in a peerto-peer system with immutable files. Ivy [25] is a read/write peer-to-peer file system which uses logging and DHash. A more exhaustive survey of peer-to-peer storage techniques for distributed file systems can be found here [26]. Our work differs from these existing file systems in two aspects: (1) Consistency of metadata is crucial in a distributed file system deployed in a datacenter and our design provides stricter consistency guarantees than these systems through synchronous replication. (2) Existing peer-to-peer file systems place blocks randomly, while some can exploit locality. Our system can implement more sophisticated placement policies (e.g: placing blocks on the closest and the least loaded server), since the group of servers which store the metadata have global information about the file system. 8.2 Distributed Key-value Stores Recently, there have been some efforts in deploying peer-to-peer like systems in distributed key-value stores used in datacenters. Cassandra [27] is a distributed key-
18 18 Abhishek Verma, Shivaram Venkataraman, Matthew Caesar, and Roy H. Campbell value store that has been widely used and provides clients with a simple data model and eventual consistency guarantees. Cassandra provides a fully distributed design like Dynamo with a column-family based data model like Bigtable [28]. Key-value stores are often useful for low latency access to small objects that can tolerate eventual consistency guarantees. RingFS on the other hand, tries to address the problems associated with storing the metadata for large files in a hierarchical file system and provides stronger consistency guarantees. 9 Conclusion Today s cloud computing storage systems need to be scalable, elastic and fault tolerant. We surveyed how storage systems have evolved from traditional distributed filesystems (NFS and AFS) and peer-to-peer storage systems (OceanStore and PAST), and how these ideas have been synthesized in current cloud computing storage systems (GFS and Dynamo). We presented and evaluated RFS, a scalable, fault-tolerant and high throughput file system that is well suited for large scale data-intensive applications. RFS can tolerate the failure of multiple metaservers and it can handle a large number of files. The number of files that can be stored in RFS and the throughput of metadata operations scales linearly with the number of servers. RFS performs better than HDFS and KFS in terms of fault tolerance, scalability and throughput. Peer-to-peer systems are decentralized and self-organizing. Thus, they are attractive for datacenters built with commodity components with high failure rates, especially as the size of datacenters increases. We have shown how using a single-hop Distributed Hash Table to manage its metadata from peer-to-peer systems can be combined together with the traditional client server model for managing the actual data. We envision that more ideas from peer-to-peer systems research can be applied for building systems that scale to large datacenters with hundreds of thousands of machines distributed at multiple sites. References 1. J. Gray, Distributed computing economics, Queue, vol. 6, no. 3, pp , J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, Commun. ACM, vol. 51, no. 1, pp , M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, Dryad: Distributed data-parallel programs from sequential building blocks, in EuroSys 07: Proc. of the 2nd ACM SIGOPS, New York, NY, USA, 2007, pp J. Dean, Large-Scale Distributed Systems at Google: Current Systems and Future Directions, J. Gantz and D. Reinsel, As the economy contracts, the Digital Universe expands, IDC Multimedia White Paper, 2009.
19 Scalable Storage for Data-Intensive Computing M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, Above the Clouds: A Berkeley View of Cloud Computing, EECS Department, University of California, Berkeley, Tech. Rep., S. Ghemawat, H. Gobioff, and S.-T. Leung, The Google file system, SIGOPS Oper. Syst. Rev., vol. 37, no. 5, pp , M. K. McKusick and S. Quinlan, GFS: Evolution on Fast-forward, Queue, vol. 7, no. 7, pp , D. Roselli, J. Lorch, and T. Anderson, A comparison of file system workloads, in Proceedings of the annual conference on USENIX Annual Technical Conference. USENIX Association, J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao, Oceanstore: An architecture for globalscale persistent storage, in Proc. of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, J. Ledlie, J. Shneidman, M. Seltzer, and J. Huth, Scooped, again, Lecture notes in computer science, pp , R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon, Design and implementation of the sun network filesystem, in Proceedings of the Summer 1985 USENIX Conference, 1985, pp J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, and M. West, Scale and performance in a distributed file system, ACM Transactions on Computer Systems (TOCS), vol. 6, no. 1, pp , D. Eastlake and P. Jones, US secure hash algorithm 1 (SHA1), RFC 3174, September, Tech. Rep., B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, and J. Kubiatowicz, Tapestry: a resilient global-scale overlay for service deployment, in IEEE J. Selected Areas in Communications, January A. Rowstron and P. Druschel, Pastry: scalable, decentralized object location and routing for large-scale peer-to-peer systems, in ACM Middleware, November G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, Dynamo: Amazon s highly available key-value store, ACM SIGOPS Operating Systems Review, vol. 41, no. 6, p. 220, B. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. Kaashoek, J. Kubiatowicz, and R. Morris, Efficient replica maintenance for distributed storage systems, in Proc. of NSDI, vol. 6, P. Corbett and D. Feitelson, The Vesta parallel file system, ACM Transactions on Computer Systems (TOCS), vol. 14, no. 3, pp , P. Schwan, Lustre: Building a file system for 1000-node clusters, in Proceedings of the 2003 Linux Symposium, S. Weil, S. Brandt, E. Miller, D. Long, and C. Maltzahn, Ceph: A scalable, high-performance distributed file system, in Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), P. Druschel and A. Rowstron, PAST: A large-scale, persistent peer-to-peer storage utility, in Proc. HotOS VIII, 2001, pp F. Dabek, M. Kaashoek, D. Karger, R. Morris, and I. Stoica, Wide-area cooperative storage with CFS, ACM SIGOPS Operating Systems Review, vol. 35, no. 5, pp , I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan, Chord: a scalable peerto-peer lookup service for Internet applications, in ACM SIGCOMM, August T. M. G. Athicha Muthitacharoen, Robert Morris and B. Chen, Ivy: A Read/Write Peer-to- Peer File System, in OSDI, December R. Hasan, Z. Anwar, W. Yurcik, L. Brumbaugh, and R. Campbell, A survey of peer-to-peer storage techniques for distributed file systems, in ITCC, vol. 5, pp A. Lakshman and P. Malik, Cassandra: structured storage system on a P2P network, in Proc. of the 28th ACM symposium on Principles of distributed computing, 2009.
20 20 Abhishek Verma, Shivaram Venkataraman, Matthew Caesar, and Roy H. Campbell 28. F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber, Bigtable: A distributed storage system for structured data, in Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI06), Acknowledgements This work was funded in part by NSF IIS grant and in part by NSF CCF grant The views expressed are those of the authors only.
Efficient Metadata Management for Cloud Computing applications
Efficient Metadata Management for Cloud Computing applications Abhishek Verma Shivaram Venkataraman Matthew Caesar Roy Campbell {verma7, venkata4, caesar, rhc} @illinois.edu University of Illinois at Urbana-Champaign
Distributed File Systems
Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)
Distributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
Snapshots in Hadoop Distributed File System
Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any
SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol-1, Iss.-3, JUNE 2014, 54-58 IIST SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE
Seminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems
Seminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems Prabhakaran Murugesan Outline File Transfer Protocol (FTP) Network File System (NFS) Andrew File System (AFS)
P2P Storage Systems. Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung
P2P Storage Systems Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung Outline Introduction Distributed file systems P2P file-swapping systems P2P storage systems Strengths
The Google File System
The Google File System Motivations of NFS NFS (Network File System) Allow to access files in other systems as local files Actually a network protocol (initially only one server) Simple and fast server
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
Scalable Multiple NameNodes Hadoop Cloud Storage System
Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 Load Balancing Heterogeneous Request in DHT-based P2P Systems Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
Distributed File Systems
Distributed File Systems Alemnew Sheferaw Asrese University of Trento - Italy December 12, 2012 Acknowledgement: Mauro Fruet Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 1 / 55 Outline
Data-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan [email protected] Abstract Every day, we create 2.5 quintillion
Sunita Suralkar, Ashwini Mujumdar, Gayatri Masiwal, Manasi Kulkarni Department of Computer Technology, Veermata Jijabai Technological Institute
Review of Distributed File Systems: Case Studies Sunita Suralkar, Ashwini Mujumdar, Gayatri Masiwal, Manasi Kulkarni Department of Computer Technology, Veermata Jijabai Technological Institute Abstract
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra
HDFS Under the Hood. Sanjay Radia. [email protected] Grid Computing, Hadoop Yahoo Inc.
HDFS Under the Hood Sanjay Radia [email protected] Grid Computing, Hadoop Yahoo Inc. 1 Outline Overview of Hadoop, an open source project Design of HDFS On going work 2 Hadoop Hadoop provides a framework
CS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
CS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
Google File System. Web and scalability
Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might
Fault Tolerance in Hadoop for Work Migration
1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous
Survey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university
The Google File System
The Google File System By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Presented at SOSP 2003) Introduction Google search engine. Applications process lots of data. Need good file system. Solution:
Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam
A Survey on P2P File Sharing Systems Using Proximity-aware interest Clustering Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam
Parallel Processing of cluster by Map Reduce
Parallel Processing of cluster by Map Reduce Abstract Madhavi Vaidya, Department of Computer Science Vivekanand College, Chembur, Mumbai [email protected] MapReduce is a parallel programming model
Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann
Storage Systems Autumn 2009 Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Scaling RAID architectures Using traditional RAID architecture does not scale Adding news disk implies
Hypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
Chapter 11 Distributed File Systems. Distributed File Systems
Chapter 11 Distributed File Systems Introduction Case studies NFS Coda 1 Distributed File Systems A distributed file system enables clients to access files stored on one or more remote file servers A file
Distributed Hash Tables in P2P Systems - A literary survey
Distributed Hash Tables in P2P Systems - A literary survey Timo Tanner Helsinki University of Technology [email protected] Abstract Distributed Hash Tables (DHT) are algorithms used in modern peer-to-peer
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,[email protected]
Distributed File Systems
Distributed File Systems File Characteristics From Andrew File System work: most files are small transfer files rather than disk blocks? reading more common than writing most access is sequential most
Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
Comparative analysis of Google File System and Hadoop Distributed File System
Comparative analysis of Google File System and Hadoop Distributed File System R.Vijayakumari, R.Kirankumar, K.Gangadhara Rao Dept. of Computer Science, Krishna University, Machilipatnam, India, [email protected]
Design and Evolution of the Apache Hadoop File System(HDFS)
Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop
A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage
Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
HDFS Architecture Guide
by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5
GraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
Magnus: Peer to Peer Backup System
Magnus: Peer to Peer Backup System Naveen Gattu, Richard Huang, John Lynn, Huaxia Xia Department of Computer Science University of California, San Diego Abstract Magnus is a peer-to-peer backup system
Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2
Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 1 PDA College of Engineering, Gulbarga, Karnataka, India [email protected] 2 PDA College of Engineering, Gulbarga, Karnataka,
Large Scale Distributed File System Survey
Large Scale Distributed File System Survey Yuduo Zhou Indiana University Bloomington [email protected] ABSTRACT Cloud computing, one type of distributed systems, is becoming very popular. It has demonstrated
P2P Networking - Advantages and Disadvantages of Virtualization
Are Virtualized Overlay Networks Too Much of a Good Thing? Pete Keleher, Bobby Bhattacharjee, Bujor Silaghi Department of Computer Science University of Maryland, College Park [email protected] 1 Introduction
MinCopysets: Derandomizing Replication In Cloud Storage
MinCopysets: Derandomizing Replication In Cloud Storage Asaf Cidon, Ryan Stutsman, Stephen Rumble, Sachin Katti, John Ousterhout and Mendel Rosenblum Stanford University [email protected], {stutsman,rumble,skatti,ouster,mendel}@cs.stanford.edu
IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY
IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY Hadoop Distributed File System: What and Why? Ashwini Dhruva Nikam, Computer Science & Engineering, J.D.I.E.T., Yavatmal. Maharashtra,
Acknowledgements. Peer to Peer File Storage Systems. Target Uses. P2P File Systems CS 699. Serving data with inexpensive hosts:
Acknowledgements Peer to Peer File Storage Systems CS 699 Some of the followings slides are borrowed from a talk by Robert Morris (MIT) 1 2 P2P File Systems Target Uses File Sharing is one of the most
Web Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)
1 1 Distributed Systems What are distributed systems? How would you characterize them? Components of the system are located at networked computers Cooperate to provide some service No shared memory Communication
Massive Data Storage
Massive Data Storage Storage on the "Cloud" and the Google File System paper by: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung presentation by: Joshua Michalczak COP 4810 - Topics in Computer Science
Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election
Distributed Metadata Management Scheme in HDFS
International Journal of Scientific and Research Publications, Volume 3, Issue 5, May 2013 1 Distributed Metadata Management Scheme in HDFS Mrudula Varade *, Vimla Jethani ** * Department of Computer Engineering,
Lifetime Management of Cache Memory using Hadoop Snehal Deshmukh 1 Computer, PGMCOE, Wagholi, Pune, India
Volume 3, Issue 1, January 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com ISSN:
Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc [email protected]
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc [email protected] What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data
MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM
MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM Julia Myint 1 and Thinn Thu Naing 2 1 University of Computer Studies, Yangon, Myanmar [email protected] 2 University of Computer
Big data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
Network File System (NFS) Pradipta De [email protected]
Network File System (NFS) Pradipta De [email protected] Today s Topic Network File System Type of Distributed file system NFS protocol NFS cache consistency issue CSE506: Ext Filesystem 2 NFS
Tornado: A Capability-Aware Peer-to-Peer Storage Network
Tornado: A Capability-Aware Peer-to-Peer Storage Network Hung-Chang Hsiao [email protected] Chung-Ta King* [email protected] Department of Computer Science National Tsing Hua University Hsinchu,
Peer-to-Peer and Grid Computing. Chapter 4: Peer-to-Peer Storage
Peer-to-Peer and Grid Computing Chapter 4: Peer-to-Peer Storage Chapter Outline Using DHTs to build more complex systems How DHT can help? What problems DHTs solve? What problems are left unsolved? P2P
BlobSeer: Towards efficient data storage management on large-scale, distributed systems
: Towards efficient data storage management on large-scale, distributed systems Bogdan Nicolae University of Rennes 1, France KerData Team, INRIA Rennes Bretagne-Atlantique PhD Advisors: Gabriel Antoniu
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
Network Attached Storage. Jinfeng Yang Oct/19/2015
Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability
Reduction of Data at Namenode in HDFS using harballing Technique
Reduction of Data at Namenode in HDFS using harballing Technique Vaibhav Gopal Korat, Kumar Swamy Pamu [email protected] [email protected] Abstract HDFS stands for the Hadoop Distributed File System.
Distributed file system in cloud based on load rebalancing algorithm
Distributed file system in cloud based on load rebalancing algorithm B.Mamatha(M.Tech) Computer Science & Engineering [email protected] K Sandeep(M.Tech) Assistant Professor PRRM Engineering College
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
Apache Hadoop FileSystem and its Usage in Facebook
Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System [email protected] Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs
Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
The Hadoop Distributed File System
The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS
Diagram 1: Islands of storage across a digital broadcast workflow
XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,
Big Table A Distributed Storage System For Data
Big Table A Distributed Storage System For Data OSDI 2006 Fay Chang, Jeffrey Dean, Sanjay Ghemawat et.al. Presented by Rahul Malviya Why BigTable? Lots of (semi-)structured data at Google - - URLs: Contents,
Jeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
Big Data Processing in the Cloud. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center
Big Data Processing in the Cloud Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center Data is ONLY as useful as the decisions it enables 2 Data is ONLY as useful as the decisions it enables
Designing a Cloud Storage System
Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes
New Structured P2P Network with Dynamic Load Balancing Scheme
New Structured P2P Network with Dynamic Load Balancing Scheme Atushi TAKEDA, Takuma OIDE and Akiko TAKAHASHI Department of Information Science, Tohoku Gakuin University Department of Information Engineering,
Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities
Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities Kavitha Ranganathan, Adriana Iamnitchi, Ian Foster Department of Computer Science, The University
Research on Job Scheduling Algorithm in Hadoop
Journal of Computational Information Systems 7: 6 () 5769-5775 Available at http://www.jofcis.com Research on Job Scheduling Algorithm in Hadoop Yang XIA, Lei WANG, Qiang ZHAO, Gongxuan ZHANG School of
Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece [email protected].
Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece [email protected] Joining Cassandra Binjiang Tao Computer Science Department University of Crete Heraklion,
ZooKeeper. Table of contents
by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
THE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
HDFS Space Consolidation
HDFS Space Consolidation Aastha Mehta*,1,2, Deepti Banka*,1,2, Kartheek Muthyala*,1,2, Priya Sehgal 1, Ajay Bakre 1 *Student Authors 1 Advanced Technology Group, NetApp Inc., Bangalore, India 2 Birla Institute
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE
TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE Deploy a modern hyperscale storage platform on commodity infrastructure ABSTRACT This document provides a detailed overview of the EMC
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Load Balancing in Structured P2P Systems
1 Load Balancing in Structured P2P Systems Ananth Rao Karthik Lakshminarayanan Sonesh Surana Richard Karp Ion Stoica ananthar, karthik, sonesh, karp, istoica @cs.berkeley.edu Abstract Most P2P systems
A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM
A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM Dr.S. DHANALAKSHMI 1, R. ANUPRIYA 2 1 Prof & Head, 2 Research Scholar Computer Science and Applications, Vivekanandha College of Arts and Sciences
RADOS: A Scalable, Reliable Storage Service for Petabyte- scale Storage Clusters
RADOS: A Scalable, Reliable Storage Service for Petabyte- scale Storage Clusters Sage Weil, Andrew Leung, Scott Brandt, Carlos Maltzahn {sage,aleung,scott,carlosm}@cs.ucsc.edu University of California,
Dynamo: Amazon s Highly Available Key-value Store
Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and
What is Analytic Infrastructure and Why Should You Care?
What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group [email protected] ABSTRACT We define analytic infrastructure to be the services,
Peer-to-Peer Replication
Peer-to-Peer Replication Matthieu Weber September 13, 2002 Contents 1 Introduction 1 2 Database Replication 2 2.1 Synchronous Replication..................... 2 2.2 Asynchronous Replication....................
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
Performance Evaluation of the Illinois Cloud Computing Testbed
Performance Evaluation of the Illinois Cloud Computing Testbed Ahmed Khurshid, Abdullah Al-Nayeem, and Indranil Gupta Department of Computer Science University of Illinois at Urbana-Champaign Abstract.
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
