An Overview of Codes Tailor-made for Networked Distributed Data Storage
|
|
|
- Chastity Hampton
- 9 years ago
- Views:
Transcription
1 An Overview of Codes Tailor-made for Networked Distributed Data Storage Anwitaman Datta, Frédérique Oggier Nanyang Technological University, Singapore arxiv: v1 [cs.dc] 11 Sep 2011 Abstract The continuously increasing amount of digital data generated by today s society asks for better storage solutions. This survey looks at a new generation of coding techniques designed specifically for the needs of distributed networked storage systems, trying to reach the best compromise among storage space efficiency, fault tolerance, and maintenance overheads. Four families of codes tailor-made for distributed settings, namely - pyramid, hierarchical, regenerating and self-repairing codes - are presented at a high level, emphasizing the main ideas behind each of these codes, and discussing their pros and cons, before concluding with a quantitative comparison among them. This survey deliberately excluded technical details for the codes, nor does it provide an exhaustive summary of the numerous works. Instead, it provides an overview of the major code families in a manner easily accessible to a broad audience, by presenting the big picture of advances in coding techniques for distributed storage solutions. Keywords: coding techniques, distributed networked storage systems, hierarchical codes, pyramid codes, regenerating codes, self-repairing codes. I. INTRODUCTION Similarly to the role granaries played in the march of agricultural civilizations of yore, massive data stores form a cornerstone in today s knowledge based societies, apart being a key enabler of the cloud paradigm. The volume of data to be stored is huge, and increasing rapidly. As of June 2011, a study sponsored by the information storage company EMC estimates that the world s data is more than doubling every 2 years, growing faster than Moore s Law, and reaching 1.8 zettabytes of data to be stored in This involves various kinds of digital data continuously being generated by individuals as well as business and government organizations, who all need scalable solutions to store such data reliably and securely over time. Quoting the EMC study, next decade will see enterprises managing 50 more data, and handling 75 more files. A very basic yet critical concern in storage systems is ensuring that the data is not irrevocably lost due to device failures. Redundancy is key to ensure fault tolerance, which can be achieved using replication, i.e., storing multiple copies of the same data. For instance, a common practice is to keep at least three copies of the data (called three way replication), the rationale being that if, or rather, when, one copy of the data is lost, the second copy is used to regenerate the first one, and hopefully, not both the two other copies are lost before repair is completed. There is of course a price to pay: redundancy naturally reduces the efficiency, or alternatively put, increases the overheads of the storage infrastructure. 1 Imagine if one could come up with a way to store the same amount of data, reliably, but using only half the needed storage space. That would imply a reduction of the storage infrastructure by half. The cost for such infrastructure should be estimated not only in terms of the raw cents per gigabyte metric, but also in terms of the physical space needed - the corresponding cost of real estate, as well as construction, operational and maintenance costs of the data center. The numbers can in fact be staggering, as we move towards a world witnessing data centers which equal medium sized cities in terms of power consumption. A US Environmental Protection Agency report of indicates that the US used 61 billion kilowatt-hours of power for data centers and servers in That is 1.5 percent of all US electricity use, and it cost the companies that paid those bills more than $4.5 billion. Such observations have led to a wide-scale realization of the energy and environmental implications of large-scale data centers, and consequent efforts to build carbon neutral data centers. There are different ways one could imagine reducing the storage space, starting from the physical media, which has witnessed a continuous shrinking of physical space per unit of data. This article questions storage space at a different layer, that of the trade-of between fault tolerance, and the level of efficiency in storage space utilization in large scale distributed data storage systems. The tension between more fault tolerance versus less storage overhead is well felt through the evolution of RAID (redundant array of inexpensive/independent disks). RAID is a technology that enables storage virtualization by distributing data across multiple disk drives that act as a logical unit, which has now been used for over two decades. The first RAID system, RAID 0, offered good performance and additional storage by exactly storing data across multiple drives, so that the disk head can read more data in a single move, a technique referred to as striping. Its main drawback was that it had no fault tolerance, and a single disk failure could cause the loss of the entire stored data. This was remedied in RAID 1, where replication (mirroring) was introduced, but at the cost of storage overhead. The subsequent RAID 2,3,4 and 5 exchanged replication against coding, by opting for parity bits, possibly the simplest coding technique, also known as Hamming parity codes: a parity bit is just the sum of all bits across one drive. This allowed to retrieve the stored data, even 2
2 in the event of a failure. While the original RAID was designed with stand alone computer systems in mind, draid -d stands for distributed- has since been widely adopted in networked storage systems, including in storage area networks. The need for more storage capacity has however pushed for more storage capacity per disk, more disks per system, and consequently the use of cheaper (less reliable) disks, resulting in the demand for more fault tolerance and thus in a new quest for better coding techniques for (d)raid, ideally as simple as parity codes, but at the same time capable of tolerating more failures. A notable example of codes proposed to suit the peculiarites of (d)raid systems is the family of weaver codes [3], which can deal with up to 4 faults. Finding codes offering the best trade-off between amount of redundancy and fault tolerance is no new question to a coding theorist. In communication theory, data is sent over a noisy channel, and coding is used to append redundancy to the transmitted signal, to help the receiver recover the intended message, even when alterated by noise, possibly erasures. The length of the data, or information symbols, is usually denoted by k, and n k symbols of redundancy are computed to obtain a codeword of length n. The amount of information k that a code can carry and its correction capability are linked, and this dependency is characterized by the so-called Singleton bound. The best EC(n, k) codes, meeting the bound, are called maximum distance separable (MDS) codes. The celebrated Reed-Solomon codes [9] are an instance of such codes. MDS erasure codes, which are capable of recovering erased information symbols, can handle up to n k erasures (see Figure 1). Fig. 1. Coding for erasure channels: a message of k symbols is encoded into a codeword of length n, before being sent over an erasure channel. As long as not more than n k symbols are lost (erased), the receiver can decode the message. Now armed with MDS codes, it seems at first clear how to upgrade RAID: replace the Hamming code by a more powerful MDS code, thus immediately guaranteeing the best possible fault tolerance at the lowest cost. Things are however not that straightforward: because erasure codes have been designed and optimized to deal with lossy communication channels, their sole objective is to recover a transmitted message at the receiver. There is no notion of reliability over time, placement of the data, or read/write operations to optimize. Furthermore, the complexity of the mathematical operations involved, for example in Reed-Solomon codes, necessitates dedicated hardware, making these codes not suitable for RAID systems. There is one example of storage application where MDS codes have been successfully used though, providing orders of magnitude more efficiency/reliability: that of the CD/DVD. When there is a scratch on a CD, the information stored in those portions are lost (erasures), but depending on the degree of such losses, the disc can still be read, thanks to the incorporation of Reed-Solomon codes. That is the exception that proves the rule. II. NETWORKED DISTRIBUTED STORAGE SYSTEMS MDS codes have also been considered in the context of networked distributed storage settings. If some storage nodes fail, or become inaccessible, e.g., due to communication faults or power outage, the data should remain available. By considering a node failure as an erasure, the data can be encoded with an erasure code, which will then provide fault tolerance, in that the stored data can be retrieved even in the presence of faulty nodes, as long as the number of failures does not exceed the capability of the code. Such fault tolerance is achieved using a much smaller overhead than three way replication. Alternatively put, for same storage overhead, a much higher level of fault-tolerance can be achieved. For instance, if the probability of individual storage devices to fail is 0.1, then with the use of a (9, 3) MDS erasure code, the probability of losing the data is roughly , but if three way replication scheme is used instead, the probability of losing the data will be as high as There is however a fundamental difference between a CD or a communication channel on the one hand, and a networked storage system on the other hand. Unlike in the CD, where a scratch cannot be removed, or in a communication channel where the receiver just has to deal with what it receives, in a networked storage system, the failed storage nodes can be replaced with new ones, and in fact this maintenance process is essential for durability and availability of the data in the long run. Specifically, the loss of redundancy needs to be replenished, by recreating new redundancy at other existing or new storage devices in the system. Such ability to replace failed storage devices, and regenerate lost redundancy enables the deployment of a reliable storage system comprising many storage nodes, and thus scaling to extremely large volumes of data - it also makes regeneration of lost data imperative. As already mentioned above, MDS erasure codes were not designed to address reliability over time, in particular they were not made to address maintenance/repair issues. When a data block encoded by an MDS erasure code is lost and needs to be recreated, one would first need data equivalent in amount to recreate the whole object in one place (either by storing a full copy of the data, or else by downloading an adequate number of encoded blocks), even in order to recreate a single encoded block, as illustrated on Figure 2.
3 (a) Data retrieval: as long as d k nodes are alive, the object can be (b) Node repair: one node has to recover the object, re-encode it, and then retrieved. distributed the lost blocks to the new nodes. Fig. 2. Erasure coding for distributed networked storage: the object to be stored is cut into k, then encoded into n fragments, given to different storage nodes. Reconstruction of the data is shown on the left, while repair of node failures is illustrated on the right. The particularly good behavior exhibited by MDS erasure codes with respect to storage overhead vs. fault-tolerance trade-off has remained appealing over time, encouraged by the CD success story, and on-going efforts have been witnessed, to retain their benefits while mitigating their drawbacks, such as lack of repair mechanism and complexity. Companies such as CleverSafe 3 and Wuala 4 claim to use Reed-Solomon like codes. Within the academic circle, erasure codes have been intensively studied in the peer-to-peer storage systems related research, and have regained interest within the coding community, which has started to take on the challenges offered by storage systems. These challenges can tentatively be summarized as follows: design erasure codes which carry out repair efficiently, in terms of (i) bandwidth, which is a very scarce resource in data-center networks, (ii) computation and energy costs, but also (iii) repair time, since delay in the repair process may leave the system vulnerable to further faults leading to irretrievable loss of data. This is besides spelling out many of the system issues, such as placement of the data, number of disk accesses, management of the meta-information to coordinate the network, etc. With respect to the repair process, there are arguably two extreme points possible in the design-space of codes tailormade for distributed networked storage: (i) Minimize the absolute amount of data transfer needed to recreate the lost data from one node. Regenerating codes [1] form a new family of codes achieving the minimum possible repair bandwidth (per repair) given an amount of storage per node, where the optimal storage-bandwidth trade-off is determined using a network coding inspired analysis, assuming that each new coming node contacts at least k live nodes for each repair. Regenerating codes, like MDS erasure codes, allow data retrievability from any arbitrary set of k nodes Collaborative regenerating codes [10], [5] are a generalization allowing to deal with multiple faults. Actual explicit constructions of regenerating codes are currently sparse (see [11] for an example of non-collaborative codes at the two extremes of the trade-off curve). (ii) Minimize the number of nodes to be contacted for repairing one node failure. Recently proposed self-repairing codes [7] achieve this optimal, by allowing one repair while contacting only two nodes, as long as less than half of the nodes have failed. This is also true for multiple faults, assuming a proper repair scheduling. This is at the price of sacrificing the MDS property, though a static resilience analysis of self-repairing codes has shown that object retrieval is little impaired by the loss of the MDS property. It might be worth noting that other codes, such as weaver codes [3], or some instances of pyramid codes [4] which will be discussed below have also traded the MDS property for better repair. As will be emphasized again later on, being able to repair by contacting less than k nodes and the MDS property are mutually exclusive, and code and system designers need to choose one or the other. There are several other codes which fall somewhere in between these extremes, and have been tailor-made with networked distributed storage applications in mind. Prominent among these are hierarchical and pyramid codes, that we will summarize first, before taking closer look at regenerating and self-repairing codes. III. PYRAMID AND HIERARCHICAL CODES In order to alleviate the drawbacks of erasure codes with respect to applications to storage systems, iterated erasure code constructions have been proposed (pyramid codes [4] and independently, hierarchical codes [2]), where subgroups of the code itself are essentially basic erasure codes. We note here that the specific instances discussed in [4] and [2] are slightly different, however, since the underlying principle is
4 analogous, we present them as such. Figure 3 illustrates the basic ideas, which we summarize next. Consider first a data object of size s k, partitioned into s subgroups, each comprising k of the unencoded blocks. An EC(n, k ) code is used to generate n k local redundancy blocks within each such subgroup. A further r global redundancy blocks are generated using all the s k unencoded blocks, and some coding scheme. Thus, a code group is formed, which effectively is a code mapping s k unencoded blocks into s n + r encoded blocks. Note that the local redundancy blocks can be used to repair any losses within a subgroup - thus requiring to access a number of blocks (much) less than the equivalent to recreate the whole object, while the global redundancy blocks provide further protection if there are too many failures within a single subgroup. If the data object is instead of size g s k, then the basic code group working on objects of size s k as described above can be repeated, to build another level of the hierarchy - and further global redundant blocks can be created using all the original g s k blocks. This process can in fact be iterated for a multi-hierarchical or pyramid code, as the names suggest. At each level of such a hierarchy, there is some local redundancy which can repair lost blocks without accessing blocks outside the subgroup, while if there are too many errors within a subgroup, then the global redundancy at that level will be used - if even that is not adequate to carry out the repairs, then one will need to move further up the pyramid, and so on. Fig. 3. Pyramid codes With respect to two design criteria that we chose to investigate, namely repair bandwidth and number of nodes contacted for repair, it is clear that an advantage of both pyramid and hierarchical codes is the reduction in the number of nodes that need to be contacted (and consequently, a reduction in bandwidth usage) to recreate a lost encoded block, though no optimality can be claimed, neither w.r.to the nodes contacted nor bandwidth used. The (possibly) biggest disadvantage is the lack of symmetry in the structure, meaning that some encoded blocks are more or less important than others, which makes it difficult if not impossible to do a thorough resilience analysis (affecting the understanding of the code effectiveness). Furthermore, more complicated algorithms (either for reconstruction or repair) and system design to utilize the codes in practice are thus required. Note that contrarily to hierarchical codes, which were intended to improve repair in distributed networked storage systems, pyramid codes were meant to improve read performance, by trading storage space efficiency for access efficiency (in particular, some instances of pyramid codes are not MDS). Coinciding these works, researchers from the network coding community started to study the fundamental limits and trade-offs of bandwidth usage for regeneration of a lost encoded block vis-a-vis the storage overhead, subject to the MDS constraint. This initiative has culminated in a new family of codes, broadly known as regenerating codes, which we discuss next. IV. REGENERATING CODES The repair of lost redundancy in a storage system can be abstracted as an information flow graph [1] as follows. Initially, the data object to be stored is processed and divided in smaller encoded blocks to be stored across n storage nodes, such that obtaining any k of these n encoded blocks would suffice to recreate the original data (this is essentially the MDS property). When some storage nodes fail, the corresponding stored data is lost. New nodes, at which some new information will then need to be regenerated, will contact the existing live nodes to do so. Over time, many of the original nodes may become unavailable, and these existing live nodes themselves may be nodes which had joined subsequently and had carried out regeneration themselves. Thus, the regeneration process essentially needs to ensure two things: (i) independently of the sequence of failures, the regeneration is done in a manner, which allows further future regenerations while always guaranteeing that (ii) any of the k live nodes possess enough information among them to permit reconstruction of the original data object. Consider that each storage node stores α amount of data. Consider also that when a node fails, the new node contacts d > k live nodes and downloads β amount of data from each contacted node. We will call d the fan-in for regeneration. Each storage node X i can be modeled as two logical nodes Xin i and Xi out, which are connected with a directed edge Xin i Xi out with a weight α representing the storage capacity of the node. The regeneration process is modeled with directed edges of weight β from the X out nodes of contacted live nodes to the X in node of the new node at which the lost data is being regenerated. The original placement of the data distributed over n nodes can similarly be modeled as directed edges of weight α from the source (data owner) to the original storage nodes. Finally, the data reconstruction problem can be abstracted with edges of weight α to represent the destination (data accessor) downloading data from arbitrary k live storage nodes. Note that the edges from source, and to destination
5 could also be modeled with edges with larger capacities (e.g., ), but it does not matter - since the maximum meaningful information that is transferred over such links correspond to the storage capacity. Figure 4(a) depicts an abstraction for the information flow graph, where each storage node is modeled as two virtual nodes, namely X in which collects β amount of information from arbitrary d live nodes in the system, while storing a maximum of α amount of information at X out, which is accessed by any data collector contacting the storage node. Then, the maximum information that can flow from the source to the destination is determined by the max-flow over a mincut of this graph. For the original data to be reconstructible at the destination, this flow needs to be at least as large as the size of the original object. (a) Regeneration process for one failed node using network coding principles: A max-flow min-cut analysis yields the feasible values for storage capacity α and amount of information obtained from each live node contacted β in terms of the number of nodes contacted d and code parameters n, k, where d k. Storage (α) Benefit of collaboration: Storage Bandwidth tradeoff t=1 t=4 t= Repair cost (γ) (b) Based on the max-flow min-cut analysis, one can determine the trade-off curve for the amount of storage space α used per node in the system, and the amount of bandwidth γ needed to regenerate a lost node. If multiple repairs (t) are carried out simultaneously, and the t new nodes at which lost redundancy is being created collaborate among themselves, then better trade-offs can be realized, as can be observed from the plot. Fig. 4. The underlying network coding theory inspiring regenerating codes A code which enables the information flow to be actually equal to the object size has been called a regenerating code. Now, given a regenerating code, the natural question to be addressed is, what are the minimal storage capacity α and bandwidth β needed for repair, as a function of the object size, and parameters k, n, and d?. This can be formulated as a linear non-convex optimization problem: minimize the total download bandwidth dβ, subject to the constraint that the information flow equals the object size. The optimal solution is a piecewise linear function, which describes a trade-off between the storage capacity α and the bandwidth β as shown in Figure 4(b), and has two distinguished boundary points: the minimal storage repair (MSR) point (when α is equal to the object size divided by k), and the minimal bandwidth repair (MBR) point. Note that the trade-off analysis only determines what can best be achieved, but in itself does not provide any specific code achieving the same, though the fact that any k nodes must provide enough redundancy for object reconstruction in itself already implies that a regenerating code includes an underlying MDS code. Several codes have since been proposed - most of which operate either at the MSR or MBR points of the tradeoff curve. We do not provide an exhaustive list, but summarize the two main classes of regenerating codes instead, each of them illustrated by an example. The max-flow min-cut argument determines the amount of data that needs to be transferred in order to regenerate data at a new node, where the regenerated data provides equivalent redundancy to what was provided by the lost data. There is however no constraint or need to regenerate precisely the same (bit wise) data as was lost (see Figure 5 (a)). This observation has been leveraged to propose what is known as functional regeneration - where the regenerated data is in fact not the same as that lost, but nevertheless provides equivalent redundancy. In contrast, if the regenerated data is bit wise identical to what was lost, then it is called exact regeneration. Exact regeneration is illustrated on Figure 5 (b). The original analysis was based on the assumption that individual repairs are done independently, or in other words, that regenerating codes are designed to repair one failure. The analysis has since been generalized [5], [10] to show that in case of multiple faults, the new nodes carrying out regenerations can furthermore collaborate among themselves to further reduce the overall bandwidth needed per regeneration (Figure 4(b)). Instances of codes for this setting are even sparser than for classical regenerating codes, and up to now, the only code construction proposed incurs a repair bandwidth cost equal to that of erasure codes under lazy repair [10], but distributes the repair load. Optimal code constructions which meet the trade-off curve remains an open issue. Regenerating codes promise the optimal usage of bandwidth per repair (subject to MDS property of the codes). Some of the proposed codes apply network coding on top of traditional erasure codes and benefit from the mature decoding algorithms for erasure codes. Furthermore, the notion of functional repair provides greater flexibility in code design choices. However, the information flow analysis itself does not suggest any code construction, and the subsequent codes proposed so far are rather restrictive - for example, some works only deal with a single fault, and need to contact all the other n 1 nodes. The
6 (a) An example of functional repair for k = 2 and n = 4, adapted from [1]: an object is cut into 4 pieces o 1,..., o 4, and two linear combinations of them are stored at each node. When the 4th node is failing, a new node downloads linear combinations of the two pieces at each node (the number on each edge describes what is the factor that multiplies the encoded fragment), from which it computes two new pieces of data, different from that lost, but any k = 2 of the 4 nodes permit object retrieval. (b) An example of exact repair from [11]: an object o is encoded by taking its inner product with 10 vectors v 1,..., v 10, to obtain o T v i, i = 1,..., 10, as encoded fragments. They are distributed to the 5 nodes N 1,..., N 5 as shown. Say, node N 2 fails. A newcomer can regenerate by contacting every node left (shown by the edges), and download one encoded piece from each of them, namely o T v 1 from N 1, o T v 5 from N 3, o T v 6 from N 4 and o T v 7 from N 5. (a) An example of self-repairing codes from [8]: The object o is split into four pieces, and xor-ed combinations of these pieces are generated. The specific combinations (represented using the basis vectors) to be used is determined based on projective geometric principles. Two such pieces are stored at each node, over a group of five nodes, so that contacting any two nodes is adequate to reconstruct the original object. Furthermore, systematic pieces are available in the system, which can be downloaded and just appended together to reconstruct the original data. If a node fails, say, N 5, then the lost data can be reconstructed by contacting a subset of live nodes. Two different strategies, with different fan-ins d = 2 and d = 3, and correspondingly different total bandwidth usage, have been shown to demonstrate some of the flexibilities of the regeneration process. Fig. 5. Regenerating codes: functional versus exact repair. value of fan-in d for regeneration has practical implications. Lower fan-in is good, because there are then more alternative choices that can be made, particularly avoiding bottlenecks, and possibly accomplishing repairs in parallel. Furthermore, the algorithmic and system design complexity of using network coding is much larger than even traditional erasure codes, apart the added computational overheads. V. SELF-REPAIRING CODES We mentioned that regenerating codes implicitly include an MDS code, and assume that the number of contacted live nodes d is bigger than k. We would like to emphasize that relaxing the constraint d k to allow d < k necessarily implies the loss of the MDS property. Indeed, if a node can be repaired with d < k other nodes, this means that there exists a linear dependency 5 among less than k fragments, so that if the data accessor were to contact k nodes including those d + 1 linearly dependent ones, he could not recover (b) An example of self-repairing codes from [7]: the object o has length 12, and encoding is done by taking linear combinations of the 12 pieces as shown, which are then stored at 7 nodes. If the 7th node fails, it can be reconstructed in 3 different ways, by contacting either N 1, N 5, or N 2, N 6, or N 3, N 4. If both the 6th and 7th node fail, each of them can still be reconstructed in two different ways, so that for example, a first newcomer can contact first N 1 then N 5 to repair N 7, while the second comer can in parallel contact first N 3 then N 1 to repair N 6. Fig. 6. Self-repairing codes 5 Only linear codes are considered.
7 the encoded object. To the contrary of regenerating codes, self-repairing codes [7] work in a regime where d << k, in particular d can take most of the time the minimal value of d = 2. We note that d = 1 is possible only when using replication/repetition code. As just explained, self-repairing codes are not MDS codes, given that they achieve regeneration using d < k. To evaluate the loss of the MDS property, a static resilience analysis is provided, which for distributed systems, refers to the probability that an object, once stored in the system, will continue to stay available without any further maintenance, even when some individual member nodes of the distributed system become unavailable. The static resilience analysis reveals that there is little difference with MDS codes, especially when the node availability is large (this will be further discussed in next section). Self-repairing codes however mimic the MDS property in the repair domain instead, namely, they satisfy that repair (of single or multiple faults) can be performed by contacting a fixed number of live nodes, this number depending only on the number of faults, and not on which specific encoded fragment is missing. Note that this is in contrast to pyramid and hierarchical codes. Thus, one may say, self-repairing codes try to combine the good properties of both pyramid codes as well as regenerating codes. Two instances of self-repairing codes are known up to date [7], [8]. That repair can be done by contacting d = 2 nodes has the advantage of allowing fast repair, which can be further sped up thanks to parallel repair of multiple faults: since each newcomer may monopolize up to 2 live nodes depending on resource constraints, there is room for several pairs of live nodes to upload repair data at the same time (see Figure 6(a) for an example). Notice that the optimal storage-bandwidth trade-off of regenerating codes does not apply here, since the constraint d > k is relaxed, thus better trade-off points in terms of total bandwidth usage can also be achieved. VI. QUANTITATIVE COMPARISON Self-repairing codes, pyramid and hierarchical codes sacrifice the MDS property in order to achieve regeneration using d < k live nodes. Due to the inherent asymmetry of the role of different encoded pieces in hierarchical and pyramid codes, there is no rigorous analysis to quantify this sacrifice. The trade-offs for homomorphic self-repairing codes have been explicitly determined, and are presented in Figures 7(a) and 7(b). For MDS codes, the possibility to reconstruct the original object by using data stored in any random x storage nodes is binary, it is always possible to reconstruct the data if any k or more storage nodes are used. In contrast, for self-repairing codes, this is probabilistic, as can be observed from the corresponding values of ρ x shown in 7(a). A natural question to ask is then, if multiple nodes fail and no repairs are carried out, then what is the probability that the data still remains available in the system, i.e., what is the static resilience of the system? Specifically, if only p node fraction out of the n storage nodes are actually available and the failures have happened uniformly randomly, then what is the ρ x Conditional probability of reconstruction 0.1 HSRC(31,5) EC(31,5) (a) Comparison of the probability of reconstruction of object using encoded data in x random storage nodes repair traffic per lost block p obj x 0.2 EC(63,5) SRC(63,5) 0.1 EC(31,5) SRC(31,5) p node (b) Static resilience of homomorphic self-repairing codes γ prl γ seq γ eclazy γ MSRGC (d=k+1) γ MSRGC (d=k+2) n=31, k= x th (c) Average traffic normalized with B/k per lost block for various choices of x th (B is the size of the stored object) Fig. 7. Comparison among traditional erasure codes, regenerating codes and self-repairing codes: static resilience analysis and storage-bandwidth trade-off
8 probability that the original data is still available. Figure 7(b) compares the static resilience of self-repairing codes with that of MDS codes (which includes traditional erasure codes as well as regenerating codes), and we observe that while there is some deterioration of the resilience, this is rather marginal. Finally, in a real system, one would not allow failures to accumulate indefinitely, and instead a regeneration process will have to be carried out. If this regeneration is triggered when precisely x th out of the n storage nodes are still available, then, the total data transfer needed to regenerate data for each of the n x th failed nodes is depicted in Figure 7(c). For self-repairing codes, the repairs can be done in sequence or in parallel, denoted using suffices seq and prl, and this is compared with traditional erasure codes (denoted using eclazy) when the repairs are done in sequence 6 as well as with RGC codes at MSR point (corresponding to equivalent α = B/k where B is the object size) for few choices of d. The bandwidth need has been normalized with the size of one encoded fragment, i.e., B/k. We notice that for up to a certain point, self-repairing codes have the least (and a constant of 2) bandwidth need for regeneration, and this is even when the regenerations are carried out in parallel. For larger number of faults, the absolute bandwidth usage for traditional erasure codes and regenerating codes is lower than that of self-repairing codes. However, given that erasure codes need to contact k nodes and regenerating codes need to contact d k nodes, some preliminary empirical studies have shown the regeneration process for such codes to be slow [6], which can in turn make the system vulnerable. In contrast, self-repairing codes, because of an extremely small fan-in d = 2 can support fast and parallel repairs [7] while dealing with a much larger number of simultaneous faults. A more thorough and comparative evaluation of how these various codes perform under realistic circumstances, for instance, taking into account topology connecting the storage nodes, placement strategies, multiple objects, etc. is an open question that the research community is currently engaged in investigating. VII. CONCLUDING REMARKS There has been a long running effort to apply codes for storage systems, and this includes use of traditional erasure codes, as well as other codes such as low density parity check codes (LDPC) coming from communication theory, rateless (digital fountain) codes originally designed for content distribution centric applications, or locally decodable codes, emerging from the theoretical computer science community, to cite a few. The long believed mantra has been the storage device is the erasure channel. Such a simplification ignores various nuances of distributed networked storage systems, particularly with respect to the maintenance process and long term reliability. This realization has led to a renewed interest in designing codes tailor-made for distributed storage. This article surveys the major families of such novel codes, but does so at a very high level, to make the innovations accessible to a broad range of audience, while also serving as a one-stop starting point for anyone interested in delving deeper. To conclude, it needs to be noted, that other techniques for data reliability and efficient storage maintenance are also being heavily investigated in the space of storage technology, for instance, snapshots, deduplication, etc. to name a few. How these aspects will be affected if, or, arguably when coding techniques become mainstream for large scale data-storage, are fascinating open questions for the storage technology community in general, and conversely, coding theorists also need to take into account these additional nuances when designing storage centric codes. REFERENCES [1] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. Wainwright and K. Ramchandran, Network Coding for Distributed Storage Systems IEEE Transactions on Information Theory, Vol. 56, Issue 9, Sept [2] A. Duminuco, E. Biersack, Hierarchical Codes: How to Make Erasure Codes Attractive for Peer-to-Peer Storage Systems, Eighth International Conference on In Peer-to-Peer Computing, P2P 08. [3] J. L. Hafner, WEAVER codes: highly fault tolerant erasure codes for storage systems, 4th conference on USENIX Conference on File and Storage Technologies, FAST 05. [4] C. Huang, M. Chen, and J. Li, Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems, Sixth IEEE International Symposium on Network Computing and Applications, NCA [5] A.-M. Kermarrec, N. Le Scouarnec, G. Straub, Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes, in the proceedings of the 2011 International Symposium on Network Coding (NetCod 2011). [6] L. Pamies-Juarez, E. Biersack, Cost Analysis of Redundancy Schemes for Distributed Storage Systems, preprint, available at [7] F. Oggier, A. Datta, Self-repairing Homomorphic Codes for Distributed Storage Systems, INFOCOM [8] F. Oggier, A. Datta, Self-Repairing Codes for Distributed Storage A Projective Geometric Construction, ITW [9] I. S. Reed, G. Solomon, Polynomial Codes Over Certain Finite Fields, Journal of the Society for Industrial and Appl. Mathematics, no 2, vol 8, SIAM, [10] K. W. Shum, Cooperative Regenerating Codes for Distributed Storage Systems, to appear at ICC 2011, available at arxiv: v1. [11] K. V. Rashmi, N. B. Shah, P. Vijay Kumar, K. Ramchandran, Explicit Construction of Optimal Exact Regenerating Codes for Distributed Storage, Allerton For parallel repairs using erasure codes, the traffic is k = 8, and has not been shown in the figure.
Network Coding for Distributed Storage
Network Coding for Distributed Storage Alex Dimakis USC Overview Motivation Data centers Mobile distributed storage for D2D Specific storage problems Fundamental tradeoff between repair communication and
Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers
Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers Anwitaman Datta Joint work with Frédérique Oggier (SPMS) School of Computer Engineering Nanyang Technological University
Functional-Repair-by-Transfer Regenerating Codes
Functional-Repair-by-Transfer Regenerating Codes Kenneth W Shum and Yuchong Hu Abstract In a distributed storage system a data file is distributed to several storage nodes such that the original file can
Reliability Comparison of Various Regenerating Codes for Cloud Services
Reliability Comparison of Various Regenerating Codes for Cloud Services Yonsei Univ. Seoul, KORE Jung-Hyun Kim, Jin Soo Park, Ki-Hyeon Park, Inseon Kim, Mi-Young Nam, and Hong-Yeop Song ICTC 13, Oct. 14-16,
The Advantages and Disadvantages of Network Computing Nodes
Big Data & Scripting storage networks and distributed file systems 1, 2, in the remainder we use networks of computing nodes to enable computations on even larger datasets for a computation, each node
A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes
A Piggybacing Design Framewor for Read-and Download-efficient Distributed Storage Codes K V Rashmi, Nihar B Shah, Kannan Ramchandran, Fellow, IEEE Department of Electrical Engineering and Computer Sciences
A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster
A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster K. V. Rashmi 1, Nihar B. Shah 1, Dikang Gu 2, Hairong Kuang
How To Encrypt Data With A Power Of N On A K Disk
Towards High Security and Fault Tolerant Dispersed Storage System with Optimized Information Dispersal Algorithm I Hrishikesh Lahkar, II Manjunath C R I,II Jain University, School of Engineering and Technology,
Introduction. What is RAID? The Array and RAID Controller Concept. Click here to print this article. Re-Printed From SLCentral
Click here to print this article. Re-Printed From SLCentral RAID: An In-Depth Guide To RAID Technology Author: Tom Solinap Date Posted: January 24th, 2001 URL: http://www.slcentral.com/articles/01/1/raid
2 Background on Erasure Coding for Distributed Storage
Data Insertion & Archiving in Erasure-coding Based Large-scale Storage Systems Lluis Pamies-Juarez 1, Frédérique Oggier 1 and Anwitaman Datta 2 1 School of Mathematical and Physical Sciences 2 School of
MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT
MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT 1 SARIKA K B, 2 S SUBASREE 1 Department of Computer Science, Nehru College of Engineering and Research Centre, Thrissur, Kerala 2 Professor and Head,
CSE-E5430 Scalable Cloud Computing P Lecture 5
CSE-E5430 Scalable Cloud Computing P Lecture 5 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 12.10-2015 1/34 Fault Tolerance Strategies for Storage
Implementation and Performance Evaluation of Distributed Cloud Storage Solutions using Random Linear Network Coding
Implementation and erformance Evaluation of Distributed Cloud Storage Solutions using Random Linear Network Coding Frank H.. Fitzek 1,2, Tamas Toth 2,3, Aron Szabados 2,3, Morten V. edersen 1, Daniel E.
Load Balancing in Fault Tolerant Video Server
Load Balancing in Fault Tolerant Video Server # D. N. Sujatha*, Girish K*, Rashmi B*, Venugopal K. R*, L. M. Patnaik** *Department of Computer Science and Engineering University Visvesvaraya College of
DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering
DELL RAID PRIMER DELL PERC RAID CONTROLLERS Joe H. Trickey III Dell Storage RAID Product Marketing John Seward Dell Storage RAID Engineering http://www.dell.com/content/topics/topic.aspx/global/products/pvaul/top
What is RAID and how does it work?
What is RAID and how does it work? What is RAID? RAID is the acronym for either redundant array of inexpensive disks or redundant array of independent disks. When first conceived at UC Berkley the former
Data Storage - II: Efficient Usage & Errors
Data Storage - II: Efficient Usage & Errors Week 10, Spring 2005 Updated by M. Naci Akkøk, 27.02.2004, 03.03.2005 based upon slides by Pål Halvorsen, 12.3.2002. Contains slides from: Hector Garcia-Molina
How To Write A Disk Array
200 Chapter 7 (This observation is reinforced and elaborated in Exercises 7.5 and 7.6, and the reader is urged to work through them.) 7.2 RAID Disks are potential bottlenecks for system performance and
Index Terms Cloud Storage Services, data integrity, dependable distributed storage, data dynamics, Cloud Computing.
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Privacy - Preserving
Practical Data Integrity Protection in Network-Coded Cloud Storage
Practical Data Integrity Protection in Network-Coded Cloud Storage Henry C. H. Chen Department of Computer Science and Engineering The Chinese University of Hong Kong Outline Introduction FMSR in NCCloud
A Network Flow Approach in Cloud Computing
1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
Distributed Storage Networks and Computer Forensics
Distributed Storage Networks 5 Raid-6 Encoding Technical Faculty Winter Semester 2011/12 RAID Redundant Array of Independent Disks Patterson, Gibson, Katz, A Case for Redundant Array of Inexpensive Disks,
IN THIS PAPER, we study the delay and capacity trade-offs
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 15, NO. 5, OCTOBER 2007 981 Delay and Capacity Trade-Offs in Mobile Ad Hoc Networks: A Global Perspective Gaurav Sharma, Ravi Mazumdar, Fellow, IEEE, and Ness
Algorithms and Methods for Distributed Storage Networks 5 Raid-6 Encoding Christian Schindelhauer
Algorithms and Methods for Distributed Storage Networks 5 Raid-6 Encoding Institut für Informatik Wintersemester 2007/08 RAID Redundant Array of Independent Disks Patterson, Gibson, Katz, A Case for Redundant
Linear Codes. Chapter 3. 3.1 Basics
Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length
RAID Overview: Identifying What RAID Levels Best Meet Customer Needs. Diamond Series RAID Storage Array
ATTO Technology, Inc. Corporate Headquarters 155 Crosspoint Parkway Amherst, NY 14068 Phone: 716-691-1999 Fax: 716-691-9353 www.attotech.com [email protected] RAID Overview: Identifying What RAID Levels
How To Understand And Understand The Power Of Aird 6 On Clariion
A Detailed Review Abstract This white paper discusses the EMC CLARiiON RAID 6 implementation available in FLARE 26 and later, including an overview of RAID 6 and the CLARiiON-specific implementation, when
A Practical Scheme for Wireless Network Operation
A Practical Scheme for Wireless Network Operation Radhika Gowaikar, Amir F. Dana, Babak Hassibi, Michelle Effros June 21, 2004 Abstract In many problems in wireline networks, it is known that achieving
Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.
Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat
Designing a Cloud Storage System
Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes
Reliability of Data Storage Systems
Zurich Research Laboratory Ilias Iliadis April 2, 25 Keynote NexComm 25 www.zurich.ibm.com 25 IBM Corporation Long-term Storage of Increasing Amount of Information An increasing amount of information is
The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage
The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage sponsored by Dan Sullivan Chapter 1: Advantages of Hybrid Storage... 1 Overview of Flash Deployment in Hybrid Storage Systems...
Data Corruption In Storage Stack - Review
Theoretical Aspects of Storage Systems Autumn 2009 Chapter 2: Double Disk Failures André Brinkmann Data Corruption in the Storage Stack What are Latent Sector Errors What is Silent Data Corruption Checksum
RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1
RAID HARDWARE On board SATA RAID controller SATA RAID controller card RAID drive caddy (hot swappable) Anne Watson 1 RAID The word redundant means an unnecessary repetition. The word array means a lineup.
Replication and Erasure Coding Explained
Replication and Erasure Coding Explained Caringo Elastic Content Protection Technical Overview Paul Carpentier, CTO, Caringo Introduction: data loss Unlike what most storage vendors will try to make you
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
How To Create A Multi Disk Raid
Click on the diagram to see RAID 0 in action RAID Level 0 requires a minimum of 2 drives to implement RAID 0 implements a striped disk array, the data is broken down into blocks and each block is written
Lecture 36: Chapter 6
Lecture 36: Chapter 6 Today s topic RAID 1 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for
AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK
Abstract AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK Mrs. Amandeep Kaur, Assistant Professor, Department of Computer Application, Apeejay Institute of Management, Ramamandi, Jalandhar-144001, Punjab,
Mathematical Modelling of Computer Networks: Part II. Module 1: Network Coding
Mathematical Modelling of Computer Networks: Part II Module 1: Network Coding Lecture 3: Network coding and TCP 12th November 2013 Laila Daniel and Krishnan Narayanan Dept. of Computer Science, University
Image Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
CS 61C: Great Ideas in Computer Architecture. Dependability: Parity, RAID, ECC
CS 61C: Great Ideas in Computer Architecture Dependability: Parity, RAID, ECC Instructor: Justin Hsia 8/08/2013 Summer 2013 Lecture #27 1 Review of Last Lecture MapReduce Data Level Parallelism Framework
RAID Basics Training Guide
RAID Basics Training Guide Discover a Higher Level of Performance RAID matters. Rely on Intel RAID. Table of Contents 1. What is RAID? 2. RAID Levels RAID 0 RAID 1 RAID 5 RAID 6 RAID 10 RAID 0+1 RAID 1E
How To Ensure Correctness Of Data In The Cloud
A MECHANICS FOR ASSURING DATA STORAGE SECURITY IN CLOUD COMPUTING 1, 2 Pratibha Gangwar, 3 Mamta Gadoria 1 M. Tech. Scholar, Jayoti Vidyapeeth Women s University, Jaipur, [email protected] 2 M. Tech.
VERY IMPORTANT NOTE! - RAID
Disk drives are an integral part of any computing system. Disk drives are usually where the operating system and all of an enterprise or individual s data are stored. They are also one of the weakest links
Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage
Moving Virtual Storage to the Cloud Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage Table of Contents Overview... 1 Understanding the Storage Problem... 1 What Makes
COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card
Clustering and scheduling maintenance tasks over time
Clustering and scheduling maintenance tasks over time Per Kreuger 2008-04-29 SICS Technical Report T2008:09 Abstract We report results on a maintenance scheduling problem. The problem consists of allocating
Why RAID is Dead for Big Data Storage. The business case for why IT Executives are making a strategic shift from RAID to Information Dispersal
Why RAID is Dead for Big Data Storage The business case for why IT Executives are making a strategic shift from RAID to Information Dispersal Executive Summary Data is exploding, growing 10X every five
Moving Virtual Storage to the Cloud
Moving Virtual Storage to the Cloud White Paper Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage www.parallels.com Table of Contents Overview... 3 Understanding the Storage
Performance of networks containing both MaxNet and SumNet links
Performance of networks containing both MaxNet and SumNet links Lachlan L. H. Andrew and Bartek P. Wydrowski Abstract Both MaxNet and SumNet are distributed congestion control architectures suitable for
Definition of RAID Levels
RAID The basic idea of RAID (Redundant Array of Independent Disks) is to combine multiple inexpensive disk drives into an array of disk drives to obtain performance, capacity and reliability that exceeds
Introduction to time series analysis
Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples
Ensuring Data Storage Security in Cloud Computing
Ensuring Data Storage Security in Cloud Computing Cong Wang 1, Qian Wang 1, Kui Ren 1, and Wenjing Lou 2 1 ECE Department, Illinois Institute of Technology 2 ECE Department, Worcester Polytechnic Institute
QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES
QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES SWATHI NANDURI * ZAHOOR-UL-HUQ * Master of Technology, Associate Professor, G. Pulla Reddy Engineering College, G. Pulla Reddy Engineering
Data Distribution Algorithms for Reliable. Reliable Parallel Storage on Flash Memories
Data Distribution Algorithms for Reliable Parallel Storage on Flash Memories Zuse Institute Berlin November 2008, MEMICS Workshop Motivation Nonvolatile storage Flash memory - Invented by Dr. Fujio Masuoka
Storage Technologies for Video Surveillance
The surveillance industry continues to transition from analog to digital. This transition is taking place on two fronts how the images are captured and how they are stored. The way surveillance images
PIONEER RESEARCH & DEVELOPMENT GROUP
SURVEY ON RAID Aishwarya Airen 1, Aarsh Pandit 2, Anshul Sogani 3 1,2,3 A.I.T.R, Indore. Abstract RAID stands for Redundant Array of Independent Disk that is a concept which provides an efficient way for
Secure Way of Storing Data in Cloud Using Third Party Auditor
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 4 (Jul. - Aug. 2013), PP 69-74 Secure Way of Storing Data in Cloud Using Third Party Auditor 1 Miss.
Adaptive Linear Programming Decoding
Adaptive Linear Programming Decoding Mohammad H. Taghavi and Paul H. Siegel ECE Department, University of California, San Diego Email: (mtaghavi, psiegel)@ucsd.edu ISIT 2006, Seattle, USA, July 9 14, 2006
Trends in NAND Flash Memory Error Correction
Trends in NAND Flash Memory Error Correction June 2009 Eric Deal Cyclic Design Introduction NAND Flash is a popular storage media because of its ruggedness, power efficiency, and storage capacity. Flash
On the Interaction and Competition among Internet Service Providers
On the Interaction and Competition among Internet Service Providers Sam C.M. Lee John C.S. Lui + Abstract The current Internet architecture comprises of different privately owned Internet service providers
TD 271 Rev.1 (PLEN/15)
INTERNATIONAL TELECOMMUNICATION UNION STUDY GROUP 15 TELECOMMUNICATION STANDARDIZATION SECTOR STUDY PERIOD 2009-2012 English only Original: English Question(s): 12/15 Geneva, 31 May - 11 June 2010 Source:
Coding and decoding with convolutional codes. The Viterbi Algor
Coding and decoding with convolutional codes. The Viterbi Algorithm. 8 Block codes: main ideas Principles st point of view: infinite length block code nd point of view: convolutions Some examples Repetition
SECURITY METRICS: MEASUREMENTS TO SUPPORT THE CONTINUED DEVELOPMENT OF INFORMATION SECURITY TECHNOLOGY
SECURITY METRICS: MEASUREMENTS TO SUPPORT THE CONTINUED DEVELOPMENT OF INFORMATION SECURITY TECHNOLOGY Shirley Radack, Editor Computer Security Division Information Technology Laboratory National Institute
technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels
technology brief RAID Levels March 1997 Introduction RAID is an acronym for Redundant Array of Independent Disks (originally Redundant Array of Inexpensive Disks) coined in a 1987 University of California
Computer Network. Interconnected collection of autonomous computers that are able to exchange information
Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.
All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at www.frankdenneman.
WHITE PAPER All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at www.frankdenneman.nl 1 Monolithic shared storage architectures
Object Storage: Out of the Shadows and into the Spotlight
Technology Insight Paper Object Storage: Out of the Shadows and into the Spotlight By John Webster December 12, 2012 Enabling you to make the best technology decisions Object Storage: Out of the Shadows
PEER TO PEER FILE SHARING USING NETWORK CODING
PEER TO PEER FILE SHARING USING NETWORK CODING Ajay Choudhary 1, Nilesh Akhade 2, Aditya Narke 3, Ajit Deshmane 4 Department of Computer Engineering, University of Pune Imperial College of Engineering
RAID Levels and Components Explained Page 1 of 23
RAID Levels and Components Explained Page 1 of 23 What's RAID? The purpose of this document is to explain the many forms or RAID systems, and why they are useful, and their disadvantages. RAID - Redundant
6 Scalar, Stochastic, Discrete Dynamic Systems
47 6 Scalar, Stochastic, Discrete Dynamic Systems Consider modeling a population of sand-hill cranes in year n by the first-order, deterministic recurrence equation y(n + 1) = Ry(n) where R = 1 + r = 1
Content Delivery Networks. Shaxun Chen April 21, 2009
Content Delivery Networks Shaxun Chen April 21, 2009 Outline Introduction to CDN An Industry Example: Akamai A Research Example: CDN over Mobile Networks Conclusion Outline Introduction to CDN An Industry
Integrating Benders decomposition within Constraint Programming
Integrating Benders decomposition within Constraint Programming Hadrien Cambazard, Narendra Jussien email: {hcambaza,jussien}@emn.fr École des Mines de Nantes, LINA CNRS FRE 2729 4 rue Alfred Kastler BP
Christian Bettstetter. Mobility Modeling, Connectivity, and Adaptive Clustering in Ad Hoc Networks
Christian Bettstetter Mobility Modeling, Connectivity, and Adaptive Clustering in Ad Hoc Networks Contents 1 Introduction 1 2 Ad Hoc Networking: Principles, Applications, and Research Issues 5 2.1 Fundamental
RAID. Storage-centric computing, cloud computing. Benefits:
RAID Storage-centric computing, cloud computing. Benefits: Improved reliability (via error correcting code, redundancy). Improved performance (via redundancy). Independent disks. RAID Level 0 Provides
Filing Systems. Filing Systems
Filing Systems At the outset we identified long-term storage as desirable characteristic of an OS. EG: On-line storage for an MIS. Convenience of not having to re-write programs. Sharing of data in an
An Efficient Hybrid Data Gathering Scheme in Wireless Sensor Networks
An Efficient Hybrid Data Gathering Scheme in Wireless Sensor Networks Ayon Chakraborty 1, Swarup Kumar Mitra 2, and M.K. Naskar 3 1 Department of CSE, Jadavpur University, Kolkata, India 2 Department of
N O T E S. A Reed Solomon Code Magic Trick. The magic trick T O D D D. M A T E E R
N O T E S A Reed Solomon Code Magic Trick T O D D D. M A T E E R Howard Community College Columbia, Maryland 21044 [email protected] Richard Ehrenborg [1] has provided a nice magic trick that can be
Near Sheltered and Loyal storage Space Navigating in Cloud
IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719 Vol. 3, Issue 8 (August. 2013), V2 PP 01-05 Near Sheltered and Loyal storage Space Navigating in Cloud N.Venkata Krishna, M.Venkata
Reliability and Fault Tolerance in Storage
Reliability and Fault Tolerance in Storage Dalit Naor/ Dima Sotnikov IBM Haifa Research Storage Systems 1 Advanced Topics on Storage Systems - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Information, Entropy, and Coding
Chapter 8 Information, Entropy, and Coding 8. The Need for Data Compression To motivate the material in this chapter, we first consider various data sources and some estimates for the amount of data associated
Measurement and Metrics Fundamentals. SE 350 Software Process & Product Quality
Measurement and Metrics Fundamentals Lecture Objectives Provide some basic concepts of metrics Quality attribute metrics and measurements Reliability, validity, error Correlation and causation Discuss
Dr Markus Hagenbuchner [email protected] CSCI319. Distributed Systems
Dr Markus Hagenbuchner [email protected] CSCI319 Distributed Systems CSCI319 Chapter 8 Page: 1 of 61 Fault Tolerance Study objectives: Understand the role of fault tolerance in Distributed Systems. Know
Codes for Network Switches
Codes for Network Switches Zhiying Wang, Omer Shaked, Yuval Cassuto, and Jehoshua Bruck Electrical Engineering Department, California Institute of Technology, Pasadena, CA 91125, USA Electrical Engineering
A Comparison of Protocols for Device Management and Software Updates
B L A C K B E R R Y M 2 M S O L U T I O N S A Comparison of Protocols for Device Management and Software Updates In the last two decades, the number of connected computing devices has grown at a staggering
Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering
Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations A Dell Technical White Paper Database Solutions Engineering By Sudhansu Sekhar and Raghunatha
