Network Coding for Distributed Storage

Similar documents
Functional-Repair-by-Transfer Regenerating Codes

A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster

Reliability Comparison of Various Regenerating Codes for Cloud Services

Weakly Secure Network Coding

A Network Flow Approach in Cloud Computing

An Overview of Codes Tailor-made for Networked Distributed Data Storage

A Digital Fountain Approach to Reliable Distribution of Bulk Data

On the Locality of Codeword Symbols

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes

Secure Network Coding via Filtered Secret Sharing

Secure Network Coding for Wiretap Networks of Type II

RAID. Storage-centric computing, cloud computing. Benefits:

XORing Elephants: Novel Erasure Codes for Big Data

Network Monitoring in Multicast Networks Using Network Coding

Department of Electrical Engineering, Pennsylvania State University, University Park PA.

Data Corruption In Storage Stack - Review

Designing a Cloud Storage System

Implementation and Performance Evaluation of Distributed Cloud Storage Solutions using Random Linear Network Coding

Applied Algorithm Design Lecture 5

On the effect of forwarding table size on SDN network utilization

Kodo: An Open and Research Oriented Network Coding Library Pedersen, Morten Videbæk; Heide, Janus; Fitzek, Frank Hanns Paul

File Sharing between Peer-to-Peer using Network Coding Algorithm

On the Multiple Unicast Network Coding Conjecture

Provably Delay Efficient Data Retrieving in Storage Clouds

Linear Codes. Chapter Basics

SYSTEMATIC NETWORK CODING FOR LOSSY LINE NETWORKS. (Paresh Saxena) Supervisor: Dr. M. A. Vázquez-Castro

Practical Covert Channel Implementation through a Timed Mix-Firewall

Signatures for Content Distribution with Network Coding

Practical Data Integrity Protection in Network-Coded Cloud Storage

Comparison of Network Coding and Non-Network Coding Schemes for Multi-hop Wireless Networks

Network Coding for Security and Error Correction

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

CSE-E5430 Scalable Cloud Computing P Lecture 5

9 More on differentiation

Factoring & Primality

A Practical Scheme for Wireless Network Operation

Scala Storage Scale-Out Clustered Storage White Paper

On the Traffic Capacity of Cellular Data Networks. 1 Introduction. T. Bonald 1,2, A. Proutière 1,2

Tableau Server Scalability Explained

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Reliability and Fault Tolerance in Storage

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Adaptive Linear Programming Decoding

Lecture 36: Chapter 6

Disk Storage & Dependability

The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms. Abhijith Shenoy Engineer, Hedvig Inc.

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Optimal Multicast in Dense Multi-Channel Multi-Radio Wireless Networks

On Data Recovery in Distributed Databases

Quantcast Petabyte Storage at Half Price with QFS!

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

PEER TO PEER FILE SHARING USING NETWORK CODING

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures

Mathematical Modelling of Computer Networks: Part II. Module 1: Network Coding

6.852: Distributed Algorithms Fall, Class 2

Network File Storage with Graceful Performance Degradation

Decentralized Utility-based Sensor Network Design

Online Scheduling with Bounded Migration

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Discrete Optimization

Notes from Week 1: Algorithms for sequential prediction

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05

Scalable Internet Services and Load Balancing

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Optimizing Congestion in Peer-to-Peer File Sharing Based on Network Coding

Codes for Network Switches

Summer Student Project Report

Why RAID is Dead for Big Data Storage. The business case for why IT Executives are making a strategic shift from RAID to Information Dispersal

5.1 Bipartite Matching

Lecture 3: Linear methods for classification

How to Choose your Red Hat Enterprise Linux Filesystem

Transcription:

Network Coding for Distributed Storage Alex Dimakis USC

Overview Motivation Data centers Mobile distributed storage for D2D Specific storage problems Fundamental tradeoff between repair communication and storage. Systematic Repair (open problem) Distributed storage allocations (open problem) 2

Motivation: Data centers Warehouse-sized computing and storage facilities. Cost in the hundreds of millions. Large-scale distributed storage: thousands of servers. Petabytes of disc space. Internet Data centers are the next computing platform: Web search, indexing, Gmail, Facebook, Video storage, 3

Massive distributed data storage Numerous disk failures per day. Must introduce redundancy in stored information. Replication or erasure coding? Coding can give orders of magnitude more reliability But problems in creating and maintaining an encoded data representation have to be addressed 4

Distributed caching in mobiles Infrastructure slow to deploy and upgrade Delivery with opportunistic contacts [7DS,Haggle, ] Extends coverage and capacity using free D2D bandwidth Scales as network gets dense [Grossglauser/Tse02] 5/5/10 5

Distributed caching in mobiles The video you want to watch is very likely to be downloaded by people nearby in the next day Storage in phones is increasing more than anything else Cache the popular content and use D2D to share 5/5/10 6

MDS erasure codes File or data object k=2 A n=3 n=4 A A A B B B B A+B A+B (3,2) MDS code, (single parity) used in RAID 5 A+2B (4,2) MDS code. Tolerates any 2 failures Used in RAID 6 7

erasure codes are reliable Replication (4,2) MDS erasure code (any 2 suffice to recover) File or data object A A A Erasure coding is introducing redundancy in an optimal way. B Very useful in practice i.e. Reed-Solomon codes, Fountain Codes, (LT and Raptor) B A+B Current storage architectures still use replication. (Gmail makes 21 copies(!)) Can we improve storage efficiency? B A+2B A vs B Replication Pr[failure]=0.43 MDS Erasure code Pr[failure]=0.31 8

New open problems A B Network traffic Issues: Communication Update complexity Repair communication? 9

Code Repair: Problem statement 1mb 1mb a b Assume we have a (4,2) MDS code and one node leaves the system How much data does a newcomer (e) have to download, to construct a new encoded packet?? repairing the code in distributed environments. c? d? e 10

Code Repair: first thoughts 1mb 1mb a Downloading 2mb definitely works. But newcomer (e) is downloading 2mb, to store only 1mb! b Q: Is it possible to download less data? a+b 1mb It is possible to download 1.5mb! e a+2b 1mb When coding is used, creating new fragments is not a trivial task. The problem is that to create a new fragment we must have access to the entire data object 11

Reducing repair bandwidth 1mb 1mb a1 a2 b1 b2 1 1 b1+b2 a1+b1 a2+b2 a1+2b1 a2+2b2 1 2 1 3 a1+b1+2a2+2b2 a1+2b1+3a2+6b2 e1 e1 12

Repair Bandwidth for MDS Theorem 1: For (n,k)-mds codes, if each node is storing bits and downloads β from each existing node Proof by reduction to an flow on an (infinite) graph. MDS = M k,β MDS = M k 1 n k (D, Godfrey, Wu, Wainwright, Ramchandran, IT Transactions (to appear) ) 13

Proof sketch: Information flow graph 1mb a a data collector S b c d b c d β β β e data collector =1mb 1+2 β 2 β 1/2mb Total download 1.5mb 14

Proof sketch: reduction to multicasting data collector a a data collector S b c d b c d β β β e data collector data collector data collector data collector Repairing a code = multicasting on the information flow graph. sufficient iff minimum of the min cuts is larger than file size M. (Ahlswede et al. Koetter & Medard, Ho et al.) 15

Overview Motivation - Distributed storage in data centers The code repair problem Minimizing repair bandwidth Fundamental tradeoff between repair bandwidth and storage. Systematic Repair 16

Regenerating codes M/k a g b f c β d β d e Repair bandwidth can be greatly reduced if we allow slightly more storage per node. 17

Minimizing repair bandwidth β β d β d β d d minβd st : MinCut(DC i ) M, i d {k,k +1,...n 1}, βd This problem can be solved analytically 18

Ingredient 1: bounding the flow lemma: for any (potentially infinite) graph G(,β,d), any data collector has flow at least MinCut(DC i ) k 1 i= 0 Min{(d i)β,} Proof: sort topologically, count. Bound is tight since satisfied with equality for this graph 19

Ingredient 2: just relax β d β d β d β d minβd st : k 1 i= 0 min{(d i)β,} M d {k,k +1,...n 1}, βd Relax the integer constraint Show that integer and relaxed problem attain optimum at the same point 20

Minimum repair bandwidth Theorem 2: The minimum repair bandwidth optimization problem has a unique optimum point: 21

Numerical example File size M=20mb, k=20, n=25 Reed-Solomon : Store =1mb, repair βd=20mb MinStorage-RC : Store =1mb, repair βd=4.8mb MinBandwidth RC : Store =1.65mb, repair βd=1.65mb Fundamental Tradeoff: What other points are achievable? 22

Storage-Communication tradeoff Theorem 3: for any (n,k) code, where each node stores bits, repairs from d existing nodes and downloads dβ=γ bits, the feasible region is piecewise linear function described as follows: min = M /k, γ [ f (0), ), M g(i)γ, γ [ f (i), f (i 1)). k i f (i) := g(i) := 2Md (2k i 1)i + 2k(d k +1) (2d 2k + i +1)i 2d 23

Storage-Communication tradeoff Min-Bandwidth Regenerating code Min-Storage Regenerating code βd 24

Open Problem: Systematic repair 1mb 1mb a b From Theorem 1, a (4,2) MDS code can be repaired by downloading MDS = M k,β MDS = M k 1 n k c?? What if we require perfect reconstruction? d? e=a 25

Repair vs Systematic Repair x 1 x 2 x n β d β d x 1? Repair= Multicasting Systematic repair= Multicasting with intermediate nodes having (overlapping) requests. data data collector collector Cut arguments might not be tight Linear codes might not suffice (Dougherty et al.) β d k β d 26

Systematic Repair-(4,2) example x1 x3 x1+x3 x1+2x3 x2 x4 x2+x4 2-1 2x2+3x4 3-1 x3+x4 x1+x2+x3+x4 2-1 x1+2 3-1 x2+x3+x4 x1? x2? (Wu and D., ISIT 2009) 27

What is known about systematic repair For (n,2) systematic repair can match cutset bound. [WD ISIT 09] (5,3) MSR systematic code exists (Cullina,D,Ho, Allerton 09) For k/n <=1/2 Systematic repair can match cutset bound [Rashmi, Shah, Kumar, Ramchandran (2010)] [Suh, Ramchandran (2010) ] What can be done for high rates? 28

What is known about systematic repair Given an error-correcting code find the repair coefficients that reduce communication (over a field) Given some channel matrices find the beamforming matrices that maximize the DoF (Cadambe and Jafar, Suh and Tse) (Papailiopoulos &D, working paper)

Distributed caching in mobiles Network codes designed for distributed storage (Regenerating codes) greatly reduce the communication required to maintain the desired redundancy.? Nodes cache different content in a distributed way Which content to cache How much to store? How to find peers that have the desired content Incentives for people to donate storage/ bandwidth?

How much to store Two files, each of size 1. Fix a total redundancy 2 How to allocate storage?

How much to store Coding helps But finding the best allocation is nontrivial

An easier problem

Allocations for one object

Allocations for one object

Problem Description max s.t. n Prob[ i=1 xi T x i 1 i 1] Can be generalized to other models of node availability. Nonconvex problem. Harder than it looks.

Distributed storage allocations Symmetric allocations can be suboptimal Given n = 5 storage nodes, budget T = 12/5, and p = 0.9, the nonsymmetric allocation performs better than the optimal symmetric allocation Finding the optimal symmetric allocation is also nontrivial Originally from a discussion among R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman also see S. Jain, M. Demmer, R. Patra, and K. Fall, SIGCOMM 05

Distributed storage allocations Results can be obtained for different access models. For iid model. Maximal spreading x= T/n was shown to have asymptotically zero gap from optimality if Tp>1 Leong, D. Ho, Netcod 2009, Globecom submitted

Open Problems Cut-Set bounds tight? Linear codes sufficient? What is the limit of interference alignment techniques? Repairing codes in small fields? Existing codes used in storage (e.g. EvenOdd Code, B- Code, etc?). Dealing with bit-errors (security)? (Dikaliotis,Ho,D, ISIT 10) What is the role of (non-trivial) network topologies? Allocations for multiple objects? 39

Coding for Storage wiki 40

fin 41

Conclusions We proposed a theoretical framework for analyzing encoded information representations Repair reduces to network coding and flow arguments completely characterize what is possible. We identified and characterized a tradeoff between repair bandwidth and communication for any storage system. Numerous interesting questions in coding for data centers- repair/ updates/disk IO vs network bandwidth. Systematic, deterministic, small finite field constructions are very interesting for real applications. 42