Network Coding for Distributed Storage Alex Dimakis USC
Overview Motivation Data centers Mobile distributed storage for D2D Specific storage problems Fundamental tradeoff between repair communication and storage. Systematic Repair (open problem) Distributed storage allocations (open problem) 2
Motivation: Data centers Warehouse-sized computing and storage facilities. Cost in the hundreds of millions. Large-scale distributed storage: thousands of servers. Petabytes of disc space. Internet Data centers are the next computing platform: Web search, indexing, Gmail, Facebook, Video storage, 3
Massive distributed data storage Numerous disk failures per day. Must introduce redundancy in stored information. Replication or erasure coding? Coding can give orders of magnitude more reliability But problems in creating and maintaining an encoded data representation have to be addressed 4
Distributed caching in mobiles Infrastructure slow to deploy and upgrade Delivery with opportunistic contacts [7DS,Haggle, ] Extends coverage and capacity using free D2D bandwidth Scales as network gets dense [Grossglauser/Tse02] 5/5/10 5
Distributed caching in mobiles The video you want to watch is very likely to be downloaded by people nearby in the next day Storage in phones is increasing more than anything else Cache the popular content and use D2D to share 5/5/10 6
MDS erasure codes File or data object k=2 A n=3 n=4 A A A B B B B A+B A+B (3,2) MDS code, (single parity) used in RAID 5 A+2B (4,2) MDS code. Tolerates any 2 failures Used in RAID 6 7
erasure codes are reliable Replication (4,2) MDS erasure code (any 2 suffice to recover) File or data object A A A Erasure coding is introducing redundancy in an optimal way. B Very useful in practice i.e. Reed-Solomon codes, Fountain Codes, (LT and Raptor) B A+B Current storage architectures still use replication. (Gmail makes 21 copies(!)) Can we improve storage efficiency? B A+2B A vs B Replication Pr[failure]=0.43 MDS Erasure code Pr[failure]=0.31 8
New open problems A B Network traffic Issues: Communication Update complexity Repair communication? 9
Code Repair: Problem statement 1mb 1mb a b Assume we have a (4,2) MDS code and one node leaves the system How much data does a newcomer (e) have to download, to construct a new encoded packet?? repairing the code in distributed environments. c? d? e 10
Code Repair: first thoughts 1mb 1mb a Downloading 2mb definitely works. But newcomer (e) is downloading 2mb, to store only 1mb! b Q: Is it possible to download less data? a+b 1mb It is possible to download 1.5mb! e a+2b 1mb When coding is used, creating new fragments is not a trivial task. The problem is that to create a new fragment we must have access to the entire data object 11
Reducing repair bandwidth 1mb 1mb a1 a2 b1 b2 1 1 b1+b2 a1+b1 a2+b2 a1+2b1 a2+2b2 1 2 1 3 a1+b1+2a2+2b2 a1+2b1+3a2+6b2 e1 e1 12
Repair Bandwidth for MDS Theorem 1: For (n,k)-mds codes, if each node is storing bits and downloads β from each existing node Proof by reduction to an flow on an (infinite) graph. MDS = M k,β MDS = M k 1 n k (D, Godfrey, Wu, Wainwright, Ramchandran, IT Transactions (to appear) ) 13
Proof sketch: Information flow graph 1mb a a data collector S b c d b c d β β β e data collector =1mb 1+2 β 2 β 1/2mb Total download 1.5mb 14
Proof sketch: reduction to multicasting data collector a a data collector S b c d b c d β β β e data collector data collector data collector data collector Repairing a code = multicasting on the information flow graph. sufficient iff minimum of the min cuts is larger than file size M. (Ahlswede et al. Koetter & Medard, Ho et al.) 15
Overview Motivation - Distributed storage in data centers The code repair problem Minimizing repair bandwidth Fundamental tradeoff between repair bandwidth and storage. Systematic Repair 16
Regenerating codes M/k a g b f c β d β d e Repair bandwidth can be greatly reduced if we allow slightly more storage per node. 17
Minimizing repair bandwidth β β d β d β d d minβd st : MinCut(DC i ) M, i d {k,k +1,...n 1}, βd This problem can be solved analytically 18
Ingredient 1: bounding the flow lemma: for any (potentially infinite) graph G(,β,d), any data collector has flow at least MinCut(DC i ) k 1 i= 0 Min{(d i)β,} Proof: sort topologically, count. Bound is tight since satisfied with equality for this graph 19
Ingredient 2: just relax β d β d β d β d minβd st : k 1 i= 0 min{(d i)β,} M d {k,k +1,...n 1}, βd Relax the integer constraint Show that integer and relaxed problem attain optimum at the same point 20
Minimum repair bandwidth Theorem 2: The minimum repair bandwidth optimization problem has a unique optimum point: 21
Numerical example File size M=20mb, k=20, n=25 Reed-Solomon : Store =1mb, repair βd=20mb MinStorage-RC : Store =1mb, repair βd=4.8mb MinBandwidth RC : Store =1.65mb, repair βd=1.65mb Fundamental Tradeoff: What other points are achievable? 22
Storage-Communication tradeoff Theorem 3: for any (n,k) code, where each node stores bits, repairs from d existing nodes and downloads dβ=γ bits, the feasible region is piecewise linear function described as follows: min = M /k, γ [ f (0), ), M g(i)γ, γ [ f (i), f (i 1)). k i f (i) := g(i) := 2Md (2k i 1)i + 2k(d k +1) (2d 2k + i +1)i 2d 23
Storage-Communication tradeoff Min-Bandwidth Regenerating code Min-Storage Regenerating code βd 24
Open Problem: Systematic repair 1mb 1mb a b From Theorem 1, a (4,2) MDS code can be repaired by downloading MDS = M k,β MDS = M k 1 n k c?? What if we require perfect reconstruction? d? e=a 25
Repair vs Systematic Repair x 1 x 2 x n β d β d x 1? Repair= Multicasting Systematic repair= Multicasting with intermediate nodes having (overlapping) requests. data data collector collector Cut arguments might not be tight Linear codes might not suffice (Dougherty et al.) β d k β d 26
Systematic Repair-(4,2) example x1 x3 x1+x3 x1+2x3 x2 x4 x2+x4 2-1 2x2+3x4 3-1 x3+x4 x1+x2+x3+x4 2-1 x1+2 3-1 x2+x3+x4 x1? x2? (Wu and D., ISIT 2009) 27
What is known about systematic repair For (n,2) systematic repair can match cutset bound. [WD ISIT 09] (5,3) MSR systematic code exists (Cullina,D,Ho, Allerton 09) For k/n <=1/2 Systematic repair can match cutset bound [Rashmi, Shah, Kumar, Ramchandran (2010)] [Suh, Ramchandran (2010) ] What can be done for high rates? 28
What is known about systematic repair Given an error-correcting code find the repair coefficients that reduce communication (over a field) Given some channel matrices find the beamforming matrices that maximize the DoF (Cadambe and Jafar, Suh and Tse) (Papailiopoulos &D, working paper)
Distributed caching in mobiles Network codes designed for distributed storage (Regenerating codes) greatly reduce the communication required to maintain the desired redundancy.? Nodes cache different content in a distributed way Which content to cache How much to store? How to find peers that have the desired content Incentives for people to donate storage/ bandwidth?
How much to store Two files, each of size 1. Fix a total redundancy 2 How to allocate storage?
How much to store Coding helps But finding the best allocation is nontrivial
An easier problem
Allocations for one object
Allocations for one object
Problem Description max s.t. n Prob[ i=1 xi T x i 1 i 1] Can be generalized to other models of node availability. Nonconvex problem. Harder than it looks.
Distributed storage allocations Symmetric allocations can be suboptimal Given n = 5 storage nodes, budget T = 12/5, and p = 0.9, the nonsymmetric allocation performs better than the optimal symmetric allocation Finding the optimal symmetric allocation is also nontrivial Originally from a discussion among R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman also see S. Jain, M. Demmer, R. Patra, and K. Fall, SIGCOMM 05
Distributed storage allocations Results can be obtained for different access models. For iid model. Maximal spreading x= T/n was shown to have asymptotically zero gap from optimality if Tp>1 Leong, D. Ho, Netcod 2009, Globecom submitted
Open Problems Cut-Set bounds tight? Linear codes sufficient? What is the limit of interference alignment techniques? Repairing codes in small fields? Existing codes used in storage (e.g. EvenOdd Code, B- Code, etc?). Dealing with bit-errors (security)? (Dikaliotis,Ho,D, ISIT 10) What is the role of (non-trivial) network topologies? Allocations for multiple objects? 39
Coding for Storage wiki 40
fin 41
Conclusions We proposed a theoretical framework for analyzing encoded information representations Repair reduces to network coding and flow arguments completely characterize what is possible. We identified and characterized a tradeoff between repair bandwidth and communication for any storage system. Numerous interesting questions in coding for data centers- repair/ updates/disk IO vs network bandwidth. Systematic, deterministic, small finite field constructions are very interesting for real applications. 42