Erasure Coding for Cloud Communication and Storage Cheng Huang, Jin Li Microsoft Research Tutorial at IEEE ICC (June, 2014) 1
Tutorial at IEEE ICC (June, 2014) 2
Tutorial at IEEE ICC (June, 2014) 3
Erasure Coding has become a key technology piece in realizing the vision focusing on improving network performance and reducing cloud storage cost Tutorial at IEEE ICC (June, 2014) 4
Tutorial at IEEE ICC (June, 2014) 5
Tutorial at IEEE ICC (June, 2014) 6
Erasure Coding 101 Tutorial at IEEE ICC (June, 2014) 7
Tutorial at IEEE ICC (June, 2014) 8
encoding decoding Tutorial at IEEE ICC (June, 2014) 9
Tutorial at IEEE ICC (June, 2014) 10
replication a=2 a=2 b=3 b=3 a=2 b=3 Tutorial at IEEE ICC (June, 2014) 11
simple parity a=2 a=2 b=3 b=3 a+b=5 Tutorial at IEEE ICC (June, 2014) 12
Tutorial at IEEE ICC (June, 2014) 13
n = 4 k = 2 a=2 a=2 b=3 b=3 a b a 2b Tutorial at IEEE ICC (June, 2014) 14
0 1 2 3 0 0 1 2 3 1 1 0 3 2 2 2 3 0 1 3 3 2 1 0 a=2 X 0 1 2 3 0 0 0 0 0 1 0 1 2 3 2 0 2 3 1 3 0 3 1 2 a=? a=2 b=3 b=? a=2 b=3 a b=1 a b=1 b=3 a 2xb=3 a 2xb=3 Tutorial at IEEE ICC (June, 2014) 15
a a b a b b a b a 2xb is Tutorial at IEEE ICC (June, 2014) 16
X 0 1 1 1 X 1 1 1 0 Tutorial at IEEE ICC (June, 2014) 17
Erasure Coding has been around for decades, the purpose of this tutorial is to cover innovative designs and applications of erasure coding in recent years to address needs from new application scenarios Tutorial at IEEE ICC (June, 2014) 18
Erasure Coding in Cloud-Based Social Gaming Tutorial at IEEE ICC (June, 2014) 19
How to ensure universally smooth gaming experience? Improving the tail! Tutorial at IEEE ICC (June, 2014) 20
Tutorial at IEEE ICC (June, 2014) 21
Tutorial at IEEE ICC (June, 2014) 22
interaction gap Tutorial at IEEE ICC (June, 2014) 23
Many messages arriving late Tutorial at IEEE ICC (June, 2014) 24
Latency (ms) 2500 2000 1500 1000 500 US/CAN & Europe only Imagine what s next open to all markets launch on mobile 0 Tutorial at IEEE ICC (June, 2014) 95% 99% 99.9% 25
Tutorial at IEEE ICC (June, 2014) 26
Tutorial at IEEE ICC (June, 2014) 27
Tutorial at IEEE ICC (June, 2014) 28
4 S:6 S:6 R:0 R 4 R:2 3 R:0 2 2 S:4 R:1 1 R 2 1 S:3 0 0 S:2 S:2 R:0 R 1 1 0 4 S:5 S:4 4 R:1 3 2 1 S:3 3 2 1 S:3 R:0 R:1 R 2 2 1 0 S:1 0 0 Tutorial at IEEE ICC (June, 2014) 0 RTT 2RTT 3RTT 29
4 S:6 S:6 R:0 R 4 R:2 3 R:0 2 2 S:4 R:1 1 R 2 1 S:3 0 0 S:2 S:2 R:0 R 1 1 0 4 S:5 S:4 4 R:1 3 2 1 S:3 3 2 1 S:3 R:0 R:1 R 2 2 1 0 S:1 0 0 Tutorial at IEEE ICC (June, 2014) 0 RTT 2RTT 3RTT 30
4 S:6 S:6 R:0 R 4 R:2 3 R:0 2 2 S:4 R:1 1 R 2 1 S:3 0 0 S:2 S:2 R:0 R 1 1 0 4 S:5 S:4 4 R:1 3 2 1 S:3 3 2 1 S:3 R:0 R:1 R 2 2 1 0 S:1 0 0 Tutorial at IEEE ICC (June, 2014) 0 RTT 2RTT 3RTT 31
4 S:6 S:6 R:0 R 4 R:2 3 R:0 2 2 S:4 R:1 1 R 2 1 S:3 0 0 S:2 S:2 R:0 R 1 1 0 4 S:5 S:4 4 R:1 3 2 1 S:3 3 2 1 S:3 R:0 R:1 R 2 2 1 0 S:1 0 0 Tutorial at IEEE ICC (June, 2014) 0 RTT 2RTT 3RTT 32
4 S:6 S:6 R:0 R 4 R:2 3 R:0 2 2 S:4 R:1 1 R 2 1 S:3 0 0 S:2 S:2 R:0 R 1 1 0 4 S:5 S:4 4 R:1 3 2 1 S:3 3 2 1 S:3 R:0 R:1 R 2 2 1 0 S:1 0 0 Tutorial at IEEE ICC (June, 2014) 0 RTT 2RTT 3RTT 33
4 S:6 S:6 R:0 R 4 R:2 3 R:0 2 2 S:4 R:1 1 R 2 1 S:3 0 0 S:2 S:2 R:0 R 1 1 0 4 S:5 S:4 4 R:1 3 2 1 S:3 3 2 1 S:3 R:0 R:1 R 2 2 1 0 S:1 0 0 Tutorial at IEEE ICC (June, 2014) 0 RTT 2RTT 3RTT 34
4 S:6 S:6 R:0 R 4 R:2 3 R:0 2 2 S:4 R:1 1 R 2 1 S:3 0 0 S:2 S:2 R:0 R 1 1 0 4 S:5 S:4 4 R:1 3 2 1 S:3 3 2 1 S:3 R:0 R:1 R 2 2 1 0 S:1 0 0 Tutorial at IEEE ICC (June, 2014) 0 RTT 2RTT 3RTT 35
4 S:6 S:6 R:0 R 4 R:2 3 R:0 2 2 S:4 R:1 1 R 2 1 S:3 0 0 S:2 S:2 R:0 R 1 1 0 4 S:5 S:4 4 R:1 3 2 1 S:3 3 2 1 S:3 R:0 R:1 R 2 2 1 0 S:1 0 0 Tutorial at IEEE ICC (June, 2014) 0 RTT 2RTT 3RTT 36
Tutorial at IEEE ICC (June, 2014) 37
Tutorial at IEEE ICC (June, 2014) 38
Tutorial at IEEE ICC (June, 2014) 39
Tutorial at IEEE ICC (June, 2014) 40
Latency (ms) 2500 2000 1500 1000 500 0 TCP Pangolin 95% 99% 99.9% 60% Pangolin overhead only 6.1%! Tutorial at IEEE ICC (June, 2014) 41
Latency Packet Loss Limited bandwidth End to end delay/ping (e.g. 100ms) Burst or Random E.g. <2 Mbps vs 100Mbps for LAN
Effect of Packet Loss on Real-Time Multimedia Communication Transmission error Time Reconstructed video frame subjected to packet loss
RemoteFX for WAN Application Rate Control Transport Protocol Congestion Control Network Feedback Application Original Packets Coded Packets Transmission Strategy Coded Packets Key components Estimate channel condition (packet loss prob), and use a cost function, determine whether to send: i) Original, ii) FEC or iii) Resent packet Use random linear code (network coding) to mix packets
Design Goal 66 Minimize sequential decodability delay Time when packet is sent to the time that packet and all previous packets available Good indicator for user perceived performance A B C x Delay for B is much larger because of retransmission Resend B Delay for C is larger than simply propagation delay because of waiting for B
Delay 67 Sequential decodability delay function of Probability of sequential decodability which is function of Channel loss characteristics increases with network congestion Transmission strategy - which packets have been sent original/fec coding structure of FEC packets delay caused by coding structure as well as retransmissions Propagation delay Network queuing delay increases with network congestion
How to Minimize Delay Loss Prevention and Loss Mitigation 1. Don t cause self-induced congestion minimizes packet loss and minimizes network queuing delay 2. Use flow control strategy that does not induce loss e.g. TCP enters congestion avoidance phase only when it encounters loss once loss has already happened, you suffer 3. Maximize transmission rate aggressive ramp up and state remembering after burst 68 4. Use FEC to proactively correct any remaining losses
Components of Sequential Decodability Delay 69 Expected value of delay for packet l Propagation + Network queuing delay Time it takes packet to reach from sender to receiver Probability of sequential decodability of packet l based upon transmitted packets up to k Time between transmission opportunities 1/transmission rate
Minimizing Components of Sequential Decodability Delay 70 Preventing self induced congestion minimizes this term minimizes network queuing delay Minimizing loss minimizes this term done by Using wise coding strategy Backing off on rate increase prior to loss Minimize this term by maximizing transmission rate Quick ramp up State remembering for initial point
Packet Encoding Strategy 71 Assume these terms constant flow/congestion control s job to minimize also assume congestion based loss at minimum Term to minimize over all packets being considered for practical reasons, only consider those currently in sender s queue By considering all packets currently in queue, balance Gain of packets already sent by sending FEC with Delaying original packets waiting in queue Have fast algorithm to determine probability given particular coding structure and probability of loss
Erasure Coding in P2P Multiparty Conferencing
Multi-party Conferencing Scenario Every user wants to view audio/video from all other users and is a source of its own audio/video stream Maximize Quality-of-Experience (QoE) Challenges Network bandwidth limited Require low end-to-end delay Network conditions time-varying Distributed solution not requiring global network knowledge Existing audio/video conferencing products B A C B A D C Apple ichat AV,,, Halo, TelePresence, Windows Live Messenger
Comparison of Distribution Approaches MCU-assisted multicast Simulcast Peer-assisted multicast A A A MCU B C B C B C High load on MCU, expensive, not scalable with increasing number of peers or groups Halo As group size and heterogeneity increases, video quality deteriorates due to peer uplink bandwidth constraint Apple ichat AV Optimal utilization of each peer s uplink bandwidth, no MCU required but can assist as helper
Celebrity (A P2P Multiparty Conferencing with Network Coding) Objectives Stringent end-to-end delay requirement <200ms Unknown network topology Limited and unknown network bandwidth Time-varying network conditions
Celebrity : Overview Data multicast via a hybrid tree and mixing solution Source sends one copy of content via low-delay spanning trees Can explore all depth-1 and depth-2 trees Each node outputs a mixture/network coded packets on each link at certain rate Redundancy provided by network coding Distributed link rate adaptation that collectively maximized delay-limited capacities of the sessions Critical session and links get more resource Approach 1: driven by session and link innovation measurement Approach 2: driven by link states and critical cut computation Respond to link congestion signals (loss and delay) Similar to TCP, TFRC, DCCP
Hybrid Tree + Coding Approach Distribution trees +: Propagation delay known from the tree structures -: Update of the trees need to be done centrally How to deal with packet loss? How to find a good set of trees? Network coding +: information will find their ways to the sinks built-in resilience to packet losses +: Update of the link rates can potentially be done distributedly How to reduce the decoding delay? How to provide delay guarantee? Hybrid tree + coding approach Best of both worlds. (Tree packets are always sent immediately)
Network Coding
Experimental Results
Erasure Coding in Video Streaming Tutorial at IEEE ICC (June, 2014) 80
Tutorial at IEEE ICC (June, 2014) 81
? Tutorial at IEEE ICC (June, 2014) 82
coding in network Tutorial at IEEE ICC (June, 2014) 83
Peer Peer Peer Peer Peer Peer Peer Peer Peer Peer Peer Peer Peer Peer Tutorial at IEEE ICC (June, 2014) 84
Advanced Applications Video Streaming Streaming VOIP Skype Basic Applications Downloading File Sharing Napster BitTorrent Tutorial at IEEE ICC (June, 2014) time 85
Tutorial at IEEE ICC (June, 2014) 86
a b a b b c c c a Tutorial at IEEE ICC (June, 2014) 87
chunks chunks Tutorial at IEEE ICC (June, 2014) 88
Segment: 4 seconds, 180KB Block: 1KB, 180 blocks/segment 350 kbps video delivered segments priority region outside priority Tutorial at IEEE ICC (June, 2014) 89
Tutorial at IEEE ICC (June, 2014) 90
Tutorial at IEEE ICC (June, 2014) 91
Tutorial at IEEE ICC (June, 2014) 92
Tutorial at IEEE ICC (June, 2014) 93
Erasure Coding in Cloud Storage Tutorial at IEEE ICC (June, 2014) 94
Performance good perf, minimize cost Storage Cost Reliability Tutorial at IEEE ICC (June, 2014) 95
replication a=2 Reed-Solomon coding a=2 a=2 b=3 b=3 b=3 a=2 a=2 reconstruction a b b=3 storage 2x 1.5x reconstruction 1 2 Tutorial at IEEE ICC (June, 2014) 96
permanent failure temporary unavailability (90+%) hot storage nodes rolling update a=2 reconstruction Reed-Solomon coding a=2 b=3 a+b reconstruction on critical path and frequent enough storage 2x 1.5x reconstruction 1 2 Tutorial at IEEE ICC (June, 2014) 97
high reconstruction cost inevitable price for erasure coding Tutorial at IEEE ICC (June, 2014) 98
reconstruction cost Reed-Solomon codes replication storage overhead Tutorial at IEEE ICC (June, 2014) 99
Pyramid Codes Tutorial at IEEE ICC (June, 2014) 100
reconstruction cost: 12 data nodes d1 d2... d6 d7... d11 d12 12 parity nodes C1 C2 C3 3 Reed-Solomon 12 + 3 Tutorial at IEEE ICC (June, 2014) 101
data nodes d1 d2... d6 d7... d11 d12 12 parity nodes C1 C2 C3 3 Pyramid Codes Construction: take an arbitrary Reed- Solomon (RS) code C 1,1 C 1,2 split one RS parity into multiple local parities 12 + 3 RS 12 + 4 Pyramid Tutorial at IEEE ICC (June, 2014) 102
reconstruction cost: 6 d1 d2 d3 d4 d5 d6 C 1,1 d7 d8 d9 d10 d11 d12 C 1,2 C2 C3 Tutorial at IEEE ICC (June, 2014) 103
d1 d2 d3 d4 d5 d6 C 1,1 d7 d8 d9 d10 d11 d12 C 1,2 C2 C3 CASE I: recover d 5 from c 1,1 recover d 8 and d 12 from c 2 and c 3 Tutorial at IEEE ICC (June, 2014) 104
d1 d2 d3 d4 d5 d6 C 1,1 d7 d8 d9 d10 d11 d12 C1,2 C1 C2 C3 CASE II: combine C1,1 and C1,2 C 1 convert 12 + 4 Pyramid code back to 12 + 3 RS code recover the 3 failures (d 8, d 11 and d 12 ) in the RS code Tutorial at IEEE ICC (June, 2014) 105
C1 C2 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 C3 reconstruction cost of d 1 3 Tutorial at IEEE ICC (June, 2014) 106
C1 C2 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 C3 reconstruction cost of d 1 and d 2 6 Tutorial at IEEE ICC (June, 2014) 107
C1 C2 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 C3 decoding analogous to climbing up Pyramid Tutorial at IEEE ICC (June, 2014) 108
reconstruction cost Reed-Solomon codes Pyramid Codes replication storage overhead Tutorial at IEEE ICC (June, 2014) 109
Maximal Recoverability Tutorial at IEEE ICC (June, 2014) 110
d1 d2 d3 d4 d5 d6 C 1,1 d7 d8 d9 d10 d11 d12 C 1,2 C2 C3 Tutorial at IEEE ICC (June, 2014) 111
Decoding Tanner graph Left: failed data nodes Right: survival parity nodes d1 d2 d3 d4 d5 d6 C1,1 d5 C 1,1 d7 d8 d9 d10 d11 d12 C1,2 d6 C 1,2 C2 C3 d8 C2 Recoverability Theorem: recoverable full matching d12 C3 decoding Tanner graph contains full matching Tutorial at IEEE ICC (June, 2014) 112
d1 C1,1 d2 d1 d2 d3 d4 d5 d6 C1,1 C1,2 d7 d8 d9 d10 d11 d12 C1,2 C2 C3 d5 C3 d6 C4 decoding Tanner graph contains no full matching Tutorial at IEEE ICC (June, 2014) 113
First class of MR codes MR codes in cloud deployment (Windows Azure Storage) Tutorial at IEEE ICC (June, 2014) 114
LRC in Windows Azure Storage Tutorial at IEEE ICC (June, 2014) 115
sealed extent ( 3 GB ) sealed extent ( 3 GB ) sealed extent ( 3 GB ) p 1 d 0 d 1 d 2 d 3 d 4 d 5 p 2 Reed-Solomon 6 + 3 storage overhead 3x 1.5x reconstruction cost 6 used in Google GFS II (as of 2012) Tutorial at IEEE ICC (June, 2014) 116 p 3
sealed extent ( 3 GB ) overhead (6+3)/6 = 1.5x d 0 d 1 d 2 d 3 d 4 d 5 p 0 p 1 p 2 Tutorial at IEEE ICC (June, 2014) 117
sealed extent ( 3 GB ) overhead (6+3)/6 = 1.5x d 0 d 1 d 2 d 3 d 4 d 5 p 0 p 1 p 2 (12+4)/12 = 1.33x d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9 d 10 d 11 p 0 p 1 p 2 p 3 Tutorial at IEEE ICC (June, 2014) 118
p 0 d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9 d 10 d 11 p 1 reconstruction twice more expensive requiring 12 fragments (12 disk I/Os, 12 net transfers) p 2 p 3 Tutorial at IEEE ICC (June, 2014) 119
Conventional Reed-Solomon Coding Storage Overhead Reconstruction Cost sealed extent ( 3 GB ) d 0 d 1 d 2 d 3 d 4 d 5 p 1 p 2 p 3 1.5x 6 reads LRC sealed extent ( 3 GB ) p 1 d 0 d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9 d 10 d 11 p 2 p 3 1.33x 12 reads p 4 Tutorial at IEEE ICC (June, 2014) 120
sealed extent ( 3 GB ) x 0 x 1 x 2 x 3 x 4 x 5 y 0 y 1 y 2 y 3 y 4 y 5 LRC 12+2+2 : 12 data fragments, 2 local parities and 2 global parities storage overhead: (12 + 2 + 2) / 12 = 1.33x Local parity: reconstruction requires only 6 fragments Tutorial at IEEE ICC (June, 2014) 121
LRC 12+2+2 : reliability: RS 12+4 > LRC 12+2+2 > RS 6+3 Tutorial at IEEE ICC (June, 2014) 122
Tutorial at IEEE ICC (June, 2014) 123
Reconstruction Read Cost RS 12+4 12 10 8 6 RS 10+4 same cost 1.5x 1.33x RS 6+3 Reed-Solomon LRC same overhead half cost (6 3) LRC (12+2+2) 4 LRC (12+4+2) Tutorial at IEEE ICC (June, 2014) 2 0 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 Storage Overhead RS 10+4 : HDFS-RAID at Facebook RS 6+3 : GFS II (Colossus) at Google 124
RS (6 + 3) reconstruction cost = 6 RS (14 + 4) reconstruction cost = 14 LRC (14 + 2 + 2) reconstruction cost = 7 14% savings Tutorial at IEEE ICC (June, 2014) millions of $ savings! 125
LRC in Hadoop Tutorial at IEEE ICC (June, 2014) 126
8% cold x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 p 0 p 1 p 2 p 3 10 + 4 Reed-Solomon single failure reconstruction requires 10 fragments (10 disk I/Os, 10 net transfers) Tutorial at IEEE ICC (June, 2014) 127
x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 p 0 p 1 p 2 p 3 c 0 c 1 c 2 c 3 c 5 c 5 c 6 c 7 c 8 c9 c 0 c 1 c 2 c 3 s 0 local parity s 1 local parity s 2 implied parity Tutorial at IEEE ICC (June, 2014) add 2 local parities to existing 10 + 4 RS code choose c i carefully so that implied parity can be derived 4 global parities and 2 local parities sum to zero single failure of any chunk can be reconstructed by 5 fragments 14% extra storage for 50% savings in reconstruction 128
LRC in Hierarchical Storage Tutorial at IEEE ICC (June, 2014) 129
Note: no need to tolerate 2 JBOD failures Tutorial at IEEE ICC (June, 2014) 130
New erasure codes designed targeting multi-level durability requirements can reduce storage space Tutorial at IEEE ICC (June, 2014) 131
JBOD enclosure local parity global parity x 1 x 2 x 3 x 4 p x y 1 y 2 y 3 y 4 p y q z 1 z 2 z 3 z 4 p z storage overhead: 1.33x (LRC 12+3+1 ) < 1.5x (RAID6 4+2 ) But, does LRC indeed tolerate failures of 1 JBOD + 1 HDD? Tutorial at IEEE ICC (June, 2014) 132
JBOD enclosure local parity global parity x 1 x 2 x 3 x 4 p x y 1 y 2 y 3 y 4 p y q z 1 z 2 z 3 z 4 p z y 3 and z 3 are reconstructed using local parity p y and p z x 3 and x 4 are then reconstructed using p x and global parity q Shipped in Windows Server 2012 R2 and Windows 8.1 Tutorial at IEEE ICC (June, 2014) 133
PMDS and SD Codes Tutorial at IEEE ICC (June, 2014) 134
m = 4 n = 7 d 0 d 1 d 2 d 3 d 4 d 5 p 0 d 6 d 7 d 8 d 9 d 10 d 11 d 12 d 13 d 14 d 15 d 16 d 17 d 18 d 19 d 20 d 21 qy 40 qy 51 p 1 p 2 p 3 s = 2 r = 1 PMDS Codes m rows, n columns n drives, m x n sectors r row parities in each row s global parities tolerate r failures per row and s additional failures anywhere Tutorial at IEEE ICC (June, 2014) 135
n = 7 m = 4 d 0 d 1 d 2 d 3 d 4 d 5 p 0 d 6 d 7 d 8 d 9 d 10 d 11 d 12 d 13 d 14 d 15 d 16 d 17 p 1 p 2 recoverable case I r = 1 drive (column) failure s = 2 additional sector failures anywhere d 18 d 19 d 20 d 21 qy 40 qy 51 p 3 s = 2 r = 1 Tutorial at IEEE ICC (June, 2014) 136
n = 7 m = 4 d 0 d 1 d 2 d 3 d 4 d 5 p 0 d 6 d 7 d 8 d 9 d 10 d 11 d 12 d 13 d 14 d 15 d 16 d 17 p 1 p 2 recoverable case II r = 1 failures per row s = 2 additional failures anywhere d 18 d 19 d 20 d 21 qy 40 qy 51 p 3 s = 2 r = 1 Tutorial at IEEE ICC (June, 2014) 137
n = 7 d 0 d 1 d 2 d 3 d 4 d 5 p 0 recoverable case II m = 4 d 6 d 7 d 8 d 9 d 10 d 11 d 12 d 13 d 14 d 15 d 16 d 17 d 18 d 19 d 20 d 21 qy 40 qy 51 p 1 p 2 p 3 d 11 and d 19 recoverable from their row parities 4 parities for the remaining 4 failures similar to LRC s = 2 r = 1 Tutorial at IEEE ICC (June, 2014) PMDS codes are Maximally Recoverable (MR) codes 138
case I case II What if restricting to only case I? r Tutorial at IEEE ICC (June, 2014) s 139
m = 4 n = 7 d 0 d 1 d 2 d 3 d 4 d 5 p 0 d 6 d 7 d 8 d 9 d 10 d 11 d 12 d 13 d 14 d 15 d 16 d 17 d 18 d 19 d 20 d 21 qy 40 qy 51 p 1 p 2 p 3 s = 2 r = 1 SD Codes m rows, n columns n drives, m x n sectors r row parities in each row s global parities tolerate r column failures and s additional failures anywhere Tutorial at IEEE ICC (June, 2014) 140
case I case II SD codes handle case I, but not case II There are many constructions which are valid as SD codes, but not PMDS codes. Tutorial at IEEE ICC (June, 2014) 141
Efficient Repair of MDS Codes Tutorial at IEEE ICC (June, 2014) 142
a 1 b 1 a 1 b 1 a 1 b 2 a 2 b 2 a 2 b 2 a 2 b 1 b 2 Tutorial at IEEE ICC (June, 2014) 143
a 1 b 1 a 1 b 1 a 1 b 2 a 2 b 2 a 2 b 2 a 2 b 1 b 2 Tutorial at IEEE ICC (June, 2014) 144
a 1 b 1 a 1 b 1 a 1 b 2 a 2 b 2 a 2 b 2 a 2 b 1 b 2 Tutorial at IEEE ICC (June, 2014) 145
Tutorial at IEEE ICC (June, 2014) 146
Efficient Repair of Existing Codes Tutorial at IEEE ICC (June, 2014) 147
Tutorial at IEEE ICC (June, 2014) 148
Tutorial at IEEE ICC (June, 2014) 149
Tutorial at IEEE ICC (June, 2014) 150
Tutorial at IEEE ICC (June, 2014) 151
~20+% savings in general Tutorial at IEEE ICC (June, 2014) 152
Theoretical Bound on Efficient Repair Tutorial at IEEE ICC (June, 2014) 153
Efficient repair: 1.83x 69% savings! Tutorial at IEEE ICC (June, 2014) 154
Single Failure Repair of 6 + 6 MDS Code Reed-Solomon Coding Regenerating Coding # of nodes participating in repair 6 11 # of network transfers 6x 1.83x # of disk I/Os 6x up to 11x Tutorial at IEEE ICC (June, 2014) 155
network transfer: 3 (optimal), disk I/O: 4 (no saving) a 1 b 1 a 1 b 1 a 1 b 2 a 2 b 2 a 2 b 2 a 2 b 1 b 2 XOR before transmitting b 2 a 1 b 1 a 2 b 2 a 1 a 1 b 2 a 2 b 1 b 2 Regenerating Codes may require more disk I/Os than network transfers. Unfortunately, most RC papers do not discuss the difference! Tutorial at IEEE ICC (June, 2014) 156
Simple Regenerating Codes Tutorial at IEEE ICC (June, 2014) 157
not Tutorial at IEEE ICC (June, 2014) 158
(n=6, k=4, f=2)-src MDS precode placement node1 node2 node3 node4 node5 node6 (6,4)-RS (6,4)-RS tolerating arbitrary two failures any chunk recoverable with 2 I/Os overhead: 3/2 * 6/4 = 2.25x Tutorial at IEEE ICC (June, 2014) 159
single failure recovered efficiently 2 I/Os for each chunk 6 I/Os in total for all three chunks disk I/O = network I/O in repair Tutorial at IEEE ICC (June, 2014) 160
Tutorial at IEEE ICC (June, 2014) 161