Erasure Codes Made So Simple, You ll Really Like Them W. David Schwaderer August 7, 214 schwaderer_1@comcast.net Santa Clara, CA 1 Agenda Errors Versus Erasures HDD Bit Error Rate Implications RAID 4, 5, and 6 Review Objects and Dispersed Storage The Math Summary Questions 2 Errors Versus Erasures Communications Error Detection Asynchronous (Start/Stop) Communication Parity (e.g. Even/Odd) TCP/IP Checksums Ethernet CCITT-32 CRC (x+1) * Special Polynomial: (x 32 +x 26 +x 23 +x 22 +x 16 +x 12 +x 11 +x 1 +x 8 +x 7 +x 5 +x 4 +x 2 +x 1 +1 ) Low-Order Term Coefficients: 1.11.1.1.111.111.111 == x4c11db7 (32 bits) Ethernet Frame HDR Payload CRC Transmitter Modulo 2 Divisor: x4c11db7 Receiver Modulo 2 Divisor: x4c11db7 Receiver remainder!= xdebb2e3 Error! position unknown Erasure: Error in known position Key Take Away: Special Polynomials; Magic Numbers; Bit-Wise, Modulo 2 Arithmetic 3 Santa Clara, CA 1
HDD Facts of Life Nominal raw BER values: 1-5 to 1-6 => Improve reliability using ECC ECC-corrected BER ratings: Desktop hard disks ~ 1 in 1 14 Enterprise hard disks ~ 1 in 1 15 (ten times better). Desktop hard disks have ~8X the capacity of an enterprise disks. => Large capacity disks experience inevitable uncorrectable bit errors Disk arrays use RAID to improve reliability. (Spreads data across multiple disks) RAID ~ Redundant Array of Inexpensive Disks ~ Redundant Array of Independent Disks 4 Disk 1 RAID 4 & 5 Parity Stripe Across LUN Devices... Disk 2 Disk 3 Disk N-1 Disk P RAID 5 Distributes Parity... Disk 1 Disk 2 Disk 3 Disk N-1 Disk P ERASURE! Rebuilds are fraught with peril.days and weeks.marginal reliability Large capacity disks experience inevitable uncorrectable bit errors Recovered Disk 1 N Total Disks, N-1 Disks, N-1 of N Can Lose One Disk, May Have Rebuild Problems 5 RAID 4 & 5 Parity Stripe Disk 1 Disk 2 Disk 3... Disk N-1 Disk P unsigned long SectorLba; for (SectorLba =, SectorLba < MAX_LBA, SectorLba++) { RAID 5 Distributes Parity 1 2 1 2 1 2 1 2 1 2 Disk 1 Sector[SectorLba] Disk 2 Sector[SectorLba] Disk 3 Sector[SectorLba] Disk N-1 Sector[SectorLba] Parity Sector[SectorLba] } 51 51 51 51 51 6 Santa Clara, CA 2
RAID 6 Parity Disk 1 Disk 2 Disk 3 Disk 4...???? Disk N-3 Disk P Disk Q Disk R ERASURES! Rebuilds are fraught with peril.days and weeks.marginal reliability? Large capacity disks experience inevitable uncorrectable bit errors Recovered Disk 1 Recovered Disk 2 Recovered Disk 3 N Total Disks, N-3 Disks, N-3 of N Can Lose Three Disks, Even Fewer Rebuild Problems 7 Objects & Information Dispersal One Object (of many on device?) 1 2 3 13 14 P1 P2 P3 P4 P5 P6 P7 Disk Dispersing Object s Exploits: Collective storage device pool isolation/reliability Distributed scale-out infrastructure design strengths 1. Distributed rebuilds for lost object slices 2. Network bandwidth load balancing Eliminates 7X data redundancy and RAID rebuild weaknesses. However, extreme data availability brings no free lunch: Computations Read/Update/Write Workload Amplification Scale at Tail Laggard Compensations 14 of 21 8 Replication Storage Reclamation Object d Object Parities 9 Santa Clara, CA 3
Replication Storage Reclamation d Object Parities >5X Reclaimed Capacity 1 Club Members and Their Labels Finite Shape Club How many? Non Non Non Always One Member Three Non- Members Answer: Four Members 1 2 3 1 3 2 2 1 3 2 3 1 3 1 2 3 2 1 Take Away: Club members and their labels are different. Once assigned, label assignments are immutable. 11 Club Member Arithmetic ZERO 1 2 3 ZERO ZERO + = * = + = * = + = * = Reciprocal! MOD-4(6) + = * = +/- -/+ + = ZERO ZERO Take Away: Label assignments enable arithmetic operations. It s all about the labels Santa Clara, CA 4
Polynomial Shorthand Consider Degree 8 Polynomials ax 8 + bx 7 + cx 6 + dx 5 + ex 4 + fx 3 + gx 2 + hx 1 + kx Example Degree 8 Polynomial 4X 8 + X 7 + 2X 6 + 16X 5 + 1X 4 + 8X 3 + (-22)X 2 + 19X 1 + 36X Example Degree 8 Polynomial Shorthand 4..2.16.1.8.-22.19.36 Label! 13 X 8 + X 4 + X 3 + X 2 + 1 (Special Polynomial) X 8 + X 7 + X 6 + X 5 + 1X 4 + 1X 3 + 1X 2 + X 1 + 1X = 1.1.111 = x1d Polynomial Label! If α is a root, by definition: α 8 + α 4 + α 3 + α 2 + 1 = So, α 8 + α 4 + α 3 + α 2 + 1 = (α 4 + α 3 + α 2 + 1) = (α 4 + α 3 + α 2 + 1) or: α 8 = α 4 + α 3 + α 2 + 1 α = 1 = α7 + α 6 + α 5 + α 4 + α 3 + α 2 + α + 1 ~ x1 α 1 = α = α7 + α 6 + α 5 + α 4 + α 3 + α 2 + 1α + ~ x2 α 2 =α2 = α7 + α 6 + α 5 + α 4 + α 3 + 1α 2 + α + ~ x4 α 3 =α3 = α 7 + α 6 + α 5 + α 4 + 1α 3 + α 2 + α + ~ x8 α 4 =α4 = α 7 + α 6 + α 5 + 1α 4 + α 3 + α 2 + α + ~ x1 α 5 =α5 = α 7 + α 6 + 1α 5 + α 4 + α 3 + α 2 + α + ~ x2 α 6 =α6 = α 7 + 1α 6 + α 5 + α 4 + α 3 + α 2 + α + ~ x4 α 7 =α7 = 1α 7 + α 6 + α 5 + α 4 + α 3 + α 2 + α + ~ x8 α 8 =α4 + α 3 + α 2 + 1 = α 7 + α 6 + α 5 + 1α 4 + 1α 3 + 1α 2 + α + 1 ~ x1d α 9 =α(α 8 ) = α(α 4 + α 3 + α 2 + 1) = α 5 + α 4 + α 3 + α ~ x3a α 1 =α 2 (α 8 ) = α 2 (α 4 + α 3 + α 2 + 1) = α 6 + α 5 + α 4 + α 2 ~ x74 α 11 =α 3 (α 8 ) = α 3 (α 4 + α 3 + α 2 + 1) = α 7 + α 6 + α 5 + α 3 ~ xe8 b 7 b 6 b 5 b 4 b 3 b 2 b 1 b b 7 b 6 b 5 b 4 b 3 b 2 b 1 b α 12 =α 4 (α 8 ) = α 4 (α 4 + α 3 + α 2 + 1) = α 8 + α 7 + α 6 + α 4 = (α 4 + α 3 + α 2 + 1) + α 7 + α 6 + α 4 = α 7 + α 6 + α 3 + α 2 + 1 ~ xcd Linear Feedback Shift Register α 254 = α 7 + α 3 + α 2 + α 1 ~ x8e α 255 =α (α 254 ) = α (α 7 + α 3 + α 2 + α 1 ) = α 8 + α 4 + α 3 + α 2 = (α 4 + α 3 + α 2 + 1) + α 4 + α 3 + α 2 = 1 = α 14 + Linear Algebra Review X + Y + 4 + 2 = 9 3X - Y + 6-5 = 2 X + Y = 3 3X - Y = 1 4X = 4 => X == 1 X + Y = 3 => Y == 2 15 Santa Clara, CA 5
Galois Field Overview A finite set of elements (e.g. 2 8, 2 16, etc.) that have: Arithmetic operator Multiplicative operator A zero member Negatives and Reciprocals ~ + ~ * Field operations always produce another element in the set. (Closure) In this discussion, our Galois Field has: 256 elements (thingees) labeled x1 to xff, and x - [GF(8)] An Arithmetic operator (addition and subtraction) ~ XOR A Multiplicative operator ( * and ) ~ exponent operations 16 Galois Multiplicative Operator (X ) (Y) = (α log α X ) ( α log α Y ) = α [(log α X + log α Y) modulus 255] Noting that α K α -K = α K-K = α = 1...it follows that 1/(α K ) = α -K 255 non-zero members So... (X )/(Y) = (α log α X )/( α log α Y ) = (α log α X ) (1/( α log α Y )) = (α log α X ) (α -log α Y ) = α [(log α X - log α Y ) modulus 255] Note: Exponent additions/subtractions operations are base 16. 17 Example Calculations x2b x6f = x8f From Log Table (Table 2) x2b = α xda 16 = α 218 1 x6f = α x3d 16 = α 61 1 (Base 1 for Understanding Ease) So x2b x6f = α 218 1 α 61 1 From Powers Table (Table 1) α x18 = x8f = α (218 1 + 61 1)mod255 1 = α (279 1)mod255 1 = α 24 1 = α x18 16 18 Santa Clara, CA 6
Example Calculations (cont.) Arithmetic (Addition, Subtraction): Multiplication: x3f x12 x2d x28 == 1.111 b XOR 1.1 b =.11 b = x5 = α xa6 α xe = αxa6 + xe (From Table 2) (Base 16 Regular Addition) = α (x186)mod 255 = α x87 = xa9 Division: x3f (1/(x12)) = α xa6 α -xe = αxa6 - xe = α x1a5 - xe (From Table 1) (From Table 2) (Base 16 Regular Subtraction) (xa6 = xa6 + xff = ox1a5) = α xc5 = x8d (From Table 1) 19 Oddities Negative Values ~ XOR Thus, for any element V, V = -V since (V XOR V) = Reciprocals Let log α V == xnn (V ) Then 1/V = 1/(α xnn ) = α -xnn Proof: v/v = 1 = (α xnn ) / (α xnn ) = (α xnn ) [1/(α xnn )] = (α xnn ) (α -xnn ) (?) = (α xnn + -xnn ) = α = 1 2 Double Disk Failure Remembering Parity Disk P Generation: D 1 D 2 D 3 D N-2 = P (Normal RAID 5 Parity Generation) Parity Disk Q Generation: (K 1 D 1) (K 2 D 2) (K 3 D 3). (K N-2 D N-2 ) = Q Losing Disk 1 and 2 gives D 1 D 2 = P D 3 D N-2 = V 1 (1) (K 1 D 1) (K 2 D 2) = Q (K 3 D 3) (K N-2 D N-2 ) = V 2 (2) So... (K 1 D 1) (K 1 D 2) = (K 1 V 1) (K 1 D 1) (K 2 D 2) = V 2 Gives... D 2 (K 1 K 2) = (K 1 V 1) V 2 Or D 2 = ((K 1 V 1) V 2 ) / (K 1 K 2) Recover D 1 using RAID 5 Logic with P values 21 Santa Clara, CA 7
Summary Easy to Solve Independent Linear Equations X + Y = 3 3*X - Y = 1 Similarly, Losing Two Values Results in: X Y = V 1 x3 X Y = V 2 22 The End Thank you! schwaderer_1@comcast.net 23 Santa Clara, CA 8