A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster



Similar documents
Small Business Networking

A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster

FAULT TREES AND RELIABILITY BLOCK DIAGRAMS. Harry G. Kwatny. Department of Mechanical Engineering & Mechanics Drexel University

Small Business Networking

Small Business Networking

Enterprise Risk Management Software Buyer s Guide

Small Business Networking

How To Network A Smll Business

How To Set Up A Network For Your Business

Small Business Cloud Services

Performance analysis model for big data applications in cloud computing

Comp anies. Innova,ve. Promotion. a w n

Humana Critical Illness/Cancer

Techniques for Requirements Gathering and Definition. Kristian Persson Principal Product Specialist

The Velocity Factor of an Insulated Two-Wire Transmission Line

VoIP for the Small Business

Application Bundles & Data Plans

Introducing Kashef for Application Monitoring

Agenda. Who are we? Agenda. Cloud Computing in Everyday Life. Who are we? What is Cloud Computing? Drivers and Adoption Enabling Technologies Q & A

Binary Representation of Numbers Autar Kaw

AntiSpyware Enterprise Module 8.5

Kofax Reporting. Administrator's Guide

VoIP for the Small Business

SCRIBE: A large-scale and decentralized application-level multicast infrastructure

FortiClient (Mac OS X) Release Notes VERSION

VoIP for the Small Business

ClearPeaks Customer Care Guide. Business as Usual (BaU) Services Peace of mind for your BI Investment

9.3. The Scalar Product. Introduction. Prerequisites. Learning Outcomes

How To Reduce Telecommunictions Costs

Traffic Rank Based QoS Routing in Wireless Mesh Network

EasyMP Network Projection Operation Guide

Graphs on Logarithmic and Semilogarithmic Paper

EQUATIONS OF LINES AND PLANES

DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report

VoIP for the Small Business

VoIP for the Small Business

Data quality issues for accounting information systems implementation: Systems, stakeholders, and organizational factors

Data replication in mobile computing

1.00/1.001 Introduction to Computers and Engineering Problem Solving Fall Final Exam

Scalable Mining of Large Disk-based Graph Databases

Health insurance exchanges What to expect in 2014

2. Transaction Cost Economics

Protocol Analysis / Analysis of Software Artifacts Kevin Bierhoff

Human Pedigrees. Independent Assortment. Mendel s Second Law. Independent Assortment Test Cross. 4 phenotypes. Pedigree analysis:

trademark and symbol guidelines FOR CORPORATE STATIONARY APPLICATIONS reviewed

The CUBE. Thunderbolt and PCIe Expansion

Engineer-to-Engineer Note


How To Get A Free Phone Line From A Cell Phone To A Landline For A Business

Experiment 6: Friction

File Storage Guidelines Intended Usage

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

Abusfian Elgelany, Nader Nada Sudan University, Khartoum, Sudan, Fatih University, Istanbul,Turkey

Network Configuration Independence Mechanism

STRM Log Manager Installation Guide

IaaS Configuration for Virtual Platforms

Engineer-to-Engineer Note

VoIP for the Small Business

Vectors Recap of vectors

VoIP for the Small Business

JaERM Software-as-a-Solution Package

Lectures 8 and 9 1 Rectangular waveguides

Factoring Polynomials

VoIP for the Small Business

Example A rectangular box without lid is to be made from a square cardboard of sides 18 cm by cutting equal squares from each corner and then folding

A generic Decision Support System for integrated weed management

How fast can we sort? Sorting. Decision-tree model. Decision-tree for insertion sort Sort a 1, a 2, a 3. CS Spring 2009

How To Make A Network More Efficient

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions.

THERMAL EXPANSION OF TUNGSTEN

Secure routing for structured peer-to-peer overlay networks

WEB DELAY ANALYSIS AND REDUCTION BY USING LOAD BALANCING OF A DNS-BASED WEB SERVER CLUSTER

VMware Horizon Mirage Web Manager Guide

Student Access to Virtual Desktops from personally owned Windows computers

What is the closest Metro/Train or Bus station? The T - (

Economics Letters 65 (1999) macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999

The Journal of Systems and Software

Firm Objectives. The Theory of the Firm II. Cost Minimization Mathematical Approach. First order conditions. Cost Minimization Graphical Approach

Savvis IT Infrastructure. Savvis SaaS Infrastructure Solutions Enterprise-class Resources for SaaS Delivery

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers.

Warm-up for Differential Calculus

g(y(a), y(b)) = o, B a y(a)+b b y(b)=c, Boundary Value Problems Lecture Notes to Accompany

COMPUTER SECURITY CS 470. Catalog Description. Course Objectives. Course Materials

STATUS OF LAND-BASED WIND ENERGY DEVELOPMENT IN GERMANY

Health insurance marketplace What to expect in 2014

Morgan Stanley Ad Hoc Reporting Guide

Transcription:

A Solution to the Network Chllenges of Dt Recovery in Ersure-coded Distributed Storge Systems: A Study on the Fcebook Wrehouse Cluster K V Rshmi, Nihr Shh, D Gu, H Kung, D Borthkur, K Rmchndrn

Outline Introducon: Ersure coding in dt centers Low storge, high fult- tolernce High downlod & disk IO during recovery Mesurements from Fcebook wrehouse cluster in producon Proposed lternve: Piggybcked- RS codes Sme storge overhed & fult tolernce 30% reducon in downlod & disk IO

Outline Introducon: Ersure coding in dt centers Low storge, high fult- tolernce High downlod & disk IO during recovery Mesurements from Fcebook wrehouse cluster in producon Proposed lternve: Piggybcked- RS codes Sme storge overhed & fult tolernce 30% reducon in downlod & disk IO

Need for Redundnt Storge Frequent unvilbility in dt- centers commodity components fil frequently solwre glitches, mintennce shutdowns, power filures Redundncy gives more relibility nd vilbility

Populr pproch: Replicon Mulple copies of dt cross mchines Eg, GFS, HDFS store 3 replics by defult block 1 block 2 block 3 block 4 b b Typiclly stored cross different rcks, b: dt blocks

Petbyte Scle dt: Replicon expensive Modertely sized dt: storge is chep replicon vible Mulple tens of PBs ggregte storge no longer chep replicon is expensive

Ersure Codes Repliction Reed-Solomon (RS) code block 1 block 1 dt blocks block 2 block 2 b block 3 b block 3 +b prity blocks block 4 b block 4 +2b Redundncy 2x 2x

Ersure Codes Repliction Reed-Solomon (RS) code block 1 block 1 dt blocks block 2 block 2 b block 3 b block 3 +b prity blocks block 4 b block 4 +2b Redundncy 2x 2x First order comprison: tolertes ny one filure tolertes ny two filures

Ersure Codes Repliction Reed-Solomon (RS) code block 1 block 1 dt blocks block 2 block 2 b block 3 b block 3 +b prity blocks block 4 b block 4 +2b Redundncy 2x 2x First order comprison: tolertes ny one filure tolertes ny two filures

Ersure Codes Repliction Reed-Solomon (RS) code block 1 block 1 dt blocks block 2 block 2 b block 3 b block 3 +b prity blocks block 4 b block 4 +2b Redundncy 2x 2x First order comprison: tolertes ny one filure tolertes ny two filures

Ersure Codes Repliction Reed-Solomon (RS) code block 1 block 1 dt blocks block 2 block 2 b block 3 b block 3 +b prity blocks block 4 b block 4 +2b Redundncy 2x 2x First order comprison: tolertes ny one filure tolertes ny two filures

Ersure Codes Repliction Reed-Solomon (RS) code block 1 block 1 dt blocks block 2 block 2 b block 3 b block 3 +b prity blocks block 4 b block 4 +2b Redundncy 2x 2x First order comprison: In generl: tolertes ny one filure lower MTTDL, high storge requirement tolertes ny two filures order of mgnitude higher MTTDL with much lesser storge

Ersure Codes Using RS codes insted of 3- replicon on less- frequently ccessed dt hs led to svings of mulple Petbytes in the Fcebook Wrehouse cluster

Reed- Solomon (RS) Codes (#dt, #prity) RS code: tolertes filure of ny #prity blocks these (#dt + #prity) blocks constute stripe Fcebook wrehouse cluster uses (10, 4) RS code Exmple: (2, 2) RS code b +b +2b #dt = 2 (dt blocks) #prity = 2 (prity blocks) 4 blocks in stripe

Why RS codes? Mximum possible fult- tolernce for storge overhed storge- cpcity opml mximum- distnce- seprble (MDS) (in coding theory prlnce) Flexibility in choice of prmeters Supports ny #dt nd #prity

Why RS codes? Mximum possible fult- tolernce for storge overhed storge- cpcity opml mximum- distnce- seprble (MDS) (in coding theory prlnce) Flexibility in choice of prmeters Supports ny #dt nd #prity However result in incresed downlod nd disk IO during dt recovery

Dt Recovery: Incresed downlod & disk IO Repliction block 1 block 2 block 3 b Downlod & IO 1x block 4 b

Dt Recovery: Incresed downlod & disk IO Repliction Reed-Solomon code block 1 block 2 block 3 b Downlod & IO 1x block 1 block 2 block 3 b +b b +b Downlod & IO 2x block 4 b block 4 +2b

Dt Recovery: Incresed downlod & disk IO Repliction Reed-Solomon code block 1 block 2 block 3 b Downlod & IO 1x block 1 block 2 block 3 b +b b +b Downlod & IO 2x block 4 b block 4 +2b In generl Downlod & IO required = #dt x (size of dt to be recovered)

Dt Recovery: Burden on TOR switches AS/Router TOR TOR TOR TOR b + + b 2b node 1 node 2 node 3 node 4 Burdens the lredy oversubscribed Top- of- Rck nd higher level switches

Outline Introducon: Ersure coding in dt centers Low storge, high fult- tolernce High downlod & disk IO during recovery Mesurements from Fcebook wrehouse cluster in producon Proposed lternve: Piggybcked- RS codes Sme storge overhed & fult tolernce 30% reducon in downlod & disk IO

Brief System Descripon HDFS cluster with mulple thousnds of nodes Mulple tens of PBs nd growing Dt immutble unl deleted Reducing storge requirements is of high importnce

Brief System Descripon HDFS cluster with mulple thousnds of nodes Mulple tens of PBs nd growing Dt immutble unl deleted Reducing storge requirements is of high importnce Uses (10, 4) RS code to reduce storge requirements on less- frequently ccessed dt Mulple PBs of RS coded dt

Brief System Descripon 256 Mbytes dt blocks block 1 block 2 block 10

Brief System Descripon dt blocks block 1 block 2 block 10 1 byte 256 Mbytes prity blocks block 11 block 14

Mchine Unvilbility Events From HDFS Nme- Node logs Logged when no hert- bet for > 15min Blocks mrked unvilble, periodic recovery process #mchine-unvilbility events logged" Dy" Medin of 50 mchine- unvilbility events logged per dy

Missing blocks per stripe # blocks missing in stripe % of stripes with missing blocks 1 9808 2 187 3 0036 4 9 x 10-6 5 9 x 10-9 Dominnt scenrio: Single block recovery

#Blocks Recovered & Cross- rck Trnsfers Medin of 180 TB trnsferred cross rcks per dy for recovery operons Around 5 mes tht under 3- replicon

Outline Introducon: Ersure coding in dt centers Low storge, high fult- tolernce High downlod & disk IO during recovery Mesurements from Fcebook wrehouse cluster in producon Proposed lternve: Piggybcked- RS codes Sme storge overhed & fult tolernce 30% reducon in downlod & disk IO

Piggybcking: Toy Exmple Step 1: Tke (2, 2) Reed- Solomon code dt blocks block 1 block 2 1 2 b 1 b 2 prity blocks block 3 block 4 1 + 2 1 +2 2 b 1 +b 2 b 1 +2b 2 1 byte 1 byte

Piggybcking: Toy Exmple (In (2,2) RS code: recovery downlod & IO = 4 bytes) block 1 1 b 1 2 b 2 1 + 2 b 1 +b 2 block 2 2 b 2 block 3 1 + 2 b 1 +b 2 block 4 1 +2 2 b 1 +2b 2

Piggybcking: Toy Exmple Step 2: Add piggybcks to prity nodes block 1 block 2 block 3 block 4 1 2 1 + 2 1 +2 2 b 1 b 2 b 1 +b 2 b 1 +2b 2 + 1 No ddionl storge!

Fult- Tolernce (toy exmple) Sme fult tolernce s RS code: cn tolerte filure of ny 2 nodes block 1 block 2 block 3 block 4 1 2 1 + 2 1 +2 2 b 1 b 2 b 1 +b 2 b 1 +2b 2 + 1

Fult- Tolernce (toy exmple) Sme fult tolernce s RS code: cn tolerte filure of ny 2 nodes block 1 block 2 block 3 block 4 1 2 1 + 2 1 +2 2 b 1 b 2 b 1 +b 2 b 1 +2b 2 + 1 1 2

Fult- Tolernce (toy exmple) Sme fult tolernce s RS code: cn tolerte filure of ny 2 nodes block 1 1 b 1 block 2 2 b 2 block 3 1 + 2 b 1 +b 2 block 4 1 +2 2 b 1 +2b 2 + 1 subtrct 1 2

Fult- Tolernce (toy exmple) Sme fult tolernce s RS code: cn tolerte filure of ny 2 nodes block 1 block 2 block 3 block 4 1 2 1 + 2 1 +2 2 b 1 b 2 b 1 +b 2 b 1 +2b 2 + 1 1 2 b 1 b 2

Recovery (toy exmple) Downlod & IO only 3 bytes (insted of 4 bytes s in RS) block 1 block 2 block 3 block 4 1 2 1 + 2 1 +2 2 b 1 b 2 b 1 +b 2 b 1 +2b 2 + 1

Recovery (toy exmple) Downlod & IO only 3 bytes (insted of 4 bytes s in RS) b 2 block 1 1 b 1 b 1 +b 2 block 2 2 b 2 b 1 +2b 2 + 1 block 3 1 + 2 b 1 +b 2 block 4 1 +2 2 b 1 +2b 2 + 1

Recovery (toy exmple) Downlod & IO only 3 bytes (insted of 4 bytes s in RS) b 2 subtrct block 1 1 b 1 b 1 +b 2 block 2 2 b 2 b 1 +2b 2 + 1 block 3 1 + 2 b 1 +b 2 block 4 1 +2 2 b 1 +2b 2 + 1

Recovery (toy exmple) Downlod & IO only 3 bytes (insted of 4 bytes s in RS) b 2 block 1 1 b 1 b 1 +b 2 block 2 block 3 2 1 + 2 b 2 b 1 +b 2 b 1 +2b 2 + 1 subtrct block 4 1 +2 2 b 1 +2b 2 + 1

Generl Piggybcking Recipe To construct Piggybcked- RS code: Step 1: Tke RS code with idencl prmeters Step 2: Add crefully designed funcons from one byte stripe on to nother retins sme fult- tolernce nd storge overhed piggybck funcons designed to reduce mount of downlod nd IO for recovery Generl theory nd lgorithms: KV Rshmi, Nihr Shh, K Rmchndrn, A Piggybcking Design Frmework for Red-nd Downlod-efficient Distributed Storge Codes, in IEEE Interntionl Symposium on Informtion Theory (ISIT) 2013

(10,4) Piggybcked- RS lternve to (10,4) RS currently used in HDFS

(10,4) Piggybcked- RS code Step 1: Tke (10, 4) Reed- Solomon code block 1 block 10 block 11 block 12 block 13 block 14 1 10 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 3 ( 1,, 10 ) f 4 ( 1,, 10 ) b 1 b 10 f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) f 3 (b 1,,b 10 ) f 4 (b 1,,b 10 ) 1 byte 1 byte

(10,4) Piggybcked- RS code Step 2: Add `Piggybcks block 1 block 10 block 11 block 12 block 13 block 14 1 10 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 3 ( 1,, 10 ) f 4 ( 1,, 10 ) b 1 b 10 f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) f 3 (b 1,,b 10 ) + f 4 (0,,0, 4, 5, 6,0,,0) f 4 (b 1,,b 10 ) + f 4 (0,,0, 7, 8, 9,0) 1 byte 1 byte

(10,4) Piggybcked- RS code Tolertes ny 4 block filures block 1 block 10 block 11 block 12 block 13 block 14 1 10 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 3 ( 1,, 10 ) f 4 ( 1,, 10 ) b 1 b 10 f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) f 3 (b 1,,b 10 ) + f 4 (0,,0, 4, 5, 6,0,,0) f 4 (b 1,,b 10 ) + f 4 (0,,0, 7, 8, 9,0)

(10,4) Piggybcked- RS code Tolertes ny 4 block filures block 1 block 10 block 11 block 12 block 13 block 14 1 10 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 3 ( 1,, 10 ) f 4 ( 1,, 10 ) b 1 b 10 f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) f 3 (b 1,,b 10 ) + f 4 (0,,0, 4, 5, 6,0,,0) f 4 (b 1,,b 10 ) + f 4 (0,,0, 7, 8, 9,0) recover 1,, 10 like in RS

(10,4) Piggybcked- RS code Tolertes ny 4 block filures block 1 block 10 block 11 block 12 block 13 block 14 1 10 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 3 ( 1,, 10 ) f 4 ( 1,, 10 ) b 1 b 10 f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) f 3 (b 1,,b 10 ) + f 4 (0,,0, 4, 5, 6,0,,0) f 4 (b 1,,b 10 ) + f 4 (0,,0, 7, 8, 9,0) recover 1,, 10 like in RS

(10,4) Piggybcked- RS code Tolertes ny 4 block filures block 1 block 10 block 11 block 12 block 13 block 14 recover 1,, 10 like in RS 1 10 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 3 ( 1,, 10 ) f 4 ( 1,, 10 ) b 1 b 10 f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 1 ( 1, 2, 3,0,,0) f 3 (b 1,,b 10 ) + f 1 (0,,0, 4, 5, 6,0,,0) subtrct piggybcks (funcons of 1,, 10 ) f 4 (b 1,,b 10 ) + f 1 (0,,0, 7, 8, 9,0)

(10,4) Piggybcked- RS code Tolertes ny 4 block filures block 1 block 10 block 11 block 12 block 13 block 14 1 10 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 3 ( 1,, 10 ) f 4 ( 1,, 10 ) b 1 b 10 f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 1 ( 1, 2, 3,0,,0) f 3 (b 1,,b 10 ) + f 1 (0,,0, 4, 5, 6,0,,0) f 4 (b 1,,b 10 ) + f 1 (0,,0, 7, 8, 9,0) recover 1,, 10 like in RS subtrct piggybcks (funcons of 1,, 10 ) recover b 1,,b 10 like in RS

(10,4) Piggybcked- RS code Efficient dt- recovery block 1 1 b 1 block 2 2 b 2 block 3 3 b 3 block 10 10 b 10 block 11 block 12 block 13 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 3 ( 1,, 10 ) f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) f 3 (b 1,,b 10 ) + f 4 (0,,0, 4, 5, 6,0,,0) block 14 f 4 ( 1,, 10 ) f 4 (b 1,,b 10 ) + f 4 (0,,0, 7, 8, 9,0)

(10,4) Piggybcked- RS code Efficient dt- recovery block 1 1 b 1 block 2 2 b 2 block 3 3 b 3 block 10 10 b 10 block 11 block 12 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0)

(10,4) Piggybcked- RS code Efficient dt- recovery block 1 1 b 1 block 2 2 b 2 block 3 3 b 3 block 10 10 b 10 block 11 block 12 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) recover b 1,,b 10 like in RS

(10,4) Piggybcked- RS code Efficient dt- recovery block 1 1 b 1 block 2 2 b 2 block 3 3 b 3 block 10 10 b 10 block 11 block 12 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) recover b 1,,b 10 like in RS

(10,4) Piggybcked- RS code Efficient dt- recovery block 1 1 b 1 block 2 2 b 2 block 3 3 b 3 block 10 10 b 10 block 11 block 12 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) recover b 1,,b 10 like in RS subtrct f 2 (b 1,,b 10 )

(10,4) Piggybcked- RS code Efficient dt- recovery block 1 1 b 1 block 2 2 b 2 block 3 3 b 3 block 10 10 b 10 block 11 block 12 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) recover b 1,,b 10 like in RS subtrct f 2 (b 1,,b 10 )

(10,4) Piggybcked- RS code Efficient dt- recovery block 1 1 b 1 block 2 2 b 2 block 3 3 b 3 block 10 10 b 10 block 11 block 12 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) recover b 1,,b 10 like in RS subtrct f 2 (b 1,,b 10 ) remove effect of 2 nd 3 to get 1

(10,4) Piggybcked- RS code block 1 1 b 1 block 2 2 b 2 Downlod & IO: block 3 3 b 3 block 10 20 in RS 10 b 10 13 in Piggybcked- RS block 11 block 12 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 1 ( 1, 2, 3,0,,0) recover b 1,,b 10 like in RS subtrct f 2 (b 1,,b 10 ) remove effect of 2 nd 3 to get 1

(10,4) Piggybcked- RS code Efficient dt- recovery block 1 block 10 block 11 block 12 block 13 block 14 1 10 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 3 ( 1,, 10 ) f 4 ( 1,, 10 ) b 1 b 10 f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) f 3 (b 1,,b 10 ) + f 4 (0,,0, 4, 5, 6,0,,0) f 4 (b 1,,b 10 ) + f 4 (0,,0, 7, 8, 9,0) Repir of blocks 1,2,3

(10,4) Piggybcked- RS code Efficient dt- recovery block 1 block 10 block 11 block 12 block 13 block 14 1 10 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 3 ( 1,, 10 ) f 4 ( 1,, 10 ) b 1 b 10 f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) f 3 (b 1,,b 10 ) + f 4 (0,,0, 4, 5, 6,0,,0) f 4 (b 1,,b 10 ) + f 4 (0,,0, 7, 8, 9,0) Repir of blocks 4,5,6

(10,4) Piggybcked- RS code Efficient dt- recovery block 1 block 10 block 11 block 12 block 13 block 14 1 10 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 3 ( 1,, 10 ) f 4 ( 1,, 10 ) b 1 b 10 f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) f 3 (b 1,,b 10 ) + f 4 (0,,0, 4, 5, 6,0,,0) f 4 (b 1,,b 10 ) + f 4 (0,,0, 7, 8, 9,0) Repir of blocks 7,8,9

(10,4) Piggybcked- RS code Efficient dt- recovery block 1 block 10 block 11 block 12 block 13 block 14 1 10 f 1 ( 1,, 10 ) f 2 ( 1,, 10 ) f 3 ( 1,, 10 ) f 4 ( 1,, 10 ) b 1 b 10 f 1 (b 1,,b 10 ) f 2 (b 1,,b 10 ) + f 4 ( 1, 2, 3,0,,0) f 3 (b 1,,b 10 ) + f 4 (0,,0, 4, 5, 6,0,,0) f 4 (b 1,,b 10 ) + f 4 (0,,0, 7, 8, 9,0) Repir of block 10

Expected Performnce Storge efficiency nd relibility no ddionl storge vs RS sme fult- tolernce vs RS

Expected Performnce Storge efficiency nd relibility no ddionl storge vs RS sme fult- tolernce vs RS Reduced recovery downlod & disk IO 30% less for single block recoveries in stripe potenl reducon >50TB cross- rck trffic per dy

Expected Performnce Storge efficiency nd relibility no ddionl storge vs RS sme fult- tolernce vs RS Reduced recovery downlod & disk IO 30% less for single block recoveries in stripe potenl reducon >50TB cross- rck trffic per dy Recovery me: expect fster recovery need to connect to more nodes system limited by disk nd network bndwidth corroborted by preliminry experiments hence, expect higher MTTDL

Relted Work: Mesurements Exisng Studies Avilbility studies: Schroeder & Gibson 2007, Jing et l 2008, Ford et l 2010 etc Comprisons between replicon nd ersure codes: Rodrigues & Liskov 2005, Wetherspoon & Kubitowicz 2002 etc Our focus Incresed network trffic due to incresed downlods during recovery of ersure- coded dt Mesurements from Fcebook wrehouse cluster in producon

Relted Work: Codes for Efficient Dt Recovery Hung et l (Windows Azure) 2012, Sthimoorthy et l (Xorbs) 2013 dd ddionl pries: need extr storge Hu et l (NCFS) 2011 Network file system using repir- by- trnsfer codes (Shh et l): need extr storge Khn et l (Rotted- RS) 2012 #prity 3 (lso, #dt 36) Xing et l, Wng et l (Opmized RDP & EVENODD) 2010 #prity <=2 Our solu;on: Piggybcked- RS no ddionl storge: storge- cpcity opml ny #dt & #prity s good s or bezer thn Rotted- RS, opmized RDP & EVENODD

Summry nd Future Work Ersure codes require higher downlod & IO for recovery Mesurements from Fcebook wrehouse cluster in producon Piggybcked- RS: lternve to RS no ddionl storge required; sme fult- tolernce s RS 30% reducon in downlod & disk IO for recovery Future Work implementon in HDFS (in progress t UC Berkeley) empiricl evluon