A Practcal Stuy of Regeneratng Coes for Peer-to-Peer Backup Systems Alessanro Dumnuco an Ernst Bersack EURECOM Sopha Antpols, France {umnuco,bersack}@eurecom.fr Abstract In strbute storage systems, erasure coes represent an attractve soluton to a reunancy to store ata whle lmtng the storage overhea. They are able to prove the same relablty as replcaton requrng much less storage space. Erasure cong breaks the ata nto peces that are encoe an then store on fferent noes. However, when storage noes permanently abanon the system, new reunant peces must be create. For erasure coes, generatng anewpecerequresthetransmssonof kpecesoverthe network, resultng n a k tmes hgher reconstructon traffc as compare to replcaton. Dmaks propose a new class of coes, calle Regeneratng Coes, whch are able to prove both the storage effcency of erasure coes an the communcaton effcency of replcaton. However, Dmaks gave only a theoretcal escrpton of the coes wthout scussng mplementaton ssues or computatonalcosts. We have oneareal mplementaton of Ranom Lnear Regeneratng Coes that allows us to measure ther computatonal cost, whch can be sgnfcant f the parameters are not chosen properly. However, we also fn that there exst parameter values that result n a sgnfcant reucton of the communcaton overheaattheexpenseofasmallncreasenstoragecost an computaton, whch makes these coes very attractve for strbute storage systems. 1. Introucton PP(Peer-to-Peer) systems have receve a lot of attenton n recent years. In partcular, the research communty has shownanncreasngnterestntheuseofppsystemsfor flestorage[1],[],[3],[4].thsapplcatoncanbevery attractve for two man reasons:() centralze solutons are expensve() common PCs are equppe wth hgh-capacty local sks, whch are often unerutlze. The man challenge n esgnng storage systems s to guarantee the persstence of the store ata. Ths s nontrval because storage peers are not totally relable: they may face falures, ata corrupton or accental ata losses. Angreunancytothestoreatasthebasctool to acheve ata urablty n spte of falures of the storng peers. The smplest reunancy scheme s replcaton, whch conssts n creatng multple copes of ata an spreang them n fferent locatons. A more complex metho s represente by erasure coes: the ata are processe n orertoprouce k + hpecessuchthatany kofthemare suffcent to reconstruct the orgnal fle, then these peces are sprea across fferent peers. Prevous works have shown [5],[6]howerasurecoesareabletoprovethesame level of relablty as replcaton wth much lower storage requrements. Reunancy alone s not enough to prove ata urablty. Snce peers mght leave permanently the system, some of the ntal reunancy mght be lost. Ths means that the number of peces or replcas present n the system mnshes wth tme. Perocally ths number must be refurbshe by the mantenance, whch s performe by the means of repars. A repar conssts n rebulng a lost replca or pece usng the avalable ones. A number of works [5], [7], [8] showe how every pece repare n erasure coes requre that k other avalable peces must be rea(whch correspons to the sze of the orgnal ata), whle n replcaton the repar of one replca nees that only one other replca s rea. In strbute systems ata accesses translate nto network transfers, for ths reason, uner banwth constrants, lke n PP systems, erasure coes become mpractcal an replcaton s the only feasble soluton. Dfferent solutons have been propose to couple the storage effcency of erasure coes wth the communcaton effcency of replcaton. Rorgues an Lskov[5] propose an hybr replcaton/erasure coe soluton, n whch a full replcaofthefleshelbyaspecalpeer,whleotherpeers store erasure coe peces. Repars are always performe usng the full replca, wth a communcaton cost equal to the replcaton case. Ths metho, however, ntrouces an asymmetry n the ata mantenance, causng both a hghercomplextyof the system an a loss n terms of storage effcency. Dumnuco an Bersack [8] propose a class of coes calle Herarchcal Coes, n whch the repar communcaton cost s on average much smaller than for erasure coes. However, Herarchcal Coes have the savantage that not all the subsets of k peces are suffcent to reconstruct the orgnal fle. Fnally Dmaks et al.[7]
propose a generalzaton of tratonal erasure coes calle Regeneratng Coes, for whch the communcaton costs urng repar s sgnfcantly reuce as compare to Ree Solomon Erasure coes. Dmaks et al. n[7] presente a theoretcal framework that allows to prove the exstence of thesecoeswhlewuetal.showen[9]howtobul etermnstc Regeneratng Coes base on lnear coes. However, none of the two papers nvestgate the computatonal cost or propose how to mplement such coes. We beleve that Regeneratng Coes represent a very attractve soluton for reunancy schemes n PP storage systems an eserve a eeper analyss wth respect to mplementaton an eployment, whch s the subject of ths paper. In secton we prove an ntroucton to ata reunancy schemes n PP storage systems an recall the man theoretc results on Regeneratng Coes([7]). In secton 3 we escrbe our mplementaton of Ranom Lnear Regeneratng Coes, whle we perform n secton 4 an analytcal evaluaton of ther cost n terms of storage an computaton. In secton 5 we test our mplementaton an evaluate the fferent cost performance trae-offs. Fnally, secton 6 conclues the paper.. Backgroun.1. Data Reunancy Schemes Inthssectonwegveaformalescrptonoftheoperatons performe n a common reunancy scheme. These operatons can be groupe n three stnct phases, whch followthelfecycleofagenercflethatsstorenthe system. 1) Inserton: The nserton conssts n processng the fle, creatng (k+h) reunant peces an strbutng them over stnct peers. The processng can be as trval as bulngreplcasofthefle 1,orcanbeacomplexcong operaton. No matter whch reunancy scheme s use,thepropertyofthesepecessthatany kofthem aresuffcenttoreconstructtheorgnalfle. ) Mantenance: Mantenance conssts n rebulng the reunancy lost ue to peer falures. Mantenance s performe by the means of repars. A repar requres thecooperatonof peersthatsenatatoanew peer 3,callenewcomer,whchnturnprocessesthe receveatatoobtananewpece.wereferto as the repar egree. If the repar s correctly execute, thenewpecehasthesamepropertesasalltheothers,.e.wthany (k 1)otherpecestformsasetofpeces suffcent to reconstruct the orgnal fle. 1.Thssthereplcatoncasewhere k = 1.. There exst reunancy schemes, n whch ths property s not satsfe lken[8],buttheseschemesarenotofnterestnthswork. 3.Anewpeersapeerthatatthemomentnotstornganypeceofthe fle. 3) Reconstructon: If the owner of the fle wants to retreve t from the system, a reconstructon nees to be performe. The reconstructon conssts n ownloang ata from k peers an processng them to obtan the orgnal fle. Intherestofthepaperwerefertotheszeofthefle as fle antotheszeofapeceas pece (ngeneralwe refertotheszeofagenercobject xas x ). From the escrpton t s clear that a reunancy scheme mples three kns of costs: 1) Storage: Reunancy mples that the store fle consumes more storage space than the orgnal fle. The storage requrement s easly compute by: storage = (k + h) pece > fle ) Communcaton: All three phases n the lfe cycle requre ata to be transferre among peers. At nserton, all the peces must be transferre, whch amounts to a volume of storage. At mantenance, for every repar peersuploaeachanamountofataequalto repar up tothenewcomerforatotalof repar own, wththeobvousrelaton: repar own = repar up. At reconstructon, the fle owner nees to ownloa atleastanamountofataequalto fle (Seesecton 3. for etals). 3) Computaton: When cong s use, all the three phasesescrberequreprocessngofata 4.Atnserton,allthepecesneetobecoewthacost of CPU(encong). At repar, part of the processngsoneonthe partcpatngpeers,enoteas CPU(repar) up an part s one on the newcomer, enote as CPU(repar) own. At reconstructon, the orgnal fle must be reconstructe from k peces wth a cost CPU(reconstructon). The partcular reunancy scheme efnes how the reunantataaregenerateanhanleanwhatsthecost n terms of computaton, communcaton, an storage. As an example let us conser tratonal erasure coes(lke Ree-Solomon coes[1]). For these coes, the followng two constrants hol w.r.t. the repar egree an the pece sze: = k (E1) pece = fle /k whch means that every repar s performe collectng ata from = kexstngpeersanthateverypeerstoresan amountofataequalto 1/koftheflesze.Itcanbeshown that gven these constrants, the amount of ata that nees to be transferre from every partcpatng peer to the newcomer sequaltotheszeofapece,whchmeansthatntotal anamountequvalenttotheszeofthewholeflewllbe transmtte. In terms of mantenance, the communcaton costsare: repar up = pece an repar own = fle. 4. In case of replcaton there s no processng.
Notethatthsmeansthatforeverynewbtthatwecreate urng a repar, k exstng bts nees to be transferre. The computaton costs are mplementaton epenent(see secton 4 for etals)... Descrpton of Regeneratng Coes Inths sectonwe gveaquckovervewoftheman propertes of Regeneratng Coes from [7]. In essence Regeneratng Coes try to aress the followng queston: whatsthempactonthecommuncatoncostfwerelax the constrants efne for tratonal erasure coes gven n eq. E1? Gven k an h, Regeneratng Coes can take k h fferent values for the par of parameters (, pece ). In fact Regeneratng Coes can be consere a generalzaton of tratonal erasure coes, whch trae-off ncrease storage cost for reuce communcaton cost. More formally, a generc Regeneratng Coe enote by RC(k, h,, ),setsthefollowngconstrantsontherepar egree an the pece sze: [k, k + h 1] pece = p(, ) fle [, k 1] (E) Gvenareparegree,theparameter,whchweefne as the pece expanson nex, etermnes the pece sze throughthefuncton p(, ),whchsefne 5 as: k + + 1 p(, ) = k( k + 1) + (k 1) ItcanbeprovethatRC(k, h,, )requresthateachofthe peers partcpatng to a repar transfer to the newcomer an amountofataatleastequalto where r(, )sefneas: r(, ) = repar up = r(, ) fle k( k + 1) + (k 1) consequently repar own = r(, ) fle. Inths paperwefxthevaluesfor k = 3an h = 3,whchallowsthesystemtosustanupto 3losses.We conser ths reasonable uner the massve churn we may observe n an Internet scenaro[3]. However, results wth other parameters show the same trens. Fg.1epctshowthepecesze pece anthevolume ofrepartraffc repar own evolveasafunctonof an foracoewth k = 3an h = 3.Inpartcularall thevaluesarerelatvetothepeceszeanthevolumeof repar traffc requre by a tratonal erasure coe, whch n the framework of Regeneratng Coes woul correspon to RC(3, 3, 3, ),.e.wth = 3an =.Asescrbe 5.Wereformulate theexpressons gven n[7]nafferent wayto facltate the successve computatons. pece stretch repar own reucton 1.8 1.6 1.4 1. 1.8 3 36 4 44 48 5 56 6 1.1 (a) pece =31 = =15 =7 =.1 3 36 4 44 48 5 56 6 (b) repar own = =3 =7 =15 =31 Fgure 1. Sze of the peces an repar communcaton cost (n log-scale) normalze by the reference values of a tratonal erasure coe, forrc(3, 3,, ). n the prevous secton these reference values are: pece = fle /3an repar own = fle. We see that movng to larger repar egree an to larger pece sze(ncreasng the pece expanson factor ) t s possble to obtan an mpressve reucton of the repar traffc. Authors n [7] entfy two notable cases for the values = an = k 1.For =,theszeofthepeces stays constant at the mnmum possble sze an the coes are calle Mnmum Storage Regeneratng coes(msr). For = k 1,repartraffcsmnmzeanthecoesare calle Mnmum Banwth Regeneratng coes(mbr). 3. Ranom Lnear Implementaton Exstng works on Regeneratng Coes[7],[9] present the theoretc framework that supports the constructon of Regeneratng Coes an gve an ntuton for a possble mplementaton base on ranom lnear coes, wthout provng etals. We propose a precse escrpton of such an mplementaton an scuss ts practcal mplcatons. 3.1. Tratonal Ranom Lnear Coes LetusfrstexplanhowRanomLnearCoesworkfor the case of tratonal erasure coes. The essence of ranom lnear coes s that all the operatons are lnear operatons
overagalosfelonfxeszeatafragments.aganwe escrbe these operatons followng the lfe cycle of a fle. 1) Inserton:Inthsphasewehavetocreate k+hpeces ofsze fle /k.toothat,tsenoughtobreakthe flen n fle = kequalsze(orgnal)fragments,an computeanyofthe k + hpecesasaranomlnear combnaton of them. The ranom coeffcents use for such combnatons are store along wth the peces. ) Mantenance: As alreay explane, a repar n tratonal erasure coes requres the transfer of the whole pece from partcpatng peers to the newcomer. The newcomer then buls the new pece performng a ranom lnear combnaton of the receve peces. Agan the resultng coeffcents are store along wth the new pece. 3) Reconstructon: The owner of the fle ownloas k pecesfrom k otherpeersanuses these peces to reconstruct the fle. The proceure conssts of nvertng, f possble, the matrx compose by the coeffcents of all the receve peces, an multplyng thenvertematrxbythepeces.theresultsofsuch a multplcaton are the orgnal fragments,.e. the orgnal fle. Theory on Ranom Lnear Network Coes[11],[1],[13] says that the probablty to successfully nvert the matrx upon reconstructon epens only on the sze of the Galos Fel an that ths probablty can be mae arbtrarly close to 1 by ncreasng the sze of the Galos Fel. For all practcalpurposesafelszeequalto 16 sconsere suffcent. 3.. Ranom Lnear Regeneratng Coes In tratonal erasure coes, ranom lnear mplementaton s straghtforwar, because all the operatons are performe onasetofpeces,whchmeansthattheszeofthepece canbeuseasthebascuntofnformatonnallthelnear combnatons an n the econg. In Regeneratng Coes thngs are fferent because they allowthattheamountofatastore pece,snotnecessarlyequaltotheamountofatatransmttebyapartcpantuponarepar repar up anthattheamountofata ownloaebyanewcomer repar own snotamultpleof pece. Inotherworsthebascuntofnformaton,whchsthe szeofthefragmentswebreaktheorgnalflento,cannot betheszeofthepeceanymore.ifweenotethsszeas fragment, we can wrte the constrants t has to fulfll: fle = n fle fragment pece = n pece fragment repar up = n repar fragment (E3) where n fle, n pece an n repar arentegers.usngequatonsn secton., we can compute: pece p(, ) = repar up r(, ) = k + + 1 an: fle repar up = 1 k( k + 1) + (k 1) = r(, ) Bothratosarentegers.Thsmeansthatwecanset n repar = 1,whchcorresponstosettng fragment = repar up,an consequently: k( k+1)+(k 1) n fle = n pece = k + + 1 (E4) Gven these parameters we can escrbe the operatons neee n Ranom Lnear Regeneratng Coes: 1) Inserton:Webreaktheflen n fle equalszeorgnal fragments, an compute any of the k + h peces as n pece ranomlnearcombnatonsof them. The ranom coeffcents use for such combnatons are storealongwththepece.theyforma(n pece, n fle ) matrx 6. ) Mantenance: A repar nvolves exstng peers, whchsenatatothenewcomer.theatasentby anyofthe peerscorrespontotheresultsofone ranom lnear combnaton of the n pece fragments contane n the store pece, as epcte n fgure Fg. (a). The newcomer receves thus fragments an the corresponng coeffcents an obtans ts new peceas n pece ranomlnearcombnatonsofthem,as epcte n Fg. (b). Note that n the partcular case of = n pece thenewcomeroesnotneetoperform lnear combnatons of the receve fragments, snce they consttute alreay the new pece. 3) Reconstructon: The owner of the fle ownloas k pecesfrom k peers,whchcorresponto n pece k fragments, along wth the coeffcents whch form a (n pece k, n fle )matrx.ittrestofn n fle nepenent rows n the coeffcent matrx, then t nverts the resultng square submatrx an fnally multples ths matrx by the concerne fragments. An mportant remark s that f the fle owner ownloas k peces, t potentally ownloas an amount of ata qute bgger than the flesze.in[7]tsclamethatthscanrepresent a sgnfcant rawback for Regeneratng Coes. In our mplementaton, we elmnate ths shortcomng: we ownloa only the coeffcents, we extract a fullrank square submatrx, we nvert t, an fnally we ownloaonlythe n fle fragmentscorresponngtothe nvertble submatrx that was extracte. In ths way we ownloaalwaysanamountofataequaltothefle sze, wthout payng any extra-cost. 6.In ournotaton a (n, m) matrx samatrx wth nrows an m columns.
4.1. Impact of coeffcents (a) Partcpant se (b) Newcomer se Fgure. Repar scheme on the partcpant se an on the newcomer se. Every arrow ncates a partcpaton to a ranom lnear combnaton. 4. Analytcal Evaluaton In ths secton we perform an analytcal evaluaton of the Ranom Lnear Regeneratng Coes. To o ths, we gve a formal escrpton of the lnear operatons performe. All the ata we hanle can be nterpreteas a sequence of values, calle elements, n a gven Galos Fel. Usually theszeofsuchfelschosentobeequalto q,snce thsspeesupthecomputaton.inthscaseeveryvalues asequenceof qbts,acommonchoces q = 16,whch correspons to an element sze of bytes. Every fragment sthusrepresentebyavectorof l frag = ( fragment /q) elements.thewholeflesthusrepresentebya(n fle, l frag ) matrx,enoteas F nfle,l frag.asetof nencoefragments srepresenteasa(n, l frag )matrx E n,lfrag 7,thsmatrxcan bealwaysrepresenteasasetoflnearcombnatonsofthe orgnal fragments: f n,lfrag = C n,nfle F nfle l frag where C n,nfle areelementsnthefelanrepresentthe coeffcents assocate wth the set of fragments. coeffcent overhea 1 1.1 =31 = =15.1 =7 =.1 3 36 4 44 48 5 56 6 Fgure 3. Coeffcent overhea (n log-scale) of RC(3, 3,, ) for a 1 MByte fle. 7.Inournotaton F n,mcorresponstoasetoforgnalfragments,whle E n,mtoasetofencoefragments. Thefrstquestonwearesssthempactofthecoeffcents n the storage an n the communcaton costs. Snce wtheveryfragment,weassocateasetof n fle coeffcents, the relatve mpact of the coeffcents s gven by the rato: r coeff = n fleq fragment = n fle fle q ths rato can be nterprete as the overhea ue to coeffcents: for every bt of ata we nee r coeff bts of coeffcents. Note that ths rato s nversely proportonal wththeszeoftheflewestore,thsmeans,asonecoul expect, that the bgger the fle the smaller s the coeffcent overhea. More mportantly, the overhea ncreases wth the squareof n fle,whchncreasessgnfcantlyaswencrease the parameters an n Regeneratng Coes(see eq. E4). To unerstan the mpact of ths atonal cost, let us conser the class of regeneratng coes RC(3, 3,, ) an letusassumethatthefelszes q = 16,whchcorrespons toanelementszeofbytes.infg.3weplotthevaluesof the coeffcent overhea when the orgnal fle sze s fle =1 MByteforallthepossblevaluesof an. Forsuchasmallflesze,thecoeffcentoverheasnot neglgble: n the most expensve confguraton for 1 bt ofata,morethan4btsofcoeffcentsareneee,whch s clearly unacceptable. By ncreasng the fle sze, ths overheaecreases 8.Themplcatonoffgure3sthatwhen usng Regeneratng Coes, system esgners nee to choose a mnmum sze for storage objects that s sgnfcantly bgger than for tratonal erasure coes. 4.. Computatonal Complexty Oneofthemanconcernsntheemploymentofcongn real systems s the computatonal effort that they requre. In ths secton we propose a formal analyss of Ranom Lnear Regeneratng Coes. All the operatons are performe n Galos Fels. Therefore, we nee to make sure to control the cost of the operatonsbychoosngtherghtfelsze.ifwesetthe felszeequalto q,wth q = 16alltheoperatonsare performe on unsgne short ntegers( bytes). In ths case Atons an subtractons correspon to an XOR operaton between two elements. Multplcaton an vson are performe n the logspace.forexample: a bbecomes exp(log a + log b). logan expforallthepossblevaluesnthefelare compute offlne an store, whch requres 56 KB ofmemoryfor q = 16.Theoperatons logan exp canthenbemplementeasvaluelookupsnavector, 8.Theactualoverheasgvenbythevaluesshownnfgure3ve bythefleszenmbytes
whch allows to mplement vsons an multplcaton n3lookupsan1aton. All the operatons we perform n Regeneratng Coes can be reuce to:(1) Lnear Combnatons an() Matrx nversons. Let us analyze them n etals: 1) A lnear combnaton of n vectors of length l conssts n n latonsan n lmultplcatonsforatotalof 5nl operatons. ) Thenversonofasquare (n, n)matrxconsstsn n 3 atonsan n 3 multplcatonsthatcanbemplemente 5n 3 operatons.actuallyforregeneratng Coes the stuaton s slghtly fferent: we have a (m, n) matrx, m n from whch we nee to extract n rows that are lnearly nepenent, whch wll result n a (n, n) submatrx that can then be nverte. Extracton an nverson are one n parallel an the cost wll vary accorngly to the partcular matrxbetweenthebouns 5n 3 an 5mn. Nowwehaveallthebasctoolstocomputethecomplexty of Regeneratng Coes along the lfetme of a fle: 1) Inserton:Inthsphaseweperform (k+h) n pece lnearcombnatonsof n fle fragmentsforatotalnumber of operatons equal to: CPU(encong) = 5(k + h) n fle n pece l frag Usng the efntons of the fferent parts we obtan: CPU(encong) = 5 (k + h) n pece fle (E5) ) Mantenance: As alreay explane, n a repar, part oftheworksoneonthepartcpatngpeersan another part s one on the newcomer. On every partcpatng peer we perform one lnear combnaton of n pece fragments,whchcorresponsto: CPU(repar) up = 5 n pece l frag ong some manpulatons we obtan that the number ofoperatonssproportonaltotheszeofthepece expresse n bytes: CPU(repar) up = 5 pece (E6) Onthenewcomerweperform n pece lnearcombnatons of fragments, whch correspons to: CPU(repar) own = 5 n pece l frag = CPU(repar) up (E7) Note that every fragment s also assocate wth a set of coeffcents. Ths means that every tme that a new fragment s generate as a lnear combnaton of other exstng fragments, ths operaton must be performe also on the corresponent coeffcents, n orer to obtan the coeffcents assocate wth the new fragment. In terms of computaton cost, ths can be taken nto account assumng that the fragment sze s vrtually ncrease by the sze of coeffcents, whch sgvenbytheoverheansecton4.1. 3) Reconstructon: We can splt the reconstructon n twophases:(1)weneetoextract n fle lnearnepenentrowsfromak n pece n fle matrx,anthennvert the obtane submatrx() We multply ths submatrx by the corresponent encoe fragments. Accorng tothesetwophases,thecostofreconstructoncanbe splt n two components as well: CPU(reconstructon) = CPU(nverson)+CPU(econg) As explane before the cost of the nverson s boune by two lmts: 5 n 3 fle <CPU(nverson) < 5 k n pece n fle (E8) Theecong,then,corresponsto n fle lnearcombnatonsof n fle fragments,whchleasto: CPU(econg) = 5 n fle l frag = 5 n fle fle Notethatallthecosts,exceptfromthenversoncost,are lnearly epenent to the fle sze fle (Ths hols also for repar, snce pece s n turn proportonal to fle ). 5. Expermental Evaluaton In ths secton we evaluate the resource requrements of Regeneratng Coes. For ths purpose, we wrote an optmze C mplementaton of Ranom Lnear Regeneratng Coes thatwe executeonan IntelCore DuoCPU at.66ghz. We execute all the operatons performe n the lfe cycle ofastorefle,asescrbensecton4,anmeasurethe tme neee to perform these operatons. All the experments have been one for a fle of 1 MByte n sze an the RegeneratngCoeparametersarefxeto k = 3, h = 3, ancantakeallpossblevaluesfor an. 5.1. Computatonal Cost To have a bass for comparng fferent confguratons of Regeneratng Coes, we frst show the results obtane for a tratonal erasure coe,(.e. a Regeneratng Coe wth RC(3, 3, 3, ))whenafle of 1MBytesstore.Let t, enotethetmeneeebyapartcularoperatonfor a Regeneratng Coe RC(3, 3,, ). The followng table showsthetme t 3, neeeforeachoperaton: t 3, [sec] Encong.5 Partcpant Repar Newcomer Repar.1 Matrx Inverson. Decong.5
computaton overhea computaton overhea computaton overhea computaton overhea 7 6 5 4 3 1 8 6 4 16 1 8 4 7 6 5 4 3 1 computaton overhea 6 5 4 3 1 5 1 15 5 3 354455556 (a) Encong 5 1 15 5 3 354455556 (b) Repar: Partcpant se. 5 1 15 5 3 354455556 (c) Repar: Newcomer se. 5 1 15 5 3 354455556 () Reconstructon: Matrx Inverson. 5 1 15 5 3 354455556 (e) Reconstructon: Decong. 7 6 5 4 3 1 8 6 4 16 1 8 4 7 6 5 4 3 1 Fgure 4. Computaton overhea for RC(3, 3,, ). 6 5 4 3 1 Note that the partcpant repar has a computaton tme of zero because n tratonal erasure coes repars o not requre any computaton at the partcpant se, whch smply sens to the newcomer the entre pece. Let us now ntrouce the results obtane for the general case of Regeneratng Coes RC(3, 3,, ). To unerstan the computatonal overhea of these coes, we conser the ratobetweenthetme t, anthetme t 3, measurefor tratonal erasure coes. We call ths rato computaton overhea r cpu : r cpu = t, t 3, The computaton overhea tells us how much a gven Regeneratng Coe s slower than a tratonal erasure coe. Followngthelfecycleofaflewehave: 1) Inserton: We show n Fg. 4(a) the computaton overheaofthentalencongofthefle.wesee thattheoverheagrowslnearlywth an.ths sconsstentwtheq.e5,whchsaysthatthecosts proportonalto n pece,whchsnturnlnearwth an aswecanseefromeq.e4. ) Mantenance: Fg. 4(b) shows the computaton overheaonthepartcpantse 9,nthscasethecomputaton overhea grows slghtly more than lnearly wth an,snceasweknowfromeq.e6tsproportonal tothepecesze,whchnturnhasthebehavorshown n Fg. 1(a). Fg. 4(c) shows the computaton overhea on the newcomer se. From eq. E7, ths cost s proportonal to tmes the cost on the partcpant se, whch s confrme by the roughly quaratc relaton wth shownbytheplot.notethatfor = k 1 the overhea falls to zero, snce for ths confguraton the newcomer oes not nee to combne the receve blocks, but smply stores them(c.f. secton 3.). 3) Reconstructon: The reconstructon requres the nverson of the matrx coeffcents an then the econg of the fragments. Fg. 4() shows the computaton overhea for the nverson, whch as we know from eq.e8growsroughlyas n 3 fle.inversoncanbecomputatonally very expensve, n partcular for large values of an. Fg. 4(e) shows the computaton overhea of the econg, whose shape closely resembles the one for encong(see Fg. 4(a)), whch s expecte snce both perform analogous operatons. 5.. Bottleneck Network Banwth As outlne n secton.1, a reunancy scheme ntrouces three fferent costs, namely computaton, storage 9.Notethatthscostsequaltozerontratonalerasurecoes,for ths reason the normalzaton s one by the smallest value larger than zero whchoccursfor = 33an = ansequalntermsofcomputaton tme to.3 sec.
Bottleneck Network Banwth Communcaton Storage Encong Repar Reconstructon Partcpant Newcomer Matrx Inverson Decong repar own storage 3 31. Mbps 777.3 Mbps 7.8 Mbps 4.6 Mbps 1MB MB 63 3 655Kbps 11.Mbps 1.Mbps 383Kbps 48Kbps 4.47KB.61MB 3 3 1.9Mbps 1.6Mbps 1.6Mbps 1.6Mbps 1.3Mbps 6.18KB 3.76MB 4 1 3.1Mbps 7.5Mbps 76.8Mbps 1.5Mbps.5Mbps 18.4KB.6MB Table 1. Resource requrements ofrc(3, 3,, ) for a 1 MByte fle. an communcaton. So far we have only consere computaton. However, what we are really ntereste n s to evaluate whch resource(computaton or communcaton) s the overall performance bottleneck of the system. In a strbute storage system the ata hanle must be transferre over the network. Let us assume that the transfer operaton s ppelne wth the cong, whch means n the case of nserton that each fragment s transmtte as soonastsproucebythentalencongstep.ifthe transfer takes longer than the computaton, then the bottleneck s communcaton, an the use of a computatonally more effcent coe wll not make the nserton operaton faster. Ths means that whether or not computaton has an mpact on the overall performance of the system epens on the avalable network banwth of the partcpatng peers. For ths purpose we want to know the mnmum network banwth of a peer, for whch the computaton represents the bottleneck for the overall performance. We call ths banwth bottleneck network banwth, whch senoteasbnb. The bottleneck network banwth can be compute as thebanwthforwhchthetransfertmesequaltothe computatontme.if t, enotesthetmeneeetoperform anoperatonforrc(3, 3,, )an ata, enotesthe amountofatahanlebythatoperatonthatneetobe transmtte over the network. We have: bnb, = ata, t, From the above efnton t s clear that the bottleneck network banwth also gves the amount of ata that can be processe by the cong/econg operaton. Thevaluesof ata, forthefferentoperatonsare compute as follows: Encong: Ths operaton prouces the (k + h) ntal peces.theamountofataproucethatssentover the network s gven by the sze of these peces: ata = (k + h) pece. Partcpant Repar: Ths operaton prouces a sngle fragment plus the corresponng coeffcents. The amountofatathatssentoverthenetworks: ata = (1 + r coeff ) fragment. Newcomer Repar: Ths operaton prouces a new pece an hs coeffcents from receve fragments an ther coeffcents. The amount of ata that s recevefromthenetworksgvenbytheszeof fragments plus the corresponng coeffcents: ata = (1 + r coeff ) fragment. Inverson: Ths operaton extracts n fle nepenent rowsformthereceve (k n pece, n fle )matrx(whch escrbes the k peces use for reconstructon), an nverts the submatrx obtane. Ths means that the amountofatathatsreceveforthsoperatons gvenbytheszeofthecoeffcentsofthe kpeces: ata = k r coeff pece. Decong: Ths operaton prouces the orgnal fle by multplyng the matrx obtane from the nverson by the corresponent encoe fragments. The amount of atathatsreceveforthsoperatonssgvenbythe szeof n fle fragments,.e.theflesze: ata = fle. Table 1 shows the bottleneck network banwth for all theoperatonsnthelfecycleofafleforfferentvalues of,.thelasttwocolumnsshowthevolumeofrepar traffc repar own anthetotalamountofatastorenthe system storage. Thefrstrowwth = 3, = presentstheresults for a tratonal erasure coe, whch mnmzes the storage requrementattheexpenseofaverylargevolumeofrepar traffc( repar own = fle ).Inrowtwoweconseracoe wth = 63 an = 3, whch mnmzes the repar traffc. However, as we know from fgure 4 ths partcular coe has the hghest computatonal costs, whch result n bottleneck network banwth values that can be as low as a few hunre Kbps. However, f we remember the results presente n Fg. 1(b), whch shows the savngs n repar traffc for Regeneratng Coes, we know that most of the savngs are alreayachevebyqutesmallvaluesof,.e.where = k orwhere sslghtlylargerthan k.forthsreason,thenext two rows of table 1 conser Regeneratng Coes wth values of = 3an = 4thatllustratehowwecantraeoff storage requrement an repar traffc: Ifwehaveplentyofstoragespace,wecanuseabg valueforthepeceexpansonnex :For = 3, = 3thestoragespacerequreascomparetotheone requre by tratonal erasure coes almost oubles. However, the reucton n repar traffc as compare to tratonal erasure coes s stll almost as goo as for theregeneratngcoewth = 63, = 3,whch mnmzes the repar traffc. Ontheotherhanfstoragespacematters,wecan chooseacoewthasmall,anaslghtlylarger
than k, whch stll preserves most of reucton n repar traffc.resultsfor = 4, = 1areshownnthefourth rowoftable1.ifwecomparetheresultstothebestone achevable for each resource(see frst two rows), we see that we acheve a close to mnmal storage requrement (.6MBvs..MB),anarepartraffc(18.4 KB)thatsalmostoneorerofmagntuelessthan for tratonal erasure coes. From the results presente so far, we can conclue that Regeneratng Coes, as compare to tratonal erasure coes, can prove substantal reuctons n repar traffc, atalmostnoextracostntermsofstoragespacerequre. However,thsgancomesattheprceofamuchhgher computatonalcostascanbeseenwhenlookngattheencong an reconstructon performance, whch s nearly one orer of magntue lower than for tratonal erasure coes. Wth the current mplementaton, we can encoe/ecoe n theorerof1gbyteofataperhour. Thsperformancemaybetoolowforalargeatacenter. However, we feel that Regeneratng Coes are best sutable forthosesystemsthatonotnsertorretreveverylarge amountsofataanthatneetooasgnfcantamount of repars: An example s gven by peer-to-peer ata backup systems where the ata mantenance ue to the hgh noe churn, s far more frequent than ata nserton or retreval. avalable banwth to carry repar traffc s lmte, as s for nstance the case n Internet-we peer-to-peer backup systems. Asfuturework,weplantoeployRanomLnearRegeneratngCoesnarealPPstoragesystem.Wewantto compare the performance of Regeneratng Coes to other exstng solutons, n partcular tratonal erasure coes an Herarchcal Coes[8], uner fferent contons wth respect to ata volume an avalable banwth. References [1] A.Aya,W.Bolosky,M.Castro,G.Cermak,R.Chaken, J.Douceur,J.Howell,J.Lorch,M.Themer,anR.Wattenhofer, Farste: Feerate, avalable an relable storage for an ncompletely truste envronment, n OSDI,. [] F. Dabek et al., We-area cooperatve storage wth CFS, n SOSP, 1. [3] A. Haeberlen, A. Mslove, an P. Druschel, Glacer: Hghly urable, ecentralze storage espte massve correlate falures, n NSDI, 5. [4] H. Weatherspoon, Desgn an evaluaton of strbutewearea on-lne archval storage systems, Ph.D. ssertaton, Unversty of Calforna, Berkeley, 6. [5] R. Rorgues an B. Lskov, Hgh avalablty n DHTs: Erasure cong vs.replcaton, n IPTPS, 5. [6] H. Weatherspoon an J. D. Kubatowcz, Erasure cong vs. replcaton: A quanttatve comparson, n IPTPS,. [7] A. G. Dmaks, B. Gofrey, Y. Wu, M. J. Wanwrght, an K. Ramchanran, Network cong for strbute storage systems, Computer Research Repostory (CoRR), vol. arxv:83.63v1 http://arxv.org/abs/83.63, Mar. 8. Fgure 5. Illustraton of the trae-offs prove by Regeneratng Coes. 6. Concluson Regeneratng Coes can be seen as a generalzaton of prevously known reunancy schemes base on replcaton anerasurecoes.theyallowtotraeoffnotonlycommuncaton an storage requrements, but also computatonal costs. We schematcally epct ths trae-off n fgure 5. We propose a practcal mplementaton of Regeneratng Coes, base on Ranom Lnear Coes. We presente an evaluate ts performance trae-offs. We saw that the mportantsavngsproventermsofrepartraffconotcome for free, as Regeneratng Coes have much lower cong an econg rates. However,wefeelthatRegeneratngCoeshavealotof potental n envronments where repars are frequent an the [8] A. Dumnuco an E. Bersack, Herarchcal coes: How to make erasure coes attractve for peer-to-peer storage systems, n IEEE PP, 8. [9] Y. Wu, A. Dmaks, an K. Ramchanran, Determnstc regeneratng coes for strbute storage, n Annual Allerton Conference, 7. [1] J. S. Plank, A tutoral on Ree-Solomon cong for faulttolerance n RAID-lke systems, Software Practce & Experence, vol. 7, no. 9, pp. 995 11, September 1997. [11] S.Aceacnsk,S.Deb,M.Mear,anR.Koetter, How goo s ranom lnear cong base strbute networke storage? n NETCOD, 5. [1] S.-Y. R. L, R. W. Yeung, an N. Ca, Lnear network cong, IEEE Transactons on Informaton Theory, vol. 49, no., February 3. [13] R.Ahlswee,N.Ca,S.-Y.R.L,anR.W.Yeung, Network nformaton flow, IEEE Transactons on Informaton Theory, vol.46,no.4,july.