Distributed Storage Allocatios for Optial Delay Derek Leog Departet of Electrical Egieerig Califoria Istitute of echology Pasadea, Califoria 925, USA derekleog@caltechedu Alexadros G Diakis Departet of Electrical Egieerig Uiversity of Souther Califoria Los Ageles, Califoria 989, USA diakis@uscedu racey Ho Departet of Electrical Egieerig Califoria Istitute of echology Pasadea, Califoria 925, USA tho@caltechedu Abstract We exaie the proble of creatig a ecoded distributed storage represetatio of a data object for a etwork of obile storage odes so as to achieve the optial recovery delay A source ode creates a sigle data object ad disseiates a ecoded represetatio of it to other odes for storage, subject to a give total storage budget A data collector ode subsequetly attepts to recover the origial data object by cotactig other odes ad accessig the data stored i the By usig a appropriate code, successful recovery is achieved whe the total aout of data accessed is at least the size of the origial data object he goal is to fid a allocatio of the give budget over the odes that optiizes the recovery delay icurred by the data collector; two objectives are cosidered: (i) axiizatio of the probability of successful recovery by a give deadlie, ad (ii) iiizatio of the expected recovery delay We solve the proble copletely for the secod objective i the case of syetric allocatios (i which all oepty odes store the sae aout of data), ad show that the optial syetric allocatio for the two objectives ca be quite differet A siple data disseiatio ad storage protocol for a obile delay-tolerat etwork is evaluated uder various scearios via siulatios Our results show that the choice of storage allocatio ca have a sigificat ipact o the recovery delay perforace, ad that codig ay or ay ot be beeficial depedig o the circustaces I INRODUCION Cosider a etwork of obile storage odes A source ode creates a sigle data object of uit size (without loss of geerality), ad disseiates a ecoded represetatio of it to other odes for storage, subject to a give total storage budget Let x i be the aout of coded data evetually stored i ode i {,, } at the ed of the data disseiatio process Ay aout of data ay be stored i each ode, as log as the total aout of storage used over all odes is at ost the give budget, that is, i= x i At soe tie after the copletio of the data disseiatio process, a data collector ode begis to recover the origial data object by cotactig other odes ad accessig the data stored i the We ake the siplifyig assuptio that the stored data is istataeously trasitted o cotact; this approxiates the case where there is sufficiet badwidth ad tie for data trasissio durig each cotact his data recovery process cotiues util the data object ca be his work has bee supported i part by the Air Force Office of Scietific Research uder grat FA955---66 ad Caltech s Lee Ceter for Advaced Networkig recovered fro the cuulatively accessed data Let rado variable D deote the recovery delay icurred by the data collector, defied as the earliest tie at which successful recovery ca occur, easured fro the begiig of the data recovery process By usig a appropriate code for the data disseiatio process ad evetual storage, successful recovery ca be achieved whe the total aout of data accessed by the data collector is at least the size of the origial data object his ca be accoplished with rado liear codes [], [2] or a suitable MDS code, for exaple hus, if r d {,, } is the set of all odes cotacted by the data collector by tie d, the the recovery delay D ca be writte as { } D i d : d x i Our goal is to fid a storage allocatio (x,, x ) that produces the optial recovery delay, subject to the give budget costrait Specifically, we shall exaie the followig two objectives ivolvig the recovery delay D: (i) axiizatio of the probability of successful recovery by a give deadlie d, or recovery probability P [D d], ad (ii) iiizatio of the expected recovery delay E [D] By solvig for the optial allocatio, we will also be able to deterie whether codig is beeficial for recovery delay For exaple, ucoded replicatio would suffice if each oepty ode is to store the data object i its etirety (ie x i for all i S, ad x i = for all i / S, where S is soe subset of {,, }); the data collector would ot eed to cobie data accessed fro differet odes i order to recover the data object he odes of the etwork are assued to ove aroud ad cotact each other accordig to a exogeous rado process; they are uable to chage their trajectories i respose to the data disseiatio or recovery processes (he recovery delay could be iproved sigificatly if odes were otherwise allowed to act o oracular kowledge about future cotact opportuities [3], for exaple) Most work o delay-tolerat etworkig traditioally assue that the data object is iteded for iediate cosuptio; both the data disseiatio ad recovery processes would therefore begi at the sae tie, ad the recovery delay would
be easured fro the begiig of the data disseiatio process I cotrast, our odel ore accurately reflects the characteristics of loger-ter storage where the data object ca be cosued log after its creatio Noetheless, our odel ca still be a good approxiatio for short-ter storage especially whe the data disseiatio process occurs very rapidly, as i the case of biary SPRAY-AND-WAI [4] where the uber of odes disseiatig or sprayig data grows expoetially over tie We also ote that i ost of the literature ivolvig distributed storage, either the data object is assued to be replicated i its etirety (see, for eg, [4]), or, if codig is used, every ode is assued to store the sae aout of coded data (see, for eg, [5] [9]) Allocatios of a storage budget with odes possibly storig differet aouts of data are ot usually cosidered A Our Cotributio his paper attepts to address the gaps i our uderstadig of how the choice of storage allocatio ca affect the recovery delay perforace We forulate a siple aalytical odel of the proble ad show that the axiizatio of the recovery probability P [D d] ca be expressed i ters of the reliability axiizatio proble itroduced i [] It turs out that the siple strategies of spreadig the budget iially (ie ucoded replicatio) ad spreadig the budget axially over all odes (ie assigig x i = for all i) ay both be suboptial; i fact, the optial allocatio ay ot eve be syetric (we say that a allocatio is syetric whe all ozero x i are equal) Applyig our earlier results [], we ca show that iial spreadig is optial aog syetric allocatios whe the deadlie d is sufficietly sall, while axial spreadig is optial aog syetric allocatios whe the deadlie d is sufficietly large For the iiizatio of the expected recovery delay E [D], we are able to characterize the optial syetric allocatio copletely: iial spreadig (ie ucoded replicatio) turs out to be optial wheever the budget is a iteger; otherwise, the aout of spreadig i the optial syetric allocatio icreases with the fractioal part of Iterestigly, our aalytical results deostrate that the optial syetric allocatio for the two objectives ca be quite differet I particular, whe the budget is a iteger, we observe a phase trasitio i the optial syetric allocatio as the deadlie d icreases, for the axiizatio of recovery probability P [D d]; however, iial spreadig (ie ucoded replicatio) aloe turs out to be optial for the iiizatio of expected recovery delay E [D] We proceed to apply our theoretical isights to the desig of a siple data disseiatio ad storage protocol for a obile delay-tolerat etwork Our protocol geeralizes SPRAY-AND- WAI [4] by allowig the use of variable-size coded packets Usig etwork siulatios, we copare the perforace of differet syetric allocatios uder various circustaces hese siulatios allow us to capture the trasiet dyaics of the data disseiatio process that were siplified i the aalytical odel Our ai result shows that a axial spreadig of the budget is optial i the high recovery probability regie Specifically, axial spreadig ca lead to a sigificat reductio i the wait tie required to attai a desired recovery probability Besides validatig the predictios ade i our theoretical aalysis, these siulatios also reveal several iterestig properties of the allocatios uder differet circustaces B Other Related Work Jai et al [2] ad Wag et al [3] evaluated the delay perforace of syetric allocatios experietally i the cotext of routig i a delay-tolerat etwork Our results copleet ad geeralize several aspects of their work We preset a theoretical aalysis of the proble i Sectio II, ad udertake a siulatio study i Sectio III Proofs of theores ca be foud i the exteded versio of this paper [4] II HEOREICAL ANALYSIS We adopt the followig otatio throughout the paper: total uber of storage odes, 2 λ cotact rate betwee ay give pair of odes, λ > x i aout of data stored i ode i {,, }, x i total storage budget, D rado variable deotig recovery delay he idicator fuctio is deoted by I [G], which equals if stateet G is true, ad otherwise We use B (, p) to deote the bioial rado variable with trials ad success probability p A allocatio (x,, x ) is said to be syetric whe all ozero x i are equal; for brevity, let x(,, ) deote the syetric allocatio for odes that uses a total storage of ad cotais exactly {,, } oepty odes, that is, ( x(,, ),,,,, } {{ } } {{ } ters ( ) ters he uber of cotacts betwee ay give pair of odes i the etwork is assued to follow a Poisso distributio with rate paraeter λ; the tie betwee cotacts is therefore described by a expoetial distributio with ea λ Let W,, W be iid rado variables deotig the ties at which the data collector first cotacts ode,,, respectively, where W i Expoetial(λ) ) A Maxiizatio of Recovery Probability P [D d] Let the give recovery deadlie be d >, ad let the subset of odes cotacted by the data collector by tie d be r {,, } Successful recovery occurs by tie d if ad oly if the total aout of data stored i the subset r of odes is at least I other words, the recovery delay D is at ost d if ad oly if x i Sice the data collector cotacts each ode by tie d idepedetly with costat probability p λ,d, give by p λ,d P [W d] = F W (d) = e λd,
it follows that the probability of cotactig exactly a subset r of odes by tie d is p r λ,d ( p λ,d) r he recovery probability P [D d] ca therefore be obtaied by suig over all possible subsets r that allow successful recovery: P [D d] = p r λ,d ( p λ,d) r I x i () r {,, }: We seek a optial allocatio (x,, x ) of the budget (that is, subject to i= x i, where x i for all i) that axiizes P [D d], for a give choice of, λ, d, ad his proble atches the reliability axiizatio proble of [] with p λ,d as the access probability; we recall that the optial allocatio ay be osyetric ad ca be difficult to fid However, if we restrict the optiizatio to oly syetric allocatios, the we ca specify the solutio for a wide rage of paraeter values of p λ,d ad Specifically, if λ or d is sufficietly sall, eg p λ,d, the x (,, = ), which correspods to a iial spreadig of the budget (ie ucoded replicatio), is a optial syetric allocatio O the other had, if λ or d is sufficietly large, eg p λ,d 4 3, the either x (,, = ) or x (,, =), which correspod to a axial spreadig of the budget, is a optial syetric allocatio B Miiizatio of Expected Recovery Delay E [D] Rewritig () i ters of the uderlyig rado variables gives us the followig cdf for the recovery delay D: F D(t) = ( FW (t) ) r ( FW (t) ) r I x i r {,, }: Differetiatig F D (t) wrt t produces the pdf f D (t) = ( FW (t) ) r ( F W (t) ) r ( r F W (t) ) f W (t) r {,, }: I x i herefore, assuig i= x i which is ecessary for successful recovery, we ca copute the expected recovery delay as follows: E [D] = t f D (t) dt = ( t ( F W (t) ) r ( F W (t) ) r ( r F W (t) ) ) f W (t) dt r {,, }: I x i = [ λ H ( ) I r {,, ( r ) }: r r x i ], (2) where H i= i is the th haroic uber We seek a optial allocatio (x,, x ) of the budget (that is, subject to i= x i, where x i for all i) that iiizes E [D], for a give choice of, λ, ad Note that the optial allocatio is idepedet of λ for the iiizatio of E [D] but ot for the axiizatio of P [D d] Fig Plot of expected recovery delay E [D] agaist budget for each syetric allocatio x(,, ), for (, λ)= ( ) 2, Paraeter deotes the uber of oepty odes i the syetric allocatio he black curve gives a lower boud for the expected recovery delay of a optial allocatio, as derived i Lea he optial value of E [D] ca be bouded as follows: Lea he expected recovery delay E [D] of a optial allocatio is at least ( i ( r H, ) ) λ r r= We ake the followig cojecture about the optial allocatio, based o our uerical observatios: Cojecture A syetric optial allocatio always exists for ay, λ, ad As a siplificatio, we ow proceed to restrict the optiizatio to oly syetric allocatios (which are easier to describe ad ipleet, ad appear to perfor well) For the syetric allocatio x(,, ), successful recovery occurs by a give deadlie d if ad oly if / ( ) = or ore oepty odes are cotacted by the data collector by tie d, out of a total of oepty odes It follows that the resultig recovery probability is give by P [D d] = P [ B (, p λ,d ) ] We therefore obtai the followig cdf ad pdf for the recovery delay D: F D (t) = f D(t) = r= ( r ) (FW (t) ) r( FW (t) ) r, ( ) (FW (t) ) ( FW (t) ) fw (t) hus, we ca copute the expected recovery delay as follows: E [D]= t f D (t) dt = λ i= + i E D (λ,, ) Fig copares the perforace of differet syetric allocatios over differet budgets, for a istace of ad λ; the value of correspodig to the optial syetric allocatio appears to chage i a otrivial aer as we vary the budget Fortuately, we ca eliiate ay cadidates for
the optial value of by akig the followig observatio (a siilar observatio was ade i the axiizatio of the recovery probability []): For fixed, λ, ad, we have = k whe ( (k ), k ], for k =, 2,,, ad fially, ( = + whe k+i ], Sice k λ i= is decreasig i for costat λ ad k, it follows that E D (λ,, ) is iiized over each of these itervals of whe we pick to be the largest iteger i the correspodig iterval hus, give, λ, ad, we ca fid a optial that iiizes E D (λ,, ) over all fro aog cadidates: { }, 2,,, (3) Note that whe = k, k Z +, the expected recovery delay siplifies to the followig expressio: E D (λ,, = k ) = λ k i= k k + i By further eliiatig suboptial cadidate values for usig suitable bouds for the haroic uber, we are able to copletely characterize the optial syetric allocatio for ay, λ, ad : heore Suppose = a + l, where a Z+, l If l, the x (,, = l ) is a optial syetric allocatio; if l >, the either x (,, = ) or x (,, =) is a optial syetric allocatio If the budget is a iteger (ie l = ), the l is always true, ad so x (,, = ), which correspods to a iial spreadig of the budget (ie ucoded replicatio), is a optial syetric allocatio However, if the budget is ot a iteger (ie l > ), the the aout of spreadig i the optial syetric allocatio icreases with the fractioal part of, up to a poit at which either x (,, = ) or x (,, =), which correspod to a axial spreadig of the budget, becoes optial Miial spreadig (ie ucoded replicatio) therefore perfors well over the whole rage of budgets, beig optial aog syetric allocatios wheever is a iteger I coclusio, we ote that the optial syetric allocatio for the two objectives ca be quite differet I particular, whe the budget is a iteger, we observe a phase trasitio fro a regie where iial spreadig is optial to a regie where axial spreadig is optial, as the deadlie d icreases, for the axiizatio of recovery probability P [D d]; however, with the averagig over both regies, iial spreadig (ie ucoded replicatio) aloe turs out to be optial for the iiizatio of expected recovery delay E [D] III SIMULAION SUDY We apply our theoretical isights to the desig of a siple data disseiatio ad storage protocol for a obile delaytolerat etwork Our protocol exteds SPRAY-AND-WAI [4] by allowig odes to store coded packets that are each w the size of the origial data object, where paraeter w is a positive iteger; successful recovery occurs whe the data collector accesses at least w such packets Differet syetric allocatios of the give total storage budget ca be realized by choosig differet values of w; the origial protocol, which uses ucoded replicatio, correspods to w = A Protocol Descriptio he source ode begis with a total storage budget of ties the size of the origial data object, which traslates to w coded packets, each w the size of the origial data object Wheever a ode with ore tha oe packet cotacts aother ode without ay packets, the forer gives half its packets to the latter he actual aout of data stored or trasitted by a ode ever exceeds the size of the origial data object (or w packets) sice the excess packets ca always be geerated o dead (usig rado liear codig, for exaple) o reduce the total trasissio cost icurred, a ode ca also directly trasit oe packet to each ode it eets whe it has w or fewer packets left; otherwise, these last few packets would be trasitted ultiple ties by differet odes he disseiatio process is copleted whe o ode has ore tha oe packet B Network Model ad Siulatio Setup We ipleeted a discrete-tie siulatio of = wireless obile odes i a grid A rado waypoit obility odel is assued where at each tie step, each ode oves a rado distace L Uifor[5,] towards a selected destiatio; o arrival, the ode selects a rado poit o the grid as its ext destiatio Each ode has a couicatio rage of 2, ad the badwidth of each poitto-poit lik is large eough to support the trasissio of w packets at each tie step A axial uber of trasissios are radoly scheduled at each tie step such that (i) a ode ca trasit to or receive fro oly oe other ode i rage, ad (ii) oly oe ode ay trasit i the rage of a ode receivig a trasissio I additio to this baselie sceario, we also cosidered the followig two scearios: (i) a high-obility sceario, where the distace traveled by each ode is icreased to L Uifor[25,5], ad (ii) a high-coectivity sceario, where the couicatio rage is icreased to 8 he recovery delay icurred by the data collector is easured for two cases: (i) whe the data recovery process begis at tie, ie at the begiig of the data disseiatio process, ad (ii) whe the data recovery process begis at tie 2, ie whe the data disseiatio process is already uderway or copleted (his is a ore appropriate perforace etric for loger-ter storage)
Fig 2 Plots of required wait tie d(p S ) agaist desired recovery probability P S (seilogarithic-scale), for budget = Each colored lie represets a specific choice of paraeter w {,, }, with w = (darkest) correspodig to a iial spreadig of the budget (ie ucoded replicatio), ad w = = (lightest) correspodig to a axial spreadig of the budget he ea recovery delay correspodig to each lie is idicated by a square arker We ra the siulatio 5 ties for each choice of budget {5,,2} ad paraeter w {, 2,, } uder each sceario, with a rado pair of odes appoited as the source ad data collector for each ru C Siulatio Results We briefly suarize our fidigs here; detailed siulatio results ca be foud i the exteded versio of this paper [4] Fig 2 shows how the required wait tie d(p S ), give by d(p S ) i{d : P [D d] P S }, varies with the desired recovery probability P S, for budget = ; these plots essetially describe how uch tie ust elapse before a desired percetage of data collectors are able to recover the data object he phase trasitio predicted i the aalytical odel (Sectio II-A) is clearly discerible i all the plots, except for the high-coectivity sceario with recovery startig at tie he ea recovery delay perforace is also cosistet with our aalysis (Sectio II-B), with iial spreadig of the budget (w = ) beig optial We observe that i the high recovery probability regie, axial spreadig of the budget (w = ) ca lead to a sigificat reductio i the required wait tie (by as uch as 4% to 6% i the baselie ad high-obility scearios) We also ote that the recovery start tie appears to have a liited ipact o the delay perforace for the baselie ad high-obility scearios: for recovery startig at tie, the differet allocatios yield about the sae perforace i the low recovery probability regie; this ca be explaied by the siilarity of the differet allocatios i the early stages of the disseiatio process, whe oly a few odes have bee reached by the source directly or idirectly through relays REFERENCES [] Ho, M Médard, R Koetter, D R Karger, M Effros, J Shi, ad B Leog, A rado liear etwork codig approach to ulticast, IEEE ras If heory, vol 52, o, pp 443 443, Oct 26 [2] C Fragouli, J-Y L Boudec, ad J Wider, Network codig: A istat prier, ACM SIGCOMM Coput Cou Rev, vol 36, o, pp 63 68, Ja 26 [3] S Jai, K Fall, ad R Patra, Routig i a delay tolerat etwork, i Proc ACM SIGCOMM, Aug 24 [4] Spyropoulos, K Psouis, ad C S Raghavedra, Spray ad Wait: A efficiet routig schee for iterittetly coected obile etworks, i Proc ACM SIGCOMM Workshop Delay-olerat Netw, Aug 25 [5] S Acedáski, S Deb, M Médard, ad R Koetter, How good is rado liear codig based distributed etworked storage? i Proc Workshop Netw Codig, heory, ad Appl (NetCod), Apr 25 [6] A G Diakis, V Prabhakara, ad K Rachadra, Ubiquitous access to distributed data i large-scale sesor etworks through decetralized erasure codes, i Proc It Syp If Process Sesor Netw (IPSN), Apr 25 [7] A Kara, V Misra, J Felda, ad D Rubestei, Growth codes: Maxiizig sesor etwork data persistece, i Proc ACM SIGCOMM, Sep 26 [8] Y Li, B Liag, ad B Li, Data persistece i large-scale sesor etworks with decetralized foutai codes, i Proc INFOCOM, May 27 [9] S A Aly, Z Kog, ad E Soljai, Foutai codes based distributed storage algoriths for large-scale wireless sesor etworks, i Proc ACM/IEEE It Cof If Process Sesor Netw (IPSN), Apr 28 [] R Kleiberg, R Karp, C Papadiitriou, ad E Frieda, Persoal correspodece betwee R Kleiberg ad A G Diakis, Oct 26 [] D Leog, A G Diakis, ad Ho, Syetric allocatios for distributed storage, i Proc IEEE Global elecou Cof (GLOBE- COM), Dec 2 [2] S Jai, M Deer, R Patra, ad K Fall, Usig redudacy to cope with failures i a delay tolerat etwork, i Proc ACM SIGCOMM, Aug 25 [3] Y Wag, S Jai, M Martoosi, ad K Fall, Erasure-codig based routig for opportuistic etworks, i Proc ACM SIGCOMM Workshop Delay-olerat Netw, Aug 25 [4] D Leog, A G Diakis, ad Ho, Distributed storage allocatios for optial delay [Olie] Available: http://purlorg/et/2