Online Algorithms for Uploading Deferrable Big Data to The Cloud


 Sabrina Long
 1 years ago
 Views:
Transcription
1 Onlne lgorths for Uploadng Deferrable Bg Data to The Cloud Lnquan Zhang, Zongpeng L, Chuan Wu, Mnghua Chen Unversty of Calgary, The Unversty of Hong Kong, The Chnese Unversty of Hong Kong, bstract Ths work studes how to nze the bandwdth cost for uploadng deferral bg data to a cloud coputng platfor, for processng by a MapReduce fraework, assung the Internet servce provder (ISP) adopts the MX contract prcng schee. We frst analyze the sngle ISP case and then generalze to the MapReduce fraework over a cloud platfor. In the forer, we desgn a Heurstc Soothng algorth whose worstcase copettve rato s proved to fall between /(D+) and ( /e), where D s the axu tolerable delay. In the latter, we eploy the Heurstc Soothng algorth as a buldng block, and desgn an effcent dstrbuted randozed onlne algorth, achevng a constant expected copettve rato. The Heurstc Soothng algorth s shown to outperfor the best known algorth n the lterature through both theoretcal analyss and eprcal studes. The effcacy of the randozed onlne algorth s also verfed through sulaton studes. I. INTRODUCTION Cloud coputng s eergng as a new coputng paradg that enables propt and ondeand access to coputng resources. s exeplfed n azon EC [] and Lnode [], cloud provders nvest substantally nto ther data centre nfrastructure, provdng a vrtually unlted sea of CPU, RM and bandwdth resources to cloud users, often asssted by vrtualzaton technologes. The elastc and ondeand nature of cloud coputng asssts cloud users to eet ther dynac and fluctuatng deands wth nal anageent overhead, whle the cloud ecosyste as a whole acheves econoes of scale through cost aortzaton. Typcal coputng jobs hosted n the cloud nclude large scale web applcatons [3] and bg data analytcs [4]. In such datantensve applcatons, a large volue of nforaton (up to terabytes or even petabytes) s perodcally transtted between the user locaton and the cloud, through the publc Internet. Parallel to utlty bll reducton n data centres (coputaton cost control), bandwdth charge nzaton (councaton cost control) now represents a ajor challenge n the cloud coputng paradg [5], [6], [7], where a sall fracton of proveent n effcency translates nto llons of dollars n annual savngs across the world [8]. Coercal Internet access, partcularly the transfer of bg data, s nowadays routnely prced by the Internet servce Ths work s supported n part by the Natural Scences and Engneerng Research Councl of Canada (NSERC), and grants fro Hong Kong RGC under the contracts HKU 778 and HKU /4/$3. 4 IEEE provders (ISPs) through a percentle charge odel, a draatc departure fro the ore ntutve totalvolue based charge odel as n resdental utlty bllng or the flatrate charge odel as n personal Internet and telephone bllng [5], [9], [7], []. Specfcally, n a θth percentle charge schee, the ISP dvdes the charge perod, e.g., 3 days, nto sall ntervals of equal fxed length, e.g., 5 nutes. Statstcal logs suarze traffc volues wtnessed n dfferent te ntervals, sorted n ascendng order. The traffc volue of the θth percentle nterval s chosen as the charge volue. For exaple, under the 95thpercentle charge schee, the cost s proportonal to the traffc volue sent n the 88 th (95% 3 4 6/5 = 88) nterval n the sorted lst [9], [7], []. The MX contract odel s sply the  th percentle charge schee. Such percentle charge odels are perhaps less surprsng when one consders the fact that nfrastructure provsonng cost s ore closely related to peak nstead of average deand. Due to both ts new algorthc plcatons and ts econoc sgnfcance n practce, ths nterestng percentle charge odel has soon spawned a seres of studes. Most of these endeavours exane cost savng strateges and opportuntes through careful traffc schedulng, ulthong (subscrbng to ultple ISPs), and nterisp traffc shftng. However, they odel the cost nzaton proble wth a crtcal, although soetes plct, assupton that all data generated at the user locaton have to be uploaded to the cloud edately, wthout any delay [9], []. Consequently, the soluton space s restrcted to traffc soothng n the spatal doan only. Realworld bg data applcatons reveal a dfferent pcture, n whch a reasonable aount of uploadng delay (often specfed n servce level agreeent, or SL) s tolerable by the cloud user, provdng a golden te wndow for traffc soothng n the teporal doan, whch can substantally slash peak traffc volues and hence councaton cost. n exaple les n astronocal data fro observatores, whch are perodcally generated at huge volues but requre no urgent attenton. nother wellknown exaple s huan genoe analyses [4], where data are also bg but not tesenstve. The an challenge of effectve teporal doan soothng les n the uncertanly n future data arrvals. Therefore a practcal cost nzaton soluton s nherently an onlne algorth, akng perodcal optzaton decsons based on
2 htherto nput. It s agan, surprsng, to dscover that the onlne cost nzaton for deferrable upload under percentle chargng, even when defned over a sngle lnk fro one source to one recever only, s stll hghly nontrval, exhbtng a rch cobnatoral structure, yet never studed before n the lterature of ether coputer networkng or theoretcal coputer scence (wth an only excepton below) [5]. The only study of the onlne cost nzaton proble under percentle charges that we are aware of s a recent work of Golubchk et al. [5], whch focuses exclusvely on the sngle ponttopont lnk case. The onlne algorth they present, referred to as Sple Soothng here, s extreely sple, and nvolves evenly soothng every nput across ts wndow of tolerable delay for upload. Nonetheless, ths seengly straghtforward algorth s proven to approach the offlne optu wthn a sall constant under the MX odel. In ths work, we frst desgn our own onlne algorth for a sngle lnk, also adoptng the MX odel, n preparaton for the MapReduce data processng case. Based on the nsght that Sple Soothng gnores valuable nforaton ncludng the axu volue recorded so far and the current aount of backlogged data and ther deadlnes, we talor a ore sophstcated soluton, whch ncorporates a few heurstc soothng deas and s hence referred to as Heurstc Soothng. We prove that Heurstc Soothng always guarantees a copettve rato no worse than that of Sple Soothng, under any possble data arrval pattern. Theoretcal analyss shows that Heurstc Soothng can acheve a worstcase copettve rato between D+ and ( e ), where D s the tolerable delay. We further extend the sngle lnk case to a cloud scenaro where ultple ISPs are eployed to transfer bg data dynacally for processng usng a MapReducelke fraework. Data are routed fro the cloud user to appers and then reducers, both resdng n potentally dfferent data centres of the cloud [6]. We apply Heurstc Soothng as a plugn odule for desgnng a dstrbuted and randozed onlne algorth wth very low coputatonal coplexty. The copettve rato guaranteed by the randozed onlne algorth ncreases fro that of Heurstc Soothng by a sall constant factor. Extensve evaluatons are conducted to nvestgate the perforance of the proposed onlne algorths. The results show that Heurstc Soothng perfors uch better than Iedate Transfer (IT), a straghtforward algorth that gnores teporal soothng. Meanwhle Heurstc Soothng also acheves saller copettve ratos than Sple Soothng does. In ost cases tested, the observed copettve rato of Heurstc Soothng s saller than.5, better than the theoretcal upper bound, and relatvely close to the offlne optu. Such superor perforance s attrbuted to less abrupt responses to hghly volatle traffc deand. Eprcal studes for the cloud scenaro further verfy the effcacy of the randozed cost reducton algorth, n ters of both scalablty and copettve rato. In the rest of ths paper, we dscuss related work n Sec. II, and ntroduce the syste odel n Sec. III. Heurstc Soothng and the randozed algorth for the cloud scenaro are desgned and analyzed n Sec. IV and Sec. V, respectvely. Evaluaton results are n Sec. VI. Sec. VII concludes the paper. II. RELTED WORK Slar to deferrng data upload to nze the peak bandwdth deand, there have been studes on schedulng CPU tasks to nze the axu CPU speed, that s closely related to the power consupton. Yao et al. [] ntally provde an optal offlne algorth, the YDS algorth, to optally nze power consupton by scalng CPU speed under the assupton that the forer s a convex functon of the latter. Bansal et al. [] further propose the BKP algorth wth a copettve rato of e, for nzng the axu speed when facng arbtrary nputs wth dfferent delay requreents, and arbtrary workload patterns. Towards new challenges brought by the prolferaton of ultcore processors, lbers et al. [3] desgn an onlne algorth for ultprocessor job schedulng wthout nterprocess job graton. Bngha et al. [4] and ngel et al. [5] further propose polynoalte offlne optal algorths, wth graton of jobs consdered. Grener et al. [6] generalze a ccopettve onlne algorth for a sngle processor nto a randozed cb α copettve onlne algorth for ultple processors, where B α s the αth Bell nuber. Dfferent fro the MX traffc charge odel n ths work, they focus on the total volue based energy charges coputed by ntegratng nstantaneous power consupton over te. In recent years, data centre workload schedulng wth deadlne constrants has been extensvely studed n the cloud coputng lterature. Gupta et al. [7] analyze the energy nzaton proble n a data center when avalable deadlne nforaton of the workload ay be used to defer job executon for reduced energy consupton. Yao et al. [8] tackle the power reducton proble wth deferrable workloads n date centers usng the Lyapunov optzaton approach, for approxate te averaged optzaton. few studes exst on the transfer of bg data to the cloud. Cho et al. [9] desgn a statc costaware plannng syste for transferrng large aounts of data to the cloud provder va both the Internet and courer servces. Consderng a dynac transfer schee where data s produced dynacally, Zhang et al. [6] propose two onlne algorths to nze the total transfer cost. Dfferent fro ths work, they assue andatory edate data upload, and adopt a total volue based charge odel nstead of the percentle charge odel. Goldenberg et al. [9] study the ulthong proble under 95percentle traffc charges. Grothey et al. [] nvestgate a slar proble through a stochastc dynac prograng approach. They both leverage ISP subscrpton swtchng for traffc engneerng, so that the charge volue s nzed. However, data traffc n ther studes cannot be deferred. dler et al. [] focus on careful routng of data traffc between two types of ISPs (verage contract, Maxu contract) to pursue the optal onlne soluton, leadng to an onlne optzaton proble slar to the classc skrental proble. Golubchk
3 et al. [5] study the nzaton of transsson cost by explotng a sall tolerable delay when ISPs adopt a 95 percentle or MX charge odel, focusng on a sngle lnk only, and proposng the Sple Soothng algorth. between DC and DC, and ISP B for councatng between DC and DC3. If two nterdc connectons are covered by the sae ISP, t can be equvalently vewed as two ISPs wth dentcal traffc charge odels. III. SYSTEM MODEL We consder a cloud user who generates large aounts of data dynacally over te, requred for transfer nto a cloud or a federaton of clouds for processng usng a MapReducelke fraework. The appers and reducers ay resde n geographcally dspersed data centres. The bg data n queston can tolerate bounded upload delays specfed n ther SL. User Locaton DC DC DC ' DC '. The MapReduce Fraework MapReduce, ntally unveled by Dean and Gheawat [], s a prograng odel targetng at effcently processng large datasets n parallel. typcal MapReduce applcaton ncludes two functons ap and reduce, both wrtten by the users. Map processes nput key/value pars, and produces a set of nteredate key/value pars. The MapReduce lbrary cobnes all nteredate values assocated wth the sae nteredate key I and then passes the to the reduce functon. Reduce then erges these values assocated wth the nteredate key I to produce saller sets of values. There are four stages n the MapReduce fraework: pushng, appng, shufflng, and reducng. The user transfers workloads to the appers durng the pushng stage. The appers process the durng the appng stage, and delver the processed data to the reducers durng the shufflng stage. Fnally the reducers produce the results n the reducng stage. In a dstrbuted syste, appng and reducng stages can happen at dfferent locatons. The syste wll delver all nteredate data fro appers to reducers durng the shufflng stage, and the cloud provders ay charge for nterdatacentre traffc durng the shufflng stage. Recent studes [], [3] suggest that the relaton between nteredate data sze and orgnal data sze depends closely on the specfc applcaton. For applcatons such as ngra odels, nteredate data sze s uch bgger, and the bandwdth cost charged by the cloud provder cannot be neglected. We use β to denote the rato of orgnal data sze to nteredate data sze. B. Cost Mnzaton for MapReduce pplcatons We odel a cloud user producng a large volue of data every hour, as exeplfed by astronocal observatores [6]. s shown n Fg., the data locaton s ulthoed wth ultple ISPs, for councatng wth data centers. Through the nfrastructure provded by ISP, data can be uploaded to a correspondng data centre DC. Each ISP has ts own traffc charge odel and prcng functon. fter arrval at the data centers, the uploaded data wll be processed usng a MapReducelke fraework. Interedate data need to be transferred aong data centers n the shufflng stage. Towards a general odel, we agan assue that ultple ISPs are eployed by the cloud to councate aong ts dstrbuted data centers, e.g., ISP for councatng DC 3 DC 3' Data Sources Mappers Reducers Fg.. n llustraton of the network for deferrable data upload under the MapReduce fraework. The syste runs n a teslotted fashon. Each te slot s 5 nutes. The charge perod s a onth (3 days). M and R denote the set of appers and the set of reducers, respectvely. Snce each apper s assocated wth a unque ISP n the frst stage, we eploy M to represent the ISP used to connect the user to apper. ll appers use the sae hash functon to ap the nteredate keys to reducers [3]. The upload delay s defned as the duraton between when data are generated to when they are transtted to the appers. We focus on unfor delays,.e., all jobs have the sae axu tolerable delay D, whch s reasonable assung data generated at the sae user locaton are of slar nature and portance. We use W t to represent each workload released at the user locaton n te slot t. Let x d,t be a decson varable ndcatng the porton of W t assgned to apper at te slot t+d. The cost of ISP s ndcated by f (V ), where V s the axu traffc that goes through ISP at te slot t. To ensure all workload s uploaded nto the cloud, we have: x d,t, M. () D x d,t =, t. () Gven the axu tolerable uploadng delay D, the traffc V t between the user and apper s: D V t = W t d x d,t d, M. (3) Let V be the axu traffc volue of ISP, whch wll be used n the calculaton of bandwdth cost. V satsfes: V V t, t. (4) We assue that ISPs n the frst stage, connectng user to appers, eploy the sae chargng functon f ; and ISPs n the second stage fro appers to reducers use the sae chargng functon f,r. Both chargng functons f and f,r are nondecreasng and convex. We further assue that the frst stage s nonsplttable,.e., each workload s uploaded
4 through one ISP only. The user decdes to delver the workload to apper n te slot t. ssue t takes a unt te to transt data va denote the total data sze at apper n te slot t +. M t+ can be calculated as the suaton of all transtted workloads at te slot t: D M t+ = W t d x d,t d, M. ISPs. Let M t+ ssue the appers take te slot to process a receved workload. Therefore the appers wll transfer data to the reducer n te slot t+. Let T,r t+ be the traffc fro apper to reducer r s n te slot t + : V t+,r = βm t+ y t+,r, M, r R. (5) The axu traffc volue of the ISP (, r), V,r, satsfes: V,r V t+,r, t. (6) Notce that the MapReduce fraework parttons the output pars (key/value) of appers to reducers usng hash functons. ll values for the sae key are always reduced at the sae reducer no atter whch apper t coes fro. Furtherore, we assue that data generated n the data locatons are unforly xed, therefore we have: y t+,r = z r, M, r R. (7) Ths equaton also ples that the superscrpt of y,r t+ can be gnored. Now we can forulate the overall traffc cost nzaton proble for the cloud user, under the MX contract charge odel: nze f (V ) + f,r(v,r) (8),r subject to: V V t, t, (8a) V,r V,r t, t,, r (8b) D x d,t = n, t, (8c) n =, (8d) x d,t, n {, }, (8e). where V t and V t,r are defned n Eqn. (3) and Eqn. (5), respectvely. n s a bnary varable ndcatng whether ISP s eployed or not. For ease of reference, notatons are suarzed l n Tab. I. IV. THE SINGLE ISP CSE We frst nvestgate the basc case that ncludes one apper and one reducer only, colocated n the sae data center, wth no bandwdth cost between the pars. Gven a MX charge odel at the ISP, the algorth tres to explot the allowable delay by schedulng the traffc to the best te slot wthn the allowed te wndow, for reducng the charge volue. Ths can be llustrated through a toy exaple: n t =, a Sybol Defnton TBLE I NOTTION D the axu delay fro the te data s generated to the te the data locaton begns to transt t to the appers. M the set of appers. R the set of reducers. Soe apper and reducer ay be n the sae locaton,.e., M R. W t the workload released n user locaton at te slot t. x d,t the porton of the workload W t that s assgned to apper at te slot t + d. β the rato of the sze of output of a apper to the sze of ts nput. y,r t the porton of the output of apper that s transtted to reducer r at te slot t. z r the porton of the key space apped nto reducer r. V t the total traffc that goes through ISP at te slot t. f (y) the cost of ISP for the nput y. job (MB, ax delay = 9 te slots) s released; n the followng te slots, no jobs are released. If the algorth sooths the traffc across the te slots, the charge volue can be reduced to M B/5n, fro M B/5n f edate transsson s adopted.. The Pral & Dual Cost Mnzaton LPs We can drop the locaton ndex (, r) n ths basc scenaro of one apper and one reducer locatng n the sae data centre. Note that the chargng functon f s a nondecreasng functon of the axu traffc volue. Mnzng the axu traffc volue therefore ples nzng the bandwdth cost. Consequently, the cost nzaton proble n our basc sngle ISP scenaro can be forulated nto the followng (pral) lnear progra (LP): subject to: n{d,t } nze V (9) W t d x d,t d V, t T (9a) D x d,t =, t T D (9b) x d,t, V, d D, t T D, (9c) where T = [, T ], T D = [, T D], D = [, D] and x d,t =, t > T D, d D Introducng dual varable y and z to constrants (9a) and (9b) respectvely, we forulate the correspondng dual LP: subject to: axze T y t t= T D t= z t () (a) z t W ty t+d, t T D, d D (b) y t, t T (c) z t unconstranted, t T D (d)
5 The nput begns wth W and ends wth W T D, and W T D+ =,..., W T = s padded to the tal of the nput. We use P and D to denote feasble solutons to the pral and dual LPs, respectvely. The optzaton n (9) s a standard lnear progra. For an offlne optal soluton, one can sply solve (9) usng a standard LP soluton algorth such as the splex ethod or the nterorpont ethod. B. Onlne algorths The splest onlne soluton n the basc one ISP scenaro s the edate transfer algorth (IT). Once a new job arrves, IT transfers t to appers edately wthout any delay. Next we analyze the copettve rato of IT, as copared to the offlne optu. Theore. IT s (D + )copettve. Proof: Consder the followng nput: (W,,,,,...). IT wll process t edately wth bandwdth cost: W. However the offlne optal algorth wll dvde the workload nto sall peces: W/(D+), W/(D+),...W/(D+),,,,...), feasble wthn the deadlne D, wth axu traffc volue W/(D + ). W Copettve rato λ W/(D + ) = D + We hence obtan a lower bound on the copettve rato of IT, D +. Next we prove D + s also an upperbound. Wthout explotng any delays, IT provdes a feasble soluton to the pral proble, whch s denoted as P IT. P IT = ax W t t Now we desgn a feasble soluton to the dual proble as follows (assue τ = arg ax t W t ): { /(D + ) f t = τ,..., τ + D y t = otherwse { /(D + )Wt f t = τ z t = otherwse D = D + Wτ So the copettve rato s: Copettve rato λ = PIT OP T PIT D = D + Rearks: f D =,.e. jobs are not deferrable, the offlne optal algorth degrades nto IT, agreeng wth the theore, whch clas IT s copettve (D + = ). IT s apparently not deal, and ay lead to hgh peak traffc and hgh bandwdth cost as copared wth the offlne optu. Golubchk et al. [5] desgn a costaware algorth that strkes to spread out bandwdth deand by utlzng all possble delays, referred to as the Sple Soothng lgorth. Upon recevng a new workload, Sple Soothng evenly dvdes t nto D + parts, and processes the one by one n the current te slot and the followng D te slots, as shown n lgorth. lgorth The Sple Soothng lgorth [5] : for τ = to T D do : for d = to D do 3: x d,τ = /(D + ) 4: end for 5: end for Theore. [5] The copettve rato of Sple Soothng s D+. Theore can be proven through weak LP dualty,.e., usng a feasble dual as the lower bound of the offlne optal. Sple Soothng s very sple, but guarantees a worst case copettve rato saller than. Nonetheless, there s stll roo for further proveents, snce Sple Soothng gnores avalable nforaton such as the htherto axu traffc volue transtted, and the current pressure fro backlogged traffc and ther deadlnes. Such an observaton otvated our desgn of the ore sophstcated Heurstc Soothng algorth for the case D, as shown n lgorth. Here T s the charge perod, τ s the current te slot, and H d s the total volue of data that have been buffered for d te slots. lgorth The Heurstc Soothng lgorth : V ax = : W τ =, τ = T D +,..., T ; 3: H d =, d =,..., D; 4: for τ = to T do { 5: V τ = n } D d= H d D } W τ + D d= H d, ax{v ax, Wτ D+ + 6: f V ax < V τ then 7: V ax = V τ ; 8: end f 9: Transfer the traffc followng Earlest Deadlne Frst (EDF) strategy; : Update H d, d =,..., D; : end for Theore 3. The copettve rato of Heurstc Soothng s lower bounded by ( e ). Proof: Consder the followng nput: (W, W,...W,,..., ) whose frst D + te slots are W. The traffc deand V ncreases untl te slot D +. V D+ = W D + + W (D )W + D + (D + )D (D )D W (D + )D D = W D + ( + D( ( D )D )) We can fnd a feasble pral soluton whch yelds the charge volue D+ D+W. Ths pral soluton s an upper bound of the offlne optu. Therefore the lower bound D+ of the copettve rato λ V D+ (D+) = D+ (D+) ( + D( ( D )D )) ( e ) as D +. Notce that
6 D+ (D+) ( + D( ( D )D )) s a decreasng functon for D [, + ), we further have λ ( e ). Theore 4. The copettve rato of Heurstc Soothng s upperbounded by D+. Proof: We take the Sple Soothng algorth (lgorth. ) as a benchark, and we prove that P sooth P heurstc, where P heurstc s the charged volue produced by lgorth 3. lgorth 3 wll only ncrease the traffc deand when W τ D+ + D d= H d/d exceeds V ax. Therefore, we rearrange H d to copute the axu traffc deand. Let V t+d = ( W t+d D + + Wt+D D + (D )Wt+D (D )D W t ) (D + )D (D + )D D Then P heurstc = ax t V t+d. Let τ = arg ax t V t+d, and we have t+d W t P sooth = ax t D + =t τ+d =τ W τ D + Wτ+D D + + Wτ+D D + + (D )Wτ+D (D + )D (D )D W τ (D + )D D = P heurstc Snce the sple soothng algorth s D+ copettve, the copettve rato of lgorth 3 cannot be worse than D+. Fro the proof above, we have followng corollary. Corollary. For any gven nput, the charge volue resultng fro Heurstc Soothng s always equal to or saller than that of Sple Soothng. lgorth Coplexty. ll three onlne algorths dscussed have oderate te coplexty, akng the lghtweght for practcal applcatons. More specfcally, IT, Sple Soothng and Heurstc Soothng have a te coplexty of O(T D), O((T D)D), and O(T D), respectvely. V. CLOUD SCENRIO In ths secton, we apply the algorths desgned for the sngle ISP case to the cloud scenaro, whch utlzes a MapReducelke fraework for processng bg data. Defne Cost = f(v), Cost =,r f,r(v,r), and adopt power charge functons by lettng f (x) = f,r (x) = x α, α >.. lgorth Desgn The twophase MapReduce cost optzaton proble s defned n (8), and s a dscrete optzaton wth nteger varables. Consequently, an offlne soluton that solves such an nteger progra has a hgh coputatonal coplexty, further otvatng the desgn of an effcent onlne soluton. natve onlne algorth selects a fxed apper and schedules the traffc on the correspondng ISP usng the Sple Soothng lgorth. Theore 5. The copettve rato of the natve onlne algorth s lower bounded by M α, where M s the nuber of appers. Proof: Consder the nput (W,, W,,, ) whose frst D + tes are W. We can verfy that the charge volue s D+. The correspondng cost s ( D+ )α + r (βz r D+ )α. Next we consder a ore ntellgent algorth that assgns the jth workload to the apper (j od M ). Ths algorth acts as the upper bound of the offlne optu. Its charge volue s (D+) M. The correspondng cost s M ( (D+) M )α + M r (βz r (D+) M )α. Therefore, Copettve rato ( D+ )α + r (βz r D+ )α M ( (D+) M )α + M r (βz r = M α (D+) M )α We next present a dstrbuted randozed onlne algorth for (8). For each workload, the user chooses ISPs unforly at rando to transfer the data to a randoly selected apper. Forally, let W be the randozed workload assgnent allocatng each workload to appers. For each selected ISP, the user runs Heurstc Soothng to gude onestage traffc deferral and transsson, as shown n lgorth 3. lgorth 3 Randozed Uploadng Schee : Generate a randozed workload assgnent W whch allocates each workload to a randoly selected apper. : For each ISP, apply the sngle ISP algorth, e.g., lgorth to schedule the traffc. We analyze lgorth 3 by buldng a connecton between the uploadng schee π and the randozed workload assgnent W. We cobne π and W to a new uploadng schee π W. Let t = < t < t e = T. Durng each nterval [t, t + ), each ISP s eployed to transfer at ost one workload n the uploadng schee π. If a workload s processed n [t, t + ), then t cannot be fnshed before t +. Due to the MX charge odel, the transfer speed for workload w n [t, t + ) s a sngle speed, say v,w. If workload w s not processed n [t, t + ), we set v,w =. Therefore, for any gven, there are at ost M values of v,w. ssue there are n workloads, forng a set W. Let Ω = {w all workloads assgned to ISP } W. In schee π W, the user transfers data at speed of w Ω v,w n te nterval [t, t + ). Let φ n (Ω ) be the probablty that exactly the workloads Ω are allocated to ISP. φ n(ω ) = ( M ) Ω ( M )n Ω
7 We next defne functon Λ n(x) where x R n \ {}: Λ n(x) = M φ n n(ω )( x w) α / x α w w Ω w= Ω W Lea. Gven any uploadng schee π and a randozed workload assgnent W, we have a randozed uploadng schee π W, whch satsfes: E(Cost (π W ) + Cost (π W )) ax Λ M (x)(cost x (π) + Cost (π)) Proof: Snce the traffc pattern n ISP (, r), r s exactly the sae as ISP, we only consder one stage. Let us consder schee π frst. In the frst stage, the cost s: Cost (π) = ax,w (v,w) α ax M Σ M (v α,w) where v,w ndcates the transfer speed n ISP durng [t, t + ) for workload w. Σ M (vα,w ) s the su of the largest M values of v,w α when gven. The nequalty holds because there are at ost M nonzero speeds for any gven duraton [t, t + ). We next have the cost of the second stage: Cost (π) = ax,w (βzrv,w) α r = β α zr α ax,w (v,w) α r β α zr α ax Σ M (v,w) α r The cost of the frst stage n π W s: E(Cost (π W )) = φ n(ω W ) ax( M Ω W W = M ax )( φ n(ω W Ω W W w Ω W w Ω W v,w) α v,w) α The second equalty above holds because the assgnent s unforly rando. Slarly, The cost of the second stage n π W s: E(Cost (π W )) = M Ω W = M β α r φ n(ω W P z α r ax ) r ax(z r φ n(ω W Ω W W w Ω W )( w Ω W βv,w) α v,w) α gan because for any [t, t + ), there are at ost M values of v,w. We have M Ω W = M Ω W W φ n(ω W Σ M (vα,w ) = Λ n (v) = Λ M (v ) )( w Ω W v,w) α W φ n(ω W )( w Ω v W,w ) α n w= (vα,w ) where v s an M densonal subvector of v R n \ {}, whch contans all nonzero transfer speeds n [t, t + ). Therefore, the rato for the frst stage s: E(Cost (π W )) Cost (π) M Ω W W φ(ωw ) ax ( w Ω W ax Σ M (vα,w ) M Ω W W φ(ωw )( w Ω W Σ M (vα,w ) ax Λ M (x) x v,w) α v,w)α where = arg ax ( w Ω v,w) α. Slarly, the rato W for the second stage s also bounded by ax x Λ M (x),.e., E(Cost (π W )) Cost (π) ax x Λ M (x). Ths proves Lea. Let S(α, j) be the jth Strlng nuber for α eleents, defned as the nuber of parttons of a set of sze α nto j subsets [4]. Let B α be the αth Bell nuber, defned as the nuber of parttons of a set of sze α [4]. The Bell nuber s relatvely sall when α s sall: B =, B =, B 3 = 5, B 4 = 5. The defntons also ply: α S(α, j) = B α j The followng lea s proven by Grener et al. [6]. Lea. [6] α N and α M, ax x Λ M (x) = M! S(α, j) α j= M j ( M j)!. Theore 6. Gven a λcopettve algorth wth respect to cost for the sngle ISP case, then the randozed onlne algorth s λb α copettve n expectaton. Proof: Let π be the optal uploadng schee, the correspondng randozed uploadng schee s πw. The algorth we use s π W. Snce the workloads n πw and π W are the sae, we have: E(Cost (π W )) λe(cost (π W )) () snce the algorth s λcopettve. Slarly, E(Cost (π W )) λe(cost (π W )) () snce the traffc pattern n ISP (, r), r s exactly the sae as n ISP. Lea ples: E(Cost (π W ) + Cost (π W )) ax x Λ M (x)(cost (π ) + Cost (π )) (3) Snce Λ M (x) s a onotoncally ncreasng functon of α, we use α as an upper bound of α >, obtanng a correspondng upper bound of Λ M (x). Cobnng Eqn. () () and (3) as well as Lea, we have the followng expected cost of the randozed onlne algorth:
8 E(Cost (π W ) + Cost (π W )) λe(cost (π W ) + Cost (π W )) λ ax x Λ M (x)(cost (π ) + Cost (π )) α M! = λ S( α, j) M j ( M j)! (Cost(π ) + Cost (π )) j= α λ S( α, j)(cost (π ) + Cost (π )) j= λb α OP T Reark: For a sngle lnk, we can eploy Heurstc Soothng, whose copettve rato s saller than wth respect to axu traffc volue. Then the copettve rato of lgorth s α n cost. Thus lgorth 3 s α B α  copettve n expectaton. When α =, the copettve rato s 8, a constant regardless of the nuber of appers. VI. PERFORMNCE EVLUTION We have pleented Sple Soothng, Heurstc Soothng, as well as the randozed onlne algorth, for perforance evaluaton through sulaton studes. The default nput W t s generated unforly at rando, as shown n Fg., where all data are noralzed,.e., scaled down by ax t W t. We assue there are 5 appers at dfferent locatons, and 5 reducers at dfferent locatons. We choose α =, thus the charge functon f (x) = f,r (x) = x.. The Sngle ISP Case Frst we copare Heurstc Soothng wth Sple Soothng. The two algorths are executed under a delay requreent D = 5. Fg. 3 llustrates the traffc volue scheduled at each te slot. Copared wth Sple Soothng, Heurstc Soothng results n a axu traffc volue ths s about 8% saller. Heurstc Soothng tres to explot the avalable delay to average the traffc and s less senstve to the fluctuaton of traffc deand, as copared wth Sple Soothng. For exaple, at around t =, the traffc of Sple Soothng ncreases abruptly due to hgh traffc deand n the nput; around t = 4, t goes down due to low traffc deand. In coparson, Heurstc Soothng results n ore even traffc dstrbutons around t = and t = 4. Next we exane how the tolerable delay affects the perforance of the proposed onlne algorths. We execute Sple Soothng, Heurstc Soothng and IT aganst a varety of delays rangng fro D = to D = 4. We also copute the offlne optu as a benchark. The observed copettve ratos are shown n Fg. 4. The results suggest that both Sple Soothng and Heurstc Soothng perfor uch better than IT. Heurstc Soothng also beats Sple Soothng, by a saller argn. Heurstc Soothng approaches the offlne optu rather closely; the observed copettve ratos are always below.5 and usually around., uch better than the theoretcally proven upper bound n Theore 4. Heurstc Soothng s further evaluated under other rando nputs, ncludng Posson dstrbuton n Fg. 5, Gaussan dstrbuton n Fg. 6 and a specfcally desgned rando nput n Fg. 7. ll results verfy that Heurstc Soothng works best aong the three onlne cost nzaton algorths. B. The Cloud Scenaro We pleented the randozed algorth n lgorth 3 and the natve algorth n Sec. V. They are evaluated under three types of nputs: unfor dstrbuton, Posson dstrbuton and Gaussan dstrbuton. We copare the costs of the two algorths usng these nputs, as shown n Fg. 8, Fg. 9 and Fg., respectvely. We observe that the randozed algorth acheves uch lower cost than the natve algorth, n partcular wth longer tolerable delays. For exaple, Fg. 8 shows that the randozed algorth saves approxately 45% cost as copared wth the natve algorth when D = 5, and t saves ore than 68% when D =. Ths suggests that longer tolerable delays provde the randozed algorth ore space of aneuver, leadng to ore evdent cost reduce. We further nvestgate the nfluence of β, the rato of orgnal data sze to the nteredate data sze. Results are shown n Fg.. When D s sall, a large β causes a rather hgh cost. However when a large D s used, e.g., D = 4, even a large β only produces a relatvely sall cost. Noralze Cost β Fg.. Relatonshp between traffc cost and paraeters D, β. VII. CONCLUSION ISPs now charge bg data applcatons wth a new, nterestng percentle based odel, leadng to new onlne algorth desgn probles for nzng the traffc cost pad for uploadng bg data to the cloud. We studed two scenaros for such onlne algorth desgn n ths work. Heurstc Soothng algorth s proposed n the sngle lnk case, wth proven better perforance than the best alternatve n the lterature, and a saller copettve rato below. randozed onlne algorth s desgned for the MapReduce fraework, achevng a constant copettve rato by eployng Heurstc Soothng as a buldng odule. We have focused on MX charge rules, and leave slar onlne algorth desgn for 95percentle charge rules as future work. REFERENCES [] azon Elastc Copute Cloud, [] Lnode, https://www.lnode.co/speedtest/. [3] azon EC Casestudes,
9 Noralzed Data Traffc.6.4 Unfor Input Te Noralzed Scheduled Traffc.6.4 Sple Soothng Heurstc Soothng Te Copettve Rato IT Sple Soothng Heurstc Soothng Copettve Rato.5.5 Fg.. Unforly Rando Input. IT Sple Soothng Heurstc Soothng Fg. 5. Copettve rato over varous delay wndow szes under nput of Posson dstrbuton. Fg. 3. Sple Soothng vs. Heurstc Soothng, D = Copettve Rato IT Sple Soothng Heurstc Soothng Fg. 6. Copettve rato over varous delay wndow szes under nput of Gaussan dstrbuton. Fg. 4. Copettve rato over varous delay wndow szes under nput of unfor dstrbuton. Copettve Rato IT Sple Soothng Heurstc Soothng Fg. 7. Copettve rato over varous delay wndow szes under a specfcally desgned nput. Randozed lgorth Natve lgorth Randozed lgorth Natve lgorth Randozed lgorth Natve lgorth Noralze Cost.6.4 Noralze Cost.6.4 Noralze Cost Fg. 8. Coparson between the proposed randozed algorth and the natve algorth under nput of unfor dstrbuton and β =. Fg. 9. Coparson between the proposed randozed algorth and the natve algorth under nput of Posson dstrbuton and β =. Fg.. Coparson between the proposed randozed algorth and the natve algorth under nput of Gaussan dstrbuton and β =. [4] E. E. Schadt, M. D. Lnderan, J. Sorenson, L. Lee, and G. P. Nolan, Coputatonal Solutons to Largescale Data Manageent and nalyss, Nat Rev Genet, vol., no. 9, pp , Sep.. [5] L. Golubchk, S. Khuller, K. Mukherjee, and Y. Yao, To Send or not to Send: Reducng the Cost of Data Transsson, n Proc. of IEEE INFOCOM, 3. [6] L. Zhang, C. Wu, Z. L, C. Guo, M. Chen, and F. Lau, Movng Bg Data to The Cloud: n Onlne CostMnzng pproach, IEEE Journal on Selected reas n Councatons, vol. 3, no., pp. 7 7, 3. [7] H. Wang, H. Xe, L. Qu,. Slberschatz, and Y. Yang, Optal ISP Subscrpton for Internet Multhong: lgorth Desgn and Iplcaton nalyss, n Proc. of IEEE INFOCOM, 5. [8] S. Peak, Beyond Bandwdth: The Busness Case For Data cceleraton, Whte Paper, 3. [9] D. K. Goldenberg, L. Quy, H. Xe, Y. R. Yang, and Y. Zhang, Optzng Cost and Perforance for Multhong, n Proc. of CM SIGCOMM, 4. []. Grothey and X. Yang, Toppercentle Traffc Routng Proble by Dynac Prograng, Optzaton and Engneerng, vol., pp ,. [] F. Yao,. Deers, and S. Shenker, Schedulng Model for Reduced CPU Energy, n Proc. of IEEE FOCS, 995. [] N. Bansal, T. Kbrel, and K. Pruhs, Speed Scalng to Manage Energy and Teperature, J. CM, vol. 54, no., pp. 3: 3:39, Mar. 7. [3] S. lbers, F. Müller, and S. Schelzer, Speed Scalng on Parallel Processors, n Proc. of CM SP, 7. [4] B. Bngha and M. Greenstreet, Energy Optal Schedulng on Multprocessors wth Mgraton, n Proc. of IEEE ISP, 8. [5] E. ngel, E. Baps, F. Kace, and D. Letsos, Speed Scalng on Parallel Processors wth Mgraton, n EuroPar Parallel Processng, ser. Lecture Notes n Coputer Scence, C. Kaklaans, T. Papatheodorou, and P. Spraks, Eds. Sprnger Berln Hedelberg,, vol. 7484, pp [6] G. Grener, T. Nonner, and. Souza, The Bell s Rngng n Speedscaled Multprocessor Schedulng, n Proc. of CM SP, 9. [7] M.. dnan, Y. Ma, R. Sughara, and R. Gupta, Dynac Deferral of Workload for Capacty Provsonng n Data Centers, [8] Y. Yao, L. Huang,. Shara, L. Golubchk, and M. Neely, Data Centers Power Reducton: Two Te Scale pproach for Delay Tolerant Workloads, n Proc. of IEEE INFOCOM,. [9] B. Cho and I. Gupta, New lgorths for Plannng Bulk Transfer va Internet and Shppng Networks, n Proc. of IEEE ICDCS,. [] M. dler, R. K. Staraan, and H. Venkataraan, lgorths for Optzng the Bandwdth Cost of Content Delvery, Coput. Netw., vol. 55, no. 8, pp. 47 4, Dec.. [] J. Dean and S. Gheawat, MapReduce: Splfed Data Processng on Large Clusters, Coun. CM, vol. 5, no., pp. 7 3, Jan. 8. [] S. Rao, R. Raakrshnan,. Slbersten, M. Ovsannkov, and D. Reeves, Salfsh: Fraework for Large Scale Data Processng, Yahoo!Labs, Tech. Rep.,. [3] B. Hentz,. Chandra, and R. K. Staraan, Optzng MapReduce for Hghly Dstrbuted Envronents, Departent of Coputer Scence and Engneerng, Unversty of Mnnesota, Tech. Rep.,. [4] H. Becker and J. Rordan, The rthetc of Bell and Strlng nubers, ercan journal of Matheatcs, vol. 7, no., pp , 948.
Maximizing profit using recommender systems
Maxzng proft usng recoender systes Aparna Das Brown Unversty rovdence, RI aparna@cs.brown.edu Clare Matheu Brown Unversty rovdence, RI clare@cs.brown.edu Danel Rcketts Brown Unversty rovdence, RI danel.bore.rcketts@gal.co
More informationOptimal Call Routing in VoIP
Optmal Call Routng n VoIP Costas Courcoubets Department of Computer Scence Athens Unversty of Economcs and Busness 47A Evelpdon Str Athens 11363, GR Emal: courcou@aueb.gr Costas Kalogros Department of
More informationP2P/ Gridbased Overlay Architecture to Support VoIP Services in Large Scale IP Networks
PP/ Grdbased Overlay Archtecture to Support VoIP Servces n Large Scale IP Networks We Yu *, Srram Chellappan # and Dong Xuan # * Dept. of Computer Scence, Texas A&M Unversty, U.S.A. {weyu}@cs.tamu.edu
More informationAn agent architecture for network support of distributed simulation systems
An agent archtecture for network support of dstrbuted smulaton systems Robert Smon, Mark Pullen and Woan Sun Chang Department of Computer Scence George Mason Unversty Farfax, VA, 22032 U.S.A. smon, mpullen,
More informationSequential DOE via dynamic programming
IIE Transactons (00) 34, 1087 1100 Sequental DOE va dynamc programmng IRAD BENGAL 1 and MICHAEL CARAMANIS 1 Department of Industral Engneerng, Tel Avv Unversty, Ramat Avv, Tel Avv 69978, Israel Emal:
More informationPerformance Evaluation of Infrastructure as Service Clouds with SLA Constraints
Performance Evaluaton of Infrastructure as Servce Clouds wth SLA Constrants Anuar Lezama Barquet, Andre Tchernykh, and Ramn Yahyapour 2 Computer Scence Department, CICESE Research Center, Ensenada, BC,
More informationAdverse selection in the annuity market when payoffs vary over the time of retirement
Adverse selecton n the annuty market when payoffs vary over the tme of retrement by JOANN K. BRUNNER AND SUSANNE PEC * July 004 Revsed Verson of Workng Paper 0030, Department of Economcs, Unversty of nz.
More informationDistributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
Foundatons and Trends R n Machne Learnng Vol. 3, No. 1 (2010) 1 122 c 2011 S. Boyd, N. Parkh, E. Chu, B. Peleato and J. Ecksten DOI: 10.1561/2200000016 Dstrbuted Optmzaton and Statstcal Learnng va the
More informationDP5: A Private Presence Service
DP5: A Prvate Presence Servce Nkta Borsov Unversty of Illnos at UrbanaChampagn, Unted States nkta@llnos.edu George Danezs Unversty College London, Unted Kngdom g.danezs@ucl.ac.uk Ian Goldberg Unversty
More informationDropout: A Simple Way to Prevent Neural Networks from Overfitting
Journal of Machne Learnng Research 15 (2014) 19291958 Submtted 11/13; Publshed 6/14 Dropout: A Smple Way to Prevent Neural Networks from Overfttng Ntsh Srvastava Geoffrey Hnton Alex Krzhevsky Ilya Sutskever
More informationDocumentation for the TIMES Model PART I
Energy Technology Systems Analyss Programme http://www.etsap.org/tools.htm Documentaton for the TIMES Model PART I Aprl 2005 Authors: Rchard Loulou Uwe Remne Amt Kanuda Antt Lehtla Gary Goldsten 1 General
More informationRECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY:
Federco Podestà RECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY: THE CASE OF POOLED TIME SERIES CROSSSECTION ANALYSIS DSS PAPERS SOC 302 INDICE 1. Advantages and Dsadvantages of Pooled Analyss...
More informationEnergy Conserving Routing in Wireless Adhoc Networks
Energy Conservng Routng n Wreless Adhoc Networks JaeHwan Chang and Leandros Tassulas Department of Electrcal and Computer Engneerng & Insttute for Systems Research Unversty of Maryland at College ark
More informationDISCUSSION PAPER. Should Urban Transit Subsidies Be Reduced? Ian W.H. Parry and Kenneth A. Small
DISCUSSION PAPER JULY 2007 RFF DP 0738 Should Urban Transt Subsdes Be Reduced? Ian W.H. Parry and Kenneth A. Small 1616 P St. NW Washngton, DC 20036 2023285000 www.rff.org Should Urban Transt Subsdes
More informationComplete Fairness in Secure TwoParty Computation
Complete Farness n Secure TwoParty Computaton S. Dov Gordon Carmt Hazay Jonathan Katz Yehuda Lndell Abstract In the settng of secure twoparty computaton, two mutually dstrustng partes wsh to compute
More informationMultiProduct Price Optimization and Competition under the Nested Logit Model with ProductDifferentiated Price Sensitivities
MultProduct Prce Optmzaton and Competton under the Nested Logt Model wth ProductDfferentated Prce Senstvtes Gullermo Gallego Department of Industral Engneerng and Operatons Research, Columba Unversty,
More informationA Study of the Cosine DistanceBased Mean Shift for Telephone Speech Diarization
TASL046013 1 A Study of the Cosne DstanceBased Mean Shft for Telephone Speech Darzaton Mohammed Senoussaou, Patrck Kenny, Themos Stafylaks and Perre Dumouchel Abstract Speaker clusterng s a crucal
More informationModels and Algorithms for Ground Staff Scheduling on Airports
Models and Algorthms for Ground Staff Schedulng on Arports Von der Fakulta t fu r Mathematk, Informatk und Naturwssenschaften der RhenschWestfa lschen Technschen Hochschule Aachen zur Erlangung des akademschen
More informationThe Relationship between Exchange Rates and Stock Prices: Studied in a Multivariate Model Desislava Dimitrova, The College of Wooster
Issues n Poltcal Economy, Vol. 4, August 005 The Relatonshp between Exchange Rates and Stock Prces: Studed n a Multvarate Model Desslava Dmtrova, The College of Wooster In the perod November 00 to February
More informationTHE ROLE OF COMMITMENT IN DYNAMIC CONTRACTS: EVIDENCE FROM LIFE INSURANCE*
THE ROLE OF COMMITMENT IN DYNAMIC CONTRACTS: EVIDENCE FROM LIFE INSURANCE* IGAL HENDEL AND ALESSANDRO LIZZERI We use data on lfe nsurance contracts to study the propertes of longterm contracts n a world
More informationMaxMargin Early Event Detectors
MaxMargn Early Event Detectors Mnh Hoa Fernando De la Torre Robotcs Insttute, Carnege Mellon Unversty Abstract The need for early detecton of temporal events from sequental data arses n a wde spectrum
More informationDISCUSSION PAPER. Is There a Rationale for OutputBased Rebating of Environmental Levies? Alain L. Bernard, Carolyn Fischer, and Alan Fox
DISCUSSION PAPER October 00; revsed October 006 RFF DP 03 REV Is There a Ratonale for OutputBased Rebatng of Envronmental Leves? Alan L. Bernard, Carolyn Fscher, and Alan Fox 66 P St. NW Washngton, DC
More informationA Structure for General and Specc Market Rsk Eckhard Platen 1 and Gerhard Stahl Summary. The paper presents a consstent approach to the modelng of general and specc market rsk as dened n regulatory documents.
More informationCREDIT RISK AND EFFICIENCY IN THE EUROPEAN BANKING SYSTEMS: A THREESTAGE ANALYSIS*
CREDIT RISK AD EFFICIECY I THE EUROPEA BAKIG SYSTEMS: A THREESTAGE AALYSIS* José M. Pastor WPEC 998 Correspondenca a: José M. Pastor: Departamento de Análss Económco, Unverstat de Valènca, Campus dels
More informationBoosting as a Regularized Path to a Maximum Margin Classifier
Journal of Machne Learnng Research 5 (2004) 941 973 Submtted 5/03; Revsed 10/03; Publshed 8/04 Boostng as a Regularzed Path to a Maxmum Margn Classfer Saharon Rosset Data Analytcs Research Group IBM T.J.
More informationSectorSpecific Technical Change
SectorSpecfc Techncal Change Susanto Basu, John Fernald, Jonas Fsher, and Mles Kmball 1 November 2013 Abstract: Theory mples that the economy responds dfferently to technology shocks that affect the producton
More informationAlgebraic Point Set Surfaces
Algebrac Pont Set Surfaces Gae l Guennebaud Markus Gross ETH Zurch Fgure : Illustraton of the central features of our algebrac MLS framework From left to rght: effcent handlng of very complex pont sets,
More informationMANY of the problems that arise in early vision can be
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 2, FEBRUARY 2004 147 What Energy Functons Can Be Mnmzed va Graph Cuts? Vladmr Kolmogorov, Member, IEEE, and Ramn Zabh, Member,
More informationMean Field Theory for Sigmoid Belief Networks. Abstract
Journal of Artæcal Intellgence Research 4 è1996è 61 76 Submtted 11è95; publshed 3è96 Mean Feld Theory for Sgmod Belef Networks Lawrence K. Saul Tomm Jaakkola Mchael I. Jordan Center for Bologcal and Computatonal
More information(Almost) No Label No Cry
(Almost) No Label No Cry Gorgo Patrn,, Rchard Nock,, Paul Rvera,, Tbero Caetano,3,4 Australan Natonal Unversty, NICTA, Unversty of New South Wales 3, Ambata 4 Sydney, NSW, Australa {namesurname}@anueduau
More information