Multi-timescale Distributed Capacity Allocation and Load Redirect Algorithms for Cloud System

Mult-tmescale Dstrbuted Capacty Allocaton and Load Redrect Algorthms for Cloud System Danlo Ardagna, Sara Casolar Mchele Colajann, Barbara Pancucc Report n. 2011.23

Mult-tme Scale Dstrbuted Capacty Allocaton and Load Redrect Algorthms for Cloud Systems 1 Danlo Ardagna, Sara Casolar, Mchele Colajann,Barbara Pancucc Poltecnco d Mlano, Dpartmento d Elettronca Informazone Unverstà d Modena e Reggo Emla, Dpartmento d Ingegnera dell Informazone Emal: {ardagna,pancucc}@elet.polm.t, {sara.casolar,mchele.colajann}@unmore.t Abstract Resource management remans one of the man ssue n cloud computng because system resources have to be contnuously allocated to handle worload fluctuatons whle guaranteeng Servce Level Agreements (SLA) to the end users. In ths paper, we propose capacty allocaton algorthms able to coordnate multple dstrbuted resource controllers operatng n geographcally dstrbuted cloud stes. Capacty allocaton solutons are ntegrated wth a load redrecton mechansm whch, when necessary, dstrbutes ncomng requests among dfferent stes. The overall goal s to mnmze the costs of allocated resources n terms of vrtual machnes, whle guaranteeng SLA constrants expressed as a threshold on the average response tme. We propose a dstrbuted soluton whch ntegrates worload predcton and dstrbuted non-lnear optmzaton technques. Expermental results show how the proposed solutons mprove other heurstcs proposed n lterature wthout penalzng SLAs, and they are close to the global optmum whch can be obtaned by an oracle wth a perfect nowledge about the future offered load. Keywords: Cloud systems, Performance modelng, Resource management, Capacty allocaton, Load balancng, SLA. I. INTRODUCTION Cloud computng s a paradgm that ams at streamlnng the on-demand provsonng of software, hardware, and data as servces, and provdng end-users wth flexble and scalable servces accessble through the Internet [28].

2 Cloud nfrastructures lve n an open world characterzed by contnuous changes n the envronment and requrements they have to meet. Contnuous changes occur autonomously and unpredctably, and they are out of control of the cloud provder. Nevertheless, cloud-based servces must be provded wth dfferent Servce Level Agreements n terms of relablty, securty and performance. In ths paper, we focus on solutons for performance guarantee that are able to dynamcally adapt the resources of the cloud nfrastructure n order to satsfy SLAs and to mnmze costs. To ths purpose, we ntegrate worload predcton models nto capacty allocaton technques that are able to coordnate multple dstrbuted resource controllers worng n geographcally dstrbuted cloud stes. These allocaton technques can wor together wth a dynamc load redrecton mechansm whch, durng pea loads, moves requests from heavly loaded stes to other stes. The proposed request dstrbuton algorthms am at optmzng the average response tme of user requests and satsfy SLA requrements. In cloud archtectures that are characterzed by geographcally dstrbuted systems, any centralzed approach for capacty allocaton and load balancng s subject to crtcal desgn lmtatons ncludng lac of scalablty and expensve communcaton costs [20]. Therefore, dstrbuted solutons are mandatory [7], [34], [57], [31]. Ths paper adopts a dstrbuted approach where, capacty allocaton and load redrect are modelled as non-lnear programmng problems. The optmzaton problems are solved by mplementng decomposton technques whch are ntegrated wth traffc predctve models, used to estmate the ncomng worload at each ste and the requests rate redrected from heavly loaded stes to the others. We compare our approach wth other heurstcs proposed n the lterature [23], [59], [58]. A large set of expermental results demonstrates that the proposed solutons save costs and do not ncur n SLA volatons. It s also worth to observe that our solutons are close to the global optmum that can be obtaned by an oracle havng a perfect nowledge about the future worload. The remander of the paper s organzed as follows. Secton II formalzes the problem. Secton III descrbes our reference framewor and desgn assumptons. The formulaton of the optmzaton problem s presented n Secton IV. The predcton technques are ntroduced n Secton V. The expermental results demonstratng the mprovements of the proposed solutons are reported n Secton VI. Other lterature approaches are dscussed n Secton VII. Conclusons are fnally drawn n Secton VIII. II. PROBLEM STATEMENT In ths paper we tae the perspectve of a provder whch offers transactonal Web-based servces (WS) hosted by multple stes of an Infrastructure-as-a-Servce (IaaS) provder. The hosted servces represent heterogeneous applcatons n terms of resource demands, worload ntenstes, and SLA requrements.

3 =1 =2 Local worload manager Vrtualzed Servers! "# $% "# & & & 23 4( 7( 23 4 ( 23 5( 7(!)(!)(!)(!"#$%&'()&*+",-().,"$.#( /&#01&#-( 23 6(!)( IaaS Provder!!! "#! $! "#%"#! &! ##!!" #$! "" #$%#$! &" $$!!" #$! %" #$&#$! '" $$ Local WS arrval rates! "# $%! &# $%'$%! (# %% Executon rate of local arrvals =4 Local worload manager! "# $%! &# $%'$%! (# %% =3! "# $%! &# $%'$%! (# %%! "# $%! &# $%'$%! (# %% Redrect rate of local arrvals Local CA and LR manager Vrtualzed Servers Fg. 1. Cloud System Reference Framewor. Servces wth dfferent SLAs and worload profles are categorzed nto ndependent classes. We assume that an SLA contract assocated wth each WS class s establshed between the WS provder and ts end users. Ths contract specfes the SLA levels expressed n terms of average response tme R that the WS provder must meet whle respondng to end users requests for a gven servce class. Overall, the system serves a set K of WS classes and average response tme thresholds that have to be guaranteed on a fve mnutes tme scale are denoted by R. Applcatons are hosted on vrtual machnes (VM) whch are provded on a pay-per-use bass by the IaaS provder. For the sae of smplcty, we assume that each VM hosts one Web servce applcaton. Multple VMs mplementng the same WS class can run n parallel at each physcal locaton. We assume that the VMs are homogeneous n terms of RAM and CPU capacty [4] and evenly share the ncomng worload (ths corresponds to the soluton currently mplemented by IaaS provders [6]). Furthermore, servces can be located on multple geographcally dstrbuted stes (see Fgure 1). For example, Amazon Elastc Compute Cloud (EC2) allows software provders to deploy VMs on fve world regons. IaaS provders usually charge software provders on a hourly bass [4]. Hence, the WS provder has to face the Capacty Allocaton (CA) problem whch conssts on determnng every hour the optmal number of VMs for each WS class n each IaaS ste accordng to the average load predcted on a

4! "#! $#!" ## $#! $#!" ##! "# Fg. 2. Cloud System Reference Framewor. hourly bass, whle guaranteeng SLA constrants. We denote by T 1 the md-long tme scale adopted for VM provsonng. If the resources of a ste are nsuffcent (e.g., because of an unpredctable worload pea), ncomng requests can be even redrected to other stes. As n other approaches, dynamc Load Redrecton (LR) [12], [59] s performed perodcally every T 2 << T 1 tme nstants (see Fgure 2) at a more fne-graned tme scale (e.g., 5 to 10 mnutes) on the bass of a short-term predcton of future WS worloads [1], [16] or can be trggered by a montorng system n order to react to unexpected events, such as system falures. By consderng two dfferent tme scales, we are able to capture two types of nformaton related to the IaaS stes [59]. The fne gran worload traces exhbt a hgh varablty due to the short-term varatons of the typcal Web-based worload and for ths reason, the fne-graned tme scale provdes useful nformaton for the dynamc load redrectons. In the coarse gran tme scale, the worload traces are more smoothed and not characterzed by the nstantaneous peas typcal of the fne gran tme scale. These characterstcs allow us to use the md-long tme scale algorthms to predct the worload trend that represents useful nformaton for the capacty allocaton algorthm. We denote by I the set of IaaS stes. For smplcty, we assume that at each ste VMs are homogeneous n terms of computng and storage capacty. Even n the case of heterogeneous VMs, cloud provders have a lmted set of avalable confguratons, say S. Hence, a ste wth heterogeneous resources can be modelled as S stes wth homogeneous resources. The capacty of VMs at ste s denoted by C, whle for each ste and WS class we denote wth Λ the arrval rate predcted at the tme scale T 1 comng

5 from the tme zone where the ste s located, whle we denote through Λ the correspondng predcton at the tme scale T 2 (see Fgure 2). Fnally, Λ wll ndcate the real local arrval rate. The objectve of the CA problem s to determne the number of VMs able to serve Λ requests/sec, whle mnmzng VMs costs and guaranteeng that R R. We assume that a WS provder can establsh two dfferent contracts wth the IaaS provder. Namely, t may be possble to access VMs on a pure on-demand bass and the WS provder wll be charged on a hourly bass (see e.g., Amazon EC2 on-demand prcng scheme, [4]). Otherwse, t may be possble to pay a fxed annual flat rate for each VM and then access the VMs on a pay-per-use bass wth a fee lower than the pure on-demand case (see e.g., Amazon EC2 reserved nstances prcng scheme, [4]). The tme unt cost (e.g., $ per hour of VM usage) for the use of flat VMs at ste s denoted by c, whle the cost for VMs on demand wll be denoted by c, wth c < c. The CA problem soluton determnes every T 1 tme unt the number of flat VMs to be allocated to WS class at ste, N, and the number of on demand VMs to be allocated to class at ste, M. We wll denote wth N, the number of flat VMs avalable at ste obtaned through the annual flat contract. On the other hand, the LR problem ams at determnng (every T 2 tme nstants) the executon rate of local arrvals for class at ste, x, and the redrect rate of class at ste toward the other stes, z, n order to satsfy the predcton Λ for the local arrvals, whle guaranteeng that R R. For the sae of clarty, the notaton adopted n ths paper s summarzed n Table I. III. REFERENCE FRAMEWORK AND DESIGN ASSUMPTIONS Our dynamc CA and LR technques combne a worload predctor and an optmzaton model. In the followng we model each WS class hosted n a VM as an M/G/1 queue n tandem wth a delay center [18], as done n [40] and we adopt the typcal assumpton n Web servce contaners [44], [3], that requests are served accordng to the processor sharng schedulng dscplne. Multple VMs can run n parallel to support the same applcaton. In that case, the worload s evenly shared among multple nstances (see Fgure 3). As dscussed n [40], the delay center D allows us to model networ delays and/or protocol delays ntroduced n establshng connectons, etc. Future performance for each WS class are obtaned on the bass of the predcton of forecasted worloads. The optmzaton model uses these estmates to determne the number of VM nstances N and M, the executon rate of local arrvals for each class x other stes z. and, possbly, the worload redrected to

6 I K C c c N T 1 T 2 Λ System parameters Set of stes Set of WS classes VM nstances capacty at ste Tme unt cost for flat VMs at ste Tme unt cost for on demand VMs at ste Number of flat VMs avalable at ste Long term CA tme horzon Short term LR tme horzon Real local arrval rate for WS class at ste Λ Local arrval rate predcton for WS class at ste at tme scale T 1 Λ Local arrval rate predcton for WS class at ste at tme scale T 2 λ Estmaton of the overall request redrect to ste for WS class at tme scale T 2 µ Maxmum servce rate of a capacty 1 VM for executng WS class requests D Queueng delay for processng class requests d,j, j Networ transfer tme for redrectng class requests from ste to ste j g,j = 1, j Conductance of the communcaton ln (,j) for class requests d,j G = g,j, j Equvalent conductance seen from ste to the other stes for class requests j R R N M x z Average response tme for executng WS class request at ste WS class request average response tme threshold Decson Varables Number of flat VMs allocated for class request at ste Number of on demand VMs allocated for class request at ste Executon rate of local arrvals for WS class request at ste Redrect of WS class request at ste toward other stes TABLE I PARAMETERS AND DECISION VARIABLES FOR CAPACITY ALLOCATION AND LOAD REDIRECT PROBLEMS. For the sae of smplcty, f the worload s redrected to other stes, the fracton of worload to ndvdual stes s nversely proportonal to the networ transfer tme for redrectng class requests from ste to ste j, d,j or equvalently s drectly proportonal to the conductance g,j of the class request networ ln between ste and j defned as g,j conductance G at ste for class as: G = = 1/d,j. In other words, f we defne the equvalent j I, j g,j

7 N +M!... D µ Fg. 3. System Performance Model. the overall load at ste due to the redrect of other stes s gven by: g j, zj. j I,j At the tme scale T 2, the total rate of class requests executed at ste s the sum of the requests executed from local arrval,.e. x Λ, and the requests executed from the redrect whch, accordng to the prevous equaton s gven by: x + j G j g j, zj G j. (1) In other words, n our LR scheme requests can be redrected only once. Otherwse multple hops could penalze some ndvdual requests, thus ncreasng the overall response tme varance of requests wthn the same WS class. The next secton formulates the optmzaton problems and devses a decentralzed soluton for the CA and LR problems. Secton V dscusses the predcton models adopted n ths paper. IV. OPTIMIZATION PROBLEM FORMULATION As dscussed n Secton II, Capacty Allocaton and Load Redrect are performed wth dfferent tme scales. We formulate the Capacty Allocaton problem n the next Secton, whle the Load Redrect approach s presented n Secton IV-B. A. Capacty Allocaton problem The CA problem s solved wth T 1 tme perod. It ams at mnmzng the overall costs for flat and on demand VM nstances of multple dstrbuted IaaS stes, whle guaranteeng that the average response

8 tme of each class s lower than the SLA threshold. The CA determnes the number of VMs N and M requred to serve the arrval rate Λ. In ths phase the traffc redrected to other stes s not consdered. Indeed, prelmnary results have shown that the LR mechansm, even f sgnfcant at the lower tme scale T 2, ntroduces a lmted ncrement to each class local ncomng worload Λ at tme scale T 1. If µ denotes the maxmum servce rate of a capacty 1 VM for executng WS class requests, the response tme for executng locally the WS class at ste s gven by R = 1 C µ Λ N +M. In partcular t must be (M/G/1 equlbrum condton) Λ < C µ (N + M ), and the total response tme for class request over all stes s: R = D + Λ R Λ j j where D denotes the queueng networ delay (see Fgure 3). After some elementary algebra, the CA problem can be formulated as: (CA) subject to mn N,M Λ (N + M ) c N + c M Λ < C µ (N + M ) K, I (3) C µ (N + M ) (R D ) Λ j N N I (5) K N, M 0 (2) Λ j K (4) where constrants (5) guarantee that the number of VMs allocated to the whole set of classes at ste s at most equal to the number of flat VMs avalable at each ste. Note that, n the problem formulaton we have not mposed varables N and M to be nteger, as n realty they are. Integer varables mae the soluton much more dffcult because of the non-lnear constrants (4). We therefore decde to deal wth contnuous varables, actually consderng a relaxaton of the real problem. However, f the optmal values of the varables are fractonal and they are rounded to the closest nteger soluton, the gap between the soluton of the real nteger problem and the relaxed one s very small, justfyng the use of a relaxed model. Furthermore, we can always choose a roundng to an nteger soluton whch preserves the feasblty. The CA problem has a lnear objectve functon over a convex set. Indeed constrants (3) and (5) are

9 lnear whle the functons Λ (N +M ) C µ (N +M ) Λ are convex snce ther Hessan s of the form: ) 2 C µ ( Λ H = 1 1 (C µ (N + M ) Λ ) 3 1 1 whose egenvalues are non-negatve n the feasble set (n partcular for condton (3)). Hence, the global optmum soluton can be obtaned by solvng (CA) n parallel at each ste and by adoptng standard non-lnear solvers. Ths requres that each ste broadcasts ts Λ predctons whch can be obtaned consderng only local nformaton. Snce ths broadcast s performed every T 1 tme nstants, the networ overhead for the CA soluton s neglgble. B. Load Redrect problem Once the number of on demand nstances has been determned, local requests can be dynamcally redrected to other stes wth tme granularty T 2 n order to, e.g., avod epsodc local congestons due to the varablty of the ncomng worload at tme granularty T 2 around ts average hourly predcted value (see Fgure 2). Accordng to equaton (1), the average response tme for executng locally WS class at ste (.e., wthout consderng the networ delay due to redrects) s gven by: R = D 1 + ). C µ ( x + j g j, G j N +M Durng tme nterval T 2 the number of executons of class requests at ste s T 2 x +T 2 the average response tme for remote requests s gven by both R and the delay j the total response tme for executng class request at ste s: j z j G j R = R + ( x + j g j, G j zj zj ), j d j, g j, zj G j g j, zj j G j g j, zj G j, and. Therefore and the total response tme for class request over all stes s on average: ( x R = + g ) j, zj G R j j. The goal of our load redrect scheme s to cooperatvely mnmze the request average response tmes. Formally the LR can be formulated as a constrant programmng problem snce R R must hold and j Λ j

10 the cost for request executon s determned by the CA soluton and s not nfluenced by the LR decson varables. However, n order to provde an effcent dstrbuted soluton, n our LR problem formulaton we consder the total requests response tme as the metrc to be mnmzed. Indeed, expermental results have shown, that ntroducng an objectve functon allows us to speed up the dstrbuted algorthm convergence relyng on standard non-lnear solvers. The LR problem can be formulated as follows: (LR) mn x,z D x + j g j, zj G j (N + M ) ( x + g j, G j j + C µ (N + M ) (x + j zj ) g j, G j zj ) + j z j G j subject to x + j g j, zj G j x + z = Λ K, I, (6) < C µ (N + M ) K, I, (7) x, z 0 K, I. Constrants (6) ensure that the overall class requests at node are locally executed or are redrected toward the other stes, whle constrants (7) guarantee that VMs saturaton condtons are avoded. (LR) defnes a centralzed load balancng problem: All the system nformaton (.e., the local ncomng worload predctons Λ ) has to be gathered and used to get the optmal worload balancng. However, for large scale cloud systems, ths centralzed load balancng scheme s unsutable because the Λ broadcast may add a sgnfcant networ overhead n massvely dstrbuted systems (recall T 2 s around 5-10 mnutes). The man dea we adopted to obtan a fully decentralzed algorthm s to explot predcton technques to estmate at each ste also the worload due to the request redrect from other stes. In other words, g j, zj G j each term s replaced by the request redrect estmaton λ whch becomes a parameter of j the optmzaton problem. The dea to predct the requests comng from other stes dffers from tradtonal worload predcton approaches, snce they are especally orented to forecast the local ncomng traffc [14], [47] or the local load condtons [21], [29], [43]. Note that, the predcton Λ at tme scale T 2 exhbts a noser behavour wth respect to Λ evaluated at tme scale T 1. Furthermore, f Λ < Λ, the CA results n a system underprovsoned and load redrecton have to be performed n order to satsfy the M/G/1 queue equlbrum condton and to guarantee that R R. As a consequence of Λ nosy behavour, the request redrecton s performed occasonally and the nature of the correspondng worload s characterzed by a spes.

11 The statstcal behavour of λ and the temporal constrant mposed by the applcaton context are the two man elements that nfluence the choce of the predcton model used by the LR algorthm and wll be descrbed n Secton V. The last term n the objectve functon can be rewrtten as: Then, the problem (LR) becomes: (LR2) subject to mn x,z D ( x + λ z j G j j ) + = ( I 1) z G (N + M ) ( x + ) λ C µ (N + M ) (x + λ ) x + z = Λ K, I, x + λ < C µ (N + M) K, I, x, z 0 K, I. + ( I 1) z G (LR2) can be separated by droppng the constants n the objectve functon and omttng the varables and ndexes: (LR2 ) subject to mn x,z (N + M) ( x + λ) D x + + ( I 1) z C µ (N + M) (x + λ) G x + z = Λ (8) x + λ < C µ (N + M) (9) x, z 0 (10) Each problem (LR2 ) can be solved ndependently at each ste from local nformaton wthout the need to exchange data wth other stes. The objectve functon of (LR2 ) s convex. Indeed the Hessan s: H = 2 C µ (N+M) ( 2 ) 3 0 C µ (N+M) (x+ λ) 0 0 (11)

12 and n the feasble set under condton (9) the egenvalues are non-negatve. Then, the global optmum can be dentfed by applyng the Karush Kuhn Tucer (KKT) condtons, whch are necessary and suffcent for optmalty n convex problems. The soluton s obtaned through the followng theorem. Theorem 1: The global optmum soluton for problem (LR2 ) can be obtaned consderg the followng cases: Case a: x = (N + M) ( C µ z = Λ + λ (C µ C µ ( I 1) G D C µ ( I 1) G ) D λ (12) ) (13) Case b: x = Λ, z = 0, under the condton: Λ + λ < C µ (N + M) Case c: x = 0, z = Λ, under the condton: λ < C µ (N + M) Proof. See Appendx A. The global optmum can be obtaned by nspecton, evaluatng the objectve functon for the three cases reported above. Case a corresponds to heavy load condtons for the system where the locally executed and redrected worloads can be obtaned by (12) and (13). Case b corresponds to lght load condtons where the overall load (local and redrected from other stes) s executed locally. Fnally, n Case c the ste serves only requests comng from other stes, whle the whole local ncomng worload s redrected to other stes. Ths last soluton corresponds to the unusual stuaton arsng when the ste s under-provsoned and/or the delay to redrect requests to other ste s very low. V. WORKLOAD PREDICTION MODELS For each ste and WS class, our CA and LR algorthms use a predcton of the (real) local ncomng worload Λ comng from the tme zone where the ste s located and also an estmate of the worload g j, due to the request redrect from other stes λ = zj. Λ G j s predcted at the two dfferent tme scales j T 1 and T 2 n order to respect the temporal constrant mposed by the CA and LR problem, respectvely. An estmate of λ s requred only at tme scale T 2. The worload characterzaton lterature s especally orented to the analyss of ncomng worload models and ther statstcal propertes (e.g., [38], [14]), but t does not deal wth the complexty and mostly unnown statstcs of the redrected requests λ. However, n our settng the request redrect estmaton could provde a smplfcaton and an useful nformaton for the soluton of the LR problem.

13 Fg. 4. Daly behavour of Λ and λ. Fg. 5. Auto-Correlaton functon of real ncomng worload. Data reachng the ste represent the hstorcal bass for predcton and estmaton models. Each hstorcal nformaton referrng to one worload can be consdered as a tme seres. An mportant premse s that no predcton model can wor f the analyzed tme seres does not exhbt some predctablty characterstcs that s, some temporal dependency. The auto-correlaton analyss on the tme seres allows us to show the presence of tme dependence and to dstngush between the possblty of achevng or not predcton. Fgure 4 reports an example of the daly behavor that characterzes the ncomng worload Λ and the redrect worload from other stes λ. The latter has been obtaned by solvng several nstances of problem centralzed formulaton of the LR problem under realstc condtons. The consdered worloads exhbt two dfferent behavors. In partcular, Λ s affected by fluctuatons because of durnal actvty. Its hgh values of the autocorrelaton functon shown n Fgure 5 confrm that the measures of the ncomng worload are correlated. On the other hand, λ shows jttery perturbatons and spes varyng n tme and ntensty that cause a quc decay of the ACF values shown n Fgure 5. A hgh value of the autocorrelaton functon suggests that a past measure may be used to predct the future values wth some degree of accuracy. The vce versa s true when the ACF between two ponts tends to zero. It s worth to pont out that the results shown n Fgure 5 are representatve of a very large set of analyses we performed by consderng the ncomng worload to real systems (ncludng an e-commerce ste, an on-lne banng ste, and a large Italan Unversty Web system). The lesson learned from our results s the followng. Independently of the cloud ste, the slow decay of the auto-correlaton functon of the ncomng worload Λ confrms ts predctablty, whle the low

14 auto-correlaton of λ evdences ts unpredctable nature. For ths reason, a relable predcton of λ usng only ts past values s unfeasble. To overcome ths ssue and to provde a relable representaton of λ wthout the sde effects of spes, we estmate ts actual value by usng a flterng technque based on the Exponental Weghted Movng Average (EWMA), as suggested n [8]. In ths way, we are able to reduce the out of scale value caused by nstantaneous spes and to extract a smoothed representaton of the redrected worload. In partcular the estmate λ (t) at tme t s evaluated as follows: λ (t) = αλ (t) + (1 α) λ (t 1) (14) where 0 < α 1 s typcally set to 1/(1 + 2π f) and f s the cutoff frequency of the EWMA flter. See [39] for further detals. Concernng the predcton of Λ we rely on the Exponental Smoothng (ES) method whch has been adopted n many felds for decades [23], [8] and s sutable to run-tme and non-statonary applcatons. In general, each predcton mechansm s characterzed by several alternatve mplementatons, where the choce about flterng or not flterng nput data, and choosng the best parameters of the model n a statc or dynamc way are the most sgnfcant [22]. These alternatves characterze every predcton models and are especally mportant when they are used n an autonomc context wthout the possblty of human nterventon and nterpretaton. In order to predct the local arrval rate Λ the choce for the ES models s motvated by the applcaton context characterzed by short-tme predctons sutable for fast and autonomc decsons subject to real-tme constrants of cloud systems. ES s an ntutve forecastng method that unequally weghts the samples of the nput tme seres Λ [32]. Non-unform weghtng s acheved through smoothng parameters whch determne how much mportance s assgned to each sample. In ths paper, we consder a verson of ES where parameters are dynamcally chosen n order to adapt the predcton model to the worload fluctuatons that characterze modern clouds [59]. We lmt the dscusson to the predcton of the arrval rate Λ at tme scale T 2. The ES-predcton model at tme scale T 1 can be obtaned n the same way. At sample t, the ES model estmates the local arrval rate at T 2 steps ahead, Λ (t + T 2 ) as a weghted average of the last sample Λ (t) and of correspondng predcted sample Λ (t), that s equal to: Λ (T 2 ) = 1 T 2 Λ T (t) 2 t=1 Λ (t + T 2 ) = γ (t) Λ (t) + (1 γ (t))λ (t), t > T 2

15 where Λ (T 2 ) s the ntal predcted value and 0 < γ (t) < 1 s the smoothng factor at current sample t related to the ste and to the class that determnes how much weght s assgned to each sample. We obtan a dynamc ES model by re-evaluatng the smoothng factor γ (t) at each predcton sample t. There are dfferent proposals for the dynamc estmaton of γ (t) (e.g., [54], [56], [32], [26]). A wdely used procedure s proposed by Trgg and Leach [54] that defne the smoothng parameter as the absolute value of the rato of the smoothed error, A (t), to the absolute error, E (t): γ (t) = A (t) E (t) The smoothed and absolute errors are equal to: A (t) = φɛ (t) + (1 φ)a (t T 2) E (t) = φ ɛ (t) + (1 φ)e (t T 2) where ɛ (t) s the forecast error at sample t, ɛ (t) = Λ (t) Λ (t), and φ s set arbtrary wth 0.2 beng a common choce [54]. Ths dynamcal choce of γ (t) mproves the predcton qualty and lmts the delay problem related to the tradtonal ES model based on a statc choce of the γ parameter. We use an analogous mplementaton of the ES predcton model to predct the local arrval rate at T 1 steps ahead, Λ. For the sae of smplcty, n the remander of the paper the t sample ndex wll be omtted. VI. EXPERIMENTAL RESULTS The proposed resource management algorthms have been evaluated for a varety of system and worload confguratons. Secton VI-A presents the expermental settngs and the results on the scalablty of our algorthms. Secton VI-B presents a cost-beneft evaluaton of our soluton compared wth other heurstcs and state-of-the-art technques [23], [59], [58]. Fnally, Secton VI-C shows the results of the applcaton of our resource management algorthms n a real prototype envronment deployed n Amazon EC2. A. Algorthm Performance To evaluate the effcency of the proposed algorthms, we use a large set of randomly generated nstances. All tests are performed on VMWare vrtual machne based on Ubuntu 9.10 server runnng on an Intel Nehalem dual socet quad-core system wth 32 GB of RAM. The vrtual machne has a physcal core dedcated wth guaranteed performance and 4 GB of memory reserved. We use SNOPT 7.2.4 as the non-lnear solver [36].

16 K, I Tme K, I Tme K, I Tme 100,20 2.3 500,20 29.3 1000,20 34.5 100,40 6.4 500,40 33.3 1000,40 98.6 100,60 9.3 500,60 65.6 1000,60 160.9 TABLE II CAPACITY ALLOCATION PROBLEM SOLUTION EXECUTION TIME (SEC). The number of cloud stes I has vared between 20 and 60, the number of request classes K between 100 and 1000. The maxmum servce rate of a capacty one VM for executng class requests, µ, has been vared unformly n (0.1,1), as n [3], whle we set R = 3/µ, as n [11], [3], [12]. Table II reports, for problem nstances of dfferent szes, the computatonal tme n seconds requred to solve the CA problem (fgures are the means computed on ten dfferent runs) demonstratng that our soluton can determne the global optmal number of VM nstances n few mnutes, hence t s sutable to be appled on a hourly bass. The computatonal tme for solvng a sngle (LR2 ) problem nstance s few mllseconds. Overall the LR soluton s largely scalable and the load redrect mechansm can be reasonably appled at 5-10 mnutes tme scales wthout ntroducng any bottlenec. B. Comparson wth alternatve lterature proposals In order to evaluate the qualty of our approach we compare t aganst the soluton whch can be obtaned by an oracle who has a perfect nowledge of the future worloads and performs capacty allocaton and request redrecton of tme perod T 1 and T 2. Furthermore, we perform a cost-beneft evaluaton of our technque by consderng other heurstcs wth a twofold am. On the one hand we compare our soluton wth other state-of-the-art proposals whch explot the utlzaton prncple and determne the number of VM nstances accordng to an utlzaton threshold upper bound [23], [59], [58], [4]. On the other hand, we evaluate the effectveness of the LR mechansm n the cloud. Indeed, n cloud systems the resource provsonng can be performed n very few mnutes and hence nstead of redrectng the load to other stes one can argue that the allocaton of addtonal VMs to manage pea of traffcs could be more effectve. In ths Secton we report the results of the comparson of our CA and LR mechansms wth a set of solutons whch perform a more fne graned CA at multple tme scales. In the remander of ths Secton the followng alternatve solutons wll be consdered:

17 Oracle: The CA s performed every hour, whle th LR tme perod s 5 mnutes. The worload predcton s 100% accurate,.e., for any tme nstant Λ = Λ and Λ = Λ. Heurstc 1: The CA s performed on a 5 mnutes tme horzon and the number of VMs s determned accordng to utlzaton thresholds as n other approaches proposed n the lterature [23], [59], [58] and currently mplemented also by IaaS provders (see, e.g., the very recent release of Amazon AWS Elastc Beanstal [5]). In the evaluaton, a lfe span of one hour for each nstantated VM has been consdered. The number of VMs s determned such that the utlzaton of the VMs s equal to a gven threshold τ 1. The VM provsonng s further trggered f the predcton of the VMs utlzaton s hgher than a second threshold τ 2 > τ 1. Multple analyses have been performed by adoptng dfferent thresholds: (τ 1, τ 2 ) = (40%, 50%), (50%, 60%), and (60%, 80%). Heurstc 2: Same as Heurstc 1 but the number of VMs s determned by optmally solvng the CA problem reported n Secton IV-A every 5 mnutes. Heurstc 3: Same as Heurstc 2 but wth a 10 mnutes tme horzon. The performance parameters of the request classes have been randomly generated as n Secton VI-A. The local ncomng worload s obtaned from the traces, at 5 mnutes sample tme nterval, related to a large dynamc Web-based system. We consder three scenaros: Normal day scenaro: It descrbes the baselne worload where the number of clents requests changes by followng the b-modal profle shown n [15]. Heavy day scenaro: It exhbts a 40% unform ncrement n the number of clent requests wth respect to the baselne worload. Nosy day scenaro: It adds a nose component to the heavy day scenaro. We added a whte nose wth zero mean and standard devaton equal to 10% of the heavy day pea. In ths way, we test the robustness of the overall soluton n hghly varable contexts. All scenaros are representatve of the typcal Web-based worload that s characterzed by heavytaled dstrbutons [25], [13]. Moreover, the heavy scenaros add burst arrvals and flash crowds [38] that contrbute to augment the request sew, and they represent a more stressful testbed for predcton models. The motvaton behnd ths choce s to demonstrate that our predcton algorthm wors even n crtcal scenaros and our CA+LR mechansm are robust to worload varablty, although the toughest goal of predctng hot spot events remans an open ssue beyond the scope of ths paper. In partcular, the predcton model consdered n ths paper s able to provde an accurate predcton qualty that, n terms of mean square error [22], s always lower than 10%.

18 Overall we have consdered 12 stes. In our LR soluton, the decson are made accordng to the predcton Λ whch allows to determne the overall traffc x + g j, z j G j j to be executed locally. Ths load ncludes two parts: the porton of the local ncomng worload x Λ, and the one due to the other stes redrect g j, z j G. In the evaluaton we consder as the local worload to the cloud stes the j j real worload Λ and the load actually redrected to other ste s the traffc exceedng the predcton based value x + j and T 2 = 5 mnutes. g j, z j G. In the followng quanttatve analyss n our soluton we set T j 1 = 1 hour, Fgures 6-11 plot the VM nstantaneous costs over the 24 hours for the normal, heavy, and nosy day scenaros. Table III reports the percentage savngs of our approach wth respect to other solutons consderng the total costs over the whole day. In partcular Fgures 6-8 compare our soluton wth Heurstc 1 for dfferent values of the thresholds (τ 1, τ 2 ), whle Fgures 9-11 compare our approach wth the oracle and Heurstcs 2 and 3. Results show that our approach s always the most convenent one and t s very close to the oracle soluton. Furthermore, even f the oracle has a perfect nowledge of the future t does not always lead to the mnmum nstantaneous cost. Indeed, f the predcton s slghtly lower then the real traffc, then our soluton s cheaper than the oracle snce a lower number of VMs s adopted. On the other hand, the oracle has the advantage of havng no chance to volate the SLA (snce there are no unexpected stuatons). Concernng ths latter ssue, Fgures 12-13 reports, as an example, the plot of the rato R of the R average response tme wth respect to the response tme threshold of a class consdered as a reference at ste 1. The plot shape s pretty general and s ndependent of the consdered ste and request class. As the results show, the Heurstc 1 s very senstve to the adopted thresholds. The (40%, 50%) case s very conservatve: t s around 35% more expensve than our approach but t always satsfes the response tme threshold (the rato s strctly lower than 1). Vce versa, the (60%, 80%) case has costs close to our soluton (only 2-4% hgher) but t ntroduces a very large number of SLA volatons especally n the nosy day scenaro (see Fgure 13). Our soluton ntroduces overall only 37 volatons over the 3456 tme ntervals consdered n the 12 stes, over the whole day. Furthermore, Heurstcs 1 s more senstve to traffc varablty (the costs ncurred n the nosy day scenaro are hgher). Heurstcs 2 and 3 perform better than Heurstcs 1, snce the number of VMs s optmally determned by the CA problem solutons. However, the LR mechansm s stll effectve snce allows to reduce costs by 4-12%. The fne graned resource allocaton ntroduced by Heurstcs 2 and 3 ndeed ends nto an over-provsonng and better performance, whle the LR mechansm allows to forward traffc spes to other locatons wthout

19 Fg. 6. VM nstances costs for the normal day scenaro. Fg. 7. VM nstances costs for the heavy day scenaro. Fg. 8. VM nstances costs for the nosy day scenaro. Fg. 9. VM nstances costs for the normal day scenaro. Fg. 10. VM nstances costs for the heavy day scenaro. Fg. 11. VM nstances costs for the nosy day scenaro. overcomng n any addtonal capacty allocaton or sgnfcant SLA volatons.

20 Alternatve soluton % Savngs Normal day Heavy day Nosy day Oracle 0.00 0.00 0.00 Heurstc 1 - (40%, 50%) 35.47 34.86 36.84 Heurstc 1 - (50%, 60%) 19.53 18.83 21.40 Heurstc 1 - (60%, 80%) 3.12 2.25 4.93 Heurstc 2 4.30 3.26 4.44 Heurstc 3 11.56 10.27 6.98 TABLE III VM PERCENTAGE COST SAVINGS OVER THE 24 HOURS OBTAINED BY OUR APPROACH. Fg. 12. Response tme threshold rato for a reference class, normal Fg. 13. Response tme threshold rato for a reference class, nosy day scenaro. day scenaro. C. Amazon EC2 Test The effectveness of our resource management algorthms has been also evaluated on a real prototype envronment deployed on Amazon EC2. We performed experments runnng the JSP mplementaton of the SPECweb2005 [52] benchmar. In partcular, we have consdered the banng worload, whch smulates the access to an on lne banng Web ste mplementng a full HTTPS load. SPECweb2005 ncludes four components: The load generators, the clent coordnator, the Web server, and the bac-end smulator. The SPECweb2005 load generators nject worload to the system accordng to a closed model. Users sessons are started accordng to a gven number of users who contnuously send requests for dynamc Web pages, wat for an average thn tme Z = 10s, and then access another page or leave the system accordng to

21 a pre-defned sesson profle 1. The clent coordnator ntalzes all the other systems, montors the test, and collects the results. The Web server s the component target of the performance assessment (Apache Tomcat 5.5.27 n our setup), whle the bac-end smulator emulates the database and applcaton parts of the benchmar and t s used to determne the dynamc content of the Web pages. The Web server has been deployed on a large nstance, whle the load generators, the clent coordnator, and the bac-end smulator have been hosted by extra-large Amazon nstances (n ths way we are guaranteed that they are not the system bottlenec). The test s performed by deployng VM nstances n Vrgna (Ste 1), North Calforna (Ste 2), and Europe (Ste 3) Amazon regons. We have obtaned an estmate of the maxmum servce rate parameters and the networ delay among dfferent Amazon stes by performng an extensve off lne proflng along the lnes of [46], [40] by collectng a set of statstcs and mnmzng the mean square error for the response tme. Fgure 14 shows how the adopted performance model fts the measured data, where D = 0.66 sec. and µ = 49.02 requests/sec. The maxmum percentage error on the average response tme estmaton s less than 20%. The networ transfer delay estmated for the three stes are reported n Table IV. Networ Transfer Delay Value [sec] d 1,2 d 2,1 0.20 d 1,3 d 3,1 0.29 d 2,3 d 3,2 0.40 TABLE IV NETWORK TRANSFER DELAY ESTIMATED AMONG EC2 AMAZON REGIONS. We set R = 0.6 seconds as the threshold for the average response tme and the overall test lasts two hours. We have generated an approprate traffc profle (reported n Fgures 15-17). In partcular Ste 1 s characterzed by a fluctuatng worload, whle at Stes 2 and 3 the worload has an almost lnear trend (wth some nose supermposed). We run the CA algorthm twce at the begnnng of each hour. The LR algorthm s run every 10 mnutes. The number of on demand VMs allocated at each ste for the two hours s reported n Table V. In every regon, the load s evenly shared among multple nstances at 1 Our optmzaton framewor s based on an open performance model: We have estmated the overall ncomng worload a pror as Λ = N /Z, snce n the consdered number of users range, VMs response tme was sgnfcantly lower than the user thn tme (we recall that, for the response tme law, N = (R + Z ) Λ ).

22 Fg. 14. Performance Model Data Fttng. each ste by regsterng the VMs wth an Amazon Elastc Load Balancer [4]. The load s redrected for 6 over the 12 tme ntervals and, sometmes, one ste concurrently executes the worload comng from the other two regons. Table VI detals the nature of the worload executed at each ste for every tme nterval. The overall ncomng worload to North Calforna s hgher than the others, especally durng the second hour. As a consequence, the traffc redrecton from Ste 2 towards the others s performed more frequently. Hour Ste 1 Ste 2 Ste 3 1 2 3 2 2 2 4 2 TABLE V NUMBER OF VMS ALLOCATED AT EACH AMAZON REGION. Fgures 18-20 show the overall traffc served at the three stes. Fnally, Fgure 21 reports the average response tme (evaluated every 10 seconds) and shows that our CA+LR algorthms are effectve snce the system satsfy SLA always but n two cases, and t s able to react to abrupt worload varatons. VII. RELATED WORK Wth the development of autonomc computng systems, dynamc resource allocaton technques have receved a great nterest both wthn the Industry and Academa. The solutons proposed can be classfed n centralzed and dstrbuted. In a centralzed approach, a dedcated entty s n charge of establshng

23 Fg. 15. Local Incomng Worload at Vrgna EC2 Ste. Fg. 16. Local Incomng Worload at North Calforna EC2 Ste. Fg. 17. Local Incomng Worload at Europe EC2 Ste. resource allocaton, admsson control or load balancng for the autonomc system and has a global nowledge of the resources state n the whole networ [12], [55]. Centralzed solutons are not sutable for geographcally dstrbuted systems, such as the cloud or more n general massvely dstrbuted systems [7], [34], [31], snce no one entty has global nformaton about all system resources. Indeed, the communcaton overhead requred to share the resource state nformaton s not neglgble and the delay to acheve state nformaton from remote nodes could lead a centralzed resource manager to very naccurate decsons due to dynamc changes n system condtons, as well as resource consumpton fluctuatons or unexpected events [34]. Dstrbuted resource management polces have been proposed to govern effcently geographcally dstrbuted systems that cannot mplement centralzed decsons and support strong nteractons among the remote nodes [7]. Dstrbuted resource management s very challengng, snce one node s decsons may nadvertently degrade the performance of the overall system, even f they greedly optmze the

24 LR Tme Instant Ste 1 Ste 2 Ste 3 1 Local Local Local 2 Local Local + Ste 1 Local + Ste 1 3 Local Local Local 4 Local + Ste 2 + Ste 3 Local + Ste 3 Local + Ste 2 5 Local Local Local 6 Local + Ste 2 Local Local + Ste 2 7 Local Local Local 8 Local + Ste 2 Local + Ste 1 Local + Ste 1 + Ste 2 9 Local + Ste 3 Local + Ste 3 Local 10 Local Local Local 11 Local Local Local 12 Local Local + Ste 1 Local + Ste 1 TABLE VI WORKLOAD EXECUTED AT EACH AMAZON REGION. performance of ts nodes. Sometmes, local decsons could lead the system even to unstable oscllatons [33]. It s dffcult to determne the best control mechansm at each node n solaton, so that the overall system performance s optmzed. Dynamcally choosng when, where and how allocate resources and coordnatng the resource allocaton accordngly s an open problem and s becomng more relevant wth the advances of clouds [31]. One of the frst contrbutons for resource management n geographcally dstrbuted systems has been proposed n [7], where novel autonomc dstrbuted load balancng algorthms have been proposed. The algorthms am to reduce the communcaton overhead among the system components and to gradually shft portons of requests among the dstrbuted nodes. Authors have proposed a trend-based actvaton scheme that s based on local system nformaton and explot cooperaton among other nodes. In the wor [49] mechansms to optmze performance wthn a geographcal node and to redrect requests to the best remote node have been proposed. In dstrbuted streamng networs, authors n [34] have proposed a jont admsson control and resource allocaton scheme, whle [57] has proposed optmal schedulng technques. Schedulng n streamng systems faces the problem that the ncomng worload would far exceed system capacty much of the tme. In cloud based systems, a jont soluton for the capacty allocaton and load balancng among multple IaaS stes based on Lagrangan decomposton technques has been presented

25 Fg. 18. Overall traffc served at Vrgna EC2 ste. Fg. 19. Overall traffc served at North Calforna EC2 ste. 1 R R 0.9 0.8 Response tme [sec] 0.7 0.6 0.5 0.4 0 100 200 300 400 500 600 700 Tme [10 sec] Fg. 20. Overall traffc served at Europe EC2 ste. Fg. 21. banng worload. Average response tme measured for the SPECweb2005 n [11], [10], whle n [48] a structured peer-to-peer networ based on dstrbuted hash tables has been proposed supportng servce dscovery, self-managng, and load-balancng of cloud based applcatons. In the dstrbuted resource management area, researches are borrowng some deas also from the bologcal world [50]. Bology-nspred technques have been used n self-aggregaton algorthms to establsh and mantan groups of software components that cooperate to reach a common goal [27], [45], to mplement performance and energy-aware vrtual machnes lve mgraton, and for the self-provsonng of cloud based applcatons [19]. To the best of our nowledge, ths paper s the frst contrbuton that proposes an analytcal soluton to the capacty allocaton and load redrect mechansm for cloud systems. Wth the focus on load redrect, a consderable amount of wor has been done facng the problem of sharng the load evenly n massvely dstrbuted Web systems [24]. Two-level dspatchng schemes, where clent requests are ntally assgned by the DNS, and each Web server may redrect a request to any

26 other server of the system through the HTTP redrecton mechansm, have been proposed n [9], [20]. DNS-based routng s the frst soluton that has been proposed to handle multple Web servers hostng a Web ste. It was orgnally conceved for locally dstrbuted Web systems even f now t s commonly used n geographcally dstrbuted Web systems [41]. Other request redrecton strateges that use the bult-n HTTP mechansm have been proposed n [17], [35]. Fnally, n [42] the URL rewrtng mechansm have been proposed. Such a mechansm ntegrated wth a multple-level DNS routng technque s also used by Content Delvery Networs, such as Aama [2]. Predcton algorthms have also an mportant role n the development of autonomc computng systems. A recent survey can be found n [22]. Predcton n a nosy context related to autonomc decsons does not requre just the applcaton of one sutable model, but a complex mechansm wth several alternatves. The presence of dozen of models, each wth many parameters, dsorents the choce because, to the best of our nowledge, there s no crtera for choosng whch model s better n a certan context, how to set the model parameters, whether nput data treatment s really necessary. For example, several models were proposed to forecast the resource state of servers that can be extended to Internet data center contexts. For example, the Lnear Regresson model was appled to Web-based systems [21], the Exponental Smoothng to the runnng tme of jobs [30], [51], the Holt s methods to the throughput of large TCP transfers [37], the Autoregressve and Autoregressve Integrated Movng Average to the load [29], [53] and networ traffc [47]; the tendency based predctors, such as the Cubc Splne, to the CPU load [43]. VIII. CONCLUSION In ths paper dstrbuted Capacty Allocaton and Load Redrect algorthms have been proposed amng at the mnmzaton of costs n nfrastructure as a servce cloud systems. The presented soluton acts at multple tme scales and ntegrates predcton technques wth non-lnear optmzaton models. A sound decomposton of the optmzaton problems has been provded. The effectveness of our approach has been assessed by performng smulaton and experments n a real prototype envronment runnng n Amazon EC2. Synthetc as well as realstc worloads and a number of dfferent scenaros of nterest have been consdered. A comparson wth other solutons proposed n the lterature shows that our solutons outperform alternatve methods provdng costs up to 35% lower, wthout ntroducng sgnfcant SLA volatons. Furthermore, our solutons are very closed to the ones found by an oracle wth perfect nowledge of the future. Ongong wor ams at extendng the optmzaton problem n order to support also the decson on the capacty of the runnng nstances and to model the use of prvate nfrastructures.

27 ACKNOWLEDGEMENT The expermentaton on Amazon EC2 has been supported by Amazon AWS n Educaton research grant. The wor of Danlo Ardagna and Barbara Pancucc has been partally supported by the GAME- IT research project funded by Poltecnco d Mlano and by the European Commsson, Programme IDEAS- ERC, Project 227977-SMScom. Sara Casolar and Mchele Colajann acnowledge the support of MIUR-PRIN project AUTOSEC Autonomc Securty. REFERENCES [1] B. Abraham and J. Ledolter. Statstcal Methods for Forecastng. John Wley and Sons, 1983. [2] Aama. http://www.aama.com. [3] J. Almeda, V. Almeda, D. Ardagna, I. Cunha, C. Francalanc, and M. Truban. Jont admsson control and resource allocaton n vrtualzed servers. Journal of Parallel and Dstrbuted Computng, 70(4):344 362, 2010. [4] Amazon Inc. Amazon Elastc Cloud. http://aws.amazon.com/ec2/. [5] Amazon Inc. AWS Elastc Beanstal. http://aws.amazon.com/elastcbeanstal/. [6] Amazon Inc. Elastc Load Balancng. http://aws.amazon.com/elastcloadbalancng/. [7] M. Andreoln, S. Casolar, and M. Colajann. Autonomc request management algorthms for geographcally dstrbuted nternet-based systems. In SASO, 2008. [8] M. Andreoln, S. Casolar, and M. Colajann. Models and framewor for supportng run-tme decsons n web-based systems. ACM Trans. on the Web, 2(3), 2008. [9] D. Andresen, T. Yanh, and O. H. Ibarra. Towards a scalable dstrbuted www server on networed worstatons. In Journal of Parallel and Dstrbuted Computng, volume 42, pages 91 100, 1997. [10] D. Ardagna, S. Casolar, and B. Pancucc. Flexble dstrbuted capacty allocaton and load redrect algorthms for cloud systems. In Proc. of the 4th Internatonal Conference on Cloud Computng (IEEE Cloud 2011). To Appear, 2011. [11] D. Ardagna, C. Ghezz, B. Pancucc, and M. Truban. Servce provsonng on the cloud: Dstrbuted algorthms for jont capacty allocaton and admsson control. In ServceWave, 2010. [12] D. Ardagna, B. Pancucc, M. Truban, and L. Zhang. Energy-Aware Autonomc Resource Allocaton n Mult-ter Vrtualzed Envronments. IEEE Trans. on Servces Computng, avalable on lne. [13] M. Arltt, D. Krshnamurthy, and J. Rola. Characterzng the scalablty of a large Web-based shoppng system. 1(1):44 69, Aug. 2001. [14] Y. Baryshnov, E. Coffman, G. Perre, D. Rubensten, M. Squllante, and T. Ymwadsana. Predctablty of Web server traffc congeston. In Proc. of the 10th Worshop of Web Content Cachng and Dstrbuton, Sopha Antpols, FR, Sep. 2005. [15] Y. Baryshnov, E. Coffman, G. Perre, D. Rubensten, M. Squllante, and Y. Ymwadsana. Predctablty of web server traffc congeston. In WCW Proc., 2005. [16] M. Bennan and D. Menascé. Resource Allocaton for Autonomc Data Centers Usng Analytc Performance Models. In IEEE Int l Conf. Autonomc Computng Proc., 2005. [17] T. Berners-Lee, R. Feldng, and H. Frysty. Hypertext transfer protocol http/1.0. RFC 1945, May 1996. [18] G. Bolch, S. Grener, H. de Meer, and K. Trved. Queueng Networs and Marov Chans. J. Wley, 1998.

28 [19] A. B. Caprarescu, N. M. Calcaveccha, E. D Ntto, and D. J. Dubos. Sos cloud: Self-organzng servces n the cloud. In BIONETICS 10: Proceedngs of the 5th Internatonal Conference on Bo-Inspred Models of Networ, Informaton and Computng Systems, 2010. [20] V. Cardelln, M. Colajann, and P. Yu. Request redrecton algorthms for dstrbuted web systems. Parallel and Dstrbuted Systems, IEEE Transactons on, 14(4):355 368, 2003. [21] S. Casolar and M. Colajann. Short-term predcton models for server management n nternet-based contexts. Elsever - Decson Support Systems, 48, 2009. [22] S. Casolar and M. Colajann. On the selecton of models for runtme predcton of system resources. Autonomc Systems, Sprnger (Eds. Danlo Ardagna, L Zhang), 2010. [23] L. Cherasova and P. Phaal. Sesson-Based Admsson Control: A Mechansm for Pea Load Management of Commercal Web Stes. IEEE Transactons on Computers, 51(6), June 2002. [24] M. Colajann, P. S. Yu, and V. Cardelln. Dynamc load balancng n geographcally dstrbuted heterogeneous web servers. In ICDCS, pages 295 302, 1998. [25] M. E. Crovella, M. S. Taqqu, and A. Bestavros. Heavy-taled probablty dstrbutons n the World Wde Web. In A Practcal Gude To Heavy Tals, pages 3 26. Chapman and Hall, New Yor, 1998. [26] J. Denns. A performance test of a run-based adaptve exponental smoothng. Producton and Inventory Management, 19, 1978. [27] E. D Ntto, D. Dubos, and R. Mrandola. Self-aggregaton algorthms for autonomc systems. In Bo-Inspred Models of Networ, Informaton and Computng Systems, 2007. Bonetcs 2007. 2nd, 2007. [28] M. D. Daaos, D. Katsaros, P. Mehra, G. Palls, and A. Vaal. Cloud Computng: Dstrbuted Internet Computng for IT and Scentfc Research. IEEE Internet Computng, 13(5):10 13, 2009. [29] P. Dnda and D. O Hallaron. Host load predcton usng lnear models. Cluster Computng, 3(4), Dec. 2000. [30] M. Dobber, G. Koole, and R. Van det Me. Dynamc load balancng experments n a grd. In Proc. of the 5th IEEE Internatonal Symposum on Cluster Computng and on the Grd, May 2005. [31] H. Erdogmus. Cloud computng: Does nrvana hde behnd the nebula? IEEE Softw., 26(2):4 6, 2009. [32] S. Everette and J. Gardner. Exponental smoothng: State of the art. Journal of Forecastng, 4, 1985. [33] P. Felber, T. Kaldewey, and S. Wess. Proactve hot spot avodance for web server dependablty. Relable Dstrbuted Systems, IEEE Symposum on, pages 309 318, 2004. [34] H. Feng, Z. Lu, C. H. Xa, and L. Zhang. Load sheddng and dstrbuted resource control of stream processng networs. Perform. Eval., 64(9-12):1102 1120, 2007. [35] R. T. Feldng, J. Gettys, J. C. Mogul, H. F. Frysty, L. Masnter, P. J. Leach, and T. Berners-Lee. Hypertext transfer protocol http/1.1. RFC 2616, June 1999. [36] P. E. Gll, W. Murray, and M. A. Saunders. SNOPT: An SQP algorthm for large-scale constraned optmzaton. SIAM Journal of Optmzaton, 12:979 1006, 2002. [37] Q. He, C. Dovrols, and M. Ammar. On the predctablty of large transfer tcp throughput. In Proc. of ACM SIGCOMM 2005, Aug. 2005. [38] J. Jung, B. Krshnamurthy, and M. Rabnovch. Flash crowds and denal of servce attacs: Characterzaton and mplcatons for CDNs and Web stes. In WWW2002 Proc., Honolulu, HW, May 2002. [39] M. Kendall and J. Ord. Tme Seres. Oxford Unversty Press, 1990.

29 [40] D. Kumar, A. N. Tantaw, and L. Zhang. Real-tme performance modelng for adaptve software systems wth mult-class worload. In MASCOTS, 2009. [41] T. T. Kwan, R. E. McGrath, and D. A. Reed. Ncsa s world wde web server: Desgn and performance. IEEE Computer, 28(11):68 74, 1995. [42] Q. L and B. Moon. Dstrbuted cooperatve aspache web server. In Proc. of 10th Int l World Wde Web Conf., May 2001. [43] Y. Lngyun, I. Foster, and J. M. Schopf. Homeostatc and tendency-based CPU load predctons. In Proc. of the 17th Parallel and dstrbuted processng Symp., Nce, FR, 2003. [44] D. A. Menascé and V. Dubey. Utlty-based QoS Broerng n Servce Orented Archtectures. In IEEE ICWS Proc., pages 422 430, 2007. [45] E. D. Ntto, D. Dubos, R. Mrandola, F. Saffre, and R. Tateson. Self-aggregaton technques for load balancng n dstrbuted systems. In Proceedngs of the 2008 Second IEEE Internatonal Conference on Self-Adaptve and Self-Organzng Systems, 2008. [46] G. Pacfc, W. Segmuller, M. Spretzer, and A. Tantaw. Cpu demand for web servng: Measurement analyss and dynamc estmaton. Perform. Eval., 65(6-7):531 553, 2008. [47] Y. Qao and P. Dnda. Networ traffc analyss, classfcaton, and predcton. Techncal report, 2003. [48] R. Ranjan, L. Zhao, X. Wu, A. Lu, A. Quroz, and M. Parashar. Peer-to-peer cloud provsonng: Servce dscovery and load-balancng. In N. Antonopoulos and L. Gllam, edtors, Cloud Computng, Computer Communcatons and Networs, pages 195 217. Sprnger London, 2010. [49] S. Ranjan and E. Knghtly. Hgh-performance resource allocaton and request redrecton algorthms for web clusters. IEEE Trans. Parallel Dstrb. Syst., 19:1186 1200, September 2008. [50] M. Shacleton and P. Marrow. Edtoral, specal ssue on nature-nspred computaton. BT Technology Journal, 2000. [51] K. Shum. Adaptve dstrbuted computng through competton. In Proc. of the 3th IEEE Internatonal Conference on Confgurable Dstrbuted System, May 1996. [52] Spec. The SPECWeb2005 benchmar. http://www.spec.org/web2005/. [53] N. Tran and D. Reed. Automatc ARIMA tme seres modelng for adaptve I/O prefetchngp. 15(4):362 377, Apr. 2004. [54] D. Trgg and A. Leach. Exponental smoothng wth an adaptve response rate. Operatonal Research Quarterly, 18, 1967. [55] B. Urgaonar, G. Pacfc, P. J. Shenoy, M. Spretzer, and A. N. Tantaw. Analytc modelng of multter Internet applcatons. ACM Transacton on Web, 1(1), January 2007. [56] D. Whybar. Comparson of adaptve forecastng technques. Logstcs Transportaton Revew, 8. [57] J. L. Wolf, N. Bansal, K. Hldrum, S. Pareh, D. Rajan, R. Wagle, K.-L. Wu, and L. Flescher. Soda: An optmzng scheduler for large-scale stream-based dstrbuted computer systems. In Mddleware, 2008. [58] A. Wole and G. Mexner. Twospot: A cloud platform for scalng out web applcatons dynamcally. In ServceWave, 2010. [59] X. Zhu, D. Young, B. Watson, Z. Wang, J. Rola, S. Snghal, B. McKee, C. Hyser, D.Gmach, R. Gardner, T. Chrstan, and L. Cherasova:. 1000 slands: An ntegrated approach to resource management for vrtualzed data centers. Journal of Cluster Computng, 12(1):45 57, 2009.

30 APPENDIX A Theorem 1 Proof. Let us ntroduce multplers L 1 for constrant (8), L 2 for the queue equlbrum condton (9), and L 3, L 4 for the non negatvty constrants on x and z respectvely (L 1 s unrestrcted n sgn, whle the other multplers have to be non-negatve). The Lagrangan for (LR2 ) problem s L(x, z, L 1, L 2, L 3, L 4 ) = D x + (N + M) ( x + λ) ) C µ (N + M) (x + λ) + ( I 1) z G L 1 ( Λ x z) L2 (C µ (N + M) x λ) L3 x L 4 z. The KKT condtons provde the followng set of equatons to be solved together wth the feasblty of x and z for problem (LR2 ): L x = D + C µ (N+M) ( 2 C µ (N+M) (x+ λ) ) 2 + L 1 + L 2 L 3 = 0 L z = ( I 1) G + L 1 L 4 = 0 L 1 ( Λ x z) = 0 L 2 (C µ (N + M) x λ) = 0 L 3 x = 0 L 4 z = 0 Frst of all, note that we have always L 2 = 0, snce the queue equlbrum condton s a strct nequalty. Let us now examne the possble cases: x > 0, z > 0: If we assume x > 0 and z > 0 then, for the non negatvty slac condtons, we get L 3 = L 4 = 0. From the second KKT system equaton we obtan L 1 = ( I 1) G derve equaton (12). Fnally from (8) we get case a equaton (13). and then we can x = Λ: If we set x = Λ, then from equaton (8) we get z = 0. The soluton s feasble ff Λ + λ < C µ (N + M) holds. Hence, we get case b condtons. z = Λ: If we set z = Λ, from equaton (8) we get x = 0, and hence the local ncomng worload s totally redrected. The soluton s feasble ff λ < C µ (N + M) holds. Hence, we get case c condtons. x = 0, 0 z < Λ or 0 x < Λ, z = 0: Under these assumptons no soluton can be obtaned snce equaton (8) can not hold.