Control-theoretical load-balancing for cloud applications with brownout

Control-theoretcal load-balancng for cloud applcatons wth brownout Jonas Dürango, Manfred Dellkrantz, Martna Maggo, Crstan Klen 2, Alessandro Vttoro Papadopoulos, Francsco Hernández-Rodrguez 2, Erk Elmroth 2 and Karl-Erk Årzén Department of Automatc Control, Lund Unversty, Sweden, 2 Department of Computng Scence, Umeå Unversty, Sweden Abstract Cloud applcatons are often subject to unexpected events lke flash crowds and hardware falures. Wthout a predctable behavour, users may abandon an unresponsve applcaton. Ths problem has been partally solved on two separate fronts: frst, by addng a self-adaptve feature called brownout nsde cloud applcatons to bound response tmes by modulatng user experence, and, second, by ntroducng replcas copes of the applcatons havng the same functonaltes for redundancy and addng a load-balancer to drect ncomng traffc. However, exstng load-balancng strateges nterfere wth brownout self-adaptvty. Load-balancers are often based on response tmes, that are already controlled by the self-adaptve features of the applcaton, hence they are not a good ndcator of how well a replca s performng. In ths paper, we present novel load-balancng strateges, specfcally desgned to support brownout applcatons. They base ther decson not on response tme, but on user experence degradaton. We mplemented our strateges n a selfadaptve applcaton smulator, together wth some state-of-theart solutons. Results obtaned n multple scenaros show that the proposed strateges brng sgnfcant mprovements when compared to the state-of-the-art ones. I. INTRODUCTION Cloud computng has dramatcally changed the management of computng nfrastructures. On one hand, publc nfrastructure provders, such as Amazon EC2, allow servce provders, such as Dropbox and Netflx, to deploy ther servces on large nfrastructures wth no upfront cost [9], by smply leasng computng capacty n the form of Vrtual Machnes (VMs). On the other hand, the flexblty offered by cloud technologes, whch allow VMs to be hosted by any Physcal Machne (PM) (or server), favors the adopton of prvate clouds [7]. Therefore, self-hostng servce provders themselves are convertng ther computng nfrastructures nto small clouds. One of the man ssues wth cloud computng nfrastructures s applcaton robustness to unexpected events. For example, flash-crowds are sudden ncrements of end-users, that may rase the requred capacty up to fve tmes [7]. Smlarly, hardware falures may temporarly reduce the capacty of the nfrastructure, whle the falure s repared [5]. Also, unexpected performance degradatons may arse due to workload consoldaton and the resultng nterference among co-located applcatons [27]. Due to the large magntude and short duraton of such events, t may be economcally too Correspondng author: jonas.durango@control.lth.se. Ths work was partally supported by the Swedsh Research Councl (VR) for the projects Cloud Control and Power and temperature control for large-scale computng nfrastructures, and through the LCCC Lnnaeus and ELLIIT Excellence Centers. costly to keep enough spare capacty to properly deal wth them. As a result, unexpected events may lead to nfrastructure overload, that translates to unresponsve servces, leadng to dssatsfed end-users and revenue loss. Cloud servces therefore greatly beneft from selfadaptaton technques [35], such as brownout [2, 25]. A brownout servce adapts tself by reducng the amount of computatons t executes to serve a request, so as to mantan response tme around a gven setpont. In essence, some computatons are marked as mandatory for example, dsplayng product nformaton n an e-commerce webste whle others are optonal for example, recommendng smlar products. Whenever an end-user request s receved, the servce can choose to execute the optonal code or not accordng to ts avalable capacty, and to the prevously measured response tmes. Note that executng optonal code drectly translates nto a better servce for the end-user and more revenue for the servce provder. Ths approach has proved to be successful for dealng wth unexpected events [2]. However, there, brownout servces were composed of a sngle replca,.e., a sngle copy of the applcaton, runnng nsde a sngle VM. In ths paper, we extend the brownout paradgm to servces featurng multple replcas.e., multple, ndependent copes of the same applcaton, servng the user the same data hosted nsde ndvdual VMs. Snce each VM can be hosted by dfferent PMs, ths enhances brownout servces n two drectons. Frst, scalablty of a brownout applcaton the ablty for an applcaton to deal wth more users by addng more computng resources s mproved, snce applcatons are no longer lmted to usng the resources of a sngle PM. Second, reslence s mproved: n case a PM fals, takng down a replca, other replcas whose VMs are hosted on dfferent PMs can seamlessly take over. The component that decdes whch replca should serve a partcular end-user request s called a load-balancer. Despte the fact that load-balancng technques have been wdely studed [5, 23, 24, 29], state-of-the-art load-balancers forward requests based on metrcs that cannot dscrmnate between a replca that s avodng overload by not executng the optonal code and a replca that s not subject to overload. Therefore, the novelty of our problem conssts n fndng a brownout-complant load-balancng technque that s aware of each replca s self-adaptaton mechansm. The contrbuton of ths paper s summarzed as follows. We present extensons to load-balancng archtectures and the requred enhancements to the replcas that convey nformaton about served optonal content and allow

to deal wth brownout servces effcently (Secton III). We propose novel load-balancng algorthms that, by recevng nformaton about the adaptaton happenng at the replca level, try to maxmze the performance of brownout servces, n terms of frequency of executon of the optonal code (Secton IV). We show through smulatons that our brownout-aware load-balancng algorthms outperform state-of-the-art technques (Secton V). II. RELATED WORK Load-balancers are standard components of Internet-scale servces [4], allowng applcatons to acheve scalablty and reslence [5, 8, 4]. Many load-balancng polces have been proposed, amng at dfferent optmzatons, spannng from equalzng processor load [37] to managng memory pools [3, 32], to specfc optmzatons for teratve algorthms [4]. Often load-balancng polces consder web server systems as a target [, 26], where one of the most mportant result s to bound the maxmum response tme that the clents are exposed to [9]. Load-balancng strateges can be guded by many dfferent purposes, for example geographcal [2, 33], drven by the electrcty prce to reduce the datacenter operaton cost [5], or specfcally desgned for cloud applcatons [5, 23, 24]. Load-balancng solutons can be dvded nto two dfferent types: statc and dynamc. Statc load-balancng refers to a fxed, non-adaptve strategy to select a replca to drect traffc to [3, 38]. The most commonly used technque s based on selectng each replca n turn, called Round Robn (RR). It can be ether determnstc, storng the last selected replca, or probablstc, pckng a replca at Random. However, due to ther statc nature, such technques would not have good performance when appled to brownout-complant applcatons as they do not take nto account the nherent fluctuatons of a cloud envronment and the control strategy at the replca level, whch leads to changng capabltes of replcas. On the contrary, dynamc load-balancng s based on measurements of the current system s state. One popular opton s to choose the replca whch had the lowest response tme n the past. We refer to ths algorthm as Fastest Replca Frst (FRF) f the choce s based on the last measured response tme of each replca, and FRF-EWMA f the choce s based on an Exponentally Weghted Movng Average over the past response tmes of each replca. A varaton of ths algorthm s Two Random Choces (2RC) [28], that randomly chooses two replcas and assgns the request to the fastest one,.e., the one wth the lowest maxmum response tme. Through expermental results, we determned that FRF, FRF-EWMA and 2RC are unsutable for brownout applcatons. They base ther decson on response tmes alone, whch leads to neffcent decsons for brownout servces. Indeed, such servces already keep ther response-tme at a gven setpont, at the expense of reducng the rato of optonal content served. Hence, by measurng response-tme alone, t s not possble to dscrmnate between a replca that s avodng overload by not executng the optonal code and a replca that s not subject to overload executng all optonal code, both achevng the desred response tmes. Another adopted strategy s based on the pendng request count and generally called Shortest Queue Frst (SQF), where the load-balancer tracks the pendng requests and select the replcas wth the least number of requests watng for completon. Ths strategy pays off n archtectures where the replcas have smlar capactes and the requests are homogeneous. To account for non-homogenety, Pao and Chen proposed a load balancng soluton usng the remanng capacty of the replcas to determne how the next request should be managed [3]. The capacty s determned through a combnaton of factors lke the remanng avalable CPU and memory, the network transmsson and the current pendng request count. Other approaches have been proposed that base ther decson on remanng capacty. However, due to the fact that brownout applcatons ndrectly control CPU utlzaton, by adjustng the executon of optonal content, so as to prepare for possble request bursts, decdng on remanng capacty alone s not an ndcator of how a brownout replca s performng. A merge of the fastest replca and the pendng request count approach was mplemented n the BIG-IP Local Traffc Manager [6], where the replcas are ranked based on a lnear combnaton of response tmes and number of routed requests. Snce the exact specfcaton of ths algorthm s not open, we tred to mmc as follows: A Predctve load balancer would rank the replcas based on the dfference between the past metrcs and the current ones. One of the solutons proposed n ths paper extends the dea of lookng at the dfference between the past behavor and the current one, although our soluton observes the changes n the rato of optonal code served and tres to maxmze the requests served enablng the full computaton. Dynamc solutons can be control-theoretcal [2, 42] and also account for the cost of applyng the control acton [4] or for the load trend [2]. Ths s especally necessary when the load balancer also acts as a resource allocator decdng not only where to route the current request but also how much resources t would have to execute, lke n [3]. In these cases, the nduced sudden lack of resources can result n poor performance. However, we focus only on load-balancng solutons, snce brownout applcatons are already takng care of the potental lack of resources [2]. III. PROBLEM STATEMENT Load-balancng problems can be formulated n many ways. Ths s especally true for the case addressed n ths paper where the load-balancer should dstrbute the load to adaptve enttes, that play a role by themselves n adjustng to the current stuaton. Ths secton dscusses the characterstcs of the consdered nfrastructure and clearly formulates the problem under analyss. Fgure llustrates the software archtecture that s deployed to execute a brownout-complant applcaton composed of multple replcas. Despte the modfcatons needed

t λ replca controller λ θ clents load-balancer λ n.. t n replca n controller n θ n Fg.. Archtecture of a brownout-complant cloud applcaton featurng multple replcas. to make t brownout-complant, the archtecture s wdely accepted as the reference one for cloud applcatons [5]. Gven the generc cloud applcaton archtecture, access can only be done through the load-balancer. The clents are assumed to be closed-loop: They frst send a request, wat for the reply, then thnk by watng for an exponentally dstrbuted tme nterval, and repeat. Ths clent model s a farly good approxmaton for users that nteract wth webstes requrng a pre-defned number of requests to complete a goal, such as buyng a product [6] or bookng a flght. The resultng traffc has an unknown but measurable rate λ. Each clent request s receved by the load-balancer, that sends t to one of the n replcas. The chosen replca produces the response and sends t back to the load-balancer, whch forwards t to the orgnal clent. We measure the response tme of the request as the tme spent wthn the replca, assumng neglgble tme s taken for the load-balancer executon and for the routng tself. Snce the responses are routed back to the load-balancer, t s possble to attach nformaton to be routed back to ad balancng decsons to t. Each replca receves a fracton λ of the ncomng traffc and s a stand-alone verson of the applcaton. More specfcally, each replca receves requests at a rate λ = w λ, such that w, and w =. In ths case, the load balancer smply computes the replca weghts w accordng to ts load-balancng polcy. Specal to our case s the presence of a controller wthn each replca [2]. Ths controller receves perodc measurements of the response tme t of the requests served by the replca, and adjusts the percentage of requests θ served wth optonal components. Here t s the 95-th percentle of the response tmes for a control perod. Followng the approach of [2], we model the response tmes from a replca as t k+ = α k θ k where α k s an unknown parameter estmated onlne (detals omtted here). The control loop s then closed usng the PI controller θ k+ = θ k + p ˆα k e k+ where e k+ s the control error and p the closed-loop pole. As the controller output s restrcted, ant-wndup measures are employed. In our experments, p s set to.99, the replca control perod s to.5s, whle the load-balancer acts every second. As gven by the brownout paradgm, a replca responds to requests ether partally, where only mandatory content s ncluded n the reply, or fully, where both mandatory and optonal content s ncluded. Ths decson s taken ndependently for each request wth a probablty θ for success. The servce rate for a partal response s µ whle a full response s generated wth a rate M. Obvously, partal reples are faster to compute than full ones, hence, µ M. Assumng the replca s not saturated, t serves requests fully at a rate λ θ and partally at a rate λ ( θ ). Many alternatves can be envsoned on how to extend exstng load balancers to deal wth brownout-complant applcatons. In our choce, the load-balancer receves nformaton about θ from the replcas. Ths soluton results n less computatonally ntensve load-balancers wth respect to the case where the load-balancer should somehow estmate the probablty of executng the optonal components, but requres addtonal communcaton. The overhead, however, s very lmted, snce only one value would be reported per replca. For the purpose of ths paper, we assume that to ad load-balancng decsons, each replca pggy-backs the current value of θ through the reply, so that ths value can be observed by the load-balancer, lmtng the overhead. The load-balancer does not have any knowledge on how each replca controller adjusts the percentage θ, t only knows the reported value. Ths allows to completely separate the acton of the load- balancer from the one of the self-adaptve applcaton. Gven ths last archtecture, we want to solve the problem of desgnng a load-balancer polcy. Knowng the values of θ for each replca [,n], a load-balancer should compute the values of the weghts w such that k= w (k)θ (k) () s maxmzed, where k denotes the dscrete tme. Gven that we have no knowledge of the evoluton n tme of the nvolved quanttes, we am to maxmze the quantty w θ n every tme nstant, assumng that ths wll maxmze the quantty defned n Equaton (). In other words, the loadbalancer should maxmze the rato of requests served wth the optonal part enabled. For that, the am s to maxmze the rato of optonal components served n any tme nstant. In practce, ths would also maxmze the applcaton owner s revenue [2]. IV. SOLUTION Ths secton descrbes three dfferent solutons for balancng the load drected to self-adaptve brownout-complant applcatons composed of multple replcas. The frst two strateges are heurstc solutons that take nto account the self-adaptvty of the replcas. The thrd alternatve s based on optmzaton, wth the am of provdng guarantees on the best possble behavor. A. Varatonal prncple-based heurstc (VPBH) Our frst soluton s nspred by the predctve approach descrbed n Secton II. The core of the predctve soluton s to examne the varaton of the nvolved quanttes. Whle n ts classcal form, ths soluton reles on varatons of response

tmes or pendng request count per replca, our soluton s based on how the control varables θ are changng. If the percentage θ of optonal content served s ncreasng, the replca s assumed to be less loaded, and more traffc can be sent to t. On the contrary, when the optonal content decreases, the replca wll receve less traffc, to decrease ts load and allow t to ncrease θ. The replca weghts w are ntalzed to /n where n s the number of replcas. The load-balancer perodcally updates the values of the weghts based on the values of θ receved by the replcas. At tme k, denotng wth θ (k) the varaton θ (k) θ (k ), the soluton computes a potental weght w (k + ) accordng to w (k + ) = w (k) [ + γ P θ (k) + γ I θ (k)], (2) where γ P and γ I are constant gans, respectvely related to a proportonal and an ntegral load-balancng acton. As calculated, w values can be negatve. Ths s clearly not feasble, therefore negatve values are truncated to a small but stll postve weght ε. Usng a postve weght nstead of zero allows us to probe the replca and see whether t s favorably respondng to new ncomng requests or not. Moreover, the computed values do not respect the constrant that ther sum s equal to, so they are then re-scaled accordng to w (k) = max( w (k),ε) max( w (k),ε). (3) We selected γ P =.5 based on expermental results. Once γ P s fxed to a selected value, ncreasng the ntegral gan γ I calls for a stronger acton on the load-balancng sde, whch means that the load-balancer would take decsons very much nfluenced by the current values of θ, therefore greatly mprovng performance at the cost of a more aggressve control acton. On the contrary, decreasng γ I would smoothen the control sgnal, possbly resultng n performance loss due to a slower reacton tme. The choce of the ntegral gan allows to explot the trade-off between performance and robustness. For the experments we chose γ I = 5.. B. Equalty prncple-based heurstc (EPBH) The second polcy s based on the heurstc that a nearoptmal stuaton s when all replca serves the same percentage optonal content. Based on ths assumpton, the control varables θ should be as close as possble to one another. If the values of θ converge to a sngle value, ths means that the traffc s routed so that each replca can serve the same percentage of optonal content,.e., a more powerful replca receves more traffc then a less powerful one. Ths approach therefore selects weghts that encourages the control varables θ to converge towards the mean n j θ j. The polcy computes a potental weght w (k + ) ( w (k + ) = w (k) + γ e θ (k) ) n θ j (k) (4) j where γ e s a strctly postve parameter whch accounts for how fast the algorthm should converge. For the experments we chose γ e =.25. The weghts are smply modfed proportonally to the dfference between the current control value and the average control value set by the replcas. Clearly, the same saturaton and normalzaton descrbed n Equaton (3) has to be appled to the proposed soluton, to ensure that the sum of the weghts s equal to one and that they have postve values.e., that all the ncomng traffc s drected to the replcas and that each replca receves at least some requests. C. Convex optmzaton based load-balancng (COBLB) The thrd approach s to update the replca weghts based on the soluton of an optmzaton problem, where the objectve s to maxmze the quantty w θ. In ths soluton, each replca s modeled as a queung system usng a Processor Sharng (PS) dscplne. The clents are assumed to arrve accordng to a Posson process wth ntensty λ, and wll upon arrval enter the queue where they wll receve a share of the replcas processng capablty. The smplest queueng models assume the requred tme for servng a request to be exponentally dstrbuted wth rate µ. However, n the case of brownout, the requests are served ether wth or wthout optonal content wth rates M and µ, respectvely. Therefore the dstrbuton of servce tmes S for the replcas can be modelled as a mxture of two exponental dstrbutons wth a probablty densty functon f S (t) accordng to f S (t) = ( θ ) µ e µ t + θ M e M t, (5) where t represents the contnuous tme and θ s the probablty of actvatng the optonal components. Thus, a request enterng the queue of replca wll receve an exponentally dstrbuted servce tme wth a rate wth probablty θ beng M, and probablty θ beng µ. The resultng queueng system model s of type M/G//PS and has been proven sutable to smulate the behavor of web servers []. It s known that for M/G/ queueng systems adoptng the PS dscplne, the mean response tmes wll depend on the servce tme dstrbuton only through ts mean [22, 34], here gven for each replca by µ = [ E[S ] = θ + θ ]. (6) µ M The mean response tmes for a M/G//PS system themselves are gven by τ = µ. (7) λw The requred servce rates µ needed to ensure that there s no statonary error can be obtaned by nvertng Equaton (7) µ = + τ λw wth τ beng the set pont for the response tme of replca. Combnng Equaton (6) and (8), t s then possble to calculate the steady-state control varables θ that gves the desred behavor θ = M (µ τ λw τ ) ( + λw τ ) (µ M ) = A B w. (9) C + D w wth A, B, C and D all postve. Note that the values of θ are not used n the replcas and are smply computed by the τ (8)

optmzaton based load-balancer as the optmal statonary condtons for the control varables θ. Clearly, one could also thnk of usng these values wthn the replcas but n ths nvestgaton we want to completely separate the loadbalancng polcy and the replcas nternal control loops. Recallng that θ s the probablty of executng the optonal components when producng the response, the values θ should be constraned to belong to the nterval [, ], yeldng the followng nequaltes (under the reasonable assumptons that τ > /M and µ M ) A C B + D w A B. () Usng these nequaltes as constrants, t s possble to formally state the optmzaton problem as max w J = w θ = s.t. w =, A C w A B w C + D w () w A. B + D B Snce the objectve functon J s concave and the constrants lnear n w, the entre problem s concave and can be solved usng effcent methods [8]. We use an nteror pont algorthm, mplemented n CVXOPT, a Python lbrary for convex optmzaton problems, to obtan the values of the weghts. Notce that solvng optmzaton problem () guarantees that the best possble soluton s found for the sngle tme nstant problem, but requres a lot of knowledge about the sngle replcas. In fact, whle other solutons requre knowledge only about the ncomng traffc and the control varables for each replca, the optmzaton-based soluton reles on knowledge of the servce tme of requests wth and wthout optonal content M and µ that mght not be avalable and could requre addtonal computatons to be estmated correctly. V. EVALUATION In ths secton we descrbe our expermental evaluaton, dscussng the performance ndcators used to compare dfferent strateges, the smulator developed and used to emulate the behavor of brownout-complant replcas drven by the load-balancer, and our case studes. A. Performance ndcators Performance measures are necessary to objectvely compare dfferent algorthms. Our frst performance ndcator s defned as the percentage % oc of the total requests served wth the optonal content enabled, whch s a reasonable metrc gven that we assume that users perform a certan number of clcks to use the applcaton. We also would lke to ntroduce some other performance metrcs to compare the mplemented load-balancng technques. For ths, we use the user-perceved stablty σ u [2]. Ths metrc refers to the varaton of performance as observed by the users, and t s measured as the standard devaton of http://cvxopt.org/ response tmes. Its purpose s to measure the ablty of the replcas to respond tmely to the clent requests. The entre brownout framework ams at stablzng the response tmes, therefore t should acheve better user-perceved stablty, regardless of the presence of the load-balancer. However, the load-balancng algorthm clearly nfluences the perceved response tmes, therefore t s logcal to check whether the newly developed algorthms acheve a better perceved stablty than the classcal ones. Together wth the value of the user-perceved stablty, we also report the average response tme µ u to dstngush between algorthms that acheve a low response tme wth possbly hgh fluctuatons from solutons that acheve a hgher but more stable response tme. B. Smulator To test the load-balancng strateges, a Python-based smulator for brownout-complant applcatons s used. In the smulator, t s easy to plug-n new load-balancng algorthms. The smulator s based on the concepts of Clent, Request, LoadBalancer and Replca. When a new clent s defned, t can behave accordng to the open-loop clent model, where t smply ssues a certan number of unrelated requests (as t s true for clents that respect the Markovan assumpton), or accordng to the closed-loop one [, 36]. Closed-loop clents ssue a request and wat for the response, when they receve the response they thnk for some tme (n the smulatons ths tme s exponentally dstrbuted wth mean s) and subsequently contnue sendng another request to the applcaton. Whle ths second model s more realstc, the frst one s stll useful to smulate the behavor of a large number of clents. The smulator mplements both models, to allow for complete tests, but we wll evaluate our results wth closed-loop clents gven the nature of the applcatons, that requres users to perform a certan number of clcks. Requests are receved by the load-balancer, that drects them towards dfferent replcas. The load-balancer can work on a per-request bass or based on weghts. The frst case s used to smulate polces lke Round Robn, Random, Shortest Queue Frst and so on, that do not rely on the concept of weghts. The weghted load-balancer s used to smulate the strateges proposed n ths paper. Each replca smulates the computaton necessary to serve the request and chooses f t should be executed wth or wthout the optonal components actvated. If the optonal content s served the servce tme s a random number from a gaussan dstrbuton wth mean φ and varance., whle f the optonal content s not served, the mean s ψ and the varance s.. The parameters φ and ψ are specfed when replcas are created and can be changed durng the executon. The servce rate of requests wth the optonal component s M = /φ whle for servng only the mandatory part of the request the servce rate s µ = /ψ. The replcas are also executng an nternal control loop to select ther control varables θ [2]. The replcas use PS to process the requests n the queue, meanng that each of the n actve

requests wll get /n of the processng capablty of the replca. The smulator receves as nput a Scenaro, whch descrbes what can happen durng the smulaton. The scenaro defnton supports the nserton of new clents and the removal of exstng ones. It also allows to turn on and off replcas at specfc tmes durng the executon and to change the servce tmes for every replca, both for the optonal components and for the mandatory ones. Ths smulates a change n the amount of resources gven to the machne hostng the replca and t s based on the assumpton that these changes are unpredctable and can happen at the archtecture level, for example due to the cloud provder colocatng more applcatons onto the same physcal hardware, therefore reducng ther computaton capablty [39]. Wth the scenaros, t s easy to smulate dfferent workng condtons and to have a complete overvew of the changes that mght happen durng the load-balancng and replca executon. In the followng, we descrbe two experments conducted to compare the load-balancng strateges when subject to dfferent executon condtons. C. Reactng to clent behavor The am of the frst test s to evaluate the performance of dfferent algorthms when new clents arrve and exstng clents dsconnect. In the experment the nfrastructure s composed of four replcas. The frst replca s the fastest one and has φ =.5s (average tme to execute both the mandatory and the optonal components) and ψ =.5s (average tme to compute only the mandatory part of the response). The second replca s slower, wth φ 2 =.25s and ψ 2 =.25s. The thrd and fourth replcas are the slowest ones, havng φ 3,4 =.5s and ψ 3,4 =.5s. Clents adhere to the closed-loop model. 5 clents are accessng the system at tme s, and of them are removed after 2s. At tme 4s, 25 more clents query the applcaton and 25 more arrves agan at 6s. 4 clents dsconnect at tme 8s and the smulaton s ended at tme s. The rght column n Fgure 2 shows the control varable θ for each replca, whle the left column shows the effectve weghts w,.e., the weghts that have been assgned by the load-balancng strateges computed a posteror. Snce solutons lke RR do not assgn drectly the weghts, we decded to compute the effectve values that can be found after the load-balancng assgnments. The algorthms are ordered by decreasng percentage % oc of optonal content served, where EPBH acheves the best percentage overall, followed by VPBH and by COBLB. For ths scenaro, the strateges that are brownout-aware acheve better results n terms of percentage of optonal content served. The SQF algorthm s the only exstng one capable of achevng smlar (yet lower) performance n terms of optonal content delvered. The scenaro also llustrates the beneft of usng a brownout-aware strategy, as there s a constant underutlzaton of replca for SQF. To analyze the effect of the load-balancng strateges on EPBH 8.9% VPBH 78.9% COBLB 78.% SQF 67.% FRF-EWMA 6.8% 2RC 5.4% FRF 47.9% Random 4.2% RR 4.% Predctve 26.9%.5.5.5.5.5.5.5.5.5.5 w 2 4 6 8 t [sec] θ 2 4 6 8 t [sec] Fg. 2. Results of a smulaton wth four replcas and clents enterng and leavng the system at dfferent tme nstants. The left column shows the effectve weghts whle the rght column shows the control varables for each replca. The frst replca s shown n black sold lnes, the second n blue dashed lnes, the thrd n green dash-dotted lnes, and the fourth n red dotted lnes.

3 2 EPBH, 8.9% VPBH, 78.9% COBLB, 78.% SQF, 67.% FRF-EWMA, 6.8% 2RC, 5.4% FRF, 47.9% Random, 4.2% RR, 4.% Predctve, 26.9% Fg. 3. Box plots of the maxmum response tme n all the replcas for every control nterval. Each box shows from the frst quartle to the thrd. The red lne shows the medan; outlers are represented wth red crosses whle the black dots ndcate the average value (also consderng the outlers). the replcas response tmes, Fgure 3 shows box plots of the maxmum response tme experenced by the replcas. The load-balancng strateges are ordered from left to rght based on the percentage of optonal code % oc acheved. The bottom lne of each box represents the frst quartle, the top lne the thrd and the red lne s the medan. The red crosses show the outlers. In addton to the classcal box plot nformaton, the black dots show for each algorthm the average value of the maxmum response tme measured durng the experment, also consderng the outlers. The box plots clearly show that all the solutons presented n ths paper acheve dstrbutons that have outlers, as well as almost all the lterature ones. The only excepton seems to be SQF, that acheves very few outlers, predctable maxmum response tme, wth a medan that s just slghtly hgher than the one acheved by VPBH. EPBH offers the hghest percentage of optonal content served, by sacrfcng the response tme bound. From ths addtonal nformaton one can conclude that the solutons presented n ths paper should be tuned carefully f response tme requrements are hard. For example, for certan tasks, users prefer a very responsve applcatons nstead of many features, hence the revenue of the applcaton owner may be ncreased through lower response tmes. Notce that the proposed heurstcs (EPBH and VPBH) have tunable parameters that can be used to explot the trade-off between response tme bounds and optonal content. Ths case study features only a lmted number of replcas. However, we have conducted addtonal tests, also n more complex scenaros, featurng up to 2 replcas, reportng results smlar to the ones presented heren. In the next secton we test the effect of nfrastructural changes to loadbalancng solutons and response tmes. D. Reactng to nfrastructure resources In the second case study the archtecture s composed of fve replcas. At tme s, the frst replca has φ =.7s, TABLE I PERFORMANCE WITH VARIABLE INFRASTRUCTURE RESOURCES Algorthm % oc µ u σ u COBLB 9.9%.78.97 EPBH 89.5%.6.95 VPBH 87.7%.2.9 SQF 83.3%.55.4 RR 75.5%. 2.42 Random 72.9%.86 2.23 2RC 72.2%.74.64 FRF 7.4%.27 2.3 FRF-EWMA 5.4%.44 3.4 Predctve 47.4%.66 3.48 ψ =.s. The second and thrd replcas are medum fast, wth φ 2,3 =.4s and ψ 2,3 =.2s. The fourth and ffth replcas are the slowest wth φ 4,5 =.7s and ψ 4,5 =.s. At tme 25s the amount of resources assgned to the frst replca s decreased, therefore φ =.35s and ψ =.5s. At tme 5s, the ffth replca receves more resources, achevng φ 5 =.7s and ψ 5 =.s. The same happens at tme 75 to the fourth replca. Table I reports the percentage % oc, the average response tme and the user-perceved stablty for the dfferent algorthms. It should be noted agan that our strateges obtan better optonal content served at the expense of slghtly hgher response tmes. However, COBLB s capable of obtanng both low response tmes and hgh percentage of optonal content served. Ths s due to the amount of nformaton that t uses, snce we assume that the computaton tmes for mandatory and optonal part are known. The optmzatonbased strategy s capable of reactng fast to changes and acheves predctablty n the applcaton behavor. Agan, f one does not have all the necessary nformaton avalable, t s possble to mplement strateges that would better explot the trade-off between bounded response tme and optonal content. VI. CONCLUSION We have revsted the problem of load-balancng dfferent replcas n the presence of self-adaptvty nsde the applcaton. Ths s motvated by the need of cloud applcatons to wthstand unexpected events lke flash crowds, resource varatons or hardware changes. To fully address these ssues, load-balancng solutons need to be combned wth selfadaptve applcatons, such as brownout. However, smply combnng them wthout specal support leads to poor performance. Three load-balancng strateges are descrbed, specfcally desgned to support brownout-complant cloud applcatons. The expermental results clearly show that ncorporatng the applcaton adaptaton n the desgn of load balancng strateges pay off n terms of predctable behavor and maxmzed performance. They also demonstrated that the SQF algorthm s the best non-brownout-aware soluton and therefore t should be used whenever t s not possble to adopt one of our proposed soluton. The granularty of the actuaton of the SQF load-balancng strategy s on a per-request based and the used nformaton are much more updated wth respect to the current nfrastructure status, whch s an advantage compared

to weght-based solutons and helps SQF to serve requests faster. In future work we plan to nvestgate brownout-aware per-request solutons. Fnally, the applcaton model used n ths paper assumes a fnte number of clcks per user, therefore the developed load-balancer strateges maxmze the percentage of optonal content served. However, when a dfferent applcaton model s taken nto account, optmzng the absolute number of requests served wth optonal content s another possble goal, that should be nvestgated n future work. REFERENCES [] F. Alomar and D. Menascé. Effcent Response Tme Approxmatons for Multclass Fork and Jon Queues n Open and Closed Queung Networks. In: Parallel and Dstrbuted Systems, IEEE Transactons on 99 (23), pp. 6. [2] M. Andreoln, S. Casolar, and M. Colajann. Autonomc Request Management Algorthms for Geographcally Dstrbuted Internet- Based Systems. In: SASO. 28. [3] D. Ardagna, S. Casolar, M. Colajann, and B. Pancucc. Dual Tme-scale Dstrbuted Capacty Allocaton and Load Redrect Algorthms for Clouds. In: J. Parallel Dstrb. Comput. 72.6 (22). [4] J. M. Bah, S. Contassot-Vver, and R. Couturer. Dynamc Load Balancng and Effcent Load Estmators for Asynchronous Iteratve Algorthms. In: IEEE Trans. Parallel Dstrb. Syst. 6.4 (Apr. 25). [5] L. A. Barroso and U. Hölzle. The Datacenter as a Computer: An Introducton to the Desgn of Warehouse-Scale Machnes. Synthess Lectures on Computer Archtecture. Morgan & Claypool, 29. [6] BIG-IP Local Traffc Manager. http : / / www. f5. com / products/bg-p/bg-p-local-traffc-manager/. Accessed: 23-2-3. [7] P. Bodk, A. Fox, M. J. Frankln, M. I. Jordan, and D. A. Patterson. Characterzng, modelng, and generatng workload spkes for stateful servces. In: SOCC. 2. [8] S. Boyd and L. Vandenberghe. Convex Optmzaton. New York, NY, USA: Cambrdge Unversty Press, 24. ISBN: 52833787. [9] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandc. Cloud Computng and Emergng IT Platforms: Vson, Hype, and Realty for Delverng Computng as the 5th Utlty. In: Future Generaton Computer Systems 25.6 (29). [] J. Cao, M. Andersson, C. Nyberg, and M. Khl. Web server performance modelng usng an M/G//K*PS queue. In: th Internatonal Conference on Telecommuncatons ICT 23. Vol. 2. 23, pp. 5 56. [] V. Cardelln, M. Colajann, and P. S. Yu. Request Redrecton Algorthms for Dstrbuted Web Systems. In: IEEE Trans. Parallel Dstrb. Syst. 4.4 (Apr. 23). [2] S. Casolar, M. Colajann, and S. Tos. Self-Adaptve Technques for the Load Trend Evaluaton of Internal System Resources. In: ICAS. 29. [3] Y. Dao, C. W. Wu, J. Hellersten, A. Storm, M. Surenda, S. Lghtstone, S. Parekh, C. Garca-Arellano, M. Carroll, L. Chu, and J. Colaco. Comparatve studes of load balancng wth control and optmzaton technques. In: ACC. 25. [4] Y. Dao, J. Hellersten, A. Storm, M. Surendra, S. Lghtstone, S. Parekh, and C. Garca-Arellano. Incorporatng cost of control nto the desgn of a load balancng controller. In: RTAS. 24. [5] J. Doyle, R. Shorten, and D. O Mahony. Stratus: Load Balancng the Cloud for Carbon Emssons Control. In: Cloud Computng, IEEE Transactons on. (23). [6] D. F. García and J. García. TPC-W E-Commerce Benchmark Evaluaton. In: Computer 36.2 (Feb. 23), pp. 42 48. [7] A. Gulat, G. Shanmuganathan, A. Holler, and I. Ahmad. Cloudscale resource management: challenges and technques. In: Hot- Cloud. 2. [8] J. Hamlton. On desgnng and deployng nternet-scale servces. In: LISA. 27. [9] C. Huang and T. Abdelzaher. Bounded-latency content dstrbuton feasblty and evaluaton. In: IEEE Transactons on Computers 54. (25). [2] H. Kameda, E.-Z. Fathy, I. Ryu, and J. L. A performance comparson of dynamc vs. statc load balancng polces n a manframepersonal computer network model. In: CDC. 2. [2] C. Klen, M. Maggo, K.-E. Årzén, and F. Hernández-Rodrguez. Brownout: Buldng more Robust Cloud Applcatons. In: ICSE. May 24. [22] L. Klenrock. Tme shared systems: A theoretcal treatment. In: Journal of the ACM 4.242-26 (967). [23] M. Ln, Z. Lu, A. Werman, and L. L. H. Andrew. Onlne algorthms for geographcal load balancng. In: IGCC. 22. [24] Y. Lu, Q. Xe, G. Klot, A. Geller, J. R. Larus, and A. Greenberg. Jon-Idle-Queue: A novel load balancng algorthm for dynamcally scalable web servces. In: Perform. Eval. 68. (Nov. 2). [25] M. Maggo, C. Klen, and K.-E. Årzén. Control strateges for predctable brownout n Cloud Computng. In: IFAC WC. Aug. 24. [26] S. Manfred, F. Olvero, and S. Romano. A Dstrbuted Control Law for Load Balancng n Content Delvery Networks. In: IEEE/ACM Transactons on Networkng 2. (23). [27] J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble- Up: ncreasng utlzaton n modern warehouse scale computers va sensble co-locatons. In: MICRO. 2, pp. 248 259. [28] M. Mtzenmacher. The Power of Two Choces n Randomzed Load Balancng. In: IEEE Trans. Parallel Dstrb. Syst. 2. (Oct. 2). [29] S. Nakran and C. Tovey. On Honey Bees and Dynamc Server Allocaton n Internet Hostng Centers. In: Adaptve Behavor - Anmals, Anmats, Software Agents, Robots, Adaptve Systems 2.3-4 (Sept. 24). [3] L. N and K. Hwang. Optmal Load Balancng n a Multple Processor System wth Many Job Classes. In: IEEE Transactons on Software Engneerng.5 (985). [3] T.-L. Pao and J.-B. Chen. The Scalablty of Heterogeneous Dspatcher-Based Web Server Load Balancng Archtecture. In: PDCAT. 26. [32] R. H. Patterson, G. A. Gbson, E. Gntng, D. Stodolsky, and J. Zelenka. Informed Prefetchng and Cachng. In: SOSP. 995. [33] S. Ranjan, R. Karrer, and E. Knghtly. Wde area redrecton of dynamc content by Internet data centers. In: INFOCOM. 24. [34] M. Sakata, S. Noguch, and J. Ozum. An Analyss of the M/G/ Queue under Round-Robn Schedulng. In: Operatons Research 9.2 (97), pp. 37 385. [35] M. Salehe and L. Tahvldar. Self-adaptve Software: Landscape and Research Challenges. In: ACM Trans. Auton. Adapt. Syst. 4.2 (May 29). [36] B. Schroeder, A. Werman, and M. Harchol-Balter. Open Versus Closed: A Cautonary Tale. In: NSDI. 26. [37] J. A. Stankovc. An Applcaton of Bayesan Decson Theory to Decentralzed Control of Job Schedulng. In: IEEE Trans. Comput. 34.2 (Feb. 985). [38] A. N. Tantaw and D. Towsley. Optmal Statc Load Balancng n Dstrbuted Computer Systems. In: J. ACM 32.2 (Apr. 985). [39] L. Tomás and J. Tordsson. Improvng Cloud Infrastructure Utlzaton Through Overbookng. In: CAC. 23, pp.. [4] L. Wang, V. Pa, and L. Peterson. The Effectveness of Request Redrecton on CDN Robustness. In: OSDI. 22. [4] J. L. Wolf and P. S. Yu. On Balancng the Load n a Clustered Web Farm. In: ACM Trans. Internet Technol..2 (Nov. 2). [42] L. Zhang, Z. Zhao, Y. Shu, L. Wang, and O. W. W. Yang. Load balancng of multpath source routng n ad hoc networks. In: ICC. 22.