Self-Adaptive Capacity Management for Multi-Tier Virtualized Environments

Self-Adaptve Capacty Management for Mult-Ter Vrtualzed Envronments Ítalo Cunha, Jussara Almeda, Vrgílo Almeda, Marcos Santos Computer Scence Department Federal Unversty of Mnas Geras Belo Horzonte, Brazl, 323-97 {cunha, jussara, vrglo, marcos}@dcc.ufmg.br Abstract Ths paper addresses the problem of hostng multple applcatons on a provder s vrtualzed mult-ter nfrastructure. Buldng from a prevous model, we desgn a new selfadaptve capacty management framework, whch combnes a two-level SLA-drven prcng model, an optmzaton model and an analytcal queung-based performance model to maxmze the provder s busness objectve. Our man contrbutons are the more accurate mult-queue performance model, whch captures applcaton specfc bottlenecks and the parallelsm nherent to mult-ter platforms, as well as the soluton of the extended and much more complex optmzaton model. Our approach s evaluated va smulaton wth synthetc as well as realstc workloads, n varous scenaros. The results show that our soluton s sgnfcantly more cost-effectve, n terms of the provder s acheved revenues, than the approach t s bult upon, whch uses a sngle-resource performance model. It also sgnfcantly outperforms a mult-ter statc allocaton strategy for heavy and unbalanced workloads. Fnally, prelmnary experments assess the applcablty of our framework to vrtualzed envronments subjected to capacty varatons caused by the processng of management and securty-related tasks. I. INTRODUCTION Modern Internet and Web-based servces commonly rely on computng-based outsourcng as a fnancally attractve approach to host ther ncreasngly popular servces []. In ths scenaro, the servce provder sgns Servce Level Agreement (SLA) contracts wth an nfrastructure provder. In order to be proftable, the servce provder demands that a sgnfcant fracton of ther customer requests are served wth a qualty that meets specfc requrements. On the other hand, the goal of the nfrastructure provder, or smply the provder, s to devse the most cost-effectve strategy for managng ther avalable resources, shared among a number of hosted applcatons. The capacty management of ths shared nfrastructure becomes partcularly challengng due to several reasons. Current Web servces demand for complex and heterogeneous multter servce platforms composed of HTTP servers, applcaton servers and, possbly, a database server. Moreover, applcaton heterogenety and typcally hgh workload daly fluctuatons [2] can not be effectvely accommodated wth tradtonal statc capacty management strateges. In such scenaro, resource vrtualzaton [3], [4], [5] can buld a much more cost-effectve envronment. A vrtualzaton mechansm creates solated vrtual machnes (VMs) on top of the physcal nfrastructure, each one composed of a set of vrtual nstances of the physcal resources, and dedcated to serve a sngle applcaton. The desgn of cost-effectve capacty management strateges for the hostng nfrastructure s further challenged by new demands from the servce provders. There s a growng nterest n establshng servce contracts where payment s proportonal to the resources actually used [6]. Moreover, such contracts should provde guarantees not only on servce throughput but also on the response tme observed by each request [7]. The latter mples a demand for guarantees on the response tme tal dstrbuton, as opposed to the tradtonal requrements on average response tme. These requrements mply the need for more complex SLA contracts, busness and prcng models, and, ultmately, for more sophstcated capacty management solutons. Integrated management of such complex networked systems demands analytc models that combne dfferent technques to represent mportant facets of the system. In [8], we have proposed a self-adaptve capacty management soluton that addresses some of the aforementoned challenges. Our soluton combnes a prcng model, bult from a two-level SLA contract model, a queung-based performance model, and an optmzaton model to dynamcally allocate the avalable capacty among the hosted applcatons, amng at maxmzng the provder s busness objectves. As n other prevous work [9], [], [], [2], our performance model represents the fracton of the physcal resources assgned to each applcaton as a sngle resource (.e., a sngle queue). Ths s a smplfed representaton of the system, especally for heterogeneous applcatons runnng on mult-ter platforms. Thus, t may lead to sgnfcant naccuraces, ultmately mpactng the costeffectveness of the soluton. Other common lmtatons of prevously proposed resource management frameworks nclude lack of busness-orented goals and use of average performance guarantees [2], [3]. Ths paper bulds on our prevous work, and proposes a sgnfcantly more cost-effectve capacty management framework for mult-ter vrtualzed systems. The framework assumes the physcal nfrastructure of each ter s vrtualzed, assgnng one local VM to each hosted applcaton. In ths case, the performance model represents each VM as a separate queue, thus capturng the nherent parallelsm of mult-ter platforms mssed by the sngle-resource models. In ths multter model, the computaton of the probablty of response tme

SLA volatons becomes sgnfcantly more challengng, whch ultmately adds to the optmzaton model complexty. Two alternatve estmates of ths probablty are proposed, whch combned wth an extended optmzaton model yelds two self-adaptve mult-ter approaches. We compare our two approaches aganst our prevous selfadaptve sngle-resource model as well as a mult-ter statc allocaton strategy. Smulaton experments wth synthetc and realstc workload profles are used to assess the relatve costeffectveness of the analyzed strateges n varous nterestng scenaros. One such scenaro models dynamc changes n the avalable capacty as a consequence of local processng by management and securty-related tasks. Our man conclusons are as follows. Frst, our mult-ter self-adaptve approaches scale reasonably well to practcal scenaros. Second, n case of heavy and unbalanced workloads, t yelds sgnfcant revenue gans to the provder, f compared wth the mult-ter statc allocaton. Thrd, the performance naccuraces ntroduced by the sngle-resource model lead to very conservatve allocaton decsons. As a consequence, the sngle-resource strategy s outperformed by the mult-ter self-adaptve and statc approaches by orders of magntude, even when the hosted applcatons are homogeneous and have balanced servce demands. Fnally, our mult-ter self-adaptve approaches are also much more robust than the sngle-resource model n face of capacty varatons due to the local executon of management and securty-related tasks. Ths paper s organzed as follows. Secton II dscusses related work. Our mult-ter vrtualzed envronment and the prcng model upon whch our approaches are bult are descrbed n Secton III. Secton IV presents our self-adaptve framework. Smulaton results are presented n Secton V. Conclusons and future work are offered n Secton VI. II. RELATED WORK A sgnfcant amount of effort has been dedcated to the desgn of effcent methods for autonomc resource management n modern computng systems. In partcular, some prevous work focused on mprovng system performance by applyng admsson control and schedulng mechansms [3] as well as technques for allocatng shared capacty among hosted applcatons [2]. However, these studes focus only on system performance and lack busness-orented goals. Other related topcs nclude the use of reward-drven request prortzaton [4] and the management of grd systems [5]. A consderable amount of work apples analytc queung models for autonomc capacty management, usually combnng admsson control and capacty allocaton wth the objectve of maxmzng the provder s busness objectve. Models usng M/M/ and M/G/ queues wth FIFO and processor sharng schedulng are consdered n [], [], employng dfferent approxmatons to the response tme of M/G/ queues. The capacty manager proposed n [9] ncludes operatonal costs (e.g., energy) nto the optmzaton model amng at mnmzng the revenue losses due to SLA volatons and management costs. In [8], we combne an SLAdrven prcng model, a queung-based performance model, and an optmzaton model to dynamcally allocate avalable capacty among hosted applcatons amng at maxmzng the provder s revenues. Models wth a sngle servce center (ether a M/M/ or a M/G/ queue) are used to provde estmates of the response tme tal dstrbuton. A number of prevous work, ncludng [6], [7], proposes queung-based performance models specfcally for mult-ter systems. More recent mult-ter models addresses ssues such as the cachng of responses at ters [8] and work conservaton [9]. In common, these models focus on performance estmaton only. They are not coupled to optmzaton models, and usually provde only average performance estmates. Our approach mproves on prevous work, combnng a more accurate mult-ter performance model and an optmzaton model, takng nto account probablstc performance guarantees and beng drven by the provder s busness objectve, expressed n a flexble two-level prcng model. III. INFRASTRUCTURE MODEL Ths secton descrbes the target platform (Secton III-A) of our self-adaptve capacty management framework as well as the prcng model (Secton III-B) t s bult upon. A. Vrtualzed Hostng Platform We consder a scenaro where a provder hosts multple thrd-party Web servces. Each such Web servces may be composed of dfferent request types, characterzed by dfferent workloads, dfferent servce demands on resources, and executed by ndependent software components. We refer to each such request type as an applcaton class, and assume the nfrastructure hosts N ndependent classes from all servces. We consder the provder s nfrastructure s composed of multple (K) ters, as s the case for many Web servces. Each ter s responsble for a specfc task n the process of servng a request (e.g., presentaton, applcaton and database ters). Ters operate n parallel and requests vst ters n sequence. That s, a request from class enters ter j, s served, and then leaves the system wth probablty p,j or proceeds to ter j + wth probablty ( p,j ) ( =..N, j =..K, p,k =). Each ter s hosted on a separate hardware, whch s shared by all classes. Such an nfrastructure must provde performance solaton between classes, among other desred features. Thus, we assume each ter runs a vrtualzaton mechansm, such as Xen [3] and Denal [4], whch provdes servce dfferentaton and performance solaton for hosted applcaton classes, smplfyng load balancng and allowng the dynamc allocaton of resources to each class. In fact, such technologes, are currently experencng a renewed nterest as a means for server consoldaton, mprovng system securty, relablty and avalablty, reducng costs and provdng flexblty. The consdered hostng platform s shown n Fg.. On top of each ter physcal nfrastructure, a vrtualzaton layer creates N solated vrtual machnes (VMs), one for each class. Gven K ters, each request from a gven applcaton class s served by K VMs, dedcated to that class. Ths hostng model

Fg.. Mult-Ter Vrtualzed Servce Hostng Platform Fg. 2. Self-Adaptve Capacty Management (from one ter s perspectve) solates classes one from another, each usng K VMs as f they were dedcated servers, workng at a fracton of the total (physcal) capacty. The vrtualzaton layers allow the provder to dynamcally ncrease or decrease the amount of physcal resources dedcated to a class on each ter, ndependently. Hence, we defne the capacty allocaton problem as the determnaton of physcal capacty fractons for each class at each ter j. We assume that VMs employ admsson control schemes to avod unwanted stuatons lke, servce nstablty due to capacty lmtatons, securty attacks or to guarantee that the requrements of response tme are met. In ths work, we focus on the dynamc capacty allocaton for hosted classes across all ters, wth the goal of maxmzng the provder s busness objectve, descrbed next. By provsonng each ter separately, we can take specfc applcaton bottlenecks nto consderaton. B. Prcng Model In [8], we propose a prcng scheme that addresses the hgh varablty of applcaton workloads n onlne servces. Many servces whch usually receve low to moderate load, are suddenly nundated by an exceptonal surge of requests. Ths phenomenon, known as flash crowds generates congeston at the servce nfrastructure, causng sgnfcant delays to customers. Due to the hghly dynamc nature of Internet workloads, we propose contracts wth two levels of requrements, whch correspond to two dfferent operaton modes, namely, normal and surge. In the followng dscusson, we refer to the servce provder as the customer, whch establshes a contract wth the nfrastructure provder to host ts servce. In the normal operaton mode, customers contract the servce level whch satsfes ther needs for the majorty of tme, whereas n the surge operaton mode, a hgher servce level lmt s establshed, up to whch the provder has an ncentve to assgn extra capacty so as to accommodate occasonal load peaks. From the busness standpont, ths approach can be advantageous both to customers who pay for extra capacty only when needed, and to provders who can offer more attractve servce plans by operatng wth more flexblty. In ths work, the SLA performance requrements quantfy the capacty of the provder s nfrastructure to process transactons, gven that per-request response tme probablstc guarantees are satsfed. Moreover, the proposed SLA contracts contan performance targets for each operaton mode. For the normal operaton mode, the SLA defnes a throughput X NSLA for each class, whch the provder s expected to satsfy, gven that the class arrval rate s hgh enough. In case of SLA volatons, the provder agrees to refund part of the servce charged to ts customers. Ths penalty s proportonal to the dfference between X NSLA and the actual vald throughput (see below). For the surge operaton mode, the SLA defnes X SSLA X NSLA, the throughput up to whch a customer s wllng to pay a reward to the provder for servng requests n excess of X NSLA. Rewards are also proportonal to the extra vald throughput acheved. Penaltes and rewards are calculated usng SLA parameters c and r per unt of throughput below or above X NSLA, respectvely. The vald throughput s composed of all requests that were served wth a response tme that satsfes the specfed SLA. We consder a tal dstrbuton response tme requrement statng that the response tme of requests from class must not exceed a gven threshold R SLA for more than α % of the tme. In other words, P (R >R SLA ) α, where R s the response tme of a class request. Note that the proposed approach can be extended to more than two SLA levels, specfyng multple performance targets so as to account for dfferent levels of demands from applcatons and customers. Gven ths prcng model, the provder s busness objectve s defned as the provson of capacty to VMs that execute the applcatons so as to maxmze the net revenues from penaltes and rewards. IV. CAPACITY MANAGEMENT FOR MULTI-TIER SERVICES Ths secton descrbes our self-adaptve capacty management model for mult-ter Web servces. Secton IV-A presents the self-adaptve framework, whch s bult from our prevous sngle-resource model [8]. The most sgnfcant new contrbutons le on the more accurate performance model and on the extended optmzaton model desgned to use t, presented n Sectons IV-B and IV-C, respectvely. A. Self-Adaptve Framework Our self-adaptve framework proposes a system operaton model based on feed-forward control, shown n Fg. 2. Its core entty s the capacty manager, whch s called perodcally to allocate the capacty avalable on each ter among the hosted

TABLE I SLA, SYSTEM AND WORKLOAD FORECASTING PARAMETERS SYMBOL DESCRIPTION X NSLA Vald throughput requred for class n normal mode (req/s). X SSLA Maxmum vald throughput for class n surge mode (req/s). R SLA Response tme requrement for class (sec). α Upper-bound on the probablty of response tme exceedng R SLA for a class request. c Penalty cost for a unt of class throughput below X NSLA. r Reward for a unt of class throughput n excess of X NSLA. N Number of hosted classes (and, thus, of VMs on each ter). K Number of ters. ν,j Maxmum utlzaton planned for a VM on ter j for class. d,j Average servce tme of class requests at ter j physcal nfrastructure runnng on ts full capacty (sec). p,j Prob. of class requests leavng system after vstng ter j. λ Predcted class arrval rate (n req/s) for the next nterval. applcaton classes, amng at maxmzng the provder s busness objectve. We refer to the nterval between consecutve nterventons as the controller nterval. At the end of each controller nterval, the capacty manager receves estmates of the workload expected for each class n the next nterval as well as the SLA requrements, the average servce tme (.e., servce demand) of requests from each class on each ter, and the routng probabltes for each class (.e., probabltes that requests leave the system after vstng each ter). These parameters are used to compute the fracton of the capacty avalable at each ter that should be gven to the local VM (.e., on the ter) assgned to each class. They are also used to estmate the fracton of the expected request rate for each class that can be accepted nto the system wthout volatng capacty lmtatons. The new capacty allocaton s then sent to the vrtualzaton layer, whch updates the vrtual resource mappngs accordngly. Note that SLA requrements and system confguraton parameters may change whenever contracts change (.e., applcaton classes are added or removed, SLA requrements change). Note also that the controller ntervals could have fxed or varable duratons, dependng on the characterstcs of the system and stablty of the workloads of hosted classes. Regardless, ts mnmum duraton s constraned by the tme the capacty manager requres to reconfgure the system. Fnally, we assume future workload estmates are provded by a workload forecaster module, whch mplements one of the exstng forecastng methods [2], and that an admsson control mechansm (such as those n [2]) s used to enforce the per-nterval accepted request rates. The desgn and evaluaton of these modules s outsde the present scope. The parameters used by the capacty manager, descrbng the prcng model and SLA requrements, system confguraton and workload characterstcs, are defned n Table I. We assume all requests from a class are statstcally ndstngushable, thus havng the same average servce tme on each ter nfrastructure (.e., runnng at full capacty), gven by d,j. Parameter ν,j, an upper-bound on the utlzaton planned for the VM assgned to class on ter j ( ν,j < ), s Fg. 3. Mult-Ter System Model. ntroduced to mnmze long-term response tme degradaton due to overload, keepng a certan level of stablty n the VMs. The capacty manager s bult from an optmzaton model that lnks an analytcal performance model wth the two-level SLA-drven prcng model presented n Secton III-B. The performance and optmzaton models are presented next. B. Performance Model Ths secton presents an analytcal queung model to estmate the performance metrcs used by the capacty manager, namely, per-ter resource utlzaton, system throughput and the probablty of response tme SLA volatons for each class. Our model assumes that request arrvals from each class follows a Posson process, as observed n real systems [], [], [22]. Class requests accepted nto the system arrve at the frst ter wth rate λ acc, and leave the system after vstng ter j wth probablty p,j (p,k =). We assume that classes have exponentally dstrbuted servce tmes at each ter, leavng the study of other, applcaton-specfc, patterns for future work. Under these assumptons, the mult-ter vrtualzed system s modeled as N tandem queue networks, one for each class, as shown n Fgure 3. The VM assgned to each class on each ter s represented as a M/M/ queue wth FCFS schedulng [23]. Ths queue has been often used as a reasonable model for transactonal servce centers [], [], [2], [8]. A class request has a system response tme equal to the sum of ts resdence tmes at each queue, whch are assumed to be ndependent of each other. Ths assumpton trades solvng tme and deployablty over accuracy. Nevertheless, we clam that our soluton captures the system prmary performance aspects and trade-offs, mprovng on prevous models, whle stll solvng a complex optmzaton model effcently for practcal scenaros. Capturng nterdependences s left for future work. Snce each applcaton s guaranteed to have access to at least the amount of resources assgned to t, we estmate the average servce tme of a request from class at ts assgned VM on ter j, d,j, by the average servce tme at the ter wth full capacty, nflated by the fracton of ts capacty currently assgned to class, gven by f,j. In other words, d,j = d,j f,j. Moreover, gven the routng probabltes p,j, the effectve request arrval rate from class at ter j s gven by λ e,j = j λacc k= ( p,k). Note that λ e, = λacc. Furthermore, λ e,j = λacc f p,j =, N and j =..K. Parameters d,j and p,j can be estmated n a pre-producton executon of each class, as dscussed n [8].

We are now ready to present our performance model. Note that, gven the job flow balance condton [23], whch states that all accepted requests are actually processed by each VM, class system throughput s gven by the accepted request rate. Moreover, the utlzaton of ter j by class, ρ,j, can be estmated by the product of class average servce tme at ts assgned VM on ter j by the rate at whch ts requests arrve at the ter [23]. That s, ρ,j = λ e,j d,j. The most challengng component of our performance model s the estmate of the probablty that a request from class volates ts response tme SLA. Gven the resdence tme of aclass request at ter j, R,j, and ts system response tme R = K j= R,j, our goal s to estmate P (R R SLA ). λ acc Note that R,j s the response tme of a M/M/ queue, whch s exponentally dstrbuted wth parameter γ,j = d,j λ e,j [23]. Moreover, recall that the sum of K ndependent exponental varables wth rates γ,j (j =..K) follows a hypoexponental dstrbuton [24]. Thus, the probablty of a response tme SLA volaton by a class request s equal to the complement of the cumulatve dstrbuton of the hypoexponental varable wth parameters γ,j. In other words, max s.t. N = g (λ acc ) λ acc mn(λ,x SSLA ) N (a) d,j = d,j f,j N,j K (b) ρ,j = λ e,jd,j ν,j N,j K (c) N f,j j K (d) = f,j N,j K (e) j λ e,j = λ acc ( p,k ) N,j K (f) k= P (R R SLA ) α N (g) Fg. 4. Capacty Manager Optmzaton Model P (R R SLA )= K j= K k=,k j γ,k e γ,jrsla γ,k γ,j () Because our queung-based performance model captures the parallelsm that can exst n a mult-ter envronment, we expect Equaton to be more precse than applyng any of the approxmatons presented n [8] to our target envronment. Based on a sngle-resource model, the most costeffectve approxmaton s derved from the response tme of a sngle M/M/ queue, whch s exponentally dstrbuted wth parameter K j= d,j λ acc [8]. However, Equaton s also more complex, whch may compromse the model soluton tme. Thus, we also consder an approxmaton derved from the s Inequalty [23], whch provdes the followng upper-bound on the probablty of a volaton for class : P (R R SLA Var[R ] ) (R SLA (2) E[R ]) 2 E[R ] and Var[R ] are the mean and varance of the system response tme for class, and are equal to the sum of the correspondng measures of each exponental component of the hypoexponental dstrbuton. In other words, E[R ] = and Var[R ]= K = ( γ,j ) 2. K = γ,j We do not use the smpler Markov s Inequalty [23], whch depends only on the mean response tme, as ths upper-bound s typcally very loose, leadng to more conservatve and less cost-effectve allocaton decsons [8]. C. Optmzaton Model The core component of our capacty manager s the optmzaton model shown n Fgure 4. Its man decson varable s vector f,j, the fracton of ter j s capacty assgned to the local VM responsble for class. The objectve functon expresses the provder s busness objectve, gven by the sum, over all classes, of the net revenues from ndvdual penaltes and rewards. The penalty (or reward) for class, g, depends on, whch, n turn, s constraned by the values of f,j. Before elaboratng on the formulaton of g, we descrbe the model constrants. Constrant (a) states that the accepted request rate for each class s lmted by ts predcted arrval rate and by the maxmum throughput the provder can captalze upon when the class s on surge mode. Constrant (b) defnes the average servce tme of class at ts assgned VM on ter j. Constrant (c) states that the utlzaton of ter j by class, ρ,j, ts SLA requrements and accepted request rate λ acc, s lmted by the maxmum planned utlzaton for the correspondng VM. Constrants (d) and (e) mpose obvous lmts on vector f,j. Constrant (f) defnes the effectve arrval rate at each ter. Fnally, constrant (g) expresses the SLA response tme requrement. Two varants of the model are created by usng the expressons n Equatons and 2. Note that ths constrant expresses the trade-off between throughput and qualty of servce. A smaller value of α assures that most class requests are served wth short response tmes. However, fewer requests are accepted nto the system, and throughput s lower. Larger values of α allow more requests nto the system and thus hgher throughput. However, accepted requests observe longer response tmes more frequently. We now turn to the formulaton of g, the provder s revenue from class. Recall that rewards are gven to the provder whenever the accepted request rate from class exceeds ts throughput SLA requrement for normal operaton mode,.e., whch ultmately depends on λ acc whenever λ acc > X NSLA. Constrant (a) guarantees the. On the other s lower than X NSLA,as upper-bound on such rewards based on X SSLA hand, penaltes are ncurred f λ acc

long as ths s due to capacty lmtatons and not to lower request arrval rates. In other words, a penalty s ncurred whenever λ acc <mn(λ,xnsla ). Thus, the net revenue obtaned from class, g, s gven by: g = { ( ) c mn(λ (,X NSLA ) ) λ acc r λ acc X NSLA λ acc λ acc X NSLA >X NSLA (3) Note that g, and thus the objectve functon, ncreases wth λ acc. However, λ acc s constraned by the workload and SLA contract (constrant (a)), by lmtatons on resource utlzaton (constrant (c)), and, above all, by the response tme SLA requrement (constrant (g)). The last two constrants ndrectly lnk the values of λ acc to the decson varables f,j. The optmzaton model shown n Fgure 4 s an extenson, for mult-ter envronments, of the one proposed n [8]. As n [8], the man challenge, from the optmalty and soluton tme perspectves, les n the pecewse lnear objectve functon and n the response tme constrant (g). If the two-step pecewse lnear objectve, computed from Equaton 3, s concave (.e., c r, N), t can be expressed as a set of lnear constrants whch can be easly solved. Here, we have focused on ths scenaro. Otherwse, a bnary varable δ can be used to combne penaltes and rewards nto a sngle expresson for g, as n [8]. Approxmatng the objectve functon by a polynomal s an alternatve soluton. The probablty of a response tme SLA volaton derved from the s nequalty (Equaton 2) as well as the approxmatons proposed n [8], though convex and non-lnear functons n the vald range of the decson varables, are reasonably smple. Thus, the optmzaton models can be easly solved. Ths s true even for the most precse approxmaton proposed n [8], whch s based on the exponental dstrbuton of a M/M/ queue response tme [8]. The expresson derved from the hypoexponental dstrbuton (Equaton ), on the other hand, s much more complex, yeldng a much more challengng optmzaton model. In partcular, the dstrbuton s undefned whenever two of ts parameters γ,j have equal values, makng the problem unsolvable. Several strateges can be used to remedy ths problem. Frst, the hypoexponental functon can be approxmated by a polynomal wth a compromse n optmalty. Second, one can solve dfferent nstances of the model, coverng complementary regons of the soluton space where the functon s clearly defned, and then take the maxmum soluton from all nstances as the global optmum. We mplemented ths strategy and successfully tested t wth a small number of ters (2) and applcaton classes (up to 4). However, t does not scale well for larger numbers of ters and classes. Fnally, one can approxmate terms of the hypoexponental dstrbuton wth equal parameter values by an Erlang dstrbuton. Ths approxmaton s asymptotcally exact, as the sum of dentcally dstrbuted exponental varables has an Erlang dstrbuton [24]. We have selected ths approach due to ts better scalablty, dscussed n Secton V-A. Our optmzaton model was mplemented and tested n AMPL [25], a modelng language for mathematcal programmng. We ran a number of dfferent solvers, and all of them converged to the same soluton for all tested nputs. V. EXPERIMENTATION In ths secton, we evaluate the cost-effectveness of our self-adaptve capacty management framework for mult-ter envronments, comparng t wth our prevous self-adaptve strategy based on a sngle-resource model [8] and wth a statc capacty allocaton, n varous scenaros. The man metrc for comparson s the provder s net revenue obtaned wth each strategy. In our experments, we consder a two-ter envronment (.e., K =2), applcable to a servce platform wth a front-end server (e.g., an HTTP server) and a back-end resource (e.g., a storage area network or a database server). We evaluate the two varants of our capacty manager that estmate the probablty of response tme SLA volatons usng the hypoexponental dstrbuton and s nequalty. In the sngle-resource model approach, ths probablty s estmated from the exponental dstrbuton of response tme of a sngle M/M/ queue [8], wth per class average servce tme equal to the sum of the average servce tmes at both ters. Fnally, the statc allocaton assumes the best capacty allocaton at each ter for the gven workloads, assgnng a fxed fracton of ter j s total capacty to class that s proportonal to ts average utlzaton over class s entre workload. It also uses the system response tme dstrbuton for the two M/M/ queue network (Equaton ) to estmate the maxmum request rate from each class that can be admtted nto the system whle stll meetng the response tme SLAs. Thus, lke our new approaches, t s based on a more accurate representaton of the system. These strateges are referred to, n ths secton, as the hypoexponental,, sngleresource and statc approaches. We bult an event-drven smulator that models the system as a tandem queue wth two centers, and s fed wth workload traces from N applcaton classes. For the self-adaptve strateges, the smulator s coupled to an optmzaton model solver, whch s called at the end of each controller nterval to calculate the capacty allocaton vector f,j and the accepted request rate λ acc for each class and ter j, for the next nterval. Durng each nterval, per-request response tme as well as per-class throughput and ter utlzaton are collected and used to compute the provder s revenue. Our smulator employs a far admsson control mechansm, whch accepts a class request wth probablty λacc λ. Thus, the assumpton of Posson arrvals holds for the accepted requests. Ths s a conservatve approach compared to other mechansms that ams at mnmzng the nter-arrval tme varance [23]. Moreover, we assume every accepted request vsts both ters, and that the maxmum planned utlzaton for all VMs s 95% (.e., p, = and ν =.95, for =..N). Fnally, snce our current focus s on the costeffectveness of our new approaches, we compare them n a best-case scenaro to understand trade-offs: we assume there

Average Solvng Tme (s) 2.5 Lnear ft 2.5.5 2 3 4 5 6 # Applcaton Classes (N) TABLE II PER-TIER AVERAGE SERVICE TIMES IN SCENARIOS AND 2 Scenaro Scenaro 2 Class d, (ms) d,2 (ms) d, (ms) d,2 (ms).6.4.7 2.4.6.7 TABLE III BUSINESS MODEL PARAMETER VALUES ( =..N ) Fg. 5. Scalablty of Self-Adaptve Mult-Ter Capacty Manager. s no tme lmtaton for adaptng the system, and that an deal workload forecastng, where future arrval rates are known a pror, s used. The selecton and evaluaton of a practcal workload forecastng method, among several exstng tme seres technques [2] wth varyng degrees of accuracy, s left for future work. In Secton V-A, we brefly dscuss the scalablty of our capacty manager. Smulaton results for synthetc workloads and for more realstc workload profles are gven n Sectons V-B and V-C. Fnally, Secton V-D dscusses a scenaro where both workload and avalable capacty dynamcally change. Our smulator was valdated comparng the system response tme measured wth the analytcal model (Equaton ) wth errors under %. All results presented are averages of 5 runs (2 n Secton V-A), wth standard devaton under 2% of the means. A. Capacty Manager Scalablty We evaluate the scalablty of our mult-ter framework for confguratons wth up to 6 classes. We focus on the hypoexponental approach, due to ts hgher complexty and longer average solvng tmes. Our experments are conducted usng the SNOPT nonlnear solver [26], on a computer wth a 2 GHz AMD Sempron 24 CPU and 52 MB of RAM. Fgure 5 shows the average solvng tme as the number of classes ncreases. The lnear fttng of the data ndcates that average solvng tme, typcally under second, ncreases wth a small factor of the number of classes. Thus, our new capacty manager scales well to practcal scenaros. B. Synthetc Workloads Ths secton presents expermental results for synthetc workloads and two applcaton classes. Two scenaros llustrate the man trade-offs and benefts of our soluton. The controller nterval s set to seconds n both scenaros. In scenaro, requests from each class arrve accordng to the perodc step-lke non-homogeneous Posson processes shown n Fgure 6-a). Arrval rates vary from to requests per second, wth steps and perods of and seconds, respectvely. Both workloads have dentcal profles wth a shft n ther perods. Ths s an nterestng scenaro for the self-adaptve approaches, whch are able to reassgn the dle capacty from the underloaded VMs to the overloaded ones to satsfy the SLA requrements. The self-adaptve capacty manager s called at the end of each Scenaro R SLA X NSLA X SSLA c r α and2. s 5 req/s 2 req/s..5. 3 2 s.8 req/s req/s 35 75. 4 5 s.8 req/s req/s 75 875. controller nterval, whch concdes wth nstants when perclass request rates change. Average servce tmes for each class at each ter as well as busness model parameter values are gven n Tables II and III, respectvely. Note that classes and 2 have bottlenecks at ters and 2, respectvely. In ths case, our self-adaptve mult-ter approaches are able to dynamcally assgn, for each applcaton, more resources to the ter t needs the most. Nevertheless, note that, for each class, the servce tme unbalance s not very sgnfcant. Moreover, both classes have equal prcng model parameter values, as our nterest s on the relatve costeffectveness of the approaches analyzed. Fgure 6-b) shows the provder s net revenue acheved by each approach throughout the smulaton. The repeatng pattern of the curve s produced by the perodc behavor of the workloads. The hypoexponental and approaches yeld quanttatvely smlar revenues throughout the smulaton. Interestngly, both approaches provde only margnal gans (%) over the statc approach when classes have complementary loads, even though ths s the best scenaro for self-adaptve approaches, as they can reassgn dle capacty from the underloaded VMs to the overloaded ones. Ths s because the load mposed at each ter s very lght throughout smulaton. Thus, the ablty of self-adaptng does not play a sgnfcant role, and most requests are admtted nto the system by all three strateges. Revenues are mostly dctated by the opportuntes for captalzng from an applcaton class runnng on surge mode, whch are the same for all approaches. On the other hand, wth the sngle-resource approach, the per-class average servce tme s equal to ms, makng the nfrastructure slghtly underprovsoned for the aggregate request rates. The revenue gans provded by our self-adaptve multter approaches over the sngle-resource approach vares from 7% (durng peaks) to 3% (durng valleys). In fact, when both classes have smlar load, the sngle-resource approach, based on a smplfed system model, s sgnfcantly outperformed even by the statc approach. Fgure 6-c) summarzes these results, showng the revenue cumulatve dstrbutons over all ntervals. The self-adaptve mult-ter approaches yeld an overall revenue gan of 28% over the sngle-resource approach.

Arrval Rate (reqs/s) 2 8 6 4 2 Class Class 2 5 5 2 25 Tme (s) Revenue 35 3 25 2 5 5 Hypoexponental Statc Sngle resource 5 5 2 25 Tme (s) P(revenue < r).8.6.4.2 Hypoexponental Statc Sngle resource 5 5 2 25 3 Revenue r (a) Workload Profles (b) Provder s Revenue over Tme (c) Provder s Revenue Dstrbuton Fg. 6. Expermental Results for Synthetc Workloads: Scenaro Accepted Request Rate (reqs/s) Hypoexponental 9 Statc Sngle resource 8 7 6 5 4 3 2 5 5 2 25 Tme (s) Avg. # Successes/Falures per Second 8 6 4 2 Hypo. Hts Statc Hts Hypo. Msses Statc Msses Sngle resource Hts 5 5 2 25 Tme (s) P(response < R).8.6.4.2 Sngle resource Statc Hypoexponental.5..5.2 Response Tme R (s) (a) λ acc (b) Successes and Falures (c) Response Tme Dstrbuton Fg. 8. Applcaton Class Performance Metrcs for Synthetc Workloads: Scenaro 2 Revenue 8 Hypoexponental Statc 6 4 2 8 6 4 2 5 5 2 25 Tme (s) (a) Revenue over Tme P(revenue < r).8.6.4.2 Hypoexponental Statc Sngle resource 5 4 3 2 2 Revenue r (b) Revenue Dstrbuton Fg. 7. Provder s Revenue for Synthetc Workloads: Scenaro 2. We now turn our evaluaton to scenaro 2, characterzed by heaver workloads wth more unbalanced ter average servce tmes. Workload profles are dentcal to those n Fgure 6-a), but rates vary from 2 to 2 arrvals per second. System parameter values are shown n Tables II and III. Fgures 7-a) and 7-b) show the provder s revenue obtaned wth each approach. The hypoexponental approach leads to hgher revenues than, when per-class request rates are balanced. Overall revenue gans are around 2%. As one mght expect, the approxmaton error becomes more sgnfcant for heaver loads. Unlke the prevous scenaro, both approaches sgnfcantly outperforms the statc strategy (up to 58%) n ntervals when classes have complementary request rates. Moreover, the sngle-resource strategy, wth revenues fluctuatng from -458 to -57, s outperformed by the three other strateges, by orders of magntude. We note that two prmary factors that mpact the costeffectveness of the capacty management strateges are the ablty to adapt to workload changes and the performance model accuracy, whch mpacts both capacty allocaton and admsson control decsons. The fxed capacty allocaton sgnfcantly penalzes the statc approach for heavy and heterogeneous workloads (.e., scenaro 2). Furthermore, n both analyzed scenaros (and n an omtted scenaro wth fully balanced and homogeneous applcatons), the sngle-resource approach s sgnfcantly penalzed by ts smpler, and thus more naccurate, performance model, where response tmes are exponentally dstrbuted wth mean gven by the sum of the average servce tmes at each ter. On the other hand, the hypoexponental and statc approaches use the hypoexponental dstrbuton of response tme for two M/M/ queues. It can be easly shown that, for fxed average servce tmes, the mean of the exponental dstrbuton s always larger [24]. Thus, n order to meet the response tme constrant, the sngle-resource approach s forced to make more conservatve allocaton and admsson control decsons, ncurrng n lower revenues. Ths concluson s llustrated n Fgures 8-a)- c), whch show the accepted request rate, the rates of response tme SLA hts and msses (.e., volatons) for the accepted requests, and the response tme dstrbuton for one applcaton class. Rates for the approach, omtted, are between those of the hypoexponental and statc approaches. Note that the more aggressve allocaton and admsson control decsons made by the hypoexponental and statc approaches lead to a larger number of SLA msses. Nevertheless, Fgure 8-c) shows that the target SLA constrant (P (R >.) <.)) s met by all

.8.8.8.8 Arrval Rate (reqs/s).6.4.2 44 288 432 576 72 864 Tme (s) Arrval Rate (reqs/s).6.4.2 44 288 432 576 72 864 Tme (s) P(revenue < r).6.4.2 Sngle res. Statc Hypo. 4 3 2 2 3 4 Revenue r P(revenue < r).6.4.2 Sngle resource Statc Hypoexponental 4 2 2 4 6 8 Revenue r (a) Applcaton (b) Applcaton 3 (a) Scenaro 3 (b) Scenaro 4 Fg. 9. Request Traces from Two Real E-Busness Applcatons. Fg.. Revenue Cumulatve Dstrbuton for Realstc Workloads. TABLE IV PER-TIER AVERAGE SERVICE TIMES IN SCENARIOS 3 AND 4. Scenaro / Avg Applcaton Class Serv. Tmes (s) 2 3 4 5 6 7 8 3 d,.5 2..5 3. - - - - d,2 3..5 2..5 - - - - 4 d,.25.25.75.9.85..5.5 d,2.5.5..85.9.75.25.25 approaches. Note also that, even acceptng a larger number of requests, the approach has a more skewed response tme dstrbuton than the statc approach. The statc allocaton alled to more aggressve admsson control decsons result n longer queues at each ter, thus ncreasng response tme. Fnally, to verfy the mpact of mspredctng the ncomng workload, we ran smulatons usng dfferent controller ntervals for scenaro 2. We chose nterval lengths that do not concde wth the nstants when the workload changes. For nterval lengths of 3 and 6 seconds, the overall loss n revenue compared to the results shown n Fgure 7- a s only 5% (8%), and % (5%), respectvely, for the hypoexponental () model. Thus, our soluton s reasonably robust to workload mspredctons. C. Realstc Workload Profles In ths secton, we evaluate the capacty management approaches for more realstc workload profles. New workloads are bult from traces contanng the request rate, at each 5- mnute nterval, to 4 dfferent real e-busness applcatons, over a perod of 3 months (from /23/24 to 2/23/25). Confdentalty agreements prevent us from nformng the source of our workloads. All four traces have smlar load profles, wth peaks around the same tme. Request rates vary wdely, wth a peak of 7 requests per second and an average of.78 requests per second. Fgures 9-a) and 9- b) show the request rate varaton for two applcatons, on a typcal day. Realstc workloads are bult by assumng request arrvals follow non-homogeneous Posson processes, wth rates changng at each 5 mnute nterval, and gven by the traces. Two new confguraton scenaros are consdered n our analyss. In scenaro 3, we smulate 4 applcaton classes wth the workloads bult from our traces. Scenaro 4 uses a larger number (.e., 8) of applcaton classes, whose workloads are bult by duplcatng each of the 4 baselne request traces, Revenue 6 4 2 2 4 Hypoexponental Sngle resource 6 44 288 432 576 72 864 Tme (s) Fg.. Provder s Revenue Over Tme for Dynamc Capacty (Scenaro 5). and shftng the requests n each replca by 6 hours to the future. Average servce tmes at each ter are selected so as to make them underprovsoned for the aggregated workload. Moreover, the aggregated demand on each ter as well as the throughput SLA requrements are fxed n both scenaros. The parameter values are gven n Tables III and IV. The controller nterval s set to 5 mnutes. The revenue cumulatve dstrbutons for both scenaros are shown n Fgures -a) and -b). The hypoexponental approach yelds the hghest revenues n both cases. In scenaro 3, the statc strategy s as good as the hypoexponental approach. The very smlar profles of all four workloads leave lttle room for mprovements from dynamc management. Note, however, ts sgnfcant degradaton n scenaro 4, whch has more opportuntes for dynamc capacty allocaton among the 8 classes. In ths case, the hypoexponental approach provdes average per-day revenue gan of 429%. As n scenaro 2, the hypoexponental approach s more cost-effectve than the approach, yeldng average per-day revenue gans of 2% and 26% n scenaros 3 and 4, respectvely. Agan, the sngle-resource approach s outperformed by orders of magntude. D. Varyng the Avalable Capacty In our last scenaro, we evaluate the applcablty of the self-adaptve strateges when the total capacty avalable at one of the ters (.e., ter ) drops suddenly. Ths would be the case, for nstance, when the frst ter (e.g., an HTTP server) s target of a malcous attack (e.g., a Denal-of-Servce attack) and has to dedcate some of ts capacty for recovery and other management tasks. Durng ths perod, the local

capacty avalable for servng legtmate requests from the hosted classes decreases. The other ter (e.g., an applcaton or a database server) can stll dedcate ts full capacty to them. We ran smulatons wth the same workload and system parameters used n scenaro 3. However, we lmt the smulaton to the day shown n Fgure 9. Ter capacty s dynamcally reduced by 25% durng the hghlghted 6-hour perod. Average servce tmes are scaled up accordngly. Fgure shows the revenues for the hypoexponental and sngleresource approaches. Clearly, the mult-ter hypoexponental approach s much more robust, yeldng sgnfcantly hgher revenues even when ter capacty s reduced. Scenaro 5 was devsed to allow a prelmnary evaluaton of the applcablty of our self-adaptve mult-ter framework for vrtualzed envronments under securty attacks. More sophstcated testng scenaros as well as workload and system modelng strateges are drectons we ntend to pursue next. VI. CONCLUSIONS AND FUTURE WORK In ths paper, we presented a new self-adaptve capacty management framework for mult-ter vrtualzed systems. Bult from a prevous sngle-resource model, our extended framework shares wth ts orgn the two-level SLA-drven prcng model. However, t mplements a much more accurate mult-queue performance model, whch captures applcaton specfc bottlenecks and the parallelsm nherent to multter archtectures, as well as an extended and much more challengng optmzaton model. We ran smulaton experments for fve confguraton scenaros, whch hghlghted dfferent trade-offs of our new solutons, compared to the sngle-resource model and a mult-ter statc allocaton strategy. Our man conclusons are threefold. Frst, our mult-ter self-adaptve solutons scale well and are sgnfcantly more cost-effectve than the statc allocaton for heavy and unbalanced workloads. Second, the smplfed and naccurate sngle-resource performance model leads to very conservatve allocaton decsons, whch ultmately, compromse ts cost-effectveness, compared to the mult-ter selfadaptve approaches and, commonly, to the mult-ter statc approach. Ths was true even for homogeneous and balanced workloads. Fnally, our mult-ter approaches are robust and can be appled for capacty management of vrtualzed envronments subjected to capacty varatons due to the local executon of management and securty-related tasks. Possble drectons for future work nclude: (a) extendng our models for alternatve applcaton traffc patterns and to capture each ter specfc resources ndvdually; (b) ncludng operatonal costs (e.g., energy) n our framework; (c) desgnng rcher busness models; (d) further evaluatng the soluton for envronments under stress (.e., attacks); and (e) prototypng the capacty manager n a real system. ACKNOWLEDGMENT Ths work was developed n collaboraton wth HP Brazl R&D. REFERENCES [] J. Ross and G. Westerman, Preparng for Utlty Computng: The Role of IT Archtecture and Relatonshp Management, IBM Systems Journal, vol. 43, no., pp. 5 9, 24. [2] J. Chase, D. Anderson, P. Thakar, A. Vahdat, and R. Doyle, Managng Energy and Server Resources n Hostng Centers, n 8th ACM SOSP, Banff, Canada, 2. [3] P. Barham, B. Dragovc, K. Fraser, S. Hand, T. Harrs, A. Ho, R. Neugebauer, I. Pratt, and A. Warfeld, Xen and the Art of Vrtualzaton, n 9th ACM SOSP, Bolton Landng, NY, 23. [4] A. Whtaker, M. Shaw, and S. Grbble, Scale and Performance n the Denal Isolaton Kernel, n 5th OSDI, Boston, MA, 22. [5] G. Banga, P. Druschel, and J. Mogul, Resource Contaners: a New Faclty for Resource Management n Server Systems, n 3rd OSDI, New Orleans, LO, 999. [6] J. Wlkes, J. Mogul, and J. Suermondt, Utlfcaton, n th ACM SIGOPS European Workshop: Beyond the PC, Leuven, Belgum, 24. [7] X. Lu, X. Zhu, S. Snghal, and M. Arltt, Adaptve Enttlement Control to Resource Contaners on Shared Servers, n 9th IFIP/IEEE IM, Nce, France, 25. [8] B. Abrahao, V. Almeda, J. Almeda, A. Zhang, D. Beyer, and F. Safa, Self-Adaptve SLA-Drven Capacty Management for Internet Servces, n th IEEE/IFIP NOMS, Vancouver, Canada, 26. [9] J. Almeda, V. Almeda, D. Ardagna, C. Francalanc, and M. Truban, Resource Management n the Autonomc Servce-Orented Archtecture, n 3rd IEEE ICAC, Dubln, Ireland, 26. [] Z. Lu, M. S. Squllante, and J. L. Wolf, On Maxmzng Servce-Level- Agreement Profts, n 3rd ACM Conference on Electronc Commerce, Tampa, Florda, 2. [] D. Vllela, P. Pradhan, and D. Rubensten, Provsonng Servers n the Applcaton Ter for e-commerce Systems, n 2th IEEE Internatonal Workshop on Qualty of Servce, Passau, Germany, 24. [2] D. Menascé and M. Bennan, Autonomc Vrtualzed Envronments, n IEEE Internatonal Conference on Autonomc and Autonomous Systems, Slcon Valley, CA, 26. [3] L. Cherkasova and P. Phaal, Sesson-Based Admsson Control: A Mechansm for Peak Load Management of Commercal Web Stes, IEEE Transactons on Computers, vol. 5, no. 6, pp. 669 685, 22. [4] D. Menascé, V. Almeda, R. Fonseca, and M. Mendes, Busness- Orented Resource Management Polces for e-commerce Servers, Performance Evaluaton, vol. 42, no. 2-3, pp. 223 239, 2. [5] J. Rola, X. Zhu, M. Arltt, and A. Andrzejak, Statstcal Servce Assurances for Applcatons n Utlty Grd Envronments, Performance Evaluaton, vol. 58, no. 2+3, pp. 39 339, 24. [6] F. Baskett, M. Chandy, R. Muntz, and F. Palacos, Open, Closed, and Mxed Networks of Queues wth Dfferent Classes of Customers, Journal of the ACM, vol. 22, no. 2, pp. 248 26, 975. [7] J. Rola and K. Sevck, The Method of Layers, IEEE Transactons on Software Engneerng, vol. 2, no. 8, pp. 689 7, 995. [8] B. Urgaonkar, G. Pacfc, P. Shenoy, M. Spretzer, and A. Tantaw, An Analytcal Model for Mult-Ter Internet Servces and ts Applcatons, n ACM SIGMETRICS, Banff, Canada, 25. [9] Y. Dao, J. Hellersten, S. Parekh, H. Shakh, M. Surendra, and A. Tantaw, Modelng Dfferentated Servces of Mult-Ter Web Applcatons, n 4th IEEE MASCOTS, Washngton, DC, 26. [2] B. Abraham and J. Ledolter, Statstcal Methods for Forecastng. John Wley and Sons, 983. [2] H. Perros and K. Elsayed, Call Admsson Control Schemes: A Revew, IEEE Communcatons Magazne, vol. 34, no., pp. 82 9, 23. [22] V. Paxson and S. Floyd, Wde Area Traffc: the Falure of Posson Modelng, IEEE/ACM Transactons on Networkng, vol. 3, no. 3, pp. 226 244, 995. [23] L. Klenrock, Queueng Systems. John Wley and Sons, 975. [24] K. Trved, Probablty and Statstcs wth Relablty, Queung and Computer Scence Applcatons. John Wley and Sons, 22. [25] R. Fourer, D. Gay, and B. Kernghan, AMPL: A Modelng Language for Mathematcal Programmng. Boyd and Fraser, 993. [26] P. Gll, W. Murray, and M. Saunders, SNOPT: An SQP Algorthm for Large-Scale Constraned Optmzaton, SIAM Journal on Optmzaton, vol. 2, no. 4, pp. 979 6, 22.