Cloud Auto-Scaling with Deadline and Budget Constraints

Prelmnary verson. Fnal verson appears In Proceedngs of 11th ACM/IEEE Internatonal Conference on Grd Computng (Grd 21). Oct 25-28, 21. Brussels, Belgum. Cloud Auto-Scalng wth Deadlne and Budget Constrants Mng Mao,Je L,Marty Humphrey Department of Computer Scence Unversty of Vrgna Charlottesvlle, VA, USA 2294 {mng, l3yh, humphrey}@cs.vrgna.edu Abstract Clouds have become an attractve computng platform whch offers on-demand computng power and storage capacty. Its dynamc scalablty enables users to quckly scale up and scale down underlyng nfrastructure n response to busness volume, performance desre and other dynamc behavors. However, challenges arse when consderng computng nstance non-determnstc acquston tme, multple VM nstance types, unque cloud bllng models and user budget constrants. Plannng enough computng resources for user desred performance wth less cost, whch can also automatcally adapt to workload changes, s not a trval problem. In ths paper, we present a cloud auto-scalng mechansm to automatcally scale computng nstances based on workload nformaton and performance desre. Our mechansm schedules VM nstance startup and shut-down actvtes. It enables cloud applcatons to fnsh submtted obs wthn the deadlne by controllng underlyng nstance numbers and reduces user cost by choosng approprate nstance types. We have mplemented our mechansm n Wndows Azure platform, and evaluated t usng both smulatons and a real scentfc cloud applcaton. Results show that our cloud auto-scalng mechansm can meet user specfed performance goal wth less cost. Keywords-cloud computng; auto-scalng; dynamc scalablty; nteger programmng I. INTRODUCTION Clouds have become an attractve computng platform whch offers on-demand computng power and storage capacty. Its dynamc scalablty enables users to scale up and scale down the underlyng nfrastructure n response to busness volume, performance desre and other dynamc behavors. To offload cloud admnstrators burden and automate scalng actvtes, cloud computng platforms have also offered mechansms to automatcally scale up and down VM capacty based on user defned polcy, such as AWS auto-scalng [1]. Usng auto-scalng, users can defne trggers by specfyng the performance metrcs and thresholds. Whenever the observed performance metrc s above or below the threshold, a predefned number of nstances wll be added to or removed from the applcaton. For example, a user can defne a trgger lke Add 2 nstances when CPU usage s above 6% for 5 mnutes. Such automaton largely enhances the cloud dynamc scalablty benefts. It transparently adds more resources to handle ncreasng workload and shuts down unnecessary machnes to save cost. In ths way, users do not have to worry about capacty plannng. The underlyng resource capacty can be adaptve to the applcaton real-tme workload. However, challenges arse when people look deeper nto the mechansms. In cloud auto-scalng mechansms, performance metrcs normally nclude CPU utlzaton, dsk operaton and bandwdth usage, etc. Such nfrastructure level performance metrcs are good ndcators for system utlzaton nformaton. But t cannot clearly reflect the qualty of servce a cloud applcaton s provdng or tell whether the performance meets user s expectaton. Choosng approprate performance metrc and fndng precse threshold s not a straghtforward task, and cases become more complcated f the workload pattern s contnuously changng. Moreover, consderng ndvdual utlzaton nformaton only may not robust to scale [9]. For example, a cluster gong from 1 to 2 nstances can ncrease capacty by 1%, whle gong from 1 to 11 nstances can only ncrease capacty by 1%. Current smple auto-scalng mechansms normally gnore such non-constant effects when addng a fxed number of resources. Another factor such auto-scalng mechansms overlook s the tme lag to boot a VM nstance. Though nstance acquston requests can be made at any tme, they are not mmedately avalable to users. Such nstance startup lag typcally nvolves fndng the rght spot for the requested nstances n cloud data center, downloadng specfed OS mage, bootng the vrtual machne, and fnshng network setup, etc. Based on our experences and research [5], t could take as long as 1 mn to start an nstance n Wndows Azure, and such startup lag can change over tme. In other words, t s very lkely that users may request nstances late f they do not consder nstance startup tme factor. Cost s also an ssue worth careful consderaton when usng cloud. Cloud computng nstances are charged by hours. A fracton of an hour s counted as a whole hour. Therefore, t could be a waste of money for machnes shut down before a whole hour operaton. In addton to notcng the full hour prncpal, clouds now usually offers varous nstance types, such as hgh-cpu and hgh I/O nstances. Choosng approprate nstance types based on the applcaton workload can further save user money and mprove performance. We beleve cloud scalng actvtes can be done better by consderng usng dfferent nstance types than ust manpulatng nstance numbers. In ths paper, we present a cloud dynamc scalng mechansm, whch could automatcally scale up and scale down underlyng cloud nfrastructures to accommodate changng workload based on applcaton level performance metrc ob deadlne. Durng the scalng actvtes, the

mechansm tres to form a cheap VM startup plan by choosng approprate nstance types, whch could save more cost compared to only consderng one nstance type. The rest of ths paper s organzed as followng. Secton II ntroduces the related work. Secton III dentfes cloud scalng characterstcs and descrbes applcaton performance model. Secton IV formalzes the problem and detals our mplementaton archtecture n Wndows Azure platform. Secton V evaluates our mechansm usng both smulatons and a real scentfc applcaton. Secton VI concludes the paper and descrbes future works. II. RELATED WORK There have been a number works on dynamc resource provsonng n vrtualzed computng envronment [9][1][12][4]. Feedback control theory has been appled n these works to create autonomc resource management systems. In [9][1], target range s proposed to solve the control stablty ssue. Further n [9], t focuses on control system desgn. It ponts out that reszng nstances s a coarse graned actuator when applyng control theory n cloud envronment and proposed proportonal threasholdng to fx the non-constant effect problem. These works use nfrastructure level performance metrcs and manly focus on control theory applcaton n cloud envronment. They do not consder varous VM types or total runnng cost. In [8], dynamc scalng s explored for cloud web applcatons. They consdered web server specfc scalng ndcators, such as the number of current users and the number of current connectons. The work uses smple trggers and thresholds to determne nstance number and does not consder VM type nformaton and budget constrants as well. In [4], they consdered extendng computng capacty usng cloud nstances and compared the ncurred cost of dfferent polces. Partcularly n cloud computng, dynamc scalablty becomes more attractve and practcal because of the unlmted resource pool. Most cloud provders offer cloud management API to enable users to control ther purchased computng nfrastructure programmatcally, but few of them drectly offers a complete soluton for automatc scalablty actvtes n cloud. Amazon web servce auto-scalng servce s one of them. AWS auto-scalng s a mechansm to automatcally scale up and down vrtual machne nstances based on user defned trggers [1]. Trggers descrbe the thresholds of observed performance metrc, whch nclude CPU utlzaton, network usage and dsk operatons. Whenever the montored metrc s above the upper lmt, a predefned number of nstances wll be started, and when t s below the lower lmt, a predefned number of nstances wll be shut down. Another work worth mentonng here s RghtScale [3]. It works as a broker between users and cloud provders by provdng unfed nterfaces. Users can nteract wth multple cloud provders on one screen. The ncely desgned user nterface, hghly customzed OS mages and many predefned utlty scrpts enable users to deploy and manage ther cloud applcatons quckly and convenently. In dynamc scalng, they borrow the dea of trggers and thresholds but extend scalng ndcator choces broadly. Includng system utlzaton metrcs, they further support some popular mddle-ware performance metrcs, such as Mysql connectons, Apache http server requests and DNS queres. However, these scalng ndcators may not be able to support all applcaton types and not all of them can drectly reflect qualty of servce requrements. Also, they do not consder cost explctly. To the best of our knowledge, our work s the frst auto-scalng mechansm whch addresses both performance and budget constrant n cloud. III. CLOUD SCALING A. Cloud Scalng Characterstcs and Analyss As a computng platform, clouds own dstnct characterstcs compared to utlty computng and grd computng. We have dentfed the followng characterstcs whch can largely affect the way people use cloud platforms, especally n cloud scalng actvtes. Unlmted resources lmted budget. Clouds offer users unlmted computng power and storage capacty. Though by default the resource capacty s capped to some number, e.g., 2 computng unts per account n Wndows Azure, such usage cap s not a hard constrant. Cloud provders allow users to negotate for more resources. Unlmted resource enables applcatons to scale to extremely large sze. On the other hand, these unlmted resources are not free. Every cycle used and byte transferred are gong to appear on the bll. Budget cap s a necessary constrant for users to consder whey they deploy applcatons n clouds. Therefore, a cloud auto-scalng mechansm should explctly consder user budget constrants when acqurng resources. Non-gnorable VM nstance acquston tme. Though cloud nstance acquston requests can be made at any tme and computng power can be scaled up to extremely large, t does not mean cloud scales fast. Based on our prevous experences and research [5], t could take around 1 more mnutes from an nstance acquston request untl t s ready to use. Moreover, such nstance startup lag could keep changng over the tme. On the other sde, VM shuttng down tme s qute stable, around 2-3 mnutes n Wndows Azure. Ths mples that users have to consder two ssues n cloud dynamc scalng actvtes. Frst, count n the computng power of pendng nstances. If an nstance s n pendng status, t means t s gong to be ready soon. Ignorng pendng nstances may result n bootng more nstances than necessary, therefore waste money. Second, count how long the pendng nstance has been acqured and how long further t needs to be ready to use. If the startup tme delay can be well observed and predcted, applcaton admn can acqure machnes n advance and prepare early for workload surges. Full hour bllng model. The pay-as-you-go bllng model s attractve, because t saves money when users shut down machnes. However, VM nstances are always blled by hours. Fracton consumpton of an nstance-hour s counted as a full hour. In other words, 1 mnute and 6 mnute usage are both blled as 1 hour usage and f an nstance s started and shut down twce n an hour, users wll be charged for two nstance hours. The shuttng down tme therefore can greatly affect cloud cost. If cloud auto-scalng

mechansms do not consder ths factor, t could be easly trcked by fluctuate workloads. Therefore, a reasonable polcy s that whenever an nstance s started, t s better to be shut down when approachng full hour operaton. Multple nstance types. Instead offerng one sut-for-all nstance type, clouds now normally offer varous nstance types for users to choose. Users can start dfferent types of nstances based on ther applcatons and performance requrement. For example, EC2 nstances are grouped nto three famles, standard, hgh-cpu and hgh-memory. Standard nstances are sutable for all general purpose applcatons. Hgh-CPU nstances are well suted for computng ntensve applcaton, lke mage processng. Hgh-memory nstances are more sutable for I/O ntensve applcaton, lke database systems and memory cachng applcatons. One mportant thng s that nstances are charged dfferently and not necessarly proportonal to ts computng power. For example, n EC2, c1.medum costs twce as much as m1.small. But t offers 5 tmes more compute power than m1.small. Thus for computng heavy obs t s cheaper to use c1.medum nstead of the least expensve m1.small. Therefore, users need to choose nstance type wsely. Choosng cost-effectve nstance types can both mprove performance and save cost. B. Cloud Applcaton Performance Model In ths paper, we consder the problem of controllng cloud applcaton performance by automatcally manpulatng the runnng nstance types and nstance numbers. Instead of usng nfrastructure level performance metrcs, we target applcaton level performance metrc, the response tme of a submtted ob. We beleve a drect performance metrc can better reflect users performance requrements, therefore can better nstruct cloud scalng mechansms for precse VM schedulng. At the same tme, we ntroduce cost as the other goal n our cloud scalng mechansm as well. Our problem statement s how to enable cloud applcatons to fnsh all the submtted obs before user specfed deadlne wth as lttle money as possble. To keep the cloud applcaton performance model general and smple, we consder a sngle queue model as shown n Fg. 1. Also, we make followng assumptons. Workload s consdered as non-dependent obs submtted n the ob queue. Users don t have knowledge about ncomng workload n advance. Jobs are served n FCFS manner and they are farly dstrbuted among the runnng nstances. Every nstance can only process a sngle ob at one tme. All the obs have the same performance goal, e.g. 1 hour response tme deadlne (from submsson to fnsh). Deadlne can be dynamcally changed VM nstances acquston requests can be made at any tme, but t may take a whle for newly requested pendng nstance to be ready to use. We call such tme VM startup delay. There could be dfferent classes of obs, such as computng ntensve obs and I/O ntensve obs. A ob class may have dfferent processng tme on dfferent nstance types. For example, a computng ntensve ob can run faster on hgh-cpu machnes than hgh-i/o machnes. The ob queue s large enough to hold all unprocessed obs and ts performance scales well wth ncreasng number of nstances. Fgure 1. Cloud applcaton performance model IV. SOLUTION & ARCHITECTURE Based on the problem descrpton n prevous secton, we formalze the problem n ths secton and present our mplementaton archtecture n Wndows Azure. A. Soluton One of the key nsghts to ths problem s that, to fnsh the all submtted obs before the deadlne, auto-scalng mechansm needs to ensure that the computng power of all acqured VM nstances s large enough to handle the workload. We summarze the key varables n the Table. I. TABLE I. Varables J KEY VARIABLES USED IN CLOUD PERFORMANCE MODEL the th ob class Meanng n the number of J submtted n the queue V I c d s v v, v the VM type the th nstance (runnng or pendng) the cost per hour of VM type V average startup delay of VM type V the tme already spent n pendng status of I t average processng tme of runnng ob J on V D C W P deadlne (e.g. 1 hour or 1 seconds) budget constrant (dollars/hour) Workload obs need to be fnshed computng power obs can be fnshed Usng the above notatons, we defne the system workload as a vector W. For each ob class J, there are submtted obs. W = ( J, n ) The computng power of nstance I can be represented as a vector P. The dea s to calculate how many obs can be fnshed for each ob class before the deadlne on nstance I.We use deadlne and ndvdual completon tme (assume all the obs are fnshed by that nstance) rato to approxmate the number of obs that can be fnshed. n

D n P = ( J, ) t n, type( I ) For nstance whose status s pendng, ts computng power can be represented as followng, where s s the tme already spent n startng the nstance. ( D ( dtype( I ) s )) n P = ( J, ) t n, type( I ) Therefore, the total computng power of current nstance can be represented as. Clearly f W > P, we need to P start more nstances P ' ( ' means new nstances) to handle the ncreased workload. The problem becomes fndng a VM nstance combnaton plan, n whch P ' W P At the same, we also want to mnmze the cost we spend for these newly added nstances. Mn( c ) type( I ') In the cases where there are nsuffcent budget, the dea to generate as much computng power as possble wthn the budget constrants type I Max( P ') c C c ( ') type( I ) When one nstance I s approachng full hour operaton, s we need to decde whether to shut-down the machne or not. In ths case, we can calculate the computng power wthout nstance I, and compare wth the workload. If the s computng power s stll bg enough to handle the workload, we can remove the nstance. P P W s To better explan the problem, we can go through a smple example. Assume we have three ob classes ( 1, 2, 3 ) and three VM types ( V 1, V 2, V 3 ). Currently, the workload n the system s [6, 6, 6] and there are two runnng nstances I 1 and I 2. Our goal s to fnd a VM type combnaton [ n ' 1, n ' 2, n ' 3 ], whose computer power s greater than or equal to target computng power and ther cost s mnmal among all the possble VM type combnatons. 1 : x 6 1 1 4 2 : y 6 5 2 35 = 3 : z 6 2 5 35 { { { { P ' W I I 2 1 : 1 1 1 x 45 2 : n1 ' 5 n2 ' 2 n3 ' 1 y 35 + + = 3 : 2 5 1 z 35 { { { { V V V P ' 1 2 3 1 Where Mn( c n ' + c n ' + c n ') 1 1 2 2 3 3 c n ' + c n ' + c n ' + ctype I + ctype I C 1 1 2 2 3 3 ( 1) ( 2 ) From the above analyss, our cloud auto-scalng mechansm s reduced to several nteger programmng problems. We try to mnmze the cost or maxmze the computng power wth ether computng power constrants or budget constrants. There are qute a few standard approaches to solve nteger programmng problems, such as cuttng-plane and branch-and-bound methods [13] [14]. We wll not duplcate the detals here. In addton to determnng the number and type of VM nstances, there are some other cases lke admsson control and deadlne mss handlng whch are also nterested to thnk about n cloud auto-scalng mechansms. However, our work s ntenson s not to create a hard real-tme cloud system whch all obs deadlne are guaranteed, we focus on automatc resource provsonng based on both performance goals and budget constrants. Deadlne s ust the metrc we choose, because t can better reflect users performance desre. Therefore, n real practce we beleve these are more lke polcy questons. Users can choose ther own polces based on ther applcatons. For example, to mantan servce avalablty and basc computng power, users can decde the mnmum number of runnng nstances. In other words, even there s no workload, a cloud applcaton wll always have at least 1 runnng nstance. For admsson control cases, when there s nsuffcent budget, auto-scalng mechansm could ether accept the ob and try to run wth maxmum computng power wthn the user budget constrants or users can smply deny the ob. In ether case, users may want to get notfcaton from the mechansm. For deadlne mss handlng, users can ether leave t alone or allow auto-scalng mechansm to ncrease as many nstances as possble to speed up the remanng processng. In our mplementaton, we have mplemented these polces and let user to confgure whch polcy s most approprate for ther cases, and users are allowed to mplement ther own polces as well. B. Archtecture We have desgned and mplemented our cloud autoscalng mechansm n Wndows Azure [3]. Fgure 2 shows the archtecture of our mplementaton. The mplementaton ncludes four components. They are performance montor, hstory repostory, auto-scalng decder and VM manager. Performance montor observes the current workload n the system, collects actual ob processng tme and arrval pattern nformaton, and updates the hstory repostory. VM manager works as the adapter between our auto-scalng mechansm and cloud provders. It montors all pendng and ready VM nstances, and updates hstory repostory wth actual startup tme of dfferent VM types. Moreover, t executes VM startup plan generated by auto-scalng decder and drectly nvokes cloud provder resource provsonng APIs. In our case, t s Wndows Azure management API. Our ntenton s that VM manager hdes all cloud provder detals and can be easly replaced wth other cloud adapters. Such nformaton hdng enhances the reusablty and

customzablty of our mplementaton when workng wth dfferent cloud provders. Hstory repostory contans two data structures. One s the confguraton fle, whch ncludes applcaton deadlne, budget constrant, montor executon nterval nformaton, etc. As shown n Fg. 2, applcaton admnstrators can dynamcally control the behavor of cloud auto-scalng mechansm by changng the confguraton fle. The other data structure s hstorcal data table, whch records the hstorcal ob processng tme and arrval pattern nformaton provded by performance montor, and nstance startup delay nformaton provded by VM manager. By mantanng hstorcal data, the repostory mproves the nput parameter precseness and also helps decder to prepare for possble workload surges early. Decder s the core of our cloud auto-scalng mechansm. Relyng on real-tme workload and VM status nformaton from performance montor and VM manager, as well as confguraton parameters and hstorcal records from hstory repostory, t solves the nteger programmng problem we formalzed n the prevous secton and generates a VM startup plan for VM manager to execute. The VM startup plan could be empty because the workload may be well handled by extng nstances or t can contan nstance type and number pars to notfy VM manager acqure enough computng power. In our current mplementaton, we use Mcrosoft Solver Foundaton [11] to solve the nteger programmng problem. Acqurng nstance actons are ntaled by decder. After every sleep nterval, t nvokes the logc to determne the VM startup plan. On the other sde, releasng nstance actons are ntaled by VM manager because t montors whch nstance s approachng full hour operaton and could be the potental shut-down targets. But t has to ask decder to see whether remanng computng power s large enough to handle the workload. We have publshed our current mplementaton as a lbrary and plug t n MODIS applcaton [7]. The evaluaton of our mechansm n ths real scentfc applcaton can be found n the next secton. Mn( c ) type( I ') P ' > W P Fgure 2. Archtecture of Cloud auto-scalng n Azure V. EVALUATION In ths secton, we evaluate our mechansm usng both smulatons and a real scentfc applcaton (MODIS) runnng n Wndows Azure. Through smulaton framework, we can easly control the nput parameters, such as workload pattern and ob processng tme, whch helps to dentfy the key factors n our mechansm. Moreover, usng smulaton extensvely reduces the evaluaton tme and cost. The scentfc applcaton tests our mechansm s performance n real envronment. In our evaluaton, we smulated three types of obs. They are mx, computng ntensve and I/O ntensve. At the same tme, we smulated three types of machnes. They are General, Hgh-CPU and Hgh-I/O machnes. We summarze ther smulaton parameters n Table II. The smulaton data s derved from prcng tables and nstance descrptons of EC2. For example, n EC2, c1.medum nstance costs twce as much as m1.small. But t offers 5 tmes more compute power than m1.small [1]. In our case, we assume mx obs are half computaton and half I/O. The speedup factor of powerful machnes s 4-5. General.85$/hour Delay 6s Hgh-CPU.17$/hour Delay 72s Hgh-IO.17$/hour Delay 72s TABLE II. Mx Avg 3 obs/hour STD 5 obs/hour Average 3s STD 5s Average 21s STD 25s Average 21s STD 25s AVAREAGE PROCESSING TIME Computng Intensve Avg 3 obs/hour STD 5 obs/hour Average 3s STD 5s Average 75s STD 15s Average 3s STD 5s I/O Intensve Avg 3 obs/hour STD 5 obs/hour Average 3s STD 5s Average 3s STD 5s Average 75s STD 15s A. Deadlne For deadlne performance goal, we consder two cases. 1) Stable workload wth changng deadlne. We generate the workload usng Table II and plot the ob response tme n Fg. 3. Every data pont n the graph reflects the ob response tme n every 5 mnutes and we record average, mnmum and maxmum response tme for all the obs fnshed n that nterval. The deadlne s frst set as 36s, then changed to 54s and fnally swtched back. The purpose s to evaluate our mechansm s reacton to dynamc user performance requrement change. Fg. 3 shows that more than 95% of obs are fnshed wthn the deadlne and most of the msses happen at the second deadlne change. Ths s manly because our auto-scalng mechansm runs every 5 mnutes and VM nstances can only be ready 1-12 mnutes later after acquston requests. Besdes, we also calculate the nstantaneous nstance utlzaton rate. Job processng s consdered as utlzed whle all the other cases, such as pendng and dlng, are consdered as unutlzed. The hgh utlzaton rate (average 94%) shows that our mechansm does not aggressvely acqure nstances to guarantee the deadlne, and 6% of tme s spent on VM startups. 2) Changng workload wth fxed deadlne. In ths test, we fx the deadlne to 36s and create three workload peaks. Base workload s 3 mx obs per hour. The frst workload peak adds another 3 mx obs per hour. The second peak adds 3 computng ntensve obs per hour, and the thrd one adds 3 I/O ntensve obs per hour. The purpose of ths

test s to evaluate our mechansm s reacton to sudden ncreasng workload and ob type changes. Such workload pattern s normally seen n large volume data processng applcatons, n whch data computaton and analyss s performed n day tme, and data backups and movements are performed n nghts and holdays. From Fg. 4, we can see that the deadlne goal s well met for all three workload peaks. When workload goes back to normal, the over acqured nstances durng peak moments quckly reduce ob response tme. As more and more unnecessary nstances are shut down (approachng full hour operaton), the response tme goes back to average. Response (sec) 7 6 5 4 3 2 1 Utlzaton (%) 1 2 3 4 5 6 7 8 Tme (hour) utlzaton deadlne avg max mn Response (sec) 4 35 3 25 2 15 1 5 Stable Workload & Changng Deadlne Fgure 3. Stable workload wth changng deadlne Changng Workload & Fxed Deadlne 1 2 3 4 5 6 7 8 Tme (hour) deadlne avg max mn workload Fgure 4. Changng workload wth fxed deadlne 1.% 9.% 8.% 7.% 6.% 5.% 4.% 3.% 2.% 1.%.% Worload (ob/h) 35 B. Cost Usng the same evaluaton as we dd for changng workload fxed deadlne, we compare the cost of usng dfferent types of VM nstance. The VM type combnatons are llustrated n Table III. Fg. 5 shows the comparson result. TABLE III. INSTANCE TYPE VM Types Total Cost ($) % more than optmal Choce #1 General 98.52$ (43%) Choce #2 Hgh-CPU 128.86$ (87%) Choce #3 Hgh-IO 129.71$ (88%) Choce #4 General, Hgh-CPU, Hgh-IO 78.62$ (14%) Optmal General, Hgh-CPU, Hgh-IO 68.85$ 3 25 2 15 1 5 To evaluate the performance of our mechansm, n addton to the four choces, we also calculate the possble optmal cost for the same workload and compare our soluton wth t. The optmal soluton can be obtaned because we know the workload n advance and we assume we can always put a ob to the most cost-effectve machnes, e.g., put computng ntensve obs on Hgh-CPU nstances for processng. From Fg. 5, we can see that by consderng all avalable nstance types (Choce #4), our mechansm can adapt to the workload changes and choose cost-effectve nstances. In ths way, the real-tme cost s always close to the optmal cost. On the other sde, General nstances always performs on average for all three workload peaks, whle Hgh-CPU and Hgh-IO can only save cost on ts preferred workload surges. Fg. 6 shows the accumulated cost. Choce #4 ncurs 14% more cost than the optmal soluton and saves 2% cost compared to General nstance choce, 45% compared to Hgh-CPU and Hgh-IO. Because of symmetry, Hgh-CPU and Hgh-IO nstances end up wth almost the same cost. General nstances has lower cost on average, therefore, n the long run, t outperforms Hgh-CPU and Hgh-IO cases. By choosng approprate nstance types, choce #4 can ncur less cost n all three workload peaks lke the optmal soluton, hence, t outperforms all the other cases. There are two reasons why our soluton cannot make the optmal decson. Auto-scalng decder does not know the future workload and can only make decsons locally. Second, t cannot control the runnng nstance for processng a ob. Cost (Dollar/hour) 6 5 4 3 2 1 Instantaneous Cost 1 2 3 4 5 6 7 8 Tme (hour) Choce #1 Choce #2 Choce #3 Choce #4 Optmal Fgure 5. Instantaneous cost of changng workload & fxed deadlne Cost (Dollar) 14 12 1 8 6 4 2 Accumulated Cost 1 2 3 4 5 6 7 8 Tme (hour) Choce #1 Choce #2 Choce #3 Choce #4 Optmal Fgure 6. Accumulated cost of changng workload & fxed deadlne

C. MODIS In addton to smulatons, we also have appled our approach to a real scentfc cloud applcaton MODIS [7]. MODIS s a cloud applcaton bult n Wndows Azure platform for large volume bophyscal data processng. It ntegrates data from ground-based sensors wth the Moderate Resoluton Imagng Spectroradometer satellte data. It s now used by bometeorology lab, UC Berkeley. We frst ntroduce MODIS workload and some confguraton parameters appled. MODIS workload can be understood n the followng way. 2X ndcates the year, Terra and Aqua represent satellte mages, and (x-y) represents the perod from day x to day y. For all our tests, we use all avalable 15 tle mages n MODIS system for a sngle day data processng. For example, Terra 24 (1-12) means processng all 15 tles of Terra mages from 24 Jan 1th to Jan 12th. Ths mples that totally 45 (15 3) obs are submtted at once. In our evaluaton, we fnd the actual ob processng tmes range from 1 sec to 13 mn wth average 5 mn and obs are processed most cost-effectvely n small nstance types. We set the performance montor nterval as 1 mn, decder nterval as 5 mn, ntal average VM delay as 15mn and we only notfy the users when deadlne s mssed. In MODIS evaluaton, we run both moderate scale (up to 2 nstances) and large scale (up to 9 nstances) tests. In moderate scale evaluaton, two test cases are randomly selected. One s Terra satellte 24 (1-12) and the other one s Aqua 28 (3-32). We record the test results n Table IV, ncludng both performance and nstance hours consumed (or cost). The table shows that 2 and 3 hour deadlne goals are better met than 1 hour deadlne for same workloads. After nvestgatng the VM nstance startup hstory, we fnd ths s largely because nstance startup delay s out of our expectaton. For example, n 1 hour deadlne tests, the average startup delay s around 22 mnutes. Some nstances even took 5 mnutes to be ready. There s lttle tme left for our mechansm to react n such cases. On the contrary, n longer deadlne tests, our mechansm acqured fewer nstances and hence the result s less affected by the startup delay varances. In both test cases, the theoretcal computng power needed s 4 nstance hours (all obs are processed by a sngle nstance). All tests actually acqured more than ths, e.g. 9 or 1 nstances hours for 1 hour deadlne test cases. Ths s caused by VM startup delay make up and mprecseness of ntal ob processng tme confguraton. Wth longer deadlnes, such over acquston s corrected because fewer nstances are acqured and ob processng tme s also updated by the hstorcal table. Therefore, longer deadlne test cases also ncur less cost. TABLE IV. Terra 24(1-12) Total 45 obs 4 C.H.* or.48$ Aqua 28(3-32) Total 45 obs 4 C.H. or.48$ MODIS MODERATE SCALE EVALUATOIN 1hour deadlne 2hour deadlne 3hour deadlne 18 mn late 8 mn early 2 mn early 9 C.H.or 1.8$ 6 C.H or.72$ 5 C.H.or.6$ 15mn late 2 mn early 29 mn early 1 C.H or 1.2$ 7 C.H.or.84$ 5 C.H.or.6$ * C.H. computng hour 1C.H. =.12$ n Wndows Azure For large scale (up to 9 nstances) MODIS evaluatons, we performed two tests and recorded the results n Table V. Smlar to moderate scale evaluatons, longer deadlne tests show better results. Agan, unexpected VM startup delay s the domnatng factor. We fnd Wndows Azure has longer VM startup delay and larger varances n large sze nstance acquston cases. For example, n Terra & Aqua 26 (1-75) 2 hour deadlne test, the average VM startup delay s 4 mnutes and there s one nstance whch s stll not ready 2 hours later. For 26 (1-125) 2 hour deadlne test, our decder calculaton shows 95 nstances are needed, whch s beyond our resource lmt. Ths ob s successfully dentfed and dened. TABLE V. Terra & Aqua 26(1-75) Total 1125 obs 93 C.H. or 11.16$ Terra & Aqua 26(1-15) Total 225 obs 185 C.H. or 22.2$ MODIS LARGE SCALE EVALUATOIN 2 hour deadlne 4 hour deadlne 2mn late 6 mn early 17 C.H. or 2.4$ 132 C.H. or 15.84$ Admsson Dened 22 mn early 243 C.H. or 29.16$ To better demonstrate our mechansm workng detals, we present nstance acquston and release nformaton for test case Terra & Aqua 26 (1-75) 4 hour deadlne n Fg. 7. Ths test totally ncludes 1125 obs and s submtted at tme. As shown n the fgure, after around 4 mnutes, the decder started 34 nstances (nstance 1-34) to handle the workload. The real nstance acquston tme took much longer than we confgured. Therefore, around 1.5 hours later, the decder started another 6 nstances (nstance 35-4) to make up for such unexpected startup delay. After approachng 2 full hour operaton, these 6 nstances were shut down due to decreased workload. After all obs are fnshed, nstance 1 to nstance 34 were shut down when they approached 4 hour operaton. At that tme, only nstance was kept alve to mantan servce avalablty. In ths case, the theoretcal ob processng tmes needed s 93 hours. The real nstance hours consumed s 132 hours wth 36 hours spent on VM startup. Both moderate and large scale tests show that longer deadlne has better performance and ncurs less cost. Ths s because longer deadlne tests are less affected by VM startup delay and have more chances to use the updated ob processng tme. Instance Number 4 38 36 34 32 3 28 26 24 22 2 18 16 14 12 1 8 6 4 2 Instance Acquston and Release 1 2 3 4 5 Tme (hour) Released Acqurng Ready Fgure 7. Instance acquston and release

VI. CONCLUSION & FUTURE WORKS In ths paper, we present a mechansm to dynamcally scale cloud computng nstances based on deadlne and budget nformaton. The mechansm automatcally scales up and scales down VM nstances by consderng two aspects of a cloud applcaton - performance and budget. From performance perspectve, our cloud auto-scalng mechansm enables cloud applcatons to fnsh all submtted obs wthn the desred deadlne by acqurng enough VM nstances. From cost perspectve, t reduces user cost by acqurng approprate nstance types whch ncurs less money and shuts down unnecessary nstances when they approach full hour operaton. We nterpreted the nstance startup plan generaton as an optmzaton problem and used nteger programmng to solve t. We have desgned and mplemented our mechansm n Wndows Azure platform, and have evaluated t usng both smulatons and a real scentfc applcaton MODIS. Evaluaton results show that our mechansm can provson enough nstances to meet user deadlne performance goals. Even n the cases of dynamc deadlne change or sudden workload surge, t can well adapt to the outsde behavors. More than 9% percent of submtted obs can meet the deadlne. In our soluton, nteger programmng s used to dentfy the most cost-effectve nstance types based on the ob composton nformaton of ncomng workload, and therefore, our approach can ncur less cost compared to fxed nstance type choces. The cost comparson shows that choosng approprate nstance type can save 2% - 45% compared to fxed nstance types and ncur 15% more compared to the optmal cost. MODIS evaluaton shows that VM startup delay plays qute an mportant role n cloud auto-scalng mechansms. Long unexpected VM startup delay could not only affect the performance, but can also domnate the utlzaton rate, and therefore the cost, especally for short deadlne cases. Workload and ob processng tme are also very mportant factors n our mechansm, because these two drectly affect the number and type of provsoned nstances. We use hstory repostory to mprove ther precseness n our mplementaton. In the future, one extenson of our work s to support ob class level deadlnes and extend cloud applcaton performance model nto mult-ter archtecture. By consderng ob class ndvdually and controllng ts executon nstance, better performance can be acheved through runnng obs on the most cost-effectve nstance types and save more money than far ob dstrbuton. Currently, we are tryng to use multple queues to submt obs by class. In mult-ter applcaton envronment, the amount of resources needed to acheve ther QoS goals mght be dfferent at each ter and may also depend on avalablty of resources n other ters. In both cases, a global vew of the applcaton s needed to generate optmzed resource provsonng plans. Second, ncludng on-demand pay-asyou-go nstances, clouds now offer other types of nstances as well, such as spot nstances and reserved nstances. Spot nstances cost around 1/3 of regular nstance prces, e.g., the average prce of a m1.small spot nstance s 3 cents an hour. It costs 8.5 cents an hour for the same type of on demand nstance. The cheaper cost comes from that cloud provders can automatcally shutdown users spot nstances f the spot prce s above predefned bd prce. Reserved nstances are even cheaper n the long run by payng a contract fee n advance. Complextes are added f cloud auto-scalng consder these cheaper nstances. Because based on our experences, spot nstances take even longer and more nondetermnstc tme to start. Auto-scalng controller needs to consder all these factors to make a VM nstance schedulng decson. To mantan servce avalablty, reserved nstances can be consdered as the always runnng nstances. The other drecton we are workng on s workflow executon n Cloud. In ths paper, we model the workload as submtted obs n a queue. The cost-savng VM startup plan can only be consdered durng an nterval nstead of globally, because users can never know the future workload n advance. In workflow context, however, t s dfferent. Users can foresee all the obs and ther decences; therefore, a globally optmzed VM startup plan can be generated. Besdes, data movement cost could make t a more nterestng problem. We also consder extendng our evaluatons to other real applcatons, lke well-known nternet workload traces, to see how our mechansm works n dfferent workload contexts. REFERENCES [1] AWS auto-scalng. http://aws.amazon.com/autoscalng/ [2] Wndows Azure. http://www.mcrosoft.com/wndowsazure/ [3] RghtScale. http://www.rghtscale.com [4] M. Assuncao et al., Evaluatng the Cost-Beneft of Usng Cloud Computng to Extend the Capacty of Clusters, 18th ACM Internatonal Symposum on Hgh performance Dstrbuted Computng (HPDC 29), pp. 141-15. [5] Z. Hll, J. L, M. Mao, A. Ruz-Alvarez, and M. Humphrey, Early Observatons on the Performance of Wndows Azure, 1st workshop on Scentfc Cloud Computng, 21. [6] R. Doyle, J. Chase, O. Asad, W. Jn, and A. Vahdat, Model-Based Resource Provsonng n a Web Servce Utlty, n Proceedngs of the USENIX Symposum on Internet Technologes and Systems, 23. [7] J. L, D. Agarwal, M. Humphrey, C. Ingen, K. Jackson, Y. Ryu, escence n the Cloud: A MODIS Satellte Data Reproecton and Reducton Ppelne n Wndows Azure Platform, IPDPS, 21 [8] Treu C. Cheu, Aay Mohndra, Alexe A. Karve, Alla Segal: Dynamc Scalng of Web Applcatons n a Vrtualzed Cloud Computng Envronment. ICEBE 29: 281-286 [9] H. Lm, S. Babu, J. Chase, and S. Parekh. Automated Control n Cloud Computng: Challenges and Opportuntes. In 1st Workshop on Automated Control for Datacenters and Clouds, June 29. [1] P. Padala, K. Shn, X. Zhu, M. Uysal, Z. Wang, S. Snghal, A. Merchant, and K. Salem. Adaptve Control of Vrtualzed Resources n Utlty Computng Envronments. EuroSys, 27 [11] Mcrosoft Solver Foundaton. http://code.msdn.mcrosoft.com/solver foundaton [12] B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal. Dynamc provsonng of mult-ter nternet applcatons. ICAC, 25. [13] B. Rountree, D. Lowenthal, S. Funk, V. Freeh, B. Supnsk, and M. Schulz, Boundng energy consumpton n large-scale mp programs. SC 27, November 1-16, 27. [14] V Swamnathan. and K. Chakrabarty. Real-tme task schedulng for energy-aware embedded systems. In IEEE Real-Tme Systems Symposum, November 2.

Cloud Auto-Scaling with Deadline and Budget Constraints

What type of types does the cloud auto - scalng mechansm help reduce user cost?

What does the word " scalng " mean?

What capacty does AWS auto - scalng?