Flexible Distributed Capacity Allocation and Load Redirect Algorithms for Cloud Systems Danilo Ardagna 1, Sara Casolari 2, Barbara Panicucci 1 1 Politecnico di Milano,, Italy 2 Universita` di Modena e Reggio Emilia, Dipartimento di Ingegneria dell Informazione, Italy IEEE Cloud 2011, Washington DC, 6 th July 2011
Research scenario 2 Cloud Computing: u On-demand software/hardware delivering on a pay per use basis u End-users obtain the benefits of the infrastructure without the need to implement it directly u Cloud providers maximize the utilization of their physical resources, minimizing energy costs and obtaining economies of scale Major issues: u Development of efficient service provisioning policies u Modern Clouds live in an open world characterized by continuous changes which occur autonomously and unpredictably
Our contribution 3 Workload prediction-based capacity allocation techniques able to coordinate multiple distributed resource controllers Dynamic load redirection mechanism which determines the requests to be redirected during peak loads Requests distribution optimized according to the average response time
Problem statement 4 WS provider perspective offering multiple transactional WSs hosted at multiple sites of an IaaS provider SLA contract, associated with each WS class k specifies the QoS levels: R k R k WSs are deployed in VMs, each VM hosts a single Web service applica4on
Problem statement 5 Mul2ple (homogeneous) VMs implemen4ng the same WS class can run in parallel Services can be located on mul2ple sites IaaS provider charges the WS provider on a hourly basis
Problem statement 6 Capacity Allocation (CA): Determine the optimal number of VMs for each WS class in each IaaS site according to a prediction of the incoming workload (T 1 mid-long time scale) Load Redirection (LR): If a site resources are insufficient, incoming requests redirected to other sites (T 2 << T 1 short-term time scale)
Cloud System Reference Framework 7 IaaS Provider i=1 i=2 Local workload manager Virtualized Servers Flat VMs! "# $% "# & & & 23 4( 7( 23 4 ( 23 5( 7(!)(!)(!)(!"#$%&'()&*+",-().,"$.#( /&#-( 23 6(!)( On demand VMs!!" #$! "" #$%#$! &" $$!!" #$! %" #$&#$! '" $$ Local WS arrival rates i=4! "# $%! &# $%'$%! (# %% i=3! "# $%! &# $%'$%! (# %% Execution rate of local arrivals! "# $%! &# $%'$%! (# %% Redirect rate of local arrivals Local workload manager! "# $%! &# $%'$%! (# %% Local CA and LR manager Virtualized Servers
Prediction model time scales 8
I K C i c i c i N i Problem formulation System parameters 9 Set of sites Set of WS classes VM instances capacity at site i Time unit cost for flat VMs at site i Time unit cost for on demand VMs at site i Number of flat VMs available at site i µ k Maximum service rate of a capacity 1 VM for executing WS class k requests d i,j,i= j Delay (s) for requests redirecting from site i to site j g i,j = 1 d,i= j Conductance of the communication link (i,j) i,j G i = g i,j,i= j Equivalent conductance seen from site i to the other sites j
Problem formulation Decision variables 10 Nk i Mk i x i k zk i Number of flat VMs allocated for class k request at site i Number of on demand VMs allocated for class k request at site i Execution rate of local arrivals for WS class k request at site i Redirect of WS class k request at site i toward other sites
Design assumptions 11 Each WS class hosted in a VM is modeled as an M/G/1-PS queue The fraction of workload redirected to other sites is inversely proportional to the network delay/directly proportional to the conductance g i,j = 1/d i,j The overall load at site i due to the redirect of other sites is given by: g j,i z j k G j j I,j=i The total rate of class k requests executed at site i is given by: x i k + j=i g j,i z j k G j
Prediction models 12 Exponential Smoothing (ES) models to predict the local arrival rate Λ i k Simple model motivated by the application context: Short-time predictions suitable for real-time autonomic decisions We consider a version of ES, where parameters are dynamically chosen: Λ i k(t + T 1 )=γk(t) i Λ i k(t)+(1 γk(t))λ i i k(t), t > T 1 Λ i k(t 1 )= 1 T 1 T 1 t=1 Λ i k(t)
Prediction models 13 Dynamic ES model by re-evaluating the smoothing factor γ i k (t) at each prediction sample t We use the Trigg and Leach procedure: γ i k (t) = Ai k (t) E i k (t) A i k(t) =φ i k(t)+(1 φ)a i k(t T 1 ) E i k(t) =φ i k(t) +(1 φ)e i k(t T 1 )
Capacity Allocation problem 14 The goal of CA problem is to: u Minimize the overall costs for flat and on demand VM instances of multiple distributed IaaS sites u Guaranteeing that the average response time of each class is lower than the SLA threshold u Solved every T 1 time instant WS requests average response time: R i k = 1 C i µ k Λ i k N i k +Mi k R k = i Λ i k Ri k Λ j k j
Capacity Allocation problem min N i k,m i k k c i Nk i + ci Mk i i 15 i Λ i k <C i µ k (Nk i + Mk) i Λ i k (N k i + M k i) C i µ k (Nk i + M k i) Λ R k Λ j i k k j Nk i N i, i I k K k K, i I k K
Load Redirect problem 16 The goal of LR problem is to: u Cooperatively minimize request average response times u Avoid episodic local congestions due to the variability of the incoming workload u Solved every T 2 time instant WS requests average response time: R i k = C i µ k 1 x i k + j=i g j,i z j k G j N i k +M i k R i k = R i k + j=i x i k + j=i z j k G j g j,i z j k G j
Load Redirect problem min x i k,zi k (N k i + M k i) x i k + j=i k i C i µ k (Nk i + M k i) (xi k + j=i g j,i z j k G j g j,i z j k G j ) + j=i 17 z j k G j x i k + z i k = Λ i k k K, i I x i k + j=i g j,i z j k G j <C i µ k (N i k + M i k) k K, i I x i k,z i k 0 k K, i I Distributed decomposable solution relying on Lagrangian techniques
Load Redirect problem decomposition min x i,y i,z i,w i i (N i + M i ) x i + y i C i µ (N i + M i ) (x i + y i ) + wi 18 x i + z i = Λ i x i + y i <C i µ (N i + M i ) y i = j=i g j,i z j G j i I i I i I w i = j=i z j G j i I x i,y i,z i,w i 0 i I
Duality Theory Primal problem D* Dual problem P* D* P* Lagrangian relaxation (LB) Any feasible solution P* (UB) Strong duality
Load Redirect problem Lagrangian relaxation min x i,y i,z i,w i i (N i + M i ) x i + y i C i µ (N i + M i ) (x i + y i ) + wi + +Θ i y i j=i g j,i z j G j +ηi w i j=i z j G j x i + z i = Λ i x i + y i <C i µ (N i + M i ) x i,y i,z i,w i 0 i I i I i I The relaxed problem further separates into I sub-problems
Dual decomposition For a given set of Θ i s and η i s defines the dual function L(Θ, η) and the dual problem is then given by: max Θ,η L(Θ, η) The dual problem can be solved by using a sub-gradient method: Θ i (t + 1) = Θ i (t)+α t y i g j,i z j G j j=i η i (t + 1) = η i (t)+β t w i j=i z j G j
Experimental results Scalability analysis 22 Large set of randomly generated instances: u I has been varied between 20 and 60 u K has been varied between 100 and 1000 Average execution time required to solve instances of maximum size is lower than 3 minutes and one minute for the CA and LR problems, respectively
Experimental results Comparison with alternative methods 23 Heuristic 1: u The CA is performed every 5 minutes and the number of VMs is determined according to utilization thresholds u The number of VMs is determined such that the utilization of the VMs is equal to a given threshold τ 1 u VM provisioning is further triggered if the prediction of the VMs utilization is higher than a second threshold τ 2 > τ 1 u Multiple analyses have been performed by adopting different thresholds: (τ 1, τ 2 ) = (40%, 50%), (50%, 60%), and (60%, 80%)
Experimental results Comparison with alternative methods 24 Heuristic 2: Same as Heuristic 1 but the number of VMs is determined by optimally solving our CA problem every 5 minutes Heuristic 3: Same as Heuristic 2 but with a 10 minutes time horizon
Experimental results Comparison with alternative methods 25 Local incoming workload has been obtained from the traces of a very large dynamic Web-based system: u Normal day scenario: It describes the baseline workload (bimodal requests profile) u Heavy day scenario: It exhibits a 40% increment in the number of the client requests with respect to the baseline u Noisy day scenario: It is characterized by the same request profile belonging to the heavy day scenario with an additional noise component
26 Experimental results Comparison with alternative methods
Experimental results Comparison with alternative methods 27
Experimental results Validation on Amazon EC2 28
Conclusions and future work 29 Prediction-based distributed CA and LR algorithms for IaaS cloud system minimizing the cost of the running VMs Experimental results shown that our solutions significantly improve other heuristics Future work will extend the validation of our solution considering a larger experimental setup
Thank you! 30 Questions? Danilo Ardagna Politecnico di Milano,, Italy ardagna@elet.polimi.it