1 Heuristic policies for SLA provisioning in Cloud-based service providers L.Silvestri, E. Casalicchio, V. Cardellini, V. Grassi, F. Lo Presti DISP, Università degli studi di Roma Tor Vergata InfQ2010

2 Agenda Problem definition System Architecture SLA Definition Problem Formulation Problem Solution Heuristic Algorithms Experimental Results Conclusions 2

3 Problem definition In SOAs many different providers can offer equivalent services at different costs and QoS levels Providers and clients stipulate SLA to define the level of QoS that should be guaranteed Service Providers have a Capacity Planning problem: respect SLAs while minimizing operational costs To react quickly to traffic bursts and avoid overprovisioning providers can lease computational power from cloud infrastructure providers when needed Efficient algorithms for automatic service provisioning are needed to prevent SLA violations in case of sudden workload fluctuations 3

4 Problem Statement This capacity planning problem can be stated as follows: To find, in case of unpredictable and suddenly changing workload conditions, the set of VMs that should be allocated to guarantee SLA fulfillment minimizing the allocation cost over a medium/long term time horizon 4

5 System Architecture 5

6 SLA Definition We consider an SLA given by ( τ, X max,t,v ) max where is the observation period used to compute the average response time X max is the maximum value for the average response time in an observation period T is the SLA time span, defined as a multiple of ; V max is the maximum fraction of an observation periods in T where the observed average response time X can exceed X max The fraction V T, of observation periods where X max is exceeded is defined as ˆ x τ,i where and is the average response time measured at the cloud dispatcher in the i-th observation period of the time span T 6

7 Problem Formulation We can formulate an optimization problem Where T ʹ = M T is the medium term time horizon m j is the number of VMs allocated in the j-th time span i,j is the arrival rate in the i-th observation period of the j-th time span is the service rate of each VM x i,j is the service response time observed at the cloud dispatcher in the i-th observation period of the j-th time span c the cost to use a VM for T time units M C the total allocation cost over the medium term time horizon C = m j c j =1 7

8 Problem Solution If the average arrival rate i,j is known in advance the solution can be easily computed: the optimal allocation that allows to obtain the minimum C over T is given by the minimum number of VMs needed to guarantees SLA fulfillment in each time span T In real environments i,j is not known and is very difficult (or even impossible) to predict We propose heuristic algorithms to solve the problem finding a suboptimal allocation 8

9 Heuristic VMs Allocation RMVA (Reactive Model-based VMs Allocation) reactive policy that computes the optimal solution for the forthcoming time span T assuming that the arrival rate in T will be equal to the one observed in the previous one RVVA (Reactive Violation-based VMs Allocation) reactive policy that choose to allocate/deallocate VMs on the basis of the number of SLA violations observed in the previous time span PVVA (Proactive Violation-based VMs Allocation) proactive policy that choose to allocate/deallocate VMs on the basis of the number of SLA violations predicted for the previous time span 9

10 RVVA - PVVA PVVA similar to RVVA but, to decide whether allocate/deallocate VMs, uses the predicted value for the number of violations in the next T instead of the last observed value Forecasting is done through exponential smoothing V ˆ T = α V T 1 + (1 α) V ˆ T 1 In our experiments we used =

11 Simulation setting To evaluate the proposed heuristics we considered 2 metrics: C : the total cost over the medium term period T V T,T : the percentage of SLAs violations over T Average value of the metrics computed running a CSIMbased event-driven simulation model In the simulation we used T = 60 minutes = 5 minutes (i.e. N=12) allocation cost is 0.1\$ per hour per VM M = 370 (T of about 15 days) 11

12 Simulation workload System workload generated using a portion of the trace from the 1998 FIFA World Cup Trace stretched from seconds to minutes 12

13 Experimental Results (1) VMs allocation (total allocation cost) 13

14 Experimental Results (2) Percentage of SLA Violations V T,T 14

15 Experimental Results (3) Average Response time 15

16 Results Summary Policy V T,T C (\$) Optimal 0.14% ± 0.04% RMVA 13.03% ± 0.28% ±1.5 RVVA 2.89% ± 0.22% ± 2.5 PVVA 2.53% ± 0.27% ±

17 Conclusions We proposed three algorithms for automated service provisioning in a cloud environment Experiments show that violation-based and reactive algorithms perform better than model-based and proactive ones Future work: Model improvement remove unrealistic assumptions Algorithms improvement use of more accurate prediction models Implementation and evaluation in a real system 17

