Genetic Algorithms for Energy Efficient Virtualized Data Centers 6th International DMTF Academic Alliance Workshop on Systems and Virtualization Management: Standards and the Cloud Helmut Hlavacs, Thomas Treutner University of Vienna, Austria 26.10.2012 1 / 40
Table of Contents 1 Scenario 2 Utilization Trace Data 3 Balanced First Fit 4 Genetic Algorithm 5 Parameters 6 Results 7 Performance Evaluation 8 Conclusions 2 / 40
Abstract In A Nutshell Efficiency by dynamic consolidation + workload forecasting Heterogeneous infrastructure in terms of power, resources Evaluation of real traces, University of Vienna Central IT Dept. CPU traces of 35 VMs, 4 weeks, 2h resolution, VMware Business infrastructure scenario: Energy costs are just one of several parts of operational costs Use a cost model! Cost model, configurable penalties for several cost categories, minimize total weighted costs Multi-objective combinatorial optimization problem Comparison of total weighted costs: Balanced First Fit heuristic, Genetic Algorithm, Load Balancing Forecasting: (S)ARIMA, Holt-Winters 3 / 40
Scenario Scenario Scenario Highly variable workload intensity, often periodic. No (little?) number-crunching, its not a HPC cluster etc. Minimize energy consumption while avoiding under-provisioning, before reaching 100% utilization! Queuing issues! Need resources for live migration! Status costs: Energy: Linearly correlated with CPU util, future work: SPECpower Overloads: Queuing, Bad QoS, loose revenue, non-linear, ideally continuous function! Reconfiguration costs: Live Migration: Resource intensive process Server Boots/Shutdowns: Costs energy, puts mechanical/electrical strain? 4 / 40
Scenario Non-linear overload cost function Overload Cost 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 Utilization Figure: Markovian M/M/1 queue, P(T > x) =1 F T (x) =e µ(1 ρ)x, ρ as CPU util, service rate µ and max response time x must be supplied 5 / 40
Scenario Static Over-provisioning Demand Capacity Resources 0 0.5 1 1.5 2 2.5 3 Time [Days] 6 / 40
Scenario Peak-provisioning Demand Capacity Resources 0 0.5 1 1.5 2 2.5 3 Time [Days] 7 / 40
Scenario Static under-provisioning Demand Capacity Resources 0 0.5 1 1.5 2 2.5 3 Time [Days] 8 / 40
Scenario Dynamic Provisioning for Actual Demand Demand Capacity Resources 0 0.5 1 1.5 2 2.5 3 Time [Days] 9 / 40
Utilization Trace Data Diagnostic time series plot 400 600 800 1000 9pm 1am 5am 9am 1pm 5pm 10 / 40
Utilization Trace Data Periodic, seasonal resource demand Utilization [MHz] 2200 2000 1800 1600 1400 1200 1000 800 600 400 200 0 0 50 100 150 200 250 300 350 Interval 11 / 40
Utilization Trace Data Bursty resource demand 1400 1200 Utilization [MHz] 1000 800 600 400 200 0 0 50 100 150 200 250 300 350 Interval 12 / 40
Utilization Trace Data Complete change in resource demand 350 300 Utilization [MHz] 250 200 150 100 50 0 50 100 150 200 250 300 350 Interval 13 / 40
Utilization Trace Data Seasonal resource demand with a changing mean 180 160 Utilization [MHz] 140 120 100 80 60 40 0 50 100 150 200 250 300 350 Interval 14 / 40
Balanced First Fit Balanced First Fit Bin packing related heuristic, inherently inflexible Not influencable by cost model, but evaluated by it Needs a sorting criteria for the bins, SPECpower ssj2008 score In a nutshell, three phases: 1 Check servers for utilizations exceeding threshold, if so, remove VMs resource-balanced until not overloaded, add VMs to homelessvms 2 Try to map homelessvms beginning with most energy-efficient. Do this resource-balanced again. If still homelessvms, force action. 3 Try to consolidate less energy-efficient servers, only if all of its VMs can be migrated to more efficient servers, and vmconsolidationinertia reached Hysteresis control: Turn of servers if rmidletimeout reached 15 / 40
Genetic Algorithm Genetic Algorithm Meta-heuristic, directly influencable by cost model Fitness value is reciprocal to the cost of a solution Lower cost solutions have higher survival chances Cross-over, Mutation, Evaluation, Selection Elitism Selection, Roulette Wheel Selection Max number of generations, stop if quality not increasing for n generations Ensures good quality and runtime Multi-threading by using demes, randomly exchanging solutions Several mutators defined Solution defined by mapping matrix Rows are servers Columns are VMs 16 / 40
Genetic Algorithm Genetic Algorithm: Crossover operator Single point crossover 1 0 0 1 X Father = 0 1 1 0 ; X Mother = 0 0 0 0 X Son = 1 0 1 0 0 1 0 1 ; X Daughter = 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 17 / 40
Genetic Algorithm Genetic Algorithm: swaprm operator Swap two rows X old = X new = 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 18 / 40
Genetic Algorithm Genetic Algorithm: swapvm operator Swap two columns X old = X new = 1 0 0 1 0 1 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 19 / 40
Genetic Algorithm Genetic Algorithm: migratevm operator Migrate a VM X old = X new = 1 0 0 1 0 1 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 1 0 20 / 40
Genetic Algorithm Genetic Algorithm: consolidaterm operator Move all 1 s to another row X old = X new = 1 0 0 1 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 21 / 40
Parameters Parameters 28 days of trace, first 7 reserved for forecasting, 21 for eval Hundreds of VMs, resampled from the trace data (scaling, memory alloc) 26 servers, taken from SPECpower ssj2008, high diversity Optional linear interpolation to emulate more frequent measurements VMware export limitation Optional forecasting, GNU R, auto-model-building for each VM in each interval to consider change in workload pattern (S)ARIMA: Limit parameter search and data, takes very long Use 95th upper bound as forecast, very conservative! Non-linear overload costs: For every minute of an interval, for every VM running on an overloaded host, multiply cost function value with rmutilizationpenalty and sum up Penalize long intervals, as overloads are longer or harder detectable 22 / 40
Parameters Relevance Parameter Value cpuutilizationwarninglevel 0.6 memoryutilizationwarninglevel 0.8 utilizationcostfunctionmu 10 utilizationcostfunctionallowedresponsetime 1 All energypenalty 600 migrationpenalty 1 rmutilizationpenalty 10 bootpenalty 1 shutdownpenalty 5 Load Balancing variancepenalty 100000 BFF rmidletimeoutseconds 900 vmconsolidationinertiaseconds 600 numberofthreads 4 numberofgenerations 200 maxgenerationsoffitnessnotincreased 10 sizeofpopulation 800 GA and LB crossoverrate 0.5 exchangerate 0.1 migratevmrate 0.3 swaprmrate 0.1 swapvmrate 0.1 elitism true Optional Forecasting requiredperiodsforforecasting 3 23 / 40
Parameters Simulation Input Data MHz 1.6e+06 1.4e+06 1.2e+06 1e+06 800000 600000 400000 200000 0 0 50 100 150 200 250 Interval VM CPU Demand RM CPU Capacity RM CPU Warning Level Capacity Figure: The time series of total VM CPU demand, server capacity and quota capacity within the warning level used in the simulations. 24 / 40
Results Total weighted costs 6e+06 5e+06 4e+06 Cost 3e+06 2e+06 1e+06 0 GA-ARIMA GA-HoWi GA BFF-ARIMA BFF-HoWi BFF LB GA-ARIMA GA-HoWi GA BFF-ARIMA BFF-HoWi BFF LB GA-ARIMA GA-HoWi GA BFF-ARIMA BFF-HoWi BFF LB GA-ARIMA GA-HoWi GA BFF-ARIMA BFF-HoWi BFF LB Energy Migration Boot 7200s 3600s Shutdown Overload 1800s 900s 25 / 40
Results Dynamic Provisioning Demand Capacity Resources 0 0.5 1 1.5 2 2.5 3 Time [Days] 26 / 40
Results Overprovisioning [%] 60 50 40 30 20 10 0-10 -20-30 0 50 100 150 200 250 300 350 400 450 500 Interval (Available-Used)/Used Figure: Provisioning efficiency for an interval length of 3600 s and the BFF heuristic without forecasting. 27 / 40
Results 140 120 Overprovisioning [%] 100 80 60 40 20 0 0 50 100 150 200 250 300 350 400 450 500 Interval (Available-Used)/Used Figure: Provisioning efficiency for an interval length of 3600 s and the BFF heuristic with Holt-Winters forecasting. 28 / 40
Results Overprovisioning [%] 55 50 45 40 35 30 25 20 15 10 5 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Interval (Available-Used)/Used Figure: Provisioning efficiency for an interval length of 900 s and the BFF heuristic with Holt-Winters forecasting. 29 / 40
Results Changing the overload penalty 500000 400000 Cost 300000 200000 100000 0 GA-HoWi BFF-HoWi GA-HoWi BFF-HoWi GA-HoWi BFF-HoWi GA-HoWi BFF-HoWi GA-HoWi BFF-HoWi GA-HoWi BFF-HoWi GA-HoWi BFF-HoWi 10 20 40 100 200 500 1000 Overload Cost Energy Migration Boot Shutdown Overload 30 / 40
Results Changing the migration penalty Cost 900000 800000 700000 600000 500000 400000 300000 200000 100000 0 GA-HoWi BFF-HoWi GA-HoWi BFF-HoWi GA-HoWi BFF-HoWi 1 50 100 Migration Cost Energy Migration Boot Shutdown Overload 31 / 40
Results Changing the energy penalty Cost 1.4e+06 1.2e+06 1e+06 800000 600000 400000 200000 0 GA-HoWi BFF-HoWi GA-HoWi BFF-HoWi GA-HoWi BFF-HoWi 600 1200 2400 Energy Cost Energy Migration Boot Shutdown Overload 32 / 40
Results VM consolidation inertia, Holt-Winters forecasting Cost 1e+06 900000 800000 700000 600000 500000 400000 300000 200000 100000 0 0 300 600 900 1800 3600 7200 VM Consolidation Inertia [s] Energy Migration Boot Shutdown Overload 33 / 40
Results VM consolidation inertia, no forecasting 1.2e+06 1e+06 800000 Cost 600000 400000 200000 0 0 300 600 900 1800 3600 7200 14400 28800 VM Consolidation Inertia [s] Energy Migration Boot Shutdown Overload 34 / 40
Performance Evaluation Performance: Hardware Platform Specification Platform: Low Power Desktop Server CPU AMD E-350 PhenomII X4 955 Intel Xeon E5-2670 CPU Frequency 1.6 GHz 3.2 GHz 2.6 GHz CPU Cores 2 4 8 CPU L2-Cache 2x512 KiB 4x512 KiB 8x256 KiB CPU L3-Cache N/A 6 MiB shared 20 MiB shared CPU TDP 18 W 125 W 115 W Memory 2 GiB 16 GiB 64 GiB Table: Description of platforms used in the performance evaluations. 35 / 40
Performance Evaluation Balanced First Fit 2.5 2 Walltime [s] 1.5 1 0.5 0 BFF-HoWi BFF BFF-HoWi BFF AMD E-350 AMD Phenom II X4 Figure: Runtime per interval of BFF with/without Holt-Winters forecasting on a low power CPU and a high-end desktop CPU. 36 / 40
Performance Evaluation Genetic Algorithm 350 300 250 Walltime [s] 200 150 100 50 0 GA-HoWi GA GA-HoWi GA AMD E-350 AMD Phenom II X4 Figure: Runtime per interval of GA with/without Holt-Winters forecasting on a low power CPU and a high-end desktop CPU. 37 / 40
Performance Evaluation Genetic Algorithm Parallelization Speedup 2.5 2 Speedup 1.5 1 0.5 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Numer of Cores AMD PhenomII X4 3.2GHz Dual Xeon E5-2670 2.6GHz Figure: Speedup achieved by parallelizing the genetic algorithm. 38 / 40
Conclusions Conclusions Flexible cost model feeding into a GAs fitness function Easy adaptation to diverse optimization demands Case study parameter sets, drastic reduction of total costs For long intervals, forecasting is essential Heuristics faster, but inflexible to changing parameters (energy price, overload costs etc.) GA can do load balancing by changing single parameter Future Work: Non-linear, heterogeneous live migration costs Heterogeneous VM overload costs (priorities) Penalizing co-existence of VM pairs on a host (customer isolation, performance issues, security) Speed up GA by storing final solution of the last n intervals, replaying them to solution population GA multi-threading speedups?! 39 / 40
Conclusions Q&A Thank you for your attention! http://www.metavisor.org was sponsored by the Internet Foundation Austria within the Netidee 2012 funding programme - http://www.netidee.at 40 / 40