Revealing the MAPE Loop for the Autonomic Management of Cloud Infrastructures Michael Maurer, Ivan Breskovic, Vincent C. Emeakaroha, and Ivona Brandic Distributed Systems Group Institute of Information Systems, Vienna University of Technology, Austria ivona@infosys.tuwien.ac.at
Custom made systems Cloud delivery Types Tailoring, combinations, adaptations, wrapping,... SMEs Standardized products escience FOSII private hybrid public 2
Cloud Anatomy Source: Buyya, Yeo, Venugopal, Broberg, Brandic. Cloud Computing and Emerging IT Platforms: Vision, Hype and Reality for Delivering Computing as 5 th Utility, Elsevier Science 2009. Automatically adapt to users needs! Software failures Challenge: Attaining SLA Agreements vs. optimizing energy consumption... Load changes Hardware failures 3
Problem statement Service Level Agreement (SLA) CPU Power 512 MIPS Memory 1024 MB Storage 1000 GB Incoming Bandwidth Outgoing Bandwidth 10 20 Mbit/s Mbit/s dynamic on demand: computing as utility unforeseen load changes autonomic adaptation and (re-) provisioning of resources very scalable How does an appropriate management (autonomic!) loop look like? 2 conflicting goals: 1. Minimize SLA violations 2. Maximize energy efficiency 4
MAPE Loop (FoSII Infrastructure) 5
SLA Agreements Speculative approach: May we allocate less resources then agreed, but more than actually utilized at the specific point in time and not violate SLAs? What we provide? What the consumer utilizes? What was agreed in the SLA? Violation? 500 GB 400 GB >= 1000 GB NO 500 GB 510 GB >= 1000 GB YES 1000 GB 1010 GB >= 1000 GB NO 6
Preventing SLA Violations: Knowledge DBs Predict SLA violations before they happen Problems: How to identify possible SLA violations ahead of time Thresholds for the SLA parameter values where we have to react Tradeoff: preventions of SLA violations vs. doing nothing and paying penalties Consider non SLA parameters like energy efficiency, carbon footprint Possible Solutions: Rules Systems, Default Logic, Situation Calculus, Case Based Reasoning,
CBR - Cases Some possible actions Typical CBR case
Credits: Michael Maurer Case Based Reasoning (CBR)
Knowledge Management in Clouds with CBR Measurements Threshold Measure Results Feedback Capacity constraint Rule to engage CBR Case Based Reasoning Trigger Action Actions: - VM resource management - VM deployment - PM management 10
Implementation of the Simulation Engine Normalization of the parameter impacts Similarity measurements Utility functions Violations 11
Goal of the simulation: SLA knowledge management Simulation Evaluate the quality of a knowledge base in respect to analyzing measurements Input: Measurements (Monitored Metrics) Output: Action to execute Evaluation: Compare the number of SLA violations to the utilization of resources violate as few parameters as possible while utilizing as few resources as possible increase energy efficiency
Simulation Design Plan I: Maps action onto PMs Quality of recommended actions (decisions) = Violations vs provided resources (1) What do we provide? (2) What does the customer utilize? (3) What did we agree in the SLA? Knowledge base: Recommends action Analysis I: Queries knowledge base Monitor (simulated): New measurement of an SLA Plan II: Prevents oscillations and schedules execution of actions Executor (simulated): Executes action 13
Simulation Results Violations [%] 35 30 25 20 15 10 5 0 2 5 10 20 Alpha=0.1 Alpha=0.5 # Iterations Alpha=0.3 No CBR 14 Utilization [%] RAE 85 80 75 70 65 60 55 50 45 2 5 10 20 # Iterations 12 10 8 6 4 2 0 2 5 10 20 # Iterations
Next challenge: Rule-based approach Rules using Drools Rule increasing Rule decreasing
Policy Modes Global view of the Cloud infrastructure Policy Mode green green-orange orange orange-red red Description Plenty of resources left. Over-provisioning allowed. Heavy over-consumption forbidden. Resource is becoming scarce, but SLA demand can be fulfilled if no over-consumption takes place. Thus, over-provisioning is forbidden. Over-provisioning forbidden. Initiate outsourcing of some applications. Over-provisioning forbidden. SLA resource requirements of all consumers cannot be fulfilled. If possible, a specific choice of applications is outsourced. If not enough, applications with higher reputation points or penalties are given priority over applications with less impact. SLAs of latter ones are deliberately broken to ensure SLAs of former ones. 16
Current/Future Work 17
Current/Future Work 18
Current/Future Work 19
Current/Future Work 20
Current/Future Work 21
Future Work Translation of Resource Utilization to Energy Efficiency Development and evaluation of different knowledge management techniques Development of heuristics to selects the most appropriate KM technique Transition from the simulation to a real world test-bed 22