ASCETiC Project D4.1 Intra-Layer Cloud Stack Adaptation Project Acronym ASCETiC Project Title Adapting Service lifecycle towards Efficient Clouds Project Number 610874 Instrument Collaborative Project Start Date 01/10/2013 Duration 36 months Thematic Priority ICT-2013.1.2 Software Engineering, Services and Cloud Computing Work Package WP4, Static Energy-Efficiency Due Date: M24 Submission Date: 30/09/2015 Version: 1.0 Status Final Main Editor(s): Richard Kavanagh (ULE) Django Armstrong (ULE) Karim Djemame (ULE) Reviewer(s) Odej Kao (TUB) Jean-Christophe Deprez (CETIC)
2 Project co-funded by the European Commission within the Seventh Framework Programme Dissemination Level PU Public X PP Restricted to other programme participants (including the Commission) RE Restricted to a group specified by the consortium (including the Commission) CO Confidential, only for members of the consortium (including the Commission)
3 Version History Version Date Comments, Changes, Status 0.01 01/05/2015 ToC agreed 0.05 12/06/2015 Integrating initial content for section 2 0.06 08/07/2015 Integrating Application monitor and pricing modellers. 0.1 20/07/2015 Integrating SLA Managers and programming model. Authors, contributors, reviewers Richard Kavanagh (ULE), Karim Djemame (ULE) Richard Kavanagh (ULE), Django Armstrong (ULE), Raül Sirvent (BSC), Mario Macías (BSC), David Garcia Perez (ATOS) Richard Kavanagh (ULE), Mario Macías (BSC), Alexandros Kostopoulos (AUEB) Richard Kavanagh (ULE), Luca Porrini (HP), Davide Sommacampagna (HP), Raül Sirvent (BSC) 0.11 21/07/2015 Adding preface/position of deliverable in reading path Richard Kavanagh (ULE), Julia Wells (ATOS) 0.12 31/07/2015 Adding SaaS KPI Modelling and Visualisation Tools 0.13 03/08/2015 Adding the outline of the Software User guide 0.14 11/08/2015 Adding further updates from CETIC. Richard Kavanagh (ULE), Jean- Christophe Deprez (CETIC) Richard Kavanagh (ULE) Richard Kavanagh (ULE), Christophe Ponsard (CETIC) 0.15 26/08/2015 Adding content to section 2.4.5 and section 3.3 Richard Kavanagh (ULE), Marc Körner (TUB), David Ortiz (BSC) 0.2 02/09/2015 Adding section 2.6 KPIs and Metrics Richard Kavanagh (ULE), Davide Sommacampagna (HP) 0.21 02/09/2015 Inclusion of Section 1 Richard Kavanagh (ULE) 0.25 07/09/2015 Structural updates and review document. 0.4 11/09/2015 Adding updates before Venice meeting. Richard Kavanagh (ULE), Karim Djemame (ULE), Django Armstrong (ULE) Richard Kavanagh (ULE), David Garcia Perez (ATOS), Marc Körner (TUB), Mario Macías (BSC), Raül Sirvent (BSC), Luca Porrini (HP), Davide Sommacampagna (HP)
4 0.6 22/09/2015 Adding updates post Venice meeting. 0.62 22/09/2015 Adding further updates and adding new section on novelty and conclusion 0.64 22/09/2015 Adding updates from AUEB 0.66 23/09/2015 Completing section 3 and minor changes to various sections. Richard Kavanagh (ULE), Django Armstrong (ULE) Alexandros Kostopoulos (AUEB), Luca Porrini (HP), Marc Körner (TUB), David Garcia Perez (ATOS), Raül Sirvent (BSC) Richard Kavanagh (ULE), Django Armstrong (ULE), David Rojoa (ATOS) Richard Kavanagh (ULE), Alexandros Kostopoulos (AUEB) Richard Kavanagh (ULE), Karim Djemame (ULE) 0.68 24/09/2015 Adding revisions to section 2.2.2 Richard Kavanagh (ULE), Jean- Christophe Deprez (CETIC) 0.7 25/09/2015 Adding revisions to section 2.2.7 Richard Kavanagh (ULE), Jean- Christophe Deprez (CETIC) 0.74 28/09/2015 Adding editorial changes suggested by internal review. 0.8 28/09/2015 Adding changes by various partners relating to comments made by the internal review. 0.81 29/09/2015 Fixing section numbering and improving grammar 0.82 29/09/2015 Updates to section 2.2.1 given feedback from the internal review. 0.84 29/09/2015 Updates regarding the pricing modellers and other minor updates. 0.90 29/09/2015 Updating the section 2.7. Inserting additional references and general document check. 1.0 30/09/2015 Final checks Richard Kavanagh (ULE), Django Armstrong (ULE) Richard Kavanagh (ULE), Luca Porrini (HP), Davide Sommacampagna (HP), Mario Macias (BSC) Richard Kavanagh (ULE) Richard Kavanagh (ULE), Raül Sirvent (BSC) Richard Kavanagh (ULE), Alexandros Kostopoulos (AUEB), Jean-Christophe Deprez (CETIC) Richard Kavanagh (ULE), Karim Djemame (ULE), Davide Sommacampagna (HP) Richard Kavanagh (ULE)
5 Table of Contents Version History... 3 List of Figures... 6 List of Tables... 8 Preface Positioning this deliverable in the ASCETiC project... 9 Executive Summary... 10 1. Introduction... 12 1.1 Purpose... 12 1.2 Structure... 12 1.3 Glossary of Acronyms... 12 2. Scientific Report on Components... 14 2.1 How ASCETiC Achieves Energy-awareness and Adaptation... 14 2.2 SaaS SDK Tools... 16 2.2.1 ASCETiC Programming Model... 16 2.2.2 SaaS KPI Modelling and Visualisation Tools... 23 2.3 PaaS layer... 33 2.3.1 PaaS Self-Adaptation Manager... 33 2.3.2 PaaS Energy Modeller... 36 2.3.3 PaaS Virtual Machine Contextualizer... 41 2.3.4 PaaS Pricing Modeller... 46 2.3.5 PaaS SLA Manager... 52 2.4 IaaS Layer... 55 2.4.1 Virtual Machine Manager... 55 2.4.2 IaaS Energy Modeller... 65 2.4.3 IaaS Pricing Modeller... 72 2.4.4 Infrastructure Monitor... 79 2.4.5 Infrastructure Manager... 84 2.4.6 IaaS SLA Manager... 85 2.5 Overall ASCETiC System Flow... 86 2.5.1 OVF Library and Interoperability Experimentation... 86 2.5.2 Objectives... 86 2.5.3 Testbed... 87 2.5.4 Application... 88 2.5.5 Results... 89 2.6 Other Components... 91 2.6.1 SaaS Application Packager... 91
6 2.6.2 SaaS Virtual Machine Image Constructor... 91 2.6.3 Code Optimizer Plug-in... 92 2.6.4 PaaS Application Manager... 93 2.6.5 Application Monitor... 95 2.6.6 PaaS Provider Registry... 99 2.7 KPI and Metrics... 100 3. Architectural Component Novelty... 117 4. Software User Guide... 118 4.1 SaaS Layer... 118 4.1.1 Using the Programming Model... 118 4.2 PaaS layer... 119 4.2.1 REST API... 119 4.2.2 AMQP... 119 4.2.3 Application Monitor... 121 4.3 IaaS layer... 125 5. Conclusions... 127 References... 128 List of Figures Figure 1: Y2 deliverables reading path... 9 Figure 2: GAT vs. NIO, task generation case... 21 Figure 3: GAT vs. NIO, file transfer case... 22 Figure 4: GAT vs. NIO, object generation case... 22 Figure 5: High Level Structure of NFR and some relationships... 27 Figure 6: Minimisation of indirect energy costs of SaaS application.... 28 Figure 7: CPU comparison between two nodes in the same test run... 30 Figure 8: Filtering on aspects.... 31 Figure 9: Energy vs. Time... 31 Figure 10: PaaS SAM Overall Workflow... 34 Figure 11: Lab Setup... 39 Figure 12: Model testing... 40 Figure 13: Response time of concurrent user requests to generate ISO images 44 Figure 14: Time to prepare a VM image.... 44 Figure 15: Time measurements of recontextualization.... 45 Figure 16: Breakdown of time spent during recontextualization.... 45 Figure 17: Comparison of payments by two applications to IaaS providers as a function of application QoS diversity. IaaS providers employing static pricing are not competitive because they require higher payments for at least one application.... 50 Figure 18: Aggregate net benefit for the cases of a two-part incorporating energy charges, and a static pricing scheme. In competitive markets, applications are always benefited the most under the two-part tariff.... 51
7 Figure 19: VM optimisation performance for different local search algorithms and search time (30 hosts 30 VMs 20% average load)... 60 Figure 20: VM optimisation performance for different local search algorithms and search time (30 hosts 30 VMs 45% average load)... 61 Figure 21: VM optimisation performance for different local search algorithms and search time (50 hosts 50 VMs 21% average load)... 61 Figure 22: VM optimisation performance for different local search algorithms and search time (50 hosts 50 VMs 51% average load)... 61 Figure 23: VM optimisation performance for different local search algorithms and search time (100 hosts 100 VMs 20% average load)... 62 Figure 24: VM optimisation performance for different local search algorithms and search time (100 hosts 100 VMs 47% average load)... 62 Figure 25: Data serving CPU vs. average operations/second... 63 Figure 26: data analytics Watts vs. execution time... 63 Figure 27: data caching Watts vs. requests/second... 64 Figure 28: Data serving Watts vs. average operations/second... 64 Figure 29: Calibration of the Energy Modeller in ASCETiC... 67 Figure 30: Example of Calibration Data on Testnode4 Leeds Testbed... 68 Figure 31: Trace of Power Consumption on Testnode4... 70 Figure 32: Distribution of Error in the Power Model... 71 Figure 33: Recursive calculation of energy charges C(T) up to time T by the energy charges C(Tk) and the energy charge during the time period from Tk up to T, where Tk is the last instant the energy price changed prior to T.... 74 Figure 34: IaaS provider profits in a monopoly Using a two-part tariff incorporating energy charges (solid curve) and a static price (dashed).... 78 Figure 35: Available metrics from IPMI sensors on a Dell PowerEdge R430 server... 82 Figure 36: Host power consumption monitored via IPMI during instantaneous switching of CPU load from 100% to 0%.... 83 Figure 37: TUB Cloud Testbed... 87 Figure 38: Three Tier Web Application Architecture.... 88 Figure 39: Application life-cycle phases against the number of application server instances.... 89 Figure 40: Linear relationships of application life-cycle phases against the number of application server instances.... 90 Figure 41: Power and CPU Utilization during deployment... 91 Figure 42: Communications workflow in a multi-provider scenario (This figure it is a simplification of the ASCETiC Y2 architecture just to explain the multi-provider scenario at PaaS level).... 94 Figure 43: Application Monitor main dashboard... 98 Figure 44: Application Monitor dynamic time-series graphing... 99 Figure 45 PaaS Layer KPIs and Metrics... 105 Figure 46: SLA Negotiation... 115 Figure 47: Training application models... 116 Figure 48: Monitoring... 116 Figure 49: Tab with Deployment Properties to specify optimization and boundaries... 119 Figure 50: System status section... 122 Figure 51: active applications section... 123 Figure 52: recently finished deployments section... 123 Figure 53: overall view of "App Metrics" section... 124
8 Figure 54: new graph modal form... 124 Figure 55: Virtual Machine Manager - Dashboard... 126 Figure 56: Virtual Machine Manager - VMs... 126 Figure 57: Virtual Machine Manager - Images... 127 List of Tables Table 1: Acronyms... 14 Table 2: Goodness of fit for the Energy Modeller's Calibration... 69 Table 3: Application life-cycle phases against number of application server instances.... 90 Table 4: Energy and Power Metrics at the IaaS Layer... 101 Table 5: IaaS VM Manager Performance Metrics... 102 Table 6: IaaS General Monitoring Metrics... 104 Table 7: PaaS Layer Metrics... 106 Table 8: SaaS - Metrics for defining KPIs and utility function on operational cost.... 110 Table 9: SaaS - Metrics for defining power and energy usage of applications 113 Table 10: SaaS - Time Performance Metrics... 114 Table 11 - Summary of Novelty within the ASCETiC Architecture... 118
9 Preface Positioning this deliverable in the ASCETiC project This deliverable is the second (of three) Scientific report documents corresponding to the second year of the project s progress. Its position in the reading path of deliverables is shown below. Figure 1: Y2 deliverables reading path ASCETiC follows an incremental development approach hence each year a separate scientific report will be produced, which will make advances on the previous year, for the duration of ASCETiC project. The first year mainly focused on making the life cycle phases and Cloud components of each service layers energy-aware; the second year enables adaptation at each service layer in isolation; finally, the third year will add adaptation across service layers to reach global energy effectiveness.
10 Executive Summary The purpose of this deliverable is to present the contribution of ASCETiC s components to science. Furthermore, instructions of how they can be used by actors from each of the architecture layers: SaaS, PaaS and IaaS, are given. The scientific contributions are presented individually per component, comparing them to current solutions and describing how each component goes a step beyond the state of the art. The key action of the second year of the project was to perform self-adaptation on a per layer basis. In the case of the SaaS layer, the SaaS Modelling Tools advance on the approach of specifying at design time what needs to be measured in an application or a service when it will be deployed and executed, by capturing an applications ability to adapt. Besides, the ASCETiC Programming Model related components now are now not only aware of the energy consumption of an application but can now adapt using advanced scheduling techniques that take account of different versions of an applications core elements, target platform and consumption profile. The PaaS layer components are orchestrated by the Application Manager. This is now complimented by the new component called the Self-Adaptation manger, which manages applications at runtime and invokes change in cases where SLAs terms are been violated. The component that provides energy awareness is the Energy Modeller, which provides estimates of energy consumption of applications and events inside an application. At the heart of enabling adaptation is the VM Contextualizer that has the ability to inject probes in application images for monitoring as well as enabling the changes to the context of VMs running an application. The Pricing Modeller is used to calculate and predict prices taking into account energy. Pricing and energy information is utilised by the SLA Manager to select the best SLA offer, while the SLA manager in addition defines SLAs and monitors their conformance; and the Application Monitor which stores and analyses application-level metrics obtained with probes. Regarding the IaaS layer, again the Energy Modeller (this time the one in the IaaS layer) is a key component, since it can predict and measure energy of hosts and VMs as well as assist in the ranking of hosts for VM placement. The energy information is collected by means of the Infrastructure Manager and stored in the Infrastructure Monitor, together with other KPIs. The energy modeller has been added to this year with the inclusion of an Emulated Watt meter, which provides scalability in cases where Watt meters are not attached to each physical host. The SLA Manager works together with its corresponding component in the PaaS layer to select the best offer, as does the Pricing Modeller. The VM Manager performs the deployment of VMs according to a policy specified by the owner of the infrastructure, which has expanded the policies it may use in the second year. The VM Manager has also been extended to include an adaptation manager that can perform rescheduling of VMs to maintain the performance and energy efficiency of the IaaS layer. In general, the key research challenge that the ASCETiC Toolbox has solved in the second year is the ability to take adaptive actions based upon factors such
11 as price and energy consumption and performance within each layer of the Toolbox and to examine the effect that it has upon the running applications. In addition to the detailed scientific contributions, we also include user guides for each layer. The most extensive guides are in the SaaS layer, since both the Programming Model and the SaaS Modelling Tools expose graphical interfaces that applications or service developers may use to create their new applications or services powered with the energy-awareness capabilities. Both SaaS manuals include a Getting started section intended to be a fast reference for end users. In the case of the PaaS and IaaS layers, both layers have a central component considered to be the entry point to each layer (the Application Manager for the PaaS and the VM Manager for the IaaS). Therefore, in this case their interfaces are referenced as the User Guide. In the case of the VM Manager, a dashboard is also offered. Our conclusions emphasize that the second year objectives of the project have been fulfilled. Energy consumption of applications can be measured and utilised within the constraints of each layer to perform adaptation within the ASCETiC architecture.
12 1. Introduction 1.1 Purpose This document is the official deliverable D4.1: Intra-layer Cloud Stack Adaptation, which is described in the DoW as a scientific report, which describes the scientific work and software packages in the second iteration of the ASCETiC Toolbox. The ASCETiC project has considered to deliver not only the software produced in Year 2, but this document to help in explaining mainly what are the individual scientific contributions of the components that form the ASCETiC and how each layer in the architecture will be used. Thus, this deliverable is therefore divided in three sections corresponding to the scientific contributions, novelty and the user guides. 1.2 Structure The scientific contributions of the ASCETiC Toolbox components have been described in a short paper format: first motivating the problem to be solved, second analysing current related work in the topic or topics addressed, third detailing specifically the scientific contributions of the and fourth providing conclusions and future directions of the research done. The section starts with a sub-section putting in context how ASCETiC achieves energy-awareness in the second year of the project and then explains the scientific contributions of the components organized by layers. This is followed by a summary of the novelty of the components in year 2. The second major part of this document includes the user guides for each layer, detailing how a Cloud end user, designer, programmer or administrator is able to use the functionalities of each layer: SaaS, PaaS and IaaS. In particular, the SaaS layer has been divided in two: the ASCETiC Programming Model and the SaaS Modelling Tools. These are the two main tools included in the layer. These two sub-sections present first a Getting Started guide and then they detail the functionalities provided and known limitations. At the end of this deliverable we also provide some conclusions about our scientific achievements, highlighting the relevance and potential scientific impact of the research that is taking place in the ASCETiC project. 1.3 Glossary of Acronyms Acronym AM AMI API APM APPM ASCETiC CE CEI CellBE CLI Definition Application Manager Amazon Machine Images Application Programming Interface Application Package Manager APPlication Monitoring Adapting Service lifecycle towards EfficienT Clouds Core Element Core Element Interface Cell Broadband Engine Command Line Interface
13 COMPSs CPU CXE DB DFS DHCP DoW DSL EC2 Gb GHz GUI IaaS IDE IM IP J2EE JDT JIT JVM KPI KVM LDAP MAC Address MAPE NFS NIC OE OVF PaaS PAM PKC PM PM plug-in PMR POSIX PUE QCow2 QEMU QoS RAM RE ReqIF REST ROI RRDtool SaaS SLA SLAM Component Superscalar Central Processing Unit Apache CXF Database Distributed File System Dynamic Host Configuration Protocol Description of Work Domain Specific Language Amazon Elastic Compute Cloud Gigabyte GigaHertz Graphical User Interface Infrastructure as a Service Integrated Development Environment Infrastructure Manager Infrastructure Provider Java 2 Enterprise Edition Java Development Tools Just In Time Java Virtual Machine Key Performance Indicator Kernel-based Virtual Machine Lightweight Directory Access Protocol Media Access Control Address Monitor, Analyse, Plan and Execute Network File System Network Interface Card Orchestration Elements Open Virtualization Format Platform as a Service Pluggable Authentication Module Public Key Cryptography Programming Model Programming Model plug-in Programming Model Runtime Portable Operating System Interface Power Usage Effectiveness QEMU Copy On Write Quick EMUlator Quality of Service Random Access Memory Requirements Engineering Requirements Interchange Format Representational state transfer Return Of Investment Round Robin Database Tool Software as a Service Service Level Agreement SLA Manager
14 SP SPU SSH Tb TOSCA TPS UI UML VDI VHD VM VMC VMDK VMIC VMM VPN XML Service Provider Synergistic Processing Unit Secure Shell Terabyte Topology and Orchestration Specification for Cloud Applications Third Party services User Interface Unified Modelling Language Virtual Disk Image Virtual Hard Disk Virtual Machine Virtual Machine Contextualizer tools Virtual Machine Disk VM Image Constructor Virtual Machine Manager Virtual Private Network extensible Mark-up Language Table 1: Acronyms 2. Scientific Report on Components 2.1 How ASCETiC Achieves Energy-awareness and Adaptation The ASCETiC architecture focuses on providing novel methods and tools to support software developers aiming at optimising energy efficiency and minimising the carbon footprint resulting from designing, developing, deploying and running software in Clouds. ASCETiC and its proposed architecture focuses upon: a) Providing models for green and efficient software design, supporting sustainability and high quality of service levels at all stages of software development and execution; b) A framework that identifies energy efficiency parameters and metrics for Cloud services; c) Measuring, analysing, evaluating and adapting energy usage in software development and execution d) Integrating energy and quality efficiency into service construction, deployment and operation leading to an Energy Efficiency Embedded Service Lifecycle. Energy awareness and adaptation is essential to the ASCETiC architecture and is achieved in each of the three layers of the Cloud stack. In the 2 nd year of the project, we will focus on intra-layer adaptation in which each layer will adapt independently, whereas in the 3 rd year the layers will cooperate with inter-layer adaptation. In the SaaS layer, a collection of components interacts to facilitate the modelling, design and construction of a Cloud application. These components
15 help in evaluating energy consumption of a Cloud application during its construction. They take the concrete form of a number of plugins directly usable by developer within an Integrated Development Environment (IDE) and that interacts with the wider ASCETiC framework. ASCETiC focuses on Cloud services made of several shared software components, which are utilised many times. These components can then be characterised, which allows the SaaS tools to relate software design and energy use. This relationship will further depend on the deployment conditions and the correct operation of the service, which will be achieved by means of an adaptive environment. The SaaS components in aid of the goal of energy efficiency therefore provide means of packaging Cloud applications in a way that enables provider agnostic deployment, while maintaining energy awareness. The PaaS layer provides middleware functionality for a Cloud application and facilitates the deployment and operation of the application as a whole. This layer focuses upon selecting the most appropriate provider for a given set of energy and performance requirements and tailoring the application to the selected provider s hardware environment. It provides facilities such as Application level monitoring and support for Service Level Agreements (SLA) formation and negotiation of to facilitate energy and Quality of Service (QoS) requirements. It goes further at runtime by utilising SLA guarantees to manage power and energy consumption levels in an automated and adaptive fashion by adapting once prior agreed constraints are realised. The IaaS layer provides admission, allocation and management of virtual resources through the orchestration of the ASCETiC IaaS layer components. Energy consumption is monitored, estimated and optimized using translated PaaS level application requirements. The focus on self-adaptation can be seen in each layer, with each component adding to the ability of the overall architecture to adapt. In the SaaS generic requirement patterns are captured on quality properties in a quantifiable fashion. The expression of these metrics as well as representative workloads enables the level and types of adaptation to be specified. This is achieved through offering deployment alternatives and selecting the most appropriate configuration for the underlying infrastructure. This is all visualised so that a SaaS development team can easily interpret how the different deployment strategies work. The captured application requirements are realised in the PaaS layer by the application manager, which enables the deployment of the application. Self- Adaptation then continues in the PaaS layer through the collaboration between the Application manager and other key components such as the Self- Adaptation manager SAM, SLA manager and App Monitor. The SLA manager continually monitors SLA conformance with the aid of the application monitor while the Self-Adaptation manager makes the decisions of when to adapt that application through horizontal scaling.
16 The last layer that supports the whole infrastructure is the IaaS layer. The Virtual machine manager (VMM) is at the heart of the adaptation at this layer. Unlike the PaaS Layer that focusses on application level metrics it focuses on optimising the VMs both at deployment and again at runtime. In order to do this it utilises energy and pricing modellers as well as key performance data from the infrastructure monitor and performs rescheduling in order to adapt either on particular events such as submission of new VMs or periodically. The individual components that have been developed to achieve this energy aware, adaptive architecture are discussed in the rest of this section. 2.2 SaaS SDK Tools 2.2.1 ASCETiC Programming Model The ASCETiC Programming Model section refers to the innovations provided by four components of the SaaS layer in the ASCETiC Architecture. They are the Programming Model, Programming Model Plug-in, Programming Model Runtime and Programming Model Packager. 2.2.1.1 Motivation During the first year of the project, the ASCETiC Programming Model (PM) has been put into context of a set of research topics, namely: frameworks/languages to program services and applications for the Cloud, energy-aware applications, energy-aware programming and energy gains from compilers. These topics were deeply analysed during Y1 SotA revisit (an accompanying document to Deliverable D2.1.1: ASCETiC Requirement Specification - version 1 ), thus we forward the reader to the Y1 SotA revisit to understand the key contributions provided by the ASCETiC PM in these topics. More specifically, during the first year we have successfully delivered the versioning technique, which enables to have applications including a single CE with different implementations, together with the possibility of implementing energy-aware policies in the ASCETiC PM Runtime (a greedy policy was provided as a proof-of-concept). For the second year and being self-adaptation at intra-layer level the global objective of the project, one of the natural topics to explore was to propose more complex policies to adapt the execution of the application at run time. As it will be shown in the Related Work sub-section, self-adaptation at the software development level has been already considered by other frameworks, but without taking into account the three parameters we consider for the optimization: performance, energy and cost. 2.2.1.2 Related Work The work related to the research topics dealt in the ASCETiC Programming Model has been already presented in the accompanying document to Deliverable D2.1.2: ASCETiC Requirements Specification (Year 2). Thus, we reference the reader to that document for more details. In this section we provide a summary of what was provided previously in D2.1.2.
17 Regarding self-adaptation capabilities at software development level, as previously mentioned, other Cloud programming solutions already included them in the past. Although performance is one of the most common parameters found in related work when optimizing the execution of applications in the Cloud (i.e. Pegasus [1], Taverna [2], Galaxy [3] or Swift [4]) other tools such as Aneka [5] and GreenPipe [6] also consider parameters such as cost in the first case (accounting it) and energy in the second (but provided by the user and tailored for a particular case). Nowadays, no other work shows a mixture of optimizing energy, cost and performance as we propose. Our main scientific contribution is a set of scheduling policies that take into account the application-knowledge level (i.e. the workflow describing the application) with the objectives of: Minimizing energy consumption: while considering boundaries for cost and performance. Minimize cost: while considering boundaries for energy consumption and performance. Maximizing performance: while considering boundaries for energy consumption and cost. As a side reminder, it is important to highlight that nowadays most common practice towards adaptation in the execution of Cloud applications and services is done using the concept of horizontal elasticity driven from the application. We see horizontal elasticity as a mechanism where several layers of the architecture (and their related information) need to work together to support this mechanism, therefore we envision it to be implemented during the third year of the project, in the inter-layer adaptation. So, optimizing performance, energy or cost will also consider the possibility of requesting new machines or reducing their number at runtime during the next year. 2.2.1.3 Scientific Contributions The scientific contribution provided by the ASCETiC PM for this second year of the project is: Scheduling policies at application-level to optimize performance, energy or cost However, other developments also help to achieve lower energy consumption in the execution of applications, which are: Persistent workers Object cache In the following paragraphs we describe both the main innovation provided and the rest of developments. Scheduling policies at application-level to optimize performance, energy or cost As previously mentioned in the motivation section, in Year 1 we implemented a greedy policy in the runtime to allocate tasks to the machine consuming less energy for a particular CE. This implementation allowed us to demonstrate during first year review that dealing with energy consumption at applicationlevel was feasible and interesting in order to save energy. However, being
18 greedy when selecting where to run a CE was not our final goal, since a greedy policy doesn t need to be the optimal policy when trying to reduce energy consumption. Therefore, in Year 2 we implemented three new policies that are able to exploit the information available in the ASCETiC Toolbox, more specifically the information and prediction in terms of energy and cost for the execution of a CE provided by the PaaS Energy and Pricing Modellers, but combined with the information available in the ASCETiC PM runtime itself regarding execution time of past CEs and prediction of that time for future ones. With this variety of information, the policies we have implemented are: Minimizing energy consumption while considering boundaries for cost and performance: we consider the use case where an end user cares about spending the minimum energy when running their application. However, the user is also interested in not to influence very negatively the total execution time (i.e. don t run the application extremely slowly), or increasing the cost up to a certain maximum (i.e. don t use very expensive machines even when they are more energy efficient). Minimize cost while considering boundaries for energy consumption and performance: in this case, the objective of the user is to execute the application as cheaply as possible, but again limiting the influence that optimizing the cost will have in both the energy (i.e. select cheap machines but with reasonable energy profiles) and the performance (i.e. cheap machines but not the slowest ones). Maximizing performance while considering boundaries for energy consumption and cost: in the third case, users want to execute the application as fast as possible, even if this requires to spend more money to use faster machines (but up to a certain level, since money is finite), but they care about spending too much energy in the execution. It is important to notice that we assume the three parameters are dynamic during the execution of an application, when they are calculated for a specific CE. This is quite obvious with the performance and energy, but not with the cost, since a fixed cost of a physical host could be directly mapped to the Virtual Machine that is using it (so, charging you no matter if you are using the machine or not). In the ASCETiC Toolbox the Pricing Modeller implements a dynamic pricing scheme, where the price of a physical host is divided between its VMs and between its applications, thus allows us the PM runtime to optimize it. If the Price had been fixed, the only possibility to optimize the cost would be to implement elasticity (i.e. demand more or less machines to run the application), which we already stated we foresee that will be implemented in Year 3. In addition, at the moment of selecting which were going to be the metrics selected to be specified as boundaries, we found that mainly two choices were available. We specify global boundaries as metrics for the whole makespan of the workflow to be executed, such as total execution time of the application, total energy spent and total cost in a set of resources. These metrics are clearly oriented to be applied to the whole workflow of an application. On the other hand, we also considered instant boundaries, such as maximum W, maximum
19 /hour and maximum execution time for a CE, that, as its own name indicates, are metrics to be considered at a particular moment in time for a particular machine. Several reasons made us select instant boundaries on top of global ones. The first one is due to the nature of the ASCETiC PM applications, that depending on the application we may or may not have the complete workflow to make the scheduling plan in advance. This is because applications can need some information in the main algorithm while progressing, which would cause a synchronization point in the master process, thus stopping the workflow generation. So, applying a global boundary to a partial part of the whole workflow won t make any sense at all. Apart of the workflow generation issues, another reason is that global boundaries are more interesting when elasticity can be driven from the application. When elasticity comes into the picture, the runtime would be able to ask for more machines to be added to the pool of resources in more parallel regions of the application and decrease that number in more sequential ones. These actions would be related not only to speed up the application, but also to reduce its energy consumption or its cost, according to the selected policy and the boundaries (e.g. add faster machines to speed up the execution, but be careful not to surpass a total energy consumption number). However, it has been also mentioned previously that elasticity is envisioned for the third year of the project, as an inter-layer mechanism. A final reason is that global boundaries only make sense for ASCETiC PM-like applications (batch oriented) but not for service-like applications, where a service is waiting for requests to be dispatched. Therefore, our implementation has considered instant boundaries in order to drive the optimization of either: energy, performance or cost, which will be specified by the end user before the execution of the application. It is also worth to mention that this optimization algorithms mixed with the versioning capabilities presented during Year 1 enable interesting options for deciding how to execute an application in a set of resources, thus, not only having different machines to make that optimization, but also different pieces of software implementing the same functionality. We have also introduced modifications to the PM Plug-in component in order to allow end users to specify in a graphical way, which is going to be the parameter to be optimized and to specify the corresponding boundaries in each case. This information is then included in the OVF that describes the application and will be read by the ASCETiC PM runtime to schedule tasks according to both the selected policy and the specified boundaries. Persistent workers A problem that the past ASCETiC PM version had in terms of energy consumption was that the workers where the CE were executed created a new process at the beginning of each task to run the core and destroyed it at the end. Thus, the Just In Time (JIT) compiler did not optimize the Java byte code of
20 the CE causing an important performance loss. An envisaged modification on the runtime system deployment to reduce the application execution time and improve its performance was to convert these worker s eventual processes to processes persistently deployed on the worker nodes. During this second Year, we have implemented this feature that will allow gains in execution time, which in turn are expected to have a direct impact in the energy saving of an application. The optimizations performed in the code by the JIT compiler will of course be noticed more in applications that are fully programmed using Java (i.e. all the work is done with the source code provided) in contrast to applications that call an external binary to perform calculations, such as an external simulator software, where the JIT compiler cannot introduce optimizations at all. Object cache Another operation commonly done in the ASCETiC PM runtime that has an impact on energy consumption is the serialization of objects. When a task needs an object that has been generated in another machine, the runtime can send the object to the machine where the object is needed, by first serializing it to disk, then sending it to the new machine and then de-serializing it. In order to avoid these serializations and transfers of objects between machines, we have implemented an intermediate object cache for the master process and each worker in the computation. Every time an object is cached, there won t be a need of transferring it from one machine to another through serialization, therefore the PM runtime will decrease the energy consumption and the time taken associated to these processes. This cache is effectively implemented as similar cache mechanisms for distributed environments. This means that data consistency is maintained between the distributed copies of a single object, invalidating and updating the copies when needed. 2.2.1.4 Evaluation The PM Runtime new capabilities on self-adaptation are evaluated using the GPF use case (a real application). Therefore and in order to avoid repetitions between deliverables, we encourage the reader to read D6.3.2: GreenPrefab Use Case Prototype version 2 to know more about the testing on such capabilities. However, we would like to include further evaluations on the rest of the improvements made to the Runtime. These improvements are the Persistent Workers capability and the Object Cache and they are the features we evaluate along this section. In order to carry this evaluation, we have designed a synthetic benchmark that is run both with the previous implementation of the runtime (using the Grid Application Toolkit (GAT)[7] and with the new one, which uses the Java Nonblocking I/O library (NIO) [8]. The benchmark can be configured to run different number of tasks with files and object transfers between the master and the workers during the execution, while the size of the cache is fixed. This flexibility
21 allows us to evaluate the difference between GAT and NIO implementations performance. We evaluate three scenarios: Task generation: generate n tasks that perform a 1s wait. File transfer: generate n tasks that have a 10 MB file as an input parameter (without sharing files between tasks). Object transfer: generate n tasks that have a 10 MB object as an input parameter (without sharing files between tasks). In all the cases there is no computation at all in the tasks, so our tests show purely runtime overhead. That s the reason why we can see in the plots that adding more workers does not contribute deeply in speeding up the benchmark execution. One could raise the fair question that it is not always true that decreasing the execution time of an application implicitly means that energy is reduced. We can see proof of this in related work analysing the energy consumption on the different functional units and banks of the machine architecture (i.e. energy spent in the registry or memory banks, in the processor, etc.). However, if we analyse the energy spent in cloud environments, the idle time in machines does a big contribution to the overall energy spent. Thus, in this particular case, we can correctly assume that if we decrease the application execution time, we are contributing the total energy consumption. Figure 2: GAT vs. NIO, task generation case In Figure 2 we see that, in all tested cases (from 1 to 100 tasks), NIO implementation outperforms the GAT one for the Task generation case of the benchmark. Only in the case of having one task and 4 workers (1+4 case), the time is similar. In the best case, the reduction of the PM Runtime overhead has been by a factor of 7.
22 Figure 3: GAT vs. NIO, file transfer case In Figure 3, we see that NIO again performs much faster than GAT when the data passed between the master and the different workers are files (from 10 to 500 tasks have been tested, using again different worker configurations). And the same happens when the data are objects, as shown in Figure 4. Here the best improvement factors are reducing between 2 and 3 times the total execution time. Figure 4: GAT vs. NIO, object generation case Thus, we can conclude from these experiments that with the NIO version, which implements the Persistent Workers capability and the Object Cache, the PM
23 Runtime contributes to decrease the execution time of the application, so decreasing accordingly its energy consumption. 2.2.1.5 Future Contributions For the final year of the ASCETiC project, we will have the objective of interlayer self-adaptation, thus the different policies applied at different levels will have agree in order to avoid chasing contradictory objectives. A clear mechanism that will come into place derived from inter-layer communication is elasticity driven from the application, which will make our optimization policies richer. This is because the ASCETiC PM runtime will be able to request new machines or reduce their number, depending on the objective to chase specified by the user, leading to a possible consolidation of tasks in the same virtual machine, or the opposite. We also envision the implementation of a locality-aware policy to avoid data transfers with the objective of saving energy. This policy will try to avoid the energy spent with the network and/or the storage when transferring data from one worker to another and will also consider data already available in shared disk spaces (pre-fetched data transfers) again to avoid transfers during the execution. A clear pre-requisite for us to accomplish such an objective is that we are able to account for the energy spent by the network and/or disks when handling data. Finally, since the elasticity capabilities will make much more interesting to apply global boundaries to the scheduling of the workflow, we will also study alternatives to implement global boundaries for applications that do not generate the whole workflow in advance, like, for instance considering historical data of past executions to predict a particular metric for the whole workflow execution. 2.2.1.6 Conclusions The ASCETiC PM is a Cloud-unaware programming model that eases the way applications are constructed and deployed by using the Eclipse PM plug-in. Applications are first developed using sequential Java programming and later are deployed to the Cloud using the ASCETiC PaaS framework capabilities. It is a general-purpose programming model that hides the details of directly using the Cloud to end users and that now includes energy-awareness in its runtime, offering a complete solution for end users to program energy-aware applications. The energy-aware policies implemented in this second year follow the objective of optimizing at application level the energy, performance or cost for the applications, but considering certain boundaries for the optimization. Besides, the persistence of workers and the object cache that have been implemented allow the PM runtime to try to save additional amounts of energy when running an application. These savings come from the Java JIT compiler optimizations and the saving of object serializations respectively. 2.2.2 SaaS KPI Modelling and Visualisation Tools In contrast to the ASCETiC PM, which proposes a cloud-unaware programming model, in other situations, the SaaS development team prefers to remain aware
24 of the underlying Cloud infrastructures composed of various digital resources such as VM, open containers or block storage. Subsequently, they will be able to adapt the architecture of their SaaS application to better exploit these digital resources. The contribution presented by the SaaS KPI Modelling and Visualisation Tools propose a suite of tools to help a SaaS development team and SaaS provider in exploring various Cloud deployment configuration alternatives for a SaaS application. Based on the result of this exploration, the SaaS provider will learn realistic KPI thresholds to use on different metrics during production time. In cases where these thresholds are not deemed good enough, the SaaS development team will then have the capacity to use the tool suite to identify components or application features to improve or make optional. In other words, SaaS KPI Modelling and visualisation tools are a suite of tools provided to SaaS development teams to facilitate their work of understanding the runtime quality properties of Cloud application they develop. This knowledge can then be used to establish realistic trade-off measures and to identify where it may be worth refactoring the given SaaS application to include variability points on which self-adaptation at the level of IaaS, PaaS and SaaS become possible. Modelling tools are provided as projects of different Eclipse plugins: jucmnav: an Eclipse plugin for Goal oriented Requirement modelling Papyrus: an Eclipse plugin for UML Modelling Acceleo: an Eclipse plugin for performing model to text generation The Visualisation tool is provided as a web application and it requires access to the ASCETiC Application Monitor service provided at the PaaS layer. 2.2.2.1 Motivation Developing software applications or services for the Cloud remains a fairly new concept. Although one might be tempted to believe that it does not affect much how application developers work, it is far from the truth. At first sight, developing SaaS application or services may bear resemblance with traditional web-application. On the other hand, the versatility of Cloud providers going from a private Cloud at a SaaS developing company or federated private Clouds between partner companies to community Clouds generates many alternatives of potential deployment configurations for a SaaS provider. Each deployment alternatives will come with its advantages and disadvantages. For instance, deploying a complete SaaS application on one s private Cloud will enable tweaking the underlying hardware and virtualisation layer to achieve the best time performance for the given SaaS. However, this added performance cost at a significant cost since SaaS provider must acquire hardware and also provision the human resource needed to maintain that hardware and tweak the virtualisation layer. In addition, the SaaS provider will also suffer the cost induce by the energy consumed by the SaaS application. At the other end of the spectrum, a SaaS application can be fully operated at a single community Cloud provider such as Amazon, Google, Microsoft or others. In such cases, there is a certain loss of control for the SaaS provider who may not know if for instance, the throughput between two SaaS components deployed on different VMs will achieve or maintain the desired level. Thus forcing a SaaS application to rely on content delivery network services from the
25 selected Cloud providers. In the end, the SaaS provider is then often guided to use Cloud provider specific features that ultimately creates a Cloud provider lock-in for the SaaS application. Furthermore, different community Cloud providers propose different performance and cost models for computation, storage or network services. Deploying a SaaS application at a single community Cloud provider may not yield the best option in terms of cost. On the other hand, deploying various components of SaaS application on different private and community Clouds could degrade certain performance. Therefore, the many potential deployment alternatives for a SaaS application can become overwhelming for a SaaS development team. On the other hand, considering all important runtime quality properties at development time increases the chance of success for the resulting SaaS application. In particular, the SaaS development team must clearly identify how to measure dynamic aspects of a SaaS application such as time performance and cost including how energy consumption could affect the cost when running a part of a SaaS application in one s private Cloud. In the future, it is realistic to imagine that community Clouds may also develop pricing models that include energy aspects. To facilitate the work of a SaaS development team for exploring objectively how its SaaS application behaves in terms of time performance, cost and energy consumption for various deployment configuration alternatives, the following challenges should be addressed: (Challenge 1) Develop generic requirement patterns on quality properties to consider and how to evaluate them in a quantifiable way (based on metrics and measurements) (Challenge 2) Facilitate the work of SaaS development team to express how measurements of various metrics must be performed for the SaaS application and with what workloads (representative of target customer groups) (Challenge 3) Facilitate the work of SaaS development team to express the various deployment alternatives considered and for which measurements of time, cost and energy should be obtained (on Cloud testbeds where the ASCETiC toolbox at the IaaS and PaaS layers is installed) (Challenge 4) Provide an intuitive visual interface for SaaS development team to interpret easily how various deployment alternatives performed on the selected metrics 2.2.2.2 Related Work Overall, the SaaS KPI and visualisation tools propose an operational approach to perform a particular assessment of the architecture trade-off analysis method (ATAM) [9] focused on runtime quality characteristics of various deployment alternatives of a SaaS application to be operated in Cloud infrastructures. However, unlike traditional assessment with ATAM or other approaches that propose to quantify non-functional requirements [11] that attempt to set values at time of requirement and architecture analysis before the existence of any executable system, the application of ATAM proposed assumes that an executable application exists. For instance, a legacy application needs to be migrated to operate in the Cloud or a SaaS application is developed incrementally and earlier iterations have produced
26 portion of the SaaS application that can be studied to better understand its runtime behaviour on time, cost and energy. Both scenarios, migration of a legacy application and incremental development of a SaaS application are common place therefore we believe that this application of ATAM to objectively assess deployment alternatives in the Cloud will cover a significant percentage of Cloud application development projects. In [12][13], past research efforts in the context of ASCETiC already explored how to augment UML first to specify energy goals and related questions (related to Challenge 1) and second to define what to measure and how to measure it i.e., with what representative workloads (Challenge 2). Furthermore, in [14], a BIRT prototype to visualise measurement results (in relation to Challenge 4) is illustrated. Although the augmented UML is appropriate to solve Challenge 2 on specifying what to measure and how, the current approach for specifying goals and questions showed shortcomings. In particular, it did not allow to express easily relationships between goals on different metrics. Second, hiding goals and questions inside a UML stereotype makes it hard for SaaS development team to remember what the measurement goals are. Finally, developing more complex goal refinement strategies with potentially different level of refinements to identify metrics of interest and making available a repository of generic goal refinement patterns easily displayable to and understandable by SaaS development team requires a dedicated approach. In particular, UML modelling, even with its current SysML modelling with the notion of Requirement is not appropriate. In the last two decades, a large body of research studied how to express and visualise models of system goals, requirements and their relationships [16] [17] [18]. Effort in [[18], subsequently derived into the ITU-T standard Z.151 [20] and led to the development of the open source tool jucmnav [19]. Consequently, this tool seemed adequate as a starting point to express goal refinement patterns relevant to the ASCETiC context. The BIRT prototype also reveals a weak tentative with little room for dynamic interaction from the SaaS development team and the data collected. In particular, an important missing feature in BIRT reporting is the ability to filter results and regenerate results in a reactive manner. Furthermore, it is also important to explore different graphical approaches and widgets to facilitate the filtering of results to identify which deployment alternatives better perform on which metrics. Thus, let the user visually understand the trade-offs between time, energy and operational cost aspects. Consequently, a customer web application based on JavaScript and existing widget libraries seemed the most appropriate. Related to the Challenge 3 on facilitating the specification on various deployment alternatives, European research effort has produced CAML and CloudML, respectively under the ARTIST and PaaSage FP7 projects. However, it is worth noting that these projects have a different objective than the one targeted in ASCETiC. In particular, using CAML or CloudML, a SaaS development team has the power to model graphically every aspect of how to deploy their SaaS application, using a full-blown model-driven approach. On the other hand, the ASCETiC vision is rather that devops who usually specify
27 how SaaS application is to be deployed usually prefer textual representation over graphical representation. Consequently, we rely on devops to provide various deployment scripts using older textual scripting technologies such Chef. The role of the deployment models in ASCETiC are rather to express where various Chef scripts need to be executed to enable the various targeted deployment alternatives. On the other hand, the ASCETiC deployment model, rely on these Chef scripts to execute in the correct order, to open the appropriate communication port, etc. to end with a properly functioning SaaS application deployment. For parameters dependent of a given deployment alternative, for example, in particular case, a given port should be used instead of the standard one then the deployment model, which will explicitly refer to Chef server URL may also add these parameters at invocation time. 2.2.2.3 Scientific Contributions In Year 2 of ASCETiC, a noteworthy contributions is proposed to address Challenge 4 on intuitive interface to identify deployment configuration to achieve feasible quantitative non-functional requirements on time performance, energy and cost. On-going work to continue in Year-3 addresses Challenge 1 on Generic SaaS Goals and Requirements patterns. Given the on-going nature of the work to address Challenge 1, we will not present here an exhaustive catalogue of goals and requirement patterns on various non-functional aspects but rather highlight our general design approach with a focus on Cloud computing. Given that the approach followed in Year-1 was goal-based, it is quite natural to consider a goaloriented requirements engineering (GORE) framework to deal with such NFR in a larger perspective, since it provides all the tools to capture NFR as well as their inter-relationship like contributions or conflicts. In the scope of this work we considered GRL and the related Open Source jucmnav tool [19]. Figure 5: High Level Structure of NFR and some relationships In order to identify the key NFR, we must also consider the concerned stakeholders because subsequent reasoning on NFR can be done for different sets of stakeholders depending on the kind of deployment considered. From a SaaS application provider's point of view, response time and price are the most common aspect on which metrics are used to evaluate a SaaS applications, for these are the ones that will directly impact end users. From a service host's point of view, the overall cost is the most obvious parameter to minimize. This cost minimization is subject to multiple constraints, which are affected by NFR.
Figure 6: Minimisation of indirect energy costs of SaaS application.
For the service to remain useful and profitable, the quality of service must reach a certain level. Service Level Agreements (SLAs) contractually ensure that the required quality of service is achieved consistently by defining measurements. We rely here on the large amount of work already carried out by several European projects [21]. Figure 5 show some of them such as availability, performance (response time, resource and also energy), security/privacy of data, location/access/portability of data, exit strategy, etc. A distinctive feature of our work is of course the inclusion of energy NFR, which also relate to "Green SLA" that are being considered in the efficient use of resources and particularly energy by services and applications [8]. At this high-level, we can state some generic conflict, e.g. redundancy will increase energy consumption, security will also require more resources (CPU, transfer volume) to cope with encryption for example and thus increase energy demand. However, other contributions might depend on the application, e.g. a good data replication strategy that can provide a positive energy balance for data intensive applications. In the end, the assessment will need to be evaluated in the scope of a specific application which be designed or deployed indifferent way so triggering the need to explore a design space with the energy efficiency becoming a parameter of the total cost function. Our approach to explore the design space is to keep the SaaS application designer in control (1) by providing him specific visualisation tool detailed in the next section and which allow him to compare the behaviour of different possible design corresponding to different deployment configuration alternatives and (2) by providing decision support taking the form of a measurable goal graph. This means that requirements need to have an associate KPI (identified from our KPI profile [22]) and that refinement/contribution structure is decorated explicitly such as the structure is depicted in Figure 6 where metrics related to a goal of minimising the operating cost for a data centre hosting the SaaS application. This declared structure can then later be used by the visualisation tool to guide the SaaS application designer in exploring measurement results and determine what realistic thresholds to use for different metrics and how to formulate a utility function for evaluating each given quality criteria or cost aspect separately. Intuitive Interface to quantify non-functional requirements The first and simplest visualisation uses a simple chart to compare a given metric between multiple measurement sets. A measurement set is a set of measures taken in given conditions. For example, a measurement set may contain data about the response time, CPU, RAM and energy usage of one virtual machine (VM) during one particular test (a search in a product catalogue for example), while another measurement set may have taken place for same search but where the underlying VM has different CPU frequency, RAM and disk size, which conceptually corresponds to a different deployment configuration alternative of the SaaS application. These tests are repeated multiple times and measures are aggregated to get reasonable estimates.
30 Figure 7: CPU comparison between two nodes in the same test run Then, these measures can be compared on a graph. For example the response time can easily be displayed using a bar chart, one bar for each measurement set and the CPU usage will be displayed using a line chart, where the CPU usage can be visualised on the duration of the test. Figure 7 illustrates a visualisation comparing CPU history on two nodes of the same experiment. This kind of graphs allows developers to: Identify potential peaks or long CPU intensive operations that could be investigated further and optimised locally Compare two SaaS application instances corresponding to two different deployment configuration alternative to see the actual effects of a change on quality criteria of interest or operational cost. This visualisation is focused on comparing one specific aspect between multiple versions of a service or application. The second visualisation takes another point of view and visualises tests and versions according to several of their characteristics. The second view approaches the data set from the opposite side. Instead of taking different tests and versions and showing how they compare on one specific aspects, this view takes multiple aspects and for each of them, displays how the various measurement sets are distributed, then it allows to filter out the unwanted parts of the distributions and see which measurements sets match the filter. Figure 8 illustrates this concept more clearly: three bar graphs display the distributions of the measurements sets over three different aspects: Energy consumption, operational Cost and Time performance. On each chart, the x axis represents an arbitrary measure for the given aspect and the y axis represents the number of measurements sets. In the example below assumes that a utility function is provided to aggregate one or several metrics on a given aspect. The range of the three utility functions a range from 0-70 for energy behaviour and from 0-100 for operational cost and time performance.
31 Figure 8: Filtering on aspects. Figure 8 also shows the filtering mechanism. Using sliders on each graph, the user can easily select the parts of each chart that he wants to include in the filter. In this example, there is no restriction on the energy measures, there is a filter from 0 to 30 on the utility function related to cost and from 0 to 40 on the utility function related to time performance of each workloads exercised on each deployment configuration alternative. The user can immediately see the impact of the filters on how the results are distributed on a scatter plot for two of the aspects, in Figure 9, energy behaviour effectiveness and time performance where their respective utility function remains in the filtered range shown in, 0-70 for energy utility function and 0-40 for time performance utility function. In particular, each dot on the graph represent the scores of a give workload script exercised on a given deployment configuration alternative of a SaaS application (where each alternative is assigned a different colour except for grey, which is reserved to show filtered out points). Figure 9: Energy vs. Time Figure 9 shows the result of the filtering on cost and time on the energy versus time plot. We can clearly see that the time dimension (y axis) has been capped at a given level: all the blue points are under the level selected by the filter. We can also see however that not all the points under that line are blue. On the left side for example, a few dots are grey, even though there is no filter on the energy (x axis) aspect. This is an effect of the filter on the cost. The grey dots on the left side of the graph (low energy) had a higher cost, and they were excluded by the filter on the cost aspect. Figure 9 only shows results of exercising different workloads on a single deployment configuration alternative of a SaaS application. If we assume to be still in the development stage, the given graph results would help the SaaS development team observe the following: many points are grey out hence the SaaS development team would observe that the tested deployment configuration alternative for the given filtered ranges on time performance and energy behaviour will likely yield a significant amount of SLA violations. Thus, the SaaS development team would
32 conclude that either other deployment configuration alternatives need to be tested, if the current deployment configuration is to be used then the SaaS application will likely need refactoring to perform better on trade-offs between time performance and energy behaviour or the last solution is to relax the filters, for example, allowing a range on the time-performance utility function from 0 to 80 would then show most of the points in blue. 2.2.2.4 Future Contributions Our experimentation so far are still limited to partial data sets manually collected from a news publication application currently being migrated to the Cloud and complemented by a test data generator. The next step in our work will be to achieve a complete integration with the ASCETiC testbed and validate the approach on two case studies. In this process, we also enrich our goal refinement pattern database and further develop the visualisation capabilities of the tool in order to provide the best assistance to the Cloud application developer. 2.2.2.5 Conclusions Where in Year 1 our work was restricted to direct reasoning on energy requirements, in practice it is always necessary to take account of other nonfunctional requirements (NFR) that can have an energy impact. In year 2, we considered such requirements and especially interested conflicting NFR such as performance (requiring to mobilise more resources, e.g. to be able to ensure good service response time), security (requiring extra layers of software and communication overheads), etc. Our contributions are the following: At the requirements level, we develop an approach to capture NFR on various quality aspects in the form of goals as well as their relationship to energy goals. On-going work elaborate pattern-oriented goal refinement for various quality aspects to identify high level goal for evaluating such aspects and then identify the catalogue of metrics that could be used by a SaaS provider to elaborate actual KPI for a given SaaS application. At the design level, our existing UML profile supports the collection of metrics related to energy, cost and time. At run-time level, We provide a visualisation tool allowing the developers to explore different design alternatives and guide them in the selection of a compromise. We support the collection of a wider set of metrics based on an extended set of probes, e.g. to collect response time related to performance.
33 2.3 PaaS layer 2.3.1 PaaS Self-Adaptation Manager 2.3.1.1 Motivation The PaaS Self-Adaptation manager (PaaS SAM) is the principle component in the PaaS layer for deciding on the adaptation required to maintain SLAs inside the ASCETiC framework. The overall aim of this component is to manage the trade-offs between energy, performance and cost within the ASCETiC framework, during adaptation at runtime. This component is new in Y2 and its architectural contributions are discussed in this section. 2.3.1.2 Related Work The PaaS Self-Adaptation manager is an adaptive system [24], which is implemented using fuzzy logic [26], [27]. The principle purpose of selfadaptation is to provide environments that are self-configuring, self-healing, self-optimizing and self-protecting, with the aim of enabling large systems to self-manage. In the case of the ASCETiC project the aim is to manage tradeoffs between energy, performance and cost, without interference from PaaS operators. PaaS layer adaptation includes aspects such as multi-clouds, heterogeneity of pricing models, rescaling VMs and services and selecting new lower power services. The PaaS SAM is similar to the SHoWA framework s [28] recovery planner. The recovery planner is likened to a disease database, with a set of rules on how to treat certain anomalies in performance. The Self-adaptation manager goes further and specifies conditions such as the recent violations of a similar nature and recent adaptation responses. It also avoids pure thresholds and utilises fuzzy logic in order to give a more refined response during adaptation. The aim of which is to further graduate the adaptive behaviour of the manager, thus avoiding situations where adaptation is performed for minor blips in performance or where a similar anomalies in performance have recently been mitigated. The Synthesis of Cost-effective Adaptation Plans (SCOAP) framework [29] is a similar PaaS/application oriented adaptation framework which focuses on the economic costs of utilising Cloud infrastructures under various pricing models. It aims to select the correct amount of VMs of different types and pricing schemes to meet demand. It achieves this by utilising previous trace logs and queuing theory based demand models to match demand to the correct resources. The PaaS SAM also holds similarities with Mistral [30], which is a controller framework that optimizes power consumption, performance benefits as well as the impact of adaptations. Mistral focus is however on the IaaS layer. Mistral utilises models for both power and performance to compare future perceived reward of adaptation against the incurred cost of adapting. Each possible adaptation is represented as part of a graph structure, with adaptations
34 transitioning between each configuration state. The state is then assessed for its perceived lifetime and benefits according to the power and performance utility models that have been employed. This is in contrast to the PaaS SAM s focus on the user s applications and SLA violation events. In which overall utility of an application is aimed to be maintained, by invoking changes to the environment in order to maintain or restore application performance. 2.3.1.3 Scientific Contributions The PaaS SAM is the principle decision engine for deciding on the type of adaptation to make. It is notified of the need to take an adaptation by the SLA manager as shown in Figure 10. Figure 10: PaaS SAM Overall Workflow The adaptation rules then run in two stages. The first stage indicates the type of adaptation to make such as: add/remove VMs by assessing the causes of the SLA breach. The second phase indicates the exact nature of this adaptation such as what type of VM to add or which VM should be deleted. Notifications of SLA breaches principally contain the following information: Time: the timestamp of the detected violation Value: a raw value representing how large the breach is (it's the measured value of the violation) Type of violation message: This is either a violation if the violation is detected, or warning if the guarantee is near the violation threshold SLA UUID: the UUID of the SLA SLA Agreement Term: used to distinguish between different constraint terms SLA Guaranteed State: Provides information on the border conditions of the SLA: o Guarantee Id: it s the metric to be monitored
35 o o Operator: such as greater than, less than, equal Guaranteed Value: the value of the threshold The events that are provided to the PaaS SAM are for the guarantees on: the application s power consumption and on the overall energy consumption of an application The principle actuators made available to the PaaS SAM are the ability to: add and remove VMs from an application and terminate the application as a whole. In the future this list will be extended to scale the VM vertically i.e., in terms of its allocated memory and CPU/s. The PaaS SAM therefore has to manage the decision rules that map between the event notifications and the potential actuators. In its most basic mode of operation a tuple of the <Agreement Term, Direction, Response Type> is utilised to determine the form of adaptation to take. Examples of this could for example be: <energy_usage_per_app,lt,remove_vm> or power_usage_per_app,lt,remove_vm> This includes an overall threshold value, which determines how many events are required before a rule fires, assuming that temporary reporting of SLA breaches can be ignored. An example of this would be if VM power was to become too high due to a short burst of CPU utilisation. In a more advanced mode of operation fuzzy logic is used with the following input parameters: Current metric difference: The current difference between the guaranteed value and the actual measured value. The trend difference: The difference between the first detected breaches value and the current detected breaches value. Energy usage per App: The count of how many times the event has fired Power Usage per App: The count of how many times the event has fired These criteria therefore allow the construction of more complicated threshold rules that can account for an SLA breach been self-correcting or a minor passing breach or warnings of impending breach that can be averted. The PaaS self-adaptation manager is at the heart of the adaptation in the ASCETiC PaaS layer. It is demonstrated further in the use cases that are documented in the deliverables D6.2.2 and D6.3.2. 2.3.1.4 Future Contributions In the future the Self-Adaptation manager will be extended to consider aspects such as: the economic cost of invoking an actuator and its payback period for
36 invoking change on the system. It will also proactively manage potential future breaches by subscribing to warning notification events from the SLA manager. 2.3.1.5 Conclusions The self-adaptation manager is at the heart of deciding upon the adaptive response in the PaaS layer, working in collaboration with the Application manager and the SLA manager, in order to manage adaptation at runtime. It is capable of performing this in a selective fashion that ensures adaption only occurs when it is absolutely necessary. Its focus in year 2 as a new component has been to focus on power and energy awareness and upon horizontal scaling of applications. 2.3.2 PaaS Energy Modeller The role of PaaS Energy Modeller is to provide per application energy information. Such information includes power consumption information from power measurements collected by the infrastructure layer and events generated by the application and reported by probes. 2.3.2.1 Motivation PaaS Energy Modeller provides aggregated measurements of consumption (Wh) and average instant power (W) per each application and its events. This information is required by other components: The Pricing Modeller, which needs to know the current consumption to get the billing information, but also the forecast the price change of an application deployment. The self-adaptation manager, which needs to learn the current consumption in order to take decisions; these decisions allows the optimization of energy consumption based on information available to PaaS layer. The SLA Manager, to support monitoring of current SLA violations or support proactive monitoring to identify future violation of SLAs. 2.3.2.2 Related Work Modelling of energy per application at PaaS level has a fundamental difference with the infrastructure energy modelling: it does not take into account physical infrastructure information (server, power meter reading), but it uses the following: Information about the topology of the virtual machines supporting an application (number of VMs) published by the Application Manager on the message queue, when the application configuration change (e.g. a VM is deployed, or destroyed). VM instant power measurements from the AMQP interfaces. Such data that is exchanged between IaaS and PaaS. Application Event data generated by application-level probes and sent to the Application Monitor PaaS Energy Modeller has the important role of providing information about the application behaviour in term of energy consumption to allow, both developers and Cloud operators, understanding the energetic impacts of their development and deployment choices. This is unpredictable and is an important challenge [31]. Moreover, the PaaS Energy Modeller allows for the
37 consumption of specific events occurring within an application, thus extending the level of analysis to a more applicative level. 2.3.2.3 Scientific Contributions The PaaS Energy Modeller focuses on energy awareness at application level rather than only the infrastructure level, to allow the infrastructure to operate under its optimization policies without interfering with it. In this way, for the Y2, ASCETiC develops intra layer optimization strategies that will pave the way for the inter layer optimization that will be developed for Y3 and will be based on results of the current and past year. Work occurring in Y2 allows the building of novel services for cloud provider and extends the traditional pricing schemes offered by public cloud provider as analysed in [32]. In order to achieve this objective, the PaaS Energy Modeller cooperates with the Pricing Modeller and the SLA Manager: SLA Manager allows users to define energy related metrics that impacts the application consumption. Pricing Modeller allows defining pricing scheme for a configuration that takes in to account energy consumption and application deployment. PaaS EM itself evaluates application and its events consumption information and the forecasted future consumption. By combining the previously mentioned features, the PaaS layer provides estimation of consumption and costs per application and its events. Moreover it checks violation of SLA that occurs during the application deployment time. PaaS Energy Modeller supports cloud operator to evaluate how a change in the current configuration, for instance adding a new front end virtual machine to a current deployed application, impact its energy consumption and its costs. These services allow cloud operators to evaluate the cost/benefits of configuration changes by learning if the price of the modified deployment will be acceptable given the increased performance levels. Power Modelling Experiment The PaaS Energy Modeller provides an interface to estimate power/consumption of application and its events. Such interface, when invoked, builds an energy model trained with data collected from a running application. The model provides: Estimation of future application instant power Estimation of future application power consumption The estimation of instant power is done by using the historical data collected from the application consumption. In particular, for each VM within the same deployment, a power model is generated by using historical data of <Power,CPU,Memory>. Such model has two inputs <CPU,Memory> and one output <Power>. When an estimation is required, it interpolates the future CPU and Memory consumption trends and then it provides these two values to the energy models that provides the expected output. If the estimation is requested for the whole application, this calculation is applied to each VM within the same application deployment and each predicted power is summed together to provide the predicted power for the application as a whole. Estimation of consumption for an application is based on the previous estimation of future power. For each VM, the EM first estimates the future power
38 value and then it uses such value to calculate the energy consumed between the last available power sample from VM and the future estimated power values. This consumption is then added to the accumulated power consumption to provide the estimated consumption. As done before, if the consumption is computed for the whole application, the consumption estimation is computed for each VM belonging to the same application deployment and then all values are summed together. During the Y2 tests for modelling energy consumption, it has been setup a small lab environment to validate an innovative approach to modelling energy. Such approach takes advantage of neural networks based models. The lab environment consists in a physical machine, a ProLiant BL460 blade. Since in the test lab there wasn t power probes, power measurement where collected by using IPMI tools131[62], this tool allows seamless access to server interfaces implementing the IPMI protocol (such as the HP ilo interface provided by the HP Blade server). To collect virtual machine s performance metrics (i.e. CPU and Memory utilization), a probe has been installed in the application container and, by using the Java Sigar library[63], we collected CPU and Memory utilization. Finally, in order to coordinate collection of Power measurements and performance metrics at the same time, the two function has been embedded in a simple Java application that invoked the IPMI tool via scripts and queried the metrics via Sigar library; the application stores all data in a local database. To generate a meaningful level of resources usage, the Python application CPU-load-generator[64] has been installed in the container. The workload application and the probe have been deployed in a Docker[65] container built from a Dockerfile. In order to deploy the container, RancherOS[66], a small Operating System dedicated to deployment of Docker containers has been used. FROM ubuntu:14.04 RUN apt-get update && apt-get install -y python python-pip curl git wget default-jre RUN git clone https://github.com/beloglazov/cpu-load-generator.git RUN./cpu-load-generator/install-lookbusy.sh RUN mkdir sigar ADD. /sigar CMD python /cpu-load-generator/cpu-load-generator.py -n 1 300 /cpu-loadgenerator/test.data & cd /sigar && java -Djava.library.path='/sigar/' -jar ascetic.jar In order to collect the data set to train a model, the following steps, shown in Figure 11 have been performed: [a] collecting instant power (W) measurements from the server ilo interface [b] collecting metrics of performance within the application via Sigar library (CPU and Memory Utilization) [c] inject a workload by using CPU-load-generator [d] storing inside a database a row for each reading of <POWER,CPU,MEMORY> with the referred timestamp [e] train and verify the model
39 Figure 11: Lab Setup The model is based on neural network built with Neuroph[67], a Java based IDE that allows to create different kinds of neural networks and perform a training process. Neuroph is also available as Java jar library. In our scenario we performed training for a Multi-Layer Perception network with 2 Inputs neurons (CPU, Memory) and 1 Output neurons (Power). As for the number of layers we choose 3. After we trained the model with 2324 samples of data collected from the application running inside the container, we tested it by providing the same input samples and comparing the estimated power samples, from model, with the ones measured. The result of this task is Figure 12. As average the model has provided an average error of 7%.
40 Figure 12: Model testing The test demonstrated that Neurph is a good candidate to model energy consumption of an application, in particular by embedding its Java libraries into the Energy Modeller prediction. The experiment was also a preliminary test of provide power measurement estimation at VM level without using power meters but only using server sensors. In order to further investigate this scenario, it will be required to perform experiments with the CPU steal time to allocate the fraction of physical CPUs used by the virtual host as discussed in (A Measurement Study of Server Utilization in Public Clouds, Dec. 2011) and compare the error with the average error measured with a model that has access to physical probes. Also collection of CPU steal time metrics requires specific hypervisors, the one used, Virtual Box, provided a CPU steal time always equal to zero (even when running the application inside a VM instead of a container) and for this reason further tests are required by changing virtual infrastructure. 2.3.2.4 Future Contributions During the first year, the Energy Modeller at PaaS layer focused on measuring consumption of an application and its virtual machines, while at the second year the forecasting functionality has been introduced; as for the third year more test will be performed to evaluate other energy modelling strategies. In particular three important contributions that will be evaluated and developed during the final year will be: Identifying training data set Profiling power behaviour of application components (e.g. front end, database) Establishing relationship between events and application performance and consumption
41 A training data set is required by the EM to initialize its energy models. In absence of data collected from a deployment, it is impossible to estimate the consumption of an application as it requires training data. A training dataset enables the training of the energy model for an application s VMs and events in absence of historical data collected from a past deployment. In order to generate such data sets, we would like to identify a strategy, used by the Energy Modeller, to create these sets of data. Such strategy could be to attach to the application (in the OVF or a repository) a training set, to allow the EM building an initial energy model to support estimation. Profiling, is another feature that we would like to investigate in Y3. In fact, by learning how an application component (typically a VM) behave in term of resources and energy utilization, we could provide better estimation for application consumption and support identifying impacts, in term of energy consumption, of a change in the application configuration (for example how much more energy will be consumed by the application if a new front end is added). Finally another functionality identified for Y3 is to model application events in terms of their impact and performance, in particular by trying to estimate the resources utilization of a single event and how it impacts the overall application performance. Such information could then be used during negotiation to define SLAs values. 2.3.2.5 Conclusions In conclusion, during this year and for the next one, PaaS Energy modeller leverage its integration with PaaS components, such as: the Pricing Modeller, the SLA manager and the self-adaptation manager, to provide novel cloud platform services that goes beyond current paradigm of per resource price scheme. In this way ASCETiC will allow application developer to create application with their specific performance needs expressed by negotiable SLAs, while supporting cloud platform providers in delivering competitive pricing schemes that leverage energy efficiency resource allocation strategies at all cloud layers. 2.3.3 PaaS Virtual Machine Contextualizer The role of the Virtual Machine Contextualizer tools (VMC) within the ASCETiC architecture is to embed software dependencies of a service into a VM image during deployment and configure these dependencies at runtime via an infrastructure agnostic contextualization mechanism. This agnostic approach enables interoperability between IaaS providers. Additionally, the VMC enables the use of energy probes for the gathering of VM level energy performance metrics at the PaaS level. Additionally, the VMC supports multi-provider scenarios in the ASCETiC project through its recontextualization mechanism. Recontextualization enables the dynamic reconfiguration of an application s components during the migration to different provider s resources. 2.3.3.1 Motivation The need of the VMC component is motivated through its requirements to support self-adaptation. Its Recontextualization mechanism enables selfadaptation by providing functionality to reconfigure a VM after a migration event to a new execution environment. The component maintains a layer of
42 abstraction between the application and the provider s infrastructure so that the application is unaffected by environmental change. 2.3.3.2 Related Work The long lasting deployments of systems such as Cloud services create a need for maintenance and management of the application during the operations phase. Such maintenance may include, e.g., applying system security updates across all associated nodes. For small scale systems this can be done manually, but the process is time consuming and error-prone. The rapid and automated elasticity of Cloud services further limits the feasibility of manual system management as instances of VMs may be added or removed automatically during any period of the day. Configuration management is a well-established concept for managing distributed systems during runtime, not just specific to the field of Cloud computing. Aiello et al. [33] define configuration management as: A management process that focuses on establishing and maintaining consistent system performance and functional attributes using requirements, design and operational information throughout a system's lifecycle. Computing oriented configuration management tools such as CFEngine, Puppet, or Chef are commonly used in large scale hosting on physical platforms. These tools provide a number of benefits including: the reproducibility and automation of software configuration across an unlimited number of (virtual) machines, the continuous vigilance over running systems with automated repairs and alert mechanisms, enhanced control over and rationalisation of large scale deployments and the ability to build up a knowledge base to document and trace the history of a system as it evolves. The CFEngine [34] project provides automated configuration management of large networked systems. CFEngine can be deployed to manage many different types of computer system such as servers, desktops and mobile/embedded devices. The project was started in 1993 by Mark Burgess at Oslo University as a way to automate the management of dissimilar Unix workstations. In the work by Burgess the foundations of self-healing systems were developed and as a precursor heavily influenced the ideas of Autonomic Computing developed later by IBM. Puppet [35] is a configuration management system originally forked from CFEngine. Puppet provides graph-based and model-driven approaches to configuration management, through a simplified declarative domain specific language that was designed to be human readable. The model driven solution enables the configuration of software components as a class, a collection of related resources where a resource is a unit of configuration. Resources can be compiled into a catalogue that defines resource dependencies using a directed acyclic graph. A catalogue can be applied to a given system to configure it. Chef [36] rose out of the Ruby-on-Rails community out of dissatisfaction with Puppet's non-deterministic graph-based ordering of resources. In contrast to Puppet, Chef places emphasis on starting up services from newly provisioned
43 clean systems, where the sequence and execution of configuration tasks is fixed and known by the user. This makes Chef particularly well suited to the paradigm of Cloud computing where VM instances are short-lived and new instances are spawned from a newly provisioned base image. Chef uses the analogy of cooking and creates recipes that are bundles of installation steps or scripts to be executed. Although existing configuration management tools are well suited to Cloud computing they do not resolve all the issues surrounding configuring an application in a dynamic environment. Most notably, these systems operate on the application level for automated management of runtime reconfigurations for a large number of system nodes. Contextualization operates on a lower layer of the system stack. Contextualization offers a multi-purpose mechanism for adapting a generically configured VM to a specific and dynamically assigned execution environment. For example, the required settings to connect a booting VM to a VPN can be supplied by the infrastructure at boot time. Configuration management tools and contextualization are complementary techniques for dynamic reconfiguration. Contextualization deals with lower level IP specifics such as network configuration and platform specific settings, while configuration management can be used to manage updates at the application level. Recontextualisation takes this one step further and can be used to adapt to system change, including making newly migrated VMs operate properly in the (potentially different) system environment of a new host. Recontextualization is the autonomous updating of configuration for individual components of an application and supporting software stack during runtime for the purpose of adapting to a new environment. 2.3.3.3 Contributions Although the VMC component is no longer a SotA major scientific contribution in itself, it does however enable the self-adaptation of the PaaS (in the Application Manager) and IaaS (in the Virtual Machine Manager) layer through its contextualization and recontextualization functionality. These two mechanisms provide facilities to continue to report energy metrics while the underlying infrastructure of a VM changes due to self-adaptation. One example of this support for self-adaptation is the enabling of an application to scale horizontally through the contextualization of worker nodes within a head node. Another example is the continuous and uninterrupted reporting of VM level probes during a migration event from one host to another. The remaining content within this section provide scientific insight into the technical capabilities of the VMC as part of its recontextualization functionality. To assess the VMC several experiments were performed to evaluate its performance and feasibility. For all tests, Libvirt version 0.9.9 was used to monitor and manage the VMs. QEMU-KVM version 1.0.50 and Xen version 4.0.0 were used as hypervisors, both running on the same hardware using CentOS 5.5 (final) with kernel version 2.6.32.24. The hosts used in these tests are on the same subnet, have shared storage and are comprised of a quad core Intel Xeon X3430 CPU @ 2.40GHz, 4GB DDR3 @ 1333MHz, 1GBit NIC and a 250GB 7200RPM WD RE3 HDD.
44 Figure 13: Response time of concurrent user requests to generate ISO images To confirm the validity of our contextualisation approach in such a use case we have created and tested a prototype of our contextualization tools on our Cloud testbed using a Dual CPU (Intel Xeon E5630) server with 16~GB of RAM and 1~TByte WD SATA 7200~rpm HDD. Figure 13 and Figure 14 provide evidence on the potential performance of our approach for contextualization with regards to preparing VM Image sizes in the range of 1-5~GByte in increments of 1~Gbyte and with varying numbers of concurrent user requests from 10-100 in increments of 10, to create ISO CD Images containing 1~Mbyte context data. The results show adequate scalability and response time over 10 iterations of the experiment with minimal variance, as shown by the error bars on the graphs. Figure 14: Time to prepare a VM image.
45 Figure 15: Time measurements of recontextualization. The results of the evaluation are shown in Figure 15. The first set of bars illustrate the time to migrate a VM from one host to another with recontextualization running and context data attached and the second set of columns illustrate the same migrations with recontextualization turned off and no virtual devices mounted. The third column illustrates the time spent within the recontextualizer software during the tests from the first column, measured from when the event for migration was received in the recontextualizer until the device had been removed and reattached. The values shown are the averages from ten runs and all columns have error bars with the (marginal) standard deviations which are all in the 0.03 to 0.07s range. Based on the evaluation we conclude that the recontextualization process adds about an 18% overhead using either hypervisor compared to doing normal migrations. For KVM, most of the extra time required for recontextualization is spent outside the bounds of our component, likely associated with processing events and extra overhead imposed by preparing migration with virtual devices attached. In the case of Xen the device management functionality in Libvirt proved unreliable and we therefore had to bypass the Libvirt API and rely on sub-process calls from the recontextualizer to Xen using the xm utility. This workaround increased the time needed for recontextualization in the Xen case. Figure 16: Breakdown of time spent during recontextualization.
46 There are four major phases associated with the recontextualization process. First, information about the VM corresponding to the event is resolved using Libvirt when the migration event is received. In the second phase, any current virtual contextualization device is identified and detached. Third, new contextualization information is prepared and bundled into a virtual device (ISO9660) image. Finally, the new virtual device is attached to the VM. A detailed breakdown of the time spent in different phases of recontextualization is presented in Figure 16. The above mentioned workaround for Xen interactions affects the second and fourth phase (detaching and attaching of devices), most likely increasing the time required for processing. In the first and third phases Xen requires significantly longer time than KVM despite the VMs being managed using the same calls in the Libvirt API, indicating performance flaws either in the link between Libvirt and Xen or in the core of Xen itself. 2.3.3.1 Conclusion To conclude, this section of the deliverable has reported on the motivation, related work and contributions of the VMC. The VMC embeds software dependencies of a service into a VM image during deployment and configures these dependencies at runtime via an infrastructure agnostic contextualization mechanism. This approach enables interoperability between providers and enables the use of energy probes for the gathering of VM level energy performance metrics at the PaaS level. The VMC also enables support for multiprovider scenarios in the ASCETiC project through its recontextualization mechanism. Finally, a feasibility study evaluates the performance of the component from a scientific perspective validating the mechanisms implemented. 2.3.4 PaaS Pricing Modeller 2.3.4.1 Motivation The goal of the PaaS Pricing Modeller is to provide energy-aware cost estimation related to the operation of applications employing VMs on a specific IaaS provider. Since the IaaS Pricing Modeller (see the respective section section) includes pricing schemes which incorporate energy costs in an explicit manner, the applications utilizing such IaaS providers must be informed of their estimated energy costs prior to deployment time and during operation. Such estimation is important because it facilitates cost comparisons between potential IaaS providers prior to deployment, hence fostering competition across IaaS providers. Through this cost estimation functionality and cost-aware provider selection by the PaaS SLA manager, the PaaS layer essentially acts as a broker agent of available Cloud infrastructures for executing applications. 2.3.4.2 Related Work Depending on whether the role of platform layer is conceived as that of a separate stakeholder standing in-between of the infrastructure and the applications or not, two different cases can be identified: Full information exchange among PaaS and IaaS layers is allowed. Partial or even no information is passed from the IaaS to the PaaS layer.
47 The literature on pricing platforms for Cloud services deals exclusively with the full information case. The work in [39] introduces a pricing mechanism for grid computing, with the aim of showing how a broker can accept the most appropriate jobs to be computed on time and on budget. Paw et al. [40] proposes a Cloud broker service (STRATOS) which facilitates the deployment and runtime management of Cloud application topologies using Cloud services sourced on the fly from multiple providers, based on requirements specified in higher level objectives. In [41] a new type of service is considered, where video-on-demand providers make reservations for bandwidth guarantees from the Cloud at negotiable prices to support continuous media streaming. We note that none of these research efforts considers the possibility of passing energy-related pricing information to the SaaS/User layer, as done by the PaaS Pricing Modeller. If the IaaS provision and the PaaS provision roles are played by different stakeholders and so full exchange of information is no longer justified, information asymmetry exists between the IaaS and PaaS layers. Information asymmetry models examine what happens when one party to an interaction has relevant information, whereas the others do not and are considered in [38]. A relevant aspect is that the PaaS provider has the role of an intermediary in a two-sided market: the market for Cloud services (between infrastructure and applications) on one side and the market for user services (between applications and their users) on the other side. The two markets are not independent as users select applications also on the basis of perceived performance which depends on the Cloud infrastructure selected by each application. The recognition of this interdependence has altered the business model followed by the intermediaries and has recently attracted a lot of attention in the economics literature, see e.g., [37]. 2.3.4.3 Scientific Contributions The main functionalities offered by the PaaS Pricing Modeller are to: Predict and calculate the payments incurred by an application (or a component of it) based on the different pricing schemes employed by IaaS providers, under different deployment cases. Predict and calculate specifically the payments due to energy consumption of an application (or a component of it) when such tariffs are employed by IaaS providers. (If the latter do not charge energy consumption explicitly, the information on energy costs is of no use to applications.) The prices the PaaS Pricing modeller computes are necessarily based on the pricing schemes of the underlying IaaS providers. Whether or not the PaaS and IaaS providers represent separate stakeholders, the PaaS Pricing Modeller should be sufficiently flexible to allow the expression of the different business goals in each case. In general, the announced prices may depend on: The tariffs employed by IaaS providers. Historical information and/or forecasts related to different tariffs employed by IaaS providers, under different deployment scenarios The specific PaaS provider business goals.
48 Currently, the PaaS Pricing Modeller is designed not to actively pursue profit maximization. (This will be the case under competition, where profits are marginal, but it is not the case in monopolistic or oligopolistic situations.) Thus the prices announced to applications correspond to the total cost of running the application s VMs to a specific infrastructure. Hence the pricing modeller currently acts as an aggregator of the costs incurred across all VMs pertaining to the same application. This is also true for prediction: price estimation involves the aggregation of price estimates for each VM of the same application. As previously described, a key feature of the PaaS Pricing Modeller is that it facilitates competition between IaaS providers, because of the brokering functionality of PaaS PM which allows the selection of the most cost-effective infrastructures by applications. The main results of such competition are the following: Applications have the incentive of using the service of PaaS Pricing Modeller. Competition favours IaaS providers which incorporate energy consumption charges, as the IaaS Pricing Modeller does. That is, both IaaS providers and applications have the incentive of adopting the respective layers of the ASCETiC architecture, while each pursues its own business goals. Thus, the PaaS (and IaaS) Pricing Modeller provides the economic incentives of adopting the ASCETiC architecture. In the rest of this section, we use a theoretical analysis and verify that competition indeed achieves the results outlined above. In order to analyse the effect of such competition, one is required to model the economic decisions of all relevant stakeholders, i.e., the IaaS providers, application owners and their users, as the end result depends on the complex interaction of their actions. A microeconomic model which incorporates the decisions of the former stakeholders is introduced in section 2.4.3.3 describing the IaaS Pricing Modeller. (We advise the reader to first consult section 2.4.3.3 and then continue with the present section.) For ease of reference the following quantities are defined in section 2.4.3.3: Variable v i r i λ i (v i ) π 0 π 1 P i (v i ) π e c(m) ρ Explanation Number of VMs belonging to application i in the infrastructure Reward per executed instruction accrued to application i Demand (instructions/s) of application i when v i VMs are used Static price charged per VM per unit of time Price charged for each watt consumed by a VM Power consumption of application i when v i VMs are used Price charged for each watt by the energy provider Maintenance cost of m active physical servers. It is assumed c(m) = cm, for some constant c. Consolidation degree, defined as ρ = i v i /m More specifically we prove the claim that for an IaaS provider, charging VM energy in addition to a flat fee per VM, as done by the two-part tariff in section
49 2.4.3.3, is optimal under competition: the provider cannot increase the net benefit of its applications without suffering losses (i.e., negative profits). Using the model notation, the aggregate net benefit across applications is i [r i λ i (v i ) π 0 v i π 1 P i (v i )]. Under ideal competition without entry costs, no IaaS provider will be able to make strictly positive costs because in that case he will be left without demand. (The demand will be attracted by other providers with a smaller albeit nonzero profit margin.) Thus competitive providers will necessarily barely cover their costs, i.e., π 0 i v i + π 1 P i (v i ) i = π e i P i (v i ) + c ( i v i ). ρ Hence the aggregate net benefit across applications is i[r i λ i (v i ) π 0 v i π 1 P i (v i )] = [r i λ i (v i ) π e P i (v i ) c ( v i i )] [r ρ i λ i (v i ) π e P i (v i ) c ( v i i )], where ρ v i is the demand if application i itself owned the infrastructure, i.e., v i maximizes r i λ i (v i ) π e P i (v i ) c ( v i ). Hence the maximum aggregate net benefit ρ while profits are exactly zero is achieved when the IaaS provider employs a two-part tariff π 0, π 1, similar to IaaS Pricing Modeller, with π 0 = c/ρ and π 1 = π e. In this tariff, the true energy price is passed on to the application, while the flat fee part covers the per server maintenance costs. Now let us compare a provider with a two-part tariff as above against any other provider offering a net benefit to application i strictly greater than r i λ i (v i ) π e P i (v i ) c( v i ). Since the first provider achieves the maximum ρ aggregate net benefit, the second provider will necessarily provide a net befit to some other application j which is lower than r j λ j (v j ) π e P j (v j ) c( v j ρ ). (Otherwise, the aggregate net benefit of the second provider will exceed the maximum possible value.) Thus the second provider will not be competitive for application j. Hence the optimal pricing scheme under competition is to use the above two-part tariff. As a numerical example shows in the next section, an IaaS provider employing the static pricing scheme described in section 2.4.3.3, cannot be competitive for any selection of static price. The above show that IaaS providers that incorporate energy charges in a twopart tariff are the most competitive. It is self-evident that applications have the incentive of using the functionality of PaaS Pricing Modeller, since by doing so they can select providers offering higher net benefit.
50 Evaluation As an exposition of the competition between IaaS providers and the effect of the pricing scheme, we consider an example, which examines the net benefit of two applications as a function of their diversity. We assume the users of the applications do not tolerate average request response delays above some value, which is specific to each application. Figure 17 depicts the payments per Figure 17: Comparison of payments by two applications to IaaS providers as a function of application QoS diversity. IaaS providers employing static pricing are not competitive because they require higher payments for at least one application. time unit incurred by each application under two different pricing schemes: i) the static price scheme described in 2.4.3.3, which does not take energy consumption into account, and ii) the two-part tariff described in section 2.4.3.3, which incorporates energy consumption. The parameter values used are R 1 = R 2 = 20, r 1 = r 2 = 1.5, ρ = 10, π e = 0.285, p 0 = 10, p 1 = 5, λ 1 max = λ 2 max = 50, μ = 50, c = 0. The price parameters of each scheme are chosen under the assumption of ideal competition, i.e., they are chosen as described in the previous section. The horizontal axis represents the maximum tolerable delay by users of application 1 (normalized to that of application 2). For stringent delay requirements, when max tolerable delay is less than 0.3, application 1 does not at all use the provider with static pricing since the high costs outweigh benefits. The latter hosts application 2 only, at a competitive price. When the delay requirements of application 1 are not so stringent, the demand rises and application 1 starts using the static provider, but at a cost which is not competitive: application 1 payments exceed the ones offered by the provider employing a two-part tariff. As applications become less diverse (i.e., max tolerable delay close to 1) the two providers are equally attractive, although the provider offering the twopart tariff is slightly more.
51 For values of the max tolerable delay above 1, the less tolerable users belong to application 2 now, and they bare most of the costs in both providers. Nevertheless, the static provider continues not to be competitive as the payments it offers to application 2 exceed those offered by the provider employing the two-part tariff. Figure 18: Aggregate net benefit for the cases of a two-part incorporating energy charges, and a static pricing scheme. In competitive markets, applications are always benefited the most under the two-part tariff. In Figure 18 the aggregate net benefit over all applications is depicted for the two part tariff and the static pricing scheme. The overall net benefit under static pricing may decrease if some applications have stringent delay requirements. The example suggests that applications with more stringent delay requirements are benefited from PaaS Pricing Modeller since through it they may select providers which incorporate energy consumption into their pricing schemes, as IaaS Pricing Modeller does. At the same time, IaaS providers of this type are favoured by the competition since they are able to offer most cost effective services to their applications. Thus, IaaS providers are benefited as well by an introduction of a PaaS Pricing Modeller fostering competition. 2.3.4.4 Future Contributions During the third year of the project, the inter-layer communication will be enabled. We intend to evaluate the economic incentives for applications of adopting an energy aware SaaS layer, as the one in the ASCETiC architecture. In particular, we will explore the effect of provider selection through energyaware benchmarking prior to application deployment, as well as the effect of energy-aware task scheduling within the application during operation time. As is the case with the other two layers, we expect such a SaaS layer to lead to more efficient operating points in spite of the opposing business goals of the different stakeholders. We intend to testing those hypotheses under more realistic market scenarios, i.e., under non-ideal competition and oligopolistic situations.
52 2.3.4.5 Conclusions In year 1 of the project, the component supported a rudimentary functionality, which mainly included the ability to calculate a simple mark-up of the price announced by the IaaS provider. Such a case makes sense when the PaaS layer serves a single IaaS provider, e.g., which happens if the platform and cloud provision are under the auspices of a single stakeholder. During the second year, we focused on the prediction and calculation of the application cost when multiple IaaS providers may exist, each offering multiple pricing schemes. This is of interest mainly when the IaaS and PaaS provider roles are played by different stakeholders. In this case, the PaaS provider acts as a broker of cloud infrastructure and fosters competition between IaaS providers. As shown by a theoretical analysis and numerical examples, the PaaS Pricing Module creates the economic incentives for adoption of the IaaS and PaaS layers of the ASCETiC architecture by IaaS providers and applications respectively. 2.3.5 PaaS SLA Manager 2.3.5.1 Motivation The PaaS SLA Manager negotiates PaaS level SLA terms with upper layers. It also negotiates with multiple IaaS SLAMs, in order to select the best one to host the application based on IaaS terms specified in the SLA proposal. On one side it allows the upper (User / SaaS) layer to negotiate specific guarantees terms that the application could ensure; on the other side the PaaS SLA Manager negotiates with its IaaS counterparts, from different providers, to collect the resources needed to fulfil the plan. PaaS SLAM also supports SLAs monitoring, to enable enforcement policies. However the enforcement is performed from the Self Adaptation Manager component and not from the SLAM itself. 2.3.5.2 Related Work PaaS layer negotiation is based on energy and application SLA terms: these terms are essential to describe different deployment needs. Moreover the SLA manager needs to be capable of negotiation with multiple IaaS providers, comparing multiple offers and selecting the best one. For this purpose, Contrail SLA manager components [54] have been reengineered and extended to satisfy such requirements, thus becoming the baseline for the ASCETiC SLA management capability, both at IaaS and PaaS level. The extension of the original work delivered support for application and energy terms and OVF resources to be included in the PaaS negotiation. If the reader would like more detailed information regarding SLA negotiation it may be found in D2.1.2 ASCETiC Requirement Specification-v2 SoTA. 2.3.5.3 Contributions The novel capabilities of the PaaS SLA Manager are: Extensible SLA Terms: existing SLA terms extended to support negotiation of energy terms and performance terms at PaaS level. To ensure a proper negotiation at PaaS level some of the terms (such as power_usage_per_app or energy_usage_per_app) must be translated into the corresponding IaaS terms (in this case, power_usage_per_vm and energy_usage_per_vm) and negotiated at IaaS level. This can be
53 accomplished in different ways: for example, the PaaS value can be simply divided by the number of virtual machines to obtain the same value for each VM; a most sophisticated approach takes into account the number of cores for each VM and weights the values accordingly; furthermore, cpu_speed is another value that can be analysed to properly distribute the PaaS value between the VMs. Different approaches could also be specified by the user and taken from the OVF. Management of multiple IaaS providers: the PaaS SLA manager is capable of handling negotiation with multiple providers and the selection of the best offers that fit the negotiated user SLA. This is accomplished through the integration with the Provider Registry component, which indicates the endpoint of the providers. In order to ensure an efficient negotiation, it s desirable to add a filter to limit the number of providers to negotiate with and choose those that can most likely respond positively to each request. This can be obtained basing on previous experience that could be maintained by the Provider Registry itself (by updating each provider history after every negotiation in which it is involved). In this way is possible to determine in advance if a certain provider can support a certain service, on the basis of its offered services and their characteristics. Selection based on different criteria: The ASCETiC release implements a sophisticated logic in the selection of the best offering. In particular, the decision algorithm for choosing between IaaS provider offers is configurable and also based on energy information. The selection process takes several steps: 1. Normalization It represents the relationship between the request and the offer. Guarantee terms of the offer are normalized towards the guarantee terms of the proposal. This is obtained simply by dividing the value of the offer guarantee term by the value of the proposal guarantee term. 2. Weighting User preferences can be expressed at the beginning of the negotiation and injected as optional information in the SLA proposal. Each criterion is a pair {Guarantee Term, Weight} where Weight is a real number in the range [0,1]. The more the Weight is close to 1, the more relevant for the user is the related Guarantee Term. Once User criteria are extracted from SLA template proposal, normalized values are weighted using Criteria weights. This is accomplished by simply multiplying the normalized term value by the associated weight. 3. Evaluation An evaluation algorithm is applied to the set of weighted values obtained in the previous step. 4. Ordering The SLA offers are ordered according to the output values of previous evaluation step.
54 5. Filtering The ordered list of SLA offers may be filtered so that only a maximum number of SLA offers are returned to the Federation. It s possible to choose between a price-based approach, which It s possible to choose between a price-based approach, which considers only the price in the sorting algorithm and an approach called Max Average Virtual System Distance : in this algorithm, an "average" Virtual System is computed for each offer. The algorithm calculates the score as an average of the distances from the origin of all the virtual systems in the same SLA offer, in a multi-dimensional space where there is one dimension for each Guarantee Term. There is a one-to-one correspondence between points and SLA offers. The score d j, j={1..number of Offers} of each point is calculated as in the formula: d j = k i=1 t i=1 W ik 2 k where k is the number of Virtual Systems, t is the number of Guarantee Terms for each Virtual System and W ik is the weighted value of the i th term for k th Virtual System (weights can be specified in the SLA template, otherwise default values are stored in the configuration). The farthest point from the origin represents the best SLA offer. This algorithm works well in the case where all the terms are numeric, otherwise a different approach must be considered. Monitoring capabilities: PaaS SLAM supports SLAs monitoring to enable enforcement policies. In this way, it is possible for ASCETiC to react after such SLA violations and to execute appropriate remediation actions to ensure the violated term will return as soon as possible under the negotiated threshold. The PaaS SLAM subscribes to the event queue waiting for application events. When an Application Starting Event is retrieved, the SLAM
55 needs to recover all of the information about this application from the Application Manager. Once all of the information about the application is recovered, PaaS SLAM asks the Application Monitor to initiate the monitoring on a given SLA (related to the application) and a given set of terms and receives the id of the queue where terms measurements will be notified. The PaaS SLAM instantiates a subcomponent (PaaS Violation Checker), which subscribes to this queue (and to the application events queue), retrieves measurements and compares them with the thresholds. Whenever a violation is identified, a Violation Notification Event is notified on a given queue. The Monitoring process terminates when an application terminated event is retrieved from the event queue. 2.3.5.4 Conclusions The PaaS SLAM finds a trade-off between conflicting requirements about resource performance and energy savings. In particular thanks to the ASCETiC energy information and the negotiated terms, it s possible to find the best compromise between performance and energy saving. The PaaS SLAM can manage multiple IaaS providers and implements a sophisticated logic in the selection of the best offering. PaaS SLAM supports SLA Monitoring, to enable enforcement policies. Next releases will monitor not only SLA violations, but by using threshold monitoring, will inform the PaaS layer before the violation occurs in order to enable proactive adaptation. 2.4 IaaS Layer 2.4.1 Virtual Machine Manager 2.4.1.1 Motivation The Virtual Machine Manager (VMM) is the component that is responsible for deploying virtual machines. Also, the VMM provides an API that can be used to shutdown, reboot, destroy, etc. virtual machines. The main functionality of the VMM is to perform the deployment of virtual machines according to a policy specified by the owner of the infrastructure. Several policies have been included in the VMM: Energy-aware: deploys the VMs in the hosts where they consume least energy, according to the models and predictions from the IaaS Energy Modeller. Price-aware: deploys the VMs in the hosts with the lowest price. The cost of deploying a virtual machine in a specific host is provided by the Pricing Modeller. Distribution: distributes the virtual machines trying to maximize the number of servers used. When using this algorithm, if two scenarios use the same number of servers, the one where the load of each server is more balanced is considered to be better. This policy is not energysaving but aims to maximize performance and hence can be used for comparison purposes with the other policies. Consolidation: distributes the virtual machines in a way that minimizes the number of servers that are being used.
56 Group by application: tries to group VMs that are part of the same application in the same host. This can be useful to maximize data locality and reduce the communication between different hosts. Random: deploys the VMs randomly. This policy is useful to compare its results with the results of the rest of policies to find out their effectiveness. The reason for providing several policies is simple. Each policy has its own advantages and disadvantages. For example, when the consolidation policy is chosen, the energy consumption will be lower than when the distribution policy is used. However, there is an incurred cost, the performance of the virtual machines is likely to be degraded when using the consolidation policy. In order to apply the scheduling policies described, the VMM needs to interact with other ASCETiC components: Energy Modeller: the VMM interacts with the Energy Modeller to request predictions related to the energy consumption that a virtual machine would have on a particular host. Pricing Modeller: the VMM interacts with the Pricing Modeller to request predictions related to the cost of deploying a virtual machine on a specific host. Infrastructure Monitor: the VMM interacts with the Infrastructure Monitor to know the load of each host of the cluster. Apart from the scheduling of virtual machines using the policies described above, the VMM offers other functionalities: It can manage the life-cycle of virtual machines. This means that using the API that the VMM provides, it is possible to reboot, shutdown, suspend, restart and destroy virtual machines. It is also possible to perform queries to know at any moment the state of each of the virtual machines deployed and retrieve information about them: CPUs, RAM and disk reserved, their IP, the host where they are deployed, their creation date and time, etc. The VMM can also be used to manage the images from which virtual machines are instantiated. Specifically, using the VMM it is possible to retrieve the information of all the images that have been registered, delete them and upload new ones from public URLs or from a local URI accessible by the VMM. Using the VMM it is possible to retrieve information about the hosts available in the cluster where it is operating. It is possible to check the capacity of the hosts in terms of CPUs, RAM and disk as well as their current load. The VMM is also able to check the power consumption of each of the hosts at any given time. Finally, the VMM can be used to calculate price and energy estimates. Given the characteristics of a virtual machine or a set of virtual machines, the VMM is able to calculate what their power consumption would be and also, the price associated to that consumption. 2.4.1.2 Related Work The scheduling of VMs in physical hosts has been one of the keys to implement self-adaptation in the IaaS level during the second year of the project.
57 The problem of scheduling n virtual machines on m physical hosts is NP-hard [54]. Several authors have tackled this problem from different angles. There are different metrics that can be optimized (energy, performance, price, etc.) and there are many techniques and algorithms that can be applied to find an acceptable solution in a reasonable amount of time. Here are some examples: In [44] the VM placement problem is characterized as a bin packing problem. They compare the effectiveness of several construction heuristics such as first fit, best first, etc. and try to minimize the number of idle hosts in their infrastructure to save energy. In [45], the author discusses some of the approaches used when tackling the problem of scheduling VMs like: bin-packing, constraint programming and genetic algorithms. In [46], the authors propose fuzzy logic techniques combined with genetic algorithms. They try to minimize a function that takes into account three different objectives: minimize energy consumption, minimize temperature peaks in the servers and minimize the wasted resources in the servers. In [47], the authors try to minimize a function that takes into account three different objectives: number of idle servers, network cost and number of VM migrations. In [48], the researchers approach the VM placement problem as a multiple multidimensional knapsack problem. Their objective is to maximize the VM placement ratio. This is different from our approach in the ASCETiC project, because they do not focus on saving energy. The paper discusses two algorithms based on the knapsack problem and they are compared against typical approaches to the bin-packing problem such as: first fit, first fit decreasing, etc. The authors argue that those algorithms tend to leave more free space on the servers and, as a consequence, they are less effective when the goal is to maximize the VM placement ratio. The previous approaches do not mention self-adaptation at operation time. Their focus is on VM scheduling at deployment time. Next, we present some publications that discuss the use of VM scheduling policies to be able to adapt their Clouds at operation time: In [49], the authors discuss OpenStack Neat [50], a tool that they have developed. The tool is based on a previous publication by the same authors [60]. As the name indicates, OpenStack Neat is a tool that can be used along with OpenStack. The aim of the software component proposed is to save energy by applying a VM consolidation algorithm. In order to evaluate the algorithm that they propose, they make their experiments using real-world workload traces from PlanetLab [51]. Their algorithm can be divided into 4 steps: 1) Host underload detection. They detect hosts with a low utilization and try to see if the VMs that they are hosting can be migrated to put the host on suspend state and thus, save energy.
58 2) Host overload detection. This step consists of detecting hosts with a large utilization and then proceeding to migrate a few VMs until the capacity of the host is not surpassed. The reason to do that is that when a host is overloaded, the VMs that are running in that host will suffer from a performance point of view. 3) VM selection. In this step, the algorithm selects the VMs that should be migrated when a host is overloaded. 4) VM placement. In this step, the algorithm selects the host to which the VMs selected in the previous step should be migrated. The authors view the VM placement problem as a bin-packing problem and propose a selection algorithm based on the best first decreasing approach. The Snooze project [52][53][61] proposes a VM management system that offers relocation of VMs at operation time. Their approach is based on the ant-colony optimization heuristic. In [57], the authors propose a self-adapting system that tries to minimize the overall number of used servers and the number of VM migrations as they consider that it is an expensive operation. They compare the VM placement problem with the bin-packing problem and point out that there is an important difference: in the initial state of the bin-packing problem, the containers are not inside the bins whereas in the VM placement problem, when doing a rebalance, the VMs are already running in hosts. In their placement algorithm, they assume homogeneous hosts, consider CPU and memory and compare it against the First Fist decreasing approach used in the bin-packing problem. In [58], the authors propose a software tool that can be integrated with Eucalyptus [59]. This tool is able to perform live-migrations at operation time. The tool presented in the paper only triggers live-migrations when it detects that a live-migration can leave a server without VMs and it can be put to a suspended state. In [68], the authors propose a tool to achieve self-adaptation at operation time by using machine-learning techniques. Next is a non-exhaustive list of some of the limitations that we have detected in some of the presented publications: Usage of computer simulations instead of real environments. Lack of integration with popular middlewares such as OpenStack, OpenNebula, etc. Lack of integration with popular monitoring solutions such as Zabbix [43] and Ganglia [42]. Assuming heterogeneous servers or VMs. Focusing just on the VM placement at deployment time but not on selfadaptation at operation time. Usage of synthetic workloads instead of 'real-world' use cases or workloads based on traces published by companies like Google [56]. Assuming that in order to achieve an energy-efficient vm placement, using the minimum number of hosts is enough. This is true to a certain extent, but it can fail when using a cluster with heterogeneous hosts.
59 Considering only a few dimensions of a VM. For example, some studies only consider CPU, but there are many others dimensions that could be used: memory, disk, network, etc. We believe that we can address those limitations during the second and the third year of the ASCETiC project. The algorithms, techniques and tools presented above are VM-centric. However, there are other scheduling techniques. Next, we briefly review two of them: batch-scheduling and BLO-driven scheduling. Batch scheduling GreenSlot [69] is a parallel batch job scheduler that predicts the amount of solar energy that will be available in the near future and schedules the workload to maximize the green energy consumption while meeting the jobs' deadlines and reducing brown energy consumption, monetary costs and environmental impact. Willow [70] assumes that reductions in green energy supply affect the hosts differently so it tries to adapt to the energy and thermal profile of the data center by managing the migration of tasks between servers. Blink [71] manages host power states when the amount of green energy varies but the data center is not connected to the electrical grid. An application s blinking policy decides when each node is active or inactive at any instant based on both its workload characteristics and energy constraints. BLO-driven scheduling There is some previous work about VM scheduling driven by Business-Level Objectives. Considering BLOs during the negotiation and allocation of SLAs demonstrate to be effective in the achievement of BLOs such as Revenue Maximization [72] and discrimination of clients according to their relation with the provider or their agreed level of QoS [73]. A similar approach has been considered by means a holistic approach where a central holistic manager decides the placement of services according to the assessment of a Risk assessor or an Eco-efficiency assessor that is switched in function of the desired Business-Level Objective (Risk Minimization, Ecological-efficiency maximization, Energy-efficiency maximization) [74]. 2.4.1.3 Scientific Contributions The main contribution of this component is the ability to deploy virtual machines in a way that minimises the energy consumed by a cluster. The schedulers of the most popular Cloud middlewares such as OpenStack do not include energy-aware policies. In addition, our VM manager can also apply scheduling and management policies that consider the pricing estimations from ASCETiC IaaS Pricing Modeller. Figure 19 to Figure 24 show the behaviour of the self-adaptation manager at IaaS. It is a component that adapts OptaPlanner[75] to optimize the placement
60 of Virtual Machines within a large set of hosts, by providing location and time constraints. The initial experiments try to find the best configuration of OptaPlanner for finding the optimum allocation of VMs. Basically the configuration consists in the deadline time that the VM Manager has to find a suboptimal solution (1, 3 or 5 minutes) and the algorithm for performing the local search of the optimum placement: hill climbing, late acceptance, late simulated annealing, simulated annealing, step counting and tabu search. Please refer to the OptaPlanner documentation for a complete explanation of each algorithm [76]. Periodically, the configured optimization algorithm is executed to check whether the utility of the objective function can be increased by reorganizing the VMs in the host, by means of VM migration techniques. Each experiment starts with a random placement of the VMs and let the algorithm to find a better placement according the consolidation policy. In all the cases, a solution can be found in a manner that no host is overloaded (neither in CPU nor memory nor disk). The Y-axis of each graph represents the number of idle hosts after a resources reallocation. The higher the better, since it means a higher degree of consolidation and, in consequence, more hosts can be suspended for saving energy costs. Figure 19: VM optimisation performance for different local search algorithms and search time (30 hosts 30 VMs 20% average load)
61 Figure 20: VM optimisation performance for different local search algorithms and search time (30 hosts 30 VMs 45% average load) Figure 21: VM optimisation performance for different local search algorithms and search time (50 hosts 50 VMs 21% average load) Figure 22: VM optimisation performance for different local search algorithms and search time (50 hosts 50 VMs 51% average load)
62 Figure 23: VM optimisation performance for different local search algorithms and search time (100 hosts 100 VMs 20% average load) Figure 24: VM optimisation performance for different local search algorithms and search time (100 hosts 100 VMs 47% average load) The results of the experiments show that Simulated Annealing generally the worst algorithm. There are no big differences across the other algorithms, but Late Acceptance is the best algorithm in almost all the cases, followed by Step Counting Hill Climbing. There is not a fixed ranking for the others. The other main conclusion is that there is not substantial different when the algorithm is executed during 1, 3 and 5 minutes. For all the cases, the found solution is near the optimum one (for example, for 50 hosts at 50% load the VMs are consolidated so the number of idle hosts is near 25). The low-level system metrics (CPU, memory, disk, network...) are not always the best indicator about the performance of an application. For example, 100% CPU in a given host may involve lower application performance than 80% CPU in another host. When scheduling the allocation of VMs, does not always indicate that the application is performing better, in terms of application metrics (e.g. response time or throughput for web applications, or execution time for HPC workloads). Initial experiments have been performed at IaaS level about Application Metrics. The complete set of CloudSuite benchmarks [77] has been executed and their performance metrics have been measured, concretely:
63 Data analytics - execution time Data caching - average requests per second Data serving - average operations per second Graph analytics - execution time Software testing - coverage Web search - throughput Web serving - throughput Media streaming - maximum concurrent users For example, Figure 25 shows the relationship between number of CPUs and operations/second for Intel Xeon (blue) and AMD Opteron (red) family of processors, for data serving workloads. The main conclusion is that Intel-family CPUs provide a higher performance in terms of application metrics. The rest of benchmarks of the suite follow similar tendencies. Figure 25: Data serving CPU vs. average operations/second In terms of energy efficiency, Figure 26, Figure 27 and Figure 28 show the relation between Watts and application performance for Intel Xeon (blue) and AMD Opteron (red) family of processors for data-related benchmarks. The rest of benchmark results follow similar tendencies. The main conclusion is that Intel Xeon processors are always clearly more energy-efficient than Opterons, because they can provide the same performance (or higher) with significantly less consumption. Figure 26: data analytics Watts vs. execution time
64 Figure 27: data caching Watts vs. requests/second Figure 28: Data serving Watts vs. average operations/second The VM Manager with its self-adaptation features is at the heart of the adaptation in the ASCETiC IaaS layer. It is demonstrated further in the use cases that are documented in the deliverables D6.2.2 and D6.3.2. 2.4.1.4 Future Contributions There are two future lines of work: 1. To continue investigating in the relation between system-level resources and application performance. 2. To optimize the configuration of OptaPlanner library to allow finding suboptimal solutions in less time, by tuning the parameters, heuristics and configuration. We must consider that the execution times for current configurations will exponentially increase with the number of resources and the addition of new constraints (VM pinning to a given host, time restrictions, performance restrictions, etc...) 2.4.1.5 Conclusions The aim of the Virtual Machine Manager is to be able to deploy virtual machines and manage their life cycle applying a specific policy selected by the owner of the infrastructure. This is a problem for which an optimal solution cannot be found in polynomial time.
65 The main scientific contribution of the virtual machine manager will be the evaluation and implementation of several heuristic algorithms that can be applied when deploying virtual machines and when migrating them at operation time. The owner of the infrastructure will be able to choose between different policies that will guide the placement of the virtual machines. The most important policy offered will be the one that deploys the virtual machines in a way that minimizes the energy consumption in the infrastructure. In the project we will pay special attention to that policy, because it is aligned with the main objective of ASCETiC: minimize the energy consumption. However, several other policies will be offered: consolidation of virtual machines, distribution of virtual machines, grouping by application, cost minimization, etc. 2.4.2 IaaS Energy Modeller The IaaS Energy Modeller is the principle component in the IaaS layer for predicting energy usage and generating historic logs of usage in the ASCETiC framework. The contributions that enhance this component in Y2 are discussed in this section. 2.4.2.1 Motivation The ASCETiC project is built around the concept of saving energy; in order to achieve this other ASCETiC components need detailed information about energy usage. One such strategy is energy saving based upon the efficient placement of VMs. The IaaS energy modeller is therefore built to accommodate queries that aid in this regard. It is therefore built around the fundamental units of hosts and virtual machines (VMs). The main feature of the energy modeller is the prediction capability with a particular focus on the virtual machine placement. The enhancements to the energy modelling at the IaaS layer therefore fall into the following main areas: Enhanced profiling of physical resources: This has been achieved with the aim of providing better host power models and higher quality calibration of those models, thus eliminating estimation errors associated with the physical resources. Calibration has been improved with a standalone calibrator that can more tightly control the conditions of calibration runs. Physical resource can now also be ranked by performance per Watt. The aim of this is to enhance the selection of the most suitable host for a VM. Enhanced scalability with emulated Watt meters: In order to address the scalability issues of not having Watt meters for every machine in a given infrastructure, which is impractical generally we have provided the capacity to emulate Watt meters. Enhanced workload specification and VM characterisation: Aside from accurate power models of physical resources an enhanced understanding of the workloads induced by virtual machines is needed, in order to better predict power consumption of the different types of virtual machine used in an ASCETiC compliant architecture.
66 This has included the analysis of OVF descriptions of VMs in order to better characterise them and thus provide insights into potential future workloads patterns. 2.4.2.2 Related Work An important part of prediction is the characterisation of the physical resources and also the characterisation of the workload. Physical resource characterisation has given rise to energy profiling and testing frameworks such as JouleUnit [24]. In addition to frameworks aimed specifically at energy profiling, there are more generalised monitoring frameworks such as Zabbix [43] and Ganglia [42]. The more generalised frameworks fitting the requirements of ASCETiC more closely given that they can monitor compute environments at scale, thus fitting the target environment of the project more closely. In order to drive the placement of VMs onto hosts the characterisation of the physical resources is required. Various frameworks have been developed to achieve this over the years. The majority of these cases have used linear models [78], [80], [82], [83], while others have used lookup table structures [84] and other techniques [81]. In most cases these linear models have shown a high degree of accuracy in providing the power profile of resources, usually within 5%, or less than 3W of the actual value. Many models can also be described as additive models such as [78], [82]. These models are characterised by summing each of the major physical component s power consumption separately. The idle power consumption in these cases is treated as an additional parameter to the model that is simply added to the other load characteristics. This practice often ignores the fact that to utilise one physical component of the host such as I/O often requires another such as CPU, thus making it hard to isolate the true cause of an increase in power consumption. In other cases more complicated bias mechanisms [80] are utilised or other mechanisms such as principal component analysis [81] to create the host s profiles underlying model. In addition to the characterisation of physical resources, the workload is required to be characterised as well. This subject is often utilised in Clouds for the purpose of assisting auto-scaling, but it may also be used to determine predictions of future workloads. Linear Regression based CPU Utilization Prediction (LiRCUP) [83] is one such example of CPU load prediction, which is aimed at maintaining service level agreements. 2.4.2.3 Scientific Contributions Physical resource profiling The advancements made in the second year regarding physical resource profiling, include the introduction of new models, the auto selection of models
67 through goodness of fit testing and improvements made to calibration through a new standalone calibration tool. This tool now includes benchmarking of physical hosts, in order to provide the performance per Watt of physical hosts. The benchmarking tool chosen for integration into the energy modeller was SciMark2 [86]. This benchmarking tool was tested on a physical resource called testnode4, which has an Intel Xeon X3430 2.4GHz CPU, with 4 cores, with 16 GB RAM and a 256GB hard disk. The benchmark gave an average flops rating of 1090.91 flops. The maximum power consumption was 124.1W. This therefore gives the flops per Watt at the maximum power consumption to be 8.79 flops/w. This new benchmarking feature provides additional ranking functionality for physical hosts, thus assisting in the selection of the best physical host for a given VM. The calibration procedure has also been enhanced to remove noise from the calibration dataset. This is achieved through carefully selecting the times at which measurement is performed. Figure 29: Calibration of the Energy Modeller in ASCETiC A sequence of timed workloads is launched during calibration, each of which induces a degree of power consumption on the physical host undergoing calibration. The workload is then changed during each of these calibration periods. This is demonstrated in Figure 29, as two periods of load with different levels of power consumption. Each measurement period provides several data points for the calibration dataset and several measurement periods at utilised to cover the entire range of possible workloads. The period after a load is induced and before it ends is considered unreliable, in terms of the validity of the data. The causes for this are that the CPU value may not have had time to settle and become consistent. In addition to this the CPU utilisation measure and the power consumption measure are taken by different processes, these need time to achieve a steady state in order to avoid synchronisation issues. The standalone calibrator therefore avoids taking measurements just after and a load is induced and just before it ends. In addition it checks for the stability of incoming metric values, hence avoiding synchronisation issues and noise due to other tasks on the system such as cron jobs inducing unplanned load.
68 The standalone calibrator can also directly take measurements on the host machine which can further mitigates synchronisation issues, by avoiding aspects such as network delay, at the cost of a smaller overhead caused by the observer effect. Figure 30: Example of Calibration Data on Testnode4 Leeds Testbed The calibration data above (Figure 30) shows the relationship between power and CPU utilisation. A linear model as well as a polynomial fit is applied. It can be observed that both R 2 values are very similar, with the polynomial fit providing a slightly better result. The sum of the square error and root mean square error are shown in the table below.
69 Measure of fit Linear Polynomial Correlation Coefficient (R 2 ) 0.9831 0.9867 Sum of the Square Error (SSE) 6454.4326 5069.8023 Root Mean Square Error (RMSE) 2.5217 2.2327 Table 2: Goodness of fit for the Energy Modeller's Calibration We can see in Figure 30 that below 10% CPU utilisation the calibration data demonstrates a lower gradient than during the rest of the graph. This transition period has been observed due to the sensitivity of the calibrator. This gives rise to the need for greater degrees of freedom than has traditionally been utilised by previous works that use linear models alone. The calibration illustrates an improvement in the fitting of the model that drives the energy modeller. The accuracy of this physical host profile can be further illustrated by the use of the Watt meter emulator. Watt Meter Emulation The Watt meter emulator is a basic tool that allows power values for hosts without associated Watt meters to be used within the ASCETiC framework. This allows for several possibilities, which include: 1) the expansion beyond the set of physical hosts that have Watt meters, 2) Watt meter to be removed after an initial training phase and 3) the cloning of a physical host s calibration data The emulation works by utilising the energy modellers, host resource profiling models and calibration data. It examines the calibration data and applies the model with the best fit. It then monitors the required metrics of the physical resource to provide an estimate of the actual power consumption of the physical host. This can work in a single instance mode that provides a power estimate for a named host, or it can work by providing an estimate for every physical resource that has calibration data. This enables the energy modeller to utilise the estimated value in cases where a measured value is unavailable. These power values are important as a record of current power values and historical logs for the physical hosts and VMs. The logs of which can drive both billing and future predictions of both power and energy consumption. In order to demonstrate the emulated Watt meter in operation and to highlight the accuracy of the physical resource profiling we illustrate its use as part of a trace.
70 Figure 31: Trace of Power Consumption on Testnode4 In Figure 31 a trace of power consumption over time is shown. A sequence of periods of high CPU load is induced and the actual power is measured along with the estimated power derived from a polynomial model. The average absolute error shown by the model is 2.32W with an average error of -1.72W. Thus in this case the model is over time underestimating the actual energy consumption used. This shown as a percentage of the average actual power consumption is 2.82% (average of absolute error/average power). The overall energy consumption for this trace is therefore underestimated by -2.10% (average error/average power). The majority of this error by examining Figure 31 seems to be associated with periods of higher load.
71 Figure 32: Distribution of Error in the Power Model The distribution of errors in the trace shown in Figure 31 is highlighted in Figure 32. This is illustrated with a normal distribution with the most frequently occurring error shown to be around 0 to -2W. This therefore confirms that the model with its current calibration data will underestimate energy consumption of a physical host, yet the absolute error in the power consumption shown is in keeping with the literature. 2.4.2.4 Future Contributions The future direction of this component will be to enhance the results it provides in several directions. These will include: Enhancing workload profiles to better utilise VM workload history information in determining average power consumption of a VM over time. Enhancing resource profiling through improvements in calibration and models that utilise a greater range of characterisation metrics The automation of the selection of the best workload prediction mechanism for a given VM instance. The inclusion of the effects of remote storage, on the profiling of VMs power utilisation profiles. 2.4.2.5 Conclusions The IaaS energy modeller this year has focused on calibration and fitting of the existing metrics of the models, with the aim of eliminating noise in the calibration data. This has included the automated detection of goodness of fit of the models and the selection of the most appropriate model accordingly. This has led away from utilising purely linear models, which is predominant in the literature.
72 The IaaS energy modeller has been extended with a new component called a Watt Meter emulator that enables scalability by removing the need to having a Watt meter attached to each machine, as well as allowing them to be removed from the production infrastructure. The error found in estimates of power over time was found to be mildly underestimating the measured value, thus energy estimates over a period of time could equally be expected to be underestimating the true value by a small but describable factor. 2.4.3 IaaS Pricing Modeller 2.4.3.1 Motivation The majority of IaaS providers charge for their services, which come in the form of VMs with specific performance characteristics using fixed rates per unit of time. The rates may depend on specific VM characteristics such as CPU speed, available memory, network bandwidth etc. For example, in Amazon the pricing varies dynamically in time and depends on bids made by other IaaS users. In any case, prices do not reflect the actual use of resources by a VM: the price of VM with some specific characteristics is the same regardless of the real usage of resources made by this VM. Due to the nature of program execution, the actual usage may unpredictably vary and differ significantly from the static VM characteristics. Economic theory suggests that services should be priced according to the actual cost of resources involved in offering these services. For example, consider what happens in restaurants charging a flat fee per person, e.g., all you can eat menus. Then persons having light meals will essentially subsidize the meals of persons which consume a lot of food. Because of this, light-meal persons may find the flat fee too high to take the deal and consider other alternatives. This drop in the demand of light-meal persons may result to overall decreased profits compared to the case of menus charged on the basis of portions ordered. Of course the capability of measuring actual resource usage might not be available. One of the basic functionalities of the IaaS Energy Modeller is the ability of accounting for the energy used by each VM. Since a major cost factor of IaaS providers is energy, pricing schemes which incorporate the energy usage explicitly (i.e., the actual usage not the average one) may bring greater profitability and be more desirable. The goal of the IaaS Pricing Modeller component is to utilize the capability of IaaS Energy Modeller and provide energy-aware cost estimation through two different energy-aware pricing schemes. 2.4.3.2 Related Work The charging schemes are limited with respect to the available options based on the workload demands of the jobs. Most of the Cloud providers are used to adopt an intuitive way of leasing their resources [113],[114],[115],[116],[117] but this simplicity does not come without a cost. A more advanced approach
73 would be to consider trade-offs between energy and cost, such as approaches that utilise CPU frequency scaling [119], albeit uncommon in production infrastructures. Microsoft Azure [118] charges the resources (i.e., VM instances, database, traffic) that will be employed on a per hour usage basis. A more composite pricing policy is employed by Google. In particular, Google Cloud Platform Pricing Calculator [120] enables customers to estimate the cost for a set of Cloud services, in terms of storage, computational needs, etc. Customers are enabled to provide input for a wide range of parameters, including the required number of servers, the amount of stored data, the datacentre location, the egress traffic, the average server usage per day or week, etc. Although, such approach does not take into account dynamic parameters, it is mainly focused on the cost function of the provider. This work is along our lines of interest. Furthermore, another important challenge is dealing with the inefficiency of existing power management techniques, since specific workloads require all servers to remain up regardless of traffic intensity. Due to economies of scale, many IaaS providers establish energy saving policies, such as consolidation, in order to deal with energy demand patterns. Another reason that makes energy usage based prices desirable is that it is common for energy prices to vary in time for various reasons, e.g., varying availability of energy sources, time-of-day pricing, demand response schemes. Thus, energy consumption-related aspects should be also incorporated within the pricing models of an IaaS provider. 2.4.3.3 Scientific Contributions In the first year of the project, a basic cost function and a basic price model was proposed, mostly for integration purposes. During the second year of the project, the IaaS Pricing modeller focuses mainly on two aspects: Offering theoretically justified energy-aware pricing schemes Handling a (perhaps) time-varying energy cost faced by IaaS providers, as announced by energy providers. The IaaS price modeller calculates the charges incurred by a VM during its operation or predicts charges based on estimates of future usage. It should also be able to compute the portion of charge due to energy usage. First let us define some useful notions to be used in what follows: Price: the (time) average charge incurred by a VM per unit of time measured in euros per seconds Charge: the total charge incurred by a VM measured in euros Energy price: the price per a unit of energy, in euros per Watt seconds Energy charge: the total charges due to the energy usage of a VM measured in euros Static price: the portion of price not explicitly depending on energy consumption; usually it depends on the static characteristics of a VM, for example: CPU speed Memory Maximum network bandwidth etc.
74 It could also be the result of a market mechanism, e.g., auction for computing resources. Static charge: the total charge due to static prices. Billing: the calculation of a price or charge incurred by a specific VM based on past usage. Prediction: the calculation of a price or charge estimate concerning the future usage of a specific VM, given a prediction of its energy (or power) consumption. Pricing scheme: a formula for computing the price We describe three sample pricing schemes: A two-part tariff The price p of a VM (starting at time 0 and up to time T) is computed by the formula p = 1 T T p static(vm, t)dt + 1 T 0 T p energy(t)w(t)dt 0 where, VM: a parameter identifying the VM p static (VM, t): static price of VM at time t p energy (t): the energy price at time t W(t): the power usage of the VM at time t, as provided by the IaaS Energy Modeller We assume that the energy price changes only at the time instants T 0 = 0 < T 1 < T 2 < (see Figure 33) and let the energy consumption during the corresponding time period be as given by the red curve in Figure 33. Figure 33: Recursive calculation of energy charges C(T) up to time T by the energy charges C(T k ) and the energy charge during the time period from T k up to T, where T k is the last instant the energy price changed prior to T.
75 T Then the total charges C(T) = p energy (t)w(t)dt 0 calculated from C(T k ) as incurred up to time T can be C(T) = C(T k ) + p energy (T k ) W(t)dt (1) T k Thus in order to be able to calculate the charge for any VM one must keep track of C(T k ), i.e., the charges incurred up to the last price change, the current T energy price p energy (T k ) and the energy k W(t)dt consumed (by this VM) up to the last price change. Then the energy consumption W(t)dt appearing in (1) T k T T can be computed as the difference W(t)dt k 0 W(t)dt. 0 Thus on a price change one must iterate through all the VMs in the T infrastructure and update C(T k ), k W(t) dt. 0 The energy price p is computed from the total charge C(T) as p = C(T)/T. A two-part tariff with energy saving discounts In certain cases, it may be undesirable for the applications not to know the charges incurred ahead of time, as it may happen under the pricing scheme in the last section. A simple alternative is to pay a lump sum and then apply a discount based on the actual power consumption. In this way it is not possible to pay more than the lump sum initial payment. More specifically, the price p is computed by the formula p = 1 T T p static(vm, t)dt 0 + min { 1 T T p energy(t)w(t)dt 0 0 T T 1 T T p energy(t)w nominal dt, 0} 0 where: W nominal : the nominal average power consumption, i.e., the power consumption already accounted for in the static price. Any average power consumption above W nominal does not increase price above the (time average) static price. Deviations below W nominal result into a proportional discount. A static pricing scheme Here the price depends on the static price, i.e., p = 1 T T p static(vm, t)dt 0 If the static price does not vary in time, i.e., p(vm, t) is constant in the time parameter t then no time averaging is necessary. If it does vary then similarly to T the above analysis, the total static charge p(vm, t)dt up to time T can be 0
76 T k written as p(vm, t)dt + p(vm, T 0 k )(T T k ), i.e., the total static charges incurred up to time T k plus the static charges from that point onwards. Thus, in order to keep track of the static charges incurred by any VM, the total static charge up to the last static price change 1 should be stored (for each VM). Consequently, every time the static prices changes one must update the static charges for each VM in the infrastructure. The static price up to time T is computed from the static charges as T p(vm, t)dt /T. 0 Evaluation An evaluation of an IaaS pricing scheme necessarily involves a study of the actions taken by other economic agents besides IaaS providers. For this reason we consider a microeconomic model which incorporates the actions of IaaS providers, applications and their users. Since an action of any of these agents, triggers a chain of subsequent re-actions by the others, we are interested in determining the equilibrium of such interactions. First we state the model assumptions regarding each economic agent: IaaS providers: each has an infinite amount of physical servers at his disposal. Each server is populated by VMs belonging to possibly different applications and the CPU speed is split equally among the VMs. Let v i be the number of VMs used by application i. The provider is able to freely scale, i.e., the server consolidation policy is such that the number of active physical servers m scales in proportion to the number of VMs in the infrastructure, i.e., i v i /m = ρ, where the constant ρ is the consolidation degree. If the CPU speed of a physical server is μ then μ/ρ is the CPU speed dedicated to each VM running in the infrastructure. We consider a two-part tariff specified by the parameters π 0, π 1 where π 0 is the static price and π 1 is the energy price. Notice that a static pricing scheme has π 1 = 0. The provider strives to maximize his profits, given by π 0 v i + π 1 P i (v i ) π e P i (v i ) c(m) i i where P i (v i ) is the average power consumed by the i-th application when this uses v i VMs. More specifically, P i (v i ) = p 0 m + p 1 λ i (v i ) = p 0v i + p ρ 1λ i (v i ), where p 0 is the idle power, p 1 is the marginal power consumption and λ i (v i ) is the demand of application i expressed e.g., in CPU instructions per second. π e is the price per watt charged by the energy provider. c(m) is the maintenance cost involved in operating m servers; we assume it is linear, i.e., c(m) = cm for some constant m. Applications: each one decides how many VMs to buy from a particular IaaS provider such that the benefit minus payments to IaaS are maximized. The net 1 For example, if the static price is the spot price of a market mechanism.
77 benefit of application i is defined as r i λ i (v i ) π 0 i v i π 1 i P i (v i ), where r i is the reward per completed instruction of application i. For example, if an application owner profits from selling customer data then r i is the expected revenue per request. Application demand: each application i has a different demand (rate of instructions to be executed at the VMs of this application) λ i max which recedes to 0 if the average processing delay of each instruction becomes excessive. In particular, we assume each request derives a benefit R i β i d i (λ i ) from its execution, where R i, β i are constants and d i (λ i ) = 1/( μ ρ λ i v i ) is the average processing delay based on an M/M/1 queueing model. Requests will balk if the benefit at the moment is negative. Thus, either R i > β i d i (λ i max ) and λ i (v i ) = λ i max, or R i < β i d i (λ i max ) and R i = β i d i (λ i (v i )), i.e., λ i (v i ) = ( μ ρ β i R i ) v i. More compactly: λ i (v i ) = min {( μ ρ β i R i ) v i, λ i max }. Since the two-part tariff has more degrees of freedom, the maximum profit derived by an IaaS provider acting as a monopolist is never below its profits if a static pricing scheme is used instead. Actually, a two-part tariff yields strictly higher profits as the following example shows. Consider the case of two applications with π e ( p 0 ρ p 1 ( μ ρ β 2 R 2 )) > r 2 ( μ ρ β 2 R 2 ) > r 1 ( μ ρ β 1 R 1 ) > π e ( p 0 ρ p 1 ( μ ρ β 1 R 1 )) A monopolist clearly would not want to serve application 2 since he will suffer losses: profit r 2 ( μ ρ β 2 R 2 ) π e ( p 0 ρ p 1 ( μ ρ β 2 R 2 )) < 0 Since r 1 ( μ ρ β 1 R 1 ) π 0 implies r 2 ( μ ρ β 2 R 2 ) > π 0 it is not possible to exclude application 2 and at the same time include application 1 with the static pricing scheme. This does not happen under a two-part tariff with π 0 = 0, π 1 > π e, since then π 1 ( p 0 ρ p 1 ( μ ρ β 2 R 2 )) > r 2 ( μ ρ β 2 R 2 ) (i.e., application 2 is excluded) and r 1 ( μ ρ β 1 R 1 ) > π e ( p 0 ρ p 1 ( μ ρ β 1 R 1 )) (i.e., application 1 is included). The above example might suggest that the static price in a two-part tariff is useless. This is not the case as the following example shows: Let r 1 = 1, r 2 2 = 3, p 0 = 1 p 2 ρ 2 1 = 1, μ β 1 = 4, μ β 2 = 4 then a direct calculation shows 2 ρ R 1 7 ρ R 2 that the profit is maximized only at π 0 = 1, π 2 1 = 1. 3
78 Figure 34: IaaS provider profits in a monopoly Using a two-part tariff incorporating energy charges (solid curve) and a static price (dashed). As a numerical exposition consider profits of a monopolistic IaaS provider under two scenarios: in the first the provider employs a two-part tariff, while in the second it uses a static price. The parameter values used are R 1 = R 2 = 20, r 1 = r 2 = 1.5, ρ = 10, π e = 0.285, p 0 = 10, p 1 = 5, λ 1 max = λ 2 max = 50, μ = 50, c = 0. Figure 34 depicts the profits as a function of the maximum average request response delay tolerated by the users of application 1 (normalized by the max tolerated delay for application 2). The profits brought by the two-part tariff are always greater than that brought by the static pricing scheme. They coincide only if the quality-of-service characteristics of the two applications are the same. The greater the diversity between the applications, the greater the difference in profits. To summarize, we have obtained the following: Result: the profit of IaaS providers in a monopoly increases if a two-part tariff incorporating energy costs is used, compared to a static pricing scheme. Meaning: IaaS providers have the incentive of adopting the ASCETiC IaaS layer regardless if the upper layers exist or not The case of ideal competition is considered in the section describing the PaaS Pricing Modeller, as the latter is necessary in some form or another, in order for applications to be able to get estimates of their energy costs. We refer the reader to the PaaS Pricing Modeller section 2.3.4. 2.4.3.4 Future contributions Future steps for the IaaS Pricing Modeller component will be to enhance functionality and improve model accuracy including: Incorporation of specific VM characteristics and time variation, to the calculation of static prices.
79 Construction of a more elaborate model where VMs residing on the same servers are not isolated. Consideration of a model with applications which scale in response to energy-related information during operation. (This will be useful for the third year of the project where inter-operation of layers is considered.) Analysis of more realistic market cases which are neither monopolies nor ideally competitive. 2.4.3.5 Conclusions The IaaS Pricing Modeller component takes into account both energy consumption and the variation in the energy prices. We have proposed two tariffs, which provably yield greater profits to monopolistic IaaS providers which employ these compared to ubiquitous static pricing schemes. In competitive markets too, two-part tariffs are optimal as shown in the section of PaaS Pricing Modeller. Thus a key function of the IaaS Pricing Modeller is to provide the economic incentives for IaaS providers to adopt the ASCETiC IaaS layer. 2.4.4 Infrastructure Monitor The Infrastructure Monitoring will be responsible for monitoring the resources such as CPU, memory, network, etc. that are being consumed both at physical host level and at virtual machine (VM) level. The monitoring of both, physical host and VM, should be provided in terms of performance and energy. The component was design to fulfil the following requirements: Monitoring of the different KPIs defined by the project. Historical Statistics of the different metrics recollected for the different hosts. Pulling of the metrics to other components. Storage of the metrics for future reference. Interoperability with the different components and applications running in different operating systems. 2.4.4.1 Related Work In October 2012, the report title Harmonizing Global Metrics for Data Center Energy Efficiency [121] was published with the objective to find a collection of common metrics to measure the energy efficiency in a Data Center, Cloud or not related. A large amount of research is being conducted in order to render data centers energy efficient and low carbon emitting. This has also gained momentum recently, as the huge increase in the use of Cloud services has an imposition on the increase in hardware on the Cloud service provider sit to support these services. The use of metrics to measure that impact was the focus of the European Games Project [122]. Building over the metrics selected in Games [122] and the Data Center Efficiency Report [123] the ECO2Clouds [124] European project takes a look to those metrics and also focuses in the necessary adaptations or additions for Cloud and Federated Cloud infrastructures. The objective of ECO2Clouds is to reduce the CO2 impact of executing application on the Cloud. ECO2Clouds project tackles the problem by first building the necessary infrastructure to measure energy in a federated Cloud infrastructure and associating that measure to the CO2 production [126]. From there it generates reports that help the application owner or programmer to understand the
80 impact of the different actions of its application in a specific Cloud infrastructure. It is the mission of the owner of the application to take measures to reduce their environmental impact in the future [126]. The metrics recollected by the previous project are done by the using of Low- Level monitoring. Low-level monitoring, which is performed to monitor the status of the Cloud infrastructure, is already employed by Cloud providers as it is an essential component of their management and maintenance tasks. There are many tools available to the provider to set-up a low-level monitoring framework such as the popular ones like Ganglia [42], Nagios [125] or Zabbix [43]. As part of the managed Cloud services they provide, some Cloud providers also offer high-level monitoring services to their Cloud consumers (such as Amazon Cloud Watch), which vary in capability from simple monitoring and alerting to customizable monitoring tailored to the user. 2.4.4.2 Scientific Contributions In this project for the Infrastructure Monitoring we rely in a proven technology: Zabbix, used worldwide to monitoring different types of infrastructures, from Cloud providers with their different physical hosts running a good collection of VMs to the equipment in telecom infrastructure. Using Zabbix enables us to focus in the more innovative part of the work in the ASCETiC project, that it is the definition of the different KPIs and associated metrics that will enable the different ASCETiC components to see what it is the best way to deploy an application to optimize the energy usage. One of the main focuses during the second year was to remove Zabbix probes from inside the VMs. In this way, it will not be necessary to install external software to the VMs of the users using ASCETiC. This work has been done using LibVirt libraries to know the resource consumption of the different users VMs (actual IaaS providers in ASCETiC are using KVM hypervisors). Using the LibVirt api [127] it is possible to extract from the KVM hypervisor the following metrics without having to install IaaS monitoring probes in the VMs: Metric cpu_time user_time system_time memory wr_bytes wr_operations rd_bytes rd_operations flush_operations wr_total_times rd_total_times flush_total_times rx_bytes Description CPU time usage in nanoseconds of the physical CPUs of the host by the VM. User CPU time usage in nanoseconds of the physical CPUs of the physical host by the VM. System CPU time usage in nanoseconds of the physical CPUs of the hypervisor host by the VM. Total physical host allocated memory to the VM. Amount of bytes written to the main disk of the VM. Amount of write operations of the main disk of the VM. Amount of read bytes of the main disk of the VM. Amount of read operations of the main disk of the VM. Amount of flush operations to the main disk of the VM. All the times the main disk of the VM has been written. All the times the main disk of the VM has been read. Total number of flush operations. Number of bytes received by the main network interface of the VM.
81 rx_drop rx_packets rx_errors tx_bytes tx_drop tx_packets tx_errors Number of drop received packets by the main network interface of the VM. Number of packets received by the main network interface of the VM. Number of error receiving packets in the main network interface of the VM. Number of bytes transmitted by the main network interface of the VM. Number of drop transmitted packets by the main network interface of the VM. Number of packets transmitted by the main network interface of the VM. Number of error receiving transmitted in the main network interface of the VM. Also, we have a derivative metric from the CPU time usage that calculates the % of physical host CPUs usage of a VM at a given time. This metric it is calculated using the following equation 2 : cputimediff = cputime now cputime t(seconds ago) %CPU = 100 cputimediff t nr cores 10 9 Some of these metrics are publishing in many IaaS AMQP topics: Metric name metricid Queue name Cpu Cpu vm.<vmid>.item.cpu Memory Memory vm.<vmid>.item.memory Power Power vm.<vmid>.item.power Network-bytes-transmitted tx-bytes vm.<vmid>.item.tx-bytes Network-bytes-received rx-bytes vm.<vmid>.item.rx-bytes Energy Energy vm.<vmid>.item.energy Power Power vm.<vmid>.item.power The output format message of these topics is the same: { } name :<String>, value : <double>, units :<String>, timestamp :<long> //metricid This information is pushed to AMQP topics every minute. 2 Based on virt-top method to calculate % of cpu usage: http://people.redhat.com/~rjones/virt-top/faq.html
82 In the section 0 the KPIs extracted from the project to the IaaS layer are presented. As it can be seen the majority of those KPIs are derivative metrics calculated by the IaaS Energy Modeller (see section 2.4.2). The Energy Modeller is collection the metrics of the VMs from Zabbix that are extracted directly from the Libvirt API. Performing its calculations and putting again into Zabbix the power and energy metric for that VM. The only exception to this workflow it is the Estimated Power KPI per host that it is calculated by the Emulated Watt Meter and the specific physical host probes installed into Zabbix. In addition to the above, the second year of the ASCETiC project has evaluated the use of IPMI sensors integrated in current generation server hardware. The devices traditionally used to remotely manage and monitor larger clusters of physical machines, are starting to appear on the market with power sensing capabilities. For example, ULEEDS recently procured a new cluster of Dell PowerEdge enterprise grade servers that have the following built in IPMI sensors: `-cscloud1n1 /etc/one # ipmitool -I lanplus -L USER -U monitor - P monitor -H cscloud1n14 sdr elist full Fan1A 30h ok 7.1 3000 RPM Fan1B 31h ok 7.1 2160 RPM Fan2A 32h ok 7.1 3480 RPM Fan2B 33h ok 7.1 2520 RPM Fan3A 34h ok 7.1 3360 RPM Fan3B 35h ok 7.1 2400 RPM Fan4A 36h ok 7.1 3240 RPM Fan4B 37h ok 7.1 2400 RPM Fan5A 38h ok 7.1 3360 RPM Fan5B 39h ok 7.1 2400 RPM Fan6A 3Ah ok 7.1 3360 RPM Fan6B 3Bh ok 7.1 2520 RPM Inlet Temp 04h ok 7.1 24 degrees C Current 1 6Ah ok 10.1 0.60 Amps Current 2 6Bh ns 10.2 No Reading Voltage 1 6Ch ok 10.1 240 Volts Voltage 2 6Dh ns 10.2 No Reading Pwr Consumption 77h ok 7.1 112 Watts Temp 0Eh ok 3.1 64 degrees C Temp 0Fh ok 3.2 61 degrees C CPU Usage FDh ok 7.1 0 percent IO Usage F1h ok 7.1 0 percent MEM Usage F2h ok 7.1 0 percent SYS Usage F3h ok 7.1 1 percent Figure 35: Available metrics from IPMI sensors on a Dell PowerEdge R430 server With this capability we have explored the possibility of using these metrics in replacement of a stand-alone power meter directly attached to a device that is not feasible in large cluster deployments due to cost. While exploring the use of these IPMI metrics, changes had to be made to the ASCETiC Infrastructure Monitor backed by Zabbix. This was achieved through the integration of libopenipmi, a library for interfacing to a large range of vendor IPMI devices, with Zabbix. This enabled the sensor data to be periodically scraped and stored. During this integration process, a number of limitations were exposed
83 with the IPMI power metric, shown in Figure 35 as pwr consumption. Previously, polling intervals of power taken from directly attached power meters could be set to a minimum interval of 1 second. Using the IPMI power consumption metric, due to device connection overhead and the metric s slow response to changes in load, this interval was limited to 15 seconds. Additionally, we experienced issues with the granularity and accuracy of the readings taken from the IPMI device. Small changes in power consumption would not register bellow 10-20Watts. These limitations can be seen if the following graph: Figure 36: Host power consumption monitored via IPMI during instantaneous switching of CPU load from 100% to 0%. From Figure 36, the low granularity of readings, high polling interval and delay in reacting to the instantaneous load can be seen. With this in mind, from an experimental perspective the impact of course granularity and high polling interval is concerning, especially when evaluating the energy efficiency of events within a Cloud application that are shorted lived. On the other hand, the impact of these limitations on real Cloud systems is less dramatic where traditionally monitoring intervals are in the order of minutes. It should be noted that theses limitation are not technological and could be alleviated in the future, as built-in power sensing technologies improves. Thus this evaluation of IPMI capabilities could provide motivation to server hardware vendors such as Dell to make improvements that could add value and differentiate their products from competitors. 2.4.4.3 Future Work At this stage the Infrastructure Monitor solution for ASCETiC it is stable. It would be possible to extend it during Y3 by the addition of new probes coming from a future KPI taskforce. 2.4.4.4 Conclusions The infrastructure monitor is a key component in the monitoring of physical hosts and VMs within the ASCETiC architecture. This year has enhanced the scalability and applicability of the ASCETiC toolbox by ensuring VMs can be monitored out of the box without the need for modification at the IaaS layer caused by the installation of Zabbix monitoring probes.
84 2.4.5 Infrastructure Manager The Infrastructure Manager (IM) is the connection to all bare metal hardware resources. It provides access and manages all the virtualized compute, storage and networking resources. 2.4.5.1 Motivation Since the Infrastructure Manager is the link between the ASCETiC Cloud stack and the hardware resources it is one of the key components to support energy efficient Cloud computing. It further builds the foundation for the inter layer Cloud stack self-adaptation and provides the regarding technologies, components and APIs. The key requirement coming from the VM Manager is that the IM needs to supports VM live migrations in order to enable the ability for inter layer self-adaptation mechanisms on the IaaS layer. Thus, the challenge is to support these live migrations by using the state of the art technologies to provide fast and stable VM migrations between different physical host nodes. This will ensure to provide an appropriate environment for a comprehensive energy efficient Cloud-computing concept as aimed by the ASCETiC project in general. 2.4.5.2 Background VM live migrations are basically a well-explored topic [132] and are available under almost all Infrastructure Manager platforms [133] like VMWare. Usually it is used for maintenance cycles and host- or operating system updates. Live migrations allow migrating e.g. customer VMs without turning them off or affecting them with any other noticeable downtime. 2.4.5.3 Contributions In order to provide the ASCETiC Cloud stack in general and the VM Manager in particular with fast VM live migrations a special testbed and IM setup is used at TUB. The IM is a modified OpenStack installation using a SDN access and aggregation network in combination with the latest CephFS beta release. This procedure allows fast migrations with minimal impacts on the services provided by the VM located in the testbed. The CephFS is deployed and distributed over ten dedicated nodes, which are hosting the Ceph Object storage and monitors. Each of these host nodes again is participating with two physical hard disks on the DFS object storage. This guarantees fast VM memory and disk hand-overs between nodes to fulfil the self-adaptation criteria with regards to the IM and testbed. 2.4.5.4 Conclusions The current IM deployment based on OpenStack in combination with CephFS as DFS covers all requirements for the ASCETiC Cloud stack self-adaptation on the IaaS layer. The Y2 use-cases will show how efficient and stable this mechanism is working based on the currently available software releases. 2.4.5.5 Future Work Software-defined networking based VM migrations with dynamic QoS allocation are a new and promising application to further speed up the migration process [134]. We plan to further investigate these opportunities since the Y2 testbed is now supporting a completely SDN enabled network resource
85 management and have already done some work [135][136] in this particular area. 2.4.6 IaaS SLA Manager 2.4.6.1 Motivation The SLA Manager at the IaaS layer allows the negotiation of resources and power consumption of the virtual systems where applications will be deployed. The IaaS SLA Manager upon agreement with its corresponding PaaS SLA Manager, produce a contract in the form of a list of SLA agreements: it includes the computational resources agreed, their maximum energy consumption and the price that the consumer will pay for using the resources. The SLA Manager supports the standard OVF specification to declare computational resources. It also supports SLAs monitoring, to enable enforcement policies. However the enforcement is performed from the Self Adaptation Manager component and not from the SLAM itself. 2.4.6.2 Related Work The negotiation of resources between customers and infrastructure providers is a relevant area in interest in the Cloud computing. In particular results from two projects have been used in ASCETiC SLA Managers. One project is SLA@SOI, which defines a flexible, extensible SLA model, machine readable, for describing arbitrary SLA. Also it provides a reference architecture to define and negotiate SLA for different business cases. The second project is Contrail [54], a Cloud software stack made of components that work together to group together independent Clouds into one integrated federated Cloud provider. The Cloud federation can split work, if possible and split the associated SLAs, thus distributing the work over the resource providers that (best) meet the SLAs. The unit of packaging and distribution is a so-called OVF Package which may contain one or more virtual systems each of which can be deployed to a virtual machine. 2.4.6.3 Contributions The novel capabilities of the IaaS SLA Manager are: Extensible SLA Terms: IaaS SLAM integrates different features in order to provide a novel approach to SLA infrastructure negotiation based on energy awareness. In particular it supports both energy and performance SLA terms in order to meet virtual system requirements needed by application to ensure consistent performance, while at the same time, providing energy awareness capabilities required by infrastructure provider in order to optimize the usage of their data centre resources. This is the actual list of supported terms: Name vm_cores cpu_speed Memory disk_size power_usage_per_vm energy_usage_per_vm Description Number of cores assigned to a VM CPU frequency assigned to a VM RAM size assigned to a VM Disk size assigned to a VM Power usage for each VM Energy usage for each VM
86 Monitoring capabilities: IaaS SLAM supports SLAs monitoring, to enable enforcement policies. In this way, it is possible for ASCETiC to react after such SLA violations and to execute appropriate remediation actions to ensure the violated term will return as soon as possibile under the negotiated threshold. In a violation scenario, it represents a trigger to the self-adaptation that applies to other components. IaaS SLAM subscribes to the event queue waiting for VM events. When a VM Startup Event is retrieved, the SLAM needs to recover all of the information about this VM from the VM Manager. The IaaS SLAM instantiates a subcomponent (IaaS Violation Checker) which subscribes to the queue where the Infrastructure Monitor writes the measurements (and to the VM events queue), retrieves them and compares them with the thresholds. Whenever a violation is identified, a Violation Notification Event is notified on a given queue. The Monitoring process terminates when a VM shutdown event is retrieved from the event queue. 2.4.6.4 Conclusions The current implementation of the IaaS SLA Manager, handles a set of terms inside the negotiation process, introducing concepts in order to satisfy energy or performance needs for the virtual systems negotiated, allowing infrastructure manager to support different business models and scenario for IaaS and PaaS layer. IaaS SLAM supports SLA Monitoring, to enable enforcement policies. Next releases will monitor not only SLA violations, but by using threshold monitoring, will inform the IaaS layer before the violation occurs in order to enable proactive adaptation. 2.5 Overall ASCETiC System Flow 2.5.1 OVF Library and Interoperability Experimentation The energy consumption of Cloud computing continues to be an area of significant concern as data centre growth continues to increase. This section reports on interoperable within the architecture. The architecture supports interoperability through the use and implementation of the Open Virtualization Format (OVF) standard [137] within the OVF API component. OVF is an open standard for defining, packaging and distributing virtual appliances that can run virtualized on a Cloud. Initial performance evaluation results of the architecture are presented. The results show that implementing Cloud provider interoperability is feasible and incurs minimal performance and energy overhead during application deployment in comparison to the time taken to instantiate Virtual Machines. 2.5.2 Objectives The objectives of the experiments are to ascertain the viability of using OVF as a means to enable interoperability between Cloud providers while capturing energy requirements and having minimal impact on the energy consumption of the Cloud system as a whole. Thus the performance (response time) of the service life cycle phases, rather than energy characteristics of the architecture, are the focus as the longer a phase the more energy consumed. However,
87 insight is provided on power consumption of deploying a VM vs the overhead of using OVF. 2.5.3 Testbed To perform the experiments outlined in this section, a Cloud testbed was used. The Cloud testbed is located at the Technische Universität at Berlin (see Figure 37). The computing cluster consists of sixteen nodes. Each of these nodes is equipped with two quad-core processors with 2.66 GHz, 32 GB of RAM, 750 GB of local hard disk capacity and an IPMI card for administration. Each node is connected to two different networks and able to transfer full speed with one Gbit/s synchronously. The first network is dedicated for infrastructure management as well as regular data exchange between the nodes. The second network is available for storage area network usage only where storage nodes are accessible through a distributed file systems. While some hardware information is obtainable through the IPMI we measure the energyconsumption of each node just before the power supply unit. Each energymeter can measure voltage, current and power consumption. We use identical energy-meters to guarantee comparative measurements. Figure 37: TUB Cloud Testbed The actual devices are Gembird EnerGenie Energy Meters [87] that share their measurements over the local network. These devices can measure power up to 2500 Watts with an accuracy of 2% and are able to deliver two measurements per second. A dedicated node collects all measurements regularly and can share the aggregated information with monitoring components. Additionally, the Cloud Testbed deploys OpenStack [88] to manage virtual infrastructure and Zabbix [43]to store monitored data. The VMM component within the architecture was configured to use an energy aware scheduling algorithm that tries to minimise energy consumption of newly provisioned VMs.
88 The application we have chosen to fulfil the objectives of the experiments is a generic three tier web application. The three tier web application is composed of a set of VM images as illustrated in Figure 38. The load balancer image implemented via HAProxy[89] distributes load between application servers. These application servers are comprised of a single JBoss[90] web container running within a Java VM and have pre-installed a photo album application. The photo album application stores and retrieves data within a single MySQL[91] image. Figure 38: Three Tier Web Application Architecture. 2.5.4 Application To ascertain the feasibility of using OVF and showcase the energy awareness of the architecture in the life-cycle of a Cloud application, the experimental results in the subsequent section measure the time taken to complete each lifecycle phase. Additionally the energy consumption and resource usage of the VMs deployed are recorded. The experiments record the time taken to run the Cloud application through its life-cycle in five different VM configurations each performed over ten iterations to reduce variance. Each configuration of VMs requires the use of both the load balancer and MySQL backend after which the number of JBoss application server instances is incremented from one to five VMs. In addition to this, the per-vm attributed energy consumption and CPU utilization are recorded as part of the standard functionality in the architecture amongst many other metrics and KPIs that are readily available for retrieval.
89 2.5.5 Results Figure 39: Application life-cycle phases against the number of application server instances. Figure 39 shows the time associated with each phase of the life-cycle of an active application. The general performance message from the experimental results show that increasing the number of VMs impacts different life-cycle phases in different ways. From the graph it can be seen that as the number of application server instances increases from 1 to 5 VMs that the Submission and Negotiation phases using the OVF application descriptor remains negligible. The time to contextualize is constant at around 5 seconds. Deployment and Undeployment of the VMs via OpenStack increases as the number of application server instances increases and oddly the initialization time of the application decreases. This decrease in the initialization phase time of the experiment is a artefact that can be attributed to the specific application and additionally the mechanism used to detect when initialization of the application is complete. It can be accredited to the non-deterministic nature in which the application server instances register with the HAProxy load balancer, where by only a single instance is required for the application stack to be functional. As the number of instances increases there is a higher probability that one of these instances will be online and available to service requests.
90 Figure 40: Linear relationships of application life-cycle phases against the number of application server instances. From Figure 40 the linear relationships between the time associated with the completion of each life-cycle phase and the number of application server instances can be seen. A full set of the data from the experiment, including standard deviation of phase times, can be seen in Table 3. Table 3: Application life-cycle phases against number of application server instances. Application Server Instances Phase 1 2 3 4 5 Submission 0.4s (+/- 0.59) 0.4s (+/- 0.40) 0.2s (+/- 0.40) 0.2s (+/- 0.40) 0.2s (+/- 0.40) Negotiation 0.2s (+/- 0.40) 0.0s (+/- 0.40) 0.2s (+/- 0.40) 0.2s (+/- 0.40) 0.4s (+/- 0.49) Contextualization 5.4s (+/- 0.80) 5.0s (+/- 0.80) 5.4s (+/- 0.80) 4.4s (+/- 0.80) 5.4s (+/- 0.80) Deployment 56.0s (+/- 1.26) 72.4s (+/- 0.98) 87.8s (+/- 0.98) 103.2s (+/- 2.32) 120.0s (+/- 1.26) Initialization 122.4s (+/- 3.61) 108.2s (+/- 3.72) 93.4s (+/- 3.72) 79.0s (+/- 2.85) 70.2s (+/- 4.02) Undeployment 33.0s (+/- 1.10) 41.0s (+/- 0.80) 50.4s (+/- 0.80) 60.4s (+/- 1.50) 67.8s (+/- 0.75) From the standard deviation it can be seen that the variance between experimental iterations is minimal. Finally, Figure 41 shows the aggregated power consumption and CPU utilization as a moving average of a single deployment of the three tier web application using 2 application server instances and its other associated VMs. This provides early insight into the overheads of using the ASCETiC architecture. From the graph it can be seen that deployment and initialization of the virtual machines accounts for the majority of energy consumed through the time period 5-250 seconds. The initial phases of submission, negotiation and contextualization that utilize an OVF description and are directly attributable to the energy consumption of architecture is minimal.
91 2.6 Other Components Figure 41: Power and CPU Utilization during deployment 2.6.1 SaaS Application Packager This component it is in charge of package non-programming Model applications provided in XML format and preparing and adapting their format into an understandable one for ASCETiC PaaS Layer, in order that this layer would be able to deploy the application provided. The Application Packager component is implemented as an Eclipse IDE plugin, ready to use by the user through a graphical user interface, doing this process transparent and very easy for his point of view, without this component, these changes would have to be done by hand, doing this step very tedious and very susceptible to make mistakes in it. 2.6.1.1 Motivation The Application Packager it is motivated for the need to collect all the annotated application coming from the modelling tools, convert that format to the OVF native format used by the rest of ASCETiC layers. 2.6.1.2 Contributions The Application Packager coordinates with the Chef repository and the VMIC the preparation of the images necessary to deploy the user application. At the same time, it will start the deployment of the application to the PaaS layer and notify to the user of the possible SLA agreements that the PaaS layer reached with the several IaaS infrastructure. 2.6.2 SaaS Virtual Machine Image Constructor The Virtual Machine Image Constructor (VMIC) is responsible for the generation of images within the ASCETiC toolbox. 2.6.2.1 Motivation The VMIC is motivated through the need to support self-adaptation by automating service construction at the SaaS level. This reduces a Cloud
92 application s development life-cycle and enables changes to application to be made faster for the purpose of improving energy efficiency. 2.6.2.2 Contributions Although the VMIC by itself is not scientifically novel or provides any scientific findings, the component does support a self-adaptive software development process in the ASCETiC SaaS SDK Layer of the architecture. This is achieved through the automation of image construction that would otherwise make the burden and cost too high of consider iteratively adapting an application to use less energy in the software development stage through incremental out-ofband (of normal application operation) deployments test. In addition to the above, the VMIC could be considered a contribution from the perspective of Software Engineering. No current software solution provides capabilities to both generate base image that contain a functional operating system and install and configure a Cloud application automatically. For comparison, Packer [92] is a free open source tool that can be used to create golden images for multiple platforms from a single source configuration but does not provide support for the automated installation of software into these images. Another example of a tool with similar functionality to the VMIC is Vagrant [93]. This tool enables software development teams to create identical development environments but does not provide a mechanism to automate the deployment of software into these environments. 2.6.3 Code Optimizer Plug-in The Code Optimiser Plugin (COP), currently a standalone plug-in component still in active development, will play an essential role in the reduction of energy consumed by an application. This is planned to be achieved through the selfadaptation of the software development processes, by providing SaaS Java software developers the ability to directly understand the energy footprint of the code they write. 2.6.3.1 Motivation The key motivation behind the COP is the lack of Java tools available to software developers to assess and profile the energy consumption of applications they create. Current tools either support performance profiling or energy awareness on mobile platforms only. 2.6.3.2 Related Work There are two works that relate strongly with the functionality and potential scientific outcome of COP component. These go some way to providing application assessment and profiling at the software development level but have deficiencies. The first JVM Monitor[23], is a general purpose Java profiler integrated with Eclipse to monitor CPU, threads and memory usage of Java applications but does not consider energy consumption. The second tool is JouleUnit[24], an Eclipse-based workbench that provides tools for the visualization of energy profiling results on a per unit test basis that has been designed for the android mobile smart phone operating system. 2.6.3.3 Scientific Contributions The proposed novelty beyond the SotA of this component is in its generic java profiling capabilities (above that available in the discipline of mobile computing) that enables the energy assessment of code out-of-band of an application s normal operation within a developer s IDE.
93 2.6.3.4 Future Contributions Future contributions are expected to cover the creation of an energy model that once calibrated of a software developer s local machine will enable runtime prediction and energy attribution. This will take the form of static code analysis and runtime energy profiling. The static code analysis functionality will enable the detection of energy consumption hot spots while runtime energy profiling functionality will provide the ability to run an application outside of normal operation and ascertain its power consumption through the translation of profiling performance metrics input into an energy model. 2.6.3.5 Conclusions To conclude, whilst the COP component is still under development there are many exciting avenues of research that can fill the current energy-awareness gap in the software development of Java Cloud based applications. 2.6.4 PaaS Application Manager 2.6.4.1 Motivation The Application Manger (AM) component manages the user applications that are described as virtual appliances, formed by a set of Virtual Machines (VMs) that are interconnected between them. It has been designed to fulfil the following requirements: Allow the deployment of applications composed of several VMs. Manage the lifecycle of an application running in one or several Infrastructure Providers. Enable the deployment of an application based on energy-aware requirements. It uses the standard Open Virtualization Format document for the user to define their applications. 2.6.4.2 Related Work Updating the section of related work already presented in D3.1 we present a series of works that progress the state of the art in Application Management in the Cloud or looks like the market and research are moving toward them: Employing the traditional MAPE algorithm: monitoring, analysis, planning and execution, Kerstesz et al [128]. In [129], Carrasco et al. present an extended TOSCA framework to be able to deploy applications into heterogeneous Cloud Infrastructures. Cloudify version 3.0 [130] starts using TOSCA as their native way to define an application in a Cloud infrastructure, moving away from other non-standard description languages. As described in the previous version of this document, Cloudify it is able manage the orchestration of an application in Cloud environment. Apache Foundation released the project Brooklyn [131]. Brooklyn project aims to help developers to manage from their application a deployment in a Cloud environment, using to do so a collection of runtime policies. 2.6.4.3 Contributions Although the Application Manager by itself is not scientifically novel or provides any scientific findings, it has a central role coordinating the work of the different PaaS layer components: PaaS Energy Modeller, PaaS SLA Manager, PaaS Self- Adaptation Manager and Price Modeller. During this year the main focus of the development of the Application Manager was to make the integration of all
94 those components to support a Multi-Provider scenario (more information later on). The Application Manager offers a REST interface to the user. By using this REST interface the user can submit (from the ASCETiC SaaS layer) an application to be deployed into the ASCETiC IaaS provider. The application needs to be defined in the standard OVF format. The Application Manager it also offers a Graphical User Interface that it is detailed in the next section of this component. The Application Manager publishes at the same time in a message queue messages related with the status or changes in an application deployment so other components can react to it. Also, during this second year, the Application Manager provides elasticity actions support that both can be used by the user or by the PaaS Self- Adaptation Manager. Multi-provider workflow PaaS Application Manager PaaS AMQP Broker Provider Registry PaaS SLA Manager Application Monitor IaaS Providers VM Manager IaaS AMQP Broker IaaS SLA Manager Physical Hosts VMs VM Monitoring VM Manager Manager Probes Figure 42: Communications workflow in a multi-provider scenario (This figure it is a simplification of the ASCETiC Y2 architecture just to explain the multi-provider scenario at PaaS level). We can identify two different workflows for the multi-iaas provider at PaaS level. The first one at the moment of deployment and the second one after the application has been deployed. These are the steps for the deployment scenario: 1. When the Application Manager receives and OVF, it creates an OVF template from it and it passes it to the PaaS SLA Manager. 2. The PaaS SLA Manager will check the possible available IaaS provides by checking the Provider Registry. It will negotiate with each one of them and the best offer could be selected manually or automatically. 3. After the provider has been selected. The Application Manager can start the preparation of the images for that provider and deploy them, checking the endpoints with the Provider Registry. Once the application has been deployed, the following interactions can happen:
95 The Application Manager subscribes to the IaaS AMQP Broker and resubmits all the relevant messages for the deployed VMs to the PaaS AMQP Broker. For the messages related to energy and power consumption of VMs, the Application Manager stores that information into the Application Monitor at PaaS level. The PaaS SLA Manager monitors the application deployment all the time to verify that the agreement terms are meted as expected. The deployed application VMs can have installed application monitoring probes by the user that send the information directly to the Application Monitor at PaaS level. 2.6.4.4 Future Contributions The future work of the Application Manager will focus in exposing new functionality of the different components that integrates to the SaaS layer and support the deployment of an application in different providers. Although it was initially planned for this second year, during the third year an study of TOSCA as application definition format and an initial basic support will be performed at the level of the Application Manager. 2.6.4.5 Conclusions In summary, the Application Manager is coordinating the workflow and different interactions between the different PaaS components, with the exception of the Application Monitor. It verifies that each step for an application submitted from the SaaS layer has been successfully executed before passing to the next step. After the application it is deployed, it is the entry point for the user to know the different details of the application in the IaaS provider: to get information about the different VMs that compose the application or to access the PaaS Energy Modeller to get energy estimations or consumptions reports. 2.6.5 Application Monitor 2.6.5.1 Motivation The Application Monitor allows the applications and processes that are running inside the virtual machines to push information related to the events and metrics that are important to achieve energy-awareness of the applications. The application-related metrics cannot be measured by the IaaS layer, since such layer is not able to measure the events that occur inside a VM and because there is not a single pattern of metrics/events for all the potential applications that can run in the Cloud. The Application Monitor has been designed to fulfil the following requirements: Heterogeneity: every application has its own set of metrics and the PaaS/IaaS layers are unaware of them. The Application Monitor must be flexible enough to allow the client applications to define their own set of metrics and their respective data structures. High frequency/low latency: many applications that access the Cloud will push their metrics simultaneously. Some applications may send metrics at a relatively high frequency (every second). It is important to enable the high frequency of metrics pushing while keeping a low response time to the clients. Scalability: the chosen solution must be scalable to allow the fulfilment of the low latency requirement by the addition of new hosts.
96 Support for analytics: to minimize the network traffic, the Application Monitor must support the in-situ analytics of its information. Decoupling and easy integration: the Application Monitor must be able to work in a tightly coupled architecture, like ASCETiC PaaS layer and easily integrated with a wide range of applications and components. Visualization: ability to visualize graphical information about the application metrics, in form of time-series graphs. In addition to the technical implementation of the Application Monitor component, the work within this component also includes the definition and analysis of the metrics set that describes the performance of a given application, as well as its implementation as Application Probes within the client-side application framework. 2.6.5.2 Related Work Currently, there are many existing software components that partially fulfil the requirements of the Application Monitor. However, none of the following related works implement the required set of features that the ASCETiC Application Monitor has to implement to fulfil the project requirements. RRDTool [94] is a data logging system for data logging and graphing for timeseries data that stores the information in a table-based Round-Robin Database (RRD). While RRDTool performs well on small and medium-sized installations, some experiments reported that RRDTool is not scalable to a Cloud-scale solution because of the high number of system I/O calls [95]. JRobin [96] is a Java reimplementation of RRDTool that aims to enhance the scalability of RRDTool by implementing robust cache and threading support. Leaving aside the scalability issue, both RRDTool and JRobin store their information in a single, big table. The heterogeneity of the potential applications requires defining more complex data structures, as stated in the previous section. Cube [97] is a time-series data collection and analysis framework that is essentially very similar to the ASCETiC Application Monitor. It is implemented in NodeJS [98] and uses a MongoDB backend and allows the clients to push timestamped JSON documents and an interface for its later analytics. We decided to discard Cube as a candidate for implementing our Application Monitor because, while NodeJS claims to be a highly scalable solution for web services, we argue that it does not allow low-latency for CPU-intensive applications such as database analytics. The main reason is that NodeJS does not support multithreading. The other main argument against Cube it that the analytics are performed at the NodeJS service layer while our Application Monitor performs this task in the MongoDB backend thus maximizing the scalability (since MongoDB implements MapReduce) and minimizing the transferring of data between the service and data layers. Because of the heterogeneous nature of Cloud applications, there is not a uniform set of probes and metrics to monitor the performance of any type of application that is deployed within the ASCETiC infrastructure. One of the main contributions of the ASCETiC Application Monitor is to enable Cloud Applications to define a schema-free set of documents that can be reported from the Application Probes to the Application Monitor for their later retrieve and analysis. Vaquero et al [100] define a set of scalability metrics for several types of applications, from the point of view of a PaaS provider. However, their metrics
97 are defined mostly in terms of low-level metrics (network, compute, disk ) and how they impact in the PaaS Quality of Service. The work of Singh et al. [101] intends to generalize software metrics for a SaaS platform. It differentiates between software-specific and SaaS-specific metrics. It evaluates high-level, business-related metrics such as cost in order to enforce the economic feasibility of the Cloud. However, the purpose of our metrics work in ASCETiC is more related to low-level metrics in order to optimize the performance of the applications in terms of energy. The work of Yazbek et al. [102] differentiates service-oriented measurement in different layers (development, IT, Business, Marketplace ). It describes an infrastructure for monitoring Cloud Applications in different aspects and defines a semantic Metric ontology as well as the complete process cycle to evaluate the proposed metrics. 2.6.5.3 Scientific Contributions There is no real scientific work in the Application Monitor, in the sense of providing contributions to the current state-of-the-art knowledge. The Application Monitor is however an essential engineering work, that offers an innovative set of features to fulfil its requirements. Heterogeneity MongoDB is a document-based database that allows storing and analysing JSON documents, which enables flexibility to allow the applications to define their own monitoring data structures, according to the users preferences. Low latency The services frontend has been created from the Play! Framework [99], which is a lightweight Scala/Java framework, which unlike NodeJS-based and similar frameworks, allows multithreaded services for CPU-intensive workloads. Most of the calculations have been moved to the database layer to minimize the data transfer between hosts. We also take advantage of the MongoDB bundled analytic mechanisms. This also helps fulfilling the analytics requirement stated previously. The database has been configured to be round robin, overwriting the oldest data when the database reaches a maximum size and new data is inserted. It exhibits a positive impact in performance according to the MongoDB implementation. Scalability Both Play! and MongoDB are designed for scalability. In addition, both the services and data structures have been designed avoiding any type of dependency between them, so an operation in a given node/thread should not have to impact negatively the performance. The scalability studies of the ASCETiC Application Monitor have been already described in the Deliverable 3.1 of this project [103]. Support for analytics The Application Monitor supports all the potential for analytics that is provided by the MongoDB analytics engine, which accepts JSON queries to transform and aggregate the stored documents. In order to facilitate the user and programmer usage of the analytics, it has been created for this component a new query language called MongoAL
98 (MongoDB Aggregation Language) [104], a human-readable, SQL-like language that softens the learning curve of MongoDB-based analytics. Decoupling and easy integration The probes can send their information through simple REST services. In the public source repository of the Application Monitor [105], many example scripts have been provided to send monitoring metrics, as well as a plugin to integrate the application probes with DropWizard Metrics reporting library. To decouple the integration of the rest of the components of the PaaS layer and to allow extra services architectures, in addition to the classical RPC-style patterns, the Application Monitor has been integrated with the Advanced Queue Message Protocol (AMQP) 1.0, wich allow other components to subscribe to metrics-related events (periodical values, deployments, undeployments, etc ) [107]. Visualization The Application Monitor aims to be a user-friendly application that allow users retrieving real-time information about application and metrics (Figure 43 and Figure 44). It uses most of the state-of-the-art web technologies to provide a usable and modern user interface: AngularJS [108] to provide a highly dynamic environment. Bootstrap [109] libraries to allow a responsive look and feel that selfadapts to the device and the screen size. D3.js [110] and HighCharts [111] for dynamic visualization of information. Figure 43: Application Monitor main dashboard
99 Figure 44: Application Monitor dynamic time-series graphing 2.6.5.4 Future Contributions We plan to add machine-learning mechanisms to implement watchdogs which would anticipate some events to the ASCETiC components as well as the application administrators (e.g. a metric will go over a given threshold). This task would be done with the support of the Energy Modeller components. 2.6.5.5 Conclusions The Application Monitor brings to the ASCETiC ecosystem capabilities to record and analyse a broad range of application-level metrics. We have chosen a tailored solution because current existing data logging systems do not fulfil all the requirements the ASCETiC project has with respect to the monitoring of application metrics. While the Application Monitor intends to be as generic as possible to handle all types of metrics, the probes that run in the application side must be specific for each application. This means that application owners must define them according to their concrete interests. Depending on the type of application, the probes may be executed at different levels. For example, metrics that are related with processes information (e.g. the %CPU consumed and a given set of processes) will run as an independent script within the VM; metrics that are related to concrete events (e.g. a batch task starts/ends, a service request has been received) must be implemented within the same application. In the latter case, it is not required to modify directly the software in order to support the Application Monitor, but the application owners must integrate some type of proxy, HTTP filter or dependency injection to tightly integrate the probes with their applications. 2.6.6 PaaS Provider Registry This component is in charge of storing the specific details for the different IaaS providers. 2.6.6.1 Motivation The PaaS layer of ASCETiC can connect to one or several IaaS provider that it are being used to deploy the different applications. It is necessary to store in a registry the specific information for each one of these providers.
100 2.6.6.2 Contributions There is no novelty beyond SotA for this component. It is just a necessary component so the PaaS layer knows all the endpoints to the IaaS layer and specific details. The actual implementation it is basically a REST interface to a database where the different PaaS component can query the details of the different available providers. 2.7 KPI and Metrics 2.7.1.1 Introduction Management of KPIs or SLA Management is a key feature of innovative cloud platforms, because it enables cloud users to specify their application performance requirements in order to satisfy their business needs. At the same time, it allows cloud providers to allocate their resources in the most efficient way. In order to handle SLAs, each cloud layer needs to collect metrics and calculate KPI values for achieving their objectives. For instance, the IaaS layer, interested in optimizing the resources utilization, supports metrics related to infrastructure resources utilization. The PaaS layer, whose interest is in the application performance, collects metrics related to its specific performance, regardless of the infrastructure, such as the elapsed time to perform specific operations. Such growing level of abstraction increases the complexity of defining and measuring metrics and KPIs. This increase in complexity results in existing energy-efficiency techniques requiring adaptation in order to bring benefits in a stratified cloud computing infrastructure [112]. While virtual systems, at IaaS layer, have well defined characteristics (e.g. CPUs, Memory, Storage) and metrics (e.g. availability, power, CPU load, Memory Utilization), applications at PaaS layer and services at SaaS layer have a wide variety of characteristics and metrics. In fact, application metrics and KPIs depends on the services they implements (e.g. web servers, CMS) and even the way they are implemented (n-tier applications); this requires an approach for describing how to measure application behaviour (metrics) and its performance (KPIs) consistent across different scenarios. Not only, each layer has its own goals and then its own metrics and KPIs, across different layers there must be consistency. This consistency enable the PaaS layer to talk and negotiate with the underlying physical infrastructure (IaaS) the resources it requires as specified by the SaaS layer. This mechanism, of metrics and KPI translation, is a fundamental capability especially in the perspective of Y3, is to enable inter-layer optimization by allowing cooperation and translation of KPIs across the different ASCETiC Cloud layers. This will extend the benefits of having per layer optimization capabilities across the whole cloud stack, thus allowing cloud service providers to specify and negotiate their required service performance via SLAs terms bound to application specific KPIs. At the same time, it will be the role of ASCETiC cloud stack layers to translate such performance needs into infrastructure requirements. This year, the first milestone has been to identify and implement, accordingly to the role of each layer, metrics and KPIs of interest. In particular metrics specific to ASCETiC energy aware services to allow: the IaaS layer to implement workload optimization capabilities, the PaaS layer to implement energy and cost aware services and the SaaS layer to support users to define requirements and optimization goals for their applications.
101 In the next sections we will discuss KPIs and related metrics that are of relevance for each Cloud layer. The discussion will start the IaaS layer, then it will move to PaaS layer and finally to SaaS layer, with an increasing level of abstraction and complexity of metrics and KPIs. Finally, this section will be concluded by a brief discussion on the important issue of handling KPIs and metrics across the Cloud platform. 2.7.1.2 IaaS The IaaS layer is in charge of allocating VMs to physical hosts. This requires metrics such as computing VM and host level power and energy consumption information. Metrics at this layer therefore fall into several categories namely: Energy and Power Metrics VM Manager Metrics General Infrastructure Monitoring Metrics In order to compute such metrics it relies on the: Energy Modeller: to calculate power and energy consumption of VMs and hosts. Virtual Machine Manager: to assign VMs to physical hosts utilising as well as determining appropriate VMM performance metrics. Metric Name Flops Per Watt Units Mflops / Watt Year Y2 Component Providing the Metric Energy Modeller Formula (if any) The benchmarked flops / maximum host power Aggregation Level Per Host Time to which measure refers Static Flops Mflops Y2 Energy Modeller The benchmarked flops, from SciMark2 Per Host Static Total Current VM Power Usage Watt (W) Y2 Energy Modeller Sum of all vm allocated power Per IaaS Provider Total Current Host Power Usage Watt (W) Y2 Energy Modeller Sum of all the power consumption of all known hosts Per IaaS Provider VM power to host power ratio double Y2 Energy Modeller iaas_total_vm_power / iaas_total_host_power Per IaaS Provider Host power unallocate d to VMs Estimated Power Watt (W) Watt (W) Y2 Energy Modeller iaas_unallocated_host _power Per IaaS Provider Emulated Watt As derived from linear Meter + Zabbix or polynomial Y2 Per Host + Custom regression model + Scripts cpu utilisation Table 4: Energy and Power Metrics at the IaaS Layer Delay in CPU Utilisation Flops Per Watt and Flops: are new metrics to Year 2 that better describe a physical host. They are static properties that are defined during calibration that compare the speed of the physical host to the power that it consumes at full load.
102 The infrastructure as a whole can be analysed with current power metrics for all VMs, all physical hosts, or the ratio between these two values. The VM power to host power ratio can be used as a guide to determine how much power is not been proportioned to a VM. Equally the metric Host power unallocated to VMs, can be used in a similar fashion as it is derived from hosts that are idle without VMs. These give a notion of how much power can be saved through consolidation and the switching off of physical hosts, but may also be used to ensure that the maximum amount of power is attributed to IaaS provider s users VMs. The last power metric is called Estimated Power its principle purpose is to aid scalability and to remove the definitive need for a Watt meter to be attached to all physical hosts. It works by calibrating a model that maps host utilisation to power consumption and is further discussed in section 2.4.2.3. The VMM is an important part of the IaaS layer and it performs the assignment of VMs to physical hosts. The metrics that it provides mainly describe the performance of the IaaS provider as a whole, these metrics are: Metric Name Units Year Component Providing the Metric Formula (if any) Aggregation Level Time to which measure refers vmmrebalancetime Seconds (s) Y2 VMM Time needed to find a "rebalanced" VM placement Per IaaS Provider vmmconsolidation -score Y1 VMM idle_hosts/hosts Per IaaS Provider vmmdistributionscore Y1 VMM non_idle_hosts/h osts Per IaaS Provider vmm-avgdeploymenttime vmm-currentvms-power vmm-currentvms-cost Seconds (s) Y1 VMM Watt (W) Y1 VMM Y1 VMM Avg time needed to deploy a VM (from deployment to request to "active" state) Current power consumption of all the VMs deployed Cost of all the VMs currently deployed Table 5: IaaS VM Manager Performance Metrics Per IaaS Provider Per IaaS Provider Per IaaS Provider These metrics fall into several categories: Ratios: such as vmm-consolidation-score and vmm-distribution-score provide an indication of how well the VM manager is keeping resources busy.
103 Timings: vmm-rebalance-time and vmm-avg-deployment-time provide measures of key events such as deployment duration and how long it takes to reschedule VMs. Power and Cost: The last category ensures the VM manager can determine both the cost and power consumption of VMs that are deployed on the infrastructure. The final category of metrics at the IaaS layer is the general infrastructure monitoring metrics. In addition to the metrics provided by default by the Zabbix monitoring infrastructure the following additional metrics have been added: Metric Name Units Year Component Providing the Metric Formula (if any) Aggregation Level Time to which measure refers Spot CPU double Y2 Zabbix + Custom Scripts The current CPU utilisation Per Host Last second Spot CPU per Core double Y2 Zabbix + Custom Scripts The current CPU utilisation for the nth Core Per Host Last second Network In bytes Y2 Zabbix + Custom Scripts Sum of all traffic coming into a node on the 'eth' adaptors Per Host Last Second Network Out bytes Y2 Zabbix + Custom Scripts Sum of all traffic coming out of a node on the 'eth' adaptors Per Host Last Second Disk In bytes Y2 Zabbix + Custom Scripts Sum of all traffic written to disk (all sd[a-z] disks) Per Host Last Second Disk Out bytes Y2 Zabbix + Custom Scripts Sum of all traffic read from disk (all sd[a-z] disks) Per Host Last Second
104 Spot Cache Utilization - miss count Integer Y2 Zabbix + Custom Scripts The count of cache misses Per Host Last Second Spot Cache Utilization - miss fraction double Y2 Zabbix + Custom Scripts cache misses / cache references Per Host Last Second Spot Cache Utilization - references Integer Y2 Zabbix + Custom Scripts The count of cache references Per Host Last Second Spot Utilization - page fault Integer Y2 Zabbix + Custom Scripts The count of page faults on the host Per Host Last Second Table 6: IaaS General Monitoring Metrics In order to correctly profile physical hosts and VMs the utilisation various metrics have been created to get real-time measurements of the hosts. This avoids issues such as averaging over long periods that means it s harder to compare metrics and create the association between a metric and the power consumed. These metrics are for: CPU (including on a per core basis), network, disk and cache. The metrics typically measure the last second, although this is customisable and utilise scripts that make use of the Linux proc file structure as well as perf and LibVirt. 2.7.1.3 PaaS Before proceeding with the discussion on how the PaaS layer provides support to metrics and KPIs, it is important to clarify the key role of the events within this layer. Events play an important role at the PaaS layer in defining application performance. They are created during the development phase and at deployment time and they trigger interactions between the application and the ASCETiC cloud platform. Events triggered by the application are captured by probes installed inside the virtual machine and reported to the PaaS monitoring system. Events can be used to monitor the application behaviour: for instance whenever it sends a query to the database or when it invokes an internal method or library. The events allow the capturing of behaviour or interactions with specific components (library, database) or subsystems (I/O system, OS call, external library) to the fine grained application level, as opposed to the course grained level of the virtual machine. In this way, events describe relevant behaviours of applications that can be measured by the PaaS layer. Such measurements includes: the time this event lasted, the resources it consumed, the number of time it has triggered, their power consumption and cost. The PaaS layer uses events also to compute KPIs such as: the average consumption, the cost, the number of events occurring within a
105 period of time (e.g. events per hour) and the average time it takes an event to complete (event average duration). The PaaS layer requires input and output show in Figure 45, in order to handle events and thus obtain KPI and metric values. SaaS Layer Application Events Application SLAs PaaS Layer Current KPIs value Current Cost per application Current Energy per application Future Energy per application Events data Consumption data IaaS Layer Figure 45 PaaS Layer KPIs and Metrics The Input from the SaaS layer: Application Events Application SLAs Application events are triggered by applications and collected by the monitoring system. They can be used to calculate the average duration and the total number of events generated by a deployed application. Application SLAs provides to PaaS layer a mechanism to define acceptable application performance. PaaS layer monitors KPIs and compares their value with SLAs, specified by the SaaS layer, to detect anomalies in the current application deployment and, if necessary, it takes the required actions to address the issue and restore the application performance. In order to deliver its platform services, the PaaS layer requires measurements related to an application deployment: Events data (duration, number of events) Consumption data (power, energy) Events data includes information such as: the application component where the event occurred (the VM), the time when it started and the time when it ended. Together with events information, energy consumption information is required by the PaaS layer to calculate the energy spent by an application and its events. Power measurements are retrieved from interface between IaaS and PaaS layer, while event information is stored in the application monitor. The PaaS layer provides the following information: Current KPIs for each application Current application cost Current consumption Predicted application consumption Predicted application cost A KPI, computed at application deployment time, provide a value against which an SLA is evaluated to detect performance anomalies. In case of a
106 violation, the PaaS layer could react to restore the suboptimal situation. The same applies to energy consumption and costs, in fact they are also compared against values specified at SaaS layer, in order to ensure that application deployment and consumption does not violate price and consumption SLAs. The following table reports metrics that have been identified during the ASCETiC activities. Metric Name Units Year ENERGY_APPLICATION ENERGY_EVENT WattHour (Wh) WattHour (Wh) Y1 Y1 POWER_APPLICATION Watt Y1 POWER_EVENT Watt Y1 EVENT_COUNT Total/Tim e Y2 EVENT_DURATION Seconds Y2 FixedPrice-per-VMTypeper-hour Power-per-VM type perhour Y3 Watt Y3 Component Providing the Metric Energy Modeller Energy Modeller Energy Modeller Energy Modeller Energy Modeller Energy Modeller Pricing Modeller Energy Modeller Formula (if any) Integration of power value over a time interval for the VM where application is running Integration of power value over a time interval for the same events type Average of power value over a time interval for the VM where application runs Average of power value over a time interval for events of the same type Count the number of event of the same type over a given amount of time (hour,day) Calculate for the same type of event, the average duration Calculate the price that a type of VM (database,frontend) has. Calculate the average instant power that a type of VM (database,frontend) has. Table 7: PaaS Layer Metrics Aggregation Level AVG AVG COUNT AVG AVG Time to which measure refers Available Samples Time Available Samples Time Available Samples Time Available Samples Time Available Events Available Events Timestamps Available Samples Available Energy Samples The metrics at PaaS layer can be split in two main categories: Application metrics: such as ENERGY_APPLICATION and POWER_APPLICATION, are collected to measure the current consumption of an application at level of its virtual machine. Such metrics can be calculated over a period of time during which the application has been deployed. Event metrics: such as ENERGY_EVENT, POWER_EVENT, EVENT_COUNT and EVENT_DURATION, are collected to measure performance characteristics of an application s event. They are supposed to be used by the energy modeller to provide estimation of consumption per event and their performance (duration
107 and number). In the future implementation, these metrics could be used for forecasting application behaviour and to connect application level KPIs to infrastructure KPI. The current PaaS layer handles events that are generated by application for measuring its consumption and costs. Future work will leverage events to enable KPI translation across the different Cloud layers. In fact, by defining application behaviour in term of its events performance, the PaaS layer could enable optimization across all layers by translating SaaS metrics and KPIs in to IaaS infrastructure requirements to be negotiated with the Cloud provides. 2.7.1.4 SaaS The role of measurement at the SaaS layer can serve different purposes depending if taking place during development time (initial development or refactoring) or during production time. During a development or refactoring stage, measurement sessions can be used to study how a current SaaS application implementation performs for different quality criteria and to estimate how much it would cost to operate in production. For this purpose, the SaaS development team would perform measurement test session with set of workloads crafted to represent targeted customer groups or to exercise particular application behaviour of interest. Measurement test sessions would then be conducted on an ASCETiC Cloud testbed with an underlying physical hardware representative of what could be expected from many IaaS providers. Different deployment configuration alternatives of the SaaS application can be instantiated on the testbed, for instance where certain software component of the SaaS application are collocated on a same VM or distributed on different VMs and then the workloads crafted by the development team can be exercised on each SaaS application instance corresponding to a feasible deployment configuration alternative. Subsequently, measurement results can be used first to identify deployment configuration alternatives with the best quality and operational cost profile and second, to determine realistic KPI thresholds for the various metrics of interest. In case, none of the deployment configuration alternatives studied achieve the desired quality level for certain quality criteria or an acceptable operating cost then refactoring will most likely be needed until an business operation effective solution is identified. It is worth mentioning that during development time, given that measurement test session are fully controlled, it is possible to obtain more accurate measurements for certain metrics. For instance, it is possible to craft workload on a specific isolated behaviour which would be impossible to achieve in production. Furthermore, the SaaS development team may also decide to probe their own application to collect measurement on specific sub-components or tasks of their application. While such measurements may also be of interest in production, in certain cases, they are exploratory and their purpose is to help the SaaS development team to identify particular aspect of the SaaS application implementation that would benefit from refactoring. Importantly, during development time, the main objective of measurement test sessions are to help SaaS provider along with a SaaS development team with specifying utility/evaluation function for each aspect that will later be
108 measured at production time. As highlighted in Section 2.2.2, elaborating desirable and realistic utility functions even on a single quality criterion is a difficult exercise if done without referencing measurement data. Thus, providing a combine utility function where desirable trade-offs between quality criteria and operational cost usually proves impossible without referencing actual measurement data. During development time, SaaS provider and SaaS development team may also elect to explore deployment configuration alternatives where a part of a SaaS application is hosted in an ASCETiC Cloud while another part of the SaaS application can be hosted at an IaaS whose Cloud environment is not running the ASCETiC PaaS and IaaS components. In such cases, certain measurements, notably energy-related ones, will only be available for the part of the SaaS application under the management of the ASCETiC Cloud. Conducting experiments where part of the application is not monitored for energy remains very relevant since not all IaaS providers will elect to share energy consumption information. SaaS provider would on the other hand not discard using such IaaS providers if their service offering is competitive on price. During production time, measurement takes place mostly to verify that the desired quality criteria and operational cost level are maintained between desirable threshold values. When violations are detected then several scenarios are possible, in a non-mutually exclusive way: The SaaS provider is notified The ASCETiC PaaS layer is notified and it manages the execution of generic self-adaptation actions related to horizontal or vertical elasticity. The SaaS application is notified and it executes some self-adaptation actions that go beyond horizontal or vertical elasticity, for instance, discard session from certain non-paying users, redeploy a degraded version of a SaaS application with less functionality, etc. Such SaaS level self-adaptation is assumed to be managed by the SaaS application itself. However, monitoring KPI values and eventually, the violation to specified utility functions can be provided by a service of the ASCETiC PaaS layer. After consulting ASCETiC use case partners and interviewing additional SaaS providers, it has been determined that next to energy, metrics on time performance and on operational cost would provide very useful information for self-adaptation actions. It is also acknowledged that other criteria such as security or reliability are important and could also provide additional selfadaptation opportunities to be studied later. Consequently, the remaining of the section will at the moment focus on metrics for energy-related behaviour, time performance and operational cost. During Year-1, definition of energy consumption behaviour effectiveness and energy efficiency followed the goal-question-metrics paradigm. This approach showed useful to specify what and how to perform measurements. Thus, it is also repeated in Year-2. Thus the sub-sections below on operational cost, energy and time are structured by first identifying goals followed by questions and then metrics useful to evaluate these goals.
109 Operational Cost goal, question and metrics Although the ultimate goal of a SaaS provider is to minimise operational cost, a more pragmatic one consist of maintaining operational cost below an anticipate threshold. Metrics to determine this threshold depends on several factors identified in the questions below. Goal: Maintain the operational cost of providing a SaaS application in the Cloud (in private/community Cloud in single/multi/hybrid deployment mode) Viewpoint: SaaS provider Important aspects to capture for aggregating measurements of various metrics into a relevant utility function for evaluating operational cost are: Do the IaaS provider hosting the SaaS application and the SaaS provider belong to the same organisation? (In other words is the SaaS application hosted in a private Cloud?). Answering this question will help determine if a portion of operational costs related to the following aspect should be included: o hardware amortisation o human resources dedicated to the support of hardware and Cloud management o Global energy consumed to operate a data centre not attributed directly to a single SaaS application such as energy spent on building-wide cooling or lighting For each IaaS provider considered (private or external), does it propose different pricing models? For each IaaS provider considered (private or external), does it propose dynamic pricing models? If yes, o What digital resources or IaaS behaviour are available at fixed prices? o What digital resources or IaaS behaviour are available at varying price? Total or average Watt-hour consumed over a given period Number of SaaS requests over a given period Number of byte transferred bits or bytes over a given period Number of support assistance with short answers guaranteed, etc. For each external IaaS provider considered, does it propose variable pricing according to the number or volume of digital resources leased? Are human resources spent to deploy the SaaS application for a new customer? Are human resources spent to support the SaaS application for existing customers? From the question above, metrics of potential interest to use as KPI to build a utility function on operational cost are listed in Table 8.
110 Metric Name Units Year AvgHardwareAmortizati on-cost-per-vm-minutes AvgHardwareSupport- Cost-per-VM-minutes GlobalIndirectCost-per- VM-minutes Avg-HR-Cost-for- SaaSApplicationDeploy ment-per-customer Avg-HR-Cost-for- SaaSApplicationSupport -per-month FixedPrice-per-VMTypeper-minute FixedPrice-per- StorageType-per-GBytes Size DynamicPrice-per- AvgWattHour-of-VM per-minute DynamicPrice-per- AvgWattHour-of- Storage per-gbyte transferred AvgFixedPrice-perevent-type AvgDynamicPrice-perevent-type Component Providing the Metric Y3 User Input Y3 User Input Y3 User Input Y3 Y3 Formula (if any) estimated total amortization cost of hardware at private data centre per year / Total VM-minutes operated per year at data centre Estimated Total cost dedicated to supporting hardware at private data centre per year / Total VM-minutes operated per year at data centre EstimatedGlobalIndir ectcost-for- private data center per year / Total VM-minutes operated per year at data centre Estimated-HR-Costspent on setting up a new customer Avg-HR-Cost-spent on supporting a SaaS application Y2 PaaS API Y2 PaaS API Y2 PaaS API Y2 PaaS API Y3 PaaS API Y3 PaaS API Aggregation Level Table 8: SaaS - Metrics for defining KPIs and utility function on operational cost. Time to which measure refers Constant update every year (or more frequently if necessary) Constant update every year (or more frequently if necessary) Constant update every year (or more frequently if necessary) Constant update every year (or more frequently if necessary) Constant update every year (or more frequently if necessary) In addition to metrics above useful to evaluate operational cost during production time, SaaS development team may be interest in measurement to guide them to evaluate potential gain of refactoring certain components or tasks. For instance, certain functional task of a SaaS application may require large or frequent transfers of data across certain SaaS application components. Given that certain IaaS provider may have dynamic pricing scheme based on number of requests or number of bits transferred across digital resources, it may be desirable to decouple such heavy data-handling tasks from other ones. The decoupled tasks could then run independently in their own VM, open container
111 or process. Importantly, monitoring would be possible with applicationindependent probes and different types of self-adaptation actions to reduce the operational cost incurred by these heavy data handling tasks could be envisaged. Thus, using measurement session with measurement probes configured to measure particular application scope limited to just certain components such as VMs but also open containers or OS processes, a SaaS development team will be able to collect additional measurement of interest, if probes for measuring open containers and processes are added to the ASCETiC toolbox. Instead of listing additional metrics in Table 8, we just note that metrics related VM could be transformed to mean Open Container or OS processes. Energy Goals and Metrics In Year 1, the Goal on Energy Consumption Behaviour Effectiveness was defined with questions and related metrics. These metrics are reviewed in Table 9 and additional ones are presented. Additional goals related to energy are proposed: Goal: Benefit from IaaS special conditions on digital resources by capping the power usage of an application or its VM. Viewpoint: SaaS provider Transitively, the IaaS can exploit the known constraint on maximal power used by a SaaS application or its VMs. This will facilitate scheduling more concurrent VM and more efficiently use the underlying hardware resources. However, to accept this constraint on power limit, a SaaS provider will expect to benefit from special conditions such a drastically reduced cost, free access to digital resources in the future, etc. Questions related to the power capping constraints are: Can the entire SaaS application scope have its power limited at certain period of time? Are there only some application VMs which can have their power limited at certain period of time, for instance, the load balancing server might still be able to serve the desired number of requests and users using a CPU with reduced frequency during off-peak working hours. Are there only some application events which can have their power limited, for instance, an upload of files which may take place in batch mode in the background throughout the day or event in a future period when data treatment from these files is not needed or may be delayed? Goal: Identify what refactoring could benefit the SaaS application Viewpoint: SaaS application development team As mentioned for operational cost, SaaS development team can fully control the workload exercised on a SaaS application as well as the measurement scope to focus on using custom-probes if needed. Thus once the SaaS application VMs or even the SaaS application code is probed, specially crafted workloads can be exercised on the SaaS application to obtain the desired
112 measurements. Although different questions interest SaaS development team compared to the SaaS provider, metrics on Watt-Hour consumption enable answering questions of interest to both stakeholders. In brief, Watt-hour measurements at the application and VM level will help SaaS provider to identify deployment configuration alternatives with adequate results for production use and the same set of metrics taken on narrower scope will assist SaaS development team in spotting the type of refactoring needed to improve the energy consumption profile of the SaaS application. Questions of interest to the SaaS development team are: What is the energy consumption of a given set of VMs, open containers and/or OS processes when a particular task is executed with different types of workloads? How significant is the energy consumption of data handling in comparison to energy spent on computation for a whole SaaS application, a set of its VM, open containers and/or OS processes? Finally, a last goal targeted to appeal to eco-conscious SaaS end-users is identified. Given the increased awareness of climate changes due to carbon emission, many citizens have consciously decided to be careful about their energy consumption. SaaS provider may therefore be interest to propose energy information with SaaS end users. Goal: Increase awareness of SaaS end-user on the energy consumption related to Cloud services they consume. Viewpoint: SaaS provider (to appeal to SaaS end-users) An efficient way to present information to end users in a useful way might be to provide a comparison of a given end user with other groups of end-users with a similar profile. Alternatively, comparing the energy consumption related to SaaS application usage of a given end-user with more common electrical appliances used in everyday life such as a light bulb, a toaster, a hair dryer, etc. may provide useful information. Metric Name Units Year AvgWattHour-for- Application (over time interval) AvgWattHour-for- Application-VM (over time interval) AvgWattHour-forapplication-event (over time interval) WattHour (Wh) WattHour (Wh) WattHour (Wh) Y1 Y1 Y1 AvgWatt-for-application Watt Y1 Compo nent Providi ng the Metric PaaS level PaaS level PaaS level PaaS level Formula (if any) API to PaaS App Manager (see ENERGY_APPLICATION at PaaS level for complete definition) API to PaaS App Manager (ENERGY_VM at PaaS level) API to PaaS App Manager (see ENERGY_EVENT at PaaS level for complete definition) API to PaaS App Manager (see POWER_APPLICATION at PaaS level for complete Aggregati on Level AVG AVG AVG AVG Time to which measure refers User specified Time interval User specified Time interval User specified Time interval User specified Time interval
113 AvgWatt-for-application VM AvgWatt-forapplication-event ChronologicalWattHourfor-Application (over time interval) ChronologicalWattHourfor-Application-VM (over time interval) ChronologicalWattHourfor-event (over time interval) MaxWatt-forapplication (over time interval) MaxWatt-forapplication-VM (over time interval) MaxWatt-forapplication-event (over time interval) WattHourRatioCompute VSDataHandling-forapplication (over time interval) WattHourRatioCompute VSDataHandling-forapplication-VM (over time interval) WattHourRatioCompute VSDataHandling-forapplication-event (over time interval) AvgWatt-Hour-for- asaasenduser- WattHourRatio-foraSaaSEnduser- VS-Avg Watt-hour of application (over time interval) Watt Watt Seq [WattHour] Seq [WattHour] Seq [WattHour] Watt Watt Watt Ratio of AvgWattHour / AvgWattHour Ratio of AvgWattHour / AvgWattHour Ratio of AvgWattHour / AvgWattHour Y1 Y1 Y1 Y1 Y1 Y3 Y3 Y3 Y3? Y3? Y3? Watt-Hour Y3? Ratio of AvgWattHour / AvgWattHour Y3? PaaS level PaaS level PaaS level PaaS level PaaS level PaaS level PaaS level PaaS level PaaS level PaaS level PaaS level SaaS SaaS definition) API to PaaS App Manager (POWER_VM at PaaS level) API to PaaS App Manager (see POWER_EVENT at PaaS level for complete definition) Sequence of AvgWattHour taken every 10 seconds during a specified time interval on all application VM Sequence of AvgWattHour taken every 10 seconds during a specified time interval on a given application VM Sequence of AvgWattHour taken every 10 seconds during a specified time interval API to PaaS App Manager (MAXPOWER_APPLICATI ON) API to PaaS App Manager (MAXPOWER_VM) API to PaaS App Manager (MAXPOWER_EVENT) Compare the ratio of energy consumed by all application VM performing data handling against energy consumed performing computation Compare the ratio of energy consumed by a given VM performing data handling against energy consumed performing computation Compare the ratio of energy consumed by an application event performing data handling against energy consumed performing computation Estimate the average energy consumption for a given end-user Compare average Watt-hour for a given end user vs average Watt-hour for all end user (over the same time interval) AVG AVG AVG AVG AVG Ratio Ratio Ratio Avg Ratio Table 9: SaaS - Metrics for defining power and energy usage of applications User specified Time interval User specified Time interval User specified Time interval User specified Time interval User specified Time interval User specified Time interval User specified Time interval User specified Time interval User specified Time interval User specified Time interval User specified Time interval User specified Time interval User specified Time interval
114 Time Performance Goals and Metrics The main goal related to time performance is Goal: Maintain specified deadlines for various operations of a SaaS application Viewpoint: SaaS provider Time performance can be gauge using duration of tasks (or events) of a SaaS application, the data transfer speed and the quantity of data that must be transferred and processed. Questions to help identify a utility function on time performance are: Must all application tasks or event be processed with the same speed? o If not, what are the types of requests with more crucial deadline constraints? Is it important to maintain an overall average response time normally distributed for all requests? Is it important to always remain under a given deadline? Do event with crucial response time constrains require transferring large amount of data speed? Metric Name Units Year Avg-Response-time-for all-application-events (over time interval) Avg-Response-time-for a given type of application-events (over time interval) Max-Response-time-for all- application-events (over time interval) Max-Response-time-for a given type of application-events (over time interval) application-eventscount (over time interval) Avg (K)bit/bytes transferred-for allapplication-events (over time interval) Avg (K)bit/bytes transferred-for a given type of applicationevents (over time interval) Component Providing the Metric Millisec Y2 PaaS level Millisec Millisec Millisec Millisec Millisec Millisec Y2 Y3 Y3 Y2 Y3 Y3 PaaS level PaaS level PaaS level PaaS level S/PaaS level S/PaaS level Formula (if any) Average time taken to complete an event (or a request) over a given time period Average time taken to complete a given event (or request) over a given time period Maximum time taken to complete an event (or a request) over a given time period Maximum time taken to complete an given event (or request) over a given time period Count number of events over a given time period Average data transfer rate for all application event (or request) over a given time period Average data transfer rate related to a given event (or request) over a given time period Table 10: SaaS - Time Performance Metrics Aggregation Level Avg Avg Max Max Count Avg Avg Time to which measure refers User specified Time interval User specified Time interval User specified Time interval User specified Time interval User specified Time interval User specified Time interval User specified Time interval
115 As mentioned for Operational Cost and Energy, the metrics in Table 10 can be taken during development time where SaaS development team can control the type of workloads to exercise on the SaaS application as well as provide an accurate scope for taking the measurement only on the given set of VMs, open containers, OS processes and even on more specific SaaS elements if custom measurement probes are provided or injected in the SaaS application. 2.7.1.5 Metrics within the Cloud stack functionalities Important ASCETiC functionalities are enabled by KPIs across different Cloud layers. In particular the following scenarios: SLA negotiation Application training Monitoring at PaaS and IaaS layer SLA negotiation (Figure 46) involves metrics since the initial SaaS layer interaction with the users requesting the negotiation. In fact the user specifies a set of constraints inside a SLA template and then the template will be used for negotiating with the PaaS layer the resources that will be allocated for the application. Such constraints can be expressed as an upper and/or a lower bound of SLA resource terms (e.g. CPU, VM and Memory). The later request of deployment will have to satisfy such requirements in order to be accepted. Whenever the PaaS layer receives a request for negotiation collects information about available IaaS providers and then it negotiates with each provider the requested resources by generating an IaaS SLA template. The lower layer checks the current resources (the servers resources availability, the power consumption) and estimate the price for its offer and then this is returned to the PaaS layer. The PaaS layer, based on the response from all IaaS providers, selects the best offer and returns it to the SaaS layer. Figure 46: SLA Negotiation The PaaS layer can provide an estimation of consumption at application level, but in order to deliver such functionality, it requires having an initial dataset available. Such training set is used to model the application energy consumption and then estimate its deployment costs (Figure 47). Such data set is created by collecting metrics handled at PaaS layer about energy spent by the application VMs. The training, which is specific to each application, is carried out by a specific set of VMs that are in charge of reproducing a synthetic workload over a deployed application. This can be achieved by using workload generation tools such as JMeter scripts or application specific functionalities.
116 Figure 47: Training application models Once an application is deployed, management of SLAs is required (Figure 48). In order to carry such task, the Cloud platform monitors the current values of KPIs, in order to detect if within a certain level (PaaS or IaaS) a violation occurs. If the violation occurs than it is possible to take action to mitigate the violation. In particular at PaaS layer, SLAs are checked per application basis, for instance to detect if a deployment is consuming more energy than it has been agreed, or if the current price is higher than a given threshold. At IaaS layer instead, not only power is considered, but also it can be taken into account the current status of the infrastructure and in particular with the goal of reducing the overall resources utilization with an efficient deployment and migration strategy. Figure 48: Monitoring These three previously introduced functionalities that involves metrics and KPIs management at different level, are steps toward the future interlayer optimization where the fully cooperation of PaaS and IaaS will deliver even better benefits in term of efficient resources allocation. For instance, monitoring could take place in an integrated fashion in order to allow PaaS layer to inform the IaaS layer to verify if a more efficient placement for a currently deployed application can be achieved in order to limit its overall power consumption. In another scenario, cooperation between PaaS and IaaS will allow SaaS users to check the energy required in order to run a specific configuration for their application, carrying out such scenario involves having knowledge of the application that is going to run and the available infrastructure resources.
117 3. Architectural Component Novelty In this section we summarise in tabula form on a per component basis the novelty presented in Year 2 ASCETiC architecture. Component Name ASCETiC Programming Model SaaS KPI Modelling and Visualisation Tools PaaS Self-Adaptation Manager PaaS Energy Modeller PaaS Virtual Machine Contextualizer PaaS Pricing Modeller PaaS SLA Manager Virtual Machine Manager IaaS Energy Modeller IaaS Pricing Modeller Infrastructure Monitor Infrastructure Manager Novelty Scheduling policies support at application-level to optimize performance, energy and cost 1) Generic SaaS Goals and Requirements patterns 2) Intuitive interface to identify deployment configuration to achieve feasible quantitative nonfunctional requirements on performance, energy and cost. Decision engine for deciding on the type of adaptation to make at PaaS. Energy awareness support at application level to allow the infrastructure to operate under its optimization policies without interference. Embedded software dependencies of a service into a VM image and their configuration at runtime via an infrastructure agnostic contextualization mechanism. Interoperability support between IaaS providers as well as multi-provider scenarios through recontextualization. Novel pricing Models for energy-aware cost estimation per VM. Extensible SLA Terms to support negotiation of energy and performance terms at PaaS level. SLA Monitoring support. Minimisation of cluster energy consumption in Virtual Machines deployment. Support of scheduling and management policies that consider the pricing estimations. 1) Physical resource profiling: enhanced models, their auto selection through goodness of fit testing and improvements made to calibration through a new standalone calibration tool. 2) Watt meter emulator that allows power values for hosts without associated Watt meters to be used within the ASCETiC framework. 1) IaaS energy-aware pricing schemes 2) Handling time-varying energy cost faced by IaaS providers, as announced by energy providers. Enhancement of the scalability and applicability of the ACETiC toolbox by ensuring VMs can be monitored out of the box without the need for modification of the IaaS layer caused by the installation of Zabbix monitoring probes. Enhanced OpenStack installation using a SDN access and aggregation network in combination with CephFS, allowing fast VM migration with
118 minimal impacts on running services. IaaS SLA Manager Extensible SLA Terms to support negotiation of energy and performance terms at IaaS level. SLA Monitoring support. Table 11 - Summary of Novelty within the ASCETiC Architecture Next, the user guides for each layer of the ASCETiC toolbox is presented. 4. Software User Guide In this section we present the user guides for each layer of the ASCETiC toolbox. 4.1 SaaS Layer 4.1.1 Using the Programming Model The ASCETiC Programming Model user guide was already presented in the first year deliverable D3.1: Static Energy Efficiency, more precisely in Section 3.1.1. In order to avoid duplicating content we have not included again what was already explained in D3.1, thus we encourage the reader to first read that section in case of need. Here, in this section, we only focus on the changes made to the interface in order to support the new scheduling policies implemented in the runtime for this second year of the project. As extensively described in Section 2.2.1, the three self-adaptation algorithms introduced during this second year in the ASCETiC PM Runtime are: Minimizing energy consumption: while considering boundaries for cost and performance. Minimize cost: while considering boundaries for energy consumption and performance. Maximizing performance: while considering boundaries for energy consumption and cost. In Figure 49, we can see a screenshot of the Deployment view in the PM Plug-in. It is in this tab, in the Deployment Properties part, where the user can specify which is going to be the Optimization Parameter (i.e. Performance, Energy or Cost), together with the Power and Price boundaries.
119 Figure 49: Tab with Deployment Properties to specify optimization and boundaries If the user wants to provide boundaries for a maximum CE execution time, this information has to be specified in the constraints specification for each CE that has been included in the application. The way in which a constraint for a CE can be added was already presented in Deliverable D3.1, so we forward the reader to that document in case of doubt. As a conclusion, we have seen that the impact on end users is very low, since the only thing they need to specify is the policy and the boundaries and nothing else needs to be changed in their application. 4.2 PaaS layer The PaaS layer offers three different ways to interact with the SaaS layer or the application owner: a REST interface, a set of AMQP topics and queues and graphical user interface. In the following subsections, information about how interact with each one of those interfaces is presented. 4.2.1 REST API The PaaS layer exposes two REST interfaces to the user, one provided by the Application Manager to control the deployment and other one by the Application Monitor to access to all the application monitoring information. The Application Manager API it is detailed here: http://docs.applicationmanager.apiary.io/. The application monitor is further discussed in Section 4.2.3. 4.2.2 AMQP Several components of the PaaS level send messages to the AMQP that can be used by an Application to react to events at PaaS or IaaS level. Next, an
120 example is presented of the topics and queues published by the application manager. 4.2.2.1 Messages produced by the Application Manager The AMQP messages produced by the Application Manager are informative to other components to react to events in the deployment of an application. The Application Manager at this stage it is not listening to a topic to accept commands like in its REST interface. The following table contains a list of topics and message format: Topic: APPLICATION.<APP-NAME>.DEPLOYMENT.<DEPLOYMENT-ID>.SUBMITTED Message: { "applicationid" : "APP-NAME", "deploymentid" : "DEPLOY-ID", "status" : "SUBMITTED" } Topic: APPLICATION.<APP-NAME>.DEPLOYMENT.<DEPLOYMENT-ID>.NEGOTIATING Message: { "applicationid" : " APP-NAME", "deploymentid" : "456", "status" : "NEGOTIATING" } Topic: APPLICATION.<APP-NAME>.DEPLOYMENT.<DEPLOYMENT-ID>.NEGOTIATED Message; { "applicationid" : " APP-NAME", "deploymentid" : "DEPLOY-ID", "status" : "NEGOTIATED" } Topic: APPLICATION.<APP-NAME>.DEPLOYMENT.<DEPLOYMENT-ID>.CONTEXTUALIZING Message: { "applicationid" : " APP-NAME", "deploymentid" : "DEPLOY-ID", "status" : "CONTEXTUALIZING" } Topic: APPLICATION.<APP-NAME>.DEPLOYMENT.<DEPLOYMENT-ID>.CONTEXTUALIZED Message; { "applicationid" : " APP-NAME", "deploymentid" : "DEPLOY-ID", "status" : "CONTEXTUALIZED" } Topic: APPLICATION.<APP-NAME>.DEPLOYMENT.<DEPLOYMENT-ID>.DEPLOYING Message: { "applicationid" : " APP-NAME", "deploymentid" : "DEPLOY-ID", "status" : "DEPLOYING" } Topic: APPLICATION.<APP-NAME>.DEPLOYMENT.<DEPLOY.-ID>.VM.<VM-ID>.DEPLOYED Message: { "applicationid" : " APP-NAME", "deploymentid" : "DEPLOY-ID", "status" : "DEPLOYING", "vms" : [ { "vmid" : "VM-ID", "iaasvmid" : "IAAS-ID", "ovfid" : "OVF-ID",
121 "status" : "ACTIVE" } ] } Topic: APPLICATION.<APP-NAME>.DEPLOYMENT.<DEPLOYMENT-ID>.DEPLOYED Message: { "applicationid" : " APP-NAME", "deploymentid" : "DEPLOY-ID", "status" : "DEPLOYED", "vms" : [ { "vmid" : "VM-ID", "iaasvmid" : "IAAS-ID", "ovfid" : "OVF-ID", "status" : "ACTIVE" } ] } Topic: APPLICATION.<APP-NAME>.DEPLOYMENT.<DEPLOY.-ID>.VM.<VM-ID>.DELETED Message: { "applicationid" : " APP-NAME", "deploymentid" : "DEPLOY-ID", "status" : "DEPLOYING", "vms" : [ { "vmid" : "VM-ID", "iaasvmid" : "IAAS-ID", "ovfid" : "OVF-ID", "status" : "DELETED" } ] } Topic: APPLICATION.<APP-NAME>.DEPLOYMENT.<DEPLOYMENT-ID>.DELETED Message: { "applicationid" : " APP-NAME", "deploymentid" : "DEPLOY-ID", "status" : "TERMINATED", "vms" : [ { "vmid" : "VM-ID", "iaasvmid" : "IAAS-ID", "ovfid" : "OVF-ID", "status" : "DELETED" } ] } 4.2.3 Application Monitor The application monitor is the principle component for monitoring in the PaaS layer. It presents this monitoring data through its GUI which has two main tabs: System Status: provides an overall view about the recently running applications and the status of the Application Monitor system itself. App Metrics: shows real time information about a selected set of metrics from the applications. The rest of this section will detail each of the tabs. System status tabs This page has three sections: System status (Figure 50): shows some gauges to show the health of the Application Monitor: CPU of the system, CPU consumed by the process, System memory Usage and Object File Descriptors.
122 Active applications (Figure 51): gives an overall view of the active applications and the nodes that they use. Each active application is represented by an orange circle (with the name of the application) linked to grey circles (with the name of each node that executes such application) Recently Finished Deployments (Figure 52) shows information about the latest deployments on the system, concretely: o Id of the application o Id of the deployment o Start and end time of the deployment o Total energy consumption (in Wh) for the deployment. Figure 50: System status section
123 Figure 51: active applications section Figure 52: recently finished deployments section App metrics section This section provides an overall view of user-selected metrics for given applications, in real time (Figure 53).
124 Figure 53: overall view of "App Metrics" section Metric windows can be added and removed according to user preferences. The X button in the top-right corner of the window removes the window. To add a new metric window, the Add new graph button will show a new form (Figure 54) with the next information: Application ID: the ID of the application to monitor. Start typing and an auto-complete popup menu will appear. Deployment ID: this field is optional. If void, the average values for all the deployments will be shown. Start typing and an auto-complete popup menu will appear. Metric path: as metrics are reported as JSON trees, each individual value can be retrieved by its path in the JSON tree. Start typing and an autocomplete popup menu will appear (as shown in Figure 54). Description: just some human-readable description to show as title of the graph. Figure 54: new graph modal form
125 4.3 IaaS layer The principle component for controlling the IaaS layer is the Virtual Machine Manager. It provides both a REST interface and a GUI for end users. The requests to the Virtual Machine Manager are made in the same way as the requests to any other REST service and enable interaction between the PaaS and IaaS layers. For the end user we also provide a graphical interface that has been developed using AngularJS, a JavaScript front-end framework. It implements most of the functionalities offered by the REST interface. These are the sections that can be found in the graphical interface: Dashboard: shows the power consumption of each of the nodes of the cluster and their load (See Figure 55). Virtual Machines: As shown in Figure 56 shows information about the virtual machines deployed in the cluster (name, size, host, creation date, etc.). Images: contains information about the VM images uploaded to the cluster and illustrated in Figure 57. Scheduling algorithms: allows the user to configure the deployment policy (consolidation, distribution, energy-aware, performance-aware, group by app, random) and also to ask the system to look for a VM placement that is better than the current one according to the policy selected. Self-adaptation: allows the user to configure the self-adaptation engine of the VMM. The user can select if he wants the self-adaptation engine to try to find a new VM placement every time a VM is deployed, destroyed or periodically. The user can also select the algorithm used by the self-adaptation engine (simulated annealing, hill climbing, etc.) and the configuration parameters for those algorithms. Hosts: contains information about the hosts in the cluster (load, current power, etc.). Logs: shows some logs that allow the user to verify that the VMM is running without problems. Next, we show some screenshots that show the functionality described above.
126 Figure 55: Virtual Machine Manager - Dashboard Figure 56: Virtual Machine Manager - VMs
127 Figure 57: Virtual Machine Manager - Images 5. Conclusions This deliverable, describes the scientific contributions of the ASCETiC Toolbox prototype year 2. It is aids the understanding of both the scientific innovations of each of its components and the Toolbox as a whole and how to use the Toolbox through the User Guides. The scientific contributions are clearly presented first globally and then component by component, making distinctions when the component is scientifically relevant, or when it is a component providing basic services to the Toolbox. In the SaaS layer, both the SaaS Modelling Tools and the ASCETiC Programming Model present innovative ways of programming green applications and services and specifying user requirements and how to adapt the architecture. The Energy Modellers of both the PaaS and IaaS layers are essential for the ASCETiC Toolbox to achieve energy-awareness and further to assist in adaptation and decision support through predictive models. The Pricing Modellers have implemented various pricing schemes, setting the basis for next year. Thanks to the Energy Modeller and the Pricing Modeller, other components are augmented with energy and price features, such as the Application Manager, the SLA Manager, Self-Adaptation manager and the VM Manager. Through these components the overall architecture is made capable of adapting to meet the energy goals of users on a per layer basis. In order to summarise the enhanced features we introduce a section tabulating the key novelty of the components of the overall architecture. Finally, we provide the User Guides that will assist our potential initial users to get started with the first release of the Toolbox, use it and provide us with their feedback so we can improve it in future releases.
128 We believe this document also proves that the second year release of the ASCETiC Toolbox is a solid base to start tackling inter-layer Cloud stack adaptation, which will be considered during the third and final year of the project. References [1] Ewa Deelman, Gurmeet Singh, Mei-Hui Su, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi, G Bruce Berriman, John Good, Anastasia Laity, Joseph C Jacob, Daniel S Katz, Pegasus: A framework for mapping complex scientific workflows onto distributed systems, Journal of Scientific Programming, Volume 13, Num 3, pp 219-237, 2005. [2] Katherine Wolstencroft, Robert Haines, Donal Fellows, Alan Williams, David Withers, Stuart Owen, Stian Soiland-Reyes, Ian Dunlop, Aleksandra Nenadic, Paul Fisher, Jiten Bhagat, Khalid Belhajjame, Finn Bacall, Alex Hardisty, Abraham Nieva de la Hidalga, Maria P. Balcazar Vargas, Shoaib Sufi, and Carole Goble (2013): The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud, Nucleic Acids Research, 41(W1): W557-W561. doi:10.1093/nar/gkt328 [3] Goecks, J, Nekrutenko, A, Taylor, J and The Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010 Aug 25;11(8):R86. [4] Yong Zhao; Hategan, M.; Clifford, B.; Foster, I.; von Laszewski, G.; Nefedova, V.; Raicu, I.; Stef-Praun, T.; Wilde, M., "Swift: Fast, Reliable, Loosely Coupled Parallel Computation," Services, 2007 IEEE Congress on, vol., no., pp.199,206, 9-13 July 2007. doi: 10.1109/SERVICES.2007.63 [5] Vecchiola, Christian, Xingchen Chu, and Rajkumar Buyya. "Aneka: a software platform for.net-based cloud computing." High Speed and Large Scale Scientific Computing (2009): 267-295. [6] Yaokuan Mao, Wenjun Wu, Hui Zhang, and Liang Luo. 2012. GreenPipe: A Hadoop Based Workflow System on Energy-efficient Clouds. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW '12). IEEE Computer Society, Washington, DC, USA, 2211-2219. [7] The Java Grid Application Toolkit: http://www.cs.vu.nl/ibis/javagat.html [8] Java Non-Blocking Input / Output library: JSR 51 (NIO) http://www.jcp.org/en/jsr/detail?id=51, JSR 203 (NIO2) http://www.jcp.org/en/jsr/detail?id=203 [9] Unified Modelling Language (UML) version 2.4.1, [Online]. http://www.omg.org/spec/uml/2.4.1 [Accessed: 24 July 2015]. This version (2.4.1) has been formally published by ISO as the 2012 edition standard: ISO/IEC 19505-1 and 19505-2. [10] Rick Kazman, Mark H. Klein, Paul C. Clements, ATAM: Method for Architecture Evaluation, Pub. Software Engineering Institute, CMU/SEI Report Number: CMU/SEI-2000-TR-004, August 2000, URL accessed on July 24 2015: http://resources.sei.cmu.edu/library/asset-view.cfm?assetid=5177 [11] Raquel Hill, Jun Wang and Klara Nahrstedt, Quantifying Non-Functional Requirements: A Process Oriented Approach, Proceedings of the 12th IEEE International Requirements Engineering Conference (RE 04), URL:
129 http://cairo.cs.uiuc.edu/publications/papers/cre_raquel.pdf (accessed on July 24 2015) [12] Jean-Christophe Deprez and Christophe Ponsard, Energy related Goals and Questions for Cloud Services, MeGSuS 2014, Rotterdam, October 6 2014. [13] Christophe Ponsard and Jean-Christophe Deprez, Driving the Evolution of Cloud Software towards Energy Awareness, SATTOSE 2015 (to appear) [14] Christophe Ponsard and Jean-Christophe Deprez, Jacques Flamand, A UML KPI Profile for Energy Aware Design and Monitoring of Cloud Services, ICTSOFT 2015 (to appear) [15] Systems Modeling Language (SysML), Version 1.3, Release Date: June 2012. [Online]. http://www.omg.org/spec/sysml/1.3/pdf [accessed: 24July 2015] [16] A. Dardenne, S. Fickas and A. van Lamsweerde, Goal-Directed Concept Acquisition in Requirements Elicitation, Proceedings IWSSD-6: Sixth International Workshop on Software Specification and Design. IEEE Computer Society Press, 1991, pp. 14-21. [17] Daniel Gross, Eric Yu, From Non-Functional Requirements to Design through Patterns, February 2001, Volume 6, Issue 1, pp 18-36. [18] Pourshahid, A., Chen, P., Amyot, D., Forster, A.J., Weiss, M. (2007) Business Process Monitoring and Alignment: An Approach Based on the User Requirements Notation and Business Intelligence Tool. Proc. of the 10th Workshop on Requirements Engineering (WER'07), Toronto, Canada, May, 80-91. [19] jucmnav, tool for the User Requirements Notation. [Online]. http://cserg.site.uottawa.ca/ucm/bin/view/projetseg/webhome (accessed on July 24 2015) [20] Z.151 - User Requirements Notation (URN) - Language definition, ITU-T standard Z.151. [Online]. http://www.itu.int/itu- T/aap/AAPRecDetails.aspx?AAPSeqNo=1806 (accessed on July 24 2015) [21] European Commission: Cloud Computing Service Level Agreements - Exploitation of Research Results (2013) [22] Ponsard, C., Deprez, J.C., Flamand, J.: A UML KPI Pro le for Energy Aware Design and Monitoring of Cloud Services. In: 10th International Conference on Software Engineering and Applications (ICSOFT-EA) (July 2015) [23] JVM Monitor - Java profiler integrated with Eclipse. [Online]. http://www.jvmmonitor.org/ [Accessed - 08/06/15]. [24] C. Wilke, S. Götz, and S. Richly, JouleUnit: A Generic Framework for Software Energy Profiling and Testing, in Proceedings of the 2013 Workshop on Green in/by Software Engineering, 2013, pp. 9 14. [25] IBM, An architectural blueprint for autonomic computing. IBM, p. 34, 2005. [26] P. Cingolani and J. Alcalá-Fdez, jfuzzylogic: a Java Library to Design Fuzzy Logic Controllers According to the Standard for Fuzzy Control Programming. [27] P. Cingolani and J. Alcala-Fdez, jfuzzylogic: a robust and flexible Fuzzy- Logic inference system language implementation, in Fuzzy Systems (FUZZ- IEEE), 2012 IEEE International Conference on, 2012, pp. 1 8. [28] J. P. Magalhaes and L. Moura Silva, A Framework for Self-Healing and Self-Adaptation of Cloud-Hosted Web-Based Applications, in Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on, 2013, vol. 1, pp. 555 564.
130 [29] D. Perez-Palacin, R. Mirandola, and R. Calinescu, Synthesis of Adaptation Plans for Cloud Infrastructure with Hybrid Cost Models, in Software Engineering and Advanced Applications (SEAA), 2014 40th EUROMICRO Conference on, 2014, pp. 443 450. [30] G. Jung, M. A. Hiltunen, K. R. Joshi, R. D. Schlichting, and C. Pu, Mistral: Dynamically Managing Power, Performance, and Adaptation Cost in Cloud Infrastructures, in Distributed Computing Systems (ICDCS), 2010 IEEE 30th International Conference on, 2010, pp. 62 73. [31] Self-adaptation Challenges for Cloud-based Applications: A Control Theoretic Perspective. Soodeh Farokhi, Pooyan Jamshidi, Ivona Brandic, Erik Elmroth [32] S. Shang, Y. Wu, J. Jiang and W. Zheng, An Intelligent Capacity planning Model for Cloud Market,, Journal of Internet Services and Information Security [33] R. Aiello and L. Sachs, Configuration Management Best Practices: Practical Methods that Work in the Real World, 1st ed. Addison-Wesley Professional, 2010. [34] CFEngine 3 - Configuration Management Software for Agile System Administrators, September 2015. [Online]. Available: http://cfengine.com/ [35] Puppet - IT Automation for System Administrators, September 2015. [Online]. Available: http://puppetlabs.com/ [36] Chef - A Systems Integration Framework, September 2015. [Online]. Available: http://aws.amazon.com [37] Armstrong, M. (2006) Competition in Two Sided Markets, RAND Journal of Economics, 37: 668-691. [38] S. Balakrishnan and M. P. Koza. Information asymmetry, adverse selection and joint-ventures: Theory and evidence. Journal of Economic Behavior & Organization, Volume 20, Issue 1, January 1993, Pages 99 117. [39] R. Kavanagh, and K. Djemame. "A grid broker pricing mechanism for temporal and budget guarantees". Computer Performance Engineering, 2011. [40] P. Pawluk, B. Simmons, M. Smit, M. Litoiu and S. Mankovski. Introducing STRATOS: A Cloud Broker Service. IEEE 6th International Conference on Cloud Computing, 2013. [41] D. Niu, C. Feng and B. Li. "A theory of cloud bandwidth pricing for videoon-demand providers". In Proceedings of IEEE INFOCOM, 2012. [42] Ganglia Project, Ganglia Monitoring System, 2012. [Online]. Available: http://ganglia.sourceforge.net/ [43] ZABBIX SIA, Homepage of Zabbix:: An Enterprise-Class Open Source Distributed Monitoring Solution, 2014. [Online]. Available: http://www.zabbix.com/ [44] D. Bonde. Techniques for Virtual Machine Placement in Clouds. Master Thesis. http://www.cse.iitb.ac.in/synerg/lib/exe/fetch.php?media=public:students: dhaval:report.pdf [45] A. Shankar. Virtual Machine Placement in Computing Clouds. [Online]. http://www.cse.iitb.ac.in/synerg/lib/exe/fetch.php?media=public:students: dhaval:anjana.pdf [46] Jing Xu; Fortes, J.A.B., Multi-Objective Virtual Machine Placement in Virtualized Data Center Environments. Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int'l Conference on & Int'l
131 Conference on Cyber, Physical and Social Computing (CPSCom), vol., no., pp.179,188, 18-20 Dec. 2010 [47] Dong Jiankang; Wang Hongbo; Li Yangyang; Cheng Shiduan, "Virtual machine scheduling for improving energy efciency in IaaS cloud," Communications, China, vol.11, no.3, pp.1,12, March 2014 [48] Ricardo Stegh Camati, Alcides Calsavara, Luiz Lima Jr. Solving the Virtual Machine Placement Problem as a Multiple Multidimensional Knapsack Problem. ICN 2014: The Thirteenth International Conference on Networks pp.253-260. [49] Beloglazov A. and Buyya R. (2014), OpenStack Neat: a framework for dynamic and energy-efficient consolidation of virtual machines in OpenStack clouds, Concurrency and Computation: Practice and Experience. [50] OpenStack Neat - Dynamic Consolidation of Virtual Machines on OpenStack. [Online]. http://www.openstack-neat.org/ [51] planetlab-workload-traces [Online]. https://github.com/beloglazov/planetlab-workload-traces [52] Snooze - Virtual machines management system [Online]. http://snooze.inria.fr/ [53] Eugen Feller. Autonomic and Energy-Efficient Management of Large-Scale Virtualized Data Centers. Distributed, Parallel, and Cluster Computing. Universit e Rennes 1, 2012. [54] Contrail project. [Online]. http://contrail-project.eu/ [55] Speitkamp, B.; Bichler, M., "A Mathematical Programming Approach for Server Consolidation Problems in Virtualized Data Centers," Services Computing, IEEE Transactions on, vol.3, no.4, pp.266,278, Oct.-Dec. 2010 [56] https://code.google.com/p/googleclusterdata/ [57] Aziz Murtazaev and Sangyoon Oh. Sercon: Server Consolidation Algorithm using Live Migration of Virtual Machines for Green Computing. IETE Technical Review, 28 (3), 2011. [58] Graubner, P.; Schmidt, M.; Freisleben, B., "Energy-Efficient Management of Virtual Machines in Eucalyptus," Cloud Computing (CLOUD), 2011 IEEE International Conference on, vol., no., pp.243,250, 4-9 July 2011 [59] https://www.eucalyptus.com/eucalyptus-cloud/iaas [60] A. Beloglazov, J. Abawajy, R. Buyya. Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing, Future Generation Computer Systems, 28 (5), 2012, 755 768. [61] E. Feller, L. Rilling, C. Morin. Energy-aware ant colony based workload placement in clouds, in proceedings of the 12th ACM/IEEE International Conference on Grid Computing, Grid 2011, Lyon, France, 2011. [62] IPMI Tool. [Online]. http://sourceforge.net/projects/ipmitool/ [63] Sigar Java Library. [Online]. https://github.com/hyperic/sigar [64] CPU Load Generator. [Online]. https://github.com/beloglazov/cpu-loadgenerator [65] Docker. [Online]. https://www.docker.com/ [66] Rancher OS distribution. [Online]. https://github.com/rancher/os [67] NeuroPh Java Library page. [Online]. http://neuroph.sourceforge.net/ [68] J.L. Berral, I. Goiri, R. Nou, F. Julia, Ferran, J. Guitart, R. Gavalda, J. Torres. Towards energy-aware scheduling in data centers using machine learning. In Proceedings of the 1st International Conference on Energy-Efficient
132 Computing and Networking. New York, NY, USA: ACM, 2010 (e-energy 10), pp. 215 224. [69] I. Goiri, K. Le, Md.E. Haque, R. Beauchea, T.D. Nguyen, J. Guitart, J. Torres, and R. Bianchini. GreenSlot: Scheduling Energy Consumption in Green Datacenters. In Supercomputing (SC 11), Seattle, WA, USA, November 12-18, 2011. [70] K. Kant, M. Murugan, and D. H. C. Du. Willow: A Control System for Energy and Thermal Adaptive Computing. In proceedings of the International Parallel and Distributed Processing Symposium, May 2011. [71] N. Sharma, S. Barker, D. Irwin, and P. Shenoy. Blink: Managing Server Clusters on Intermittent Power. In proceeding of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, March 2011. [72] M. Macías, J. Guitart. Using Resource-level Information into Nonadditive Negotiation Models for Cloud Market Environments. 12th IEEE/IFIP Network Operations and Management Symposium (NOMS 2010). Osaka, Japan, April 19-23, 2010, pp 325-332 [73] M. Macías, J. Guitart. Client Classification Policies for SLA Negotiation and Allocation in Shared Cloud Data centers. 8th International Workshop on the Economics and Business of Grids, Clouds, Systems, and Services (GECON 2011). Paphos, Cyprus. December 5, 2011 [74] J. Guitart, M. Macías, K. Djemame, T. Kirkham, M. Jiang, D. Arsmstrong. Riskdriven Proactive Fault-tolerant Operation of IaaS Providers. To be presented in 5th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2013). Bristol, UK. December 2-5, 2013 [75] OptaPlanner constraint satisfaction solver. [Online]. http://www.optaplanner.org [76] OptaPlanner User Guide. [Online]. https://docs.jboss.org/drools/release/6.0.0.cr5/optaplannerdocs/html_single/#constructionheuristics [77] CloudSuite - Benchmarks. [Online]. http://parsa.epfl.ch/cloudsuite/cloudsuite.html [78] A. Kansal, F. Zhao, J. Liu, N. Kothari, and A. A. Bhattacharya, Virtual Machine Power Metering and Provisioning, in Proceedings of the 1st ACM Symposium on Cloud Computing, 2010, pp. 39 50. [79] G. G. Castañé, A. Núñez, P. Llopis, and J. Carretero, E-mc2: A formal framework for energy modelling in cloud computing, Simul. Model. Pract. Theory, vol. 39, no. 0, pp. 56 75, 2013. [80] A. E. H. Bohra and V. Chaudhary, VMeter: Power modelling for virtualized clouds, in Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, 2010, pp. 1 8. [81] H. Yang, Q. Zhao, Z. Luan, and D. Qian, imeter: An integrated {VM} power model based on performance profiling, Futur. Gener. Comput. Syst., vol. 36, no. 0, pp. 267 286, 2014. [82] J. W. Smith, A. Khajeh-Hosseini, J. S. Ward, and I. Sommerville, CloudMonitor: Profiling Power Usage, in Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on, 2012, pp. 947 948. [83] F. Farahnakian, P. Liljeberg, and J. Plosila, LiRCUP: Linear Regression Based CPU Usage Prediction Algorithm for Live Migration of Virtual Machines in Data Centers, in Software Engineering and Advanced Applications (SEAA), 2013 39th EUROMICRO Conference on, 2013, pp. 357 364.
133 [84] Z. Jiang, C. Lu, and Y. Cai, VPower: Metering power consumption of VM, 2013 IEEE 4th Int. Conf. Softw. Eng. Serv. Sci., pp. 483 486, May 2013. [85] K. Singh, M. Bhadauria, and S. A. McKee, Real Time Power Estimation and Thread Scheduling via Performance Counters, SIGARCH Comput. Arch. News, vol. 37, no. 2, pp. 46 55, Jul. 2009. [86] Roldan Pozo and Bruce Miller, Java SciMark 2.0, 2004. [Online]. Available: http://math.nist/gov/scimark2/ [87] GEMBIRD Deutschland GmbH. EGM-PWM-LAN data sheet. [Online]. http://gmb.nl/repository/6736/egm-pwm-lan manual 7f3db9f9-65f1-4508- a986-90915709e544.pdf 2013. [88] OpenStack: Open source software for building private and public clouds. [Online]. http://www.openstack.org/ [Accessed - 08/06/15] [89] HAProxy - A Reliable, High Performance TCP/HTTP Load Balancer. [Online]. http://www.haproxy.org/ [Accessed - 08/06/15] [90] JBoss Application Server. [Online]. http://jbossas.jboss.org/ [Accessed - 08/06/15] [91] MySQL - Open Source Database. [Online]. http://www.mysql.com/ [Accessed - 08/06/15]. [92] Packer Identical machine images for multiple platforms. [Online]. https://www.packer.io/ [Accessed - 08/06/15] [93] Vagrant Development environments made easy. [Online]. https://www.vagrantup.com/ [Accessed - 08/06/15] [94] RRDTool. [Online]. http://oss.oetiker.ch/rrdtool/ [95] RRDTool Scalability. [Online]. http://net.doit.wisc.edu/~dwcarder/rrdcache/ [96] JRobin. [Online]. http://www.opennms.org/wiki/jrobin [97] Cube. [Online]. http://square.github.io/cube/ [98] NodeJS. [Online]. http://www.nodejs.org [99] Play! Framework. [Online]. http://www.playframework.com [100] Vaquero, L. M., Rodero-Merino, L., & Buyya, R. (2011). Dynamically scaling applications in the cloud. ACM SIGCOMM Computer Communication Review, 41(1), 45-52. [101] Singh, Ram, Avinash Bhagat, and Navdeep Kumar. "Generalization of Software Metrics on Software as a Service (SaaS)." Computing Sciences (ICCS), 2012 International Conference on. IEEE, 2012. [102] Yazbek, Hashem, et al. "Service-Oriented Measurement Infrastructure." Software Engineering Research, Management and Applications (SERA), 2010 Eighth ACIS International Conference on. IEEE, 2010. [103] D3.1. Static Energy Efficiency. ASCETiC Project. CT-2013.1.2 Software Engineering, Services and Cloud Computing. October 2014. [Online]. www.ascetic.eu/assets/docs/usermanual.pdf [104] MongoAL: MongoDB aggregation Language. [Online]. https://github.com/mariomac/mongoal [105] Application Monitor public repository. [Online]. https://github.com/mariomac/appmon [106] DropWizard Metrics. [Online]. https://dropwizard.github.io/metrics/3.1.0/ [107] Advanced Queue Messaging Protocol 1.0. [Online]. https://www.amqp.org/ [108] AngularJS. [Online]. https://angularjs.org/ [109] Bootstrap. [Online]. http://getbootstrap.com/ [110] D3.js. [Online]. http://d3js.org/
134 [111] Highcharts. [Online]. http://www.highcharts.com/ [112] Recent Trends in Energy-Efficient Cloud Computing Toni Mastelic, Ivona Brandic Publication date: 2015/1 Journal: Cloud Computing, IEEE, Volume 2, Issue 1, Pages 40-47 [113] Amazon EC2 Pricing. [Online]. http://aws.amazon.com/ec2/pricing/. [114] GoGrid Cloud Hosting. [Online]. http://www.gogrid.com. [115] RackSpace Cloud Hosting. [Online]. http://www.rackspace.com/cloud. [116] JoyentCloud. [Online]. http://www.joyentcloud.com. [117] ElasticHosts. [Online]. http://www.elastichosts.com/ [118] Windows Azure [Online]. http://www.microsoft.com/windowsazure. [119] D. Lucanin, I. Pietri, I. Brandic, and R. Sakellariou, A Cloud Controller for Performance-Based Pricing, in Cloud Computing (CLOUD), 2015 IEEE 8th International Conference on, 2015, pp. 155 162. [120] Google Cloud Platform Pricing Calculator. [Online]. https://cloud.google.com/products/calculator/ [121] Harmonizing Global Metrics for Data Center Energy Efficiency White Paper. [Online]. http://iet.jrc.ec.europa.eu/energyefficiency/sites/energyefficiency/files/files /documents/ict_coc/dppe_e_20140430.pdf [122] GAMES- Green Data Centers: [Online]. http://www.greendatacenters.eu/ [Accessed: 9 th September 2015]. [123] Data Center Efficiency Assessment. Natural Resources Defence Council. August 2014. [Online]. https://www.nrdc.org/energy/files/data-centerefficiency-assessment-ip.pdf. [124] ECO2Clouds Experiment Awareness of CO2 in Federated Cloud Sourcing. [Online]. http://eco2clouds.eu/. [Accessed: 9 th September 2015]. [125] Nagios The Industry Standard In IT Infrastructure Monitor. [Online]. https://www.nagios.org/. [Accessed: 9 th September 2015]. [126] Usman Wajid, Cinzia Cappiello, Pierluigi Plebani, Barbara Pernici, Nikolay Mehandjiev Monica Vitali, Michael Gienger, Kostas Kavoussanakis, David Margery, David Garcia Perez, Pedro Sampaio, On Achieving Energy Efficiency and Reducing CO2 Footprint in Cloud Computing, accepted for publication on IEEE Transaction on Cloud Computing, 2015. [127] Libvirt Virtualization API. [Online]. http://libvirt.org/. [Accessed: 9 th September 2015]. [128] A. Kerstez, G. Kecskemeti, and I. Brandic. An interoperable and selfadaptive approach for SLA-based service virtualization in heterogeneous Cloud environments. Future Generation Computer Systems. Elsevier. Vol 32, pp. 54-68. 2015. [129] J. Carrasco, J. Cuby y E. Pimentel. Propuesta de metodología de despliegue de aplicaciones en nubes heterogéneas con TOSCA. XIX Jornadas de Ingeniería del Software y Bases de Datos. Pp. 321-334. Cádiz, 2014. [130] Cloud Orchestration & Automation Made easy. [Online]. http://getcloudify.org [Accessed: 9 th September 2015]. [131] Brooklyn A liubrary that simplifies application lifecycle and management. https://brooklyn.incubator.apache.org/. [Accessed: 9 th September 2015]. [132] Leelipushpam, P.G.J.; Sharmila, J., "Live VM migration techniques in cloud environment A survey," in Information & Communication
135 Technologies (ICT), 2013 IEEE Conference on, vol., no.,pp.408-413,, 11-12 April 2013 [133] VMWare, Virtual Machine Migration Comparison: VMWare Vspehre vs. Microsoft Hyper-V. [Online]. http://www.vmware.com/files/pdf/vmwvmotion-verus-live-migration.pdf [134] Nasim, Robayet; Kassler, Andreas J., "Network-centric Performance Improvement for Live VM Migration," in Cloud Computing (CLOUD), 2015 IEEE 8th International Conference on, vol., no., pp.106-113,, June 27 2015- July 2 2015 [135] Koerner, Marc and Stanik, Alexander and Kliem, Andreas, "An Approach for QoS Constraint Networks in Cloud Environments", Fourth International Conference on Network of the Future (NoF'13) (NoF'13), pp. 1-3 [136] Koerner, Marc and Stanik, Alexander and Kao, Odej, "Applying QoS in Software Defined Networks by Using WS-Agreement", Cloud Computing Technology and Science (CloudCom), Proceedings of the 2014 IEEE 6th International Conference on. IEEE Computer Society, 893-898. 2014 [137] Open Virtualization Format (OVF) - A standard from the Distributed Management Task Force. [Online]. http://www.dmtf.org/standards/ovf [Accessed - 08/06/15].