ADAPTIVE CLOUD SCHEDULING

Transcription

1 ADAPTIVE CLOUD SCHEDULING A dissertation submitted to The University of Manchester for the degree of Master of Science in the Faculty of Engineering and Physical Sciences 2014 Abdelkhalik Elsaid Mohamed Mosa School of Computer Science

2 Table of Contents Abstract... 9 Declaration Copyright Acknowledgements Chapter 1 : Introduction Motivation Research Aims, Objectives and Scope Methodology Contributions Dissertation Organization Chapter 2 : Background and Related Work Cloud Computing Cloud Deployment Models Cloud Services Architecture Cloud Computing Enabling Technologies Virtualization Technology Green Computing Related Work Heuristic Approach Utility Functions Cloud Computing Simulation Tools Existing Cloud Simulators CloudSim Chapter 3 : Design System architecture

3 3.2 Input, Processing and Output Model Utility Function Definition Cost Model Development The Finite Discrete Markov Chain Prediction Model Modelling CPU Utilization using Markov Chain Prediction CPU Utilization based on VM utilization Calculating Energy Consumption Calculating Possible Sources of SLA Violation Optimization Algorithm Representation design Initial Population Evaluation/Fitness function Genetic operators Convergence Chapter 4 : Implementation Steps for creating a basic cloud datacentre Initializing the CloudSim Package Creating the Data Centre Creating the Cloud Broker Creating the List of the Virtual Machines Creating the cloudlets Starting the Simulation Stopping the Simulation Printing the results Implementing the Adaptive VMs Assignment Finding Source and Destination Hosts The implementation of the utility function

4 4.4 Integrating the utility based strategy with CloudSim The Class Hierarchy Elements of the Adaptive cloud scheduling system Configuring the Experiments Conclusion of the Implementation Chapter 5 : Evaluation Performance Metrics Experiments Setup Experiment Experiment Experiment Conclusion Chapter 6 : Conclusion and Future Ideas Conclusion and Discussion Future Work Improving the Cost Model Considering all Computing Resources Multi-objective Optimization Generalized Framework for Adaptive Cloud Scheduling Improving the search Bibliography Word count:

5 List of Figures Figure 2-1: Cloud Deployment Models, from [10] Figure 2-2: Cloud architecture stack diagram, from [10] Figure 2-3: Operating System virtualization, from [10] Figure 2-4: Hypervisor based virtualization, from [10] Figure 2-5: Hosted Virtualization Figure 2-6: Current State, Action, and Possible State, from [34] Figure 2-7: The Action policies example Figure 2-8: The goal policies example Figure 2-9: Utility policy example Figure 2-10: CloudSim architecture, from [5] Figure 3-1: Green cloud system Architecture, from [45] Figure 3-2: Input, Processing and Output of the Adaptive Scheduling Problem Figure 3-3: CPU Utilization Transition State Diagram Figure 3-4: CPU Utilization Prediction Algorithm Using Markov Model Figure 3-5: The pseudo-code for computing CPU Utilization Figure 3-6: Power Consumption according to different utilizations, from [31] Figure 3-7: Predicted Energy Cost Figure 3-8: Violation cost depending on the number of VMs in violation Figure 3-9: Pseudo-code for Calculating the Cost of PDM Figure 3-10: General framework for the evolutionary algorithm, from [49] Figure 3-11: Solution vector representation Figure 4-1: Steps for creating a basic cloud datacentre Figure 4-2: The class hierarchy of the cloud adaptive scheduling problem Figure 4-3: Monitoring, Analysis, Planning and Execution (MAPE) loop design model Figure 5-1: Overall SLA violation to energy consumption after running configurations 1.1 of the first experiment,10 times using the utility and the heuristics based approaches Figure 5-2: Overall SLA violation to energy consumption after running configurations 1.2 of the first experiment,10 times using the utility and the heuristics based approaches

6 Figure 5-3: Overall SLA violation to energy consumption after running configurations 1.3 of the first experiment,10 times using the utility and the heuristics based approaches Figure 5-4: Overall SLA violation to energy consumption running Configuration times using the utility and the heuristics based approaches Figure 5-5: Overall SLA violation to energy consumption running Configuration times, using the utility and the heuristics based approaches Figure 5-6: Overall SLA violation to energy consumption running Configuration 3.1 and configuration times using the utility and the heuristics based approaches

7 List of Tables Table 5-1: Running Configuration times using the utility based approach. 74 Table 5-2: Running Configuration times using the heuristics based approach Table 5-3: Running Configuration times using the utility based approach. 76 Table 5-4: Running Configuration times using the utility based approach. 77 Table 5-5: The results of allocating 150 VMs to 150 hosts after running the experiment 10 times using the utility based approach Table 5-6: The results of allocating 150 VMs to 150 hosts after running the experiment 10 times using the heuristics based approach Table 5-7: Summary results of experiment Table 5-8: The results of allocating 150 VMs to 100 hosts after running Configuration times using the utility based approach Table 5-9: The results of allocating 150 VMs to 100 hosts after running Configuration times using the heuristics based approach Table 5-10: The results of allocating 200 VMs to 100 hosts after running Configuration 2.3, 10 times using the utility based approach Table 5-11: The results of allocating 200 VMs to 100 hosts after running Configuration times using the heuristics based approach Table 5-12: Summary results of experiment Table 5-13: The results of allocating 50 VMs to 50 hosts with energy cost of 3 after running Configuration times using the utility based approach Table 5-14: Summary results of experiment

8 List of Codes Code 4-1: Building the initial population of assignments Code 4-2: Selecting parents for mutation Code 4-3: Selecting parents for cross over Code 4-4: The getmutated() method Code 4-5: The getcrossover() method Code 4-6: The population for the next generation Code 4-7: calculation of the energy cost and the violation cost Code 4-8: Calculating PDM cost

9 Abstract Cloud computing plays a significant role in today s computing by delivering computing resources as pay as you go services over the Internet. Many organizations and individuals all over the world rely on cloud environments to support their applications, platform, and even the infrastructure. As a result of the huge demand on cloud services, cloud providers had to build enormous data centres to meet this increase in users needs from the cloud. However, these huge data centres consume great amounts of power which not only contribute to data centres operating costs but also increase the amount of carbon dioxide emissions. Energy efficient algorithms are required for minimizing the operating costs of the data centres and building green energy-aware cloud environments. Our goal in this work is to design and evaluate an optimized adaptive resource allocation and management algorithm which dynamically assigns virtual machines to existing hosts in the cloud data centre. This algorithm will not only save energy consumption but also meet the agreed upon qualities of service (QoS). Existing work followed a heuristic approach for managing the energy-performance trade-off. On the contrary, this work makes use of utility functions along with optimization for deciding which VMs should be allocated to which physical hosts while achieving the desired goal. This work describes in detail the selection of the utility properties, the creation of the utility function, and the relevant optimization technique for maximizing the required utility. The proposed technique will be validated by analysing and evaluating the performance using the CloudSim framework. 9

10 Declaration No portion of the work referred to in this dissertation has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning. 10

11 Copyright i. The author of this dissertation (including any appendices and/or schedules to this dissertation) owns certain copyright or related rights in it (the Copyright ) and s/he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes. ii. Copies of this dissertation, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has entered into. This page must form part of any such copies made. iii. The ownership of certain Copyright, patents, designs, trade marks and other intellectual property (the Intellectual Property ) and any reproductions of copyright works in the dissertation, for example graphs and tables ( Reproductions ), which may be described in this dissertation, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and commercialisation of this dissertation, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see in any relevant Dissertation restriction declarations deposited in the University Library, The University Library s regulations (see and in The University s Guidance for the Presentation of Dissertations. 11

12 Acknowledgements First and foremost, praises and thanks to Allah SWT, the Almighty, who has granted me countless blessings, the wisdom and perseverance during this research project, and indeed, throughout my life. I gratefully acknowledge the deepest gratitude to my supervisor, Professor Norman Paton, for his great mentorship and wise supervision. Without your continuing support and encouragement, this dissertation would never be a reality. My beloved parents, the most people in this world whom I love from the bottom of my heart. I appreciate all the sacrifices you made to raise me up. I could not find precious words to express my grateful thank to my beloved brothers (Mahmoud, Mohamed, Kamal, Yasser, Salah and Emad) and sisters (Fatema and Amany). Moreover, I am also grateful to my dearest wife Walaa for putting up with my late hours, my spoiled weekends and my bad temper, I love you. Finally, I would like to thank Dr. Ahmed Sobhy and Dr. Tarek Gaber, I cannot forget your help and support. Moreover, love and thanks are extended to Eng. Mostafa Zayed, Dr. Abdulrahman Alghamdi, Eng. Abdullah Al-Ahmari, Dr. Mohamed El-Sawy and Eng. Mohamed Safwat. 12

13 Chapter 1 : Introduction In this chapter, the motivation for this research will be demonstrated, besides the aims and objectives of the project will be delineated. In addition to that, this chapter provides an outline which summarizes what will be done in the following chapters. 1.1 Motivation Cloud computing is a new computing paradigm for delivering computing resources and services over the Internet [1]. Organizations, business owners and even individuals started using cloud services extensively instead of building their own data centres for providing the required services. Due to the high demand on cloud services, cloud providers had to build large-scale data centres to meet cloud users needs. For example, according to [2], in 2012, Amazon EC2 had more than 454,000 servers in 7 different regions all over the world and this number is continuously increasing. These giant data centres with hundreds of thousands of servers consume great amounts of energy. These huge amounts of the energy consumed resulted in a notable increase the datacentres operating costs which also affects the cloud users. Moreover, it has a bad effect on the environment due to the increase in the carbon dioxide emissions from these datacentres. As a result, reducing energy consumption in cloud datacentres became a goal for cloud providers both for their personal benefits and for the environment. Reducing the waste in energy consumption involves two parallel actions. The first action should be increasing the efficiency of power consumption in computing nodes in the cloud infrastructure. The second action should be improving resource utilization which can be done by deploying efficient resource monitoring and scheduling algorithms. Improving the infrastructure efficiency is a hardware issue so it is out of the scope of this work. On the other hand, building efficient resource scheduling algorithms that save energy consumption while meeting the service level agreement (SLA) will be the main goal of this work. Many of the datacentre hosts are continuously working even though they are underutilized. The average CPU utilization is less than 50% [3], and even if a server is completely idle, it still consumes about 70% of the maximum power that the server 13

14 normally consumes [4]. Therefore, efficient resource allocation and management algorithms are required to play their role in alleviating the energy consumption problem and the resulting CO2 emissions. These algorithms should switch underutilized servers into sleep mode after migrating the virtual machines from these underutilized servers to other servers that are not underutilized. By turning these unused servers into sleep mode, the amount of energy consumed and CO2 emissions will be reduced. Moreover, the cloud providers return on investment (ROI) will increase as the total energy cost will be reduced. In addition to that, the resource allocation algorithm will migrate virtual machines from overloaded servers to meet the required quality of service (QoS), which achieves the desired level of user satisfaction. To sum up, the expected benefits of developing efficient resource monitoring and scheduling algorithms for the cloud users, cloud providers, and the environment were the driving force for conducting this research. 1.2 Research Aims, Objectives and Scope The aim of the project is to design, implement and evaluate an optimized energy-aware adaptive resource scheduling algorithm. This algorithm should dynamically assign virtual machines to physical hosts in the cloud data centre. This adaptive assignment puts into consideration minimizing power consumption and meeting the negotiated service level agreements (SLAs). The proposed strategy depends on the deployment of utility functions and evolutionary algorithms for finding an effective and efficient assignment of VMs. This assignment will be rated against the utility fitness function. In order to achieving the aims of this research, the following objectives need to be accomplished: 1. Identifying and analysing existing techniques for the dynamic allocation of virtual machines to the physical hosts in cloud computing environments. 2. Selecting the properties that are required for the utility definition. This will be followed by defining the utility function that aims to capture the utility of an assignment without violating the cloud user s and cloud 14

15 provider s constraints. This will inform the definition of the associated cost model for computing energy consumption and SLA violation costs. 3. The implementation of a search over the space of possible assignments of VMs to physical hosts, using genetic algorithms. 4. Extending the current CloudSim toolkit [5] by setting up the experiment and integrating it with the implementation of the optimization algorithm. 5. Evaluating and comparing the proposed utility-based policy with existing heuristics action-based techniques in [6] using performance metrics. This research is concerned with addressing the dynamic allocation of virtual machines to hosts problem based upon a utility based policy. On the contrary, the task of dynamic allocation of workload to the virtual machines is out of the scope of this research. 1.3 Methodology The following research methodology was followed for achieving the research aim. 1. Background reading and literature review: A background reading and a review of the literature have been conducted to identify what has been previously addressed and how it was addressed. The reading involved reviewing research papers, journal articles and book chapters that describe the required strategies and techniques for addressing the research problem. The background reading was also crucial for understanding cloud computing concepts, the cloud deployment models and the cloud services architecture. Moreover, this reading made the idea of how efficient cloud computing environments can help in creating a green and energy-efficient environment clear. The background reading followed by a review of the state-of-the-art of the dynamic cloud scheduling techniques. A precise understanding of the heuristic approach, proposed in [6] and [31], for the dynamic consolidation and 15

16 deconsolidation of VMs for saving energy and meeting the SLA. This approach was thoroughly reviewed as it is the one which the results of this research will be compared with. The review and the background reading were important so as not to re-invent the wheel. Moreover, they enabled building a solid background and deep understanding of the technical concepts related to the research and the cloud computing generally. 2. Reviewing existing policies for handling self-management systems: Understanding different policies used for achieving the self-management in cloud computing environments is crucial to this research, as adaptive cloud scheduling involves monitoring and self-management of the cloud datacentre. The review conducted showed that there are three major policy types that can be used for building autonomic computing systems [7]. These policies are either rule-based action policies, goal policies or utility functions based policies. The utility function based policy is the self-management policy that has been deployed for monitoring and managing the cloud datacentre in this research. 3. Choosing and understanding the simulation toolkit: A survey of currently used cloud simulation tools had been conducted. The survey showed that there is a number of cloud simulation tools such as MDCSim, Green Cloud, icancloud and CloudSim. A comparison among those tools showed that icancloud is currently the most powerful cloud modelling and simulation tool [8] as it provides more features than any of the previously stated simulation toolkits. However, the proposed strategy in this research will be simulated using the CloudSim toolkit, so that the results of this project can be easily compared to the results found in the heuristic approach [6], which was implemented using CloudSim. 4. Understanding the heuristic approach and testing its results: The heuristic approach in [6] had been closely studied. The experiments reported from this approach has been checked and analysed by rerunning it using the CloudSim toolkit. Moreover, the values of the performance metrics have been thoroughly reviewed and reported. 16

17 5. Developing the proposed strategy using utility functions: The first step in the development was finding where the implementation of the utility based approach should be hooked into the existing CloudSim toolkit. Genetic algorithms have been used for formulating the utility function for finding a robust assignment that considers the constraints. The implementation of the genetic algorithm started by building the initial population of VMs-tohosts assignment and selecting parents for mutation and crossover operations. This followed by the implementation of the mutation and crossover functions. The utility function is implemented according to the description and design shown in chapter 3. The CloudSim is extended to support the utility based approach by integrating the optimization problem with the cloud environment. 6. Project evaluation: After setting up the experiment, application workloads CloudeLets are simulated using synthetic data. The next step was choosing and defining the performance metrics. This followed by defining different experiments that are going to be conducted and the objectives of each of these experiments. The objectives of the first experiment is to assess the effectiveness of the utility based strategy in a lightly loaded datacentres. In this experiment the number of VMs is equal to the number of hosts that they are going to be allocated to. The second experiment appraises the impact of larger number of VMs per physical machines on both the energy consumption and the overall SLA violations. The last experiment assesses the impact of different cost ratios on the strategy. General conclusions are delineated for all the conducted experiments with supporting graphs that compare the results from the proposed approach with the one found in the heuristic approach [6]. 1.4 Contributions The contributions of this research can be classified into five different areas: 1. Survey and literature review: The first contribution involved a survey of the state of the art in dynamic resource allocation techniques in the cloud computing environments. 17

18 2. Utility definition: The second contribution was in the definition of the utility function and identifying the utility properties. This utility function represents the objective of the optimization problem. 3. Cost model design: The third contribution was the cost model which predicts the percentage of CPU utilization. The CPU utilization will be used for computing both of the energy consumption cost and SLA violation (SLAV) cost which are crucial for calculating the utility. 4. Metaheuristic optimization: The fourth contribution was the design and implementation of the genetic algorithm that searches over the space of all possible VMs to the physical hosts assignments. This genetic algorithm seeks to find an effective assignment rated against fitness criteria. 5. Performance Evaluation: The last contribution was the evaluation of the values of the performance metrics resulted from running the proposed utility based approach. The evaluation also involved analysing the results and comparing it to the results of the heuristics based approach [6]. 1.5 Dissertation Organization The dissertation contains six chapters; Chapter 1 demonstrated the motivation behind this research. In addition to the research aims, objectives and its scope followed by the methodology and the main contributions. The remainder of the dissertation is structured as follows: Chapter 2 Background and Related Work. This chapter examines the general background related to cloud computing besides a review of energy-efficient computing. This will be followed by a review of the current dynamic cloud scheduling techniques and a review of the utility functions and how they can be 18

19 used to solve our problem. Finally, a survey of currently used cloud simulation tools and in depth review of CloudSim. Chapter 3 Design. This chapter describes the problem and the system model, in addition to the definition of the utility function and the design of the cost mode. Moreover, it involves the representation design and the design of the genetic algorithm. Chapter 4 Implementation. This chapter describes all the steps required for implementing the genetic algorithm and the utility function. The implementation also involves setting up the experiment using CloudSim and integration of the implemented algorithm with CloudSim. Chapter 5 Evaluation. This chapters presents all the parameters required for setting up the experiment and the definition of the performance metrics. An analysis of the simulation results will follow the running of the experiment. Chapter 6 Conclusion and future work. This chapter summarizes the results obtained and makes conclusions. Finally, it lists a number of future ideas that needs further research. 19

20 Chapter 2 : Background and Related Work This chapter describes the general background and the literature review required as a starting point for this research. The background involves reviewing all information required for understanding cloud computing and its related technologies. This general background will be followed by a review of the previous work related to dynamic resource allocation techniques in cloud computing environments. 2.1 Cloud Computing Cloud computing is a new computing paradigm in which computing resources (hardware, platform, and application software) are provided as elastic and on-demand services over the Internet [1]. In this computing model, all computations and data storage are done by remote hosts located in the cloud providers data centres. These remote data centres, which utilize virtualization technology for the consolidation of multiple virtual servers in physical servers, are what is technically referred to as the cloud [9]. The deployment of cloud computing offers many advantages that outweigh the deployment of conventional datacentres. For example, cloud computing provides computing resources on demand and in an elastic manner so that the resources can be increased or decreased according to cloud consumers needs. Furthermore, cloud computing eliminates up-front commitments by cloud users, simplifies servers operation and management, and improves resource utilization via physical servers virtualization [43]. 20

21 2.1.1 Cloud Deployment Models The cloud computing environment is called a public or sometimes external cloud, when the cloud services are delivered on a pay-as-you-go basis to the public [10]. Therefore, in a public cloud, the cloud providers and users belong to different organizations or companies. Many international companies such as Amazon, IBM, Microsoft and Oracle provide public cloud services. Due to security considerations such as availability, data privacy and others, some organizations build their own private or internal clouds which are only accessible by authorized users within the organization and not publically accessible to other non-authorized users. In contrast to the public cloud, both cloud users and providers in a private cloud belong to the same organizational entity. In between, the hybrid cloud provides some services in-house by making use of a private cloud, and other services are provided by public clouds. On one hand, hybrid clouds use private clouds for supporting the security critical services. On the other hand, the non-critical services are performed by the public cloud to take the advantage of the scalability and cost effectiveness provided by the public cloud [9], [10]. Figure 2-1 which is taken from [10], shows different cloud deployment models. Figure 2-1: Cloud Deployment Models, from [10] 21

22 2.1.2 Cloud Services Architecture The architecture of cloud computing services is represented by a layered or stack model, in which each layer has specific functions and provides its own services [10]. There are differences in the number of these layers between proposals, and also over time new layers may be added to provide specific services. Regardless of the number of layers, the cloud environments generally follow the Everything as a Service (XaaS) paradigm, where X is a variable that refers to the category of the provided service. According to [10], the cloud environment may support four distinct services in four main layers namely Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS), and finally Human as a Service (HaaS). Figure 2-2, from [10], shows the cloud services layered architecture. Figure 2-2: Cloud architecture stack diagram, from [10] As shown in Figure 2-2, the IaaS layer consists of two sub-layers namely, the resource set and the infrastructure services. The resource set sub-layer represents the physical and virtualized resources such as processing, storage, memory, and bandwidth. The infrastructure services sub-layer provides services for the infrastructure such as storage services as in Amazon S3[11], Dropbox [12] and Google Big Table [13] or virtual server recovery as in Bluelock virtual recovery[14]. 22

23 The PaaS layer provides developer oriented services such as different programming and execution environments and database management systems (DBMSs). Examples of well-known PaaS: Google App Engine [15] and Facebook platform [16]. SaaS provides applications directed for end users such as office suites, image processing or customer relationship management applications (CRM). SaaS relieves end users from installing and updating software. Examples of SaaS: Google Docs [17], Adobe Photoshop Express [18] and Salesforce.com [19]. Finally, HaaS layer shows that the cloud computing paradigm includes not only IT services but also services provided by people. The main category of HaaS layer is crowdsourcing, where work can be done online using a crowd of people. Crowdsourcing enables providing of services that are either impossible or can t be easily or accurately done by computers, such as design services and accurate full text translation. Amazon Mechanical Turk [20] is an example of a marketplace that offers crowdsourcing services Cloud Computing Enabling Technologies Technically speaking, cloud computing represents the normal evolution of computing paradigms, and hence it is not a revolution. Cloud computing makes use of a number of existing technologies such as distributed computing, autonomic computing, virtualization, web services, service oriented architecture (SOA) and Web 2.0. Among these technologies, virtualization technology is considered to be of a higher importance as it represents the most significant difference between a traditional data centre and a cloud data centre [21]. As a result of its importance, besides its relevance to this project on Adaptive Cloud Scheduling, virtualization technology will be discussed in some details. 2.2 Virtualization Technology Cloud computing wouldn t be feasible without the existence of virtualization technology. Virtualization is a technique that breaks the physical computing resources in a host/server down into a number of fully isolated virtual environments or machines 23

24 with different operating systems and applications [10]. Hardware virtualization aids in reducing hardware costs and energy consumption, as one physical server can be logically used for creating a group of virtual servers, which in turn increases resource utilization. Virtualization provides a number of tangible benefits for the cloud providers, the cloud users, and the environment. For cloud providers, virtualized servers can use the physical resources efficiently by fully utilizing the existing resources which reduce operational costs. In addition, the management of virtual servers can be automated so that no user intervention is required to allocate virtual machines to the physical hosts. For cloud users, the process of building servers and running the applications becomes easier and faster. Moreover, the CEOs also became happy as they only pay for the amount of used resources, reducing the need for overprovisioning to cope with peak demand. Finally, could computing helps in reducing power consumption which in turn aids in reducing CO2 emissions. Virtualization technology can be applied on different levels such as operating system, platform (either full virtualization or paravirtualization), storage, and applications level [10]. Operating system virtualization creates multiple identical isolated containers that use the same operating system kernel. These containers are also called jails, virtual private servers (VPS) and virtualization engines (VE). This technique of virtualization is commonly used in virtual hosting environments as the resulted overhead is small compared to other virtualization techniques. Figure 2-3, from [10], illustrates operating system virtualization. Figure 2-3: Operating System virtualization, from [10] In contrast to operating system virtualization, Platform Virtualization enables users to run different operating systems simultaneously. There are two types of platform virtualization, the first type is full virtualization while the second type is 24

25 paravirtualization. Full virtualization emulates the entire virtual machine and deploys either a hypervisor or a hosted architecture. In the hypervisor architecture, also called bare-metal, the virtualization layer (hypervisor of virtual machine monitor (VMM)) is directly installed above the hardware as shown in Figure 2-4 from [10]. In the hosted architecture, the VMM is installed over the host operating system and the hosted guest OSs are located above the VMM as shown in Figure 2-5. VMware workstation and Oracle virtual box are two examples of desktop virtualization. In contrast to the OS virtualization, the hosted architecture allows for creating a number of virtual machines with different operating systems. In, paravirtualization there is a communication between the guest OS and the hypervisor layer and it involves modifications to the operating system kernel. Figure 2-4: Hypervisor based virtualization, from [10] Applications Applications Applications Guest OS 1 Guest OS 2 Virtual Machine Manager (VMM) Host Operating System Hardware Figure 2-5: Hosted Virtualization 25

26 In storage virtualization, multiple network storage devices are pooled together to be logically seen as a single storage device. This pooling of the storage resources makes the tasks of backing up and recovering easier due to the central management of the distributed storage. Network virtualization allows the creation of a number of virtual local area networks (VLANS) from a single physical network. This type of virtualization makes cloud resources appear in the local network. In addition to that, cloud services can be accessed via virtual IPs instead of real ones. Application virtualization insulates the application software from the OSs in which it is running on [10]. Application virtualization allows running of a specific OS applications in another OS. For instance, to run MS Windows applications on Unix OS, the user can use software such as Wine [22]. Virtualization allows consolidating a number of VMs into one physical host instead of running multiple independent hosts. This VMs consolidation makes the process of creating a new servers easier and on demand [23]. Furthermore, it contributes to minimizing energy consumption in cloud environments compared to the traditional data centres. However, minimizing energy consumption in the IaaS layer of a cloud environment is still a research challenge. Solving this challenge efficiently, contributes to building green cloud computing environment [21]. 2.3 Green Computing Green computing, also called green IT or energy-efficient computing, aims at building energy efficient computers or any other technology that is environmentfriendly [21], [24]. Moreover, green computing is also concerned with reducing the energy consumption of computing resources by implementing energy efficient techniques. These techniques involve saving energy consumption by all possible computing resources. These resources include CPU, storage, cooling systems, interfaces, and even network devices in the cloud data centre. Improving energy efficiency by minimizing energy consumption by all computing resources contributes to creating a green cloud computing environment [21]. Green cloud computing solutions are for the good of cloud users, cloud providers, and the environment. Building such computing solutions bring about saving energy consumption, minimizing operating costs and hence saving money. Moreover, 26

27 the amount of carbon dioxide emitted will be reduced which has a positive impact on the environment. 2.4 Related Work Cloud resource Scheduling is the process of allocating either VMs onto physical hosts or workload onto VMs, according to constraints given by both the cloud users and providers [25]. According to one of the existing classifications, cloud scheduling techniques can be either static or dynamic [26]. Static scheduling requires the existence of information about the whole resources and tasks by the time the application is scheduled. In addition, the resources are assumed to be available all the time. While in dynamic scheduling, the scheduler doesn t have any knowledge about the resources in advance, which means that any failure or change of resources is considered and handled through rescheduling. Various proposals addressed scheduling in the cloud and other distributed environments. Cardosa et al. [27] addressed the allocation of virtual machines to physical hosts with the goal of minimizing energy consumption in virtualized heterogeneous computing environments. The authors made use of existing features in virtualization technologies such as Xen[28] and VMware[29]. Existing VMMs already provide a number of handy parameters such as min, max, and shares. The min and max parameters are used for specifying the minimum and maximum amount of resources that can be allocated to the VM. On the other hand, the shares parameter defines the amount of physical resources that can be shared among overloaded VMs. Previous allocation techniques didn t take advantage of these parameters. Experiments showed that making use of these parameters improves the data centre utility by 47%. In their approach, the amount of resources allocated to VMs can be fine-tuned based upon power consumption and application utilities. The first uncovered problem with this work is that it only handles static allocation, which means that the resources assigned to the VMs can t be adjusted during run-time. In addition, it didn t uphold rigid SLAs and requires previous knowledge of the priorities of the applications to define the shares parameter. Finally, the CPU was the only resource taken into account when making VM reallocation. 27

28 Verma et al. [30] have implemented a cost-aware technique for the dynamic allocation of applications to virtual machines. The cost-aware technique handles both the power and migration costs. The authors have applied heuristics for the bin packing problem. They depended upon continuous optimization to handle the balance between the power consumption and performance. However, the proposed algorithms didn t support strict SLA requirements, and the violation of the service level agreements can occur because of the workload variability. In addition, this work addressed the applications-to-vms placement problem but didn t handle the VMs-to-hosts allocation problem. Despite the great efforts done on the previously stated resource allocation techniques in [27] and [30], they either concentrated on the application placement or didn t strictly handle the required SLA. Anton Beloglazov et al. [31] developed an energy-aware VM allocation technique by following a heuristic approach that manages the energy-performance trade-off. This heuristic method is primarily based upon the analysis of historical data of the virtual machines resource usage. This work followed a divide and conquer approach for solving this problem by dividing it into four smaller sub-problems. This heuristic action based approach will be described in more detail as this is the main work that the proposed project will be compared against Heuristic Approach The first sub-problem to be addressed by Anton Beloglazov et al. [31] was when to migrate a virtual machine. They again, divided this sub-problem into two other sub-problems which are host overload detection and host under-load detection. An overloaded host requires the transferring of a number of virtual machines from this overloaded host to another non-overloaded host. This migration process helps in meeting the service level agreement (SLA) by avoiding performance degradation. On the other hand, all virtual machines in an underutilized host should be transferred/migrated to another host. After this migration, this host should be switched into sleep mode to minimize energy consumption and hence improve resource utilization of host or hosts to which the virtual machine are migrated. 28

29 Host Overload/Underload Detection Three basic techniques are used to determine when a host is considered overloaded [31]. The first one uses static thresholds for both the upper and lower bounds of resource utilization. The overall CPU utilization by all VMs in a host should always be between the upper and lower limits. This technique can be easily applied and may be effective in environments with static workload. However, this static threshold will not be appropriate for environments with dynamic, changeable and unpredicted workloads. The other two techniques were able to adjust the utilization threshold value based upon the applications workload patterns. The first one is based upon an adaptive utilization threshold using either Median Absolute Deviation (MAD) or Interquartile Range (IQR). MAD is a measure of statistical dispersion, and it is more robust than standard deviation and variance as it is more resistant to outliers [31]. The second technique relied on either Local regression or Robust Local Regression (LRR). According to the authors evaluation, the local regression-based algorithm was better than the other techniques. The author proposed a straightforward algorithm for host under-load detection. The algorithm tries to move all virtual machines in the current host to other hosts. If this migration process is possible, then the host is considered as under-loaded, else the host is said to be overloaded VM Selection The second sub-problem was, which VMs from the overloaded host should the algorithm select for migration? The author proposed three different techniques for VM selection namely, minimum migration time policy (MMT), random selection policy (RS) and the maximum correlation policy (MC). Using the MMT policy, the selected VM is the one that can be migrated faster than any other virtual machine in the overloaded host; i.e. the one with the least migration time. The random selection policy chooses the virtual machine that will be migrated based upon uniformly distributed discrete random variable; the values of this random variable represent the set of VMs allocated to the host [31]. The maximum correlation policy (MC) selects virtual machines that have the highest correlation of the CPU utilization with other VMS on 29

30 the same host. This correlation is calculated by applying multiple correlation coefficient. According to the authors evaluation, the MMT was the best policy for VM selection VM Placement The third sub-problem was the placement of the VMs selected for migration from the overloaded and underutilized hosts. The VM placement problem is an example of the bin packing problem. The author solved this problem by making some modifications to the best fit decreasing (BFD) algorithm [32] so that it will be poweraware and named this modified algorithm as Power Aware BFD (PABFD). The PBFD algorithm works as follows: the algorithm sorts the list of virtual machines in a decreasing order according to their CPU utilizations. Each virtual machine in the VM list is allocated to the host that will yield the lowest increase in power consumption after the allocation process. By applying this algorithm, machines that are more energy-efficient are chosen Switching idle hosts off The fourth and the last sub-problem was which and when the host will be turned on or off? Switching an idle host off will save power by eliminating the 70% of the power consumed when the host/server is entirely idle. Some idle hosts should be reactivated and switched on to in case of any violation in the SLA. The work in [6], [31] used heuristics for the adaptive cloud scheduling problem. In this work, an alternative approach is being deployed for finding an efficient and robust scheduling for the same problem. The proposed approach will utilize utility functions along with genetic algorithms; therefore, the following section will discuss utility functions and how they can be used for creating self-management systems Utility Functions Utility functions provide a common framework for creating self-managed and self-optimized autonomic computing systems by capturing the preferences of an agent [33]. This agent can be either a human being or software that acts on a human s behalf, 30

31 and its preferences are expressed in terms of a multi-attribute utility function. The agent selects the state that maximizes the utility, and the best utility can be obtained from the state with the largest value [34]. The adaptive cloud scheduling problem is a self-management and selfoptimization problem, where the cloud datacentre manager should manage VM provisioning according to the desired objectives. Utility functions are a well-known method for representing an agent s preferences in autonomic computing systems [34]. They are relevant for autonomic computing systems as they focus on the desired state by providing a clear and straightforward basis for decision making. The autonomic computing problems are usually addressed in real-world systems using one of the following three policies namely the rule-based action policies, goal policies, and utility functions. Figure 2-6 from [34] exhibits a general framework that can be applied to the three previously stated policies. For understanding how this framework acts, suppose that we have a system that has a number of states. Each state S is represented as a vector of attributes. The current state will be transited to a new sate σ based upon the taken action a. This means that different actions to the current state might lead to possible different state. Figure 2-6: Current State, Action, and Possible State, from [34] Action policies are typically represented in the form of IF (condition) THEN (action), where the condition is the current state of the system. This type of policy 31

32 doesn t explicitly specify the new state that the system will reach after applying the action. To illustrate this concept, suppose that the goal is to minimize energy consumption while meeting SLA in the cloud datacentre. Host utilization reflects energy consumption as energy consumption is proportional to host utilization [44]. We assume that when the host is 100% utilized, this means that the host is overutilized, and some VMs need to be migrated to another host for meeting SLA. Moreover, if the host utilization is less than or equal to 30%, for example, this means that it is underutilized and wastes power. In this case, all VMs in this host underutilized host should be migrated to another host and this host switched to sleep mode. The pseudo-code in figure 2-7, demonstrates an example of using action policies in autonomic computing environments. IF (HostUtilization >= 100%) THEN migratesomevmstoanotherhost() ELSE IF (HostUtilization <= 30%) THEN migrateallvmstoanotherhost() END IF Figure 2-7: The Action policies example Where HostUtilization represents the percentage of host utilization, migratesomevmstoanotherhost() migrates some VMs from the host till the host become not over-loaded for meeting SLA, and migrateallvmstoanotherhost() migrates all VMs from under-utilized host for saving energy consumption. The heuristics based approach presented in [6] and [31] utilized action policies for the adaptive cloud scheduling problem. Goal policies do not specify exactly what should be done in the current state, and instead only specify the desired outcome. The system computes the action that will cause it to move from the current state to the state with the desired properties [34]. The pseudo-code in figure 2-8, demonstrates how goal policies can be used in autonomic computing environments. 30% <= HostUtilization <= 100% Figure 2-8: The goal policies example 32

33 Therefore, only the desired state needs to be specified and the system will determine how to reach this state. Actually, the goal policies perform a kind of binary classification against the state of the system. This classification will be either accepted or rejected according to the goal policy. Here in the example in Figure 2-8, there will be a conflict if it can t satisfy the goal even if the state is actually correct. For example, suppose that the utilization of a host is 20%, this means that it will be rejected by the goal policy, although it might be the only possible state as the VMs representing the 20% of the host utilization cannot be migrated to another host. Utility function policies can be viewed as an extension of goal policies, but the desired state needn t be specified in advance. In contrast to goal policies, the desired state is computed by repeatedly selecting the state with the highest utility from the feasible ones. This means that utility functions do not perform any kind of classification done in goal policies. The pseudo-code in Figure 2-9, exhibits solving the same problem previously solved by action and goal policies. Utility(a, t) = Income(a, t) TotalEnergyCost(a, t) Income(a, t) = IncomePerVM(VM, costpercpu) VM a PridectedViolationCost(a, t) Figure 2-9: Utility policy example Where a is the assignment of the list of VMs to the hosts list in the datacentre, and t is the time spent in the assignment. The calculation of both the TotalEnergyCost(a,t) and PredictedViolationCost(a,t) is based on the host and VM utilization level. The utility is expressed in terms of monetary values or any other value. The objective is finding a robust assignment that maximizes the overall profit. Theoretically, goal policies and utility function policies are more relevant to self-managing systems and autonomic computing than the action policies as they focus on the required state rather than the current state [34]. Moreover, utility functions are considered to be better than goal policies as they contribute to a more flexible behaviour. On the contrary, goal policies can t express fine-grained agents preferences as they only perform a kind of binary classification against the current 33

34 state of the system. This means that it accepts the state as long as it satisfies the policy even if there is a better state that it should consider instead. 2.5 Cloud Computing Simulation Tools It s time to think about the implementation and the evaluation of the research after the background reading and the literature review that have resulted in deploying utility functions for the adaptive cloud scheduling problem. The implementation of real cloud computing environments for testing the proposed solution is not simple and costs time and money. Moreover, the performance evaluation of cloud scheduling using different applications and service models under different conditions is a challenge in a real cloud environment. Generally, simulation tools are widely used to simulate the behaviour of real devices and environments. There many simulation tools that are being used for implementing research work in computer networks such as NS-2 [35], and OPNET [36] and in grid systems such as GridSim [37], and MicroGrid [38]. Using simulation tools, the implementation and the evaluation processes in cloud computing environments will be possible and easier. Furthermore, anyone who knows who to use the simulation tool can reproduce the tests Existing Cloud Simulators Currently, there is a number of cloud simulation tools such as, CloudSim [5], GreenCloud [39], icancloud [40], and MDCSim[41]. CloudSim, GreenCloud and icancloud are open source, while MDCSim is commercial. CloudSim uses the Java programming language, GreenCloud uses C++/OTcl, MDCSim uses C++/Java, and icancloud uses C++. CloudSim, GreenCloud, and MDCSim neither support parallel experiments nor models for the public cloud providers. On the contrary, icancloud supports parallel experiments and provides models for Amazon public cloud. According to the comparison in [40], icancloud is the most powerful compared to CloudSim, GreenCloud, and MDCSim. However, CloudSim will be used for the implementation of this project so that the project results can be easily compared with the results found in the heuristic approach [6] that was also implemented using 34

35 CloudSim. The following section discusses the process of creating a cloud environment using the CloudSim toolkit CloudSim CloudSim is a cloud modelling and simulation tool that is used for simulating both the cloud computing infrastructure and the cloud services. Technically speaking, CloudSim is an open source library that was built using the Java programming language. CloudSim was built at the University of Melbourne by the Computer Science and Software Engineering Department in the Cloud Computing and Distributed Systems (CLOUDS) Laboratory [42]. Developers can extend or replace existing Java classes, and new algorithms and scenarios can be added CloudSim Architecture CloudSim was created based on a layered architecture with three basic layers namely, user code, CloudSim and the core simulation layer. This layered architecture makes it easier to add new classes or update existing ones. The CloudSim core simulation engine is located at the bottom of the CloudSim stack. This engine is responsible for processing different simulation events and creating main entities of the cloud such as the data centre, host, VM, and the broker. In addition, this layer is also responsible for managing queues and handling communication between existing entities. The CloudSim layer is located in the middle of the stack, with functionalities such as the allocation of existing VMs to hosts besides the allocation of CPU, memory, storage and bandwidth resources. All the work in this project will be done on the CloudSim layer. Finally, the user code layer is at the top of the stack which allows the cloud user to apply different cloud scenarios such as the number of hosts and VMs in addition to the number and size of applications based upon the user requirements. Figure 2-10, from [5], shows the CloudSim architecture. 35

36 Figure 2-10: CloudSim architecture, from [5] Modelling the Cloud The first step towards modelling a complete cloud environment is the creation of the required data centres. These data centres, with their physical hosts and VMs, model the cloud infrastructure. CloudSim provides the Datacenter class for creating the cloud environment data centres. CloudSim defines the characteristics of a data centre using the DatacenterCharacteristics class. This class represents the resource properties such as resource architecture, the management policy either time-shared or space-shared, the operating system, the cost and the time zone where the resource is located. After creating the data centre and defining its characteristics, the policy for allocating VMs to physical hosts should be defined. CloudSim provides a class named VmmAllocationPolicy which selects a host from the host list for VM deployment [5]. CloudSim uses the Host class for modelling the physical server. This class has a number of attributes that define the host capabilities such as the number of available CPU cores, the amount of RAM, storage, and bandwidth. The VmScheduler class 36

37 defines how the host CPU core will be allocated to the VMs using either time or space sharing. The time shared policy dynamically distribute the capacity of the existing cores among the VMs. However, space shared policy assigns specific CPU cores to specific VMs. The virtualization technology is at the heart of cloud computing, CloudSim uses the VM class for modelling the virtual machine. This class stores the VM characteristics such as the number of CPU cores per VM, the memory size, priority, and the virtual machine manager (VMM). The cloudletscheduler class is used for scheduling the applications workload to the available CPU resources. The cloud application services are represented by the Cloudlet class. Each cloudlet has its own size or length. Furthermore, CloudSim models the cloud market based on a layered approach which is represented by cost metrics for both the IaaS and SaaS models. Moreover, CloudSim models the network behaviour, dynamic entities creation, federation of clouds, and datacentre power consumption. To sum up, this chapter discussed the required background readings related to cloud computing and green cloud computing environments. Furthermore, a review of the literature about adaptive cloud scheduling has been conducted. In addition to that, the existing cloud simulation toolkits have been reviewed. All that have been done so far made it easy for taking the correct decision concerning the strategy and the tools for solving the research problem. Therefore, the decision is to make use of utility functions for solving the adaptive VMs-to-hosts assignment problem. Furthermore, CloudSim toolkit is the cloud simulation toolkit that is going to be used for simulating the research problem. The next step should be about choosing the utility attributes, defining the utility function and designing the cost model. Everything related to the design is described in the following chapter. 37

38 Chapter 3 : Design The next step that should follow the background reading and the review of the related work is designing the strategy that will be used for solving the problem. Designing the strategy is the purpose of this chapter, which begins with describing the green cloud computing environment architecture followed by the input, processing, and output (IPO) model of the adaptive cloud scheduling problem. The goal of the design is delineating the approach that will be followed for solving the problem. This approach begins with defining the utility function and designing the cost model. The final step is designing the genetic algorithm which searches the search space for an efficient assignment that maximizes the utility. 3.1 System architecture Cloud architects are seeking to build a green cloud computing architecture that efficiently saves energy consumption and doesn t violate SLAs. Figure 3-1, from [45], shows a general system architecture of a green cloud computing environment that can be set up to provide an energy-aware resource scheduling solution. This architecture essentially consists of four different layers ranging from the cloud consumers to the physical hosts in the datacentre. Figure 3-1: Green cloud system Architecture, from [45] 38

39 The first layer represents the cloud consumers or their brokers who request cloud services from the cloud providers. In this architecture, the cloud consumer could be different from the cloud user, as the cloud consumer might be any organization that hosts its applications at a cloud provider so that these applications are accessible to the cloud users. The second layer, green service allocator, represents the interface between cloud consumers and the physical infrastructure of the cloud provider. This layer is actually divided into two sub-layers, the first sub-layer represents the interface to the consumer, while the second is an interface to the cloud resources. The consumer interface sub-layer is responsible for negotiating the service level agreement (SLA) with the cloud consumer and determining the penalties resulting from any violations to the SLA. The agreement involves the service cost, the expected quality of service (QoS) and the penalties in case of SLA violations. The consumer interface sub-layer analyses the consumer requests by the service analyser component, and the result of this analysis will be either accepting or rejecting these requests for cloud services. Moreover, it also assigns privileges and prioritizes users according to their characteristics through the consumer profiler component. Finally, it calculates the cost of the provided services through the pricing component. The cloud interface sub-layer schedules the computing resources and services by the service scheduler component. The resource utilization and the associated costs are monitored by the accounting component and it makes use of historical data for improving the scheduling process. This sub-layer is also responsible for tracking energy consumption, through the energy monitor component, and this tracking information can be used later for determining whether a VM should be switched on, off, or turned into any other power saving modes such as the sleep mode. Most of the research in service scheduling and monitoring in cloud computing is performed in the sub-layers of the green service allocator layer. The third layer, represents the collection of the VMs built above the physical servers in the cloud datacentre. These VMs are dynamically created, deleted and migrated among running hosts according to consumers needs. The last layer is the physical infrastructure of the cloud computing environment, which consists of a number of hosts. Energy-efficient adaptive resource scheduling should decide when 39

40 and which physical servers should be turned on or off for energy saving while meeting SLAs which is the ultimate goal of this research. 3.2 Input, Processing and Output Model Figure 3-2, exhibits the input, processing, and output (IPO) model which provides a general schema of the proposed solution to the adaptive cloud scheduling problem. Input: N physical hosts, each host is characterized by parameters such as CPU, defined in million instructions per second (MIPS), RAM, storage and bandwidth. Cloud consumers submit requests for the allocation of M VMs characterised by the same parameters of the host but with different values. Cloud users or their brokers will submit cloud applications with different requirements defined by the cloudlets. Processing: The processing is divided into two main parts, the first part is the Initial cloud scheduling algorithm while the other is the adaptive cloud scheduling algorithm. The initial cloud scheduling is responsible for accepting new requests from the cloud consumers, and either accepts or rejects these requests according to the availability of resources. The initial allocation actually allocates VMs according to the resources required by the cloud consumer. The initial allocation of VMs to hosts can be seen as a bin packing problem where the physical hosts with variable capabilities represent bins with variable sizes. The VMs that need to be allocated represent the items that need to be placed in the bins. This problem can be solved using the best fit decreasing algorithm (BFD) [32]. The adaptive allocation uses the output of the initial allocation as its input to produce an adaptive scheduling according to resource utilization rather than the anticipated resource requirements used in the initial allocation. The adaptive allocation makes use of utility functions to make a robust and efficient assignment that maximizes the overall utility. A genetic algorithm is used to search for the assignment that maximizes the utility function. The details of the utility function definition, its attributes and the cost model development are presented in Section

41 Output: The output will be the adaptive VMs-to-hosts assignment according to the utilization and SLA requirements. This output will print the values of some performance metrics such as the energy consumption and the percentage of SLA violation. Figure 3-2: Input, Processing and Output of the Adaptive Scheduling Problem 3.3 Utility Function Definition The utility function specifies the self-managing policy adopted for the adaptive cloud scheduling problem. The overall goal of the utility is maximizing the benefit of the adaptive allocation of VMs by minimizing energy consumption and minimizing any sources of violation to the negotiated SLA. As a result, the properties of the utility function will be the total amount of energy consumption (E) and the percentage of SLA violation (SLAV). The high level definition of the utility of the assignment of the VMs list to the hosts list is formulated as follows in (1): Utility(a, t) = PredictedEnergyCost(a, t) + PredictedViolationCost(a, t) + PDMCost(a, t) (1) Where a is a vector representing the assignment of the list of VMs to the hosts list in the datacentre; t is the total time of this assignment which is the same as the time of the scheduling interval; PredictedEnergyCost(a,t) is the cost of energy consumed due to the assignment. Any violation in the SLA will expose the cloud provider to a penalty 41

42 which should be paid to the cloud users. In this utility definition, there are two different violation costs that are going to be computed, the first one is PredictedViolationCost(a,t) which represents the cost of SLA violation and is computed by counting the number of VMs that are in violation due to the assignment. The second source of violation is the PDMCost(a,t) which is the penalty due to the degradation in VMs performance resulting from the migration of VMs among hosts. Maximizing the utility is achieved by minimizing the sources of cost defined by the different parameters in the utility definition in (1). Therefore the utility should be expressed by an inverse relationship with the summation of different costs. This makes the final definition of the utility function be as follows in (2). Utility(a, t) = 1 PredictedEnergyCost(a, t) + PredictedViolationCost(a, t) + PDMCost(a, t) (2) The total energy cost in the datacentre will be the summation of the predicted energy cost per host. The algorithms for calculating the predicted energy cost, the predicted violation cost and the cost of performance degradation due to migration are all shown in the cost model development, section 3.4, and its sub-sections. 3.4 Cost Model Development The cost model involves the prediction of the amount of energy consumed due to the allocation. Predicting CPU utilization is crucial for calculating the expected energy consumption [4], [44]. The proposed algorithm for computing the expected CPU utilization is based on the percentage of VM utilization at the time of the allocation. Moreover, the Markov chain prediction model can be used along with the proposed algorithm for predicting the future level of CPU utilization. Therefore the following section will discuss Markov chains and how they can be used for predicting CPU utilization, which in turn is used for calculating energy consumption. 42

43 3.4.1 The Finite Discrete Markov Chain Prediction Model The finite discrete Markov chain will be used for modelling CPU utilization. A finite Markov chain is a discrete stochastic process with a finite number of states defined at integer values of time n = 0, 1,.... A discrete system is represented by a set S of states" and transitions between the states and S is referred to as the state space [46]. A discrete stochastic process is a discrete system in which transitions occur randomly according to some probability distribution. The finite Markov chain has an initial probability distribution vector and a transition probability matrix. The initial probability distribution vector V is an n-dimensional vector that holds the probability distribution over the states (P(s1),..., P(sn)) where P(si) is the probability of each state. However, the transition probabilities matrix is an n X n matrix T = (Pij) where Pij is the probability of moving from a state i to a state j; therefore, Pij is a conditional probability Pij = p(j i). The transition probability matrix is a square matrix containing non-negative real numbers and the summation of each row in this matrix is one [46]. Now, the relationship between CPU utilization and energy consumption is clear. Moreover, Markov chains can be used for modelling CPU utilization. The level of CPU utilization can be predicted by multiplying V * T so, now we investigate the modelling of CPU utilization using Markov chains Modelling CPU Utilization using Markov Chain Our goal here is to define both the initial probability distribution vector and the transition probability matrix so that we can predict the CPU utilization level. For defining the vector and the matrix, the states of CPU utilization need to be defined first followed by a state-based probability model. The level of CPU utilization will define the different states of the system, such that each level will define a different state. This level will be defined by the percentage of CPU utilization over the specified period. CPU utilization data should be collected in different time intervals from the datacentre hosts log data. Off-line prediction is then enabled using a Markov chain based on CPU utilization log data. In Markov chain, every event in the sequence is 43

44 only dependent on the event that occurred directly before it. The first step will be defining the different states of CPU utilization Definition of System States CPU utilization will be partitioned into a number different levels where each level represents a different state. In this work, the CPU utilization level will be partitioned into 11 different intervals ranging from 0% to 100%, which define the different states of the system. Random variables S1, S2,. Sn will be used for defining these level as follows: S1: represents the CPU utilization level in the range [0, 10) %. S2: represents the CPU utilization level in the range [10, 20) %. S3: represents the CPU utilization level in the range [20, 30) %. S4: represents the CPU utilization level in the range [30, 40) %. S5: represents the CPU utilization level in the range [40, 50) %. S6: represents the CPU utilization level in the range [50, 60) %. S7: represents the CPU utilization level in the range [60, 70) %. S8: represents the CPU utilization level in the range [70, 80) %. S9: represents the CPU utilization level in the range [80, 90) %. S10: represents the CPU utilization level in the range [90, 100) %. S11: represents the CPU utilization level of 100 % server is fully utilized. The last state S11 is used for computing SLAV in case that the server is fully utilized 100%. Knowing the different states of the system should be followed by creating and calculating both the initial probability distribution vector and the transition probability matrix Initial Distribution Vector The initial probability distribution is an n dimensional vector that holds the probability distribution over the states of the system. This means that the probability of each state 44

45 should be computed first or it can be randomly generated. The initial distribution vector V will have the format (P(s1),...., P(sn)), where P(si) is the probability of state i. The probability of each state in V will be calculated using P(si) = n(si)/n(s), where n(si) is the total number of events going from si to sn and n(s) is the total number of possible events. The predicted distribution vector after one move will be the result of multiplying V * T, where T is the n x n transition probability distribution matrix [46]. Historical data will be used for computing the initial distribution vector V over the different states of the CPU utilization level CPU Utilization Transition State Diagram The state transition diagram is used to show the states of the Markov Chain system and the transition probabilities between those states. Figure 3-3, shows the state transition diagram for our CPU utilization example using only 4 states. S2 S1 P(S1 Sn) P(Sn S1) Sn S3 Figure 3-3: CPU Utilization Transition State Diagram 45

46 Where P(Si Sj) is the conditional probability which is the probability of state Sj given state Si. This transition state diagram will be used for building the transition probability matrix Transition Probability Matrix The transition probability matrix is an n X n matrix that shows the conditional probability of moving from one state to another. This matrix will be represented as follows: State S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 1 P 11 P 12 P 13 P 14 P 15 P 16 P 17 P 18 P 19 P 10 S 2 P 21 P 22 P 29 P 210 S 3 S 4 S 5 S 6 S 7 S 8 S 9 P 91 P 92 P 99 P 910 [ S 10 P 101 P 102 P 109 P 1010 ] Where Pij is the conditional probability of moving from state i to state j, therefore the previous matrix will be rewritten as follows. State S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 1 P(S 1 S 1 ) P(S 2 S 1 ) P(S 9 S 1 ) P(S 10 S 1 ) S 2 P(S 1 S 2 ) P(S 2 S 2 ) P(S 9 S 2 ) P(S 10 S 2 ) S 3 S 4 S 5 S 6 S 7 S 8 S 9 P(S 1 S 9 ) P(S 2 S 9 ) P(S 9 S 9 ) P(S 10 S 9 ) [ S 10 P(S 1 S 10 ) P(S 2 S 10 ) P(S 9 S 10 ) P(S 10 S 10 )] The data that will be used for computing the probability of moving among different states of the system will be historical resource utilization data. These historical data can be monitoring traces collected from virtualization servers. 46

47 Computing the Distribution Vector of the CPU Utilization We have defined the initial distribution vector V1 = (P(s1), P(s2),...., P(sn)), where n represents the entire number of states in the system. In addition to that, the transition probability matrix T containing the probability of moving from one state to another has also been defined. The predicted distribution vector will be the result of multiplying the initial distribution vector by the transition probability matrix. As an illustration, the formula V2 = V1 * T will be used for computing the successor predicted state V2 based on the current distribution V1 and the transition probability matrix T. The result will be a vector with each element representing the probability of a particular state. Consequently, a particular method should be applied for precisely determining the current state of CPU utilization. The exact state of CPU utilization needs to be determined after calculating the state based distribution vector. Now, we have a probability distribution vector of CPU utilization calculated, and the result will be in the form V = [P(s1),.. P(s10)]. The determination of the state of CPU utilization will be modelled by using the following formula from [50]: n P(S i ). Utility(S i ) i=1 Where n is the total number of states, and Utility(Si) is the percentage of CPU utilization of the state Si Markov Model Prediction Algorithm Given the current distribution vector and the probability transition matrix, the CPU utilization will be estimated as shown in Figure 3-4. The utilizationvector contains the probabilities of each state in the system. This vector will be calculated by multiplying the current distribution vector by the probability transition matrix. The exact state of CPU utilization will be computed by summing the multiplication of the probability of the state by its utility for all the states in our system. 47

48 utilizationlevel MarkovPrediction(currentUtilization []) { utilizationvector []; utilizationlevel; states[]; utilizationvector = currentutilization[] * transitionmatrix[][]; for i = 1 to length(utilizationvector) { utilizationlevel += utilizationvector[i] * averageofeachstate } return(utilizationlevel); } Figure 3-4: CPU Utilization Prediction Algorithm Using Markov Model Prediction CPU Utilization based on VM utilization In this work, genetic algorithms will be deployed to search for an assignment of VMs to hosts that maximised utility. The output of the initial allocation problem presented in Section 3.2 is the current assignment which is the allocation of the VMs according to the requested VM resources. The genetic algorithm produces a number of candidate assignments that need to be compared to the current assignment. If the best candidate assignment is better than (i.e. has a higher fitness than) the current assignment then the candidate assignment should replace the current. The key idea is passing the candidate assignment as a parameter to the PredictedUtilization(a) function for computing the predicted utilization. To explain the pseudo-code in Figure 3-5, which exhibits the algorithm for predicting hosts CPU utilization based on assigned VMs utilization, let s define two different types of assignment. The first assignment c represents the current assignment that is in use in the system and running at the moment. Initially, this assignment represents the output of the initial allocation. The second assignment a which represents the candidate assignment to which we might change to if its utility is high enough. 48

49 allpmsutilizations[] PredictedUtilization(a) { Assignment c = getcurrentassignment(pm) addedvms = the virtual machines associated with pm in a that are not in c removedvms = the virtual machines associated with pm in c that are not in a allpmsutilizations[] = 0 foreach pm in the Assignment a { u[pm] = pm.getutilization() + sum{ added.getutilization() added <- addedvms} - sum{ removed.getutilization() removed <- removedvms} // pmutilizations[pm] = MarkovPrediction(u[pm]) allpmsutilizations[] <- add(u[pm]) } return allpmsutilizations[] } Figure 3-5: The pseudo-code for computing CPU Utilization Where pm refers to the physical machine; pm.getutilization() returns the current CPU utilization of physical machine in MIPs, from monitoring. On the other hand, addedvms.getutilization() returns the total CPU utilization of the virtual machines (VMs) that are in a and not in c. The removedvms.getutilization() returns the total CPU utilization of the virtual machines (VMs) that are in c and not in a ; and u[pm] stores the utilization of the host in MIPS due to the candidate assignment. Due to the time constraints, the MarkovPrediction(u[pm]) is not implemented in the final implementation during calculating CPU utilization. The utilization is only calculated based on the utilization of the VMs and that s why the line of using MarkovPrediction(u[pm]) is commented in the algorithm. After knowing how to compute the level of CPU utilization on each host, now it is the time for calculating energy consumption and its related cost Calculating Energy Consumption There are two different techniques that can be used on this project for calculating energy consumption. The first technique is defining a power model and then calculating energy consumption. On the contrary, the other technique is based upon using real power consumption data such as the one provided in the SPECpower_ssj2008 benchmark [47] which shows the server s power consumption in Watts against the utilization level. 49

50 For defining the power model, host components that will be used for creating the model should be selected first. The CPU, memory, cooling systems, disk storage and network interfaces are examples of the hosts components that consume power. However, this work will be concerned only with the CPU as it is the resource that consumes most of the host s power [6]. Previous studies [4] showed that a completely idle server will still consume 70% of the power that a fully utilized host consumes. According to the work on [6], the power model, P(u), is can be estimated using the formula (3). P(u)= k. Pmax + (1 - k). Pmax. u (3) From [6] Where k, is the amount of power consumed when the server is totally idle which represents 70% of the maximum power consumption; Pmax is the maximum power consumption when the server is 100% utilized, and u refers to the CPU utilization. The CPU utilization varies by time as its value reflects the current workload which also varies over time. Therefore, CPU utilization is a function of time u(t), and the overall amount of energy consumed by the host (E) is the integral of the power model over the time in which the host is on. E = t1 t0 P(u(t))dt (4) From [6] Previously calculated real power consumption data provided by benchmarks consider all host components including memory in calculating power consumption. Using this previously calculated data makes it more realistic especially for hosts where memory consumes considerable amounts of power that shouldn t be neglected. In this work, the real power consumption data provided in the SPECpower_ssj2008 benchmark will be used for calculating power consumption. Two different servers are used in the experiment as they already provided by CloudSim, the first server is HP ProLiant ML110 G4 and the second one is HP ProLiant ML110 G5. Figure 3-6, from [31], shows the variation of power consumption of those two servers according to the level of utilization in Watts. Figure 3-6: Power Consumption according to different utilizations, from [31] 50

51 Energy Cost Prediction Algorithm The following algorithm in Figure 3-7, shows the pseudo-code for calculating the predicted energy cost for all running hosts in the datacentre. The calculation of the predicted energy cost depends upon the predicted CPU utilization which its calculation is shown in Section The power model will be computed using the real power consumption data shown in Section The pseudo-code in Figure finds the utilization of all physical machines by the method PredictedUtilization(a). The getpower(cpuutilization[pm]) returns the power according to the power model used. The result of this algorithm will be the cost of the total energy consumed due to the assignment in the specified time period (the time during which the assignment is active) float PredictedEnergyCost(a, t) { constant UNIT_ENERGY_COST_PER_SEC; t = schedulinginterval powerconsumedperpm[] = 0; totalpowerconsumed = 0; totalenergycost = 0; allpmsutilizations[] = PredictedUtilization(a) foreach pm in the assignment{ float cpuutilization[pm] = allpmsutilizations[pm]; powerconsumedperpm[pm] = pm.getpowermodel().getpower(cpuutilization[pm]); totalpowerconsumed += powerconsumedperpm[pm]; } totalenergycost = totalpowerconsumed * UNIT_ENERGY_COST_PER_SEC * t; return(totalenergycost) } Figure 3-7: Predicted Energy Cost Calculating Possible Sources of SLA Violation The required QoS level can be achieved by meeting the agreed upon SLA. Working with applications in the SaaS layer, the SLA can be defined in terms of attributes such as the maximum response time and the minimum throughput delivered by the cloud application. The values of these attributes commonly differ among different types of applications and hence shouldn t be used for defining SLA in IaaS 51

52 where each VM usually runs different applications. As a consequence, the attributes of the SLA in the IaaS should be workload independent so that it can be applied to any VM regardless the running applications. SLA in the IaaS environment is said to be met when the VM delivers, at any time, 100% of its capabilities [31] which are defined by the VM parameters. If the performance delivered by the VM is lower than the capabilities of the VM parameter, then an SLA violation (SLAV) happened. In this work, two sources of violation of SLA in the IaaS layer have been computed. The first one is SLA violation per host and this happens when the host is over-utilized. The second source is violation resulted from the migration of VMs among hosts during the running of the adaptive scheduling algorithm. The calculation of those two sources is described in the following two sub-sections Violation Cost Prediction due to Host Over-utilization The SLA violation per host occurs when the CPU utilization of that host is 100%. When the CPU utilization is 100%, due to the CPU scheduling overheads among existing VMs shared by the physical host, this inevitably leads to an SLAV in one or more of the VMs. In this case, the number of VMs in violation can be counted and the violation cost can be computed accordingly. Figure 3-8, shows the pseudocode for calculating the violation cost depending on the number of VMs in violation float predictedviolationcost(a, t){ violationscount = 0; violation = 0; t = schedulinginterval constant SLA_VIOLATION_COST_PER_SEC foreach pm in assignment a { VMList = list of VMs from a in pm VMList = sort(vmlist) by demand demand = sum of demands in VMList // current utilization supply = pm.gettotalmips() while(demand > supply){ violation++ demand = demand VMList[violation].demand } violationscount + = violation } return(violationscount * SLA_VIOLATION_COST_PER_SEC * t); } Figure 3-8: Violation cost depending on the number of VMs in violation 52

53 Violation Cost Prediction due to VM Migrations The second source of violation happens during VM migration, and the violation in this case depends upon the time consumed for completing the migration process successfully. This violation results from the degradation of performance during migration and hence is called performance degradation due to migrations (PDM) [31]. This violation can be computed as shown in Figure float PDMCost(a, t){ migratiotime = 0 pdmviolationcost = 0 constant PDM_COST_PER_SEC foreach vm in the assignment { if (the vm is migrated){ migrationtime = vm.getmips() / pm.getbw(); pdmviolationcost += migrationtime * PDM_COST_PER_SEC; } } } Figure 3-9: Pseudo-code for Calculating the Cost of PDM Where migrationtime is computed by dividing the VM processing capability by the network bandwidth available. The cost of performance degradation due to migration is the multiplication of the migration time by the cost of PDM per second. Now, we have completed the definition of the utility functions and designed the cost model for estimating everything required for computing the utility function. Therefore, we can now use this notion of utility to inform the search for an efficient assignment. 3.5 Optimization Algorithm Finding a good assignment rated against the utility (fitness) function involves searching all candidate assignments. A genetic evolutionary algorithm can be used for efficiently searching the search space for an appropriate assignment. Genetic algorithms are one of the evolutionary computation methods that is considered as a powerful stochastic search and optimization technique based on the Darwin s principle of natural selection [48]. Genetic algorithms look for a good solution rated against 53

54 fitness criteria. Genetic algorithms search a population of individuals in parallel, therefore it has the ability to avoid being trapped in locally optimal solutions like traditional methods. Moreover, rather than optimizing the parameters themselves, genetic algorithms work on the chromosome, which is an encoded version of potential solution parameters. Figure 3-10 from [49] shows a general framework for the evolutionary algorithm. The evolutionary algorithm starts with initial population followed by parent selection, mutation and crossover recombination until the algorithm converges. Figure 3-10: General framework for the evolutionary algorithm, from [49] The following sub-sections show the basic components of the genetic algorithm according to [50]. These sub-sections also show the design requirements of each of these components for finding an efficient assignment for the adaptive cloud scheduling problem Representation design The representation of the solution or the assignment of the VMs to the physical hosts where any adaption will be a kind of modification to this representation. The representation will be a vector of m elements, where each element represents a VM and the value of that element is the ID of the host to which the VM is assigned. Figure 3-11, show the representation of the solution vector of our problem. 54

55 Host4 Host1 Host1 Host9 Hosti VM1 VM2 VM3 VM3 VMm Figure 3-11: Solution vector representation According to the research problem, the representation in Figure 3-11 is simple and represents the solution efficiently Initial Population For the genetic algorithm to work, there should be a way for creating an initial population of solutions to the problem. The initial population consists of a number of individuals where each individual is considered a candidate solution to the problem. In this work, the individual will be a vector of elements, where each element represents a VM and the value of that element is the ID of the host to which the VM is assigned as exhibited in Figure The initial population will be randomly generated based on the output of the initial allocation Evaluation/Fitness function Each individual candidate solution will be evaluated against the fitness objective function. In this work, the fitness function is actually the utility function defined in Section 3.3. A higher fitness means a better result as the goal is to maximize the utility of the assignment Genetic operators Genetic operations performs some stochastic transformations to generate new individuals offspring. These transformations are mutation and crossover Mutation The mutation function creates new individuals or structures by making changes in one or more gene values in a single individual. These new individuals are similar to current individuals with changes resulting from a pre-defined probability. The mutation is used to make changes to the population from one generation to another. In this work, there is two different mutation probabilities. The first one is 55

56 MUTATION_PROBABILITY_DURING_INIT which defines the probability of changing a VM-to-host mapping when building the initial population. The second is MUTATION_PROBABILITY which defines the probability of changing a VM-tohost mapping during generations. The values of those two mutation probabilities should be selected carefully as they are crucial for the genetic algorithm Crossover The crossover function creates new individuals by trying to combine good parts of two individuals. The crossover probability should be tuned carefully for helping in producing better assignment. In the implementation, NEW_CROSSOVER_GENES_RATIO parameter is used for specifying how many new crossover gene pairs to generate in each iteration Convergence A new population is formed by selecting the more fit individuals from the parent population. After the specified number of generations, the algorithm converges to the best individual which hopefully represents a robust solution to the problem. In conclusion, the utility function has been defined as the inverse of the summation of energy and violation costs. Moreover, the cost model has been designed for predicting energy and violation costs. Two different types of violation costs will be calculated based on the designs shown in the chapter. The first one is the violation per VM when the host is overloaded and the second is the violation resulting from the degradation in performance due to migration. Due to the time limitations, the Markov chain will not be implemented in the prediction of CPU utilization. For searching the search space, a genetic algorithm will be used for finding an appropriate assignment. The following chapter describes the implementation of genetic algorithm and the utility function based on the designs shown in this chapter. 56

57 Chapter 4 : Implementation This chapter illustrates the implementation of the utility-based approach. It starts by describing the basic steps required for building a cloud computing environment using the CloudSim toolkit. The following step is the implementation of the adaptive VM assignment using the genetic algorithm. The role of this implementation is searching for possible assignments to find an efficient one that maximizes the utility. The implementation has been performed using the Java programming language on Eclipse IDE along with the CloudSim toolkit. 4.1 Steps for creating a basic cloud datacentre There are eight basic steps for creating and running the simulated cloud infrastructure using the CloudSim. Figure 4-1, shows these steps, which are discussed further in the following sub-sections. Initializing the Cloudsim Package Creation of Data Centers Creation of Broker Interface between user and provider Creation of Virtual Machines Creation of Cloudlets Start Simulation Stop Simulation Printing the Results Figure 4-1: Steps for creating a basic cloud datacentre 57

58 4.1.1 Initializing the CloudSim Package First of all, the CloudSim package needs to be initialized using the init() method in the CloudSim class. This package must be initialized before creating any other entity. CloudSim.init(num_user,calendar,trace_flag); This initialization process involves specifying the number of cloud users through the parameter num_user; the starting time of the simulation is determined through the calendar parameter; and finally an optional parameter trace_flag which traces the simulation events if set to true Creating the Data Centre After initializing CloudSim, a data centre should be created. A data centre contains a number of physical hosts which represent the computing resources. At least one data centre should be created using the createdatacenter("datacentre_name") method which returns an object of type Datacenter. Datacenter datacenter0 = createdatacenter("datacentre_name"); The data centre will be given a unique name through the datacentre_name parameter. This data centre contains a list of physical hosts stored in an array list. Each host in this list of hosts has its own capabilities for the CPU, RAM, bandwidth and storage Creating the Cloud Broker The third step is the creation of the broker which works as an Interface between the cloud user and the cloud provider. The broker acts on behalf of the cloud user, it hides the creation, destruction and management of VMs. Moreover, it submits VMs and cloudlets and is created using the createbroker() method which returns an object of type DatacenterBroker and the getid() method returns the broker id. DatacenterBroker broker = createbroker(); int brokerid = broker.getid(); 58

59 4.1.4 Creating the List of the Virtual Machines An array list for storing the list of VMs in the cloud data centre will be created using the following code snippet. private static List<Vm> vmlist; vmlist = new ArrayList<Vm>(); After the VM list creation, at least one VM needs to be created using the Vm class to be added to the VM list. For example, the following code creates a VM called vm1. Vm vm1 = new Vm(vmId, brokid, CPUInMips, pesnumber, ram, banwidth, size, provisioning_policy); Where vmid represents the virtual machine id and brokid refers to the owner of the VM that is the id of the broker in CloudSim. CPUInMips is a measure of the CPU speed which refers to million instructions per second that maps the CPU frequency. pesnumber is the number of CPU cores; ram is the amount of memory in Megabyte; bandwidth is the network bandwidth in Megabit. size is the storage size in Megabyte; vmm is the type of the virtual machine manager (VMM) used such as Xen. provisioning_policy is the Cloudlet scheduling policy which can be either time-shared using the constructor of the class CloudletSchedulerTimeShared or space-shared using the constructor of the class CloudletSchedulerSpaceShared. In a space shared scheduling, cloudlets will be executed one by one on the CPU while, in time shared scheduling, cloudlets will be executed simultaneously on the CPU. The list of virtual machines should be submitted to the broker for execution. broker.submitvmlist(vmlist); Creating the cloudlets Now, it s the time for creating the application services to be executed by the VMs. The cloudlet represents the applications services that are going to run on top of the virtual machines.a cloudlet list should be created with at least one cloudlet. private static List<Cloudlet> cloudletlist; cloudletlist = new ArrayList<Cloudlet>(); 59

60 Cloudlet cloudlet = new Cloudlet(cloudletId, length, pesnumber, filesize, outputsize, CPUutilizationModel, RAMutilizationModel, BWutilizationModel); Where cloudletid represents the ID of the cloudlet; length is the length of the cloudlet expressed in MI. MI is an abbreviation of millions of instructions which means that if the CPU has 1000 MIPS, it can process an instruction of 1000 MI in one second. The pesnumber is the number of processing elements "CPU cores that will process the cloudlet; filesize is the size of the cloudlet file in bytes. The outputsize is the size in bytes of the output from the cloudlet after execution. CPUutilizationModel is the utilization model of the CPU, which monitors CPU usage by the cloudlet. RAMutilizationModel is the utilization model of the RAM that controls RAM utilization by the cloudlet. BWutilizationModel is the utilization model of the BW which controls the bandwidth usage by the cloudlet. Both the cloudlet broker and the VM which will execute the cloudlet should be set. After that, the cloudlet should be added to the cloudlet list which in turn will be submitted to the broker for execution by the specified VM. broker.submitcloudletlist(cloudletlist) After that, the cloudlets should be tied to the VMs that they will be executed through. broker.bindcloudlettovm(cloudlet1.getcloudletid(),vm1.getid()); Starting the Simulation Now that all CloudSim entities have been created, and the cloudlet is submitted for execution by the VMs. Starting the simulation follows these processes using the sartsimulation() method. This method waits for all the entities to complete the execution. CloudSim.startSimulation(); 60

61 4.1.7 Stopping the Simulation The simulation would stop if any of the simulation entities wanted to stop the simulation. The stopsimulation() method will throw a NullPointerException if any entity is going to be created before the initialization of the CloudSim. CloudSim.stopSimulation(); Printing the results Finally, the desired output messages and the values of the performance metrics are printed using the appropriate methods. The proposed utility based strategy will be built on top of a datacentre such the one described above in Section 4.1 and its sub-sections. 4.2 Implementing the Adaptive VMs Assignment The implementation of the adaptive VMs assignment is carried out using the optimizeallocation(list) method for optimizing the current allocation if possible. This method takes the VMs list as a parameter and returns a map of the best assignment found. The following steps accompanied with code snippets show how the adaptive assignment of VMs has been integrated into CloudSim. 1. Building the initial population: a. The current allocation is used as a genotype, solution vector that represents an assignment, in the initial population so that it can be kept if there is nothing better. The size of the initial population is determined based upon the value assigned to the POPULATION_SIZE parameter. b. Completely random initialization is not used for creating the initial population as it might produce completely random allocations that might not make sense. Therefore, the current allocation is mutated for creating the list of genotypes representing the initial population. The probability of mutation in the initial population is based on the MUTATION_PROBABILITY_DURING_INITIALIZATION parameter. Each genotype in the initial allocation is evaluated against the utility fitness function. 61

62 c. By the end of the creation of the initial population, each individual is storing the genotype in addition to the utility of each of those genotypes. The code snippet in Code 4-1, shows the building of the initial population. The implementation of the utility(g) method is shown in Section 4.3. Code 4-1: Building the initial population of assignments 2. Running the genetic algorithm: a. Parent Selection: i. Selecting parents for mutation: A portion of the current population is randomly selected to be mutated, the size of this portion is determined by multiplying the POPULATION_SIZE by the NEW_MUTATED_GENOTYPES_RATIO. The code snippet in code 4-2, shows how the selection for the mutation is done. Code 4-2: Selecting parents for mutation 62

63 ii. Selecting parents for crossover: A portion of the current population is also randomly selected, the size of this portion is identified by multiplying the POPULATION_SIZE by the NEW_CROSSOVER_GENOTYPES_RATIO. The code snippet in Code 4-3, shows how the selection for the crossover is done. Code 4-3: Selecting parents for cross over b. Mutation: Each genotype resulted from the selection is mutated using the getmutated(genotype, MUTATION_PROBABILITY) method. This mutation creates a new separate population of the mutated genotypes along with their utilities. The code snippet in Code 4-4, shows how the getmutated(genotype, mutationprobability) method is implemented. Code 4-4: The getmutated() method 63

64 c. Crossover: Randomly, two genotypes from the selected parents are chosen to be crossed over for creating a new genotype. The crossover operation creates a new separate population of the crossed over genotypes along with their utilities. The code snippet in Code 4-5, shows how the selection and crossover are done. Code 4-5: The getcrossover() method d. After the mutation and crossover processes, there will be a new population, the size of this new population is the size of the original population resulting from the initial population + the size of the population resulting from the mutation + the size of the population resulting from the recombination crossing over. This new population is sorted by utility fitness. The population for the next generation is created from this new population by keeping the best genotypes based on the KEEP_BEST_RATIO plus selecting randomly from the other genotypes bounded by the size of the population. This means that the selection of the population for the next generation is based on the children and the parents. The code snippet in Code 4-6, shows, how the population of the next generation is created. 64

65 Code 4-6: The population for the next generation 3. The resulting assignment: The genetic algorithm goes into a loop, the length of this loop is defined by the number of generations in the NUMBER_OF_GENERATIONS parameter. After the last generation, there will be a population of genotypes. The final step is to build a migration map from the best valid genotype found so far from running the genetic algorithm Finding Source and Destination Hosts For the whole adaptive VMs assignment to work, we need to define two types of hosts in the datacentre. The first type is the source hosts; these are the hosts where we can move VMs from. Any host is said to be a source host if its utilization is greater than zero. On the other hand, the second type of hosts is the target hosts. Target hosts are hosts where we can move VMs to. Any host is considered as a target host if it is not over-utilized. The findsourceandtargethosts() method shown in Code 4-7 make lists of source and target hosts. 65

66 Code 4-7: Making lists of source and target hosts 66

67 4.3 The implementation of the utility function The implementation of the utility function exactly matches the details described in chapter 3. For computing the utility, energy cost, violation cost and the cost of performance degradation due to migration (PDM) need to be calculated. The code snippet in Code 4-8, shows the calculation of the energy cost and the violation cost based on the design in chapter 3. Code 4-8: Calculation of the energy cost and the violation cost The performance degradation due to migration (PDM) cost is calculated by multiplying the time needed for migration by the cost of that time as shown previously in Chapter 3. The code snippet in Code 4-9, shows how the cost of performance degradation due to migration is calculated. Code 4-9: Calculating PDM cost 67

68 4.4 Integrating the utility based strategy with CloudSim The Class Hierarchy Figure 4-2, shows the class hierarchy of the implementation of the adaptive cloud scheduling problem. This graph is built with the help of Architexa plugin [51] which is plugged into the Eclipse IDE for Java developers [52]. The implementation of the genetic algorithm is hooked into the optimizeallocation(list) method of the PowerVmAllocationPolicyMigrationGeneticAlgorithm class which is a subclass of PowerVmAllocationPolicyMigrationAbstract class that in turn is a child class of the PowerVmAllocationPolicyAbstract class in the org.cloudbus.cloudsim.power package. The PowerVmAllocationPolicyAbstract class extends the VmAllocationPolicy class in the org.cloudbus.cloudsim package. Figure 4-2: The class hierarchy of the cloud adaptive scheduling problem 68