Virtual Machine Instance Scheduling in IaaS Clouds

Virtual Machine Instance Scheduling in IaaS Clouds Naylor G. Bachiega, Henrique P. Martins, Roberta Spolon, Marcos A. Cavenaghi Departamento de Ciência da Computação UNESP - Univ Estadual Paulista Bauru, Brazil Renata S. Lobato, Aleardo Manacero Departamento de Ciência da Computação e Estatística UNESP - Univ Estadual Paulista São José do Rio Preto, Brazil Abstract With steady increase in the use of computers, problems such as energy demand and space in data centers are occurring worldwide. Many solutions are being designed to solve these situations, among them is Cloud Computing, which uses existing technologies, such as virtualization, trying to solve problems like energy consumption and space allocation in data centers or large companies. The cloud is shared by multiple customers and allows an elastic growth, where new resources such as hardware or software, can be hired and added anytime in the platform. In this model, customers pay for the resources they use and not for all the architecture involved. Therefore, it is important to determine how efficiently those resources are distributed in the cloud. This paper aims to propose and develop a scheduling algorithm for the cloud that could efficiently define the distribution of resources within the architecture. Keywords Cloud Computing; Scheduling Algorith; Virtualization I. INTRODUCTION Cloud Computing is seen as a trend in the current scenario in almost all organizations. The advantages of using Cloud Computing are the reduction hardware and maintenance cost, accessibility, flexibility and a highly automated process in which the client does not need to concern about software upgrading [1]. Sabahi [2] defines Cloud Computing as a network environment based on computational resource sharing. In fact, clouds are based on the Internet and try to disguise their complexity for the customers. II. CLOUD COMPUTING The Cloud refers to the hardware and software delivered as services over the Internet by data centers. Companies that provide clouds make use of virtualization technologies, combined with their ability to provide computing resources through their network infrastructure. Cloud Computing uses virtualization to create an environment (cloud), which allocates instances (virtualized operating systems) according to the available resources (physical machines). These virtual machine instances are allocated in accordance with the physical machines that are part of the cloud environment. A. Classes of Services According to Buyya [3], Cloud Computing is divided into three service classes according to the type of services offered by providers: Infrastructure as a Service (IaaS), Software as a Service (SaaS) and Platform as a Service (Paas ): Software as a Service (SaaS): in this class, the applications reside on top of the model, offering "software on demand". The applications are accessible from various devices such as a Web browser (e.g.: webmail). The customer does not manage or control the cloud infrastructure, such as network, servers, operating systems, storage, or even the application. The collection for the service, in this case, can be based on the number of users [4]. Platform as a Service (PaaS): provides an environment for developers to build, test and deploy their applications, not caring about the infrastructure, amount of memory or hardware requirements. One example of this class is Google Apps service, where it is offered a scalable environment for developing and hosting Web applications or Microsoft Azure [4]. Infrastructure as a Service (IaaS): in this class of service, the customer has the availability of cloud processing, networking, storage, and other computing resources, where he can install operating systems or any other system. The customer does not manage or control the underlying cloud infrastructure and pay only for the structure used. As examples, it can be mentioned IaaS services such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3), Eucalyptus, OpenNebula and OpenStack [4]. Besides the three types of services mentioned above, other authors also consider: CaaS (Communications as a Service), DaaS (Datacenter as a Service), Kaas (Knowledge as a Service) and HAAS (Hardware as a Service) [5]. B. Benefits from Cloud Computing The main benefit brought with the use of Cloud Computing is scalability. Servers that are not being used represent

problems with management and energy consumption. Full load and low load servers use almost the same amount of electricity, so servers which are not being used are not viable. With the resource provisioning provided by the cloud, based on demand, it is easier to scale the system, introducing more resources when they are needed. This allows for reduction in power consumption and management effort, optimizing the use of servers, network and storage space. The economics of clouds involve the following aspects [6]: Economy of scale from the providers view: it is achieved from big datacenters, minimizing operating costs related to power consumption, personnel, and management. The minimization is a direct result of the assembly of multiple resources in a single domain. Economy of scale from the demand view: occurs due to the demand aggregation, reducing inefficiencies resulting from load variations, increasing server's load. Economy of scale from the multitenancy view: since the degree of sharing can be increased, it is possible to reduce the cost of management of servers. III. SCHEDULING ON THE CLOUD One of the most challenging problems in parallel and distributed computing is known as the scheduling problem. The goal of scheduling is to determine an assignment of tasks to processing units in order to optimize certain performance indices [9]. It should be noted that there are two kinds of scheduling within the architecture: Scheduler Manager: the scheduling algorithms work in order to scale virtual machine instances for computing nodes, responsible for the processing. Scheduler hypervisor within the computing node: the scheduling algorithm is present in the operating system of the physical machine, sharing the processing of their cases. The cloud has an infrastructure that includes a scheduler resource. Differently from an operating system that generally works with processes of low granularity, the manager of the cloud works with virtual machine instances, if compared to processes, it can be said they would be of high granularity. Thus, the scheduler of the IaaS cloud application works for allocation of virtual machine instances, which must determine which node allocates this instance. Differently from a common process scheduling, cloud virtual machine instance remains active, consuming resources or not, until an action (user request or failure hardware/software) interrupts processing. Some items must be evaluated before the cloud scheduler makes its decision on which node should allocate a new request for resources such as: Free processing capacity of the node; Amount of total memory available; Amount of secondary memory available; Free ability to read/write secondary memory; Free upstream and downstream capacity of the network. A major problem of scheduling is to determine the cost of a task. The cloud has the same problem of how to determine the cost of processing, disk, memory and a virtual machine instance network before information can be staggered. In such cases it is necessary to use an adaptive scheme, in which the algorithms and parameters used for scheduling decisions dynamically change according to the state of the previous, current and/or future feature [7]. As it can be seen in Figure 1, the grid computing which is, in a way, similar to the one of the cloud, the adaptive scheduling is realized with a focus on resource heterogeneity of candidates, the dynamic performance of resources and the diversity of applications [7]: Adaptation Resources Adaptive scheduling Adaptation of Application Adaptation of Dynamic Performance Fig. 1. Taxonomy of scheduling in Grid [7]. According to Casavant [8], an assumption for a good solution in scheduling can be recognized in cases where a metric is available for evaluating a solution, this technique can be used to decrease the time required to find an acceptable solution (schedule). The factors that determine this approach include: Availability of a function to evaluate a solution. Time needed to evaluate a solution. Ability to judge according to some metrics the value of an optimal solution. Provision of an intelligence mechanism to limit the space solutions. As discussed, to a cloud environment, the scheduler needs to evaluate the conditions of computing nodes (approximate or heuristic) before allocating the next virtual machine instance (static scheduling), must select the node with more resources available, whereas it is not possible to measure accurately the amount of resources that need new instance (suboptimal), periodically measure (adaptive) and relocate instances if necessary (load balancing), to not harm the performance of other instances present on the node in question. A. Scheduling Algorithm of OpenSource Clouds There are several scheduling algorithms used to determine a better balancing of processing and distribution of resources. In open-source clouds, the main algorithms are deterministic, using the scores to determine the node that will be used for

processing. This score does not take into account the condition of resources available in the cloud and it often affects its performance, as well as the services delivered to the customers by service providers. Considering that the current scheduling algorithms of opensource cloud to determine static mode cloud resources, this study aimed to create a dynamic scheduling algorithm to determine which computing nodes within a cloud, have the resources to efficiently host new virtual machine instances. IV. METHODOLOGY AND TESTING ENVIRONMENT A cloud was built to have this work developed. As it can be seen from Figure 2, four computers were used, one of them as a manager and the others as computing nodes. The management system OpenStack cloud was used, and it was chosen because it is open-source, belongs to an active community and has extensive documentation. For better observation of the results and to provide more detailed comparison in the development phase, the algorithm was tested with two behaviors of virtual machines in an attempt to simulate a real production environment. Virtual machines with constant consumption: three virtual machine images were created with Ubuntu Server 12.04 operating system that runs in its startup script that provides a different constant consumption of resources (processing, memory, network and disk) for each operating system as in Figure 3. Virtual machines with variable resource consumption: in this model, the startup script provides a varying consumption with the use of threads that are initiated randomly. scheduling Virtual Infrastructure Management Virtual Machine Cloud Nodes Fig. 2. Cloud infrastructure. Fig. 3. Instances of constant consumption. Assuming that virtual machines can adopt two behaviors within a cloud: constant resource consumption and variable resource consumption, an algorithm was developed, as shown in Figure 4, to create this scenario. Table 1 describes the initial state of each node, amount of memory, network, CPU, number of cores and cache processor on each physical node. TABLE I NODE RESOURCES Nodes Network Memory CPU Mhz Cores 01 100 Mbps 4 GB 2000 4 02 100 Mbps 4 GB 2000 4 03 100 Mbps 4 GB 2000 4 Table 2 shows the initial state of the nodes in relation to the resources available before scheduling testing. TABLE II AVAILABLE RESOURCES The process restarts when the subthreads are enclosed. The procedure loops until its time to finalize random life. Process begins with 1 to 10 threads Start Process Multiple processes Each subthread starts with 1 to 3 procedures (Disk, CPU and network) Each procedure starts with a random life time. Nodes CPU Network Memory Disk 01 99.7 % 100 % 91 % 100 % 02 99.8 % 100 % 91.2 % 100 % 03 99.8 % 100 % 90 % 100 % CPU usage Network usage Fig. 4. Simulation consumption script. Disk usage

Initially the script starts 1-10 threads and each of these threads start 1-3 new threads containing a specific consumption: CPU, disk and network. Note that the memory is not inserted in the script, as when a virtual machine instance is loaded, the KVM removes the portion of memory available and allocates the physical machine to the virtual machine. V. RESULTS AND DISCUSSION A. Tests with standard scheduling algorithm of OpenStack To start the test, thirty virtual machine instances were launched using a script that selects instances randomly from 1 to 3 with different loads, but constant. As it can be seen from Figure 5, node 1 has more resources available, but node 2 received 10 instances, overloading it more than the others. Fig. 5. Test Release: instances of constant consumption. After that, eight variable instances consumption launches were made and measurements were carried out for the first ten minutes, twenty minutes, thirty minutes and one hour, as it is shown in Figure 6: Fig. 6. Test Release: instances of variable consumption. There were variations in resource consumption, but these changes followed a pattern. This happened because the hypervisor present in the node scheduler has its own system of load balancing, balancing resources among active VMs, making it possible to specify by an average, the quantity of available node resources. B. Prototype The algorithm prototype consists of two parts: The first must constantly evaluate the free resources of each node and save the information in a database. The second should select the node with more resources at the time of instantiation of any virtual machine. Thus, the algorithm should monitor the amount of free resources on each node of the private cloud, should create an index for the manager, which may contain the nodes with more resources available and should also scale the nodes with more resources, as well as new requests of virtual machine instances. It should be noted that, in open-source managers, this selection is done manually by the user. 1) Node Resources A program that monitors, in predefined time intervals, the amount of resources of each node was created. The program is written in Python, to be the same language used by OpenStack and is present in each node of the cloud. To monitor the amount of available resources on the node, a library of Python, called PSUTIL was used. This library contains functions that evaluate the use of resources such as CPU, memory, network, disk. In the specific case of the disk, to determine its speed, a test is made to read and write to calculate transfer rates. The algorithm was scheduled to collect machine information and store it in a database. It was scheduled to run every five minutes, and that time was based on uptime command, found in Unix and Linux systems, which checks the load processes in these systems. This algorithm has three important functions: resources: this function is performed only once and through performance analysis, which determines the maximum data transfer capacity of the disk and the network, number of cores in the processor, memory and processing available. check_node: this function is executed every five minutes, storing on a database the amount of resources available for that node. check_instance: this function is executed every five minutes, storing on a database the amount of resources consumed by each instance. 2) Choice Node The second algorithm is present in the manager. This should capture all the records present in the database with the information of resources for each node, analyses them and score them. For this, the algorithm evaluates the records of the conditions for the last 24 hours of an active node, taking into consideration that this is only a prototype and future revisions may take into account the behavior parameterization of the nodes using neural networks or other dynamic algorithms. To determine the node with more features, it is taken into consideration a simple average of the records and weights are applied on resources that may influence the performance of a virtual machine. In this test, higher weights were prioritized for CPU and memory. Based on the results, the algorithm chooses the node with more resources available before launching the virtual machine instance. As it is shown in Figure 7, the prototype could distribute the load among the nodes participating in the

network more evenly than the current algorithm of the OpenStack cloud. So, it is extremely necessary to receive feedback from each participant from the cloud, demonstrating their real capacity of available resources. For instances of constant consumption, the prototype achieved its goal when it distributed the resources in an equated way among the instances. Fig. 7. Results of instances of constant consumption. As one of the approaches to the use of Cloud Computing is the reduction in energy consumption, the prototype could also be used to allocate the maximum possible instances on a single node, allowing the manager to hibernate nodes not used, without compromising the performance of the virtualized operating system. As in the constant consumption, the prototype worked with instances of variable consumption (Figure 8), because it evaluates the condition of the nodes before scheduling application for VM. In the case of instances of variable consumption, future improvements of this algorithm could migrate instances of overloaded nodes and monitor these nodes after migration. possible to determine at least two policies for scheduling requests for virtual machine instances that justify the use of Cloud Computing: To distribute the load among the nodes of the cloud, thereby improving the quality of service provided; To allocate the maximum instances for a node until its exhaustion of resources by turning off unused nodes. Therefore, this prototype has shown that feedback processes that perform the VM instances is essential to determine with some precision the resource capacity of each node participating in the cloud, making it possible for a manager to decide which policy for operation of the architecture will be used, to be chosen among quality of service and/or energy savings: Quality of service: to ensure the availability of the service to a client, allocating the resources between nodes without overloading a particular specific node; Energy savings: it makes possible the large data centers, to monitor and to allocate the correct amount of instances per node, disabling nodes that are inactive, reducing energy consumption and the amount of carbon dioxide released into the atmosphere. Thus, it is concluded that Cloud Computing provides great benefits, among the major energy consumption, physical space savings in data centers, easy sizing provisions, APIs for external interface, among others. However, determining how resources will be provisioned into the cloud is of utmost importance to ensure its success and adoption by large companies. REFERENCES Fig. 8. Results of instances of variable consumption. In open-source clouds, the main algorithms are deterministic, using the scores to determine the node that will be used for processing. This score does not take into account the condition of resources available in the cloud and it can hinder the performance of the architecture and affect the services delivered to the customers by service providers. The results in this paper show that instances that behave like processes in the operational system, such as the KVM hypervisor, allow the analysis and record of consumed resources, as well as the calculation of the amount of resources available for each node of the cloud. With this information, it is [1] BHADAURIA, Rohit; CHAKI, Rituparna. A Survey on Security Issues in Cloud Computing. CORR, P. abs/1109.5388, 2011. [2] SABAHI, F. Cloud computing security threats and responses. Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on May 2011. [3] BUYYA, Rajkumar; BROBERG, James; GOSCISNSKI, Andrzej M. Cloud Computing: Principles and Paradigms. John Wiley and Sons: San Francisco, 2011. [4] SASIKALA, P. Cloud computing: present status and future implications. Available at: <http://www.inderscience.com/storage/f123101246118579.pdf>. Last access: 29 apr. 2013. [5] HE, Zhonglin; HE, Yuhua. Analysis on the security of cloud computing. Proc. Spie, Qingdao, China, n., p.7752-775204, 2011. [6] BACHIEGA, Naylor G.; et al. Open Source Cloud Computing: Characteristics and an Overview. Available at: <http://worldcomp.org/p2013/pdp3537.pdf>. Last access: 10 jan. 2014. [7] DONG, F.; AKL, S. G. Scheduling Algorithms for Grid Computing: State of the Art and Open Problems. Kingston, Ontario, Canada, Janeiro 2006. [8] CASAVANT, Thomas L.; KUHL, Jon G. A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Transactions on Software Engineering, New York, v. 14 n. 2, p.141-154, Feb. 1988.