Energy efficient cloud autoscaling. Martynas Puronas BSc Computer Science 2013/2014

Transcription

1 Energy efficient cloud autoscaling Martynas Puronas BSc Computer Science 2013/2014 The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference has been made to the work of others. I understand that failure to attribute material which is obtained from another source may be considered as plagiarism. (Signature of student)

2 Summary Energy related costs are always increasing and have become a significant part of costs associated with running a datacentre. If these costs could be reduced, then data centres would have higher profit margins. It would also help in reducing greenhouse gas emissions. This project looks at ways of scaling web applications running in the cloud in an energy efficient manner. It delivers a solution capable of allocating virtual machines in a virtual cluster whilst taking into account energy costs of the underlying machines.

3 List of Acronyms ANN API AWS CLI IaaS PaaS SaaS SLA VM Artificial Neural Network Application Programming Interface Amazon Web Services Command line interface Infrastructure as a service Platform as a service Software as a Service Service-level agreement Virtual Machine 2

4 Contents Summary... 1 List of Acronyms... 2 Contents... 1 Table of Figures Introduction Overview Aims Objectives Minimum requirements Methodology Research Design and Implementation Evaluation Schedule Background research Introduction Mainframes Cluster computing Grid computing Cloud computing Cloud models Infrastructure as a service (IaaS) Platform as a service (PaaS)

5 2.2.3 Software as a Service (SaaS) Types of cloud Public Private Hybrid Virtualisation Types of virtualisation Full virtualisation Paravirtualisation Hardware assisted virtualisation Energy efficiency Related work in cloud auto-scaling Summary Design and testing environment Introduction Design of auto-scaling application Architecture of the web application Testing environment Summary Predictive model Introduction Supervised machine learning Artificial Neural Networks Introduction Training the network Architecture

6 4.3.4 Data Development Other types of predictive models The Slashdot effect Summary Load testing tool Requirements Research ApacheBench Httperf JMeter Development Summary Provisioning module Introduction Image of VM Web application Load balancing Allocation strategies Implementation Summary Monitoring module Introduction Performance metrics Passive collection Active collection

7 7.3 Configuration Response time thresholds Monitored URL Response time measurement frequency and moving average s window Request count measurement frequency Implementation Summary The Brain Module Introduction Implementation Tests Without Prediction module With prediction module Summary Evaluation Introduction Summary of results Evaluation of solution Scalability Modularity and extensibility Practical usage Relation to other work Objectives and requirements Objectives Minimum requirements Methodology

8 9.6 Future work Summary Conclusion Appendix A Personal reflection Appendix B Ethical issues Appendix C Bibliography Appendix D

9 Table of Figures Figure 1: Design of the auto-scaling solution Figure 2: Architecture of a load balancer with multiple application servers Figure 3: Multilayer perceptron network with 1 hidden layer Figure 4: Elman network Figure 5: Jordan network Figure 6: Sample of NASA's access log Figure 7: Sample of the summary file generated from log file Figure 8: Diagram of HTTP Load balancer, worker instances and their weights Figure 9: Pseudocode for the auto-scaling algorithm Figure 10: VMs over time in Test Case Figure 11: Number of VMs over time in Test Case Figure 12: Number of VMs over time in Test Case 3 Run Figure 13: Number of VMs over time in Test Case 3 Run Figure 14: CPU utilisation of VMs in Test Case 3 Run Figure 15: Memory usage of VMs in Test Case 3 Run Figure 16: CPU utilisation of VMs in Test Case 3 Run Figure 17: Memory usage of VMs in Test Case 3 Run Figure 18: Number of VMs over time in Test Case Figure 19: CPU utilisation of VMs in Test Case Figure 20: Memory usage of VMs in Test Case Figure 21: Gantt chart

10 1. Introduction 1.1. Overview The biggest advantage of cloud applications is elasticity. An application can seamlessly scale up or down depending on its usage. The developers do not have to worry about the infrastructure and can focus on developing their applications and deploying them to the cloud. The task of managing the application s platform becomes the responsibility of the cloud provider. Cloud service providers run large datacentres to meet their users demands. Having over a hundred thousand servers is not uncommon. As a result of this, energy related costs can become one of the biggest factors in overall expenses. Therefore providers of datacentres are looking at ways to reduce these costs to increase their profit margins. Even more, reducing energy consumption can lead to a better carbon footprint when the pressure from environmental organisations and the government is as high as ever. This project tries to solve this problem by developing a model for provisioning virtual machines on the cloud whilst taking into account energy costs. Datacentres consist of heterogeneous hardware old machines are replaced by new ones which are usually more powerful and more efficient. The developed model will try to exploit this fact to achieve its goal Aims The main aim of this project is to develop an application which would scale a web application deployed on a virtual cluster. The number of the machines in the cluster depends on the application s load if the load increases, an extra machine should be added to the cluster. If the load decreases to the point where having an extra machine in the cluster brings performance improvement, then the machine should be removed. Apart from being able to react to changes in the workload, the auto-scaling application should also try to predict changes in the workload. That way the cluster will be able to adjust the number of machines in the cluster to either reduce the time when the cluster is overloaded or to reduce the time when the cluster is under-capacity to minimise cluster s energy consumption Objectives To reach the project s goals, the following tasks have to be accomplished: 1

11 Train a model for workload prediction. Implement a supervised machine learning model which would estimate workload based on past data. Develop a tool to test how the cluster scales with varying work load. Develop tools which could be used to test if the auto-scaling solution responds accordingly to the workload by increasing or decreasing virtual cluster s size. Develop a virtual machine provisioning model. Develop a model which could communicate with cloud management platform, allocate / deallocate VMs (Virtual Machine) and choose from available physical hosts to deploy a VM on. Evaluate energy savings compared to other virtual machine provisioning models. Compare how different VM allocation strategies contribute to energy consumption Minimum requirements Tasks that must be achieved to get a passing mark: Provide a literature review. Compose an introduction to the history of cloud computing, types of clouds and work done by previous researchers in cloud auto-scaling area. Develop a model for predicting workload. Choose a supervised machine learning technique and implement it to predict workload. Develop the load testing tool. Develop a tool which could send requests at a fixed rate in order to test how auto-scaling solution responds to workload. Create images of the VMs. Create an image of VM on cloud management platform and install necessary software on it. 1.5 Methodology Research Investigate work done by other researches, the points they addressed and conclusions made and see if there can be any improvements made to their work. Look at auto scaling solutions provided by cloud providers to get an idea of how they approached this problem Design and Implementation Construct a model based on supervised machine learning to predict changes in the workload. To do this a supervised machine learning technique will have to be chosen and training data will have to be acquired. 2

12 Develop a tool for load testing which would be used in later stages of the project when testing of the auto-scaling takes place. Build a sample application which will be scaled in the cloud and create images of VM with the software and the sample application deployed. Create the Provisioning and Monitoring modules. Once all the modules are developed, wire them together into a single working application Evaluation Evaluate how different VM allocation strategies worked and if how contributed to the overall energy consumption, evaluate what impact the predictive model had on energy consumption and application s performance. 1.6 Schedule The plan is to subdivide the aims to smaller milestones so that a measurable deliverable could be produced at the end of every 1 2 weeks (Table 1). A milestone will consist of a working piece of software and a chapter in the report. Milestone s no. Duration (weeks) Deliverable 1 1 Provide a literature review 2 2 Develop a prediction model 3 1 Develop load testing tool 4 2 Develop client which would communicate with cloud manager and provision virtual machines 5 1 Build the module for measuring the performance of web server 6 2 Wire all the individual components together and run experiments 7 2 Evaluate the solution and reflect on the results Table 1: Project's timetable 3

13 2. Background research 2.1. Introduction Cloud computing took decades to evolve from mainframes in the 1950s and grid computing in the 1990s to the launch of Amazon Web Services (AWS) in 2006 [1]. Today, there are several cloud providers to choose from (Amazon, Google, Microsoft Azure etc.) each offering different platforms. In this chapter a brief history of cloud computing will be presented Mainframes Mainframes were the most popular from 1950s to 1980s. They were very powerful and reliable (at that time) machines used for bulk data processing [2]. They were very large machines, stored in dedicated rooms [3]. The first successful commercially available mainframe was UNIVAC I built in 1951 [4]. IBM developed their first mainframe called IBM 701 Defense Calculator in 1952 [5] and dominated the mainframe market ever since [6] Cluster computing One of the biggest problems with mainframes was their price they were extremely expensive. Even today, they can cost as much as $75,000 [7]. Because of this, businesses started looking for cheaper ways to perform their computational tasks and cluster computing was invented. Built using commodity hardware, interconnected by high speed network and managed by special software, called middleware, the cluster appears as a single system to the application developer [2]. Clusters experience the same advantages as mainframes they are powerful, fault-tolerant and can be easily extended when needed. Clusters usually consist of similar machines running the same operating system, middleware and connected on the same network [8] Grid computing If cluster computing can be thought of as connecting computers within an organisation, then grid computing can be thought of as connecting computers across many organisations. Grids emerged in the 1990s when organisations started to connect their clusters. Because clusters belonged to different organisations, the grid consisted of heterogeneous computing nodes [2]. A group of organisations which pooled their resources together is called a virtual organisation. It is the responsibility of the virtual organisation to manage the resource allocations to its members to satisfy their requirements. The resources provided by the members of the virtual organisation are computing clusters, databases, special network devices (sensors) etc. [8]. Members of the virtual organisation usually share a common 4

14 goal that all the participants are working towards. An example of a grid application is a WorldWide Telescope project [9] where a group of astronomers share their literature, images and other findings [10] Cloud computing Cloud computing, also known as utility computing, is the newest computing paradigm in IT. The main idea behind it is that computing services users are charged based on usage, just like other utilities such as electricity, gas, water. A person would only need to have his credit card to have access to unlimited computing services. The services provided to the user can range from infrastructure and runtime environments to applications [2]. Usage based billing is very attractive to businesses which often have to deal with variable workloads. In the traditional environment, the business would have to choose one of following: Have more servers than they need to serve the load during peak times and for the rest of time have the resources underutilised. Have high utilisation rates during regular work load times and experience overcapacity during peak loads. The cloud computing frees the business from choosing between the two by providing dynamic resource provisioning. New resource can be provided to the organisation in minutes instead of days [11] and released when they are unnecessary, thus allowing them to deal with the load efficiently and in costeffective manner. 2.2 Cloud models The vision of the cloud sees everything provided to the user as a service (XaaS or X-as-a-Service). The most basic types of services are Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS), although there could be many others [12] Infrastructure as a service (IaaS) The IaaS model frees the user from dealing with hardware and its administration. It becomes the responsibility of the cloud service provider to administrate the physical hardware (replace broken parts, provide high speed bandwidth, build the physical location where the servers are stored, secure physical access to the servers etc.). The customer gets a virtual machine from the provider and it is the client s responsibility to manage the software installed on it. Billing is based on the amount of resources 5

15 allocated (number of CPUs, size of memory etc.) and on the time the virtual machine was running. Usually, it is an hourly charge. AWS was the first to offer such service in 2006 with the launch of Elastic Computing Cloud (EC2) [1]. Today, there are many more IaaS providers to choose from. Most popular are Windows Azure [13], Google [14], IBM [15], Rackspace [16] and many others Platform as a service (PaaS) The main problem with IaaS is that the user is required to install the software and middleware he needs. PaaS addresses this problem by providing the environment and platform required for the application development and deployment. This could be programming languages, libraries, software or anything else required for the user s application to run. The PaaS platform can also provide auto-scaling, version control software (git, SVN), IDE integration. This can save time for their customers in terms of development time and allow reach the market faster [17]. The most popular PaaS solutions today are AWS Elastic Beanstalk [18], Google App Engine [19], OpenShift [20], Heroku [21] Software as a Service (SaaS) Unlike IaaS and PaaS, which are aimed at developers, SaaS provides a service which is consumable by a non-tech person [17]. The SaaS model promotes multi-tenant architecture the application is hosted by SaaS provider and its users have access to it via the Internet. This approach solves numerous problems, compared to the traditional desktop applications. One of them is maintenance the users do not have to upgrade their software themselves as this is done by the provider. As a result of this, users are always running the latest software and can benefit from the latest changes [17]. Another advantage is simplified deployment. Since the only requirement is internet connection, the users are required to have an internet browser, meaning that they can work on different devices (PCs, tablets, smartphones) from any location in the world. However, there are some disadvantages with SaaS model. First, customising software is often difficult as direct access to the application s code might not be provided or exposed through restrictive APIs (Application Programming Interface). Secondly, some organisations might not want their data to leave their premises due to legal restrictions or security concerns. Examples of SaaS applications are SalesForce.com [22], Google Apps [23], Dropbox [24]. 6

16 2.3 Types of cloud Public Public clouds were the first type of clouds to appear. They are available to everyone. Their main purpose is to reduce IT infrastructure s cost. They are quite popular with new businesses which are able to start delivering their services without significant investments in their infrastructure [2] Private Private clouds, as the name suggests, are not available to the general public. These types of clouds are built by organisations for their internal use. Most often, those organisations either already have an IT infrastructure or want to reduce their costs by consolidating their resources or have to process sensitive data and the organisation does not want it leave its premises [2]. Private clouds can still offer cost savings, but they are not as big as public cloud can offer Hybrid One of the biggest drawbacks of private clouds is that they cannot scale up on demand when exposed to high loads. This however can be solved by hybrid clouds. The main idea behind this architecture is to use private cloud to exploit existing IT infrastructure and use public cloud to server peak loads when existing infrastructure is over-capacitated [2]. 2.4 Virtualisation Virtualisation refers to the separation of the physical layer (hardware) and the software layer (Operating system) [11]. This abstraction allows the operating system (called the Guest Operating System) run on any hardware supported by the Virtual Machine Manager (also referred to as Hypervisor). The hypervisor usually supports multiple guest OSs running at the same time. This helps to achieve several goals for organisations: 1. Fast resource provisioning. Without virtualisation, if a department within organisation requests a new OS set up, a new physical machine has to be purchased and set up. This can take days or even weeks. Using virtualisation new OS can be set up ready for work in minutes. 2. High utilisation rates. Usually, an operating system might not utilise all of the available resources and have room available for additional workload. This might be acceptable in some scenarios, but as the number of machines grows, the amount of wasted resources can quickly accumulate 7

17 into significant unused CPU time and memory. Virtualisation allows achieving higher utilisation rates by running multiple OSs on the same physical server. 3. Simpler migration. In order to migrate a virtualised operating system from one physical machine to another is as simple as copying the image of the system to a new location. The migration can even be done with zero downtime [25]. All these great features allowed virtualisation to become a key part in the IaaS model. In fact, if it wasn t for the virtualisation technology, cloud computing would not be possible. 2.5 Types of virtualisation There exist several types of virtualisation, each with its own advantages and drawbacks. Therefore it becomes the responsibility of the datacentre s architect to choose the most suitable one Full virtualisation Full virtualisation means running operating system directly on top of the hypervisor without any modifications to the guest operating system. The guest operating system is completely unaware that it is running inside virtual machine. This isolation also increases security, so if one virtual machine is compromised, others can continue to work. Full virtualisation solutions try to obtain maximum efficiency by executing save instructions directly on CPU and trapping any sensitive calls inside the hypervisor [2]. This can have a negative impact on performance Paravirtualisation Paravirtualisation tries to avoid the performance overhead which is caused by trapping sensitive calls by making the guest OS aware that it is virtualised. This allows the guest OS communicate with the hypervisor directly, avoiding expensive call trapping. The guest OS s kernel must be modified to achieve this, therefore, the OSs available to be paravirtualised are mostly limited to open source products [26] Hardware assisted virtualisation In this model, hardware s architecture supports virtualisation and is aimed to reduce performance penalties caused by call trapping. Hardware vendors have their own solutions to achieve this, but they are incompatible with one another [2]. 2.6 Energy efficiency Energy consumption related costs consist of the cost of the power, power and cooling infrastructure and could amount up to as much as 42% percent of all operating costs as shown by [27]. Therefore it is 8

18 important for cloud providers to cut down these prices as much as possible. Improving efficiency in a datacentre can be approached from many perspectives one can try achieve efficiency at the infrastructure, hardware or software level as shown by [28]. 2.7 Related work in cloud auto-scaling Auto-scaling in cloud context refers to dynamic provisioning of resources [29]. If an application suddenly experiences high workload or its latency increases, additional resources can be allocated. In an opposite scenario, when an application is idle, resources can be released. All of this done automatically, without human intervention. What cloud operator has to define is a set of rules when the resources are added or removed. Research in cloud auto-scaling is mostly concerned with provisioning virtual machines to complete tasks within certain constraints. The constraints can be budget, time, energy consumption or any combination of the three. Work by [30] was concerned with budget and time constraints. Their main observation was that billing period must be taken into account when provisioning virtual machines. For example, if a cloud provider bills based on hourly usage of virtual machine, it is inefficient to shut it down before the hour has finished as the customer has already paid for it. Even worse, if the machine is shut down before the hour s end and there is an increase in workload which requires a new machine to start, then the cloud user would pay for the hour s work twice. According to their findings, it is far better to leave the machine running until the hour expires and shut it down then, even if there is not enough workload for that machine. [30] Also argue that VM s start-up time has to be taken into account when developing a VM provisioning model and the type of VM is available for rent as some of them can be optimised towards faster computations while others towards larger memory. Auto-scaling is particularly interesting topic in case of web applications as they have truly variable workloads (visitor traffic). Web applications also have certain architectural constraints a typical web application has one HTTP server, one or more application servers and one relational database management system. Work done in [31] was concerned with scaling a web application. They chose a number of sessions in an application container as a performance metric and used it in making decisions whether more application servers should be added to the infrastructure or underutilised servers should be removed. I think that the metric chosen is not an accurate indicator of server s performance. First, it does not correlate with the workload CPU is experiencing a user who is not logged in might cause 9

19 higher workload than a logged in one. Secondly, the sessions are persisted on the server between requests and remain on the server until they expire, depending on the server s configuration, so it is possible that a server has a high number of active sessions but a small workload as the users who logged in stopped using the application, but the sessions remained on the server. Another metric has to be chosen to evaluate server s performance more accurately. Exploiting energy costs to maximise profits was discussed by [32]. They tried to distribute the load amongst datacentres which are charged different price of electricity. The prices can vary due to differences in time zones (electricity can be cheaper at night than during the day). Their results showed that models, which take into account energy consumption and its price, can result into significant savings. Most of the authors mentioned in their reports that their results would have been even better had they had a model to predict future load. Research done by [33] tried to address this problem by building two models to predict the future load. One was a trained artificial neural network and the other was a linear regression model. They found that the artificial neural network had better results predicting the work load. In my project, I will also use Neural Networks to predict the load, but instead of predicting CPU utilisation of the servers in the cluster, I will train the network to predict the number of visitors. 2.8 Summary This chapter gave an introduction to cloud computing history of cloud computing and major developments, introduction to virtualisation, types of virtualisation and how it is used in cloud computing, why energy efficiency is important, what cloud auto-scaling is and what research has been done in that field. This chapter marks the completion of the first milestone of this project. Next chapter will give an overview of the proposed solution s design and testing environment. 10

20 3. Design and testing environment 3.1 Introduction Scaling an application on the cloud requires special considerations not every applications can be scaled and some architectures work better than others. In this chapter an overview of the auto-scaling application s design, the architecture of the web application that will be scaled and information of the testing environment will be presented. 3.2 Design of auto-scaling application The tasks of the auto-scaling application are going to be VM provisioning, performance monitoring and modelling future workload, therefore it would make sense to separate these tasks into separate modules to achieve loose coupling between software components. The overall design of the system is shown in Figure 1. Monitoring Agent Brain Provisioning Module Prediction Model Figure 1: Design of the auto-scaling solution In total, there are going to be four modules: Monitoring Module. This module will monitor web application s response time, collect various performance metrics (such as memory consumption, CPU usage, etc.) from each server in the cluster and the total number of requests to the web server. Predictive Model. A supervised machine model which given past application s load will predict future load. The application will load the model which already had been trained and the model will not the retrained whilst the application is running. To implement this model, Artificial Neural Network will be used. 11

21 Provisioning Module. A client which will communicate with cloud manager to add or remove servers from the virtual cluster. It will also be responsible for starting required processes on the newly instantiated VMs and reconfiguring the cluster to use the new VM. Brain Module. This module will be the main application s entry point it will instantiate other modules and communicate with them to make decisions if VMs should be added to or removed from the cluster. 3.3 Architecture of the web application The most common way to scale a web application is to have one HTTP Load balancer and multiple application servers. The role of the load balancer is to accept an incoming HTTP request and forward it to one of application servers. The application server then processes the request and replies to the load balancer which forwards the response back to client. The client is unaware which application server processed his request or how many application servers there are in the cluster. This architecture allows to add or remove servers from the cluster whilst leaving other servers untouched. It also allows to achieve higher availability if one of application servers goes down for any reason (software or hardware failure, maintenance), other application servers in the cluster will continue to process requests. Figure 2 shows such architecture. Application server Application server HTTP Load Balancer HTTP Request Figure 2: Architecture of a load balancer with multiple application servers 12

22 3.4 Testing environment All experiments are going to be run on cloud testbed deployed by Leeds University s School of Computing. The cloud is managed by OpenNebula [34] cloud infrastructure manager and is deployed on 7 physical machines. 3.5 Summary In this chapter the design of auto-scaling solution and the architecture of web application s runtime environment was introduced. The auto-scaling solution will consist of three independent modules (Provisioning, Monitoring and Prediction) communicating with the main module (Brain) responsible for making the decisions regarding the size of application servers cluster. In the following chapters a detailed overview of each module and its implementation will be presented. 13

23 4. Predictive model 4.1 Introduction Relationship between cloud provider and its customer is defined in a service-level agreement (SLA) where the provider defines the quality-of-service parameters under which the service is delivered [2]. These parameters can be elasticity, throughput, response time and others [35]. The SLA also defines penalties imposed if the agreement is breached multiple times penalties can range from a rebate of fees to an early termination of the contract [36] [37]. Therefore, it is in cloud provider s best interest to comply with the agreement to maximise profits. On the other hand, cloud providers have to carefully allocate resources to their customers as overly provisioned resources will be used inefficiently and will result in increased operating costs. Finding the middle ground between complying with the SLA and efficient resource allocation would yield the highest profits. When it comes to implementing these agreements, cloud providers might allocate more resources to the customer if a quality-of-service parameter goes over a certain threshold. For example, if a web service s response time is longer than 1 second for 10 minutes, add another application server to the cluster. While, this may work in certain cases, there are drawbacks. First, during the 10 minute period the application would have an unacceptable response time. This could result in visitors dissatisfaction. Secondly, due to the nature of the web and its fluctuating traffic, the web application might experience slow response for 9 minutes, then an acceptable response for 1 minute under the 1 second response threshold and then another 9 minutes of slow response time. This simple, yet very likely scenario would not violate the SLA, but would be unacceptable in practice. These reactive rules may not always work as they result in application not working acceptably. However, if the cloud provider could predict the changes in the application s load, resources could be allocated and deallocated without the negative effects on application s response time. 4.2 Supervised machine learning Supervised machine learning can be used to solve this problem. In this approach, a model is given data with inputs and expected outputs. The model then tries to find the relationship between the input and output so that given an unseen set of inputs, correct or a good enough approximation of output is produced [38]. Predicting web traffic falls under time series prediction. Time series is a collection of observations over equally spaced time periods. Examples of time series would be the closing price of stock index, daily temperature or GDP [39]. One technique often used to train the model is called a sliding window technique [40]. In this technique the input is n sequential observations and the (n + 1) observation is the output. I will be using this technique to train my model. 14

24 4.3 Artificial Neural Networks Introduction An Artificial Neural Network (ANN) is a model which tries to simulate brain s neural network. The reason why the ANNs are appealing to researchers of Artificial Intelligence is their fault tolerance, flexibility and parallelism [41]. Just as a biological neural network, ANN consists of neurons which are connected with one another. Primarily, there are two types of networks recurrent and non-recurrent. Non-recurrent networks do not permit self-connections or loops, whereas recurrent networks allow them. Each connection between two neurons has a value associated to indicate the strength of the connection. These values are called weights. The larger the weight is, the stronger the connection between two neurons is. Each neuron in the network receives input from neurons attached to it and sends the output to all the neurons it is connected to. The output of the neuron is defined as follows: N o = f ( w i x i θ) i=1 Where o is the calculated output, w i is the weight from neuron i, x i is the output of neuron i and θ is the bias. The function f is called a transfer function. The transfer function is needed to put the output within a certain rage. For example, the sigmoid function limits the output to the range of [0, 1], the hyperbolic tangent function limits it to [-1, 1] and the step function assigns either 0 or Training the network Training the network means changing the weights so that the output produced by the network is correct [41]. Most training algorithms start with initialising weights to random values. Then a pattern is fed into the network and the output is observed. If the output is not correct, the weights are adjusted to minimise the error. The procedure is repeated until the weights converge. The resulting network has to be tested on unseen data to see how well it behaves Architecture There are several architectures for ANNs. The main difference between them is the way nodes in the hidden and output layers are connected. Different architectures are better suited for different tasks non-recurrent networks are better suited for classification tasks whereas recurrent networks are more suited for time series prediction as they have memory [42]. 15

25 Multilayer Perceptron Network The multilayer perceptron network is a non-recurrent network which consists of one input layer, one output layer and zero or more hidden layers (Figure 3). Each node in the input layer is connected to every node in the hidden layer (or output layer if there aren t any hidden layers). Each node in the hidden layer is connected to every node in the next hidden layer (or the output layer if there is only one hidden layer) [40]. Input layer Hidden layer Output layer Figure 3: Multilayer perceptron network with 1 hidden layer Elman Network Elman network (Figure 4) is a recurrent network. It has one input layer, one hidden layer and one output layer, just like the multilayer perceptron network. However, it has additional nodes in the input layer. These nodes provide activation feedback. They feed into the hidden layer the output of the hidden layer from previous iteration [43] Jordan network Jordan network (Figure 5) is similar to Elman network, except the recurrent nodes provide output feedback additional nodes feed the output of the output nodes from previous iteration to the hidden layer [43]. 16

26 Figure 4: Elman network Figure 5: Jordan network 17

27 4.3.4 Data The data I will be using to train the neural network is NASA s web access log from 1 July 1995 to 31 July 1995 provided by [44]. Although the data is old, it is still very relevant today, as browsing patterns should not change over time. A summary of the log file was created which had a number of requests in a 15 minute timeframe. A sample of the log file used in training the network is shown in Figure 6 and a sample from the summary of the access log file is shown in Figure 7. in24.inetnebr.com - - [01/Aug/1995:00:00: ] "GET /shuttle/missions/sts-68/news/sts-68-mcc- 05.txt HTTP/1.0" uplherc.upl.com - - [01/Aug/1995:00:00: ] "GET / HTTP/1.0" uplherc.upl.com - - [01/Aug/1995:00:00: ] "GET /images/ksclogo-medium.gif HTTP/1.0" uplherc.upl.com - - [01/Aug/1995:00:00: ] "GET /images/mosaic-logosmall.gif HTTP/1.0" uplherc.upl.com - - [01/Aug/1995:00:00: ] "GET /images/usa-logosmall.gif HTTP/1.0" ix-esc-ca2-07.ix.netcom.com - - [01/Aug/1995:00:00: ] "GET /images/launch-logo.gif HTTP/1.0" uplherc.upl.com - - [01/Aug/1995:00:00: ] "GET /images/world-logosmall.gif HTTP/1.0" slppp6.intermind.net - - [01/Aug/1995:00:00: ] "GET /history/skylab/skylab.html HTTP/1.0" piweba4y.prodigy.com - - [01/Aug/1995:00:00: ] "GET /images/launchmedium.gif HTTP/1.0" slppp6.intermind.net - - [01/Aug/1995:00:00: ] "GET /history/skylab/skylab-small.gif HTTP/1.0" slppp6.intermind.net - - [01/Aug/1995:00:00: ] "GET /images/ksclogosmall.gif HTTP/1.0" ix-esc-ca2-07.ix.netcom.com - - [01/Aug/1995:00:00: ] "GET /history/apollo/images/apollologo1.gif HTTP/1.0" Figure 6: Sample of NASA's access log 948,1004,807,747,678, ,807,747,678,833, ,747,678,833,744, ,678,833,744,696, ,833,744,696,539, ,744,696,539,532,499 Figure 7: Sample of the summary file generated from log file Development I chose to use a sliding window of size 5 for the training. To develop the best model to predict traffic changes I had to decide on the following parameters: Architecture. I had to decide if I was going to use Elman or Jordan architecture. Recurrent networks are better suited for time series predictions, so Multilayer Perceptron Network was not considered. Number of nodes. Number of nodes in the hidden layer Training data s size. The number of patterns from the data set to use in training the network. Data format. Should the data be represented as a number of hits in a timeframe or a change between two consecutive timeframes? 18

28 To build the network I used Encog framework [45]. To find out the best parameters for the network, I wrote a program which tested various combinations of the parameters. The program generated Jordan and Elman networks with 1 to 15 nodes in the hidden layer using 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45% and 50% of the data for training and testing if the data should be represented as the actual number of hits vs. change between two consecutive time frames. The program trained each network until the weights converged (meaning that the error did not change or changed insignificantly after 5000 iterations) or iterations of training were made. The results from the test showed that the lowest error when tested with unseen data was achieved using Jordan network with 9 nodes in the hidden layer, using 50% of the training data and having the data represented as an actual number of hits. This network configuration will be used in developing the scaling application. Appendix D presents the results obtained from the simulations. 4.4 Other types of predictive models Apart from artificial neural networks, there are many other predictive models that could have been implemented instead. One of them is multiple linear regression [46]. The formula for multiple linear regression is y = β 0 + β 1 x 1 + β 2 x β n x n + u. Here y is the value which is modelled (called dependent variable), x variables are the Independent variables and u is a disturbance term (error). In order to use this type of model, coefficients β have to found. One way to do that is to use the Ordinary Least Squares method [46]. The main idea behind this method is to minimise (y y ) 2 over all observations where y is value from an observation and y is the value predicted by the model. 4.5 The Slashdot effect The Slashdot effect [47] refers to a sudden spike in traffic. The term originates from news website which posts links to news articles submitted by its users. When an article is featured on Slashdot s front page and the site which posted the news item goes down due to heavy network traffic, it is often said that the site was slashdotted. This type of traffic is impossible to predict as it is random the users of Slashdot decide which website is featured on the front page. Because of the randomness in traffic surge, it is impossible to expect a predictive model to predict these traffic surges. 4.6 Summary The second milestone of this project was successfully achieved a predictive model was successfully developed using artificial neural network after a series of experiments determined the most efficient 19

29 network s configuration. The next chapter looks at developing tools for the auto-scaling solution s testing. 20

30 5. Load testing tool 5.1 Requirements As mentioned in the Introduction of this report, one of this project s objectives is to deliver tool which would help to test the elasticity of the final application. The testing tool should be able to do the following: Request throttling. The tool should implement some kind of request throttling, i.e. it should not send requests one after another, but send a fixed number of requests per second. Resend logged requests. The tool should be able to resend requests logged in Apache s access log. Time intervals between the requests must be preserved. Command line interface (CLI). The tool should be CLI based so it could be used in the testing environment or could be integrated with other command lines tools should this be required in the future. 5.2 Research Before developing my own tool, I investigated if there were any already built tools that I could use ApacheBench ApacheBench (AB) [48] is one of the most popular tools available for web server benchmarking. It comes together with Apache Web Server. AB offers CLI with a number of runtime configurations, such as number of requests to send, concurrency level, support for proxy etc. However it does not support resending requests from access log nor it can t throttle the request throughput Httperf Httperf is a web server benchmarking tool from HP Labs [49]. It offers similar features to AB, in addition, it can also limit the number of requests per second. However, the request throttling is not flexible the throughput rate cannot change over time. While this would allow to test if the cluster can scale up, it would not be able to test cases when the cluster has to scale down JMeter Apache JMeter [50] is another web benchmarking tool from Apache Software Foundation. JMeter can resend requests from Apache log and offers a CLI interface, but it is not capable of request throttling. 21

31 5.3 Development Since none of the tools I investigated offered the features I require, I had to write my own. I wrote the tool in Java. It offers CLI, can read requests from Apache s access log and preserves the time between subsequent requests. Since it is built in Java, not only it can be run from command line, it can also be used in other Java applications (such as integration tests). In order to build CLI I used the Commons CLI library [51]. This tool achieves two goals I previously set it offers CLI and can resend requests from Apache s access log preserving the time between requests. To achieve the other goal request throttling I wrote another application which reads an XML file and generates log file with requests based on the XML. The generated log file can later be used to resend the requests. For example, to define a workload where for the first 10 seconds 10 requests per second are sent to the server and then 20 requests per second for the next 10 seconds the following XML would be written: <Jobs> <Job> <Duration>10</Duration> <Throughput>10</Throughput> <RequestPath>/</RequestPath> </Job> <Job> <Duration>10</Duration> <Throughput>20</Throughput> <RequestPath>/</RequestPath> </Job> </Jobs> Although, this might seem as a hackish way to achieve the goals, the two applications I wrote are independent of each other and can be used separately. 5.4 Summary Investigated tools did not meet the requirements, therefore a custom tool meeting the defined requirements was developed. A table summarising the custom tools and the investigated tools capabilities is presented in Table 2. The developed tools meet all of the defined criteria which means that Milestone 3 of this project was achieved. 22

32 ApacheBench Httperf JMeter Custom Request throttling No Partial No Yes Resend logged requests Command line interface No Yes Yes Yes Yes Yes Yes Yes Table 2: Summary of load testing tools' features 23

33 6. Provisioning module 6.1 Introduction The provisioning module is one of key components in the auto-scaling solution. Its job is to communicate with the cloud manager to allocate or deallocate VMs. The provisioning module has to be capable of different VM allocation strategies. 6.2 Image of VM Before the development work can begin, an image of VM has to be created with the necessary software installed. Originally I used Ubuntu [52] as it is one of the most popular Linux distribution with huge resources and community [53], but the image provided by the OpenNebula s marketplace could not configure its network settings when VM was instantiated, therefore I decided to use a different operating system. Switching to Debian Squeeze [54] solved the problem it managed to configure its network settings without any problems. This is rather strange, as Ubuntu is a Debian based operating system and there should not be any reason why one would work and the other would not. The software installed onto the image was Java 6 Runtime Environment, Java 6 Development Kit [55], Apache HTTP Server 2.2 [56], Apache Tomcat 7 [57] and collectd [58]. I could have had two different images one with HTTP server installed and the other with Tomcat installed, but for simplicity I only have one image with both HTTP Server and Tomcat installed and the and decision which service to start will be taken by the Provisioning Module. 6.3 Web application For testing purposes I wrote a simple web application which multiplies two matrices. This is both memory and CPU intensive as it requires O(n 2 ) memory and O(n 3 ) operations. The input parameter n is passed as a request parameter to the application which when creates two n-by-n matrices and multiplies them. After writing the application I deployed it to the VM s image. 6.4 Load balancing The architecture of the web application is going to have multiple instances of Tomcat containers behind Apache HTTP server which is going to act as a load balancer. In practice, this is a fairly common setup, however, the administrators have to take into consideration one additional thing session replication across the Tomcat instances as the each users requests are dealt by different Tomcat instances due to 24

34 load balancing. In my case, the web application is not stateful, therefore this makes the setup slightly easier as I do not need to worry about session replication. The HTTP server has the mod_jk module installed to handle the communication between the HTTP server and Tomcat instances using the AJP protocol [59]. The mod_jk module implements a weighted round-robin algorithm to distribute the workload, meaning that some instances could receive higher work load than others. In my setup all tomcat instances are weighted the same. See Figure 8 for an overview. Tomcat Tomcat Tomcat Apache HTTP HTTP Request Figure 8: Diagram of HTTP Load balancer, worker instances and their weights The Provisioning Module s responsibility is responsible for the following tasks: Starting one Tomcat instance and HTTP server. When the Provisioning Module is instantiated it should start one instance of Tomcat container, one instance of HTTP server and configure the HTTP server to forward all incoming requests to the Tomcat instance. Increase / decrease the number of Tomact instances. If the Provisioning Module is asked to allocate more resources, it should start a new VM instance from the prepared image, start Tomcat on the new instance and reconfigure HTTP server to notify it about a new server and start forwarding some of requests to it. The Provisioning Module must be able to stop a VM running Tomcat, if asked to do so, and reconfigure HTTP server to stop forwarding requests to the stopped instance. 25

35 6.5 Allocation strategies As mentioned before, there are several different strategies that are implemented by the Provisioning Module. How well these strategies behave in terms of energy saving will be compared in the evaluation phase. The strategies I propose are: Greedy. The greedy strategy would allocate as many VMs as possible on a single host before starting to allocate VMs on a different host. If a VM has to be deallocated, choose the one running on the least desirable host. The aim of this strategy is to prioritise certain hosts over others as they could be more powerful or efficient. Random. Allocate a VM on a randomly selected host from all available hosts. Deallocate a random VM from all running VMs. 6.6 Implementation Before the Provisioning Module starts, it reads a Java properties file where information about hosts efficiency is stored. Each host is assigned a real number which describes its energy consumption larger coefficient means higher energy consumption and lower coefficient means lower energy consumption. These coefficients are only used by the Greedy strategy it tries to allocate VMs on hosts with lowest energy usage coefficients and deallocate VMs which run on hosts with largest energy coefficients. The Random strategy disregards these coefficients. To make sure that the strategies work correctly I added unit tests which cover basic cases as well as some edge cases. Once the information about hosts is read and the module is instructed to start, the module allocates two VMs one for running Tomcat and the other one for running Apache. Setting up the Tomcat instance is very straight forward all the module does is connects to the VM via Secure Shell (SSH) (JSch [60] library was used for this) and starts Tomcat. Setting up the Apache instance is more difficult the Provisioning module connects to the VM via SSH and creates a workers.properties file required by the mod_jk module which tells mod_jk about available Tomcat instances their IP addresses, listening ports, communication protocol and work share. Once the configuration file is created, the Apache server can be started after which the web application is deployed can accept incoming HTTP requests which get routed to the running Tomcat instance. In a real world set up, the administrator might want to start with more than one Tomcat instance rather than wait for additional instances to be allocated. 26

36 If Tomcat instance is allocated or deallocated as a result of change in the workload the process is slightly different the Provisioning module has to SSH into the VM running Apache server, update the workers.properties file to reflect the changes and reload the Apache server. Reloading is the preferred way as opposed to restarting the server as reloading does not cause downtime [61]. Before an allocation strategy can decide where to deploy a new VM, the Provisioning module has to gather information from all available hosts about resources allocated by other users, i.e. allocated memory, CPU time etc. and decide if there is enough resources for one additional VM. Calculating available CPU is rather easy the module just has to sum all allocated CPU time shares, but calculating available memory is slightly harder not only memory allocated for VMs has to be taken into account, but also memory used by Xen s hypervisor, control domain [62] and by the VM itself. Therefore the Provisioning module reads how much memory is reserved by Xen from configuration file, as this value can change from one setup to another, and parses the VM s template to get amount of memory used by a VM. 6.7 Summary A second module of the auto-scaling solution was developed. The Provisioning module is going to allocate and deallocate VMs on the cloud, start required processes on them and reconfigure the cluster. The provisioning module is a layer which abstracts the details of communicating with the cloud infrastructure manager. It will be invoked by the Brain module whenever the Brain module makes a decision that the cluster s size has to be adjusted. Therefore, the 4th Milestone of this project was achieved. 27

37 7. Monitoring module 7.1 Introduction The Monitoring Module is another key component of the auto-scaling solution. Its job is to monitor web application s performance and collect performance metrics of the VMs in the virtual cluster. These metrics include CPU utilisation, memory consumption, web application s response time and the number of requests the web application receives. 7.2 Performance metrics Passive collection Performance metrics are collected from VMs running Tomcat instances. To collect the data of interest, collectd [58] was chosen. It is a daemon which runs in background and collects the statistics. What makes it great is that it has a built-in network module which allows to configure a server-client model where a client sends all the collected data to server where it is aggregated. This significantly simplifies the auto-scaling application as it does not have to be constantly polling the VMs to collect the data, but rather a push model is implemented where each client sends the data at regular time intervals and the server collecting them is unaware of the number of clients. In my solution the VM running Apache collects the data, because this VM is never stopped as opposed to VMs running Tomcat as they might be stopped by the application and all the data on server would be lost. Collectd is started whenever a Tomcat instance is added to the cluster. It collects memory consumption and CPU usage while it runs and sends it to the server every 10 seconds. The server then aggregates the data and stores it in CSV format. Note that the data collected by collectd is not used by the auto-scaling application to decide if the virtual cluster s size needs to be adjusted. The data is collected for later investigation of the application performance by a developer to learn how the application behaves under load. This side of the module can be thought of as a passive data collection. Originally, it was planned to collect energy consumption as well using powertop [63], but it was unable to give any readings, although it was giving power consumption readings when powertop was installed on a physical machine. Most likely, this is because the underlying library used by powertop lm-sensors [64] - has to communicate with the hardware directly and because of the virtualised environment, it is unable to give any readings. 28

38 7.2.2 Active collection In addition to passive collection, the Monitoring Module also does active collection of the data. Active collection in this context means that the collected performance metrics can trigger the auto-scaling application to adjust the virtual cluster s size remove or add more servers. The data collected by the active part of module is web application s response time and the request count the HTTP Load Balancer receives. The web application s response time is used to determine the web application s health. If the response time is too high it means that the current virtual cluster is struggling with the workload and more resources have to be allocated and if the response time is too fast it means that the resources in the cluster are underutilised and a VM can be shutdown. Some cloud providers (like AWS Elastic Beanstalk) let the developers configure various thresholds based on which VMs are provisioned, like CPU usage, network s throughput and many others, but in web application context only one performance metric is important and that is response time. Users do not want to wait long time for their requests to be fulfilled, so from their point of view a web application must respond to their requests as fast as possible. Another reason to choose response time as the performance metric to monitor is that it can sum up all other metrics (CPU utilisation, memory usage etc.) in one number web application might be memory intensive, CPU intensive or I/O intensive and setting up thresholds for each metric can be difficult. Even more, these configurations would be different for every web application. Therefore, measuring response time gives an accurate overview of health of the entire web application. 7.3 Configuration Configuring this module requires setting a few parameters Response time thresholds The module requires to provide to threshold values for response time minimum response time and maximum response time. When a measured response time is lower than the minimum response time a VM running Tomcat instance is removed and the HTTP server is reconfigured to not forward requests to that VM anymore. The reasoning behind this is that the VM cluster is dealing with the workload well and some of the resources can be released. Similarly if a measured response time is higher than the maximum response time threshold, it means that the virtual cluster is having problems with the workload and it needs more resources to server the requests under the maximum threshold. 29

39 One of the difficulties with choosing these two parameters correctly is that the difference between them (maximum response time minimum response time) has to allow for some fluctuations. For example, in the early experiments I conducted I found that if the difference between these two values is too small, a newly allocated VM would cause the response time to fall below the minimum response time threshold and then it would cause a VM to be removed from the cluster. After it is removed the response time would go above the maximum response time threshold and cause a new VM to be allocated again. These oscillations should be avoided as this is very inefficient use of resources Monitored URL The web application s response time is monitored by sending an HTTP request to the web application and measuring the response time. Therefore, the auto-scaling application needs to know where to send the HTTP request. The URL where the requests are sent has to be carefully chosen. If the workload associated with the URL is small, then the response time may quick while the application responds slowly to other requests, thus giving a wrong impression that the application s overall performance is acceptable and additional resources are not needed. Similarly, if the workload behind the chosen URL is very heavy, the response time will be long and additional resources will be provisioned even though they might not be needed Response time measurement frequency and moving average s window In order to avoid noise in the measurements, two additional parameters have to be provided for the Monitoring Module. They are response time measurement frequency and moving average s window. The first parameter is there to specify how often the Monitoring Module should send HTTP requests to the web application. The other one is to specify the window size of the simple moving average. Rather than using a single measurement to decide if additional resources should be provisioned, the Monitoring module calculates a simple moving average of all the measurements. This should smooth the fluctuations in the measurements and reduce the noise. Auto-scaling application uses this calculated moving average in all its decision making Request count measurement frequency The request count measurement frequency parameter tells the Monitoring Module how often it should retrieve the number of HTTP requests received from the HTTP load balancer. This measurement is later used by the Prediction Module, therefore, the measurement frequency has to match the parameters the supervised machine learning model was trained on. Since the artificial neural network that was trained earlier was trained on data which was an aggregation of the number of visitors in 15 minute time frame, 30

40 the request count measurement frequency must also be configured to be 15 minutes. Any other measurement frequency would provide unusable data. If for some reason this parameter had to be changed, then the artificial neural network would have trained on a different set of data. 7.4 Implementation The Monitoring Module is built using as an event system. It emits two events one when the average response time is calculated (Response Time Event) and the other is when the number of HTTP requests received by the HTTP load balancer (Request Count Event). The Monitoring Module periodically measures the response time and the number of requests received using Java s Concurrency API. Once the sample is collected, an event is emitted. The Brain Module (responsible for making decisions regarding scaling) is listening for the incoming events and decides if the results provided by the Monitoring Module are within the thresholds. If not, the Brain Module then communicates with the Provisioning Module and allocates or deallocates workers from the virtual cluster. To implement this event based communication the observer pattern was implemented where the Brain Module acts as an observer and the Monitoring Module as the subject (observable). Whilst working on the Monitoring Module the most difficult part was dispatching events. When designing the module two choices were present. One approach was to have an infinite loop in the main thread where during each iteration the Monitoring Module would check if it needs to sample the response time or request count and sample data. The problem with this approach would be that if one task took a very long time it would delay the execution of another task. For example, under a high load, the response time can be very large. This then could cause a delay in measuring the request count the HTTP load balancer received. The measured request count would then be larger than the actual number of requests received in a defined timeframe. This inaccurate measurement of data would go on to propagate further in the system by being provided to the Predictive Model by the Brain Module and giving estimations on inaccurate data. Another problem with the single-threaded blocking approach would be cases when a new VM is instantiated during that time whilst the Provisioning Module is waiting for the VM to boot and configuring the new VM, no measurements by the Monitoring module could be taken. Therefore, it was clear that a multithreaded approach has to be implemented. In a multithreaded implementation, I used java.util.concurrent.scheduledexecutorservice class in Java s Concurrency API to schedule periodic tasks to measure server s performance. This class allows to set period between each task s executions. Because of this if one task is taking long time to complete it does 31

41 not block the other task. However, the way ScheduledExecutorService is implemented (by java.util.concurrent.scheduledthreadpoolexecutor), if an execution of a task takes longer than the defined delay, then tasks are put into queue are started immediately one after another after the last task completes. To avoid this the Monitoring Module checks when the last task was run. If the time between when the task is started and last task s run is greater than the defined delay, the task terminates. Another thing to consider is that the Brain Module is listening for any incoming events and may instantiate new VMs. In early experiments when I did not consider this case, the Brain Module would be notified by the Monitoring Module about a slow response time and start to instantiate a VM, but before it is instantiated, another run of response time measurement would be put in queue and tasks would start to accumulate. To avoid this, the Brain Module stops the response time measurement in the Monitoring Module and reschedules it once the new VM is deployed. 7.5 Summary The Monitoring module s responsibility is going to be reporting web application s health to the Brain module which later use that information in deciding if the cluster s size needs to be adjusted. The monitoring module consists of two parts one where it actively monitors web application s response time and reports it to the Brain module and the other where collectd, which started by the Provisioning module, collects CPU and memory utilisation metrics which can be analysed later. The successful development of the Monitoring module marks the completion of Milestone 5. 32

42 8. The Brain Module 8.1 Introduction The Brain Module is the final piece of the auto-scaling application. Its objectives are to instantiate all other modules (Prediction, Provisioning and Monitoring) and communicate with them to scale the virtual cluster based on their feedback. 8.2 Implementation The Brain module is an entry point of the application. It loads the configuration from Java s properties file. The configuration file s path is passed as command line argument. The command line argument parsing is handled by Apache Commons CLI library [51]. The configuration file not only includes parameters for all the modules but also settings for Apache log4j [65] module a logging library for the Java platform. The configuration file includes the following settings: Opennebula s credentials. Username, password, URL of Opennebula s API. Provisioning Module s settings. Path to a VM s template, VM s credentials, path to collectd s configuration file and executable, list of hosts managed by Opennebula, allocation strategy for VMs and amount of memory reserved for Xen. Monitoring Module s settings. URL of web application s URL to monitor, thresholds for response time, response time s and request count s measurement frequency. Prediction Module s settings. Path to serialised neural network s architecture, window s size used in training, parameters about the data the neural network was trained on and the number of requests per second one server in the cluster can handle. The Brain Module first instantiates Prediction module, then Provisioning module and lastly the Monitoring module. After the Monitoring module is created, it starts listening for emitted Response Time Events and Response Count Events. Then it goes into an infinite loop and waits for any incoming events and responds to them by either adjusting the virtual cluster s size or ignoring them. Pseudo code for the auto-scaling algorithm is given in Figure 9. If response_time_measurement_received then If average_response_time > maximum_response_time then allocate_machine() clear_response_time_average() endif if averate_response_time < minimum_response_time then 33

43 deallocate_machine() clear_response_time_average() endif endif if request_count_received then derived_cluster_size = requests_received / requests_per_second_allowed_on_vm if derived_cluster_size > actual_server_size then allocate_machine() clear_response_time_average() else if derived_cluster_size < actual_server_size then deallocate_machine() clear_response_time_average() endif Figure 9: Pseudocode for the auto-scaling algorithm 8.3 Tests Without Prediction module Setup All the components have been built, wired and tested individually. It is now possible to start testing the auto-scaling application. The main purpose of these tests is to see if the virtual cluster is scaled up and down based on the workload, performance metrics gathered by collectd correspond to the workload and if the load testing tool built earlier can be used on a larger scale. Since the workload for all of these tests is going to be artificially generated by load testing tool, there is no reason to be using the Predictive Model as there are not any patterns in the workload that could be exploited. Therefore, the Predictive Model is going to be disabled for all of the following tests. All VMs had 1 virtual CPU core and were allocated 0.5 CPU time and 256 MB of memory Test 1 The first test case tests if the auto-scaling solution is capable of scaling the Tomcat cluster up and down based on the changes in the workload. The application s parameters and workload are defined in Table 3 and Table 4 respectively. In early experiments I found that one VM was capable to handle around 35 requests per second before failing to respond under 1 second, so I would expect the cluster to scale to two servers, scale to three servers and then scale down to two servers. Parameter Monitored URL Value /DemoWebsite/performance?n=200 34

44 Interval of response time measurement Moving average s window size Minimum response time Maximum response time: 120 seconds 5 measurements 400 milliseconds 1000 milliseconds Table 3: Application's parameters for Test Case 1 Duration Throughput URL 20 minutes 70 /DemoWebsite/performance?n= minutes 103 /DemoWebsite/performance?n= minutes 70 /DemoWebsite/performance?n=200 Table 4: Workload for Test Case 1 From Figure 10 this prediction almost holds. For a brief moment there are 4 VMs running, but it quickly goes back to the predicted 3 VMs. Figure 10: VMs over time in Test Case 1 35

45 Test 2 In this test I wanted to show the importance of selecting appropriate response time thresholds. Application s configuration is in Table 5 and the work load is in Table 6 Parameter Monitored URL Interval of response time measurement Moving average s window size Minimum response time Maximum response time: Value /DemoWebsite/performance?n= seconds 5 measurements 800 milliseconds 1000 milliseconds Table 5: Application's configuration for Test Case 2 Duration Throughput URL 70 minutes 69 /DemoWebsite/performance?n=200 Table 6: Workload for Test Case 2 Figure 11: Number of VMs over time in Test Case 2 36

46 Figure 11 shows that with this configuration one VM is not enough to deal with the requests, i.e. the response time is too long, but with two VMs the workload is too small and one VM is deallocated. This oscillation is very inefficient use of resources, because when there is 1 VM, the response time is not acceptable (greater than the maximum response time, which could correspond to SLA s requirements in real world application) and when there are 2 VMs in the virtual cluster, the resources are not fully utilised (according to the constraints expressed via the response time thresholds). The key point to take from this experiment is that when more resources are allocated to deal with increased workload, the utilisation of all other VMs will drop. This drop should be accounted for when running an auto-scaling application so the drop would not trigger resource deallocation Test 3 In Test Case 3 I wanted to see how the response time sampling interval and the moving average s window size affects auto-scaling application s decisions, i.e. if the sampling is done at shorter intervals and smaller window size would the cluster s size be different? In the first run, the response time is measured every 20 seconds with 4 measurements in moving average s window (Table 7). The second run the response time was measured every 120 seconds with 5 measurements moving average s window (Table 9). Workloads for the first run and second run are in Table 8 and Table 10 respectively. Parameter Monitored URL Interval of response time measurement Moving average s window size Minimum response time Maximum response time: Value /DemoWebsite/performance?n= seconds 4 measurements 400 milliseconds 1000 milliseconds Table 7: Application's configuration for Test Case 3 Run 1 Duration Throughput URL 30 minutes 70 /DemoWebsite/performance?n=200 Table 8: Workload for Test Case 3 Run 1 Parameter Monitored URL Interval of response time measurement Moving average s window size Value /DemoWebsite/performance?n= seconds 5 measurements 37

47 Minimum response time Maximum response time: 400 milliseconds 1000 milliseconds Table 9: Application's configuration for Test Case 3 Run 2 Duration Throughput URL 30 minutes 70 /DemoWebsite/performance?n=200 Table 10: Workload for Test Case 3 Run 2 During the first run, the auto-scaling application unexpectedly scaled the cluster to three VMs and stayed at that size until the end of workload, as shown in Figure 12, even though two VMs should have been enough. In the second run of the test, when the application was configured to use a longer response time measurement interval and larger moving average window, the cluster was scaled to two VMs instead (Figure 13). Because the workload was the same (only duration was different), VMs in the second run had higher CPU utilisation rates than VMs in the first run and slightly higher memory usage CPU utilisation was around 60% during the first run (Figure 14) and about 90% during the second run (Figure 16). Memory consumption of the first two VMs was slightly higher in the second run (Figure 15 and Figure 17). Figure 12: Number of VMs over time in Test Case 3 Run 1 38

48 Figure 13: Number of VMs over time in Test Case 3 Run 2 Figure 14: CPU utilisation of VMs in Test Case 3 Run 1 39

49 Figure 15: Memory usage of VMs in Test Case 3 Run 1 Figure 16: CPU utilisation of VMs in Test Case 3 Run 2 40

50 Figure 17: Memory usage of VMs in Test Case 3 Run Test 4 One of the required parameters for the auto-scaling application to monitor the health of the virtual cluster is the URL where the HTTP requests are sent and response time is measured. Due to the nature of web applications, requests to different URLs cause the application server to perform different work sometimes it might respond with a cached or static response, whereas other times it might have to perform resource intensive work which cannot be cached. This test addresses the issue of selecting the monitored URL correctly. I configured the application to monitor the URL which corresponds to big workload (multiplying two 600-by-600 matrices). Unlike in previous tests, there were no web requests sent to the web server. Parameter Monitored URL Interval of response time measurement Moving average s window size Minimum response time Maximum response time: Value /DemoWebsite/performance?n= seconds 5 measurements 400 milliseconds 1000 milliseconds 41

51 Table 11: Application's configuration for Test Case 4 Duration Throughput URL Table 12: Workload for Test Case 4 The auto-scaling application was stopped after it allocated 4 VMs (Figure 18) and was clear that it would keep allocating VMs until all resources on the cloud were used up. CPU utilisation graph in Figure 19 is particularly interesting in this test it is possible to see when the HTTP request was sent to the server simply by looking at the spikes in Figure 19. Even more, it is possible to see which VM in the cluster served the request. Memory usage, as shown in Figure 20 shows similar patterns with every request VM serves, memory usage increases sharply. In order to avoid problem shown in this test one would have to either change the URL used for monitoring the response time or increase the threshold for maximum response time. Figure 18: Number of VMs over time in Test Case 4 42

52 Figure 19: CPU utilisation of VMs in Test Case 4 Figure 20: Memory usage of VMs in Test Case 4 43

53 8.3.2 With prediction module In previous chapter it was confirmed that all of the components (except the Prediction module) were wired together correctly and the virtual cluster is scaled up and down correctly in response to the workload, therefore now it would be time to test the auto-scaling application with the Prediction module enabled and using real world data. Originally I planned to use Wikipedia s access trace provided by [66], but due to the number of requests in the access trace, my tool couldn t send the requests as fast as it was reading the log file (the tool became a bottleneck), therefore I used access trace of ClarkNet s (former internet service provider) web server [44]. Because the requests were sometimes sparse, the requests were resent twice as fast, e.g. requests received in 3 hour timeframe were resent in 1.5 hours. Because of this speedup I also halved the window s size in the prediction module from 15 minutes to 7.5 minutes. To calculate the power consumption I will use an exponential power model proposed by [67]. In their work the researchers show that it is possible to derive an estimate of power consumption by using CPU s utilisation rate and CPU s average and maximum power consumption. The formula to derive the power consumption is presented in Equation 1 where P is power consumption, D average power consumption during idle time, M average power consumption during heavy load and U CPU s utilisation expressed in percentages. This formula will be used to calculate total power consumed by all VMs. P = D + (M D)U 0.5 Equation 1 Obviously, the estimation of power consumption by this formula will not be correct in the context of virtualisation as each VM is allocated only a slice of physical CPU, therefore 100% of CPU utilisation on VM might not correspond to 100% utilisation of physical CPU, but for the purposes of comparison this is assumption is acceptable. Also, during the tests all VMs were allocated on physical hosts which had the same CPU model (Intel Xeon X3360) and all VMs were given the same CPU share. Values for parameters D and M will be used as provided by [68]. In order to measure how the Prediction module works I ran six tests three 3 hour slices from ClarkNet s access trace were replayed twice once with the Prediction module enabled and once with it disabled. Every request in the access log was substituted to be /DemoWebsite/performance?n=330 which corresponds to workload equivalent to multiplying two 330 by 330 matrices. Results from the tests are presented in Table

54 With Prediction module Without Prediction module Percentage change with Prediction module enabled Data Set 1 Data Set 2 Data Set 3 Average response time (ms) Total power consumed (kwh) Average response time (ms) Total power consumed (kwh) Average response time Total power consumed % +5.2% % +0.4% % +20.5% Table 13: Results from testing the Prediction module. The results from the experiments show that the Prediction module improved the average response time significantly, although it led to higher power consumption. 8.1 Summary In this final implementation chapter the 6th Milestone of this project was achieved all components developed in previous chapters were wired together into working software the auto-scaling solution responded to changes in workload by adjusting the cluster s size and was able to make adequate predictions about future workload and preparing for it accordingly. 45

55 9 Evaluation 9.1 Introduction In this chapter I will give an overview of the results from the experiments conducted, evaluate the solution s design and implementation, choice of methodology and present possible extensions to the auto-scaling tool which was not implemented in this project due to time and resource constraints. 9.2 Summary of results Originally, it was planned to evaluate how different VM allocation strategies (random vs. greedy) contribute to energy consumption, but due to technical limitations this was impossible to do, therefore instead I would like to reflect how the predictive model influenced the energy consumption. Runs which had the Prediction module enabled used more energy kwh more using the first data set, kwh using the second data set and kwh with the third one. However, the average response time was improved only in the first and third case by 285ms and 683ms respectively. The reason why the response time did not improve in the second test was because the predictive model triggered one VM to be removed, but it did not trigger VMs to be added, whereas in test cases 1 and 2, the predictive model triggered the Brain module to add VMs to the cluster, thus improving the average response, but also worsening the power consumption. From this, it is clear that average response time and power consumption are inversely proportional higher power consumption results in faster response time and lower power consumption leads to longer response times. The auto-scaling solution can function without the predictive model and it will have lower power consumption that way, but the web application will become less responsive, as additional resources are provisioned only when the all the VMs in the virtual cluster are over capacitated. 9.3 Evaluation of solution Scalability Scalability defines how well the system copes if the problem s size increases [69]. The auto-scaling tool should scale very well even with high workload at any time it will be communicating with at most two VMs it will be communicating with the load balancer when it checks the response time and gets the request count or with the load balancer and a newly instantiated worker when the processes on the worker VM have to be started and the load balancer has to be configured. The only bottleneck for this 46

56 setup is the load balancer, e.g. a case where a workload is so heavy that two load balancers are required Modularity and extensibility Modularity in software engineering refers to the code being divided logically into distinct sections where each section s responsibilities and goals are clear [70]. The auto-scaling solution is made of four different modules. Three of these modules (Prediction, Provisioning and Monitoring) do not have any direct communication between them, so a change in one module would not affect the other. The main module the Brain module is where all the communication happens, so if there is a need to additional modules or change the existing one, only the Brain module would have to be made of aware of the changes Practical usage The developed solution is capable of scaling a virtual cluster in the cloud. It can be used on a cloud shared by multiple users as the solution does not make any assumptions if the hosts are used by other users or not it simply checks if the hosts have enough resources to have another VM instantiated. However, there are a few issues that have to be addressed before this solution can be used in production environment. First, it does not monitor the health of the VMs in the cluster. If a VM in the cluster for some reason gets killed, the auto-scaling solution will not be aware of it. If a disappearance of a VM from cluster causes web application s performance to degrade, a new VM will be allocated, but the auto-scaling tool will think that the cluster s size is bigger than it actually is, therefore some form health-checking and recovery mechanism would have to be added. Secondly, the current version of auto-scaling tool runs only application containers and load balancer, but in practice, it would have to be responsible for managing VMs running persistent storage services. These issues were not addressed in this project as they would have been out of scope Relation to other work [31] in their work present a similar auto-scaling solution they have a VM provisioning and monitoring system, and exactly the same web application s architecture (having a HTTP load balancer with multiple workers behind it), but they chose to use the number of active sessions as performance metric. In Chapter 2 I raised a concern that session number might not be the best performance metric and in their work the researchers did not show how the sessions are related to CPU or memory utilisation, whereas tests conducted in Chapter 8 (The Brain Module) show a clear correlation between response time and CPU utilisation. 47

57 Work done by [33] showed that artificial neural networks can be used to predict future workload. Work presented in this report can be thought of as an implementation of the researchers findings in practice confirming their findings. 9.4 Objectives and requirements Objectives At the beginning of the project, the following goals were set 1. Train a model for load prediction 2. Develop a tool to test how the cluster scales with varying work load 3. Develop a virtual machine provisioning model. 4. Evaluate energy savings compared to other virtual machine provisioning models. Objective 1 was achieved successfully through a number of tests, the most optimum network s configuration was found, the predictive model was successfully integrated into the auto-scaling solution and provided noticeable improvements in web application s response time. The results found in this project also show that predictive model created using one organisation s workload can be used in predicting the workload for another organisation. The load testing tool was developed as part of Objective 5 and was used in testing the auto-scaling solution. Although, it could not handle large request throughput, it was not designed to do that and still proved to be useful in the early stages of the project and the project s evaluation. The VM provisioning model (Objective 6) was developed and integrated into the final solution. Not only it could allocate or deallocate VMs, but it also started required processes, configured them, and had VM allocation strategies which could either allocate VMs on physical hosts randomly or prioritise certain hosts over others. The original objective (Objectice 4) regarding comparing different VM allocation strategies in terms of energy savings could not be achieved due to unforeseen technical limitations when the objectives were set at the beginning of the project. Instead, I compared how the predictive model affected energy consumption and application s performance Minimum requirements In order to achieve a passing mark in this project, a set of requirements were defined: 48

58 1. Provide a literature review on cloud computing history, types of clouds and work done by previous researchers in cloud auto scaling area. 2. Develop a model for predicting work load. 3. Develop the load testing tool 4. Create images of the virtual machines that will be used in the cluster Literature review was provided in Chapter 2. It covered a brief history of cloud computing and what developments led to current technology. It also gave an introduction to types of cloud (public, private and hybrid), cloud models (IaaS, PaaS and SaaS), types of virtualisation (full virtualisation, paravirtualisation and hardware assisted virtualisation) and work done by other researchers in cloud auto-scaling area. The model for predicting the workload was developed in Chapter 4. A recurrent artificial neural network was chosen as predictive model. The choice for this type of model proved to be correct as it managed to improve the response time of the web application in two out of three cases. The load testing tool was developed in Chapter 5. The developed tool met all of the requirements (request throttling, replaying Apache s access log, command-line interface) and was used in auto-scaling solution s evaluation. Image of VM was created in Chapter 6. The chosen software stack (Debian, Apache, Tomcat and collectd), that was installed on the image, worked well, was easy to configure and manage. No serious issues were encountered whilst using it. 9.5 Methodology The entire project was approached with Agile methodology the deliverables auto-scaling tool and load testing tool were broken down into milestones. Each milestone meant delivering a usable module and writing about it in the final report at the end of the sprint, which took from one to two weeks. If I had followed the Waterfall technique, then I would have had to write the entire report at the end of the project when all the work had been done. This would have been a very difficult task as I would have had to reflect on progress that had happened months ago. But because the work was done incrementally, the report was being written as I was progressing through the milestones. More detailed information on how I managed to keep up with the original timetable (Table 1) is presented in the Gantt chart in Figure

59 Week Milestone Figure 21: Gantt chart 9.6 Future work Current auto-scaling solution provides a good working framework for future work its modules are loosely coupled, therefore adding additional functionality or changing existing one could be added without refactoring the entire codebase. Some things that could be added to the framework: Making the Provisioning module agnostic. Currently, it works only with OpenNebula, but it would be useful if it was extended to work with other cloud infrastructure managers or public IaaS solutions (Amazon AWS, Microsoft Azure or Google Compute Engine) Continuous network s training. At the moment, the neural network is trained once and then loaded at application s start-up. However, the patterns can change over time. It would be interesting to see if it would be possible to build a neural network which is trained using most recent data. Additional predictive models. Artificial Neural Network is just one of many available predictive models. Implementing additional ones and comparing how they perform. Adding additional performance parameters which are monitored. Add CPU utilisation, memory usage or network traffic thresholds in addition to response time monitoring. Updating web application in live environment. Web application is part of VM image, but in practice the web application is constantly maintained and changed, therefore it needs to be updated whilst deployed on multiple servers. 50

60 9.7 Summary This last chapter of the report marks the completion of the final milestone of the project overview of the results obtained from experiments in Chapter 8 was given together with the evaluation of the autoscaling solution and project s management approach. The project delivered a working cloud auto-scaling tool capable of predicting workload and improving the application s performance based on the predictions. Although one of the objectives could not be achieved, all other objectives with minimum requirements were achieved and exceeded. 51

61 10. Conclusion The main objective of this project was to build a piece of software that could take advantage of cloud s elasticity and scale up or down depending on the workload. The project focused on scaling an application running in a web environment, but the ideas presented in this work should apply in other contexts as well. The project began with providing an introduction to cloud computing historical developments, cloud types and work done by other researchers. Then a detailed walkthrough through each component s design and development was given showing and justifying decisions made. Tools required for testing the auto-scaling solution as well as a sample web application were developed. After the development was done it was shown that the built solution responds to changes in workload by adjusting the virtual cluster s size and that a predictive model can be used to predict the workload and adjust the cluster accordingly. One of the project s original goals was to measure power consumption of VM, but it was discovered not to be possible due to technical limitations, therefore instead I compared how the predictive model contributed to estimated energy consumption. Although the predictive model was successful in predicting the workload, it had a negative effect on energy consumption test cases which used the predictive model consumed more energy. This shows that in order to achieve better energy utilisation, some SLA s requirements have to be made less restrictive, i.e. allow the application to suffer a performance penalty. 52

62 Appendix A Personal reflection I think my project went well. All of the minimum requirements were delivered and exceeded, and most of the objectives were met. The main reason why I wanted to do this project was because I wanted to get deeper understanding of cloud s architecture how is it built and managed. I started preparing for this project as soon as I found out what I was doing I started studying virtualisation and doing background reading during my Christmas break. I think this was a smart decision, because it meant I had to spend less time doing background research when the Final Year Project officially started and move to implementation part of the project sooner. As the implementation stage of the project progressed, I found myself more and more ing with the cloud testbed s administrator trying to solve technical issues. These issues were critical and were halting any development work. As a result of that, I fell behind the schedule. I am not exactly sure, why was I having that many problems it could have been due to the fact that the cloud s administrator could not spend enough of his time administrating the cloud or simply because Opennebula had many unresolved bugs and I kept running into them. Fortunately, I did not fall behind too much and was able to finish everything on time. To anyone else planning to work on Schools of Computing s cloud testbed I would recommend to allocate a time buffer for problems like this. Another problem that ran into was I set an objective to measure VM s power consumption, but it turned out to be impossible. I encountered this problem because I set this objective before actually having any practical experience with VMs and did not know this was unfeasible. Looking at back at the entire project s management, I think that approaching it with Agile methodology was a very smart choice and would strongly recommend it over the waterfall technique. Many of my friends who were following the waterfall technique had to write two reports one as mid-project report and the other as the final one, whereas I wrote one while I was reaching milestones. Overall, I think I approached the project correctly, I achieved what I wanted I learnt how clouds are built and managed to build a working PaaS solution. 53

63 Appendix B Ethical issues No ethical issues were encountered in this project. 54

64 Appendix C Bibliography [1] Amazon Web Services, About AWS, [Online]. Available: [Accessed ]. [2] R. Buyya, C. Vecchiola and S. T. Selvi, Mastering Cloud Computing: Foundations and Applications Programming, Morgan Kaufmann, [3] P. E. Ceruzzi, Computing: A Concise History, MIT Press, [4] Computer History Museum, Timeline of Computer History, [Online]. Available: [Accessed ]. 03.ibm.com/ibm/history/ibm100/us/en/icons/ibm700series/. [Accessed ]. [6] L. Dignan, IBM leads server race, Cisco breaks top 5, [Online]. Available: [Accessed ]. [5] IBM, The IBM 700 Series. Computing Comes to Business, [Online]. Available: [7] R. Mullins, IBM Debuts Lower Cost $75,000 Mainframe, InformationWeek, [Online]. Available: [Accessed ]. [8] A. S. Tanenbaum and M. Van Steen, Distributed Systems: Principles and Pardigms, 2nd edition, Prentice Hall, [9] Microsoft, WorldWide Telescope, [Online]. Available: [Accessed ]. 55

65 [10] G. Coulouris, J. Dollimore and T. Kindberg, Distributed Systems: Concepts and Design, 5th Edition, Addison-Wesley, [11] M. Portnoy, Virtualization Essentials, John Wiley & Sons, [12] A. Hendryx, Cloudy Concepts: IaaS, PaaS, SaaS, MaaS, CaaS & XaaS, ZDNet, [Online]. Available: /. [Accessed ]. [13] Microsoft, Windows Azure: Microsoft's Cloud Platform Cloud Hosting Cloud Services, [Online]. Available: [Accessed ]. [14] Google Inc., Compute Engine - Google Cloud Platform, [Online]. Available: [Accessed ]. [15] IBM, IBM Cloud Computing: Infrastructure as a Service (IaaS) - United States, [Online]. Available: [Accessed ]. [16] Rackspace US, Inc., Public cloud hosting, computing, storage, and networking by Rackspace, [Online]. Available: [Accessed ]. [17] J. Rhoton, Cloud Computing Explained: Implementation Handbook for Enterprises, Recursive Press, [18] Amazon Web Services, AWS Elastic Beanstalk - Application Management & PaaS in the Cloud, [Online]. Available: [Accessed ]. [19] Google Inc., Google App Engine Google Developers, [Online]. Available: [Accessed ]. [20] Red Hat Inc., OpenShift by Red Hat, [Online]. Available: [Accessed ]. [21] Heroku Inc., Heroku Cloud Application Platform, [Online]. Available: [Accessed ]. 56

66 [22] Salesforce.com Inc., CRM and Cloud Computing To Grow Your Business - Salesforce.com UK, [Online]. Available: [Accessed ]. [23] Google Inc., Google Apps for Business United Kingdom, [Online]. Available: [Accessed ]. [24] Dropbox Inc., Dropbox, [Online]. Available: [Accessed ]. [25] Microsoft, Virtual Machine Live Migration Overview, [Online]. Available: [Accessed ]. [26] VMware Inc., Understanding Full Virtualization, Paravirtualization, and Hardware Assist., [Online]. Available: [Accessed ]. [27] J. Hamilton, Cooperative Expendable Micro-Slice Servers (CEMS): Low Cost, Low Power Servers for Internet-Scale Services, 4th Biennial Conference on Innovative Data Systems, [28] A. Berl, E. Gelenbe, M. di Girolamo, G. Giuliani, H. de Meer, M. Quan Dang and K. Pentikousis, Energy-Efficient Cloud Computing, The Computer Journal, vol. 53, no. 7, pp , [29] Google Inc., Web Apps Articles & Tutorials Google Cloud Platform, [Online]. Available: [Accessed ]. [30] M. Mao, J. Li and M. Humphrey, Cloud auto-scaling with deadline and budget constraints, th IEEE/ACM International Conference on Grid Computing (GRID), pp , [31] T. C. Chieu, A. Mohindra, A. A. Karve and A. Segal, Dynamic Scaling of Web Applications in a Virtualized Cloud Computing Environment, ICEBE '09. IEEE International Conference on e-business Engineering, pp , [32] K. Le, R. Bianchini, M. Martonosi and T. D. Nguyen, Cost- and Energy-Aware Load Distribution Across Data Centers, in HotPower'09,

67 [33] S. Islam, J. Keung, K. Lee and A. Liu, Empirical prediction models for adaptive resource provisioning in the cloud, Future Generation Computer Systems, vol. 28, no. 1, pp , [34] OpenNebula Project, OpenNebula Flexible Enterprise Cloud Made Simple, [Online]. Available: [Accessed ]. [35] Cloud Computing Working Group, Spec Open Systems Group, Report on Cloud Computing to the OSG Steering Committee, April [Online]. Available: [Accessed ]. [36] J. M. Myerson, Best practices to develop SLAs for cloud computing, [Online]. Available: [Accessed ]. [37] J. M. Myerson, Use SLAs in a Web services context, Part 1: Guarantee your Web service with a SLA, [Online]. Available: [Accessed ]. [38] E. Alpaydin, Introduction to Machine Learning, 2nd Edition, MIT Press, [39] B. Ingram, Definition of a Time Series - Business Statistics - UIowa Wiki, [Online]. Available: [Accessed ]. [40] T. G. Dietterich, Machine learning for sequential data: a review, Structural, Syntactic, and Statistical Pattern Recognition, vol. 2396, pp , [41] B. Yegnanarayana, Artificial Neural Networks, Prentice-Hall of lndia Private Limited, [42] P. D. McNelis, Neural Networks in Finance: Gaining Predictive Edge in the Market, Academic Press Inc., [43] D. P. Mandic and J. A. Chambers, Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability, Wiley-Blackwell, [44] Lawrence Berkeley National Laboratory, Traces In The Internet Traffic Archive, [Online]. Available: 58

68 [Accessed ]. [45] Heaton Research Inc., Encog Machine Learning Framework Heaton Research, [Online]. Available: [Accessed ]. [46] C. Brooks, Introductory Econometrics for Finance (Second Edition), Cambridge University Press, [47] D. Terdiman, Solution for Slashdot Effect?, [Online]. Available: [Accessed ]. [48] Apache Software Foundation, ab - Apache HTTP server benchmarking tool - Apache HTTP Server, [Online]. Available: [Accessed ]. [49] HP Labs, Welcome to the httperf homepage, [Online]. Available: [Accessed ]. [50] Apache Software Foundation, Apache JMeter - Apache JMeter, [Online]. Available: [Accessed ]. [51] Apache Software Foundation, Commons CLI - Home, [Online]. Available: [Accessed ]. [52] Canonical Ltd, The world's most popular free OS Ubuntu, [Online]. Available: [Accessed ]. [53] S. J. Vaughan-Nichols, The 5 most popular Linux distributions ZDNet, ZDNet, [Online]. Available: [Accessed ]. [54] Software in the Public Interest Inc., Debian -- The Universal Operating System, [Online]. Available: [Accessed ]. [55] Oracle Corporation, OpenJDK, [Online]. Available: [Accessed ]. 59

69 [56] Apache Software Foundation, Welcome! - The Apache HTTP Server Project, [Online]. Available: [Accessed ]. [57] Apache Software Foundation, Apache Tomcat - Welcome!, [Online]. Available: [Accessed ]. [58] F. Forster, Start page collectd The system statistics collection daemon, [Online]. Available: [Accessed ]. [59] Apache Software Foundation, The Apache Tomcat Connector - Documentation Index, [Online]. Available: [Accessed ]. [60] JCraft Inc., JSch - Java Secure Channel, [Online]. Available: [Accessed ]. [61] Software in the Public Interest Inc., Debian Policy Manual - The Operating System, [Online]. Available: [Accessed ]. [62] XenSource Inc., Appendix D. Xen Memory Usage, [Online]. Available: [Accessed ]. [63] Intel Corporation, PowerTOP 01.org, [Online]. Available: [Accessed ]. [64] F. Looijaard, lm-sensors, [Online]. Available: [Accessed ]. [65] Apache Software Foundation, Apache log4j 1.2 -, [Online]. Available: [Accessed ]. [66] G. Urdaneta, G. Pierre and M. van Steen, Wikipedia Workload Analysis for Decentralized Hosting, Elsevier Computer Networks, vol. 53, no. 11, pp , [67] C.-H. Lien, Y.-W. Bai and M.-B. Lin, Estimation by Software for the Power Consumption, IEEE Transactions on instrumentation and measurement, vol. 56, no. 5, pp ,

70 [68] Snapsort Inc., Intel Xeon X3360, [Online]. Available: [Accessed ]. [69] A. B. Bondi, Characteristics of scalability and their impact on performance, Proceedings of the 2nd international workshop on Software and performance, Ottawa, ON, Canada, pp , [70] D. Drysdale, Engineering, High-Quality Software, Lulu.com,

71 Appendix D Due to the size of the results, only a small sample is provided. Table 14 shows results when the neural networks tried to predict the changes in the traffic. Table 15 shows results from the simulations when the network tried to predict the number of hits in a timeframe. Table 14: Predicting changes in traffic Sample s size Architecture Nodes Iterations Training error Testing error 0.05 elman elman elman elman elman elman elman elman elman elman elman elman

72 0.05 elman elman elman jordan jordan jordan jordan jordan jordan jordan jordan jordan jordan jordan jordan jordan jordan jordan Table 15: predicting the number of requests Sample s size Architecture Nodes Iterations Training error Testing error 63

73 0.05 elman elman elman elman elman elman elman elman elman elman elman elman elman elman elman jordan jordan jordan jordan jordan jordan

74 0.05 jordan jordan jordan jordan jordan jordan jordan jordan jordan