Exploiting Private and Hybrid Clouds for Compute Intensive Web Applications

Transcription

1 Exploiting Private and Hybrid Clouds for Compute Intensive Web Applications Aleksandar Draganov August 17, 2011 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2011

2 Abstract Cloud computing changes the way software and hardware are purchased and used. Increasing number of applications are becoming web based since these are available from anywhere and from any device. Such applications are using the infrastructures of large scale data centres and can be provisioned efficiently. Hardware, on the other side, representing basic computing resources, can be also delivered to match specific demands without the consumer having to actually own them. Cloud computing model provides benefits for private enterprise environments where a significant physical infrastructure already exists. Private cloud management platforms have been emerging in the last several years providing new opportunities for efficient management of internal infrastructures leading to high utilization. The resources in a private cloud may become insufficient to satisfy the demands for higher computational power or storage capacities. In cases where the local provisioning is not sufficient, the private resources can be extended with resources from other public remote infrastructures. The resulting infrastructure appears as one hybrid entity. In other words, a private cloud can be extended to a hybrid cloud by adding resources to their capacity from public cloud providers where required. This work investigates the usage of an open source cloud management platform (OpenNebula [1]) to create private cloud and using it for hosting compute intensive web application by managing a farm of virtual web servers to meet its demands. The benefits of using such an approach along with the issues it raises are explained. The chosen algorithm for the web application (LU factorisation) represents a generalized case of applications where the complexity is O(N 3 ) and the input data size is N 2. An approach where the computational load is increased is also tested. The results and the requests per second (i.e. throughput). Two scenarios are covered utilizing a small existing private infrastructure (i.e. a private cloud) and extending the private infrastructure with external resources (i.e. a hybrid cloud) on Amazon Web Services [2]. OpenNebula proved to be easy to use and robust software. Its capabilities ensured convenient management of virtual web server farm within private and hybrid clouds. The network bandwidths appeared to be the most significant limiting factor for the effective use of the farm influenced by the heterogeneous character of the setup, the virtualization of the network interfaces, their sharing between virtual machines and the size of the input data. However, increasing the execution time (i.e. heavier problems) proved to lessen the impact of those issues. i

3 Contents Chapter 1 Introduction... 1 Chapter 2 Background Cloud computing Main benefits of cloud computing Virtualization role Choice of cloud management system OpenNebula Amazon Web Services (AWS) Web applications Load balancing Benchmarking web servers Chapter 3 Web server farm design Cloud software stack Intel VT-X Scientific Linux 6 (Host OS) KVM kernel modules Libvirt OpenNebula Web server software stack ii

4 3.2.1 Java Load balancing/proxy server Private cloud web server farm setup Networking Hybrid cloud web server farm setup VM performance comparison System of linear equations solver The algorithm The web application Running the cloud Preparations Resources allocation Running the application Load Balancer Web server Benchmarking Benchmarking tool Benchmarking Chapter 4 Results and Analysis Results Private cloud web server farm LB deployed on VM LB deployed on a physical machine Increased load - LB deployed on VM iii

5 4.2.4 Increased load - LB deployed on a physical machine Hybrid cloud web server farm Cloud management platform role Centralized control EC2 instances managing User management Cloud API Monitoring Chapter 5 Conclusions and Future Work Conclusions Future work Appendix A Experimental Data A.1 Standard Load Application VM LB A.2 Standard Load Application Physical Machine LB A.3 Increased Load Application VM LB A.4 Increased Load Application Physical Machine LB Appendix B Results of iperf tests B.1 From ph-cplab to VM LB B.2 From ph-cplab to physical machine LB B.3 To local worker (NAT-based network virbr0) from LB VM B.4 To public worker (physical network br0) from LB VM B.5 To public worker from physical machine LB B.6 To local worker (virbr0) from physical machine LB B.7 To EC2 medium instance from VM LB iv

6 B.8 To EC2 medium instance from physical machine LB Appendix C Work plan modifications References v

7 List of Tables Table 2.1: EC2 instance types Table 3.1: Theoretical IP addresses organization for private cloud Table 5.1 Standard load application throughputs with LB on VM. Private cloud server farm Table 5.2 Standard load application throughputs with LB on physical machine. Private cloud server farm Table 5.3 Increased load application throughputs with LB on VM. Hybrid cloud server farm Table 5.4 Increased load application throughputs with LB on physical machine. Hybrid cloud server farm vi

8 List of Figures Figure 2.1: OpenNebula main components. Reproduced from [1] Figure 2.2: Horizontally scaled web servers Figure 3.1: Standard OpenNebula cloud organization Figure 3.2: The actual OpenNebula cloud organization used Figure 3.3: Private cloud web server farm Figure 3.4: Standard private cloud network configuration Figure 3.5: The actual private cloud network configuration Figure 3.6: Hybrid cloud web server farm. Reproduced from [54] Figure 3.7: Performance comparison between EC2 instances and local VM. Single executions Figure 3.8: Performance comparison between EC2 instances and local VM. Thread pool with size Figure 3.9: Size of data transferred in relation to problem size Figure 3.10: Throughput generated by 1 web server for N = 50, 150, 250 for different number of running threads Figure 4.1: Throughput generated by private cloud web server farm for different problem sizes with LB deployed on VM Figure 4.2: Network bandwidths for private cloud web server farm with VM LB Figure 4.3: Throughput speedup for private cloud web server farm with VM LB Figure 4.4: Network bandwidths for private cloud web server farm with LB on a physical machine vii

9 Figure 4.5: Throughput generated by private cloud web server farm for different problem sizes with LB deployed on a physical machine Figure 4.6: Throughput speedup for private cloud web server farm with LB deployed on a physical machine Figure 4.7: Throughput generated by private cloud web server farm for different problem sizes with LB deployed as VM for increased load application Figure 4.8: Throughput speedup for private cloud web server farm with VM LB for increased load application Figure 4.9: Throughput generated by private cloud web server farm for different problem sizes with LB deployed on a physical machine for increased load application Figure 4.10: Throughput speedup for private cloud web server farm with LB deployed on a physical machine for increased load application Figure 4.11: Network bandwidths for hybrid cloud web server farm with LB on a physical machine Figure 4.12: Throughput generated by hybrid cloud web server farm for different problem sizes with LB deployed as VM for increased load application Figure 4.13: Throughput speedup for hybrid cloud web server farm with LB deployed on a physical machine for increased load application viii

10 Abbreviations AMI AWS CLI CRM DNS EBS EC2 ECU ERP HTTP IaaS JVM KVM LB MAC NAT NIC OS PaaS QoS RDS RR S3 SaaS SLA SQS SSL VM VMM Amazon Machine Image Amazon Web Services Command Line Interface Customer Relationship Management Domain Name System (Amazon) Elastic Block Storage (Amazon) Elastic Compute Cloud (Amazon) Elastic Compute Unit Enterprise Resource Planning Hypertext Transfer Protocol Infrastructure as a Service Java Virtual Machine Kernel-based Virtual Machine Load Balancer Media Access Control Network Address Translation Network Interface Card Operating System Platform as a Service Quality of Service (Amazon) Relational Database Service Round -Robin (Amazon) Simple Storage Software as a Service Service Level Agreement (Amazon) Simple Queue Service Secure Sockets Layer Virtual Machine Virtual Machine Manager ix

11 Acknowledgements I would like to thank my supervisor, Dr Charaka J. Palansuriya, for the time spent and his tireless and patient guidance and support during the project. I am also very grateful to Mr. Maciej Olchowik for all his helpful advices and suggestions. I would as well like to acknowledge the efforts of Mr. Craig Morris and Ms. Fiona Bisset from EPCC support to provide me with the required for the project hardware resources and help. Finally I would like to thank my family and my friends for their infinite support throughout the last 12 months. x

12 Chapter 1 Introduction There is an ever increasing movement towards adoption of cloud computing within IT departments of academic and business environments. Based on the advances of the virtualization technologies developed so far, the concept of the Cloud offers a convenient way of efficient organization and usage of different computing resources. Many cloud services have been introduced in the recent years some of them available networks (private clouds). Public cloud services are available in three types of service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) [3]. IaaS provides control over running Virtual Machines (VMs), networks and any other basic computing resources. PaaS offers a platform where users can deploy their applications if they use specific tools and technologies of the PaaS provider. SaaS is normally a web application that is running on the top of a cloud infrastructure. The consumer only pays for what he uses an hour of CPU, a gigabyte of memory for the first two servicemodels and a monthly fee per person for SaaS. The cloud providers often offer different fee schemes so they can match most of the requirements. Private clouds are deployed within the organization and managed by its system administrators. They require more work to set up and to maintain, but provide better flexibility and can be used for different purposes virtual clusters, hosting web applications or deploying VMs used as desktops. Private clouds can be extended to hybrid clouds by adding resources to their capacity from public providers. There are several open source platforms that can be used for managing private clouds. One of the main characteristics of a cloud platform is its elasticity. From a perspective the cloud seems to have unlimited resources that can be provisioned just for the time period required to accomplish a task and then can be released with minimum effort when they are no longer required. This feature allows quick scaling and better utilization and provisioning of the resources. This project focuses on the installation process and usage of an open source cloud platform for private cloud management for creating and managing a private cloud for hosting web applications that require significant computational time. An important aspect of this work is to investigate how the private cloud can be extended to a hybrid 1

13 cloud when the internal resources are no longer adequate. The focus is on web applications since this is an ever more popular way of providing software and thus making it available for PCs, tablets, smartphones and any other device that can access it. The project has 4 main objectives: To show that a private cloud platforms provide benefits for hosting compute intensive web applications; To show that public clouds can be used seamlessly for extending the private resources thus transforming a private cloud into hybrid (i.e. cloud bursting); To identify issues that can arise in usage of private and hybrid clouds for web applications; To investigate the possible usage of existing resources for cloud infrastructure. The OpenNebula [1] software will be utilized for creating a small private cloud that will be used for performing various experiments with a simple web application. The steps of the deployment process will be discussed along with the major issues arisen during the setup. Some of the important capabilities OpenNebula provides will be described. One of these features is the logic that offers control over VMs within Amazon Web Services [2]. The cloud has to be configured to use those external resources in addition to the local setup and increase its capacity in this way. The web application itself will be deployed on running VMs. The project does not aim to outline the advantages and disadvantages of any specific web technology or programming language or platform, but some performance aspects of the chosen setup will be discussed. 2

14 Chapter 2 Background 2.1 Cloud computing Cloud computing model has become an important concept in the last few years. Many different major companies and independent individuals have given definitions for Cloud and cloud computing according to their strategies, visions and products [4], but most of them focus on several important characteristics resources are provided rapidly, on demand, they are highly scalable, and in case of public services the consumer pays only for what he uses. The resources might be computational power, storages, networks, applications. Therefore IT resources are now elastic - they can be provisioned for the exact duration they are necessary and released when the consumer no longer needs them. As described in [4] some definitions include other aspects as Internet-driven service, monitoring and (self-) management, Service Level Agreement (SLA) management, controllable through API, guaranteed Quality of Service (QoS). But those are mostly specific for some platforms or services. As mentioned earlier there are three main models of cloud computing: Software as a Service In general SaaS is a normal web site service often pay a fee to gain access to specific features of the system for a fixed period of time, usually per week, month or year. SaaS software is running in a cloud environment hence it should be able to serve a number of users together. Popular areas where SaaS software is available are accounting, project management, ERP, CRM, document management and many others [5]. Platform as a Service The consumers of the service can deploy their applications on top of the cloud platform supported by the provider. The applications must be developed using APIs and programming languages made available by the PaaS provider. Therefore software development is restricted to a specific platform and limited set of technologies. Since 3

15 every PaaS provider has his own special features and mechanisms, the developers are creating software intended to run only on a particular platform (i.e., vendor lock-in). The most popular platforms at the moment are Google App Engine [6] and Microsoft Windows Azure Platform [7]. Infrastructure as a Service The provider of the service delivers basic resources computational power, networks, storages, operating systems. Those resources are scalable and elastic. They can be provisioned on demand and made accessible for a minimum time. The consumers of the service have full control over the running virtual machines (VM) hence being able to choose all the elements of the software stack that their applications require. But the providers cannot guarantee that the physical resources such as VMs or storage will not fail and they do not provide back up for the data. That is a responsibility of the user. Other research [4] even differentiates some other services such as Computing, Storage, and Database that usually fit into the models mentioned above, but they are offered as separate products by some cloud providers. 2.2 Main benefits of cloud computing One of the main features that differentiate cloud computing from the traditional computing is that clouds are usually programmable. With PaaS the user is supplied with API that allows starting, stopping and other operations over workers and storages [8] [9]. In IaaS the API can be used to run, stop, migrate, clone, delete, etc VMs [8] [11] [12]. In such a context the cloud can indeed be called self-manageable, but it only provides API that can be used to achieve this. Private cloud management platforms also offer API for the same purposes as IaaS. A cloud system can run a number of VMs within a physical machine in a complete isolation from each other. As a result applications can be easily provisioned only with the resources they require. In case of under provisioning another VM can be started in order to handle part of the load generated towards the application and later on destroyed if not necessary so the resources can be used for other purposes or applications. Better utilization is an advantage that comes from the virtualization technologies, but cloud platforms provide mechanisms to control many physical nodes from a centralized point and thus organizing the nodes into server farms improving the process of their management. other details as virtualization software, hosts OS, hardware organization, etc are hidden to the consumer of the service providing a new level of abstraction. Undoubtedly better utilization and server consolidation have financial benefits when considering the existing infrastructure that many organizations have. But public IaaS providers ensure that an organization can rent significant external resources instead of purchasing its own. Small- or medium-sized business can considerably decrease their 4

16 expenses with such an approach. No capital investments for building an infrastructure, moderate ongoing costs, provisioning new computational power requires minimum effort, no staff dedicated entirely to maintain the hardware setup and high availability (eg., Amazon Web Services) are the main befits for organizations that use IaaS services. Detailed cost comparison between internal IT, managed services, and IaaS approaches is published in chapter 1 in [13]. For bigger organizations that can invest in data centre infrastructure and have the expertise to build and maintain it (e.g., a university) it might be more advantageous to have the resources internally. That is to have a private cloud. 2.3 Virtualization role The increasing popularity of the cloud computing approach is driven by the advancement of the virtualization technology. It allows different logical machines to share the same hardware but to run isolated from each other. As a result physical hosts can be utilized better and the computing resources can be allocated easily in a very flexible manner. These isolated running operating systems (OS) are called virtual machines (VMs). A number of virtualization software solutions are available. They usually include a hypervisor also called virtual machine manager (VMM). The hypervisor assign resources to the VMs and lets them operate as if they were running on different machines independently. It runs on the host hardware directly (bare-metal) or as a part of the OS by modifying the kernel or by using the system services. Based on this and on the way the VMs communicate with the physical devices the virtualization is either full virtualization or para-virtualization [14]. Recently the hardware vendors added a new virtualization feature (Intel VT-x [15], AMD-V [16]) which allows guest OS to directly interact with the VMM layer [14]. Popular virtualization hypervisors are VMware vsphere [14], XEN [18], KVM [19], and Microsoft Hyper-V [20]. The last one cannot be used on Linux machines. Red Hat has cut the support for XEN in their new RHEL 6 and has replaced it with KVM. Therefore the support for XEN for Scientific Linux which is the OS used for the current project is also being dropped. As KVM packages for the latest Scientific Linux 6 are prebuilt and ready to install that makes it a suitable choice for the scenarios in the current dissertation. Virtualization also applies to other physical resources as storage and networks. Though virtualization software abstracts the hardware and provides flexible and agile way to control it, it is not a cloud itself. A layer that controls it has to be deployed so it can access the entire infrastructure within a data centre and to manage all the resources. This layer is called cloud management platform. 5

17 2.4 Choice of cloud management system The choice of an IaaS platform to be used for this project presents a challenge since software in this field is relatively new and new versions are released in short cycles. For example, OpenNebula has three beta releases between July 2010 and July The other widely used cloud platform is called Eucalyptus [21]. Eucalyptus and OpenNebula share their most important features: Both projects had first versions in 2008 and are equally mature nowadays; Both can be installed and run on the majority of the Linux distributions available; Both can use XEN, KVM, and VMware virtualization; Both are EC2 compatible; Both can be configured to cloudburst using resources into Amazon Web Services (AWS); Both have user management, etc. In [22] different cloud management systems are compared, but they have all been changed dramatically after the publication of the article. For example, both OpenNebula and Eucalyptus can exploit EC2 resources and there is Graphical User Interface (GUI) available now for OpenNebula. 4caaSt project cloud analysis [23] provides up to date detailed descriptions of existing platforms including Open Stack [24] which is still very new and not as mature as OpenNebula and Eucalyptus. Eucalyptus is dual licensed and some of the features are only available in the commercial version as VMWare support and Windows guest OS. Creators of OpenNebula maintain a blog [25] where practical experience is shared between all the users. OpenNebula is also proved to work effectively in large scale environments [26]. Based on its proved capabilities and big and active community OpenNebula was chosen for the experimental work in this project. 2.5 OpenNebula OpenNebula is a fully open-source toolkit to build IaaS Private, Public and Hybrid Clouds [1]. Figure 2.1: OpenNebula main components. Reproduced from [1]. 6

18 The platform consists of two running processes on a front-end or a central machine as shown on Figure 2.1. The first process is called OpenNebula daemon and it is responsible for controlling the cloud platform modules and the virtual machines. The other process called scheduler decides where to place each virtual machine by accessing the database OpenNebula holds on the front-end. The information stored in the database includes the available resources on the nodes based on the requirements for previously submitted VMs so the scheduler does not need to contact the hypervisors. For each host added to the list of hosts 3 drivers have to be specified so that the daemon can use it. The virtualization driver (VMM driver) is used for communication with the hypervisor installed on the node and to control the VMs running on it. The daemon uses the Transfer Manager (TM) driver to perform operations with the storage system on the host in order to control the VM images. The last driver called the Information Manager (IM) driver allows the daemon to monitor the hosts. Different drivers can be specified and used with different nodes. Working with OpenNebula requires describing images, virtual networks and VMs in template files that can be later submitted to the daemon. Management of the platform is usually performed through the command line, but there is also a GUI that includes the majority of the functions and replaces the template files OpenNebula Sunstone. OpenNebula works with several basic objects through the command line: User represents a user that has access to the system. The command used is oneuser. Host represents a host/node that is controlled by the front-end. Group of drivers have to be specified for each host. The command used is onehost. Image represents an OS image. Its basic parameters are name of the image, whether it is public and available to all users or not, short description and path to the image. Once submitted OpenNebula copies the image to a new location and changes its name. Every image is described with a template file. The command used is oneimage. Virtual network represents a network shared by the VMs. They are either ranged or fixed. For the range networks the only required parameters are size (B, C), network address and a bridge. OpenNebula assigns IPs to the VM automatically. The fixed networks are defined with a set of IPs, possibly MAC addresses and a bridge. The number of addresses specified is the maximum number of VMs that can use it. Networks are also described with a template file. The command is onevnet. Virtual machine represent a guest OS or a VM. Also described with template file. The basic parameters are CPU, memory, image and network. OpenNebula automatically assigns IP (if available) to the instance and places it on a host of its choice. Every VM may be in several states. The most significant of them are pending (waits for the scheduler to place it somewhere), running, stopped, failed. The command is onevm. 7

19 The users list on the OpenNebula home page [1] is often updated. The most significant companies and projects that control internal resources with it are CERN, China Mobile, ESA, KPMG, Fermilab, Telefonica, Reservoir, StratusLab, BonFIRE, etc. The creators of OpenNebula have suggested two ways [27] to increase the internal infrastructure with Amazon EC2 VMs for virtual cluster and web server farm. For both of their scenarios they use VPN channel in order to make the remote nodes a part of the private network. The network latencies caused by the Internet connection and especially by the VPN channel affect the performance, but the results clearly show improvement of the throughput when using EC2 instances. However for the web server benchmarking they have only performed tests for accessing static files. 2.6 Amazon Web Services (A WS) Amazon has described several use cases [28] covering many IT services as application hosting, backup and storage, databases, e-commerce, HPC, web hosting, search engines, on-demand workforce. For each case they offer a group of products that can be exploited for deployment of a high quality service. The major products and services available are: Elastic Compute Cloud (EC2) - provides compute capacity; Simple Storage (S3) - a scalable storage; Elastic Block Storage (EBS) represents block level storage that is designed to be utilized with EC2; SimpleDB non-relational database; Relational Database Service (RDS) highly scalable RDBMS in the cloud; Simple Queue Service (SQS) highly scalable queue; CloudWatch provides more detailed monitoring for the resources being used; CloudFront used for distributed delivery in order to make content easily accessible with low latencies and fast data transfers. The services exploited in the current project are EBS and EC2. EBS is only used to store Amazon Machine Image (AMI) so its part is insignificant. The AMI is used as a prototype of a VM. Amazon provides many images that are preconfigured and publicly available for common needs. Once the image is chosen, a key to access the running VM is provided, security group that acts as a firewall has to be specified, and the type of instance must be selected for the image. The fee paid for a running VM depends of the instance. 8

20 Instance type Memory ECUs I/O Perf. Storage Platform $ per hour Micro 613 MB Up to 2 Low EBS only 32/64 bit Small 1.7 GB 1 Moderate 160 GB 32 bit Large 7.5 GB 4 High 850 GB 64 bit 0.38 Extra Large 15 GB 8 High bit 0.76 High-Memory Extra Large High-Memory Double Extra Large High-Memory Quadruple Extra Large 17.1 GB 6.5 Moderate 420 GB 64 bit GB 13 High 850 GB 64 bit GB 26 High 1690 GB 64 bit 2.28 High-CPU Medium 1.7 GB 5 Moderate 350 GB 32 bit 0.19 High-CPU Large Extra Cluster Compute Quadruple Extra Large Cluster Quadruple Large GPU Extra 7 GB 20 High 1690 GB 64 bit GB 33.5 Very High 1690 GB 64 bit 1.60* 22 GB 33.5, 2 x NVIDIA M2050 GPUs Very High 1690 GB 64 bit 2.10* Table 2.1: E C2 instance types. All instance types available with EC2 service are shown on Table 2.1. Obviously there are configurations for any requirements including cluster with 10 Gigabit network and GPU cluster hence there are resources which are suitable even for HPC applications. The prices shown are for Linux usage for the data centre in Ireland, however clusters are only offered in the main data centre in the US. A free tier of services is offered for the newly registered users for their first months only including micro instances. One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a GHz 2007 Opteron or 2007 Xeon processor [2]. EC2 services are organized in different data centres called regions. They isolate the regions from each other to achieve greater fault tolerance, improve stability, and to help prevent issues within one region from affecting another [29]. A performance comparison between the basic instance types for Geosciences application is described in [30]. They have achieved good CPU utilization for small, medium, and large instances, but the problem does not efficiently make use of instances that are bigger. Though all their products are advertised as highly available, flexible, and scalable they do not fit into every problem. As shown in [31] both the financial model and the performance of S3 are not suitable for big scale scientific projects since they have different requirements for data usage. However benchmarking AWS and any other 9

21 cloud services in general is a problem specific therefore every research involving different application will provide new results. AWS Management Console available on the AWS website provides convenient way to control EC2 instances and the other services they offer. The web interface uses the EC2 API Tools [32] which are also available to the users so they can control the services remotely. These tools provide all the operations as a command line utility. They include many operations such as registering, deploying and terminating an instance, listing all the running instances, all the regions, security group operations, etc. Installing the tools is straight-forward they only require a certificate and a private key so the user can authorize himself remotely. Several environment variables have to be set up to specify the key, the certificate and the URL of the region that will be used. 2.7 Web applications The majority of web sites or web services are deployed within physical or virtual servers. If the capacity that the underlying hardware provided is not enough and could not reasonably handle peaks the response time will be increased and the throughput of the architecture might be unacceptably low. good enough to serve the peaks then it is probably over-provisioning when the demand is not high. The elastic dynamic provisioning cloud computing services offer can be exploited for handling unexpected loads and to keep the provisioning close to the exact requirements of the application by adding virtual machines representing workers. Later on those added workers can be destroyed with minimum effort. The most notable example for exploiting public cloud computing self-provisioning mechanisms is Animoto [33] [34] [35] [36]. This web application allows its users to generate movies with their pictures and music uploaded from their computer or from other web location. The templates for the movies and a small playlist are available on the Animoto servers so some of the data that the algorithms use is not uploaded by the users. One movie generation takes several minutes of processing which makes the application very suitable for web server farm where each request is processed independently. Using EC2, S3 and SQS Animoto had automatically scaled in just a few days from several hundred instances to thousand instances [34]. Different compute intensive applications that can make effective use of the cloud computing are problems that do not require big input data or the data is already on the server as data mining, map browsing, etc. However they produce other challenge for the developers - distributing huge volumes of data. 2.8 Load balancing Different load balancing techniques are discussed and compared in [37] in the context of public cloud services. The hardware-based Load Balancer (LB) is very scalable and very efficient and robust, but it is usually not offered in public cloud environments and it is typically expensive. Domain Name System (DNS) load balancing is another 10

22 option, but it also has serious disadvantages since the DNS servers cache information for exact time and therefore prevents the elasticity. Layer 2 optimizations approaches are disabled in Amazon EC2 because of security issues. Software load balancing is not very scalable because of CPU and network bandwidth limitations. To overcome these constraints they suggest a client-side load balancing scheme which indeed seems to be very scalable, but it is targeting AWS. The client accesses a page stored in S3 which is empty, but includes Java script that holds the logic for the load balancing. The script or the content of the page. However such a client-side mechanism will not fit into the private cloud platform because the web servers are expected to have private addresses which are not accessible directly from the clients. Layer 2 optimizations will not be possible for cloud bursting with Amazon. DNS cannot deal with short peaks and hardware-based load balancers are still expensive. For compute intensive applications the CPU limitation for the software LB might not affect the performance, but the network bandwidth will most probably be an issue for a range of problems. However it is the easiest to deploy and control load balancing mechanism within the private cloud and to allow scaling out to public services. Figure 2.2: Horizontally scaled web servers. 11

23 The classic way to scale web servers is horizontal scaling [36] shown on Figure 2.2. The users request content from a domain or IP address that is associated with the load balancer (LB). The LB re-distributes the incoming requests to the workers using a specific algorithm. The web servers can access shared content as database or file system concurrently if necessary. The shared resources also might be distributed. When a worker sends response to the client the traffic goes again through the LB. 2.9 Benchmarking web servers There are existing tools designed to benchmark web servers. One such a benchmark is SPECweb2005 [38] [39] which includes 3 different tests: Banking simulates load that user would generate to a possible banking system. Aims at testing SSL connections and is therefore the most CPU intensive of all. Ecommerce simulates e-commerce website where a mixture of HTTP and SSL connections occur. Support only focuses on HTTP connections and simulates a support website where the users just browse but also download big files, up to 40MB. The benchmark comes with ready-mae PHP or JSP applications and the tests are against them. However the current project aims at compute-intensive applications that do not fit in any of the cases covered by SPECweb2005. Another powerful tool is httperf [40] [41]. It is designed to generate and sustain server overload [40]. The tool provides many options and it is intended to be universal. However its main goal is also benchmarking the components of system rather than the application itself. All the generated results are grounded on received responses, but in the current application benchmarking it will be important if all the calculations have been successful and no errors have occurred during the execution. Therefore a different approach for benchmarking the entire setup with the application has to be considered. 12

24 Chapter 3 Web server farm design In order to build a small private cloud two physical machines with equivalent configurations were deployed in the University of Edinburgh gigabit network. Each one has 4 cores Intel CPU and 8GB of memory. The first machine (i.e. Node 1 in the setup) is used as a front-end (where the OpenNebula software runs) for the cloud and also as a node or host for VMs. The other one (Node 2) is only hosting VMs. 3.1 Cloud software stack OpenNebula requires a number of other tools and technologies in order to be able to start and stop VMs by using the virtualization hypervisor, Kernel-based Virtual Machine (KVM). Figure 3.1: Standard OpenNebula cloud organization The standard cluster-like style of organizing the resources used by the cloud platform is shown on Figure 3.1. Normally the OpenNebula daemon and scheduler run on a frontend separately from all the workers. However the hardware available for the project is 13

25 limited and dedicating a whole machine for the front-end is not appropriate. Instead the front-end and one of the hosts are in fact the same machine as shown on Figure 3.2. The OpenNebula software along with the virtualization software run together on node Intel V T-X Figure 3.2: The actual OpenNebula cloud organization used This hardware-assisted Intel Virtualization Technology combined with software-based virtualization solutions provides maximum system utilization [15]. It represents an extension that allows issuing of specific privileged instructions for x86 architectures from the guest OSs through the hypervisor. The hardware virtualization support is required by the KVM kernel modules in order to perform full virtualization. It is available on many Intel processors [42] Scientific Linux 6 (Host OS) It is the main Linux distribution used within EPCC. It has support for KVM and all the virtualization software is prebuilt as packages and available K V M kernel modules The Kernel-based Virtual Machine (KVM) software includes two loadable kernel modules common KVM module and chip-specific one (kvm-intel and kvm-amd). They extend the OS kernel functionality that is not available within the OS core thus transforming it into hypervisor. When the modules are loaded the hypervisor can host Linux and Windows machines. Each machine has private virtualized hardware: a network interface card (NIC), disk, graphics adapter, etc [19] Libvirt Libvirt [43] is an API or a toolkit that provides control over hypervisors running on a modern Linux system. The main goal of the project is to provide a common and stable 14

26 layer sufficient to securely manage domains on a node [43], where domain is OS running on a VM. Libvirt can be also managed remotely. In the current setup OpenNebula controls the KVM hypervisor by interacting with the libvirt API on each node separately. Libvirt also installs a private bridge on the host so that the VMs running on it can be easily connected in a small private Network Address Translation (NAT) based network OpenNebula OpenNebula calls libvirt functions for all the VM operations such as creating, destroying, shutting down, migrating, etc. The OpenNebula daemon running in the front-end must be able to connect to the nodes through a password-less ssh. In the official installation instructions [44] it has been suggested that all the nodes use only one private key copied to the shared location on the front-end. However in the current setup it is more secure to have individual private keys for each machine and to only allow password-less connections from the front-end and not to expose private keys in a public network. For a big number of nodes it is easier to have only one key. on the front-end is exported on the network, mounted on each node and made as a home folder there as well. In this way all the hosts share the same ssh settings and have access to a centralized storage for all the running images. The latest stable version at the time the practical work of the project started was OpenNebula 2.2 so it the one installed. 3.2 Web server software stack Java Java [45] as a language and platform is widely used for web applications and at the same time it is often used for scientific applications and therefore there are many existing libraries as the one chosen for the project JLapack [46]. Though Java Virtual Machine (JVM) might have some limitations especially with its heap memory size, its performance is still sufficient for the current setup. Furthermore there are a number of Java web containers to host web applications. Tomcat [47] is an open-source widely used and robust Java Servlet container that powers many big-scale applications Load balancing/proxy server As discussed in 2.8 software load-balancer is not the most scalable solution so an efficient server that proxies the incoming traffic is a necessity. Nginx [48] [49] is stable, secure and high-performing web server and proxy server. It can be easily configured 15

27 and its ability to handle requests asynchronously makes it very efficient in terms of memory and CPU usage [49]. In the current setup nginx acts as a load balancer. 3.3 Private cloud web server farm setup Figure 3.3: Private cloud web server farm The private cloud server farm shown on Figure 3.3 hosts the web application. Clients send requests to the load balancer which only proxies them to the workers. The LB and the web servers are running VMs controlled entirely by OpenNebula deployed, monitored and later deleted. The OpenNebula scheduler decides where to put each worker. However the LB has to be placed on a machine that is connected to a public network. Currently both the nodes are connected. Other option, in case no node can be accessed from the outside, is configuring a port on the front-end that forwards the traffic to the proxy. The web servers have to be visible to the LB. 16

28 3.3.1 Networking Figure 3.4: Standard private cloud network configuration The ideal configuration for a cloud would normally include a dedicated private network especially for the usage of the cloud and therefore the administrator will have complete control over the network, illustrated on Figure 3.4. Then, for example, several or dual/quad-port NIC cards can be added on each node and connected to the network switch so the VMs do not share the same physical Ethernet interface. The front-end is the only machine connected to the public network, but it also has access to the private network for the cloud in order to access each host. IP addresses should be assigned to the front-end, nodes and VMs. Example for IPs organization is listed in Table 3.1. In such a configuration a major problem will be where the LB is deployed. Several different approaches exist. The LB might be installed on the front-end as a VM or on the physical machine directly. It can be also deployed on any node and a port from the front-end re-directed to the internal IP or an additional Ethernet interface can be added to one of the hosts so the LB VM can use it and connect directly to the public network. 17

29 Machine Address Network Front-end Node Node N VM VM VM Y Table 3.1: Theoretical IP addresses organization for private cloud VM can only use networking effectively if the NIC card is virtualized which is the major performance bottleneck if there is an intense data transfer in or out of a VM. Connecting a VM to a local or public network and making it accessible for others is achieved through Ethernet bridging [50] [51] [44]. For each physical host a virtual interface representing a bridge has to be added and logically attached to the physical network interface. Thereby the VMs running on the node will be able to access the local network and the other machines will be also able to see them. However the traffic goes through the bridge which is shared between all the VMs and the host OS. There is also another approach available for VMs networking. Each machine needs to have access to the local network through User Networking [50] and thus gaining access to a VPN server. As a consequence all the VMs along with any other machine that is able to contact the server can be organized together in a private network with OpenVPN [52], for example. However this approach is not efficient and the encrypted VPN connection represents additional overhead to the network performance. It should be used if the limitations caused by the decreased network bandwidth are not critical. A full control over the network was impossible for the current project and due to EPCC policies the machines for the cloud were deployed within a public network. In this way the hosts can be directly accessed from Internet and they are not controlled by the LCFG [53] system used by the EPCC support. Having the nodes for the cloud in a public network causes several issues. First the IPs cannot be assigned and used freely because they are public. Second the Gigabit network the machines are connected to is used by other people so its performance can vary in time. Furthermore the LB receives the incoming traffic and just a moment later it redistributes the requests to the workers using the same network which means sharing the same bandwidth. The actual private cloud web server farm used in this project includes LB and 7 web servers organized by 4 on each node, as shown in Figure 3.5. The LB and three of the workers are deployed on node 2 and configured to use private NAT-based network that libvirt provides (bridge virbr0) in order to avoid usage of public IPs. The LB has another virtual network interface that connects it to the public network through the 18

30 public bridge (br0) and thus being accessible from Internet. The web servers deployed on the front-end however require public IPs in order to receive requests by the LB. Figure 3.5: The actual private cloud network configuration Though connected in different ways the network bandwidth between the LB and workers running on node 1 and node 2 is similar: Mbits/sec (see B.3 and B.4). This is the maximum outgoing traffic trough a bridge with the bridge-utils package provided by the OS which means there is almost 10x times overhead in this direction. Obviously because of the limitations, a specific mapping of the VMs over the hosts has to be done. However, in a standard configuration the only requirement might be for the LB location. 3.4 Hybrid cloud web server farm setup OpenNebula provides functionality to control EC2 instances through the EC2 API tools. Instances can be easily launched and terminated from the private cloud in the same way the internal VMs are controlled - thus making the cloud hybrid. However OpenNebula does not fully use the EC2 command line tools. Furthermore, OpenNebula does not provide monitoring for the running EC2 instances. The current setup shown on Figure 3.6 represents the standard configuration suggested in [54]. The running EC2 instances are accessible remotely and from the LB in particular so it can use them as workers and re-send requests to them in the same way it does for the local workers. However, the data sent across Internet might be read from 3 rd parties which could be a problem if the application is serving internally for the organization needs. If that is the case then setting up a VPN should be considered, but as discussed in the encrypted connections introduce additional overhead. In the current setup it is accepted that the data in not sensitive. 19

31 The maximum network bandwidth from the LB to any EC2 instance running is also limited by the bridge MBits, but in this case it can vary since it is dependent of the Internet connection. Figure 3.6: Hybrid cloud web server farm. Reproduced from [54] V M performance comparison The main issue when involving VMs outside of the private infrastructure to cooperate with local resources is the fact that the external provider probably does not offer instances that exactly match the performance of the internal machines in terms of CPU, memory and network bandwidth. Consequently the first action to undertake when using the hybrid cloud is to establish which of the EC2 machines shown in Table 2.1 has similar productivity to a local web server. A significant difference in the performance of the internal and the external machines might present additional load imbalance. Micro instances included in the free tier were not tried since they are intended for less demanding applications, but may still consume significant compute cycles periodically and allow bursting CPU capacity when additional cycles are available [55]. Though significantly cheaper than the other instances they are not predictable in terms of CPU and only provide 613MB of memory which is not enough for the application. The machines that look similar to the local workers are the small instance and the High- CPU medium instance. In order to compare them with the local worker the application was run on each of them and the average timings from 20 executions are shown on Figure

32 Execution time, ms Problem size Time on local VM m1.small c1.medium Figure 3.7: Performance comparison between E C2 instances and local V M. Single executions. The small instance is far from the productivity of the local worker. The medium instance, though also slower, has 2 logical CPUs and when using a thread pool to execute 10 tasks concurrently (Figure 3.8) outperforms the local web server, but the difference is not dramatic so for the purposes of the hybrid cloud the High-CPU medium instance will be used. The results on Figure 3.8 are also average timings obtained from 20 executions. Execution timings, s Problem size local VM m1.small c1.medium Figure 3.8: Performance comparison between E C2 instances and local V M. Thread pool with size

33 3.5 System of linear equations solver The algorithm The chosen algorithm consists of 2 separate operations - LU factorisation and partial pivoting. Their complexities are respectively O(N 3 ) and O(N 2 ). The data increases in a N 2 fashion [56]. Even though it is unusually efficient algorithm [56] the increased amount of context switching caused by the virtualization should affect the performance. Though the problem is not particularly suitable for web it is simple enough to deploy and test, and represents compute load similar to applications as image processing algorithms used in all image hosting websites, mathematical services as [57], etc. So comparable for all O(N 3 ) problems where the input data increases with N 2. Such kind of applications is different not only because they require more time for processing, but also because they need a much bigger input data than the standard web applications. Figure 3.9 shows the increase of the data sent to the web server with the increase of the size of the input data. It includes the 2-dimensional N 2 array with the parameters and 1-dimensional N array representing the Right Hand Side (RHS) Data sizes, kbytes Problem size Figure 3.9: Size of data transferred in relation to problem size The web application The web application consists of a simple Java Servlet [58] which accepts POST HTTP data parameters sent from the clients in the same way files are sent. The input data is organized into arrays and passed to the JLapack methods that solve the system. After the calculations have finished the Servlet sends response to the client. 22

34 The application is built [59] and the generated war file is deployed into the Tomcat container. 3.6 Running the cloud Preparations Using the commands OpenNebula provides, several actions were undertaken: 3 hosts were added and configured so OpenNebula can access them, deploy and manage VMs. These hosts are node 1 (i.e. front-end), node 2, and a host that represents EC2 services. 2 networks were added a private and a public one. The private network is used for the NAT-based communications that take place on node 2 including the LB and workers. The public network will connect the LB and the web servers on node 1. 1 image was added so the VMs can use it. It is configured to start the application on boot. The same image is used for the LB where manual adjustments will be required for the list of the web servers and reloading the configuration of the proxy. The image has inside scripts provided by OpenNebula for configuring the network interfaces based on their Media Access Control (MAC) addresses so that each VM can start with networking. Several templates were written so they can be used to start VMs both locally and in EC2. Templates are simple text files that OpenNebula uses for describing images, networks, VMs. Using AWS Management Console a few additional steps were completed: An Amazon Instance Image (AMI) was prepared with the functionality similar to that of the local image and stored in EBS so it can be used to start new instances. Since EBS is paid a snapshot of running instance can be made and downloaded locally, but because of the limited time duration of the hybrid cloud experiments and for simplicity the AMI was stored on EBS. The default security group was altered so it can accept connections from the required ports Resources allocation Each host has 8GB of memory with 0.5GB of that reserved for the host itself. The LB is configured to use 1.5GB and the workers 1.87GB each so all the workers have the same resources available. Each host has 4 cores CPU so the most natural configuration is 1 core per VM to avoid some of them using the same core. Unfortunately, at least one of the workers will have to share core with the host. 23

35 A user might expect from cloud management platform to actually place VMs to work over a particular core of the underlying hardware. However the only reason OpenNebula requires CPU parameter seems to be only to check if the sum of the CPU power used by all the VMs is less than what is available. For example, if the host has 2 cores and there are 2 VMs running each submitted for 0.7 CPU, the system will not allow another one with 0.7, but a VM that requires 0.6 will make it through. In other words, OpenNebula does not map CPU cores with running VMs. However, KVM represents running VMs as Linux processes which allows the OS scheduler to schedule them [60], but they can be also manually controlled with virsh [61]. According to the statistics the top command returns, the physical CPU utilization during the experiments was close to the maximum which means that the virtualization software and the OS perform reasonable mapping of the physical resources and VMs so the cloud platform does not need to do it explicitly. However cloud management systems might benefit from having a feature that allows more precise control VMs and CPU mapping. 3.7 Running the application Load Balancer 2 different approaches were attempted. The first one LB running on a VM - was already described in section 3.3. This approach exhibits network bandwidth limitations caused by the bridge. The bandwidth between physical machine on the network representing client and LB VM varies between 250Mbits/sec and 600Mbits/sec (see B.1) and from the LB to the workers is Mbits/sec (see B.3 and B.4). In order to overcome this issue partially another second approach was tried installing the LB on the physical host. By eliminating the data transfer from machine that uses virtualized network device the bandwidths are better more than 900Mbits/sec to the LB (see B.2) and around 350Mbits/sec to the public workers on node 1 (see B.5) and Mbits/sec to the local workers using the NAT-based network (see B.6) from the LB to the workers. However the LB deployed on physical host has one significant disadvantage if the host fails, another machine has to be configured instead of only deploying a new Load Balancer VM (LB VM). Improving the LB VM networking can be performed by using several NIC cards or one many-port NIC card. The configuration of the nginx proxy (i.e., LB) is trivial. It requires a list of servers that will handle the incoming traffic. The default module for load balancing includes Round-Robin (RR) and weighted RR algorithms [62]. Load balancing requests with varying sizes can be accomplished with the fair upstream module [63]. It knows the number of requests served by each server and re-distribute the incoming requests to the least loaded server. For the goals of the project using RR will be sufficient. Adding new servers to the setup does not require stopping the proxy. The configuration is reloaded by sending a signal to the parent process which kills the nginx worker and starts a new one. 24

36 3.7.2 Web server Each worker has the application war file deployed in Tomcat and starting on boot. A major impact over the performance of the application and the web servers seems to be the Java heap memory size. By default it is clearly insufficient for memory demanding applications so it has to be increased. The bigger amount of memory that is allocated for the heap the bigger capacity the application has. Usually the web servers process a large However, the current application requires more time to generate response and if many threads are working concurrently on the server this may worsen the performance. Throughput (req/s) Size of Tomcat thread pool N = 50 N = 150 N = 250 Figure 3.10: Throughput generated by 1 web server for N = 50, 150, 250 for different number of running threads Tomcat works with a thread pool that handles the requests. Figure 3.10 illustrates the throughput (number of requests served per second) for the application running on 1 web server. If the application is running on OS installed directly on the hardware, the increased context switching will have significant impact over the performance. Despite this, Figure 3.10 shows that the throughput is nearly constant for any number of threads and for any problem size which means that the context switching impact is negligible compared with the virtualization overhead. However, the average time to solve each request is longer with bigger number of threads and in a real application such an effect would be undesirable. Furthermore, if running many threads concurrently and they solve big problem sizes the setup will quickly reach the maximum Java heap size and run out of memory. So in order to keep the capacity of the servers high and the average time to solve each request reasonable, a small thread pool size should be chosen. All the other experiments in the project have been performed with 10 threads working on each web server. 25

37 3.8 Benchmarking Benchmarking tool As described in section 2.9, the existing web benchmarks are not suitable for the current setup so another approach has to be considered. In order to simulate realistic load for an application that is time consuming, the number of http connections is not required to be significant. Several hundred simultaneous connections for the smallest problem size would be completely sufficient to establish the real throughput of the web application. Hence the benchmarking tool could be run from a single location reaching the maximum possible throughput without being limited by the performance of the hosting machine. The benchmarking application relies on Java multithreading mechanisms. A number of threads are started through a thread pool, synchronised just before every thread establishes HTTP connection, then they wait for response. After it is received they are synchronised again so the correct timings could be measured and afterwards their execution finishes. The barrier used was provided in OOP in HPC course. Before the thread pool is started data is generated and built into a string which is later sent with POST method to the web application. The data generation algorithm provided by EPCC in [56] allows easy verification of the results of the calculations. The design of the algorithm allows each thread to perform a number of requests. Furthermore for every request each thread can provide different problem size. However for the current setup benchmarking, only 1 request is executed for each thread with the same problem size for the whole thread pool. The motivation behind this is to determine the best possible performance of the setup which is clearly when the problem sizes solved concurrently are equal Benchmarking Different locations were considered to host the tool. However in order to make use of all the network bandwidth available to the LB and to represent realistic results, a machine within the university network was chosen. All the experimental data was collected by executing the benchmarking tool from ph-cplab. The bandwidth between these machines and the cloud nodes is very close to gigabit (see B.2). If the LB is deployed on a VM the bandwidth can vary (see B.1) because of the virtualization of the network interface. Based on the fact that the network bandwidth varies and other people might be using it at the time, average timings will not be comparable. Instead minimum timings (i.e. maximum throughputs) were collected from two groups of 15 executions in different moments in order to reduce the chance of other students using intensively the ph-cplab network. 26

38 Chapter 4 Results and Analysis 4.1 Results The results of the experiments will be illustrated with the following 3 figures: Throughput graphs in order to prove that the throughput of the application increases with increasing number of virtual web servers (i.e. VMs). Throughput speedup shows the rates of increasing throughput which for bigger problem sizes might not be clear from the throughput figure alone. It represents the ratio between the current throughput and the throughput for 1 web server. Network bandwidth since the network connections have a significant impact over the performance, the bandwidths between all communicating points in the presented setup will be illustrated. The entire collected throughput data is based on the maximum throughputs (i.e. minimum timings) from 2 groups of 15 independent executions of concurrent http requests as explained in section For the biggest problem size test the smallest number of threads generating load was used. For the application that simulates heavier computational load http requests were used since the web servers are expected to be busier and the throughput can be calculated with smaller number of connections. 27

39 4.2 Private cloud web server farm L B deployed on V M Throughput (req/s) N = 100 N = 150 N = 200 N = Web servers Figure 4.1: Throughput generated by private cloud web server farm for different problem sizes with LB deployed on V M As Figure 4.1 shows the throughput of the setup, bigger problem sizes have lower number of requests served per second since they require more time for data transfer and calculations. The application cannot effectively use the available workers in the private cloud when using VM LB. The first 3 web servers (i.e. number 1, 2 and 3 on Figure 4.1) are local for the node that hosts the LB and they use the private bridge (virbr0) that is also a gateway for them. The rest of the workers make use of the public bridge (br0) available on each host. The network bandwidths shown on Figure 4.2 limit the scalability of the problem. The LB has only half of the available bandwidth of the gigabit network when contacted from the client in the local network. But the outgoing connection from the LB is what really restricts the setup and therefore bandwidths through the both bridges provide Mbits/sec between the LB and a web server. Furthermore the LB shares the same bandwidth for both receiving requests from the clients and re-distributing them to the workers on node 2 (i.e. number 4, 5, 6 and 7 on Figure 4.1). The virtual gateway (i.e. the private bridge virbr0) cannot handle effectively the intensive incoming and outgoing traffic at the same time and when all the 3 local workers are involved in the farm the throughput of the setup drops especially for bigger problem sizes where the input data is bigger. However, the physical network provides more efficient connections and the performance slightly improves when involving the other web servers. 28

40 Bandwidths, Mbits/sec 1. To the LB Bandwidth, Mbits/sec From the LB to local worker 3. From the LB to other worker Between physical machines in the local network Figure 4.2: Network bandwidths for private cloud web server farm with V M LB Considering the nature of the problem the throughput increasing rate should be close to linear speedup, but it is only acceptable for 2 web servers and only for the biggest problem size - Figure 4.3 The data sent to the LB and then re-sent using the same network connection represents the part of the problem that limits the speedup of the throughput and therefore cannot be improved. The poor speedup for the increasing number of web servers also means that the CPU utilization in each one of them aggravates since most of the time is spent for data transfers especially for the smaller problem sizes Throughput speedup N = 100 N = 150 N = 200 N = Web servers Figure 4.3: Throughput speedup for private cloud web server farm with V M LB 29

41 4.2.2 L B deployed on a physical machine In order to improve the bandwidth and to diminish the impact caused by the virtualized networking devices the LB was moved to one of the physical hosts increasing the speeds considerably as shown on Figure 4.4. The transparent columns represent the bandwidths taking part in the VM LB setup. The incoming traffic from the clients to the LB is increased to the maximum available from the Gigabit connections. All the workers are now able to receive data at higher rates. The connection to the local workers is 6 times faster and to the web servers connected on the network up to 3 times faster. Bandwidth, Mbits/sec To the LB (VM LB) 2. To the LB 3. From the LB to local worker (VM LB) 4. From the LB to local worker 5. From the LB to other worker (VM LB) 6. From the LB to other worker 7. Between physical machines in the local network Figure 4.4: Network bandwidths for private cloud web server farm with LB on a physical machine After the performed modifications the s are significantly better as illustrated on Figure

42 Throughput (req/s) N = 100 N = 150 N = 200 N = 250 Web servers Figure 4.5: Throughput generated by private cloud web server farm for different problem sizes with LB deployed on a physical machine Throughput speedup N = 100 N = 150 N = 200 N = Web servers Figure 4.6: Throughput speedup for private cloud web server farm with LB deployed on a physical machine Figure 4.6 shows the speedups of the throughput for different problem sizes. The speedups for all problem sizes are comparable. This means that the network bandwidth capacity is now big enough to handle bigger problem sizes without affecting the throughput and its speedup. The speedup is not linear because the time for transferring the data to the LB is the same for any number of web servers and cannot be improved, but the fact that the increasing rate is close to the ideal shows much better CPU utilization. However, it seems that the speedup saturates at 6/7 web servers and has reached its maximum. Therefore benchmarking it in the hybrid cloud is not reasonable. 31

43 Though this setup has much better performance than the LB deployed as VM there are still critical limitations. Web servers connected to the physical network and waiting for requests from the LB share the same bandwidth because of the fact that they share the same Ethernet device the machine that hosts them has. In additon the LB traffic is also shared between the clients and some of the workers. And as discussed in if the LB fails a new machine has to be configured instead of only submitting new already prepared VM on a working host which will clearly provide minimum downtime. Clearly this kind of problems requires careful tuning of the network configuration and complete control over it. Even though it is CPU consuming, it also require big size input data whose transferring seems to be a big challenge. However in order to continue raised by repeating the calculations 20 times for each request. In this way increasing the computational time decreased the impact of the time necessary to send the data to the server and the network organization shortcomings. Similar more CPU demanding problems are different objects recognition, video processing, document conversations, etc. For the chosen algorithm the increased load could represent repeating the calculations for more precise solution or solving the system of equations for different initial conditions (i.e. different RHSs) Increased load - LB deployed on V M The setup is the same as the normal application however the tests proved better throughput speedup as it can be seen on Figure 4.7 and Figure 4.8. Throughput (req/s) Web servers N = 75 N = 125 N = 175 N = 225 Figure 4.7: Throughput generated by private cloud web server farm for different problem sizes with LB deployed as V M for increased load application 32

44 Throughput speedup N = 75 N = 125 N = 175 N = Web server Figure 4.8: Throughput speedup for private cloud web server farm with V M LB for increased load application The throughput and the speedup are increasing gradually for any number of web servers and big problem size that was tried within the private cloud due to the increased computational load. The speedup for smaller problem sizes is still influenced by the slow network connections. This allows further experiments to be conducted by extending the cloud to hybrid and using computing power from EC Increased load - LB deployed on a physical machine Throughput (req/s) N = 75 N = 125 N = 175 N = Web servers Figure 4.9: Throughput generated by private cloud web server farm for different problem sizes with LB deployed on a physical machine for increased load application 33

45 Throughput speedup N = 75 N = 125 N = 175 N = Web servers Figure 4.10: Throughput speedup for private cloud web server farm with LB deployed on a physical machine for increased load application Running the increased load application with the LB on a physical machine is also beneficial. Throughput and speedup graphs are almost straight lines with the speedup very close to the linear as shown on Figure 4.9 and Figure Such kinds of problems obviously do not require the perfect network setup, but overcoming the disadvantages mentioned before could improve the performance even more. Those results provide good start for cloud bursting experiments. 4.3 Hybrid cloud web server farm The hybrid cloud experiments were conducted using LB deployed on the physical machine setup for increased load as base since it proved to be the most scalable one. As described in High-CPU medium instance was chosen for the hybrid cloud experiments. The network bandwidth between the internal and the external machines should in general be smaller than the bandwidth within the private network, but that is not valid for the current setup as shown on Figure In terms of bandwidth there is no difference between the LB sending the data to a non-local worker that shares NIC card with other VMs and the LB sending to EC2 instance. The NAT-based network has good performance when the traffic is only incoming. 34

46 Bandwidth, Mbit/sec To the LB 2. From the LB to local worker 3. From the LB to other worker 4. From the LB to EC2 medium instance Figure 4.11: Network bandwidths for hybrid cloud web server farm with LB on a physical machine Expectedly the throughput still increases in a very good rate as illustrated on Figure Consequently the speedup on Figure 4.13 is also increasing and is still close to the linear. It is slightly better for bigger problem sizes since they take more time for calculations and the other non-parallelizable part (the data transfer) is less significant. The hybrid cloud setup introduces a small load imbalance since the EC2 medium instances have a little better performance and should have finished with the calculations before the local workers. However this is hard to be followed and controlled because the bandwidth varies (see B.8) and sometimes it can be compensated, sometimes worsened. Therefore, for the balanced load generated from the benchmarking tool the proxy only needs RR algorithm to re-distribute. If the bandwidth is constant and for example heterogeneous resources are used within the private cloud weighted RR [62] approach should be considered based on the performance of each worker. 35

47 Throughput (req/s) N = 75 N = 125 N = 175 N = 225 Web servers Figure 4.12: Throughput generated by hybrid cloud web server farm for different problem sizes with LB deployed as V M for increased load application 14 Throughput speedup N = 75 N = 125 N = 175 N = 225 Web servers Figure 4.13: Throughput speedup for hybrid cloud web server farm with LB deployed on a physical machine for increased load application Even though the network has several disadvantages, the throughput still increases and the application scales very successfully. This also means that this group of problems is also a good choice for pure public cloud solutions. Furthermore AWS also offer cluster configurations with 10 Gigabit (see Table 2.1) networks that can be exploited if better I/O performance is required and the budget fits into the pricing scheme. 36

48 4.4 Cloud management platform role Centralized control Clearly the whole setup can be built without cloud platforms and will have the same performance and the same issues. VMs can be started directly from the virtualization software having direct access to all of its functionality at the moment their instances are created. However each hypervisor has to be controlled separately and managing big data centre with many nodes in this way would be very time consuming task. OpenNebula provides centralized control point that contacts each hypervisor without ervention. It can also manage different hypervisors together and it provides centralized storage for VM images. In addition its scheduler takes decision for the node that will host the guest OS based on the least busy node E C2 instances managing OpenNebula also provides control over EC2 instances. Therefore the cloud user can easily perform most of the basic operations provided by EC2 API Tools. In this way the transitions from private to hybrid when required after OpenNebula is configured requires minimum effort. Only simple VM template that describes the EC2 instance to be run has to be provided so it can be used for VM submission. There are also limitations for the maximum number of VMs that can be launched for small, large and extra large instances. Specifying a maximum number for medium instances is not documented and if added in the configuration in the same style the limitation appears not to be working User management OpenNebula also provides user management control. By using it resources can be easily assigned to different users so they will be able to manage their own instances and use them for different purposes Cloud API The provisioning of resources can be entirely automated by using OpenNebula OCCI API specification [11] which is based on Open Cloud Computing Interface [64] or OpenNebula EC2 API [12] which allows the platform to be used as EC2. Such a dynamic resource allocation if implemented properly would eliminate human interaction with the platform when scaling. In this way the provisioning of the setup will be improved by allowing a user to scale out or scale in when required, for example, as a response to changes in the required computational load Monitoring The main issue with the platform during the experiments was monitoring the running VMs. Though OpenNebula provides some statistics they are either not reliable or not dynamic. 37

49 During the execution of the application on all the local VMs - Figure 4.14, Figure 4.16 and Figure 4.17 were captured for the best performing setup. Figure 4.15 and Figure 4.17 are screenshots of Virtual Machine Manager [65] - widely used graphical tool for virtualization control. In this moment the CPU usage is expected be high. Figure 4.14: Monitoring OpenNebula V Ms through its command line tools Figure 4.14 shows the result of listing all VMs that OpenNebula is running using its command line interface. The listing has 2 fields that are designed to provide information about the resources for a VM according to the Command Line Interface (CLI) documentation [66]. The memory information though not of a particular interest is static and it displays only what is specified in the template file before the VM submission. The data in the CPU column should present the percentage of CPU used by the VM [66], but is also constant for long intervals, but it changes. If compared to Figure 4.17 where the CPU usage of node 1 is almost at maximum at the same moment it is clear that this is not real-time information. However this might be a problem caused by the virtualization software and the information it provides or the purpose of the field is to display the average CPU utilization for the entire period the VM has been running which seems to be possible based on the numbers on the figure. Monitoring VMs through Virtual Machine Manager shown on Figure 4.15 is more reliable and the CPU usage was followed in this way during the experiments. 38

50 Figure 4.15: Monitirtual Machine Manager The OpenNebula Sunstone GUI for managing the platform uses the same numbers to provide monitoring as the CLI. For the hosts as shown on Figure 4.16 the data is reliable and after refreshing it is comparable with the statistics presented by Virtual Machine Manager illustrated on Figure The interval OpenNebula daemon waits between monitor executions can be changed. However if it is too short it might cause overheads. The number of simultaneous monitor operations OpenNebula performs is set to 5 since it has proved to be expensive in large-scale environments. Figure 4.16: Monitoring OpenNebula through OpenNebula Sunstone Figure 4.17: hosts through Virtual Machine Manager 39

51 Chapter 5 Conclusions and Future Work 5.1 Conclusions This project investigated the usage of a private cloud and its suitability to host compute intensive web applications. Such compute intensive web applications could represent image and video processing, document conversations, mathematical services, etc. In addition it provided results for cloud bursting and scaling the local resources out thus transforming the cloud into a hybrid. Several steps were performed to accomplish this: Two physical machines were used for to create a private cloud, one of them sharing 2 functions front-end and hosting VMs. Cloud management platform was installed and configured. A simple compute intensive web application was developed. Suitable instance offered by EC2 was determined in order to resemble the local VMs performance. The private cloud was configured to make use of external resources. A simple benchmarking scheme was created. Tests were performed to prove that the web application benefits from involving bigger number of web servers within the private and the hybrid cloud. The cloud management platform provides centralized point for managing hosts with enabled virtualization. It also controls running virtual machines virtual machine managers organizing them in virtual networks thus allowing convenient control over farms of virtual server. A virtual machine can be started within up to several minutes and t automatically choosing the most suitable node to host the new guest OS which for big scale environment would be a significant facilitation. Setting the cloud platform to use Amazon EC2 requires several simple configurations. Once they are accomplished, the private cloud can be transformed into hybrid cloud with minimum effort every time additional computational resources are required. Deploying and managing cloud effectively when good performance is required is hard when the cloud administrator does not have full control over the underlying hardware components. The confined freedom to change the network configuration influenced the 40

52 results and restricted the speedup of the throughput with the increasing number of web servers for the original application. The main limiting factors were that the LB used the same connection to receive from the clients and to re-distribute to the web servers, that the web servers shared NIC card along with the host, the fact that virtualization of networking devices cases overheads. The effect of the last issue was lessened by moving the proxy server from VM to one of the physical nodes of the cloud. The standard network configuration that does not present the outlined disadvantages was also discussed. The initial choice of application where the size of the input data is N 2 and the complexity of the algorithm is O(N 3 ) appeared to be well suited for the setup, but the network bandwidths caused by the non-standard configuration of the network limited its performance. However after the load of the application was raised 20 times to represent more CPU demanding problems the network impact was decreased thus the speedup of the throughput was brought close to the linear. The speedup kept increasing at good rates for the hybrid cloud web server farm tests. In some web applications the processing could be much heavier (e.g. multimedia processing in Animoto) thus providing good speedup of the throughput for very big server farms. Therefore compute intensive web application in general are well suited for both private and hybrid environments. The techniques used in this project could be applied to existing resources such as machines in the training rooms in the University of Edinburgh for building a private cloud in order to provide, for example, on demand access to VMs to satisfy relatively short term requirements for computational power. Every Linux machine that is -end needs several software packages and configurations in order to become a node of the cloud. However there are a number networking performance limitations that have to be considered as sharing NIC cards if good I/O performance is required. OpenNebula itself appears to be very easy to use and robust. They avoid vendor lock-in and support usage of the most popular hypervisors. Though no thoroughly documented it has active community that is eager to help with problems and to provide good support for the increasing number of users. It can be easily configured to use EC2 and it offers API that provides the same functionality as EC2 Query API along with OCCI API. Though the version used for the project (OpenNebula 2.2) has some confusing VM monitoring statistics it is evolving rapidly and before the end of the current project a new beta version (OpenNebula 3) was released and announced to have detailed monitoring statistics as a new feature. 5.2 Future work The mechanisms OpenNebula provides [11] [12] for external dynamic control will give an application the ability to automatically scale resources based on its requirements. In this way it will be able to react very quickly to any increase in the load and manage all the requests in a reasonable time. Different approaches could be followed for such a provisioning - this could either be based on the CPU/memory usage of each VM or 41

53 based on the problem sizes of the requests handled by the workers. It will be necessary to replace the proxy server queue service such as Amazon SQS or something similar that holds the requests and knows exactly its size in order to take decisions for launching or shutting down VMs. OpenNebula will surely add many new features to its functionality in the coming years and the requirements from its growing community will be even more. These mechanisms and new features will provide a solid ground for further research into developing private and hybrid clouds for academic and business environments. 42

54 Appendix A Experimental Data This appendix contains the maximum throughputs generated from the benchmarking code used for the figures in Chapter 4. A.1 Standard Load Application V M L B Web Servers/Problem Size N = 100 N = 150 N = 200 N = Table 5.1 Standard load application throughputs with LB on V M. Private cloud server farm. 43

55 A.2 Standard Load Application Physical Machine LB Web Servers/Problem Size N = 100 N = 150 N = 200 N = Table 5.2 Standard load application throughputs with LB on physical machine. Private cloud server farm. A.3 Increased Load Application V M L B Web Servers/Problem Size N = 75 N = 125 N = 175 N = (EC2) (EC2) (EC2) (EC2) (EC2) (EC2) Table 5.3 Increased load application throughputs with LB on V M. Hybrid cloud server farm 44

56 A.4 Increased Load Application Physical Machine LB Web Servers/Problem Size N = 75 N = 125 N = 175 N = (EC2) (EC2) (EC2) (EC2) (EC2) (EC2) Table 5.4 Increased load application throughputs with LB on physical machine. Hybrid cloud server farm 45

57 Appendix B Results of iperf tests The bandwidth between the LB and the workers placed in different location is measured with iperf. Every test runs for 300 seconds and reports results in each 10 seconds. The purpose of detailed listing of these results is to show the behaviour of the network since it appeared to have significant impact over the performance of the examined scenarios. 46

58 B.1 F rom ph-cplab to V M LB -bash-3.2$ iperf -c t 300 -i Client connecting to , TCP port 5001 TCP window size: 16.0 KByte (default) [ 3] local port connected with port 5001 [ ID] Interval Transfer Bandwidth [ 3] sec 300 MBytes 252 Mbits/sec [ 3] sec 352 MBytes 295 Mbits/sec [ 3] sec 397 MBytes 333 Mbits/sec [ 3] sec 366 MBytes 307 Mbits/sec [ 3] sec 454 MBytes 381 Mbits/sec [ 3] sec 408 MBytes 343 Mbits/sec [ 3] sec 402 MBytes 337 Mbits/sec [ 3] sec 370 MBytes 310 Mbits/sec [ 3] sec 384 MBytes 322 Mbits/sec [ 3] sec 419 MBytes 352 Mbits/sec [ 3] sec 438 MBytes 368 Mbits/sec [ 3] sec 455 MBytes 382 Mbits/sec [ 3] sec 608 MBytes 510 Mbits/sec [ 3] sec 590 MBytes 495 Mbits/sec [ 3] sec 676 MBytes 567 Mbits/sec [ 3] sec 554 MBytes 465 Mbits/sec [ 3] sec 520 MBytes 436 Mbits/sec [ 3] sec 498 MBytes 418 Mbits/sec [ 3] sec 463 MBytes 388 Mbits/sec [ 3] sec 660 MBytes 553 Mbits/sec [ 3] sec 730 MBytes 613 Mbits/sec [ 3] sec 545 MBytes 457 Mbits/sec [ 3] sec 574 MBytes 481 Mbits/sec [ 3] sec 630 MBytes 528 Mbits/sec [ 3] sec 566 MBytes 475 Mbits/sec [ 3] sec 604 MBytes 506 Mbits/sec [ 3] sec 603 MBytes 506 Mbits/sec [ 3] sec 648 MBytes 543 Mbits/sec [ 3] sec 578 MBytes 484 Mbits/sec [ 3] sec 607 MBytes 510 Mbits/sec [ 3] sec 15.0 GBytes 430 Mbits/sec 47

59 B.2 F rom ph-cplab to physical machine L B -bash-3.2$ iperf -c t 300 -i Client connecting to , TCP port 5001 TCP window size: 16.0 KByte (default) [ 3] local port connected with port 5001 [ ID] Interval Transfer Bandwidth [ 3] sec 1.09 GBytes 934 Mbits/sec [ 3] sec 1.07 GBytes 922 Mbits/sec [ 3] sec 1.09 GBytes 933 Mbits/sec [ 3] sec 1.08 GBytes 929 Mbits/sec [ 3] sec 1.05 GBytes 905 Mbits/sec [ 3] sec 1.02 GBytes 880 Mbits/sec [ 3] sec 1.08 GBytes 932 Mbits/sec [ 3] sec 1.09 GBytes 934 Mbits/sec [ 3] sec 1.09 GBytes 936 Mbits/sec [ 3] sec 1.08 GBytes 932 Mbits/sec [ 3] sec 1.09 GBytes 935 Mbits/sec [ 3] sec 1.09 GBytes 932 Mbits/sec [ 3] sec 1.09 GBytes 932 Mbits/sec [ 3] sec 1.08 GBytes 930 Mbits/sec [ 3] sec 1.09 GBytes 932 Mbits/sec [ 3] sec 1.09 GBytes 935 Mbits/sec [ 3] sec 1.09 GBytes 934 Mbits/sec [ 3] sec 1.09 GBytes 934 Mbits/sec [ 3] sec 1.09 GBytes 933 Mbits/sec [ 3] sec 1.08 GBytes 930 Mbits/sec [ 3] sec 1.09 GBytes 932 Mbits/sec [ 3] sec 1.09 GBytes 936 Mbits/sec [ 3] sec 1.08 GBytes 929 Mbits/sec [ 3] sec 1.09 GBytes 934 Mbits/sec [ 3] sec 1.09 GBytes 933 Mbits/sec [ 3] sec 1.08 GBytes 929 Mbits/sec [ 3] sec 1.09 GBytes 932 Mbits/sec [ 3] sec 1.09 GBytes 936 Mbits/sec [ 3] sec 1.09 GBytes 933 Mbits/sec [ 3] sec 1.09 GBytes 933 Mbits/sec [ 3] sec 32.5 GBytes 930 Mbits/sec 48

60 B.3 To local worker (N A T-based network virbr0) from L B V M [sasho@t3400msc2vm ~]# iperf -c t 300 -i Client connecting to , TCP port 5001 TCP window size: 16.0 KByte (default) [ 3] local port connected with port 5001 [ ID] Interval Transfer Bandwidth [ 3] sec 134 MBytes 113 Mbits/sec [ 3] sec 136 MBytes 114 Mbits/sec [ 3] sec 136 MBytes 114 Mbits/sec [ 3] sec 134 MBytes 113 Mbits/sec [ 3] sec 132 MBytes 110 Mbits/sec [ 3] sec 136 MBytes 114 Mbits/sec [ 3] sec 134 MBytes 112 Mbits/sec [ 3] sec 136 MBytes 114 Mbits/sec [ 3] sec 135 MBytes 113 Mbits/sec [ 3] sec 133 MBytes 111 Mbits/sec [ 3] sec 133 MBytes 111 Mbits/sec [ 3] sec 134 MBytes 113 Mbits/sec [ 3] sec 138 MBytes 116 Mbits/sec [ 3] sec 134 MBytes 113 Mbits/sec [ 3] sec 136 MBytes 114 Mbits/sec [ 3] sec 133 MBytes 112 Mbits/sec [ 3] sec 133 MBytes 112 Mbits/sec [ 3] sec 133 MBytes 111 Mbits/sec [ 3] sec 135 MBytes 113 Mbits/sec [ 3] sec 134 MBytes 112 Mbits/sec [ 3] sec 141 MBytes 118 Mbits/sec [ 3] sec 140 MBytes 117 Mbits/sec [ 3] sec 136 MBytes 114 Mbits/sec [ 3] sec 141 MBytes 119 Mbits/sec [ 3] sec 134 MBytes 112 Mbits/sec [ 3] sec 140 MBytes 117 Mbits/sec [ 3] sec 134 MBytes 112 Mbits/sec [ 3] sec 140 MBytes 117 Mbits/sec [ 3] sec 132 MBytes 111 Mbits/sec [ 3] sec 132 MBytes 111 Mbits/sec [ 3] sec 3.96 GBytes 113 Mbits/sec 49

61 B.4 To public worker (physical network br0) from L B V M [sasho@t3400msc2vm ~]# iperf -c t 300 -i Client connecting to , TCP port 5001 TCP window size: 16.0 KByte (default) [ 3] local port connected with port 5001 [ ID] Interval Transfer Bandwidth [ 3] sec 156 MBytes 131 Mbits/sec [ 3] sec 148 MBytes 124 Mbits/sec [ 3] sec 150 MBytes 125 Mbits/sec [ 3] sec 141 MBytes 119 Mbits/sec [ 3] sec 144 MBytes 121 Mbits/sec [ 3] sec 142 MBytes 119 Mbits/sec [ 3] sec 144 MBytes 121 Mbits/sec [ 3] sec 144 MBytes 120 Mbits/sec [ 3] sec 151 MBytes 127 Mbits/sec [ 3] sec 150 MBytes 126 Mbits/sec [ 3] sec 148 MBytes 124 Mbits/sec [ 3] sec 150 MBytes 126 Mbits/sec [ 3] sec 149 MBytes 125 Mbits/sec [ 3] sec 151 MBytes 127 Mbits/sec [ 3] sec 144 MBytes 121 Mbits/sec [ 3] sec 139 MBytes 117 Mbits/sec [ 3] sec 151 MBytes 127 Mbits/sec [ 3] sec 144 MBytes 121 Mbits/sec [ 3] sec 144 MBytes 121 Mbits/sec [ 3] sec 147 MBytes 123 Mbits/sec [ 3] sec 145 MBytes 122 Mbits/sec [ 3] sec 150 MBytes 126 Mbits/sec [ 3] sec 151 MBytes 127 Mbits/sec [ 3] sec 144 MBytes 120 Mbits/sec [ 3] sec 146 MBytes 122 Mbits/sec [ 3] sec 143 MBytes 120 Mbits/sec [ 3] sec 143 MBytes 120 Mbits/sec [ 3] sec 145 MBytes 121 Mbits/sec [ 3] sec 142 MBytes 119 Mbits/sec [ 3] sec 144 MBytes 121 Mbits/sec [ 3] sec 4.29 GBytes 123 Mbits/sec 50

62 B.5 To public worker from physical machine L B -bash-4.1$ iperf -c t 300 -i Client connecting to , TCP port 5001 TCP window size: 16.0 KByte (default) [ 3] local port connected with port 5001 [ ID] Interval Transfer Bandwidth [ 3] sec 416 MBytes 349 Mbits/sec [ 3] sec 408 MBytes 342 Mbits/sec [ 3] sec 415 MBytes 348 Mbits/sec [ 3] sec 409 MBytes 343 Mbits/sec [ 3] sec 408 MBytes 342 Mbits/sec [ 3] sec 412 MBytes 345 Mbits/sec [ 3] sec 416 MBytes 349 Mbits/sec [ 3] sec 411 MBytes 345 Mbits/sec [ 3] sec 405 MBytes 340 Mbits/sec [ 3] sec 419 MBytes 351 Mbits/sec [ 3] sec 412 MBytes 345 Mbits/sec [ 3] sec 417 MBytes 349 Mbits/sec [ 3] sec 438 MBytes 368 Mbits/sec [ 3] sec 403 MBytes 338 Mbits/sec [ 3] sec 402 MBytes 337 Mbits/sec [ 3] sec 408 MBytes 342 Mbits/sec [ 3] sec 411 MBytes 345 Mbits/sec [ 3] sec 406 MBytes 340 Mbits/sec [ 3] sec 405 MBytes 340 Mbits/sec [ 3] sec 414 MBytes 347 Mbits/sec [ 3] sec 409 MBytes 343 Mbits/sec [ 3] sec 400 MBytes 336 Mbits/sec [ 3] sec 411 MBytes 345 Mbits/sec [ 3] sec 407 MBytes 342 Mbits/sec [ 3] sec 424 MBytes 356 Mbits/sec [ 3] sec 410 MBytes 344 Mbits/sec [ 3] sec 407 MBytes 341 Mbits/sec [ 3] sec 413 MBytes 347 Mbits/sec [ 3] sec 409 MBytes 343 Mbits/sec [ 3] sec 408 MBytes 342 Mbits/sec [ 3] sec 12.0 GBytes 345 Mbits/sec 51

63 B.6 To local worker (virbr0) from physical machine L B -bash-4.1$ iperf -c t 300 -i Client connecting to , TCP port 5001 TCP window size: 16.0 KByte (default) [ 3] local port connected with port 5001 [ ID] Interval Transfer Bandwidth [ 3] sec 814 MBytes 683 Mbits/sec [ 3] sec 815 MBytes 684 Mbits/sec [ 3] sec 797 MBytes 669 Mbits/sec [ 3] sec 824 MBytes 691 Mbits/sec [ 3] sec 799 MBytes 670 Mbits/sec [ 3] sec 821 MBytes 689 Mbits/sec [ 3] sec 826 MBytes 693 Mbits/sec [ 3] sec 823 MBytes 690 Mbits/sec [ 3] sec 812 MBytes 681 Mbits/sec [ 3] sec 767 MBytes 644 Mbits/sec [ 3] sec 756 MBytes 635 Mbits/sec [ 3] sec 755 MBytes 633 Mbits/sec [ 3] sec 759 MBytes 637 Mbits/sec [ 3] sec 770 MBytes 646 Mbits/sec [ 3] sec 769 MBytes 645 Mbits/sec [ 3] sec 774 MBytes 649 Mbits/sec [ 3] sec 773 MBytes 649 Mbits/sec [ 3] sec 774 MBytes 649 Mbits/sec [ 3] sec 772 MBytes 648 Mbits/sec [ 3] sec 767 MBytes 644 Mbits/sec [ 3] sec 759 MBytes 636 Mbits/sec [ 3] sec 756 MBytes 634 Mbits/sec [ 3] sec 758 MBytes 636 Mbits/sec [ 3] sec 764 MBytes 641 Mbits/sec [ 3] sec 761 MBytes 639 Mbits/sec [ 3] sec 760 MBytes 637 Mbits/sec [ 3] sec 757 MBytes 635 Mbits/sec [ 3] sec 763 MBytes 640 Mbits/sec [ 3] sec 772 MBytes 648 Mbits/sec [ 3] sec 772 MBytes 648 Mbits/sec [ 3] sec 22.8 GBytes 654 Mbits/sec 52

64 B.7 To E C2 medium instance from V M LB [sasho@t3400msc2vm ~]# iperf -c ec eu-west-1.compute.amazonaws -t 300 -i Client connecting to ec eu-west-1.compute.amazonaws.com, TCP 5001 TCP window size: 16.0 KByte (default) [ 3] local port connected with port 5001 [ ID] Interval Transfer Bandwidth [ 3] sec 69.4 MBytes 58.2 Mbits/sec [ 3] sec 146 MBytes 122 Mbits/sec [ 3] sec 140 MBytes 117 Mbits/sec [ 3] sec 143 MBytes 120 Mbits/sec [ 3] sec 154 MBytes 129 Mbits/sec [ 3] sec 155 MBytes 130 Mbits/sec [ 3] sec 148 MBytes 124 Mbits/sec [ 3] sec 156 MBytes 131 Mbits/sec [ 3] sec 137 MBytes 115 Mbits/sec [ 3] sec 138 MBytes 116 Mbits/sec [ 3] sec 140 MBytes 117 Mbits/sec [ 3] sec 135 MBytes 113 Mbits/sec [ 3] sec 135 MBytes 113 Mbits/sec [ 3] sec 150 MBytes 126 Mbits/sec [ 3] sec 138 MBytes 116 Mbits/sec [ 3] sec 136 MBytes 114 Mbits/sec [ 3] sec 136 MBytes 114 Mbits/sec [ 3] sec 134 MBytes 113 Mbits/sec [ 3] sec 135 MBytes 113 Mbits/sec [ 3] sec 135 MBytes 113 Mbits/sec [ 3] sec 136 MBytes 114 Mbits/sec [ 3] sec 130 MBytes 109 Mbits/sec [ 3] sec 142 MBytes 119 Mbits/sec [ 3] sec 134 MBytes 112 Mbits/sec [ 3] sec 134 MBytes 113 Mbits/sec [ 3] sec 136 MBytes 114 Mbits/sec [ 3] sec 140 MBytes 118 Mbits/sec [ 3] sec 137 MBytes 115 Mbits/sec [ 3] sec 137 MBytes 115 Mbits/sec [ 3] sec 149 MBytes 125 Mbits/sec [ 3] sec 4.04 GBytes 116 Mbits/sec 53

65 B.8 To E C2 medium instance from physical machine L B -bash-4.1$ iperf -c ec eu-west-1.compute.amazonaws.com -t 300 -i Client connecting to ec eu-west-1.compute.amazonaws.com, TCP port 5001 TCP window size: 16.0 KByte (default) [ 3] local port connected with port 5001 [ ID] Interval Transfer Bandwidth [ 3] sec 416 MBytes 349 Mbits/sec [ 3] sec 393 MBytes 330 Mbits/sec [ 3] sec 442 MBytes 371 Mbits/sec [ 3] sec 356 MBytes 299 Mbits/sec [ 3] sec 382 MBytes 321 Mbits/sec [ 3] sec 451 MBytes 379 Mbits/sec [ 3] sec 370 MBytes 310 Mbits/sec [ 3] sec 340 MBytes 285 Mbits/sec [ 3] sec 282 MBytes 236 Mbits/sec [ 3] sec 329 MBytes 276 Mbits/sec [ 3] sec 565 MBytes 474 Mbits/sec [ 3] sec 323 MBytes 271 Mbits/sec [ 3] sec 338 MBytes 284 Mbits/sec [ 3] sec 400 MBytes 335 Mbits/sec [ 3] sec 373 MBytes 313 Mbits/sec [ 3] sec 290 MBytes 243 Mbits/sec [ 3] sec 256 MBytes 215 Mbits/sec [ 3] sec 293 MBytes 246 Mbits/sec [ 3] sec 257 MBytes 216 Mbits/sec [ 3] sec 289 MBytes 242 Mbits/sec [ 3] sec 288 MBytes 241 Mbits/sec [ 3] sec 318 MBytes 267 Mbits/sec [ 3] sec 274 MBytes 230 Mbits/sec [ 3] sec 276 MBytes 232 Mbits/sec [ 3] sec 366 MBytes 307 Mbits/sec [ 3] sec 421 MBytes 353 Mbits/sec [ 3] sec 398 MBytes 334 Mbits/sec [ 3] sec 297 MBytes 249 Mbits/sec [ 3] sec 382 MBytes 321 Mbits/sec [ 3] sec 517 MBytes 434 Mbits/sec [ 3] sec 10.4 GBytes 299 Mbits/sec 54

66 Appendix C Work plan modifications The fact that the project required separate machines presented issues that were not carefully considered at the beginning. Due to EPCC policies the direct remote access to the machine was refused after it was initially granted. At the same time the machine was in a network that is fully controlled by LCFG system so for every basic operation as adding users or installing packages EPCC support had to be contacted. After 2 weeks working in this manner it was acknowledged that this working style is not flexible enough. Hence decision to move the machine to another network and add another machine to the setup was taken, along with changing the OS which was also time consuming. As a consequence the practical work started at 6 June thus significantly affecting the schedule of the project with several weeks. Though such a risk was not considered at the beginning, the work plan of the project had extension part which was dropped and the long period planned for working mainly on the writing was used so the work to reach its objectives. 55

67 References [1] OpenNebula, [2] Amazon Web Services, [3] NIST: The NIST Definition of Cloud Computing, [online], (Accessed: 27 February 2011) [4] X. Chen, G. B. Wills, L. Gilbert, D. Bacigalupo. TeciRes Report: Using Cloud For Research: A Technical Review, University of Southampton, June 2010 [5] SaaS Directory, [6] Google App Engine, [7] MS Windows Azure Platform, [8] Official Website, [online], (Accessed: 29 June 2011) [9] Windows Azure: API References for Windows Azure, MSDN Official Website, [online], (Accessed: 29 June 2011) [10] AWS: Amazon Elastic Compute Cloud API Reference, Amazon Web Services Official Website, [online] July 2011, (Accessed: 29 June 2011) [11] OpenNebula: OpenNebula OCCI Specification, OpenNebula Official Website, [online], (Accessed: 24 July, 2011) [12] OpenNebula: OpenNebula EC2 User Guide, OpenNebula Official Website, [online], (Accessed: 24 July, 2011) [13] G. Reese, Cloud Application Architectures 1 st ed [14] VMWare: Understanding Full Virtualization, Paravirtualization, and Hardware Assist, VMWare Official Website, [online] 2007, (Accessed: 4 June 2011) [15] Intel Virtualization Technology, [16] AMD Virtualization Technology, [17] VMWare vsphere, [18] XEN, [19] KVM, 56

68 [20] Microsoft Hyper-V Server, [21] Eucalyptus, [22] D. Cerbelaud, S. Garg, J. Overview of the State-of-the-art Open Source VM-based Cloud Management, Springer-Verlag [23] 4caaSt: Analysis of the State of the Art and Definitions of Requirements, 4caaSt Project Official Website, [online], of_the_art_and_definition_of_requirements_v1.1.pdf (Accessed: 2 June 2011) [24] Open Stack, [25] OpenNebula Blog, [26] OpenNebula: CERN Cloud Scaling to 16,000 VMs, OpenNebula Blog, [online], (Accessed: 3 August 2011) [27] R. Moreno- of Cluster-, pp [28] Amazon AWS Solutions, [29] Amazon Elastic Comput Cloud: Region and Availability Zone FAQ, AWS Documentation, [online], AQ_Regions_Availability_Zones.html (Accessed: 1 June 2011) [30] Q. Huang, C. Yang, D. Nebert, K. Liu, H. Wu. ACM, pp 35-38, 2010 [31] M. Palankar, M. Ripeanu, S. Garfinkel., pp 55-64, 2008 [32] AWS: Amazon EC2 API Tools, Amazon Web Services Official Website, [online] August 2006, (Accessed: 15 June 2011) [33] Animoto, [34] Animoto Scaling Through Viral Growth, Amazon Web Services Blog, [online] April 2008, (Accessed: 2 June 2011) [35] Lecturer Notes in Computer Science: Cloud Computing, Springer Berlin/Heidelberg, pp [36] Sun Microsystems. Introduction to cloud computing architectures, White Paper, 2009 [37] -, pp , 2010 [38] Standard Performance Evaluation Corporation, SPECweb2005, [39] pdf (Accessed: 9 July 2011) [40] httperf, [41] A Tool for Measuring Web Server Performance Evaluation Review Volume 26, Number 3, pp 31-57

69 37, 1998 [42] Intel Virtualization Technology List, [43] Libvirt, [44] OpenNebula: Planning the Installation 2.2, OpenNebula Official Website, [online], (Accessed: 20 April 2011) [45] Java SE, [46] JLapack, [47] Apache Tomcat, [48] Nginx, [49] - Linux Journal Issue 173, 2008 [50] KVM: KVM Networking, Official KVM web site, [online], (Accessed: 27 June 2011) [51] IBM: Quick Start Guide for installing and running KVM, IBM Linux Information, [online] 2009, /kvminstall/liaaikvminstallstart.htm (Accessed: 15 June 2011) [52] OpenVPN, [53] Large Scale Unix Configuration System, [54] OpenNebula Use Cases: Scaling out Web Servers to Amazon EC2, OpenNebula Official Website, [online] December 2010, (Accessed: 16 April 2011) [55] AWS: Announcing Micro Instances for Amazon EC2, AWS Official Website, [online] September 2010, (Accessed: 23 July 2011) [56] Applied Numeric Algorithms, Dense Linear Algebra, (course slides, EPCC, The University of Edinburgh, 2010) [57] WolframAlpha, [58] Java Servlet Technology, [59] Apache Ant, [60] Red Hat: KVM Kernel Based Virtual Machine, Red Hat Official Website, [online] 2009, (Accessed: 29 June 2011) [61] Libvirt Virtualization API: Virsh Command Reference, Libvirt Official Website, [online], (Accessed: 17 July 2011) [62] Nginx: HttpUpstreamModule, Nginx Wiki, [online], (Accessed: 24 June, 2011) [63] Nginx: HttpUpstreamFairModule, Nginx Wiki, [online], (Accessed: 24 June, 2011) [64] OCCI, [65] Virtual Machine Manager, [66] OpenNebula: Command Line Interface, OpenNebula Official Website, [online], (Accessed: 26 April 2011) 58