1 Computer Technology and Application 4 (2013) D DAVID PUBLISHING Performance of the Cloud-Based Commodity Cluster Van-Hau Pham, Duc-Cuong Nguyen and Tien-Dung Nguyen School of Computer Science and Engineering, International University, Hochiminh City 70000, Vietnam Received: September 05, 2013 /Accepted: October 01, 2013 /Published: October 25, Abstract: Traditional HPC (High Performance Computing) cluster is built on top of physical machines. It is usually not practical to reassign these machines to other tasks due to the fact that software installation is time consuming. As a result, these machines are usually dedicated for the cluster usage. Virtualization technology provides an abstract layer which allows several different operating systems (with different software packages) running on top of one physical machine. Cloud computing provides an easy way for the user to manage and interact with the computing resources (the virtual machines in this case). In this work, we demonstrate the feasibility of building a cloud-based cluster for HPC on top of a set of desktop computers that are interconnected by means of Fast Ethernet. Our cluster has several advantages. For instance, the deployment time of the cluster is quite fast: We need only 5 min to deploy a cluster of 30 machines. Besides, several performance benchmarks have been carried out. As expected, the embarrassingly parallel problem has the linear relationship between the performance and the cluster size. Key words: Cloud computing, HPC cluster, virtualization, performance. 1. Introduction Cluster is most often used for the HPC High Performance Computing. It was in 1994, the first cluster based on commodity-grade computers called Beowulf was created . With this model, we don t need to have expensive computers, network devices to build cluster. As all the traditional computing models, using such cluster has the following shortcoming: we need to have the in-house hardware, among other things, dedicated for the cluster. It is usually not practical to use these computers for other purpose due to the fact that the software installation process is usually a time consuming task. In our work, we try to show that we can build the cluster without the need of dedicated hardware. This is achieved by using cloud computing: Given a set of hardware and software, the purpose of the cloud provider is to provide as many computing services as possible (of course, with certain SLA in place) for the cloud user. Three main widely-accepted service models are Software as a Corresponding author: Van-Hau Pham, lecturer, Ph.D., research field: cloud computing. Service (SaaS) such as the Amazon Flexible Payments Service provided by Amazon , Platform as a Service (PaaS) such as Google App Engine provided by Google , and Infrastructure as a Service (IaaS) such as the EC2 (Elastic Compute Cloud) instance provided by Amazon . If the cloud is widely accessible, it is called public cloud. On other hand, if the access to cloud is limited for employee of a single organization, it is called private cloud. Whatever the deployment model (public or private cloud) is, cloud providers need to resolve the following situation: User s requests may be various; the cloud providers need to prepare various types of computing resources to fulfill user s needs. This is where the virtualization technologies such as VMWare , Xen  and KVM (Kernel-based Virtual Machine)  come to play. Thanks to virtualization technologies, one can run in parallel different operating systems (Windows, Linux ), thus, different types of applications on a single physical machine. By adopting virtualization technology, cloud computing is able to provide different computing services on top of a given set of hardware. This explains why virtualization is often mentioned when talking about cloud computing.
2 Performance of the Cloud-Based Commodity Cluster 533 In this work, we demonstrate the feasibility of building a cloud-based computer cluster for HPC on top of a set of desktop computers. The defining characteristic of our work is that the cloud is not built on the strong, dedicated hardware (that would be the choice of professional cloud providers). From the view point that software for cloud platform (which the HPC cluster is based on) is as any other programs on the computer, we can use these computers for other purpose besides building cloud-based cluster. By means of experimental study, we show that the deployment time of the HPC cluster is extremely fast and the performance of the newly built cluster, to solve the embarrassingly parallel problem, has the linear relationship between the performance and the cluster size. The rest of the paper is organized as follows: In Section 2, we describe how our system is built; then, several experiments are presented in Section 3; we provide the discussion in Section 4, and finally, Section 5 concludes the paper. 2. Materials and Methods 2.1 Architecture of Cloud-Based Cluster The architecture consists of 3 layers as depicted in Fig. 1. The lowest level consists of physical machines, communication links and network devices. This layer provides the foundation for the cloud. It is important to clarify that these machines are used to build the HCP cluster occasionally in an on-demand fashion. The hardware is not constantly used for the cloud as in the case of professional cloud provider (no matter if it is public or private). This is why we say that we do not need the dedicated hardware. In the second layer, we install the software that manage cloud platform. In the writing time of this paper, there are several open source solutions that provide IaaS such as Eucalyptus , OpenNebula  and Openstack . Almost the cloud platforms work with all the popular virtualization technologies such as KVM, VMWare and Xen. This layer is responsible for creating the virtual machine. It is worth noting that different cloud platforms have Cluster Cloud platform Resource pool Fig. 1 System architecture. different set of terms used to describe their components. To simplify the explanation process, we adopt the following simple architecture: A cloud consists of a node controller (Master node) that controls the other nodes (compute node). The compute node is where the virtual machines are created. The master node does several management tasks such as receiving user request (for virtual machines), monitoring available resources of compute node, scheduling the compute node to handle the user requests, etc.. In our case, at most, we create one virtual machine per compute node. The arguments for this decision are that: (1) our compute nodes are not so powerful. It is not practical to have multiple virtual machines running on it in the same time; (2) having only one virtual machine per node can avoid resource competition between virtual machines which is the main drawback of virtualization technologies [11, 12]. The third layer consists of the cluster middleware such as MPI , PVM  and Hadoop  that make several virtual machines work in the context of cluster. 2.2 Testing Aspects Our objective is to study the feasibility of building a cloud-based cluster on top of a set of commodity-grade computers. To measure this, we consider the following two parameters. The first one is the time duration needed to deploy a cluster. The second one is the performance of the newly-built cluster Deployment Time The deployment time starts when the user issues a request for virtual machine(s) with certain configuration (for instance Fedora, 512 M of RAM) to the master. Based on the scheduling policy in place, the master identifies the compute nodes that are in charge of the request. The selected compute nodes, then,
3 534 Performance of the Cloud-Based Commodity Cluster create the corresponding virtual machines. In reality, to create such virtual machine, the compute node needs to copy the virtual machine image (including root file system image, ramdisk and kernel) from the master. This virtual machine image must be uploaded beforehand to the master by the cloud administrator (or user). As a result, to have a HPC cluster with N machines (based on a certain virtual machine image I), we need to copy I to N compute node(s). The copying time and the time for the compute nodes to create the virtual machines make the deployment time of the cluster. The deployment time of one virtual machine depends mostly on the following two parameters: Size of root file system image: The root file system is the content of the virtual machine which includes all the programs, libraries, etc.. And any computing node in the cloud aiming at running the corresponding virtual machine must copy this image from the master to its place. It is clear that the bigger the image is, the longer the copy time is; Cache: The compute node keeps a copy of the virtual machine image that has been copied to it. As such, it reduces the time to initiate the corresponding virtual machine in the future Computing Bound Application To serve as an example of application that uses intensive computation but less communication between nodes (a.k.a embarrassingly parallel problem), we proposed the Parallel Two-Phase K-means (Par2PK-means) based on the model of 2PK-means (Two-Phase K-means) . In the first phrase of 2PK-means, the original dataset is split into several sub datasets on which the clustering is carried out to produce the intermediate clusters. The second phrase takes all the intermediate clusters as input and produces the final clusters. The difference between 2PK-means and Par2PK-means is that the clustering of all the sub-datasets happens simultaneously. As such, we can increase the processing speed of the clustering task in the first phrase. By design, most of the workload of Par2PK-means happens in the first phrase. Computational effort of the second phrase is negligible. It is worth to clarify that our purpose is not to convince the usefulness and correctness of 2PK-means. Instead, we use Par2PK-means as an example of real life application that belongs to embarrassingly parallel problem Communicational Bound Application High Performance LINPACK (HPL) Benchmark  is a widely-used benchmark tool to measure the amount of floating-point operations (with double precision) and a distributed memory system can perform during a time unit by solving the dense system of linear equations. HPL is based on a subsystem called Basic Linear Algebra Subprograms (BLAS). We use this benchmark to measure the performance of our system. This application is supposed to use more bandwidth for communication than the previous test case. 3. Implementation and Result Due to the historical reason, we use two different setups to carry our performance tests. Whereas little contribution is made thanks to this difference, for the sack of precision, we choose to present both testing platforms here: Setup 1: This setup consists of 31 machines (which are using for the teaching purpose at our University). All the machines connect to a central switch (Fast Ethernet Ports). The machines are classified into the following categories: 18 machines (3.0 GHz CPU Pentium D RAM, 160GB harddisk, and 1 Gigabit Ethernet card), 13 machines (2.7 GHz CPU, 1GB RAM, 250GB harddisk, and 1 Gigabit Ethernet card). All the machines are installed Centos 5.4 kernel Regarding the virtualization technology, we choose to use Xen since it does not need the hardware support. We install the Eucalyptus software on all; Setup 2: The second platform consists of 10 Ubuntun (server edition) machines with (2.8 GHz Dual Core CPU, 2G RAM, 1 Gigabit Ethernet card). These machines are also connected to a central switch (Fast Ethernet Ports). We deployed Openstack as cloud
4 Performance of the Cloud-Based Commodity Cluster 535 computing platform and KVM as virtualization technology. In term of terminology, Eucalyptus and Openstack use different terms to describe their sub-components. A full description of all the relevant terms is out of the scope of this writing. To simplify the presentation, in both setup1 and setup2, we adopt the term master node and compute node with the following convention. A master node controls the cloud and compute node is where the virtual machines are generated. 3.1 Deployment Time No cache scenario: We have tested on the Setup 1 to deploy a virtual machine image I with size of 1,400 Mbytes with different cluster sizes. The result is represented in Table 1. As we can see, the deployment time of the cluster increases linearly with the cluster size. With cache scenario: Both Eucalyptus and Openstack use cache to reduce the time to deploy a virtual machine. Table 2 represents the deployment time with cache of several cluster sizes. As we can see, we need around 5 min to deploy a cluster whatever its size is. 3.2 Computing Bound Application As mentioned earlier, to test the speedup of embarrassingly parallel application, we use a real life example called Par2K-means. We deployed the test on the setup 2 in which the virtual machine is configured to have 2.8 GHz CPU and 1GB of memory. Hadoop version and Java 1.6.0_26 are used as the MapReduce system which is preinstalled into the virtual machine image. And they are used for all experiments in this section. Regarding the algorithms, we use a data set of 29,050,600 data objects of size of 1.23 GB. Each object has four attributes. The evaluation result is represented in Fig. 2. As we can observe, the speedup ratio is almost increased linearly with the cluster size. 3.3 HPL Benchmark As mentioned earlier, in this experiment, we want to Table 1 Deployment time without cache. Cluster size Time duration Duration/virtual-machine m 42 s 2 m 19.4 s m 54 s 2 m 19.6 s m 16 s 2 m 19.6 s 5 13 m 27 s 2 m 42 s 1 4 m 42 s 4 m 42 s Table 2 Deployment time with cache. Cluster size Duration 30 5 m 10 s 15 4 m 26 s 1 4 m 15 s Fig. 2 Relationship between speedup ratio and cluster size. measure the performance of our virtual cluster in solving the dense system of equations by using the HPL benchmark tool. The following packages are used: Open-MPI version 1.6.4, Atlas version and HPL version 2.1. These packages are installed into a virtual machine image Ubuntu version 64 bits. To test the benchmark with different cluster size, we need to adjust several parameters, which are suggested as in Ref. . The important parameters are represented in Table 3. The result is represented in Fig. 3. As we can see, the speed-up ratio increases when nodes are added into the cluster. But it is not a 1 to 1 relationship. In fact, when cluster size increases by 10 times, speed increases by 26.1/7.29 = 3.6 times. 4. Discussion Our HPC cluster is built on top of the virtual machines and it seems that the performance of virtual machines is something that we should worry about. In fact, when trying to measure several performance-related aspects
5 536 Performance of the Cloud-Based Commodity Cluster Table 3 HPL configuration parameters. Nodes Problem size Number of process (P*Q) 1 12,416 1 * ,664 1 * ,960 2 * ,720 2 * ,456 2 * ,552 2 * 5 Fig. 3 HPL benchmark result. such as CPU, memory, I/O and hard disk of Linux-Vserver and Xen, the authors of Ref.  have concluded that heavy network usage from competing VMs can introduce delays as high as 100 ms to round-trip times. Also interested in the performance problem of virtual machines but authors of Ref.  considered it in the context of cloud computing. When testing the EC2 instances (virtual machines provided by Amazon), authors of Ref.  have stated that ven though the data center network is lightly utilized, virtualization can still cause significant throughput instability and abnormal delay variations. Similarly, when doing the benchmark on a set of 75 EC2 instances, the authors of Ref.  came to the conclusion that multiple virtual machines (VMs) can share CPUs and main memory surprisingly well in cloud computing, but that network and disk I/O sharing is more problematic. Given the popular use of virtualization technologies recently, several attempts to build the cluster on top of virtual machines have been made. For instance, the authors of Ref.  have studied the feasibility of using virtual machines to build HPC cluster. As a step further in this direction, the authors of Ref.  tried to build a framework that helps to deploy the virtual-machine-based HPC cluster. In fact, the authors of Ref.  have proposed the method to distribute the virtual machine image for building the HPC cluster. Our HPC cluster distinguishes itself from the previous in two points: (1) It is based on the virtual machine. This makes it different from Ref. ; (2) we take advantages of cloud computing. There is no need to implement the mechanism to manage the virtual machines and images. Cloud computing does the job for us. Our work, in the spirit, is quite similar with the one in Ref. . However, the differences between the two are significant: Work in Ref.  is a theoretical analysis of cloud-based HPC cluster whereas our work proposes a concrete deployment model. 5. Conclusions In this paper, we have demonstrated the feasibility of building a cloud-based HPC on top of a set of commodity-grade computers. Taking advantages of Eucalyptus and Openstack cloud platform, we have shown that it is extremely fast to deploy the cloud-based cluster. In fact, we need only 5 min to deploy a cluster of 30 machines. Moreover, the embarrassingly parallel problem has the linear relationship between the performance and the cluster size. This positive result shows that the cloud-based cluster built on top of commodity-grade computers is a promising solution for occasional needs of HPC, especially for the academic institution. However, more intensive performance tests with various configurations need to be done to reinforce our findings. References  R.G. Brown, Engineering a Beowulf-Style Computer Cluster, Duke University, Physics Department.  Amazon Web Services, Amazon flexible payments service (Amazon FPS) [Online], (accessed August 28, 2013).
6 Performance of the Cloud-Based Commodity Cluster 537  Google Cloud Platform Home Page, https://cloud.google.com/products/app-engine (accessed August 28, 2013).  Amazon Web Services, Amazon elastic compute cloud (Amazon EC2) [Online], (acessed August 27, 2013).  Mware Virtualization Technology Home Page, (accessed August 27, 2013).  Linux Foundation Collaborative Projects Home Page, (accessed August 27, 2013).  Kernel Based Virtual Machine Home Page, (accessed August 28, 2013).  Eucalyptus, Open Source AWS-Compatible Private Clouds, Eucalyptus. [Online],  Eucalyptus Systems, Inc. Home Page, (accessed August 28, 2013).  Openstack Home Page, (accessed August 27, 2013).  M. Armbrust, A. Fox, R. Griffith, et al., A view of cloud computing, Commun. Of the ACM 53 (4) (2010)  G.H. Wang, T.S. Eugene Ng, The impact of virtualization on network performance of amazon EC2 data center, in: Proceeding of 29th Conference on Information Communications, 2010, pp  Message Passing Interface Forum Home Page, (accessed August 27, 2013).  Parallel Virtual Machine Home Page, (accessed August 28, 2013).  Hadoop Home Page, (accessed August 28, 2013).  D.T. Pham, S.S. Dimov, C.D. Nguyen, A two-phase k-means algorithm for large datasets, Proceedings of the Institution of Mechanical Engineers Part C: Journal of Mechanical Engineering Science 2018 (10) (2004)  A. Petitet, R.C. Whaley, J. Dongarra, A. Cleary, HPL A portable implementation of the High-Performance Linpack Benchmark for distributed-memory computers [Online], Innovative Computing Laboratory, (accessed August 28, 2013).  How do I tune my HPL.dat file? [Online], Advanced Clustering Technologies Inc. Home Page, y-hpldat-file.html (accessed August 28, 2013).  J. Whiteaker, F. Schneider, R. Teixeira, Explaining packet delays under virtualization, ACM SIGCOMM Computer Communication Review 41 (1) (2011)  L. Youseff, R.Wolski, B. Gorda, C. Krintz, Evaluating the performance impact of Xen on MPI and process execution for HPC systems, in: Proceeding of 2nd International Workshop on Virtualization Technology in Distributed Computing, 2006, P. 1.  W. Huang, J.X. Liu, B. Abali, D.K. Panda, A case for high performance computing with virtual machines, in: Proceedings of the 20th Annual International Conference on Supercomputing, 2006, pp  M.F. Mergen, V. Uhlig, O. Krieger, J. Xenidis, Virtualizing high performance computing, SIGOPS Oper. Syst. Rev. 40 (2006) 8-11,  J. Geelan, Twenty-one experts define cloud computing, Cloud Computing Journal [Online], January 24, 2009, (accessed August 27, 2013).  Beowulf Home Page, (accessed August 27, 2013).