. BRIEF REPORT. SCIENCE CHINA Information Sciences July 2010 Vol. 53 No. 7: 1481 1486 doi: 10.1007/s11432-010-4011-z An introduction to Tsinghua Cloud ZHENG WeiMin 1,2 1 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China; 2 Tsinghua National Laboratory for Information Science and Technology, Beijing 100084, China Received February 22, 2010; accepted April 28, 2010 Abstract Cloud computing has become a trend that draws attention from both academia and industry all over the world. This paper unveils details of Tsinghua Cloud, a comprehensive solution for cloud computing developed by Tsinghua University. Keywords cloud computing, distributed file system, virtual computing environment, data sharing Citation Zheng W M. An introduction to Tsinghua Cloud. Sci China Inf Sci, 2010, 53: 1481 1486, doi: 10.1007/s11432-010-4011-z 1 Introduction Cloud computing [1 3] refers to services (including hardware such as CPUs and storage, platforms, and applications) that are provided and consumed over the Internet in an on-demand approach. It is the latest computing paradigm that emerges along the journey toward a long-held dream of computing as a utility and it has been well recognized to be of great potential to achieve this vision. Grid computing [4] is based on resource-centric design, aiming to integrate computer resources of multiple administrative domains to meet high performance computing requirements in the field of engineering and scientific research. Cloud computing, however, is distinguished from grid computing in that cloud computing is based on user- and task-centric design with an objective of delivering elastic storage and computation to common business applications using data centers [5]. Therefore, unnecessary details of underlying infrastructures are shielded from users, thereby easing application and programming in cloud computing. Due to its great potential, cloud computing keeps drawing more and more attention from both academia and industry all over the world. In academia, more and more dedicated conferences are emerging: ACM Symposium on Cloud Computing (SOCC), IEEE International Conference on Cloud Computing (CLOUD), USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), and International Conference on Cloud Computing (CloudCom), to name but just a few. Research subjects include architecture and platform for cloud computing (e.g., OpenNebula, Nimbus and Eucalyptus), programming model (e.g., MapReduce [6], PIG Latin [7], Dryad and DryadLinq [8]), resource management, tool development, applications of cloud computing, pricing model, and so on. In industry, many IT companies such as Google, Microsoft, Amazon, IBM, Yahoo! and Intel released their initiatives or products of cloud computing. Typically, people identify three kinds of cloud services: Infrastructure as a Service (IaaS) (e.g., Amazon (email: zwm-dcs@tsinghua.edu.cn) c Science China Press and Springer-Verlag Berlin Heidelberg 2010 info.scichina.com www.springerlink.com
1482 ZHENG WeiMin Sci China Inf Sci July 2010 Vol. 53 No. 7 EC2 and S3, and IBM Blue Cloud), Platform as a Service (PaaS) (e.g., Google App Engine, Microsoft Windows Azure and Salesforce Force.com), and Software as a Service (SaaS) (e.g., Salesforce SFA, Google Doc and Microsoft Dynamic CRM). Cloud computing has also gained significant attention in enterprises and academic institutions of China. As a result, some substantial work in cloud computing has been conducted in these organizations. In this paper, we report one of them, Tsinghua Cloud, a comprehensive solution for cloud computing developed by High Performance Computing Institute of Tsinghua University. 2 Tsinghua Cloud 2.1 Overview Tsinghua Cloud stems from our data grid project sponsored by the National High-Tech Research & Developnemt (863) Program of China. It was initially designed with two objectives: (1) easing the process for both researchers and users to utilize resources in the cloud; and (2) building a platform to exploit and conquer potential challenges of cloud computing. The architecture of Tsinghua Cloud is illustrated in Figure 1. Carrier, a distributed file system, provides scalable simple storage services over large-scale heterogeneous machines. Above Carrier, two services Corsair and Nova are supplied. Corsair delivers easy-to-use services for data storage and sharing while Nova enables users to build their own computing environment (e.g., virtual hosts, virtual clusters and services) in an easier, more productive and universal way. To facilitate application development, four kinds of interfaces are provided, namely GUI (graphic user interface), API (application programming interface), Shell and Web. Tsinghua Cloud has the following features: Its design abides by the low coupling principle; that is, Carrier, Corsair and Nova can be utilized independently to deliver limited functionalities or in combination to provide more advanced functions. It is a comprehensive solution covering both data storage and computation. Moreover, it emphasizes the key role of data in support of computation. That is why Nova service is built on top of Carrier. It highlights data sharing among users. In the rest of this section, we will explain Carrier, Corsair and Nova in details. 2.2 Carrier Carrier is a distributed file system aiming to provide high performance, high availability and convenient storage services. As shown in Figure 2, it is a loosely coupled system consisting of five parts: supervisors, clients, metadata servers, data servers, and the fuse module. The metadata servers provide a global namespace of tree structure. Notice that more than one metadata servers are deployed in order to avoid the bottleneck and single point of failures. The data servers provide space to hold file data. Files in Carrier are divided into multiple 32 MB chunks and stored onto several data servers. The clients provide operations to access file data and metadata. The fuse module enables users to use Carrier in the same way as using a local file system. The supervisors are designed to handle system management issues such as machine failures, replication adjustment, load balancing, data integrity checking, and garbage collection. With Carrier, a series of experiments have been conducted in comparison with the Hadoop Distributed File System (HDFS) [9]. Experimental results show that for files of 256 MB, write and read speeds of Carrier are around 1.3-fold and 1.95-fold to those of HDFS; for files of 32 KB, the gains of write and read speeds are 3-fold and 3.4-fold, respectively. In a word, Carrier can efficiently handle workloads of both small and big files. The improvement of efficiency is due to some optimizations on metadata organization, data access path, garbage collection, etc. In more detail, multiple metadata servers are deployed that balance workloads, thereby avoiding the bottleneck. On metadata servers, a lightweight R/W (read/write) locking mechanism is implemented that improves the efficiency of metadata servers further. Finally, chunk id in Carrier is
ZHENG WeiMin Sci China Inf Sci July 2010 Vol. 53 No. 7 1483 Figure 1 The architecture of Tsinghua Cloud. Figure 2 Functional components of carrier. generated by clients using UUID (universally unique identifier) algorithm, rather than by metadata servers. In this way, the number of interactions between a client and a metadata server to write one chunk is reduced from 2 to 1. Since the interaction is done via network, the reduced interaction number implies a performance improvement. 2.3 Corsair Corsair is designed to facilitate data storage and sharing on top of Carrier and other storage resources. Figure 3 shows its architecture. It mainly consists of storage service, mapping service, search engine, client tools and user service. Storage service is used to access data in Carrier and other storage resources. Mapping service is responsible for translating logical names to physical ones. Search engine is deployed for users to find desired files from tens of terabytes resources. The client tool provides users with a unified view of local and remote files. It looks very like Windows Explorer. User service is responsible for such things as user authorization, authentication and community management.
1484 ZHENG WeiMin Sci China Inf Sci July 2010 Vol. 53 No. 7 Storage managed by Corsair is categorized into three types: public storage (accessible to anyone), community storage (accessible within a community) and personal storage (accessible to a person). By default, personal storage is of 2 GB and community storage of 100 GB. Since any user can create a community and has full control over its accessibility, file sharing becomes very easy. Compared with related work (e.g., Amazon S3), Corsair provides an interface with which user can uniformly manage local and remote files. In addition, it enables, by community storage, user-controllable sharing of files in the cloud, which is a feature rarely supported by current cloud storage services. Both features enhance users experience in using cloud storage. 2.4 Nova Nova aims to ease the burden of computing environment configuration and enable users to utilize remote computing resources in an easier and more productive way. As shown in Figure 4, the Nova service consists of one (or more as needed) master node and many worker nodes. A master node is a physical machine dedicated to managing all physical and virtual machines and provides services like information service, worker dispatch service, storage dispatch service, and virtual machine (VM) monitor service. A worker node is a physical machine responsible for running VMs to complete various computing tasks. In Tsinghua Cloud, the Nova service, as shown in Figure 1, is built on top of Carrier. This is done using the fuse module of Carrier. When configuring a computing environment, users can specify a space in Carrier which is then automatically attached as an independent driver for data inputs and outputs. In this way, the key role of data in support of computation is highlighted, for users no longer to pay extra attention to data downloading and uploading. Here we would also like to point out that basing Nova on Carrier is not a must. Since Tsinghua Cloud adopts low coupling design, the Nova service can run independently with no support of Carrier. However, users must deal with data downloading and uploading in this way, which implies some extra work to them. Compared with related work (e.g., Amazon EC2), the Nova service has the following outstanding features. Firstly, Nova is more productive. Besides virtual hardware and system software, Nova also provides virtualized application software. As shown in Figure 4, user desiring application software will be automatically deployed after the operating system (OS) image has been loaded and the VM starts up. In this way, users can immediately obtain a complete computing environment to fulfill their tasks. Secondly, users do not need to install and configure the client software, because the computing environment configurations and access are all done through a Web browser. In this way, the users burden is eased. Finally, Nova provides inherent support for the integration with storage cloud as aforementioned, which makes it even more attractive. 3 Initial achievements and concluding remarks Till now, Corsair has been deployed in Tsinghua campus for more than one year with more than 12000 registered users and 327 communities. Its client tools have been downloaded more than 50000 times. The daily usage of Corsair service is over 3000 person-times, with an average throughput of around 1.3 TB/day. Nova has been used in a high performance computing course of our university to deliver student-specific experimental environments. It is also used in bioinformatics research with more than 70 bioinformatics software available for users to customize. Besides, we have also developed three mobile applications based on Tsinghua Cloud, namely ifriend, icamera and imovie. ifriend enables users to backup and restore contact information to and from the cloud, icamera facilitates mobile users to take advantage of the unlimited space in the cloud to store photos taken with their mobile phones, and imovie demonstrates how to use mobile clients to browse through and play video files stored in the cloud. Through the work above, we have gained the following experiences of cloud computing. Firstly, cloud computing DOES enhance users experience in remotely using resources provided or operated by third-
ZHENG WeiMin Sci China Inf Sci July 2010 Vol. 53 No. 7 1485 Figure 3 Functions and internal structure of Corsair service. Figure 4 Functional components of Nova service and their interactions. parties. We therefore believe that cloud computing is of a bright future. Secondly, cloud computing is still an early stage technology. There exist challenges on security and privacy, service reliability and availability, programming model, interoperability and so on, which are waiting to be addressed. Acknowledgements This work was supported by the National Natural Science Foundation of China (Grant No. 60773145), and the National High-Tech Research & Development Program of China (Grant Nos. 2009AA01A130, 2006AA01A101, 2006AA01A108, 2006AA01A111 and 2006AA01A117). References 1 Buyya R, Yeo C S, Venugopal S, et al. Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Futur Gener Comp Syst, 2009, 25: 599 616
1486 ZHENG WeiMin Sci China Inf Sci July 2010 Vol. 53 No. 7 2 Dikaiakos M D, Katsaros D, Mehra P, et al. Cloud computing: distributed Internet computing for IT and scientific research. IEEE Internet Comput, 2009, 13: 10 13 3 Armbrust M, Fox A, Griffith R, et al. Above the clouds: A Berkeley view of cloud computing. Technical Report No. UCB/EECS-2009-28, University of California at Berkeley, 2009 4 Foster I, Kesselman C. The Grid 2: Blueprint for a New Computing Infrastructure. San Francisco: Morgan Kaufmann Publishers, 2004 5 Sriram I. SPECI, a simulation tool exploring cloud-scale data centres. LNCS, 2009, 5931: 381 392 6 Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Commun ACM, 2008, 51: 107 113 7 Olston C, Reed B, Srivastava U, et al. Pig Latin: a not-so-foreign language for data processing. In: Proc of ACM SIGMOD, Vancouver, BC, Canada, 2008. 1099 1110 8 Yu Y, Isard M, Fetterly D, et al. DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: Proc of OSDI, San Diego, CA, USA, 2008. 1 14 9 The Hadoop Distributed File System. http://hadoop.apache.org/hdfs/, accessed Feb. 11, 2010