An introduction to Tsinghua Cloud

Similar documents
Introduction to Cloud Computing

How To Understand Cloud Computing

AN IMPLEMENTATION OF E- LEARNING SYSTEM IN PRIVATE CLOUD

CLOUD COMPUTING IN HIGHER EDUCATION

Cloud FTP: A Case Study of Migrating Traditional Applications to the Cloud

PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM

Cloud computing: A Perspective study

Reallocation and Allocation of Virtual Machines in Cloud Computing Manan D. Shah a, *, Harshad B. Prajapati b

Virtual Machine Based Resource Allocation For Cloud Computing Environment

Cloud Computing: a Perspective Study

CLOUD COMPUTING: THE EMERGING COMPUTING TECHNOLOGY. Feng-Tse Lin and Teng-San Shih. Received May 2010; accepted July 2010

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

FEDERATED CLOUD: A DEVELOPMENT IN CLOUD COMPUTING AND A SOLUTION TO EDUCATIONAL NEEDS

How To Understand Cloud Computing

An Overview on Important Aspects of Cloud Computing

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

CLOUD STORAGE USING HADOOP AND PLAY

From Grid Computing to Cloud Computing & Security Issues in Cloud Computing

White Paper on CLOUD COMPUTING

CHAPTER 8 CLOUD COMPUTING

A Load Balancing Model Based on Cloud Partitioning for the Public Cloud

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Optimal Service Pricing for a Cloud Cache

Architectural Implications of Cloud Computing

Scalable Multiple NameNodes Hadoop Cloud Storage System

A New Approach of CLOUD: Computing Infrastructure on Demand

A Study on the Cloud Computing Architecture, Service Models, Applications and Challenging Issues

Role of Cloud Computing in Education

International Journal of Engineering Research & Management Technology

Cloud Computing and the OWL Language

THE IMPACT OF CLOUD COMPUTING ON ENTERPRISE ARCHITECTURE. Johan Versendaal

From Grid Computing to Cloud Computing & Security Issues in Cloud Computing

Cloud Based Distributed Databases: The Future Ahead

Security Considerations for Public Mobile Cloud Computing

Introduction to Cloud Computing

Study on Service-Oriented Cloud Conferencing

Cloud Computing Submitted By : Fahim Ilyas ( ) Submitted To : Martin Johnson Submitted On: 31 st May, 2009

ANALYSIS OF CLOUD VENDORS IN INDIAN ENVIORNMENT

Keywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction

Cloud computing - Architecting in the cloud

CLOUD COMPUTING. Keywords: Cloud Computing, Data Centers, Utility Computing, Virtualization, IAAS, PAAS, SAAS.

Available online at APEEJAY JOURNAL OF COMPUTER SCIENCE AND APPLICATIONS ISSN: (P)

Cloud Computing. Chapter 1 Introducing Cloud Computing

Novel Network Computing Paradigms (I)

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos

CLOUD COMPUTING USING HADOOP TECHNOLOGY

Mobile Cloud Computing T Open Source IaaS

How To Compare Cloud Computing To Cloud Platforms And Cloud Computing

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Li Sheng. Nowadays, with the booming development of network-based computing, more and more


Data Centers and Cloud Computing. Data Centers

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Outline. What is cloud computing? History Cloud service models Cloud deployment forms Advantages/disadvantages

An Efficient Cost Calculation Mechanism for Cloud and Non Cloud Computing Environment in Java


[Sudhagar*, 5(5): May, 2016] ISSN: Impact Factor: 3.785

Cloud Computing Architecture: A Survey

Dynamic Resource Pricing on Federated Clouds

ABSTRACT. KEYWORDS: Cloud Computing, Load Balancing, Scheduling Algorithms, FCFS, Group-Based Scheduling Algorithm

Cloud Courses Description

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Virtual Machine Instance Scheduling in IaaS Clouds

Hadoop. Scalable Distributed Computing. Claire Jaja, Julian Chan October 8, 2013

Infrastructure as a Service (IaaS)

Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing

The Regional Medical Business Process Optimization Based on Cloud Computing Medical Resources Sharing Environment

A Survey on Cloud Computing Security, Challenges and Threats

Cloud Computing to Traditional approaches

Design and Implementation of IaaS platform based on tool migration Wei Ding

Cloud Computing Services and its Application

Manjrasoft Market Oriented Cloud Computing Platform

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

Method of Fault Detection in Cloud Computing Systems

REVIEW OF SECURITY AND PRIVACY ISSUES IN CLOUD STORAGE SYSTEM

DATA SECURITY MODEL FOR CLOUD COMPUTING

Fault Tolerance in Hadoop for Work Migration

Software Systems Architecture in a World of Cloud Computing. Christine Miyachi SDM Entering Class 2000

Cloud Computing an introduction

Scientific and Technical Applications as a Service in the Cloud

Cloud Courses Description

CLOUD COMPUTING IN RURAL EDUCATIONAL SECTOR:ENLIGHTENING BENEFITS AND CHALLENGES

Early Cloud Experiences with the Kepler Scientific Workflow System

Permanent Link:

Radware Cloud Solutions for Enterprises. How to Capitalize on Cloud-based Services in an Enterprise Environment - White Paper

Multilevel Communication Aware Approach for Load Balancing

Hosting Transaction Based Applications on Cloud

Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM

Cloud Computing: The Next Computing Paradigm

Cloud Computing Utility and Applications

Lecture 02a Cloud Computing I

Scientific Cloud Computing: Early Definition and Experience

Load Balancing in the Cloud Computing Using Virtual Machine Migration: A Review

Challenges for cloud software engineering

PRIVACY PRESERVATION ALGORITHM USING EFFECTIVE DATA LOOKUP ORGANIZATION FOR STORAGE CLOUDS

Distributed and Cloud Computing

Grid Computing vs Cloud

Software as a Service (SaaS) and Platform as a Service (PaaS) (ENCS 691K Chapter 1)

Transcription:

. BRIEF REPORT. SCIENCE CHINA Information Sciences July 2010 Vol. 53 No. 7: 1481 1486 doi: 10.1007/s11432-010-4011-z An introduction to Tsinghua Cloud ZHENG WeiMin 1,2 1 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China; 2 Tsinghua National Laboratory for Information Science and Technology, Beijing 100084, China Received February 22, 2010; accepted April 28, 2010 Abstract Cloud computing has become a trend that draws attention from both academia and industry all over the world. This paper unveils details of Tsinghua Cloud, a comprehensive solution for cloud computing developed by Tsinghua University. Keywords cloud computing, distributed file system, virtual computing environment, data sharing Citation Zheng W M. An introduction to Tsinghua Cloud. Sci China Inf Sci, 2010, 53: 1481 1486, doi: 10.1007/s11432-010-4011-z 1 Introduction Cloud computing [1 3] refers to services (including hardware such as CPUs and storage, platforms, and applications) that are provided and consumed over the Internet in an on-demand approach. It is the latest computing paradigm that emerges along the journey toward a long-held dream of computing as a utility and it has been well recognized to be of great potential to achieve this vision. Grid computing [4] is based on resource-centric design, aiming to integrate computer resources of multiple administrative domains to meet high performance computing requirements in the field of engineering and scientific research. Cloud computing, however, is distinguished from grid computing in that cloud computing is based on user- and task-centric design with an objective of delivering elastic storage and computation to common business applications using data centers [5]. Therefore, unnecessary details of underlying infrastructures are shielded from users, thereby easing application and programming in cloud computing. Due to its great potential, cloud computing keeps drawing more and more attention from both academia and industry all over the world. In academia, more and more dedicated conferences are emerging: ACM Symposium on Cloud Computing (SOCC), IEEE International Conference on Cloud Computing (CLOUD), USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), and International Conference on Cloud Computing (CloudCom), to name but just a few. Research subjects include architecture and platform for cloud computing (e.g., OpenNebula, Nimbus and Eucalyptus), programming model (e.g., MapReduce [6], PIG Latin [7], Dryad and DryadLinq [8]), resource management, tool development, applications of cloud computing, pricing model, and so on. In industry, many IT companies such as Google, Microsoft, Amazon, IBM, Yahoo! and Intel released their initiatives or products of cloud computing. Typically, people identify three kinds of cloud services: Infrastructure as a Service (IaaS) (e.g., Amazon (email: zwm-dcs@tsinghua.edu.cn) c Science China Press and Springer-Verlag Berlin Heidelberg 2010 info.scichina.com www.springerlink.com

1482 ZHENG WeiMin Sci China Inf Sci July 2010 Vol. 53 No. 7 EC2 and S3, and IBM Blue Cloud), Platform as a Service (PaaS) (e.g., Google App Engine, Microsoft Windows Azure and Salesforce Force.com), and Software as a Service (SaaS) (e.g., Salesforce SFA, Google Doc and Microsoft Dynamic CRM). Cloud computing has also gained significant attention in enterprises and academic institutions of China. As a result, some substantial work in cloud computing has been conducted in these organizations. In this paper, we report one of them, Tsinghua Cloud, a comprehensive solution for cloud computing developed by High Performance Computing Institute of Tsinghua University. 2 Tsinghua Cloud 2.1 Overview Tsinghua Cloud stems from our data grid project sponsored by the National High-Tech Research & Developnemt (863) Program of China. It was initially designed with two objectives: (1) easing the process for both researchers and users to utilize resources in the cloud; and (2) building a platform to exploit and conquer potential challenges of cloud computing. The architecture of Tsinghua Cloud is illustrated in Figure 1. Carrier, a distributed file system, provides scalable simple storage services over large-scale heterogeneous machines. Above Carrier, two services Corsair and Nova are supplied. Corsair delivers easy-to-use services for data storage and sharing while Nova enables users to build their own computing environment (e.g., virtual hosts, virtual clusters and services) in an easier, more productive and universal way. To facilitate application development, four kinds of interfaces are provided, namely GUI (graphic user interface), API (application programming interface), Shell and Web. Tsinghua Cloud has the following features: Its design abides by the low coupling principle; that is, Carrier, Corsair and Nova can be utilized independently to deliver limited functionalities or in combination to provide more advanced functions. It is a comprehensive solution covering both data storage and computation. Moreover, it emphasizes the key role of data in support of computation. That is why Nova service is built on top of Carrier. It highlights data sharing among users. In the rest of this section, we will explain Carrier, Corsair and Nova in details. 2.2 Carrier Carrier is a distributed file system aiming to provide high performance, high availability and convenient storage services. As shown in Figure 2, it is a loosely coupled system consisting of five parts: supervisors, clients, metadata servers, data servers, and the fuse module. The metadata servers provide a global namespace of tree structure. Notice that more than one metadata servers are deployed in order to avoid the bottleneck and single point of failures. The data servers provide space to hold file data. Files in Carrier are divided into multiple 32 MB chunks and stored onto several data servers. The clients provide operations to access file data and metadata. The fuse module enables users to use Carrier in the same way as using a local file system. The supervisors are designed to handle system management issues such as machine failures, replication adjustment, load balancing, data integrity checking, and garbage collection. With Carrier, a series of experiments have been conducted in comparison with the Hadoop Distributed File System (HDFS) [9]. Experimental results show that for files of 256 MB, write and read speeds of Carrier are around 1.3-fold and 1.95-fold to those of HDFS; for files of 32 KB, the gains of write and read speeds are 3-fold and 3.4-fold, respectively. In a word, Carrier can efficiently handle workloads of both small and big files. The improvement of efficiency is due to some optimizations on metadata organization, data access path, garbage collection, etc. In more detail, multiple metadata servers are deployed that balance workloads, thereby avoiding the bottleneck. On metadata servers, a lightweight R/W (read/write) locking mechanism is implemented that improves the efficiency of metadata servers further. Finally, chunk id in Carrier is

ZHENG WeiMin Sci China Inf Sci July 2010 Vol. 53 No. 7 1483 Figure 1 The architecture of Tsinghua Cloud. Figure 2 Functional components of carrier. generated by clients using UUID (universally unique identifier) algorithm, rather than by metadata servers. In this way, the number of interactions between a client and a metadata server to write one chunk is reduced from 2 to 1. Since the interaction is done via network, the reduced interaction number implies a performance improvement. 2.3 Corsair Corsair is designed to facilitate data storage and sharing on top of Carrier and other storage resources. Figure 3 shows its architecture. It mainly consists of storage service, mapping service, search engine, client tools and user service. Storage service is used to access data in Carrier and other storage resources. Mapping service is responsible for translating logical names to physical ones. Search engine is deployed for users to find desired files from tens of terabytes resources. The client tool provides users with a unified view of local and remote files. It looks very like Windows Explorer. User service is responsible for such things as user authorization, authentication and community management.

1484 ZHENG WeiMin Sci China Inf Sci July 2010 Vol. 53 No. 7 Storage managed by Corsair is categorized into three types: public storage (accessible to anyone), community storage (accessible within a community) and personal storage (accessible to a person). By default, personal storage is of 2 GB and community storage of 100 GB. Since any user can create a community and has full control over its accessibility, file sharing becomes very easy. Compared with related work (e.g., Amazon S3), Corsair provides an interface with which user can uniformly manage local and remote files. In addition, it enables, by community storage, user-controllable sharing of files in the cloud, which is a feature rarely supported by current cloud storage services. Both features enhance users experience in using cloud storage. 2.4 Nova Nova aims to ease the burden of computing environment configuration and enable users to utilize remote computing resources in an easier and more productive way. As shown in Figure 4, the Nova service consists of one (or more as needed) master node and many worker nodes. A master node is a physical machine dedicated to managing all physical and virtual machines and provides services like information service, worker dispatch service, storage dispatch service, and virtual machine (VM) monitor service. A worker node is a physical machine responsible for running VMs to complete various computing tasks. In Tsinghua Cloud, the Nova service, as shown in Figure 1, is built on top of Carrier. This is done using the fuse module of Carrier. When configuring a computing environment, users can specify a space in Carrier which is then automatically attached as an independent driver for data inputs and outputs. In this way, the key role of data in support of computation is highlighted, for users no longer to pay extra attention to data downloading and uploading. Here we would also like to point out that basing Nova on Carrier is not a must. Since Tsinghua Cloud adopts low coupling design, the Nova service can run independently with no support of Carrier. However, users must deal with data downloading and uploading in this way, which implies some extra work to them. Compared with related work (e.g., Amazon EC2), the Nova service has the following outstanding features. Firstly, Nova is more productive. Besides virtual hardware and system software, Nova also provides virtualized application software. As shown in Figure 4, user desiring application software will be automatically deployed after the operating system (OS) image has been loaded and the VM starts up. In this way, users can immediately obtain a complete computing environment to fulfill their tasks. Secondly, users do not need to install and configure the client software, because the computing environment configurations and access are all done through a Web browser. In this way, the users burden is eased. Finally, Nova provides inherent support for the integration with storage cloud as aforementioned, which makes it even more attractive. 3 Initial achievements and concluding remarks Till now, Corsair has been deployed in Tsinghua campus for more than one year with more than 12000 registered users and 327 communities. Its client tools have been downloaded more than 50000 times. The daily usage of Corsair service is over 3000 person-times, with an average throughput of around 1.3 TB/day. Nova has been used in a high performance computing course of our university to deliver student-specific experimental environments. It is also used in bioinformatics research with more than 70 bioinformatics software available for users to customize. Besides, we have also developed three mobile applications based on Tsinghua Cloud, namely ifriend, icamera and imovie. ifriend enables users to backup and restore contact information to and from the cloud, icamera facilitates mobile users to take advantage of the unlimited space in the cloud to store photos taken with their mobile phones, and imovie demonstrates how to use mobile clients to browse through and play video files stored in the cloud. Through the work above, we have gained the following experiences of cloud computing. Firstly, cloud computing DOES enhance users experience in remotely using resources provided or operated by third-

ZHENG WeiMin Sci China Inf Sci July 2010 Vol. 53 No. 7 1485 Figure 3 Functions and internal structure of Corsair service. Figure 4 Functional components of Nova service and their interactions. parties. We therefore believe that cloud computing is of a bright future. Secondly, cloud computing is still an early stage technology. There exist challenges on security and privacy, service reliability and availability, programming model, interoperability and so on, which are waiting to be addressed. Acknowledgements This work was supported by the National Natural Science Foundation of China (Grant No. 60773145), and the National High-Tech Research & Development Program of China (Grant Nos. 2009AA01A130, 2006AA01A101, 2006AA01A108, 2006AA01A111 and 2006AA01A117). References 1 Buyya R, Yeo C S, Venugopal S, et al. Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Futur Gener Comp Syst, 2009, 25: 599 616

1486 ZHENG WeiMin Sci China Inf Sci July 2010 Vol. 53 No. 7 2 Dikaiakos M D, Katsaros D, Mehra P, et al. Cloud computing: distributed Internet computing for IT and scientific research. IEEE Internet Comput, 2009, 13: 10 13 3 Armbrust M, Fox A, Griffith R, et al. Above the clouds: A Berkeley view of cloud computing. Technical Report No. UCB/EECS-2009-28, University of California at Berkeley, 2009 4 Foster I, Kesselman C. The Grid 2: Blueprint for a New Computing Infrastructure. San Francisco: Morgan Kaufmann Publishers, 2004 5 Sriram I. SPECI, a simulation tool exploring cloud-scale data centres. LNCS, 2009, 5931: 381 392 6 Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Commun ACM, 2008, 51: 107 113 7 Olston C, Reed B, Srivastava U, et al. Pig Latin: a not-so-foreign language for data processing. In: Proc of ACM SIGMOD, Vancouver, BC, Canada, 2008. 1099 1110 8 Yu Y, Isard M, Fetterly D, et al. DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: Proc of OSDI, San Diego, CA, USA, 2008. 1 14 9 The Hadoop Distributed File System. http://hadoop.apache.org/hdfs/, accessed Feb. 11, 2010