2 Cloud Computing and Software Services Theory and Techniques
4 Cloud Computing and Software Services Theory and Techniques Edited by Syed A. Ahson Mohammad Ilyas
5 CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL by Taylor and Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper International Standard Book Number-13: (Ebook-PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at and the CRC Press Web site at
6 Contents Preface...vii Editor...ix Contributors...xi 1 Understanding the Cloud Computing Landscape...1 Lamia Youseff, Dilma M. Da Silva, Maria Butrico, and Jonathan Appavoo 2 Science Gateways: Harnessing Clouds and Software Services for Science...17 Nancy Wilkins-Diehr, Chaitan Baru, Dennis Gannon, Kate Keahey, John McGee, Marlon Pierce, Rich Wolski, and Wenjun Wu 3 Enterprise Knowledge Clouds: Next Generation Knowledge Management Systems?...47 Jeff Riley and Kemal Delic 4 Real Cases and Applications of Cloud Computing...61 Jinzy Zhu 5 Large-Scale Data Processing...89 Huan Liu 6 Toward a Reliable Cloud Computing Service Thomas J. Hacker 7 Abstractions for Cloud Computing with Condor Douglas Thain and Christopher Moretti 8 Exploiting the Cloud of Computing Environments: An Application s Perspective Raphael Bolze and Ewa Deelman v
7 vi Contents 9 Granules: A Lightweight Runtime for Scalable Computing with Support for Map-Reduce Shrideep Pallickara, Jaliya Ekanayake, and Geoffrey Fox 10 Dynamic and Adaptive Rule-Based Workflow Engine for Scientific Problems in Distributed Environments Marc Frincu and Ciprian Craciun 11 Transparent Cross-Platform Access to Software Services Using GridSolve and GridRPC Keith Seymour, Asim YarKhan, and Jack Dongarra 12 High-Performance Parallel Computing with Cloud and Cloud Technologies Jaliya Ekanayake, Xiaohong Qiu, Thilina Gunarathne, Scott Beason, and Geoffrey Fox 13 BioVLAB: Bioinformatics Data Analysis Using Cloud Computing and Graphical Workflow Composers Youngik Yang, Jong Youl Choi, Chathura Herath, Suresh Marru, and Sun Kim 14 Scale-Out RDF Molecule Store for Efficient, Scalable Data Integration and Querying Yuan-Fang Li, Andrew Newman, and Jane Hunter 15 Enabling XML Capability for Hadoop and Its Applications in Healthcare Jianfeng Yan, Jin Zhang, Ying Yan, and Wen-Syan Li 16 Toward a QoS-Focused SaaS Evaluation Model Xian Chen, Abhishek Srivastava, and Paul Sorenson 17 Risk Evaluation-based Selection Approach for Transactional Services Composition Hai Liu, Kaijun Ren, Weimin Zhang, and Jinjun Chen Index...431
8 Preface Cloud computing has gained significant traction in recent years. The proliferation of networked devices, Internet services, and simulations has resulted in large volumes of data being produced. This, in turn, has fueled the need to process and store vast amounts of data. These data volumes cannot be processed by a single computer or a small cluster of computers. Furthermore, in most cases, these data can be processed in a pleasingly parallel fashion. The result has been the aggregation of a large number of commodity hardware components in vast data centers. Among the forces that have driven the need for cloud computing are falling hardware costs and burgeoning data volumes. The ability to procure cheaper, more powerful CPUs coupled with improvements in the quality and capacity of networks have made it possible to assemble clusters at increasingly attractive prices. By facilitating access to an elastic (meaning the available resource pool that can expand or contract over time) set of resources, cloud computing has demonstrable applicability to a wide range of problems in several domains. Among the many applications that benefit from cloud computing and cloud technologies, the data/compute-intensive applications are the most important. The deluge of data and the highly compute-intensive applications found in many domains, such as particle physics, biology, chemistry, finance, and information retrieval, mandate the use of large computing infrastructures and parallel processing to achieve considerable performance gains in analyzing data. The addition of cloud technologies creates new trends in performing parallel computing. The introduction of commercial cloud infrastructure services has allowed users to provision compute clusters fairly easily and quickly by paying a monetary value for the duration of their usages of the resources. The provisioning of resources happens in minutes, as opposed to hours and days required in the case of traditional queuebased job-scheduling systems. In addition, the use of such virtualized resources allows the user to completely customize the virtual machine images and use them with administrative privileges, another feature that is hard to achieve with traditional infrastructures. Appealing features within cloud computing include access to a vast number of computational resources and inherent resilience to failures. The latter feature arises because in cloud computing the focus of execution is not a specific, well-known resource but rather the best available one. The availability vii
9 viii Preface of open-source cloud infrastructure software and open-source virtualization software stacks allows organizations to build private clouds to improve the resource utilization of the available computation facilities. The possibility of dynamically provisioning additional resources by leasing from commercial cloud infrastructures makes the use of private clouds more promising. Another characteristic of a lot of programs that have been written for cloud computing is that they tend to be stateless. Thus, when failures do take place, the appropriate computations are simply relaunched with the corresponding datasets. This book provides technical information about all aspects of cloud computing, from basic concepts to research grade material including future directions. It captures the current state of cloud computing and serves as a comprehensive source of reference material on this subject. It consists of 17 chapters authored by 50 experts from around the world. The targeted audience include designers and/or planners for cloud computing systems, researchers (faculty members and graduate students), and those who would like to learn about this field. The book is expected to have the following specific salient features: To serve as a single comprehensive source of information and as reference material on cloud computing To deal with an important and timely topic of emerging technology of today, tomorrow, and beyond To present accurate, up-to-date information on a broad range of topics related to cloud computing To present the material authored by the experts in the field To present the information in an organized and well-structured manner Although, technically, the book is not a textbook, it can certainly be used as a textbook for graduate courses and research-oriented courses that deal with cloud computing. Any comments from the readers will be highly appreciated. Many people have contributed to this book in their own unique ways. First and foremost, we would like to express our immense gratitude to the group of highly talented and skilled researchers who have contributed 17 chapters to this book. All of them have been extremely cooperative and professional. It has also been a pleasure to work with Rich O Hanley and Jessica Vakili of CRC Press; we are extremely grateful to them for their support and professionalism. Special thanks are also due to our families who have extended their unconditional love and support throughout this project. Syed Ahson Seattle, Washington Mohammad Ilyas Boca Raton, Florida
10 Editors Syed Ahson is a senior software design engineer at Microsoft. As part of the Mobile Voice and Partner Services group, he is currently engaged in research on new endto-end mobile services and applications. Before joining Microsoft, Syed was a senior staff software engineer at Motorola, where he contributed significantly in leading roles toward the creation of several iden, CDMA, and GSM cellular phones. He has extensive experience with wireless data protocols, wireless data applications, and cellular telephony protocols. Before joining Motorola, Syed worked as a senior software design engineer at NetSpeak Corporation (now part of Net2Phone), a pioneer in VoIP telephony software. Syed has published more than 10 books on emerging technologies such as Cloud Computing, Mobile Web 2.0, and Service Delivery Platforms. His recent books include Cloud Computing and Software Services: Theory and Techniques and Mobile Web 2.0: Developing and Delivering Services to Mobile Phones. He has authored several research articles and teaches computer engineering courses as adjunct faculty at Florida Atlantic University, Boca Raton, where he introduced a course on Smartphone technology and applications. Syed received his MS in computer engineering from Florida Atlantic University in July 1998, and his BSc in electrical engineering from Aligarh University, India, in Dr. Mohammad Ilyas is an associate dean for research and industry relations and professor of computer science and engineering in the College of Engineering and Computer Science at Florida Atlantic University, Boca Raton, Florida. He is also currently serving as interim chair of the Department of Mechanical and Ocean Engineering. He received his BSc in electrical engineering from the University of Engineering and Technology, Lahore, Pakistan, in From March 1977 to September 1978, he worked for the Water and Power Development Authority, Lahore, Pakistan. In 1978, he was awarded a scholarship for his graduate studies and he received his MS in electrical and electronic engineering in June 1980 from Shiraz University, Shiraz, Iran. In September 1980, he joined the doctoral program at Queen s University in Kingston, Ontario, Canada. He completed his PhD in His doctoral research was about switching and flow control techniques ix
11 x Editors in computer communication networks. Since September 1983, he has been with the College of Engineering and Computer Science at Florida Atlantic University. From 1994 to 2000, he was chair of the Department of Computer Science and Engineering. From July 2004 to September 2005, he served as interim associate vice president for research and graduate studies. During the academic year, he was on sabbatical leave with the Department of Computer Engineering, King Saud University, Riyadh, Saudi Arabia. Dr. Ilyas has conducted successful research in various areas, including traffic management and congestion control in broadband/high-speed communication networks, traffic characterization, wireless communication networks, performance modeling, and simulation. He has published 1 book, 16 handbooks, and over 160 research articles. He has also supervised 11 PhD dissertations and more than 38 MS theses to completion. He has been a consultant to several national and international organizations. Dr. Ilyas is an active participant in several IEEE technical committees and activities and is a senior member of IEEE and a member of ASEE.
12 Contributors Jonathan Appavoo Department of Computer Science Boston University Boston, Massachusetts and IBM Thomas J. Watson Research Center Yorktown Heights, New York Chaitan Baru San Diego Supercomputer Center University of California at San Diego La Jolla, California Scott Beason Community Grids Laboratory Pervasive Technology Institute Indiana University Bloomington, Indiana Raphael Bolze Information Sciences Institute University of Southern California Marina del Rey, California Maria Butrico IBM Thomas J. Watson Research Center Yorktown Heights, New York Jinjun Chen Centre for Complex Software Systems and Services Faculty of Information and Communication Technologies Swinburne University of Technology Melbourne, Victoria, Australia Xian Chen Department of Computing Science University of Alberta Edmonton, Alberta, Canada Jong Youl Choi School of Informatics and Computing Indiana University Bloomington, Indiana Ciprian Craciun Computer Science Department West University of Timisoara and Research Institute e-austria Timisoara Timisoara, Romania Dilma M. Da Silva IBM Thomas J. Watson Research Center Yorktown Heights, New York xi
13 xii Contributors Ewa Deelman Information Sciences Institute University of Southern California Marina del Rey, California Kemal Delic Institut d Administration des Entreprises Université Pierre-Mendès-France Grenoble, France Jack Dongarra Department of Electrical Engineering and Computer Science University of Tennessee Knoxville, Tennessee and Computer Science and Mathematics Division Oak Ridge National Laboratory Oak Ridge, Tennessee Jaliya Ekanayake Community Grids Laboratory Pervasive Technology Institute Indiana University and School of Informatics and Computing Indiana University Bloomington, Indiana Geoffrey Fox Community Grids Laboratory Pervasive Technology Institute Indiana University and School of Informatics and Computing Indiana University Bloomington, Indiana Marc Frincu Computer Science Department West University of Timisoara and Research Institute e-austria Timisoara Timisoara, Romania Dennis Gannon Date Center Futures Microsoft Research Redmond, Washington Thilina Gunarathne Community Grids Laboratory Pervasive Technology Institute Indiana University and School of Informatics and Computing Indiana University Bloomington, Indiana Thomas J. Hacker Computer and Information Technology Discovery Park Cyber Center Purdue University West Lafayette, Indiana Chathura Herath School of Informatics and Computing Indiana University Bloomington, Indiana Jane Hunter School of Information Technology & Electrical Engineering The University of Queensland Brisbane, Queensland, Australia
14 Contributors xiii Kate Keahey Computation Institute University of Chicago Chicago, Illinois and Argonne National Laboratory Argonne, Illinois Sun Kim School of Informatics and Computing Indiana University Bloomington, Indiana Wen-Syan Li SAP Research China Shanghai, People s Republic of China Yuan-Fang Li School of Information Technology & Electrical Engineering The University of Queensland Brisbane, Queensland, Australia Hai Liu College of Computer National University of Defense Technology Changsha, People s Republic of China Huan Liu Accenture Technology Labs San Jose, California Suresh Marru School of Informatics and Computing Indiana University Bloomington, Indiana John McGee Renaissance Computing Institute University of North Carolina Chapel Hill, North Carolina Christopher Moretti Department of Computer Science and Engineering University of Notre Dame Notre Dame, Indiana Andrew Newman School of Information Technology & Electrical Engineering The University of Queensland Brisbane, Queensland, Australia Shrideep Pallickara Department of Computer Science Colorado State University Fort Collins, Colorado Marlon Pierce Community Grids Laboratory Pervasive Technology Institute Indiana University Bloomington, Indiana Xiaohong Qiu Community Grids Laboratory Pervasive Technology Institute Indiana University Bloomington, Indiana Kaijun Ren College of Computer National University of Defense Technology Changsha, People s Republic of China Jeff Riley School of Computer Science and Information Technology RMIT University Melbourne, Victoria, Australia
15 xiv Contributors Keith Seymour Department of Electrical Engineering and Computer Science University of Tennessee Knoxville, Tennessee Paul Sorenson Department of Computing Science University of Alberta Edmonton, Alberta, Canada Abhishek Srivastava Department of Computing Science University of Alberta Edmonton, Alberta, Canada Douglas Thain Department of Computer Science and Engineering University of Notre Dame Notre Dame, Indiana Nancy Wilkins-Diehr San Diego Supercomputer Center University of California at San Diego La Jolla, California Rich Wolski Eucalyptus Systems University of California at Santa Barbara Santa Barbara, California Wenjun Wu Computation Institute University of Chicago Chicago, Illinois Jianfeng Yan SAP Research China Shanghai, People s Republic of China Ying Yan SAP Research China Shanghai, People s Republic of China Youngik Yang School of Informatics and Computing Indiana University Bloomington, Indiana Asim YarKhan Department of Electrical Engineering and Computer Science University of Tennessee Knoxville, Tennessee Lamia Youseff Department of Computer Science University of California, Santa Barbara Santa Barbara, California Jin Zhang SAP Research China Shanghai, People s Republic of China Weimin Zhang College of Computer National University of Defense Technology Changsha, People s Republic of China Jinzy Zhu IBM Cloud Computing Labs & HiPODS IBM Software Group/Enterprise Initiatives Beijing, People s Republic of China
16 Chapter 1 Understanding the Cloud Computing Landscape Lamia Youseff, Dilma M. Da Silva, Maria Butrico, and Jonathan Appavoo Contents 1.1 Introduction Cloud Systems Classifications SPI Cloud Classification Cloud Software Systems Cloud Platform Systems Cloud Infrastructure Systems UCSB-IBM Cloud Ontology Applications (SaaS) Cloud Software Environment (PaaS) Cloud Software Infrastructure Software Kernel Layer Cloud Hardware/Firmware Jackson s Expansion on the UCSB-IBM Ontology Hoff s Cloud Model Discussion...13 References
17 2 Cloud Computing and Software Services 1.1 Introduction The goal of this chapter is to present an overview of three different structured views of the cloud computing landscape. These three views are the SPI cloud classification, the UCSB-IBM cloud ontology, and Hoff s cloud model. Each one of these three cloud models strives to present a comprehension of the interdependency between the different cloud systems as well as to show their potential and limitations. Furthermore, these models vary in the degree of simplicity and comprehensiveness in describing the cloud computing landscape. We find that these models are complementary and that by studying the three structured views, we get a general overview of the landscape of this evolving computing field. 1.2 Cloud Systems Classifications The three cloud classification models present different levels of details of the cloud computing landscape, since they emerged in different times of evolution of this computing field. Although they have different objectives some are for academic understanding of the novel research area, while others target identifying and analyzing commercial and market opportunities they collectively expedite comprehending some of the interrelations between cloud computing systems. Although we present them in this chapter in a chronological order of their emergence which also happens to reflect the degree of details of each model this order does not reflect the relative importance or acceptance of one model over the other. On the other hand, the three models and their extensions are complementary, reflecting different views of the cloud. We first present the SPI model in Section 1.2, which is the oldest of the three models. The second classification is the UCSB- IBM ontology, which we detail in Section 1.3. We also present a discussion of a recent extension to this ontology in Section 1.4. The third classification is Hoff s cloud model, which we present in Section 1.5. We discuss the importance of these classifications and their potential impact on this emerging computing field in Section SPI Cloud Classification As the area of cloud computing was emerging, the systems developed for the cloud were quickly stratified into three main subsets of systems: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). Early on, these three subsets of the cloud were discussed by several cloud computing experts, such as in [24,30,31]. Based on this general classification of cloud systems, the SPI model was formed and denotes the Software, Platform, and Infrastructure systems of the cloud, respectively.
18 Understanding the Cloud Computing Landscape Cloud Software Systems This subset of cloud systems represents applications built for and deployed for the cloud on the Internet, which are commonly referred to as Software as a Service (SaaS). The target user of this subset of systems is the end user. These applications, which we shall refer to as cloud applications, are normally browser based with predefined functionality and scope, and they are accessed, sometimes, for a fee per a particular usage metric predefined by the cloud SaaS provider. Some examples of SaaS are salesforce customer relationships management (CRM) system , and Google Apps  like Google Docs and Google SpreadSheets. SaaS is considered by end users to be an attractive alternative to desktop applications for several reasons. For example, having the application deployed at the provider s data center lessens the hardware and maintenance requirements on the users side. Moreover, it simplifies the software maintenance process, as it enables the software developers to apply subsequent frequent upgrades and fixes to their applications as they retain access to their software service deployed at the provider s data center Cloud Platform Systems The second subset of this classification features the cloud platform systems. In this class of systems, denoted as Platform as a Service (PaaS), the provider supplies a platform of software environments and application programming interfaces (APIs) that can be utilized in developing cloud applications. Naturally, the users of this class of systems are developers who use specific APIs to build, test, deploy, and tune their applications on the cloud platform. One example of systems in this category is Google s App Engine , which provides Python and Java runtime environments and APIs for applications to interact with Google s runtime environment. Arguably, Microsoft Azure  can also be considered a platform service that provides an API and allows developers to run their application in the Microsoft Azure environment. Developing an application for a cloud platform is analogous to some extent to developing a web application for the old web servers model, in the sense that developers write codes and deploy them in a remote server. For end users, the final result is a browser-based application. However, the PaaS model is different in that it can provide additional services to simplify application development, deployment, and execution, such as automatic scalability, monitoring, and load balancing. Furthermore, the application developers can integrate other services provided by the PaaS system to their application, such as authentication services, services, and user interface components. All that is provided through a set of APIs is supplied by the platform. As a result, the PaaS class is generally regarded to accelerate the software development and deployment time. In turn, the cloud software built for the cloud platform normally has a shorter time-to-market. Some academic projects have also emerged to support a more thorough understanding of PaaS, such as AppScale .
19 4 Cloud Computing and Software Services Another feature that typifies PaaS services is the provision of APIs for metering and billing information. Metering and billing permits application developers to more readily develop a consumption-based business model around their application. Such a support helps integrate and enforce the relationships between end users, developers, PaaS, and any lower-level providers, while enabling the economic value of the developers and providers Cloud Infrastructure Systems The third class of systems, according to the SPI classification model, provides infrastructure resources, such as compute, storage, and communication services, in a flexible manner. These systems are denoted as Infrastructure as a Service (IaaS). Amazon s Elastic Compute Cloud (EC2 ) and Enomalism elastic computing infrastructure  are arguably the two most popular examples of commercial systems available in this cloud category. Recent advances in operating system (OS) Virtualization have facilitated the implementation of IaaS and made it plausible on existing hardware. In this regard, OS Virtualization technology enables a level of indirection with respect to direct hardware usage. It allows direct computer usage to be encapsulated and isolated in the container of a virtual machine (VM) instance. As a result, OS Virtualization enables all software and associated resource usage of an individual hardware user to be treated as a schedulable entity that is agnostic to the underlying physical resources that it is scheduled to use. Therefore, OS Virtualization allows IaaS providers to control and manage efficient utilization of the physical resources by enabling the exploitation of both time division and statistical multiplexing, while maintaining the familiar and flexible interface of individual standard hardware computers and networks for the construction of services using existing practices and software. This approach is particularly attractive to IaaS providers given the underutilization of the energy-hungry, high-speed processors that constitute the infrastructure of data centers. Amazon s infrastructure service, EC2, is one example of IaaS systems, where users can rent computing power on their infrastructure by the hour. In this space, there are also several academic open-source cloud projects, such as Eucalyptus  and Virtual Workspaces . 1.4 UCSB-IBM Cloud Ontology The UCSB-IBM cloud ontology emerged through a collaboration effort between academia (University of California, Santa Barbara) and industry (IBM T.J. Watson Research Center) in an attempt to understand the cloud computing landscape. The end goal of this effort was to facilitate the exploration of the cloud computing area as well as to advance the educational efforts in teaching and adopting the cloud computing area.
20 Understanding the Cloud Computing Landscape 5 Cloud applications (e.g., SaaS) Cloud software environments (e.g., PaaS) Cloud software infrastructures Computational resources (IaaS) Storage (DaaS) Communications (CaaS) Software kernels & middleware Firmware/hardware (HaaS) Figure 1.1 UCSB-IBM Cloud Computing Classification Model depicted as five layers, with three constituents to the cloud infrastructure layer. In this classification, the authors used the principle of composability from a Service-Oriented Architecture (SOA) to classify the different layers of the cloud. Composability in SOA is the ability to coordinate and assemble a collection of services to form composite services. In this sense, cloud services can also be composed of one or more of other cloud services. By the principle of composability, the UCSB-IBM model classified the cloud in five layers. Each layer encompasses one or more cloud services. Cloud services belong to the same layer if they have an equivalent level of abstraction, as evident by their targeted users. For example, all cloud software environments (also known cloud platforms) target programmers, while cloud applications target end users. Therefore, cloud software environments would be classified in a different layer than cloud applications. In the UCSB-IBM model, the five layers compose a cloud stack, where one cloud layer is considered higher in the cloud stack if the services it provides can be composed from the services that belong to the underlying layer. The UCSB-IBM cloud model is depicted in Figure 1.1. The first three layers of the UCSB-IBM cloud are similar to the SPI classification, except that the authors break the infrastructure layer into three components. The three components that compose the UCSB-IBM infrastructure layer are computational resources, storage, and communications. In the rest of this section, we explain in more detail this ontology s components Applications (SaaS) Similar to the SPI model, the first layer is the cloud application layer. The cloud application layer is the most visible layer to the end users of the cloud. Normally, users access the services provided by this layer through the browser via web