Grant Agreement N FP7-318484 Title: Authors: Editor: Reviewers: Analysis of existing Cloud technologies and Cloud modelling concepts and prototype requirements Nicolas Ferry (SINTEF), Arnor Solberg (SINTEF), Alessandro Rossini (SINTEF), Santo Lombardo (POLIMI), Oscar Locatelli (POLIMI), Marco Brambilla (POLIMI), Marcos Almeida (SOFTEAM), Anthonin Abhervé (SOFTEAM) Nicolas Ferry (SINTEF) Tabassum Sharif (FLEXI) and Giuliano Casale (Imperial) Identifier: Deliverable # D4.1 Nature: Report Version: 1 Date: 1 April 2013 Status: Diss. level: Final Public Executive Summary This deliverable exhibits an analysis of the state of the art in Cloud technologies and Cloud modelling concepts. This analysis is done at both Cloud-enabled Computation Independent Model and Cloud Provider Independent Model levels. Subsequently, it presents the WP4 requirements according the requirements specification template provided in D3.1.1. The approach is use case based. Copyright 2013 by the MODAClouds consortium All rights reserved. The research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under grant agreement n 318484 (MODAClouds).
Members of the MODAClouds consortium: Politecnico di Milano Stiftelsen SINTEF Institute E-Austria Timisoara Imperial College of Science, Technology and Medicine SOFTEAM Siemens Program and System Engineering BOC Information Systems GMBH Flexiant Limited ATOS Spain S.A. CA Technologies Development Spain S.A. Italy Norway Romania United Kingdom France Romania Austria United Kingdom Spain Spain Published MODAClouds documents These documents are all available from the project website located at http://www.modaclouds.eu/ Public Final Version, Dated April 1 st 2013 2
Contents INTRODUCTION... 7 1.1 CONTEXT AND OBJECTIVES... 7 1.2 STRUCTURE OF THE DOCUMENT... 8 2 SURVEY CLOUD TECHNOLOGIES AND CLOUD MODELLING CONCEPTS... 9 2.1 KEY CHALLENGES FROM MODACLOUDML GENERAL OBJECTIVES AND CASE STUDIES... 9 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6 MODACloudML general objectives... 9 Case study 1: Project management server... 9 Case study 2: Business process modelling system... 9 Case study 3: Health- care application... 9 Case study 4: A smart city urban safety planner... 10 Summary of MODACloudML challenges... 10 2.2 STATE OF THE ART... 10 2.2.1 Cloud- enabled Computation Independent Modelling concepts and technologies... 10 2.2.2 Modelling concepts and technologies for the provisioning, deployment and adaptation of applications in the cloud (CPIM/CPSM)... 12 2.2.3 Data persistence... 22 2.3 DESIGN TIME SCHEMA TRANSFORMATION... 29 2.3.1 Synthesis of the state of the art... 29 3 REQUIREMENT SPECIFICATION... 29 3.1 CONTEXT AND SYSTEM OVERVIEW... 30 3.1.1 3.1.2 Context... 30 System boundary model... 31 3.2 USE CASE SPECIFICATION FOR THE CPIM LEVEL SPECIFICATION... 32 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 3.2.6 Use case heading... 32 Use case description... 33 Use case scenarios... 33 Information model... 35 Interface specification... 35 QoS requirements... 35 3.3 USE CASE SPECIFICATION FOR THE CPSM DERIVATION... 36 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 3.3.6 Use case heading... 36 Use case description... 36 Use case scenarios... 36 Information model... 37 Interface specification... 37 QoS requirements... 37 3.4 USE CASE SPECIFICATION FOR THE CLOUDAPP PROVISIONING AND DEPLOYMENT USE CASES... 38 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 3.4.6 Use case heading... 38 Use case description... 38 Use case scenarios... 39 Information model... 39 Interface specification... 40 QoS requirements... 40 3.5 USE CASE SPECIFICATION FOR THE MODEL BASED RUNTIME MANAGEMENT AND ADAPTATION USE CASES... 40 3.5.1 3.5.2 3.5.3 3.5.4 3.5.5 3.5.6 Use case heading... 40 Use case description... 41 Use case scenarios... 42 Information model... 43 Interface specification... 43 QoS requirements... 43 Public Final Version, Dated April 1 st 2013 3
3.6 CIM MODELLING SUPPORT... 43 3.6.1 3.6.2 3.6.3 3.6.4 Context and system overview... 43 Use case specification for the Define Application Services use case... 45 Use case specification for the Define Services Orchestration use case... 46 Use case specification for the Define Service Requirements use case... 47 4 ROADMAP... 48 5 BIBLIOGRAPHY... 50 Public Final Version, Dated April 1 st 2013 4
Table of Figures Figure 1 MODAClouds Architecture... 8 Figure 2 The stack of cloud solutions... 12 Figure 3 Anatomy of an application in Cloudify... 18 Figure 4 Overview of the models@runtime approach... 21 Figure 5 ORM layer... 22 Figure 6 Key-Value Data Model... 25 Figure 7 Document-based Data Model... 26 Figure 8 Column-oriented Data Model... 27 Figure 9 Graph-based Data Model... 27 Figure 10 Scope with respect to the MODAClouds reference architecture; Run time adaptation, Data Synchronization and MODACloudML CIM, CPIM and CPSM modelling.... 30 Figure 11 Overall MODAClouds approach, the scope of WP 4 is indicated by the blue squares, while the green squares are interacting elements.... 31 Figure 12 System boundary model.... 32 Figure 13 Main scenarios of the CPIM level specification.... 34 Figure 14 Information model for PIM level specification... 35 Figure 15 Main scenarios of the CPSM derivation.... 37 Figure 16 Main scenarios of the CPSM derivation.... 39 Figure 17 Exemplifying the vision of a provisioning and deployment wizard... 40 Figure 18 Main scenarios of the model based management and adaptation use cases... 42 Figure 19 CIM Model... 44 Figure 20 System boundary model... 44 Figure 21 Use Case Figure... 45 Figure 22 Information: Define Application Services... 45 Figure 23 Use Case diagram... 46 Figure 24 Information: Define Services Orchestration... 46 Figure 25 Requirements Use Case diagram... 47 Figure 26 Information: Define Service Requirements... 48 Public Final Version, Dated April 1 st 2013 5
Public Final Version, Dated April 1 st 2013 6
Introduction 1.1 Context and objectives Cloud computing is a computing model enabling ubiquitous network access to a shared and virtualised pool of computing capabilities (e.g., network, storage, processing, and memory) that can be rapidly provisioned with minimal management effort [1]. The landscape of cloud computing encompasses a multitude of cloud providers, as well as several infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) [1] solutions. The ability to run, monitor and adapt multi-cloud systems (i.e., applications and services on multiple clouds) allows exploiting the peculiarities of each cloud solution and hence optimising performance, availability, and cost of the applications and services. However, these cloud solutions are typically heterogeneous and the provided features are often incompatible. This diversity is an obstacle with respect to demands such as promoting interoperability and preventing vendor lock-in. Indeed, it hinders the exploitation of the full potential of cloud computing by increasing the complexity of development and administration of multi-cloud systems. This challenge needs to be addressed. There are several academic and industrial projects that aim at addressing this challenge (see [2], [3], [4], [5], to mention but a few) by providing seamless solutions for provisioning, deployment, monitoring and adaptation of cloud systems. The results from these projects are paramountly important to promote interoperability and prevent vendor lock-in, but they are not sufficient to properly manage the complexity of development and administration of multi-cloud systems [6]. MODAClouds proposes a model-driven approach for the design and execution of applications on multiple clouds. The model-driven approach, commonly summarised as model once, generate anywhere, is particularly relevant when it comes to provisioning and deployment of applications and services across multiple clouds, as well as migrating the source code from one cloud to another. The model-driven engineering (MDE) approach adopted allows the developers to build the system at various level of abstraction. As depicted in Figure 1, the three levels envisioned are: (i) the Cloud-enabled Computation Independent Model (CIM) to describe an application and its data, (ii) the Cloud-Provider Independent Model (CPIM) to describe cloud concerns related to the application in a cloud agnostic way, and (iii) the Cloud-Provider Specific Model (CPSM) to describe the cloud concerns needed to deploy and provision the application on a specific cloud. On the basis of this architecture, this document presents the state of the art in cloud modelling concepts and environments at all levels, as well as requirements for the MODACloudML platform that will be developed within WP 4. This platform provides methods and techniques for provisioning, deployment, and adaptation on multiple clouds. This way, CPIM and CPSM can be regarded as the models that can be manipulated by tools for provisioning, deployment and adaptation of cloud-based applications. At these levels, the deliverable presents a classification of the state of the art in tools for the provisioning and deployment of application in the cloud and discusses about their use of MDE. Since this discussion focus on computational resources and elements of an application, the deliverable will also present the state of the art in data persistence. A result of the latter is to help us in expressing preference for instance to choose a database. Then, in the rest of the deliverable, we will specify WP4 requirements at all levels (CIM, CPIM and the derivation of CPSM) and for the provisioning, deployment and adaptations features. These specifications are driven and illustrated by use cases. Public Final Version, Dated April 1 st 2013 7
Decision making New or legacy applications design Code development Management Semi-automatic transformation Automatic deployment IDE DSS Design-time Developer CIM CPIM CPSM Monitoring & Data syncronization Run-time adaptation Run-time Service Operator Goal: QoS assurance & costs minimization Figure 1 MODAClouds Architecture 1.2 Structure of the document The remainder of the document is organised as follows. Section 2 highlights some challenges related to the definition of MODACloudML, and presents an analysis of the state of the art of cloud technologies and cloud modelling concepts on the basis of these challenges. In particular, it outlines the state of the art in modelling concepts and technologies at the CIM level, and then it discusses cloud concerns at CPIM and CPSM levels. On this basis, Section 3 presents the specifications of WP4 requirements, based on use cases. Finally, Section 4 presents the roadmap for WP4. Public Final Version, Dated April 1 st 2013 8
2 Survey cloud technologies and cloud modelling concepts This Section presents an analysis of the state of the art in cloud technologies and cloud modelling concepts. The different works presented are discussed with respect to the challenges to be addressed within MODACloudML. 2.1 Key challenges from MODACloudML general objectives and case studies This Section presents the general objectives of MODACloudML and the challenges associated. These challenges are extracted from MODAClouds case studies. 2.1.1 MODACloudML general objectives The MODACloudML platform will provide a software engineering methodology and a tool for the model-driven design of multi-clouds systems (i.e., application and services on multiple clouds). The purpose of the proposed modelling approach is to hide to designers the technical details of cloud providers while helping them to fully exploit their peculiarities by capturing requirements on distribution, QoS, and other concerns that are important when deploying an application in the clouds. The scope of MODACloudML is to provide methods and techniques for provisioning, deployment, and adaptation on multiple clouds (please note that with this multiclouds aspect the scope of the platform is broader than the scope of the runtime). The engineering of components, services or any other software artefact can be done using any of the many approaches that already exists for that purpose. 2.1.2 Case study 1: Project management server The Modelio TeamWork server is a repository with versioning which stores models defined within the Modelio case tool. A prototype of this server is currently deployed on top of Softeam's own experimental cloud infrastructure. In a production environment, some QoS requirements should be respected and, for instance, during peak of demands on the server, we should be able to move the server from one provider to another or scale the system. Extracted challenges: When migrating from one cloud to another, the model-driven approach adopted by MODACloudML should abstract from IaaS provider specific concepts to ease the process and then decrease the cost of migration. However, this abstraction should not prevent to exploit providers peculiarities, for instance, for scalability purposes. This scalability concern also requires that the models of the system should be adapted at runtime to scale up and down. 2.1.3 Case study 2: Business process modelling system ADOxx is an application that supports business process modelling. From the end-user perspective, deploying this application on the cloud could benefit in terms of availability and performance. Report analysis is a complex activity that can be time consuming depending on several factors such as the size of the model repository or the complexity of the analysis. In such case, this analysis is executed asynchronously as a batch job during the night. By using appropriate cloud providers and solutions (such as cloud elasticity) this time can be reduced by 50%. Extracted challenges: A challenge for MODACloudML from this case study is the migration of the legacy application to the cloud. MODACloudML should be agnostic to any development paradigm and technology, meaning that developers can design and implement the applications and services based on their preferred paradigms and technologies. Another challenge is to provide methodology and tools to fully exploit clouds peculiarities by focusing on cloud concerns rather than implementation details. 2.1.4 Case study 3: Health-care application The Health-care application is an existing monolithic application for the home-based treatment of patients affected by some dementia and also for the formulation of a home care strategy by their care-givers. A high level architecture of the Health-care MODAClouds case study includes: (i) the patients and carers data storage that will benefit from the cloud in terms of scalability and performance and will be deployed in a private IaaS to better address privacy and security issues. (ii) A Server Application that implements the core functionalities of the platform, i.e. secure communication with client applications, risk assessment and analysis, adverse event detection plus a Web-based Graphical User Interface (GUI) for clinicians and platform administrators. This will benefit from the cloud in terms of scalability by taking advantage of the auto-scaling and pay per use features the Public Final Version, Dated April 1 st 2013 9
cloud offers, and it will require the cloud to establish a high level SLA oriented to the application and not to the infrastructure. (iii) A Carers Client Application used by carers and patients to access the services of the platform and deployed on the cloud as a Virtual Desktop. This requires allocating/de-allocating environments as carers come and go. Extracted challenges: The described cloud-based applications will be deployed in a federated multi-cloud PaaS/IaaS Infrastructure that implements a hybrid cloud scenario. MODACloudML should provide modelling concepts for multiple private, public, or hybrid clouds at both IaaS and PaaS levels. MODACloudML should enable the deployment of such multi-cloud systems. 2.1.5 Case study 4: A smart city urban safety planner The goal of this case study is to develop a city urban safety planner for the management of fire incidents. The scenario considers an area where high density population is served by an old, hard to maintain gas pipe network. Gas detectors, traffic sensors in the road, CCTV cameras and electricity circuit breaker are in place and are managed by an already existing Internet of Things (IoT) platform. The goal of the case study is to develop a city planner able to predict: (i) the potential failure of gas detector sensors by analysing data from gas sensors, and (ii) the impact of a fire by analysing the videos taken from CCTV cameras in the nearby. The planner should be deployed on PaaS with different characteristics. The infrastructure processing power has also to scale up and down very quickly to manage the peaks of data flows which are not constant all over the measuring areas. Finally, data replication and migration mechanisms have to be put in place to avoid loss of data in case of failure of one of the application instances. Extracted challenges: MODACloudML should support the deployment of the same application on multiple clouds. The planner should be easily deployed on different PaaS, MODACloudML should enable the deployment of applications on PaaS in a cloud agnostic way. Another challenge from this case study is related to data migration and replication. MODACloudML should enable data replication on multiple clouds. 2.1.6 Summary of MODACloudML challenges On the basis of the challenges extracted from our four case studies we can summarize the challenges of the MODACloudML platform as below: Providing modelling concepts for the provisioning, deployment and adaptation of application on multicloud mixing both IaaS and PaaS levels. Providing through the model-driven architecture various levels of abstraction that allows developers to abstract from IaaS/PaaS provider specific concepts while fully exploiting clouds peculiarities. Being agnostic to any development paradigm and technology, meaning that developers can design and implement applications and services based on their preferred paradigms and technologies. Providing an approach and concepts for data replication on multiple clouds at both IaaS and PaaS levels. The analysis of the state of the art will focus on the ability of the various modelling concepts and technologies presented to address some of these challenges. 2.2 State of the art The state of the art is organized on the basis of the MODAClouds architecture. We will present modelling concepts and technologies first at the CIM level and then at the CPIM level. The latter will be decomposed in two subsections on (i) provisioning, deployment and adaptation of software artefacts and (ii) data persistence. 2.2.1 Cloud-enabled Computation Independent Modelling concepts and technologies The Cloud-enabled Computation Independent Models (CIM) describes cloud applications at the service level. Hence, for a given application, it contains the description of the services that compose it. Besides, it contains the public interfaces of each service, the business processes that describe their orchestration and the domain model of the data exchanged by them through their public interfaces. The CIM model includes also the support of requirements defining constraints associated with the application and a model of resource consumption associated to the utilisation of its services. In this Section we are going to describe the existing and related technologies that allow a developer to describe each part of the CIM. Public Final Version, Dated April 1 st 2013 10
We start this Section by analysing approaches that are able to describe services and their interaction. That is the case of the Services Oriented Architecture (SOA) related technologies which define applications as sets of services that communicate through well-defined interfaces. SOA-enabled technologies may therefore be used to define cloud-enabled applications without going into the fine details of deployment. Most of the time services are simply modelled by means of general purpose languages such as UML. For example, in [7], the generic UML concepts of class and interface are used to define services and their public interfaces respectively. Some service-specific languages have also been designed for SOA. We take SoaML [8] and SOMF [9]. SoaML defines a MOF metamodel and a UML profile while SOMF defines a completely new language for defining servicerelated concepts. It reuses and extends the UML concepts of components and ports to define respectively the services and their interfaces. Classes are used to define the entities in the domain model of the application that are manipulated by the services. SOMF also includes a sublanguage called Cloud Computing Modelling Notation (CCMN), whose concepts such as IaaS, PaaS and SaaS clouds, and clouds of clouds; and service orchestration based on activity diagrams could be embedded into MODACloudML. Apart from these industry driven initiatives, academic works such as the Unified Services Language (USDL) [10] also provide the necessary abstraction to describe a services oriented architecture. This language goes even further, by allowing designers to specify, beside services and their interfaces, non-functional aspects on these services (e.g. marketing, pricing, legal, certification, documentation, etc). These aspects are however not cloud specific. This is a drawback that will be dealt with by MODACloudML. The works outlined in the previous paragraph deal with the generic concept of service. Other works adress the specific concept of Web Service. These works can be separated into two groups. The first group is formed by languages like WDSL [11], which enable the specification of a list of services, interfaces, data types and orchestration processes at a syntactical level. The second group is formed by the so-called Semantic Web languages such as WSML [12] and OWL-S [13], which enable the specification of the semantics of the services, besides their syntax. The semantics is defined by means of logic formalisms with the objective of achieving high level services such as automatic service selection, discovery and composition. This is however out of the scope of MODAClouds. The main drawback of the approaches in the previous category is that they do not allow for the description of non-functional requirements and constraints. There are, however, other approaches that take this kind of constraint into consideration. For example, several extensions of WSDL include non-functional requirements to service interface descriptions [14] [15]. These extensions usually consist of logical languages to link each service to a list of so-called policy assertions that it should enforced at runtime. At the modelling level, the OMG UML profile for QoS [16] also allows a designer to specify QoS requirements and to connect them to service descriptions. Reusing and extending such languages is part of the scope of MODAClouds, for more details about modelling languages allowing the specification of QoS constraints and requirements, please refer to the deliverable D5.1. Another important aspect of the CIM model that is usually neglected by service specifications is the model of the resource consumption associated to the services that compose the application. In MODAClouds, this piece of information is used in both finding a CPIM and CPSM combination that respect the constraints and requirements imposed by the CIM; and in providing a feedback loop in which this part of the model can be updated from runtime collected monitoring information. The state of the art in the representation of resource consumption is however not in the scope of the present document. For more details, please refer to the deliverable D2.1. Besides the service definition, their associated requirements and resource consumption information, the CIM model should also provide a description of their interactions, which is what is called service orchestration. There are several languages for the definition of service orchestrations [17] [18] [19] [20] [21] [22]. In this Section we are going to focus in standards such as Business Process Execution Language (BPEL) [17] and Web Services Choreography Language (WS-CDL) [18]. They have been proposed respectively by OASIS and W3C and focus on service orchestration in the implementation phase. Their specifications should then be detailed and executable. The execution of such specifications is often delegated to execution engines. These engines do not target cloud computing environments. MODACloudML orchestration specifications at the CIM level do not target full executability since it is intended to be transformed into a CPIM by means of a semiautomated mapping. Other languages, such as the one presented in [22] represent academic efforts in defining high level languages intended for the initial phases of a project, in which high level declarative descriptions of processes are preferred Public Final Version, Dated April 1 st 2013 11
to low level imperative ones. In this paper, message exchanges between services are used to describe interactions which are then composed into choreographies. The main drawback of such choreographies is that they are intended for the initial phases of the project, i.e. their mapping into lower level orchestration descriptions is not supported. In MODAClouds Approach, the orchestration in the CIM level is semiautomatically mapped into a CPIM level architecture that implements such orchestration. Finally, the efforts in defining Enterprise Architecture Frameworks (EAF) may be useful into defining cloudenabled computation independent models of applications. That is so because EAFs are intended to represent the abstract working of enterprises, from its interactions to external actors to its internal services and orchestration protocols. We can cite two important standards in this domain: RM-ODP [23] and TOGAF [24]. For example, TOGAF includes the description of business services and their interfaces, and uses data entities to specify the data handled by these services. Both standards define custom multi-view metamodels which are mapped into UML profiles. The main drawback of such metamodels is that they do not target cloud applications, e.g. cloud specific requirements and QoS constraints cannot be defined and enforced. The CIM is provided as an input to cloud solutions for the provisioning and deployment of application on the cloud together with resources such as the code of the application to be deployed (e.g., a war or jar file). Within MODAClouds this model will be semi-automatically translated and further refined at the CPIM level. CPIM and CPSM include cloud concepts such as IaaS, PaaS or SaaS elements. They are basically the models that can be manipulated by tools for provisioning, deployment and adaptation of cloud-based applications. The next Section presents the state of the art in such tools. 2.2.2 Modelling concepts and technologies for the provisioning, deployment and adaptation of applications in the cloud (CPIM/CPSM) The cloud market counts numerous cloud solutions at different levels of the cloud stack, such as IaaS providers, IaaS/PaaS libraries, as well as PaaS frameworks. As mentioned, this diversity prevents interoperability and promotes vendor lock-in. In the following we are going to present each of these solutions and explain how they build upon each other to form a stack (see Figure 2). We will also discuss the representation and models used and sometimes provided by some of these solutions. 2.2.2.1 Providers Figure 2 The stack of cloud solutions There is nowadays plethora of providers. The literature encompasses several taxonomies and surveys of providers [ProdanOstermann09,LiYangKZ10], but the cloud computing market has been constantly evolving during the latest years, and the data collected just few years ago is already outdated. Public Final Version, Dated April 1 st 2013 12
Table 1 shows a classification that outlines current major public IaaS providers. The list of providers is by no means exhaustive, but it includes the ones that we believe are the current major players ate least in the European and North American markets. This classification is based on headquarters, data centres' location, and uptime service level agreement (SLA). Table 1 Providers Provider Headquarters Data centres location Uptime SLA IaaS stack Amazon AWS USA USA, Brazil, Ireland, Japan, Singapore, Australia 99.95% Proprietary AT&T Cloud Architect USA USA 100.00% OpenStack Bit Refinery USA USA, UK 100.00% VMWare vcloud GoGrid USA USA, Netherlands 100.00% Proprietary Google Compute Engine USA USA, EU (Unspecified) 99.95% Proprietary Hosting.com USA USA 100.00% VMWare vcloud HP Cloud USA USA 99.95% OpenStack IBM SmartCloud Enterprise USA USA, Germany, Japan 99.90% OpenStack Microsoft Windows Azure USA USA, Ireland, Netherlands, Hong Kong, Singapore 99.95% Proprietary Nephoscale USA USA 99.90% Proprietary, OpenStack (Storage only) OpSource USA USA, France, UK 100.00% Proprietary RackSpace USA USA, UK, Hong Kong 100.00% OpenStack ReliaCloud USA USA 100.00% VMWare vcloud Softlayer USA USA, Netherlands, Singapore 100.00% Proprietary, OpenStack (Storage only) Terramark USA USA, Canada, Brazil, Colombia, Dominican Republic, Belgium, France, Germany, Ireland, Italy, Luxembourg, Netherlands, Spain, Sweden, Turkey, UK, China, Japan, Singapore, Australia 100.00% VMWare vcloud Public Final Version, Dated April 1 st 2013 13
Aruba Cloud Italy Italy 99.95% Proprietary CloudSigma Switzerland Switzerland, USA 100.00% Proprietary Gandi France France, USA 99.95% Proprietary GreenQloud Iceland Iceland 100.00% CloudStack Lunacloud UK France, Germany, Latvia, Portugal 99.99% Proprietary (AWS EC2/S3 compatible) Memset UK UK 99.99% Proprietary, OpenStack (Storage only) The headquarters column shows that 15 providers are based in the USA while only six are based in Europe. However, the data centres' location column shows that 17 providers have data centres in the USA while 16 have data centres in Europe. This information is particularly relevant with respect to data protection laws and regulations, such as the EU data protection directive (Directive 95/46/EC) and the upcoming data protection regulation (to be adopted in 2014), which restricts the geographical locations where for instance the data of EU residents can be stored and processed. The uptime SLAs column shows that all the providers promise at least 99.9% uptime. This indicates that for many applications there is no significant difference in terms of uptime SLAs, however, for some types of applications it is important to further exceed the 99.9\% uptime and reach near 100% uptime. However, the uptime SLA information does not reflect the actual uptime, but rather a contract between the provider and the clients, and the latest years have witnessed several severe outages at major providers [Jansen11]. The interested reader may use the CloudSleuth's called Global Provider View (see [25]) to understand the reliability and consistency of the most popular providers. Public providers (see Section 2.2.2.1) have traditionally been offering a set of proprietary APIs for the provisioning, deployment, monitoring, and (partially) adaptation of cloud capabilities. Some minor providers have been implementing APIs which are compatible with the ones from leading providers such as the Amazon AWS [26] APIs. This solution may increase the interoperability across some providers. However, it does not solve the vendor lock-in problem. As explained in Section 2.1.6, MODACloudML is an approach for the provisioning, deployment and adaptation of application on multi-cloud this vendor locking problem is still an issue with respect to this multi-cloud concern. 2.2.2.2 Stacks A first step towards solving this problem is provided by IaaS stacks such as OpenStack [27] and WMWare vcloud [28] for creating and managing infrastructure of cloud services in private, public, and hybrid clouds. Table 2 shows a classification of these stacks based on the license, implementation languages, hypervisors supported, and main contributors of each stack. Apache CloudStack [29] is a free software included in the Apache Incubator project since 2012. It was originally developed by Citrix and is currently maintained by the Apache Software Foundation. CloudStack provides features such as resource management, user management, API, and user interface. Eucalyptus [30] is a free software project initiated in 2008. It is developed and maintained by Eucalyptus Systems. Eucalyptus allows building Amazon AWS-compatible private and hybrid clouds. OpenNebula [31] is a free software project initiated in 2008. It is sponsored by C12G, a cloud computing company associated with the Scientific Park of Madrid, and maintained by the OpenNebula Community. OpenNebula aims at developing the industry standard solution for creating and managing virtualised enterprise data centers and IaaS clouds. Public Final Version, Dated April 1 st 2013 14
OpenStack [27] is a free software project launched in 2010. It was originally developed by Rackspace and NASA and is currently maintained by the OpenStack Foundation with contributions from all the major players in cloud computing. OpenStack allows controlling pools of computing, storage, and networking resources throughout a datacentre. It provides an API and a dashboard that allow consumers to seamlessly provision resources. Table 2 Stacks Stack License Implementation languages Supported hypervisors Adopted by Main contributors CloudStack Apache License 2.0 Java KVM, Citrix Xen, VMWare vsphere GreenQloud Citrix, Apache Software Foundation Eucalyptus General Public License v3 Java, C KVM, Citrix Xen, VMWare vsphere Eucalyptus Systems OpenNebula Apache License 2.0 C++, C, Ruby, Java, Shell script, lex, yacc KVM, Citrix Xen, Oracle VM, VMWare vsphere OpenNebula Community OpenStack Apache License 2.0 Python KVM, Citrix Xen, VMWare vsphere AT&T Cloud Architect, HP Cloud, INM SmartCloud enterprise, Nephoscale (storage), RackSpace, Softlayer (storage), Memset (storage) RackSpace, NASA VMWare vcloud Commercial VMWare vsphere Bit Refinery, Hosting.com, ReliaCloud, Terramark VMWare As depicted by the IaaS providers column of Table 2, the cloud market seems to be consolidating at the IaaS level towards a few IaaS stacks. As shown for the 21 public providers listed in Table 1, seven providers adopt OpenStack (four fully, and three partially), four providers adopt VMWare vcloud, one provider adopts CloudStack and the remaining nine adopt proprietary stacks (although one is compatible with Amazon AWS APIs). This indicates that the OpenStack has gained relatively wide acceptance across public providers, and also VMWare vcloud is supported by several providers. This trend may increase the interoperability across providers adopting the same stack. However, it does not have any significant contribution to address the challenge of supporting development and management of multi-clouds systems. 2.2.2.3 Libraries A second step towards supporting multi-cloud systems is provided by some IaaS/PaaS libraries such as jclouds [32], DeltaCloud [33], and Simple Cloud [34]. These libraries provide abstraction layers facilitating the provisioning and deployment of multi-cloud systems through a single interface. They support numerous IaaS providers as well as IaaS stacks (see Figure 2). These libraries are at the border between IaaS and PaaS levels since they allow, for instance, a developer to run scripts on the infrastructure or to deploy a load balancer that may rely on platform services. Table 3 shows a classification of these libraries based on license, implementation languages, and supported providers/stacks of each library. Table 3 Libraries Library License Implementation Supported providers/stacks Public Final Version, Dated April 1 st 2013 15
language jclouds [32] Apache License Version 2.0 Java http://www.jclouds.org/documentation/reference/supportedproviders/ libcloud [35] Apache License 2.0 Python http://libcloud.apache.org/supported_providers.html DeltaCloud [33] Apache License 2.0 Ruby http://deltacloud.apache.org/supported-providers.html fog [36] MIT License Ruby http://fog.io/about/supported_services.html Simplecloud [34] BSD license Php fog [36] is a Ruby API providing access to compute and storage facilities on multiple clouds. It helps developers in testing and simulating their deployment by providing an in-memory representation of cloud resources. jclouds [32] is a Java and Clojure API delivering an abstraction layer over the APIs of IaaS providers and stacks. It facilitates developers in describing generic virtual machines by means of templates. It also allows deploying multiple virtual machines and managing them as a group. libcloud [35] is a Python API providing solutions for managing multiple clouds that are akin to the ones of jclouds. Simple Cloud [34] is a PHP API delivering mechanisms for managing the life-cycle of a virtual machine on multiple clouds. It offers interfaces for data storage, document storage, and message queue services. It also provides mechanisms for monitoring a virtual machine (e.g., computing, memory, storage, and network usage). Most of these libraries are language-dependent since they are designed to interface with programming language like Ruby, Java, and PHP. However, this is not the case of DeltaCloud [33], another API providing drivers for computing and storage facilities. It consists of a REST interface where client send request to a DeltaCloud server (on a local machine or on a public DeltaCloud instance) wrapping the drivers to the various cloud providers. However, such an approach can introduce a single point of failure. IaaS libraries provide a common access to multiple clouds; however, they do not provide any mechanism for automatic provisioning and deployment of applications and services on the clouds. They do not rely on a classical Model-Driven Architecture (MDA) but provide most of the times a code-based model of the infrastructure. For instance, jclouds provides a POJO model of the infrastructure that includes concepts such as: NodeMetadata: description of node with metadata such as imageid, CPU, RAM, security policy etc. Template: an abstract representation of a node with parameters such as mincpu, OS type, etc NodeInGroup: a set of nodes to be managed together Script: a set of command to be executed on nodes Provider: information about the provider Since jclouds is working at the IaaS level, applications and services are not modelled. 2.2.2.4 Frameworks The latest step towards supporting multi-cloud systems is provided by some specific PaaS frameworks. These frameworks aim at reducing the complexity of managing multi-clouds systems. They provide capabilities for the provisioning, deployment, monitoring, and adaptation of multi-cloud systems without being language-dependent. They partially reuse the IaaS and Paas libraries (see Figure 2). As claimed in [37], two main types of PaaS can be distinguished. One type of PaaS such as openshift [38] considers the underlying IaaS as a black box; i.e., it does not provide visibility and control over the underlying infrastructure. Another type of PaaS considers the same IaaS as a white box, i.e., it provides full visibility and control over the underling infrastructure. Without visibility Public Final Version, Dated April 1 st 2013 16
and control on the underlying infrastructure, developers can not explicitly adapt the infrastructure to optimise performance, availability, and cost. In this Section, we present PaaS frameworks that provide visibility of the IaaS level since this is an objective of the MODACloudML platform. Some of them rely on so-called DevOps tools such as Chef [39] and Puppet [40] that automate the deployment of applications and services, as well as the management of cloud capabilities. With visibility and control on both IaaS and PaaS levels, developers can exploit the peculiarities of cloud solutions at each level of the cloud stack. They embed simple mechanisms to monitor the topology of the infrastructure, metrics about resource consumption (e.g., computing, memory, storage, and networking) in addition to feedback about the status of the application. They also offer cloud-specific adaptation mechanism such as load balancing, auto scaling or automatic failure recovery. Table 4 shows a classification of the latter type of frameworks based on license, implementation languages, interfaces, provisioning and deployment support, monitoring support, and adaptation support. More technical details about Cloudify and Cloud foundry can be found in D6.1. Table 4 Frameworks Tool License Implementation languages Interface Supported providers/stacks Monitoring support Adaptation capabilities Cloudify [4] Apache License 2.0 Java, Groovy, JavaScript CLI, webbased monitoring interface, REST API to cloudify service Amazon, OpenStack, Azure, HP cloud, RackSpace, your own local provider Application and deployment status and logs & resources metrics Auto-scaling based on metrics on resources and number of instances Automatic failure recovery Scalr [5] Apache License 2.0 Python, PHP, JavaScript REST API, Web-based user interface Amazon, OpenStack, RackSpace, Nimbula, Eucalyptus, IDC Frontier, CloudStack, cloud foundry Application status and logs & load statistics & Notification when anything happened to a farm. Auto-scaling of the infrastructure including database when overloaded (CPU, RAM, DISK, Network) or when scheduled thanks to the task manager. Automatic failure recovery Cloud Foundry [41] Apache License 2.0 Ruby, Java, JavaScript REST API, CLI, eclipse plug-in Amazon, OpenStack, Rackspace, Piston, Eucalyptus, your own local provider Application Status & Environment variables & Application Logs & resources metrics Change the number of instances associated to an application Automatic failure recovery Public Final Version, Dated April 1 st 2013 17
Cloudify [4] is an open-source project developed by GigaSpaces that focus on the deployment and execution of application on the cloud with a large panel of providers supported and provides basic scalability features. To deploy applications Cloudify proposes a model inspired from Chef [39] involving the following concepts: Service recipe: describes general information about the service including its required infrastructure, how it should be used and the probes to monitor it. Service: is a cluster of service instances that make-up an application tier Application recipe: describes the configuration (including provisioning and scaling rules) of an application and the services it is made of. Application: an application is a set of services working together and is described in an application recipe. The Cloudify manager deployed on a cloud allows cloud-operators to manage several applications in the same infrastructure. Probes: are used to monitor the status of the system, they can be built-in, scripted or plugin Figure 3 from [4] describes the anatomy of an application in Cloudify. Figure 3 Anatomy of an application in Cloudify Scalr [5] is also an open-source project with a specific focus on scalability with more advanced features in this area than the two other frameworks. It proposes a model that can be manipulated through a graphical user interface which involves the following concepts: Server farms: a set of components with specified roles to be deployed Components: type of element to be deployed (e.g., databases, load balancers, application servers etc) Roles: describes the configuration of a component (e.g., scaling options, settings, parameters, load balancing options...) Auto-scaling rules: base on a metrics describes when to scale in or out Config templates: pre-defined configuration for a role Cloud Foundry [41] is both a PaaS hosted by VMWare and an open-source project with a Micro version for local deployment. Concepts that can be manipulated through the Cloud Foundry API are 1 : Ressources: are entities with metadata, they can be: Organization, User, Space, Application, Runtime, Framework, Service, ServicePlan, ServiceInstance, ServiceBinding, ServiceAuthToken. Associations: relations between entities Actions: to change the state of the system i.e., start a resources with 5 instances and id 2 Errors: HTTP response codes These frameworks are important to optimise performance, availability, and cost of multi-cloud systems. However, they do not come with any structured approach, and the provided methods and tools are at a technical level, thus, the developer will typically be left hacking at code level rather than engineering multi-cloud systems following a structured tool supported methodology. 1 http://cloudfoundry.github.com/docs/reference/cc-api.html Public Final Version, Dated April 1 st 2013 18
2.2.2.5 EU projects Several on-going European projects are providing stacks, libraries or frameworks for the provisioning, deployment, monitoring and adaptation of cloud-based systems at IaaS or PaaS levels. In this Section, we will present these projects with a focus on their ability to target multi-clouds systems and their use of model-driven techniques. More technical details about Cloud4SOA, mosaic and OPTIMIS can be found in D6.1. Project Objective REMICS [42] [43] The MODACloudML approach is based on the work done in REMICS [43] which provides modelling concepts enabling model-driven provisioning and deployment of cloud-based systems at the IaaS level. A domain specific language called PIM4Cloud provides a first step to designers to model application to be deployed on the cloud. The proposed approach is cloud provider independent and inspired by components models. The language is implemented using Scala as a hosting language. Another mechanism focuses on the provisioning of computational resources at the IaaS level and manipulates concepts as providers and nodes with properties such as CPU, RAM etc. MODACloudML will extend it to the PaaS level. 4CaaSt [44] The 4CaaSt project delivers a solution for elastic and optimised hosting of Internet-scale multi-tier applications. This solution is based on Chef to monitor the execution and manage the life-cycle of applications and services [45]. ARTIST [46] ARTIST aims at providing MDE techniques for representing applications and services as well as cloud infrastructures and platforms. The expected outcomes of the project are a vendor- and platform-independent methodology and an automation-oriented toolset for re-engineering, migration, maintenance and evolution of cloud-based applications. Since ARTIST is also a project from call 8 of FP7-ICT only little information is available at this stage of the project. CELAR (Cloud ELAsticity provisining) [47] CELAR aims at delivering an automated and highly customisable system for elastic provisioning of resources in cloud computing platforms at the IaaS level. The expected outcomes of the project are a middleware for elastic provisioning that automatically manages and adapts cloud resources, an information system describing cloud resources and providing a search mechanism, and a scalable monitoring tool. Since CELAR is also a project from call 8 of FP7-ICT only little information is available at this stage of the project. Cloud4SOA [48] The Cloud4SOA project supports cloud-based systems developers with multiplatform management, monitoring and migration by semantically interconnecting heterogeneous PaaS offerings. The deployment process can be done through the Cloud4SOA API exposed by Cloud4SOA PaaS platform adapters. The solution currently supports CloudFoundry, openshift, and Amazon Elastic Beanstalk. CloudScale [49] CloudScale aims at supporting scalable service engineering. The expected outcomes of the project are tools and methods for the modelling of design alternatives and the analysis of their effect on scalability and cost and that detect scalability problems by analysing code. The ScaleDL language will serve as a basis for these tools. Since CloudScale is also a project from call 8 of FP7-ICT only little information is available at this stage of the project. Contrail [50] Contrail aims at solving the vendor lock-in problem by providing a solution at Public Final Version, Dated April 1 st 2013 19
both IaaS and PaaS levels to allow providers to integrate resources form others clouds. It also allows application to seamlessly switch of cloud provider. The solution requires an agreement in the adoption of a common technology stack among cloud providers. mosaic [51] mosaic tackles the vendor lock-in problem by providing an open-source platform including an API for provisioning and deployment of applications on multiple clouds. The API allows developing cloud applications with abstraction of IaaS services that enables the migration of these applications from one cloud to another. OPTIMIS [52] The OPTIMIS toolkit allows to provision on mulicloud and federated cloud infrastructures and to optimize the use of resources. The toolkit provides tools for IaaS providers and service providers and developers. The Service Deployer is responsible for the deployment of services while the Service Manager is responsible for the operation of the services by keeping track of all runtime data. Reservoir [53] Reservoir has defined an architecture for future IaaS clouds. It provides solutions for the provisioning and scalability of resources on demand. An expected outcome of the project is to enable providers of cloud infrastructure to dynamically partner with each other. The description on how to manage an application on a cloud infrastructure is done through a Service Definition Manifest which is a contract between the service and the infrastructure [54]. The abstract syntax of this language is defined using the Essential Meta-Object Facility (EMOF) in order to be independent of any specific implementation platform. Some constraints on the behaviour of the underlying infrastructure can be expressed in OCL. This abstract syntax is used to define syntax of language as the application description language or elasticity rules. PaaSage [55] The main goal of PaaSage is to deliver an open and integrated platform to support both design and operation of cloud-based systems, together with an accompanying methodology that allows model-driven provisioning, deployment, and adaptation of these systems independently of the underlying cloud infrastructures. MODAClouds and PaaSage are collaborating on the research and development of what will be the core elements of MODACloudML (which are referred to in PaaSage as CloudML). 2.2.2.6 Discussion The stacks, libraries and frameworks presented in this Section provide mechanisms to automate the provisioning and deployment of application on multiple clouds. However, as explained in [6], there is a... need for developers to be able to design their software systems for multiple Clouds and for operators to be able to deploy and re-deploy these systems on various Clouds depending on the convenience. The current Cloud literature, however, does not seem to pose attention to this issue as it is focused on considering the perspective of the Cloud providers, by offering mechanisms for auto scaling of Clouds and for interoperability and federation between Clouds.. MDE is a well-known approach to tame the complexity of designing complex systems. Models enable developers to work at a high level of abstraction by focusing on cloud concerns rather than implementation details. Model transformations restrain developers from repetitive and error-prone tasks such as coding. The model-driven approach, commonly summarised as model once, generate anywhere, is particularly relevant when it comes to provisioning and deployment of applications and services across multiple clouds, as well as migrating them from one cloud to another. Even if none of the solutions presented in this state of the art fully Public Final Version, Dated April 1 st 2013 20
rely on a model-driven approach at both IaaS and PaaS levels, some of the concepts to be modelled within MODACloudML can be expressed by these solutions and they will be a source of inspiration during the design of the modelling language. The frameworks presented in this state of the art also offer some cloud-specific mechanisms for adaptation and self-adaptation [56] such as load balancing, auto scaling or failure recovery. These adaptations are triggered when some of the constraints specified at design-time are not fulfilled any more. These constraints are related either to computing resources (e.g., the CPU usage should be below 75%) or to desired topologies (e.g., the service should be deployed and running on at least two virtual machines). Self-adaptive systems are generally based on a control loop like the well-known Monitor Analyse Plan Execute from autonomic computing [56]. Inputs of the reasoning systems (Analyse and Plan) are observables describing the running system and its context. Outputs are a set of planned adaptation actions. However, the adaptation of multi-cloud systems is becoming ever more complex. Models can also help in taming such complexity. The models at runtime [57] [58] paradigm proposes to leverage models during the execution of adaptive software systems to monitor and control the way they adapt. This way, adaptation mechanism can benefit from MDE at runtime. The models@runtime layer can be applied as a pattern for the design of Monitoring and Execution (enactment of the adaptation either compositional or parameter [59]) mechanisms of the loop. Models@runtime provide an abstract representation of the running system causally connected to the underlying state of the system which facilitates reasoning, simulation and enactment of adaptation actions. A change in the running system is automatically reflected in a model of the current system. Any modification applied to this model can be enacted on the running system on demand. A classical architecture to achieve this is depicted in Figure 4 from [60]. The current model of the running system can be used by a reasoning system that will produce the target model of the system. Before adapting the system, some validation process can be done on the target model (step 1). If passed, the difference between the target model and the current model of the system is computed (step 2). Then, the adaptation engine enacts the adaptation only on parts of the system which are included in this difference (step 3). Finally, the model of the current system is updated again (step 4). Figure 4 Overview of the models@runtime approach The models@runtime approach enables the continuous evolution of the system with no strict boundaries between design-time and runtime activities. Thanks to the use of models, they provide a well-defined interface to monitor the system and adapt it. They also provide a way to measure the importance of changes in the system and analyse the delay before their enactment on the running system. In general, the stacks, libraries and frameworks we have presented do not provide such abstraction. These concerns will also be considered during the design of MODACloudML, runtime model of the running system will be provided at the CPSM level. It will describe the real topology and deployment of the system and then provide cloud specific information. Then in one hand any modifications in the specifications of the system at the CIM or CPIM level will be reflected on the CPSM level and then automatically on the running system. In the other hand, any change in the running system can be checked against the CPIM. Public Final Version, Dated April 1 st 2013 21
When designing and operating applications and services in the cloud, design decisions are not only related to computational entities but also to data representation and persistence. The next Section proposes a state of the art in mechanisms for data persistence. 2.2.3 Data persistence The Section presents the state of the art in mechanisms for data persistence. This study can help for design decision about the choice of such mechanisms on the basis of their properties. 2.2.3.1 Object-oriented mechanisms for data persistence It is undisputed that object oriented paradigm (OO) is among the most widespread approaches to produce code. It encourages the modularization and the reuse of the code and it is the ideal field for developing big and complex software systems. Still complex systems need complex data models and complete storage solutions. To address the increasing needs to easily manage data in OO languages, the Object/Relational Mapping (ORM) has been proposed as a good practise to design data and make them persistent. The persistence concept refers to the characteristic of state that outlives the process that creates it. It is obtained by storing data in a non-volatile storage such hard driver or databases [61]. Despite the fact that persistence does not suggest any kind of storage (file, database), it is quite common to use relational databases systems (RDBMSs) as non-volatile storage. As an alternative, object databases (ODBMSs) are often used since their data model is quite compatible with OO data model. In ORM style, the developer declares data objects using the OO model, so, it enriches the model by defining objects or parts of them that the developer wants to make persistent. Meta-models (that frequently consist in annotations mixed with OO code) are used to enrich the initial OO data model as shown in the code below. @PersistenceCapable public class ContactInfo { } @PrimaryKey private Key key; @Persistent private String streetaddress; Code 1 OO annotated class The ORM framework is responsible to map and store objects in the persistence layer. Furthermore, the framework provides API to load and manipulate data. Figure 5 ORM layer Figure 5 ORM layer depicts the concepts explained before. There are some benefits in the usage of ORM for development of applications: The productivity improves because the framework is responsible for automatically generating the code for data management. The user accesses data by using specific query languages provided by ORM. ORM forces the developers to strongly decouple the data model domain from the business logic domain. ORM increases the amount of reusable code and enhances the application maintainability. Public Final Version, Dated April 1 st 2013 22
Several tools based on ORM have been released in the last years, but only few of them are widely diffused and adopted. Hibernate [61], for instance, is a standard de-facto ORM for Java developers. It is a very mature ORM solution that is compliant with Java Persistence API (JPA) [62] and Java Data Object (JDO) [62]. Furthermore, it offers a proprietary SQL-like query language to manage data objects (HQL) [63]. JPA and JDO are two standard persistence technologies for Java. Based on difference between JPA and JDO we could conclude JPA is a subset of JDO. The reader will find in [62] a complete documentation for both. The available RDBMSs and the OO programming paradigm have demonstrated their unsuitability when large quantities of data have to be handled. In this case, scalability and ability to distribute data and to parallelize computations becomes a critical issue. NoSQL databases and the Map-Reduce (MR) paradigm represent the emerging solutions to deal with this issue. We will first focus on the MR paradigm in the next section which can be combined with NoSQL databases and then NoSQL will be discussed in Section 2.2.3.3. 2.2.3.2 Map Reduce MR is not new in the context of distributed computing, but it has been re-discovered brought back to the scene thanks to Google. In [64] Google presents the Map Reduce idea and its own implementation. This is assumed to run on a large cluster of machine. In [64], we read a typical MapReduce computation processes many terabytes of data on thousands of machines and Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google s clusters every day. Google had the merit to demonstrate that a simple infrastructure composed of several clusters combined with a MR paradigm is the right way of looking at big data problems. Influenced by the Google experience, an Apache project started with the idea to realise an open-source product similar to the Google one. As a result, Hadoop [65] has been released. It is a complete framework for distributed storage and processing of large data sets across clusters. It is mainly composed of: Hadoop Distributed File System: A distributed file system Hadoop YARN: Job scheduler and cluster resources manager Hadoop MapReduce: Hadoop MR environment and of more other projects 100% compliant with the basic architecture. Nowadays, Hadoop has become the open source solution used by private users as well as by big ICT company. For instance, Amazon offers VMs with complete Hadoop solution, Cloudera offers Cloud solution based on Hadoop and also Microsoft dismissed Dryad [66], its research project for writing parallel and distributed programs, to support Hadoop. To conclude, Hadoop is a very complete open-source alternative to conduct experiment with scalable systems and the parallel programming paradigm. 2.2.3.3 NoSQL For a long period, since 1970, Relational Database Management Systems (RDBMs) have been largely adopted as storage solution [67]. SQL was the mainly reason of the RDBMS success. SQL provides a comprehensive and ad-hoc query language to manipulate data. More recently, Non-relational (or not-only relational) databases, often termed as NoSQL, have emerged [68], especially in the context on widely distributed systems, and have generated both interest and criticism. NoSQLS are not new, the term NoSQL was first used in 1998 for a relational database that omitted the use of SQL (No SQL) [69]. There are two schools of thought concerning the NoSQL meaning. The first one believes the term refers to relational databases without SQL support (No SQL), while the second one refers to non-relational databases (Not Only SQL). In this document we will refer to the term NoSQL with the meaning of distributed non-relational databases. One of the major problems often mentioned is the heterogeneity of the languages and the interfaces they offer to developers and users. Different platforms and languages have been proposed, and applications developed for one Public Final Version, Dated April 1 st 2013 23
system require significant effort to be migrated to another one. Furthermore, some crucial properties (such as transactionality) are missing in the typical NoSQL approaches. From a theoretical point of view, the need for a uniform classification and principle generalization for NoSQL databases is widely recognized and was described by Cattell in [70], reporting a detailed characterization of nonrelational systems. Stonebraker [71] highlights the absence of a consolidated standard for NoSQL models and the absence of a formal query language for those models. Kossmann and Kraska [72] analyze the offerings of the main PaaS storage provider Amazon, Google, and Microsoft and examine the common features and differences, from a data perspective. Leymann et al. provide taxonomy for Cloud Data Hosting Solutions in their survey [73]. On the transactionality aspect, some early studies have been done within the CumuloNimbo project [74] [75], which proposes some initial statements and visions on the topic. The most interesting features of NoSQL DBs, in our opinion, are their attitude to scale based on the workload, the characteristic to be schema-less, and the fact they don t guarantee ACID (Atomicity, Consistency, Isolation, Durability) properties (ACID are not always a system requirement). [76] offers an even longer list of interesting characteristics: Avoidance of unneeded complexity. High throughput. Horizontal scalability and running on commodity hardware. Avoidance of expensive Object-Relational mapping: The NoSQLs are often designed to store complex data structures in a way similar to the Object-Oriented programming language than the relational databases. Complexity and cost of setting up database clusters: The NoSQLs are simple systems that are specifically designed to run in distributed environment and generally they do not require the administrator role. Possibility to establish a trade-off between reliability and performance: this is the case in which to share data is more important than persist data, for instance, because of performance. to clarify, we can think about the situation in which several web processes need to share the HTTP user session. In this case it may be more convenient do not store the session to the detriment of reliability. The current One size fit s it all Databases Thinking Was and Is Wrong: in the past the trend for managing data was to adopt RDMS as unique solution for every domains problem. Nowadays, different NoSQLs are designed for different domain problems. The myth of effortless distribution and partitioning of centralized data models: in [77] is discussed the disadvantages due to develop data model in a centralized model. Shalom suggests designing data models to fit into a partitioned environment even if there will be only one centralized database server initially. This approach offers the advantage to avoid exceedingly late and expensive changes of application code. Movements in programming languages and development frameworks: the lack of the requirement of NoSQL to be general-purpose data storage makes the NoSQL offer focused on a specific technology often related to a specific programming language. Requirements of cloud computing: The paper refers to the scalability and low administrator overhead. The RDBMS plus caching-layer pattern/workaround vs. systems built from scratch with scalability in mind: This point is explained in [78]. In the blog Hoff reports the architecture design of real systems that need scalability - Shard MySQL to handle high write loads, cache objects in memcached to handle high read loads, and then write a lot of glue code to make it all work together. [ ] With a little perspective, it's clear the MySQL+memcached era is passing. Yesterday s vs. Today s Needs: In the 1960s and 1970s databases have been designed for single, large high-end machines. In contrast to this, the trend of many large companies is the adoption of hardware which will predictably fail. Consequently, applications are designed to be dynamically adaptive to failures. 2.2.3.3.1 Taxonomy It is now clear that NoSQLs databases are ad hoc solutions in the meaning that they have been designed by keeping in mind a particular problem space. They are used to have the best performance in the context of usage. For this reason, they are deeply different and the categorization is not simple and often presents exceptions. Nevertheless, a taxonomy based on their data model has been provided by Yen in [79] and by Cattel in [70]. In Table 5 NoSQL taxonomy these two taxonomies are compared. We extend them by including a new category, that is, the Graph-based category. Table 5 NoSQL taxonomy Public Final Version, Dated April 1 st 2013 24
Cattel Taxonomy Yen Taxonomy Our Taxonomy Key-Value-Cache Key-Value-Store Key-Value Store Document Stores Extensible Record Stores Eventually-Consistent K-V-S Ordered-Key-Value-Store Data-Structures Server Document Store Object Store Wide Columnar Store Tuple-store Key-Value Document-based Column-oriented Graph-based 2.2.3.3.2 Data Model The Table 5 NoSQL taxonomy shows the amount of different data-models of NoSQLs. In this paragraph we will illustrate the main and diffused data model and the respective commercial or free offers. 2.2.3.3.2.1 Key-Value Key-Value stores (KVs) have a simple data model (see Figure 6 Key-Value Data Model) similar to a map/dictionary and they provide simple operations that allow the users to fetch or put data by key. KVs have existed for a long time, for instance BerkeleyDB [80] which was developed in 1996 at the University of California. Of course, the modern KVs have been designed with particular attention to scalability and some of them provide a rich ad-hoc query language. It is completely schema-free and optimized to simple operations, which are based on key only. The main candidate in the category is Amazon Dynamo DB [81] because of its influence on a number of NoSQLs. Project Voldemort [82] is a KV stores developed and still used at LinkedIn. Redis [83] is particular since it allows matching for key-ranges or regular expression. Memcached [84] and MemcacheDB [85] are two KV solutions widely used by very large web sites in order to reduce database load. The first is a completely in memory data store and the second adds persistence to the first by relying on Berkeley DB [80]. Scalaris [86] is completely developed in Erlang and takes profit of this language propriety. In fact, Erlang is a functional language developed for distributed systems. Interesting features are: strict consistency, complex query model and ACID properties. Figure 6 Key-Value Data Model Public Final Version, Dated April 1 st 2013 25
2.2.3.3.2.2 Document-based The document-based (DB) data model (see Figure 7 Document-based Data Model) treats the value field as a document. The term document is not a document in the traditional sense (article, MSWord file etc.), it allows values to be nested documents or lists as well as scalar values, and the attribute names are dynamically defined for each document at runtime. A document is a schema-less data model because the attributes are not defined in a global schema, and wide ranges of values are permitted. Contrary to KVs that have opaque value fields, DBs have structured value fields. Most of them propose JSON (JavaScript Object Notation) or JSON like document as document. JSON document has the advantage to be compliant with web-based technologies (it is natively supported by JavaScript), and JSON also has the additional advantage of supporting data types, which make document stores very developer-friendly. Apache CouchDB [87] is a document database written in Erlang. It provides REST and HTTP access. Documents consist of named fields that have a key/name and a value. A field name has to be unique within a document and its assigned value may be a string (of arbitrary length), a number, a boolean, a date, an ordered list or an associative map. The document is represented as a JSON object. MongoDB [88] is the most popular DB [89]. The document is a JSON-Style document represented in BSON format in order to preserve data size. It provides a rich query language to manage the document value. 2.2.3.3.2.3 Column-oriented Figure 7 Document-based Data Model The Column-Oriented (CO) data model (see Figure 8 Column-oriented Data Model) optimizes the columns operations of table instead of rows. In this sense the table is stored and processed by columns. This approach is more efficient when it comes to retrieve data as an aggregate from several rows but which is only a subset of these rows because reading that smaller subset of data can be faster than reading all data. The data model is a hybrid between a fixed tuple data-schema and a document, where families of attributes are defined in a schema, but new attributes can be added (within an attribute family) on a per-record basis. Attributes may be list-valued. They are also described as [sparse], distributed, persistent multidimensional sorted [maps] [90]. Google s BigTable [90] is designed to scale to a very large size. It is used by over sixty projects at Google, including web indexing, Google Earth, Google Analytics, etc. HBase [91] is a clone of BigTable, developed in Java. It is part of Hadoop distribution. A notable usage of HBase is the Facebook chat. Cassandra [92] [93] is famous since it was developed by Facebook. In [92], Lakshman describes Cassandra as a distributed storage system for managing structured data that is designed to scale to a very large size [90]. It shares many design and implementation strategies with databases [90] but does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format [90]. Public Final Version, Dated April 1 st 2013 26
Figure 8 Column-oriented Data Model 2.2.3.3.2.4 Graph-based Graph-based (GB) data models are graphs, in this sense they are deeply different to the other categories we discussed. The data model is best able to model domain problems that can be represented by graph as social relationship, maps etc. Particular query languages allow querying the data bases by using classical graph operators as neighbour, path, distance etc. Unlike the NoSQL systems we presented, these systems generally provide ACID transactions. Among the others we mention Neo4j [94], an open-source implementation of GB. Nodes store data and edges represent relationships. The data model is called property graph to indicate that edges could have properties. Neo4j provides a REST interface or a Java API. Figure 9 Graph-based Data Model 2.2.3.3.3 Query Model The question of how to query a NoSQL is a disputed argument. This is due to the deep differences in their data model. Many of NoSQLs databases provide very elementary operations to manipulate data; others expose more complex and powerful query languages. The absence of SQL imposes to learn different languages dependent of the adopted NoSQL solution. The current situation of NoSQLs is similar to the situation of RDBMSs before Codds proposed SQL. A good discussion about query languages provided by NoSQLs is proposed in [95]. The categories are briefly discussed below. 2.2.3.3.3.1 Key-Value Due to their simple data-model, the data manipulation layer consists on simple key-based instruction as put, get and delete. Some implementations often provide REST interfaces that are very useful in webapplications context. For instance, Membase [96] offers a REST API natively. Public Final Version, Dated April 1 st 2013 27
2.2.3.3.3.2 Document-based This category offers rich data manipulation solutions. It often provides a powerful query language that allows range queries on value, secondary indexes, querying nested documents and operation like and, or etc. While MongoDB [88] supports additional operations like count and distinct, Riak [97] is optimized to traverse links between documents easily. 2.2.3.3.3.3 Column-oriented They provide range queries (queries to retrieve records where some values are in a defined range) and some operations like in, and, or and regular expressions restricted on row keys or indexed values. Every column family store offers a query language similar to SQL but only key and indexed value can be used in the where clauses. In spite of SQL-like query languages are common adopted by different products; no common query language has been adopted. 2.2.3.3.3.4 Graph-based The query languages are divided in two categories, languages to define graph patterns matching, and languages that offer traversal instructions in order to query a graph. The traversal policy could be breadth-first or depthfirst. The first one is better to find a solution as fast as possible, the second one in order to obtain the shortest path. SPARQL [98] is a popular, declarative query language that provides graph pattern matching. Example of graph traversals query language is Gremlin [99]. 2.2.3.3.4 Polyglot Persistence Based on the concept of Polyglot Programming coined from Neal Ford in 2006 [100], Martin Flower proposes the idea of Polyglot Persistence [101]. With the term of Polyglot Persistence, Flower refers to the concept that applications should be designed using mixed data storage solutions to take advantage of different data models that are suitable for specific applications. But mixed architectures introduce a mixture of interfaces to be learned. This discussion introduces two open points that are becoming increasingly relevant research area. The first concerns the definition of a common data management language (as SQL for RDBMS). The second concerns the choice of the best architecture based on the application needs. Which metrics could be used to decide the best storage in terms of data model? It is our personal opinion than CAP properties (Consistence, Availability and Partitioning) can represent a good metric. We will shortly present the CAP in the next paragraph. 2.2.3.4 CAP Theorem We cannot present the NoSQLs without a brief discussion about the CAP theorem. The idea on which the theorem is based was introduced by Eric Brewer in 2000 [102]. This conjecture has been formalized in 2002 by Seth Gilbert and Nancy Lynch [103], and became famous as CAP theorem. The theorem states that: CAP theorem: It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties: (i) Availability and (ii) Atomic consistency in all fair executions (including those in which messages are lost). Even though this theorem does not explicitly concern network partitioning, a distributed network complies with the model of asynchronous network and, in case of partitioning, with asynchronous network with messages losses. The theorem formalizes the idea that any distributed system has three properties (Consistency, Availability and Partition) but only two can be satisfied at the same time. The consequence of CAP is that NoSQL cannot guarantee ACID properties and reliable processing of database transactions. In literature this is indicated by term BASE (Basically Available, Soft state, Eventual consistency). 2.2.3.5 Discussion NoSQL solutions are numerous and heterogeneous. They offer different data models and query models; some of them are able to guarantee ACID properties. Practically, every NoSQL database is specialized on certain use cases. At the moment there is not tool that supports designers in choosing which solution is the right one for some application. The discussion presented in this document could help us in expressing some preference. For Public Final Version, Dated April 1 st 2013 28
instance, we can choose a database depending on the provided access layer (APIs, language SDKs, query language expressiveness). Others characteristics to take in to account could be the CAP properties of the software system (the measurement of consistency, availability and tolerance of partitioning). Bearing in mind the data model, KVs should be used for very fast and simple operation. Document databases show their benefits when schema flexibility and rich query language are required. Otherwise, column-based should be used if the application target is to manage big quantity of data. Graph-based databases should be used in domains, where entities are as important as the relationship between them. 2.3 Design Time Schema Transformation Schema transformation is a wide research field, addressed by a variety of model management perspectives. Bernstein and Melnik [104] present the recent state of the art in this field and, indirectly, outline an overview of the major approaches and achievements. A theoretical approach has been proposed by Atzeni et al. with ModelGen, an operator that translates schemas from one model to another, and its implementation MIDST [105]. The approach translates both schema and data: a wide family of models is handled by using a metamodel in which models can be succinctly and precisely described. The approach expresses the translation as Datalog rules and exposes the source and target of the translation in a generic relational dictionary. More recently, this work has been extended to cover also runtime model translations [106]. Fagin et al. introduced Clio as a data exchange system, which however shares several features of typical data schema translation solutions. Clio is aimed at building a completely defined mapping between two schemas, given a set of user-defined correspondences [107]. Given the large number of different data models for NoSQL, they are certainly an important context where to apply schema transformation techniques. Related to this, Atzeni et al. proposed a common programming interface to NoSQL systems (and also to relational ones) called SOS (Save Our Systems) [108], whose goal is to support application development by hiding the specific details of the various systems. It is based on a metamodeling approach, in the sense that the specific interfaces of the individual systems are mapped to a common one. The tool provides interoperability as well, since a single application can interact with several systems at the same time. Another schema transformation approach is presented in [109]. The aim of this work is twofold. First, since real systems are divided in relational and non-relational parts, the author proposes a mapping mechanism to store generic NoSQL non-relational data into RDBMS database. Second, on top of that, he also provides a common access data layer for both relational and non-relational. UnQL [110] is another approach towards standardization of query languages, designed to query semi structured and document data (i.e. XML and JSON document). 2.3.1 Synthesis of the state of the art We have proposed an overview of the state of the art in cloud techniques and modelling concepts relevant to the CIM and CPIM/CPSM levels of MODAClouds model-driven architecture. At the CIM level, we have presented existing service oriented modelling techniques and languages that can be used to represent legacy application to be deployed on the cloud in a cloud computational independent way. At the CPIM level, we have exhibited first a classification of cloud solutions for the provisioning and deployment of application in the cloud with a focus on their ability to manage multi-clouds systems and their use of models. As a result we have clearly identified the benefits and the need for a model-based approach in order to tame the complexity of designing such systems. However, the concepts manipulated by these solutions will serve as a basis for the definition of MODACloudML models and metamodels. Among the frameworks and libraries presented, some of them offer few simple adaptation features. We have also identified a need for the usage of models at runtime in order to tame the complexity of adaptation and ease the reasoning process for self-adaptation. Finally we have presented an overview of existing approach for data persistence. This overview highlighted the specificities of each of them and we believe this will help us for the data synchronization features and in expressing preference for instance to choose a databas. On the basis of this analysis of the state of the art and on features provided by existing tools we will specify in the next Section the requirements for WP4's prototype. 3 Requirement specification In this Section the requirements elicited in WP4 is specified according to the use case template defined in D3.1.1 Public Final Version, Dated April 1 st 2013 29
3.1 Context and system overview 3.1.1 Context Category name MODACloudML platform (Model abstractions, modelling IDE and model based runtime management exploiting Models@runtime) The scope of the following use cases specification is within both design time for specification, deployment time for Cloud application provisioning and deployment, and run time for run time adaptation using models@runtime. Figure 10 identifies this scope in the context of the MODAClouds reference architecture. Decision making New or legacy applications design Code development Management Semi-automatic transformation Automatic deployment IDE DSS Design-time Developer CIM CPIM CPSM Run-time Service Operator Monitoring & Data syncronization Run-time adaptation Goal: QoS assurance & costs minimization Figure 10 Scope with respect to the MODAClouds reference architecture; Run time adaptation, Data Synchronization and MODACloudML CIM, CPIM and CPSM modelling. Figure 2 depicts the overall process applying the model based MODAClouds approach. The overall process is that the CloudApp developer specifies CIM level constraints such as constraints on location, cloud platforms to use, cost profiles, QoS constraints etc. Then a platform independent model of the deployment options and provisioning of resources are specified at the CPIM level and transformed to MODAClouds platform specific representations (e.g., in the form of a JSON templates and exploiting the MODACloud execution platform). At Public Final Version, Dated April 1 st 2013 30
this stage the CloudApp provider specifies actual deployment schemes based on the deployment templates specified by the CloudApp developer, and the cloud-enabled system(s) are deployed on actual Cloud platforms. The MODAClouds execution environment supports the monitoring, reasoning and adaptation of the running system. The monitoring and reasoning mechanisms are developed in WP5 and WP6, while the actual adaptation engine (for the enactment of the adaptation) is developed in WP4 by exploiting Models@runtime mechanisms. Note in particular that the modeling using MODACloudML is scoped to only concern the specification of provisioning, deployment and adaptation in the cloud. Engineering of the components, systems and services can be done using any of the many approaches that already exists for that purpose (MODACloudML should not exclude any of these approaches, thus, both model driven approaches and non-model driven approaches can be applied for engineering of software components, services, artefacts and systems). Thus, the scope of the MODACloudML platform is merely on the task of provisioning and deployment on multi-cloud platforms as well as runtime adaptation of these cloud-enabled systems and services. MODACloudML IDE Tool(s) to support MODACloudML modelling MODAClouds Exec env CIM (CIM level Constraints) CPIM spec. and Trans. to CPSM Deployment wizard Models@RT Runtime adaptation Monitoring and Reasoning Resources (Code, war ) Amazon Rackspace.. FlexiScale CloudApp Developer 3.1.2 Figure 11 Overall MODAClouds approach, the scope of WP 4 is indicated by the blue squares, while the green squares are interacting elements. System boundary model CloudApp provider Cloud-enabled system The system boundary model for WP 4 CloudML platform is specified in the system boundary model below. Public Final Version, Dated April 1 st 2013 31
uc System Boundary Model MODACloudML platform (WP 4) CloudApp Specification CIM lev el specification Specify Adaptation Model «extend» CloudApp Adaptation Engineer CPIM lev el Specification CloudApp Dev eloper «extend» Data Migration CPSM deriv ation (from CPIM) «use» CloudApp CloudApp Deployment and provisioning Model based RunTime Management and adaptation Fill inn Deployment wizard Models@Runtime GetModels@runtime Information Monitoring (WP6) CloudApp Provider CloudApp Deployment «use» Enact Adaptation Reasoning (WP6) Figure 12 System boundary model. The MODACloudML platform consists of three main modules: i) the CloudApp specification that includes the CIM level and the CPIM level specifications as well as the derivation of the CPSM level specifications, ii) CloudApp Deployment and provisioning, where the CloudApp Provider is able to in an easy way deploy cloudenabled application(s) and iii) Model based Runtime adaptation where the model based representation of the system are provided as well as the cloud application adaptations. The Monitoring and Reasoning are central component to understand and reason when, what and how to adapt. The development of these is the responsibilities of WP5 and WP6, while the MODACloudML platform (WP4) will be responsible for developing mechanisms for conducting the actual requested adaptation. Thus, the Monitoring may monitor some information of the Models@runtime and the reasoning will reason and decide on when and how to adapt, then the enact adaptation use case execute the adaptation using the model representation of the system to manipulate/reconfigure the system and then enact the specified adaptation (models@runtime). Each of the use cases are elaborated further in the subsections below according to the use case template described in D3.1.1. 3.2 Use case specification for the CPIM level specification 3.2.1 Use case heading Use case name Use case ID CPIM level specification use case UC-MC.wp4.MODACloudML platform. CPIM level specification.-v01 Public Final Version, Dated April 1 st 2013 32
Revision and Reference Revision : 01 Reference: NA Status In progress Priority of accomplishment Must have: The system must implement this goal/assumption to be accepted. Author(s) and date Arnor Solberg, 20 December 2012 3.2.2 Use case description Use case diagram (Figure) See the CPIM level specification use case of the system boundary model. This use case also includes the adaptation specification and data migration use cases uc System Boundary Model Specify Adaptation Model «extend» CloudApp Adaptation Engineer CloudApp Developer CPIM lev el Specification «extend» Data Migration Goal The goal of CPIM level specification use case is to enable the CloudApp Developer to specify provisioning and deployment and QoS constraints of his cloud application in a platform independent way. This include also support for migration (e.g., wrapping) of existing systems and the specification of provisioning and deployment and QoS constraints of these as well as enablement of Data migrationbetween heterogeneous platforms by specifying mappings between relevant platforms. The CPIM level specification also includes the specification of the adaptation model. This model specifies application specific cloud based adaptation to optimize utility within the set of specified constraints for different context situations. Actors CloudApp Developer: Is responsible for specifying CIM level constraints, CPIM level provisioning and deployment and QoS constraints, migration and Data migration. The CloudApp Developer also performs derivation of the CPIM specifications to CPSM models. The CPSM models are exploited for run time management and adaptation. CloudApp Adaptation Engineer is responsible for specifying the cloud based adaptation model of the cloud-enabled application 3.2.3 Use case scenarios The main success scenario is indicated in the Figure below. Public Final Version, Dated April 1 st 2013 33
act CPIM level specification CloudApp Developer CloudApp Adaptation Engineer CIM lev el constraints App description and resources (SE models, code, war...) Identify deployment artifacts Specify constraints (Physical resources, QoS, dependencies...) and deriv e deployment templates migrate? Specify migration of legacy (e.g., wrapper) DataSynch? Specify mapping of data for relev ant platforms Specify Adaptation Model (context v ariables, variants, utility funcions and adaptation rules) To CPSM derivation use case iterate adaptation model? Figure 13 Main scenarios of the CPIM level specification. The Figure above shows the main scenarios of the CPIM level specification use case. Note that this shows the flow of each iteration and the development will typically follow an iterative and incremental approach. Note also that the specification of the adaptation model can have iterations within each development iteration cycle. Triggers CIM level constraints are specified or evolved CloudApp developer is ready to specify the CPIM level Preconditions The CIM level specification of the current iteration is performed. The application specification is available Post conditions The CPIM level specification of the current iteration is finalized and ready to be transformed to the CPSM representation (serialization of the model according to the MODACloudML metamodel) Public Final Version, Dated April 1 st 2013 34
3.2.4 Information model The current information model for the PIM level specification is shown in the Figure below. This model is the initial version of the MODACloudML metamodel and shows the modelling concepts to be used for the CPIM level specification. It includes concepts for modelling the provisioning and deployment. Additional models will be provided for the modelling of QoS constraints, adaptation, data synchronization and migration. Figure 14 Information model for PIM level specification 3.2.5 Interface specification Façade The interface model of the CPIM level use case will in essence be MODACLoudML IDE, thus, the essence of this will be CRUD operations for modelling the MODACloudML models according to the MODACloudML metamodel (se initial version in the information model above). In addition it will be general administration of the models (loading, saving etc). Elaborated details of these will be identified as part of the actual development of the MODACloudML IDEs and according to needs and requirements from the case study providers and other sources. 3.2.6 QoS requirements Relevant QoS requirements for the CPIM level requirements are in general related to ease of use and usability of the MODACloudML IDE and related tools (including response time using the tools for modelling and compilation). In general these requirements should be at the common level for tools development in the context of a research project. Thus, advanced user interfaces and thorough tuning of performance is out of reach within the duration of the project, however, the tooling should be easy to understand and use by the case study providers of the project Public Final Version, Dated April 1 st 2013 35
3.3 Use case specification for the CPSM derivation 3.3.1 Use case heading Use case name CPSM derivation use case Use case ID UC-MC.wp4.MODACloudML platform. CPSM derivation.-v01 Revision and Reference Revision : 01 Reference: NA Status In progress Priority of accomplishment Must have: The system must implement this goal/assumption to be accepted. Author(s) and date Arnor Solberg, 12 March 2013 3.3.2 Use case description Use case diagram (Figure) CPSM derivation (see also the system boundary model). uc System Boundary Model CloudApp Developer CPSM deriv ation (from CPIM) Goal The goal of the CPSM derivation use case is to derive the CPSM level models from the CPIM level specification. The CPSM level models are the models exploited at run time. The CPIM level models are transformed to CPSM level models through model transformations. The CPSM derivation use case includes some model checking to ensure significant qualities of the derived CPSM models. Actors CloudApp Developer: Is responsible for specifying CIM level constraints, CPIM level provisioning and deployment and QoS constraints, migration and data migration. The CloudApp Developer also performs derivation of the CPIM specifications to CPSM models. The CPSM models are exploited for run time management and adaptation. 3.3.3 Use case scenarios The main success scenario is indicated in the Figure below. Public Final Version, Dated April 1 st 2013 36
act CPSM derivation CloudApp Dev eloper Transformation Engine CPIM lev el specification Prepare and start CPSM deriv ation Execute CPSM derivation CPSM representation Figure 15 Main scenarios of the CPSM derivation. The Figure above shows the main scenarios of the CPSM derivation use case. The CloudApp developer acquires the CPIM level specification and prepares and starts the CPSM derivation. The transformation is executed and the output is the CPSM representation of the actual CPIM level specification. Triggers CloudApp developer starts the CPSM derivation Preconditions The CPIM level specification is available and ready for derivation to CPSM. Post conditions The CPIM level specification is transformed to the CPSM representation (serialization of the model according to the CloudML metamodel) 3.3.4 Information model The information model of the CPSM level will be similar to the information model of the CPIM level (see Section 3.2.4). However, some platform specific properties and concepts will be added. 3.3.5 Interface specification Façade The interface model of the CPSM derivation will include two main methods: one for checking properties of the CPIM level model and one for transforming the the CPIM level model to the CPSM level. - ModelCheckingReport checkcpimproperties(cpimmodelspec model) - CPSMmodel CPIMtoCPSM(CPIMmodelSpec model) 3.3.6 QoS requirements Relevant QoS requirements for the CPSM derivation are mainly performance of the transformation. In general there are not very strict performance requirements for the compilation of the CPIM specification. However, the CloudApp developer should not avoid necessary compilation because of performance issues. Performance is naturally dependent on the size and complexity of the CPIM specification, however, compilation normally aims to not exceed 30 seconds. Public Final Version, Dated April 1 st 2013 37
3.4 Use case specification for the CloudApp provisioning and deployment use cases 3.4.1 Use case heading Use case heading for the Fill in deployment wizard use case Use case name Fill in deployment wizard Use case ID UC-MC.wp4.MODACloudML platform. Fill in deployment wizard.-v01 Revision and Reference Revision : 01 Reference: NA Status In progress Priority of accomplishment Must have: The system must implement this goal/assumption to be accepted. Author(s) and date Arnor Solberg, 12 March 2013 Use case heading for the CloudApp deployment use case Use case name Initiate CloudApp deployment Use case ID UC-MC.wp4.MODACloudML platform. CloudApp deployment.-v01 Revision and Reference Revision : 01 Reference: NA Status In progress Priority of accomplishment Must have: The system must implement this goal/assumption to be accepted. Author(s) and date Arnor Solberg, 12 March 2013 3.4.2 Use case description Use case diagram (Figure) Initiate CloudApp provisioning and deployment use case (see also the system boundary model). uc System Boundary Model CloudApp Deployment and provisioning Fill inn Deployment wizard CloudApp Provider CloudApp Deployment Goal The goal of the initiate CloudApp provisioning and deployment use cases is to specify the deployment and execution profile(s) of a cloud application. The profiles are derived from the CPIM specifications and the CloudApp provider will specify the profile(s) based on wizards where the options may be selected (the Fill in deployment wizard use case). Then the initiate CloudApp Deployment use case will initiate the automatic deployment of the application based on the specified profile(s) through the execution platform. Actors CloudApp Provider: Is responsible for specifying the deployment and execution profile of the actual cloud application and deploy and provide it accordingly. Public Final Version, Dated April 1 st 2013 38
3.4.3 Use case scenarios The main success scenario is indicated in the Figure below act CloudApp deployment and provisioning CloudApp provider Deployment and prov isioning engine CloudApplication CPIM specification Specify deployment and execution profile Deploy application Perform deployment and prov isioning Running cloud application Figure 16 Main scenarios of the CPSM derivation. The Figure above shows the main scenario of the CloudApp provisioning and deployment use cases. The CloudApp provider specifies the deployment and execution profile(s) applying wizards which are based on the CPIM specification. The executable artefacts of the cloud application are also required. When the profile(s) are specified the CloudApp provider initiates the deployment of the application and the Deployment and provisioning engine performs the deployment and initial provisioning of the cloud application. Triggers CloudApp provider gets the executable artifacts of the running application and the CPIM specification. Initiate the specification of the deployment and execution profile(s) Preconditions The executable artefacts of the running application and the CPIM specification are available. Post conditions The cloud application is deployed and executes. 3.4.4 Information model The information model will be similar to the information model of the CPIM level (see Section 3.2.4). In addition it will be specific concepts for specifying provisioning and deployment profiles. An illustration exemplifying the initial vision for a provisioning and deployment wizard is shown in the Figure below. Public Final Version, Dated April 1 st 2013 39
Figure 17 Exemplifying the vision of a provisioning and deployment wizard 13 3.4.5 Interface specification Façade The interface model of the provisioning and deployment is mainly to support the wizard for specifying deployment and execution profiles including provisioning and adaptation constraints. In addition it is the actual automatic deployment which requires the profile specifications as well as the executable artefacts of the actual application. 3.4.6 QoS requirements Relevant QoS requirements for the CPSM derivation are usability and ease of use as well as performance of the automatic deployment. In general these requirements should be at the common level for tools development in the context of a research project. Thus, advanced user interfaces and thorough tuning of performance is out of reach within the duration of the project, however, the tooling should be easy to understand and use and with acceptable performance as required by the case study providers of the project. 3.5 Use case specification for the Model based runtime management and adaptation use cases 3.5.1 Use case heading Use case heading for the Models@Runtime use case Use case name Models@Runtime Use case ID UC-MC.wp4.MODACloudML platform. Models@Runtime.-V01 Revision and Reference Revision : 01 Reference: NA Public Final Version, Dated April 1 st 2013 40
Status In progress Priority of accomplishment Must have: The system must implement this goal/assumption to be accepted. Author(s) and date Arnor Solberg, 12 March 2013 Use case heading for the enact adaptation use case Use case name EnactAdaptation Use case ID UC-MC.wp4.MODACloudML platform. EnactAdaptation.-V01 Revision and Reference Revision : 01 Reference: NA Status In progress Priority of accomplishment Must have: The system must implement this goal/assumption to be accepted. Author(s) and date Arnor Solberg, 12 March 2013 3.5.2 Use case description Use case diagram (Figure) Model based runtime management and adaptation use cases (see also the system boundary model). uc System Boundary Model CloudApp Model based RunTime Management and adaptation Models@Runtime GetModels@runtime Information Monitoring (WP6) «use» Enact Adaptation Reasoning (WP6) Goal The goal of the Model based runtime management use cases is to provide models at runtime for managing and adapting the running cloud application. Models@runtime technologies will be applied and evolved to support a model based representation of the system that is causally connected to the actual running system. Runtime adaptation is performed by manipulating the model level that can be checked before the actual enactment of the adaptation. Actors CloudApp: The actual cloud application that is dynamically configured according to manipulations of its model representation using models@runtime technologies. Monitoring: The monitoring mechanisms are responsible for monitoring the application and its environment. Some specific information of the running cloud application are provided to the monitoring component through the models@runtime that are causally connected and in synch with the running application. The monitoring component will exploit a set of different mechanisms to monitor the application and its environment. This are further elaborated in the WP6 deliverables. Public Final Version, Dated April 1 st 2013 41
Reasoning: The reasoning component is responsible for doing reasoning based on the monitored information and making decisions whether to adapt the running cloud application. In order to make a decision, the Reasoner can consume data from the monitoring and from the models@runtime. When a decision of adaptation is made the reasoning component provides the adaptation decision to the enact adaptation use case which executes the adaptation using the models@runtime mechanisms. 3.5.3 Use case scenarios The main success scenario is indicated in the Figure below act Model based runtime management and adaptation Monitoring Reasoning Adaptation manager Models@runtime do continous monitoring Monitored information reasoning and decision making adapt? trigger adaptation Adaptation decision conduct adaptation perform adaptation Adapted CloudApp FlowFinal Figure 18 Main scenarios of the model based management and adaptation use cases The Figure above shows the main scenarios of the model based management and adaptation use cases. The Monitoring are responsible for doing continuous monitoring and feed this data to the Reasoning which will be doing reasoning and decide whether to trigger an adaptation or not. In case of an adaptation decision the reasoned will provide the actual adaptation decision to the Adaptation manager who is responsible for enacting the adaptation. The Adaptation manager uses the models@runtime to perform the actual adaptation. The result of this adaptation process is the running adapted cloud application. Triggers At the start of the execution of a MODAClouds based cloud Public Final Version, Dated April 1 st 2013 42
application the continuous monitoring process will be initiated as well as the reasoning on the monitored data. Preconditions Monitoring mechanisms such as sensors and required monitoring APIs (e.g., for monitoring cloud infrastructure status) are available Post conditions A successful adaptation is enacted and the adapted cloud application are up and running 3.5.4 Information model For the enactment of the adaptation the MODACloudML metamodel will be the basis (see the information model of the CPIM level Section 3.2.4). For the monitoring and reasoning the information models will be elaborated in WP 5 and WP 6. 3.5.5 Interface specification Façade The interface between the monitoring and reasoning will be the exchange of the monitored data. The interface between the reasoning and adaptation manager will be the description of the adaptation decision (e.g., the adding or removal of components (cloud resources etc), features etc). There models@runtime will also provide information of the state of the running application through an API which can be exploited by the monitoring component. 3.5.6 QoS requirements The most important QoS requirements for the model based runtime management and adaptation is performance, however these requirements will vary dependent of the kind of adaptations. Some adaptations are time critical and the adaptation should not be noticeable from the users, while other adaptations (e.g., moving the cloud application or part of it to another cloud infrastructure to optimize cost) are not necessarily time critical. Furthermore, the performance of the adaptations is typically also dependent on the performance of third party components (e.g., components of the applied cloud infrastructure). These are issues and considerations that need to be clarified further in the project. 3.6 CIM Modelling Support 3.6.1 Context and system overview 3.6.1.1 Context Category name Design time Modelling This Section concerns the modelling support for the Computation Independent Model (CIM) to be provided by MODAClouds. The proposed approach is a in fact a Service Oriented Approach (SOA). The CIM is therefore composed by a set of service definitions, a model of their orchestration and a model of the requirements they are supposed to fulfil. The CIM model is designed by the system designer; however parts of it may be reused or reverse engineered from legacy models and code. The following picture illustrates the contents of the CIM model: Public Final Version, Dated April 1 st 2013 43
Figure 19 CIM Model More specifically, the CIM model is composed of a definition of a set of services (Service Definition) along with their public interfaces (Public Interface). These services exchange data (Exchanged Data Definition) and cooperate in a way defined by a Service Orchestration. Furthermore, the CIM model should contain a requirements specification comprising a set of Business Requirements, the requirements on the Quality of the Service provided by these services (QoS Requirements) and the requirements on the data to be manipulated (Data Requirements). The Service Definitions may also contain Legacy Content that even though is not directly part of the CIM is necessary to the correct implementation of the CIM to CPIM transformation and from its subsequent transformation to a CPSM model and then to code. 3.6.1.2 System boundary model Figure 20 System boundary model Public Final Version, Dated April 1 st 2013 44
3.6.2 Use case specification for the Define Application Services use case 3.6.2.1 Use case heading Use case name Use case ID Revision and Reference Status Priority of accomplishment Define Application Services UC-MC.WP4.CIM.Define Application Services.-V01 Revision : 01 Reference : NA In progress Must have Author(s) and date Marcos Almeida, 6 March 2013 3.6.2.2 Use case description The CIM model contains a set of service specifications. This use case allows the designer to create one of such specifications. Each service specification defines a set of public interfaces that contain a description of the data exchanged by different services (See Information Model). Use case diagram (Figure) Figure 21 Use Case Figure Goal Define the high-level black box services that compose the application. 3.6.2.3 Information model Figure 22 Information: Define Application Services Public Final Version, Dated April 1 st 2013 45
3.6.3 Use case specification for the Define Services Orchestration use case 3.6.3.1 Use case heading Use case name Use case ID Revision and Reference Status Priority of accomplishment Define Services Orchestration UC-MC.WP4.CIM.Define Non-functional Requirements.-V01 Revision : 01 Reference : NA In progress Could have Author(s) and date Marcos Almeida, 6 March 2013 3.6.3.2 Use case description This use case allows the designer to specify a service orchestration, which defines how the set of services specified by means of the Define Application Services use case interact. Use case diagram (Figure) Figure 23 Use Case diagram Goal Define the orchestration of the services that compose the application. 3.6.3.3 Use case scenarios The main success scenario is shown in the Figure below (not that this Figure is for illustrative purposes only, it is thus partly synthetic and at a high abstraction level) Preconditions The service specifications should have been previously defined. 3.6.3.4 Information model Figure 24 Information: Define Services Orchestration Public Final Version, Dated April 1 st 2013 46
3.6.4 Use case specification for the Define Service Requirements use case 3.6.4.1 Use case heading Use case name Use case ID Revision and Reference Status Priority of accomplishment Define Service Requirements UC-MC.WP4.CIM.Define Services Requirements.-V01 Revision : 01 Reference : NA In progress Should have Author(s) and date Marcos Almeida, 6 March 2013 3.6.4.2 Use case description This use case allows the designer to define the requirements that should be fulfilled by the services and orchestration defined at the CIM model. Requirements may be either business requirements, QoS requirements or Data requirements. Use case diagram (Figure) Figure 25 Requirements Use Case diagram Goal Define the business, QoS and data requirements that should be fulfilled by the application services. 3.6.4.3 Use case scenarios The main success scenario is shown in the Figure below (not that this Figure is for illustrative purposes only, it is thus partly synthetic and at a high abstraction level) Preconditions The service specifications should have been previously defined. Public Final Version, Dated April 1 st 2013 47
3.6.4.4 Information model 4 Roadmap Figure 26 Information: Define Service Requirements In this first year of the project, the MODAClouds consortium will focus on a proof of concept of MODACloudML. This way, an initial version of MODACloudML models and metamodels at the various levels (CIM CPIM - CPSM) for the provisioning, deployment and adaptation of multi-clouds systems will be provided as well as a first provisioning, deployment engine. The relation with others tools from others WP will be discussed. In terms of timelines for implementation of the requirements depicted in the following table, each of them will evolve in three steps according to the delivery date of the deliverables months 12, 24 and 30. # Use case scenarios (UC- MC.wp4.*) Table 6 Summary of WP4 requirements Status Priority 1 CPIM level specification In progress Must Have 2 CPSM derivation In progress Must Have 3 Fill in deployment wizard In progress Must Have 4 CloudApp deployment In progress Must Have 5 Models@runtime In progress Must Have 6 Conduct adaptation In progress Must Have 7 Define Application Services In progress Must Have 8 Define Service Orchestration In progress Could Have 9 Define Service Requirements In progress Should Have We envision the integration of the above features into the MODACloudML framework, which will enable the specification of cloud concerns at design-time (using CPIMs), and their enactment at runtime (using the "models@runtime" platform). The models@runtime platform ensures the connection between the models and the running cloud applications. As explained in Section Error! Reference source not found., enforcing a causal connection between the system and its models will help the transition from purely descriptive models (i.e., design models) to productive models (i.e., runtime models), and, in turn, ensure that models remain up-to-date through the system life span. Public Final Version, Dated April 1 st 2013 48
The development of this models@runtime platform will follow agile practices, in order to maintain a strong match between the delivered product and user requirements, which invariably evolve during projects. This iterative development will result in three main milestone deliveries, scheduled at month 12, 24, and 30, respectively. Their key features will be brokendown as follows: Preliminary Release (Month 12). This first release will provide the minimal platform to support the end-to-end development of cloud-based applications: from simplified CPIM level specifications down to deployment on real cloud IaaS or PaaS. This platform will mainly focus on enacting the top-down approach, where models are defined manually and then enacted on various cloud platforms. Intermediary Release (Month 24). This second release will provide features needed for the bottom-up approach, where models are built directly from the running applications, and fed back into design tools or adaptation engines, in order to ease maintenance, and system evolutions, such as system migration from one cloud to another. Key refinements will be new platforms supported during the CPSM derivation, and better support for application service definition. This complete models@runtime platform will then undergo a first evaluation, whose results will drive the platform finalisation. Final Release (Month 30). The final release will integrate feedback from case studies. It will also integrate support for service requirements and service orchestrations. Public Final Version, Dated April 1 st 2013 49
5 Bibliography [1] P. Mell and T. Grance, The NIST Definition of Cloud Computing, National Institue of Standards and Technology, Special Publication 800-145, 2001. [2] mosaic EU project, [Online]. Available: http://www.mosaic-project. [3] Optimis EU project, [Online]. Available: http://www.optimis-project.eu. [4] Cloudify, [Online]. Available: http://www.cloudify.org. [5] Scalr, [Online]. Available: http://scalr.com/. [6] D. Ardagna, E. D. Nitto, G. Casale, D. Pectu, P. Mohagheghi and S. Mosser, MODACLOUDS, A Model- Drivem Approach for the Design and Execution of Applications on Multiple Clouds, in ICSE MiSE: International Workshop on Modelling in Software Engineering, 2012. [7] IBM developerworks, 2004. [8] OMG, Service oriented architecture Modeling Language (SoaML), 2009. [9] B. Michael, Introduction to Service-Oriented Modeling, in Service-Oriented Modeling: Service Analysis, Design and Architecture., Wiley & Sons. [10] J. Cardoso, A. P. Barros, N. May and U. Kylau, Towards a Unified Service Language for the Internet of Services: Requirements and First Developments, in IEEE SCC, 2010. [11] W3C, Web Services Language (WSDL) 1.1, W3c, 2001. [12] J. d. Bruijn, H. Lausen, A. Polleres and a. D. Fensel., The web service modeling language WSML: an overview., in 3rd European conference on The Semantic Web: research and applications (ESWC), Berlin, 2006. [13] W3C, OWL-S, Semantic markup for web services, 2004. [14] Web Services Policy Framework (WS-Policy), version 2.1, 2006. [15] A. D'Ambrogio, A Model-driven WSDL Extension for Describing the QoS ofweb Services., in In Proceedings of the IEEE International Conference on Web Services (ICWS '06), Washington, DC, 2006. [16] OMG, UML TM Profile for Modeling Quality of Service and Fault Tolerance Characteristics and Mechanisms Specification, Version 1.1, 2008. [17] OASIS, Business Process Execution Language for Web Services (BPEL), Version 1.1., 2003. [18] OASIS, Web Services Business Process Execution Language, Version 2.0, 2007. [19] Business Process Modeling Language (BPML)., [Online]. Available: http://www.ebpml.org/bpml.htm. [20] OMG Business Process Model and Notation, [Online]. Available: http://www.bpmn.org/. [21] ebxml Business Process Specification Schema, Version 1.01, May 2011. [Online]. Available: http://www.ebxml.org/specs/ebbpss.pdf. [22] J. M. Zaha, A. P. Barros, M. Dumas and A. H. M. t. Hofstede, Let's Dance: A Language for Service Behavior Modeling., in OTM Conferences, 2006. [23] ISO/IEC 10746-3:2009 Information technology, Open distributed processing - Reference model: Architecture.. [24] TOGAF, [Online]. Available: http://www.opengroup.org/togaf/. [25] CloudSleuth, [Online]. Available: https://cloudsleuth.net/. [26] Amazon Web Services, [Online]. Available: http://aws.amazon.com/. [27] OpenStack, [Online]. Available: http://www.openstack.org/. [28] vcloud, [Online]. Available: http://vcloud.vmware.com/. [29] CloudStack, [Online]. Available: http://incubator.apache.org/. [30] Eucalyptus, [Online]. Available: http://www.eucalyptus.com/. [31] OpenNebula, [Online]. Available: http://opennebula.org/. [32] jclouds, [Online]. Available: http://www.jclouds.org/. [33] Deltacloud, [Online]. Available: http://deltacloud.apache.org/. [34] Simple Cloud, [Online]. Available: http://simplecloud.org/. [35] Libcloud, [Online]. Available: http://libcloud.apache.org/. Public Final Version, Dated April 1 st 2013 50
[36] fog, [Online]. Available: http://fog.io/. [37] The blurring line between PaaS and IaaS, [Online]. Available: http://natishalom.typepad.com/nati_shaloms_blog/2012/10/paas-as-an-infrastructure.html. [38] OpenShift, [Online]. Available: https://openshift.redhat.com/app/. [39] Chef, [Online]. Available: http://www.opscode.com/chef/. [40] Puppet, [Online]. Available: https://puppetlabs.com/. [41] Cloud Foundry, [Online]. Available: http://www.cloudfoundry.org/. [42] REMICS EU project, [Online]. Available: http://remics.eu/. [43] E. Brandtzæg, M. Parastoo and S. Mosser, Towards a Domain-Specific Language to Deploy Applications in the Clouds, in Cloud computing 2012: 3rd International Conference on Cloud Computing, GRIDs, and Virtualization, 2012. [44] 4CaaST EU project, [Online]. Available: http://4caast.morfeo-project.org/. [45] R. Krebs, B. Neu, J. Bíró, K. Sinka, H. M. Frutos, S. G. Gomez and C. Sheridan, Resource Deployment and Management, D4.1.2, R. Krebs, Ed., 2012. [46] ARTIST EU project, [Online]. Available: http://www.artist-project.eu/. [47] Celar EU project, [Online]. Available: http://www.celarcloud.eu/. [48] N. Loutas, V. Peristeras, T. Bouras, E. Kamateri, D. Zeginis and K. A. Tarabanis, Towards a Reference Architecture for Semantically Interoperable Clouds, in CloudCom 2010: 2nd IEEE International Conference on Cloud Computing Technology and Science, 2010. [49] CloudScale EU project, [Online]. Available: http://www.cloudscale-project.eu. [50] E. Carlini, M. Coppola, P. Dazzi, L. Ricci and G. Righetti, Cloud Federations in Contrail, in Euro-Par Workshops, 2011. [51] C. Sandru, D. Petcu and a. V. I. Munteanu, Building an Open-Source Platform-as-a-Service with Intelligent Management of Multiple Cloud Resources, in UCC 2012: 5th IEEE International Conference on Utility and Cloud Computing, 2012. [52] A. J. Ferrer, F. Hernández, J. Tordsson, E. Elmroth, A. Ali-Eldin, C. Zsigri, R. Sirvent, J. Guitart, R. M. Badia, K. Djemame, W. Ziegler, T. Dimitrakos, S. K. Nair, G. Kousiouris, K. Konstanteli, T. A. Varvarigou, B. Hudzia, A. Kipp, S. Wesner and M. Corral, OPTIMIS: A holistic approach to cloud service provisioning, Future Generation Comp. Syst, vol. 28, no. 1, pp. 66-77, 2012. [53] L. Rodero-Merino, L. M. V. Gonzalez, V. Gil, F. Galán, J. Fontán, R. S. Montero and a. I. M. Llorente, From infrastructure delivery to service management in clouds, Future Generation Comp. Syst.,, vol. 26, no. 8, pp. 1226-1240, 2010. [54] C. Chapman, W. Emmerich, F. G. Márquez, S. Clayman and A. Galis, Software architecture definition for on-demand cloud provisioning., in 19th ACM International Symposium on High Performance Distributed Computing (HPDC '10), 2010. [55] PaaSage EU project, [Online]. Available: http://www.paasage.eu/. [56] J. Kephar and D. Chess, The vision of autonomic computing, Computer, vol. 36, pp. 41-50, January 2003. [57] G. Blair, N. Bencomo and R. France, Models@ run.time, Computer, vol. 42, no. 10, pp. 22-27, 2009. [58] B. Morin, O. Barais, J.-M. Jézéquel, F. Fleurey and A. Solberg, Models@run.time to support dynamic adaptation, IEEE Computer, vol. 42, no. 10, pp. 44-51, 2009. [59] P. McKinley, S. Sadjadi, E. Kasten and B. H. C. Cheng, Composing adaptive software, Computer, vol. 37, no. 7, pp. 56-64, 2004. [60] F. Fouquet, E. Daubert, N. Plouzeau, O. Barais, J. Bourcier and a. J.-M. Jézéquel., Dissemination of reconfiguration policies on mesh networks., in 12th IFIP WG 6.1 international conference on Distributed Applications and Interoperable Systems (DAIS'12), 2012. [61] Hibernate, What is Object/Relational Mapping?, [Online]. Available: http://www.hibernate.org/about/orm. [62] Apache, Java Data Object - Which Persistence Specification?. [63] Hibernate, HQL: The Hibernate Query Language, [Online]. Available: http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/queryhql.html. [64] J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, Communications of Public Final Version, Dated April 1 st 2013 51
the ACM, pp. 1-13, 2008. [65] Apache, Welcome to Apache, Hadoop!. [66] M. Research, Dryad. [67] E. F. Codd, A relational model of data for large shared data banks, Commun. ACM, vol. 13, no. 6, pp. 377-387, #jun# 1970. [68] J. Han, E. Haihong, G. Le and J. Du, Survey on NoSQL database, in Pervasive Computing and Applications (ICPCA), 2011 6th International Conference on, 2011. [69] C. Strozzi, NoSQL - A Relational Database Management System, 2013. [70] C. Rick, Scalable SQL and NoSQL Data Stores, 2011. [71] M. Stonebraker, Stonebraker on NoSQL and enterprises, Commun. ACM, vol. 54, no. 8, pp. 10-11, 2011. [72] D. Kossmann and T. Kraska, Data Management in the Cloud: Promises, State-of-the-art, and Open Questions, Datenbank-Spektrum, vol. 10, no. 3, pp. 121-129, 2010. [73] S. Strauch, O. Kopp, F. Leymann and T. Unger, A Taxonomy for Cloud Data Hosting Solutions, in Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth International Conference on, 2011. [74] The CumuloNimbo project, CumuloNimbo: Highly Scalable Transactional Multi-Tier PaaS, [Online]. Available: http://www.cumulonimbo.eu/. [75] F. Junqueira, B. Reed and M. Yabandeh, Lock-free transactional support for large-scale storage systems, in Dependable Systems and Networks Workshops (DSN-W), 2011 IEEE/IFIP 41st International Conference on, 2011. [76] C. Strauch, NoSQL Databases, 2011. [77] N. Shalom, Nati Shalom's Blog, [Online]. Available: http://natishalom.typepad.com/nati_shaloms_blog/2009/07/no-to-sql-anti-database-movement-gainssteam-my-take.html. [78] T. Hoff, MySQL And Memcached: End Of An Era?, [Online]. Available: http://highscalability.com/blog/2010/2/26/mysql-and-memcached-end-of-an-era.html. [79] NorthScale, NoSQL is a horseless carriage. [80] Oracle, Oracle Berkeley DB, [Online]. Available: http://www.oracle.com/technetwork/products/berkeleydb/overview/index.html. [81] Amazon, Amazon Simple DB, [Online]. Available: http://aws.amazon.com/simpledb/. [82] Project Voldemort. A distributed database., [Online]. Available: http://www.projectvoldemort.com/voldemort/. [83] Citrusbyte, Redis, [Online]. Available: http://redis.io/. [84] Memcached, [Online]. Available: http://memcached.org/. [85] MemcacheDB, [Online]. Available: http://memcachedb.org/. [86] Scalaris. Distributed Transactional Key-Value Store., [Online]. Available: http://code.google.com/p/scalaris/. [87] Apache Software Foundation, Apache CouchDB, [Online]. Available: http://couchdb.apache.org/. [88] 10gen, Inc., MongoDB, [Online]. Available: http://www.mongodb.org/. [89] solid IT, DB-Engines, [Online]. Available: http://db-engines.com/en/ranking/document+store. [90] J. D. S. G. W. C. H. D. A. W. M. B. T. C. A. F. a. R. E. G. Fay Chang, Bigtable: A Distributed Storage System for Structured Data, Seattle, WA, 2006. [91] The Apache Software Foundation, Welcome to Apache HBase, [Online]. Available: http://hbase.apache.org/. [92] A. Lakshman and P. Malik, Cassandra: a decentralized structured storage system, SIGOPS Oper. Syst. Rev., vol. 44, no. 2, pp. 35-40, #apr# 2010. [93] The Apache Software Foundation, Cassandra, [Online]. Available: http://cassandra.apache.org/. [94] N. Technology, Neo4j. [95] R. Hecht and S. Jablonski, NoSQL evaluation: A use case oriented survey, in Cloud and Service Computing (CSC), 2011 International Conference on, 2011, pp. 336-341. Public Final Version, Dated April 1 st 2013 52
[96] CouchBase, Couchbase, [Online]. Available: http://www.couchbase.com/membase. [97] basho, Riak, basho, [Online]. Available: http://basho.com/riak/. [98] W3C, SPARQL Query Language for RDF, [Online]. Available: http://www.w3.org/tr/rdf-sparqlquery/. [99] tinkerpop, Gremlin, tinkerpop, [Online]. Available: https://github.com/tinkerpop/gremlin/wiki. [100] N. Ford, Polyglot Programming, [Online]. Available: http://memeagora.blogspot.it/2006/12/polyglotprogramming.html. [101] M. Fowler and P. Sadalage, Polyglot Persistence, [Online]. Available: http://martinfowler.com/articles/nosql-intro.pdf. [102] E. A. Brewer, Towards robust distributed systems (abstract), in Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing, New York, NY, USA, 2000. [103] S. Gilbert and N. Lynch, Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, SIGACT News, vol. 33, no. 2, pp. 51-59, #jun# 2002. [104] P. A. Bernstein and S. Melnik, Model management 2.0: manipulating richer mappings, in Proceedings of the 2007 ACM SIGMOD international conference on Management of data, New York, NY, USA, 2007. [105] P. Atzeni, P. Cappellari and P. A. Bernstein, Model-Independent Schema and Data Translation, in EDBT, 2006. [106] P. Atzeni, L. Bellomarini, F. Bugiotti, F. Celli and G. Gianforme, A runtime approach to model-generic translation of schema and data, Inf. Syst., vol. 37, no. 3, pp. 269-287, 2012. [107] R. Fagin, P. G. Kolaitis, R. J. Miller and L. Popa, Data Exchange: Semantics and Query Answering, in In ICDT, 2003. [108] P. Atzeni, F. Bugiotti and L. Rossi, Uniform Access to Non-relational Database Systems: The SOS Platform, in CAiSE, 2012. [109] J. Roijackers, Bridging SQL and NoSQL, 2011. [110] B. Peter, F. Mary and S. Dan, UnQL: a query language and algebra for semistructured data based on structural recursion, The VLDB Journal, 2000. [111] Advanced Information Systems Engineering - 24th International Conference, CAiSE 2012, Gdansk, Poland, June 25-29, 2012. Proceedings, in CAiSE, 2012. [112] Advances in Database Technology - EDBT 2006, 10th International Conference on Extending Database Technology, Munich, Germany, March 26-31, 2006, Proceedings, in EDBT, 2006. Public Final Version, Dated April 1 st 2013 53