An SLA-based Broker for Cloud Infrastructures

Size: px
Start display at page:

Download "An SLA-based Broker for Cloud Infrastructures"


1 Journal of Grid Computing manuscript No. (will be inserted by the editor) An SLA-based Broker for Cloud Infrastructures Antonio Cuomo Giuseppe Di Modica Salvatore Distefano Antonio Puliafito Massimiliano Rak Orazio Tomarchio Salvatore Venticinque Umberto Villano Abstract The breakthrough of Cloud comes from its service oriented perspective where everything, including the infrastructure, is provided as a service. This model is really attractive and convenient for both providers and consumers, as a consequence the Cloud paradigm is quickly growing and widely spreading, also in non commercial contexts. In such a scenario, we propose to incorporate some elements of volunteer computing into the Cloud paradigm through the solution, involving into the mix nodes and devices provided by potentially any owners or administrators, disclosing high computational resources to contributors and also allowing to maximize their utilization. This paper presents and discusses the first step towards providing quality of service and service level agreement facilities on top of unreliable, intermittent Cloud providers. Some of the main issues and challenges of such as the monitoring, management and brokering of resources according to service level requirements are addressed through the design of a framework core architecture. All the tasks committed to the architecture s modules and components, as well as the most relevant component interactions, are identified and discussed from both the structural and the behavioural viewpoints. Some encouraging experiments on an early implementation prototype deployed in a real testing environment are also documented in the paper. Keywords Cloud Computing, SLA, QoS, Resource Brokering. Antonio Cuomo, Umberto Villano Dipartimento di Ingegneria, Università degli Studi del Sannio, Italy. Giuseppe Di Modica, Orazio Tomarchio Dipartimento di Ingegneria Elettrica, Elettronica ed Informatica, Università di Catania, Italy. Salvatore Distefano Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy. Antonio Puliafito Dipartimento di Matematica, Università di Messina, Italy. Massimiliano Rak, Salvatore Venticinque Dipartimento di Ingegneria dell Informazione, Seconda Università di Napoli, Italy.

2 2 Antonio Cuomo et al. 1 Introduction Among the several definitions of Cloud computing available in literature, one of the most authoritative is that provided by NIST [32]: Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This Cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models. It is important to remark that such definition identifies the availability as a key concept of the Cloud paradigm. This concept has to be categorized into the broader class of quality of service (QoS), service level agreement (SLA) and related issues, topics of primary and strategic importance in Cloud. In such context, the focus of this paper is on the Infrastructure as a Service (IaaS) provisioning model. IaaS Clouds are built up to provide infrastructures such as computing, storage and communication systems for free or by charge, with or without QoS/SLA guarantees. There are multiple frameworks able to provide computing IaaS services: Eucalyptus [28], OpenNebula [20], Nimbus [29], PerfCloud [12], Clever [42], OpenStack [41] to name a few. All of them, as well as the existing proprietary solutions (i.e., Amazon EC2, Rackspace, etc.), aggregate and manage powerful and reliable underlying computing resources (usually single or multiple interconnected datacenters) to build up the Cloud IaaS infrastructure. A different approach is instead proposed by a project funded by Italian Ministry for Education and Research [15]. (briefly, aims at building an IaaS Cloud Provider using computing, storage and sensing resources also acquired from volunteer contributors. The basic assumption on which relies is that the resources offered on a volunteer basis are not reliable and can not provide levels of QoS comparable to those offered by commercial public Clouds. We believe the volunteer approach can provide benefits in both business and open contexts. In business environments one of the main source of cost and complexity for companies is related to expanding, maintaining, tuning and optimizing the hardware resources in order to effectively satisfy the highly demanding, domain-specific software and to ensure adequate productivity levels. The technology will enable companies to organize their computing resources, which are sometimes distributed over several sites, in order to meet the demands of the mentioned software. Indeed, allows a company to aggregate its sites into a federation, to which each site can contribute with its available and underexploited hardware resources to provide added value, guaranteed services. In open contexts a possible scenario that magnifies the features and approach can be the academic one. Let us imagine that several universities or, in general, research institutions worldwide, need to collaborate on a scientific project that requires a huge amount of hardware resources. Moreover, let us assume that each institution owns a private datacenter, made up of heterogeneous computing resources, each having a different level of utilization (depending on their geographic coordinates and time zones, some datacenters may result underexploited with respect to others). will provide the institutions with tools to build up a federation of datacenters acting as a Cloud broker, to which each partner can contribute with its own (i.e., not utilized or underexploited) resources according to its scheduled availability. Pushing the approach to its limits, one can also imagine a scenario where private users aggregate into a federation and share their resources for each other s needs. In order to implement such ambitious idea, in the project a three-phase roadmap was scheduled: i) development of quality of service (QoS) and service level agreement (SLA) brokering/federation mechanisms for private Cloud providers; ii) development of

3 SLA-based Cloud Broker 3 billing and reward mechanism for merging both private and public Cloud; iii) development of tools for involving single-resource (desktop, laptop, cluster) volunteer contributors. The focus here is on the first step of the roadmap. The paper discusses the implementation of a framework for federating and brokering private Clouds, able to provide QoS guarantees on top of best effort providers. The strength of the proposal is that it offers a solution that is efficient from a cost perspective (when needed, resources can be borrowed from the system, thus avoiding to purchase them outside) and sufficiently reliable at the same time, as mechanisms for guaranteeing QoS are also provided. If we take a look at the state of the art in this research field, the management of service quality of the leased resources is sometimes partially covered by commercial Clouds (availability, reliability and high level performance is natively supported by very powerful and high-quality datacenters) and often neglected in actual Cloud frameworks. The objectives of the project and its rationale, limits and application contexts have been critically discussed in [5], which is a preliminary version of this work mainly focusing on concepts and ideas behind The present work tries to further develop and implement such ideas, more specifically by: i) identifying and characterizing the system actors; ii) detailing the architectural design and its modules and components; iii) providing details on the interactions among the components implementing the functionalities; iv) reporting on the current prototype implementation and the testbed. The remainder of the paper is organized as follows. A brief overview of the state of the art is first reported in Section 2. Then, Section 3 describes the core architecture. Sections 4, 5, 6 and 7 delve into the details of the architectural modules and components. Section 8 presents the system from a dynamic perspective, while Section 9 deals with a real testbed on which tests have been carried out. Conclusions and future developments are discussed in Section Related work aims at implementing a brokering-based Cloud Provider starting from resources shared by different providers, addressing QoS and SLA related issues, as well as resource management, federation and brokering problems. Since such issues and problems involve different topics, in the following we identified some of them providing an overview of the current state of the art. Volunteer and Cloud Computing. The idea of volunteer Clouds recently emerges as one of the most interesting topic in Cloud computing. Some work is available in literature, also inspired by that is one of the first attempt in such direction [16]. In [14] the authors present the idea of leveraging volunteer resources to build a form of dispersed Clouds, or nebulas, as they call them. Those nebulas are not intended to be general purpose, but to complement the offering of traditional homogeneous Clouds in some areas where a more flexible, less guaranteed approach can be beneficial, like in testing environments or in application where data are intrinsically dispersed and centralising them would be costly. Some requirements and possible solutions are presented. BoincVM [38] is an integrated Cloud computing platform that can be used to harness volunteer computing resources such as laptops, desktops and server farms, for computing CPU intensive scientific applications. It leverages on existing technologies (the BOINC platform and VirtualBox) along with some projects currently under development: VMWrapper, VMController and CernVM. Thus, it is a kind

4 4 Antonio Cuomo et al. of volunteer-on-cloud approach, whereas can be classified as a Cloud-on-volunteer model. In [3] the authors investigate how a mixture of dedicated (and so highly available) and non-dedicated (and so highly volatile) hosts can be used to provision a processing tier of a large-scale Web service. They propose an operational model that guarantees long-term availability despite of host churn, by ranking non-dedicated hosts according to their availability behavior. Through experimental simulation results they demonstrate that the technique is effective in finding a suitable balance between costs and service quality. Although the technique is interesting and the results are encouraging, in the paper there is no evidence of either a possible implementation or an architecture design of the overall infrastructure framework that should implement the idea. An approach that can be categorized into the volunteer Cloud is the P2P Cloud. It has been proposed in several papers as the ones cited above, and particularly in storage Cloud contexts. An interesting implementation of such idea is proposed in [22]. In particular, this work specifically focuses on the peer reliability, proposing a distributed mechanism in order to enable churn resistant reliable services that allows to reserve, to monitor and to use resources provided by the unreliable P2P system and maintains long-term resource reservations through controlled redundant resource provision. The evaluation results obtained through simulation show that using KAD measurements on the prediction of the lifetime of peers allows for 100% successful reservations under churn with very low traffic overhead. As in the above case, there is no real implementation of the proposed solution. Anyway, the monitoring and prediction tools developed can be of interest for Federation, InterCloud and resource provisioning from multiple Clouds. Management of resources in Cloud is a complex topic. In [4,37] examples of resource management techniques are discussed, while in [26] a policy-based technique facing the resource management in Cloud environment is proposed. Even if Cloud computing is an emerging field, the need to move out from the limitations of provisioning from a single provider is gaining interest both in academic and commercial research. In [25], the authors move from a data center model (in which clusters of machines are dedicated to running Cloud infrastructure software) to an ad-hoc model for building Clouds. The proposed architecture aims at providing management components to harvest resources from non-dedicated machines already in existence within an enterprise. The need for intermediary components (Cloud coordinators, brokers, exchange) is explained in [11], where the authors outline an architecture for a federated network of Clouds (the InterCloud). The evaluation is conducted on a simulated environment modeled through the CloudSim framework, showing significant improvements in average turnaround time and make-span in some test scenarios. Federation issues in Cloud environments have been considered, and some research projects focuses on this specific topic actively investigate possible solutions, such as RESERVOIR [36] and more recently mosaic [31] and OPTIMIS [21]. With specific regards to [31,21], the approach they propose is to implement a brokering system that acquires resources from different Cloud Providers and offers them in a custom way to their users. As regards the brokering solution for federation, pioneeringly identified and proposed such modality since 2009 [16]. SLA Management in Clouds. In service oriented environments several proposals addressing the negotiation of dynamic and flexible SLAs have appeared [19]. However, to the best of our knowledge, none of the main commercial IaaS providers (Amazon, Rackspace, GoGRID,...) is offering negotiable SLA. What they usually propose is an SLA contract that specifies simple grants on uptime percentage or network availability. Moreover, most of the

5 SLA-based Cloud Broker 5 providers offer additional services (for example Amazon CloudWatch) which monitor the state of target resources (i.e., cpu utilization and bandwidth). Open Cloud Engine software like Eucalyptus, Nimbus, OpenNebula, also implement monitoring services for the private Cloud Provider, but do not provide solutions for SLA negotiation and enforcement. A survey of the SLAs offered by commercial Cloud Providers can be found in [44]. In [24] the authors describe a system able to combine SLA-based resource negotiations with virtualized resources, pointing out how in current literature there is no approach taking into account both these aspects. A global infrastructure aiming at offering SLA on any kind of Service Oriented Infrastructure (SOI) is the objective of the project [39], which proposes a general architecture that can be integrated in many existing solutions. Anyway, this interesting solution is hard to be fully adopted in an infrastructure composed of unreliable resources such as the ones targeted by the project. Recently, in the context of the mosaic project, some offerings exist in order to offer user-oriented SLA services to final users [35,1]. A critical analysis. Due to the large number of computational resources often available in different environments, (like scientific labs, office terminals, academic clusters) there is a clear need of building solutions which are able to reuse such resources in a Cloud computing fashion. The above described state of art illustrates that a few attempts were done in order to apply volunteer computing approaches towards such direction. The main limit of such solution is related to different aspects: (1) definition of the real use cases where the volunteer computing approach can be applied, (2) integration with resources that come from commercial providers and (3) a clear evaluation of the quality of services obtained with such resources. The first open issue needs a clear identification of the type of resources that can be shared with the volunteer approach and their possible usage. At the best of authors knowledge such analysis is not yet available, let alone [3], which is a work in progress paper which partly anticipates some of the ideas presented here. The second problem, instead, is strictly related to what is called Cloud federation, i.e., the idea of integrating resources from different Cloud providers using different techniques (like brokering or Cloud bursting). As outlined above, there is a lot of interest towards federation, but few results take into consideration the effect of integrating commercial-based and volunteer-based resources, and the different issues that arise in such a context. In any case, even though brokering solutions are now available, few of them are stable; furthermore the techniques to be adopted are open and the way in which the brokering functionalities should be offered is, at state of art, not clearly defined. The third problem is a well-known one in literature: how to grant service level agreement over Cloud providers? Even if a lot of effort exists in such direction, no one has come out with a stable proposal. Commercial Cloud Providers use natural language to describe the functionalities, the terms of use and the service levels of their offers. Research projects (like Contrail, Optimis, above described) try to offer frameworks that can be integrated in Cloud Providers, but are usually heavy to maintain and hard to customize. The only available standard (WS-Agreement) has only one stable framework implementing its features (WSAG4J). Moreover all such proposals focus on integration of SLA management system from the Cloud Provider perspective. Implementing SLA mechanisms on top of federated resources is still an open question. Existing solutions (i.e. focus on how to integrate SLA management framework in complex datacenters; the problem is translated into a resource optimization problem. Some results focus on how to optimize the resources at brokering level, i.e. after the resources are obtained and without control over the physical infrastructure. More recently solutions aiming at offering SLA functionalities on the top of

6 6 Antonio Cuomo et al. brokering systems have been proposed [10, 8, 33, 7]. To the best of the authors knowledge, none of them take into account volunteer and time-bounded availability scenarios. 3 System Overview From a wider perspective, aims at merging the Cloud and the Volunteer computing paradigms. collects infrastructure resources from different providers and offers them to the end users through a uniform interface, in an IaaS fashion. As depicted in Fig. 1, resources are gathered from heterogeneous providers, potentially ranging from commercial Cloud providers, offering highly reliable services, to single PCs, voluntarily shared by their owners, who, by their nature, are not able to provide guarantees on the QoS. Users Admin CP Amazon EC2 PerfCloud OpenStack Eucalyptus Volunteer CP OpenNEbula Clever Yet Another CP Fig. 1: Resource aggregation from different Cloud providers. The main goal of is to provide a set of tools for building up a new, enhanced provider of resources (namely, a provider) that is not yet another classic Cloud provider, but instead acts as an aggregator of resources offered by third party providers. A provider collects heterogeneous resources from different Cloud providers adopting diverse resource management policies, and offers such resources to the users in a uniform way. This results in an added value infrastructure that also provides mechanisms and tools for implementing, managing and achieving QoS requirements defined through, and managed by, a specific SLA process. Indeed, in order to deal with the heterogeneity and churn of resources, a provider can use a set of tools and services dedicated to the SLA management, monitoring and enforcement. A goal of is to release the above discussed tools to the Cloud community. Any interested organization may use such tools to build a provider. Nothing prevents the instantiation of multiple providers, each one collecting and aggregating resources by different resource providers. Furthermore, a resource provider is allowed to join any system they wish. As explained in Section 5, the availability of resources for a given request

7 SLA-based Cloud Broker 7 is assessed by the provider at run-time, and the management of the resource status (free, busy, etc.) is up to the resource provider itself. To implement the wide and ambitious vision it was necessary to adequately design and organize the work into phases according to the project aims and goals. In this paper we focus on the first phase towards which aims at identifying, specifying and implementing the building blocks and its core architecture restricting the scope to private Cloud providers. 3.1 Actors On the backend side, the provider interfaces with Cloud providers and performs the brokering of their services. It has to deal with the different levels of the service quality they are natively able to deliver. On the frontend side, the provider has to allow the final users to access the resources in a uniform way, providing them with the required, sustainable QoS specified through the SLA process. In such a context, it is possible to identify three main actors involved in the management process: Users, Admins and Resource Owners. A User interacts with the provider in order to request resources along with the desired quality of service. The Users are also provided with tools to negotiate the desired QoS and, at service provision time, to check that the promised QoS is actually being delivered. A Admin builds up and manages the provider. The Admin is the manager of the infrastructure and, in particular, is in charge of the infrastructure activation, configuration and management. The Admin decides which services provided by the infrastructure must be activated/deactivated. Furthermore, in case of QoS/SLA enabled infrastructures and services, the Admin specifies the policies that have to be adopted to carry out the SLA negotiation process and the QoS enforcement. A Resource Owner shares its resources with the system. Besides private sharers, the category of Resource Owners also encompasses commercial offerers (e.g., mainstream IaaS Cloud providers). In other terms a Resource Owner is a potential Cloud provider for even if some Resource Owners are not able to provide any standalone Cloud service. The role of Resource Owners can be classified and specialized as public contributors (i.e., well known public Cloud automatically enrolled by the system) and volunteer contributors. Volunteer contributors are Resources Owners that voluntarily share their resources to build up a provider. We can further categorize volunteer contributors as: Private Clouds: standalone administrative domains, which may have own QoS-SLA management and other resource facilities and services. They can be voluntarily involved in the according to their administrators needs and wills thus becoming contributors. Individuals: anyone who wants to voluntarily share their own desktop, laptop, cluster or generic resource/device with a community. In this paper we specifically focus on private Clouds, narrowing the issues and related solutions to such class of volunteer contributors. In other words, here we restrict the concept of volunteer to just private Cloud contributors.

8 8 Antonio Cuomo et al. Fig. 2: The system architecture. 3.2 A Modular Architecture According to the scenario above discussed, in the following we identify the main blocks and modules of the architecture, just considering private Clouds as Resource Owners. This includes the basic functionalities and mechanisms for implementing a provider on top of private Cloud contributors. It can be also considered as the core architecture, the starting point to extend and generalize when public Clouds and/or individuals will be involved into as Resource Owners. The main goal of the architecture is to address some of the issues arised above. The architecture offers a set of components that can be easily used to build up its own Cloud Brokering solution. In Fig. 2 the core architecture is depicted: it is organized into modules each composed of units providing specific functionalities, named components. components are themselves delivered as resources hosted on Cloud providers. Following and applying the separation of concerns principle, four main modules have been logically identified, grouping the main functionalities: the Resource Abstraction module, the SLA Management module, the Resource Management module and the Frontend module. As shown above, some of the main components are thought to deal with SLA issues. The Resource Abstraction module hides the heterogeneity of resources (computing, storage and sensor elements) collected from Resource Owners and offers the User a uniform way to access them. It provides a layer of abstraction adopting a standard, implementationagnostic representation. It also implements drivers in order to convert requests expressed in the intermediate representation to actual invocations on the interface of the Resource Owner. On top of the Resource Abstraction module provides tools for the management of the SLAs that have to be negotiated with the Users. The definition of formal guarantees on the performance that the resources must deliver is achieved through the SLA Management module. Users can negotiate the quality level of the requested resources. The negotiation process relies on performance prediction and on statistics of providers historical

9 SLA-based Cloud Broker 9 availability in order to assess the sustainability of Users requests. Statistics are built from information collected on the actual performance and QoS recorded for the supplied resources. A mobile agent-based monitoring service is responsible for gathering those data. The Resource Management module is the core of the system. It is in charge of the provision of resources and of the SLA enforcement. The most important functionalities provided by such module are related to the resource management. In particular this module is responsible for resource enrolment, discovery, allocation/re-allocation, activation/deactivation. These activities are carried out in accordance with the SLA goals and applying the procedures defined in the SLA modules. Finally, the Frontend module acts as an interface to the IaaS for both the Admin and the Users. It just collects Admin and User requests and dispatches them to the appropriate system module. In terms of implementation, components are able to interact with each other through standardized, service-oriented interfaces. Such a choice enables flexible component deploying schemes. Pushing the virtualization paradigm to its limits, a single Component can even be offered as a customized virtual machine hosted by any Cloud provider. 4 Resource Abstraction mainly acts as an intermediary, acquiring different types of infrastructural resources from different Resource Owners and delivering them to Users. To cope with resources and providers heterogeneity, a resource abstraction s scheme have been devised. The Resource Abstraction module encompasses the components providing the abstraction and the necessary logic to map resources into real implementations. Heterogeneous resources lack of uniformity in some specific characteristics, properties and aspects, as they consist of dissimilar or diverse elements. Such differences can be broadly categorized according to three resource aspects: Type - a possible classification could be based on the resource intended function (i.e., the resource type), distinguishing among computing, sensor and storage resources. Such specific aspect of resource heterogeneity has been investigated in [17]. In the present work, we specifically focus on computing resources, since we are mainly interested in describing the high level mechanisms and solutions to deal with QoS, SLA and resource management in However the proposed solutions do not depend on the type of resources and can be easily adapted to sensor and storage resources. Hardware - resources can physically differ in their characteristics, ranging from the internal architecture (CPU, symmetric multiprocessor, shared memory, etc.) to the devices (number of cores, controllers, buses, disks, network interfaces, memories, etc.) and so on. Hardware heterogeneity issues of computing resources can be mainly addressed by virtualization. Software - software environments can differ on operating systems, compilers, libraries, protocols, applications, etc. In Cloud context, as above, computing resource heterogeneity is overcome through virtualization. To cope with this complexity, we decide to adopt an implementation-agnostic representation and access interface for every class of resources, embracing current standardization efforts whenever possible. Another issue to adequately take into account is the delivery of resources from providers, for which the specific acquisition modality must be defined and reflected in the interface.

10 10 Antonio Cuomo et al. Based on the abstractions discussed above, a computing resource interface has been designed to enable access to the provisioned infrastructure. First of all, computing resources are defined in terms of basic blocks like Virtual Machines and Virtual Clusters. A reference standard has been chosen as the native interface to be supported for computing resources management: the OGF Open Cloud Computing Interface (OCCI, [30]). As for the delivery of resources, currently defines two acquisition modalities for computing resources: Charged - resources obtained from public Cloud provider at a certain cost or fees and providing guaranteed QoS levels; Volunteer - resources obtained from Resource Owners that voluntarily support These might range from Cloud-enabled academic clusters, which deliver their resources for free (possibly subjected to internal administration policies and restrictions), to laboratory providing single machines outside their working time, to single desktops willing to contribute to the infrastructure. Such resources are intermittent but, in case of private Cloud volunteer contributors the availability is slotted, i.e., the private Cloud provider defines a time window, an interval, or more specifically a series of intervals, in which and when it is available as Resource Owner. Otherwise, in case of individuals, no guarantees are provided on the shared resources. In this paper, restricting the scope to just private Cloud volunteer contributors, we assume resources are provided specifying the availability time windows. Besides the just described computing resource interface, the Resources Abstraction module contains components to support the practical implementation of such interface. These are the Provider Drivers that implement tools for the acquisition of resources by enabling the interaction with the Resource Owners. They receive OCCI-compliant resource requests from other components, convert them into the target provider s interface, perform the actual invocation to the Resource Owner and return the results that are again converted into OCCI. In this way, it is possible to interact with several open and commercial Cloud platforms too, like Amazon EC2 and Rackspace. An OCCI generic driver is provided too, so that Resource Owners whose infrastructure implements this interface are automatically able to interact with the higher level services by directly exchanging OCCI-compliant messages and requests. In the current implementation, mainly focusing on private Cloud contributors, we have implemented drivers for PerfCloud [12] and Clever [42] providers, two Cloud frameworks adopted in the context of the project. PerfCloud is a solution for integrating Cloud and Grid paradigms, an open problem that is attracting growing research interest [27, 43]. The PerfCloud approach aims at building an IaaS (Infrastructure as a Service) Cloud environment upon a Grid infrastructure, leasing virtual computing resources that usually (but not necessarily) are organized in virtual clusters. Clever builds a Cloud provider out of independent hosts through a set of peer-to-peer based services. The choice of a decentralized P2P infrastructure allows the framework to provide fault-tolerance with respect to issues like host volatility. In case a new Cloud framework wants to join it ought to provide an OCCIcompliant interface or implement a driver for its own infrastructure. 5 Resource management From a high level point of view, a provider is an intermediary (or broker) for the acquisition of resources from different Resource Owners. In this way, it delegates the

11 SLA-based Cloud Broker 11 low-level management of the infrastructure to interface with such Resource Owners. Two important tasks of a provider are therefore the search for Resource Owners (providers) and the acquisition of resources that eventually will be delivered to the final Users. This section describes the components implementing such tasks: the Registry and the Resource & QoS Manager. 5.1 Registry The Registry component collects information on resource providers and on the way their offered resources can be assessed. It provides a simple interface through which resource providers can subscribe to (i.e., decide to share their resources to At subscription time, resource providers must supply a Resource Provider Descriptor (RPD) file. As briefly shown in Listing 1, the file contains the following sections: CloudEngine - identifies the Cloud solution adopted by the provider. This information is needed by the to set up the correct drivers. As discussed in Section 4, drivers for PerfCloud and Clever have been fully developed. Other engines must to use the generic OCCI driver in order to interact with WindowShare - contains the schedule of the time windows during which the resource provider is willing to share resources to the This information is used by the when new user requests arrive, to filter out providers that are not willing to share their resources at the time requested by the User. Security - contains the subsection(s) dedicated to the kind of credentials accepted by the provider. In the example proposed, the provider adopts the PerfCloud engine, which makes use of GSI credentials from the Globus Security Infrastructure [40]. In this case, a subset of the information contained in the PKI certificate is reported. In order to gain access, the user needs the credentials for the target Virtual Organization. AccessPoint - contains the information needed to access the target provider (IP address and qualified names, in the example). Listing 1: Resource Provider Descriptor [..] <Provider> <CloudEngine>PerfCloud</CloudEngine> <WindowShare> <Window> <Day>WorkingDay</Day> <From>6:00 PM CEST</From> <To>6:00 AM CEST</To> </Window> <Window> <Day>HolyDay</Day> <From>12:00 AM CEST</From> <To>12:00 PM CEST</To> </Window> </WindowShare> <Security> <GRIDAuthentication> <Issuer>O=Grid, OU=GlobusTest, OU=simpleCA-CloudAtHome, CN =Globus Simple CA</Issuer>

12 12 Antonio Cuomo et al. <Subject>O=Grid, OU=GlobusTest, OU=simpleCA-CloudAtHome, CN=Globus Simple CA</Subject> </GRIDAuthentication> <Security> <AccessPoint> <IP> </IP> <Name></Name> <Name>Antares</Name> </AccessPoint> </Provider> [..] At discovery time, i.e. when resources must be enrolled to satisfy a User s request, the Registry can be queried to retrieve the list of providers that are eligible to serve the request according to their WindowShare and Security settings. 5.2 Resource & QoS Manager The Resource & QoS Manager (RQM) is a crucial component in the architecture, as it is responsible of acquiring the virtual resources from the providers and ensuring that the negotiated QoS is being delivered. As shown in Fig. 3, the crosscutting tasks of the RQM require it to be able to interface with all other subsystems. To this end, the RQM has been designed as an asynchronous event-based system, denoted as RQMCore, which reacts to requests coming from other components. Fig. 3: The Resource & Qos Manager The RQM core tasks are Request Management and SLA Enforcement. Request Management consists in the translation of User requests into actual resource selection and allocation. Resource requests that do not involve SLA negotiations are directly forwarded to the RQM by the Frontend. SLA-based resource requests are handed to the RQM by the SLA Manager (described in subsection 6.1): the associated policies and procedures that must be used in SLA Enforcement are defined through the SLA Manager and stored in the RQM Core databases.

13 SLA-based Cloud Broker 13 To fulfill these tasks, the RQM performs many activities. A workflow view of the activities and the interactions they entail is described in section 8.2.2, while an overview from the RQM perspective is provided here. Activities related to Request Management include: Provider Selection. The RQM can query the Registry to obtain the list of subscribed providers. The query can include filtering criteria on parameters specified in the Resource Provider Descriptor of Listing 1. Resource acquisition. Once a suitable provider has been found, the Provider Drivers are set up to carry out resource acquisition. Logging. The RQM logs all its operations and their status for further inspection and bookkeeping. More complex activities involve cooperation between multiple modules and are oriented to the management of SLA-based requests and SLA Enforcement: Availability guarantees. To provide availability guarantees, the RQM needs a forecast of the availability level of resources shared by a provider. The availability of a provider resource can be described with the well-known equation: ProviderAvailability = MT BF MT BF + MT T R. (1) where MTBF is the mean time between failures and (MTTR) is the mean time to repair of a single resource provided by the corresponding Cloud provider. To estimate the MTBF of a provider resource, the RQM firstly retrieves historical heartbeat provider data available from the monitoring subsystem, which is based on the Mobile Agents based GriD Architecture (MAGDA) described in the next section 6.3. The historical data can be used to obtain a forecast of the provider MTBF by invoking the forecast service provided by the Autonomic Service Engine (CHASE), a component described in section 6.2. With regards to the MTTR, it can be defined as: where MT T R = T f d + T boot (2) T f d is the time required to detect a resource failure. This is related to the rate with which the monitoring subsystem performs checks, which is bounded by the MAGDA timeout, a parameter that specifies how often MAGDA agents have to report on the resource status. T boot is the time required for the system to boot up another virtual machine in substitution of the failed one. Such time depends on both the computing power of the virtual machine and on the complexity of the VM image. Again, the RQM can obtain a forecast for this value by feeding the forecast service in CHASE with historical boot time data (logs of virtual machines boot times) obtained from MAGDA. Once the MTBF and the MTTR are evaluated, the RQM is able to compute the provider availability through Equation 1. Alert reaction. The RQM uses alerts generated by the monitoring subsystem to activate the SLA Enforcement process. The policies to be activated are expressed through simple triples [< parameter >, < condition >, < procedure >], which formalize the procedures that have to be triggered when a given parameter satisfies a certain condition. The policies offered to the administrator are represented in terms of a simple template, which enables the administrator to configure them: the responsibility of the correctness of the policies is up to the administrator. The current implementation uses a format akin to JSON.

14 14 Antonio Cuomo et al. Let us describe an example use of policies. Through the MAGDA monitoring component, the system uses heartbeat messages to verify if a node is alive. It stores the heartbeat information using two variables: HBfail, representing the number of failures, and HBsuccess, representing the number of consecutive success heartbeats. We wish to specify a policy to use the heartbeat results to detect a machine crash and perform a restart. The policy states that if the number of failed heartbeats hits a certain threshold X, the machine must be restarted. However, random failed heartbeats may happen, for example for a very short network unavailability, that are not symptoms of a crashed machine. To avoid the accumulation of randomly failed heartbeats, we reset the HBfail counter if there are at least Y consecutive success heartbeats. The described policy can be specified as follows: {Policy: [[Heartbeat,HBfail>X,restart], [Heartbeat,HBsuccess>Y,resetHBfail], [Heartbeat,HBsuccess>Y,resetHBsuccess]] } where restart, resethbfail and resethbsuccess are identifiers of procedures that respectively restart the virtual machine (the first) and reset the heartbeats counters (the other two). The procedure definitions are collected in a database local to the RQM. When multiple policies are applicable to the current situation (i.e. all their conditions evaluate to true), all the policies are applied without any order. Again, the responsibility of verifying that this does not lead to the application of conflicting policies is up to the administrator. Performance guarantees. When the SLA involves application-level performance parameters, the RQM can provide guarantees through predictions on the performance of a resource configuration, provided by the CHASE simulation-based performance prediction service. To enable the predictions, the RQM must provide CHASE with a description of the application (included in the user request), the user QoS requirements (part of the SLA) and benchmark data of the provider machines (obtained through the monitoring subsystem). 6 Service Level Agreement management One of the most critical Cloud-related issues is the management of the Service Level Agreement. This is particularly true for since it tries to mix the Cloud and the volunteer paradigms. The volunteer contribution in dramatically complicates the SLA management task: the volatility of resources (resources providers can asynchronously join or leave the system without any message) has to be taken into account when enforcing QoS requirements. In the SLA Management module is in charge of the negotiation and the monitoring of SLAs, and collaborates with the RQM component for the enforcement of the QoS. The SLA Management module is composed of three components: the SLA Manager, the Autonomic Service Engine (CHASE) [34] and the Mobile Agent Based Grid Architecture (MAGDA) [6]. The features of these components are briefly discussed in the following. 6.1 SLA Manager The SLA Manager is in charge of managing the SLA templates to be offered to Users. A User can refer to these templates to start the negotiation procedure with the SLA

15 SLA-based Cloud Broker 15 Manager, that will eventually produce an SLA. In this case, resources (which are the negotiation object of the SLA) are virtualized, and in general can be provided by several, different Cloud providers enforcing different management policies: SLAs are then crucial to guarantee the quality of virtualized resources and services in such heterogeneous environment. We recall that the resource context we are addressing is heterogeneous, from different points of view. As discussed in Section 4, aggregates resource providers that are by their nature heterogeneous: on the one hand the commercial providers, seeking to maximize the profit, and on the other one the volunteer providers, that just share their underutilized resources. Secondly, the provided resources themselves are heterogeneous in terms of computational power they are able to supply. The SLA management must take into account this heterogeneity, and enforce the appropriate strategy according to the nature of the providers and the resources being involved. Stated that resource availability is the only QoS parameter that is currently addressing, when commercial providers are involved in the provision of resources the s SLA strategy will guarantee no more than what the providers proprietary SLA are claiming to guarantee, and will apply the very same penalties for unattended QoS. When, instead, providers voluntarily share resources, the SLA produced for a specific provision aims at just forecasting the minimum service level (again, in terms of resource availability) that will likely be able to sustain. The client of volunteered resources is aware that the resources are volatile, and that strives to guarantee the provision; should the forecast service perform worse than the agreed minimum level, no penalty would be applied. For volunteer scenarios, we are planning to develop an incentive mechanism (supported by the SLA framework itself) that awards those providers that are able to guarantee at best the promised QoS. The basic principle is that the better the SLA is honoured, the more credits the provider gains. Credits can then be used by providers to acquire new resources within the federation. The SLA Manager adopts the WS-Agreement protocol [2] for the interactions. WS-Agreement compliant templates are used by the Users to specify the required quality level. An example of template filled with the User s required functional and non functional parameters is reported in the following: Listing 2: Resource and Availability request in WS-Agreement <ws:servicedescriptionterm ws:name="cluster REQUEST" ws:servicename ="SET VARIABLE"> <mod:cluster xmlns:mod=""> <Compute> <architecture>x86</architecture> <cpucores>4</cpucores> [...] <title>compute1</title> </Compute> <Compute> <architecture>x86</architecture> [...] <title>compute2</title> </Compute> [...] </Cluster> </ws:servicedescriptionterm> [...] <wsag:guaranteeterm wsag:name="availability" Cluster"> <wsag:variables>

16 16 Antonio Cuomo et al. <wsag:variable wsag:name="nodeavailability" wsag:metric="ch:availability" /> <wsag:servicelevelobjective> 97.0 </wsag:servicelevelobjective> <wsag:variable wsag:name="duration" wsag:metric="ch:hours" /> <wsag:servicelevelobjective> 8 </wsag:servicelevelobjective> </wasg:variables> [..] </wsag:guaranteeterm> In the ServiceDescriptionTerm section the features of the needed resource are expressed (in this case the User is asking for a cluster of two nodes). It is important to remark that the resource features request complies with the OCCI specification format. At acquisition time, this information will be extracted from the SLA and used to make explicit request to OCCI-compliant providers. As for the Duration parameter in the GuaranteeTerm section, it will be used at discovery time to filter out providers that do not share resources at the time, and for the duration, requested by the User. Finally, through the NodeAvailability parameter the User specifies the required QoS (non-functional parameters), which in this specific case is targeted at 97.0%. 6.2 CHASE CHASE Autonomic Service Engine) is a framework that allows to add selfoptimization capabilities to grid and Cloud systems. It evolves from an existing framework for the autonomic performance management of service oriented architectures [13]. The engine allows to identify the best set of resources to be acquired and the best way to use them from the application point of view. CHASE is a modular framework that follows the autonomic computing paradigm, providing components to fully manage a grid/cloud computing element. For the operation of CHASE as a stand-alone autonomic manager, the interested reader is invited to consult the work in [34]. Focus here is on the design of the services provided by the framework to support the operativeness of namely the forecast service and the performance prediction service. The forecast service provides a forecast of future values from historical data. It takes as input a time series of values and produces forecast based on autoregressive methods. The forecast service is used when the RQM needs to evaluate the provider availability, for which estimates for MTBF and MTTR are required. For the MTBF, historical data of heartbeat failure are used to produce the forecast. For the MTTR, historical boot times of the specific virtual machine image on a specific provider are used. The collection of historical data is made through the MAGDA platform, described in the next subsection. The performance prediction service is a simulation-based estimator of application performance parameters, like execution time and resource usage. In particular, the CHASE simulator, fed with a) an application description, b) information regarding the current state of the system in terms of resource availability and load, and c) the user s requested QoS, builds a parameterized objective function to be optimized. The optimization engine drives the simulator to explore the space of possible configurations in order to find a configuration

17 SLA-based Cloud Broker 17 that meets the demands. The performance prediction service is used during the negotiation to evaluate the sustainability of performance guarantees. It can be also invoked when the monitored QoS agreed in the SLA is at risk of violation. New simulations are run with up-to-date settings in order to search for alternative scheduling decisions (like migrating or adding more VMs) that can solve the QoS problem. 6.3 Mobile Agent based Application Monitoring The MAGDA (Mobile Agent Based Grid Architecture) component constantly carries out the monitoring of the QoS level provided by the leased resources. MAGDA [6] is a mobile agent platform implemented as extension of JADE, a FIPA standard [23] compliant agent platform developed by TILAB [9]. The MAGDA toolset allows to create an agent-enabled Cloud in which mobile agents are deployed on different virtual machines which are connected by a real or a virtual network. The details of how the MAGDA platform can interact with Cloud environments are discussed in past work [18]. The emphasis here will be on the description of the MAGDA-based monitoring service. This has been designed as a multiagent system that distributes tasks among specialized agents. It contemplates both static and mobile agents: the former are responsible for performing complex reasoning on the knowledge base, so they are statically executed where the data reside; the latter usually need to move to the target resources in order to perform local measurements or to get system information. The Archiver is a static agent that configures the monitoring infrastructure, collects and stores measurements, computes statistics. According to the parameters to be monitored, the kind of measurement and the provider technology, the archiver starts different Meters, which are implemented as mobile agents that the Archiver can dispatch where it needs. Observers periodically check a set of rules to detect critical situations. They query the Archiver to know about the statistics and eventually notify applications if some checks have failed. Applications can use a Agent-bus service to subscribe themselves for being alerted about each detected event. They can also invoke MAGDA services to start, stop or reconfigure the monitoring infrastructure. Finally, applications can access the complete knowledge base to retrieve information about Cloud configuration, monitoring configuration, statistics and the history of past failed checks. In the current prototype, the Meters are used to collect three kinds of metrics: Image boot times: a Meter is configured to start-up as soon as the MAGDA platform is loaded. This provides an estimate of the time required to boot the virtual machine. Heartbeats: heartbeats are sent by the Meters to verify the liveness of the resource on which they are residing. Benchmark figures: Meters are able to execute different kind of benchmarks, which vary from simple local data sampling (actual CPU utilization, memory available, etc.) to distributed benchmarking (evaluating distributed data collection, or evaluating the global state with snapshot algorithms). The MAGDA component poses a number of issue in terms of deployment. The agent platform, the Archiver and the bus service can be deployed as a dedicated virtual machine which coordinates all the available agents. MAGDA Meters, instead, must be installed inside the virtual machines of the user. A dedicated Java-based agent execution environment is installed and configured as a start-up service in the user VM. From the portal hosted in the Frontend, it is possible to manage (start, stop, migrate, etc.) the mobile agents in order to control monitoring or other functionalities on top of all the computational resources

18 18 Antonio Cuomo et al. hosting a MAGDA container. In the current implementation, if the user does not accept the installation of the agent platform and the roaming of mobile agents in its virtual machine, the system cannot provide the required monitoring and the associated services like health check and performance prediction. 7 Frontend The main role of the Frontend within the architecture is to provide an access point to the system functionalities. The Frontend provides the reference links to all the components, thus operating as a glue entity for the architecture. It serves the incoming requests by triggering appropriate processes on the specific components in charge of serving such requests. In order to provide a user-friendly and comprehensive interface to the system, the Frontend was implemented as an extensible and customizable Web portal. Furthermore, in accordance with the everything-as-a-service philosophy, the developed Frontend component, as the other components, can be deployed as a virtual machine image, enabling the setup of an infrastructure access point with a modest amount of work (as shown in Section 8). More specifically, the Frontend component has to manage both the User and the Admin incoming requests. For this reason it has been split into two parts as discussed in the following. 7.1 User Frontend The main goal of the User Frontend is to provide access to tools and the services implemented by the system in favor of the end users. Such tools base on and involve lower level functionalities and operations that are hidden to the users, masquerading the internal organization and resource provisioning model in a Cloud-fashion. Moreover, ubiquity and fault tolerance have to be guaranteed, following a service oriented provisioning model. For such a reason the User Frontend is implemented as a Web service, providing suitable SOAP and/or REST interfaces. The User Frontend exposes the main services allowing end users to access resources. In case the provider implements guarantees on the resource provisioning, adequate services for negotiating the SLA and for monitoring the agreed QoS level on the resources are required. In such cases the Frontend pre-processes, splits and forwards the incoming requests to the corresponding SLA and QoS management components according to the specified requirements. More specifically the User Frontend exposes the following functionalities: resource negotiation - through which the User can trigger the resource negotiation process with the SLA Management module (in particular, downloading SLA templates, issuing SLA proposals, proposing SLA counter offers); resource management - allowing the User to manage (activate/deactivate) the leased resources obtained by the IaaS through the Resource Management module; SLA monitoring - providing the User tools for monitoring the status of the SLA regulating the resource provision through a specific interface to the SLA Manager; settings and preference management - allowing the User to customize the user interface with specific user settings and configurations.

19 SLA-based Cloud Broker Admin Frontend From the provider perspective one of the most interesting characteristic of the system is to provide a service for building up a provider and the corresponding Infrastructureas-a-Service by selecting the basic components and services it has to provide. In this way a Admin can customize its service (i.e. the IaaS Cloud provisioning), can decide to either include SLA/QoS management or provide the infrastructure without any guarantee (best effort), and finally can specify where they have to be deployed. The Admin Frontend implements the services to set-up, customize and implement such choices, driving the Admin through all the required steps for a correct service establishment. More specifically such services are: system management - allows the administrator to set-up, configure and deploy a infrastructure and to customize all the services it provides, and therefore also the Frontend interface; negotiation management - through which negotiation policies can be defined, fine-tuned and deployed; QoS management - allows the administrator to specify the QoS policies that must be put in force to sustain the SLAs. The basic components that needs be included in any provider configuration are those of the Resource Abstraction module. A minimal system configuration, building an IaaS with no QoS guarantees, must provide such tools and services, that just implement the basic system management mechanisms. In case the Admin needs to set-up a infrastructure providing QoS on top of resources, it is necessary to enhance the minimal configuration with the Resource and SLA management modules, thus deploying the whole configuration shown in Fig. 2. Different configurations can be identified to best fit and meet the Admin requirements, not necessarily involving all the components identified above. It is important to remark that the component selection is not directly performed by the Admin that just customizes the provisioning model they want to implement. The component selection is therefore automatically performed by the Frontend tools according to the requirements specified by the Admin. The deployment is instead mainly in charge of the Admin. 8 in action In the previous sections the system has been described from a static perspective. The stress has been on the architectural modules and components, and on the functionality that each analyzed entity is able to offer. The current section, instead, intends to give the reader a dynamic view of the system, focusing on the activities that are triggered within the system and on the interactions taking place among the components of the architecture. 8.1 Process View Some activities must be carried out in order to create from scratch a new system and to get it ready to work, as shown in Fig. 4. In the first stage (image retrieval) the overall

20 20 Antonio Cuomo et al. Fig. 4: infrastructure set-up process system image, made up of specific software components, must be retrieved. Several versions of the system are available, each providing a different flavor of services, ranging from the very basic to the full-fledged. As discussed in Section 7, the basic version is equipped with just the Resource Abstraction module. The User is aware that they will not get any guarantee on the performance of the resources provided, and can not claim support in case of QoS degradation. Thus, the service will be provided on a best-effort basis. The full-fledged version of the system, instead, is featured with the SLA/QoS management service. In the following we will discuss the activities to be carried on in the case that a full-fledged system must be set-up. Once the image is retrieved, the system profiling activity is performed, during which the Admin specifies the policies and the strategies that will have to be adopted for SLA negotiation and enforcement respectively. After that, the components can be deployed (system deployment), configured in order to properly work with each other (system configuration), and run (system boot-up). The system is then up and ready to accept User requests. The overall request management process (shown in Fig. 5) involves many activities, some of which are optional, in the sense that they can be either triggered or not, depending on the User s specific request. Such process starts with the SLA negotiation of the resource functional parameters and, optionally, of the QoS level (non-functional parameters) that the system must support at provision time. In the latter case, upon a successful termination of the negotiation activity, the User will receive a formal guarantee (in the form of an SLA) that the requested resources will be provisioned and, if required, the QoS levels will be sustained. Should the negotiation fail, the request would be simply discarded and the User would have to issue a new request. Otherwise, upon a successful negotiation, the requested resources are activated and assigned to the User (resource delivery). From this point onwards, the User can use the resources. If the SLA also requested QoS, a monitoring infrastructure is set-up to take under constant control the performance of the delivered resources (resource monitoring): in such activity, the non-functional parameters contributing to the overall QoS (namely, the availability) are monitored to ensure that none of the SLA terms is being violated. Whenever any QoS parameter is about to be violated, a recovery action is triggered (QoS recovery): in this step countermeasures are taken in order to bring the QoS back to safe levels. Upon a successful recovery, the originally requested QoS is restored and the monitoring process, which had been temporarily put on stand-by, is resumed in order to detect new possible faults. Should instead the recovery fail, the resource provision would be stopped and the SLA would be terminated. Finally, unless the User or the system decides (for any reason) to prematurely force the termination, the resource provision

21 SLA-based Cloud Broker 21 Fig. 5: Request management process will end up at the termination time specified in the SLA, and the resources will be released (termination). 8.2 Interaction view This subsection describes in detail through a practical approach how to set up a provider and how this latter negotiates and enforces QoS on top of resources voluntarily shared by their Cloud providers. In subsection 8.2.1, we first show how a Admin can build and set up the components that implement the infrastructure according to the IaaS service they want to provide (with or without fault tolerance, QoS-SLA management, etc.). The QoS negotiation and enforcement dynamics are described in subsection and subsection 8.2.3, respectively. As regards the described scenarios, the target resources taken into account are academic clusters hosting a Cloud middleware (e.g., Eucalyptus [28], Nimbus [29], PerfCloud [12], Clever [42], OpenStack [41]) able to provide Virtual Clusters (VCs). We also assume that such clusters have a reliable frontend during the provider availability time windows, whereas computing nodes are unreliable since, for example, they might crash due to power outages or they might be periodically unavailable for a while to perform reconfiguration or maintenance. The scenario we consider implements a infrastructure able to satisfy User requests of VCs including specific QoS requirements. In particular, the QoS is expressed in terms of availability, i.e., the fraction of time (percentage) the node must be up and reachable with respect to a 24 hours base time, as specified in Section 5.2. In order to do that, we assume all the nodes belong to the same Cloud provider. Although is able to pick resources from different providers, in this specific case all resources are acquired by the same provider. The VC images host the MAGDA agent-based monitoring service System Setup The Admin is the actor in charge of the deployment and the setup of the infrastructure. As already pointed out in Section 3, components are provided with serviceoriented interfaces. Such a choice enables flexible deployment schemes. Inspired by the as-a-service paradigm, individual (or group of) components can be packaged together and run within virtual machines offered by any (even commercial) Cloud provider.



More information

Conference Paper Sustaining a federation of Future Internet experimental facilities

Conference Paper Sustaining a federation of Future Internet experimental facilities econstor Der Open-Access-Publikationsserver der ZBW Leibniz-Informationszentrum Wirtschaft The Open Access Publication Server of the ZBW Leibniz Information Centre for Economics Van Ooteghem,

More information

FRAUNHOFER INSTITUTE FOR OPEN COMMUNICATION SYSTEMS. Cloud Concepts for the Public Sector in Germany Use Cases

FRAUNHOFER INSTITUTE FOR OPEN COMMUNICATION SYSTEMS. Cloud Concepts for the Public Sector in Germany Use Cases FRAUNHOFER INSTITUTE FOR OPEN COMMUNICATION SYSTEMS Cloud Concepts for the Public Sector in Germany Use Cases Peter Deussen, Klaus-Peter Eckert, Linda Strick, Dorota Witaszek Fraunhofer Institute FOKUS

More information

Cloud Service Level Agreement Standardisation Guidelines

Cloud Service Level Agreement Standardisation Guidelines Cloud Service Level Agreement Standardisation Guidelines Brussels 24/06/2014 1 Table of Contents Preamble... 4 1. Principles for the development of Service Level Agreement Standards for Cloud Computing...

More information

An architectural blueprint for autonomic computing.

An architectural blueprint for autonomic computing. Autonomic Computing White Paper An architectural blueprint for autonomic computing. June 2005 Third Edition Page 2 Contents 1. Introduction 3 Autonomic computing 4 Self-management attributes of system

More information

Cloud Computing and Grid Computing 360-Degree Compared

Cloud Computing and Grid Computing 360-Degree Compared Cloud Computing and Grid Computing 360-Degree Compared 1,2,3 Ian Foster, 4 Yong Zhao, 1 Ioan Raicu, 5 Shiyong Lu,,, 1 Department

More information

DRAFT Cloud Computing Synopsis and Recommendations

DRAFT Cloud Computing Synopsis and Recommendations Special Publication 800-146 DRAFT Cloud Computing Synopsis and Recommendations Recommendations of the National Institute of Standards and Technology Lee Badger Tim Grance Robert Patt-Corner Jeff Voas NIST

More information


PROJECT FINAL REPORT PROJECT FINAL REPORT Grant Agreement number: 212117 Project acronym: FUTUREFARM Project title: FUTUREFARM-Integration of Farm Management Information Systems to support real-time management decisions and

More information

Cloud-Based Software Engineering

Cloud-Based Software Engineering Cloud-Based Software Engineering PROCEEDINGS OF THE SEMINAR NO. 58312107 DR. JÜRGEN MÜNCH 5.8.2013 Professor Faculty of Science Department of Computer Science EDITORS Prof. Dr. Jürgen Münch Simo Mäkinen,

More information

Risk assessment-based decision support for the migration of applications to the Cloud

Risk assessment-based decision support for the migration of applications to the Cloud Institute of Architecture of Application Systems University of Stuttgart Universittsstrae 38 D 70569 Stuttgart Diplomarbeit Nr. 3538 Risk assessment-based decision support for the migration of applications

More information

Final Report. DFN-Project GRIDWELTEN: User Requirements and Environments for GRID-Computing

Final Report. DFN-Project GRIDWELTEN: User Requirements and Environments for GRID-Computing Final Report DFN-Project GRIDWELTEN: User Requirements and Environments for GRID-Computing 5/30/2003 Peggy Lindner 1, Thomas Beisel 1, Michael M. Resch 1, Toshiyuki Imamura 2, Roger Menday 3, Philipp Wieder

More information

State of art in the field of Adaptive Service Composition Monitoring and Management

State of art in the field of Adaptive Service Composition Monitoring and Management D5.1 Version: 0.7 Date: 2008-07-30 Author: UNITN Dissemination status: PU Document reference: D5.1 State of art in the field of Adaptive Service Composition Monitoring and Management Project acronym: COMPAS

More information

Analysis of the state of the art and defining the scope

Analysis of the state of the art and defining the scope Grant Agreement N FP7-318484 Title: Authors: Editor: Reviewers: Analysis of the state of the art and defining the scope Danilo Ardagna (POLIMI), Giuliano Casale (IMPERIAL), Ciprian Craciun (IEAT), Michele

More information

THÈSE. En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE. Présentée et soutenue le 03/07/2014 par : Cheikhou THIAM

THÈSE. En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE. Présentée et soutenue le 03/07/2014 par : Cheikhou THIAM THÈSE En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE Délivré par : l Université Toulouse 3 Paul Sabatier (UT3 Paul Sabatier) Présentée et soutenue le 03/07/2014 par : Cheikhou THIAM Anti

More information

Cyber Security and Reliability in a Digital Cloud

Cyber Security and Reliability in a Digital Cloud JANUARY 2013 REPORT OF THE DEFENSE SCIENCE BOARD TASK FORCE ON Cyber Security and Reliability in a Digital Cloud JANUARY 2013 Office of the Under Secretary of Defense for Acquisition, Technology, and Logistics

More information

The HAS Architecture: A Highly Available and Scalable Cluster Architecture for Web Servers

The HAS Architecture: A Highly Available and Scalable Cluster Architecture for Web Servers The HAS Architecture: A Highly Available and Scalable Cluster Architecture for Web Servers Ibrahim Haddad A Thesis in the Department of Computer Science and Software Engineering Presented in Partial Fulfillment

More information

Scalability and Performance Management of Internet Applications in the Cloud

Scalability and Performance Management of Internet Applications in the Cloud Hasso-Plattner-Institut University of Potsdam Internet Technology and Systems Group Scalability and Performance Management of Internet Applications in the Cloud A thesis submitted for the degree of "Doktors

More information

Power System Control Centers: Past, Present, and Future

Power System Control Centers: Past, Present, and Future Power System Control Centers: Past, Present, and Future FELIX F. WU, FELLOW, IEEE, KHOSROW MOSLEHI, MEMBER, IEEE, AND ANJAN BOSE, FELLOW, IEEE Invited Paper In this paper, we review the functions and architectures

More information

By Nicholas R. Jennings and Stefan Bussmann

By Nicholas R. Jennings and Stefan Bussmann odern control systems must meet increasingly demanding requirements stemming from the need to cope with significant degrees of uncertainty, as well as with By Nicholas R. Jennings and Stefan Bussmann Mmore

More information

Priorities for Research on Current and Emerging Network Technologies

Priorities for Research on Current and Emerging Network Technologies 101010011011101100111001010111001001010101110101111010001100111 Priorities for Research on Current and Emerging Network Technologies About ENISA The European Network and Information Security Agency (ENISA)

More information

Design and Evaluation of a Wide-Area Event Notification Service

Design and Evaluation of a Wide-Area Event Notification Service Design and Evaluation of a Wide-Area Event Notification Service ANTONIO CARZANIGA University of Colorado at Boulder DAVID S. ROSENBLUM University of California, Irvine and ALEXANDER L. WOLF University

More information

Arbeitsberichte der Hochschule für Wirtschaft FHNW Nr. 28. Enterprise Architectures for Cloud Computing

Arbeitsberichte der Hochschule für Wirtschaft FHNW Nr. 28. Enterprise Architectures for Cloud Computing Arbeitsberichte der Hochschule für Wirtschaft FHNW Nr. 28 Enterprise Architectures for Cloud Computing Laura Aureli, Arianna Pierfranceschi, Holger Wache ISSN Nr. 1662-3266 (Print) Nr. 1662-3274 (Online)

More information

Service Composition in Open Agent Societies

Service Composition in Open Agent Societies Service Composition in Open Agent Societies 1 Service Composition in Open Agent Societies Agostino Poggi, Paola Turci, Michele Tomaiuolo Abstract Agentcities is a network of FIPA compliant agent platforms

More information

University: Andrés Terrasa Company: Salvador Ferris

University: Andrés Terrasa Company: Salvador Ferris POLYTECHNIC UNIVERSITY OF VALENCIA Faculty of Computer Science University: Andrés Terrasa Company: Salvador Ferris Move away from frequented tracks and walk through footpaths Pythagoras To my parents

More information

Deliverable D.A2a Business SLA Management

Deliverable D.A2a Business SLA Management Project no. FP7-216556 Instrument: Integrated Project (IP) Objective ICT-2007.1.2: Service and Software Architectures, Infrastructures and Engineering Deliverable D.A2a Business SLA Management Keywords:

More information

No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data-intensive Analytics

No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data-intensive Analytics No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data-intensive Analytics Herodotos Herodotou Duke University Fei Dong Duke University Shivnath Babu Duke

More information



More information

Digital Forensic Trends and Future

Digital Forensic Trends and Future Digital Forensic Trends and Future Farhood Norouzizadeh Dezfoli, Ali Dehghantanha, Ramlan Mahmoud, Nor Fazlida Binti Mohd Sani, Farid Daryabar Faculty of Computer Science and Information Technology University

More information

Just in Time Clouds: Enabling Highly-Elastic Public Clouds over Low Scale Amortized Resources

Just in Time Clouds: Enabling Highly-Elastic Public Clouds over Low Scale Amortized Resources Just in Time Clouds: Enabling Highly-Elastic Public Clouds over Low Scale Amortized Resources Rostand Costa 1,2, Francisco Brasileiro 1 1 Federal University of Campina Grande Systems and Computing Department

More information