POLICY BASED MANAGEMENT OF MODULAR CLOUD STORAGE SYSTEMS

Transcription

1 POLICY BASED MANAGEMENT OF MODULAR CLOUD STORAGE SYSTEMS MASTER THESIS Natural Science Faculty of the University of Basel Department of Mathematics and Computer Science Databases and Information Systems Group Examiner: Prof. Heiko Schuldt Supervisor: M.Sc. Filip-Martin Brinkmann B.Sc. Florian Lindörfer

2 Acknowledgements First I want to thank my advisor Filip-Martin Brinkmann and also Prof. Heiko Schuldt, who both supported my work and always provided helpful input during the last 6 months. Furthermore, I want to thank the other members of the DBIS group of the Univserity of Basel. Some of them gave important inputs during my master thesis, while most of them also support my way towards my master thesis. I also thank all my colleagues, that accompanied me through the courses and work at the University of Basel. First of all Steven Rose and Ivan Giangreco deserves a lot of respect for discussing different problems and providing motivations in difficult times. It is difficult to list all people that influenced my bachelor and master studies and I herewith want to thank all people that made this master thesis possible.

3 Abstract Cloud storage system are currently becoming more common. But the selection of a system, which fits the requirements, still requires expertise in storage systems. Additionally, it is not always possible to find the matching solution, since most provider only offer specialized solutions. A new approach, to overcome the later problem, are modular cloud storage systems, that offer the possibility to compose customized solutions from a set of modules. But with this comes also the need for deeper expertise in the mechanisms of storage systems. In this thesis work, we propose a policy system, that is able to reduce the complexity of getting from the requirement to a matching data management system. The policy system reduces the need for expertise about data management systems, by providing the customer the possibility to formulate his requirements as a policy on a high level and then automatically producing a module composition and configuration that adapts to the requirements. Thus, the customer only deals with high level properties of data management systems. To reach this target, we developed a policy language for describing policies and modules, together with the implementation of a policy system that translates a policy, given module descriptions, to a system configuration. This resulting system configuration is then used to enforce a data management system. We evaluated our implementation based on a theoretical scenario.

4 Table of Contents Acknowledgements Abstract ii iii 1 Introduction Terminology Scope of Work Scenario as a framework Policy system Policy translation Scenario Overview System Components and their Requirements Queue Raw data Forecast data Archive and statistics Improvement of Cloud Management through a Policy System Related Work Policy Language Deployment and Management System for Policies Modular Storage Cloud UBStore OSGi Structure of UBStore Service search Configuration Problem Policy System From Requirements to System Configuration Characterising modules Requirement to policy

5 Table of Contents v Policy to system configuration Binding between Policy System and Modular Cloud Storage System Providing module descriptions Enforcing system configurations Policy Language Module description Policy definition Policy Translation Assumptions Problem analysis Algorithm overview First phase - service matching and filtering Second phase - optimizing variables Implementation Implementation Environment Binding between the Policy System and UBStore Bundle Descriptor Service Manager Policy System Policy translation Policy Language Policy language and according classes Specific requirement and variable definition types Mapping between Java objects and XML representation Evaluation Evaluation Preparation Properties of data management systems Modules for a modular cloud storage system Policies Evaluation Environment Quality of Solutions Setup Results Performance of the Policy System Results Discussion and Conclusion Quality and Performance Quality Performance Summary Policy Language

6 Table of Contents vi 7.3 Properties of Cloud Storage Systems Translation between Policy and Properties Lessons Learned OSGi Real world scenarios Time Future Work Properties of Cloud Storage Systems Policy Language and Module Description Mapping Requirements to Variables Policy System Policy Translation Bibliography 90 A Appendix 92 A.1 Full XSD of the policy language A.2 UML diagram of objects for policy, module and system configuration A.3 Definition of modules used for evaluation A.4 Policies used for the evaluation

7 1 Introduction Cloud storage systems are currently becoming more common. They promise large, scalable storage with simplified administration, only little risk and low costs, since the customer only pays for what he uses. Most cloud storage provider offer specialized solutions for specific scenarios and customers have to compare the properties of different solutions to decide on one that fits their requirements best. Although there exists a huge variety of cloud systems, it is likely that the customers requirements are not perfectly matched. To overcome this problem, Kossmann et al. recently proposed Cloudy [1]. Cloudy is a modular cloud storage that provides a configurable cloud storage, which can be modified by the customer to meet his requirements. With this system, the customer avoids the comparison and selection of different solutions. But despite the possibility to compose a system that satisfies the requirements, the customer still has to examine different modules and their properties and cannot specify directly his requirements or high level goals. For example, a customer may decide for a specific consistency protocol but cannot express that his high level goal is a high consistency as far as it allows high throughput. Cloudy is not the only modular storage system currently developed. For example, Jindal [2] proposed OctopusDB with target to create a one-size-fits-all database. Similar to Cloudy, this system splits a database system in modules, that are selected during runtime. But other than a modular cloud storage system like Cloudy, OctopusDB concentrates on the data storage part, while a modular cloud storage system spans from the low storage layer up to the access layer, commonly reachable over the internet. Thus, OctopusDB does not directly deal with a customer requiring a standalone solution, but can be seen as a possible part of a modular cloud storage system. To overcome the problem of getting from the customers requirements to a customized cloud storage system, we propose a policy based management system for modular cloud storage systems (short "policy system"). The policy system allows the user to express his requirements, independent of available modules in the modular cloud storage system. Given a policy, the policy system composes the necessary modules and deploys them. We have focused on cloud storage systems, since only the dynamics and scalability of a cloud system offer the possibility to flexible compose new systems. For example, in the case of a tradi-

8 Introduction 2 tional data center of a single company, it is not easy to scale the hardware and the large amount of modules would be unnecessary, since there are only a few use cases. 1.1 Terminology Since some terms may be misleading or need further clarification, the following list describes terms that are central and frequently used in this work: customer A customer can be a person or a company, that requires a data management system. The customer has "real" requirements (in sense of an idea or thought) that he wants to be fulfilled by a data storage system. Thus, the requirements represent the interest of the customer. One target of our work is, that a customer cannot only be an expert from the area of database systems, but also someone who only knows how to formulate his high level requirements. module When talking about modules, we think of a module as a functional part that provides and references services. The provided services provide functionality to topics the module is specialized in. data management system Instead of the misleading term "storage system", we use "data management system" in this work. A data management system describes a container with hardware and software for storing, managing and accessing data. A data management system can also be on one single node or distributed over multiple nodes. modular cloud storage system A modular cloud storage system is the system, that supports the policy system and the ability to create new data management systems. characteristics, properties and variables In this work, we use the terms "properties" and "characteristics" for similar things. A data management system has characteristics, that are expressed through the properties of the modules it consists of. On the technical layer, we also use the term "variable" instead of "properties", since this expresses the dynamics during the configuration of the data management system. system configuration A system configuration provides information about a set of selected modules, the connections between them and also about required properties. 1.2 Scope of Work Our high level goal in this thesis was to simplify and accelerate the process of getting from the need of a data management system to a solution that satisfies the requirements of the customer. We focused on the area of modular cloud storage systems, since those systems allow to deploy data management systems in a dynamic and flexible way. In this thesis, we investigated on how a policy system could help to fulfil our high level goal and what components are needed. In this work, we define the following parts: 1. a concept for describing and defining modules

9 Introduction 3 2. a policy language for describing policies and module descriptions 3. a system that translates from a policy to an initial system configuration 4. a system that enforces a system configuration Our policy system is embedded in modular cloud storage system running in an online environment, as shown in figure 1.1. Our policy language is used to define a policy and module descriptions. The policy is then translated into a system configuration, using the module descriptions. For evaluation, we implemented the policy language, the policy translation system and a binding for communicating with the modular cloud storage system. Figure 1.1: Overview showing our contribution marked with a gray background. The customer formulates his requirements and afterwards transmits them to an online modular cloud storage system containing our policy system. Developer define modules for the system and additionally module descriptions that are used by the policy system to compose a system configuration. This system configuration is used to enforce the required data management system for the customer. The process described in this work concentrates on the initial transmission of a policy and thus an initial system configuration. As we discuss in this work, it is necessary to validate the policy during runtime, since the module descriptions only estimate the characteristics of the resulting data management system. The runtime monitoring is left for future work Scenario as a framework Currently, in the area of data management systems, most systems are configured manually and result in specialized solutions for different use cases. Due to the huge variety of solutions, there exists no uniform terminology for data management systems. The ACID

10 Introduction 4 properties (atomicity, consistency, isolation and durability) and the three properties of the CAP-Theorem (consistency, availability and partition-tolerance) are well known and defined terms, but when it comes to properties in the area of performance, cloud systems (e.g. elasticity, scalability) or data access (e.g. data types, security, protocols), there are no general terms with according measures. Furthermore, most properties are not well studied in general terms of influence on other properties. For example, it is not always clear how elasticity and scalability influence consistency and durability. The ideal target of a modular cloud storage system would be the possibility to match as many as possible of different use cases. The policy based configuration requires also an uniform terminology for characterising modules from the different use cases with their properties. Therefore, we decided to sketch a scenario with different use cases of data management systems, which is described in detail in chapter 2. The scenario was used as a guideline for a possible choice of modules and a set of properties that would be necessary to describe the characteristics and help to distinguish between the different use cases. By evaluating our implementation, using the defined scenario, we showed that it is theoretical possible to compose a customized data management system, given high level requirements. Future work will show, whether this applies to real world scenarios, too. Therefore, we discuss the evaluation in chapter 7 and highlight possible future work in chapter Policy system In chapter 4, we describe different approaches to design a policy system and discuss our decisions and solution. As we describe in more detail in chapter 4, we split the process of getting from the need of a data management system to a solution into three parts: "requirements to policy", "describing modules" and "policy to system configuration". First, we describe and discuss the environment of the policy system. Afterwards, we describe the components of the policy system. Our main contribution is the concept of the full system and especially the algorithm for getting from a policy to a module composition and configuration. The implementation of our policy system is described in chapter 5. We based our implementation on the modular cloud storage system UBStore, currently developed and describe for the implementation how to embed our policy system in this environment. In chapter 3, we describe UBStore for understanding our decisions and extensions Policy translation By translating the policy to a module composition that can be configured, given a set of module compositions, we define an optimization or configuration problem. As we describe in this work, we used a genetic algorithm to optimize system configurations. Therefore, we describe in chapter 3 the area of configuration problems and genetic algorithm and highlight, that we used well established methods for developing a concept and a first prototype for our policy system.

11 2 Scenario An appropriate scenario for developing and testing the policy system should be a realistic scenario with different data and different use cases. Ideally we would use different requirement specifications and the resulting solutions of some companies that are trading with data. Since these specifications are the key to the success of these companies we do not have access to those information. Therefore, we decided to come up with a small scenario that would include gathering and delivering data. A weather service combines a processing pipeline and a delivery service. The processing pipeline generates the forecast from gathered data, while the delivery service serves different customers, which may receive updates in varying intervals. Therefore, the scenario offers different components, data types and requirements. For example one container is necessary for storing gathered data at live time, another container has to store the forecast data and provide access for customers but is for example only updated twice a day with new forecasts. In the following sections we will give a detailed description of the scenario. 2.1 Overview In our scenario the weather service needs six data management systems and has three external components, as shown in figure 2.1. The first external component is a sensor network, which periodically delivers data at frequent intervals. The second external component is the group of large customers like newspaper agencies or large online portals. Additionally the weather service offers geo-specific data for private customers. During our work, our focus was on the definition of the different containers inside a cloud system. In our scenario, the weather service decided to externalize its data management system to a cloud system. This has the advantage of lower maintenance costs and more flexible adoption to future amounts of data.

12 Scenario 6 Cloud System Statistics Archive Sensor Network Queue Raw data Forecast data Large Customers User DB Private Customers Figure 2.1: System of the weather service with six examples for different data management systems. The six systems are embedded in a scenario that supports the definition of requirements. 2.2 System Components and their Requirements Queue A queue stores all received data from the sensor network of the weather service until it is validated in a preprocessing step. The queue must be able to store data periodically but not for long. Since the sensor network is not able to store data, the queue requires high availability. High consistency is required to avoid duplicate preprocessing of the data Raw data Most time the data management system for raw data must be able to handle write access. When a new forecast is computed it must be able to handle many read accesses. Furthermore, the container has to handle additional but small read accesses from some customers. It may appear that the container has to handle many read accesses at once but consistency can be relaxed, since it is not important that every private customer has access to the data in real time Forecast data The forecast data of the weather service is static content, since forecasts are not updated in real time. Therefore, the data management system for the forecast data mainly has

13 Scenario 7 to support read accesses from private users, large customers and the archive. Both, the accesses from the large customers and from the archive, are at fixed time intervals and thus easy to support. Since the forecast data is the core business of the weather service, a high availability is necessary Archive and statistics The most important property of the archive is the ability to hold large data sets. There are no requirements for high availability. Computation of statistics and data transfers from the data management system for forecasts data are infrequent tasks, which do not have special requirements. The important requirement of the archive is to support OLAP requests needed for generating statistics. The statistics are stored in a simple system for small amounts of structured data. 2.3 Improvement of Cloud Management through a Policy System With existing cloud storage solutions, the weather service would have to evaluate multiple suppliers and decide, which supplier has solutions that support his requirements and which solution fits best. A policy system could simplify the decision process, since in an ideal case, the weather service defines his requirements and transmits them to different suppliers that automatically try to find a solution. The resulting system configurations would have been made with the target to match the requirements. Therefore, the weather service would not have to evaluate multiple solutions and understand their characteristics but simply know how to define his requirements. Also a modular cloud storage system without the policy system would support customized solutions for the requirements of the customer (in this case the weather service), but it would be still necessary for him to understand the characteristics of the different modules.

14 3 Related Work The policy system together with the modular storage cloud can be divided into three parts: a policy language, a deployment and management system for policies and the modular storage cloud itself. The policy language should be designed to describe abstract requirements for data management systems. The deployment and management is responsible for the negotiation process translation from the policies to module compositions and verification during runtime. The modular storage system consists of a core that manages a set of nodes and deploys modules on them. 3.1 Policy Language Policy languages and frameworks are an established research area. Most research focuses on privacy policies. Three of the most popular privacy or security policies are XACML [3], PONDER [4] and EPAL [5]. These languages mainly differ in their expressive strength. For example, XACML is a declarative language that is designed for access control but has some general processing rules that would also apply to other use cases. There are also existing policy frameworks, that can be flexible adapted to different use cases. Rei [6] is a policy framework, that was designed for security aspects, but is very flexible, since it is based on OWL-Lite. APPEL [7] is designed to adapt to different use cases, but only deals with policies for how a system should automatically react to certain stimuli in certain situations. Therefore, it is does not fit the scenario of expressing constraint like properties for the configuration of a data management system. Another research area, that deals with policies, are web services. One of the most popular policy language is WS-Policy [8]. WS-Policy is a XML based policy language, which uses assertions to specify guidelines for security, quality and version of service. A policy language for modular cloud storage would mainly consist of guidelines for quality requirements such as consistency, availability and throughput. Therefore, the concept of assertions, as used in WS-Policy could be suitable. Related to policy languages, there is also the research area of service level agreements. The languages used for these agreements are designed for quality of service and accounting. Service level agreement languages like WSLA [9] and WS-Agreement [10] address mainly

15 Related Work 9 business aspects of web services. But since they also deal with quality, it might be possible to adapt some concepts to modular cloud storage. In the area of data management systems exist only little work related to policy systems. Traditional data management systems are large containers that might be distributed, but mostly have very strict mechanisms for distribution and configuration. With large databases for mass consumption applications comes the need of adaptive systems that support as fast as possible access. For example, Kadambi et al. presented lately a policy system for replication constraints for selective replication [11]. This approach shows that there exist policies concerning data management systems, but more on the level of the data instead of the characteristics, where we want to develop our policy language. Currently there are no policy languages that directly address the composition of storage systems. So far, the approach to automatically compose data management systems is not well studied. The need of having such systems came only together with cloud environments, where storage as a services is offered to different customers. As we mentioned in the introduction, there exists no uniform terminology for properties of data management systems. Additionally, one of our goals was to simplify the selection of a data management solution, which also includes giving the customer the ability to express his requirements, instead of having to evaluate the modules of the data management system. Therefore, we decided to take existing policy languages and frameworks as examples but to develop our own compact and basic policy language. This approach has the advantage of better adapting to evolving terminologies and requirements during the analysis of how to describe data management systems. 3.2 Deployment and Management System for Policies Together with most policy languages and frameworks come systems for deployment and processing of the policies. Associated systems, for the policy languages and frameworks discussed in the previous section, are well studied and established. Due to the novelty of the area of modular cloud storage systems, there is no related work on how to manage policies in such systems. But as there is no big variation in deployment and management of policies in different scenarios, our main focus was on the processing of the policy or more precise the translation process. Since the time for a master thesis is limited, the design of the deployment and management process is deferred as future and extending work. We also assumed that it would be easier to define a deployment and management system when the requirements of the policy language and connection to the modular cloud storage are already defined. 3.3 Modular Storage Cloud The goal of our work was not only to provide the theoretical policy system, but also an implementation for evaluation purposes. Currently, the only existing modular storage cloud is Cloudy, which we have mentioned previously. But there is further research that deals with modular composition of database systems as in i6db [12] or DBNet [13]. The combination

16 Related Work 10 of these systems together with Cloudy gave us some guidelines of possible modules and compositions. Furthermore, there is ongoing research related to cloud computing which studies the live migration of storage systems. For example, Zephyr [14] is a system that allows the live migration of databases. Thus, we could build on current research to assume future modular cloud storage systems, whose development was not a part of this master thesis. 3.4 UBStore As described previously this thesis builds on the currently evolving area of modular cloud storage systems. UBStore of the Databases and Information Systems Group (DBIS) of the University of Basel is another modular cloud storage system that is currently in development. While the concepts of this thesis are generally designed to fit modular cloud storage systems in general, we paid attention on the compatibility to UBStore. The implementation of the prototype uses and extends UBStore. UBStore has the target to provide a modular framework and platform for the implementation and evaluation of cloud storage systems (in our case data management system). The modular structure should provide fast and flexible composition to support further research. As we can see in figure 3.1, UBStore (implementation in Java) uses OSGi, which provides a module system for the Java Platform. Data management systems run in the environment of UBStore and can select services, using the mechanisms of OSGi. Figure 3.1: The implementation of the policy system developed for this thesis work is based on UBStore and OSGi. The layers show the dependencies of a data management system running in UBStore. The data management system uses OSGi service mechanisms for searching and receiving service from UBStore OSGi To understand UBStore, one has to understand OSGi. UBStore is based on OSGI, since it is a module system and a service platform with a dynamic component model. OSGi is a standard for the Java platform and provides an environment for modular applications. A module is a bundle that contains classes, jars and configuration files, that exactly explain

17 Related Work 11 the dependencies. Figure 3.2 shows the important layers/components of OSGi for our use case, that are explained in the following descriptions: Bundles Bundles are jars that contain the implementation of the "module" together with additional manifest headers that describe the bundle and its dependencies. Services The service layer dynamically connects services. Services are parts of a bundle and usually defined using Java Interfaces. Commonly, there are three parts involved in a service binding: service definition, service consumer and service provider. The service definition is a class that defines the service (usually a Java Interface). The service consumer is a class that requires the implementation of the service definition and the service provider is the implementation (also a class). Service Registry The service registry defines APIs for registering, unregistering and finding services. The service registry also defines a ServiceTracker that allows the notification about state changes (e.g. registered, unregistered) of services. Life Cycle The life cycle layer provides an API for the management of bundles. Bundles can be installed, started, stopped, updated and uninstalled. When a bundle is installed (INSTALLED), the dependencies are loaded and the state is set to RESOLVED. A RESOLVED bundle can be started STARTING and is afterwards ACTIVE. Modules The module layer defines encapsulation and dependencies. This primarily means how a bundle can import and export other code. JVM An OSGi implementation runs on top of the Java Virtual Machine (JVM), which defines the platform the application is running on. Figure 3.2: Important layers of the OSGi model. Bundles are parts of the application running in the OSGi environment. Bundles are modules that can have multiple service definitions, consumer and providers that indirectly define the connections between bundles. As described above, the services and service registry only define APIs for connecting services. This means, for example, that a bundle has to include code that registers services on startup and thus the registration can be very flexible during runtime. But this means also, that there is no clear definition of which services a class provides and references, which can lead to a high complexity.

18 Related Work Components Another possibility are declarative services, that avoid coded connections by defining components as XML files. A component is a class together with its provided and referenced services. Furthermore, one can define the cardinality of connections and methods that are called to bind a service provider. One advantage of the declarative services module is the automatic service connection (register, find,...) based on the XML definitions. Thus, it provides a way of redefining service bindings without changing the implementation and avoids start-up code (e.g. for registering services) Structure of UBStore UBStore is designed as a modular system that can be deployed on several nodes. The instances on different nodes have the same basic services, that provide functionality for module management and communication between nodes. The UBStore core is implemented as an OSGi bundle, that also defines the basic services that are bound using declarative services. The core mainly consists of a class Controller that provides itself as a service and references the basic services. Following, we describe those basic services that can also be seen in figure 3.3: Controller The Controller is the central class of the UBStore management. Bootstrapper The Bootstrapper service is called on start-up. Log The Log service is used as centralized logger for UBStore and other modules. Router The Router service maps data to the nodes UBStore is running on. Shepherd The Shepherd service is used for managing and querying other UBStore instances. Remote The Remote service defines RMI interface to other UBStore instances Service search For our work, it is important, that UBStore builds directly on the service selection mechanisms of OSGi and thus does not support manual service selection. If the service registry of OSGi is asked to get a service provider for a given service definition, it selects this provider automatically. If we require a specific service, we have to use service trackers, that observe all service providers and thus allow us to find for example a specific class. This means for our policy system, that we have to provide mechanisms that build on service trackers for getting the selected services. Furthermore, the policy system has to provide mechanisms, that allow the data management system modules to obtain the selected services (see figure 3.4). The idea behind UBStore is that each instance runs the same module composition and thus builds a homogeneous distributed system. We can build on this mechanism and thus need only to handle the policy on one node (instance). Since at the moment it is not clear, how this mechanism will finally work, we concentrated on the deployment on one instance.

19 Related Work 13 Figure 3.3: The UBStore core together with its basic modules. Each instance of UBStore runs these modules that provide basic functionality for management and communication between nodes. Figure 3.4: The policy system needs to extend UBStore with an alternative mechanisms to obtain services. 3.5 Configuration Problem As mentioned previously, our policy translation is an optimization problem. Optimization problems are a large and old research area. Numerous papers present a large variety of algorithms to handle different types of problems, as we can see in the survey of Archetti et al. [15] from Today optimization problems are still not solved, but when it comes to uncertain environments, evolutionary algorithm has been widely used. Even in this sub-area exist numerous techniques as described by Jin et al. [16] in a survey. Our optimization problem tries to find an optimized system configuration, that consists of modules and properties that can be arbitrary calculated. As we explain in chapter 4, we have no assumptions about the function space and thus the environment of the optimization problem. This suggest the usage of an evolutionary algorithm. Regarding that our work is not only about the policy translation and at the moment our target is primarily to show that such a system is possible, we decided not to investigate to deep in the selection of the

20 Related Work 14 algorithm but took the well explored and widely used genetic algorithm for our purposes. Descriptions and applications of the genetic algorithm were described by Goldberg [17] and show the well established knowledge in this area. Furthermore, we describe later that we take a reward (maximization) and costs (minimization) to rate and optimize our system configuration. In the area of optimization problems this is well known as multi-objective optimization and also comes with large knowledge about how to treat this types. But again, it is still open to find an one-fits-all solution for combining multiple measures of different objectives as described by Konak et al. [18]. For our work we decided to use a basic and robust weighted sum approach for calculating the fitness, as we explain later.

21 4 Policy System This chapter describes the concepts developed during this thesis. First, in section 4.1 we describe and discuss the environment of the policy system and give a rough overview over the process the policy system is supporting. In the following sections, we describe the concepts of the components of the policy system in more detail. Section 4.2 describes the components, necessary to communicate with the modular cloud storage system, while section 4.4 describes the components to find a matching solution for a given policy and thus our main contribution. Section 4.3 describes our policy language that we use to formulate policies and module descriptions. 4.1 From Requirements to System Configuration For this work, we assumed a simple use case. As shown in the second case of figure 4.1 we have on one side the customer with his requirements and on the other side the modular cloud storage system with its choice of modules. In the simplest case the customer would take the set of modules and compose them himself. If the system offers many modules on a low level the customer has to spend a lot of time on evaluating the modules. If the modular cloud storage provider or a third party provides experts, it is crucial whether the experts succeed in translating the requirements to a system configuration. With a policy system (third case in figure 4.1) we avoid that the customer has to evaluate the modules himself and also reduce the risk at the translation process. The translation from the customers requirements to a system configuration can be viewed as three separate parts. First, the modular cloud storage system is responsible for providing information on the characteristics of the modules in an understandable form for the policy system. For example, this information can be provided by developers or it can be gathered by automatically evaluating the modules. Second, the customer has to translate his requirements to a policy that can be understood by the policy system. In the ideal case the customer needs no external help for writing the policy. This case might be impossible but, since the customer has not to deal with the modular cloud storage itself, we can try to accommodate the customer by defining the policy language on a high and abstract level. The third part of the translation is the matching between the policy and the information

22 Policy System 16 Figure 4.1: Three possibilities of getting from the requirements (in sense of a thought) to a data management system. The first approach is the currently used approach and requires the customer to compare different solutions before selecting one, which can be time consuming and maybe not in a good solution. The second approach uses a modular cloud storage system but has the problem of comparing different modules before composing a solution by hand. This approach requires deep expertise in data management systems but offers a customized solution. The third approach uses our policy-based composition of the modules in the modular cloud storage system. Thus, the customer receives a customized solution without deep expertise in the mechanisms of data management systems. about the modules Characterising modules Definition of a module Modules can be defined on different layers of abstraction. On the lowest level we find modules that are defined by the programming language of the system. In the case of Java, this would be classes. On a much higher level modules could be whole systems. For example, one could define a RDBMS or a Key-Value-Store as module. Those modules could be extended with modules that define different hardware configurations or different access interfaces, like HTTP or more specialized protocols. Taking classes as modules mainly has the problem that in many cases not only a single class provides functionality. For example, a locking protocol provides it functionality only together with the right implementation of a lock. The definition of modules as whole systems with only few modules as extensions would have the advantage of a simple configuration problem but also is in conflict with our goal to provide

23 Policy System 17 high customizability. For our system we decided to define modules as units that provide and reference some functionality (service) and have properties that might be changeable (e.g. attributes). A module can reference or provide none, one or multiple services. Thus, from the level of implementation a module can be seen as a class that provides some functionality together with classes that contain data or characterise the functionality in more detail. On the other hand, the definition also allows more abstract modules like a whole Master-Slave System with all its characteristics. Since we did not specify any direct relation between modules and classes, there also exist different interpretations of modules. For example, one could see the provided services as tags. From this perspective, the tags signal that the module is able to provide information about a certain topic or keyword Describing modules As we described previously, the definition of a module could be generated automatically or by developers. The approach of automatically testing modules and thereby identifying the influenced properties could avoid some problems. Automated tests could evaluate non-functional properties like performance or availability in more detail then a developer. Automatic tests would also allow more frequent changes of modules since the re-evaluation would be independent of knowledge transfers between developers. But there might also be problems with the automatic evaluation. For example, it might happen that an automatic evaluation system would relate some modules with unrelated properties due to statistical noise. The definition of modules by developers has always the risk of inaccurate information. This might be by inaccurate knowledge, missing guidelines between developers on how to choose measures or even by malicious intent. Nonetheless, we decided to use this manual approach of describing modules. We made this decision to keep the complexity of the whole work low and because the translation from policy to system configuration can be viewed as a problem, independent of the description process. Furthermore, we could reduce the problems of the manual approach due to the currently first evaluations in a theoretical environment. Thus, mainly the problem of malicious intents and missing agreements can be eliminated Requirement to policy One of our primary goal was to simplify the time that is needed to come from requirements to a solution. A policy system gives the ability to hide complex structures and relations in a system. Thus, it reduces the complexity of the decision problem. Nevertheless, a user of a policy system requires knowledge about the solution he is specifying requirements for. It is to investiaget, how deep and specialized the required knowledge should be. In our case, the less knowledge we demand about a data management system, the more complex gets the task to find a solution to a policy. On the other hand, the more details the user can express in the policy, the greater is the risk that a user defines specialized requirements and it gets more difficult to understand the high level goal of the user. The advantage of having

24 Policy System 18 the user to define high level goals is the ability to frequently updating the modular cloud storage system and thus optimizing the system continuously. If the user defined specialized requirements, it is likely that those requirements depend on specific modules. Thus, in the case of an update of the modular cloud storage system, it is likely that the policy system is not able to find a new solution that matches the requirements. Otherwise, the policy system could re-evaluate the policy and find another solution, since high level goals specify the overall properties of the whole system independent on the inner structures. For our work, we assumed a customer that at least knows about general properties of a data management system. This includes primarily the ACID properties and availability, consistency and partition-tolerance from the CAP-Theorem but also other properties that can be defined in a global way. In more detail, this means that we assume that the customer sees the required solution as a black-box that shows the required characteristics specified as a set of requirements in form of a policy. In this case the provider has the ability to introduce new modules, change existing modules or even remove modules with a small risk about destroying the ability of the policy system to adapt the existing policies. On the other hand, it is likely that a customer only has to know about a basic set of characteristics/properties that furthermore, could be reused for different providers Policy to system configuration In the previous sections, we described the environment of the policy system. The process of getting from a policy to zero or more system configurations depends on the environment and can be described as five steps: 1. The customer transmits the formulated policy to the policy system 2. The policy system receives a set of module descriptions from the modular cloud storage system 3. The system tries to find a matching system configuration for the given policy 4. The policy sytem offers possible solutions to the customer who selects one 5. The policy system enforces the selected solution Step 1 and 4 are the negotiation process between customer and provider. As described previously in chapter 3, we left the detailed specification of these steps as future work, since there exist well established solutions for this. This means primarily that we ignored legal and financial circumstances. Our contributions are the concepts of step 2, 3 and 5, with step 3 as the main contribution. During the remaining parts of this work, we reference to step 3 as the policy translation. Since the five steps above are only a rough overview, it is important to mention some details about the assumptions and conditions: In step 1, we assume that the customer transmits a semantically valid policy. Furthermore, we assume that the policy has a name and a version. This helps to identify two policies as same over the triple (customer, name, version).

25 Policy System 19 During the process of getting from a policy to a system configuration we assume that we receive a current snapshot of the set of available modules in step 2. This ensures that the set of modules will not change during step 3. This is a valid assumption, since we also assume that modules only change during infrequent updates of the modular cloud storage system. In the interest of the provider it is important to keep the modular cloud storage system stable and thus planing and preparing every update very well. In step 4, we assume that either no solution was found or a sorted list with a maximum number of solutions was returned in step 3. It is the task of the negotiation protocol to offer the solutions to the customer and give him the possibility to decide between accepting a solution, aborting the whole process or selecting another alternative action. Alternative actions may include the change of the policy and returning to step 3. Step 5 is only executed if there was a decision on a valid solution. A valid solution must contain information about all required dependencies of the involved modules. This means primarily that all required connections between different modules and properties must be defined. Furthermore, for the simplicity of the algorithm, we assume that the modular cloud storage system has enough resources to enforce it. The whole process from step 1 to 4 should ideally replace the process of comparing different solutions of different providers but does not include the formulation of the requirements (in this case in the form of a policy). Cloud systems have the advantage of a low time-tomarket compared to the classic set-up of an own solution. Therefore, our target is to keep the time between step 1 and 4 as low as possible with the maximum in the low range of hours. Ideally, the process takes not more than a few minutes. This target is also supported by the fact that the resulting system configuration is only an approximation. As described previously, currently we have no exact definition of data management system properties and some properties depend on the data traffic during runtime. Thus, it is valid to keep the time consumption for finding the initial solution for the data management system low subject to the fact that we cannot say whether the solution is the optimal solution. As we describe in detail in chapter 8, it is necessary for further work to monitor and optimize the initial data management system solution. This runtime analysis would be able to evaluate the system using runtime evaluation. 4.2 Binding between Policy System and Modular Cloud Storage System In the previous section, we described the high level overview of the policy system and explained which steps must be defined to specify a policy system. This and the following sections describe the identified parts of the policy system in more detail. This section starts with the binding between the modular cloud storage system and the policy system that allows to get information about the modules available and to enforce the selected system configuration. Since the binding is specific for the modular cloud storage system used, we explain the concepts that were necessary to extend UBStore (see chapter 3). For this thesis we extended UBstore with mainly two concepts: a mechanism for providing module descriptions and a mechanism that manages the modules and thus is able to enforce

26 Policy System 20 the selected services. Figure 4.2 shows the conceptual components for binding the policy system and in our case UBStore, which will be explained in the following subsections Providing module descriptions The modular cloud storage system (in our case UBstore) needs to provide a set of available modules. Since, our interpretation of modules is not bound to the layer of classes, like bundles in the case of OSGI, we decided that a bundle should be responsible for providing module descriptions. This also binds the module descriptions to the implementation and thus ensures that only module descriptions for loaded implementations are available. For further extensions we generalized this concept by introducing a bundle descriptor (see figure 4.2), that not only could provide information about the module descriptions but also additional information. To keep dependencies small, the bundle descriptor of a module (bundle) should only provide module definitions formulated using the policy language. This way we keep the interpretation of the policy language a part of the policy system and thus control the consistency of the interpretation. Therefore, we define that the policy system is responsible for tracking all loaded bundle descriptors, receiving the module definitions and finally parsing them. On the side of the policy system, we need a component that manages the modules. The final task for this module manager (see figure 4.2) is to provide the set of module descriptions for the policy translation algorithm. To fulfil this task the module manager should keep track of all available and loaded bundle descriptors and get the module descriptions from them. An appropriate solution would be the use of service trackers of OSGi. Figure 4.2: The binding between the policy system and UBStore. The Controller takes requests for services and delegates them to the service manager. Thus, only the Controller knows about the service manager and the bundles are independent of the policy system. The module manager is responsible for tracking all bundle descriptors and thus the available module descriptions.

27 Policy System Enforcing system configurations The system configuration produced by the policy system contains strict information which service implementation is selected for which service requester. Furthermore, for each module it contains information about properties that have to be set. Since OSGi is not designed for fully manually service selection (see chapter 3) and thus UBStore, we had to introduce a new service selection concept for UBStore. As figure 4.2 shows, the policy system introduces a service manager, which is used by the Controller of the UBStore core. For separation of concern, the Controller should provide the same interface as the service manager and forward all requests to the service manager. Thus, the bundles do not depend on the implementation of the policy system (see figure 4.2). Since the service manager provides an alternative to the automatic service selection of OSGi, it should support similar functionality like the service registry of OSGi. In our case this mainly addresses methods for getting services, since the registration still can be managed by OSGi. This means also, that the service manager has to use service trackers of OSGi to obtain all available services. An implementation might add a pendant to the service tracker of OSGi. Since the policy system should enforce a policy, a pendant to the service tracker should only notify the service consumer on state changes of the selected service. This concept is comparable with a service listener. Furthermore, the service manager is responsible for enforcing the defined attributes of the system configuration. Since OSGi does not support the manual settings of arbitrary properties, the implementation should also add an interface for services, that enables the enforcement of properties. 4.3 Policy Language In this section, we describe the design and underlying concepts of our policy language. As mentioned previously in chapter 3, we decided to develop our own simple policy language. This has the advantage of better adapting to the evolving concepts during this master thesis and to avoid complex adoptions to the concepts of the existing policy language or framework. The policy language is the most abstract layer of a total of three layers (see figure 4.3) that define the syntax of policies and module descriptions. In section 4.4, we will define the middle layer by presenting our approach for the policy translation and deducing methods that define a specific requirement or variable definition (see previous section 4.3). Finally, in chapter 5 we describe the third layer that defines the specific implementation of specific requirement and variable definition types. For our policy language we had mainly two requirements. First, the language should describe the policies formulated by the customer and the module definitions formulated by the developer. Second, both the requirements of the policy and the characteristics of the modules should be designed to be easily extensible. Furthermore, it is important that both policy and module definition are human-readable and we also wanted to use a language that is commonly supported. Therefore, we decided (following the example of many other policy languages) to base our policy language on XML. In the policy system the language elements are mapped to implementations. The implementations define for example how the

28 Policy System 22 Figure 4.3: The three layers from the XML representation of the policy language to the implementation of the objects defining the semantics of the policy language elements. The policy translation algorithm works on the abstract definition of the policy language elements. fulfilment of a requirement is rewarded and how the characteristics of a system configuration are computed. We explained previously, that we did not investigate the policy negotiation process in depth, which is the reason for missing language elements that might be necessary for legal and financial arrangements during the negotiation process. Furthermore, as we described in chapter 1 and 2 we based our design of the policy language and our evaluation on a scenario that contains six exemplary data management systems embedded in a context of a weather service. Thus, we can not claim to have developed a complete set of language elements for our policy language that works for all use cases. Therefore, this chapter only discusses the underlying concepts and language elements for the definition of requirements and characteristics (properties). The different concrete types of requirements and properties are then described in chapter 5 during the description of the evaluation environment. In the following subsections we describe the models of module definitions and policies. Furthermore, the next section contains the language element definitions as XML Schema Definitions Module description As described previously, we defined modules as units that provide and reference some functionality (service) and have properties that might be changeable (e.g. attributes). In this section, we describe our model of a module in detail. It is important that we do not define any direct mapping between our modules and concepts on the layer of the used programming language. For example, this means in the case of Java that our module can consist of one or more classes. Since we do not define any direct mapping between modules and concepts on the layer of the used programming language, we have no influence on the instantiation of classes on the layer of the programming language. Thus, we define our modules as singleton for keeping the complexity of the module composition problem low. Most cases where multiple instances of

29 Policy System 23 the same module would be required, can be circumvented. In the case of multiple instances with the same configuration, the multiple instances can be delegated to the implementation of the module. For example, the module might have an attribute that defines the number of instances. The other case, with multiple instances of the same module with different configuration can be solved by defining multiple modules. For example, a logging module could provide two modes: one for debug output and one for secure logging (e.g. in the case of access logging in a bank). Two other modules require different modes. We can solve this by defining a module for each mode. There might be other cases where this strategy will not be applicable, but can be supported by adding specific variable definitions that would handle the selection of multiple values. In thesis, we only added variable definitions that were necessary for supporting the six examples from our scenario, but that is exactly why we defined extensibility as one of our primary requirements Module attributes and structure As we can see in listing 4.1 module has a name, a version and a type. It is identified over the tuple (name, version). This is not mandatory necessary for the policy system but might be helpful for further work. The type is necessary for categorizing modules. We assume that a data management system produced by a modular cloud storage system consists of at least one module. Furthermore, we assume that every data management has at least one basic module that provides information about the environment (e.g. the modular cloud storage system or hardware). Other modules might be added to the system to fulfil the requirements of the base modules, but there might be also modules that are not necessary to fulfil any requirement but might add some functionality. For example, a module that regularly archives data older than a certain date will not be required by other modules, but it might be useful as an extension and influences the necessary storage size. This results in three categories (types) a module can belong to: standard (default), base, extension. Modules of the type standard are those modules that are only added to a system if they are required. base are the mandatory modules and extension the optional modules. 1 <xs: complextype name=" module"> 2 <x s : s e q u e n c e> 3 <x s : e l e m e n t r e f=" p r o v i d e d S e r v i c e " minoccurs="0" maxoccurs=" unbounded"/> 4 <x s : e l e m e n t r e f=" r e f e r e n c e d S e r v i c e " minoccurs="0" maxoccurs=" unbounded"/> 5 <x s : e l e m e n t name=" v a r i a b l e D e f i n i t i o n s " minoccurs="0"> 6 <xs: complextype> 7 <x s : s e q u e n c e> 8 <x s : e l e m e n t r e f=" v a r i a b l e D e f i n i t i o n "/> 9 <! s p e c i f i c t y p e s o f v a r i a b l e d e f i n i t i o n s are i n c l u d e d u s ing XSD element s u b s t i t u t i o n > 10 </ x s : s e q u e n c e> 11 </ xs: complextype> 12 </ x s : e l e m e n t> 13 </ x s : s e q u e n c e>

30 Policy System <x s : a t t r i b u t e name=" v e r s i o n " type=" x s : s t r i n g " use=" r e q u i r e d "/> 15 <x s : a t t r i b u t e name="name" type=" x s : s t r i n g " use=" r e q u i r e d "/> 16 <x s : a t t r i b u t e name="moduletype" type=" type " default=" standard "/> 17 </ xs: complextype> <xs:simpletype name=" type "> 20 < x s : r e s t r i c t i o n base=" x s : s t r i n g "> 21 <xs:enumeration value=" base "/> 22 <xs:enumeration value=" standard "/> 23 <xs:enumeration value=" e x t e n s i o n "/> 24 </ x s : r e s t r i c t i o n> 25 </ xs:simpletype> Listing 4.1: XSD of the module, the root element of a module description. The specific types of variable definitions can vary depending on the implementation. A module consists of a set of referenced services, a set of provided services and a set of variable definitions. These variable definitions define variables that define the properties of the module, which are the characteristics of the data management system, but they define also the properties, which can be set to modify the module (e.g. attributes). We use the term variable in this case to express the dynamics during the composition process, though a "variable" might also be a fixed value (constant). As we will describe in section 4.4, we use the variable definitions to instantiate variables that are used to evaluate the system configuration. For example, a variable definition might define that the module provides a strong consistency with the value 1.0 or it might define that the module has an attribute that defines the memory size and can take values between 1 and Referenced and provided service The referenced and provided services express the connections between modules. A provided service expresses the ability of the module to provide information and/or functionality for a specific "topic", while a referenced service expresses the requirement to obtain such information and/or functionality. 1 <xs:complextype name=" r e f e r e n c e d S e r v i c e "> 2 < x s : a l l /> 3 <x s : a t t r i b u t e name=" r e q u e s t e r " type=" x s : s t r i n g " use=" r e q u i r e d "/> 4 <x s : a t t r i b u t e name=" s e r v i c e " type=" s e r v i c e " use=" r e q u i r e d "/> 5 <x s : a t t r i b u t e name=" b i n d i n g n e s s " type=" b i n d i n g n e s s " use=" r e q u i r e d "/> 6 </ xs: complextype> 7 8 <xs:simpletype name=" s e r v i c e "> 9 < x s : r e s t r i c t i o n base=" x s : s t r i n g "/> 10 </ xs:simpletype> <xs:simpletype name=" b i n d i n g n e s s "> 13 < x s : r e s t r i c t i o n base=" x s : s t r i n g ">

31 Policy System <xs:enumeration value="mandatory"/> 15 <xs:enumeration value=" o p t i o n a l "/> 16 </ x s : r e s t r i c t i o n> 17 </ xs:simpletype> Listing 4.2: XSD of the referenced service together with the definition of service and bindingness. As we can see in listing 4.2, referenced service has three attributes: service, requester and bindingness. The service is something (here defined as string) that identifies the referenced service class (e.g. in terms of Java an interface). For example, the service might be "Storage" and would express that the module needs another module that can provide information or functionality related to the topic "Storage". The requester is an identifier for distinguishing between different service references and may be any value. A referenced service is identified over the tuple (service, requester). Thus, it is possible to obtain multiple service instances by referencing the same service multiple times. For example, this might be necessary if a module requires two logging modules with different abilities (e.g. a debug logger and a secure logger). Finally, a referenced service has a bindingness that accepts two values: mandatory and optional. 1 <xs:complextype name=" p r o v i d e d S e r v i c e "> 2 < x s : a l l /> 3 <x s : a t t r i b u t e name="monopoly" type=" x s : b o o l e a n " default=" t r u e "/> 4 <x s : a t t r i b u t e name=" s e r v i c e " type=" s e r v i c e " use=" r e q u i r e d "/> 5 <x s : a t t r i b u t e name=" p r o v i d e r " type=" s e r v i c e " use=" r e q u i r e d "/> 6 </ xs: complextype> Listing 4.3: XSD of the provided service. The definition of service can be found in listing 4.2. Similar to the referenced service, a provided service has three attributes: service, provider and monopoly (see listing 4.3). The service defines the service class that is provided by the corresponding module. Unlike a referenced service, a provided service is identified over the service and thus a module may not have two provided services with the same service. This makes sense, since we defined all modules as singletons, as described previously. The provider of a provided service is necessary for the service manager for identifying the providing component and is the only direct connection between a module and a class on the implementation layer. In our case of UBStore and therefore OSGi, the provider is the class name of the OSGi service. If the provided service is only used to obtain information and no functionality, the provider may have any value, since it is only used for identifying the characteristics of the data management system but will not be requested on the implementation layer. The attribute monopoly is a flag that can be set to signal that this module can not be used with other modules that provide the same service. For example, in the case of most replication control protocol it would not make sense to have multiple implementations (e.g. 2PC can only enforce consistency if there is no other active replication control protocol).

32 Policy System Variable definition A variable definition describes something that defines a module and has one of three possible relations to the environment (communication information): input, output or input and output. In the first case the variable describes an attribute or a property that can be used to modify the module. In the second case the variable is a property that characterises the module and thus the data management system. The third case covers both the first and the second case and is a characteristic that can be manipulated directly. One example is the data type of a data management system that is used for data output. An access module might have an input and output variable that offers two values (e.g. XML and JSON). If the customer requires a data management system with XML data as output, this value can be set directly as a property of a module and at the same time expresses this as a characteristic. We decided to define a variable definition as abstract as possible, which leaves the decisions on how to determine the value to the concrete types of variable definitions. The basic variable definition, as shown in listing 4.4, has only two attributes: name and visibility. The visibility can take two values (public and private) and determines whether the variable can be addressed by the policy. This means, if the policy requires a variable x to have the value a, the policy system will only respect the variables, which have a name equal to x and are defined as public. Thus, the name of a variable definition is the name of a data management system characteristic or in the case of a private variable definition any other name that is used for internal identification. We decided to make the communication information not a part of the policy language definition. The communication information is specific for different types of variable definitions. For example, an attribute can be defined as input only, as in most cases the configuration of a data management system does not represent the abstract characteristics. Therefore, we left the communication information to the implementation of specific variable definition types as we will describe later in section <xs:complextype name=" v a r i a b l e D e f i n i t i o n " a b s t r a c t=" t r u e "> 2 < x s : a l l /> 3 <x s : a t t r i b u t e name="name" type=" x s : s t r i n g " use=" r e q u i r e d "/> 4 <x s : a t t r i b u t e name=" v i s i b i l i t y " type=" v i s i b i l i t y " use=" r e q u i r e d "/> 5 </ xs: complextype> 6 7 <xs:complextype name=" b o u n d e d V a r i a b l e D e f i n i t i o n " a b s t r a c t=" t r u e "> 8 <xs: complexcontent> 9 <x s : e x t e n s i o n base=" v a r i a b l e D e f i n i t i o n "> 10 <x s : e l e m e n t r e f=" range "/> 11 </ x s : e x t e n s i o n> 12 </ xs: complexcontent> 13 </ xs: complextype> <xs:complextype name=" range " a b s t r a c t=" t r u e "> 16 < x s : a l l /> 17 </ xs: complextype> 18

33 Policy System <xs:simpletype name=" v i s i b i l i t y "> 20 < x s : r e s t r i c t i o n base=" x s : s t r i n g "> 21 <xs:enumeration value=" p u b l i c "/> 22 <xs:enumeration value=" p r i v a t e "/> 23 </ x s : r e s t r i c t i o n> 24 </ xs:simpletype> Listing 4.4: XSD of the basic variable definition together with the definition of visibility. The element variabledefinition is defined abstract, since it is only the common base for the later defined specific types of variable definitions. The bounded variable definition is an abstract specialization of the variable definition and defines the definition of a range. One of our final targets is to find a module composition that matches a given policy. The basic definition of a variable definition does not include any computational restrictions. One category of variables that could help to reduce the complexity of the module composition process are variables that have a previously defined range, which enables the analysis of satisfiability. Especially variables with a discrete number of possible values are suitable for discarding modules in an early stage of the composition process. Therefore, we included bounded variable definitions in our basic module model. As shown in listing 4.4 the bounded variable definition only adds the definition of a range, which is also left for further specialization Policy definition As described previously, our requirement was to have a simple and extensible policy definition. Furthermore, we argued that we did not consider the negotiation process and thus may have missing language elements. Our simple model of a policy must provide an identifier and a set of requirements. In our case, we only considered requirements related to characteristics of the data management system (during the module composition process expressed as variables instantiated according to variable definitions) Policy attributes and structure As described previously and shown in listing 4.5, a policy has two attributes: name and version. It is identified over the tuple (name, version) and can be identified by the triple (name, version, customer) for the management inside the modular cloud storage system. Furthermore, a policy consists of a set of requirements, whose specific types come with the implementation. It might be useful for the module description to provide generic variable definitions that allow the developer to define the mechanisms for determining the value of the variable. A specific requirement type needs a definition of how the fulfilment of the requirement is rewarded. In this case, it is recommended that this is defined by the implementation and the customer only has to choose between different requirement types. For example, the implementation could provide a "constantrequirement", that has as value an attribute. The internal implementation would define that the reward is 1.0 iff it is fulfilled and 0.0 otherwise. This abstraction simplifies the task of formulating a policy, since the customer only has to deal with the definition of provided requirement types.

34 Policy System 28 1 <xs:complextype name=" p o l i c y "> 2 <x s : s e q u e n c e> 3 <x s : e l e m e n t r e f=" requirement "/> 4 </ x s : s e q u e n c e> 5 <x s : a t t r i b u t e name=" v e r s i o n " type=" x s : s t r i n g " use=" r e q u i r e d "/> 6 <x s : a t t r i b u t e name="name" type=" x s : s t r i n g " use=" r e q u i r e d "/> 7 </ xs: complextype> Listing 4.5: XSD of the policy, the root element of a policy definition. For further work it might be necessary to add elements related to legal and financial topics Requirements A basic requirement has only a bindingness (see listing 4.6). The bindingness defines whether the requirement is mandatory or optional and thus, a simple type of weighting. As mentioned previously we concentrated on the category of variables that are related to a variable, which defines the characteristic of the data management system. In our model, the variable requirement is an extension the basic requirement and adds the name of the variable. As described for the variable definitions, only public variables are visible to the policy. Thus, the variable name of the variable requirement must be the name of a public variable definition. 1 <xs:complextype name=" requirement " a b s t r a c t=" t r u e "> 2 < x s : a l l /> 3 <x s : a t t r i b u t e name=" b i n d i n g n e s s " type=" b i n d i n g n e s s " use=" r e q u i r e d "/> 4 </ xs: complextype> 5 6 <xs:complextype name=" variablerequirement " a b s t r a c t=" t r u e "> 7 <xs: complexcontent> 8 <x s : e x t e n s i o n base=" requirement "> 9 < x s : a l l /> 10 <x s : a t t r i b u t e name=" variablename " type=" x s : s t r i n g " use=" r e q u i r e d "/> 11 </ x s : e x t e n s i o n> 12 </ xs: complexcontent> 13 </ xs: complextype> <xs:simpletype name=" b i n d i n g n e s s "> 16 < x s : r e s t r i c t i o n base=" x s : s t r i n g "> 17 <xs:enumeration value="mandatory"/> 18 <xs:enumeration value=" o p t i o n a l "/> 19 </ x s : r e s t r i c t i o n> 20 </ xs:simpletype> Listing 4.6: XSD of the requirement and the variable requirement, together with definition of bindingness. Both requirement and variablerequirement are defined abstract, since they only define the common base of specialized requirement types.

35 Policy System Policy Translation The policy translation is the main contribution of this thesis work. We define the policy translation as the process of how to compose possible system configuration given a policy (provided by the customer) and a set of module descriptions (provided by the modular cloud storage system). In the previous section 4.3, we described our representation of policies and module descriptions as seen by a customer or a module developer. This representation is the most abstract layer of a total of three layers that define the semantics of policies and module descriptions. In this section, we will define the middle layer (see figure 4.4) by presenting our approach for the policy translation and deducing methods that define a specific requirement or variable definition (see previous section 4.3). In chapter 5, we will describe the third layer that defines the specific implementation of specific requirement and variable definition types. Figure 4.4: The three layers from the XML representation of the policy language to the implementation of the objects defining the semantics of the policy language elements. The policy translation algorithm works on the abstract definition of the policy language elements. For our policy translation algorithm we need to specify the involved objects: policy, module description and system configuration. While section 4.3 defined the properties of the policy and the module description, this section adds the definition for the system configuration and further methods and properties needed for the algorithm. The full UML diagram of policy, module description and system configuration can be found in the appendix (A.2). It is important to note that the UML diagram only defines the conceptual model. For performance reasons, the implementation differs from the model, which will be explained in chapter Assumptions As described in section 4.1.3, we assume a set of modules that will not change during the policy translation. The policy translation is embedded in the scenario of a customer who transmits his requirements (formulated as a policy) to the modular cloud storage system and finally wants a deployed data management system. Ideally, the whole process takes not

36 Policy System 30 longer than a few minutes. Therefore, the assumption of a static set of modules is valid. Currently, we cannot say whether the resulting system configuration is the optimal solution, partly because of the missing definition of data management system properties. Due to this reason, we also did not restrict the determination of the characteristics of a data management system. Those are defined by the variable definitions of the module descriptions but the determination of the values is left to the implementation of specific variable definition types. Thus, we developped an algorithm that is able to find a solution without knowing the solution space Variable definition revisited As described previously in section , we left the definition of the communication information to the implementation. Therefore, the inner representation of a variable definition defines two methods (isinput and isoutput) as shown in figure 4.5. Variables inherit the properties of their associated variable definition. We reference to them as input, output and input/output variables. All input/output variables are also input and output variables. VariableDefinition name : String visibility : Visibility isinput() : boolean isoutput() : boolean Figure 4.5: The variable definition with methods for specification of the communication information. The full UML diagramm with the definition of the policy, module and system configuration objects can be found in the appendix (A.2). This diagram only shows the properties that define a variable definition. Other properties and methods shown in the full diagram are required by the algorithm and explained in the following subsections Problem analysis The result of the policy translation should be a list of weighted and valid system configurations that fulfil the policy. The system configuration defines which modules are used, how they are connected and their properties (attributes). Finally, the best system configuration (the one with the highest weight) can be used for deployment. As an alternative, the n- best system configurations can be presented to the customer who then selects the system configuration to be deployed. Thus, the algorithm should return not only the best system configuration but a list of weighted and valid system configurations. Figure 4.6 shows the system configuration together with those three components. The variable modules contains the modules that compose the system configuration. For each variable definition of the modules in modules the system configuration contains a variable that is associated with its variable definition and a has a value (initially null). The variable is used to evaluate the characteristics of the data management system. The bounded variable is defined similarly. For each referenced service of the modules in modules, the system

37 Policy System 31 configuration creates a service link. Initially, the service link is associated with its referenced service and the variable providedservice is null. During the policy translation the service link can be associated with any matching provided service of the modules in modules. SystemConfiguration modules : Module[] 1 servicelinks 0..* ServiceLink providedservice : ProvidedService referencedservice : ReferencedService 1 0..* variables Variable variabledefinition : VariableDefinition 1 value 0..1 Value value : Object BoundedVariable boundedvariabledefinition : BoundedVariableDefinition Figure 4.6: The system configuration with associated objects and properties. The full UML diagramm with the definition of the policy and module objects can be found in the appendix (A.2). This diagram only shows the properties that define a system configuration. Other properties and methods shown in the full diagram are required by the algorithm and explained in the following subsections. We previously defined that the result of the policy translation algorithm has to be fulfilling and valid. The following list defines those properties: valid A system configuration is valid iff the following conditions apply to its components and those of the used modules: 1. all service links with non-optional referenced services are associated with a matching provided service (the service of the referenced service equals the service of the provided service) 2. for all monopoly provided services applies that no other provided service has an equal service fulfilled A system configuration fulfils a given policy iff the following conditions apply to all its public output variables that are referenced by requirements (the name of the associated variable definition equals the variablename of the requirement): 1. the value of the public output variable is not equal null

38 Policy System the value of the public output variable fulfils the requirements it is referenced by according to their implementation The definition of "fulfilled" requires the implementation of a requirement to express whether it is fulfilled by its referenced variable. Furthermore, we need the variable definition to specify how to determine the value of its variable. Therefore, we define following the method isfulfilled for a requirement and getvalue for a variable definition: Requirement.isFulfilled (systemconfiguration : SystemConfiguration) : boolean Given a system configuration the methods returns true if the requirement is fulfilled by the referenced public output variable of the given system configuration. Otherwise the method returns false. If two system configurations are equal, the results of this method are equal. VariableDefinition.getValue (systemconfiguration : SystemConfiguration) : Value Given a system configuration the method calls the method Variable.update(systemConfiguration) on all variables it depends on and returns a value depending on the system configurations components. In the case of an input variable, the method returns the current value of the variable that is associated with this variable definition. If the value can not be determined, the method returns null. If two system configurations are equal, the results of this method are equal. With the method getvalue we are able to determine the value of a variable. Though, the number of variables in a system configuration can be very large and their evaluation time consuming. This results in the target of only evaluating those variables that influence the requirements. Since the dependencies are left to the implementation, it makes sense to leave the evaluation to the implementations, too. Therefore, we defined the following update methods of Requirement and Variable, that are called recursively only updating necessary variables: Requirement.update (systemconfiguration : SystemConfiguration) : void Given the system configuration the method updates all resources it depends on. In the case of variables it calls Variable.update(systemConfiguration). Variable.update (systemconfiguration : SystemConfiguration) : void If isfinal of this variable is set to true, this method does not change the value of this variable. Otherwise, it calls variabledefinition.getvalue(systemconfiguration) and sets itself the received value Algorithm overview Our policy translation algorithm is separated into two phases: in the first phase, we try to solve most functional requirements, while in the second phase we try to optimize the remaining requirements (primarily non-functional requirements). Functional requirements specify the functionality of a product, in this case it is the data management system. The functionality is defined by the service links and properties like

39 Policy System 33 the data type of output data, the number of nodes, the hardware and similar. Most of these properties can be defined as bounded variables with discrete ranges with a small number of elements. In the first phase, we try to exploit those variables to reduce the solution space by identifying variables whose values can be fixed and thus filtering the variables. First of all, the service links are analysed to find only valid system configurations with respect to referenced and provided services. In the second phase, all unfixed variables are optimized to fulfil the requirements. Those requirements primarily reference non-functional properties like consistency, performance, availability and similar, due to its likely continuous range of possible values. Since we have no assumptions about the solution space, we decided to use a genetic algorithm. The genetic algorithm optimizes the reward of a system configuration, which is defined by the requirements of the policy First phase - service matching and filtering The first phase deals with the service links and the variables of possible system configurations. Since we have no knowledge about the determination of the variable values, we have to evaluate every possible variation of module compositions and input variable values. Another consequence is that in most cases we have no assumptions about the change of the reward during the second phase. Hence, the first phase composes all valid system configurations (given a set of modules and the policy), tries to fix values and returns them as unordered collection for further optimization. Since the second phase uses a genetic algorithm and in general those class of algorithms does not scale well with the problem complexity, it was our target to reduce the number of possible system configurations as far as possible. As defined previously, there exist three types of modules: "base", "standard" and "extension". Those three types have to be treated in different ways during module composition. First, an initial system configuration containing all base modules is generated and thus produces the initial service requirements. This is followed by adding standard and extension modules that provide services which match the unsatisfied service links, because it is necessary for composing a valid system configuration. We interpret extension modules as optional, since they are not needed for producing a valid system configuration. In a third step, all extension modules that can influence the variables referenced by the requirements are added to the previously constructed possible system configurations, if possible. As a summary, the first phase contains the following steps that are shown in figure 4.7 and explained in detail in the following subsections: 1. generate the initial system configuration containing all base modules 2. satisfy all unsatisfied service links 3. filter system configurations that will not be able to fulfil the policy 4. extend system configurations with extensions that can influence variables that are referenced by the requirements 5. repeat step 2 to satisfy new service links induced by the extensions

40 Policy System repeat step 3 to reduce again the number of possible system configurations Figure 4.7: The two phases of the policy translation algorithm, that takes a policy as input. During the first phase, the possible and valid system configuration are constructed. The second phase optimizes the properties of the system configurations Satisfying service links Listing 4.7 shows the pseudo-code of the algorithm used for satisfying service links. The queue newsystemconfigurations stores the system configurations with unsatisfied service links and systemconfigurations stores those that are valid. On line 9 the next system configuration to be satisfied is stored in currentsystemconfiguration and on line 12 the next service link to be satisfied in currentservicelink. The function nextservice goes through the service links of the current system configuration and searches an unsatisfied service link for which exist modules that have a matching provided service. Otherwise the function returns null. Hence, currentservicelink is null if no further service link can be satisfied. Line 15 to 21 try to find provided services and set them. First, on line 15 we check whether currentsystemconfiguration contains already a matching provided service that is declared as monopoly. In this case we have to take this provided service, since otherwise the system configuration would not be valid. Function getpossibleprovidedservices on line 16 searches all provided services (from the modules in modules) that match the service link and are possible. A provided service is not possible if it is defined as monopoly and currentsystem- Configuration already has a provided service with an equal service. Finally, on line 17 to 20 for all provided services, except one, currentsystemconfiguration is copied, the new service link that equals currentservicelink is searched and the provided service is assigned to it. On line 21, the remaining provided service is assigned to the current service link. If it is necessary the module is added to the system configuration. After there is no service link left that can be satisfied, the system configuration is tested on line 25 and added to systemconfigurations if valid. The result is a collection of all valid

41 Policy System 35 system configurations, given the modules. 1 function s a t i s f y S e r v i c e L i n k s ( SystemConfiguration basesystemconfiguration, Module [ ] modules ) 2 begin 3 SystemConfiguration [ ] s y s t e m C o n f i g u r a t i o n s := [ ] 4 Queue<SystemConfiguration > newsystemconfigurations := [ basesystemconfiguration ] 5 SystemConfiguration currentsystemconfiguration := n u l l 6 S e r v i c e L i n k c u r r e n t S e r v i c e L i n k := n u l l 7 8 while newsystemconfigurations. isnotempty ( ) 9 currentsystemconfiguration := newsystemconfigurations. p o l l ( ) repeat f o r e v e r 12 c u r r e n t S e r v i c e L i n k := n e x t S e r v i c e ( currentsystemconfiguration, modules ) 13 i f c u r r e n t S e r v i c e L i n k i s n u l l then break i f not trytosetmonopoly ( currentsystemconfiguration, c u r r e n t S e r v i c e L i n k ) then 16 P r o v i d e d S e r v i c e [ ] p r o v i d e d S e r v i c e s := g e t P o s s i b l e P r o v i d e d S e r v i c e s ( modules, currentsystemconfiguration, s e r v i c e L i n k ) 17 for i n t i := 1 to p r o v i d e d S e r v i c e s. s i z e ( ) 18 SystemConfiguration newsystemconfiguration := copy ( currentsystemconfiguration ) 19 s e t P r o v i d e d S e r v i c e ( newsystemconfiguration, findequal ( newsystemconfiguration, c u r r e n t S e r v i c e L i n k ), p r o v i d e d S e r v i c e [ i ] ) 20 end for 21 s e t P r o v i d e d S e r v i c e ( currentsystemconfiguration, c u r r e n t S e r v i c e L i n k, p r o v i d e d S e r v i c e [ 0 ] ) 22 end i f 23 end repeat i f currentsystemconfiguration. i s V a l i d ( ) 26 s y s t e m C o n f i g u r a t i o n s. add ( currentsystemconfiguration ) 27 end i f 28 end while r e t u r n s y s t e m C o n f i g u r a t i o n s 31 end Listing 4.7: Algorithm for finding service links as pseudo-code. The arguments are the initial system configuration containing the base modules (basesystemconfiguration), all available modules of the modular cloud storage system (modules) and the policy (policy). If there exist multiple possibilities for satisfying a service link the system configuration is

42 Policy System 36 branched Filtering system configurations This part of the algorithm has two targets. First, we want to find variables that can be constrained or where we can determine the value. If the value can be determined, we can set this value and thus fixing the variable. This has the advantage of reducing the variables that can change or have to be optimized by the genetic algorithm. Second, we want to check whether the system configuration is able to fulfil the policy at all. For declaring a variable as fixed we add the variable isfinal to the definition of Variable (see appendix A.2). This flag can be set to true to signal the algorithm that the value should not be changed any more. 1 function f i l t e r ( SystemConfiguration [ ] u n f i l t e r e d S y s t e m C o n f i g u r a t i o n s, P o l i c y p o l i c y ) 2 begin 3 SystemConfiguration [ ] f i l t e r e d S y s t e m C o n f i g u r a t i o n s := [ ] 4 5 for each SystemConfiguration systemconfiguration in u n f i l t e r e d S y s t e m C o n f i g u r a t i o n s ) 6 boolean f u l f i l l a b l e := true 7 8 for each Requirement requirement in p o l i c y. requirements 9 i f requirement i n s t a n c e o f VariableRequirement then 10 VariableRequirement variablerequirement := ( VariableRequirement ) requirement 11 i f not i s E x t e n s i b l e ( systemconfiguration, variablerequirement ) then 12 i f not variablerequirement. i s F u l f i l l a b l e F o r V a l u e ( systemconfiguration, true ) then 13 f u l f i l l a b l e := f a l s e 14 break 15 end i f 16 end i f 17 else 18 i f not requirement. i s F u l f i l l a b l e ( systemconfiguration ) then 19 f u l f i l l a b l e := f a l s e 20 break 21 end i f 22 end i f 23 end for 24 i f not f u l f i l l a b l e then break for each V a r i a b l e v a r i a b l e in systemconfiguration. v a r i a b l e s 27 i f v a r i a b l e. value i s not n u l l 28 v a r i a b l e. i s F i n a l := true

43 Policy System end i f 30 end for for each VariableRequirement variablerequirement in p o l i c y. requirements 33 i f variablerequirement. isconstant ( systemconfiguration ) then 34 variablerequirement. update ( systemconfiguration ) 35 i f not variablerequirement. i s F u l f i l l e d ( systemconfiguration ) then 36 f u l f i l l a b l e := f a l s e 37 break 38 end i f 39 end i f 40 end for 41 i f not f u l f i l l a b l e then break f i l t e r e d S y s t e m C o n f i g u r a t i o n s. add ( systemconfiguration ) 44 end for r e t u r n f i l t e r e d S y s t e m C o n f i g u r a t i o n s 47 end Listing 4.8: Pseudo-code for filtering system configurations. The target is to set certain variables final and check whether the policy can be fulfilled at all. As we have no knowledge of the determination of the values and the evaluation of the requirements, we leave the analysis to the implementation of the specific implementations. Listing 4.8 shows the filtering as pseudo-code, where every system configuration is analysed and added to a resulting collection if it is able to possibly fulfill the policy. The following steps are taken: line 8-24: In a first step it is checked whether the requirements can be fulfilled (Requirement.isFulfillable). A variable requirement is only checked if the function isextensible returns true. isextensible checks whether there exist modules (either extension modules or standard modules, since extension modules could require other standard modules) that are valid to be added to the system configuration and that could influence the variable requirements. Thus, it it necessary that VariableRequirement and VariableDefinition provide information about their possible dependencies (VariableRequirement.getPossibleDependencies and VariableDefinition.getPossibleDependencies). In the case of variable requirements it is not only checked whether they can be fulfilled. but also whether certain values can be set (VariableRequirement.isFulfillableForValue, Variable.isFulfillableForValue and VariableDefinition.isFulfillableForValue). If this is the case, the value is set. line 26-30: After setting values values in the previous step (where possible), the variables with values not equal null are marked as final.

44 Policy System 38 line 32-41: In a last step, the algorithm checks whether there exist variable requirements whose value will not change (VariableRequirement.isConstant and Variable.isConstant and variabledefinition.isconstant). If this is the case, a last update is performed and it is tested whether the variable requirement is fulfilled. This analysis steps require information that depends on the implementation and semantics of requirements and variable definitions. Thus, the methods referenced in this context are defined but their implementation is left for specific types. Following methods have to be implemented for specific types: Requirement.isFulfillable (systemconfiguration : SystemConfiguration) : boolean The method returns false if the given system configuration is not able to fulfill this requirement. If it can not be decided whether this requirement can be fulfilled, the method returns true. VariableRequirement.isFulfillableForValue (systemconfiguration : SystemConfiguration, trytosetvalue : boolean) : boolean The method calls Variable.isFulfillableForValue(systemConfiguration, value, trytosetvalue) for every variable of systemconfiguration it depends on. If this variable requirement requires an explicit value the attribute value is this value. Otherwise value is null. If at least one variable, this requirement depends on, is not fulfillable, the method returns false. VariableRequirement.isConstant (systemconfiguration : SystemConfiguration) : boolean The method calls Variable.isConstant(systemConfiguration) for every variable it depends on. It returns true if all variable it depends on are constant. Otherwise the method returns false. VariableRequirement.getPossibleDependencies (modules : Module[]) : VariableDefinition[] Given the modules modules, this method returns a collection of all variable definitions that might influence this variable requirement. To obtain this collection this method calls VariableDefinition.getPossibleDependencies(modules) for every variable definition that might have an associated variable that might influence this variable requirement and merges the results. Variable.isFulfillableForValue (systemconfiguration : SystemConfiguration, value : Value, trytosetvalue : boolean) : boolean The method calls variabledefinition.isfulfillableforvalue(systemconfiguration, value, trytosetvalue) and returns false if the result is false. Variable.isConstant (systemconfiguration : SystemConfiguration) : boolean The method returns true if the variable isfinal is true or if variabledefinition.isconstant(systemconfiguration) returns true. VariableDefinition.isFulfillableForValue (systemconfiguration : SystemConfiguration, value : Value, trytosetvalue : boolean) : boolean The method returns false if value is not null and it is not possible that the method getvalue might return value given systemconfiguration. If trytosetvalue is true, value is not null,

45 Policy System 39 isinput() is true, this methods checks whether value is allowed as value. If it is allowed the method sets the value of the variable of systemconfiguration that is assigned with this variable definition to value. If value is not allowed, the method returns false. VariableDefinition.isConstant (systemconfiguration : SystemConfiguration) : boolean The method returns true if getvalue always returns an equal value given systemconfiguration. VariableDefinition.getPossibleDependencies (modules : Module[]) : VariableDefinition[] Given the modules modules this method returns a collection of all variable definitions that might influence a variable that is assigned to this variable definition. To obtain this collection this method calls VariableDefinition.getPossibleDependencies(modules) for every variable definition that might have an associated variable that might influence the assigned variable and merges the results Extending system configurations In this step, the algorithm takes the existing system configurations and extends them with the extension modules from the given set of modules. Using VariableDefinition.getPossibleDependencies and VariableRequirement.getPossibleDependencies, we can determine the extension modules that are able to influence the requirements. Since we have no further knowledge about the way the requirements are influenced, we have to add all combinations of different extension modules that are able to influence the requirements. The pseudo-code for extending the system configurations is shown in listing 4.9. For every system configuration, we take the collection of possible extension modules and check for each extension module whether it is valid to add the module to the system configuration. If it is not valid we remove the extension module from the collection (similar to line 11 to 15 in the listing). Given the number of possible extension modules n, there exist 2 n combinations of the extension modules. For each possible combination we create a copy of the system configuration and add the selected extension modules, as seen on line 17 to 27. It is important that we check for each module whether is is still valid (line 21). 1 function extend ( SystemConfiguration [ ] systemconfigurations, Module [ ] modules, P o l i c y p o l i c y ) 2 begin 3 Queue<SystemConfiguration > newsystemconfigurations := [ ] 4 Module [ ] p o s s i b l e M odules := [ ] 5 6 p o s s i b l e M o d u l e s := getextensionmoduleswithinfluence ( modules, p o l i c y ) 7 8 for each SystemConfiguration systemconfiguration in s y s t e m C o n f i g u r a t i o n s 9 Module [ ] modulestobeadded := [ ] for each Module module in p ossiblemodules 12 i f canaddmodule ( systemconfiguration, module ) then

46 Policy System modulestobeadded. add ( module ) 14 end i f 15 end for for each Module [ ] combination in g e n e r a t e P o s s i b l e C o m b i nations ( modulestobeadded ) 18 SystemConfiguration newsystemconfiguration := systemconfiguration. c l o n e ( ) for each Module module in combination 21 i f canaddmodule ( systemconfiguration, module ) then 22 newsystemconfiguration. modules. add ( module ) 23 end i f 24 end for newsystemconfigurations. add ( newsystemconfiguration ) 27 end for r e t u r n newsystemconfigurations 30 end Listing 4.9: Pseudo-code for extending system configurations. Possible extension modules are identified and added in all possible combinations Second phase - optimizing variables Like in the first phase, we assume for the second phase a set of modules and a policy as given. Additionally, we assume a given collection of valid system configurations. The target of this phase is to optimize the given system configurations (see figure 4.7) by varying the input variables and thus defines an optimization problem. To fulfil this target, the algorithm applies a genetic algorithm to each system configuration, collects the results and returns the n-best results. It is important that a genetic algorithm does not assure to find the optimal solution. But as we mentioned previously, the search of an initial solution does not assure to find the optimal solution at all. This solution can be found only by evaluating the system configurations during runtime due to the dependence on the use case. As mentioned previously, we decided to use a genetic algorithm, because we have only little knowledge about the search space. This results from the fact that we do not know how the values of variables of the system configurations are determined. There are two reasons that emphasize the decision for a search heuristic (like a genetic algorithm) instead of an exhaustive search. First, as soon as the set of input variables contains at least one variable that has an infinite range (e.g. in the case of an continuous range), we have an infinite search space, making exhaustive search impossible. Second, in the case of a finite search space, this search space can be very large, thus a search heuristic can be more efficient. Also other algorithms like gradient descent or simplex can be excluded. We can write our optimization

47 Policy System 41 problem as maximize r p,s (i) (4.1) i where r is a reward function defined by a policy p and a system configuration s, that takes a vector i with input variable values i j as input. We have no assumptions about r and cannot say whether it is differentiable (e.g. needed for gradient descent) or convex (e.g. needed for simplex algorithm). The following subsections describe our adapted genetic algorithm with the free parameters and the resulting problems Genetic algorithm A simple genetic algorithm is defined as follows: 1. generate initial population of individuals 2. evaluate the fitness of each individual 3. repeat until termination (e.g. maximum number of iterations or sufficient fitness reached) 1. breed new individuals through crossover of the best-fit individuals 2. mutate new individuals with a certain probability 3. evaluate fitness of new individual 4. replace least-fit population with new individuals In our adapted genetic algorithm, an individual is represented by a set of variables with assigned values. Since we need to optimize different system configuration and compare the results later, we defined two further objects as seen in figure 4.5: a weighted variable set and a weighted system configuration. The weight of both objects is the fitness (f itness) calculated by the genetic algorithm. WeightedVariableSet variables : Variable[] fitness : float WeightedSystemConfiguration systemconfiguration : SystemConfiguration fitness : float Figure 4.8: The weighted variable set is used to store a set of input variables that are varied to optimize the system configuration they belong to. In terms of a genetic algorithm, the weighted variable set represents the individual with the variables as its chromosomes. The weighted system configuration is used to store the resulting system configuration with applied variables from the weighted variable set. The listing 4.10 shows the pseudo-code for the function optimize that goes through all system configurations and optimizes their input variables. The resulting variable sets are applied to the system configuration and stored as weighted system configuration with the fitness of the according weighted variable set.

48 Policy System 42 1 function optimize ( SystemConfiguration [ ] systemconfigurations, P o l i c y p o l i c y, i n t p o p u l a t i o n S i z e, i n t numberofchildren, i n t maxnumberofiterations, f l o a t m u t a t i o n P r o b a b i l i t y ) 2 begin 3 WeightedSystemConfiguration [ ] weightedsystemconfigurations := [ ] 4 5 for each SystemConfiguration s y s t e m C o n f i g u r a t i o n s in s y s t e m C o n f i g u r a t i o n s 6 V a r i a b l e [ ] v a r i a b l e s := [ ] 7 WeightedVariableSet [ ] population := [ ] 8 WeightedVariableSet [ ] newpopulation := [ ] 9 10 v a r i a b l e s := getallopeninputvariables ( systemconfiguration ) 11 population := generaterandompopulation ( v a r i a b l e s, p o p u l a t i o n S i z e ) for 1 to maxnumberofiterations 14 c a l c u l a t e A n d S e t F i t n e s s ( population, systemconfiguration, p o l i c y ) 15 i f i t e r a t i o n s S i n c e L a s t I m p r o v e m e n t ( population ) > 50 then break newpopulation := [ ] 18 for 1 to numberofchildren 19 WeightedVariableSet c h i l d = reproduce ( population ) 20 mutate ( c h i l d, m u t a t i o n P r o b a b i l i t y ) 21 newpopulation. add ( c h i l d ) 22 end for newpopulation. add ( g e t N F i t e s t ( population, population numberofchildren ) ) p opulation := newpopulation 27 end for c a l c u l a t e A n d S e t F i t n e s s ( population, systemconfiguration, p o l i c y ) 30 weightedsystemconfigurations. add ( generatesystemconfigurationswithappliedvariables ( population, systemconfiguration ) ) 31 end for sortdescendingbyfitness ( weightedsystemconfigurations ) 34 r e t u r n weightedsystemconfigurations 35 end Listing 4.10: Pseudo-code for the optimization, using an adapted genetic algorithm. The parameter maxnumberofiterations, populationsize and numberofchildren must be positive and numberofchildren smaller or equal populationsize. The parameter mutationprobability is a real number between 0 and 1.

49 Policy System 43 On line 10 of the above listing, we identify the variables that can be varied for optimization (getallopeninputvariables). We take all input variables of the system configuration, where the flag isf inal is not yet set to true. Thus, after applying the genetic algorithm the system configuration will have values for all input variables. The function generaterandompopulation takes the identified variables and generates populationsize random individuals. Each individual returned by this function has an undefined weight fitness and contains a copy of the identified variables. Each variable has a random value and in the case of a bounded variable (see appendix A.2) a random value in the defined range. The fitness calculation of the genetic algorithm is executed on line 14 and 29 by the function calculateandsetfitness. For each variable set, the function applies the variables to the system configuration, evaluates the fitness (a number between 0 and 1) and assigns the result to WeightedVariableSet.fitness. The fitness calculation is described in more detail in the next section. As we can see on line 13, the algorithm runs for a maximum number of iterations. This is necessary, since we cannot say whether we found the optimal solution and thus the algorithm could never terminate (compare step 3. of the abstract genetic algorithm at the beginning of this section). In our variant of the genetic algorithm, we also used another termination criteria. If the algorithm is stalling, which means that the fitness of the best individual does not change or only slightly, we terminate the algorithm. This is represented by the method iterationssincelastimprovement on line 15. In our case, we decided to use a static limit of 50 iterations and required a minimum change of 0.1. It is difficult to prove whether this is sufficient, but since we only approximate a solution we did not further investigate. This conditions force a fast termination, which avoids unnecessary time consumption. The reproduction (breeding) and mutation of the genetic algorithm (compare with step 3.1. and 3.2. of the abstract genetic algorithm at the beginning of this section) is executed on line 18 to 22. The method reproduce takes two random individuals. Each variable v i can be selected with the probability p = f i n i=1 f i (4.2) where n is the number of variables in the set and f i the fitness assigned to the variable v i. The two individual are taken and a random number r is chosen. The new individual is created by copying variable v 1 to v r from the first parent individual and v r+1 to v n from the second parent variable. Finally, the method mutate changes the value of a random variable randomly (respecting the range if provided) with the probability of mutationprobability. The step on line 24 is needed to keep populationsize numberofchildren old individuals. This ensures that good individuals are not discarded and are kept for the case that reproduction does not lead to better individuals. The free parameters that are provided as arguments of the function optimize are explained in the following sections.

50 Policy System Fitness calculation The ideal solution to a policy is reached, if each requirement is fulfilled as good as possible including the optional requirements. Naturally, a second measure is applied by the costs of the solution. customer. This also applies to our case since the solution is a product sold to a In the most simple case we would count the number of fulfilled requirements (given a system configuration), divide it by the number of requirements and receive a fitness for a given system configuration. But there might also exist requirements that require a value in a certain range to be fulfilled. Nonetheless, one value might be better than another (e.g. the greater the better). Therefore, we decided to add a further method to the definition of the requirement: Requirement.getReward (systemconfiguration : SystemConfiguration) : float Given a system configuration the method returns a value from 0 to 1. The method may not return 0 if Requirement.isFulfilled(systemConfiguration) returns true. With the method defined above we have two measures for the fitness of a system configuration: reward and costs. The interpretation of this means that we allow the algorithm to compensate mandatory requirements with low reward with optional requirements. This means that the writer of the policy is responsible for defining mandatory requirements in a range that allows the compensation with optional requirements. To meet the requirement that only fulfilling system configurations are returned by the algorithm, we defined that the fitness is 0 if at least one mandatory requirement is not fulfilled. This results in the following definition of the fitness f s,p given the system configuration s and the policy p: ( k ) j=1 w c + (1 w) rreqj k, if s is fulfilling f s,p = 0, otherwise (4.3) where k is the number of requirements in p, r reqj the reward of requirement req j, w the weight of the costs and c the costs as following: l i=1 1 c mod i c c = max, if l i=1 c mod i c max 0, otherwise (4.4) where l is the number of modules in s, c modi the costs of module mod i and c m ax the maximal costs. For determining the costs of a module, it is tested whether the module has a variable with name equal "costs". If this variable exists the value of the variable is otherwise taken 0. Obviously the weight of the costs w and the maximum costs c max are free parameters and thus explained in the next section Open parameters and problems The definitions in the previous sections define multiple free parameters that influence the behaviour of the algorithm. The following list defines the free parameters and describes

51 Policy System 45 their influences. In chapter 6 we will describe the evaluation of the algorithm using different values for the parameters after explaining the selection of these values. weight of costs The weight of costs defines the weight of the costs and the reward when calculating the fitness. A large value leads to a cheap solution, while a small value to a qualitative solution. populationsize The population size determines the number of resulting system configurations for each system configuration provided as input. For the genetic algorithm a larger population size results in a longer duration, but also in a higher probability of generating a good initial individual. maxnumberofiterations If the algorithm reaches the maximum number of iterations it is terminated. This termination criteria is only for avoiding an endless loop. For our case, we defined a fixed maximum number of iterations. As we will explain later for the evaluation in chapter 6, we set this free parameter to 1000 to limit the time consumption of the algorithm. mutationprobability The mutation probability defines the probability with that a new individual has a randomly changed value. If the probability is too low, it is likely that it takes longer to breed better individuals. If the mutation probability is too high, it is likely that good values are discarded more often, which would also result in slower development. Another risk occurs in the case of a high mutation probability together with the number of children set to the population size. In this case, it is likely that good individuals are more often replaced with worse individuals. numberofchildren The number of children defines how many old individuals are replaced by new individuals. If the number of children is the population size, all individuals are replaced. This has the risk of discarding good individuals that were better than child individuals. Thus, it is reasonable to define the number of children smaller than the population size. On the other hand, a smaller number of children slows down the breeding of new individuals and the algorithm takes longer to find a good solution. maximum costs The maximum costs are left as a free parameter defined in the implementation of the policy system. One possibility is to define it as a setting of the policy system. This is a valid solution since the provider of the modular cloud storage system should be able to define a maximum price for his solutions.

52 5 Implementation The previous chapter described the underlying concepts of our policy system and identified three layers. On the first layer, we described the outer representation of policies and module descriptions by defining a policy language using XML Schema Definitions. The second layer described the inner representation by defining objects together with their variables and methods. The requirements were deduced from the policy language and the requirements of our algorithm. This chapter deals with the third layer (see figure 5.1) and describes the implementation details of specific requirement and variable definition types. Figure 5.1: The three layers from the XML representation of the policy language to the implementation of the objects defining the semantics of the policy language elements. The policy translation algorithm works on the abstract definition of the policy language elements. Furthermore, this chapter describes the implementation of the policy system itself and the binding to UBstore. It is important that the implemented policy system does not implement all concepts discussed in chapter 4. As we mentioned previously, this thesis work concentrated on the policy translation and the binding to the modular cloud storage system and thus leaves the policy negotiation to further works.

53 Implementation Implementation Environment For our implementation we used Java 1.6.0_26 together with the UBstore system and Eclipse Equinox [19] (version 3.6.2). UBstore was provided as an OSGi bundle named ch.unibas.cs.dbis.ubstore together with its sources, which allowed us to further extend the implementation and thus implementing the binding between the policy system and UBstore. Like UBstore, our policy system is implemented as an OSGi bundle named ch.unibas.cs.dbis.ubstore.management.policy. Where appropriate we used OSGi Declarative Services to declare bindings between components. For better readability in the following sections, we use class names always without the full package name. Both, UBstore and the policy system use the name of the bundle as the basic package name for all its classes to avoid duplicate class names. This means that all classes are placed in the package ch.unibas.cs.dbis.ubstore or its sub-packages. 5.2 Binding between the Policy System and UBStore As we described previously in section 4.2, we required two mechanisms for binding the policy system and the modular cloud storage system (in our case UBStore). The first mechanism provides module descriptions to the policy system. The second mechanism enforces a select system configuration by managing the services of UBStore. Our implementation implements the specifications from section 4.2. This means that the following components are specified as Java classes or interfaces: Bundle Descriptor (IBundleDescriptor) and Service Manager (IServiceManager) Bundle Descriptor A Bundle Descriptor provides information about an OSGi bundle used with UBStore. For signalling the presence of module descriptions the Bundle must contain a class that implements IBundleDescriptor and registers it as a IBundleDescriptor service. IBundleDescriptor Defines an interface for providing module descriptions to the policy system. Collection<String> getmoduledefinitions () Returns a collection of module descriptions in XML format. To avoid inconsistency this method should return always an equal collection, while the implementing class is registered. Our implementation also comes with a class FileBasedBundleDescriptor that implements IBundleDescriptor. This implementation reads the property UBStore-Module in the manifest file of the bundle. The property is expected to contain a list of relative file names of files in the bundle, that contain XML specifications of module definitions. The class also provides two further methods for activation and deactivation. FileBaseBundleDescriptor Implements IBundleDescriptor and loads module definitions from files, contained in the bundle.

54 Implementation 48 activate () Loads all module descriptions form the files defined in UBStore-Module in the manifest file of the Bundle. After loading all module description, this instance is registered as a service using IBundleDescriptor.class.getName(). deactivate () Unregisters this instance. Both, the interface and the class are provided with the UBStore bundle Module manager For detecting the provided module descriptions and collecting those, the policy system Bundle contains a module manager. The module manager uses an OSGi ServiceTracker to detect registering and unregistering implementations of IBundleDescriptor. The module descriptions are then collected by calling IBundleDescriptor.getModuleDescription and are converted to instances of the Module class (described in section 5.4). If an implementation is unregistered, the module manager removes its module descriptions. The module manager of our implementation is provided as class UBStoreModuleManager and is described in the following with its important methods. UBStoreModuleManager Tracks implementations of IBundleDescriptor and manages their provided module descriptions. The class implements the interface ModuleManager. Collection<Module> getmodules () Returns all available standard and extension modules. Collection<Module> getbasemodules () Returns all available base modules Service Manager The service manager service definition, as described in section 4.2 is implemented as interface IServiceManager provided by the UBStore bundle. Using OSGi Declarative Services, the central UBStore class Controller references an implementation of this interface and also implements the interface itself. Thus, the Controller works as a proxy for accessing the functionality of the service manager. The policy system Bundle comes with an implementation UBStoreTargetSystem of IServiceManager that is also bound to the module manager UBStoreModuleManager. Thus, it provides the communication interface to UBStore for the policy system. The binding between UBStoreTargetSystem and the policy system core is again implemented using OSGi Declarative Services. UBStoreTargetSystem Provides the communication interface to UBStore and implements the methods needed to enforce the current the system configuration. The class tracks provided services specified in the current system configuration and sets the according properties, if the implementation of the provided service implements the interface IConfigurableService. The class implements the interface TargetSystem for abstraction towards the policy system.

55 Implementation 49 Object getservice (String requester, String serviceclassname) Given the requester (ReferencedService.requester) and the serviceclassname (ReferencedService.serviceClassName), the method returns the specified implementation. The specified implementation is found using OSGi Service Trackers and ProvidedService.provider of the provided service specified in the according service link of the current system configuration. addservicelistener (IServiceListener servicelistener, String requester, String serviceclassname) Registers the service listener servicelisteners for the tuple (requester, serviceclassname). The service listener is notified, if the implementation for the tuple (requester, serviceclassname) is registered or unregistered. removeservicelistener (IServiceListener servicelistener, String requester, String serviceclassname) Unregisters the associated service listener for the tuple (requester, serviceclassname). getmodulemanager () Returns the module manager. setsystemconfiguration (SystemConfiguration systemconfiguration) Sets systemconfiguration as the current system configuration to be enforced. The method initiates OSGI Service Trackers for all provided services defined in systemconfiguration. The class UBStoreTargetSystem implements the methods according to the specification defined previously. The classes IServiceListener and ServiceEvent support an alternative to the OSGi service tracker for the policy-based management. The service listener can be registered at the service manager and is notified when the state of the selected service changes. IServiceListener The IServiceListener interface defines only one method for notification. servicechanged (ServiceEvent serviceevent) Called by the implementation IServiceaManager when the service implementation, that matches the tuple (requester, serviceclassname), becomes available or unavailable. ServiceEvent The class ServiceEvent is transmitted to the IServiceListener by the ServiceManager. Type type Specifies the type of the event and can be "REGISTERED" or "UNREG- ISTERED". String requester The requester provided when calling IServiceaManager.ddServiceListener. String serviceclassname The serviceclassname provided when calling IServicea- Manager.ddServiceListener. Object service A reference to the specified implementation similar to IServiceManager.getService. As discussed previously, OSGi provides no way to manually setting arbitrary properties. As described for IServiceManager, it uses an interface IConfigurableService for setting the properties specified by the system configuration:

56 Implementation 50 IConfigurableService The IConfigurableService interface defines two methods for setting and getting properties. setproperty (String name, Object value) Used by the implementation of IServiceManager for setting specified properties (attributes). name equals VariableDefinition.name and value is the value selected by the policy translation algorithm. Object getproperty (String name) Gets the property and can be used for controlling the specified properties. This method is currently not used by our implementation. For running a consistent system with UBStore and the policy system, it is important that all services respect the properties, set through the interface IConfigurableService and obtain their referenced services by using the methods specified by the interface IServiceListener. The Controller should always be used as a proxy, since this central class is reachable by every service in UBStore. 5.3 Policy System The policy system core implementation contains mainly a management class PolicyManagementSystem, which manages the components of the policy system, and a deployment class PolicyDeploymentSystem that deploys a system configuration given a policy. This is done by using the interface PolicyTranslationEngine. The class PolicyDeploymentSystem provides the possibility to limit the number of results by the translation engine and the possibility to customize the selection process (SystemConfigurationSelectionCoordinator) of the system configuration finally deployed using the interface TargetSystem. Thus, it is possible to plug-in further implementation of a negotiation process. The methods of the PolicyDeploymentSystem are described below: PolicyDeploymentSystem Provides the interface for deploying a system configuration given a policy. For selecting a system configuration, that results from the policy translation algorithm, the implementation uses an implementation of the interface System- ConfigurationSelectionCoordinator and calls its method SystemConfiguration select- SystemConfiguration(List<RankedSystemConfiguration> rankedsystemconfiguration). Translated policies are stored together with the selected system configuration for consistency. For translation, the class uses the interface PolicyTranslationEngine. deploy (Policy policy, boolean forcereevaluation) Translates the given policy, selects a system configuration using the used SystemConfigurationSelectionCoordinator and sets it using the TargetSystem interface. If policy was deployed previously and forcereevaluation is false, the stored system configuration is used and the policy is not translated. setsystemconfigurationselectioncoordinator ( SystemConfigurationSelectionCoordinator systemconfigurationselection- Coordinator) Sets the SystemConfigurationSelectionCoordinator to be used.

57 Implementation 51 setmaxresults (int maxresults) Set the number of maximum results that should be returned by the policy translation algorithm when calling deploy. This is also the maximum number of results offered to the SystemConfigurationSelectionCoordinator for selection. getselectedsystemconfiguration (Policy policy) Given the policy policy, this method returns the system configuration that was previously deployed by the method deploy for the equal policy Policy translation Our policy translation algorithm is implemented according to the following interfaces: PolicyTranslationEngine List<WeightedSystemConfiguration> translate (Policy policy, ModuleManager modulemanager, int maxresults) Returns a ranked list of resulting system configurations that are fulfilling and valid. The module manager modulemanager is used to obtain the necessary set of modules together with the policy policy. The argument maxresults defines the maximal number of system configurations returned. The result is sorted descending by the weight and contains only the n-best results with the highest weights. FunctionalRequirementSolver Interface for the implementation of the first phase of the policy translation algorithm. Collection<SystemConfiguration> solve (Policy policy, ModuleManager modulemanager) Returns a collection of all possible valid system configurations that might be fulfilling. The result is evaluated, given a policy and the module manager that provides the necessary set of modules. NonFunctionalRequirementSolver Interface for the implementation of the second phase of the policy translation algorithm List<WeightedSystemConfiguration> solve (Policy policy, Collection<SystemConfiguration> systemconfigurations, ModuleManager modulemanager, int maxresults) Optimizes the system configurations systemconfigurations given the policy and the module manager. The final results for each system configuration are merged and the maxresults-best results are returned as a list. The list is sorted, descending by the weight. Our implementation of the PolicyTranslationEngine (TwoPhasePolicyTranslationEngine) simply calls the implementation of FunctionalRequirementSolver (FilteredServiceMatchingAlgorithm) first and afterwards the implementation of NonFunctionalRequirementSolver (GeneticAlgorithm) by applying the result from the first. We implemented both phases, according to the concepts described in section 4.4. For performance reasons, our target was to pre-calculating as much as possible. For the Filtered- ServiceMatchingAlgorithm this are two maps (using Java HashMap) that are calculated in advance: providedservices and propertyinfluencemap. The first map (providedservices)

58 Implementation 52 maps from Service to Set<ProvidedServices> and thus provides a fast access to the possible provided services given a service. This reduces the algorithmic complexity of the service matching, since it avoids going through all provided services multiple times. The second map (propertyinfluencemap) maps from String to a collection of modules, inferred from the results returned by the method getpossibledependencies (defined in section 4.4). This method returns all possible variable definitions, a variable requirement might depend on. By identifying the modules of each variable definition, we get a collection of all modules that might influence a variable requirement. The key used for this map is a string that uniquely defines each requirement of the given policy. Thus, this map reduces the runtime for checking whether a variable requirement might be extensible (needed for filtering as described in section 4.4), since it is not necessary to re-evaluate the method getpossibledependencies. For the GeneticAlgorithm it is not possible to reduce the runtime or the algorithmic complexity directly. The most time consuming part is hidden in the fitness calculation and thus mainly depends on the implementation of the requirements and variable definitions. Thus, the definition of the genetic algorithm in section 4.4 defines the object WeightedVariableSet that stores the fitness for later reuse. For performance optimization of the implementation, we only decided to parallelize the optimization, since we apply the genetic algorithm to each system configuration. This is possible, since the optimization of each system configuration is independent of each other. The parallelization is implemented using a FixedThreadPool (Java) with a size of twice the number of CPUs available. To avoid large memory consumption, the results from each optimized system configuration are merged with previous results and the resulting is pruned directly to a size of maxresults. 5.4 Policy Language The implementation of the policy language covers three parts. First, we explain the implementation of the objects defined in section 4.4 and used by the policy translation algorithm. Second, we explain the implementation of the specific types of requirements and variable definitions, as mentioned and referenced previously. And finally in the third part, we describe the implementation of the mapping between the inner objects, used in the implementation and the XML representation Policy language and according classes Previously, we explained the objects and methods needed for the policy translation by referencing to the UML diagram, showed in appendix A.2. Our implementation follows this specification, except for the implementation of the values (Value) which are replaced with a type system. For performance reasons, we added methods to the class SystemConfiguration that provides access to mappings and pre-calculated collections. The implementation contains mappings (using Java HashMap) from Identifier to service links, variables and public variables. The class Identifier stores a name, a variable name or a composition of those. It is used as a memory-efficient alternative, instead of always storing strings. Furthermore, we introduced

59 Implementation 53 a pre-calculated collection for all cost variables, contained in the system configuration, thus allowing faster fitness calculation. As mentioned previously, we replaced the object Value by a type system. Different variables require different data types. For example, the number of nodes in a system should be represented using integers, while the consistency most likely should be represented as float value. Since our model allows the connection between arbitrary variables, we also had to take care of compatibility between different value types. Therefore, we implement our type system by defining the interface Type<T>, where the generic argument T is the Java type of the value. The interface Type extends the included type with methods for comparison and conversion for interoperability between different variable values. We implemented four basic data types, that implement the Type interface: FloatType, IntegerType, StringType, BooleanType. The implementation allows the comparison and conversion between InetegerType and FloatType. From the type system it follows, that classes, dealing with values, have to specify its type. We implemented this, using generics for the following classes: VariableRequirement<T extends Type<?> >, Variable<T extends Type<?> >, BoundedVariable<T extends Type<?> >, VariableDefinition<T extends Type<?> >, BoundedVariableDefinition<T extends Type<?> > and Range<T extends Type<?> > Specific requirement and variable definition types For our implementation, we decided to provide a basic set of flexible requirements and variable definition types. As explained previously, we do not claim to provide a complete definition of the policy language, since our work is only based on a theoretical scenario with a limited number of use cases. Our target was to define the specific requirements and variable definitions based on a consistent concept, that would have easy understandable semantics Requirements Since we concentrated on the policy translation, we focused on variable requirements, since most other requirements would be depending on the negotiation process. For a variable requirement, we defined that it always references only one public variable with VariableRequirement.variableName equal to Variable.variableDefinition.name. In the case of multiple matching public variables, we defined that only the first variable is referenced and in the following context is declared as variable. Furthermore, we defined the following basic definitions for the implementation of variable requirements (according to the specifications in section 4.4): VariableRequirement<T extends Type<?> Abstract class of all variable requirements with basic definitions of behaviour. boolean isfulfilled (SystemConfiguration systemconfiguration) Returns true if the requirement is optional and false if the referenced variable or its value is null. Otherwise the subclass has to specify the result. boolean isfulfillable (SystemConfiguration systemconfiguration) Returns true if the requirement is optional and false if the referenced variable is null. Otherwise

60 Implementation 54 the result of variable.isfulfillable(systemconfiguration) is returned. boolean isfulfillableforvalue (SystemConfiguration systemconfiguration, boolean trytosetvalue) Returns true if the requirement is optional and false if the referenced variable is null. Otherwise the subclass has to specify the result. boolean isconstant (SystemConfiguration systemconfiguration) Returns false if the referenced variable is null. Otherwise the result of variable.isconstant(systemconfiguration) is returned. float getreward (SystemConfiguration systemconfiguration) Returns 0 if the referenced variable or its value is null. Otherwise the subclass has to specify the result. Collection<VariableDefinition<?> > getpossibledependencies (Collection<Module> modules) Returns all variable definitions of the modules contained in modules, where VariableDefinition.name equals variablename of this variable requirement. For simple description of requirements, we can distinguish between two cases: an exact value or a value in a specific interval is required. Since our implementation is only based on a theoretical scenario, we decided to implement only those two cases for first evaluations. The following descriptions define a constant and an interval requirement, extending the basic definition of a variable requirement (described previously) by extending VariableRequirement: ConstantRequirement<T extends Type<?> > Defines a requirement that enforces a specific value for the referenced variable. T value The constant value required for fulfilling this requirement. boolean isfulfilled (SystemConfiguration systemconfiguration) Returns true if variable.value is equal value. Otherwise the method returns the value specified by the superclass. boolean isfulfillableforvalue (SystemConfiguration systemconfiguration, boolean trytosetvalue) Returns the value specified by the superclass and otherwise the result of variable.isfulfillableforvalue(systemconfiguration, value, trytoset- Value). float getreward (SystemConfiguration systemconfiguration) Returns 1 if the variable.value is equal value. Otherwise the method returns the value specified by the superclass. IntervalRequirement<T extends Type<?> > Defines a requirement that enforces a value between from and to for the referenced variable. Additionally, it is possible to define whether the value should be as large or as small as possible. T from The minimum value required for fulfilling this requirement. T to The maximum required for fulfilling this requirement. TargetValue targetvalue Defines the behaviour of this requirement and can be set to ANY (all values between to and from are treated equal), AS_LARGE_AS_POSSIBLE

61 Implementation 55 (a larger value is better than a smaller) and AS_SMALL_AS_POSSIBLE (a larger value is worse than a smaller value). boolean isfulfilled (SystemConfiguration systemconfiguration) Returns true if variable.value is greater equal from and smaller equal to. Otherwise the method returns the value specified by the superclass. boolean isfulfillableforvalue (SystemConfiguration systemconfiguration, boolean trytosetvalue) Returns the value specified by the superclass and otherwise the result of variable.isfulfillable(systemconfiguration). float getreward (SystemConfiguration systemconfiguration) Returns 0 if this requirement is not fulfilled. Otherwise the method returns 1 if targetvalue is ANY and a linear interpolated value between 0 and 1 in the case of AS_LARGE_AS_POSSIBLE (1 in the case of variable.value equal to) and AS_SMALL_AS_POSSIBLE (inverse of AS_LARGE_AS_POSSIBLE) Variable definitions In the sense of an easy understandable concept, we defined that each variable definition only uses local variables (of the same module) and those of referenced modules (using referenced services). This emphasizes the semantics of the referenced services in the case, where not only functionality but primarily information is needed. As defined in section 4.3 a variable definition defines the communication information and we identified three types of communication with the environment (output to the policy and input from the algorithm): input, output and input/output. We can ignore the case were a variable definition is neither declared as input or output, since this is provided by a private variable. While input and output define the possible communications or behaviour of the variable, the visibility finally decides whether the variable can be referenced by a variable requirement. For our basic implementation we first defined implementations for all of the three communication cases. First, the input variable definition is implemented as Attribute<T extends Type<?> > (extending BoundedVariableDefinition) and allows the algorithm to specify the value. Second, the input/output variable definition is implemented as BijectiveVariable<T extends Type<?> > (extending BoundedVariableDefinition) and allows the algorithm to specify the value and the policy to directly access this value. And third, the output variable definition is implemented as ComplexVariable<T extends Type<?> > (extending VariableDefinition) and allows the calculation of the value using local variables and those of referenced modules. All three variable definitions define active cases where the value is calculated or set by the algorithm. For the passive cases we defined two further variable definitions. First, the AliasProperty<T extends Type<?> > (extending VariableDefinition) allows to map an existing variable to another name. For example, a compression module might have a variable that defines the compression level. In the context of accessing data this variable should be referenced as output compression level and in the context of storing data as storage compression level. Second, the Constant<T extends Type<?> > (extending VariableDefinition) allows to define a constant value.

62 Implementation 56 Attribute and BijectiveVariable extend the bounded variable definition and thus define a possible range of the value. For defining a range our implementation provides a class for each data type: BooleanRange (BooleanType), ContinuousRange (FloatType), DiscreteRange (IntegerType) and SetRange (primarily StringType but also applicable for all other data types). Since, the different variable definitions differ significantly in their implementation, we did not define any general method definitions on the abstract layer of VariableDefinition and BoundedVariableDefinition. The following list defines directly the methods of the specific variable definitions that define their behaviour: AliasVariable<T extends Type<?> > Defines an alias variable for another variable of a referenced module. String referencedserviceservice Identifies the local referenced service of the module (together with referencedservicerequester) containing this variable definition. The referenced service is following referenced as referencedservice. String referencedservicerequester Identifies the local referenced service of the module ((together with referencedserviceservice)) containing this variable definition. The referenced service is following referenced as referencedservice. String variablename The name of the variable of the module that is connected to referencedservice. The variable is following referenced as variable and must have variable.variabledefinition.name equal variablename. T getvalue (SystemConfiguration systemconfiguration) Returns null if variable is null. Otherwise calls variable.update(systemconfiguration) before returning variable.getvalue(). boolean isconstant (SystemConfiguration systemconfiguration) Returns null if variable is null and otherwise variable.isconstant(systemconfiguration). boolean isfulfillableforvalue (SystemConfiguration systemconfiguration, Value value, boolean trytosetvalue) Returns null if variable is null and otherwise variable.isfulfillableforvalue(systemconfiguration, value, trytosetvalue). Collection<VariableDefinition<?> > getpossibledependencies (Collection<Module> modules) Returns a collection containing all variable definitions contained in modules contained in modules, where the module contains a provided service that matches referencedservice and the variable definition has VariableDefinition.name equal variablename. Attribute<T extends Type<?> > Defines an input variable. T getvalue (SystemConfiguration systemconfiguration) Returns the value of the variable that is associated with this attribute. boolean isconstant (SystemConfiguration systemconfiguration) Returns false. boolean isfulfillableforvalue (SystemConfiguration systemconfiguration, Value value, boolean trytosetvalue) Returns false if the range of this attribute contains value and otherwise true. If trytosetvalue and the range contains value, the method sets the value of the variable associated with this attribute to value.

63 Implementation 57 Collection<VariableDefinition<?> > getpossibledependencies (Collection<Module> modules) Returns an empty collection. BijectiveVariable<T extends Type<?> > Defines an input/output variable. T getvalue (SystemConfiguration systemconfiguration) Returns the value of the variable that is associated with this attribute. boolean isconstant (SystemConfiguration systemconfiguration) Returns false boolean isfulfillableforvalue (SystemConfiguration systemconfiguration, Value value, boolean trytosetvalue) Returns false if the range of this attribute contains value and otherwise true. If trytosetvalue and the range contains value, the method sets the value of the variable associated with this attribute to value. Collection<VariableDefinition<?> > getpossibledependencies (Collection<Module> modules) Returns an empty collection. ComplexVariable<T extends Type<?> > Defines an output variable that calculates a value using local variables and those of referenced modules. The calculation is executed by an implementation of the interface Calculation that provides the method. Type<?> calculate(systemconfiguration systemconfiguration, VariableDefinition<?> variabledefinition). Furthermore, the class contains a mapping from String to VariableBinding that maps from variable names used in the calculation to variable bindings that define local variables or those of referenced modules. The detailed definition of the calculation and its classes is explained in the following section. Calculation calculation The calculation defined for this complex variable. T getvalue (SystemConfiguration systemconfiguration) Returns calculation.calculate(systemconfiguration, this). boolean isconstant (SystemConfiguration systemconfiguration) Calls VariableBinding.isConstant(systemConfiguration, this.module) for each variable binding of calculation and returns true if all of those return true. Otherwise the method returns false. boolean isfulfillableforvalue (SystemConfiguration systemconfiguration, Value value, boolean trytosetvalue) Calls VariableBinding.isFulfillable(systemConfiguration, this.module) for each variable binding of calculation and returns true if all of those return true. Otherwise the method returns false. Collection<VariableDefinition<?> > getpossibledependencies (Collection<Module> modules) Calls VariableBinding.getPossibleDependencies(modules, this.module) for each variable binding of calculation and returns the merged results. Constant<T extends Type<?> > Defines a constant output variable. T value The constant value. T getvalue (SystemConfiguration systemconfiguration) Return value. boolean isconstant (SystemConfiguration systemconfiguration) Returns true.

64 Implementation 58 boolean isfulfillableforvalue (SystemConfiguration systemconfiguration, Value value, boolean trytosetvalue) Returns true if the given value is equal the own value. Otherwise the method returns false. Collection<VariableDefinition<?> > getpossibledependencies (Collection<Module> modules) Returns an empty collection Calculation As described in the definition of ComplexVariable, the interface Calculation is used to evaluate the value. This mechanisms allows very flexible evaluations, since Calculation can contain a script that allows arbitrary complex determinations of the value. The following descriptions explain the interface Calculation together with the interface VariableBinding that is used for bonding to the variables of the system configuration. Calculation<T extends Type<?> > Defines an interface for calculating complex variables. T type A template for the result retuned by calculate, since Java does not allow the instantiation of generic classes. Map<String, VariableBinding> variablebindings Maps variable names used in the script to VariableBinding instances that connect to variables (instances of Variable). T calculate (SystemConfiguration systemconfiguration, Moudle owner) Calculates a variable value using variablebindings. owner is needed to provide a VariableBinding information about the local environment for searching for variables. VariableBinding Abstract class that binds a variable to the context of a Calculation. A VariableBinding always resolves to an instance of Variable, which we reference as variable. Variable<?> getvariable (SystemConfiguration systemconfiguration, Module owner) Returns variable or null, if the referenced variable (defined by the subclass) could not be found in the context of owner. boolean isfulfillable (SystemConfiguration systemconfiguration, Module owner) Returns true if getvariable(systemconfiguration, source) returns not null. Otherwise the method returns false. Collection<VariableDefinition<?> > getpossibledependencies (Collection<Module> modules, Module owner) To be defined by the subclass. boolean isconstant (SystemConfiguration systemconfiguration, Module owner) Returns variable.isconstant(systemconfiguration) if variable is not null. Otherwise the method returns true.

65 Implementation 59 Following the principle for our variable determination, we implemented only two subclasses for the interface VariableBinding: one for accessing local (same module) variables and another for accessing variables of referenced modules similar to AliasVariable. The following description defines those two subclasses: LocalVariableBinding Binds to a local variable, associated with a variable definition of the module owner. String variablename The name of the referenced variable. Variable<?> getvariable (SystemConfiguration systemconfiguration, Module owner) Returns the referenced variable variable, which is the associated variable with the variable definition of owner, where VariableDefinition.name equals variablename. Collection<VariableDefinition<?> > getpossibledependencies ( Collection<Module> modules, Module owner) Returns all variable definitions of modules, where VariableDefinition.name equals variablename. ReferencedServiceVariableBinding Binds to a variable of a referenced modules (defined by a referenced service). This class works similar to AliasVariable. String referencedserviceservice Identifies the local referenced service of the module (together with referencedservicerequester) owner. The referenced service is following referenced as referencedservice. String referencedservicerequester Identifies the local referenced service of the module ((together with referencedserviceservice)) owner. The referenced service is following referenced as referencedservice. String variablename The name of the variable of the module that is connected to referencedservice. Variable<?> getvariable (SystemConfiguration systemconfiguration, Module owner) Returns the referenced variable of the module connected to referencedservice of the module owner. If the variable cannot be found, this method returns null. Collection<VariableDefinition<?> > getpossibledependencies ( Collection<Module> modules, Module owner) Returns a collection containing all variable definitions contained in modules, where the module contains a provided service that matches referencedservice and the variable definition has VariableDefinition.name equal variablename. For the implementation of Calculation, we used the Java Scripting Interface together with Luaj [20] (implemented in class LuaCalculation). Luaj is a lightweight interpreter written in Java for the scripting language Lua. We selected a scripting language for flexible support of different value determinations. Using an expression evaluator written in Java would be faster, but since we currently do not exactly know how properties of data management

66 Implementation 60 system can be calculated, we left an optimized implementation of Calculation to further work. To calculate the value, the class LuaCalculation first updates and gets the values from the variables bound by variablebindings. Afterwards, the defined script (class member variable script) is evaluated. Currently, the implementation only supports expressions that are on one line, since the implementation evaluates the string "result="+script. An important detail of our implementation is that Luaj does not support multi-threading at the moment. This means that we had to synchronize the access to the Luaj script engine, which allows only serial calculation of complex variables. This is a weakness of our current implementation, since the genetic algorithm is implemented using parallel threads, as we describe previously. Thus, the threads can be blocked by waiting for access to the script engine, which can lead to higher time consumption of the overall optimization algorithm. To optimize the performance of the class LuaCalculation and to overcome the problem of the synchronization, we furthermore introduced a cache to the implementation. The cache stores the resulting value for a given set of variable bindings with associated variable values. Thus, we can avoid the blocking (when using the script engine) and thus better utilize the worker threads, that run the genetic algorithm. The cache stores at maximum 1000 results and is cleared, when it reaches this number. We use this technique instead of a fifo-queue, since the genetic algorithm tries randomly new values and thus we have no clear pattern of reoccurring calculations Mapping between Java objects and XML representation Another important part of the implementation is the mapping between the inner classes (e.g. module and policy) and the XML representation defined by a customer or a developer. For this mapping, we used JAXB (Java Architecture for XML Binding), which is also contained in the Java Standard Edition. JAXB provides marshalling and unmarshalling between XML and Java Objects using annotations. Thus, it is easy to define new implementations of requirements and variable definitions, since it is not necessary to extend a parser. The schema of our policy language is defined in the namespace and the full version can be found in the appendix A.1. For mapping the data types (extending Type), we used the JAXB translation using XMLSchemainstance. Listings 5.1 and 5.2 show a valid example for a module and a policy. An example of XMLSchema-instance used to define an integer type (IntegerType) can be found on line 8 in listing 5.1. Furthermore, the module defines a provided service (line 3) and two variable definitions. The first definition, starting on line 5, shows a complex variable with a Lua-calculation that uses the second variable definition, starting on line 16, as input and multiples it with 100. The second variable definition defines a bijective variable with a possible input value from 0.5 to <?xml version=" 1. 0 " encoding="utf 8" standalone=" yes "?> 2 <module xmlns=" h t t p : // d b i s. cs. unibas. ch/ ubstore /management/ p o l i c y " version=" " name=" DatabaseProviderA "> 3

67 Implementation 61 4 <v a r i a b l e D e f i n i t i o n s> 5 <complexvariable v i s i b i l i t y=" p u b l i c " name="a"> 6 <l u a C a l c u l a t i o n> 7 <s c r i p t>c 100</ s c r i p t> 8 <type x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype " /> 9 <v a r i a b l e B i n d i n g s> 10 <v a r i a b l e B i n d i n g name=" c "> 11 <l o c a l V a r i a b l e B i n d i n g variablename="b" /> 12 </ v a r i a b l e B i n d i n g> 13 </ v a r i a b l e B i n d i n g s> 14 </ l u a C a l c u l a t i o n> 15 </ complexvariable> 16 17 <continuousrange> 18 <min>0. 5</min> 19 <max>0. 7</max> 20 </ continuousrange> 21 22 </ v a r i a b l e D e f i n i t i o n s> 23 </ module> Listing 5.1: Example of a valid module description. Listing 5.2 shows a policy that requires the variable "a" to be 0.7 and 0.8 and "b" exactly "X". The first requirement is defined as an interval requirement (starting on line 3). The target value is defined as "any" and thus defining an equal for all values from 0.7 to 0.8. The second requirement is defined as constant value, starting on line 7. 1 <?xml version=" 1. 0 " encoding="utf 8" standalone=" yes "?> 2 3 4 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 7</ from> 5 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 8</ to> 6 7 <constantrequirement b i n d i n g n e s s=" mandatory" variablename="b"> 8 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">X</ value> 9 </ constantrequirement> 10 Listing 5.2: Example of a valid policy.

68 6 Evaluation In this chapter, we present the evaluation of our implementation of the policy translation algorithm. The target was to evaluate, whether the policy translation algorithm is able to find suitable solutions, given a policy and a set of modules. As mentioned previously, the policy system together with the modular cloud storage system should offer customized data management system without having to compare multiple solutions or evaluating how to compose modular data management system. Thus, the time consumption of the process of getting from requirements to a matching solutions should be reduced. Therefore, we had to split the evaluation in a qualitative and a quantitative part. The qualitative part analysis, whether the policy algorithm is able to produce a valid solution that fulfills the policy as good as possible. An ideal evaluation would be to analyse real case requirements of companies, formulate them as policies and compare the solutions, enforced by the policy system, during runtime to the requirements. To evaluate the quality, the enforced solutions could be compared to solutions that were manually composed by experts. Since we do not have access to the real world resources, we defined artificial requirements based on our scenario from Chapter 2. The quantitative part of the evaluation analyses the runtime performance of the policy system. Since the ideal policy system should work as an "on-demand" system for customized data management systems, we evaluated the time consumption by the policy translation algorithm. For making assumptions about real world scenarios, we based also our quantitative evaluation on the artificial requirements which are based on our scenario. 6.1 Evaluation Preparation Based on the scenario, we defined artificial policies and modules. The policies are necessary, since we do not have access to real world scenarios. The set of modules is necessary, since at the moment, we do not have access to a modular cloud storage system that allows a large variety of different solutions. If we want to define modules or policies, we have to agree on a terminology and characterisation of data management systems. As argued previously, at the moment there is no such terminology that spans the whole area and is widely accepted. Therefore, we identified a

69 Evaluation 63 terminology based on our scenario Properties of data management systems For identifying properties/characteristics of data management systems, we mainly based our assumptions on two sources: well known properties from literature (ACID, CAP) and properties that discriminate the systems in our scenario. For each property we defined the value type and if possible a range. Our assumptions are based on a theoretical scenario and modular cloud storage system described in chapter 2. Thus, we outlined possible property definitions. In an evaluation using a real world modular cloud storage system, it would be necessary to identify and analyse the properties by evaluating the existing modules Properties from literature The ACID properties are well known properties for every database system. When it comes to distributed database systems (as provided by cloud storage), the CAP-Theorem plays an important role, too. To all of those properties applies, that they have a best and a worst case. Thus, they can be limited to the interval [0, 1], where 1 means a strong enforcement. The following table describes the properties in detail: Name Description Range atomicity all or nothing execution [0, 1] availability all queries are handled [0, 1] consistency all nodes see the same data at the same time [0, 1] (for simplicity the consistency of ACID is not addressed explicitly) durability no data loss [0, 1] isolation transactions do not influence each other [0, 1] partition tolerance system operates despite failures [0, 1] Furthermore, we added properties from the area of distributed databases and their usage in cloud storage. For example, "elasticity" and "scalability" play an important role in cloud system, but it is difficult to define a measurement for such properties. For our evaluation, it is sufficient to assume measurability and leave the exact definition of a measurement to real world scenario evaluations. We selected the following properties as a representative set: Name Description Range elasticity the system is able to adapt to system [0, 1] load redundancy number of copies {1,..., n} replication speed best, if immediately replicated (eager) [0, 1] scalability in our case the size to which the storage space is able to scale {1,..., n} (in GB)

70 Evaluation 64 Finally, we added two basic properties: costs and latency. The "costs" are a basic property of every product and are required by the algorithm. We added (access-)"latency" as a measure of performance and because the calculation involves many modules (all modules that involved in accessing data) and thus provides a complex update task for the genetic algorithm. Name Description Range costs the total costs of the system {0,..., n} latency the roughly estimated latency when accessing a data item {0,..., n} (in ms) Scenario specific properties Specific for our scenario, but also occurring in other scenarios, are properties that address business interests and "services" that are added to a data management system. For legal reasons, a weather service might require the nodes to be placed in a certain country and that the system is not multi tenant. The "services" are significant differences between the six data management systems of our scenario. For example, the system for forecast data needs an archiving service, since the daily growing data would slow down the system. The type of supported indexing can also be interpreted as a service that adds functionality to the data management system. For example, the archive needs OLAP support for generating statistics, while the queue towards the sensor network does not need indexing at all. We selected the following properties:

71 Evaluation 65 Name Description Range archiving whether archiving is enabled {true, false} authentication whether authentication {required, not required} mechanisms are required erasing whether old data is deleted {true, false} geo-location the geographic location of {Switzerland, USA} (examples) the system nodes indexing type of index support {no, olap, b-tree} (examples) location location inside the cloud {near-boundary, inside} system (whether the system should be reachable over web) multi tenancy whether the system nodes true, false (examples) can be shared with other customers accesslogging the supported type of logging {no, debug, secure} (examples) for accesses (for example a bank might need logs of all accesses) storagelogging the supported type of logging for data manipulation {no, debug, secure} (examples) Extending property domains The properties in the previous sections address mainly non-functional properties or properties that are specific for our scenario. In the following, we added other basic properties from the domain of how the data is accessed: Name Description Range access data type the data type of returned data {SQL, JSON, XML, CSV} (examples) compression the quality of compression > 1 (1 means no compression) freshness whether the system is freshness true, false aware interface type the protocol type of the data {http, proprietary} (examples) access interface storage data type the type of stored data (for example as plain text files or tables) {KeyValue, Table, XML, File} (examples)

72 Evaluation Modules for a modular cloud storage system For the policy language, we defined three basic types of modules: "base", "standard" and "extension". We followed the same principle for designing modules for the evaluation. The motivation for base modules is the need to define an environment for the data management system. Only together with the information about the hardware is it possible to make statements about the properties of a data management system. For example, the performance depends on the hardware and the availability on the number of nodes. Therefore, we decided to start with two base modules: Hardware System The Hardware module provides information about the physical properties of each node (e.g. memory, cpu and geo-location). The System provides information about the system itself, like the number of instances used. Both modules provide information, but also have some of the important attributes that can be selected by the algorithm to match the policy. Those two modules do not directly belong to the data management system, but the data management system depends on them. The data management system runs on the hardware and is embedded in a larger system. For enforcing a data management system, our System module provides a "blueprint" by referencing basic services that should always exist in a data management system. Those basic services should define a complete but abstract data management system. For being complete, a data management system can be defined having a service that controls the hardware (in our case mainly the storage) and an access service that handles queries. Since we also want be able to compose distributed data management systems, we added a third basic service, that handles the connections and communication between multiple nodes. Figure 6.1 shows the two base modules together with the "blueprint" of a data management system Services and modules In a first step, we took the three basic services and analysed the previously listed properties of data management systems. For properties that belong to the domain of one of the three basic services, we defined new services if different values of the property would require significantly different implementations. Once again, we first added the base modules themselves as services: Hardware System For the Access service, we added services that should define properties like the interface type, the authentication and also the decision on where to place data during read and update operations. Therefore, we defined the following services in the domain of the Access service:

73 Evaluation 67 Figure 6.1: The two base modules (Hardware and System) together with the three basic services (dashed contour). The Access service should access data using the Distribution service which uses the Storage service. The connections between the modules and services are only recommendations. Access Interface (defines the property interface type) Authentication (provides an authentication mechanism) Account Management (extension for Authentication) Concurrency Control Protocol (schedules transactions and mainly influences the ACID properties) The Distribution service and referenced services define the distribution of nodes and data together with the communication mechanisms. Those functionalities can be varied in different combinations and thus should be split into different services. The domain of the Distribution service can delegate the definitions and implementations to the following services: Distribution Routing (defines the communication between nodes) Replication Control Protocol (decides which nodes are involved in transactions and thus mainly influences the properties of the CAP-Theorem) Partitioning (decides on how the data is distributed) The Storage defines how the data is stored using the hardware. Most properties are defined on higher levels and thus we defined only this basic service in this domain: Storage For the other properties, that define additional extensions to the data management system, we added additional services:

74 Evaluation 68 Compression (adds a compression method) Indexing (adds indexing support) Freshness (adds freshness awareness) Durability (defines how old data is deleted) Archiving (defines how old data is archived) Logging (defines how something is logged) For each of the services, we defined different modules that provide a large variety of possible module compositions. Thus, we have a reasonable set of modules that approximates a real world scenario. This enables us to evaluate the policy translation algorithm and make first assumptions about real world scenarios. The service definitions above result in a total of 18 services, for which we we defined 39 modules. The most modules are different variations of the three basic services and the related services. For the extending services, we defined in most cases only one module that enables a certain extending functionality. The full list of all modules together with their provided services, referenced services and influenced properties can be found in the Appendix A Important variable definitions The full XML files for the 39 module descriptions would be too much for including it in this thesis. Therefore, we describe two important variable definitions (properties) in the following list: costs As we can see in the module overview in Appendix A.3, all modules except the System module define a variable "costs". As described for the algorithm, we use this variable for calculating the costs for the fitness calculation. Most modules have constant costs, while the Hardware module calculates its costs based on instances defined by the System module and its other properties (e.g. scalability and multi tenancy). latency The latency is calculated using complex variable calculations that include most modules. Thus, when calculating the latency, a large amount of complex variables has to be update by executing scripts and thus provides a complex and time intensive case for our evaluation Policies Given the set of properties from section and our defined scenario, we defined policies for the evaluation. For each policy, we selected the important properties and their characteristics that specify the required data management system. For time reasons at the end of this master thesis, we only evaluated three of the six data management systems of the weather service scenario. To evaluate different conditions, we selected the archive and also the queue, since those systems vary in many requirements.

75 Evaluation 69 Additionally, we selected the raw data system as the third data management system, since it has requirements similar to a traditional RDBMS. The following subsections sketch and describe the policy definition for each of the three data management systems. The policy language representation of those three policies can be found in the Appendix A Archive The archive is a large-scale system that most importantly should have a high durability. Another important aspect is the usage for generating statistics, which results in the requirement for OLAP support. The following list sketches the important properties that we specified for this policy: availability the archive does not need a high availability, since it is not frequently accessed compression since the archive has to store a large amount of data, a good compression would be beneficial durability the archive must avoid data loss indexing OLAP support is required scalability the archive is a large-scale system Queue The queue is a small-scale system that must provide high availability and low latency for high throughput. The following list sketches the important properties that we specified for this policy: authentication since the queue is accessible over web, the queue requires authentication elasticity the system should have a high elasticity, since it should always scale with the sensor network traffic latency the system requires a low latency for a high throughput scalability the archive is a small-scale system Raw Data The data management system for raw data is a mid-scale system that is similar to a RDBMS. Since, the data can be accessed by customers, the system requires a strong consistency. Raw data can be corrected manually, since there might be errors in sensor data. To retain the quality of the data, accesses to the system should be logged. The following list sketches the important properties that we specified for this policy: authentication since the queue is accessible over web, the queue requires authentication consistency strong consistency

76 Evaluation 70 access logging for data quality, accesses to the system should be logged scalability the archive is a mid-scale system 6.2 Evaluation Environment In the following sections, we will describe the detailed qualitative and quantitative evaluation. For both evaluations, we used the same environment based on the environment of our implementation described in chapter 5. The tests for the evaluation were implemented using JUnit [21] (version 4.8.1) and we used the junit modules of Equinox to run JUnit in the OSGi environment. The implementation of the test cases uses directly the classes FilteredServiceMatchingAlgorithm and GeneticAlgorithm. This allowed us to controlled execution of the two algorithm phases, without having to deal with the remaining policy system. For executing the JUnit test, we used a system with 4GB memory and an Intel Core i7 M620 at 2.67 GHz, with 2 hyper-threaded cores (resulting in 4 virtual cores). This means that our genetic algorithm was executed using 8 parallel threads. We used twice the number of cores, since the implementation of LuaCalculation does not support multi-threading and thus we had to synchronize the access to the script engine. Since, the genetic algorithm is executed in parallel for multiple system configuration, it comes frequently to blocking in the case, where two system configurations need to evaluate a complex variable. 6.3 Quality of Solutions For evaluating the quality of our implementation, we had to compare the result of the policy system with the "real" requirements of the customer. The "real" requirements are not the requirements defined in the policy or other formulations, but the requirements that the customer wants to express with the policy. The ideal qualitative evaluation would compare the "real" requirements with the enforced data management system, shown in figure 6.2. As we discussed previously, we do not have access to those resources. We could have developed artificial "real" requirements based on our scenario. But also in this case, we could not use them to evaluate our system configurations, since we do not know, how they are related to the enforced data management system. For the above reason, we had to reduce our evaluation to the comparison between the resulting system configurations and the policy (see figure 6.2). Given that future evaluations of the policy language would result in a good mapping between "real" requirements and policy and between enforced data management system and system configuration, the results of our evaluation would apply to the runtime evaluation between "real" requirements and enforced system configuration Setup Our target for the qualitative evaluation was to try different settings of the free parameters defined in section 4.4 for the optimization using the genetic algorithm. We executed the test

77 Evaluation 71 Figure 6.2: Relations between "real" requirements, enforced data management system and the abstract layer defined by the policy and the system configuration (containing the module descriptions). The dashed lines show possible evaluations. as follows: 1. call FilteredServiceMatching.solve for 10 times 2. call GeneticAlgorithm.solve for 10 times for each settings (see next subsection) We used the resulting possible system configurations of FilteredServiceMatching.solve as input for the genetic algorithm, which is valid because the first phase of our algorithm is deterministic and results always in the same possible system configurations Free parameters During the test execution, we set the maximum number of iterations to 1000 and the threshold for stalling to 0.1. This means, that the genetic algorithm terminated when the fitness of the best individual did not improve at least by 0.1 during 50 iterations or after 1000 iterations. The number of children was set to the population size minus 10 and the maximal number of results returned after optimization was 10, too. We kept the number of children fixed at the number of results, to enforce a stable optimization of the 10 best system configurations. Another reason for this decision was, that the number of children should be greater than 0, since we know only little about the value calculation for variables. Other effects could be examined by changing the population size and thus the percentage of discarded system configurations. For the other free parameters, we used the following values: costsweight (0, 0.5): We evaluated the algorithm, with and without taking the costs into account. With this we wanted to investigate, whether it is necessary, to take costs into account and which are the effects on fulfilling the policy.

78 Evaluation 72 mutationprobability (0.2, 0.5, 0.8): We tried three different values of the mutation probability for sampling different extrema of this parameter and thus evaluating the robustness of the algorithm to different frequencies for random mutation. Since we set the number of children to 10, different mutation probabilities should not effect the quality of the results but could effect the number of iterations until termination. In the worst case, a high mutation probability could avoid finding a good solution. populationsize (20, 40, 60): The different populations sizes were used for testing, whether a larger set of initial individuals could raise the probability of finding a good solution. This could be due to the fact, that a larger population size means a larger probability of starting near the optimum. We tested the genetic algorithm for each combination of values as settings. This resulted in a total of 18 settings and thus a total of 180 test run for each policy Results Reward and costs For the test results explained above, we collected data about fitness, reward and costs for the best results. Figure 6.3, 6.4 and 6.5 show the resulting reward for the different settings (defined in the previous section). It is important to note, that we took the reward instead of the fitness (which can contain costs and reward), since only the reward shows how good the algorithm could fulfil the requirements defined by the policy. As we can see in all three diagrams, the best system configuration returned never reaches a reward of 1.0. Furthermore we can see, that the algorithm is always able to return an equally rewarded solution, independent of different settings. Note at this point the two different scales used for mean and standard deviation in the diagrams. Despite the almost equal rewards, we can observe differences in the standard deviation. While it is not obvious, which effect the mutation probability and the population size have, the costs weight shows clearly, that the standard deviation is always larger, when taking the costs into account for the fitness calculation. As described earlier, one of our targets was to investigate the effects of taking the costs into account. While the Figures 6.3, 6.4 and 6.5 show, that costs do change the optimized reward only slightly, the following Figures (6.6, 6.7 and 6.8) show the effect on the resulting costs. The tree figures show clearly, that in case of not taking the costs into account, the mean costs and the standard deviation are higher compared to the case of taking the costs into account. It is obvious, that the costs are not under control by only optimizing the reward. But furthermore we can see, that in the case of the "archive" policy the requirements enforce indirectly low costs. Similar to the reward evaluation, we see that it is not clear, which effect the mutation probability and the population size have on the cost optimization. But again we can say, that these two parameters do not directly disturb the optimization process.

79 Evaluation 73 Figure 6.3: The reward of the best result (according to the overall fitness) returned by the policy translation algorithm (first and second phase) given the "archive" policy. The test was run for multiple settings (each 10 times). The diagram shows the results for each combination of mutation probability and population size. The left bars indicate the mean and the standard deviation for running the genetic algorithm without taking the costs into account (for the fitness calculation). The right bars indicate the corresponding values for running the algorithm with taking half the reward and half the costs into account. Figure 6.4: The reward of the best result returned by the policy translation algorithm (first and second phase) given the "queue" policy. The test was run as described previously (see figure 6.3) Analysis of system configurations We do not know the ideal module composition, since an "artificial" optimum would make only sense, if an expert would compose modules independent of our module descriptions. Therefore, we did not go deep into analysing the system configurations returned by the algorithm, but searched the resulting system configurations for distinctive modules or variables. One exemplary conspicuous element is the combination of the MasterSlave module and the instance variable of the module System. As we found out, the instance variable was often

80 Evaluation 74 Figure 6.5: The reward of the best result returned by the policy translation algorithm (first and second phase) given the "raw data" policy. The test was run as described previously (see figure 6.3). Figure 6.6: The costs of the best result returned by the policy translation algorithm (first and second phase) given the "archive" policy. The test was run as described previously (see figure 6.3). The costs are resulting from the module descriptions and are not related to an existing currency. set to 1, which does not fit the semantics of a master-slave system. This fact rather indicates an optimization problem, than a problem with the capabilities of the policy language for the module description. In a real world scenario, it should be clear, that a master-slave system requires normally 2 instances. Since our module descriptions do not offer the possibility to express system internal policies, it is obvious that the optimization algorithm reduced the number of instances for reducing the costs. For the "archive" policy, we further evaluated the used modules for the returned system configurations. As it is clear, the system configuration always included the System and the Hardware module, since they are base modules and thus mandatory. Further modules which were always included are: ComplexAccess, FileStorage, MultipleNodes, OlapIndexing

81 Evaluation 75 Figure 6.7: The costs of the best result returned by the policy translation algorithm (first and second phase) given the "queue" policy. The test was run as described previously (see figure 6.3). The costs are resulting from the module descriptions and are not related to an existing currency. Figure 6.8: The costs of the best result returned by the policy translation algorithm (first and second phase) given the "raw data" policy. The test was run as described previously (see figure 6.3). The costs are resulting from the module descriptions and are not related to an existing currency. and TokenBasedAuthentications. This shows, that those modules are directly bound to requirements of the policy. Furthermore, the system configurations included other modules that occurred infrequently. This happened under different conditions. For example, the AdvancedCompression and the DelayedCompression occurred more frequent, when taking the costs into account. By looking at the module descriptions, it shows, that compression modules are cheaper than storage for the Hardware module. Other modules like StaticRouting and DistributedHashTable occurred both in nearly half of the system configurations. For the "archive" policy, both modules provide similar features and both modules provide the service Routing, resulting in random

82 Evaluation 76 selection between those two. 6.4 Performance of the Policy System As described previously, we executed the first and the second phase of the algorithm multiples times for the qualitative evaluation. For the quantitative evaluation we used the data gathered from the qualitative evaluation, since the test runs provided enough data Results First phase The following tables (6.1, 6.2 and 6.3) show the time consumption and the resulting system configurations of the different steps of the first phase of the algorithm (see chapter 4). Since the first phase is deterministic, we evaluated it 10 times without different settings, which only influence the second part. As it is obvious, the three tables show that the first service matching step always results in the same number of possible system configurations, since this step does not depend on the policy. For the first filtering we can see, that it is able to reduce the number of system configurations significantly for all three policies. The extension step raises the number again, which is obvious, since we explained previously, that this step tries to extend the system configurations with all possible combinations of extension modules, resulting in exponential growth (worst case). Since, our extension modules do not require additional services other than basic services, the second service matching does not raise the number of possible system configurations. Finally, the second filtering again filters some solutions that became decidable through the extension step. For the time consumption it shows, that especially the first service matching and filtering consumes most time. The other steps can build on the results of the previous steps and consume significantly less time. But while the time consumption for the first steps are almost equal for the three policies, it shows that the second step differs. Looking at the results, we can see, that the "raw data" policy consumes more time for the second step than the other two. This is expected, since the "raw data" policy also defines more requirements. As a summary, we can say that the first service matching consumes most time but overall the first phase takes not long and finishes for all three policies in less than 5 seconds Second phase The second phase of the algorithm consists of repeated updates of the system configuration, including complex variables that run Lua scripts. The following three figures (6.9, 6.10 and 6.11) show the time consumption of the second phase for the different policies and settings. For gathering the data, we executed the second phase 10 times for each policy and setting, resulting in 180 runs for each policy. Similar to the diagrams for the reward and the costs, the following diagrams show the mean and the standard deviation for taking and for not taking the costs into account for each combination of mutation probability and population size.

83 Evaluation matching 1. filtering "archive" policy time in ms (mean) time in ms (stddev) resulting sys. conf Table 6.1: The time consumption of the different steps of the first phase of the policy translation algorithm, given the "archive" policy. The test was run 10 times and the first row shows the mean of the resulting time measurements. The second row shows the standard deviation and the third row the resulting number of system configurations after the executed step. extending 2. matching 2. filtering 1. matching 1. filtering "queue" policy time in ms (mean) time in ms (stddev) resulting sys. conf Table 6.2: The time consumption of the different steps of the first phase of the policy translation algorithm, given the "queue" policy. See the description of table 6.1 for further explanations. extending 2. matching 2. filtering 1. matching 1. filtering "raw data" policy time in ms (mean) time in ms (stddev) resulting sys. conf Table 6.3: The time consumption of the different steps of the first phase of the policy translation algorithm, given the "raw data" policy. See the description of table 6.1 for further explanations. We can clearly see, that the population size has a significant impact on the time consumption, while the mutation probability does not. Furthermore, we can see that the calculation of costs has a visible impact on the time consumption too, since the costs are not calculated in the case of setting the costs weight to 0. Comparing the different policies, it shows that the "raw data" policy takes again longer than the other two. Here it is shown clearly, that this depends on the policy, since the "archive" and the "raw data" policy have the same number of possible system configurations as input. For making assumptions about the scalability of the algorithm, we also analysed the time consumption of the genetic algorithm, while the previous figures were related to the overall extending 2. matching 2. filtering

84 Evaluation 78 Figure 6.9: The total time consumed for executing the second phase of the algorithm (optimization) given the "archive" policy and thus, in this case for 960 possible system configurations (previously returned by the first phase). The test was run for multiple settings (each 10 times). The diagram shows the results for each combination of mutation probability and population size. The left bars indicate the mean and the standard deviation for running the genetic algorithm without taking the costs into account (for the fitness calculation). The right bars indicate the corresponding values for running the algorithm with taking half the reward and half the costs into account. Figure 6.10: The total time consumed for executing the second phase of the algorithm (optimization) given the "queue" policy and thus in this case, for 576 possible system configurations. The test was run as described previously (see figure 6.9). time consumption of the optimization phase. The gathered data was again taken from

85 Evaluation 79 Figure 6.11: The total time, consumed for executing the second phase of the algorithm (optimization) given the "raw data" policy and thus in this case, for 960 possible system configurations. The test was run as described previously (see figure 6.9). the qualitative evaluation, but we selected only specific runs of the genetic algorithm for the "raw data" policy. For making the values comparable, we furthermore selected only runs of the genetic algorithm that terminated after 50 seconds. Figure 6.12 shows the time consumptions for multiple runs of the genetic algorithm, sorted by the number of variables that had to be updated for each system configuration for the fitness calculation. The figure shows, that overall there is a trend towards higher time consumption for larger numbers of variables to be updated. But on the other hand, the diagram shows, that for the same number of variables the time consumption can vary significantly. It also shows, that the time consumption varies more for a larger number of variables. The large variation indicates, that not only the number of variables influences the time consumption.

86 Evaluation 80 Figure 6.12: For this test we took all 10 times 960 runs of the genetic algorithm with the settings described in the diagram title. The diagram shows all runs that terminated after 50 iterations. This means that they were terminated because of stalling and did not succeed in optimizing right from the beginning, since the stalling limit was set to 50. The algorithm still had to calculate the fitness for 50 times. The diagram sorts the runs by the number of updated variables and shows the consumed time after the 50 iterations. Additionally, the diagram shows the trend of the time consumption using a geometric function.

87 7 Discussion and Conclusion At the moment, the area of modular cloud storage systems (or modular cloud systems in general) is still in its beginnings. As mentioned in chapter 3, there are efforts in supporting modular systems in this area, but the compilations are missing. While systems like Cloudy are on the way to exploit the possibilities of modular cloud storage systems, it is still not clear how such systems can be used efficiently. In the thesis, we presented the following components for the process of getting from requirements to a data management system (see figure 7.1): 1. a concept for describing and defining modules 2. a policy language for describing policies and module descriptions 3. a system that translates from a policy to an initial system configuration 4. a system that enforces a system configuration We derived modules and policies from a theoretical scenario and used them for evaluating the implementation of our policy translation. We showed that the system is able to compose qualified solutions but also that further evaluations using real world scenarios will be necessary. Figure 7.1: Process of getting from requirements to a data management system (solution) using a policy. The policy system is part of the modular cloud storage system.

88 Discussion and Conclusion Quality and Performance As the evaluation showed, the system is able to produce constantly and qualitative good solutions with respect to the defined policy and the possibilities of the modules. At the moment, we can only estimate the quality of the enforced data management system, with respect to the "real" requirements that maximize the benefit of the customer. But given that the policy is a good representation of the "real" requirements and the module descriptions provide a good estimate of the enforced data management system, we can assume that our system would constantly produce well rewarded data management systems Quality When we speak about good solutions we can not fully verify that the algorithm always finds a nearly optimal solution. But by observing the small variances of the results and the random nature of the genetic algorithm, we can say that there seems to be only little space for better solutions, since those should be found with a high probability by random. We evaluated different values for the free parameters of the algorithm. The following list discusses the effect that each property has: costs weight We evaluated two different values (0 and 0.5) for the weight assigned to the costs during fitness calculation. The target was to show, whether it is qualified to take the costs into account. The evaluation shows, that optimizing the costs is necessary, since otherwise the costs are not always under control, although it can happen for specific policies. We can also observe for our case, that minimizing the costs, does not mean that the quality of the solution is reduced. The reason for this seems to be, that the algorithm tries enough possibilities to find a solution that maximizes the reward but also the costs. mutation probability It is nor clear, which mutation probability gives the best results. But it is important, that a high mutation probability does not directly mean a higher variance of the best results returned. This can be explained, since our variant of the genetic algorithm keeps always some of the best individuals and thus avoids discarding good solutions. population size The results of varying the population size show that a large population size does not automatically mean better results. We can follow, given our scenario and problem size, that a relatively small population size seems to be sufficient to find a good solution. It is also important that a larger population size means a significantly higher time consumption, which is unnecessary regarding the uncertain better results. The above descriptions show that the algorithm seems to be robust and that it is qualified to take the costs into account for the fitness calculation. But we also have to keep in mind, that for other scenarios the mutation probability and the population size could have significantly more influence on the results. This is likely, since with a growing search space, the probability to find good parameters would be reduced. Thus, the two free parameters could be helpful to cover the search space.

89 Discussion and Conclusion Performance We have seen, that the algorithm is able to produce appropriate solutions in a time, that would allow "on demand" creation of customized data management systems (in our case approximately under 5 minutes). This means, that the policy system is able to reduce the time and the complexity of the process of getting from a policy to a solution. Requirement specification is also necessary for traditional evaluation of different provided solutions and requirement engineering suggests, that the formulation of requirement is possible. Therefore, we can also estimate, that the whole process of getting from an idea to a solution is faster compared to the traditional approach of comparing and selecting by hand. Looking at the details of the quantitative evaluation, we can see, that the second phase of the algorithm takes much longer than the first phase. The main reason for the high time consumption of the second phase is clearly the variable calculation using Lua scripts. But it is also important that the time consumption for optimizing each system configuration does vary a lot, while the time consumption for matching services and filtering stays nearly constant for the same set of modules. This means that it is important to reduce the number of possible system configurations provided to the optimization, to avoid the unpredictability of the genetic algorithm. This is also emphasized by the result, that the time consumption does not only depend on the number of updated variables. The time consumption seems to depend also largely on which variables are updated. The results show that, in our case, the filtering is useful for reducing the possible system configurations drastically. The results also show, that is is important to keep the number of extension modules low, since the extension with those modules lets the number of possible system configurations grow exponentially (in worst case). Thus, the main effort should be put into the optimization of the analysis of fulfilling solutions (filtering), since otherwise the extension modules could lead to a high memory consumption and the optimization to a high time consumption. It is also visible, that the first service matching results in a large number of possible system configurations. But we believe, that our 39 modules used for our scenario is a good estimation for real world scenarios, while this is uncertain for the variable definitions. Furthermore, the number of possible system configurations does not grow exponentially, as long as the number of "monopoly" services is high, which seems realistic. Thus, the critical part of the first phase is mainly the extension of the system configuration using extension modules Summary As a summary, we can say that our policy translation is robust and seems to find good solutions. The robustness is a good characteristic for other scenarios and we can estimate that for real world scenarios, the system could be able to find good solutions, given that the policy and module descriptions are good estimators. But our evaluation also shows, that the performance is sensitive to the module descriptions and it is not clear, whether the system will perform well for real world scenarios. The main factors for this uncertainty are the exponential growth while extending the system configurations and the unpredictability of the genetic algorithm together with the usage of

90 Discussion and Conclusion 84 scripted calculations. 7.2 Policy Language For this thesis, we were only able to evaluate the policy language on a subjective level. Our opinion is, that the description of modules is much more complex than describing requirements on a high level. We estimate, that real world scenarios would have much more sophisticated requirements. But since much effort has been put into requirement engineering over the last decades, it is likely that the policy language can be adopted to real world scenarios. We see the difficulty in describing modules, since it is more difficult to express for example non-functional properties, than to verify them during runtime. Therefore, we believe, that is is important to see the module descriptions and thus the system configuration always as an estimate of the implementation and thus the enforced data management system. This suggests, that runtime verification of the policy is necessary to really provide a system configuration, that matches the policy. Our current definition of the policy language used for the evaluation showed problems in describing all characteristics of modules. For example, we observed that the current implementation does not allow a module to define constraints to referenced services. This is mainly visible in the case of the previously mentioned MasterSlave module, which requires an exact number of instances. Another problem is that, with the current implementation, it is difficult to express the support of multiple definitions of the same kind. For example, it is difficult to express, that the system should support XML and JSON for data output. Those weaknesses are problems of the current implementation, but the flexible concept of designing requirements and variable definitions does not obstruct the support for those expressions. Overall, we think that the policy language enables the customer to select a data management system on a much higher and abstract level, compared to evaluating different solutions of different providers or evaluating and composing modules of a modular cloud storage system. Thus, it is likely that the policy language enables more customers, without deep expertise in the mechanisms of data management systems, to obtain a matching data management system. 7.3 Properties of Cloud Storage Systems As we discussed previously, there exists no standardized terminology for data management systems. Furthermore, it is not standardized how most of the possible properties can be measured or calculated. The consequences for our system are: 1. we can not define a complete set of properties that characterise all possible data systems 2. we can not restrict the solution space for calculating properties For the first point we developed an extensible policy language. To overcome the second point, we developed a policy translation algorithm that uses a genetic algorithm, which is

91 Discussion and Conclusion 85 able to optimize configurations problems without knowledge about the solutions space. The extensible design of the policy language could help to easily support new developments that require new properties or measurements. Furthermore, the flexible interpretation of modules, services and variables allows to adapt easily to other scenarios (with different interpretations of properties) which still can be handled by our algorithm. Thus, our system offers a robust framework for further research and work in this area, but this is also connected to disadvantages. An ideal target for using policy systems, would be the possibility to transmit the policy to different providers and then compare the offered data management systems. The extensible design of our system makes it difficult to compare different offers, since we cannot build on a standardized set of properties. Another disadvantage of our algorithm is the complexity, since depending on the module description and policy design, the time consumption for composing system configurations can grow exponentially. Thus, the obscurity of properties of data management systems leads our system to support flexible adaptation but also avoids increasing efficiency and reduces comparability between solutions of different modular cloud storage systems. 7.4 Translation between Policy and Properties We defined our module descriptions and policies based on the analysis of our scenario. But we also have to notice, that this an artificial case, where the module developer and the customer are the same person. In real world scenarios it is very likely, that the developer and the customer have different views on properties of data management systems. Additionally, the different views can be very different, because of the non-uniformity of the terminology and the uncertainty on how to measure the properties. This fact holds great uncertainty for the application in real world scenarios. One interpretation emphasizes the problem with the properties of data management system and urges the standardization. Another interpretation would be the need of flexible matching between different names and measures. Given the fact, that the area of database and thus data management systems is well established and investigated, it is likely, that it is impossible to create an uniform terminology or there were yet no need for this. We believe that the first possibility might be very likely and also the standardization of a terminology could quickly become obsolete, regarding the huge variety of data management systems. 7.5 Lessons Learned OSGi One problem during the implementation of this thesis work, was that OSGi is not thought for fully manually composition and configuration of modules (services). This makes it difficult to use existing mechanisms for our policy system. On the other hand, we observed that OSGi defines basic models on how to connect services, but there also exist multiple extensions, that change the service composition. For example, the most basic way is to use the API methods for registration, getting services and service trackers, while descriptive services are

92 Discussion and Conclusion 86 a XML based approach that is widely used. Implementations of the OSGi standard like Equinox or the Spring Framework also use own definitions of services, like we did for our UBstore modules, that are compatible with the policy system Real world scenarios It is difficult to develop and evaluate a system that is build on theoretical assumptions mostly. This addresses mainly the missing implementation of a modular cloud storage system but also the uncertainty of properties and the missing of real world scenarios. As we explained, we cannot be sure about how the system would perform for requirements of companies, since we can only very roughly estimate their requirements and the properties commonly addressed. This is one of the most important problems, that should be covered by future work Time Mainly because of the theoretical basis for this thesis work, it was difficult to set time limits for investigating the different topics. In the case of designing the system and preparing the evaluation, the uncertainty about properties, left an open area for further work. For example, we tried to make a survey about which properties are commonly used to describe data management systems, but quickly noticed, that questions concerning this area can be very unclear. This once again emphasizes the uncertainty in this area, and suggest that it needs much more effort to gather information about properties of data management systems. Also the system design analysis, whether the model for modules and variables is sufficient for different scenarios, needed much time. Once again the theoretical assumptions inhibited a straight design of the algorithm, while it would have been easier to verify the model against a large set of existing modules, which would be less time consuming. The examples above, in combination with the time limit of the master thesis showed, that it is important to keep a good structure during the work and to see were things are getting too deep into details, which is not always easy in an open research area like modular cloud storage systems.

93 8 Future Work As the discussion in chapter 7 suggests, this thesis offers a large amount of future work. The following subsections present the main topics that could be investigated for future work. 8.1 Properties of Cloud Storage Systems The uncertainty of a terminology and the associated measurements are one of the big topics, that need further research. The target should be to evaluate whether the space of possible properties could be limited or even standardized. The main profit of this investigation would be new findings on how to optimize the algorithm. But the findings would also enable to improve the policy language and develop a fix set of requirements and variable definitions. Another aspect of this topic is to further estimate the feasibility of the whole system. In the worst case, the findings would suggest that it is impossible to approximate the properties of a data management system and thus would make the policy system impossible or unrealistic complex. 8.2 Policy Language and Module Description Further investigations on how to define the policy language are closely related to the evaluation of properties of data management systems. But furthermore, there should be evaluations that examine how customer want to formulate their requirements and how developers can easily describe their modules. The second aspect would also need mechanisms that ensure a good quality. As discussed previously, the current implementation of the policy language could also need further expressions and semantics. The following list gives examples: policies for referenced services One extension could be the support of defining policies for each referenced service. This would allow to formulate constraints addressing the referenced module. For example a MasterSlave module could express, that it needs exactly two instances/nodes. multiple values Another extension could be the support of requirements that require more

94 Future Work 88 than one value. For example, the requirement could express the need of having an HTTP and another proprietary interface for accessing the data management system. Together with the policy language for module descriptions comes the question, whether the description by developers is the best solution. Further work could investigate, whether it is possible to generate module descriptions by evaluating modules automatically. 8.3 Mapping Requirements to Variables During this thesis, we identified the problem, that developer and customer do not use the same terminology. To overcome this problem, given that there is no clear and standardized terminology for data management systems, one could map the requirements to variables using an ontology. An example for this approach is the work by Lamparter et al. [22], where an ontology is used for policy based management of web services. Future work for our policy system could be the creation of an ontology using OWL together with the mapping of property names. Furthermore, one could define an ontology that also maps different measures for properties. 8.4 Policy System As mentioned in this work, we left the negotiation process to further work. The final target should be a policy system, that is reachable over web and allows a full negotiation process for selecting a solution and finally deploying a data management system. This mainly requires the support of financial and legal aspects. Another important extension of the policy system would be the runtime monitoring. Since it is only possible to fully verify the solution during runtime, further work should define a system for runtime evaluation. This system should evaluate the compliance of the enforced data management system but also try to optimize the system configuration. For the runtime evaluation and the following optimization it will be necessary to evaluate how we can adopt the policy translation algorithm to work with runtime evaluation data instead of module descriptions. We suggest that, appropriate to the genetic algorithm, the runtime optimization could use a method from the area of unsupervised learning or even reinforcement learning, since those areas are also able to deal with little knowledge about the search space. 8.5 Policy Translation As we have observed, it is not sure, whether the quantitative performance is good enough for real world scenarios. Therefore, the policy language should be optimized in two directions: the first phase should be optimized to limit the search space (better filtering) for the optimization for the second phase it would be possible to optimize the time consumption by reducing the function space for variables and replacing the script calculation by more efficient

95 Future Work 89 methods Another possibility to improve the policy translation, could be to store best practices for reoccurring patterns in policies and module compositions. With this feature, it would be possible to optimize the policy translation engine over time.

96 Bibliography [1] Kossmann, D., Kraska, T., Loesing, S., Merkli, S., Mittal, R., and Pfaffhauser, F. Cloudy: a modular cloud storage system. Proc. VLDB Endow., 3: (2010). URL [2] Jindal, A. The mimicking octopus: Towards a one-size-fits-all database architecture. In VLDB PhD Workshop (2010). [3] Moses, T. extensible Access Control Markup Language TC v2.0 (XACML) (2005). URL 0-core-spec-os.pdf. [4] Damianou, N., Dulay, N., Lupu, E., and Sloman, M. The Ponder Policy Specification Language. In Proceedings of the International Workshop on Policies for Distributed Systems and Networks, POLICY 01, pages Springer-Verlag, London, UK (2001). URL [5] Ashley, P., Hada, S., Karjoth, G., Powers, C., and Schunter, M. Enterprise Privacy Authorization Language (EPAL 1.2). Misc (2003,). [6] Kagal, L. Rei: A Policy Language for the Me-Centric Project. Technical report (2002). [7] Turner, K. J., Reiff-marganiec, S., Blair, L., Cambpell, G. A., Wang, F., Turner, K. J., Reiff-marganiec, S., Blair, L., Cambpell, G. A., and Wang, F. APPEL: An Adaptable and Programmable Policy Environment and Language (2007). [8] Vedamuthu, A. S., Orchard, D., Hirsch, F., Hondo, M., Yendluri, P., Boubez, T., and Yalçinalp, U. Web Services Policy Framework. Technical report (2007). URL [9] Ludwig, H., Keller, A., Dan, A., King, R. P., and Franck, R. Web Service Level Agreement ( WSLA ) Language Specification. Language, pages (2003). URL &rep=rep1&type=pdf. [10] Andrieux, A., Czajkowski, K., Dan, A., Keahey, K., Ludwig, H., Nakata, T., Pruyne, J., Rofrano, J., Tuecke, S., and Xu, M. Web Services Agreement Specification (WS- Agreement). URL [11] Kadambi, S., Chen, J., Cooper, B., Lomax, D., Ramakrishnan, R., Silberstein, A., Tam, E., and Garcia-Molina, H. Where in the World is My Data? Proceedings of the VLDB Endowment, 4(11) (2011).

97 Bibliography 91 [12] Irmert, F., Daum, M., and Meyer-Wegener, K. A new approach to modular database systems. In Proceedings of the 2008 EDBT workshop on Software engineering for tailor-made data management, SETMDM 08, pages ACM, New York, NY, USA (2008). URL [13] Tok, W. H. and Bressan, S. DBNet: A Service-Oriented Database Architecture. In Database and Expert Systems Applications, DEXA th International Workshop on, pages (2006). [14] Elmore, A., Das, S., Agrawal, D., and Abbadi, A. E. Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms. In SIGMOD. ACM, ACM (2011). [15] Archetti, F. and Schoen, F. A survey on the global optimization problem: general theory and computational approaches. Annals of Operations Research, 1(2): (1984). [16] Jin, Y. and Branke, J. Evolutionary optimization in uncertain environments-a survey. Evolutionary Computation, IEEE Transactions on, 9(3): (2005). [17] Goldberg, D. Genetic algorithms in search, optimization, and machine learning. Addison-wesley (1989). [18] Konak, A., Coit, D., and Smith, A. Multi-objective optimization using genetic algorithms: A tutorial. Reliability Engineering & System Safety, 91(9): (2006). [19] Foundation, T. E. Eclipse Equinox. [20] Geeknet. Luaj. [21] Mentor, O. JUnit. [22] Lamparter, S., Ankolekar, A., Studer, R., and Grimm, S. Preference-based Selection of Highly Configurable Web Services (2007).

98 A Appendix A.1 Full XSD of the policy language 1 <?xml version=" 1. 0 " encoding="utf 8" standalone=" yes "?> 2 <xs:schema elementformdefault=" q u a l i f i e d " version=" 1. 0 " targetnamespace=" h t t p : // d b i s. cs. unibas. ch/ ubstore /management/ p o l i c y " xmlns:ubs=" h t t p : // d b i s. cs. unibas. ch/ ubstore /management/ p o l i c y " xmlns:tns=" h t t p : // d b i s. cs. unibas. ch/ ubstore /management/ p o l i c y " xmlns:xs=" h t t p : //www. w3. org /2001/XMLSchema"> 3 <x s : e l e m e n t name=" a l i a s V a r i a b l e " type=" t n s : a l i a s V a r i a b l e "/> 4 <x s : e l e m e n t name=" a t t r i b u t e " type=" t n s : a t t r i b u t e "/> 5 <x s : e l e m e n t name=" b i j e c t i v e V a r i a b l e " type=" t n s : b i j e c t i v e V a r i a b l e "/> 6 <x s : e l e m e n t name=" booleanrange " type=" tns:booleanrange "/> 7 <x s : e l e m e n t name=" complexvariable " type=" t n s : c o m p l e x V a r i a b l e "/> 8 <x s : e l e m e n t name=" constant " type=" t n s : c o n s t a n t "/> 9 <x s : e l e m e n t name=" constantrequirement " type=" tns:constantrequirement "/> 10 <x s : e l e m e n t name=" continuousrange " type=" tns:continuousrange "/> 11 <x s : e l e m e n t name=" d i s c r e t e R a n g e " type=" t n s : d i s c r e t e R a n g e "/> 12 <x s : e l e m e n t name=" i n t e r v a l Requirement " type=" t n s : i n t e r v a l R e q u i r e m e n t "/> 13 <x s : e l e m e n t name=" j a v a S c r i p t C a l c u l a t i o n " type=" t n s : j a v a S c r i p t C a l c u l a t i o n "/> 14 <x s : e l e m e n t name=" l o c a l V a r i a b l e B i n d i n g " type=" t n s : l o c a l V a r i a b l e B i n d i n g "/> 15 <x s : e l e m e n t name=" l u a C a l c u l a t i o n " type=" t n s : l u a C a l c u l a t i o n "/> 16 <x s : e l e m e n t name="module" type=" tns:module "/> 17 <x s : e l e m e n t name=" p o l i c y " type=" t n s : p o l i c y "/> 18 <x s : e l e m e n t name=" p r o v i d e d S e r v i c e " type=" t n s : p r o v i d e d S e r v i c e "/> 19 <x s : e l e m e n t name=" r e f e r e n c e d S e r v i c e " type=" t n s : r e f e r e n c e d S e r v i c e "/> 20 <x s : e l e m e n t name=" r e f e r e n c e d S e r v i c e V a r i a b l e B i n d i n g " type=" t n s : r e f e r e n c e d S e r v i c e V a r i a b l e B i n d i n g "/> 21 <x s : e l e m e n t name=" setrange " type=" t n s : s e t R a n g e "/> 22

99 Appendix <xs: complextype name=" module"> 24 <x s : s e q u e n c e> 25 <x s : e l e m e n t r e f=" t n s : p r o v i d e d S e r v i c e " minoccurs="0" maxoccurs=" unbounded"/> 26 <x s : e l e m e n t r e f=" t n s : r e f e r e n c e d S e r v i c e " minoccurs="0" maxoccurs=" unbounded"/> 27 <x s : e l e m e n t name=" v a r i a b l e D e f i n i t i o n s " minoccurs="0"> 28 <xs: complextype> 29 <x s : s e q u e n c e> 30 <x s : c h o i c e minoccurs="0" maxoccurs="unbounded"> 31 <x s : e l e m e n t r e f=" t n s : a t t r i b u t e "/> 32 <x s : e l e m e n t r e f=" t n s : b i j e c t i v e V a r i a b l e "/> 33 <x s : e l e m e n t r e f=" t n s : c o n s t a n t "/> 34 <x s : e l e m e n t r e f=" t n s : a l i a s V a r i a b l e "/> 35 <x s : e l e m e n t r e f=" t n s : c o m p l e x V a r i a b l e "/> 36 </ x s : c h o i c e> 37 </ x s : s e q u e n c e> 38 </ xs: complextype> 39 </ x s : e l e m e n t> 40 </ x s : s e q u e n c e> 41 <x s : a t t r i b u t e name=" v e r s i o n " type=" x s : s t r i n g " use=" r e q u i r e d "/> 42 <x s : a t t r i b u t e name="name" type=" x s : s t r i n g " use=" r e q u i r e d "/> 43 <x s : a t t r i b u t e name="moduletype" type=" t n s : t y p e "/> 44 </ xs: complextype> <xs:complextype name=" p r o v i d e d S e r v i c e "> 47 < x s : a l l /> 48 <x s : a t t r i b u t e name="monopoly" type=" x s : b o o l e a n "/> 49 <x s : a t t r i b u t e name=" s e r v i c e " type=" t n s : s e r v i c e " use=" r e q u i r e d "/> 50 <x s : a t t r i b u t e name=" p r o v i d e r " type=" t n s : s e r v i c e " use=" r e q u i r e d "/> 51 </ xs: complextype> <xs:simpletype name=" s e r v i c e "> 54 < x s : r e s t r i c t i o n base=" x s : s t r i n g "/> 55 </ xs:simpletype> <xs:complextype name=" r e f e r e n c e d S e r v i c e "> 58 < x s : a l l /> 59 <x s : a t t r i b u t e name=" r e q u e s t e r " type=" x s : s t r i n g " use=" r e q u i r e d "/> 60 <x s : a t t r i b u t e name=" s e r v i c e " type=" t n s : s e r v i c e " use=" r e q u i r e d "/> 61 <x s : a t t r i b u t e name=" b i n d i n g n e s s " type=" t n s : b i n d i n g n e s s " use=" r e q u i r e d "/> 62 </ xs: complextype> <xs:complextype name=" v a r i a b l e D e f i n i t i o n " a b s t r a c t=" t r u e "> 65 < x s : a l l /> 66 <x s : a t t r i b u t e name="name" type=" x s : s t r i n g " use=" r e q u i r e d "/>

100 Appendix <x s : a t t r i b u t e name=" v i s i b i l i t y " type=" t n s : v i s i b i l i t y " use=" r e q u i r e d "/> 68 </ xs: complextype> <xs:complextype name=" a t t r i b u t e "> 71 <xs: complexcontent> 72 <x s : e x t e n s i o n base=" t n s : b o u n d e d V a r i a b l e D e f i n i t i o n "> 73 < x s : a l l /> 74 </ x s : e x t e n s i o n> 75 </ xs: complexcontent> 76 </ xs: complextype> <xs:complextype name=" b o u n d e d V a r i a b l e D e f i n i t i o n " a b s t r a c t=" t r u e "> 79 <xs: complexcontent> 80 <x s : e x t e n s i o n base=" t n s : v a r i a b l e D e f i n i t i o n "> 81 <x s : c h o i c e> 82 <x s : e l e m e n t r e f=" tns:booleanrange "/> 83 <x s : e l e m e n t r e f=" tns:continuousrange "/> 84 <x s : e l e m e n t r e f=" t n s : d i s c r e t e R a n g e "/> 85 <x s : e l e m e n t r e f=" t n s : s e t R a n g e "/> 86 </ x s : c h o i c e> 87 </ x s : e x t e n s i o n> 88 </ xs: complexcontent> 89 </ xs: complextype> <xs:complextype name=" range " a b s t r a c t=" t r u e "> 92 < x s : a l l /> 93 </ xs: complextype> <xs: complextype name=" booleanrange "> 96 <xs: complexcontent> 97 <x s : e x t e n s i o n base=" t n s : r a n g e "> 98 < x s : a l l /> 99 </ x s : e x t e n s i o n> 100 </ xs: complexcontent> 101 </ xs: complextype> <xs: complextype name=" continuousrange "> 104 <xs: complexcontent> 105 <x s : e x t e n s i o n base=" t n s : r a n g e "> 106 < x s : a l l> 107 <x s : e l e m e n t name="max" type=" t n s : f l o a t T y p e "/> 108 <x s : e l e m e n t name="min" type=" t n s : f l o a t T y p e "/> 109 </ x s : a l l> 110 </ x s : e x t e n s i o n> 111 </ xs: complexcontent> 112 </ xs: complextype>

101 Appendix <xs:simpletype name=" floattype "> 115 < x s : r e s t r i c t i o n base=" x s : f l o a t "/> 116 </ xs:simpletype> <xs:complextype name=" d i s c r e t e R a n g e "> 119 <xs: complexcontent> 120 <x s : e x t e n s i o n base=" t n s : r a n g e "> 121 < x s : a l l> 122 <x s : e l e m e n t name="max" type=" t n s : i n t e g e r T y p e "/> 123 <x s : e l e m e n t name="min" type=" t n s : i n t e g e r T y p e "/> 124 </ x s : a l l> 125 </ x s : e x t e n s i o n> 126 </ xs: complexcontent> 127 </ xs: complextype> <xs:simpletype name=" integertype "> 130 < x s : r e s t r i c t i o n base=" x s : i n t "/> 131 </ xs:simpletype> <xs: complextype name=" setrange "> 134 <xs: complexcontent> 135 <x s : e x t e n s i o n base=" t n s : r a n g e "> 136 <x s : s e q u e n c e> 137 <x s : e l e m e n t name=" element " type=" xs:anytype " maxoccurs=" unbounded"/> 138 </ x s : s e q u e n c e> 139 </ x s : e x t e n s i o n> 140 </ xs: complexcontent> 141 </ xs: complextype> <xs:simpletype name=" booleantype "> 144 < x s : r e s t r i c t i o n base=" x s : b o o l e a n "/> 145 </ xs:simpletype> <xs:simpletype name=" stringtype "> 148 < x s : r e s t r i c t i o n base=" x s : s t r i n g "/> 149 </ xs:simpletype> <xs:complextype name=" b i j e c t i v e V a r i a b l e "> 152 <xs: complexcontent> 153 <x s : e x t e n s i o n base=" t n s : b o u n d e d V a r i a b l e D e f i n i t i o n "> 154 < x s : a l l /> 155 </ x s : e x t e n s i o n> 156 </ xs: complexcontent> 157 </ xs: complextype> 158

102 Appendix <xs:complextype name=" constant "> 160 <xs: complexcontent> 161 <x s : e x t e n s i o n base=" t n s : v a r i a b l e D e f i n i t i o n "> 162 <x s : s e q u e n c e> 163 <x s : e l e m e n t name=" value " type=" xs:anytype "/> 164 </ x s : s e q u e n c e> 165 </ x s : e x t e n s i o n> 166 </ xs: complexcontent> 167 </ xs: complextype> <xs:complextype name=" a l i a s V a r i a b l e "> 170 <xs: complexcontent> 171 <x s : e x t e n s i o n base=" t n s : v a r i a b l e D e f i n i t i o n "> 172 < x s : a l l /> 173 <x s : a t t r i b u t e name=" variablename " type=" x s : s t r i n g " use=" r e q u i r e d "/> 174 <x s : a t t r i b u t e name=" r e f e r e n c e d S e r v i c e R e q u e s t e r " type=" x s : s t r i n g " use=" r e q u i r e d "/> 175 <x s : a t t r i b u t e name=" r e f e r e n c e d S e r v i c e S e r v i c e " type=" x s : s t r i n g " use=" r e q u i r e d "/> 176 </ x s : e x t e n s i o n> 177 </ xs: complexcontent> 178 </ xs: complextype> <xs: complextype name=" complexvariable "> 181 <xs: complexcontent> 182 <x s : e x t e n s i o n base=" t n s : v a r i a b l e D e f i n i t i o n "> 183 <x s : c h o i c e> 184 <x s : e l e m e n t r e f=" t n s : l u a C a l c u l a t i o n "/> 185 <x s : e l e m e n t r e f=" t n s : j a v a S c r i p t C a l c u l a t i o n "/> 186 </ x s : c h o i c e> 187 </ x s : e x t e n s i o n> 188 </ xs: complexcontent> 189 </ xs: complextype> <xs:complextype name=" c a l c u l a t i o n " a b s t r a c t=" t r u e "> 192 < x s : a l l> 193 <x s : e l e m e n t name=" type " type=" xs:anytype "/> 194 <x s : e l e m e n t name=" v a r i a b l e B i n d i n g s " type=" tns:mapping "/> 195 </ x s : a l l> 196 </ xs: complextype> <xs: complextype name=" mapping"> 199 <x s : s e q u e n c e> 200 <x s : e l e m e n t name=" v a r i a b l e B i n d i n g " type=" tns:mapelements " maxoccurs=" unbounded"/> 201 </ x s : s e q u e n c e>

103 Appendix </ xs: complextype> <xs: complextype name=" mapelements"> 205 <x s : c h o i c e> 206 <x s : e l e m e n t r e f=" t n s : l o c a l V a r i a b l e B i n d i n g "/> 207 <x s : e l e m e n t r e f=" t n s : r e f e r e n c e d S e r v i c e V a r i a b l e B i n d i n g "/> 208 </ x s : c h o i c e> 209 <x s : a t t r i b u t e name="name" type=" x s : s t r i n g " use=" r e q u i r e d "/> 210 </ xs: complextype> <xs:complextype name=" v a r i a b l e B i n d i n g " a b s t r a c t=" t r u e "> 213 < x s : a l l /> 214 <x s : a t t r i b u t e name=" variablename " type=" x s : s t r i n g " use=" r e q u i r e d "/> 215 </ xs: complextype> <xs:complextype name=" l o c a l V a r i a b l e B i n d i n g "> 218 <xs: complexcontent> 219 <x s : e x t e n s i o n base=" t n s : v a r i a b l e B i n d i n g "> 220 <x s : s e q u e n c e /> 221 </ x s : e x t e n s i o n> 222 </ xs: complexcontent> 223 </ xs: complextype> <xs:complextype name=" r e f e r e n c e d S e r v i c e V a r i a b l e B i n d i n g "> 226 <xs: complexcontent> 227 <x s : e x t e n s i o n base=" t n s : v a r i a b l e B i n d i n g "> 228 <x s : s e q u e n c e /> 229 <x s : a t t r i b u t e name=" r e f e r e n c e d S e r v i c e R e q u e s t e r " type=" x s : s t r i n g " use=" r e q u i r e d "/> 230 <x s : a t t r i b u t e name=" r e f e r e n c e d S e r v i c e S e r v i c e " type=" x s : s t r i n g " use=" r e q u i r e d "/> 231 </ x s : e x t e n s i o n> 232 </ xs: complexcontent> 233 </ xs: complextype> <xs:complextype name=" l u a C a l c u l a t i o n "> 236 <xs: complexcontent> 237 <x s : e x t e n s i o n base=" t n s : c a l c u l a t i o n "> 238 <x s : s e q u e n c e> 239 <x s : e l e m e n t name=" s c r i p t " type=" x s : s t r i n g "/> 240 </ x s : s e q u e n c e> 241 </ x s : e x t e n s i o n> 242 </ xs: complexcontent> 243 </ xs: complextype> <xs:complextype name=" j a v a S c r i p t C a l c u l a t i o n "> 246 <xs: complexcontent>

104 Appendix <x s : e x t e n s i o n base=" t n s : c a l c u l a t i o n "> 248 <x s : s e q u e n c e> 249 <x s : e l e m e n t name=" s c r i p t " type=" x s : s t r i n g "/> 250 </ x s : s e q u e n c e> 251 </ x s : e x t e n s i o n> 252 </ xs: complexcontent> 253 </ xs: complextype> <xs:complextype name=" p o l i c y "> 256 <x s : s e q u e n c e> 257 <x s : c h o i c e minoccurs="0" maxoccurs="unbounded"> 258 <x s : e l e m e n t r e f=" tns:constantrequirement "/> 259 <x s : e l e m e n t r e f=" t n s : i n t e r v a l R e q u i r e m e n t "/> 260 </ x s : c h o i c e> 261 </ x s : s e q u e n c e> 262 <x s : a t t r i b u t e name=" v e r s i o n " type=" x s : s t r i n g " use=" r e q u i r e d "/> 263 <x s : a t t r i b u t e name="name" type=" x s : s t r i n g " use=" r e q u i r e d "/> 264 </ xs: complextype> <xs:complextype name=" requirement " a b s t r a c t=" t r u e "> 267 < x s : a l l /> 268 <x s : a t t r i b u t e name=" b i n d i n g n e s s " type=" t n s : b i n d i n g n e s s " use=" r e q u i r e d "/> 269 </ xs: complextype> <xs: complextype name=" constantrequirement "> 272 <xs: complexcontent> 273 <x s : e x t e n s i o n base=" t n s : v a r i a b l e R e q u i r e m e n t "> 274 <x s : s e q u e n c e> 275 <x s : e l e m e n t name=" value " type=" xs:anytype "/> 276 </ x s : s e q u e n c e> 277 </ x s : e x t e n s i o n> 278 </ xs: complexcontent> 279 </ xs: complextype> <xs:complextype name=" variablerequirement " a b s t r a c t=" t r u e "> 282 <xs: complexcontent> 283 <x s : e x t e n s i o n base=" t n s : r e q u i r e m e n t "> 284 < x s : a l l /> 285 <x s : a t t r i b u t e name=" variablename " type=" x s : s t r i n g " use=" r e q u i r e d "/> 286 </ x s : e x t e n s i o n> 287 </ xs: complexcontent> 288 </ xs: complextype> <xs:complextype name=" i n t e rvalrequirement "> 291 <xs: complexcontent>

105 Appendix <x s : e x t e n s i o n base=" t n s : v a r i a b l e R e q u i r e m e n t "> 293 < x s : a l l> 294 <x s : e l e m e n t name=" to " type=" xs:anytype "/> 295 <x s : e l e m e n t name=" from " type=" xs:anytype "/> 296 </ x s : a l l> 297 <x s : a t t r i b u t e name=" targetvalue " type=" t n s : t a r g e t V a l u e " use=" r e q u i r e d "/> 298 </ x s : e x t e n s i o n> 299 </ xs: complexcontent> 300 </ xs: complextype> <xs:simpletype name=" type "> 303 < x s : r e s t r i c t i o n base=" x s : s t r i n g "> 304 <xs:enumeration value=" base "/> 305 <xs:enumeration value=" standard "/> 306 <xs:enumeration value=" e x t e n s i o n "/> 307 </ x s : r e s t r i c t i o n> 308 </ xs:simpletype> <xs:simpletype name=" b i n d i n g n e s s "> 311 < x s : r e s t r i c t i o n base=" x s : s t r i n g "> 312 <xs:enumeration value="mandatory"/> 313 <xs:enumeration value=" o p t i o n a l "/> 314 </ x s : r e s t r i c t i o n> 315 </ xs:simpletype> <xs:simpletype name=" v i s i b i l i t y "> 318 < x s : r e s t r i c t i o n base=" x s : s t r i n g "> 319 <xs:enumeration value=" p u b l i c "/> 320 <xs:enumeration value=" p r i v a t e "/> 321 </ x s : r e s t r i c t i o n> 322 </ xs:simpletype> <xs:simpletype name=" t argetvalue "> 325 < x s : r e s t r i c t i o n base=" x s : s t r i n g "> 326 <xs:enumeration value="any"/> 327 <xs:enumeration value=" aslargeaspossible "/> 328 <xs:enumeration value=" assmallaspossible "/> 329 </ x s : r e s t r i c t i o n> 330 </ xs:simpletype> 331 </ xs: schema> Listing A.1: The full XSD of the policy language together with specific implementations. The XSD isgenerated using JAXB. JAXB defines the variable definitions and requirements as choice instead of using inheritance.

106 Appendix 100 A.2 UML diagram of objects for policy, module and system configuration Policy name : String version : String 1 variablename : String VariableRequirement isfulfillableforvalue(systemconfiguration : SystemConfiguration,tryToSetValue : boolean) : boolean isconstant(systemconfiguration : SystemConfiguration) : boolean getpossibledependencies(modules : Module[]) : VariableDefinition[] 0..* requirements bindingness : Bindingness Requirement update(systemconfiguration : SystemConfiguration) : void isfulfilled(systemconfiguration : SystemConfiguration) : boolean isfulfillable(systemconfiguration : SystemConfiguration) : boolean getreward(systemconfiguration : SystemConfiguration) : float <<enumeration>> Bindingness MANDATORY OPTIONAL Service classname : String Module name : String modules version : String 1 moduletype : ModuleType 0..* * referencedservices ReferencedService requester : String service : Service bindingness : Bindingness isfinal : boolean value : Value 0..* providedservices 1 referencedservice ProvidedService provider : String monopoly : boolean service : Service * servicelinks providedservice 0..* Variable SystemConfiguration 1 0..* 0..* ServiceLink update(systemconfiguration : SystemConfiguration) : void isfulfillable(systemconfiguration : SystemConfiguration) : boolean isfulfillableforvalue(systemconfiguration : SystemConfiguration,value : Value,tryToSetValue : boolean) : boolean isconstant(systemconfiguration : SystemConfiguration) : boolean 0..* variabledefinitions name : String visibility : Visibility variabledefinition VariableDefinition 1 0..* variables getpossibledependencies(modules : Module[]) : VariableDefinition[] isfulfillable(systemconfiguration : SystemConfiguration) : boolean isfulfillableforvalue(systemconfiguration : SystemConfiguration,value : Value,tryToSetValue : boolean) : boolean isconstant(systemconfiguration : SystemConfiguration) : boolean isinput() : boolean isoutput() : boolean getvalue(systemconfiguration : SystemConfiguration) : Value 0..* 1 Range iscontinuous() : boolean iscountable() : boolean getmin() : Value getmax() : Value getelements() : Value[] contains(value : Value) : boolean getlength() : float getrandomelement() : Value 1 range 1 BoundedVariableDefinition <<enumeration>> ModuleType BASE STANDARD EXTENSION 1 boundedvariabledefinition <<enumeration>> PUBLIC PRIVATE Visibility BoundedVariable 0..* Value value : Object Figure A.1: The core objects needed to describe policies, module descriptions and system configurations. Furthermore, the UML diagram shows the method defining the behaviour of and thus the semantics of the policy language elements. This behaviour must be defined for applying the policy translation algorithm.

107 Appendix 101 A.3 Definition of modules used for evaluation The following tables list the used referenced services, provided services, module types and influenced properties for the modules used for evaluation. The list is followed by listing A.3 and A.2 that show two of the modules using the policy language. The two XML definitions were used for evaluation. System Module Type Provided Services Referenced Services Properties base System Access, Distribution, Storage instances Hardware Module Type Provided Services Referenced Services Properties base Hardware System costs, scalability, durability, location, geolocation, multitenancy BTreeIndexing Module Type Provided Services Referenced Services Properties standard Indexing Hardware costs, indexing, latency ProprietaryInterface Module Type Provided Services Referenced Services Properties extension Interface Access, Hardware costs, interfacetype, latency TokenBasedAuthentication Module Type Provided Services Referenced Services Properties standard Authentication costs, authentication ConsistencyRationing Module Type Provided Services Referenced Services Properties standard ReplicationControlProtocol ConcurrencyControlProtocol, Routing, System, Hardware costs, availability, consistency, latency, redundancy, replicationspeed, atomicity, isolation

108 Appendix 102 SecureLogger Module Type Provided Services Referenced Services Properties standard Logging costs, logging DistributedHashTable Module Type Provided Services Referenced Services Properties standard Routing, Partitioning Hardware, System costs, elasticity, latency, partitiontolerance, redundancy ComplexAccess Module Type Provided Services Referenced Services Properties standard Access Distribution, Logging, Authentication, ConcurrencyControl- Protocol, Compression, Hardware costs, accessdatatype, accesslogging, latency DomainPartitioning Module Type Provided Services Referenced Services Properties standard Partitioning costs, partitiontolerance HorizontalPartitioning Module Type Provided Services Referenced Services Properties standard Partitioning costs, partitiontolerance SimpleUserManagement Module Type Provided Services Referenced Services Properties standard AccountManagement costs S2PL Module Type Provided Services Referenced Services Properties standard ConcurrencyControlProtocol ReplicationControlProtocol, Hardware costs, atomicity, availability, consistency, isolation, latency

109 Appendix 103 NoConcurrencyControlProtocol Module Type standard Provided Services ConcurrencyControlProtocol Referenced Services ReplicationControlProtocol, Hardware Properties costs, atomicity, availability, consistency, isolation, latency DebugLogger Module Type Provided Services Referenced Services Properties standard Logging costs, logging AccountBasedAuthentication Module Type standard Provided Services Authentication Referenced Services AccountManagement Properties costs, authentication DelayedCompression Module Type Provided Services Referenced Services Properties NoCompression Module Type Provided Services Referenced Services Properties DurabilityManager Module Type Provided Services Referenced Services Properties SimpleAccess Module Type Provided Services Referenced Services Properties standard Compression costs, compression, latency standard Compression costs, compression, latency extension Durability costs, erasing standard Access ConcurrencyControlProtocol, Distribution, Logging, Hardware costs, accessdatatype, accesslogging, latency

110 Appendix 104 FatsStreamCompression Module Type Provided Services Referenced Services Properties standard Compression costs, compression, latency Freshness Module Type Provided Services Referenced Services Properties extension Freshness costs, freshness HttpInterface Module Type Provided Services Referenced Services Properties extension Interface Access, Hardware costs, interfacetype, latency, loadbalancing QuorumProtocol Module Type Provided Services Referenced Services Properties standard ReplicationControlProtocol ConcurrencyControlProtocol, Routing, System, Hardware costs, availability, consistency, latency, redundancy, replicationspeed, atomicity, isolation ArchiveManager Module Type Provided Services Referenced Services Properties extension Archiving Durability costs, erasing, archiving MultipleNodes Module Type Provided Services Referenced Services Properties standard Distribution Storage, Routing, Partitioning, ReplicationControlProtocol, Hardware, System costs, availability, consistency, elasticity, latency, partitiontolerance, redundancy, replicationspeed, scalability OLAPIndexing Module Type Provided Services Referenced Services Properties standard Indexing Hardware costs, indexing, latency

111 Appendix 105 FileStorage Module Type Provided Services Referenced Services Properties standard Storage Hardware, Compression, Indexing costs, elasticity, storagedatatype, scalability, durability, latency, storagelogging InMemoryStorage Module Type Provided Services Referenced Services Properties standard Storage Hardware, Compression, Indexing costs, elasticity, storagedatatype, scalability, durability, latency, storagelogging 2PC Module Type Provided Services Referenced Services Properties standard ReplicationControlProtocol ConcurrencyControlProtocol, Routing, System, Hardware costs, availability, consistency, latency, redundancy, replicationspeed, accesses, atomicity, isolation SingleNode Module Type Provided Services Referenced Services Properties standard Distribution Storage, Hardware costs, availability, consistency, elasticity, latency, partitiontolerance, redundancy, replicationspeed, scalability LazyReplication Module Type Provided Services Referenced Services Properties standard ReplicationControlProtocol ConcurrencyControlProtocol, Routing, System, Hardware costs, availability, atomicity, consistency, latency, redundancy, replicationspeed, accesses, isolation MasterSlave Module Type Provided Services Referenced Services Properties standard Distribution, Routing, Partitioning, ReplicationControlProtocol Storage, Hardware costs, availability, consistency, elasticity, latency, partition- Tolerance, redundancy, replicationspeed, scalability, atomicity, isolation

112 Appendix 106 AdvancedCompression Module Type standard Provided Services Compression Referenced Services Properties costs, compression, latency MultiVersionConcurrencyControl Module Type standard Provided Services ConcurrencyControlProtocol Referenced Services ReplicationControlProtocol, Hardware Properties costs, atomicity, availability, consistency, isolation, latency StaticRouting Module Type Provided Services Referenced Services Properties NoLogger Module Type Provided Services Referenced Services Properties standard Routing, Partitioning Hardware, System costs, elasticity, latency, partitiontolerance, redundancy standard Logging costs, logging KeyBasedPartitioning Module Type standard Provided Services Partitioning Referenced Services Properties costs, partitiontolerance NoIndexing Module Type Provided Services Referenced Services Properties standard Indexing costs, indexing, latency

113 Appendix <?xml version=" 1. 0 " encoding="utf 8"?> 2 <module xmlns=" h t t p : // d b i s. cs. unibas. ch/ ubstore /management/ p o l i c y " version=" 1. 0 " name=" ComplexAccess "> 3 4 <r e f e r e n c e d S e r v i c e s e r v i c e=" D i s t r i b u t i o n " b i n d i n g n e s s="mandatory" r e q u e s t e r=" ComplexAccess " /> 5 <r e f e r e n c e d S e r v i c e s e r v i c e=" Logging " b i n d i n g n e s s="mandatory" r e q u e s t e r=" ComplexAccess " /> 6 <r e f e r e n c e d S e r v i c e s e r v i c e=" Authentication " b i n d i n g n e s s="mandatory" r e q u e s t e r=" ComplexAccess " /> 7 <r e f e r e n c e d S e r v i c e s e r v i c e=" ConcurrencyControlProtocol " b i n d i n g n e s s="mandatory" r e q u e s t e r=" ComplexAccess " /> 8 <r e f e r e n c e d S e r v i c e s e r v i c e=" Compression " b i n d i n g n e s s="mandatory" r e q u e s t e r=" ComplexAccess " /> 9 <r e f e r e n c e d S e r v i c e s e r v i c e="hardware" b i n d i n g n e s s="mandatory" r e q u e s t e r=" ComplexAccess " /> 10 <v a r i a b l e D e f i n i t i o n s> 11 <constant v i s i b i l i t y=" p r i v a t e " name=" c o s t s "> 12 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">1000</ value> 13 </ constant> 14 15 <setrange> 16 <element x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">SQL</ element> 17 <element x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">JSON</ element> 18 <element x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">XML</ element> 19 <element x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">CSV</ element> 20 </ setrange> 21 22 <a l i a s V a r i a b l e v i s i b i l i t y=" p u b l i c " name=" a c c e s s L o g g i n g " variablename=" l o g g i n g " r e f e r e n c e d S e r v i c e R e q u e s t e r=" ComplexAccess " r e f e r e n c e d S e r v i c e S e r v i c e=" Logging " /> 23 <complexvariable v i s i b i l i t y=" p r i v a t e " name=" l a t e n c y "> 24 <l u a C a l c u l a t i o n> 25 <s c r i p t>l a t e n c y +5.0/ performance+compressionlatency+p rotocollatency</ s c r i p t> 26 <type x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype " /> 27 <v a r i a b l e B i n d i n g s> 28 <v a r i a b l e B i n d i n g name=" l a t e n c y "> 29 <r e f e r e n c e d S e r v i c e V a r i a b l e B i n d i n g variablename=" l a t e n c y " r e f e r e n c e d S e r v i c e R e q u e s t e r=" ComplexAccess " r e f e r e n c e d S e r v i c e S e r v i c e=" D i s t r i b u t i o n " />

114 Appendix </ v a r i a b l e B i n d i n g> 31 <v a r i a b l e B i n d i n g name=" compressionlatency "> 32 <r e f e r e n c e d S e r v i c e V a r i a b l e B i n d i n g variablename=" l a t e n c y " r e f e r e n c e d S e r v i c e R e q u e s t e r=" ComplexAccess " r e f e r e n c e d S e r v i c e S e r v i c e=" Compression " /> 33 </ v a r i a b l e B i n d i n g> 34 <v a r i a b l e B i n d i n g name=" performance "> 35 <r e f e r e n c e d S e r v i c e V a r i a b l e B i n d i n g variablename=" performance " r e f e r e n c e d S e r v i c e R e q u e s t e r=" ComplexAccess " r e f e r e n c e d S e r v i c e S e r v i c e="hardware" /> 36 </ v a r i a b l e B i n d i n g> 37 <v a r i a b l e B i n d i n g name=" protocollatency "> 38 <r e f e r e n c e d S e r v i c e V a r i a b l e B i n d i n g variablename=" l a t e n c y " r e f e r e n c e d S e r v i c e R e q u e s t e r=" ComplexAccess " r e f e r e n c e d S e r v i c e S e r v i c e=" ConcurrencyControlProtocol " /> 39 </ v a r i a b l e B i n d i n g> 40 </ v a r i a b l e B i n d i n g s> 41 </ l u a C a l c u l a t i o n> 42 </ complexvariable> 43 </ v a r i a b l e D e f i n i t i o n s> 44 </ module> Listing A.2: Definition of the ComplexAccess module using the policy language. The module references many other service for delegating the definition of functionality. 1 <?xml version=" 1. 0 " encoding="utf 8"?> 2 <module xmlns=" h t t p : // d b i s. cs. unibas. ch/ ubstore /management/ p o l i c y " version=" 1. 0 " name=" DistributedHashTable "> 3 4 5 <r e f e r e n c e d S e r v i c e s e r v i c e="hardware" b i n d i n g n e s s="mandatory" r e q u e s t e r=" DistributedHashTable " /> 6 <r e f e r e n c e d S e r v i c e s e r v i c e="system" b i n d i n g n e s s="mandatory" r e q u e s t e r=" DistributedHashTable " /> 7 <v a r i a b l e D e f i n i t i o n s> 8 <constant v i s i b i l i t y=" p r i v a t e " name=" c o s t s "> 9 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">200</ value> 10 </ constant> 11 <constant v i s i b i l i t y=" p r i v a t e " name=" e l a s t i c i t y "> 12 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ value> 13 </ constant> 14 <complexvariable v i s i b i l i t y=" p r i v a t e " name=" l a t e n c y "> 15 <l u a C a l c u l a t i o n> 16 <s c r i p t>2/ performance math. l o g ( i n s t a n c e s )</ s c r i p t>

115 Appendix <type x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype " /> 18 <v a r i a b l e B i n d i n g s> 19 <v a r i a b l e B i n d i n g name=" performance "> 20 <r e f e r e n c e d S e r v i c e V a r i a b l e B i n d i n g variablename=" performance " r e f e r e n c e d S e r v i c e R e q u e s t e r=" DistributedHashTable " r e f e r e n c e d S e r v i c e S e r v i c e="hardware" /> 21 </ v a r i a b l e B i n d i n g> 22 <v a r i a b l e B i n d i n g name=" i n s t a n c e s "> 23 <r e f e r e n c e d S e r v i c e V a r i a b l e B i n d i n g variablename=" i n s t a n c e s " r e f e r e n c e d S e r v i c e R e q u e s t e r=" DistributedHashTable " r e f e r e n c e d S e r v i c e S e r v i c e="system" /> 24 </ v a r i a b l e B i n d i n g> 25 </ v a r i a b l e B i n d i n g s> 26 </ l u a C a l c u l a t i o n> 27 </ complexvariable> 28 <complexvariable v i s i b i l i t y=" p r i v a t e " name=" p a r t i t i o n T o l e r a n c e "> 29 <l u a C a l c u l a t i o n> 30 <s c r i p t>0. 3 ( redundancy 1)</ s c r i p t> 31 <type x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype " /> 32 <v a r i a b l e B i n d i n g s> 33 <v a r i a b l e B i n d i n g name=" redundancy "> 34 <l o c a l V a r i a b l e B i n d i n g variablename=" redundancy " /> 35 </ v a r i a b l e B i n d i n g> 36 </ v a r i a b l e B i n d i n g s> 37 </ l u a C a l c u l a t i o n> 38 </ complexvariable> 39 40 <d i s c r e t e R a n g e> 41 <min>1</min> 42 <max>3</max> 43 </ d i s c r e t e R a n g e> 44 45 </ v a r i a b l e D e f i n i t i o n s> 46 </ module> Listing A.3: Definition of the DistributedHashTable module using the policy language. Most variables depend on the number of instances.

116 Appendix 110 A.4 Policies used for the evaluation The following listings show the three policies (using the policy language) used for the evaluation: 1 <?xml version=" 1. 0 " encoding="utf 8"?> 2 3 4 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 9</ from> 5 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ to> 6 7 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" a u t h e n t i c a t i o n "> 8 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">notrequired</ value> 9 </ constantrequirement> 10 11 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 4</ from> 12 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ to> 13 14 15 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 5</ from> 16 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">5. 0</ to> 17 18 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" storagedatatype "> 19 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">Table</ value> 20 </ constantrequirement> 21 22 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 9</ from> 23 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ to> 24 25

117 Appendix <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 2</ from> 27 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ to> 28 29 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" i n d e x i n g "> 30 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">olap</ value> 31 </ constantrequirement> 32 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" i n t e r f a c e T y p e "> 33 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">p r o p r i e t a r y</ value> 34 </ constantrequirement> 35 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" geolocation "> 36 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">S w i t z e r l a n d</ value> 37 </ constantrequirement> 38 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" multitenancy "> 39 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" booleantype ">f a l s e</ value> 40 </ constantrequirement> 41 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" redundancy "> 42 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">3</ value> 43 </ constantrequirement> 44 45 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">1000</ from> 46 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">2000</ to> 47 48 Listing A.4: The archive of the weather service is a large-scale data management system that requires a good durability. Additionally, the system requires OLAP support. 1 <?xml version=" 1. 0 " encoding="utf 8"?> 2 3 4 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 9</ from>

118 Appendix <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ to> 6 7 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" a u t h e n t i c a t i o n "> 8 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">r e q u i r e d</ value> 9 </ constantrequirement> 10 11 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 8</ from> 12 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ to> 13 14 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" c o n s i s t e n c y "> 15 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ value> 16 </ constantrequirement> 17 18 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 2</ from> 19 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ to> 20 21 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" i n t e r f a c e T y p e "> 22 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">http</ value> 23 </ constantrequirement> 24 25 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">0</ from> 26 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">100</ to> 27 28 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" l o c a t i o n "> 29 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">nearboundary</ value> 30 </ constantrequirement> 31 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" geolocation "> 32 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">S w i t z e r l a n d</ value>

119 Appendix </ constantrequirement> 34 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" multitenancy "> 35 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" booleantype ">f a l s e</ value> 36 </ constantrequirement> 37 38 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 5</ from> 39 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ to> 40 41 42 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">1</ from> 43 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">2</ to> 44 45 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" r e p l i c a t i o n S p e e d "> 46 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ value> 47 </ constantrequirement> 48 49 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">10</ from> 50 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">50</ to> 51 52 Listing A.5: The queue of the weather service is a small-scale data management system that does not require special extensions. This system requires high availability and throughput 1 <?xml version=" 1. 0 " encoding="utf 8"?> 2 3 4 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 9</ from> 5 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ to> 6

120 Appendix <constantrequirement b i n d i n g n e s s="mandatory" variablename=" a u t h e n t i c a t i o n "> 8 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">r e q u i r e d</ value> 9 </ constantrequirement> 10 11 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 6</ from> 12 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ to> 13 14 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" c o n s i s t e n c y "> 15 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ value> 16 </ constantrequirement> 17 18 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 4</ from> 19 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ to> 20 21 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" i n t e r f a c e T y p e "> 22 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">http</ value> 23 </ constantrequirement> 24 25 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">0</ from> 26 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">100</ to> 27 28 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" l o c a t i o n "> 29 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">nearboundary</ value> 30 </ constantrequirement> 31 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" a c c e s s L o g g i n g "> 32 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">s e c u r e</ value> 33 </ constantrequirement> 34 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" geolocation ">

121 Appendix <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" stringtype ">S w i t z e r l a n d</ value> 36 </ constantrequirement> 37 <constantrequirement b i n d i n g n e s s="mandatory" variablename=" multitenancy "> 38 <value x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" booleantype ">f a l s e</ value> 39 </ constantrequirement> 40 41 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 5</ from> 42 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ to> 43 44 45 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">2</ from> 46 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">3</ to> 47 48 49 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">0. 7</ from> 50 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" floattype ">1. 0</ to> 51 52 53 <from x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">10</ from> 54 <to x m l n s : x s i=" h t t p : //www. w3. org /2001/XMLSchema i n s t a n c e " x s i : t y p e=" integertype ">50</ to> 55 56 Listing A.6: The data management system for storing raw data has to be a mid-scale system. Since, we defined that customers of the weather system can access the data, the system requires strong consistency and high availability.

122