Cost-optimized, Policy-based Data Management in Cloud Environments



Similar documents
Cost-Based Adaptive Concurrency Control in the Cloud

POLICY BASED MANAGEMENT OF MODULAR CLOUD STORAGE SYSTEMS

Data Distribution with SQL Server Replication

Distributed Data Management

DATABASE REPLICATION A TALE OF RESEARCH ACROSS COMMUNITIES

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

Data Consistency on Private Cloud Storage System

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

CAP Theorem and Distributed Database Consistency. Syed Akbar Mehdi Lara Schmidt

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Database Replication with Oracle 11g and MS SQL Server 2008

CHAPTER 7 SUMMARY AND CONCLUSION

Eventually Consistent

COMPONENTS in a database environment

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Middleware for Heterogeneous and Distributed Information Systems

ADDING A NEW SITE IN AN EXISTING ORACLE MULTIMASTER REPLICATION WITHOUT QUIESCING THE REPLICATION

MTCache: Mid-Tier Database Caching for SQL Server

Client/Server and Distributed Computing

Database Replication Techniques: a Three Parameter Classification

Improved Aggressive Update Propagation Technique in Cloud Data Storage

Cloud-dew architecture: realizing the potential of distributed database systems in unreliable networks

Developing SOA solutions using IBM SOA Foundation

Chapter 3 - Data Replication and Materialized Integration

Outgoing VDI Gateways:

Database Replication with MySQL and PostgreSQL

Increased Security, Greater Agility, Lower Costs for AWS DELPHIX FOR AMAZON WEB SERVICES WHITE PAPER

Towards secure and consistency dependable in large cloud systems

OAK Database optimizations and architectures for complex large data Ioana MANOLESCU-GOUJOT

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Virtual Infrastructure Security

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

EMC DOCUMENTUM MANAGING DISTRIBUTED ACCESS

This paper was presented at the 1996 CAUSE annual conference. It is part of the proceedings of that conference, "Broadening Our Horizons:

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Secure Cloud Transactions by Performance, Accuracy, and Precision

A Tool for Generating Partition Schedules of Multiprocessor Systems

SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems

Designing a Cloud Storage System

Solr Cloud vs Replication

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

A framework for web-based product data management using J2EE

How To Build Cloud Storage On Google.Com

Best Practices for Programming Eclipse and OSGi

iservdb The database closest to you IDEAS Institute

Software Defined Security Mechanisms for Critical Infrastructure Management

Cost-Effective Certification of High- Assurance Cyber Physical Systems. Kurt Rohloff BBN Technologies

The Role of the Software Architect

chapater 7 : Distributed Database Management Systems

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Course 20465C: Designing a Data Solution with Microsoft SQL Server

SharePoint 2010 Interview Questions-Architect

What is a life cycle model?

Automated file management with IBM Active Cloud Engine

Lecture 7: Concurrency control. Rasmus Pagh

JBoss Enterprise App. Platforms Roadmap. Rich Sharples Director of Product Management, Red Hat 4th April 2011

WebSphere ESB Best Practices

Converting Java EE Applications into OSGi Applications

Cloud Computing at Google. Architecture

Data Modeling Basics

Selecting the Right Service Virtualization Tool. E: UK: US:

Successfully managing geographically distributed development

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Globule: a Platform for Self-Replicating Web Documents

EDOS Distribution System: a P2P architecture for open-source content dissemination

Storage and Disaster Recovery

Enterprise Service Bus

Motivation Definitions EAI Architectures Elements Integration Technologies. Part I. EAI: Foundations, Concepts, and Architectures

Using OSGi as a Cloud Platform

A new era of PaaS. ericsson White paper Uen February 2015

Rose Business Technologies

Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts

Modular Communication Infrastructure Design with Quality of Service

Service Oriented Architecture

20465: Designing a Data Solution with Microsoft SQL Server

This course is intended for database professionals who need who plan, implement, and manage database solutions. Primary responsibilities include:

Mike Chyi, Micro Focus Solution Consultant May 12, 2010

MANAGEMENT METHODS IN SLA-AWARE DISTRIBUTED STORAGE SYSTEMS

A Survey Study on Monitoring Service for Grid

Database Management. Chapter Objectives

Seamless adaptive multi- cloud management of service- based applications. European Open Cloud Collaboration Workshop, May 15, 2014, Brussels

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

Theodor Borangiu UVHC, ENSIAME 2013

Ensuring the Security of Your Company s Data & Identities. a best practices guide

Apache Karaf in real life ApacheCon NA 2014

Distributed Database Management Systems

Michał Jankowski Maciej Brzeźniak PSNC

Extend the value of your service desk and integrate ITIL processes with IBM Tivoli Change and Configuration Management Database.

Survey on Comparative Analysis of Database Replication Techniques

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

A cloud-based architecture to crowdsource mobile app privacy leaks

How To Build A Cloud Storage System

Information Management

Whitepaper. The ERP-Link Software Platform: Counterpart to Duet Enterprise

Designing a Data Solution with Microsoft SQL Server 2014

ONOS Open Network Operating System

Relational Database Basics Review

Analysis of Cloud Solutions for Asset Management

VDM vs. Programming Language Extensions or their Integration

Transcription:

Cost-optimized, Policy-based Data Management in Cloud Environments Ilir Fetai Filip-Martin Brinkmann Databases and Information Systems Research Group University of Basel

Current State in the Cloud: A zoo of different DBMS Current DBMS differ in: APIs Guarantees (Ex: consistency models, replication, etc.) Limitations (Ex: no support for archiving, etc.) Costs 1-2

DMS wish list DMS = Data Management System Construct my DBMS via declarative specifications Declarative specifications = Policies Policy management needed A way to express requirements of applications, transactions, end-users, etc., and monitor their fulfilment Examples: Control data consistency Specify possible data replication locations Control data archiving Control costs! Modular DMS architecture needed 1-3

Scenario Client: use services have different requirements Service Provider: provides different services uses underlying DMS infrastructure has requirements on data management Cloud DMS Provider: provides the DMS infrastructure 1-4

Agenda Policy-based and modular Data Management Systems (POLAR DMS) Cost-effective & Policy-based Data Management Data Consistency Data Replication Data Archiving Architecture of POLAR DMS Conclusion 1-5

Agenda Policy-based and modular Data Management Systems (POLAR DMS) Cost-effective & Policy-based Data Management Data Consistency Data Replication Data Archiving Architecture of POLAR DMS Conclusion 1-6

POLAR DMS - Requirements Requirements: User-friendly & understandable policy-based and modular DMS Specify policies at different levels: Cloud provider, application/service provider, but even end-customer DMS adjusts at runtime based on policies Minimal core Implement DMS functionality as exchangeable modules 1-7

POLAR DMS - Requirements Requirements: User-friendly & understandable policy-based and modular DMS Specify policies at different levels: Cloud provider, application/service provider, but even end-customer DMS adjusts at runtime based on policies Minimal core Implement DMS functionality as exchangeable modules 1-8

Policies in POLAR DMS End-user policies May override default transaction policies Ex: paid services Transaction-policies May override default application policies Application-policies Applications may specify default policies valid for entire application System-policies The underlying system may also specify own policies 1-9

Conflicting Policies POLAR DMS must also handle policy conflicts! 1-10

Modularity in POLAR DMS Build your own DMS by composing it of modules Service Provider: Application - Developers: Specify policies at transaction level. The system may need to adjust at runtime (load modules, etc.) Application - Architects: Choose the modules you need for your architecture or specify application-wide policies Cloud DMS Provider Administrators: Specify system policies and configure the DMS 1-11

Cost-effective & Policy-based Data Management POLAR DMS provides Policy Mechanism Modular Architecture In the context of POLAR DMS Analyse a concrete subset of possible data management policies Data management aspects: consistency, replication & archiving 1-12

Agenda Policy-based and modular Data Management Systems (POLAR DMS) Cost-effective & Policy-based Data Management Data Consistency Data Replication Data Archiving Architecture of POLAR DMS Conclusion 1-13

Agenda Policy-based and modular Data Management Systems (POLAR DMS) Cost-effective & Policy-based Data Management Data Consistency Data Replication Data Archiving Architecture of POLAR DMS Conclusion 1-14

Data Consistency Data Consistency Level A contract between DMS provider and customer Customer: From application developer to end-customer A range of consistency levels exists Strong consistency: Low availability, but more user-friendly Weak Consistency: High availability, but less user-friendly EC SI 1SR W e a k c o n s i s t e n c y S t r o n g c o n s i s t e n c y 1-15

Data consistency in the Cloud Current Cloud Data Systems provide weak consistency More oriented towards scalability If strong consistency needed, use traditional DBMS Are less scalable due to the consistency overhead Disadvantages of current solutions Different DBMS types for different consistency levels No way to further influence behaviour of data consistency via policies! Kraska et al [3]: Consistency rationing Categorize data based on their consistency requirements: 1SR, EC, Adaptive Our approach: Adaptive consistency based on policies at transaction level! 1-16

Idea: Cost-based Consistency Control - C3 Implement a range of consistency levels Provide a policy API at transaction level Adjust consistency at runtime based on the policies - C3 [1] Current Consistency Levels (CL): 1SR, SI and EC Cost models for 1SR, SI and EC defined in [2] 1SR & SI generate no or rarely inconsistencies, but are expensive EC is cheap, but may generate high inconsistency costs Policies: Cost-budget (CB) for a transaction & inconsistency costs (IC) 1-17

C3 Architecture Provides an API at transaction level (CB, IC) Different combinations possible Enforce a consistency level by setting IC = Is based on a modular architecture Implement new protocols Define their cost models C3 will choose the cheapest consistency protocol 1-18

ECO-1SR C3: Provides a meta-consistency level improve transaction costs & performance by switching between concrete consistency levels Can the concrete levels be improved? Again, based on policies Work-in-Progress: Efficient & cost-optimized 1SR ECO-1SR 1-19

ECO-1SR vs. Traditional 1SR Traditional implementation of 1SR ECO-1SR Two-Phase-Locking (2PL) and Two- Phase-Commit (2PC) All replicas synchronized eagerly before transaction returns to the user Optimize transaction execution Optimize transaction commit: combination of eager & lazy replica synchronization 1-20

ECO-1SR vs. Traditional 1SR Traditional implementation of 1SR ECO-1SR Two-Phase-Locking (2PL) and Two- Phase-Commit (2PC) All replicas synchronized eagerly before transaction returns to the user Optimize transaction execution Optimize transaction commit: combination of eager & lazy replica synchronization 1-21

ECO-1SR Optimizations Transaction execution: which replica is best suited for the transaction execution Choose the replica with highest capacity and freshness, and lowest cost Transaction commit: how many and which replicas to synchronize eagerly How many: Depending on data popularity (i.e. transaction popularity) Which: Choose the one with highest capacity, highest popularity and lowest cost All other replicas updated lazily Active refresh in the case transactions access stale data The 'right' choice reduces the overhead for providing 1SR 1-22

Agenda Policy-based and modular Data Management Systems (POLAR DMS) Cost-effective & Policy-based Data Management Data Consistency Data Replication Data Archiving Architecture of POLAR DMS Conclusion 1-23

Cost-based Data Replication How to provide efficient and cost-optimized replication? What is the best number of replicas? What is the best replica type? What is the best replica location? Example 1 In the next period many high important transactions expected High importance: high profit or high penalty if constraints violated! Should be more replicas be generated? What replica types are more costeffective? Example 2 Legal issues: Data can be stored/replicated only in specific locations Specify replication policies per data-object or application 1-24

Agenda Policy-based and modular Data Management Systems (POLAR DMS) Cost-effective & Policy-based Data Management Data Consistency Data Replication Data Archiving Architecture of POLAR DMS Conclusion 1-25

Data Archiving - Motivation Assumption: freshness-aware key-value store timestamped/versioned data items freshness-centric operations Observation: concurrently different versions of the same data item by design different applications/clients: different freshness requirements Key: CharliesAccount Value: 200.00 Timestamp: 3 Key: CharliesAccount Value: 200.00 Timestamp: 3 eager replication lazy replication Key: CharliesAccount Value: 1000.00 Timestamp: 1 lazy replication Key: CharliesAccount Value: 500.00 Timestamp: 2

Data Archiving - Motivation Assumption: freshness-aware key-value store timestamped data items freshness-centric operations Observation: concurrently different versions of the same data item by design different applications/clients: different freshness requirements Idea: not deal with, but exploit these properties Goals: archive cloud data in situ allow for policy enforcement more freshness-centric operations enhanced replica placement 1-27

K/V Store Infrastructure Example: freshness-aware K/V store for webshop Few writes, many reads few update, many read s -semantics read() latest local data sufficient Update lazy/pull replication Update Update eager replication update(items on stock) webshop read(items on stock) itemsonstock : 100 timestamp: 1

K/V Store Infrastructure II Example: freshness-aware K/V store for a webshop Few writes, many reads few update, many read s -semantics read() latest local data sufficient readnotolderthan(time t) when certain freshness level required possibly route / refresh necessary operation is feasible, however potentially expensive Update Update Update refresh() route() itemsonstock : 10 timestamp: 7 itemsonstock : 80 timestamp: 3 itemsonstock : 100 timestamp: 1 readnotolderthan(7) webshop

K/V Store Infrastructure III Other read-semantics are necessary as well. However, they are either expensive or impossible. (Or even both!) Examples: readnotyoungerthan(t) How is the stock of a book developing? readbetween(t1, t2) How was it yesterday? readallbetween(t1, t2) How exactly did it fluctuate during our sales week? Data has to be reconstructed via local logs or route/ defresh. Update Update Update log webshop readnotyoungerthan(2) itemsonstock : 80 timestamp: 3 route/defresh() itemsonstock : 100 timestamp: 1

Archive Infrastructure An archive helps rolling back time Different possible archive layouts possible: Update Update Archive update Archive update Update Update Update Archive read Archive read Update Archive update Archive read Archive read Dedicated archive Integrated archive

Data Archiving with Policies Control over archiving is necessary Which data? How fast? How widely spread? What about successive updates? How long? minimal, maximal TTL Who may access data? How much $ to spend? etc. Policies are needed 1-32

Data Archiving with Policies Control over archiving is necessary Which data? How fast? How widely spread? What about successive updates? How long? minimal, maximal TTL Who may access data? How much $ to spend? etc. Data Policy Key: Speed: Spread: Successive updates: TTLmin: TTLmax: Access: Budget: CharliesAccount eager max infinite 10 years 10 years charlie;accountmanager infinite 1-33

*-Semantics Rich(er) read*-semantics can be handled efficiently with the archive without archive: read() readnotolderthan(t) with archive: readnotyoungerthan(t) readbetween(t1, t2) readallbetween(t1, t2) Production Update t = 5 t = 3 t = 2 Step 2 Archive Update t = 3..5 t = 1..4 readallbetween(3,4) Step 3 Step 1 readallbetween(2,4) webshop 1-34

Putting the Pieces Together High-level, conceptual vision of modular & policy-based DMS Concepts of Data Consistency Data Replication Data Archiving Recall: we want to reach policy-based and modular DMSs In order to put the pieces together: we built a concrete architecture 1-35

Agenda Policy-based and modular Database Management Systems (POLAR DMS) Cost-effective & Policy-based Database Management Data Consistency Data Replication Data Archiving Architecture of POLAR DMS Conclusion 1-36

Architecture of POLAR DMS 1-37

Architecture of POLAR DMS Based on OSGi Module-based framework for Java Facilitates run-time changes of modules Basic architecture Implement core modules: Data Access, Routing, etc. Implement additional functional modules: C3, ECO-1SR, Data Archiving, etc. Policy controlled module selection Finding optimal module selection and configuration POLAR DMS core policy system extensions S3 Access ECO-1SR C3 1-38

Agenda Policy-based and modular Data Management Systems (POLAR DMS) Cost-effective & Policy-based Data Management Data Consistency Data Replication Data Archiving Architecture of POLAR DMS Conclusion 1-39

Conclusion POLAR DMS consists of a unified model for: categorizing, assessing and creating Data Management Systems (DMS) This is achieved by a policy-driven framework architecture We built concrete functional modules, enabling cost-effective and policy-based Data Consistency Data Replication Data Archiving We provide a concrete architecture of POLAR DMS 1-40

Thank You! Questions? 1-41

References [1]: Ilir Fetai and Heiko Schuldt, Cost-Based Data Consistency in a Data-as-a-Service Cloud Environment. Proceedings of the 5th International Conference on Cloud Computing (CLOUD 2012), Honolulu, HI, USA, 2012/6. [2]: Ilir Fetai and Heiko Schuldt, Cost-Based Adaptive Concurrency Control in the Cloud. Technical Report, Department of Mathematics and Computer Science, University of Basel, 2012/2. [3]: Kraska et al, Consistency rationing in the cloud: pay only when it matters. Proc. VLDB Endow., 2009. 1-42