Cloud Computing - A Database Perspective. Donald Kossmann Systems Group, ETH Zurich http://systems.ethz.ch



Similar documents
Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

An Evaluation of Alternative Architectures for Transaction Processing in the Cloud

SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK

Technical Challenges for Big Health Care Data. Donald Kossmann Systems Group Department of Computer Science ETH Zurich

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

CloudDB: A Data Store for all Sizes in the Cloud

Preparing Your Data For Cloud

Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara

Cloud Computing. Lecture 24 Cloud Platform Comparison

Platforms in the Cloud

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Can the Elephants Handle the NoSQL Onslaught?

APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS

Cloud Computing: Making the right choices

Outlook. Corporate Research and Technologies, Munich, Germany. 20 th May 2010

WOLKEN KOSTEN GELD GUSTAVO ALONSO SYSTEMS GROUP ETH ZURICH

How To Run A Cloud Computer System

JAVA IN THE CLOUD PAAS PLATFORM IN COMPARISON

Amazon Web Services Yu Xiao

Amr El Abbadi. Computer Science, UC Santa Barbara

Users VM A A A. Application. Compute/Storage/Network. VM Virtual Machine. On-Premises Data Center

Performance Management for Cloudbased STC 2012

Cloud Computing Technology

OTM in the Cloud. Ryan Haney

PaaS - Platform as a Service Google App Engine

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Scalable Architecture on Amazon AWS Cloud

NoSQL for SQL Professionals William McKnight

Amazon Elastic Beanstalk

Preparing Your IT for the Holidays. A quick start guide to take your e-commerce to the Cloud

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

How To Write A Database Program

MASTER PROJECT. Resource Provisioning for NoSQL Datastores

Distributed Data Stores

ZADARA STORAGE. Managed, hybrid storage EXECUTIVE SUMMARY. Research Brief

Cloud Computing Is In Your Future

Database Scalability and Oracle 12c

Cloud Computing with Microsoft Azure

Cloud Computing. Up until now

A programming model in Cloud: MapReduce

Practical Cassandra. Vitalii

How To Understand Cloud Computing

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

Technical Writing - Definition of Cloud A Rational Perspective

Cloud Courses Description

How To Store Data On A Server Or Hard Drive (For A Cloud)

Lecture 6 Cloud Application Development, using Google App Engine as an example

bigdata Managing Scale in Ontological Systems

Design for Failure High Availability Architectures using AWS

Cloud computing - Architecting in the cloud

Large-Scale Web Applications

Benchmarking and Analysis of NoSQL Technologies

An Introduction to Cloud Computing Concepts

A Hotel in the Cloud. Bruno Albietz

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos

Cloud Design and Implementation. Cheng Li MPI-SWS Nov 9 th, 2010

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Using Cloud Services for Test Environments A case study of the use of Amazon EC2

Tushar Joshi Turtle Networks Ltd

Mike Boyarski Jaspersoft Product Marketing Business Intelligence in the Cloud

What is Cloud Computing? Tackling the Challenges of Big Data. Tackling The Challenges of Big Data. Matei Zaharia. Matei Zaharia. Big Data Collection

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Cloud Computing Summary and Preparation for Examination

Databases in the Cloud

Oracle Applications and Cloud Computing - Future Direction

Cloud Computing. Chapter 1 Introducing Cloud Computing

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

CLOUD DATABASE DATABASE AS A SERVICE

Towards a simple programming model in Cloud Computing platforms

Storing and Processing Sensor Networks Data in Public Clouds

Cloud Compu)ng. [Stephan Bergemann, Björn Bi2ns] IP 2011, Virrat

INTRODUCTION TO CASSANDRA

How To Choose Cloud Computing

JBoss & Infinispan open source data grids for the cloud era

Introduction to Azure: Microsoft s Cloud OS

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Open Source Technologies on Microsoft Azure

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

CLOUD COMPUTING. Dana Petcu West University of Timisoara

Cloud Computing and Amazon Web Services. CJUG March, 2009 Tom Malaher

Transcription:

Cloud Computing - A Database Perspective Donald Kossmann Systems Group, ETH Zurich http://systems.ethz.ch

Agenda Promises of Cloud Computing Benchmarking the State-of-the Art Amazon, Google, Microsoft Towards a Silver Bullet Asking the right questions

The Cloud 15 Years Ago Browser http http Browser http Service 1 DB 1 Internet http Service 2 DB 2

The Cloud Today Browser http http Browser http Service 1 DB 1 Internet http Service 2 DB 2

Cost Promises of Cloud Computing reduce cost utilization, commoditization, optimization prevention of failures, having failures pay as you go (cap-ex vs. op-ex) easy to test! Time to market avoid unnecessary steps HW provisioning, puchasing, testing difficult to test! can only test scalability / elasticity

Cow vs. Supermarket Cow Big upfront investment Produces more than you need Uses up resources Maintenance needed Unpleasant waste product Supermarket: Bottle of Milk Small continous expense Buy what you need Less resources needed No maintenance Waste is not my problem [Abadi 2009], [Kraska 2009]

Questions to ask Questions you ask your supermarket How expensive is one litre of milk? Is the price always the same? What if I have a big party? Can you deliver? Questions you do not ask How fast is the supermarket? Good enough is good enough!

What to optimize? Feature Traditional Cloud Cost [$] fixed optimize Performance [tps, secs] optimize fixed Scale-out [#cores] optimize fixed Predictability [σ($)] - fixed Consistency [%] fixed??? Flexibility [#variants] - optimize Put $ on the y-axis of your graphs!!! [Florescu & Kossmann, SIGMOD Record 2009]

Agenda Promises of Cloud Computing Benchmarking the State-of-the Art Amazon, Google, Microsoft Towards a Silver Bullet Asking the right questions

Amazon What is out there? IaaS: S3 and EC2 PaaS: SQS, SimpleDB, RDS Google App Engine PaaS: Integrated DB, AppServer, and Cache Microsoft Azure PaaS: Windows Azure, SQL Azure,.Net Azure Gazillions of start-ups (e.g., 28msec ) Research prototypes (e.g., Crescando )

Tested Services MS Azure Google App Eng AWS RDS AWS S3 Business Model PaaS PaaS PaaS IaaS Architecture Replication Part. + Repl. (+Dist. Control) Classic Distr. Control Consistency SI SI Rep. Read EC Cloud Provider Microsoft Google Amazon Flexible Web/App Server.Net Azure AppEngine Tomcat Tomcat Database SQL Azure DataStore MySQL -- Storage / FS Simple DataStore GFS -- S3 App-Language C# Java/AppEngine Java Java DB-Language SQL GQL SQL Low-Lev. API HW Config. Part. automatic Automatic Manual Manual

Tested Services AWS SimpleDB AWS MySQL AWS MySQL/R Business Model PaaS IaaS IaaS Architecture Replication Classic Replication Consistency EC Rep. Read Rep. Read Cloud Provider Amazon Flexible Flexible Web/App Server Tomcat Tomcat Tomcat Database SimpleDB MySQL MySQL Storage / File System -- EBS EBS App-Language Java Java Java DB-Language SimpleDB Queries SQL SQL HW Config Partly Automatic Manual Manual

Cost [m$/wi] EBs 1 10 100 500 1000 Fully Utilized MySQL 0.635 0.072 0.020 0.006 0.006 0.005 MySQL/R 2.334 0.238 0.034 0.008 0.006 0.005 RDS 1.211 0.126 0.032 0.008 0.006 0.005 SimpleDB 0.384 0.073 0.042 0.039 0.037 0.037 S3 1.304 0.206 0.042 0.019 0.011 0.009 Google AE 0.002 0.028 0.033 0.042 0.176 0.042 Google AE/C 0.002 0.018 0.026 0.028 0.134 0.028 Azure 0.775 0.084 0.023 0.006 0.006 0.005

Cost predictability [m$/wi] mean ± σ MySQL 0.015 ± 0.077 MySQL/R 0.043 ± 0.284 RDS 0.030 ± 0.154 SimpleDB 0.063 ± 0.089 S3 0.018 ± 0.098 Google AE 0.029 ± 0.016 Google AE/C 0.021 ± 0.011 MS Azure 0.010 ± 0.058 Ideal: Cost / WI is constant low σ better (ideal: σ=0) cost independent of load Google clear winner here!

Relative Cost Factors MySQL Azure SimpleDB S3 21

Throughput [WIPS]

Overload Behavior RDS SimpleDB App Engine

Agenda Promises of Cloud Computing Benchmarking the State-of-the Art Amazon, Google, Microsoft Towards a Silver Bullet Asking the right questions

Reference Architecture Client HTTP XML, JSON, HTML Web Server FCGI,... XML, JSON, HTML App Server SQL records DB Server get/put block Store

Open Questions Client How to map stack to IaaS? Web Server How to implement store layer? App Server What consistency model? DB Server What programming model? Store Whether and how to cache?

Variant I: Partition Workload by Tenant Client Client Client Client HTTP FCGI,... Web Server App Server XML, JSON, HTML XML, JSON, HTML Workload Splitter XML, JSON, HTML Server-A Server-B SQL records Server-A Server-B DB Server Server-A Server-B get/put block block Store Store-A Store-B

Metaphor: Shopping Mall If a shop is successful, you need to move it! (popularity vs. growth of product assortment)

Principle Partition Workload by Tenant partition data by tenant route request to DB of that tenant Advantages reuse existing database stack (RDBMS) flexibility to use DAS or SAN/NAS Disadvantages multi-tenant problem [Salesforce] optimization, migration, load balancing, fix cost silos: need DB federator for inter-tenant requests expensive HW and SW for high availabilty

Variant II: Partition Workload by Request Client Client Client Client HTTP FCGI,... SQL Web Server App Server DB Server XML, JSON, HTML XML, JSON, HTML records Workload Splitter XML, JSON, HTML Server-A Server-B??? Store (e.g., S3) Store (e.g., S3) Store (e.g., S3) get/put block Store

Metaphor: Internet Department Store Server-A Server-A Server-A Server If a product is successful, you stock up its supply Transparent and fine-grained reprovisioning Cost of reprovisioning much lower!!! Store (e.g., S3) Store (e.g., S3) Store (e.g., S3)

Partition Workload by Request Principle fine-grained data partitioning by page or object any server can handle any request implement DBMS as a library (not server) Advantages avoids disadvantages of Variant I Disadvantages new synchronization problem whole new breed of systems caching not effective

Experiments Revisited: Cost / WI (m$) Low Load Peak Load Variant 1 (RDS) 1.212 0.005 Variant 2 (S3) - 0.007 Variant 2 (Google) 0.002 0.028 Variant 1 (Azure) 0.775 0.005 Only V2 is cheap and predictable!

Throughput [WIPS] V1, V2 V1 V1, V2 High performance can be achieved with both! Warning: Still a hypothesis. S3 is only EC.

Conclusion & Future Work Trying to define the problem benchmark what is out there work with players: Enterprise Computing Center (Amadeus, Credit Suisse, Microsoft SAP) Find silver bullet for data management in cloud Develop reference architectures

Reference Kossmann, Kraska: Data Management in the Cloud: Promises, State-of-the-art, and Open Questions. Datenbank Spektrum, 2010. http://www.pubzone.org/dblp/journals/dbsk/koss mannk10 (Published by Springer Open Access.) (Summary of several talks and papers of the DB community on cloud computing.)

Open Questions How to map traditional DB stack to IaaS? How to implement the storage layer? What is the right consistency model? What is the right programming model? Whether and how to make use of caching?

Storage Layer Questions Client Web Server App Server DB Server What is the interface? get/put or more? return records or blocks? How to implement this interface? data structures storage media and network clustering of data granularity of partitioning / sharding get/put block Store

API of Store Assumption: Hardware trends will change the game most data in MM or SSD / PCM Proposition 1: return tuples rather than blocks blocks are an arte-fact of old HW (i.e., disks) Proposition 2: push down predicates because it can be done (power depends on impl.) compensates for cost of remote storage (old idea; e.g., idisks) Open: consistency constract of store? store might have nice properties (e.g., mon. Writes) store might have nice primitives (e.g., counter, set/test)

DAS Store Implementation Variants local disks with physically exclusive access put/get interface; no synchronization only works for V1 Key-value stores (e.g., Dynamo) DHTs with concurrent access put/get interface; no synchronization works for V1 and V2; makes more sense for V2 A DBMS (e.g., MySQL, SQL Server) have been misused before ClockScan [Unterbrunner et al. 2009] massively shared scans in a distributed system

Open Questions How to map traditional DB stack to IaaS? How to implement the storage layer? What is the right consistency model? What is the right programming model? Whether and how to make use of caching?

CAP Theorem Three properties of distributed systems Consistency (ACID transactions w. serializability) Availability (nobody is ever blocked) resilience to network Partitioning Result it is trivial to achieve 2 out of 3 it is impossible to have all three Highly controversial discussion [Brantner 08, Abadi 10] Levels of availability (always vs. never available) What is the difference between CP and CA

What have people done? Client-side Consistency Models [Tannenbaum] read & write monotonicity, session consistency,... Time-line consistency [PNUTS08] all writes are consistent (except across DC) read consistency: either don t care or latest version New DB transaction models Escrow, Reservation Pattern [O Neil 86], [Gawlick 09] SAGAs and compensation; e.g., in BPEL [G.-Molina,Salem] SAP, Amadeus et al. [Buck-Emden], [Kemper et al. 98] Define logical unit of consistency [Azure] strong consistency within LuC app-defined consistency across LuCs Educate Application Developers [Helland 2009]

What have people done? Client-side Consistency Models [Tannenbaum] read & write monotonicity, session consistency,... Time-line consistency [PNUTS08] all writes are consistent (except across DC) read consistency: either don t care or latest version New DB transaction models Sacrifice Consistency Escrow, Reservation Pattern [O Neil 86], [Gawlick 09] Push problem to app programmer! SAGAs and compensation; e.g., in BPEL [G.-Molina,Salem] SAP, Amadeus et al. [Buck-Emden], [Kemper et al. 98] Define logical unit of consistency [Azure] strong consistency within LuC app-defined consistency across LuCs Educate Application Developers [Helland 2009]

What have we done? Levels of Consistency, Cost Tradeoffs [Brantner08] can achieve read/write monotonicy + A + P the more consistency, the higher the cost Economic models for consistency [Amadeus], [Kraska09] Classify the data as A, B, C A data always strong consistent C data always eventually consistent B data handled like A or C data adaptively Cost model that estimates business impact of inconst.

What have we done? Levels of Consistency, Cost Tradeoffs [Brantner08] can achieve read/write monotonicy + A + P the more consistency, the higher the cost Follow tradition to Economic push problem models for to consistency app programmer! [Amadeus], [Kraska09] Classify the data as A, B, C A data always strong consistent But for a different reason!!! C data always eventually consistent B data handled like A or C data adaptively Cost model that estimates business impact of inconst.

Cost per 1000 TAs [$] (TPC-W) Total Adjustable Fixed Nothing 0.15 0 0.15 Durability 1.8 1.1 0.7 Monotonicity 2.1 1.4 0.7 Monotonicity+ Atomicity 2.9 2.6 0.3 [Brantner+08]

Latency per TAs [secs] (TPC-W) Avg. Max. Nothing 11.3 12.1 Durability 4.0 5.9 Monotonicity 4.0 6.8 Mono+Atomici ty 2.8 4.6 [Brantner+08]

Open Questions How to map traditional DB stack to IaaS? How to implement the storage layer? What is the right consistency model? What is the right programming model? Whether and how to make use of caching?

Programming Model Properties of a programming lang. for the cloud support DB-style + OO-style + CEP-style avoid keeping state at app servers for V2 Many languages will work in the cloud SQL, XQuery, Ruby,.Net /LINQ,...; J2EE will not work Open (research) questions do OLAP on the OLTP data: My guess is yes! rewrite your apps: My guess is yes!

Caching Many Variants Possible this is just one V1 caching mandatory V2 caching prohibitive TPC-W Experiments marginal improvements for Google AppEngine No low hanging fruit

Agenda Promises of Cloud Computing Benchmarking the State-of-the Art [SIGMOD 2010] Amazon, Google, Microsoft Towards a Silver Bullet [ICDE 2010] Asking the right questions

No! (Only for code / precompiled query plans) Case Study: Bets Made by 28msec How to map traditional DB stack to IaaS? implemented both architectures (V1 + V2) V1 only in a single server variant for low end How to implement the storage layer? EBS for V1; KVS for V2 What is the right consistency model? ACID for V1; configurable for V2 What is the right data + programming model? XML & XQuery Whether and how to make use of caching?

Conclusion & Future Work Hopefully started a new benchmark war Tested elasticity, cost and cost predictability Vendors interested in cost experiments Academia interested in scalability experiments More bits available: http://www.pubzone.org Find silver bullet for data management in cloud Develop a reference architecture Combine partitioning + replication + distributed control ETH Systems Group: http://www.systems.ethz.ch