Amr El Abbadi. Computer Science, UC Santa Barbara amr@cs.ucsb.edu



Similar documents
Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara

Big Data, Deep Learning and Other Allegories: Scalability and Fault- tolerance of Parallel and Distributed Infrastructures.

Scalable and Elastic Transactional Data Stores for Cloud Computing Platforms

How To Manage A Multi-Tenant Database In A Cloud Platform

Transactions Management in Cloud Computing

A Taxonomy of Partitioned Replicated Cloud-based Database Systems

Aaron J. Elmore, Carlo Curino, Divyakant Agrawal, Amr El Abbadi. cs.ucsb.edu microsoft.com

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

TRANSACTION MANAGEMENT TECHNIQUES AND PRACTICES IN CURRENT CLOUD COMPUTING ENVIRONMENTS : A SURVEY

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

Introduction to NOSQL

DATABASE REPLICATION A TALE OF RESEARCH ACROSS COMMUNITIES

Hosting Transaction Based Applications on Cloud

Cassandra A Decentralized Structured Storage System

Low-Latency Multi-Datacenter Databases using Replicated Commit

Scalable Transaction Management with Snapshot Isolation on Cloud Data Management Systems

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Do Relational Databases Belong in the Cloud? Michael Stiefel

bigdata Managing Scale in Ontological Systems

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Serializability, not Serial: Concurrency Control and Availability in Multi Datacenter Datastores

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Benchmarking and Analysis of NoSQL Technologies

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

MASTER PROJECT. Resource Provisioning for NoSQL Datastores

Managing Geo-replicated Data in Multi-datacenters

Scalable Transaction Management on Cloud Data Management Systems

Elasticity Primitives for Database as a Service

Distributed Systems. Tutorial 12 Cassandra

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

Performance of Scalable Data Stores in Cloud

A Survey of Distributed Database Management Systems

Structured Data Storage

Big Data Storage Architecture Design in Cloud Computing

How To Write A Database Program

NoSQL Data Base Basics

Slave. Master. Research Scholar, Bharathiar University

Practical Cassandra. Vitalii

MyDBaaS: A Framework for Database-as-a-Service Monitoring

Cloud Computing at Google. Architecture

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

CLOUD DATA MANAGEMENT: SYSTEM INSTANCES AND CURRENT RESEARCH

Database as a Service Polystore Systems Self-Managed Elastic Systems

Preparing Your Data For Cloud

Introduction to Apache Cassandra

Lecture Data Warehouse Systems

Challenges for Data Driven Systems

Cloud Computing Is In Your Future

Data Consistency on Private Cloud Storage System

From the Monolith to Microservices: Evolving Your Architecture to Scale. Randy linkedin.com/in/randyshoup

NoSQL. Thomas Neumann 1 / 22

How To Build Cloud Storage On Google.Com

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Secure Cloud Transactions by Performance, Accuracy, and Precision

An Approach to Implement Map Reduce with NoSQL Databases

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

Data Management Challenges in Cloud Computing Infrastructures

NoSQL Databases. Nikos Parlavantzas

The Cloud to the rescue!

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Data Management in the Cloud. Zhen Shi

Cloud Computing with Microsoft Azure

Distributed Architecture of Oracle Database In-memory

Database Scalability {Patterns} / Robert Treat

Yahoo! Cloud Serving Benchmark

TECHNICAL OVERVIEW HIGH PERFORMANCE, SCALE-OUT RDBMS FOR FAST DATA APPS RE- QUIRING REAL-TIME ANALYTICS WITH TRANSACTIONS.

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

On the Design and Scalability of Distributed Shared-Data Databases

nosql and Non Relational Databases

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

INTRODUCTION TO CASSANDRA

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

LARGE-SCALE DATA STORAGE APPLICATIONS

Transcription:

Amr El Abbadi Computer Science, UC Santa Barbara amr@cs.ucsb.edu Collaborators: Divy Agrawal, Sudipto Das, Aaron Elmore, Hatem Mahmoud, Faisal Nawab, and Stacy Patterson.

Client Site Client Site Client Site Load Balancer (Proxy) App Server App Server App Server App Server App Server Database becomes the Scalability Bottleneck MySQL Master DB Cannot leverage elasticity

Client Site Client Site Client Site Load Balancer (Proxy) App Server App Server App Server App Server App Server MySQL Master DB

Client Site Client Site Client Site Load Balancer (Proxy) App Server App Server App Server App Server App Server Key Value Scalable and Elastic, but limited consistency and Stores operational flexibility

Operations on a single row are atomic. Objective: make read operations single-sited! Scalability and Elasticity: Data is partitioned across multiple servers. Bigtable, PNUTS, Dynamo, Hypertable, Cassandra, Voldemort

Scale-up Classical enterprise setting (RDBMS) Flexible ACID transactions Transactions in a single node Scale-out Cloud friendly (Key value stores) Execution at a single server Limited functionality & guarantees No multi-row or multi-step transactions

Application developers need higher-level abstractions: MapReduce paradigm for Big Data analysis Transaction Management in DBMSs

It s nice to have JOINs It s nice to have transactions After 30 years of development, it seems that SQL Databases have some solid features, like the query analyzer. NoSQL is like the Wild West; SQL is civilization Gee, there sure are a lot of tools oriented toward SQL Databases. Peter Wayner at InfoWorld Seven Hard Truths about NoSQL technologies July 2012.

RDBMS Fission Fusion Key Value Stores ElasTraS [HotCloud 09, TODS] Cloud SQL Server [ICDE 11] RelationalCloud [CIDR 11] G-Store [SoCC 10] MegaStore [CIDR 11] ecstore [VLDB 10] Walter [SOSP 11]

These systems question the wisdom of abandoning the proven data management principles Gradual realization of the value of the concept of a transaction and other synchronization mechanisms Avoid distributed transactions by co-locating data items that are accessed together

Pre-defined partitioning scheme e.g.: Tree schema ElasTras, SQLAzure (TPC-C) Workload driven partitioning scheme e.g.: Schism in RelationalCloud

TM Master Health and Load Management OTM OTM Lease Management OTM Durable Writes Metadata Manager Master and MM Proxies Txn Manager P 1 P 2 P n Log Manager DB Partitions Distributed Fault-tolerant Storage

Semantically pre-defined as Entity Groups Blogs, email, maps Cheap transactions in Entity groups (common)

Grouping Protocol Key Group Ownership of keys at a single node One key selected as the leader Followers transfer ownership of keys to leader

How does the leader execute transactions? Caches data for group members underlying data store equivalent to a disk Transaction logging for durability Cache asynchronously flushed to propagate updates Guaranteed update propagation Leader Transaction Manager Cache Manager Log Followers Asynchronous update Propagation

Load Balancer Application/ Web/Caching tier Database tier

Decoupled storage architectures ElasTraS, G-Store, Deuteronomy, MegaStore Persistent data is not migrated Albatross [VLDB 2011] Shared nothing architectures SQL Azure, Relational Cloud, MySQL Cluster Migrate persistent data Zephyr [SIGMOD 2011]

Need to tolerate catastrophic failures Geographic Replication How to support ACID transactions over data replicated at multiple datacenters One-copy serializablity: Gives Consistency and Replication. Clients can access data in any datacenter, appears as single copy with atomic access Major challenges: Latency bottleneck (cross data center communication) Distributed synchronization Atomic commitment

The Paxos Approach

Megastore-Google (CIDR11) PaxosCP-UCSB (VLDB12) user User User User Paxos Paxos Log position 1 2 3 4 5 Log position 1 2 3 4 5 Transactio n α β γ Transactio n α β γ

MDCC Berkeley (EuroSys 13) User user Paxos User User Log position Transactio n 1 2 3 4 5 α β γ Log position Transactio n 1 2 3 4 5 α β γ

Asynchronous coordination

Message Futures UCSB [CIDR 13] User user Replicated Log User Replicated Log

Message Futures Data center A Data center B Log A-12 carries a reservation with value 12 Transactions requesting to commit in the green area are assigned reservation number 12 Green area transactions commit at point 13 if no conflicts observed with Log B 12 13 Log A-12 Log B Transactions in the red area and earlier are included in Log B Log B acknowledges the receipt of reservation 12

Message Futures cases Data center B sends Logs at a higher rate. DC A DC B 12 Immediate commits A new transaction in the immediate commit zone will have its reservation (12) already acknowledged

Transaction execution ON fault-tolerant replicated storage

Spanner Google [OSDI 12] User user 2PC Paxos Paxos Leader Paxos Leader Paxos Leader

Replication on Consistent ACID Data Centers

Replicated Commit --UCSB [inprogress] User user 2PC Paxos Data center Leader Data center Leader

The Cloud is the inevitable choice for hosting Big Data Ecosystems. The Cloud poses unique challenges: Scalability, elasticity, and performance Distribution and Consistency Fault-tolerance We are in the era of Globalization REALLY!