Middle-R: A Middleware for Scalable Database Replication. Ricardo Jiménez-Peris

Similar documents

Ph.D. Thesis Proposal Database Replication in Wide Area Networks

Data Replication and Snapshot Isolation. Example: Cluster Replication

Postgres-R(SI): Combining Replica Control with Concurrency Control based on Snapshot Isolation

DATABASE REPLICATION A TALE OF RESEARCH ACROSS COMMUNITIES

Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication

Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication

Database replication for commodity database services

Boosting Database Replication Scalability through Partial Replication and 1-Copy-Snapshot-Isolation

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Pangea: An Eager Database Replication Middleware guaranteeing Snapshot Isolation without Modification of Database Servers

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Adaptive Middleware for Data Replication

Chapter 18: Database System Architectures. Centralized Systems

Transactions and the Internet

Conflict-Aware Scheduling for Dynamic Content Applications

Survey on Comparative Analysis of Database Replication Techniques

Consistent Database Replication at the Middleware Level

Ganymed: Scalable Replication for Transactional Web Applications

Data Management in the Cloud

Transaction Management Overview

Transactional Replication in Hybrid Data Store Architectures

Strongly consistent replication for a bargain

Byzantium: Byzantine-Fault-Tolerant Database Replication

Conflict-Aware Load-Balancing Techniques for Database Replication

Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts

Lecture 7: Concurrency control. Rasmus Pagh

Transactional properties of DBS

Geo-Replication in Large-Scale Cloud Computing Applications

Applying Database Replication to Multi-player Online Games

Distributed Database Management Systems

DISTRIBUTED AND PARALLELL DATABASE

Database Replication Techniques: a Three Parameter Classification

Chapter 2 Distributed Database Management Systems: Architectural Design Choices for the Cloud

PipeCloud : Using Causality to Overcome Speed-of-Light Delays in Cloud-Based Disaster Recovery. Razvan Ghitulete Vrije Universiteit

Database Replication Policies for Dynamic Content Applications

Specification and Implementation of Dynamic Web Site Benchmarks. Sameh Elnikety Department of Computer Science Rice University

Distributed Data Management

Transactions: Definition. Transactional properties of DBS. Transactions: Management in DBS. Transactions: Read/Write Model

DATABASE ENGINES ON MULTICORES SCALE: A PRACTICAL APPROACH. João Soares and Nuno Preguiça

PARALLELS CLOUD STORAGE

Versioned Transactional Shared Memory for the

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Delivering Quality in Software Performance and Scalability Testing

New method for data replication in distributed heterogeneous database systems

EDG Project: Database Management Services

Contents RELATIONAL DATABASES

ZooKeeper. Table of contents

Performance Analysis of Web based Applications on Single and Multi Core Servers

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Replicated open source databases for web applications: architecture and performance analysis

Evaluating and Comparing the Impact of Software Faults on Web Servers

GORDA: An Open Architecture for Database Replication

Concurrency Control. Module 6, Lectures 1 and 2

CHAPTER 6: DISTRIBUTED FILE SYSTEMS

Performance Prediction, Sizing and Capacity Planning for Distributed E-Commerce Applications

Special Relativity and the Problem of Database Scalability

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

A Proposal for a Multi-Master Synchronous Replication System

CAP Theorem and Distributed Database Consistency. Syed Akbar Mehdi Lara Schmidt

Intel Xeon Processor 5560 (Nehalem EP)

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara

DDB Functionalities by Major DMBS Products. Haibin Liu Shcherbak Maryna Nassrat Hatem

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases

BBM467 Data Intensive ApplicaAons

Scaling Database Performance in Azure

Hadoop and Map-Reduce. Swati Gore

Autonomic Provisioning of Backend Databases in Dynamic Content Web Servers

In Memory Accelerator for MongoDB

Data Distribution with SQL Server Replication

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK

MDCC: Multi-Data Center Consistency

The Cost of Increased Transactional Correctness and Durability in Distributed Databases

A Replication Protocol for Real Time database System

Highly Available Database Systems

NVRAM-aware Logging in Transaction Systems. Jian Huang

Database Tuning and Physical Design: Execution of Transactions

Distributed Systems LEEC (2005/06 2º Sem.)

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Serializability, not Serial: Concurrency Control and Availability in Multi Datacenter Datastores

Database Replication: a Tale of Research across Communities

Database Replication: A Survey of Open Source and Commercial Tools

Cloud Based Application Architectures using Smart Computing

SQL Server Performance Tuning and Optimization

MySQL and Virtualization Guide

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Clustering Versus Shared Nothing: A Case Study

CMSC724: Concurrency

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

GORDA: An Open Architecture for Database Replication

Conflict-Aware Load-Balancing Techniques for Database Replication

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

Concurrency Control: Locking, Optimistic, Degrees of Consistency

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

On Non-Intrusive Workload-Aware Database Replication

Dell Compellent Storage Center SAN & VMware View 1,000 Desktop Reference Architecture. Dell Compellent Product Specialist Team

Expert Reference Series of White Papers. Introduction to Amazon Relational Database Service (Amazon RDS)

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

Transcription:

Middle-R: A Middleware for Scalable Database Replication Ricardo Jiménez-Peris Distributed Systems Laboratory Universidad Politécnica de Madrid (UPM) Joint work with Bettina Kemme, Marta Patiño, Yi Lin and Damian Serrano.

Content Motivation Snapshot Isolation One Copy Snapshot Isolation Partial Replication Other Issues 2

Motivation Current solutions to data replication exhibit limited scalability due to three main reasons: Restrictive concurrency/replica control. Full replication. Engineering of data replication (e.g. writeset extraction). In this talk we address all these issues. 3

Concurrency and Replica Control Bottlenecks 1-copy serializability has been the traditional correctness criterion. What is wrong with it? Read-write conflicts restricted potential concurrency And in a replicated setting What is the alternative? Snapshot isolation. Oracle success story. 4

Serializability The result is the same as executing them serially. Conflicts: read/write and write/write T0 T1 T2 w(x) r(x) w(x) r(x) w(y) w(x) time T0 w(x) T1 r(x) w(x) w(y) T2 r(x),w(x) time 5

Snapshot Isolation Read from a snapshot of the committed data when the transaction starts Conflict: only write/write T0 T1 T2 w(x) r(x) w(x) r(x) w(y) w(z) time 6

Snapshot Isolation: Atomicity Provides to each transaction a snapshot of the committed state of the database as of transaction start. Thanks to multi-versioning it avoids read-write conflicts. Concurrent writes forbidden: write-write conflicts. Splits atomicity it two points: One point at the start of the transaction where all reads happen atomically. Another point at commit time where all writes happen atomically. 7

Snapshot Isolation: What is lost? Write Skew We want to enforce the integrity constraint: x+y 0 x+y=50+50>100 Initial values R(x=50, y=50) T1 x=50 y=50 R(x=50, y=50) T2 x+y=50+50>100 Two transactions: T1: Extract 50 Euros from X. T2 Extract 80 Euros from Y. x+y=0+50>0 W(x=50-50) x+y=-30<0 T1 x=0 W(y=50-80) y=-30 T2 Final values x+y=50-30>0 8

1-Copy-SI [SIGMOD 05, TODS 09] 1-Copy-SI defines 1-copy correctness for SI databases. It allows all possible executions in a centralized SI DB and disallows all other executions. In practical terms: The DB at each replica provides SI. The system guarantees that read operations see a globally consistent (1-copy equivalent) set of snapshots. Distributed concurrent txns are not allowed to modify the same tuples. 9

Implementing 1-Copy-SI: Consistency To disallow that distributed concurrent txns modify the same tuples: A global validation is needed. In order to guarantee that read operations see a globally consistent set of snapshots: The order in which local transactions are started and remote transactions committed at each replica should be constrained. 10

Implementing 1-Copy-SI: Main Components Clients Validator Clients Local txns Replica Replica Remote Remote Control txns txns Control txns upds txns upds Local txns BD BD 11

Implementing 1-Copy-SI: Get Protocol txn1: commit w(x) X Y commit(txn1) apply WS1(X) WS(txn1) and commit WS1(X) VALIDATION X Y WS1(X) WS2(X) abort txn2: commit w(x) X Y Validation Succeed Fail apply WS1(X) WS2(X) Get abort(txn2) commit(txn1) WS(txn2) and commit 12

Implementing 1-Copy-SI: Consistency Committing all txns in the same order has some cost. Would it be enough to commit write-conflicting txns in the same order? w1(x),c1 r3(x,y),c3 w2(y),c2 w2(y),c2 r4(x,y),c4 w1(x),c1 x x y y x y It does not guarantee a 1-copy consistent set of snapshots for reads!!! 13

1-Copy-SI: Replica Control One solution: Commit update txns in the same order at all replicas. Problem: SI DBs use locking for writes. 14 Deadlocks become possible between the local DBs and the replication logic. Ti W(x) Tj W(y) Tr W(y) Local txn Local txn Remote txn successfully validated blocked on W(y) Ti finishes execution, waits for Tr to complete to write x Tj W(x) Gets blocked on W(x) waiting for Ti Tr Tj Ti Tr Deadlock!!!

1-Copy-SI: Replica Control The alternative is to allow non-conflictive update txns to commit in any order (e.g. Tr and Ti in the deadlock example). But it is necessary to delay the start of local txns whilst there are holes in the commit order (i.e. the set of committed transactions is not a prefix of the sequence of validated transactions). Replicas will therefore need to alternate between parallel commitment of update transactions and start of new transactions. 15

1-Copy-SI: Validation Validation of updates is required to be done at the global level. It can be centralized or decentralized: Middle-R has decentralized validation that is inherently faulttolerant. In the decentralized case, txn updates sent for validation are sent in total order to guarantee validation determinism. 16

The Limits of Full Replication LOCAL REMOTE LOCAL REMOTE 2 Sites DB 3 Sites DB Updates impose a fraction of remote (not productive) work The more copies, the more remote work Full replication can scale to 20-30 replicas depending on the workload. Alternative: partial replication 17

Partial Replication txn(a, B, C) A B B C A C Distributed transactions might be needed!! 18

Hybrid Replication txn(a, B, C) txn(b, C) A B C A B C A B B C A C 19 Full Replicas Partial Replicas Transactions can be executed either at partial replicas or at full replicas Full replicas prevent distributed transactions Partial replicas provide scalability

Full vs. Hybrid vs. Partial 100 90 80 70 Scale out 60 50 40 30 20 10 Partial Full Hybrid 20% updates Replication degree 10 20 0 10 20 30 40 50 60 70 80 90 # Replicas

Issues with Partial Replication Hybrid replication prevents distributed transactions but full replicas become a bottleneck. Pure partial replication scales well but distributed txns are possible. How to provide 1CSI for a partially replicated system? When a txn becomes distributed it will get different snapshots at different replicas. How to provide a consistent snapshot for distributed txns? Can validation be partial (i.e. non-global)? Not in general. 21

Partial Validation Constraints A B F C E D txn(a) txn(b,c) txn(d,e) txn(d,f) If a site stores a replica of an object, it must receive writesets from objects in the same sub-graph But, if transaction access patterns are unknown every replica needs all the writesets 22

Preserving Snapshots Global consistent snapshot: Distributed transactions must be executed in the proper snapshot in the participants. Session consistency: Snapshots must be updated with former updates before executing txns for transaction session consistency. 23

Preserving Snapshots Dummy transactions Set of transactions that are begun for each snapshot A transaction is assigned to a snapshot when starts An operation of a transaction must get the snapshot before being executed at a site. Transaction aborts may occur because of unavailable snapshots Garbage collector closes unused dummy transactions based on piggybacked info 24

Scalability: Evaluation Setup Using TPC-W: benchmark that simulates an Amazonlike online book store Database backend: PostgreSQL 7.2 Servlet server: Tomcat 5.5.9 TPC-W tables are partitioned by tuples Homogeneous sites running Fedora Core 3 Two processors AMD Athlon 2GHz, 1 GB of RAM 100Mb Ethernet 25

12 Replicas: Throughput 26

12 Replicas: Response Time 27

Scale Out 28

Database Replication Architectural Approaches White Box Pros: It takes advantage of many optimizations performed within the database kernel. Cons It requires access to source code. It is tightly integrated with the implementation of the regular database functionality. Black Box Pros: The replication is performed outside the database. Cons: The database is used exclusively based on its user interface. Proposed approach: Combine advantages of both approaches in a Gray Box approach. 29

Writeset Extraction The writeset can be obtained in a standard way by using triggers (black box - standard). The writeset in some DBs can be obtained through proprietary log mining facilities (black box nonstandard). A function can be implemented in the DB to extract the writeset (gray-box) either in binary or in textual form (SQL update where primary key statement). 30

Cost of Writeset Extraction Commercial DB Triggers, log, no extraction MySQL SQL extraction service, no extraction 31

Cost of Writeset Extraction PostgreSQL Binary ws service, trigger, no extraction 32

Towards MiddleCloud High availability does not only require fault-tolerance but online recovery. Availability is not the right measure, but performability. We have developed online recovery for our replication protocols that is able to provide a peak performance close to 90-95% the one of the system without recovery. 33

Towards MiddleCloud Elasticity: Self-provisioning: Adding new replicas with disrupting the peak performance nor violating consistency. Detecting when the load has lowered to the point one replica can be decommissioned. Self-configuration of partial replicas. Deciding where to create replicas dynamically. 34

Towards MiddleCloud: A PostgreSQL Cloud Replication in across data centers requires special care. Group communication across WANs performs badly. Centralized validator outperforms other architectural approaches. Sticking to one WAN interaction per update txn and local txn execution is crucial. Interconnection of clusters require hierarchical replication protocols aware of the network topology (local sites vs. remote sites). 35

Multi-tier Replication High availability for multi-tier architectures introduces interesting problems. The application server caches data items. Current applications servers (AS) when combined with SI databases do not work!!! We have developed SI-aware caching that provides end-to-end SI consistency. We have developed a replication solution for J(2)EE that replicate pairs of application servers and databases guaranteeing 1CSI. 36

Questions?