Database replication for commodity database services

Similar documents
DATABASE REPLICATION A TALE OF RESEARCH ACROSS COMMUNITIES

Survey on Comparative Analysis of Database Replication Techniques

Database Replication with Oracle 11g and MS SQL Server 2008

Data Replication and Snapshot Isolation. Example: Cluster Replication

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Ganymed: Scalable Replication for Transactional Web Applications

Data Distribution with SQL Server Replication

Postgres-R(SI): Combining Replica Control with Concurrency Control based on Snapshot Isolation

Database Replication Techniques: a Three Parameter Classification

Pangea: An Eager Database Replication Middleware guaranteeing Snapshot Isolation without Modification of Database Servers

New method for data replication in distributed heterogeneous database systems

Scaling out by distributing and replicating data in Postgres-XC

Strongly consistent replication for a bargain

Database Replication: a Tale of Research across Communities

Ph.D. Thesis Proposal Database Replication in Wide Area Networks

Database Replication: A Survey of Open Source and Commercial Tools

Database Replication with MySQL and PostgreSQL

Distributed Data Management

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

Tier Architectures. Kathleen Durant CS 3200

DISTRIBUTED AND PARALLELL DATABASE

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Analysis of Caching and Replication Strategies for Web Applications

Building Highly Available Database Applications with Geronimo and Derby

Applying Database Replication to Multi-player Online Games

Data Management in the Cloud

CHAPTER 6: DISTRIBUTED FILE SYSTEMS

Boosting Database Replication Scalability through Partial Replication and 1-Copy-Snapshot-Isolation

Examining the Performance of a Constraint-Based Database Cache

Chapter 18: Database System Architectures. Centralized Systems

BBM467 Data Intensive ApplicaAons

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Bryan Tuft Sr. Sales Consultant Global Embedded Business Unit

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Consistent Database Replication at the Middleware Level

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases

Transactional Replication in Hybrid Data Store Architectures

Distributed Software Development with Perforce Perforce Consulting Guide

Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication

Conflict-Aware Load-Balancing Techniques for Database Replication

Big data management with IBM General Parallel File System

A Shared-nothing cluster system: Postgres-XC

Eloquence Training What s new in Eloquence B.08.00

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Geo-Replication in Large-Scale Cloud Computing Applications

Distributed System Principles

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Oracle Weblogic. Setup, Configuration, Tuning, and Considerations. Presented by: Michael Hogan Sr. Technical Consultant at Enkitec

Outline. Failure Types

Postgres Plus Advanced Server

Technical Challenges for Big Health Care Data. Donald Kossmann Systems Group Department of Computer Science ETH Zurich

Adaptive Middleware for Data Replication

Tushar Joshi Turtle Networks Ltd

A Scalable Data Platform for a Large Number of Small Applications

Apuama: Combining Intra-query and Inter-query Parallelism in a Database Cluster

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

GORDA: An Open Architecture for Database Replication

A distributed system is defined as

Distribution transparency. Degree of transparency. Openness of distributed systems

High Availability with Windows Server 2012 Release Candidate

Byzantium: Byzantine-Fault-Tolerant Database Replication

be architected pool of servers reliability and

System Models for Distributed and Cloud Computing

High Availability Database Solutions. for PostgreSQL & Postgres Plus

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

Conflict-Aware Scheduling for Dynamic Content Applications

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Service Oriented Architectures

The PostgreSQL Database And Its Architecture

Google File System. Web and scalability

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Chapter 3 - Data Replication and Materialized Integration

Distributed Databases

Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication

Availability Digest. MySQL Clusters Go Active/Active. December 2006

Robust Snapshot Replication

Transaction Management Overview

Concurrency control. Concurrency problems. Database Management System

Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts

Correctness Criteria for Database Replication: Theoretical and Practical Aspects

Client/Server and Distributed Computing

Agenda. ! Strengths of PostgreSQL. ! Strengths of Hadoop. ! Hadoop Community. ! Use Cases

Data Consistency on Private Cloud Storage System

Transactions and ACID in MongoDB

SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK

Enhancing an Application Server to Support Available Components

Distributed Databases

Variations in Performance and Scalability when Migrating n-tier Applications to Different Clouds

Microsoft SQL Server: MS Performance Tuning and Optimization Digital

Configuration Management of Massively Scalable Systems

Microkernels & Database OSs. Recovery Management in QuickSilver. DB folks: Stonebraker81. Very different philosophies

Goals. Managing Multi-User Databases. Database Administration. DBA Tasks. (Kroenke, Chapter 9) Database Administration. Concurrency Control

Splice Machine: SQL-on-Hadoop Evaluation Guide

A Proposal for a Multi-Master Synchronous Replication System

Do Relational Databases Belong in the Cloud? Michael Stiefel

Pronto: High Availability for Standard Off-the-shelf Databases

Chapter 2 Distributed Database Management Systems: Architectural Design Choices for the Cloud

How to Choose Between Hadoop, NoSQL and RDBMS

Transcription:

Database replication for commodity database services Gustavo Alonso Department of Computer Science ETH Zürich alonso@inf.ethz.ch http://www.iks.ethz.ch

Replication as a problem Gustavo Alonso. ETH Zürich. 2

How to replicate data? Depending on when the updates are propagated: Synchronous (eager) Asynchronous (lazy) Depending on where the updates can take place: Primary Copy (master) Update Everywhere (group) Primary copy Update everywhere Synchronous Asynchronous Gustavo Alonso. ETH Zürich. 3

Theory The name of the game is correctness and consistency Synchronous replication is preferred: copies are always consistent (1-copy serializability) programming model is trivial (replication is transparent) Update everywhere is preferred: system is symmetric (load balancing) avoids single point of failure Other options are ugly: inconsistencies centralized formally incorrect Synchronous Asynchronous Primary copy Update everywhere Gustavo Alonso. ETH Zürich. 4

and practice The name of the game is throughput and response time Asynchronous replication is preferred: avoid transactional coordination (throughput) avoid 2PC overhead (response time) Primary copy is preferred: design is simpler (centralized) trust the primary copy Other options are not feasible: overhead deadlocks do not scale Synchronous Asynchronous Primary copy Update everywhere Gustavo Alonso. ETH Zürich. 5

The dangers of replication... SYNCHRONOUS Coordination overhead distributed 2PL is expensive 2PC is expensive prefer performance to correctness Communication overhead 5 nodes, 100 tps, 10 w/txn = 5 000 messages per second!! UPDATE EVERYWHERE Deadlock/Reconciliation rates the probability of conflicts becomes so high, the system is unstable and does not scale Useless work the same work is done by all administrative costs paid by everybody all nodes must understand replication (not trivial) Gustavo Alonso. ETH Zürich. 6

Text book replication (BHG 87) SITE A SITE B SITE C BOT R(x) W(x) Upd Lock Upd request ack change Lock Upd Read One, Write All Each site uses 2PL Atomic commitment through 2PC Read operations are performed locally Write operations involve locking all copies of the data item (request a lock, obtain the lock, receive an acknowledgement) Optimizations are based on the idea of quorums...... Gustavo Alonso. ETH Zürich. 7

Response Time centralized database update T= T= replicated database update: 2N messages 2PC The way replication takes place (one operation at a time), increases the response time and, thereby, the conflict profile of the transaction. The message overhead is too high (even if broadcast facilities are available). Gustavo Alonso. ETH Zürich. 8

Deadlocks (Gray et al. SIGMOD 96) A B C BOT R(x) W(x) Lock Lock D BOT W(x) Approximated deadlock rate: TPS! Action_ Time! Actions! N 2 4! DB_ Size 2 5 3 if the database size remains constant, or 2 5 TPS! Action_ Time! Actions! N 2 4! DB_Size if the database size grows with the number of nodes. Optimistic approaches result in too many aborts. Gustavo Alonso. ETH Zürich. 9

Commercial systems R e p l i c a t i o n u s i n g d i s t r i b u t e d l o c k i n g 8 0 0 Response Time in ms 6 0 0 4 0 0 2 0 0 0 1 2 3 4 5 N u m b e r o f R e p l i c a s Gustavo Alonso. ETH Zürich. 10

Cost of Replication 60 50 40 30 20 10 0 Available CPU System with 50 nodes ws (replication factor) 0 0.1 0.3 0.5 0.7 0.9 1 Overall computing power of the system: N 1+ w! s!(n " 1) No gain with large ws factor (rate of updates and fraction of the database that is replicated) Quorums are questionable, reads must be local to get performance advantages. Gustavo Alonso. ETH Zürich. 11

GANYMED: Solving the replication problem Gustavo Alonso. ETH Zürich. 12

What can be done? Are these fundamental limitations or side effects of the way databases work? Consistency vs. Performance: is this a real trade-off? Cost seems to be inherent: if all copies do the same, no performance gain Deadlocks: typical synchronization problem when using locks Communication overhead: ignored in theory, a real showstopper in practice If there are no fundamental limitations, can we do better? In particular, is there a reasonable implementation of synchronous, update everywhere replication? Consistency is a good idea Performance is also a good idea Nobody disagrees that it would be nice but commercial systems have given up on having both!! Gustavo Alonso. ETH Zürich. 13

Consistency vs. Peformance We want both: Consistency is good for the application Performance is good for the system Then: Let the application see a consistent state...... although the system is asynchronous and primary copy This is done through: A middleware layer that offers a consistent view Using snapshot isolation as correctnes criteria REPLICATION MIDDLEWARE Asynchronous Primary copy I see a consistent state Gustavo Alonso. ETH Zürich. 14

Two sides of the same coin SNAPSHOT ISOLATION To the clients, the middleware offers snapshot isolation: Queries get their own consistent snapshot (version) of the database Update transactions work with the latest data Queries and updates do not conflict (operate of different data) First committer wins for conflicting updates PostgreSQL, Oracle, MS SQL Server ASYNCH PRIMARY COPY Primary copy: master site where all updates are performed Slaves: copies where only reads are peformed A client gets a snapshot by running its queries on a copy Middleware makes sure that a client sees its own updates and only newer snapshots Updates go to primary copy and conflicts are resolved there (not by the middleware) Updates to master site are propagated lazily to the slaves Gustavo Alonso. ETH Zürich. 15

Ganymed: Putting it together Based on JDBC drivers Only scheduling, no concurrency control, no query processing... Simple messaging, no group communication Very much stateless (easy to make fault tolerant) Acts as traffic controller and bookkeeper Route queries to a copy where a consistent snapshot is available Keep track of what updates have been done where (propagation is not uniform) Gustavo Alonso. ETH Zürich. 16

Linear scalability Gustavo Alonso. ETH Zürich. 17

Improvements in response time (!!!) Gustavo Alonso. ETH Zürich. 18

Fault tolerance (slave failure) Gustavo Alonso. ETH Zürich. 19

Fault tolerance (master failure) Gustavo Alonso. ETH Zürich. 20

GANYMED: Beyond conventional replication Gustavo Alonso. ETH Zürich. 21

Oracle master PostgreSQL slaves Gustavo Alonso. ETH Zürich. 22

Oracle master PostgreSQL slaves Gustavo Alonso. ETH Zürich. 23

Updates through SQL (Oracle-Postgres) Gustavo Alonso. ETH Zürich. 24

Updates through SQL (Oracle-Postgres) Gustavo Alonso. ETH Zürich. 25

DB2 master PostreSQL slaves Gustavo Alonso. ETH Zürich. 26

DB2 master PostreSQL slaves Gustavo Alonso. ETH Zürich. 27

Critical issues The results with a commercial master and open source slaves is still a proof of concept but a very powerful one More work needs to be done (in progress) Update extraction from the master Trigger based = attach triggers to tables to report updates (low overhead at slaves, high overhead at master) Generic = propagate update SQL statements to copies (high overhead at slaves, no overhead at master, limitations with hidden updates) Update propagation = tuple based vs SQL based SQL is not standard (particularly optimized SQL) Understanding workloads (how much write load is really present in a database workload) Replicate only parts of the database (table fragments, tables, materialized views, indexes, specialized indexes on copies...) Gustavo Alonso. ETH Zürich. 28

Query optimizations (DB2 example) SELECT J.i_id, J.i_thumbnail FROM item I, item J WHERE (I.i_related1 = j.i_id OR I.i_related2 = j.i_id OR I.i_related3 = j.i_id OR I.i_related4 = j.i_id OR I.i_related5 = j.i_id) AND i.i_id = 839; Gustavo Alonso. ETH Zürich. 29

Query optimization (DB2 example) SELECT J.i_id, J.i_thumbnail FROM item J WHERE J.i_id IN ( (SELECT i_related1 FROM item WHERE i_id = 839) UNION (SELECT i_related2 FROM item WHERE i_id = 839) UNION (SELECT i_related3 FROM item WHERE i_id = 839) UNION (SELECT i_related4 FROM item WHERE i_id = 839) UNION (SELECT i_related5 FROM item WHERE i_id = 839) ); Gustavo Alonso. ETH Zürich. 30

Understanding workloads TPC-W Browsing Shopping Ordering Updates 3.21 % 10.77 % 24.34 % Read-only 96.79 % 89.23 % 75.66 % Ratio 1 : 30.16 1 : 8.29 1 : 3.11 POSTGRES COST Browsing NON-OPTIMIZED SQL Ratio (avg) updates : read only 7:50 : 50.11 Ratio (total) updates : read only 7.50 : 1511.32 OPTIMIZED SQL Ratio (avg) updates : read only 6.92 : 10.39 Ratio (total) updates : read only 6.29 : 313.36 Shopping 6.38 : 49.35 6.38 : 409.11 6.28 : 6.59 6.28 : 54.63 Ordering 7.70 : 36.28 7.70 : 112.83 6.23 : 3.28 6.23 : 10.20 Gustavo Alonso. ETH Zürich. 31

A new twist to Moore s Law What is the cost of optimization? SQL rewriting = several days two/three (expert) people (improvement ratio between 5 and 10) Ganymed = a few PCs with open source software (improvement factor between 2 and 5 for optimized SQL, for non-optimized SQL multiply by X) Keep in mind: Copies do not need to be used, they can be kept dormant until increasing load demands more capacity Several database instances can share a machine (database scavenging) We do not need to replicate everything (less overhead for extraction) Gustavo Alonso. ETH Zürich. 32

SQL is not SQL Amongst the 3333 most recent orders, the query performs a TOP-50 search to list a category's most popular books based on the quantity sold SELECT * FROM ( SELECT i_id, i_title, a_fname, a_lname, SUM(ol_qty) AS orderkey FROM item, author, order_line WHERE i_id = ol_i_id AND i_a_id = a_id AND ol_o_id > (SELECT MAX(o_id)-3333 FROM orders) AND i_subject = 'CHILDREN' GROUP BY i_id, i_title, a_fname, a_lname ORDER BY orderkey DESC ) WHERE ROWNUM <= 50 Current version does very basic optimizations on the slave side. Further work on optimizations at the middleware layer will boost performance even more Optimizations can be very specific to the local data Virtual column specific to Oracle. In PostgreSQL = LIMIT 50 Use of MAX leads to sequential scan in Postgres, change to: SELECT o_id-3333 FROM orders ORDER BY o_id DESC LIMIT 1 Gustavo Alonso. ETH Zürich. 33

GANYMED: Our real goals Gustavo Alonso. ETH Zürich. 34

Database scavenging Ideas similar to cluster/grid computing tools that place computing jobs in a pool of computers We want to dynamically place database slaves for master databases in a pool of computers The goal is to have a true low cost, autonomic database cluster GANYMED DB-MASTER A DB-MASTER C DB-MASTER B DB-MASTER D SLAVE CLUSTER Gustavo Alonso. ETH Zürich. 35

Steps to get there We already have the performance and scalability gain We already have the ability to replicate commercial engines (Oracle, DB2, SQL Server) What is missing Optimization of write set extraction or SQL update propagation Optimization of SQL statements forwarded to slaves Optimization of replication strategies in slaves Dynamic creation of slaves (many possibilities) Autonomic strategies for dynamic creation/deletion of slaves Grid engine for resource management Gustavo Alonso. ETH Zürich. 36

Databases as commodity service Remote applications use the database through a web services enabled JDBC driver (WS-JDBC) INTERNET WEB SERVICES INTERFACE (WS-JDBC) DB-MASTER A GANYMED DB-MASTER C DB-MASTER B DB-MASTER D SLAVE CLUSTER Gustavo Alonso. ETH Zürich. 37

Conclusions Ganymed synthesizes a lot of previous work in DB replication Postgres-R (McGill) Middle-R (Madrid Technical Uni.) Middleware based approaches (U. Of T.) C-JDBC (INRIA Grenoble, Object Web)... Contributions There is nothing comparable in open source solutions Database independent Very small footprint Easily extensible in many context Can be turned into a lazy replication engine Can be used for data caching across WANs Almost unlimited scalability for dynamic content \ web data Very powerful platform to explore innovative approaches Databases as a commodity service Database scavenging Optimizations to commercial engines through open source slaves Gustavo Alonso. ETH Zürich. 38