State of the Art for MySQL Multi-Master Replication. Robert Hodges, CEO

Similar documents
Preventing con!icts in Multi-master replication with Tungsten

Future-Proofing MySQL for the Worldwide Data Revolution

Solving Large-Scale Database Administration with Tungsten

How, What, and Where of Data Warehouses for MySQL

Parallel Replication for MySQL in 5 Minutes or Less

Synchronous multi-master clusters with MySQL: an introduction to Galera

Preparing for the Big Oops! Disaster Recovery Sites for MySQL. Robert Hodges, CEO, Continuent MySQL Conference 2011

From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten

Availability Digest. MySQL Clusters Go Active/Active. December 2006

High Availability Solutions for the MariaDB and MySQL Database

Database Replication with MySQL and PostgreSQL

MySQL synchronous replication in practice with Galera

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

High Availability Solutions for MySQL. Lenz Grimmer DrupalCon 2008, Szeged, Hungary

Linas Virbalas Continuent, Inc.

Galera Replication. Synchronous Multi-Master Replication for InnoDB. ...well, why not for any other DBMS as well. Seppo Jaakola Alexey Yurchenko

MySQL always-up with Galera Cluster

Using Tungsten Replicator to solve replication problems

High-availability with Galera Cluster for MySQL

Partitioning under the hood in MySQL 5.5

Monitoring Galera with MONyog

How to evaluate which MySQL High Availability solution best suits you

Recovery Principles in MySQL Cluster 5.1

Module 14: Scalability and High Availability

Database Replication with MySQL and PostgresSQL

Real-time reporting at 10,000 inserts per second. Wesley Biggs CTO 25 October 2011 Percona Live

MySQL High-Availability and Scale-Out architectures

Real-time Data Replication

Not Your Grandpa s Replication

The Future of PostgreSQL High Availability Robert Hodges - Continuent, Inc. Simon Riggs - 2ndQuadrant

Database Replication with Oracle 11g and MS SQL Server 2008

Tushar Joshi Turtle Networks Ltd

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Adding Indirection Enhances Functionality

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Welcome to Virtual Developer Day MySQL!

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

MySQL High Availability Solutions. Lenz Grimmer OpenSQL Camp St. Augustin Germany

MySQL Administration and Management Essentials

Replicating to everything

High Availability Solutions with MySQL

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL

Dave Stokes MySQL Community Manager

Data Management in the Cloud

Appendix A Core Concepts in SQL Server High Availability and Replication

SanDisk ION Accelerator High Availability

Implementing the Future of PostgreSQL Clustering with Tungsten

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Social Networks and the Richness of Data

High Availability and Scalability for Online Applications with MySQL

White Paper. Optimizing the Performance Of MySQL Cluster

Transactions and ACID in MongoDB

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Zero Downtime Deployments with Database Migrations. Bob Feldbauer

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

MySQL Enterprise Backup

ADDING A NEW SITE IN AN EXISTING ORACLE MULTIMASTER REPLICATION WITHOUT QUIESCING THE REPLICATION

How graph databases started the multi-model revolution

Practical Cassandra. Vitalii

Survey on Comparative Analysis of Database Replication Techniques

MySQL. Leveraging. Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Citrix XenServer Backups with SEP sesam

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Comparing MySQL and Postgres 9.0 Replication

Extending Your Availability Group for Disaster Recovery

Distributed Databases

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, Anthony Yeh, Software Engineer, YouTube.

Transactions and the Internet

Bigdata High Availability (HA) Architecture

Rajesh Gupta Best Practices for SAP BusinessObjects Backup & Recovery Including High Availability and Disaster Recovery Session #2747

Using Apache Derby in the real world

BookKeeper overview. Table of contents

Design Patterns for Distributed Non-Relational Databases

Real World Enterprise SQL Server Replication Implementations. Presented by Kun Lee

Boost SQL Server Performance Buffer Pool Extensions & Delayed Durability


MySQL Cluster New Features. Johan Andersson MySQL Cluster Consulting johan.andersson@sun.com

Virtuoso Replication and Synchronization Services

Flash Databases: High Performance and High Availability

The Hadoop Distributed File System

Deploying Database clusters in the Cloud

PostgreSQL Concurrency Issues

Managed Hoster gains competitive edge through new cross-region multi-datacenter architecture

Lotus Domino Backup Strategy

SQL Server AlwaysOn. Michal Tinthofer 11. Praha What to avoid and how to optimize, deploy and operate.

MySQL Performance Blog. Measuring failover time for ScaleArc load balancer

Managing MySQL Scale Through Consolidation

Big Data Management and NoSQL Databases

CA ARCserve and CA XOsoft r12.5 Best Practices for protecting Microsoft Exchange

Transcription:

State of the Art for MySQL Multi-Master Replication Robert Hodges, CEO

A Short and Depressing Introduction to Distributed Systems Continuent 2012 2

Why Do We Care About Multi-Master? The Dream: Multiple, active DBMS servers with exactly the same data 1. Simple high availability model 2. Operate systems over multiple sites 3. Access geographically close data 4. Enable communication between applications 3

There s Just One Problem... It Doesn t Work More Precisely: Multi-Master Systems Cannot Provide One Copy Serialization 4

Consistency In a Single DBMS Transaction #1 update foo set follower=23 where id=32 Transaction #2 update foo set follower=976 where id=32 STOP Table Foo ACID transactions serialize updates follower=976 5

Consistency in a Distributed DBMS Transaction #1 update foo set follower=23 where id=32 Transaction #2 update foo set follower=976 where id=32 Table Foo follower=976 follower=976 follower=23 Table Foo follower=23 6

Ensuring Distributed Consistency Transaction #1 follower=23 Pessimistic Locking (Wait your turn, pal!) Transaction #2 follower=976 Table Foo Optimistic Locking (Early bird gets the worm) Conflict Resolution (Your mother cleans up later) Conflict Avoidance (Solve the problem by not having it) Table Foo 7

Communication Implies Latency Transaction #1 follower=23 Transaction #2 follower=976 Log Scaled Network Latency Table Foo Table Foo 8

Communication Implies Latency Transaction #1 follower=23 Transaction #2 follower=976 Table Foo Async replication = No application latency Synchronous replication = Application latency Table Foo 9

So Can We Build Useful Applications? Absolutely. 10

MySQL Native Replication (Aka the basics) Continuent 2012 11

How Does It Work? Master Master Binlog Binlog 12

Row vs. Statement Replication Statement replication = send client SQL Row replication = send changed rows Use row replication for multi-master 13

Server IDs Distinguish between di!erent MySQL servers Prevent replication loops in multi-master [my.cnf] server-id=1... [my.cnf] server-id=2... [my.cnf] server-id=3... 14

Auto-Increment Key O!sets Set di!erent keys on each server to avoid primary key collisions [my.cnf] server-id=1 auto-increment-offset = 1 auto-increment-increment = 4... [my.cnf] server-id=2 auto-increment-offset = 2 auto-increment-increment = 4... 15

Global Transaction IDs (GTID) New feature in MySQL 5.6 to provide globally unique IDs for transactions Consist of UUID plus sequence number Designed to capture location of original update as well as the sequence number of the transaction 3E11FA47-71CA-11E1-9E33-C80AA9429562:5660 16

MySQL Multi-Master Topologies Master Master-master Replication Master Circular Replication Master Master Master 17

Native Replication Summary Built in with well-known capabilities Very limited topology support Very limited con"ict avoidance Not a good choice for multi-master if there are writes to more than 1 master GTIDs, multi-source (MariaDB) replication promise for the future 18

MySQL Cluster Continuent 2012 19

How Does It Work? Access Layer MySQL A MySQL B MySQL B Distributed Storage NDB1 NDB3 NDB2 NDB4 20

MySQL Cluster Cross-Site Topology Transaction #1 follower=23 Transaction #2 follower=976 MySQL A MySQL B MySQL B MySQL A MySQL B MySQL B NDB1 NDB2 NDB1 NDB2 NDB3 NDB4 NDB3 NDB4 Primary Replica Backup Replica 21

Eventual Consistency Algoritm NDB has built-in cross-cluster con"ict detection based on epochs and primary keys Updates to primary always succeed Update to backup may be rolled back if primary has a con"icting update MySQL Cluster resends updates from the primary to cure con"icts on the slave Caveat: I am not a MySQL Cluster expert 22

MySQL Cluster Summary Allows active/active operation Innovative eventual consistency algorithm Covers failure of individual MySQL nodes Detects con"icts automatically on rows Limited by lack of NDB usage in general Check with MySQL Cluster folks for more See also Brendan Frazer s posts at http://messagepassing.blogspot.com 23

Galera Continuent 2012 24

How It Works: Group Communication Server A foo pk=1, v=5 foo pk=6, v=25 [1] pk=1, v=6 [2] pk=1, v=5 [3] pk=6, v=25 Group Ordering and Delivery Server C [1] pk=1, v=6 [2] pk=1, v=5 [3] pk=6, v=25 foo pk=1, v=6 Server B [1] pk=1, v=6 [2] pk=1, v=5 [3] pk=6, v=25 25

How It Works: Certi#cation Server A [1] pk=1, v=6 [2] pk=1, v=5 [3] pk=6, v=25 foo pk=1, v=5 DEADLOCK Group Ordering and Delivery foo pk=6, v=25 COMMIT Server C [1] pk=1, v=6 [2] pk=1, v=5 [3] pk=6, v=25 foo pk=1, v=6 COMMIT Server B [1] pk=1, v=6 [2] pk=1, v=5 [3] pk=6, v=25 26

New Node Start-Up and SST (Initializing a new node) # vi /etc/my.cnf <== set node name # mysql_install_db --user=mysql --datadir=/data/ galera # mysqld_safe & 1. Assess node state 2. Join the cluster 3. Request SST (= State Snapshot Transfer ) 4. Recover DBMS Continuent 2012 27

Connect to Any Node for Writes (galera1) mysql> create table test (id int primary key auto_increment, data varchar(30)); mysql> insert into test(data) values('g1'); mysql> select * from test; +----+------+ id data +----+------+ 3 g2 4 g1 +----+------+ Auto_increment keys handled by Galera (galera2) mysql> insert into test(data) values('g2'); mysql> select * from test; +----+------+ id data +----+------+ 3 g2 4 g1 +----+------+ Continuent 2012 28

Optimistic Locking in Action (galera1) mysql> begin; mysql> update test set data='from g1' where id=3; mysql> commit; mysql> select * from test; +----+---------+ id data +----+---------+ 3 from g1 4 g1 +----+---------+ (galera2) mysql> begin; mysql> update test set data='from g2' where id=3; mysql> commit; ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction mysql> select * from test; +----+---------+ id data +----+---------+ 3 from g1 4 g1 +----+---------+ Continuent 2012 29

Cross-Site Replication Galera1 us-east-1d Galera3 eu-west-1b Group Ordering and Delivery Galera2 us-east-1d 30

Galera Replication Performance Sysbench Transactions per Second US + EU Nodes US Nodes Only 70 400 52.5 300 35 200 17.5 100 0 1 2 4 8 16 32 0 1 2 4 8 Threads Threads 31

Failure Handling Crashed servers drop out of the cluster IST (= incremental state transfer) can repair nodes that are just out of date Quorum determined automatically; On loss of quorum all nodes stop Loss of quorum can stop entire sites from working if you operate cross-site 32

Fixing Broken Nodes IST (= incremental state transfer) can repair nodes that are just out of date Inconsistent nodes require full state transfer 130422 16:28:09 [ERROR] WSREP: Local state seqno (88056) is greater than group seqno (88054): states diverged. Aborting to avoid potential data loss. Remove '/ data/galera//grastate.dat' file and restart if you wish to continue. (FATAL) SST is time-consuming for large data sets Enabling binlogs may help determine current state 33

Replicating To/From Galera Clusters Enable binlogs on all clusters & restart # Same server ID on all nodes server-id=13 log-slave-updates=1 Try to connect with Tungsten! 130422 19:21:07 [ERROR] Slave SQL: Could not execute Update_rows event on table tungsten_g1.heartbeat; Can't find record in 'heartbeat', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 153, Error_code: 1032 A new Galera bug is born :( 34

Galera Summary Big plus #1: simplicity Big plus #2: synchronous replication Performance looks promising Optimistic locking not good for all workloads SST is problematic for large datasets Unstable for some use cases but improving fast 35

Tungsten Continuent 2012 36

How It Works: Eventual Consistency update foo set v=5 where pk=1 Server A Server C Possible Conflict! update foo set v=26 where pk=6 update foo set v=6 where pk=1 Server B 37

Tungsten Replicator Overview Master Download transactions via network Replicator THL (Transactions + Metadata) DBMS Logs Slave Replicator THL Apply using JDBC (Transactions + Metadata) 38

Tungsten Replication Service Pipeline Stage Stage Stage Extract Filter Apply Extract Filter Apply Extract Filter Apply Master DBMS Transaction History Log In-Memory Queue Slave DBMS 39

Eventually Consistent Design MySQL servers are completely independent Replication provides multi-way links Transfer is asynchronous Links can be down for days or weeks if required It is the application s responsibility to ensure there are no con"icts 40

Asynchronous Multi-Master in Action SFO Replicator LAX Service sfo Service lax Service nyc Service sfo Service sfo Service lax Service nyc NYC Service lax Service nyc Replicator Best practice: Do not log slave updates 41

Connect to Any Node for Writes (sfo) mysql> create table test (id int primary key auto_increment, data varchar(30)); mysql> insert into test(data) values('sfo'); (lax) mysql> insert into test(data) values('lax'); mysql> select * from test; +----+------+ id data +----+------+ 3 lax 4 sfo +----+------+ Auto_increment keys must be manually configured mysql> select * from test; +----+------+ id data +----+------+ 3 lax 4 sfo +----+------+ Continuent 2012 42

Failure Handling Replication stops and resumes automatically when network link goes down Replication stops on replicator or DBMS failure and recovers after operator restart Con"icts can break replication Reconciliation is manual and potentially very di$cult Use binlogs to trace inconsistencies 43

Tungsten Multi-Master Clusters London New York te1 master ny lon master te3 te2 slave ny lon slave te4 Cross-site replicators read from slaves Cross-site updates are unlogged 44

Connect to Master when Slave Fails London New York te1 slave ny lon master te3 te2 master ny lon slave te4 (Slave failure/maintenance) 45

Use Filters to Detect/Avoid Con"icts Name ignore_server rename replicate shardfilter Purpose Drop transactions from server-id Rename schemas/tables/columns Control schemas/table replication Control shard replication You can also write your own filters! 46

Wide Range of Topologies All Masters Snowflake Star 47

Tungsten Summary Big plus #1: "exibility Big plus #2: handles mixed workloads/dirty operating conditions well Replication is in large-scale production use Lacks con"ict resolution Filters are powerful but a pain to write Can link between local clusters across sites 48

What Will Be in Next Year s Talk? Continuent 2012 49

Improvements to Replication Oracle: Stable GTIDs and a host of replication improvements MariaDB: Multi-source replication, GTIDs, more improvements Galera: Cover more use cases, GTIDs Tungsten: Con"ict resolution and better #lters 50

Better Programming Models Planes that did not leave before 4pm Planes that landed before 4pm SELECT flights that did not land before 4pm Select flights that landed before 4pm 51

More Data on Multi-Site Operation Public cloud infrastructure is improving Networks are more stable Synchronous replication is back in favor We ll see how it works out! 52

560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel +1 (866) 998-3642 Fax +1 (408) 668-1009 e-mail: sales@continuent.com Our Blogs: http://scale-out-blog.blogspot.com http://datacharmer.org/blog http://www.continuent.com/news/blogs Continuent Web Page: http://www.continuent.com Tungsten Replicator 2.0: http://code.google.com/p/tungsten-replicator Continuent 2012.