Synchronous multi-master clusters with MySQL: an introduction to Galera



Similar documents
MySQL synchronous replication in practice with Galera

MySQL always-up with Galera Cluster

High-availability with Galera Cluster for MySQL

High Availability Solutions for the MariaDB and MySQL Database

How to evaluate which MySQL High Availability solution best suits you

Galera Replication. Synchronous Multi-Master Replication for InnoDB. ...well, why not for any other DBMS as well. Seppo Jaakola Alexey Yurchenko

Tushar Joshi Turtle Networks Ltd


Parallel Replication for MySQL in 5 Minutes or Less

High Availability and Scalability for Online Applications with MySQL

MySQL High-Availability and Scale-Out architectures

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Preparing for the Big Oops! Disaster Recovery Sites for MySQL. Robert Hodges, CEO, Continuent MySQL Conference 2011

MySQL performance in a cloud. Mark Callaghan

How to choose High Availability solutions for MySQL MySQL UC 2010 Yves Trudeau Read by Peter Zaitsev. Percona Inc MySQLPerformanceBlog.

High Availability Solutions for MySQL. Lenz Grimmer DrupalCon 2008, Szeged, Hungary

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL

Solving Large-Scale Database Administration with Tungsten

MySQL Fabric: High Availability Solution for Connector/Python

MySQL High Availability Solutions. Lenz Grimmer OpenSQL Camp St. Augustin Germany

The Future of PostgreSQL High Availability Robert Hodges - Continuent, Inc. Simon Riggs - 2ndQuadrant

MySQL Replication. openark.org

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Full Text Search in MySQL 5.1 New Features and HowTo

Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL

MySQL Cluster Deployment Best Practices

MySQL backups: strategy, tools, recovery scenarios. Akshay Suryawanshi Roman Vynar

MySQL Cluster New Features. Johan Andersson MySQL Cluster Consulting johan.andersson@sun.com

High Availability Using MySQL in the Cloud:

Welcome to Virtual Developer Day MySQL!

Flash Databases: High Performance and High Availability

MySQL Storage Engines

DBA Tutorial Kai Voigt Senior MySQL Instructor Sun Microsystems Santa Clara, April 12, 2010

Monitoring Galera with MONyog

SkySQL Data Suite. A New Open Source Approach to MySQL Distributed Systems. Serge Frezefond V

Comparing MySQL and Postgres 9.0 Replication

How, What, and Where of Data Warehouses for MySQL

Preventing con!icts in Multi-master replication with Tungsten

Cloud Based Application Architectures using Smart Computing

How to Optimize the MySQL Server For Performance

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Percona Server features for OpenStack and Trove Ops

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Exploring Amazon EC2 for Scale-out Applications

MySQL: Cloud vs Bare Metal, Performance and Reliability

XtraBackup. Vadim Tkachenko CTO, Co-Founder, Percona Inc Percona Live SF Feb 2011

Dave Stokes MySQL Community Manager

SCALABLE DATA SERVICES

SCALABILITY AND AVAILABILITY

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Database Administration with MySQL

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

White Paper. Optimizing the Performance Of MySQL Cluster

Part 3. MySQL DBA I Exam

Tier Architectures. Kathleen Durant CS 3200

Open Source DBMS CUBRID 2008 & Community Activities. Byung Joo Chung bjchung@cubrid.com

Database Replication with MySQL and PostgreSQL

Real-time reporting at 10,000 inserts per second. Wesley Biggs CTO 25 October 2011 Percona Live

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Contents. SnapComms Data Protection Recommendations

Partitioning under the hood in MySQL 5.5

XtraBackup: Hot Backups and More

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Active/Active DB2 Clusters for HA and Scalability

Monitoring MySQL. Kristian Köhntopp

Availability Digest. MySQL Clusters Go Active/Active. December 2006

Hypertable Architecture Overview

GoGrid Implement.com Configuring a SQL Server 2012 AlwaysOn Cluster

MySQL Enterprise Backup

High Availability Solutions with MySQL

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

Scaling Database Performance in Azure

Module 14: Scalability and High Availability

ScaleArc for SQL Server

QA PRO; TEST, MONITOR AND VISUALISE MYSQL PERFORMANCE IN JENKINS. Ramesh Sivaraman

Load balancing MySQL with HaProxy. Peter Boros Percona 4/23/13 Santa Clara, CA

Social Networks and the Richness of Data

Database Scalability {Patterns} / Robert Treat

Simba Apache Cassandra ODBC Driver

TECHNICAL OVERVIEW HIGH PERFORMANCE, SCALE-OUT RDBMS FOR FAST DATA APPS RE- QUIRING REAL-TIME ANALYTICS WITH TRANSACTIONS.

5 Percona Toolkit tools that could save your day. Stéphane Combaudon FOSDEM February 3rd, 2013

MAGENTO HOSTING Progressive Server Performance Improvements

How To Monitor Mysql With Zabbix

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Best Practices for Using MySQL in the Cloud

MySQL Backup and Recovery: Tools and Techniques. Presented by: René Senior Operational DBA

Not Your Grandpa s Replication

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

PRM and DRBD tutorial. Yves Trudeau October 2012

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, Anthony Yeh, Software Engineer, YouTube.

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

Postgres Plus xdb Replication Server with Multi-Master User s Guide

Zero Downtime Deployments with Database Migrations. Bob Feldbauer

Scaling DBMail with MySQL

Future-Proofing MySQL for the Worldwide Data Revolution

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Transcription:

Synchronous multi-master clusters with : an introduction to Galera Henrik Ingo OUGF Harmony conference Aulanko, Please share and reuse this presentation licensed under Creative Commonse Attribution license http://creativecommons.org/licenses/by/3.0/

Agenda * replication issues * Galera internal architecture * Installing and configuring it * Synchronous multi-master clustering, what does it mean? * Load balancing and other options * How network partitioning is handled * WAN replication How does it perform? * In memory workload * Scale-out for writes - how is it possible? * Disk bound workload * NDB shootout. Synchronous multi-master clusters with : an introduction to Galera 2

So what is Galera all about?

Created by Codership Oy Participated in 3 cluster developments, since 2003 Started Galera work 2007 Galera is free, open source. Codership offers support and consulting Percona XtraDB Cluster based on Galera, launched 2012 Is (and can be) integrated into other and non- products Synchronous multi-master clusters with : an introduction to Galera 4

replication challenges Asynchronous = You will lose data 5.5 semi-sync: Better, but falls back to asynchronous when in trouble... Single-threaded => slave lag => 50-80% performance penalty? Master-slave: read-write splitting, failovers, diverged slaves Low level: manually provision DB to slave, configure binlog positions,... Master Slave Synchronous multi-master clusters with : an introduction to Galera 5

So what about DRBD? Synchronous, yay! Cold standby No scale-out (But you can combine with replication...)? 50% performance penalty (SAN based HA has these same issues btw) Primary Disk DRBD Cold standby Disk Synchronous multi-master clusters with : an introduction to Galera 6

Galera in a nutshell True multi-master: Read & write to any node Synchronous replication No slave lag No integrity issues No master-slave failovers or VIP needed Multi-threaded slave, no performance penalty Master Master Master Automatic node provisioning Galera Synchronous multi-master clusters with : an introduction to Galera 7

Ok, let's take a closer look...

A Galera cluster is... SHOW STATUS LIKE "wsrep%" SHOW VARIABLES... mysqldump rsync xtrabackup etc... Snapshot State Transfer Replication API Wsrep API Galera group comm library MyISAM InnoDB Patches http://www.codership.com/downloads/download-mysqlgalera Synchronous multi-master clusters with : an introduction to Galera 9

Starting your first cluster Lets assume we have nodes 10.0.0.1, 10.0.0.2 and 10.0.0.3. On node 1, set this in my.cnf: wsrep_cluster_address="gcomm://" Then start the first node: /etc/init.d/mysql start Important! Now change this in my.cnf to point to any of the other nodes. Don't leave it with empty gcomm string! wsrep_cluster_address="gcomm://10.0.0.3" On node 2, set this in my.cnf: wsrep_cluster_address="gcomm://10.0.0.1" And start it /etc/init.d/mysql start Node 2 now connects to Node 1. Then do the same on Node 3: wsrep_cluster_address="gcomm://10.0.0.1" # or.2 /etc/init.d/mysql start Node 3 connects to node 1, after which it establishes communication with all nodes currently in the cluster. wsrep_cluster_address should be set to any one node that is currently part of the cluster. After startup, it doesn't have any meaning, all nodes connect to all others all the time. Tip: Galera outputs a lot of info to error log, especially when nodes join or leave the cluster. Use one of these to observe SST: ps aux grep mysqldump ps aux grep xtrabackup ps aux grep rsync Synchronous multi-master clusters with : an introduction to Galera 10

Checking that nodes are connected and ready SHOW STATUS LIKE "wsrep%";... wsrep_local_state 4 wsrep_local_state_comment Synced (6)... wsrep_cluster_conf_id 54 wsrep_cluster_size 3 wsrep_cluster_state_uuid wsrep_cluster_status... Primary or Non-Primary cluster component? Q: Can you tell me which nodes are connected to this node/component? A: See error log. Increments when a node joins or leaves the cluster. Nr of nodes connected. 3108998a-67d4-11e1-... Primary UUID generated when first node is started, all nodes share same UUID. Identity of the clustered database. Synchronous multi-master clusters with : an introduction to Galera 11

Your first query # mysql -uroot -prootpass -h 10.0.0.1 Welcome to the monitor. Commands end with ; or \g. Your connection id is 11 Server version: 5.1.53 wsrep_0.8.0 mysql> create table t (k int primary key auto_increment, v blob); mysql> show create table t; Table Create Table t CREATE TABLE `t` ( `k` int(11) NOT NULL auto_increment, `v` blob, PRIMARY KEY (`k`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> insert into t (k) values ( "first row"); mysql> insert into t (k) values ( "second row"); Synchronous multi-master clusters with : an introduction to Galera 12

How it looks from another node # mysql -uroot -prootpass -h 10.0.0.2 Welcome to the monitor. Commands end with ; or \g. Your connection id is 11 Server version: 5.1.53 wsrep_0.8.0 mysql> show create table t; Table Create Table t CREATE TABLE `t` ( `k` int(11) NOT NULL auto_increment, `v` blob, PRIMARY KEY (`k`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 mysql> select * from t; k v 1 first row 4 second row Galera automatically sets auto_increment_increment and auto_increment_offset Synchronous multi-master clusters with : an introduction to Galera 13

More interesting Galera (Wsrep) status vars wsrep_local_send_queue 34 wsrep_local_send_queue_avg 0.824589 wsrep_local_recv_queue 30 wsrep_local_recv_queue_avg 1.415460 wsrep_flow_control_paused 0.790793 wsrep_flow_control_sent 9 wsrep_flow_control_recv 52 wsrep_cert_deps_distance 105.201550 wsrep_apply_oooe 0.349014 wsrep_apply_oool 0.012709 wsrep_apply_window 2.714530 Tip: These variables reset every time you do SHOW VARIABLES. So do it once, wait 10 sec, then do it again and read the second values. Fraction of time that Galera replication was halted due to slave(s) overloaded. Bigger than 0.1 means you have a problem! Synchronous multi-master clusters with : an introduction to Galera 14

Did you say parallel slave threads? mysql> show variables\g... *************************** 296. row *************************** Variable_name: wsrep_slave_threads Value: 32 mysql> SHOW PROCESSLIST;... 31 system user NULL Sleep 32 committed 933 NULL 32 system user NULL Sleep 21 committed 944 NULL replication allows master to proceed and leaves slave lagging. Galera cluster's are tightly coupled: Master throughput will degrade if any slave can't keep up. For best throughput, use 4-64 slave threads. (Esp disk-bound workloads.) Generic solution: works with any application, schema. Commit order is preserved. Also possible: Out-of-order-commits. Small additional benefit, unsafe for most apps. Synchronous multi-master clusters with : an introduction to Galera 15

options with Galera friendly values # (This must be substituted by wsrep_format) binlog_format=row # Currently only InnoDB storage engine is supported default-storage-engine=innodb # No need to sync to disk when replication is synchronous sync_binlog=0 innodb_flush_log_at_trx_commit=0 innodb_doublewrite=0 # to avoid issues with 'bulk mode inserts' using autoinc innodb_autoinc_lock_mode=2 # This is a must for paralell applying innodb_locks_unsafe_for_binlog=1 # Query Cache is not supported with wsrep query_cache_size=0 query_cache_type=0 Synchronous multi-master clusters with : an introduction to Galera 16

Setting WSREP and Galera options # Most settings belong to wsrep api = part of # # State Snapshot Transfer method wsrep_sst_method=mysqldump # # SST authentication string. This will be used to send SST to joining nodes. # Depends on SST method. For mysqldump method it is root:<root password> wsrep_sst_auth=root:password # Use of Galera library is opaque to. It is a "wsrep provider". # Full path to wsrep provider library or 'none' #wsrep_provider=none wsrep_provider=/usr/local/mysql/lib/libgalera_smm.so # Provider specific configuration options # Here we increase window size for a WAN setup wsrep_provider_options="evs.send_window=512; evs.user_send_window=512;" Synchronous multi-master clusters with : an introduction to Galera 17

Synchronous Multi-Master Clustering

Multi-master = you can write to any node & no failovers needed What do you mean no failover??? Use a load balancer Application sees just one IP Write to any available node, round-robin If node fails, just write to another one What if load balancer fails? -> Turtles all the way down Galera LB Synchronous multi-master clusters with : an introduction to Galera 19

Protip: JDBC, PHP come with built-in load balancing! No Single Point of Failure LB LB One less layer of network components Is aware of transaction states and errors jdbc:mysql:loadbalance:// 10.0.0.1,10.0.0.2,10.0.0.3 /<database>? loadbalanceblacklisttimeout=5000 Galera Synchronous multi-master clusters with : an introduction to Galera 20

Load balancer per each app node Also no Single Point of Failure LB LB LB is an additional layer, but localhost = pretty fast Need to manage more load balancers Good for languages other than Java, PHP Galera Synchronous multi-master clusters with : an introduction to Galera 21

You can still do VIP based failovers. But why? => You can run a 2 node Galera cluster like this. failover VIP VIP Clustering framework Clustering framework Galera Galera Synchronous multi-master clusters with : an introduction to Galera 22

Understanding the transaction sequence in Galera Commit delay BEGIN SELECT UPDATE COMMIT commit return Master User transaction Certification COMMIT ROLLB InnoDB commit Synchronous multi-master clusters with : an introduction to Galera Slave Group communication => GTID Certification COMMIT discard Apply InnoDB commit Optimistic locking between nodes = Risk for deadlocks Certification = deterministic Virtual synchrony = Committed events written to InnoDB after small delay 23

How network partitioning is handled aka How split brain is prevented

Preventing split brain If part of the cluster can't be reached, it means The node(s) has crashed LB LB Nodes are fine and it's a network connectivity issue = network partition A node cannot know which of the two has happened Network partition may lead to split brain if both parts continue to commit transactions. Split brain leads to 2 diverging clusters, 2 diverged datasets Clustering SW must ensure there is only 1 cluster partition active at all times Galera Synchronous multi-master clusters with : an introduction to Galera 25

Quorum Galera uses quorum based failure handling: LB LB When cluster partitioning is detected, the majority partition "has quorum" and can continue A minority partition cannot commit transactions, but will attempt to re-connect to primary partition A load balancer will notice the errors and remove failed node from its pool Galera Synchronous multi-master clusters with : an introduction to Galera 26

What is majority? 50% is not majority Any failure in 2 node cluster = both nodes must stop 4 node cluster split in half = both halves must stop pc.ignore_sb exists but don't use it You can manually/automatically enable one half by setting wsrep_cluster_address Use 3, 5, 7... nodes Synchronous multi-master clusters with : an introduction to Galera 27

WAN replication

WAN replication Works fine Use higher timeouts and send windows No impact on reads No impact within a transaction adds 100-300 ms to commit latency Synchronous multi-master clusters with : an introduction to Galera 29

WAN with asynchronous replication You can mix Galera replication and replication But it can give you a headache :-) Good option on slow WAN link (China, Australia) You'll possibly need more nodes than in pure Galera cluster Remember to watch out for slave lag, etc... If binlog position is lost (e.g. due to node crash) must reprovision whole cluster. Mixed replication also useful when you want an asynchronous slave (such as time-delayed, or filtered). Synchronous multi-master clusters with : an introduction to Galera 30

Failures & WAN replication Q: What does 50% rule mean when I have 2 datacenters? DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS Synchronous multi-master clusters with : an introduction to Galera 31

Pop Quiz Q: What does 50% rule mean when I have 2 datacenters? DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS A: Better have 3 data centers too. DBMS DBMS DBMS DBMS DBMS DBMS Synchronous multi-master clusters with : an introduction to Galera 32

WAN replication with uneven node distribution Q: What does 50% rule mean when you have uneven amount of nodes per data center? DBMS DBMS DBMS DBMS DBMS Synchronous multi-master clusters with : an introduction to Galera 33

WAN replication with uneven node distribution Q: What does 50% rule mean when you have uneven amount of nodes per data center? DBMS DBMS DBMS DBMS DBMS A: Better distribute nodes evenly. (We will address this in future release.) Synchronous multi-master clusters with : an introduction to Galera 34

Benchmarks!

Baseline: Single node (sysbench oltp, in-memory) Red, Blue: Constrained by InnoDB group commit bug Fixed in Percona Server 5.5, MariaDB 5.3 and 5.6 Brown: InnoDB syncs, binlog doesn't Green: No InnoDB syncing either Yellow: No InnoDB syncs, Galera wsrep module enabled http://openlife.cc/blogs/2011/august/running-sysbench-tests-against-galera-cluster Synchronous multi-master clusters with : an introduction to Galera 36

3 node Galera cluster (sysbench oltp, in memory) http://openlife.cc/blogs/2011/august/running-sysbench-tests-against-galera-cluster Synchronous multi-master clusters with : an introduction to Galera 37

Comments on 3 node cluster (sysbench oltp, in memory) Yellow, Red are equal -> No overhead or bottleneck from Galera replication! Green, Brown = writing to 2 and 3 masters -> scale-out for read-write workload! top shows 700% CPU util (8 cores) http://openlife.cc/blogs/2011/august/running-sysbench-tests-against-galera-cluster Synchronous multi-master clusters with : an introduction to Galera 38

Sysbench disk bound (20GB data / 6GB buffer), tps EC2 w local disk Note: pretty poor I/O here Blue vs red: turning off innodb_flush_log_at_trx _commit gives 66% improvement Scale-out factors: 2N = 0.5 x 1N 4N = 0.5 x 2N 5th node was EC2 weakness. Later test scaled a little more up to 8 nodes http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited Synchronous multi-master clusters with : an introduction to Galera 39

Sysbench disk bound (20GB data / 6GB buffer), latency As before Not syncing InnoDB decreases latency Scale-out decreases latency Galera does not add latency overhead http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited Synchronous multi-master clusters with : an introduction to Galera 40

Galera and NDB shootout: sysbench "out of the box" Galera is 4x better Ok, so what does this really mean? That Galera is better... For this workload With default settings (Severalnines) Pretty user friendly and general purpose NDB Excels at key-value and heavy-write workloads (which sysbench is not) Would benefit here from PARTITION BY RANGE http://codership.com/content/whats-difference-kenneth Synchronous multi-master clusters with : an introduction to Galera 41

Summary Many replication idioms go away: synchronous, multi-master, no slave lag, no binlog positions, automatic node provisioning. Many LB and VIP architectures possible, JDBC/PHP load balancer is best. Also for WAN replication. Adds 100-300 ms to commit. Quorum based: Majority partition wins. Minimum 3 nodes. Minimum 3 data centers. Great benchmark results: more performance, not less, with replication! Galera is good where InnoDB is good: general purpose, easy to use HA cluster Compared to other alternatives: Galera is better than replication: - synchronous - parallel slave - multi-master - easier Galera is better than DRBD: - no performance penalty - active/active - easier Galera is better than NDB: - based on good old InnoDB - more general purpose - easier Synchronous multi-master clusters with : an introduction to Galera 42

Questions? Thank you for listening! Happy Clustering :-)