Binlog Servers at Booking.com

Similar documents
High Availability, Disaster Recovery and Extreme Read Scaling using Binlog Servers. Jean-François Gagné jeanfrancois DOT gagne AT booking.

Database Resilience at ISPs. High-Availability. White Paper

Tushar Joshi Turtle Networks Ltd

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL

High Availability Solutions for MySQL. Lenz Grimmer DrupalCon 2008, Szeged, Hungary

High Availability Solutions for the MariaDB and MySQL Database

MySQL Backup IEDR

Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL

MySQL Replication. openark.org

MySQL High Availability Solutions. Lenz Grimmer OpenSQL Camp St. Augustin Germany

MySQL Backup and Recovery: Tools and Techniques. Presented by: René Senior Operational DBA

Best Practices for Using MySQL in the Cloud

Preparing for the Big Oops! Disaster Recovery Sites for MySQL. Robert Hodges, CEO, Continuent MySQL Conference 2011

Synchronous multi-master clusters with MySQL: an introduction to Galera

High Availability and Scalability for Online Applications with MySQL

Zero Downtime Deployments with Database Migrations. Bob Feldbauer

Percona Server features for OpenStack and Trove Ops

Cloud Based Application Architectures using Smart Computing

MySQL High-Availability and Scale-Out architectures

MySQL Enterprise Backup

Storage Based Replications

Disaster Recovery for Oracle Database

PARALLELS CLOUD STORAGE

Comparing MySQL and Postgres 9.0 Replication

SCALABILITY AND AVAILABILITY

Application Brief: Using Titan for MS SQL

Welcome to Virtual Developer Day MySQL!

2000 databases later. Kristian Köhntopp Mittwoch, 17. April 13

Windows Geo-Clustering: SQL Server

Linas Virbalas Continuent, Inc.

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Core Solutions of Microsoft Lync Server 2013

Disaster Recovery Solution Achieved by EXPRESSCLUSTER

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Establishing Environmental Best Practices. Brendan Flamer.co.nz/spag/

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

VMware vsphere Data Protection

Avid MediaCentral Platform Disaster Recovery Systems Version 1.0

MySQL backups: strategy, tools, recovery scenarios. Akshay Suryawanshi Roman Vynar

Course Outline. Course 20336B: Core Solutions of Microsoft Lync Server Duration: 5 Days

Course Outline. Core Solutions of Microsoft Lync Server 2013 Course 20336B: 5 days Instructor Led. About this Course.

esxreplicator Contents

The Benefits of Virtualizing

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

MySQL Reference Architectures for Massively Scalable Web Infrastructure

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models

VMware vsphere Data Protection 6.1

MCSE SYLLABUS. Exam : Managing and Maintaining a Microsoft Windows Server 2003:

Availability Digest. MySQL Clusters Go Active/Active. December 2006

Getting Started with Attunity CloudBeam for Azure SQL Data Warehouse BYOL

vsphere Upgrade Update 1 ESXi 6.0 vcenter Server 6.0 EN

I/O Considerations in Big Data Analytics

Barracuda Backup for Managed Services Providers Barracuda makes it easy and profitable. White Paper

Core Solutions of Microsoft Lync Server 2013

Core Solutions of Microsoft Lync Server 2013

Module 14: Scalability and High Availability

Caching SMB Data for Offline Access and an Improved Online Experience

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

EBOOK: XOSOFT. CA XOsoft A solution that offers Disaster Recovery, High Availability and Continuous Data Protection

System Management. What are my options for deploying System Management on remote computers?

MySQL and Virtualization Guide

High Availability with Windows Server 2012 Release Candidate

10533: Deploying, Configuring, and Administering Microsoft Lync Server 2010 Duration: Five Days

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Parallel Replication for MySQL in 5 Minutes or Less

DISASTER RECOVERY WITH AWS

Deploying, Configuring, and Administering Microsoft Lync Server 2010

Protect SAP HANA Based on SUSE Linux Enterprise Server with SEP sesam

Solving Large-Scale Database Administration with Tungsten

Continuous Data Protection for any Point-in-Time Recovery: Product Options for Protecting Virtual Machines or Storage Array LUNs

MICROSOFT CERTIFIED SYSTEMS ENGINEER Windows 2003 Track

Designing Database Solutions for Microsoft SQL Server 2012 MOC 20465

SanDisk ION Accelerator High Availability

Attix5 Pro Disaster Recovery

Veritas Storage Foundation High Availability for Windows by Symantec

CA ARCserve Family r15

Evolution of Database Replication Technologies for WLCG

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module

Internet Infrastructure

MySQL Backups: From strategy to Implementation

IM and Presence Disaster Recovery System

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Eloquence Training What s new in Eloquence B.08.00

Course 10533A: Deploying, Configuring, and Administering Microsoft Lync Server 2010

Distributed Backup with the NetVault Plug-in for VMware for Scale and Performance

Server and Storage Virtualization with IP Storage. David Dale, NetApp

PROTECTING AND ENHANCING SQL SERVER WITH DOUBLE-TAKE AVAILABILITY

Elastic Scalability in MySQL Fabric using OpenStack

Dave Stokes MySQL Community Manager

VERITAS Business Solutions. for DB2

XtraBackup. Vadim Tkachenko CTO, Co-Founder, Percona Inc Percona Live SF Feb 2011

<Insert Picture Here> MySQL Update

Tungsten Replicator, more open than ever!

DISASTER RECOVERY BUSINESS CONTINUITY DISASTER AVOIDANCE STRATEGIES

XtraBackup: Hot Backups and More

Course 10534A: Planning and Designing a Microsoft Lync Server 2010 Solution

Database high availability

IT Test - Server Administration

Load Balancing & High Availability

Course 20336: Core Solutions of Microsoft Lync Server 2013

Transcription:

Binlog Servers at Booking.com Jean-François Gagné jeanfrancois DOT gagne AT booking.com Presented at Oracle Open World 2015

Booking.com 2

Booking.com Based in Amsterdam since 1996 Online Hotel and Accommodation Agent: 170 offices worldwide +819.000 properties in 221 countries 42 languages (website and customer service) Part of the Priceline Group And we use MySQL: Thousands (1000s) of servers, ~85% replicating >110 masters: ~25 >50 slaves & ~8 >100 slaves 3

Binlog Server: Session Summary 1. Replication and the Binlog Server 2. Extreme Read Scaling 3. Remote Site Replication (and Disaster Recovery) 4. Easy High Availability 5. Other Use-Cases (Crash Safety, Parallel Replication and Backups) 6. Binlog Servers at Booking.com 7. New master without touching slaves 4

Binlog Server: Replication One master / one or more slaves The master records all writes in a journal: the binary logs Each slave: Downloads the journal and saves it locally (IO thread): relay logs Executes the relay logs on the local database (SQL thread) Could produce binary logs to be itself a master (log-slave-updates) Replication is: Asynchronous lag Single threaded (in MySQL 5.6) slower than the master 5

Binlog Server: Booking.com Typical replication deployment: M +------+--... --+---------------+--------... S1 S2 Sn M1 +--... --+ T1 Tm Si and Tj are for read scaling Mi are the DR master Intermediate Master :-( 6

Binlog Server: What Binlog Server (BLS): is a daemon that: Downloads the binary logs from the master Saves them identically as on the master Serves them to slaves A or X are the same for B and C: By design, the binary logs served by A and X are the same 7 / \ A ---> / X \ ----- B C

Binlog Server: Read Scaling Typical replication topology for read scaling: M +--------+--------+---... ---+ S1 S2 S3 Sn When there are too many slaves, the network of M is overloaded: 100 slaves x 1Mbit/s: very close to 1Gbit/s OSC or purging data in RBR becomes hard Slave lag or unreachable master for writes 8

Binlog Server: Read Scaling Typical solution: fan-out with Intermediary Masters (IM): M But Intermediate Masters bring problems: log-slave-updates IM are slower than slaves +--------+-... -+ Lag of an IM all its slaves are lagging Rogue transaction on IM infection of all its slave M1 M2 Mm Failure of an IM all its slaves stop replicating (and action must be taken fast) +--... +-... +--... --+ S1 T1 Z1 Zi 9

Binlog Server: Read Scaling Solving IM problems with shared disk: Filers (expensive) or DRBD (doubling the number of servers) sync_binlog = 1 + trx_commit = 1 slower replication lag After a crash of an Intermediate Master: we need InnoDB recovery replication on slaves stalled lag and the cache is cold replication will be slow lag Solving IM problems with GTIDs: They allow slave repointing at the cost of added complexity :- But they do not completely solve the lag problem :-( And we cannot migrate online with MySQL 5.6 :-( :-( 10

Binlog Server: Read Scaling New Solution: replace IM by Binlog Servers M +----------------+-----... -----+ / \ / \ / \ / I1\ / I2\ / Im\ ----- ----- ----- +------+... +---... +---... ---+ S1 S2 Si Sj Sn 11 BLS should not lag If a BLS fails, repoint its slaves to other BLSs (easy by design)

Binlog Server: Remote Site Typical deployment for remote site: A Intermediate Master :-( +------+------+---------------+ E is an Intermediate Master B C D E same problems as read scaling +------+------+ F G H 12

Binlog Server: Remote Site Ideally, we would like this: A +------+------+---------------+------+------+------+ B C D E F G H No lag and no Single Point of Failure (SPOF) But no master on remote site for writes (easy solvable problem) And expensive in WAN bandwidth (harder problem to solve) 13

Binlog Server: Remote Site New solution: a Binlog Server on the remote site: A +------+------+---------------+ / \ B C D / X \ ----- +------+------+------+ E F G H 14

Binlog Server: Remote Site Or deploy 2 Binlog Servers to get better resilience: A If Y fails, repoint G and H to X, If X fails, repoint Y to A and E and F to Y +------+------+---------------+ / \ / \ B C D / X \ ------> / Y \ ----- ----- +------+ +------+ E F G H 15

Binlog Server: Remote Site Interesting property: if A fails, E, F, G & H converge to a common state A +------+------+---------------+ / \ B C D / X \ ----- +------+------+------+ E F G H New master promotion is easy on remote site 16

Binlog Server: Remote Site Step by step master promotion: 1. The 1st slave that is up to date can be the new master 2. SHOW MASTER STATUS or RESET MASTER, and RESET SLAVE ALL on the new master 3. Writes can be pointed to the new master 4. Once a slave is up to date, repoint it to the new master at the position of step # 2 5. Keep delayed/lagging slaves under X until up to date 6. Once no slaves is left under X, recycle it as a Binlog Server for the new master / \ / X \ ----- +-------------+ G F +------+ E H 17

Binlog Server: High Availability This property can be used for high availability: A / \ / X \ ----- +------+------+------+------+------+------+------+ B C D E F G H I 18

Binlog Server: Other Use-Cases Better Crash-Safe Replication http://blog.booking.com/better_crash_safe_replication_for_mysql.html Better Parallel Replication in MySQL 5.7 (LOGICAL_CLOCK) http://blog.booking.com/better_parallel_replication_for_mysql.html Easier Point in Time Recovery http://jfg-mysql.blogspot.com/ 2015/10/binlog-servers-for-backups-and-point-in-time-recovery.html 19

Binlog Server: Better // Replication Four transactions on X, Y and Z: X V Y V Z On X: ----Time----> T1 B---C T2 B---C T3 B-------C T4 B-------C On Y: ----Time----> B---C B---C B-------C B-------C On Z: ----Time---------> B---C B---C B-------C B-------C IM might stall the parallel replication pipeline To benefit from parallel replication, IM must disappear The Binlog Server allows exactly that 20

Binlog Server: Point in Time Recovery Implementing Point in Time Recovery means: to regularly take a backup of the database, and to save the binary logs of that database. Executing Point in Time Recovery means: Restoring the backup, Applying the binary logs. / \ M ----> / X \ ----> S ----- 21

BLS@Booking.com Reminder: typical deployment at Booking.com: M +---... ---+... +-------------+---... S1 Sj Sn M1 +--... --+ T1 To 22

BLS@Booking.com We are deploying Binlog Server Clusters to offload masters: M >40 Binlog Servers +-----+-----+... +-------------+---... >20 BLS Clusters >650 slaves replicating / \ / \ from Binlog Servers / A1\ / A2\ Sj Sn M1 ----- ----- +-+-----+-+... -+-----+ / \ / \ S1... Si / Z1\ / Z2\ ----- ----- 23 We have in production:

BLS@Booking.com What is a Binlog Server Cluster? At least 2 Binlog Servers Replicating from the same master With independent failure mode (not same switch/rack/ ) With a Service DNS entry resolving to all IP addresses Failure of a BLS transparent to slaves Thanks to DNS, the slaves connected to a failing Binlog Server reconnect to the others Easy maintenance/upgrade of a Binlog Server +-----+ / \ / \ / A1\ / A2\ ----- ----- +-+-----+-+ S1... Si 24

BLS@Booking.com We are deploying BLS side-by-side with IM to reduce delay: M +-----+-----+... +-------------+-----+-----------... / \ / \ / \-->/ \ / A1\ / A2\ Sj Sn M1 / B1\ / B2\ ----- ----- ----- ----- +-+-----+-+... +-+-----+-+ S1... Si Tk... To 25

BLS@Booking.com We are deploying a new Data Center without IM: M +-----+-----+... +-------------+-----+---------------------+ / \ / \ / \-->/ \ / \-->/ \ / A1\ / A2\ Sj Sn M1 / B1\ / B2\ / C1\ / C2\ ----- ----- ----- ----- ----- ----- +-+-----+-+... +-+-----+-+ +-+-----+-+ S1... Si Tk... To U1... Up ----- 26

HA with Binlog Servers Distributed Binlog Serving Service (DBSS): M +----+----------------------------------------------------------+ +----+-------+--------------+-------+--------------+-------+----+ S1... Sn T1... Tm U1... Uo Properties: A single Binlog Server failure does not disrupt the service (resilience) Minimise inter Data Center bandwidth requirements Allows to promote a new master without touching any slave 27

Zoom in DBSS: HA with Binlog Servers +-------------------------------------------------------------+ +----------------------+---------------------+ / \ / \ / \ / \--->/ \ / \--->/ \ / \--->/ \ ----- / \ ----- / \ ----- / \ ----- ----- ----- +--------------------------------------------------------+ 28

HA with Binlog Servers Crash of the master: +--------------------------------------------------------------+ / \ / \ / \ / \--->/ \ / \--->/ \ / \--->/ \ ----- / \ ----- / \ ----- / \ ----- ----- ----- +--------------------------------------------------------+ 29

HA with Binlog Servers Crash of the master: Step # 1: level the Binlog Servers (the slaves will follow) +--------------------------------------------------------------+ / \ <----------------- / \ ----------------> / \ / \--->/ \ / \--->/ \ / \--->/ \ ----- / \ ----- / \ ----- / \ ----- ----- ----- +--------------------------------------------------------+ 30

HA with Binlog Servers Crash of the master: Step # 2: promote a slave as the new master (there is a trick) +-------------------------------------------------------------+ +----------------------+---------------------+ / \ / \ / \ / \--->/ \ / \--->/ \ / \--->/ \ ----- / \ ----- / \ ----- / \ ----- ----- ----- +--------------------------------------------------------+ 31

HA with Binlog Servers Crash of the master - the trick: Needs the same binary log filename on master and slaves 1. FLUSH BINARY LOGS on candidate master until its binary log filename follows the one available on the BLSs 2. On the new master: PURGE BINARY LOGS TO <latest binary log file>' RESET SLAVE ALL 3. Point the writes to the new master 4. Make the Binlog Servers replicate from the new master From the point of view of the Binlog Server, the master only rebooted with a new ServerID and a new UUID. 32

New Master wo Touching Slaves M +----+----------------------------------------------------+ +----+-------+-----------+-------+-----------+-------+----+ S1... Sn T1... Tm U1... Uo 33

New Master wo Touching Slaves +\-/+ X +/-\+ +---------------------------------------------------------+ +----+-------+-----------+-------+-----------+-------+----+ S1... Sn T1... Tm U1... Uo 34

New Master wo Touching Slaves +\-/+ X T1 +/-\+ +------------------------+--------------------------------+ +----+-------+-----------+-------+-----------+-------+----+ S1... Sn T2... Tm U1... Uo 35

New Master wo Touching Slaves FLUSH BINARY LOGS in a loop is ugly (but it works) A RESET MASTER at/to binlog.00xxxx would be much nicer: https://bugs.mysql.com/bug.php?id=77438 36

Binlog Servers with Orchestrator Orchestrator is the tool we use for managing Binlog Servers https://github.com/orchestrator/orchestrator 37

Binlog Server: Links http://blog.booking.com/mysql_slave_scaling_and_more.html http://blog.booking.com/abstracting_binlog_servers_and_mysql_master_promotion_wo_reconfig uring_slaves.html HOWTO Install and Configure Binlog Servers: http://jfg-mysql.blogspot.com/2015/04/maxscale-binlog-server-howto-install-and-configure.html http://blog.booking.com/better_crash_safe_replication_for_mysql.html http://blog.booking.com/better_parallel_replication_for_mysql.html (http://blog.booking.com/evaluating_mysql_parallel_replication_2-slave_group_commit.html) http://jfg-mysql.blogspot.nl/2015/10/binlog-servers-for-backups-and-point-in-time-recovery.html Note: the Binlog Servers concept should work with any version of MySQL (5.7, 5.6, 5.5 and 5.1) 38

Questions Jean-François Gagné jeanfrancois DOT gagne AT booking.com