Parallel Replication for MySQL in 5 Minutes or Less



Similar documents
Preparing for the Big Oops! Disaster Recovery Sites for MySQL. Robert Hodges, CEO, Continuent MySQL Conference 2011

How, What, and Where of Data Warehouses for MySQL

MySQL performance in a cloud. Mark Callaghan

Solving Large-Scale Database Administration with Tungsten

Future-Proofing MySQL for the Worldwide Data Revolution

From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten

Preventing con!icts in Multi-master replication with Tungsten

Replicating to everything

Configuring Apache Derby for Performance and Durability Olav Sandstå

Synchronous multi-master clusters with MySQL: an introduction to Galera

Linas Virbalas Continuent, Inc.

Azure VM Performance Considerations Running SQL Server

Linas Virbalas Continuent, Inc.

High Availability Solutions for MySQL. Lenz Grimmer DrupalCon 2008, Szeged, Hungary

Virtuoso and Database Scalability

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings

Performance and Scalability Overview

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

How To Scale Myroster With Flash Memory From Hgst On A Flash Flash Flash Memory On A Slave Server

High Availability Solutions for the MariaDB and MySQL Database

bla bla OPEN-XCHANGE Open-Xchange Hardware Needs

Active/Active DB2 Clusters for HA and Scalability

Hadoop: Embracing future hardware

Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Performance and scalability of a large OLTP workload

Accelerating Server Storage Performance on Lenovo ThinkServer

Exploring Amazon EC2 for Scale-out Applications

Database Hardware Selection Guidelines

Capacity Planning Process Estimating the load Initial configuration

HP reference configuration for entry-level SAS Grid Manager solutions

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Audit & Tune Deliverables

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Maximizing SQL Server Virtualization Performance

WHITE PAPER Optimizing Virtual Platform Disk Performance

Cloud Based Application Architectures using Smart Computing

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

ADAM 5.5. System Requirements

Welcome to Virtual Developer Day MySQL!

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Tableau Server Scalability Explained

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Configuring Apache Derby for Performance and Durability Olav Sandstå

StreamServe Persuasion SP5 Microsoft SQL Server

Parallels Cloud Server 6.0

Comparing MySQL and Postgres 9.0 Replication

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

SQL Server Business Intelligence on HP ProLiant DL785 Server

Postgres Plus Advanced Server

Toolbox 4.3. System Requirements

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

PostgreSQL Backup Strategies

White Paper. Recording Server Virtualization


MySQL High Availability Solutions. Lenz Grimmer OpenSQL Camp St. Augustin Germany

N /150/151/160 RAID Controller. N MegaRAID CacheCade. Feature Overview

Parallels Plesk Automation

Apache Derby Performance. Olav Sandstå, Dyre Tjeldvoll, Knut Anders Hatlen Database Technology Group Sun Microsystems

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Real-time reporting at 10,000 inserts per second. Wesley Biggs CTO 25 October 2011 Percona Live

Bigdata High Availability (HA) Architecture

Virtualisa)on* and SAN Basics for DBAs. *See, I used the S instead of the zed. I m pretty smart for a foreigner.

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

Scaling Web Applications on Server-Farms Requires Distributed Caching

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

HP ProLiant DL380p Gen mailbox 2GB mailbox resiliency Exchange 2010 storage solution

Tableau Server 7.0 scalability

Oracle Database In-Memory The Next Big Thing

Tushar Joshi Turtle Networks Ltd

The team that wrote this redbook Comments welcome Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Not Your Grandpa s Replication

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Windows Server Performance Monitoring

Express5800 Scalable Enterprise Server Reference Architecture. For NEC PCIe SSD Appliance for Microsoft SQL Server

Performance White Paper

Enabling Technologies for Distributed Computing

MySQL Cluster Deployment Best Practices

White paper. QNAP Turbo NAS with SSD Cache

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

DBA Tutorial Kai Voigt Senior MySQL Instructor Sun Microsystems Santa Clara, April 12, 2010

ioscale: The Holy Grail for Hyperscale

MakeMyTrip CUSTOMER SUCCESS STORY

Automated Data-Aware Tiering

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Benchmarking Cassandra on Violin

Performance and Scalability Overview

Hardware Configuration Guide

DBMS / Business Intelligence, SQL Server

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Transcription:

Parallel Replication for MySQL in 5 Minutes or Less Featuring Tungsten Replicator Robert Hodges, CEO, Continuent

About Continuent / Continuent is the leading provider of data replication and clustering for open source relational databases / Our Products: Tungsten Replicator - High-performance, MySQL replication Tungsten Enterprise - Commercial replication and data management solution for MySQL and PostgreSQL / Our Services: Consulting on Tungsten plus replication and clustering in general Subscriptions for commercial products

Adding Parallel Replication in 5 Minutes or Less

Problem: Single Threaded Replication Master Slave IO and SQL threads I/O Thread downloads binlog I/O thread writes relay logs; updates coords in master.info SQL Thread reads and applies relay logs; updates coords in relay- logs.info master.info Relay logs relay-log.info MySQL SQL thread is overwhelmed by CPU- or IO-bound queries

Solution: Tungsten Parallel Slave Master Slave IO and SQL threads disabled Client connections Tungsten downloads binlog Tungsten Replicator Tungsten converts binlogs to local log, then applies to slave using parallel connections Disk logs

Installing Tungsten Parallel Replication $ tar -xvzf tungsten-replicator-2.0.5-357.tar.gz $ cd tungsten-replicator-2.0.5-357 $ tools/tungsten-installer tools/tungsten-installer --direct -a \ --service-name=parallel --native-slave-takeover \ --master-host=127.0.0.1 --master-port=33306 \ --master-user=msandbox --master-password=msandbox \ --slave-host=127.0.0.1 --slave-port=33307 \ --slave-user=msandbox --slave-password=msandbox \ --home-directory=/opt/continuent \ --property=replicator.store.parallel-queue.maxofflineinterval=5 \ --svc-parallelization-type=disk --buffer-size=100 \ --channels=30 --thl-port=2115 --rmi-port=10010 \ --skip-validation-check=mysqlpermissionscheck \ --skip-validation-check=mysqlapplierserveridcheck \ --start-and-report (Red options required for MySQL sandboxes)

Understanding Tungsten Parallel Replication

What Is Tungsten Replicator? Tungsten Replicator is a fast, opensource replication engine for open source databases GPL V2 license Written in Java Designed for speed and flexibility http://code.google.com/p/tungsten-replicator

Tungsten Replicator Architecture Master (Replicator configuration) MySQL Tail binlog or login as client Tungsten Replicator (replicator) replicator. properties Binlogs Transaction History Log Transport via TCP/IP connection (Transactions + Metadata) Transaction History Log replicator. properties Tungsten Replicator (replicator) Slave Apply using JDBC MySQL

How Can We Parallelize? / Look for workloads that have independent streams of updates Shared data requires full serialization shared tenant_1 tenant_2 tenant_3 tenant_4 Tenants independent from each other but depend on shared data shared tenant_1 tenant_2 tenant_3 tenant_4 Tenant database updates can move in parallel streams

Replicator Pipelines and Parallel Apply Tungsten Replicator Process Pipeline Extract Stage Assign Shard ID Apply Extract Stage Apply Parallel Queue Extract Extract Extract Stage Apply Apply Apply channels Transaction History Log Binlog THL shard.list file Slave DBMS

Sharding Rules for Safe Parallel Apply SQL Statement/Row Update use myschema; create table foo (id int); use myschema; create table yourschema.foo (id int); begin; insert into yourschema.foo values(1); commit; begin; insert into myschema.foo values(1); insert into yourschema.foo values(1); commit; Shard ID myschema #UNKNOWN (full serialization req d) yourschema #UNKNOWN (full serializationreq d) Summary: Serialize if parallel apply is unsafe

Parallel On-Disk Queue Global sync counter (Store) THL (Stage) thl-to-q THLParallel Queue 1 2 3 (Stage) q-to-dbms channels DBMS Per-channel read threads Queue per channel

How Fast Is Tungsten?

Understanding What to Test Application Application Application Typical web property: 1.) Data size >> buffer pool 2.) Storage typically on disk 3.) Heavy read traffic + writes 4.) Slaves lag after maintenance Master InnoDB Buffer Pool Slave InnoDB Buffer Pool Disk/SSD Storage Disk/SSD Storage

Sysbench Performance Tests Head-to-head Tungsten vs. MySQL replication Test Scenario Databases Rows/Db Data Size Cache-resident 30 10K 430Mb I/O Bound 30 10M 68Gb HP Proliant Server w Dual Xeon L5520, 72 GB RAM, 1TB HP Smart Array RAID 1+0 MySQL Version 5.1.57, 10Gb InnoDB buffer pool Run 1 hour sysbench OLTP load Start slave with empty buffer pool Measure throughput and total catch-up

Cache-Resident Total Binlog Comparison Tungsten Replicator (~17 minutes) MySQL Replication (~30 minutes)

Cache-Resident Throughput Comparison Tungsten ~1.8x faster than MySQL; both exceed master Tungsten Replicator (~17 minutes) MySQL Replication (~30 minutes)

I/O Bound Total Binlog Comparison Tungsten Replicator (~51 minutes) MySQL Replication (~228 minutes)

I/O Bound Throughput Comparison Tungsten ~4.5x faster than MySQL; Only Tungsten exceeds master throughput Tungsten Replicator (~51 minutes) MySQL Replication (~228 minutes)

Tips for Maximizing Parallel Replication Performance

Pick the Right Workload / Parallel replication is great for I/O bound workloads Small buffer pool compared to dataset size Very large data sets Slaves that also take read-only queries / Cache-resident workloads see less benefit Dataset < buffer cache size Few I/O bound updates

Pick the Right Application Profile / Multi-tenant applications Independent customer databases Minimal or no shared data between tenants Uniform distribution of updates across schemas / Horizontally shared applications Data distributed across schemas No cross-shard queries (Rule of thumb: >1-2% serialization kills parallel apply performance)

Allocate Capable Hardware / Ensure enough RAM Large InnoDB buffer pool 1GB RAM for Tungsten Replicator JVM 500M-2GB OS page cache for parallel replication / Fast storage BBU for fast fsync RAID Good controller cache to buffer updates / 1-2 CPUs required for replication / Run off-board to avoid master impact (Not enough testing on SSD to offer recommendations yet)

Tune Tungsten Properly In-memory index Sequential writes, random reads Sequential writes only (best case) Java Virtual Machine 1 writer thread, 0+ reader threads Java Stream Classes OS Page Cache Disk/SSD Log options buffersize - 128Kb dochecksum - Costs CPU but probably worth it fsynconflush - Slow on non-bbu storage but required for crash-save slaves logfilesize - If bigger seek time on start-up is longer OS Settings Ensure at least 1GB in page cache Use Innodb O_DIRECT if onboard with MySQL Storage settings Separate device best but not required

Where Parallel Apply Does Not Help / Single large database Difficult to parallelize transactions safely Tungsten serializes everything / Cross-schema queries Multi-tenant app with shared data between tenants Messy single-application spread across schemas Tungsten serializes cross-schema updates / Economy-class hardware or VMs Adding more threads does not help if one thread already hogs the disk

Off On Your Own

Home Sweet Home http://code.google.com/p/tungsten-replicator

More Than Just Parallel Apply / Global transaction IDs / Flexible transaction filters / Replicate from MySQL to PG/Oracle/MongoDB / Backup and restore integration / Cross-version replication: 5.5 -> 5.1 -> 5.0 -> 4.1 / Row and statement replication / Automatic consistency checks / Multi-master replication / Parallel replication (of course!) / And extensible too

More Tungsten Talks / 1:30pm MySQL Parallel Replication in 5 minutes or less (Robert Hodges) / 2:30pm MySQL Replication outside the Box: Multiple Masters, Fan-in, Parallel Apply (Giuseppe Maxia) / 4:30pm MySQL Sandbox: a Framework for Productive Laziness (Giuseppe Maxia)

Conclusion / Tungsten parallel replication works for MySQL version 5.0+ / You can enable slave takeover in a few minutes / Tungsten Replicator is up to 4.5x faster than MySQL built-in replication Try out Tungsten Replicator today!

Contact Information for Continuent HQ 560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel (866) 998-3642 Fax (408) 668-1009 e-mail: sales@continuent.com Our blogs http://scale-out-blog.blogspot.com http://datacharmer.blogspot.com http://flyingclusters.blogspot.com Continuent Web Site: http://www.continuent.com Tungsten Replicator 2.0: http://code.google.com/p/tungsten-replicator