Testing and Verifying your MySQL Backup Strategy

About the Author Ronald BRADFORD Testing and Verifying your MySQL Backup Strategy Ronald Bradford http://ronaldbradford.com @RonaldBradford 16 years with MySQL / 26 years with RDBMS Senior Consultant at MySQL Inc (06-08) Consultant for Oracle Corporation (96-99) Published author of 4 MySQL books http://ronaldbradford.com/presentations/ 2015.08 OTN TOUR 2015 Uruguay www.uyoug.org.uy Argentina www.aroug.org Chile www.cloug.cl Peru www.peoug.org Latin America Oracle User Groups Community www.laouc.net "No one cares if you can backup, only that you can restore." Adapted from W. Curtis Preston - Backup & Recovery (O'Reilly 2009)

Problem To verify your backup is to restore Test frequently Test for exceptions Restoring is a two part process Restore static backup Restore to point in time "Testing is about breaking your software. Testing is not about checking if it works." Ronald Bradford, 2006 Requirements? Constraints? What is most important? Your restore completes successfully? Your restore matches your backup files? Your restored system has all the data it should have? Your system is back online ASAP? All of these requirements are necessary for success What part takes the most time to restore? Copying your backup files across network? Uncompressing your backup? Restore the backup? Point in time recovery?

Backup Options review Product Options mysqldump mydumper XtraBackup MySQL Enterprise Backup LVM/SAN Snapshot Filesystem copy New options Included with binary distribution Open source Maintained by Oracle Only tool for schema information Poor defaults mysqldump Locks schemas (by default) mysqldump Inconsistent for multi schema applications Does not obtain position needed for point in time recovery Does not backup all objects by default Single threaded Reads all data (into database buffers) Defaults

mysqldump mydumper ASCII copy of data editable Cross platform/version compatible Can perform partial backups Parallel implementation based on mysqldump Open source Community developed and maintained Max Bubenick - Percona https://launchpad.net/mydumper xtrabackup xtrabackup Open source Maintained by 3rd party company Widely used for large InnoDB installations Non-blocking backup (of transactional data) Consistent file copy https://www.percona.com/doc/percona-xtrabackup/

MEB MySQL enterprise backup snapshot Commercial license bundled with subscription support Oracle provided enterprise version Features similar to Xtrabackup LVM/SAN Snapshot DB agnostic MyLVMBackup Open source Community Maintained http://www.lenzg.net/mylvmbackup/ snapshot upcoming Read Consistent view FLUSH TABLES WITH READ LOCK Can take a long time on highly concurrent systems mysqlpump (5.7.8) Open source Developed by Oracle rewrite of mysqldump parallel implementation of mysqldump https://dev.mysql.com/doc/refman/5.7/en/mysqlpump.html

point in time What is missing? MySQL Binary Logs Are they enabled? Binary log position Obtained consistently SaaS (per schema issues) Backup Options binary log file copy cp rsync Use a slave --log-slave-updates DRBD - Disk Replicated Block Device mysqlbinlog --read-from-remote-server (New in 5.6) Binary logs are sequential files cp/rsync mysqladmin flush-logs e.g. AWS 5 minutes strategy

mirrored binary log Remote copy binary logs as created No automatic restart management Requires MySQL 5.6+ (can be on a slave) B&R Strategy Considerations Design choices storage engines Time to backup Time to restore Consistency Flexibility Partial Capabilities Cost InnoDB only MyISAM mysql schema Other storage engines Mixed data solutions

Requirements Technical Requirements (Regardless of method) Schema & Data Schema Size/Tables etc Replication architecture Auditing Security Checksums Storage Engines Do these help to test and verify? schema & objects meta info Can only be obtained via mysqldump Required for test systems & subsets Auditability of schema changes Schema size Tables size Table objects

meta info ALL Schemas Size Meta Info Table size SELECT FROM FORMAT(SUM(data_length+index_length)/1024/1024,2) AS total_mb, FORMAT(SUM(data_length)/1024/1024,2) AS data_mb, FORMAT(SUM(index_length)/1024/1024,2) AS index_mb, COUNT(DISTINCT table_schema) AS schema_cnt, COUNT(*) AS tables, CURDATE() AS today, VERSION() information_schema.tables\g *************************** 1. row *************************** total_mb: 5344.63 data_mb: 4545.49 index_mb: 799.13 schema_cnt: 7 tables: 103 today: 2012-04-03 VERSION(): 5.1.61-0ubuntu0.11.10.1-log Your daily verification step should include this DROP TABLE IF EXISTS db_size; CREATE TABLE db_size( table_schema VARCHAR(64) NOT NULL, table_name VARCHAR(64) NOT NULL, engine VARCHAR(64) NOT NULL, row_format VARCHAR(10) NULL, table_rows INT UNSIGNED NOT NULL, avg_row INT UNSIGNED NOT NULL, total_mb DECIMAL(7,2) NOT NULL, data_mb DECIMAL(7,2) NOT NULL, index_mb DECIMAL(7,2) NOT NULL, created_date DATETIME NOT NULL, INDEX (created_date,table_name) ) ENGINE=InnoDB; Keep a copy of all tablesize info per backup https://github.com/ronaldbradford/mysql/blob/master/sql/database_size.sql https://github.com/ronaldbradford/mysql/blob/master/sql/db_size_table.sql Meta Info Table size architecture replication topology DELIMITER $$ DROP PROCEDURE IF EXISTS generate_db_size$$ CREATE PROCEDURE generate_db_size() BEGIN DECLARE l_created_date DATETIME; SET l_created_date := NOW(); INSERT INTO db_size(table_schema, table_name, engine, row_format, table_rows, avg_row, total_mb, data_mb, index_mb, created_date) SELECT table_schema, table_name, engine, row_format, table_rows, avg_row_length AS avg_row, ROUND((data_length+index_length)/1024/1024,2) AS total_mb, ROUND((data_length)/1024/1024,2) AS data_mb, ROUND((index_length)/1024/1024,2) AS index_mb, l_created_date FROM information_schema.tables WHERE table_schema=database() AND table_type='base TABLE' ORDER BY 6 DESC; SELECT l_created_date AS created_date, DATABASE() AS table_schema, COUNT(*) AS tables, FORMAT(SUM(total_mb),2) AS total_mb, FORMAT(SUM(data_mb),2) AS data_mb, FORMAT(SUM(index_mb),2) AS index_mb FROM db_size WHERE created_date = l_created_date; https://github.com/ronaldbradford/mysql/blob/master/sql/generate_db_size.sql Do you backup the master or the slave? Binary position option changes Data drift? Do you backup the entire DB? Is some data static? Can a recovered system start without need for all data?

Backup Options SNAPSHOT USAGE $ sudo su - $ sync ; lvcreate -L1G -s -n dbsnapshot /dev/db/p0 $ mkdir -p /mnt/dbsnapshot $ mount -o ro /dev/db/dbsnapshot /mnt/dbsnapshot $ du -sh /mnt/dbsnapshot $ ls -al /mnt/dbsnapshot $ mkdir /mysql/backup/snapshot1 $ cp -r /mnt/dbsnapshot/* /mysql/backup/snapshot1 Testing & Verification $ sudo su - $ mylvmbackup TIP: mylvmbackup does all the hard work http://effectivemysql.com/article/configuring-a-new-hard-drive-for-lvm http://effectivemysql.com/article/using-mysql-with-lvm http://www.lenzg.net/mylvmbackup/ Incomplete backup file Missing objects Locking Missing point in time position Restore time considerations Fake objects Situations Trigger hell ^C does not kill mysqldump Backup Options mysqldump USAGE $ time mysqldump --all-databases > /mysql/backup/dump1.sql real 1m31.631s user 1m12.533s sys 0m10.893s $ echo $? 0 $ ls -lh /mysql/backup/dump1.sql -rw-rw-r-- 1 uid gid 2.9G 2012-04-03 03:04 /mysql/ backup/dump1.sql

Backup Options VERIFICATION $ time mysqldump --all-databases > /mysql/backup/dump1.sql mysqldump: Got error: 1142: SELECT,LOCK TABL command denied to user 'root'@'localhost' for table 'cond_instances' when using LOCK TABLES $ echo $? 2 TIP: Error checking is essential and easy to implement Invalid objects Error examples Lack of permissions mysqldump: Couldn't execute 'SHOW FIELDS FROM `alarm`': View 'schema.alarm' references invalid table(s) or column(s) or function(s) or definer/ invoker of view lack rights to use them (1356) mysqldump: Got error: 1044: Access denied for user 'rbradfor'@'localhost' to database 'schema' when using LOCK TABLES Backup Options mysqldump USAGE Backup Options mysqldump USAGE $ time mysqldump --all-databases \ --events --routines > /mysql/backup/dump1.sql $ time mysqldump --all-databases \ --events --routines \ --single-transaction > /mysql/backup/dump1.sql All objects are NOT included by default Remove default schema locking

Backup Options mysqldump USAGE Backup improvment Separate schema/objects $ time mysqldump --all-databases \ --events --routines \ --single-transaction \ --master-data > /mysql/backup/dump1.sql Point in Time recovery position # Schema $ mysqldump --all-databases --no-data \ --skip-triggers --add-drop-database \ > /mysql/backup/schema.sql TIP: Include daily dumps of database objects mysqldump: Error: Binlogging on server not active $ time mysqldump --all-databases \ --events --routines \ --single-transaction \ --dump-slave > /mysql/backup/dump1.sql Fail on test system? Different for master or slave # Objects $ mysqldump --all-databases --no-data --no-create-info \ --events --routines --triggers \ > /mysql/backup/objects.sql # Just the data $ mysqldump --all-databases --no-create-info \ --single-transaction --master-data --skip-triggers \ > /mysql/backup/data.sql Restore Options mysqldump USAGE $ time mysqldump --all-databases > /mysql/backup/dump1.sql real 1m31.631s... $ time mysql u[user] -p -f < dump1.sql > dump1.out 2>&1 real 14m13.817s user 1m6.960s Continue on error or sys 0m1.516s stop on error? $ echo $? 0 $ ls -l dump1.out -rw-rw-r-- 1 uid gid 0 2012-04-08 04:07 dump1.out Fake objects mysqldump USAGE DROP TABLE IF EXISTS `demo`; /*!50001 DROP VIEW IF EXISTS `demo`*/; SET @saved_cs_client = @@character_set_client; SET character_set_client = utf8; /*!50001 CREATE TABLE `demo` ( `zip` tinyint NOT NULL, `lat` tinyint NOT NULL, `lon` tinyint NOT NULL ) ENGINE=MyISAM */; SET character_set_client = @saved_cs_client; /*!50001 DROP TABLE IF EXISTS `demo`*/; /*!50001 DROP VIEW IF EXISTS `demo`*/; /*!50001 CREATE ALGORITHM=UNDEFINED */ /*!50013 DEFINER=`root`@`localhost` SQL SECURITY DEFINER */ /*!50001 VIEW `demo` AS select ùszip`.`zip` AS `zip`,ùszip`.`lat` AS `lat`,ùszip`.`lon` AS `lon` from ùszip` */;

Fake objects mysqldump USAGE Compression -- -- Temporary view structure for view `demo` -- DROP TABLE IF EXISTS `demo`; /*!50001 DROP VIEW IF EXISTS `demo`*/; /*!50001 CREATE VIEW `demo` AS SELECT 1 AS `zip`, 1 AS `lat`, 5.7 syntax 1 AS `lon`*/;... -- -- Final view structure for view `demo` -- /*!50001 DROP VIEW IF EXISTS `demo`*/; /*!50001 CREATE ALGORITHM=UNDEFINED */ /*!50013 DEFINER=`root`@`localhost` SQL SECURITY DEFINER */ /*!50001 VIEW `demo` AS select ùszip`.`zip` AS `zip`,ùszip`.`lat` AS `lat`,ùszip`.`lon` AS `lon` from ùszip` */; How long to compress the data? Which compression do you use? How long to uncompress the data? 95% of recovery is from last backup Best strategy is to have an uncompressed copy on server. Compression Backup needs Utility Comp (s) Dec (s) Saving lzo (-3) 21 34 48% pigz (-1) 43 33 64% pigz [-6] 105 25 69% gzip [-6] 232 78 69% bzip2 540 175 74% lzo (-9) 20m 82 58% lzma 58m 180 78% xz 59m 160 78% Static Backup Useless without Binary Log position mysql> SHOW MASTER STATUS\G File: mysql-bin.020616 Position: 63395562 Binlog_Do_DB: Binlog_Ignore_DB: WARNING: Can work on slave and provide the wrong information Depends greatly on data types

Backup Needs Situations Binary Log $ mysqldump --master-data (or --dump-slave) CHANGE MASTER TO MASTER_LOG_FILE= 'mysql-bin.000146', MASTER_LOG_POS=810715371; Xtrabackup $ cat xtrabackup_binlog_info mysql-bin.000001 37522 MEB $ grep binlog meta/backup_variables.txt binlog_position=mysql-bin.000017:5555 mydumper $ cat export-20120407-230027/metadata Log: mysql-bin.000017 Pos: 8328 Lost connection Transaction too large ROW format statement SET max_allowed_packet Single threaded what about Failover Is failover a better option? A different strategy? Do you have high availability (HA)implemented? Top Benefit Improved time to recovery Top Risk [Potential] loss of high availability

failover issues What to consider Failover does not cater for all situations Data recovery to a prior point in time e.g. security problem building testing systems Building out replicas Entirely different set of problems to verify Forget the technology? What is most important for your business? B&R/DR decision should be based on loss/ cost to business RPO/RDO/SLA Terminology Conclusion MTTD - Mean Time To Detect MTTR - Mean Time to Recover RPO - Recovery Point Objective RDO - Recovery Data Objective SLA - Service Level Agreement B&R is a complex process Business will want all of the data, all of the time Justify the business needs first Discounts [easier] options Determining business priorities is important for any strategy

Ronald Bradford 220 pages dedicated to B&R http://j.mp/em-book2 http://ronaldbradford.com me@ronaldbradford.com @RonaldBradford