Using Tungsten Replicator to solve replication problems Neil Armitage, Cluster implementation Engineer, Continuent Giuseppe Maxia, QA Director, Continuent 1 1
ABOUT US Neil Armitage Continuent Tungsten Deployment and Support Engineer, Continuent, Inc 20 years development and DB experience Giuseppe Maxia, a.k.a. "The Data Charmer" QA Director, Continuent, Inc 25 years development and DB experience long timer MySQL community member. Oracle ACE Director 2 2
Tungsten replicator Global transaction ID Multiple masters Multiple sources Flexible topologies Parallel replication Heterogeneous replication... and more 3 3
What Tungsten Replicator is NOT Automated management Automatic failover Transparent connections All the above (and more) are available with a commercial solution named Continuent Tungsten (a.k.a. Tungsten Enterprise) 4 4
What are we talking about? Requirements Components Installation Topologies Administration Troubleshooting 5 5
Tungsten Replicator Concepts Replicator The replication engine Role Master, slave, direct slave service A.k.a. "pipeline" stage extract,queue,apply 6 6
Tungsten Replicator Components THL Transaction History Log service schema Makes the node crash proof properties file service definition tools Ruling from a centralized location 7 7
Tungsten Replicator in a nutshell host1 master host2 slave binlog global transaction ID THL THL trep_commit_seqno origin seqno eventid trep_commit_seqno origin seqno eventid 8 8
Planning Hosts Topology Stand-alone or taking over 9 9
master-slave fan-in slave MySQL Oracle Oracle MySQL all-masters Heterogeneous Oracle MySQL MySQL Oracle star 10
Installation 11 11
Installation System Requirements Validate!rst Deploying from a single location 12 12
Installation - tools tools/ tungsten-installer tools/ con!gure-service tools/update (Using the cookbook recipes, you hardly see them) 13 13
Tungsten in practice Installation 14 14
Installation Check the requirements Get the binaries Expand the tarball Run cookbook 15 15
REQUIREMENTS Java JRE or JDK (Sun/Oracle or Open-jdk) Ruby 1.8 (only during installation) ssh access to the same user in all nodes MySQL user with all privileges 16 16
Installation - Choices --master-slave --direct 17 17
master-slave host1 master host2 slave binlog THL THL host3 slave THL 18 18
direct host1 master host2 slave relay log binlog host3 THL slave relay log THL 19 19
Overview of Virtual Machines Copy zip!les from USB Key Expand on local disk Start all 4 Machines in VirtualBox 20 20
Virtual Machines 4 Nodes host1->host4 Running centos 6.3 and Percona 5.5 Root and tungsten password = password localhost port 2222 redirects to 22 on hosts ssh - p 2222 tungsten@localhost 21 21
VERY important de!nitions Staging directory: Where you unpack the software and run the installer. There is generally only one, in one host; Can be discarded after installation Installation directory: Where your installed software will go; There is one for every host; 22 22
Example host1 Staging directory: $HOME/tungsten-replicator-2.0.8-167 Installation directory: /opt/replication Installation directory: /opt/replication host2 Installation directory: /opt/replication host3 23 23
Requirements : how to step by step: how it happened 24 24
installing VMs Step-by-step demo 25 25
Overview of Tungsten cookbook 26 26
tungsten cookbook tungsten-replicator-2.0.8-167 +--/cluster-home +--/cookbook +--/tools +--/tungsten-replicator 27 27
tungsten cookbook tungsten-replicator-2.0.8-167 +--/cookbook +--COMMON_NODES.sh +--USER_VALUES.sh +--NODES_MASTER_SLAVE.sh +--install_master_slave +--show_cluster +--test_cluster... 28 28
tungsten cookbook tungsten-replicator-2.0.8-167 +--/cookbook +--COMMON_NODES.sh +--USER_VALUES.sh +--NODES_ALL_MASTERS.sh +--install_all_masters +--show_cluster +--test_cluster... 29 29
tungsten cookbook tungsten-replicator-2.0.8-167 +--/cookbook +--COMMON_NODES.sh +--USER_VALUES.sh +--NODES_STAR.sh +--install_star +--show_cluster +--test_cluster... 30 30
tungsten cookbook tungsten-replicator-2.0.8-167 +--/cookbook +--COMMON_NODES.sh +--USER_VALUES.sh +--NODES_FAN_IN.sh +--install_fan_in +--show_cluster +--test_cluster... 31 31
tungsten cookbook $ cat COMMON_NODES.sh export NODE1=host1 export NODE2=host2 export NODE3=host3 export NODE4=host4 32 32
tungsten cookbook $ cat USER_VALUES.sh # User defined values for the cluster to be installed. export TUNGSTEN_BASE=$HOME/installs/cookbook export DATABASE_USER=tungsten export BINLOG_DIRECTORY=/var/lib/mysql export MY_CNF=/etc/my.cnf export DATABASE_PASSWORD=secret export DATABASE_PORT=3306 export TUNGSTEN_SERVICE=cookbook export RMI_PORT=10000 export THL_PORT=2112 export START_OPTION=start 33 33
Getting started: VALIDATE FIRST export VERBOSE=1./cookbook/check_cookbook./cookbook/validate_cluster 34 34
sample master-slave installation edit cookbook/common_nodes.sh edit cookbook/user_values.sh run cookbook/install_master_slave and then: run cookbook/show_cluster run cookbook/test_cluster 35 35
What does the installation do 1: Validate all servers host4 host1 host2 host3 Report all errors 36 36
What does the installation do 1: (again) Validate all servers host4 host1 host2 host3 37 37
What does the installation do 2: install Tungsten in all servers $HOME/ tinstall/ config/ releases/ relay/ thl/ tungsten/ backups/ host4 host1 host2 host3 38 38
example (from manual installation) ssh r2 chmod 444 $HOME/tinstall./tools/tungsten-installer \ --master-slave --master-host=r1 \ --datasource-user=tungsten \ --datasource-password=secret \ --service-name=dragon \ --home-directory=$home/tinstall \ --thl-directory=$home/tinstall/logs \ --relay-directory=$home/tinstall/relay \ --cluster-hosts=r1,r2,r3,r4 --start ERROR >> qa.r2.continuent.com >> /home/tungsten/ tinstall is not writeable 39 39
example ssh r2 chmod 755 $HOME/tinstall./tools/tungsten-installer \ --master-slave --master-host=r1 \ --datasource-user=tungsten \ --datasource-password=secret \ --service-name=dragon \ --home-directory=$home/tinstall \ --thl-directory=$home/tinstall/logs \ --relay-directory=$home/tinstall/relay \ --cluster-hosts=r1,r2,r3,r4 --start # no errors 40 40
After installation. A tour of the cookbook utilities 41 41
General principles (1) Scripts without extension are designed to be launched by users e.g../cookbook/help./cookbook/install_master_slave Scripts with extension ".sh" are either for internal use only or deprecated../cookbook/install_* scripts can be used before installing. Most everything else require an installed topology 42 42
General principles (2) After installation there is a!le CURRENT_TOPOLOGY in the staging directory cookbook scripts can be used either from the staging directory or from the installation directory. 43 43
Cookbook tour: help and checks./cookbook/check_cookbook./cookbook/help./cookbook/readme 44 44
Cookbook tour: Getting information./cookbook/show_cluster./cookbook/paths./cookbook/backups./cookbook/services./cookbook/query_node {node} {query}./cookbook/query_all_nodes {query} 45 45
Cookbook tour: Inspecting replication./cookbook/replicator./cookbook/trepctl./cookbook/thl./cookbook/show_conf./cookbook/edit_conf./cookbook/show_log./cookbook/vimlog./cookbook/emacslog 46 46
Cookbook tour: testing tools./cookbook/test_cluster./cookbook/start_load [start stop]./cookbook/test_all_topologies 47 47
Cookbook tour: powerful admin tools./cookbook/heartbeat./cookbook/switch./cookbook/add_node_master_slave./cookbook/add_node_star./cookbook/copy_backup./cookbook/clear_cluster # <--- CAUTION! 48 48
More installation 49 49
DRY-RUN Method to simulate installation; Does NOT perform installation; Does NOT even do validation; It only shows the commands used to install; Allows you to get the commands and do an installation manually (e.g. when you can't ssh between nodes) 50 50
DRY-RUN export DRYRUN=1./cookbook/install_master_slave 51 51
Intro to multi-master installation 52 52
How tungsten-installer Works for Basic Master/Slave Deployment Staging copy of files db1 db2 check prereqs copy code db3 configure 53 53
From Master/Slave Replication... db2 db1 Replicator Service alpha Replicator Service alpha db3 Replicator tungsteninstaller Service alpha Install master and slaves on the whole cluster 54 54
To Multi-Master db1 Replicator Replicator db2 Service alpha Service alpha Service bravo Service bravo tungsteninstaller tungsteninstaller con!gureservice con!gureservice Install master on db1 install master on db2 install slave service on db1 install slave service on db2 55 55
tungsten-installer master 1 TUNGSTEN_HOME=/home/tungsten/installs/cookbook./tools/tungsten-installer --master-slave --master-host=$master1 --datasource-port=3306 --datasource-user=tungsten --datasource-password=secret --datasource-log-directory=/var/lib/mysql --service-name=alpha --home-directory=$tungsten_home --cluster-hosts=$master1 --start creating service 'alpha' Notice: --cluster-hosts has only one host 56 56
tungsten-installer master 2 TUNGSTEN_HOME=/home/tungsten/installs/cookbook./tools/tungsten-installer --master-slave --master-host=$master2 --datasource-port=3306 --datasource-user=tungsten --datasource-password=secret --datasource-log-directory=/var/lib/mysql --service-name=bravo --home-directory=$tungsten_home --cluster-hosts=$master2 --start creating service 'bravo' Notice: --cluster-hosts has only one host 57 57
Con!gure Service master 1 TUNGSTEN_HOME=/home/tungsten/installs/cookbook $TUNGSTEN_HOME/tungsten/tools/configure-service -C --quiet --host=$master1 --datasource=$master1 --local-service-name=alpha --role=slave --service-type=remote --release-directory=$tungsten_home/tungsten --skip-validation-check=thlstoragecheck --master-thl-host=$master2 --master-thl-port=2112 --svc-start bravo Notice: bravo is the master service in host 2 58 58
Con!gure Service master 2 TUNGSTEN_HOME=/home/tungsten/installs/cookbook $TUNGSTEN_HOME/tungsten/tools/configure-service -C --quiet --host=$master2 --datasource=$master2 --local-service-name=bravo --role=slave --service-type=remote --release-directory=$tungsten_home/tungsten --skip-validation-check=thlstoragecheck --master-thl-host=$master1 --master-thl-port=2112 --svc-start alpha Notice: alpha is the master service in host 1 59 59
From Master/Slave Replication... db2 db1 Replicator Service db1 Replicator Service db1 db3 Replicator Service db1./cooobook/install_master_slave 60 60
How Do I Install Fan-In Replication? db1 Replicator Service db1 Replicator db2 Replicator Service db1 Service db2 Service db2 db3./cooobook/install_fan_in 61 61
How Do I Install Multi-Master? db1 Replicator Service db1 Service db2 db2 Service db1 Service db2 Replicator./cooobook/install_all_masters 62 62
How Do I Extend Multi-Master? db1 Replicator Service db1 Service db2 Service db3 Replicator Service db1 db3 db2 Service db2 Service db3 Service db1 Service db2 Service db3 Replicator 63 63
How Do I Extend Multi-Master? db1 Replicator Service db1 Service db2 Service db3 Service db4 Replicator Service db1 Service db2 Service db3 Service db4 db3 db2 Service db1 Service db1 db4 Service db2 Service db2 Service db3 Service db3 Service db4 Service db4 Replicator Replicator 64 64
How Do I Install a Star Topology? db1 Replicator Service db1 Service db3 Replicator Service db1 db3 HUB db2 Service db2 Service db3 Service db2 Service db3 Replicator./cooobook/install_star 65 65
How Do I Extend a Star Topology? db1 Replicator Service db1 Service db3 Service db1 Service db2 db3 HUB db2 Replicator Service db2 Service db3 Service db4 Service db3 db4 Service db3 Service db4 Replicator 66 66
How Do I Extend a Star Topology? db1 Replicator Service db1 Service db3 Service db1 Service db2 db3 HUB db2 Replicator Service db2 Service db3 Service db4 Service db5 Service db3 db5 Replicator Service db5 Service db3 db4 Service db3 Service db4 Replicator 67 67
BI-DIR: the painless way edit cookbook/common_nodes.sh edit cookbook/user_values.sh remove two nodes edit the variables in cookbook/ NODES_ALL_MASTERS.sh cookbook/install_all_masters 68 68
Multiple masters fan-in Steps: install a master service in each node install a slave service for each master in the fanin node or : cookbook/install_fan_in 69 69
Multiple masters star topology Steps: install a master service in each server in the hub, install a slave service for each spoke in each spoke, install a slave service for the hub, using bypass option cookbook/install_star 70 70
Taking Over from Standard Replication cookbook/install_standard_replicaton cookbook/takeover 71 71
Replication Management 72 72
Common Commands replicator trepctl thl the Tungsten service schema 73 73
replicator It s the service provider You launch it once when you start You may restart it when you change con!g 74 74
trepctl Tungsten Replicator ConTroLler It s the driving seat for your replication You can start, update, and stop services You can get speci!c info 75 75
trepctl Tungsten Replicator Controller put services online or o"ine check status skip events inspect internals change roles heartbeat backup/restore... and a lot more 76 76
thl Transaction History List Gives you access to the Tungsten transaction history logs 77 77
thl Transaction History Log info index list (total or a speci!c event, or by range) purge 78 78
Tungsten service schema one for each service named "tungsten_service_name" e.g. tungsten_alpha, tungsten_dragon Most important table: trep_commit_seqno 79 79
Looking at the tungsten service db select * from tungsten_dragon.trep_commit_seqno\g ******************* 1. row ******************* task_id: 0 seqno: 102 fragno: 0 last_frag: 1 source_id: qa.r1.continuent.com epoch_number: 0 eventid: mysql-bin. 000002:0000000000018903;0 applied_latency: 0 update_timestamp: 2012-02-06 05:56:12 shard_id: tungsten_dragon extract_timestamp: 2012-02-06 05:56:09 80 80
Where are the tools in the tungsten directory: $TUNGSTEN_BASE/tungsten/tungsten-replicator/bin replicator trepctl thl # the daemon # replicator controller # transaction history log tool 81 81
Starting and stopping the replicator cd $TUNGSTEN_BASE/tungsten/tungsten-replicator/bin./replicator status Tungsten Replicator Service is running (PID:32400)../replicator stop Stopping Tungsten Replicator Service... Stopped Tungsten Replicator Service../replicator start Starting Tungsten Replicator Service...... or./cookbook/replicator... 82 82
checking replicator vitals trepctl services Processing services command... NAME VALUE ---- ----- appliedlastseqno: -1 # bad sign? appliedlatency : -1.0 role : slave servicename : dragon servicetype : local started : true state : ONLINE Finished services command... 83 83
sending a heartbeat trepctl -host $MASTER_HOST heartbeat trepctl services Processing services command... NAME VALUE ---- ----- appliedlastseqno: 102 appliedlatency : 3.139 role : slave servicename : dragon servicetype : local started : true state : ONLINE Finished services command... 84 84
replicator status (1) trepctl status Processing status command... NAME VALUE ---- ----- appliedlasteventid : mysql-bin.000002:0000000000018903;0 appliedlastseqno : 102 appliedlatency : 3.139 clustername : default currenteventid : NONE currenttimemillis : 1328504342058 dataserverhost : qa.r4.continuent.com extensions : latestepochnumber : 0 masterconnecturi : thl://qa.r1.continuent.com:2112/ masterlistenuri : thl://qa.r4.continuent.com:2112/ maximumstoredseqno : 102 minimumstoredseqno : 0 [...] 85 85
replicator status (2) [...] offlinerequests : NONE pendingerror : NONE pendingerrorcode : NONE pendingerroreventid : NONE pendingerrorseqno : -1 pendingexceptionmessage: NONE resourceprecedence : 99 rmiport : 10000 role : slave seqnotype : java.lang.long servicename : dragon servicetype : local simpleservicename : dragon sitename : default sourceid : qa.r4.continuent.com state : ONLINE timeinstateseconds : 245.215 uptimeseconds : 245.539 Finished status command... 86 86
A failover scenario 1: MySQL native replication 87 87
1. one Master, two slaves Loading the employees test database 88 88
2. Master goes away * Stop replication * Slaves are updated at di"erent levels # 2 select count(*) from titles 333,145 # 3 select count(*) from titles 443,308 89 89
3. Look into Slave #2 binary logs!nd the last transaction 90 90
4. Look into Slave #3 binary logs 1.!nd the transaction that was last in slave #2 2. Recognize that last transaction in the log of slave #3 (This can actually take you a LOOOONG TIME) 3. Get the position immediately after this transaction 4. (e.g. 134000 in!le mysql-bin.000018) 91 91
5. promote Slave #3 to master * in slave #2 CHANGE MASTER TO master_host= slave_3_ip, master_user= slavename, master_password= slavepassword, master_log_file= mysql-bin.000018, master_log_pos=134000; 92 92
A failover scenario 1I: Tungsten Replicator 93 93
1. one master, two slaves loading the employees test database 94 94
2. Master goes away * Stop replication * Slaves are updated at di"erent levels # 2 select count(*) from titles 333,145 # 3 select count(*) from titles 443,308 95 95
3. no need to!nd the last transaction # simply change roles trepctl -host slave3 setrole -role master trepctl -host slave2 setrole \ -role slave -uri thl://slave3 trepctl -host slave3 online State: ONLINE trepctl -host slave2 online State: GOING-ONLINE:SYNCHRONIZING 96 96
4. Check that the slave has synchronized # new master select seqno from tungsten.trep_commit_seqno; 78 # new slave select seqno from tungsten.trep_commit_seqno; 64 97 97
4. Tell the replicator to hurry up # new master trepctl -node slave3 flush Master log is synchronized with database at log sequence number: 78 # new slave trepctl host slave2 wait -applied 78 ONLINE select seqno from tungsten.trep_commit_seqno; 78 98 98
4.... and we re done # new master select count(*) from employees.titles count(*) 443308 # new slave: count(*) 443308 99 99
planned role switch cookbook/install_master_slave cookbook/switch 100 100
Switching roles in master/slave replication (1) online db2 db1 Replicator Service db1 Replicator Service db1 db3 online Replicator Service db1 online 101 101
Switching roles in master/slave replication (2) online db2 db1 Replicator Service db1 Replicator Service db1 db3 o"ine Replicator Service db1 online 102 102
Switching roles in master/slave replication (3) online db2 db1 Replicator Service db1 Replicator Service db1 db3 o"ine Replicator Service db1 Wait for transactions to be applied online 103 103
Switching roles in master/slave replication (4) o"ine db2 db1 Replicator Service db1 Replicator Service db1 db3 o"ine Replicator Service db1 Slaves go offline o"ine 104 104
Switching roles in master/slave replication (5) o"ine db2 db1 Replicator Service db1 Replicator Service db1 db3 o"ine Replicator Service db1 Slave is promoted. Notice: 2 masters, but o"ine o"ine 105 105
Switching roles in master/slave replication (6) o"ine db2 db1 Replicator Service db1 Replicator Service db1 db3 o"ine Replicator Service db1 old master becomes slave o"ine 106 106
Switching roles in master/slave replication (7) o"ine db2 db1 Replicator Service db1 Replicator Service db1 db3 o"ine Replicator Service db1 slaves are directed to new master o"ine 107 107
Switching roles in master/slave replication (8) online db2 db1 Replicator Service db1 Replicator Service db1 db3 online Replicator Service db1 all nodes go online, using new master online 108 108
Tungsten GTID vs MySQL 5.6 GTID What is GTID How it works in Tungsten How it works (or not) in MySQL 5.6 109 109
without global transaction ID commit commit commit commit A master binlog position binlog binlog slave B C slave position 110 position 110
with global transaction ID commit commit commit commit A master id#200 slave B C slave id#200 111 id#200 111
Tungsten and global transaction ID: activation (none) active by default 112 112
Tungsten and global transaction ID: status trepctl status Processing status command... NAME VALUE ---- ----- appliedlasteventid : mysql-bin.000002:0000000000001442;0 appliedlastseqno : 6 appliedlatency : 0.862 clustername : default currenteventid : NONE currenttimemillis : 1354304680923 dataserverhost : qa.r4.continuent.com 113 113
Tungsten and global transaction ID: seeing transactions thl list -seqno 6 SEQ# = 6 / FRAG# = 0 (last frag) - TIME = 2012-11-30 20:44:35.0 - EPOCH# = 0 - EVENTID = mysql-bin.000002:0000000000001442;0 - SOURCEID = qa.r1.continuent.com - SQL(0) = insert into test.v1 values (1, 'inserted by node #1') /* SERVICE = [cookbook] */ 114 114
Tungsten and global transaction ID: changing master connection trepctl offline trepctl online -seqno 105 115 115
Tungsten and Global transaction ID: crash-safe slave tables mysql -e 'select * from tungsten_cookbook.trep_commit_seqno\g' *************************** 1. row *************************** task_id: 0 seqno: 6 fragno: 0 last_frag: 1 source_id: qa.r1.continuent.com epoch_number: 0 eventid: mysql-bin.000002:0000000000001442;0 applied_latency: 0 update_timestamp: 2012-11-30 20:44:35 shard_id: test extract_timestamp: 2012-11-30 20:44:35 116 116
Tungsten and Global transaction ID: crash-safe tables and parallel replication mysql -e 'select seqno, source_id, shard_id,update_timestamp from tungsten_cookbook.trep_commit_seqno' +-------+----------------------+----------+---------------------+ seqno source_id shard_id update_timestamp +-------+----------------------+----------+---------------------+ 7 qa.r1.continuent.com db1 2012-11-30 20:54:14 8 qa.r1.continuent.com db2 2012-11-30 20:54:14 9 qa.r1.continuent.com db3 2012-11-30 20:54:14 10 qa.r1.continuent.com db4 2012-11-30 20:54:14 11 qa.r1.continuent.com db5 2012-11-30 20:54:14 12 qa.r1.continuent.com db6 2012-11-30 20:54:14 13 qa.r1.continuent.com db7 2012-11-30 20:54:14 14 qa.r1.continuent.com db8 2012-11-30 20:54:14 15 qa.r1.continuent.com db9 2012-11-30 20:54:14 16 qa.r1.continuent.com db10 2012-11-30 20:54:14 +-------+----------------------+----------+---------------------+ 117 117
MySQL 5.6 and global transaction ID activation mysqld --log-slave-updates \ --gtid-mode=on \ --enforce-gtid-consistency WARNING: before MySQL 5.6.10, it was --disable-gtid-unsafe-statements 118 118
MySQL 5.6 and global transaction ID seeing transactions #121203 11:15:49 server id 1 end_log_pos 344 CRC32 0x45b25c8f GTID [commit=yes] SET @@SESSION.GTID_NEXT= '7A77A490-3D3A-11E2-8CC9-7DCF9991097B: 2'/*!*/; # at 344 #121203 11:15:49 server id 1 end_log_pos 423 CRC32 0x873c8fac Query thread_id=3 exec_time=0 error_code=0 SET TIMESTAMP=1354533349/*!*/; BEGIN /*!*/; # at 423 #121203 11:15:49 server id 1 end_log_pos 522 CRC32 0xb4bf4372 Query thread_id=3 exec_time=0 error_code=0 SET TIMESTAMP=1354533349/*!*/; insert into t1 values (1) 119 119
MySQL 5.6 and global transaction ID status show slave status\g *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_User: rsandbox Master_Port: 13233 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 1837 Relay_Log_File: mysql_sandbox13234-relay-bin.000005 Relay_Log_Pos: 2047 Relay_Master_Log_File: mysql-bin.000002... Retrieved_Gtid_Set: 46E13434-3B28-11E2-BF47-6C626DA07446:1-7 Executed_Gtid_Set: 46E13434-3B28-11E2-BF47-6C626DA07446:1-7 120 120
MySQL 5.6 and global transaction ID changing master connection CHANGE MASTER TO master_log_file='mysql-bin-000003', master_log_pos='1234' # No global transaction ID is used 121 121
MySQL 5.6 and global transaction ID crash-safe slave table select * from slave_relay_log_info\g ********************* 1. row ******************** Number_of_lines: 7 Relay_log_name:./mysql_sandbox13234-relay-bin.000005 Relay_log_pos: 2047 Master_log_name: mysql-bin.000002 Master_log_pos: 1837 Sql_delay: 0 Number_of_workers: 5 Id: 1 # NO Global transaction ID is used! 122 122
MySQL 5.6 and global transaction ID crash-safe slave table + parallel select * from mysql.slave_worker_info\g Id: 12 Relay_log_name:./mysql_sandbox13234-relay-bin.000007 Relay_log_pos: 4299 Master_log_name: mysql-bin.000002 Master_log_pos: 7155 Checkpoint_relay_log_name:./mysql_sandbox13234-relay-bin.000007 Checkpoint_relay_log_pos: 1786 Checkpoint_master_log_name: mysql-bin.000002 Checkpoint_master_log_pos: 4642 Checkpoint_seqno: 9 Checkpoint_group_size: 64 Checkpoint_group_bitmap:? # NO Global transaction ID is used! 123 123
Filters 124 124
Tungsten Replication Service Pipeline Stage Extract Filter Apply Stage Extract Filter Apply Stage Extract Filter Apply Master DBMS Transaction History Log In-Memory Queue Slave DBMS 125 125
Restrict replication to some schemas and tables./tools/tungsten-installer \ --master-slave -a \... --svc-extractor-filters=replicate \ "--property=replicator.filter.replicate.do=test,*.foo" \... --start-and-report # test="test.*" -> same drawback as binlog-do-db in MySQL # *.foo = table 'foo' in any database # employees.dept_codes,employees.salaries => safest way 126 126
Exclude some schemas and tables from replication./tools/tungsten-installer \ --master-slave -a \... --svc-extractor-filters=replicate \ "--property=replicator.filter.replicate.ignore=test,*.foo" \... --start-and-report # test="test.*" -> same drawback as binlog-ignore-db in MySQL # *.foo = table 'foo' in any database # employees.dept_codes,employees.salaries => safest way # DO NOT MIX.do and.ignore! # (you can do it, but it may not do what you mean) 127 127
Change name of replicated schema -a --svc-applier-filters=dbtransform \ --property=replicator.filter.dbtransform.from_regex1=stores \ --property=replicator.filter.dbtransform.to_regex1=playground # from_regex1=stores -> name of the schema in the master # to_regex1=playground -> name of the schema in the slave # WARNING: requires "USE schema_name" to work properly. 128 128
Multi-master: Con#ict prevention 129 129
CONFLICTS Continuent 2012 130 130
What's a con#ict Data modi!ed by several sources (masters) Creates one or more : data loss (unwanted delete) data inconsistency (unwanted update) duplicated data (unwanted insert) replication break 131 131
Data duplication 4 Matt 140 alpha id name amount 1 Joe 100 2 Frank 110 3 Sue 100 bravo charlie 4 Matt 130 BREAKS REPLICATION 132 132
auto_increment o$sets are not a remedy A popular recipe auto_increment_increment + auto_increment_offset They don't prevent con#icts They hide duplicates 133 133
Hidden data duplication 11 Matt 140 INSERT alpha o$set 1 id name amount 1 Joe 100 2 Frank 110 3 Sue 100 bravo o$set 2 INSERT charlie o$set 3 13 Matt 130 134 134
Data inconsistency 3 Sue 108 UPDATE alpha id name amount 1 Joe 100 2 Frank 110 3 Sue 100 bravo charlie UPDATE 3 Sue 105 135 135
Data loss 3 Sue 108 UPDATE alpha id name amount 1 Joe 100 2 Frank 110 3 Sue 100 bravo charlie DELETE record #3 MAY BREAK REPLICATION 136 136
con#ict handling strategies resolving after the fact planned for future use Needs information that is missing in async replication avoiding requires synchronous replication with 2pc preventing setting and enforcing a split sources policy Transforming and resolving all records are converted to INSERTs used by Tungsten planned for future use con"icts are resolved within a given time window Continuent 2012 137 137
Multi-master: Con!ict prevention 138 138
Tungsten con#ict prevention in a nutshell 1. de!ne the rules (which master can update which database) 2. tell Tungsten the rules 3. de!ne the policy (error, drop, warn, or accept) 4. Let Tungsten enforce your rules 139 139
Tungsten Con#ict prevention facts Sharded by database De!ned dynamically Applied on the slave services methods: error: make replication fail drop: drop silently warn: drop with warning 140 140
Tungsten con#ict prevention applicability unknown shards The schema being updated is not planned actions: accept, drop, warn, error unwanted shards the schema is updated from the wrong master actions: accept, drop, warn, error whitelisted shards can be updated by any master 141 141
Con#ict prevention directives --svc-extractor-filters=shardfilter replicator.filter.shardfilter.unknownshardpolicy=error replicator.filter.shardfilter.unwantedshardpolicy=error replicator.filter.shardfilter.enforcehomes=false replicator.filter.shardfilter.allowwhitelisted=false 142 142
con#ict prevention in a star topology alpha updates employees A C Host1 master: alpha database: employees A B C Host3 master: charlie (hub) database: vehicles Host2 master: bravo database: buildings B C 143 143
con#ict prevention in a star topology alpha updates vehicles A C Host1 master: alpha database: employees A B C Host3 master: charlie (hub) database: vehicles Host2 master: bravo database: buildings B C 144 144
con#ict prevention in a all-masters topology alpha updates employees B A C Host1 master: alpha database: employees Host2 master: bravo database: buildings B A C A B C Host3 master: charlie database: vehicles 145 145
con#ict prevention in a all-masters topology charlie updates vehicles B A C Host1 master: alpha database: employees Host2 master: bravo database: buildings B A C A B C Host3 master: charlie database: vehicles 146 146
con#ict prevention in a all-masters topology bravo updates employees B A C Host1 master: alpha database: employees Host2 master: bravo database: buildings B A C A B C Host3 master: charlie database: vehicles 147 147
con#ict prevention in a all-masters topology charlie updates employees B A C Host1 master: alpha database: employees Host2 master: bravo database: buildings B A C A B C Host3 master: charlie database: vehicles 148 148
setting con#ict prevention rules trepctl -host host1 -service charlie \ shard -insert < shards.map cat shards.map shard_id master critical personnel alpha false buildings bravo false vehicles charlie false test whitelisted false # charlie is slave service in host 1 149 149
setting con#ict prevention rules trepctl -host host2 -service charlie \ shard -insert < shards.map cat shards.map shard_id master critical personnel alpha false buildings bravo false vehicles charlie false test whitelisted false # charlie is slave service in host 2 150 150
setting con#ict prevention rules trepctl -host host3 -service alpha \ shard -insert < shards.map trepctl -host host3 -service bravo \ shard -insert < shards.map cat shards.map shard_id master critical personnel alpha false buildings bravo false vehicles charlie false test whitelisted false # alpha and bravo are slave services in host 3 151 151
Con#ict prevention demo reminder Server #1 can update "employees" Server #2 can update "buildings" Server #3 can update "vehicles" 152 152
Sample correct operation (1) mysql #1> create table employees.names(... ) # all servers receive the table # all servers keep working well 153 153
Sample correct operation (2) mysql #2> create table buildings.homes(... ) # all servers receive the table # all servers keep working well 154 154
Sample incorrect operation (1) mysql #2> create table employees.nicknames(... ) # Only server #2 receives the table # slave service in hub gets an error # slave service in #1 does not receive anything 155 155
sample incorrect operation (2) #3 $ trepct services simple_services alpha [slave] seqno: 7 - latency: 0.136 - ONLINE bravo seqno: [slave] -1 - latency: -1.000 - OFFLINE:ERROR charlie [master] seqno: 66 - latency: 0.440 - ONLINE 156 156
sample incorrect operation (3) #3 $ trepct -service bravo status NAME VALUE ---- ----- appliedlasteventid : NONE appliedlastseqno : -1 appliedlatency : -1.0 (...) offlinerequests : NONE pendingerror : Stage task failed: q-to-dbms pendingerrorcode : NONE pendingerroreventid : mysql-bin.000002:0000000000001241;0 pendingerrorseqno : 7 pendingexceptionmessage: Rejected event from wrong shard: seqno=7 shard ID=employees shard master=alpha service=bravo (...) 157 157
Fixing the issue mysql #1> drop table if exists employees.nicknames; mysql #1> create table if exists employees.nicknames (... ) ; #3 $ trepct -service bravo online -skip-seqno 7 # all servers receive the new table 158 158
Sample whitelisted operation mysql #2> create table test.hope4best(... ) mysql #1> insert into test.hope4best values (... ) # REMEMBER: 'test' was explicitly whitelisted # All servers get the new table and records # But there is no protection against conflicts 159 159
administration 160 160
Viewing THL Events thl info log directory = /home/tungsten/installs/master_slave/thl/dragon/ min seq# = 0 max seq# = 101 events = 101 161 161
viewing THL events thl index LogIndexEntry thl.data.0000000001(0:102) 162 162
viewing THL events thl index [...] LogIndexEntry thl.data.0000000001(0:18) LogIndexEntry thl.data.0000000002(19:33) LogIndexEntry thl.data.0000000003(34:35) LogIndexEntry thl.data.0000000004(36:3641) LogIndexEntry thl.data.0000000005(3642:3712) LogIndexEntry thl.data.0000000006(3713:3838) LogIndexEntry thl.data.0000000007(3839:3949) LogIndexEntry thl.data.0000000008(3950:4011) LogIndexEntry thl.data.0000000009(4012:4039) LogIndexEntry thl.data.0000000010(4040:4057) LogIndexEntry thl.data.0000000011(4058:4067) LogIndexEntry thl.data.0000000012(4068:4073) LogIndexEntry thl.data.0000000013(4074:4085) LogIndexEntry thl.data.0000000014(4086:4095) LogIndexEntry thl.data.0000000015(4096:4101) LogIndexEntry thl.data.0000000016(4102:4111) 163 163
viewing THL events thl list -seqno 102 [...] SEQ# = 102 / FRAG# = 0 (last frag) - TIME = 2012-02-06 05:56:09.0 - EPOCH# = 0 - EVENTID = mysql-bin.000002:0000000000018903;0 - SOURCEID = qa.r1.continuent.com - METADATA = [mysql_server_id=10;is_metadata=true;service=dragon;shard=tung sten_dragon;heartbeat=none] - TYPE = com.continuent.tungsten.replicator.event.repldbmsevent - OPTIONS = [##charset = ISO8859_1, autocommit = 1, sql_auto_is_null = 1, foreign_key_checks = 1, unique_checks = 1, sql_mode = 'IGNORE_SPACE', character_set_client = 8, collation_connection = 8, collation_server = 8] - SCHEMA = tungsten_dragon - SQL(0) = UPDATE tungsten_dragon.heartbeat SET source_tstamp= "2012-02-06 05:56:09", salt= 2, name= "NONE" WHERE id= 1 /* SERVICE = [dragon] */ 164 164
Skipping a THL Event trepctl online -skip-seqno 1092 trepctl online -skip-seqno 1092,1093,1094 # see example 165 165
Adding a Member Let's see the cookbook, and use it 166 166
parallel replication 167 167
Replicator Pipeline Architecture Tungsten Replicator Process Pipeline Stage Stage Stage Extract Assign Shard ID Apply Extract Apply Parallel Queue Extract Extract Extract Apply Apply Apply channels Transaction History Log MySQL Binlog THL shard.list file Slave DBMS 168
Parallel replication facts Sharded by database Good choice for slave lag problems Bad choice for single database projects 169 169
Parallel Replication test STOPPED binary logs MySQL slave Concurrent sysbench on 30 databases Tungsten slave OFFLINE direct: alpha (slave) running for 1 hour TOTAL DATA: 130 GB RAM per server: 20GB replicator alpha Slaves will have 1 hour lag 170
measuring results START binary logs MySQL slave Tungsten slave ONLINE direct: alpha (slave) Recording catch-up time replicator alpha 171
MySQL native replication slave catch up in 04:29:30 172
Tungsten parallel replication slave catch up in 00:55:40 173
Parallel replication made simpler FROM HERE... 174
Parallel replication made simpler TO HERE 175
Parallel replication made simpler 176
parallel replication direct slave facts No need to install Tungsten on the master Tungsten runs only on the slave Replication can revert to native slave with two commands (trepctl offline; start slave) Native replication can continue on other slaves Failover (either native or Tungsten) becomes a manual task 177 177
installing parallel replication MORE_OPTIONS='--channels=10'./cookbook/install_master_slave 178 178
Checking parallel replication trepctl status trepctl status -name tasks trepctl status -name shards trepctl status -name stores 179 179
Parallel replication demo 180 180
Troubleshooting 181 181
Identify the Failed Component Steps 1. trepctl services 2. trepctl -service SVC_NAME status 3. look at the logs 4. Take action 182 182
reading the logs ls $TUNGSTEN_BASE/tungsten/tungsten-replicator/logs/ trepsvc.log user.log... or./cookbook/show_log # let's see it in practice 183 183
Parting thoughts 184 184
Open source Tungsten Replicator now includes Oracle-to-MySQL and Oracle-to-Oracle extractors and appliers! 185 185
560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel +1 (866) 998-3642 Fax +1 (408) 668-1009 e-mail: sales@continuent.com Our Blogs: http://scale-out-blog.blogspot.com http://datacharmer.blogspot.com http://flyingclusters.blogspot.com Continuent Website: http://www.continuent.com Tungsten Replicator 2.0: http://code.google.com/p/tungsten-replicator Continuent 2012 186 186