2000 databases later Kristian Köhntopp http://google.com/+kristiankohntopp
TL;DR about 2000 MySQL instances in production Mostly running MySQL 5.5, some 5.6 beta since May 2012 made gamble that we would be all-out on 5.6 by now, lost that gamble reason are interesting new features in the optimizer, most likely 2
Database Environment in numbers
14 TB of unique data
73 TB archived unique data
445 TB space used by MySQL databases
2000 instances of MySQL running
7 The number of Database administrators
Environment CentOS Apache Perl with Plack MySQL 5.5, moving to 5.6 A lot of local custom tools 9
Servers and Versions 10
Replication chains 11
Cold standby Master/Standby: shared storage (on filer, previously DRBD) migratory VIP failover manual (scripted) automated solution less reliable than manual 12
Master hardware HP DL 380 G7 12 phy cores (24 HT) 96/192 GB Memory 6 drives RAID-10 (or filer) 13
Unreliable slaves HP BL:460c G7 12 phy cores (24 HT) 96/192 GB Memory 2 drives RAID-1 (or 1 SSD) 14
Situation May 2012: Need 5.6
Use Case that breaks it Availability Data Normalized data transformed into flattened, read-optimized view Massively parallel writes 16
MySQL 5.5 does not cut it InnoDB internal locks stalling with 24 cores Especially writes Redo Log limit: 4GB max Again writes 17
Fix: MySQL 5.6 Less global locks 32 GB Redo-Log 200 concurrent writes typically 24GB Redo-Log busy SSD-Optimized flushing strategy 18
Problems Using MySQL 5.6.5 (beta) initial problems with I_S crash bug core dumping 192 GB memory to SSD takes 15 minutes 800 GB SSD total size uploading coredump to Oracle for analysis :-) 19
Fixes Workaround for crash bugs found: I_S tables: 2 concurrent reads + r/o txns = boom Stop monitoring = good Fixed in later 5.6 releases Second showstopper also fixed in new version 20
September 2012: Situation good
MySQL 5.6, autumn 2012 Excellent dev work, Oracle! Most stable beta experience, ever! Without 5.6 we would not have made it through the summer peak MySQL 5.6 beta - at the core of the business 22
Typical MyQL release New features have new bugs Previous features greatly improved 23
Late 2012: A bold plan
Ambitious migration goal Be able to move all production to 5.6 in less than 30 days (20 work days) Minimize network traffic Fully orchestrate the automation No human involved after 'go' decision 25
Planning migration 5.1 to 5.5 migration was a pain mysql_upgrade did not work for us. Create initial 5.5 slave to 5.1 master. Test with this, clone from this. Took ages to complete. With current data volume and networking situation not possible to complete in time. 26
Beyond puppet: Orchestration 27 Goal: To control all subsystems in a service to allow mass operations without affecting availability and QoS.
Beyond puppet: Orchestration 28 Goal: To control all subsystems in a service to allow mass operations without affecting availability and QoS.
Parts needed
Orchestration Central controller Service group definitions and dependencies Flexible repointing A load balancer OR dynamic repointing Faster data distribution faster than rsync, that is 30
Parts we have serverdb Asset and configuration management Django Application, MySQL backend defines groups drives puppet, nagios, graphite drives my.cnf generation through puppet 31
Parts we are not happy with Clone Script (/bin/bash) maintain inheritance data for debugging shovel data from clone source to destination driven by serverdb state machine uses rsync (and ssh) need not be fast, because unattended 33
Moving to Python Painful decision: keep legacy bash scripts, but no more bash development 'dba' command as a shell FE to.py scripts hierarchy of Python classes 34
Moving to Python Better reuse Better error reporting and robustness Better security Less processes, less connections Threading, Concurrency control 35
Fast Data Distribution
Shoveling data faster use "nc" python based unencrypted socket used in cloning and backups works at network speed does not scale use BitTornado modified Torrent tracker take network topology into account used in mass cloning long setup time scales just fine 37
Fast Data Distribution 'rsync' still the default method 'secure' default 'nc', 'torrent' provide scaling options use less CPU use less network bandwidth 38
Repointing
Old cell structure Puppet generates "db.conf" "db.conf" points to fixed databases per "handle" 40
Old cell structure Slow to change: puppet run required Hard to control: puppet runs eventually Error backpropagation: db failure fe error lb drops entire cell Capacity loss: error in a single db loss of complete cell 41
Old cell structure Goal: DBA controlled, local change temp hack, not using db.conf "Solution": iptables! 42
Repointing def reflect(self, destination, port): if not self._forwarding_enabled: self._enableforwarding() ret, out, err = self._run('sudo', get_os_command('iptables'), '-t', 'nat', '-A', '-p', 'tcp', '--dport', str(port), '-j', 'DNAT', '--to-destination', destination logging.info("traffic redirect status: %s, %s, %s" % ( ret, out, err )) return ret 43
Repointing Works. That's about the only redeeming feature it has. 44
Repointing (done right)
DB level load balancing Who is doing it? Query through all channels came back empty. What are typical implementation pitfalls?? 46
Solutions considered haproxy (rejected) single threaded routing traffic through haproxy process SPOF, Bottleneck 47
Solutions considered LVS Direct Routing Mode (rejected) Bulk traffic not going through LB Ugly ARP hack Not useable due to network topology constraints (broadcast domain) 48
Solutions considered haproxy redirector + client lib Modify Bookings::Db to ask haproxy for server to contact ('redirect') No traffic through LB Make connection stick to last target if haproxy is down Did run as interim solution, success. 49
Solutions considered haproxy as a DNS server modify haproxy to answer DNS queries FE resolves virtual db name through haproxy controlled DNS server no Bookings::Db change required also works with straight connections currently in production. 50
Solutions considered New 'cell free' design 51
Benefits Easier maintenance Automated rolling restarts possible Wrapper script as POC Enhanced capacity Single queue, multiple consumers 52
Towards Orchestration
DBA command insert commands into job queue Django + MySQL Backend (Job table) Task runner fetches jobs, executes them repoint, motd, check idle, take MySQL down, perform job start mysql, warmup, undo motd, unpoint 54
DBA command # dba downtime 2h 'Kris is testing' # dba clone_slave_from_dedicated_source bc43bprdb-01 OR # dba torrent_get_share --path='/mnt/clone_snapshot' # dba retrieve_root_mycnf # dba warmup 55
Orchestration Central queue Central error reporting LB stats from haproxy Central realtime service quality status information 56
Orchestration Central service quality information Group definitions from serverdb Centralized orchestration controller 57
Monitoring Graphite + Merlin reimplemented as etsy dashboard Icinga/Check_MK Merlin (deprecated) DBAng dashboard Controlrooms (business level monitoring) Show_errors page 58
Management POV
Strategy We are an agile organization. We value velocity (development speed) over all other factors. The task of operations is to enable development. 63
Operations vs. Incident We made database upgrades part of regular operational scripting at a scale. An upgrade is no longer an abnormal thing that is to be avoided. 64
Giving Management Choice Upgrade decition based on features and benefits, not on cost and manpower. "We found to be useful and stable, and would like to have it available across all of production next month." 65
March 2013: Still not using MySQL 5.6
CPU Comparison 67
Testing in production Take 3x the normal number of FE blades Increase LB weight in small steps Wait until load stabilizes Check "siege" latency, check error log 68
Load Test Results 69
http://booking.com/jobs We Are Growing! We Are Hiring!