Redundant Storage Cluster For When It's Just Too Big Bob Burgess radian 6 Technologies MySQL User Conference 2009
Scope Fundamentals of MySQL Proxy Fundamentals of LuaSQL Description of the Redundant Storage Cluster Architecture Sample code
Scope NOT Complete Course on MySQL Proxy Lua Programming Course Amazing Lua Tricks Complete code listing of cluster (Posted online)
The Problem
Growth! 12 Blog Posts millions per day 10 8 6 4 2 0
... Current Solution month table myisam merge table month table month table month table Data over NFS can be unreliable Separate copy for each DB that could use it (Master / Replicas)
Goals Single place for all content Redundancy without complete duplication Add storage by adding nodes Survive a server failure
Existing Products HiveDB Java/Hibernate-based No redundancy Spock Proxy No redundancy MySQL Cluster All indexes in RAM
Cluster Master DB content (federated) Other DBs Other DBs DBs content (federated) Load Balancer MySQL Proxy Lua MySQL Proxy Lua MySQL Proxy Lua content 1 1 dir content 2 content n dir dir 2... n
Component Walk-Through Federated Engine Load Balancer MySQL Proxy Lua LuaSQL
Federated Engine Just a pointer to another table create table sample ( id int primary key, value varchar(100) ) engine=federated connection= 'mysql://user:password@host:9999/schema/sample'
Federated Engine create table sample ( id int primary key, val int, name varchar(50) ) select id,name from sample; SELECT `id`, `val`, `name` FROM `sample`; select id,name from sample where id=2; SELECT `id`, `val`, `name` FROM sample WHERE `id` = '2'; select val,name from sample where val=2; SELECT `id`, `val`, `name` FROM `sample`; select * from sample limit 10; SELECT `id`, `val`, `name` FROM `sample`;
Federated Engine select max(id)from sample; SELECT `id`, `val`, `name` FROM `sample`; select count(*) from sample; SELECT `id`, `val`, `name` FROM `sample`; insert into sample (id,name) values (5,'bob'); INSERT INTO `sample` (`id`, `val`, `name`) VALUES ('5', NULL, 'bob'); insert into sample values (5,10,'bob'); INSERT INTO `sample` (`id`, `val`, `name`) VALUES ('5', '10', 'bob');
Load Balancer Options MySQL Proxy Load Balancer For read balancing only For traditional master/slave architecture Custom load balancer MySQL Proxy & Lua script Still an option, depending on future architecture
Load Balancer Options Linux Networking NAT LB stays involved in session Limited Scalability Slower option Linux Networking Direct Routing LB hands off session Node answers client directly Much more scalable
MySQL Proxy Communicates with a MySQL client using MySQL Network Protocol Provides an API to a script environment Pass query from client Return result set to the client Uses Lua scripting language
MySQL Proxy Supplied Constants and Methods global proxy.global.connect_retry=3 proxy.global.conremote={} proxy.global.mysqlenv=luasql.mysql() proxy.global.conremote[tonumber(details.nodeid)] = proxy.global.mysqlenv:connect(...) queries list of queries going to the server proxy.queries:reset()
MySQL Proxy Supplied Constants and Methods response response from server to client proxy.response.type = proxy.mysqld_packet_ok proxy.response.resultset = {...} Constants MYSQL_PACKET_OK MYSQL_PACKET_ERR MYSQL_TYPE_LONG PROXY_SEND_RESULT
MySQL Proxy MySQL client 3306 MySQL server MySQL client 4040 MySQL Proxy 3306 MySQL server MySQL client 4040 MySQL Proxy Lua LuaSQL 3306 MySQL server
MySQL Proxy MySQL client MySQL server MySQL Proxy Lua read_query( ) read_query_result( )
MySQL Proxy MySQL client MySQL server MySQL Proxy Lua LuaSQL calls
LuaSQL Connect directly to databases from Lua scripts environment :connect connection :execute (select) :execute (ins/upd/del) cursor return code & error msg :fetch return set
LuaSQL Connection Timeouts Connection drops after sleeping 10s call the execute method if cursor object is nil: call connect method for the environment call execute again
LuaSQL Connection Timeouts function select_db (node,sql) local reconnects = 0 local cur local err repeat cur, err = proxy.global.conremote[tonumber(node)]:execute(sql) if cur == nil then print ('Reconnecting select_db. Error='..err) reopen_remote_db (node) end reconnects = reconnects+1 until (cur ~= nil or reconnects==proxy.global.connect_retry) if cur==nil then error("could not reconnect / No result set in select_db",2) else return cur:fetch() end end
LuaSQL Connection Timeouts function execute_db (node,sql) local reconnects = 0 local LOST_CONNECTION = "MySQL server has gone away" local rc local err repeat rc, err = proxy.global.conremote[tonumber(node)]:execute(sql) if rc ~= nil then print ("execute_db RC="..rc); end if err~=nil then print ("execute_db error="..err); end if rc == nil and err:find(lost_connection) then print ('Reconnecting execute_db. Error='..err) reopen_remote_db (node) end reconnects = reconnects+1 until (rc ~= nil or reconnects==proxy.global.connect_retry or not err:find(lost_connection) ) if rc == nil and err:find(lost_connection) then error("could not reconnect in execute_db.",2) else return rc,err end end
System Info Script Keeps cluster up to date on disk usage Update Node table on all nodes with disk size & free of this node
Accepting Queries from Client Query comes in to Proxy, read_query is called The query appears in read_query's parameter First byte of variable indicates the type of query: proxy.com_query (Query) proxy.com_process_info (Process List ) proxy.com_connect (Connect) proxy.com_process_kill (Kill)
Returning a result set Two response types "OK" Error Set properties of the "response" object in the "proxy" environment
Returning a result set: Error Array: type errmsg errcode sqlstate proxy.response={ type= proxy.mysqld_packet_err, errmsg= "Malformed INSERT statement.", errcode= 1064, sqlstate= "42000" } return proxy.proxy_send_result
Returning a result set: OK Empty Result Set Array: type proxy.response.type=proxy.mysqld_packet_ok return proxy.proxy_send_result
Returning a result set: OK Full result set Array: type (value) resultset (table) fields (table) rows (table) type proxy resultset (etc.) fields rows name value value value value value value
Returning a result set: OK fields rows type name value value value value value value proxy.response.type = proxy.mysqld_packet_ok proxy.response.resultset = { fields = { {type=proxy.mysql_type_longlong, name="blogpostid" }, {type=proxy.mysql_type_long, name="partitionkey"}, {type=proxy.mysql_type_var_string, name="rawcontent" } }, rows = { { tonumber(itemid), tonumber(partkey), contentvalue } } } return proxy.proxy_send_result
Cluster Operation Overview
Cluster: Librarian Talks to the client Accepts items to store Retrieves items / gives them to client The single Lua script that runs under Proxy
Cluster: Librarian Directory itemid nodeid partitionkey Event table serial no. event type event data
Cluster: Librarian Content_partitionKey itemid item compressed into largeblob Node table nodeid connection / authentication info system status (disk) capacity factor
Cluster: Librarian insert no yes syntax good bad error exists no store on this node yes Error: table doesn't exist error create table store on this node update ALL directories store Event return OK to client
Cluster: Librarian Returning an error proxy.response={ type= proxy.mysqld_packet_err, errmsg= "Duplicate entry '"..itemid.."' for key 1", errcode= 1062, sqlstate= "23000"} return proxy.proxy_send_result Content Tables MyISAM (concurrent_insert=2) One table per partition key For us: 300 MB / hour
Cluster: Librarian select no yes syntax good bad error 1=0 no yes return empty result set to client info table no yes calculate max(id) and count(*) return result set to client get item from remote db remote read dir local doesn't exist return empty result set to client get item from this db return result set to client
Cluster: Librarian Federated select max(id)from sample; SELECT `id`, `val`, `name` FROM `sample`; Client create table _content_info ( _max_id bigint unsigned, _count bigint unsigned) engine=federated connection=(...schema/_content_info...); Target create view _content_info as select max(id) _max_id, count(*) _count from _content;
Cluster: Librarian show table status like `_content`; Auto_increment 0 cluster node add cluster node status cluster node offline
Cluster: Rebalancer Enforces redundancy policy for new inserts to heal from node loss Rebalance node added equal disk usage
Cluster: Rebalancer Policy enforce for new items get earliest new item no new items find another node with the most free space copy the item there update all directories next step
Cluster: Rebalancer Policy enforce for other items get one item from the Directory which exists on an insufficient number of nodes and has "me" as the first listed node no items copy the item to the node with the most free space that doesn't already have it update all directories next step
Cluster: Rebalancer Rebalancer do I have the most free space of all the nodes yes no find the node with the least free space find an item on that node that's not on this node (that obeys the redundancy policy) move the item from that node to me update all directories done
Member Add Add new node to Node table Get a copy of the directory from all nodes (a piece from each)
Member Remove Set free-space margin to 0 Force "free disk space" for this node to 0 Wait for Rebalance to copy everything off Restore the free-space margin and update the Node list
Member Fail Update node list Remove all directory entries for this node
Information Age-Out Drop tables for obsolete partition keys Remove directory entries for those partitions
Table Optimizer information_schema.tables compare data_free to data_length "Lock" that partition on that node (row in PartitionLock table) Run optimize table but abandon if any node fails Unlock table
Backup "Write Lock" one partition across all nodes Copy that partition table for all nodes Copy directory entries for that node Unlock
Development Directions Bulletproof error handling Performance tuning
Alternative Architectures Load Balancer choices Move SQL parsing to a complex proxy-based load balancer, communicate with nodes on network sockets Librarian in Perl, Java, C Alternative databases
Resources MySQL Documentation http://forge.mysql.com/wiki/mysql_proxy http://forge.mysql.com/tools/search.php?t=tag&k=mysqlproxy http://www.lua.org http://www.keplerproject.org/luasql http://lua-users.org http://jan.kneschke.de/projects/mysql/mysql-proxy http://www.linuxvirtualserver.org
Thank you! Bob Burgess bob.burgess@radian6.com www.radian6.com/mysql