Adding Indirection Enhances Functionality The Story Of A Proxy Mark Riddoch & Massimiliano Pinto
Introductions Mark Riddoch Staff Engineer, VMware Formally Chief Architect, MariaDB Corporation Massimiliano Pinto Senior Software Engineer, MariaDB Corporation 2
Agenda Why indirection MaxScale as a platform for indirection Examples of functionality that can be achieved 3
Why Indirection? You can always solve any problem by adding a level of indirection 4
Principles Interrupt Interrupt Vector Table Interrupt Handler Hiding physical resources Operating system traps Jump tables Interrupt handler tables Device drivers 5
Brokers Indirection in a network environment Map requests to providers Name servers Object Request Brokers 6
Load Balancers Schedulers Typical web site uses HTTP Load Balancer 7
Altering Behavior Device Drivers Plugins Inheritance Overriding methods by use of function pointers 8
Scale Out Virtual Memory Management Use of virtual memory to increase visible memory space Logical Volume Manager Create large disk volumes from many physical volumes 9
Reasons for Hide physical resources Increase Fault Tolerance Alter behaviors Balance load Scale Out Performance 10
Everything Has A Cost Indirection is no exception Extra processing cycles Extra memory references Extra Network Hops Need to evaluate cost v s benefit 11
MaxScale as a platform for indirection
What is MaxScale? Simple answer - a database proxy Provides flexibility via plugins Content aware network component 13
Indirection Within MaxScale MaxScale is the indirection layer between clients and databases MaxScale provides internal indirection via service definitions and plugins 14
Service Endpoints MaxScale has a service concept One proxy may have several services Each service defines a virtual database Several virtual databases may map to the same physical databases Each service defines a set of plugins to implement service functionality 15
Protocol Plugins Separate protocols for client side and server side of proxy Allows different data stores to be used at the backend Clients may talk with different protocols 16
Authentication Plugins Provide for different authentication models in the clients and servers Maps between different authentication models 17
Monitor Plugin Allows MaxScale to monitor the states and roles of backend databases Monitoring data used in routing decision to determine best end point for requests 18
Filter Plugin Pipeline of filters - based on Unix pipes mechanism Modify, block, duplicate or log requests Add hints to pass onto future components 19
Router Plugin Determines which database to route request to May route individual requests or connections Accepts hints from filters earlier in the process 20
Content Aware MaxScale includes a SQL parser Individual statements and session history may be used to decide best statement destination 21
MaxScale usage examples
Read Scalability Read/Write splitting with MySQL Replication Each application uses only 1 connection MaxScale monitors the state of each node and selects only available nodes MaxScale MaxScale creates 2 connections, one for R/W on the Master node and one R/O load balanced on the Slave nodes Database Database Database Database Database 23
Update Safe Galera Read/Write Splitting routes write to a single node Each application uses only 1 connection MaxScale monitors the state of each Galera Cluster node and selects only synced nodes. One node is selected as Master, for write operations Master MaxScale MaxScale load balances the client connections for reads and writes to one node avoiding conflicts. 2 connections are used Database Database Database 24
Schema Based Sharding Sharding is a method of splitting a single database server into separate parts. Each schema is located on a different database server MaxScale will appear to the client as a database server with the combination of all the schemas. 1 R/W connection to each server Schema1 Schema2 Schema3 25
Replication Proxy It requests and receives binlog records from the Master autonomously of any slave. Binlog records are stored to allow replaying of those records to the Slave servers. The Slave servers must be able to request historical binlog records without sending any additional traffic to the Master server. Binlog records received from the Master are relayed to the Slaves that are able to accept them: i.e. minimal lag behind the Master. 26
Query Metrics & Logging The QLA filter writes copies of a query to a per user connection log file. Logged queries could be controlled using regular expressions, connection source address and connection user name The top filter monitors every SQL statement that passes through the filter. Top N times are kept, along with the SQL text itself and a list sorted on the execution times of the query is written to a file upon closure of the client session. Log examples: 15:16:23.333 8/04/2015, INSERT INTO sbtest values(1,0,' ','abcd) 22.985 select sum(salary), year(from_date) from salaries s. 5.304 select d.dept_name as "Department", y.y1 as Year 27
Legacy Support Modify queries to compensate for legacy applications or database schemas: use deprecated statements statements that have been altered Example: TYPE = versus ENGINE= in the CREATE TABLE statement [CreateTableFilter] type=filter module=regexfilter options=ignorecase match=type[ ]*= replace=engine= 28
Parallel Databases May duplicate to different database backend types The tee filter can be used in a filter pipeline of a service to make a copy of request from the client and dispatch it to another service within MaxScale. Duplicate some or all queries via the tee filter The tee filter has mechanisms to limit those queries that are replicated using regular expressions, connection source address and connection user name. Protocol Filter Router Protocol INSERT INTO T1 (...) INSERT INTO T1 (...) Filter Router Protocol INSERT INTO T1 (...) 29
Query Firewall The database firewall filter is used to block queries that match a set of rules. It can be used to prevent harmful queries from reaching the backend database instances It can limit the access to the database based on a more flexible set of Examples: rule query_regex deny regex.*select.*from.*user_data.*' users %@% match all rules query_regex 30
Non-overlapping, multi-master, write-safe Galera Cluster Split writes on table or schema such that only one node in cluster gets writes for a particular schema or table Reduce or remove the possibility of write set conflicts Still benefit from multiple nodes supporting the write load DB1.TBL1 DB2.TBL2 1 R/W connection to each server node1 node2 31
What s coming Simple Table-Key based sharding Binlog router compatibility with MariaDB 10 Client Side SSL Backend persistent connections Launchable scripts from monitors REST/JSON for admin interface Zero Copy Response New authentication plugins Sharding improvement with cross shard joins 32
Get involved Check on GitHub MariaDB Source https://github.com/mariadb-corporation/maxscale https://downloads.mariadb.org/ Bugs report https://mariadb.atlassian.net/projects/mxs Google groups https://groups.google.com/forum/#!forum/maxscale 33
Thank you! & Questions 34