How MySQL and NoSQL Coexist Matt Yonkovit - Percona
DIAMOND SPONSORSHIPS THANK YOU TO OUR DIAMOND SPONSORS www.percona.com
Who Am I Matt Yonkovit - Principal Consultant Percona MySQL->Sun Veteran 15+ years Database Experience Lots of fun at parties
What We do
Where MySQL and the Ecosystem is today:
Lots of Data
MySQL focus: scale up Look at the advances made within innodb Scalability to 32+cores SSD enhancements Dealing with mega-sized buffers pools This fits with hardware Multi-core Lots of memory standard Cheap Flash
We Need More Servers!
Industry Says Scale Out Cloud Computing Rise of EC2 Scale on demand Pay for what you need Big companies run 1000's of servers Big Data retention drives more scale out Retention and Analysis out pace hardware growth This is part of the cyclical nature of computing, I remember hearing this from Oracle when RAC was first introduced, before that it was time share
Developers want easy and fast SQL Is Powerful But SQL can be overly complex for many operations Sometimes Optimizers in RDBM's do stupid things and developers can do it better SQL is yet another component a developer needs to learn Even for Simple SQL operations there is overhead to parse/execute
ORM Sucks Because developers want to be focused on being super developer ninja's the often turn to ORM's ORM's ( like active record ) work great for simple applications, but tend to bork when you have complex mappings
Fast Moving Changes What data and how that data is stored is changing at a break neck pace Changes to large databases are hard Example In a presentation Craigslist said their archive db took 1 month to alter a table. Other alters can still take days or weeks
Missed opportunities While the MySQL community as a whole has done an awesome job, we did miss a few things: Online Table Alters ( Add/Mod column ) Scale out -vs- Scale up Flexible data types
How other groups tackled the problems:
The Rise of NoSQL RDBM's of old did not keep up with the demand Need for fast, efficient data access Eliminate the pain points Eliminate the unneeded fluff Many websites got along well without things like Functions, Stored porcs, full acid, compliance, etc.
NoSQL Covers a lot Key/Value i.e. memcached, redis Column Stores i.e. Hbase Document Stores i.e. Mongo,Couch
More developer centric interface Instead of relying on a SQL interface many NoSQL solutions allow develoeprs to stay in code and directly access objects using their programing language of choice i.e. mongodb's javascript interface Allow data to be retrieved in an easily consumable format i.e. json, binary object, etc
Scale out Remove the complexities and automate sharding Make full use of multiple servers for complex tasks i.e. map reduce Ability to add servers on demand Support for fail-over and replication
Easy to change Change is a certainty in life... Users will demand it! Make sure that you are not bound to a rigid structure Allow for changes on the fly flexible schema
Custom Features Add support for features missing in MySQL or that solve a specific need GridFS (Mongo) Super Columns (Cassandra) Lists/Sets (Redis)
Where are we now? SQL -vs- NOSQL?
Too Much FUD A lot of developers like to hate on MySQL A NoSQL Solution (Like Mongo) does not mean: You do not have to think about your data types and schema design Does not mean add nodes will solve all your issues Does not mean you can get sloppy with code Will not fully replace a relational db Bad Code is Bad Code
As complex as you make it You still have to think with NoSQL How will this data be used Will we have to correlate this data with other data in the system Will duplication of data cause issues? Bloat space?
Too Much FUD A lot of people in MySQL hate on NoSQL Not All NOSQL solutions are made the same Some of them are durable, or are adding durability features ( i.e. mongo in 1.8+ ) While in most NOSQL more memory = better performance, the same holds true for MySQL Not every solution is eventual consistency Remember MySQL started as that other technology, mocked by classic DBA's Many of these solutions are at similar places where MySQL was 6-7 years ago.
MySQL is not Sitting Idle
Lessons from NDB In Many ways MySQL has had a NoSQL solution available for years NDB ( Mysql Cluster ) is at its code a NoSQL solution with a SQL wrapper Many deployments of NDB need to bypass the SQL layer and write directly to the NDB API in order to achieve optimal performance
Parsing SQL Can be slow libmysql (Akira s Numbers) samples % symbol name 748022 7.7355 MYSQLParse(void*) 219702 2.2720 my_pthread_fastmutex_lock 205606 2.1262 make_join_statistics( ) 198234 2.0500 btr_search_guess_on_hash Sources: http://www.mysqlperformanceblog.com/2011/03/16/where-does-handlersocket-really-save-youtime/ http://www.slideshare.net/akirahiguchi/handlersocket-20100629en-5698215
Handler Socket Developed by DeNA Direct Access to the Innodb storage engine, bypassing SQL Key-Value type access Yoshinori hit 750K QPS, faster then memcached, and 7X faster then Stock SQL http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-storyfor.html
Handler Socket Performance Benchmark by Vadim: http://www.mysqlperformanceblog.com/2010/11/02/handlersocket-on-ssd/
Memcached Oracle recently got in on the fun, introducing a Memcached interface for Innodb Key-Value access Uses libmemcached ( known and used protocol) Promise the ability to use the distributed hash capabilities of memcached to shard ( not available yet )
Question: Should you use MySQL or Look at NoSQL Solution? Answer: It depends on your application.
Questions you should ask Do you need transactions? Can you risk data loss? What are your performance requirements? What level of risk is acceptable? Costs involved? Hard costs: Servers, Infrastructure Soft costs: Developer Time
General MySQL Considerations Durability important ( Is some data loss acceptable? ) 3rd Party applications Integration with other RDBMS's Already invested in SQL? OLTP type Workloads? Transactions Joins
General NoSQL Considerations SQL Is Overkill Simple Key Values? Document Oriented No Standard Form or Consistent data stream Require CPU Based resources from several machines Huge amounts of archived data Rapid Changes to the structures Possible data inconsistencies Eventual consistency by product
Performance? Performance for both MySQL and NoSQL solutions can vary wildly. Impacted by: Schema/Object design Datatypes Indexes ORM Drivers You can in many cases you can trade performance for reliability/consistency
Space With Huge Datasets, space can be a premium Some NoSQL options can take up a lot more space then their MySQL counter parts. XML Example (no redundant data): 2.5GB in MongoDB 486MB in Innodb Some solutions have you duplicate data for performance and simplicity
Thou shall Benchmark Trust, but verify
Verify I have run into a lot of people who leap before they test Asking for trouble Don't be trapped by legacy problems If you have bad sql code, its not MySQL's fault ;) Use specific benchmarks, not generic ones
NoSQL Options
MongoDB Nice Json centric data storage Do not underestimate MapReduce capabilities Better durability in version 1.8+ Built in sharding Replication No Native Joins Trade performance for consistency Data footprint can be bigger then Mysql in some cases
Redis Pros: Cons: Fast when fully in memory, but can use virtual memory Using Virtual Memory prevents using other features Supports clustering Data loss possible Supports replication No Native joins Super fast Key Access only Complex data types like lists
Cassandra Pros: Cons: Auto-Sharding of data Eventual Consistency Replication Documentation not as deep as other solutions Parallel Processing Super Columns are interesting Speed is by product of adding more nodes Network Chatty Durability issues
Hadoop Pros: Cons: Great with Super Large Datasets (Petabytes) Can be complex Extreme Parallelism Can do complex CPU intensive tasks via map reduce Needs time to produce results No Native Joins/Indexes
Tokyo Cabinet Pros: Cons: Can Be Super Fast No Built in Sharding Multiple Table Types to Durability/Consistency issues support different requirements Lack of Documentation Embeddable Can Bog down if you run out Replication of memory Can use Memcached protocol
Couch Resistant to corruption No Sharding MVCC Based (Versioning) Can be slower then other solutions Javascript Rest Interface Replication Map Reduce Easy to Get started
MySQL Tricks There are ways to solve the NoSQL issues in MySQL, but they are manual or add complexity Storage engines Shard Query XML Data Sharding NDB/Cluster