Cassandra vs MySQL SQL vs NoSQL database comparison 19 th of November, 2015 Maxim Zakharenkov
Maxim Zakharenkov Riga, Latvia Java Developer/Architect Company
Goals Explore some differences of SQL and NoSQL Compare Cassandra and MySQL Take a look of what is under the hood Figure out what database to use
MySQL - No 2 SQL database in the world - OpenSource (GPL) - First release - 1995 Cassandra - No 2 NoSQL database in the world - OpenSource (Apache) - First release - 2008 See: http://db-engines.com/en/ranking
Sample model Users name surname Comments comment_id text created
MySQL: schema Users (PK,AI) name surname Comments comment_id (PK,AI) text created (IDX) (IDX)
MySQL: storing data (InnoDB) B-tree 1-1200 1-500 501-1000 1-100 101-200 900-950 951-1000 1001-1200 data data data data data
MySQL: storing index Comments comment_id (PK) text created (IDX) 1-1000 1-500 501-1000 1-100 comment_id 101-200 comment_id 900-950 comment_id
Cassandra: schema UsersAndComments (PK) : timeuid name (static) surname (static) comment_id (C) : timeuid comment
Cassandra: storage UsersAndComments: partitions : b027d040-4e69-11e5-8b53-0002a5d5c51b name: John surname: Brown comment_id: a5bdb8e0-53d2-11e5-a445-3f2e96f4bdc5 comment_id: 17baef80-53d3-11e5-a445-3f2e96f4bdc5 comment_id: efb95901-53d1-11e5-a445-3f2e96f4bdc5 comment: Hello! comment: Hi! comment: Good! : efb95900-53d1-11e5-a445-3f2e96f4bdc5 name: Bruce surname: Lee comment_id: f6359621-53d3-11e5-a445-3f2e96f4bdc5 comment_id: efb95901-53d1-11e5-a445-3f2e96f4bdc5 comment: Hello! comment: Hi!..
Case 1: read user data with comments MySQL Cassandra select * from Users u, Comments c where u. = c. and = 3 select * from UsersAndComments where = '1339d222-6e6a-...' find comments: C*O(log(C)) find user: O(log(U)) find partition: O(1) in 90% cases O(log(X)) worst case
Case 2: Find users who MySQL select * from Users u, Comments c where u. = c. and c.created = '2015-11-19' posted today Cassandra Find comments: c*o(log(c)) find users: u*o(log(u))
Case 2: Find users who posted today MySQL Cassandra select * from Users u, Comments c where u. = c. and c.created = '2015-11-14' - Real man doesn't need joins! Find comments: c*o(log(c)) find users: u*o(log(u))
Cassandra: add new table UserCommentsByDay day (PK) comment_id (C):timeuid comment user_name
Case 2: Find users who posted today MySQL +1 : no code changes required Cassandra +1 : query is fast +1 : no data duplicates - Disk is cheap! Who cares?
Case 3: add 1 column MySQL alter table Users add column gender bit(1); Cassandra alter table UsersAndComments add column gender boolean; -Long execution -Requires extra memory - Works Immediately!
Cassandra: SS-tables Bloom filter: 00110... sstable 51 Bloom filter: 00110... sstable 50 Memtable Read Insert/Update/ Delete... Bloom filter: 00110... sstable 2 Bloom filter: 00110... sstable 1 compaction manager
Case 4: Write performance MySQL - Each insert/update/delete requires search to be done - Batch write: - Ordered inserts are fast - Random inserts are slow - Tries to do changes in place (+1) Cassandra - Write is fast (+1) Insert/Update/Delete do not require search to be done - Every Insert/Update/Delete is an append write to the disk.
Case 5: Transactions MySQL - ACID compliant - Atomic - Consistent - Isolated - Durable Cassandra - Atomic (row level) - Isolated (row level) - Durable - Eventually consistent - Lightweight transactions
Case 6: Horizontal scalability MySQL - Requires some additional libraries to support H.S. (e.g. to do sharding) - No joins any more - No auto-incremented keys Cassandra - Initially designed for H.S. - Linearly scalable - Has a lot of client libraries supporting H.S. - Stimulates usage of the right design patterns for H.S.
Case 7: Fault tolerance MySQL - Supports Master-Slave - A lot of complains about Master-Master - Many systems use manual fault tolerance Cassandra - Configurable replication factor - Multiple data-centers - Rack aware - Consistency levels - Hinted handof - Read repair - Manual repair
Read performance +1 +1 Write performance +1 Multiple row queries +1 Joins +1 Transactions +1 Schema changes +1 Scalability +1 Multiple data centers +1 Fault tolerance +1
QA Maxim Zakharenkov: zakharenkov.maxim@gmail.com