Cassandra vs MySQL. SQL vs NoSQL database comparison



Similar documents
Practical Cassandra. Vitalii

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to am to 5 pm HOTEL DUBAI GRAND DUBAI

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Distributed Systems. Tutorial 12 Cassandra

NoSQL Database Options

Introduction to Apache Cassandra

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

Integrating Big Data into the Computing Curricula

Introduction to Cassandra

Real-time reporting at 10,000 inserts per second. Wesley Biggs CTO 25 October 2011 Percona Live

Cassandra. Jonathan Ellis

MariaDB Cassandra interoperability

A survey of big data architectures for handling massive data

The Apache Cassandra storage engine

An Approach to Implement Map Reduce with NoSQL Databases

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

Structured Data Storage

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

How, What, and Where of Data Warehouses for MySQL

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Time series IoT data ingestion into Cassandra using Kaa

Domain driven design, NoSQL and multi-model databases

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

NOSQL DATABASES AND CASSANDRA

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Tobby Hagler, Phase2 Technology

Xiaoming Gao Hui Li Thilina Gunarathne

Transactions and ACID in MongoDB

Apache HBase. Crazy dances on the elephant back

Comparing SQL and NOSQL databases

Big Data and Scripting Systems build on top of Hadoop

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Big Data with Component Based Software

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Firebird. A really free database used in free and commercial projects

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

How To Use Big Data For Telco (For A Telco)

Case study: CASSANDRA

Challenges for Data Driven Systems

LARGE-SCALE DATA STORAGE APPLICATIONS

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

nosql and Non Relational Databases

Can the Elephants Handle the NoSQL Onslaught?

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

In Memory Accelerator for MongoDB

NoSQL and Hadoop Technologies On Oracle Cloud

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Design Patterns for Distributed Non-Relational Databases

Big Data With Hadoop

An Open Source NoSQL solution for Internet Access Logs Analysis

MongoDB. The Definitive Guide to. The NoSQL Database for Cloud and Desktop Computing. Apress8. Eelco Plugge, Peter Membrey and Tim Hawkins

Accelerating Cassandra Workloads using SanDisk Solid State Drives

Cassandra A Decentralized Structured Storage System

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Do Relational Databases Belong in the Cloud? Michael Stiefel

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

NoSQL Data Base Basics

NoSQL in der Cloud Why? Andreas Hartmann

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Introduction to Big Data Training

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Evaluation of NoSQL databases for large-scale decentralized microblogging

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Adding Indirection Enhances Functionality

Apache Cassandra for Big Data Applications

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

High Availability Solutions for the MariaDB and MySQL Database

Database Replication with MySQL and PostgreSQL

A Survey of Distributed Database Management Systems

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

NoSQL for SQL Professionals William McKnight

How To Scale Out Of A Nosql Database

Benchmarking Cassandra on Violin

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

NOT IN KANSAS ANY MORE

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Using distributed technologies to analyze Big Data

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Database Setup. Coding, Understanding, & Executing the SQL Database Creation Script

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

New Features in MySQL 5.0, 5.1, and Beyond

MySQL Storage Engines

Database Scalability and Oracle 12c

Hypertable Architecture Overview

Preparing Your Data For Cloud

Big + Fast + Safe + Simple = Lowest Technical Risk

Data storing and data access

MongoDB Developer and Administrator Certification Course Agenda

Transcription:

Cassandra vs MySQL SQL vs NoSQL database comparison 19 th of November, 2015 Maxim Zakharenkov

Maxim Zakharenkov Riga, Latvia Java Developer/Architect Company

Goals Explore some differences of SQL and NoSQL Compare Cassandra and MySQL Take a look of what is under the hood Figure out what database to use

MySQL - No 2 SQL database in the world - OpenSource (GPL) - First release - 1995 Cassandra - No 2 NoSQL database in the world - OpenSource (Apache) - First release - 2008 See: http://db-engines.com/en/ranking

Sample model Users name surname Comments comment_id text created

MySQL: schema Users (PK,AI) name surname Comments comment_id (PK,AI) text created (IDX) (IDX)

MySQL: storing data (InnoDB) B-tree 1-1200 1-500 501-1000 1-100 101-200 900-950 951-1000 1001-1200 data data data data data

MySQL: storing index Comments comment_id (PK) text created (IDX) 1-1000 1-500 501-1000 1-100 comment_id 101-200 comment_id 900-950 comment_id

Cassandra: schema UsersAndComments (PK) : timeuid name (static) surname (static) comment_id (C) : timeuid comment

Cassandra: storage UsersAndComments: partitions : b027d040-4e69-11e5-8b53-0002a5d5c51b name: John surname: Brown comment_id: a5bdb8e0-53d2-11e5-a445-3f2e96f4bdc5 comment_id: 17baef80-53d3-11e5-a445-3f2e96f4bdc5 comment_id: efb95901-53d1-11e5-a445-3f2e96f4bdc5 comment: Hello! comment: Hi! comment: Good! : efb95900-53d1-11e5-a445-3f2e96f4bdc5 name: Bruce surname: Lee comment_id: f6359621-53d3-11e5-a445-3f2e96f4bdc5 comment_id: efb95901-53d1-11e5-a445-3f2e96f4bdc5 comment: Hello! comment: Hi!..

Case 1: read user data with comments MySQL Cassandra select * from Users u, Comments c where u. = c. and = 3 select * from UsersAndComments where = '1339d222-6e6a-...' find comments: C*O(log(C)) find user: O(log(U)) find partition: O(1) in 90% cases O(log(X)) worst case

Case 2: Find users who MySQL select * from Users u, Comments c where u. = c. and c.created = '2015-11-19' posted today Cassandra Find comments: c*o(log(c)) find users: u*o(log(u))

Case 2: Find users who posted today MySQL Cassandra select * from Users u, Comments c where u. = c. and c.created = '2015-11-14' - Real man doesn't need joins! Find comments: c*o(log(c)) find users: u*o(log(u))

Cassandra: add new table UserCommentsByDay day (PK) comment_id (C):timeuid comment user_name

Case 2: Find users who posted today MySQL +1 : no code changes required Cassandra +1 : query is fast +1 : no data duplicates - Disk is cheap! Who cares?

Case 3: add 1 column MySQL alter table Users add column gender bit(1); Cassandra alter table UsersAndComments add column gender boolean; -Long execution -Requires extra memory - Works Immediately!

Cassandra: SS-tables Bloom filter: 00110... sstable 51 Bloom filter: 00110... sstable 50 Memtable Read Insert/Update/ Delete... Bloom filter: 00110... sstable 2 Bloom filter: 00110... sstable 1 compaction manager

Case 4: Write performance MySQL - Each insert/update/delete requires search to be done - Batch write: - Ordered inserts are fast - Random inserts are slow - Tries to do changes in place (+1) Cassandra - Write is fast (+1) Insert/Update/Delete do not require search to be done - Every Insert/Update/Delete is an append write to the disk.

Case 5: Transactions MySQL - ACID compliant - Atomic - Consistent - Isolated - Durable Cassandra - Atomic (row level) - Isolated (row level) - Durable - Eventually consistent - Lightweight transactions

Case 6: Horizontal scalability MySQL - Requires some additional libraries to support H.S. (e.g. to do sharding) - No joins any more - No auto-incremented keys Cassandra - Initially designed for H.S. - Linearly scalable - Has a lot of client libraries supporting H.S. - Stimulates usage of the right design patterns for H.S.

Case 7: Fault tolerance MySQL - Supports Master-Slave - A lot of complains about Master-Master - Many systems use manual fault tolerance Cassandra - Configurable replication factor - Multiple data-centers - Rack aware - Consistency levels - Hinted handof - Read repair - Manual repair

Read performance +1 +1 Write performance +1 Multiple row queries +1 Joins +1 Transactions +1 Schema changes +1 Scalability +1 Multiple data centers +1 Fault tolerance +1

QA Maxim Zakharenkov: zakharenkov.maxim@gmail.com