How MySQL and NoSQL Coexist. Matt Yonkovit - Percona

Similar documents
Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Data Services Advisory

Structured Data Storage

Can the Elephants Handle the NoSQL Onslaught?

Preparing Your Data For Cloud

Cloud Scale Distributed Data Storage. Jürmo Mehine


extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

In Memory Accelerator for MongoDB

Scalable Architecture on Amazon AWS Cloud

Open Source Technologies on Microsoft Azure

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Big Systems, Big Data

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Lecture Data Warehouse Systems

MongoDB and Couchbase

Database Scalability and Oracle 12c

NoSQL Data Base Basics

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

A survey of big data architectures for handling massive data

NoSQL. Thomas Neumann 1 / 22

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Energy Efficient MapReduce

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

How To Write A Database Program

Tier Architectures. Kathleen Durant CS 3200

Practical Cassandra. Vitalii

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Understanding NoSQL Technologies on Windows Azure

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

MySQL: Cloud vs Bare Metal, Performance and Reliability

An Oracle White Paper February Hadoop and NoSQL Technologies and the Oracle Database

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Introduction to Apache Cassandra

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Summary of Alma-OSF s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013

Introduction to Big Data Training

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

MongoDB Developer and Administrator Certification Course Agenda

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

Welcome to Virtual Developer Day MySQL!

Large-Scale Web Applications

Challenges for Data Driven Systems

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

NoSQL Databases. Polyglot Persistence

nosql and Non Relational Databases

GigaSpaces Real-Time Analytics for Big Data

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Apache HBase. Crazy dances on the elephant back

Moving From Hadoop to Spark

Study concluded that success rate for penetration from outside threats higher in corporate data centers

Department of Software Systems. Presenter: Saira Shaheen, Dated:

How To Handle Big Data With A Data Scientist

Big data and urban mobility

How graph databases started the multi-model revolution

NoSQL in der Cloud Why? Andreas Hartmann

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

How To Compare The Economics Of A Database To A Microsoft Database

MakeMyTrip CUSTOMER SUCCESS STORY

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Understanding NoSQL on Microsoft Azure

An Open Source NoSQL solution for Internet Access Logs Analysis

Oracle Big Data SQL Technical Update

So What s the Big Deal?

The Quest for Extreme Scalability

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

How to choose High Availability solutions for MySQL MySQL UC 2010 Yves Trudeau Read by Peter Zaitsev. Percona Inc MySQLPerformanceBlog.

HGST Virident Solutions 2.0

An Approach to Implement Map Reduce with NoSQL Databases

NoSQL Database Options

Performance and Scalability Overview

Cloud Big Data Architectures

Experimentation on Cloud Databases to Handle Genomic Big Data

Turn Big Data to Small Data

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Performance and Scalability Overview

Querying MongoDB without programming using FUNQL

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Enterprise Operational SQL on Hadoop Trafodion Overview

Big Data Technologies Compared June 2014

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Sentimental Analysis using Hadoop Phase 2: Week 2

NoSQL Systems for Big Data Management

Integrating Big Data into the Computing Curricula

NoSQL for SQL Professionals William McKnight

Comparing Scalable NOSQL Databases

How To Use Big Data For Telco (For A Telco)

Comparing SQL and NOSQL databases

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Transcription:

How MySQL and NoSQL Coexist Matt Yonkovit - Percona

DIAMOND SPONSORSHIPS THANK YOU TO OUR DIAMOND SPONSORS www.percona.com

Who Am I Matt Yonkovit - Principal Consultant Percona MySQL->Sun Veteran 15+ years Database Experience Lots of fun at parties

What We do

Where MySQL and the Ecosystem is today:

Lots of Data

MySQL focus: scale up Look at the advances made within innodb Scalability to 32+cores SSD enhancements Dealing with mega-sized buffers pools This fits with hardware Multi-core Lots of memory standard Cheap Flash

We Need More Servers!

Industry Says Scale Out Cloud Computing Rise of EC2 Scale on demand Pay for what you need Big companies run 1000's of servers Big Data retention drives more scale out Retention and Analysis out pace hardware growth This is part of the cyclical nature of computing, I remember hearing this from Oracle when RAC was first introduced, before that it was time share

Developers want easy and fast SQL Is Powerful But SQL can be overly complex for many operations Sometimes Optimizers in RDBM's do stupid things and developers can do it better SQL is yet another component a developer needs to learn Even for Simple SQL operations there is overhead to parse/execute

ORM Sucks Because developers want to be focused on being super developer ninja's the often turn to ORM's ORM's ( like active record ) work great for simple applications, but tend to bork when you have complex mappings

Fast Moving Changes What data and how that data is stored is changing at a break neck pace Changes to large databases are hard Example In a presentation Craigslist said their archive db took 1 month to alter a table. Other alters can still take days or weeks

Missed opportunities While the MySQL community as a whole has done an awesome job, we did miss a few things: Online Table Alters ( Add/Mod column ) Scale out -vs- Scale up Flexible data types

How other groups tackled the problems:

The Rise of NoSQL RDBM's of old did not keep up with the demand Need for fast, efficient data access Eliminate the pain points Eliminate the unneeded fluff Many websites got along well without things like Functions, Stored porcs, full acid, compliance, etc.

NoSQL Covers a lot Key/Value i.e. memcached, redis Column Stores i.e. Hbase Document Stores i.e. Mongo,Couch

More developer centric interface Instead of relying on a SQL interface many NoSQL solutions allow develoeprs to stay in code and directly access objects using their programing language of choice i.e. mongodb's javascript interface Allow data to be retrieved in an easily consumable format i.e. json, binary object, etc

Scale out Remove the complexities and automate sharding Make full use of multiple servers for complex tasks i.e. map reduce Ability to add servers on demand Support for fail-over and replication

Easy to change Change is a certainty in life... Users will demand it! Make sure that you are not bound to a rigid structure Allow for changes on the fly flexible schema

Custom Features Add support for features missing in MySQL or that solve a specific need GridFS (Mongo) Super Columns (Cassandra) Lists/Sets (Redis)

Where are we now? SQL -vs- NOSQL?

Too Much FUD A lot of developers like to hate on MySQL A NoSQL Solution (Like Mongo) does not mean: You do not have to think about your data types and schema design Does not mean add nodes will solve all your issues Does not mean you can get sloppy with code Will not fully replace a relational db Bad Code is Bad Code

As complex as you make it You still have to think with NoSQL How will this data be used Will we have to correlate this data with other data in the system Will duplication of data cause issues? Bloat space?

Too Much FUD A lot of people in MySQL hate on NoSQL Not All NOSQL solutions are made the same Some of them are durable, or are adding durability features ( i.e. mongo in 1.8+ ) While in most NOSQL more memory = better performance, the same holds true for MySQL Not every solution is eventual consistency Remember MySQL started as that other technology, mocked by classic DBA's Many of these solutions are at similar places where MySQL was 6-7 years ago.

MySQL is not Sitting Idle

Lessons from NDB In Many ways MySQL has had a NoSQL solution available for years NDB ( Mysql Cluster ) is at its code a NoSQL solution with a SQL wrapper Many deployments of NDB need to bypass the SQL layer and write directly to the NDB API in order to achieve optimal performance

Parsing SQL Can be slow libmysql (Akira s Numbers) samples % symbol name 748022 7.7355 MYSQLParse(void*) 219702 2.2720 my_pthread_fastmutex_lock 205606 2.1262 make_join_statistics( ) 198234 2.0500 btr_search_guess_on_hash Sources: http://www.mysqlperformanceblog.com/2011/03/16/where-does-handlersocket-really-save-youtime/ http://www.slideshare.net/akirahiguchi/handlersocket-20100629en-5698215

Handler Socket Developed by DeNA Direct Access to the Innodb storage engine, bypassing SQL Key-Value type access Yoshinori hit 750K QPS, faster then memcached, and 7X faster then Stock SQL http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-storyfor.html

Handler Socket Performance Benchmark by Vadim: http://www.mysqlperformanceblog.com/2010/11/02/handlersocket-on-ssd/

Memcached Oracle recently got in on the fun, introducing a Memcached interface for Innodb Key-Value access Uses libmemcached ( known and used protocol) Promise the ability to use the distributed hash capabilities of memcached to shard ( not available yet )

Question: Should you use MySQL or Look at NoSQL Solution? Answer: It depends on your application.

Questions you should ask Do you need transactions? Can you risk data loss? What are your performance requirements? What level of risk is acceptable? Costs involved? Hard costs: Servers, Infrastructure Soft costs: Developer Time

General MySQL Considerations Durability important ( Is some data loss acceptable? ) 3rd Party applications Integration with other RDBMS's Already invested in SQL? OLTP type Workloads? Transactions Joins

General NoSQL Considerations SQL Is Overkill Simple Key Values? Document Oriented No Standard Form or Consistent data stream Require CPU Based resources from several machines Huge amounts of archived data Rapid Changes to the structures Possible data inconsistencies Eventual consistency by product

Performance? Performance for both MySQL and NoSQL solutions can vary wildly. Impacted by: Schema/Object design Datatypes Indexes ORM Drivers You can in many cases you can trade performance for reliability/consistency

Space With Huge Datasets, space can be a premium Some NoSQL options can take up a lot more space then their MySQL counter parts. XML Example (no redundant data): 2.5GB in MongoDB 486MB in Innodb Some solutions have you duplicate data for performance and simplicity

Thou shall Benchmark Trust, but verify

Verify I have run into a lot of people who leap before they test Asking for trouble Don't be trapped by legacy problems If you have bad sql code, its not MySQL's fault ;) Use specific benchmarks, not generic ones

NoSQL Options

MongoDB Nice Json centric data storage Do not underestimate MapReduce capabilities Better durability in version 1.8+ Built in sharding Replication No Native Joins Trade performance for consistency Data footprint can be bigger then Mysql in some cases

Redis Pros: Cons: Fast when fully in memory, but can use virtual memory Using Virtual Memory prevents using other features Supports clustering Data loss possible Supports replication No Native joins Super fast Key Access only Complex data types like lists

Cassandra Pros: Cons: Auto-Sharding of data Eventual Consistency Replication Documentation not as deep as other solutions Parallel Processing Super Columns are interesting Speed is by product of adding more nodes Network Chatty Durability issues

Hadoop Pros: Cons: Great with Super Large Datasets (Petabytes) Can be complex Extreme Parallelism Can do complex CPU intensive tasks via map reduce Needs time to produce results No Native Joins/Indexes

Tokyo Cabinet Pros: Cons: Can Be Super Fast No Built in Sharding Multiple Table Types to Durability/Consistency issues support different requirements Lack of Documentation Embeddable Can Bog down if you run out Replication of memory Can use Memcached protocol

Couch Resistant to corruption No Sharding MVCC Based (Versioning) Can be slower then other solutions Javascript Rest Interface Replication Map Reduce Easy to Get started

MySQL Tricks There are ways to solve the NoSQL issues in MySQL, but they are manual or add complexity Storage engines Shard Query XML Data Sharding NDB/Cluster