MariaDB Cassandra interoperability



Similar documents
Apache Cassandra Query Language (CQL)

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

Cassandra vs MySQL. SQL vs NoSQL database comparison

New Features in MySQL 5.0, 5.1, and Beyond

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to am to 5 pm HOTEL DUBAI GRAND DUBAI

MySQL+HandlerSocket=NoSQL

Practical Cassandra. Vitalii

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Comparing SQL and NOSQL databases

White Paper. Optimizing the Performance Of MySQL Cluster

How To Create A Table In Sql (Ahem)

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Data Modeling in the New World with Apache Cassandra TM. Jonathan Ellis CTO, DataStax Project chair, Apache Cassandra

Database Migration from MySQL to RDM Server

4 Logical Design : RDM Schema Definition with SQL / DDL

Simba Apache Cassandra ODBC Driver

B.1 Database Design and Definition

CSC 443 Data Base Management Systems. Basic SQL

MS ACCESS DATABASE DATA TYPES

Using SQL Server Management Studio

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone

SQL. Short introduction

Database Administration with MySQL

Introduction to SQL for Data Scientists

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Cassandra. Jonathan Ellis

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

news from Tom Bacon about Monday's lecture

Integrating Big Data into the Computing Curricula

Introduction to Cassandra

Information Systems SQL. Nikolaj Popov

Getting Started with User Profile Data Modeling. White Paper

High Availability Solutions for the MariaDB and MySQL Database

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

SQL Server An Overview

Linas Virbalas Continuent, Inc.

Big Data and Scripting Systems build on top of Hadoop

!"# $ %& '( ! %& $ ' &)* + ! * $, $ (, ( '! -,) (# *&23. mysql> select * from from clienti;

Synchronous multi-master clusters with MySQL: an introduction to Galera

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Easy-Cassandra User Guide

Apache Cassandra for Big Data Applications

Partitioning under the hood in MySQL 5.5

XEP-0043: Jabber Database Access

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

How To Scale Out Of A Nosql Database

Cloud Scale Distributed Data Storage. Jürmo Mehine

Introduction to Apache Cassandra

sqlite driver manual

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Python Driver 1.0 for Apache Cassandra

May 6, DataStax Cassandra South Bay Meetup. Cassandra Modeling. Best Practices and Examples. Jay Patel Architect, Platform

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, Anthony Yeh, Software Engineer, YouTube.

Introduction to Hbase Gkavresis Giorgos 1470

Going Native With Apache Cassandra. QCon London, 2014

MADOCA II Data Logging System Using NoSQL Database for SPring-8

CQL for Cassandra 2.2 & later

Real-time reporting at 10,000 inserts per second. Wesley Biggs CTO 25 October 2011 Percona Live

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Information Technology NVEQ Level 2 Class X IT207-NQ2012-Database Development (Basic) Student s Handbook

Why Migrate from MySQL to Cassandra?

Services. Relational. Databases & JDBC. Today. Relational. Databases SQL JDBC. Next Time. Services. Relational. Databases & JDBC. Today.

Percona Server features for OpenStack and Trove Ops

CQL for Cassandra 2.0 & 2.1

FAQs. This material is built based on. Lambda Architecture. Scaling with a queue. 8/27/2015 Sangmi Pallickara

NoSQL storage and management of geospatial data with emphasis on serving geospatial data using standard geospatial web services

MySQL Cluster Deployment Best Practices

Spider Storage Engine: The sharding plugin of MySQL/MariaDB Introducing and newest topics. Spiral-Arm / Team-Lab Kentoku SHIBA

Introduction to Big Data Training

Extracting META information from Interbase/Firebird SQL (INFORMATION_SCHEMA)

The release notes provide details of enhancements and features in Cloudera ODBC Driver for Impala , as well as the version history.

Apache Sqoop. A Data Transfer Tool for Hadoop

4 Simple Database Features

CASSANDRA. Arash Akhlaghi, Badrinath Jayakumar, Wa el Belkasim. Instructor: Dr. Rajshekhar Sunderraman. CSC 8711 Project Report

MySQL Storage Engines

MongoDB Developer and Administrator Certification Course Agenda

Service Integration course. Cassandra

Brisk: More Powerful Hadoop Powered by Cassandra.

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD METAMARKETS

Agenda. ! Strengths of PostgreSQL. ! Strengths of Hadoop. ! Hadoop Community. ! Use Cases

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce

Going Native With Apache Cassandra. NoSQL Matters, Cologne, 2014

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

So What s the Big Deal?

Xiaoming Gao Hui Li Thilina Gunarathne

Understanding NoSQL Technologies on Windows Azure

Structured Data Storage

Scaling Up 2 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. HBase, Hive

Why Zalando trusts in PostgreSQL

Data storing and data access

Open Source Technologies on Microsoft Azure

Transcription:

MariaDB Cassandra interoperability Cassandra Storage Engine in MariaDB Sergei Petrunia Colin Charles

Who are we Sergei Petrunia Principal developer of CassandraSE, optimizer developer, formerly from MySQL psergey@mariadb.org Colin Charles Chief Evangelist, MariaDB, formerly from MySQL colin@mariadb.org

Agenda An introduction to Cassandra The Cassandra Storage Engine (Cassandra SE) Data mapping Use cases Benchmarks Conclusion

Background: what is Cassandra A distributed NoSQL database Key-Value store Limited range scan suppor Optionally flexible schema Pre-defined static columns Ad-hoc dynamic columns Automatic sharding / replication Eventual consistency

Background: Cassandra's data model Column families like tables Row key columns Somewhat similar to SQL but some important differences. Supercolumns are not supported

CQL Cassandra Query Language Looks like SQL at first glance bash$ cqlsh -3 cqlsh> CREATE KEYSPACE mariadbtest... WITH REPLICATION ={'class':'simplestrategy','replication_factor':1}; cqlsh> use mariadbtest; cqlsh:mariadbtest> create columnfamily cf1 ( pk varchar primary key,... data1 varchar, data2 bigint... ) with compact storage; cqlsh:mariadbtest> insert into cf1 (pk, data1,data2)... values ('row1', 'data-in-cassandra', 1234); cqlsh:mariadbtest> select * from cf1; pk data1 data2 ------+-------------------+------- row1 data-in-cassandra 1234

CQL is not SQL Similarity with SQL is superficial cqlsh:mariadbtest> select * from cf1 where pk='row1'; pk data1 data2 ------+-------------------+------- row1 data-in-cassandra 1234 cqlsh:mariadbtest> select * from cf1 where data2=1234; Bad Request: No indexed columns present in by-columns clause with Equal operator cqlsh:mariadbtest> select * from cf1 where pk='row1' or pk='row2'; Bad Request: line 1:34 missing EOF at 'or' No joins or subqueries No GROUP BY, ORDER BY must be able to use available indexes WHERE clause must represent an index lookup.

Cassandra Storage Engine Starts a NoCQL movement Provides a view of Cassandra's data from MariaDB.

1. Load the Cassandra SE plugin Get MariaDB 10.0.1+ Load the Cassandra plugin From SQL: MariaDB [(none)]> install plugin cassandra soname 'ha_cassandra.so'; Or, add a line to my.cnf: [mysqld]... plugin-load=ha_cassandra.so Check it is loaded MariaDB [(none)]> show plugins; +--------------------+--------+-----------------+-----------------+---------+ Name Status Type Library License +--------------------+--------+-----------------+-----------------+---------+... CASSANDRA ACTIVE STORAGE ENGINE ha_cassandra.so GPL +--------------------+--------+-----------------+-----------------+---------+

0 2. Connect to Cassandra Create an SQL table which is a view of a column family MariaDB [test]> set global cassandra_default_thrift_host='10.196.2.113'; Query OK, 0 rows affected (0.00 sec) MariaDB [test]> create table t2 (pk varchar(36) primary key, -> data1 varchar(60), -> data2 bigint -> ) engine=cassandra -> keyspace='mariadbtest' -> thrift_host='10.196.2.113' -> column_family='cf1'; Query OK, 0 rows affected (0.01 sec) thrift_host can be set per-table @@cassandra_default_thrift_host allows to Re-point the table to different node dynamically Not change table DDL when Cassandra IP changes.

1 Possible gotchas SELinux blocks the connection MariaDB [test]> create table t1 (... ) engine=cassandra... ; ERROR 1429 (HY000): Unable to connect to foreign data source: connect() failed: Permission denied [1] Packaging bug To get running quickly: echo 0 >/selinux/enforce Cassandra 1.2 and CFs without COMPACT STORAGE MariaDB [test]> create table t1 (... ) engine=cassandra... ; ERROR 1429 (HY000): Unable to connect to foreign data source: Column family cf1 not found in keyspace mariadbtest Caused by a change in Cassandra 1.2 They broke Pig also We intend to update CassandraSE for 1.2

2 Accessing Cassandra data Can get Cassandra's data MariaDB [test]> select * from t2; +------+-------------------+-------+ pk data1 data2 +------+-------------------+-------+ row1 data-in-cassandra 1234 +------+-------------------+-------+ Can insert data MariaDB [test]> insert into t2 values ('row2','data-from-mariadb', 123); Query OK, 1 row affected (0.00 sec) Cassandra sees inserted data cqlsh:mariadbtest> select * from cf1; pk data1 data2 ------+-------------------+------- row1 data-in-cassandra 1234 row2 data-from-mariadb 123

Data mapping between Cassandra and SQL

4 Data mapping between Cassandra and SQL create table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1' MariaDB table represents Cassandra's Column Family Can use any table name, column_family=... specifies CF.

5 Data mapping between Cassandra and SQL create table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1' MariaDB table represents Cassandra's Column Family Can use any table name, column_family=... specifies CF. Table must have a primary key Name/type must match Cassandra's rowkey

6 Data mapping between Cassandra and SQL create table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1' MariaDB table represents Cassandra's Column Family Can use any table name, column_family=... specifies CF. Table must have a primary key Name/type must match Cassandra's rowkey Columns map to Cassandra's static columns Name must be the same as in Cassandra Datatypes must match Can any subset of CF's columns

Datatype mapping CF column datatype determines MariaDB datatype Cassandra blob ascii text varint int bigint uuid timestamp boolean float double decimal counter MariaDB BLOB, VARBINARY(n) BLOB, VARCHAR(n), use charset=latin1 BLOB, VARCHAR(n), use charset=utf8 VARBINARY(n) INT BIGINT, TINY, SHORT CHAR(36) (text in MariaDB) TIMESTAMP (second precision), TIMESTAMP(6) (microsecond precision), BIGINT BOOL FLOAT DOUBLE VARBINARY(n) BIGINT

8 Dynamic columns Cassandra supports dynamic column families Can access ad-hoc columns with MariaDB's dynamic columns feature create table tbl ( rowkey type PRIMARY KEY column1 type,... dynamic_cols blob DYNAMIC_COLUMN_STORAGE=yes ) engine=cassandra keyspace=... column_family=...; insert into tbl values (1, column_create('col1', 1, 'col2', 'value-2')); select rowkey, column_get(dynamic_cols, 'uuidcol' as char) from tbl;

Data mapping is safe Cassandra SE will refuse incorrect mappings create table t3 (pk varchar(60) primary key, no_such_field int) engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1'; ERROR 1928 (HY000): Internal error: 'Field `no_such_field` could not be mapped to any field in Cassandra' create table t3 (pk varchar(60) primary key, data1 double) engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1'; ERROR 1928 (HY000): Internal error: 'Failed to map column data1 to datatype org.apache.cassandra.db.marshal.utf8type'

Command mapping

Command Mapping Cassandra commands PUT (upsert) GET Scan DELETE (if exists) SQL commands SELECT GET/Scan INSERT PUT (upsert) UPDATE/DELETE read+write.

SELECT command mapping MariaDB has an SQL interpreter Cassandra SE supports lookups and scans Can now do Arbitrary WHERE clauses JOINs between Cassandra tables and MariaDB tables Batched Key Access is supported

DML command mapping No SQL semantics INSERT overwrites rows UPDATE reads, then writes Have you updated what you read DELETE reads, then deletes Can't be sure if/what you have deleted Not as bad as it sounds, it's Cassandra Cassandra SE doesn't make it SQL.

Cassandra SE use cases

Cassandra use cases Collect massive amounts of data Web page hits Sensor updates Updates are naturally non-conflicting Keyed by UUIDs, timestamps Reads are served with one lookup Good for certain kinds of data Moving from SQL entirely may be difficult

Cassandra SE use cases (1) Access Cassandra data from SQL Send an update to Cassandra Be a sensor Grab a piece of data from Cassandra This web page was last viewed by Last known position of this user was....

Cassandra SE use cases (2) Coming from MySQL/MariaDB side: Want a special table that is auto-replicated fault-tolerant Very fast? Get Cassandra, and create a Cassandra SE table.

8 Cassandra Storage Engine non-use cases Huge, sift-through-all-data joins Use Pig Bulk data transfer to/from Cassandra cluster Use Sqoop A replacement for InnoDB No full SQL semantics

A benchmark One table EC2 environment m1.large nodes Ephemeral disks Stream of single-line INSERTs Tried Innodb and Cassandra Hardly any tuning

0 Conclusions Cassandra SE can be used to peek at data in Cassandra from MariaDB. It is not a replacement for Pig/Hive It is really easy to setup and use

1 Going Forward Looking for input Do you want support for Fast counter columns updates? Awareness of Cassandra cluster topology? Secondary indexes??

2 Resources https://kb.askmonty.org/en/cassandrase/ http://wiki.apache.org/cassandra/datamodel http://cassandra.apache.org/ http://www.datastax.com/docs/1.1/ddl/column_family

3 Thanks! Q & A

4 Extra: Cassandra SE internals Developed against Cassandra 1.1 Uses Thrift API cannot stream CQL resultset in 1.1 Cant use secondary indexes Only supports AllowAllAuthenticator In Cassandra 1.2 CQL Binary Protocol with streaming CASSANDRA-5234: Thrift can only read CFs WITH COMPACT STORAGE