Use Your MySQL Knowledge to Become an Instant Cassandra Guru

Size: px
Start display at page:

Download "Use Your MySQL Knowledge to Become an Instant Cassandra Guru"

Transcription

1 Use Your MySQL Knowledge to Become an Instant Cassandra Guru Percona Live Santa Clara 2014 Robert Hodges CEO Continuent Tim Callaghan VP/Engineering Tokutek

2 Who are we? Robert Hodges CEO at Continuent Database nerd since 1982 starting with M204, RDBMS since 1990, NoSQL since 2012; designed Continuent Tungsten Continuent offers clustering and replication for MySQL and other fine DBMS types Tim Callaghan VP/Engineering at Tokutek Long time database consumer (Oracle) and producer (VoltDB, Tokutek) Tokutek offers Fractal Tree indexes in MySQL (TokuDB) and MongoDB (TokuMX)

3 Cassandra! Cassandra, used by NetFlix, ebay, Twitter, Reddit and many others, is one of today's most popular NoSQLdatabases in use. According to the website, the largest known Cassandra setup involves over 300 TB of data on over 400 machines. High Performance Reads and Writes Linear Scalability High Availability

4 One Good Thing about Cassandra Cassandra makes it easy to scale capacity Existing cluster nodes running out of space Start new nodes and let them join Stored data redistribute automatically

5 One Bad Thing about Cassandra cql> create table foo ( id int primary key, customerid int, orderid int, ordervalue double); OK! <Me: I think I m gonna like Cassandra.> cql> create index idx_foo_1 on foo (customerid, orderid); ERROR! - Secondary indexes only support 1 column <Me: I think I just changed my mind. CQL!= SQL> <Me: (much later) secondary indexes aren t that useful>

6 Today s Question How can you use your MySQL knowledge to get up to speed on Cassandra?

7 CQL CQL is not SQL!

8 What is CQL? Cassandra originally used a Thrift RPC-based API CQL was added in 0.8 Simplifies access Smaller learning curve for SQL users So, you ll feel right at home Create table Data types Insert, update, select, delete But Cassandra isn t pretending to be relational...

9 What ISN T CQL? It s familiar, and then it s not! No joins No foreign keys No not null No sum(), group(), min(), Some ORDER BY support Single row UPDATE Limited secondary indexing INSERT == UPDATE but it all behaves like REPLACE INTO

10 The Bottom Line on CQL It s just a language It s very similar to SQL Don t blame CQL What first appears as a limitation in Cassandra can also be a strength CQL enables us to get started quickly Try Cassandra 0.7 for a little while Commit this to memory... You do not just throw data into Cassandra and add later indexes to make it fast.

11 Schema Design Easy to learn and difficult to master - Nolan Bushnell (founder of Atari and Chuck E. Cheese s Pizza-Time)

12 Coming from a Relational World? Tradeoffs are hard Feature RDBMS Cassandra Single Point of Failure Cross Datacenter Linear Scaling Data Modeling *source = Patrick McFadin (@PatrickMcMcFadin)

13 How is My Data Organized? Relational Model Cassandra Model Database Keyspace Table Column Family Primary Key Row Key Column Name Column Name/Key Column Value Column Value

14 What is a BigTable? Cassandra uses it for the data model Supports extremely wide rows Row lookup is fast and easily distributed Columns are sorted by name Up to 2 billion! Row Key Column Name Column Name Column Name Column Name Column Value Column Value Column Value Column Value Timestamp Timestamp Timestamp Timestamp TTL TTL TTL TTL

15 What About Data Types? Unlike some other NoSQL databases, types are included via CQL Usual suspects ascii, bigint, blob, boolean, decimal, varchar, More interesting uuid - global uniqueness timeuuid - global uniqueness, sorted by time portion inet - IPv4 or IPv6 address varint - variable precision integer

16 How Do I Create a Static Table? CREATE TABLE users ( user varchar, These rows look and feel familiar. varchar, state varchar, PRIMARY KEY (user) ); user state timestamp ttl timestamp ttl Primary key is hashed for location/placement (more on that later) Meaning you cannot range scan by PK No not null or varchar sizing (enforce in applications) Reads (on username) and all writes are easily distributed Schema flexibility without downtime alter table users add column password varchar;

17 No Auto-Increment? Auto-increment is extremely difficult in a distributed system Use a natural primary key, if possible Remember, small primary keys aren t important when denormalizing Or, use uuid / timeuuid Generated in your client applications CREATE TABLE payments ( id timeuuid, user varchar, type varchar, amount decimal, PRIMARY KEY (id) );

18 What About Secondary Indexes? CREATE TABLE users ( user varchar, varchar, state varchar, PRIMARY KEY (user)); -- OPTION 1 : create an index CREATE INDEX idxubs on users (state); -- OPTION 2 : create another table (store data twice) CREATE TABLE usersbystate ( state varchar, user varchar, PRIMARY KEY (state, user));

19 Why Not Create Secondary Indexes? select * from users where state = CA ; Secondary index Pro : Index is automatically maintained Con : Above query sent to entire cluster (slows everyone down) Additional table Pro : Above query is directed to single node (important for scalability) Con : Index is manually maintained (insert data twice, which is OK) General tips Low selectivity is bad - (male/female) Extremely high selectivity is bad - (unique) In general, create additional table and don t busy entire cluster for reads

20 How Do I Create a Dynamic Table? CREATE TABLE payments ( user varchar, - Compound PK creates wide row - Remainder of columns are grouped - Still 1 row per user id timeuuid, type varchar, amount decimal, PRIMARY KEY (user, id) ); id1 type timestamp ttl amount id2 type amount timestamp ttl timestamp ttl timestamp ttl user Affects how data is stored/organized Still returned in row format to CSQL Enables ORDER BY id and range query on id for user =? queries Remember, BigTables support up to 2 billion columns

21 What About Relationships? Relational table emp (empname text PK, deptid int,...); index empidx1 on emp (deptid); table dept (deptid int PK, deptname text); Store department name with employee Cassandra table emp (empname text PK, deptid int, deptname text); table dept ( Store employee names with department (up to ~2 billion) deptid int, deptname text, empname text, PRIMARY KEY ((deptid, deptname), empname)); 1, Accounting FrankJones FredJones SamSmith

22 What About Time Series Data? CREATE TABLE dashboard ( dashboardid text, event_time timestamp, event_value double, PRIMARY KEY (dashboardid, event_time)) WITH CLUSTERING ORDER BY (event_time DESC); great for last n, like dashboards WITH CLUSTERING ORDER BY () Data is stored in given order in BigTable row In this case descending by event_time Easy to access most recent data

23 How Do I Remove Time Series Data? CREATE TABLE dashboard ( dashboardid text, event_time timestamp, event_value double, PRIMARY KEY (dashboardid, event_time)) WITH CLUSTERING ORDER BY (event_time DESC); insert into dashboard (dashboardid, event_time, event_value) values (25, now(), ) using ttl 86400; defined at the data level data magically disappears no more cron jobs

24 Are There Other Cool Schemas? collections : sets, lists, maps an alternative to making the row wide set<text> : ordered by CQL Type comparator list<text> : ordered by insertion map<int,text> : unordered support for 64,000 objects per collection (but don t go crazy) create table users ( user varchar, s set<text>, PRIMARY KEY (user)); insert into users (user, s) values ( tmcallaghan, { tim@tokutek.com, timc@tokutek.com });

25 Topic: Transactions

26 MySQL Transactions and Isolation InnoDB creates MVCC view of data; locks updated rows, commits atomically mysql> BEGIN;... mysql> INSERT INTO sample(data) VALUES ( Hello world! ); mysql> INSERT INTO sample(data) VALUES ( Goodbye world! );... MyISAM locks table mysql> COMMIT; and commits each row immediately

27 Does Cassandra Have Transactions? Transactions includes a lot of things so No: Updates on different rows are separate, like MyISAM Failed transactions on replicas might create partial writes (Cassandra does not guarantee clean-up) Yes: Updates to single rows are atomic and isolated Updates to rows are durable (logged) as well (Important: Cassandra rows can be bigger than MySQL)

28 How Does Cassandra Handle Locks? Cassandra uses timestamps instead create columnfamily sample(id int primary key, data varchar, n int); insert into sample(id, data, n) values(1, 'hello', 25); insert into sample(id, data, n) values(2, 'goodbye', 27); cqlsh:cbench> update sample set data='goodbye!' where id=2; cqlsh:cbench> select id, writetime(data),writetime(n) from sample; id writetime(data) writetime(n) Updated together Updated separately Timestamps are the same for columns updated at the same time

29 How Does Cassandra Handle Isolation? Row updates are completely isolated until they are completed Updates can propagate out to replicas at different times

30 Topic: Replication and HA

31 Review of MySQL Replication

32 How Does Cassandra Replication Work? Cassandra is fully multi-master Updates are allowed at any location Updates can happen even when there is a partition Coordinator Client program issues write db1 db2 db3 db4 Writes proxied to other instances

33 How Many Replicas Are There? The number of replicas and how they are distributed are properties of keyspaces CREATE KEYSPACE cbench WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '3' }; Strategy class to distribute replicas Keep 3 copies of data

34 MySQL Partitioning MySQL partitioning breaks a table into <n> tables PARTITION is actually a storage engine Tables can be partitioned by hash or range Hash = random distribution Range = user controlled distribution (date range) Helpful in big data use-cases Partitions can usually be dropped efficiently Unlike delete from table1 where timefield < 12/31/2012 ;

35 How Does Cassandra Partition Data? Cassandra splits data by key across hosts using a ring to assign locations H A G B F C E db1 ABCDEF db2 CDEFGH db3 EFGHAB db4 GHABCD D Virtual node D gets ⅛ of hash range Copy assigned to host db4 by strategy

36 Partitioning in Action Replica placement is based on primary key Primary Key insert into sample(id, data) values (550e8400-e29b-41d4-a , 'hello!'); Run hash function on value, assign to virtual node, from there to actual hosts

37 What About Conflicts? Any client can update rows from any location Cassandra resolves conflicts using timestamps Conflicts are resolved at the cell level The latest timestamp value wins UPDATE sample SET data = hello, age=35 WHERE id=352 UPDATE sample SET data = bye WHERE id=352 LOSE WIN WIN Id Data Age 352 bye 35

38 So Is This Like Galera? Galera uses ACID transactions with optimistic locking It would accept the first transaction and roll back the second completely UPDATE sample SET data = hello, age=35 WHERE id=352 UPDATE sample SET data = bye WHERE id=352 WIN WIN LOSE Id Data Age 352 bye 35

39 How Does Failover Work? Cassandra does not have failover If a node fails or disappears, others continue Writes or reads may stop if too many replicas fail Coordinator Client program issues write db1 db2 X db4 Writes proxied to other instances

40 Tunable Consistency Clients define the level of consistency cqlsh:cbench> consistency all Consistency level set to ALL. cqlsh:cbench> update rtest set k1=4444 where id=3; Unable to complete request: one or more nodes were unavailable. cqlsh:cbench> select * from cbench.rtest where id=3; Unable to complete request: one or more nodes were unavailable. cqlsh:cbench> consistency quorum Consistency level set to QUORUM. cqlsh:cbench> update rtest set k1=4444 where id=3;

41 What Happens To Failed Writes? Cassandra has several repair mechanisms for failures Hinted Handoff - The coordinator remembers the write and replays it when node returns Read Repair - Coordinator for a later read notices that a value is out of sync Node Repair - Run an external process (nodetool) to scan for and fix inconsistencies

42 What Else is There to Learn? A lot would be an understatement, but here are some topics to consider. You need to rethink your data model, so read up, practice, repeat Data consistency is an application problem Storage/Internals : LSM, Compaction, Tombstones, Bloom Filters

43 What Should You Do?

44 Summary 1 We liked CQL + Tables made it easy to get started HA Scaling model Look [out] for Don t bother trying to port your MySQL application Query patterns are critical, model around them Pay attention to version when reading docs/blogs Cassandra is quickly evolving (0.7, 1.0, 1.2, 2.0, )

45 Summary 2 Highly recommended reading Book: Practical Cassandra: A Developer s Approach Datastax has great docs, Presentations Blogs

46 Questions? Robert Hodges CEO, Tim Callaghan VP/Engineering,

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00 Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn

More information

Apache Cassandra for Big Data Applications

Apache Cassandra for Big Data Applications Apache Cassandra for Big Data Applications Christof Roduner COO and co-founder christof@scandit.com Java User Group Switzerland January 7, 2014 2 AGENDA Cassandra origins and use How we use Cassandra Data

More information

Cassandra vs MySQL. SQL vs NoSQL database comparison

Cassandra vs MySQL. SQL vs NoSQL database comparison Cassandra vs MySQL SQL vs NoSQL database comparison 19 th of November, 2015 Maxim Zakharenkov Maxim Zakharenkov Riga, Latvia Java Developer/Architect Company Goals Explore some differences of SQL and NoSQL

More information

MariaDB Cassandra interoperability

MariaDB Cassandra interoperability MariaDB Cassandra interoperability Cassandra Storage Engine in MariaDB Sergei Petrunia Colin Charles Who are we Sergei Petrunia Principal developer of CassandraSE, optimizer developer, formerly from MySQL

More information

Apache Cassandra Query Language (CQL)

Apache Cassandra Query Language (CQL) REFERENCE GUIDE - P.1 ALTER KEYSPACE ALTER TABLE ALTER TYPE ALTER USER ALTER ( KEYSPACE SCHEMA ) keyspace_name WITH REPLICATION = map ( WITH DURABLE_WRITES = ( true false )) AND ( DURABLE_WRITES = ( true

More information

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg HDB++: HIGH AVAILABILITY WITH Page 1 OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 2 OVERVIEW What is Cassandra (C*)?

More information

Data Modeling in the New World with Apache Cassandra TM. Jonathan Ellis CTO, DataStax Project chair, Apache Cassandra

Data Modeling in the New World with Apache Cassandra TM. Jonathan Ellis CTO, DataStax Project chair, Apache Cassandra Data Modeling in the New World with Apache Cassandra TM Jonathan Ellis CTO, DataStax Project chair, Apache Cassandra Download & install Cassandra http://planetcassandra.org/cassandra/ 2014 DataStax. Do

More information

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI Big Data Development CASSANDRA NoSQL Training - Workshop March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 121109 Dubai UAE, email training-coordinator@isidusnet M: +97150

More information

Simba Apache Cassandra ODBC Driver

Simba Apache Cassandra ODBC Driver Simba Apache Cassandra ODBC Driver with SQL Connector 2.2.0 Released 2015-11-13 These release notes provide details of enhancements, features, and known issues in Simba Apache Cassandra ODBC Driver with

More information

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election

More information

Introduction to Cassandra

Introduction to Cassandra Introduction to Cassandra DuyHai DOAN, Technical Advocate Agenda! Architecture cluster replication Data model last write win (LWW), CQL basics (CRUD, DDL, collections, clustering column) lightweight transactions

More information

Xiaowe Xiaow i e Wan Wa g Jingxin Fen Fe g n Mar 7th, 2011

Xiaowe Xiaow i e Wan Wa g Jingxin Fen Fe g n Mar 7th, 2011 Xiaowei Wang Jingxin Feng Mar 7 th, 2011 Overview Background Data Model API Architecture Users Linearly scalability Replication and Consistency Tradeoff Background Cassandra is a highly scalable, eventually

More information

CQL for Cassandra 2.0 & 2.1

CQL for Cassandra 2.0 & 2.1 CQL for Cassandra 2.0 & 2.1 Documentation January 21, 2016 Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation 2016 DataStax, Inc. All rights

More information

Database Administration with MySQL

Database Administration with MySQL Database Administration with MySQL Suitable For: Database administrators and system administrators who need to manage MySQL based services. Prerequisites: Practical knowledge of SQL Some knowledge of relational

More information

Distributed Systems. Tutorial 12 Cassandra

Distributed Systems. Tutorial 12 Cassandra Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse

More information

CQL for Cassandra 2.2 & later

CQL for Cassandra 2.2 & later CQL for Cassandra 2.2 & later Documentation January 21, 2016 Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation 2016 DataStax, Inc. All rights

More information

Distributed Storage Systems part 2. Marko Vukolić Distributed Systems and Cloud Computing

Distributed Storage Systems part 2. Marko Vukolić Distributed Systems and Cloud Computing Distributed Storage Systems part 2 Marko Vukolić Distributed Systems and Cloud Computing Distributed storage systems Part I CAP Theorem Amazon Dynamo Part II Cassandra 2 Cassandra in a nutshell Distributed

More information

Future-Proofing MySQL for the Worldwide Data Revolution

Future-Proofing MySQL for the Worldwide Data Revolution Future-Proofing MySQL for the Worldwide Data Revolution Robert Hodges, CEO. What is Future-Proo!ng? Future-proo!ng = creating systems that last while parts change and improve MySQL is not losing out to

More information

New Features in MySQL 5.0, 5.1, and Beyond

New Features in MySQL 5.0, 5.1, and Beyond New Features in MySQL 5.0, 5.1, and Beyond Jim Winstead jimw@mysql.com Southern California Linux Expo February 2006 MySQL AB 5.0: GA on 19 October 2005 Expanded SQL standard support: Stored procedures

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

May 6, 2013. DataStax Cassandra South Bay Meetup. Cassandra Modeling. Best Practices and Examples. Jay Patel Architect, Platform Systems @pateljay3001

May 6, 2013. DataStax Cassandra South Bay Meetup. Cassandra Modeling. Best Practices and Examples. Jay Patel Architect, Platform Systems @pateljay3001 May 6, 2013 DataStax Cassandra South Bay Meetup Cassandra Modeling Best Practices and Examples Jay Patel Architect, Platform Systems @pateljay3001 That s me Technical Architect @ ebay Passion for building

More information

NoSQL Database Options

NoSQL Database Options NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has

More information

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what

More information

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it Dan Ariely MYSQL AND HBASE ECOSYSTEM

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

Database Replication with MySQL and PostgreSQL

Database Replication with MySQL and PostgreSQL Database Replication with MySQL and PostgreSQL Fabian Mauchle Software and Systems University of Applied Sciences Rapperswil, Switzerland www.hsr.ch/mse Abstract Databases are used very often in business

More information

Database Scalability {Patterns} / Robert Treat

Database Scalability {Patterns} / Robert Treat Database Scalability {Patterns} / Robert Treat robert treat omniti postgres oracle - mysql mssql - sqlite - nosql What are Database Scalability Patterns? Part Design Patterns Part Application Life-Cycle

More information

Linas Virbalas Continuent, Inc.

Linas Virbalas Continuent, Inc. Linas Virbalas Continuent, Inc. Heterogeneous Replication Replication between different types of DBMS / Introductions / What is Tungsten (the whole stack)? / A Word About MySQL Replication / Tungsten Replicator:

More information

Cassandra A Decentralized Structured Storage System

Cassandra A Decentralized Structured Storage System Cassandra A Decentralized Structured Storage System Avinash Lakshman, Prashant Malik LADIS 2009 Anand Iyer CS 294-110, Fall 2015 Historic Context Early & mid 2000: Web applicaoons grow at tremendous rates

More information

Cassandra. Jonathan Ellis

Cassandra. Jonathan Ellis Cassandra Jonathan Ellis Motivation Scaling reads to a relational database is hard Scaling writes to a relational database is virtually impossible and when you do, it usually isn't relational anymore The

More information

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359

More information

NoSQL: Going Beyond Structured Data and RDBMS

NoSQL: Going Beyond Structured Data and RDBMS NoSQL: Going Beyond Structured Data and RDBMS Scenario Size of data >> disk or memory space on a single machine Store data across many machines Retrieve data from many machines Machine = Commodity machine

More information

How To Use Big Data For Telco (For A Telco)

How To Use Big Data For Telco (For A Telco) ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

More information

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014 Highly available, scalable and secure data with Cassandra and DataStax Enterprise GOTO Berlin 27 th February 2014 About Us Steve van den Berg Johnny Miller Solutions Architect Regional Director Western

More information

Cassandra in Action ApacheCon NA 2013

Cassandra in Action ApacheCon NA 2013 Cassandra in Action ApacheCon NA 2013 Yuki Morishita Software Developer@DataStax / Apache Cassandra Committer 1 2 ebay Application/Use Case Social Signals: like/want/own features for ebay product and item

More information

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in

More information

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords

More information

Apache Cassandra Present and Future. Jonathan Ellis

Apache Cassandra Present and Future. Jonathan Ellis Apache Cassandra Present and Future Jonathan Ellis History Bigtable, 2006 Dynamo, 2007 OSS, 2008 Incubator, 2009 TLP, 2010 1.0, October 2011 Why people choose Cassandra Multi-master, multi-dc Linearly

More information

Partitioning under the hood in MySQL 5.5

Partitioning under the hood in MySQL 5.5 Partitioning under the hood in MySQL 5.5 Mattias Jonsson, Partitioning developer Mikael Ronström, Partitioning author Who are we? Mikael is a founder of the technology behind NDB

More information

NoSQL. Thomas Neumann 1 / 22

NoSQL. Thomas Neumann 1 / 22 NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

Transactions and ACID in MongoDB

Transactions and ACID in MongoDB Transactions and ACID in MongoDB Kevin Swingler Contents Recap of ACID transactions in RDBMSs Transactions and ACID in MongoDB 1 Concurrency Databases are almost always accessed by multiple users concurrently

More information

Real-time reporting at 10,000 inserts per second. Wesley Biggs CTO 25 October 2011 Percona Live

Real-time reporting at 10,000 inserts per second. Wesley Biggs CTO 25 October 2011 Percona Live Real-time reporting at 10,000 inserts per second Wesley Biggs CTO 25 October 2011 Percona Live Agenda 1. Who we are, what we do, and (maybe) why we do it 2. Solution architecture and evolution 3. Top 5

More information

Zero Downtime Deployments with Database Migrations. Bob Feldbauer twitter: @bobfeldbauer email: bob.feldbauer@timgroup.com

Zero Downtime Deployments with Database Migrations. Bob Feldbauer twitter: @bobfeldbauer email: bob.feldbauer@timgroup.com Zero Downtime Deployments with Database Migrations Bob Feldbauer twitter: @bobfeldbauer email: bob.feldbauer@timgroup.com Deployments Two parts to deployment: Application code Database schema changes (migrations,

More information

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You

More information

these three NoSQL databases because I wanted to see a the two different sides of the CAP

these three NoSQL databases because I wanted to see a the two different sides of the CAP Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the

More information

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

More information

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,

More information

Real-Time Big Data in practice with Cassandra. Michaël Figuière @mfiguiere

Real-Time Big Data in practice with Cassandra. Michaël Figuière @mfiguiere Real-Time Big Data in practice with Cassandra Michaël Figuière @mfiguiere Speaker Michaël Figuière @mfiguiere 2 Ring Architecture Cassandra 3 Ring Architecture Replica Replica Replica 4 Linear Scalability

More information

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Robert C. Nickerson ISYS 464 Spring 2003 Topic 23 Database

More information

Going Native With Apache Cassandra. QCon London, 2014 www.datastax.com @DataStaxEMEA

Going Native With Apache Cassandra. QCon London, 2014 www.datastax.com @DataStaxEMEA Going Native With Apache Cassandra QCon London, 2014 www.datastax.com @DataStaxEMEA About Me Johnny Miller Solutions Architect www.datastax.com @DataStaxEU jmiller@datastax.com @CyanMiller https://www.linkedin.com/in/johnnymiller

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

Tipping The Scale Tips, Tools, and Techniques For Building Scalable. Steve French Senior Software Engineer digg.com

Tipping The Scale Tips, Tools, and Techniques For Building Scalable. Steve French Senior Software Engineer digg.com Tipping The Scale Tips, Tools, and Techniques For Building Scalable Steve French Senior Software Engineer digg.com First Thing s First... The Stack Server OS Linux, MacOS X, UNIX, Windows Web Server apache,

More information

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database. Physical Design Physical Database Design (Defined): Process of producing a description of the implementation of the database on secondary storage; it describes the base relations, file organizations, and

More information

Replicating to everything

Replicating to everything Replicating to everything Featuring Tungsten Replicator A Giuseppe Maxia, QA Architect Vmware About me Giuseppe Maxia, a.k.a. "The Data Charmer" QA Architect at VMware Previously at AB / Sun / 3 times

More information

Using SQL Server Management Studio

Using SQL Server Management Studio Using SQL Server Management Studio Microsoft SQL Server Management Studio 2005 is a graphical tool for database designer or programmer. With SQL Server Management Studio 2005 you can: Create databases

More information

Python Driver 1.0 for Apache Cassandra

Python Driver 1.0 for Apache Cassandra Python Driver 1.0 for Apache Cassandra Document August 13, 2015 2015 DataStax. All rights reserved. Contents Contents About the Python driver... 3 Installation...3 The driver and its dependencies... 4

More information

Tushar Joshi Turtle Networks Ltd

Tushar Joshi Turtle Networks Ltd MySQL Database for High Availability Web Applications Tushar Joshi Turtle Networks Ltd www.turtle.net Overview What is High Availability? Web/Network Architecture Applications MySQL Replication MySQL Clustering

More information

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3 Storage Structures Unit 4.3 Unit 4.3 - Storage Structures 1 The Physical Store Storage Capacity Medium Transfer Rate Seek Time Main Memory 800 MB/s 500 MB Instant Hard Drive 10 MB/s 120 GB 10 ms CD-ROM

More information

.NET User Group Bern

.NET User Group Bern .NET User Group Bern Roger Rudin bbv Software Services AG roger.rudin@bbv.ch Agenda What is NoSQL Understanding the Motivation behind NoSQL MongoDB: A Document Oriented Database NoSQL Use Cases What is

More information

MySQL Storage Engines

MySQL Storage Engines MySQL Storage Engines Data in MySQL is stored in files (or memory) using a variety of different techniques. Each of these techniques employs different storage mechanisms, indexing facilities, locking levels

More information

Understanding NoSQL Technologies on Windows Azure

Understanding NoSQL Technologies on Windows Azure David Chappell Understanding NoSQL Technologies on Windows Azure Sponsored by Microsoft Corporation Copyright 2013 Chappell & Associates Contents Data on Windows Azure: The Big Picture... 3 Windows Azure

More information

Geodatabase Programming with SQL

Geodatabase Programming with SQL DevSummit DC February 11, 2015 Washington, DC Geodatabase Programming with SQL Craig Gillgrass Assumptions Basic knowledge of SQL and relational databases Basic knowledge of the Geodatabase We ll hold

More information

NoSQL Databases. Nikos Parlavantzas

NoSQL Databases. Nikos Parlavantzas !!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!

More information

Design Patterns for Distributed Non-Relational Databases

Design Patterns for Distributed Non-Relational Databases Design Patterns for Distributed Non-Relational Databases aka Just Enough Distributed Systems To Be Dangerous (in 40 minutes) Todd Lipcon (@tlipcon) Cloudera June 11, 2009 Introduction Common Underlying

More information

Evaluation of NoSQL databases for large-scale decentralized microblogging

Evaluation of NoSQL databases for large-scale decentralized microblogging Evaluation of NoSQL databases for large-scale decentralized microblogging Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Decentralized Systems - 2nd semester 2012/2013 Universitat Politècnica

More information

Department of Software Systems. Presenter: Saira Shaheen, 227233 saira.shaheen@tut.fi 0417016438 Dated: 02-10-2012

Department of Software Systems. Presenter: Saira Shaheen, 227233 saira.shaheen@tut.fi 0417016438 Dated: 02-10-2012 1 MongoDB Department of Software Systems Presenter: Saira Shaheen, 227233 saira.shaheen@tut.fi 0417016438 Dated: 02-10-2012 2 Contents Motivation : Why nosql? Introduction : What does NoSQL means?? Applications

More information

How To Create A Table In Sql 2.5.2.2 (Ahem)

How To Create A Table In Sql 2.5.2.2 (Ahem) Database Systems Unit 5 Database Implementation: SQL Data Definition Language Learning Goals In this unit you will learn how to transfer a logical data model into a physical database, how to extend or

More information

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive

More information

Using Object Database db4o as Storage Provider in Voldemort

Using Object Database db4o as Storage Provider in Voldemort Using Object Database db4o as Storage Provider in Voldemort by German Viscuso db4objects (a division of Versant Corporation) September 2010 Abstract: In this article I will show you how

More information

How graph databases started the multi-model revolution

How graph databases started the multi-model revolution How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015 Welcome to Big Data 90% of the data in the world today has been created in the

More information

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

No-SQL Databases for High Volume Data

No-SQL Databases for High Volume Data Target Conference 2014 No-SQL Databases for High Volume Data Edward Wijnen 3 November 2014 The New Connected World Needs a Revolutionary New DBMS Today The Internet of Things 1990 s Mobile 1970 s Mainfram

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

Synchronous multi-master clusters with MySQL: an introduction to Galera

Synchronous multi-master clusters with MySQL: an introduction to Galera Synchronous multi-master clusters with : an introduction to Galera Henrik Ingo OUGF Harmony conference Aulanko, Please share and reuse this presentation licensed under Creative Commonse Attribution license

More information

nosql and Non Relational Databases

nosql and Non Relational Databases nosql and Non Relational Databases Image src: http://www.pentaho.com/big-data/nosql/ Matthias Lee Johns Hopkins University What NoSQL? Yes no SQL.. Atleast not only SQL Large class of Non Relaltional Databases

More information

Services. Relational. Databases & JDBC. Today. Relational. Databases SQL JDBC. Next Time. Services. Relational. Databases & JDBC. Today.

Services. Relational. Databases & JDBC. Today. Relational. Databases SQL JDBC. Next Time. Services. Relational. Databases & JDBC. Today. & & 1 & 2 Lecture #7 2008 3 Terminology Structure & & Database server software referred to as Database Management Systems (DBMS) Database schemas describe database structure Data ordered in tables, rows

More information

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013 Database Management System Choices Introduction To Database Systems CSE 373 Spring 2013 Outline Introduction PostgreSQL MySQL Microsoft SQL Server Choosing A DBMS NoSQL Introduction There a lot of options

More information

Database Design Patterns. Winter 2006-2007 Lecture 24

Database Design Patterns. Winter 2006-2007 Lecture 24 Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each

More information

The Apache Cassandra storage engine

The Apache Cassandra storage engine The Apache Cassandra storage engine Sylvain Lebresne (sylvain@.com) FOSDEM 12, Brussels 1. What is Apache Cassandra 2. Data Model 3. The storage engine 1. What is Apache Cassandra 2. Data Model 3. The

More information

Understanding NoSQL on Microsoft Azure

Understanding NoSQL on Microsoft Azure David Chappell Understanding NoSQL on Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Data on Azure: The Big Picture... 3 Relational Technology: A Quick

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

database abstraction layer database abstraction layers in PHP Lukas Smith BackendMedia smith@backendmedia.com

database abstraction layer database abstraction layers in PHP Lukas Smith BackendMedia smith@backendmedia.com Lukas Smith database abstraction layers in PHP BackendMedia 1 Overview Introduction Motivation PDO extension PEAR::MDB2 Client API SQL syntax SQL concepts Result sets Error handling High level features

More information

Referential Integrity in Cloud NoSQL Databases

Referential Integrity in Cloud NoSQL Databases Referential Integrity in Cloud NoSQL Databases by Harsha Raja A thesis submitted to the Victoria University of Wellington in partial fulfilment of the requirements for the degree of Master of Engineering

More information

Big Data with Component Based Software

Big Data with Component Based Software Big Data with Component Based Software Who am I Erik who? Erik Forsberg Linköping University, 1998-2003. Computer Science programme + lot's of time at Lysator ACS At Opera Software

More information

HBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services

HBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services HBase Schema Design NoSQL Ma4ers, Cologne, April 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera ConsulFng on Hadoop projects (everywhere) Apache Commi4er HBase and Whirr

More information

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster. MongoDB 1. Introduction MongoDB is a document-oriented database, not a relation one. It replaces the concept of a row with a document. This makes it possible to represent complex hierarchical relationships

More information

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Cloud DBMS: An Overview Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Outline Definition and requirements S through partitioning A through replication Problems of traditional DDBMS Usage analysis: operational

More information

Apache Cassandra 1.2

Apache Cassandra 1.2 Apache Cassandra 1.2 Documentation January 21, 2016 Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation 2016 DataStax, Inc. All rights reserved.

More information

NOSQL DATABASES AND CASSANDRA

NOSQL DATABASES AND CASSANDRA NOSQL DATABASES AND CASSANDRA Semester Project: Advanced Databases DECEMBER 14, 2015 WANG CAN, EVABRIGHT BERTHA Université Libre de Bruxelles 0 Preface The goal of this report is to introduce the new evolving

More information

A Shared-nothing cluster system: Postgres-XC

A Shared-nothing cluster system: Postgres-XC Welcome A Shared-nothing cluster system: Postgres-XC - Amit Khandekar Agenda Postgres-XC Configuration Shared-nothing architecture applied to Postgres-XC Supported functionalities: Present and Future Configuration

More information

LARGE-SCALE DATA STORAGE APPLICATIONS

LARGE-SCALE DATA STORAGE APPLICATIONS BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Case study: CASSANDRA

Case study: CASSANDRA Case study: CASSANDRA Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu Cassandra:

More information

Hands-on Cassandra. OSCON July 20, 2010. Eric Evans eevans@rackspace.com @jericevans http://blog.sym-link.com

Hands-on Cassandra. OSCON July 20, 2010. Eric Evans eevans@rackspace.com @jericevans http://blog.sym-link.com Hands-on Cassandra OSCON July 20, 2010 Eric Evans eevans@rackspace.com @jericevans http://blog.sym-link.com 2 Background Influential Papers BigTable Strong consistency Sparse map data model GFS, Chubby,

More information

Going Native With Apache Cassandra. NoSQL Matters, Cologne, 2014 www.datastax.com @DataStaxEU

Going Native With Apache Cassandra. NoSQL Matters, Cologne, 2014 www.datastax.com @DataStaxEU Going Native With Apache Cassandra NoSQL Matters, Cologne, 2014 www.datastax.com @DataStaxEU About Me Johnny Miller Solutions Architect @CyanMiller www.linkedin.com/in/johnnymiller We are hiring www.datastax.com/careers

More information

Distributed Data Stores

Distributed Data Stores Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High

More information