NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases



Similar documents
Cloud Scale Distributed Data Storage. Jürmo Mehine

NoSQL Database Options

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

Sentimental Analysis using Hadoop Phase 2: Week 2

An Approach to Implement Map Reduce with NoSQL Databases

NoSQL Databases. Nikos Parlavantzas

Open Source Technologies on Microsoft Azure

Department of Software Systems. Presenter: Saira Shaheen, Dated:

these three NoSQL databases because I wanted to see a the two different sides of the CAP

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Mobile + HA + Cloud. Eugene Ciurana! ! pr3d4t0r - irc.freenode.net! ##java, ##security, #awk, #python, #bitcoin! irc.oftc.net: #tor, #tor-dev, #tails!

NoSQL Database Systems and their Security Challenges

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

MongoDB Developer and Administrator Certification Course Agenda

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

NoSQL: Going Beyond Structured Data and RDBMS

In Memory Accelerator for MongoDB

MONGODB - THE NOSQL DATABASE

Frictionless Persistence in.net with MongoDB. Mogens Heller Grabe Trifork

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Advanced Data Management Technologies

f...-. I enterprise Amazon SimpIeDB Developer Guide Scale your application's database on the cloud using Amazon SimpIeDB Prabhakar Chaganti Rich Helms

Structured Data Storage

An Open Source NoSQL solution for Internet Access Logs Analysis

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Document Oriented Database

Lecture Data Warehouse Systems

NoSQL, But Even Less Security Bryan Sullivan, Senior Security Researcher, Adobe Secure Software Engineering Team

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Domain driven design, NoSQL and multi-model databases

INTRODUCTION TO CASSANDRA

3 Case Studies of NoSQL and Java Apps in the Real World

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Transactions and ACID in MongoDB

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

Understanding NoSQL Technologies on Windows Azure

Applications for Big Data Analytics

Assignment # 1 (Cloud Computing Security)

Preparing Your Data For Cloud

Evolution of Web Application Architecture International PHP Conference. Kore Nordmann / <kore@qafoo.com> June 9th, 2015

MongoDB and Couchbase

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Practical Cassandra. Vitalii

Cloud Computing at Google. Architecture

Integrating Big Data into the Computing Curricula

What is a stack? Do I need to know?

Comparing SQL and NOSQL databases

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

PaaS - Platform as a Service Google App Engine

Scalable Architecture on Amazon AWS Cloud

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

Offerte del 13 giugno 2014

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

How graph databases started the multi-model revolution

GigaSpaces Real-Time Analytics for Big Data

Google Cloud Platform The basics


Lecture 6 Cloud Application Development, using Google App Engine as an example

.NET User Group Bern

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Data sharing in the Big Data era

The Cloud to the rescue!

Large-Scale Web Applications

Table of Contents. Développement logiciel pour le Cloud (TLC) Table of Contents. 5. NoSQL data models. Guillaume Pierre

Attacking MongoDB. Firstov Mihail

NoSQL in der Cloud Why? Andreas Hartmann

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

Lecture 21: NoSQL III. Monday, April 20, 2015

Learning Web App Development

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Big Data Visualization with JReport

Concepts of Database Management Seventh Edition. Chapter 7 DBMS Functions

nosql and Non Relational Databases

High Performance Ruby on Rails and MySQL. David Berube

Ad Hoc Analysis of Big Data Visualization

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

Process Automation Tools For Small Business

NoSQL and Hadoop Technologies On Oracle Cloud

Architecture Workshop

EWD: Simplifying Web Application Architecture

Embedded Analytics & Big Data Visualization in Any App

MySQL. Leveraging. Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

MongoDB. Or how I learned to stop worrying and love the database. Mathias Stearn. N*SQL Berlin October 22th, gen

THE WINDOWS AZURE PROGRAMMING MODEL

Olivier Caudron. Big Data and NoSQL

Big Data and Data Science: Behind the Buzz Words

Consistency Trade-offs for SDN Controllers. Colin Dixon, IBM February 5, 2014

CISC 432/CMPE 432/CISC 832 Advanced Database Systems

How To Write A Database Program

Start up Jobs Germany FEB 2014

Architecting Open source solutions on Azure. Nicholas Dritsas Senior Director, Microsoft Singapore

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Transcription:

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Background

Inspiration: postgresapp.com

demo.beatstream.fi (modern desktop browsers without Flash block)

Backend Very light Access & modify data Relay commands to Last.fm Currently Ruby on Rails So, we ll focus on the backend

My Super Complex Data Model

Copy the song row Songs Playlists User s Relation?

Data

Users Few users Not many fields: username, email, password, lastfm_key Read when user logs in Write rarely Because users have their own playlists

Songs Huge list of objects (array) Read after user logs in Write rarely Basically just meta-data about a song: title, artist, album, tracknum, path, etc

Playlists Possibly Huge lists of objects (arrays) Read a good amount during a day Write a lot (probably) Owned by a user Contains songs Also need to track song s position on the playlist

SQLite

Why I chose SQLite Easy-to-use, simple, familiar Ready-to-use on new Rails project Simple data model get a simple DBMS? Can easily implement CRUD, playlist sorting, etc. Great for rapid prototyping Doesn t require separate server installation

Why Replace SQLite?

Why Replace? 1. Fear of Bad Concurrency Multiple users + SQLite = Bad memories Writing playlists takes time Writing the songs list takes time SQLite locks up or corrupts data Sadness 2. Try something new Schemaless == even better for prototyping?

Sidenote

At The Moment Moved to JSON files Songs and Playlists are in JSON files Users are in SQLite SQLite was just getting in the way SQLite <--> SQL result <--> JSON Keep It Simple But this feels kinda icky

songs.json Over 9000 objects

Back to regular programme

Features We Want Standalone / embeddable / portable Can embed into application, invisible to Beatstream server user or admin No separate server installation etc. Simple, easy-to-use SQLite-like performance or better Lightweight No availability, concurrency or consistency problems The database can t be the reason a song won t play When adding a song to a playlist, it should be there and be there always The frontend asks only once which songs are in the DB and they should be there always

Features We Want (2) Can store whole song library Also read it all fast (or we cache it) No need for sorting Can store sort information somehow Sorting of playlists & playlist songs Not as obvious as it sounds Can fetch only certain user s playlists Relational data! (Please, work with Ruby or JRuby)

NoSQL

CAP Consistency Availability Partition tolerance

CAP (2) CA seems best for us Partition Tolerance is useless But it s the core idea of NoSQL? Allows horizontal scaling and people like that Some say CP == CA (source: http://dbmsmusings.blogspot.fi/2010/04/problems-with-cap-and-yahoos-little.html) AP might be ok too with eventual consistency MongoDB is like this and people use it for the strangest things

Screw CAP

Why NoSQL? Data is simple, just lists of objects Not relational data Songs, Playlists, Users Don t need big queries, joins, analytics, versioning, or anything super- Document-oriented systems seem nice for this Maybe JSON-oriented?

Why NoSQL? (2) try to limit the work done over your data and just store it, then retrieve it and show it to the user, do not over process the information. Manipulate JSON on the user interface and send it to the database with few or even none modification. djondb (http://djondb.com/documentation.html) I like this idea.

Why NoSQL? (3) Standalone / embeddable / portable Most SQLite replacement suggestions were NoSQL systems Schemaless Think different

Why NoSQL? (4) NoSQL systems usually concentrate on performance and scalability I m not really concerned about those things right now Maybe should not pick NoSQL then?

Why NoSQL? (5) Try new things Experiment It s what the cool kids use And in the end Tech doesn t matter. Until it does.

Choices

Criteria STOP

Criteria Beatstream s backend is small-scale with: Less than 100,000 rows Performance rarely a problem No horizontal scaling And I m stressing over database choice?

Criteria

Criteria

Criteria Standalone / embeddable / portable (no separate server software) Lightweight Keeps It Simple Can store and access our data easily Key-value, document-oriented, column-oriented, or something which fits Good performance No availability, concurrency or consistency problems (Works with Ruby, or with Java for use with JRuby)

Criteria (2) In the future: Someone might create a Spotify competitor using Beatstream with millions of users Scaling etc., becomes important, but it s not important now

Choices Key-value Kyoto Cabinet LevelDB

Choices (2) Document-oriented MongoDB CouchDB RavenDB Terrastore

Choices (3) Other db4o

Findings

Kyoto Cabinet

Kyoto Cabinet Key-value store Standalone file-based database (also in-memory) Support for many languages (Ruby, Java, C#, PHP, etc.) Popular (community) Hash table or B+ tree based Can t decide which one would be better for Beatstream, have to test Hash table: random sorted not an issue, sorting in frontend Source: http://fallabs.com/kyotocabinet/

Kyoto Cabinet (2) Notes: Replace SQLite on apps that store simple data How do I store song and playlist data in a key-value store? Need two collections/tables: songs and playlists Own database files for songs and playlists? key: filepath --> value: song meta-data as JSON?

LevelDB

LevelDB Key-value store Standalone file-based database Support for many languages (C/C++, Ruby, Java) Built by Google for use in Google Chrome Sorting by key Fast write & read, slow if value is large Sources: http://code.google.com/p/leveldb/, http://en.wikipedia.org/wiki/leveldb

LevelDB (2) Notes: Same as with Kyoto Cabinet: good for simple use, but how do I use key-value with Beatstream?

MongoDB

MongoDB Document-oriented JSON-style documents! Collections are logical and easy! For many languages (C/C++, Ruby, Java, C#, PHP, ) Easy to use, simple Mongo is a schemaless relational database Some people Indexes instead of map/reduce functions Active community Lots of plugins, etc.

MongoDB (2) Notes: Crash right after successful write: might lose data Embedding is not simple, need to build/find a C++ wrapper & launch DB process in app. source: http://stackoverflow.com/questions/6115637/can-mongodb-be-used-as-an-embedded-database

CouchDB

Document-oriented CouchDB For many languages

CouchDB (2) Notes: I would have a HTTP API accessing a HTTP API? Embedding is hard Need to install Erlang somehow on the user s computer

RavenDB

RavenDB Document-oriented Standalone directory/file-based database For.NET and Javascript (NodeJS) Detailed info on how RavenDB works, listen: http://herdingcode.com/wp-content/uploads/herdingcode-0083-ayende-rahien-on- RavenDB.mp3

RavenDB (2) Notes: No Ruby support :(

Terrastore

Terrastore Document-oriented For Java, maybe can attach to JRuby? Main feature is scalability without sacrificing consistency Seems easy to use

Terrastore (2) Notes: Could not find how to run standalone

db4o

db4o Object database For Java Simple, easy Sidenote: Works on Android out-of-the-box

db4o (2) Notes: Known issue with objects duplicating by itself sometimes

Eliminated Choices

Eliminated Choices Berkeley DB No time to investigate Cassandra the right choice when you need scalability and high availability SimpleDB Optimized to provide high availability Not really standalone / embeddable / portable, but in the cloud and invisible djondb Not standalone / embeddable / portable Couchbase Could not find a way to run embed, maybe it s the same as with CouchDB

Conclusions

Conclusions NoSQL systems promote their horizontal scalability, replication, sharding, etc. Features I don t really care about right now Feels like I m looking at the wrong thing in the wrong place (for Beatstream at least) Only time will tell

Conclusions (2) MongoDB CouchDB Kyoto Cabinet

Conclusions (3) Simple key-values in SQLite: Kyoto Cabinet and LevelDB seem like excellent replacements Use cases: Queue, word dictionary, user database, document database, session management, CMS cache

Conclusions (4) More complex relations: Have a look at MongoDB, CouchDB or RavenDB (.NET)

Extra Later: Convert Users-table in SQLite to the new NoSQL database Songs can be re-created Playlists is a new feature, hasn t been released

Extra (2) Redis could be embeddable: [communicate over unix domain socket] you can fork your main process, then run one of the exec*() functions in the child to start Redis. source: http://code.google.com/p/redis/issues/detail?id=276

Thx! ajk.im @darep

Links! http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis http://blog.nahurst.com/visual-guide-to-nosql-systems http://www.cs.tut.fi/~tjm/seminars/nosql2012/nosql-intro.pdf https://speakerdeck.com/u/kplawver/p/nosql-an-introduction http://fallabs.com/kyotocabinet/rubydoc/ http://blog.creapptives.com/post/8330476086/leveldb-vs-kyoto-cabinet-my-findings http://herdingcode.com/wp-content/uploads/herdingcode-0083-ayende-rahien-on- RavenDB.mp3