NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases
Background
Inspiration: postgresapp.com
demo.beatstream.fi (modern desktop browsers without Flash block)
Backend Very light Access & modify data Relay commands to Last.fm Currently Ruby on Rails So, we ll focus on the backend
My Super Complex Data Model
Copy the song row Songs Playlists User s Relation?
Data
Users Few users Not many fields: username, email, password, lastfm_key Read when user logs in Write rarely Because users have their own playlists
Songs Huge list of objects (array) Read after user logs in Write rarely Basically just meta-data about a song: title, artist, album, tracknum, path, etc
Playlists Possibly Huge lists of objects (arrays) Read a good amount during a day Write a lot (probably) Owned by a user Contains songs Also need to track song s position on the playlist
SQLite
Why I chose SQLite Easy-to-use, simple, familiar Ready-to-use on new Rails project Simple data model get a simple DBMS? Can easily implement CRUD, playlist sorting, etc. Great for rapid prototyping Doesn t require separate server installation
Why Replace SQLite?
Why Replace? 1. Fear of Bad Concurrency Multiple users + SQLite = Bad memories Writing playlists takes time Writing the songs list takes time SQLite locks up or corrupts data Sadness 2. Try something new Schemaless == even better for prototyping?
Sidenote
At The Moment Moved to JSON files Songs and Playlists are in JSON files Users are in SQLite SQLite was just getting in the way SQLite <--> SQL result <--> JSON Keep It Simple But this feels kinda icky
songs.json Over 9000 objects
Back to regular programme
Features We Want Standalone / embeddable / portable Can embed into application, invisible to Beatstream server user or admin No separate server installation etc. Simple, easy-to-use SQLite-like performance or better Lightweight No availability, concurrency or consistency problems The database can t be the reason a song won t play When adding a song to a playlist, it should be there and be there always The frontend asks only once which songs are in the DB and they should be there always
Features We Want (2) Can store whole song library Also read it all fast (or we cache it) No need for sorting Can store sort information somehow Sorting of playlists & playlist songs Not as obvious as it sounds Can fetch only certain user s playlists Relational data! (Please, work with Ruby or JRuby)
NoSQL
CAP Consistency Availability Partition tolerance
CAP (2) CA seems best for us Partition Tolerance is useless But it s the core idea of NoSQL? Allows horizontal scaling and people like that Some say CP == CA (source: http://dbmsmusings.blogspot.fi/2010/04/problems-with-cap-and-yahoos-little.html) AP might be ok too with eventual consistency MongoDB is like this and people use it for the strangest things
Screw CAP
Why NoSQL? Data is simple, just lists of objects Not relational data Songs, Playlists, Users Don t need big queries, joins, analytics, versioning, or anything super- Document-oriented systems seem nice for this Maybe JSON-oriented?
Why NoSQL? (2) try to limit the work done over your data and just store it, then retrieve it and show it to the user, do not over process the information. Manipulate JSON on the user interface and send it to the database with few or even none modification. djondb (http://djondb.com/documentation.html) I like this idea.
Why NoSQL? (3) Standalone / embeddable / portable Most SQLite replacement suggestions were NoSQL systems Schemaless Think different
Why NoSQL? (4) NoSQL systems usually concentrate on performance and scalability I m not really concerned about those things right now Maybe should not pick NoSQL then?
Why NoSQL? (5) Try new things Experiment It s what the cool kids use And in the end Tech doesn t matter. Until it does.
Choices
Criteria STOP
Criteria Beatstream s backend is small-scale with: Less than 100,000 rows Performance rarely a problem No horizontal scaling And I m stressing over database choice?
Criteria
Criteria
Criteria Standalone / embeddable / portable (no separate server software) Lightweight Keeps It Simple Can store and access our data easily Key-value, document-oriented, column-oriented, or something which fits Good performance No availability, concurrency or consistency problems (Works with Ruby, or with Java for use with JRuby)
Criteria (2) In the future: Someone might create a Spotify competitor using Beatstream with millions of users Scaling etc., becomes important, but it s not important now
Choices Key-value Kyoto Cabinet LevelDB
Choices (2) Document-oriented MongoDB CouchDB RavenDB Terrastore
Choices (3) Other db4o
Findings
Kyoto Cabinet
Kyoto Cabinet Key-value store Standalone file-based database (also in-memory) Support for many languages (Ruby, Java, C#, PHP, etc.) Popular (community) Hash table or B+ tree based Can t decide which one would be better for Beatstream, have to test Hash table: random sorted not an issue, sorting in frontend Source: http://fallabs.com/kyotocabinet/
Kyoto Cabinet (2) Notes: Replace SQLite on apps that store simple data How do I store song and playlist data in a key-value store? Need two collections/tables: songs and playlists Own database files for songs and playlists? key: filepath --> value: song meta-data as JSON?
LevelDB
LevelDB Key-value store Standalone file-based database Support for many languages (C/C++, Ruby, Java) Built by Google for use in Google Chrome Sorting by key Fast write & read, slow if value is large Sources: http://code.google.com/p/leveldb/, http://en.wikipedia.org/wiki/leveldb
LevelDB (2) Notes: Same as with Kyoto Cabinet: good for simple use, but how do I use key-value with Beatstream?
MongoDB
MongoDB Document-oriented JSON-style documents! Collections are logical and easy! For many languages (C/C++, Ruby, Java, C#, PHP, ) Easy to use, simple Mongo is a schemaless relational database Some people Indexes instead of map/reduce functions Active community Lots of plugins, etc.
MongoDB (2) Notes: Crash right after successful write: might lose data Embedding is not simple, need to build/find a C++ wrapper & launch DB process in app. source: http://stackoverflow.com/questions/6115637/can-mongodb-be-used-as-an-embedded-database
CouchDB
Document-oriented CouchDB For many languages
CouchDB (2) Notes: I would have a HTTP API accessing a HTTP API? Embedding is hard Need to install Erlang somehow on the user s computer
RavenDB
RavenDB Document-oriented Standalone directory/file-based database For.NET and Javascript (NodeJS) Detailed info on how RavenDB works, listen: http://herdingcode.com/wp-content/uploads/herdingcode-0083-ayende-rahien-on- RavenDB.mp3
RavenDB (2) Notes: No Ruby support :(
Terrastore
Terrastore Document-oriented For Java, maybe can attach to JRuby? Main feature is scalability without sacrificing consistency Seems easy to use
Terrastore (2) Notes: Could not find how to run standalone
db4o
db4o Object database For Java Simple, easy Sidenote: Works on Android out-of-the-box
db4o (2) Notes: Known issue with objects duplicating by itself sometimes
Eliminated Choices
Eliminated Choices Berkeley DB No time to investigate Cassandra the right choice when you need scalability and high availability SimpleDB Optimized to provide high availability Not really standalone / embeddable / portable, but in the cloud and invisible djondb Not standalone / embeddable / portable Couchbase Could not find a way to run embed, maybe it s the same as with CouchDB
Conclusions
Conclusions NoSQL systems promote their horizontal scalability, replication, sharding, etc. Features I don t really care about right now Feels like I m looking at the wrong thing in the wrong place (for Beatstream at least) Only time will tell
Conclusions (2) MongoDB CouchDB Kyoto Cabinet
Conclusions (3) Simple key-values in SQLite: Kyoto Cabinet and LevelDB seem like excellent replacements Use cases: Queue, word dictionary, user database, document database, session management, CMS cache
Conclusions (4) More complex relations: Have a look at MongoDB, CouchDB or RavenDB (.NET)
Extra Later: Convert Users-table in SQLite to the new NoSQL database Songs can be re-created Playlists is a new feature, hasn t been released
Extra (2) Redis could be embeddable: [communicate over unix domain socket] you can fork your main process, then run one of the exec*() functions in the child to start Redis. source: http://code.google.com/p/redis/issues/detail?id=276
Thx! ajk.im @darep
Links! http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis http://blog.nahurst.com/visual-guide-to-nosql-systems http://www.cs.tut.fi/~tjm/seminars/nosql2012/nosql-intro.pdf https://speakerdeck.com/u/kplawver/p/nosql-an-introduction http://fallabs.com/kyotocabinet/rubydoc/ http://blog.creapptives.com/post/8330476086/leveldb-vs-kyoto-cabinet-my-findings http://herdingcode.com/wp-content/uploads/herdingcode-0083-ayende-rahien-on- RavenDB.mp3