NoSQL Database Options



Similar documents
these three NoSQL databases because I wanted to see a the two different sides of the CAP

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

A survey of big data architectures for handling massive data

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

INTRODUCTION TO CASSANDRA

Can the Elephants Handle the NoSQL Onslaught?

Transactions and ACID in MongoDB

NoSQL Databases. Nikos Parlavantzas

Structured Data Storage

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Cloud Scale Distributed Data Storage. Jürmo Mehine

How To Handle Big Data With A Data Scientist

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Introduction to Apache Cassandra

NoSQL. Thomas Neumann 1 / 22

The Quest for Extreme Scalability

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Lecture Data Warehouse Systems

nosql and Non Relational Databases

Distributed Systems. Tutorial 12 Cassandra

Data Modeling for Big Data

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL Databases. Polyglot Persistence


Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

NoSQL in der Cloud Why? Andreas Hartmann

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

Cassandra vs MySQL. SQL vs NoSQL database comparison

Big Systems, Big Data

How To Scale Out Of A Nosql Database

An Approach to Implement Map Reduce with NoSQL Databases

Understanding NoSQL on Microsoft Azure

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment

NoSQL Data Base Basics

Understanding NoSQL Technologies on Windows Azure

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Introduction to NOSQL

NoSQL Systems for Big Data Management

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

NoSQL: Going Beyond Structured Data and RDBMS

Cloud Computing at Google. Architecture

Data storing and data access

GigaSpaces Real-Time Analytics for Big Data

Cassandra A Decentralized Structured Storage System

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

2.1.5 Storing your application s structured data in a cloud database

So What s the Big Deal?

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Open Source Technologies on Microsoft Azure

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Big Data and Data Science: Behind the Buzz Words

Scalable Architecture on Amazon AWS Cloud

InfiniteGraph: The Distributed Graph Database

Preparing Your Data For Cloud

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

An Open Source NoSQL solution for Internet Access Logs Analysis

Integrating Big Data into the Computing Curricula

The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service

BIG DATA TOOLS. Top 10 open source technologies for Big Data

NoSQL. What Is NoSQL? Why NoSQL?

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Practical Cassandra. Vitalii

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

In Memory Accelerator for MongoDB

Big Data with Component Based Software

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Challenges for Data Driven Systems

Implement Hadoop jobs to extract business value from large and varied data sets

["Sam Stelfox", " ], ["Gabe Koss", " ] }'

Benchmarking and Analysis of NoSQL Technologies

NOSQL DATABASES AND CASSANDRA

Slave. Master. Research Scholar, Bharathiar University

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

Oracle Big Data SQL Technical Update

Referential Integrity in Cloud NoSQL Databases

Advanced Data Management Technologies

Transcription:

NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has the most promise to be beneficial to me in my endeavors as a game developer. I chose Riak because I want to learn more about key-value stores and my other two choices were both document stores. For my analysis, I will be comparing the systems as potential storage options for a (potentially non-technical) direct supervisor. Analysis: MongoDB MongoDB was created by a company called 10gen in 2007, who changed their name to MongoDB Inc. following their document store's wild success [1]. The product went open source in 2009 and has since gained great popularity. MongoDB's data model is document-oriented, and is highly flexible. For example, we can retrieve data by looking it up using regular expressions. This means we can pattern match instead of knowing exactly what we're looking for! In addition, every field can be indexed in MongoDB, which will give us similar behavior to our existing relational databases so you shouldn't worry about losing any existing functionality. MongoDB stores data on disk, and scales horizontally through a process they call sharding [2]. In practice, this means it spreads our data across different servers in a redundant way so that if one server dies, no data is lost. We can also add new servers to the database without having to shut it down! This way we can add more space without having to suspend our services. In the CAP model, we would describe MongoDB as sacrificing C or immediate consistency for being partition safe (hardware failures do not affect data availability) and available. Eventual

consistency is achieved through changes being propagated across all servers, but before those changes to spread, it is possible to retrieve stale data from the database. This means that MongoDB is useful for running many web services that value availability over consistency, but this is a shortcoming worth keeping in mind. MongoDB also is only ACID compliant per document, not per transaction as we would think of it in relational terms. As previously mentioned, MongoDB uses sharding to handle scaling issues. Given its popularity in the web industry, it's safe to say MongoDB scales well. Having said that, many companies, including Netflix, chose Cassandra over MongoDB exactly because of scalability. So it may not be the best, but it definitely far outshines relational databases. Analysis: Cassandra Cassandra started off as a column family store created by Facebook employees to drive their Inbox Search functionality. They open sourced the project in 2008, and the Apache Foundation picked it up and carried it forward. Today, Cassandra has evolved into a partitioned row store. Cassandra uses Cassandra Query Language (CQL), which looks and feels like SQL which is quite helpful for existing relational database programmers. Following this parallel, a column family in Cassandra is similar to a table in a relational database. It also has supported MapReduce functionality since version 0.6. Cassandra stores data on disk across multiple nodes in a multi-server environment. The disk storage is organized into tables that are distributed, multi-dimensional maps indexed by key. Cassandra, like MongoDB in practice, settles for eventual consistency, focusing on accuracy and partition safety. However, unlike MongoDB, Cassandra supports tunable consistency [3]. This means that the database administrator can manually make accuracy and consistency tradeoffs by telling the master node how many nodes needs to be updated before the new data can be considered updated. In addition, Cassandra supports fully ACID transactions since Cassandra 2.0.

When it comes to scaling, Cassandra seems to be the name of the game. Netflix chose Cassandra over the other two databases analyzed here entirely because of scalability. It supports horizontal scaling (adding new machines to be used) while the database is running. FamilySearch is migrating to it away from Oracle relational databases to handle live scaling, as they have weekly spikes in traffic that greatly multiplies their live accesses. Analysis: Riak Riak is a key-value storage system developed by people who moved from Akamai to Basho Technologies. Its first release was in August of 2009. Their goal was to create a web product that happened to use their own custom datastore on the backend. When the datastore created more interest than their web product, they decided to center their efforts on that, which became Riak. Since 2009, Riak has matured to offer adaptive CAP approaches where eventual or immediate consistency can be supported. Riak uses a REST-ful API for its basic operations, such as PUT, GET, DELETE and POST. It also allows for MapReduce use. In Riak, values are stored as key-value pairs and can be used in memory, stored on disk, or both. As with the other options discussed here, data is stored across multiple nodes on a network. Keys are located in near-constant time by hashing keys for lookup. Riak offers tunable consistency, similarly to Cassandra, but per bucket of key values. This allows it to have eventual consistency, or immediate. As far as ACID goes, Riak does not support atomic transactions, and is therefore not ACID compliant. Riak, like many other NoSQL databases, was intended to be used across a network with many nodes for redundancy. While the free version stops there, Riak Enterprise can duplicate data across multiple data centers, not just across multiple servers in one center. This puts it on the same scale as Cassandra, but colloquial comparisons made by business leaders between these two say that Riak doesn't scale quite as well.

Difference Comparison When comparing these three NoSQL databases, the first factor that comes up is scalability. All three of them offer network scaling, but Cassandra seems to outshine the other two by the opinions of large companies who have compared these databases for large-scale use. In terms of data mining use, all three options support MapReduce. In addition, MongoDB supports regex lookup, which neither of the other options explicitly support. Conclusion In conclusion, I believe that for a general business use, Cassandra would be the best option. It has the best scalability, lowering the chances of needing a massive database overhaul or migration in the future. It also has the use of a SQL-like language that will make re-training of existing SQL developers much easier while still allowing complex queries without using MapReduce. Finally, it also seems to have the best adjustment of its consistency, at least better than MongoDB, maybe not Riak, which could come in handy with changing business needs.

REFERENCES: [1] Harris, Derrick. 10gen embraces what it created, becomes MongoDB Inc. Gigaom Research, August 27, 2013. https://gigaom.com/2013/08/27/10gen-embraces-what-it-created-becomes-mongodbinc/ [2] MongoDB, Inc. Sharding and MongoDB https://docs.mongodb.org/manual/sharding/ [3] Configuring Data Consistency Datastax Documentation, 12 October 2015. http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html