The Quest for Extreme Scalability



Similar documents
NoSQL Data Base Basics

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

INTRODUCTION TO CASSANDRA

Cloud Scale Distributed Data Storage. Jürmo Mehine

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

NoSQL Database Options

Structured Data Storage

nosql and Non Relational Databases

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

Preparing Your Data For Cloud


Lecture Data Warehouse Systems

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Introduction to NOSQL

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Slave. Master. Research Scholar, Bharathiar University

A Distributed Storage Schema for Cloud Computing based Raster GIS Systems. Presented by Cao Kang, Ph.D. Geography Department, Clark University

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Data Modeling for Big Data

NoSQL Databases. Polyglot Persistence

Databases 2 (VU) ( )

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

How To Write A Database Program

How To Handle Big Data With A Data Scientist

Challenges for Data Driven Systems

Advanced Data Management Technologies

Yahoo! Cloud Serving Benchmark

The 3 questions to ask yourself about BIG DATA

Benchmarking and Analysis of NoSQL Technologies

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Can the Elephants Handle the NoSQL Onslaught?

MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3

Making Sense of NoSQL Dan McCreary Ann Kelly

these three NoSQL databases because I wanted to see a the two different sides of the CAP

An Open Source NoSQL solution for Internet Access Logs Analysis

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

NoSQL Systems for Big Data Management

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Open source large scale distributed data management with Google s MapReduce and Bigtable

Introduction to Big Data Training

Big Systems, Big Data

Big Data Analytics. Rasoul Karimi

NoSQL. Thomas Neumann 1 / 22

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Practical Cassandra. Vitalii

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

How To Scale Out Of A Nosql Database

So What s the Big Deal?

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

Introduction to NoSQL

The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service

How To Store Data In Nosql

Introduction to NoSQL

How graph databases started the multi-model revolution

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

CloudDB: A Data Store for all Sizes in the Cloud

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Institutionen för datavetenskap Department of Computer and Information Science

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

Understanding NoSQL Technologies on Windows Azure

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Using Cloud Services for Test Environments A case study of the use of Amazon EC2

Database Scalability and Oracle 12c

A Survey of Distributed Database Management Systems

Performance Analysis for NoSQL and SQL

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Data sharing in the Big Data era

Sentimental Analysis using Hadoop Phase 2: Week 2

Making Sense of NoSQL Dan McCreary Wednesday, Nov. 13 th 2014

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

NoSQL Databases. Nikos Parlavantzas

NoSQL. What Is NoSQL? Why NoSQL?

@tobiastrelle. codecentric AG 1

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Application of NoSQL Database in Web Crawling

NoSQL and Hadoop Technologies On Oracle Cloud

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Open Source Technologies on Microsoft Azure

Applications for Big Data Analytics

GigaSpaces Real-Time Analytics for Big Data

Completing the Big Data Ecosystem:

Which NoSQL Database? A Performance Overview

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Transcription:

The Quest for Extreme Scalability In times of a growing audience, very successful internet applications have all been facing the same database issue: while web servers can be multiplied without too many problems (scale out), this is not the case for relational databases. Sustaining a growing database workload requires either to buy more powerful hardware (scale up) or to rely on clustering abilities. Both solutions lead to increased complexity and costs. In this context, developers realized that relational databases and query languages might be the bottlenecks. Relational databases and query languages are fundamentally designed for stable workloads and complex data extraction, which are not as common with modern applications, where the ability to handle very large data sets while maintaining speed and scalability are actually more important. This realization lead to the creation of the NoSQL movement, based on innovative, open source, non-relational database systems designed to achieve specific requirements and manage extreme scalability on very large data sets.

Build to scale «NoSQL» is a label qualifying database management system that enables the implementation of databases that are not only based on SQL. NoSQL is usually associated with extreme performance and/or the ability to manage extremely large data sets. NoSQL emerged as a movement in early 2009 during a meet up organized in San Francisco to discuss the growing number of open source distributed database management systems that do not attempt to comply with ACID guarantees (atomicity, consistency, isolation, durability). There are close to 100 NoSQL open source projects being implemented today. Many NoSQL projects start by implementing a specific data structure to solve a specific problem that SQL databases can-not solve. Despite this common starting point, NoSQL databases vary quite a bit, reflecting the fact that NoSQL is more a label that qualifies a variety of atypical databases than a uniformed set of equivalent solutions. What are the main categories of NoSQL databases?

Extreme and affordable scalability data: Neo4J. There are actually four major types of NoSQL data models : - Key-value stores, which provide giant hashtable to store data, very useful for highaudience applications constantly broadcasting data to their users: Memcached, Redis, - Bigtable clones, storing data on a large, multi-dimensional sorted map, very useful to store, analyze and retrieve large amounts of data: HBase, Cassandra, - Document stores, designed for semistructured information: CouchDB, MongoDB, - Graph databases, probably the most experimental type, designed for graph-like According to Bruno Michel - lead developer of af83 R&D department, he has taken part in all NoSQL-related projects by af83 - «Solutions like Cassandra are designed as an effective answer to a specific problem, which is scalability with large data sets. Others, like MongoDB, are designed for Web application development in general, allowing more flexibility and performance. Others are meant to be solutions to very specific types of data, like graph databases or projects specialized in geographic data». What is the real life impact of this variety of approaches?

The right tool for the right context Bruno Michel says «Benefits of NoSQL solutions depend on use cases». When considering NoSQL, users are required to select the database that will be the best fit for their applications. Proper choice leads to higher performance, or lower cost. Olivier Desmoulin, founder of a geolocalized social network for foodies, certainly understands that: «We are serving up to 60,000 customers per day, using just one low cost server which not only serves the Rails applications, but also the whole MongoDB database. MongoDB s ability to handle geolocation was also very helpful. For a small start-up like us, NoSQL was critical to ensure scalability». There are many examples such as these. Bruno Michel: «MongoDB is used by very popular websites like bit.ly, foursquare or disqus for its ability to deliver performance and scalability by using sharding.» Despite the good news, choosing a NoSQL database is tough. According to Ori Pekelman, CTO of af83 with extensive experience on NoSQL technologies, having spearheaded the use of NoSQL on numerous projects with af83 customers, «There are more than a hundred NoSQL projects going on, most solutions are very new to the market, mature projects are only two or three years old, and the hierarchy is constantly moving. This is a time of better opportunities for customers, but the choice is tough». Bruno Michel «In the last three years, a lot of very promising solutions have appeared, some of them proved to be extremely hard to sustain, due to a very slow development or project instability». What could be the approach to select and leverage the power of NoSQL?

Our Recommendations 1. Assess your situation NoSQL databases were designed to handle very specific tasks: MongoDB was meant to be the database engine of a cloud-based application platform, Cassandra was designed to manage inbox search at Facebook, Memcached was designed to improve caching at Livejournal. As long as you are not going to implement your own NoSQL database management system, it is recommended that you clearly define your functional requirements first, and then proceed with finding the appropriate solution - using NoSQL or not: NoSQL should be used when scalability is a requirement, and avoided when running complex queries is the requirement. 2. Be realistic with your scalability requirements NoSQL is not appropriate for every application and project. Typical use cases are publicfacing internet applications with a really large audience, and very large data sets. Typical NoSQL databases range from dozens of gigabytes to petabytes. Very few applications fit that definition. Alternatively, smaller applications will benefit from the sometimes comparatively lower requirements of NoSQL databases, but should assess if they are able to sustain the NoSQL choice: NoSQL expertise is more difficult to find. 3. Check the performance thoroughly Ori Pekelman: «We have been testing dozens of solutions; performances go tenfold from one solution to another. Many benchmarks are available online, but due to the variety of approaches, it is sometimes very difficult to find proper return on experience on specific data volumes, read/write ratios or queries distribution». 4. Pay attention to the Open Source projects that is tied to the solutions Bruno Michel: «NoSQL solutions are only starting to be mature and production-ready, but customers need to be cautious as solutions and projects are not equal and evolve very quickly. Some solutions received a lot of visibility without the ability to deliver». 5. Check the availability of proper tools and documentation Bruno Michel: «An issue that may be underestimated is the status of tools and documentation related to your solutions. This sort of shortcoming leaves your project at risk, whatever the performance levels of the database.

Famous NoSQL users Facebook, Foursquare, Google and Yahoo are all NoSQL users. A quick search on Google will provide you a lot of coverage of their trial, errors and success toward NoSQL. NoSQL is not always one-stop data solution There are cases where NoSQL will be the data solution that will solve all your data requirements, but more often than not, NoSQL will only solve part of your requirements and you will need to implement a combination of solutions including NoSQL. Innovative features Depending on the implementation, NoSQL databases often include non-traditional features such as the ability to run in memory (also known as «NoDisk»), sharding or optimized mechanisms for geolocation. Required reading The Dynamo paper, about Amazon s own highly scalable data store The Bigtable paper, about Google s own DBMS SQL Databases Don t Scale The slideshow from the June 11, 2009 NoSQL meet up in San Francisco