Cloud Scale Distributed Data Storage. Jürmo Mehine

Similar documents

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL Databases. Nikos Parlavantzas

Structured Data Storage

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Domain driven design, NoSQL and multi-model databases

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Preparing Your Data For Cloud

Applications for Big Data Analytics

NoSQL: Going Beyond Structured Data and RDBMS

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Slave. Master. Research Scholar, Bharathiar University

Advanced Data Management Technologies

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

3 Case Studies of NoSQL and Java Apps in the Real World

A Study of NoSQL and NewSQL databases for data aggregation on Big Data

Understanding NoSQL Technologies on Windows Azure

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Sentimental Analysis using Hadoop Phase 2: Week 2

Open Source Technologies on Microsoft Azure

Enterprise Operational SQL on Hadoop Trafodion Overview

Databases 2 (VU) ( )

NoSQL Evaluation. A Use Case Oriented Survey

Understanding NoSQL on Microsoft Azure

NoSQL Data Base Basics

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

Big Data Analytics. Rasoul Karimi

How To Scale Out Of A Nosql Database

@tobiastrelle. codecentric AG 1

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

NOSQL DATABASE SYSTEMS

Making Sense of NoSQL Dan McCreary Wednesday, Nov. 13 th 2014

MongoDB Developer and Administrator Certification Course Agenda

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Challenges for Data Driven Systems

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

NoSQL Databases. Polyglot Persistence

NOSQL DATABASES AND CASSANDRA

NoSQL Roadshow Berlin Kai Spichale

How To Handle Big Data With A Data Scientist

Integrating Big Data into the Computing Curricula

Choosing the right NoSQL database for the job: a quality attribute evaluation

NoSQL. What Is NoSQL? Why NoSQL?

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

NoSQL and Graph Database

Lecture Data Warehouse Systems

Journal of Cloud Computing: Advances, Systems and Applications

An Open Source NoSQL solution for Internet Access Logs Analysis

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Distributed Storage Systems

MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3

Institutionen för datavetenskap Department of Computer and Information Science

How To Use Big Data For Telco (For A Telco)

An Approach to Implement Map Reduce with NoSQL Databases

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Scalable SQL and NoSQL Data Stores

Introduction to NoSQL

Introduction to NOSQL

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

NoSQL Database Systems and their Security Challenges

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

So What s the Big Deal?

REAL-TIME BIG DATA ANALYTICS

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

NOSQL stores, graph databases and data analytics tools

The Quest for Extreme Scalability

Schema Extraction of Document Database - MongoDB. Master of Engineering. Computer Science and Engineering. Submitted By.

NoSQL Database Options

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

INTRODUCTION TO CASSANDRA

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

Open source large scale distributed data management with Google s MapReduce and Bigtable

Application of NoSQL Database in Web Crawling

Can the Elephants Handle the NoSQL Onslaught?

Document Oriented Database

Cloud Big Data Architectures

NoSQL in der Cloud Why? Andreas Hartmann

Data Modeling for Big Data

NoSQL and Hadoop Technologies On Oracle Cloud

Transcription:

Cloud Scale Distributed Data Storage Jürmo Mehine 2014

Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented Column family Graph Five databases Riak MongoDB CouchDB HBase Cassandra 2/28

The Relational Model Data in tables (relations) Connections between data are expressed as foreign key references between rows The expected format of the data is specified with a schema Data is accessed using SQL 3/28

Database Scaling Vertical scaling on one machine Horizontal scaling across a cluster of machines Because of the way data is structured in relational databases, the relational model does not scale well horizontally 4/28

Keys, Values and Aggregates Non-relational data models are based on a simpler keyvalue structure An aggregate is a collection of data that is treated as a unit E.g. a customer and all of his orders In normalized relational databases aggregates are computed using JOIN operations Key-value, document-oriented and column family are said to be aggregate-oriented data models because they store aggregated data together Keyed aggregates make for a natural unit of data sharding 5/28

Benefits of the Key-value Model Horizontal scalability (suitable for cloud computing) Higher access speeds Flexible schemaless models suitable for unstructured data More suitable for Object Oriented languages 6/28

The NoSQL Movement NoSQL is a broad term with no clear boundaries Emergence of persistence solutions using nonrelational data models Schemaless data models Driven partly by the rise of cloud computing Simpler key-value based data models scale better horizontally than the relational model Tradeoff between data consistency and high availability 7/28

The NoSQL Landscape Distributed key-value based data stores Embedded non-relational data stores (LevelDB, Tokyo Cabinet) In-memory data stores (Redis, Memcached) Hosted services (Amazon S3, DynamoDB) Data analysis and processing platforms/tools (Hadoop/HDFS, Pig, Hive) Object databases (ZODB, GemStone) NewSQL (Drizzle, SQLFire) 8/28

Four Non-relational Data Models 9/28

The Key-value Model Data stored as keyvalue pairs The value is an opaque blob to the database Examples: Dynamo, Riak 10/28

The Document-oriented Model Data stored as keyvalue pairs Values have further structure in the form of fields and values Examples: CouchDB, MongoDB 11/28

Example Aggregates described in JSON using map and array data structures 12/28

The Column Family Model Data stored in large tabular structures (sparse matrix) Columns grouped into column families A record or row can be thought of as a two-level map Columns can be added at any time Examples: BigTable, Cassandra, HBase http://www.datastax.com/docs/1.1/ddl/column_family 13/28

Example Aggregates as rows in a column family store 14/28

Graph Databases Data stored as nodes and edges of a graph The nodes and edges can have fields and values Aimed at graph traversal queries in connected data Examples: Neo4J, FlockDB 15/28

Non-relational Data Models In non-relational data stores data is denormalized It is common to also store JSON documents in key-value stores The key-value, document-oriented and column family models are aggregate oriented models The other models are based on the key-value model We will not talk about graph databases, because they are not essential to cloud storage In reality the classification of databases into different models is not as straight forward Multiparadigm databases also exist (e.g ArangoDB) 16/28

Five Databases 17/28

Riak Key-value Based on Amazon's Dynamo specification Distributed decentralized persistent hash table Consistent Hashing http://basho.com/products/riak-overview/ 18/28

Riak Gossip protocol MVCC using vector clocks RESTful query interface Secondary indexes as item metadata Links and link walking Riak search MapReduce in Erlang and JavaScript 19/28

MongoDB Document oriented (BSON) Query language based on JavaScript GridFS for BLOB file storage Master-slave architecture Linking between documents Supports MapReduce in JavaScript http://www.mongodb.org/ 20/28

CouchDB Document oriented (JSON) RESTful query interface Built in web server Web based query front-end Futon http://docs.couchdb.org/en/latest/ 21/28

CouchDB MapReduce is the primary query method (JavaScript and Erlang) Materialized views as results of incremental MapReduce jobs CouchApps javascript-heavy web applications built entirely on top of CouchDB without a separate web server for the logic layer 22/28

HBase Column family Master-slave architecture Built on top of Hadoop Data stored in HDFS A single table with several column families Supports Java MapReduce http://hbase.apache.org/ 23/28

Cassandra Column family Data model from BigTable, distribution model from Dynamo (decentralized) http://cassandra.apache.org/ 24/28

Cassandra Different implementation of column family than HBase The concept of super-columns Static and dynamic column families Dynamic column families as materialized views on data CQL for querying data 25/28

Polyglot Persistence http://martinfowler.com/bliki/polyglotpersistence.html 26/28

Conclusions In recent years there has been a rise in non-relational (NoSQL) data stores This is related to the rise of cloud computing key-value based models offer better horizontal scalability across a cluster of computers The NoSQL landscape is extremely varied The four basic non-relational data models are: The key-value model The document-oriented model The column family model Graph databases 27/28

28/28