extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010



Similar documents
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Cloud Scale Distributed Data Storage. Jürmo Mehine

NoSQL Data Base Basics

Structured Data Storage

How To Write A Database Program

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Lecture Data Warehouse Systems

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL Database Options

Databases 2 (VU) ( )

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

Scalable SQL and NoSQL Data Stores


Challenges for Data Driven Systems

NoSQL Databases. Nikos Parlavantzas

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Can the Elephants Handle the NoSQL Onslaught?

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

NoSQL Systems for Big Data Management

Databases : Lecture 11 : Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

NoSQL Evaluation. A Use Case Oriented Survey

The Quest for Extreme Scalability

Scalable SQL and NoSQL Data Stores

Introduction to NoSQL

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Advanced Data Management Technologies

Data Services Advisory

NoSQL in der Cloud Why? Andreas Hartmann

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

nosql and Non Relational Databases

Sentimental Analysis using Hadoop Phase 2: Week 2

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

NoSQL. Thomas Neumann 1 / 22

LARGE-SCALE DATA STORAGE APPLICATIONS

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

3 Case Studies of NoSQL and Java Apps in the Real World

Google Bing Daytona Microsoft Research

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Open source large scale distributed data management with Google s MapReduce and Bigtable

CloudDB: A Data Store for all Sizes in the Cloud

Introduction to NOSQL

Choosing the right NoSQL database for the job: a quality attribute evaluation

How To Scale Out Of A Nosql Database

NoSQL Database Systems and their Security Challenges

DROSS Distributed & Resilient Open Source Software

Preparing Your Data For Cloud

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Understanding NoSQL Technologies on Windows Azure

Introduction to Big Data Training

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

New database architectures: steps toward Big Data processing

Amr El Abbadi. Computer Science, UC Santa Barbara

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Big Data With Hadoop

An Oracle White Paper February Hadoop and NoSQL Technologies and the Oracle Database

How To Handle Big Data With A Data Scientist

NoSQL: Going Beyond Structured Data and RDBMS

NoSQL, Big Data, and all that

Data Modeling for Big Data

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University

Applications for Big Data Analytics

multiparadigm programming Multiparadigm Data Storage for Enterprise Applications

Enterprise Operational SQL on Hadoop Trafodion Overview

NoSQL. What Is NoSQL? Why NoSQL?

Integrating Big Data into the Computing Curricula

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Distributed Databases

Domain driven design, NoSQL and multi-model databases

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

Survey of NoSQL Database Engines for Big Data

Big Data Technology CS , Technion, Spring 2013

Journal of Cloud Computing: Advances, Systems and Applications

BigFoot: Big Data Analytics of Digital Footprints

Cloud & Big Data a perfect marriage? Patrick Valduriez

Practical Cassandra. Vitalii

Benchmarking and Analysis of NoSQL Technologies

Database Design for NoSQL Systems

Open Source Technologies on Microsoft Azure

How To Improve Performance In A Database

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Configuration and Deployment Guide for the Cassandra NoSQL Data Store on Intel Architecture

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Mirror, mirror on the wall, what s the fairest database technology of all?

MapReduce with Apache Hadoop Analysing Big Data

Performance Analysis for NoSQL and SQL

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

Scaling Pinterest. Yash Nelapati Ascii Artist. Pinterest Engineering. Saturday, August 31, 13

Yahoo! Cloud Serving Benchmark

Transcription:

System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached O O O O O O key-val nosql 2004 MapReduce O O O O O O key-val batch 2005 CouchDB record MR O O document nosql 2006 BigTable (Hbase) record compat. w/mr / O O ext. record nosql 2007 MongoDB EC, record O O O O document nosql 2007 Dynamo O O O O O O key-val nosql 2008 Pig O O O / O tables sql-like 2008 HIVE O O O O tables sql-like 2008 Cassandra EC, record O O key-val nosql 2009 Voldemort O EC, record O O O O key-val nosql 2009 Riak EC, record MR O key-val nosql 2010 Dremel O O O / O tables sql-like 2011 Megastore entity groups O / O / tables nosql 2011 Tenzing O O O O tables sql-like 2011 Spark/Shark O O O O tables sql-like 2012 Spanner? tables sql-like 2012 Accumulo record compat. w/mr / O O ext. record nosql 2013 Impala O O O O tables sql-like Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010 extensible record stores document stores key-value stores 5/18/2013 Bill Howe, UW 1

Rick Cattel 2010 Terminology Document = nested values, extensible records (think XML or JSON) Extensible record = families of attributes have a schema, but new attributes may be added Key-Value object = a set of key-value pairs. No schema, no exposed nesting 5/18/2013 Bill Howe, UW 2

Rick Cattel 2010 NoSQL Features Ability to horizontally scale simple operation throughput over many servers Simple = key lookups, read/write of 1 or few records The ability to replicate and partition data over many servers Consider sharding and horizontal partitioning to be synonyms A simple API no query language A weaker concurrency model than ACID transactions Efficient use of distributed indexes and RAM for data storage The ability to dynamically add new attributes to data records 5/18/2013 Bill Howe, UW 3

Rick Cattel 2010 ACID v.s. BASE ACID = Atomicity, Consistency, Isolation, and Durability BASE = Basically Available, Soft state, Eventually consistent Don t use BASE it didn t stick. Aside: Consistency: Any data written to the database must be valid according to all defined rules 5/18/2013 Bill Howe, UW 4

Rick Cattel 2010 Major Impact Systems (Rick Cattel) Memcached demonstrated that in-memory indexes can be highly scalable, distributing and replicating objects over multiple nodes. Dynamo pioneered the idea of [using] eventual consistency as a way to achieve higher availability and scalability: data fetched are not guaranteed to be up-to-date, but updates are guaranteed to be propagated to all nodes eventually. BigTable demonstrated that persistent record storage could be scaled to thousands of nodes, a feat that most of the other systems aspire to. 5/18/2013 Bill Howe, UW 5

System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached O O O O O O key-val nosql 2004 MapReduce O O O O O O key-val batch 2005 CouchDB record MR O O document nosql 2006 BigTable (Hbase) record compat. w/mr / O O ext. record nosql 2007 MongoDB EC, record O O O O document nosql 2007 Dynamo O O O O O O key-val nosql 2008 Pig O O O / O tables sql-like 2008 HIVE O O O O tables sql-like 2008 Cassandra EC, record O O key-val nosql 2009 Voldemort O EC, record O O O O key-val nosql 2009 Riak EC, record MR O key-val nosql 2010 Dremel O O O / O tables sql-like 2011 Megastore entity groups O / O / tables nosql 2011 Tenzing O O O O tables sql-like 2011 Spark/Shark O O O O tables sql-like 2012 Spanner? tables sql-like 2012 Accumulo record compat. w/mr / O O ext. record nosql 2013 Impala O O O O tables sql-like 5/18/2013 Bill Howe, UW 6

Memcached Main-memory caching service basic system: no persistence, replication, fault-tolerance Many extensions provide these features Ex: membrain, membase Mature system, still in wide use Important concept: consistent hashing 5/18/2013 Bill Howe, UW 7

Regular Hashing Example: N=3 Assign M data keys to N servers key 0 -> server 0 assign each key to server = k mod N key 1 -> server 1 key 2 -> server 2 key 3 -> server 0 key 4 -> server 1 data keys k0 = 367 k1 = 452 k2 = 776 server 1 2 3. 64 What happens if I increase the number of servers from N to 2N? Every existing key needs to be remapped, and we re screwed. 5/18/2013 Bill Howe, UW 8

Consistent Hashing server id = 1 server id = 2 server id = 3 data key = 367 data key = 452 data key = 776 5/18/2013 Bill Howe, UW 9

Consistent Hashing: Routing server id = 1 server id = 2 server id = 3 5/18/2013 Bill Howe, UW 10