Distributed Data Stores



Similar documents
Dynamo: Amazon s Highly Available Key-value Store

The Cloud Trade Off IBM Haifa Research Storage Systems

Eventually Consistent

Introduction to NOSQL

Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Dynamo: Amazon s Highly Available Key-value Store

Data Consistency on Private Cloud Storage System

The CAP theorem and the design of large scale distributed systems: Part I

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

Design Patterns for Distributed Non-Relational Databases

Cassandra A Decentralized, Structured Storage System

Big Data & Scripting storage networks and distributed file systems

Transactions and ACID in MongoDB

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Practical Cassandra. Vitalii

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Although research on distributed database systems. Consistency Tradeoffs in Modern Distributed Database System Design COVER FEATURE

Consistency Models for Cloud-based Online Games: the Storage System s Perspective

G Porcupine. Robert Grimm New York University

Structured Data Storage

Distributed Systems. Tutorial 12 Cassandra

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

CS435 Introduction to Big Data

Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

LARGE-SCALE DATA STORAGE APPLICATIONS

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Distributed Storage Systems

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

NoSQL Databases. Nikos Parlavantzas

A survey of big data architectures for handling massive data

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

Consistency Trade-offs for SDN Controllers. Colin Dixon, IBM February 5, 2014

Lecture Data Warehouse Systems

Daniel J. Adabi. Workshop presentation by Lukas Probst

Data Management in the Cloud

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to am to 5 pm HOTEL DUBAI GRAND DUBAI

Cloud Computing at Google. Architecture

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Cassandra A Decentralized Structured Storage System

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

This paper defines as "Classical"

Distributed Software Systems

Cassandra. Jonathan Ellis

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Distributed File Systems

Referential Integrity in Cloud NoSQL Databases

FAWN - a Fast Array of Wimpy Nodes

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Distributed Data Management with VMware vfabric GemFire. Real-Time Data Correlation: Latency and Sustained Operations

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

ZooKeeper. Table of contents

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

Analysis and Classication of NoSQL Databases and Evaluation of their Ability to Replace an Object-relational Persistence Layer

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Eventual Consistent Databases: State of the Art

MASTER PROJECT. Resource Provisioning for NoSQL Datastores

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

In Memory Accelerator for MongoDB

NoSQL Database Options

A programming model in Cloud: MapReduce

Challenges for Data Driven Systems

Distributed System Principles

The Sierra Clustered Database Engine, the technology at the heart of

Case study: CASSANDRA

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

Can the Elephants Handle the NoSQL Onslaught?

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

How To Write A Key Value Database (Kvs) In Ruby On Rails (Ruby)

bigdata Managing Scale in Ontological Systems

NoSQL Evaluation. A Use Case Oriented Survey

Database Replication with Oracle 11g and MS SQL Server 2008

Introduction to NoSQL

Cloud Storage over Multiple Data Centers

Using Object Database db4o as Storage Provider in Voldemort

Windows Azure Storage Scaling Cloud Storage Andrew Edwards Microsoft

Transcription:

Distributed Data Stores 1

Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High resource requirements for unnecessary components, high total cost of ownership No incremental scalability ( Elastic scalability ) Replication support for >10 5 nodes, load balancing? Very strict correctness model (ACID) 2

Brewer s Conjecture a.k.a. CAP Theorem Brewcer, PODC 2000 Gilbert, Lynch, ACM SIGACT News, 33(2), 2002, p. 51-59. 3 Non-functional Requirements for Distributed Data Stores Consistency Availability Partition-tolerance Choose 2! 3

Fault Tolerance Millions of hardware components in cluster Disks CPUs Memory Network adapters, Network cabling, Network switches Something is always broken! Availability, Partition-tolerance crucial CAP implies: give up consistency 4

Eventual Consistency Less strict than ACID correctness Without updates, all replicas eventually settle on same state Variants Causal consistency Read-your-writes consistency Monotonic read consistency Monotonic write consistency Example Domain Name System Werner Vogels: Eventually consistent. Commun. ACM 52(1): 40-44 (2009) 5

Consistent Hashing How to allocate data items to N nodes? Hash function: h(o) mod N Problem: Incremental Scalability means frequent adding and removing of nodes Rehashing is not feasible Consistent Hashing Partition Hash Value Space using Indirection Karger et al: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. STOC 1997 6

Consistent Hashing o3 o4 o1 o2 o5 Hash values 7

Consistent Hashing o3 t1 o4 o1 o2 t3 o5 t2 Hash values 8

Consistent Hashing t1 o3 o4 o2 o1 t3 o5 t2 Hash values 9

Vector clocks Mechanism to create partial ordering of updates in distributed systems Detect causal relationships and concurrent updates Use vectors of timestamps instead simple timestamps One timestamp per node Increase own vector component at each local update Two versions v1, v2 If all components of v1 smaller than v2: v2 resulted from v1, v1 old If any component of v1 greater than corresponding component from v2: Concurrent updates occurred Resolved conflict uses maximum of each component 10

Dynamo Key-value store Put / Get / (Delete) Key, value: Bytestrings Infrastructure for Amazon services AWS S3, Shopping cart,... >100 service calls/amazon web page DeCandia et al: Dynamo: amazon's highly available key-value store. SOSP 2007: 205-220 11

Dynamo Requirements Incremental Scalability Symmetry/Decentralization no special node roles/points of failure Heterogeneity nodes of different types (e.g. due to general technology progress) Always writable never reject client updates (e.g. shopping cart additions) High performance requirements apply to 99.9% percentile e.g. 300ms per request at 500 requests/sec 12

Consistent Hashing in Dynamo Problem with regular method Load is not uniformly distributed Node performance varies Solution Map each node to multiple positions (virtual nodes) on hash ring Number of virtual nodes depends on node performance Effect Finer granularity of key partitions (more nodes responsible for same range) More load on more powerful nodes Effect of adding/removing nodes is distributed over many remaining nodes 13

Replication Fault tolerance implies replication of data Data replicated to N nodes in preference list Preference list of replication targets Nodes following key range in hash ring size >N to prepare for node failures Use physical nodes by skipping virtual nodes of same physical nodes Preference list information replicated across all nodes 14

Replication Quorum-like system Send N requests, declare success after enough replies arrive Protocol parameters for enough : R reads / W writes Fine-tuning R+W > N gives strong consistency Change R, W depending on application workload Slowest replica of R/W set determines latency Examples N=2, R=1, W=2 N=3, R=2, W=2 N=100, R=1, W=100 N=4, R=1, W=2 15

Load Balancing Any node can accept put/get request If not in preference list, forward to first healthy node on pref list If in preference list, coordinate request (send redundant requests and reply) 16

Sloppy quorum/hinted handoff Only use first N healthy nodes, skip nonresponders/down nodes Handoff to less preferred nodes increases availability Add intended recipients to requests as hint store hinted writes in separate store use hint to propagate updates later when original recipient back up 17

Eventual Consistency in Dynamo Inconsistent versions may occur Sloppy quorum in case of node failure R+W<=N by configuration Use vector clocks to discover inconsistencies during read Syntactic inconsistencies (vector clock values stricly greater) automatically resolved Repair remaining inconsistencies using application code e.g. merge shopping carts 18

Trade-offs Increase W Higher durability, less write availability, lower performance Increase R Less inconsistency, less read availability, lower performance Additional criteria for selecting R/W nodes, e.g. different data centers 19