Dynamo: Amazon s Highly Available Key-value Store



Similar documents
Dynamo: Amazon s Highly Available Key-value Store

Distributed Data Stores

Seminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems

Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Hands-on Cassandra. OSCON July 20, Eric

Cassandra A Decentralized, Structured Storage System

A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA. Technology, Coimbatore. Engineering and Technology, Coimbatore.

Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece

Cloud Computing in Distributed System

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD

ANALYSIS OF SMART METER DATA USING HADOOP

Making Sense of NoSQL Dan McCreary Wednesday, Nov. 13 th 2014

Design Patterns for Distributed Non-Relational Databases

CS435 Introduction to Big Data

Distributed Systems. Tutorial 12 Cassandra

DISTRIBUTION OF DATA SERVICES FOR CORPORATE APPLICATIONS IN CLOUD SYSTEM

A Comparative Analysis of Different NoSQL Databases on Data Model, Query Model and Replication Model

MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM

Distributed Storage Evaluation on a Three-Wide Inter-Data Center Deployment

Evaluation of NoSQL databases for large-scale decentralized microblogging

Scalable Multiple NameNodes Hadoop Cloud Storage System

Chapter 2 Related Technologies

Sharding by Hash Partitioning A database scalability pattern to achieve evenly sharded database clusters

Cassandra - A Decentralized Structured Storage System

Using Object Database db4o as Storage Provider in Voldemort

Cassandra - A Decentralized Structured Storage System

Cisco Global Cloud Index: Forecast and Methodology,

Practical Cassandra. Vitalii

Cassandra A Decentralized Structured Storage System

AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS

Survey of Apache Big Data Stack

How To Write A Key Value Database (Kvs) In Ruby On Rails (Ruby)

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

An Oracle White Paper April Back to the Future with Oracle Database 12c

Distributed Storage Systems

Eventually Consistent

The Cloud Trade Off IBM Haifa Research Storage Systems

Cloud Data Management: A Short Overview and Comparison of Current Approaches

Evaluation of NoSQL and Array Databases for Scientific Applications

Structured Data Storage

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

LONG TERM EVOLUTION WITH 5G USING MAPREDUCING TASK FOR DISTRIBUTED FILE SYSTEMS IN CLOUD

Advanced Data Management Technologies

Introduction to NOSQL

Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants

Which NoSQL Database? A Performance Overview

NoSQL Databases: a step to database scalability in Web environment

R.Tamilarasi #1, G.Kesavaraj *2

A TAXONOMY AND COMPARISON OF HADOOP DISTRIBUTED FILE SYSTEM WITH CASSANDRA FILE SYSTEM

Comparison of Distribution Technologies in Different NoSQL Database Systems

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

USC Viterbi School of Engineering

Case study: CASSANDRA

MASTER PROJECT. Resource Provisioning for NoSQL Datastores

CLOUD scale storage Anwitaman DATTA SCE, NTU Singapore CE 7490 ADVANCED TOPICS IN DISTRIBUTED SYSTEMS

G Porcupine. Robert Grimm New York University

Cloud Computing at Google. Architecture

LOAD BALANCING FOR OPTIMAL SHARING OF NETWORK BANDWIDTH

Cassandra: Principles and Application

Benchmarking Failover Characteristics of Large-Scale Data Storage Applications: Cassandra and Voldemort

Data Management Challenges in Cloud Computing Infrastructures

The CAP theorem and the design of large scale distributed systems: Part I

Automatic, multi- grained elasticity- provisioning for the Cloud

A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Communication System Design Projects

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL 2012 IBM Corporation

Transcription:

Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels SOSP 2007

Intro Reliability and scalability are key operational requirements. Data, such as shopping carts, must be always available. Even in the presence of hardware failure, communication failure, or natural disaster. The storage technology behind the shopping cart must ensure data can always been read/written and data needs to be available across data centers. Failure is the norm. Shouldn't impact performance.

Intro Dynamo: a highly available and scalable distributed data store. Relational databases are too heavyweight for simple primary key access. Allows tradeoff between availability, consistency, cost-effectiveness and performance.

Intro Key features: data is partitioned and replicated using consistent hashing consistency is facilitated by object versioning consistency among replicas during updates is maintained by a quorum-like technique and a decentralized replica synchronization protocol. gossip based distributed failure detection and membership protocol completely decentralized system with minimal need for manual administration storage nodes can be added and removed from Dynamo without requiring any manual partitioning or redistribution

Background Complex query management provided by RDBMS requires expensive hardware and operational overhead. Traditional DBs favor consistency over availability.

System Assumptions Query model: support read/write on data identified by a key. ACID Properties: trade consistency for availability. Efficiency: runs on commodity hardware. Provides throughput and latency guarantees. Assumptions: environment is non-hostile; scale up to hundreds of nodes.

Design Considerations Use eventual consistency to provide high availability. All updates eventually reach all replicas. Dynamo is always writeable, so conflicts are resolved during reads. The application provides conflict resolution.

Other Design Choices Incremental scalability Symmetry Decentralization Heterogeneity

Interface Two operations: get(key), put(key, context, object) context includes metadata such as the version of the object. An MD5 hash on the key determines where object should be stored.

Partitioning Consistent hashing generates a circular ID space (ring) Each node is assigned a random position on the ring Data's key is hashed and data is stored at first node with position larger than the item's position. Virtual nodes help to deal with heterogeneity. Each physical node is assigned several positions (tokens).

Replication Each data item is stored on a coordinator chosen as described on the last slide. The coordinator replicates the data on N-1 successors on the ring. The list of nodes storing a piece of data is the preference list.

Data Versioning No updates should be lost. Vector clocks maintain versioning information. For a put, the client specifies the version it is updating. A get request may result in multiple versions returned.

Execution of Operations Any get or put can be handled by any node. A client can select a node using a load balancer or a partition-aware client library. A node (coordinator) in the top N in the preference list handles the read/write. Requests sent to other nodes may be forwarded. Parameters R and W specify the number of nodes that must participate in a successful read/write. put - coordinator generates the new vector clock, writes the version locally, and sends to W-1 other nodes from the preference list. If W-1 respond, the write is successful. get - the coordinator gets R-1 versions of the data from nodes in the perference list and possibly returns multiple replicas.

Hinted Handoff Reads/writes are performed on the first N healthy nodes found by the coordinator. If a node is down, data will be sent to the next node in the ring. This node will keep track of the intended recipient and send later. Replicas are stored at multiple data centers. Merkle trees are used to keep replicas synchronized without significant overhead.

Membership An adminstrator adds and removes nodes from the ring. Every second, each node contacts another to exchange membership information. At startup, a node chooses a random set of tokens and writes the selection to disk. This information is exchanged during gossiping. Some nodes are are seeds. All nodes know about seeds and eventually send them their membership info. When a node finds out about a new node that should store some of the data it currently stores, it offers the data to the new node.

Lessons Learned Typical values for N, R, W - (3, 2, 2) Experiments consider several hundred nodes on multiple data centers.

Results 99.9 percentile latencies for reads/writes over 30 days Patterns are diurnal because requests are diurnal Writes slower than reads because of disk access