Megastore: Providing Scalable, Highly Available Storage for Interactive Services



Similar documents
Megastore: Providing Scalable, Highly Available Storage for Interactive Services

A Taxonomy of Partitioned Replicated Cloud-based Database Systems

Low-Latency Multi-Datacenter Databases using Replicated Commit

Distributed Systems. Tutorial 12 Cassandra

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Amr El Abbadi. Computer Science, UC Santa Barbara

Cloud Computing at Google. Architecture

Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Managing Geo-replicated Data in Multi-datacenters

Lecture 6 Cloud Application Development, using Google App Engine as an example

Blockchain, Throughput, and Big Data Trent McConaghy

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

Learning Management Redefined. Acadox Infrastructure & Architecture

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

Parallel & Distributed Data Management

Hypertable Architecture Overview

So What s the Big Deal?

How To Build Cloud Storage On Google.Com

Consistency Models for Cloud-based Online Games: the Storage System s Perspective

LARGE-SCALE DATA STORAGE APPLICATIONS

Vertical Paxos and Primary-Backup Replication

Web Technologies: RAMCloud and Fiz. John Ousterhout Stanford University

NoSQL in der Cloud Why? Andreas Hartmann

Practical Cassandra. Vitalii

Scaling Database Performance in Azure

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

Eliminate SQL Server Downtime Even for maintenance

Cassandra A Decentralized, Structured Storage System

Big Data Storage Architecture Design in Cloud Computing

Design Patterns for Distributed Non-Relational Databases

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Distributed Databases

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Big data management with IBM General Parallel File System

The Design and Implementation of the Zetta Storage Service. October 27, 2009

Clustering/HA. Introduction, Helium Capabilities, and Potential Lithium Enhancements.

Designing Database Solutions for Microsoft SQL Server 2012 MOC 20465

Introduction to Apache Cassandra

MS Design, Optimize and Maintain Database for Microsoft SQL Server 2008

A1 and FARM scalable graph database on top of a transactional memory layer

The Cloud to the rescue!

Achieving Zero Downtime and Accelerating Performance for WordPress

Data Management in the Cloud

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Cloud Computing with Microsoft Azure

Cloud Storage over Multiple Data Centers

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

SharePoint Data Management and Scalability on Microsoft SQL Server

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models

FINANCIAL SERVICES: FRAUD MANAGEMENT A solution showcase

Transactions and ACID in MongoDB

ScaleArc for SQL Server

<Insert Picture Here> Oracle In-Memory Database Cache Overview

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

JDBC We don t need no stinking JDBC. How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale.

Scalable Transaction Management on Cloud Data Management Systems

Introduction to Cassandra

MySQL CLUSTER. Scaling write operations, as well as reads, across commodity hardware. Low latency for a real-time user experience

A Binary Tree SMART Migration Webinar. SMART Solutions for Notes- to- Exchange Migrations

High Availability and Disaster Recovery Solutions for Perforce

In Memory Accelerator for MongoDB

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

No-SQL Databases for High Volume Data

Hyper-V over SMB: Remote File Storage Support in Windows Server 2012 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Cloud Computing Is In Your Future

Distribution transparency. Degree of transparency. Openness of distributed systems

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Distributed File Systems

Introduction to NOSQL

Bigdata High Availability (HA) Architecture

HA / DR Jargon Buster High Availability / Disaster Recovery

Hosting Transaction Based Applications on Cloud

Windows Server Failover Clustering April 2010

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Datacenter Operating Systems

High Availability Solutions for the MariaDB and MySQL Database

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Transcription:

Megastore: Providing Scalable, Highly Available Storage for Interactive Services J. Baker, C. Bond, J.C. Corbett, JJ Furman, A. Khorlin, J. Larson, J-M Léon, Y. Li, A. Lloyd, V. Yushprakh Google Inc. Originally presented at CIDR 2011 by James Larson Presented by George Lee

With Great Scale Comes Great Responsibility A billion Internet users Small fraction is still huge Must please users Bad press is expensive - never lose data Support is expensive - minimize confusion No unplanned downtime No planned downtime Low latency Must also please developers, admins

Megastore Started in 2006 for app development at Google Service layered on: Bigtable (NoSQL scalable data store per datacenter) Chubby (Config data, config locks) Turnkey scaling (apps, users) Developer-friendly features Wide-area synchronous replication partition by "Entity Group"

Entity Group Examples Application Entity Groups Cross-EG Ops Email User accounts none Blogs Mapping Social Users, Blogs Local patches Users, Groups Access control, notifications, global indexes Patch-spanning ops Messages, relationships, notifications Resources Sites Shipments

Achieving Technical Goals Scale Bigtable within datacenters Easy to add Entity Groups (storage, throughput) ACID Transactions Write-ahead log per Entity Group 2PC or Queues between Entity Groups Wide-Area Replication Paxos Tweaks for optimal latency

Two Phase Commit Commit request/voting phase Coordinator sends query to commit Cohorts prepare and reply Commit/Completion phase Success: Commit and acknowledge Failure: Rollback and acknowledge Disadvantage: Blocking protocol

Basic Paxos Prepare and Promise Proposer selects proposal number N and sends promise to acceptors Acceptors accept or deny the promise Accept! and Accepted Proposer sends out value Acceptors respond to proposer and learners

Message Flow: Basic Paxos Client Proposer Acceptor Learner X--------> Request X---------> -> -> Prepare(N) <---------X--X--X Promise(N,{Va,Vb,Vc}) X---------> -> -> Accept!(N,Vn) <---------X--X--X------> -> Accepted(N,Vn) <---------------------------------X--X Response

Paxos: Quorum-based Consensus "While some consensus algorithms, such as Paxos, have started to find their way into [largescale distributed storage systems built over failure-prone commodity components], their uses are limited mostly to the maintenance of the global configuration information in the system, not for the actual data replication." -- Lamport, Malkhi, and Zhou, May 2009

Paxos and Megastore In practice, basic Paxos is not used Master-based approach? Megastore s tweaks Coordinators Local reads Read/write from any replica Replicate log entries on each write

Omissions These were noted in the talk: No current query language Apps must implement query plans Apps have fine-grained control of physical placement Limited per-entity Group update rate

Is Everybody Happy? Admins linear scaling, transparent rebalancing (Bigtable) instant transparent failover symmetric deployment Developers ACID transactions (read-modify-write) single-system image makes code simple little need to handle failures End Users fast up-to-date reads, acceptable write latency consistency many features (indexes, backup, encryption, scaling)

Take-Aways Constraints acceptable to most apps Entity Group partitioning High write latency Limited per-eg throughput In production use for over 4 years Available on Google App Engine as HRD (High Replication Datastore)

Questions? Rate limitation on writes are insignificant? Why not lots of RDBMS? Why not NoSQL with mini-databases?