How To Build Cloud Storage On Google.Com

Similar documents

Distributed File Systems

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

Snapshots in Hadoop Distributed File System

Massive Data Storage

Cloud Computing at Google. Architecture

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Do Relational Databases Belong in the Cloud? Michael Stiefel

Hosting Transaction Based Applications on Cloud

Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Distributed File Systems

The Google File System

Distributed Metadata Management Scheme in HDFS

Practical Cassandra. Vitalii

The Google File System

High Availability with Windows Server 2012 Release Candidate

What Is Datacenter (Warehouse) Computing. Distributed and Parallel Technology. Datacenter Computing Application Programming

Tree-Based Consistency Approach for Cloud Databases

Cloud Computing with Microsoft Azure

CHAPTER 7 SUMMARY AND CONCLUSION

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

Amr El Abbadi. Computer Science, UC Santa Barbara

Big Data Processing in the Cloud. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center

NoSQL Database - mongodb

Cloud Architecture Patterns

Windows Azure Storage Scaling Cloud Storage Andrew Edwards Microsoft

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

NoSQL in der Cloud Why? Andreas Hartmann

Cloud Computing Is In Your Future

Big Data Management and NoSQL Databases

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Big Data Storage Architecture Design in Cloud Computing

High Availability Using Raima Database Manager Server

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

Big Data Storage, Management and challenges. Ahmed Ali-Eldin

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Maxta Storage Platform Enterprise Storage Re-defined

The CAP theorem and the design of large scale distributed systems: Part I

The Cloud to the rescue!

bigdata Managing Scale in Ontological Systems

Lecture Data Warehouse Systems

Software-Defined Networks Powered by VellOS

Client/Server and Distributed Computing

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

Distributed File Systems

High Availability Using MySQL in the Cloud:

The Microsoft Large Mailbox Vision

nosql and Non Relational Databases

Service Description Cloud Storage Openstack Swift

What is Analytic Infrastructure and Why Should You Care?

MASTER PROJECT. Resource Provisioning for NoSQL Datastores

Survey on Load Rebalancing for Distributed File System in Cloud

Distributed Data Stores

Transactions and ACID in MongoDB

Big data management with IBM General Parallel File System

Parallel & Distributed Data Management

Big Table A Distributed Storage System For Data

Hadoop IST 734 SS CHUNG

A Taxonomy of Partitioned Replicated Cloud-based Database Systems

This paper defines as "Classical"

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

Windows Server 2003 Migration Guide: Nutanix Webscale Converged Infrastructure Eases Migration

Advanced Data Management Technologies

An Oracle White Paper January A Technical Overview of New Features for Automatic Storage Management in Oracle Database 12c

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Cloud Based Distributed Databases: The Future Ahead

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Distributed Storage Systems

Load Re-Balancing for Distributed File. System with Replication Strategies in Cloud

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Solaris For The Modern Data Center. Taking Advantage of Solaris 11 Features

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Data Management Course Syllabus

Alternatives to HIVE SQL in Hadoop File Structure

Challenges for Data Driven Systems

EMC CLOUDARRAY PRODUCT DESCRIPTION GUIDE

Consistency Trade-offs for SDN Controllers. Colin Dixon, IBM February 5, 2014

Cyber Forensic for Hadoop based Cloud System

Scalable Multiple NameNodes Hadoop Cloud Storage System

Infrastructure as a Service (IaaS)

Building Storage Clouds for Online Applications A Case for Optimized Object Storage

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Efficient Metadata Management for Cloud Computing applications

Comparative analysis of Google File System and Hadoop Distributed File System

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

High Availability Solutions for the MariaDB and MySQL Database

Communication System Design Projects

Eventually Consistent

emind Webydo Moves to the Google Cloud Platform (GCP) with Emind For a Scalable Cloud Customers Stories by Overview About Webydo

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

Transcription:

Building Scalable Cloud Storage Alex Kesselman alx@google.com

Agenda Desired System Characteristics Scalability Challenges Google Cloud Storage

What does a customer want from a cloud service? Reliability Big Data Durability Global Access Speed Low Cost Atomic Updates Simplicity Strong Consistency

Our job: Keep service complexity in the Cloud......and out of the Client Providing reliability and speed isn't free; it requires a fair amount of service complexity. Our design task is to hide most of the complexity behind the API. We've done our job when: 1 2 No matter where you are or how many of you there are, our service is virtually indistinguishable from talking to a single, fast, nearby machine that provides reliable service with ~100% uptime and virtually infinite storage. You don't have to write a tome of boilerplate to use the system.

Storage Service Part 1 Various failure modes: What if I connect from the other side of the world? What if the box fails? What if the network fails? What if data gets corrupted? What if we need more storage? For latency, you are forced to have boxes spread around globally. This implies having to intelligently replicate the data. For availability, there has to be a route when boxes or networks fail, and for continued availability, those boxes must self-heal afterwards. For data integrity, you should identify and recover from data corruptions. For "infinite" storage and "infinite" QPS, you have to have capacity planning that has a lead time on storage use and growth.

Storage Service Part 2 For durability, error correction codes and replication. Someone gets to monitor the service health, and try to stave off incidents that might cause unavailability. Now, what model to use? 1 BASE (Basically Available, Soft state, Eventually consistent)? 2 ACID (Atomicity, Consistency, Isolation, Durability)? Scales well, due to cheaper reads and writes. More work for the client developer to handle resource contention. Strong consistency means less work for client developer, but Doesn't scale well, sacrificing write latency max QpS

Scalability Challenges The corpus size is growing exponentially Systems require a major redesign every ~5 years Complexity is inherent in dynamically scalable architectures

CAP The CAP theorem states that any networked shared-data system can have at most two of three desirable properties: 1. 2. 3. consistency (C) high availability (A) tolerance to network partitions (P) Mitigation (C & P): Regional architecture minimizing dependency on the global control plane Degraded mode of operation during partition followed by recovery

Region Can Operate in Isolation

Dynamic Sharding Static sharding of object keyspace helps with hot spots, but affects listing efficiency => dynamic sharding

Incremental Processing Maintenance overhead (e.g. billing) is proportional to the corpus size => incremental processing proportional to the daily churn rate Cold Corpus Deleted Created

Isolation It s hardly feasible to track millions of individual users in real time => focus on top K users and throttle them

Google's Solution Global data centers, each with Internet access, all connected through a private backbone. Devs and Site Reliability Engineers monitor and plan. Split the problem into two parts: 1 Data service (i.e., how to store data on the the boxes) 2 Metadata service (map of object names to data service entity refs) Replication and sharding Uses Colossus to store data (successor of GFS) Gives fairly-low latency, scalability, and limited ACID semantics Uses Spanner (successor of Megastore)

Data Replication End-user latency really matters Application complexity is less if close to its data Countries have legal restrictions on locating data Plan and optimize data moves Reduce costs by: De-duplicating data chunks Adjusting replication for cold data Migrating data to cheaper storage Caching reads

Continuous Multi-Criteria Optimization Problem

Replication Architecture Fast response time; work prioritization; maximize utilization; minimize wastage.

Spanner The basics: Planet-scale structured storage Next generation of Bigtable stack Provides a single, location-agnostic namespace Manual and access-based data placement: Distributed cross-group transactions Synchronous replication groups (Paxos) Automatic failover of client requests

Spanner Universe Organization Universe is a Spanner deployment Zone is Unit of physical isolation One zonemaster, thousands of spanservers

Metadata Replication Each spanserver responsible for multiple tablet instances Tablet maintains the following mapping: (key: string, timestamp:int64) -> string Data and logs stored on Colossus

Conclusion: Useful Principles of System Design Scale in multiple orders of magnitude Adapt dynamically to traffic changes Degrade gracefully in face of failures Support data aggregation & incremental processing Keep complexity at bay

References More information: http://cloud.google.com/storage 1. 2. 3. Spanner: Google s Globally- Distributed Database J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, and others, Spanner: Google s globally distributed database, ACM Transactions on Computer Systems (TOCS), vol. 31, no. 3, p. 8, 2013. Eric Brewer, CAP twelve years later: How the "rules" have changed, IEEE Explore, Volume 45, Issue 2 (2012), pg. 23-29. Ghemawat, S.; Gobioff, H.; Leung, S. T. (2003). "The Google file system". Proceedings of the nineteenth ACM Symposium on Operating Systems Principles SOSP '03.