Designing a Cloud Storage System



Similar documents
High Availability with Windows Server 2012 Release Candidate

Diagram 1: Islands of storage across a digital broadcast workflow

Nutanix Tech Note. Failure Analysis All Rights Reserved, Nutanix Corporation

Implementing Multi-Tenanted Storage for Service Providers with Cloudian HyperStore. The Challenge SOLUTION GUIDE

The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms. Abhijith Shenoy Engineer, Hedvig Inc.

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY

How To Create A Multi Disk Raid

Cloud Based Application Architectures using Smart Computing

The Design and Implementation of the Zetta Storage Service. October 27, 2009

A Virtual Filer for VMware s Virtual SAN A Maginatics and VMware Joint Partner Brief

XenData Archive Series Software Technical Overview

PIONEER RESEARCH & DEVELOPMENT GROUP

HP and Mimosa Systems A system for archiving, recovery, and storage optimization white paper

ANY SURVEILLANCE, ANYWHERE, ANYTIME

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

Private Cloud Storage for Media Applications. Bang Chang Vice President, Broadcast Servers and Storage

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

How to choose the right RAID for your Dedicated Server

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

Introduction to Gluster. Versions 3.0.x

EMC VPLEX FAMILY. Continuous Availability and Data Mobility Within and Across Data Centers

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Technical Overview Simple, Scalable, Object Storage Software

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Azure Media Service Cloud Video Delivery KILROY HUGHES MICROSOFT AZURE MEDIA

Key Messages of Enterprise Cluster NAS Huawei OceanStor N8500

Storage Technologies for Video Surveillance

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

Software-defined Storage Architecture for Analytics Computing

PARALLELS CLOUD STORAGE

EMC VPLEX FAMILY. Transparent information mobility within, across, and between data centers ESSENTIALS A STORAGE PLATFORM FOR THE PRIVATE CLOUD

CROSS PLATFORM AUTOMATIC FILE REPLICATION AND SERVER TO SERVER FILE SYNCHRONIZATION

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

How To Build A Clustered Storage Area Network (Csan) From Power All Networks

Solbox Cloud Storage Acceleration

CSE-E5430 Scalable Cloud Computing P Lecture 5

Understanding Microsoft Storage Spaces

HGST Object Storage for a New Generation of IT

(Scale Out NAS System)

VMware Software-Defined Storage Vision

Clustering Windows File Servers for Enterprise Scale and High Availability

With DDN Big Data Storage

Zadara Storage Cloud A

Redefining Microsoft SQL Server Data Management

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

Octoshape s Multicast Technology Suite:

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Using object storage as a target for backup, disaster recovery, archiving

Expert. Briefing. \\\\ Best Practices for Managing Storage with Hyper-V

Scality RING High performance Storage So7ware for pla:orms, StaaS and Cloud ApplicaAons

High Availability Databases based on Oracle 10g RAC on Linux

Scala Storage Scale-Out Clustered Storage White Paper

NETWORK ATTACHED STORAGE DIFFERENT FROM TRADITIONAL FILE SERVERS & IMPLEMENTATION OF WINDOWS BASED NAS

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Fault Tolerance & Reliability CDA Chapter 3 RAID & Sample Commercial FT Systems

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Virtual Infrastructure Security

Implementing Offline Digital Video Storage using XenData Software

Technology Insight Series

The last 18 months. AutoScale. IaaS. BizTalk Services Hyper-V Disaster Recovery Support. Multi-Factor Auth. Hyper-V Recovery.

Hadoop in the Hybrid Cloud

Increasing Storage Performance, Reducing Cost and Simplifying Management for VDI Deployments

Red Hat Storage Server

The Microsoft Large Mailbox Vision

Network Attached Storage. Jinfeng Yang Oct/19/2015

POWER ALL GLOBAL FILE SYSTEM (PGFS)

How To Make A Backup System More Efficient

Whitepaper. NexentaConnect for VMware Virtual SAN. Full Featured File services for Virtual SAN

SwiftStack Filesystem Gateway Architecture

Building Storage Clouds for Online Applications A Case for Optimized Object Storage

Traditionally, a typical SAN topology uses fibre channel switch wiring while a typical NAS topology uses TCP/IP protocol over common networking

Big data management with IBM General Parallel File System

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

What is RAID and how does it work?

Deploying a File Server Lesson 2

Technical Brief: Global File Locking

StorPool Distributed Storage Software Technical Overview

RAID Basics Training Guide

Achieving High Availability & Rapid Disaster Recovery in a Microsoft Exchange IP SAN April 2006

How To Use The Hitachi Content Archive Platform

Pivot3 Desktop Virtualization Appliances. vstac VDI Technology Overview

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

IBM Global Technology Services March Virtualization for disaster recovery: areas of focus and consideration.

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

Vess A2000 Series HA Surveillance with Milestone XProtect VMS Version 1.0

EMC CENTERA VIRTUAL ARCHIVE

SECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX

SanDisk ION Accelerator High Availability

Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011

Transcription:

Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes of data) from the system s delivery capacity (its ability to deliver popular objects to a scalable number of users). The archival half need not support scalable performance, and likewise, the delivery half need not guarantee persistence. In practical terms, this translates into an end to end storage solution that includes a high capacity and highly resilient Object Store in the data center, augmented with caches throughout the network to take advantage of aggregated delivery bandwidth at edge sites. As depicted in the following diagram, the Object Store ingests data from some source (e.g., video prepared using a Content Management System), and delivers it to users via edge caches. The ingest interface is push based and likely includes one or more popular storage APIs (e.g., WebDAV, S3), while the delivery interface is pull based and corresponds to HTTP GET requests from the CDN. While the CDN should be source agnostic for example, content might originate from an upstream CDN or be transparently intercepted it is increasingly the case that content delivered over a CDN is sourced from a data center as part of a cloud based storage solution. This begs the question: is there anything we can learn by looking at storage from such an end to end perspective? There are three key lessons. First, it makes little sense to build an Object Store using traditional SAN or NAS technology. This is for two reasons. One has to do with providing the right level of abstraction. In this case, the CDN running at the network edge is perfectly capable of reading a large set of objects from the store, meaning there is no value in managing those objects using full file system semantics (i.e., NAS is a bad fit). Similarly, the storage system needs to understand complete objects and not just blocks (i.e., SAN is not a good fit). The second reason is related to cost. It is simply more cost effective to build a scalable Object Store from commodity hardware. This argument is well understood, and leverages the ability to achieve scalable performance and resiliency in software. 1 Verivue, Inc www.verivue.com

Second, a general purpose CDN that is able to deliver a wide range of content from software updates to video, from large files to small objects, from live (linear) streams to on demand video, from over the top to managed video should not be handicapped by an Object Store that isn t equally flexible. In particular, it is important that the ingest function be low latency, support redundant encoders, and be able to accommodate HTTP adaptive streaming. This makes it possible to deliver on demand and live video, the latter of which needs to be staged through an Object Store to support time shifting and ndvr. Third, it is not practical to achieve scalable delivery purely from a data center. Data centers typically provide massive internal bandwidth, making it possible to build scalable storage from commodity servers, but Internet facing bandwidth is generally limited and expensive. This is just another way to state the argument in favor of delivering content via a CDN scalable delivery is best achieved from the edge. The OneVantage Object Store adopts exactly this end to end design philosophy. It supports file ingest from content management systems via multiple ingest protocols and originates that content for live streaming and on demand delivery via a CDN. In essence, Object Store provides the root of the CDN hierarchy and serves cache misses from multiple CDN tiers downstream. It also offers a scale out architecture for redundancy and storage expansion by leveraging commodity hardware, and in doing so, supports a more cost effective solution than purpose built storage appliances. Scale out Design Object Store scales to billions of objects and petabytes of storage. It runs on clustered commodity servers incorporating the latest hardware technology. Both disks and nodes can be easily added to accommodate growing storage needs without service disruption. Different size disks can be mixed in a node and different types of nodes can be mixed in a cluster, making it possible to always incorporate the latest Commercial Off The Shelf (COTS) hardware. I/O bandwidth and transaction processing capacity also grow linearly, meaning that as the storage and nodes are added to the cluster, Object Store s ingest and delivery capacity increases proportionally. Transaction processing capacity can be adjusted independent of storage capacity by controlling the number of disks per node. For example, external disk shelves can be used to expand the direct attach storage per node. Object Store efficiently handles a high rate of small reads and writes because it has no centralized mechanisms (e.g., a replica tracking database). Instead, ingest and delivery requests can be directed at any node, independent of which nodes currently store a replica of the object. Object Store distributes content evenly across all of the disks within the cluster using the same consistent hashing algorithms employed by the OneVantage HyperCache. Availability and Durability Object Store provides multiple levels of redundancy, employing both mirroring and automatic failure recovery to achieve high levels of fault tolerance. 2 Verivue, Inc www.verivue.com

Disks in the Object Store are configured into redundancy groups a set of nodes that are as isolated as possible from other nodes where a given deployment must have at least three redundancy groups. When content is ingested into the Object Store, it is replicated across multiple redundancy groups; two by default, but the replication factor is configurable. In small clusters it is sometimes necessary for a redundancy group to contain a single node. In large clusters it is typical to configure multiple nodes into the same group when those nodes share a network switch, with redundancy groups isolated from each other by separate switches, power, or geography. RAID is not required. Disk and node failures are automatically detected and isolated. Content on the failed disk is automatically replicated to other disks to rapidly restore redundancy. The failure recovery time is a fraction of what it would be on a typical RAID based system, which requires a lengthy RAID rebuild process. In addition, a background auditor continually validates the integrity of the objects. If a corrupted object is detected (due to the decay of physical storage media, for example), the file is quarantined and the bad file is replaced with another replica. Disaster recovery surviving the failure of an entire site is handled in one of two ways. The first is to explicitly synchronize content between two Object Store clusters located at distinct sites. In this case, the two Object Stores operate autonomously, which means each maintains an independent set of redundancy groups. The second is to distribute a single Object Store cluster across multiple sites, in which case the Object Store s internal redundancy mechanisms cause objects to be replicated at a remote site. Optimized for Streaming Applications Object Store is uniquely tailored for streaming applications, particularly live and HTTP adaptive streaming. There are three considerations. First, live streaming applications require low latency ingest and delivery of small video fragments, typically at multiple bit rates. Object Store is optimized to support ingest and delivery of a large number of live channels per node with predictable, low latency even in the presence of failed disks or nodes. In contrast, RAID or erasure coding based storage systems are typically not optimized for small object writes. This is because in addition to the overhead of parity calculations, every time a portion of a block stripe is updated the entire stripe must be read back in order to compute the new parity. For example, on a RAID 5 array made from five disks, a particular stripe across those disks may have data on drives #1, #2, #3 and #4, and its parity block on drive #5. If a small object write changes just the block in the stripe of disk #2, disks #1, #3 and #4 must also be read to calculate the parity which is then written to disk #5. Also, the RAID controller must ensure that changes to data and its associated parity occur as a single transaction. This is often handled with a two phase commit, which results in additional performance overhead. Finally, writes must be serialized which can affect latency. Hence the ingest latency on RAID based storage systems is less predictable which can be problematic for live applications. 3 Verivue, Inc www.verivue.com

Second, live streaming applications require redundant encoders to avoid a single point of failure, which implies Object Store must be able to simultaneously ingest multiple copies of each object. Even in such cases, Object Store replicates the content as if it had received a single request. There is no storage penalty for using redundant encoders, and critically, no content is lost in the event of an encoder or Object Store disk/server failure. Third, some HTTP adaptive streaming protocols, such as Microsoft Smooth Streaming and Adobe HTTP Dynamic Streaming, require a translation from client fragment requests to server file offsets. Object Store can be complemented with origin heads to provide this functionality. These origin heads support a scale out model consistent with the Object Store architecture, where origin heads can run on dedicated servers or on the same servers as the base Object Store functionality. For example, the following figure shows an eight node cluster with six nodes dedicated to Object Store and two nodes running origin heads. Origin heads are not required for Apple HLS or native HTTP delivery. Integrated Management Object Store allows independent accounts to be generated, and within these accounts, users can be defined with specific rights and privileges. Accounts can be generated for in house users, as well as for third party content providers. This allows operators to offer storage as a service, where typically there will be an administrative account for the operator and a separate account for each content provider. Content providers then create users with desired privileges within that account. All interaction with Object Store is cryptographically protected via HTTPS, and conforms to a well defined API; content providers are not granted direct access to individual Object Store nodes. This API allows content provider s to ingest content into Object Store using established content management protocols (e.g., FTP, WebDAV) and cloud storage interfaces (e.g., Rackspace, S3), thereby simplifying the transition to/from 4 Verivue, Inc www.verivue.com

popular cloud storage services. The API also provides integrated control over how content is published via a CDN, including when content is published, where it can be accessed, and how long content is available. Finally, audit logs record all access through the management API or management user interface that sits on top of this API. Summary The OneVantage Object Store offers cloud based replicated HTTP storage, which can be used to persistently store media content that is subsequently delivered to users via a CDN. The solution leverages COTS hardware and state of the art clustering software to scale to billions of objects and petabytes of data. It offers high availability, including geo redundancy, supports multi tenant usage scenarios, and supports APIs that ease integration with both cloud storage systems and widely distributed CDNs. And perhaps most uniquely, Object Store is optimized to support a full range of content, including Live, ndvr, and On Demand streaming applications. About the Author Larry Peterson, Chief Scientist, Verivue As Chief Scientist, Larry Peterson provides technical leadership and expertise for research and development projects. He is also the Robert E. Kahn Professor of Computer Science at Princeton University, where he served as Chairman of the Computer Science Department from 2003 2009. He also serves as Director of the PlanetLab Consortium, a collection of academic, industrial, and government institutions cooperating to design and evaluate next generation network services and architectures. Larry has served as Editor in Chief of the ACM Transactions on Computer Systems, has been on the Editorial Board for the IEEE/ACM Transactions on Networking and the IEEE Journal on Select Areas in Communication and is the co author of the best selling networking textbook Computer Networks: A Systems Approach. He is a member of the National Academy of Engineering, a Fellow of the ACM and the IEEE, and the 2010 recipient of the IEEE Kobayahi Computer and Communication Award. He received his Ph.D. degree from Purdue University in 1985. For more information on Verivue s Object Store solution, please visit: www.verivue.com/object store. 5 Verivue, Inc www.verivue.com