Growing Pains of Cloud Storage



Similar documents
CSE-E5430 Scalable Cloud Computing P Lecture 5

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms. Abhijith Shenoy Engineer, Hedvig Inc.

TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

OmniCube. SimpliVity OmniCube and Multi Federation ROBO Reference Architecture. White Paper. Authors: Bob Gropman

WHITE PAPER. Software Defined Storage Hydrates the Cloud

Building low cost disk storage with Ceph and OpenStack Swift

Second-Generation Cloud Computing IaaS Services

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Getting performance & scalability on standard platforms, the Object vs Block storage debate. Copyright 2013 MPSTOR LTD. All rights reserved.

Business white paper Invest in the right flash storage solution

Using object storage as a target for backup, disaster recovery, archiving

DDN updates object storage platform as it aims to break out of HPC niche

Product Spotlight. A Look at the Future of Storage. Featuring SUSE Enterprise Storage. Where IT perceptions are reality

Addressing Storage Management Challenges using Open Source SDS Controller

VMware and Primary Data: Making the Software-Defined Datacenter a Reality

IBM Spectrum Protect in the Cloud

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

FLASH STORAGE SOLUTION

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ANY SURVEILLANCE, ANYWHERE, ANYTIME

RED HAT STORAGE PORTFOLIO OVERVIEW

ioscale: The Holy Grail for Hyperscale

Choosing Between Commodity and Enterprise Cloud

Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS

Whitepaper. NexentaConnect for VMware Virtual SAN. Full Featured File services for Virtual SAN

A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster

SINGLE-TOUCH ORCHESTRATION FOR PROVISIONING, END-TO-END VISIBILITY AND MORE CONTROL IN THE DATA CENTER

The Software Defined Hybrid Packet Optical Datacenter Network SDN AT LIGHT SPEED TM CALIENT Technologies

D e c e n t r a lized Scale - Out Ar c h i t e c t u r e s

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Building Storage Clouds for Online Applications A Case for Optimized Object Storage

TCO Case Study. Enterprise Mass Storage: Less Than A Penny Per GB Per Year. Featured Products

A Hitchhiker s Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

The Convergence of Software Defined Storage and Physical Appliances Hybrid Cloud Storage

Online Code Rate Adaptation in Cloud Storage Systems with Multiple Erasure Codes

FNT EXPERT PAPER. // From Cable to Service AUTOR. Data Center Infrastructure Management (DCIM)

TCO Case Study Enterprise Mass Storage: Less Than A Penny Per GB Per Year

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

BIG DATA-AS-A-SERVICE

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk

At-Scale Data Centers & Demand for New Architectures

Object-based Storage in Big Data and Analytics. Ashish Nadkarni Research Director Storage IDC

THE FUTURE OF STORAGE IS SOFTWARE DEFINED. Jasper Geraerts Business Manager Storage Benelux/Red Hat

SQL Server Virtualization

VMware VMware Inc. All rights reserved.

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

DEDUPLICATION BASICS

SALSA Flash-Optimized Software-Defined Storage

Industry Brief. The Epic Migration. to Software Defined Storage. SUSE Enterprise Storage. Featuring

Global Headquarters: 5 Speen Street Framingham, MA USA P F

White Paper. For Service Providers. Guaranteed. Guaranteed. Shared Storage. Shared Storage

Designing a Cloud Storage System

Understanding Object Storage and How to Use It

Archiving On-Premise and in the Cloud. March 2015

Software-defined Storage

RUBRIK CONVERGED DATA MANAGEMENT. Technology Overview & How It Works

Introduction to AWS Economics

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Red Hat Storage Server

White Paper. Educational. Measuring Storage Performance

MPSTOR Technology Overview

How the Software-Defined Data Center Is Transforming End User Computing

Reliability Comparison of Various Regenerating Codes for Cloud Services

WHITE PAPER. How Deduplication Benefits Companies of All Sizes An Acronis White Paper

Simple. Extensible. Open.

Server SAN Market Definition

Understanding Microsoft Storage Spaces

SOFTWAREDEFINED-STORAGE

Software Defined Environments

ntier Verde Simply Affordable File Storage

Host Performance-Sensitive Applications in Your Cloud

Solid State Storage in the Evolution of the Data Center

Software- Defined Storage

Maxta Storage Platform Enterprise Storage Re-defined

VMware Software-Defined Storage Vision

Advanced Knowledge and Understanding of Industrial Data Storage

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Hadoop & its Usage at Facebook

Demystifying Deduplication for Backup with the Dell DR4000

EMC XTREMIO EXECUTIVE OVERVIEW

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM

Extending the Power of Your Datacenter

The All-Flash Array for the Next Generation Data Center

ROUTING ALGORITHM BASED COST MINIMIZATION FOR BIG DATA PROCESSING

Best Cloud Hosting For SaaS (Singapore) Contractors

Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study

SMART SCALE YOUR STORAGE - Object "Forever Live" Storage - Roberto Castelli EVP Sales & Marketing BCLOUD

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Flash and OpenStack in the Enterprise: An Accelerating Trend

REDEFINE SIMPLICITY TOP REASONS: EMC VSPEX BLUE FOR VIRTUALIZED ENVIRONMENTS

Best Practices for Deploying Citrix XenDesktop on NexentaStor Open Storage

A 5 Year Total Cost of Ownership Study on the Economics of Cloud Storage

OpenStack Benelux. Dan Chester. Seagate Cloud Systems & Solutions

Flash Storage Gets Priority with Emulex ExpressLane

MakeMyTrip CUSTOMER SUCCESS STORY

W H I T E P A P E R R e a l i z i n g t h e B e n e f i t s o f Deduplication in a Backup and Restore System

Transcription:

Growing Pains of Cloud Storage Yih-Farn Robin Chen AT&T Labs - Research November 9, 2014 1 Growth and Challenges of Cloud Storage Cloud storage is growing at a phenomenal rate and is fueled by multiple forces: Mobile Devices, Social Networks, and Big Data. Content is created any time and any where on billions of smartphones and tablets; high resolution photos and videos are frequently uploaded to the cloud automatically as soon as they are captured. A Gartner Report predicts that consumer digital storage would grow to 4.11 zettabytes in 2016, with 36% of it in the cloud[4]. All the social interactions and transactions on the Internet are frequently captured for targeted advertising. Besides Social Networks and E-commerce, Big Data analytics are growing in many other sectors as well, including government, health care, media, and education. An IDC forecast suggests that the storage of Big Data is growing at a compound annual growth rate (CAGR) of 53% from 2011-2016 [1]. The growth of cloud storage has made it an expensive cost component in many cloud services and the cloud infrastructure today. While raw storage is cheap, the performance, availability, and data durability requirements of cloud storage frequently dictate sophisticated, multi-tier, geo-distributed storage solutions. Amazon S3 offers 11 nine s of data durability (99.999999999%), but some other cloud storage services demand even more stringent requirements due to the sheer number of objects being stored in the cloud these days (1.3 billion Facebook users, uploading 350 million photos each day) and the importance of the data (who can afford to lose a video of your baby s first steps?). Data is frequently replicated or mirrored in multiple data centers to avoid catastrophic data loss, but copying data across data centers is very expensive. The networking cost is frequently proportional to the distance and bandwidth requirements between the datacenter sites. Traditional storage systems use dedicated storage hardware and networking to guarantee that the storage QoS requirements such as throughput, latency, and IOPS (total number of input/output operations per second) are preserved. Unfortunately, these dedicated resources are frequently underutilized. Cloud computing promises efficient resource utilization by allowing multiple tenants to share the underlying networking, computing, and storage infrastructure, but it is difficult to provide end-to-end storage QoS guarantees to individual tenants without mechanisms to avoid interference. Typically, in a cloud 1

environment like Openstack, the backend block storage (such as LVM or Ceph RBD) is shared by multiple tenants through a storage virtualization layer like Cinder that attaches VM s to individual storage volumes. It s difficult to provide customized storage QoS to meet different tenant needs. One exception is all-ssd storage arrays - some vendors (such as Solid Fire) allows different tenants to allocate storage volumes with different QoS types and dynamically change them, but all-ssd solutions (on the order of $1000/TB) are still very expensive compared to HDD-based solutions. Moreover, an IOPS guarantee in the backend is not sufficient as there may be contention for network bandwidth or cpu capacity from other tenants. Finally, to operate any Web-scale solutions, infrastructure service providers are moving to scale-out solutions based on commodity hardware, instead of expensive storage appliances, which are frequently more expensive and difficult to adapt to changing workload or specific QoS requirements. Any cloud solution architect must understand the tradeoffs among performance, reliability, and cost of cloud storage to provide an effective overall solution. There are emerging trends that are sweeping through the storage industry to address these issues. This editorial will briefly discuss two software-based solutions: Erasure- Coded Storage and Software-Defined Storage (SDS). Incidentally, erasure coding may play a crucial role in offering design tradeoffs in certain software-defined storage solutions. These two technologies, working together, have a huge potential in addressing the growing pains of cloud storage and help ease the transition from traditional IT storage solutions - as we believe that the cloud storage is likely to support a large portion of all IT storage needs in the future. 2 Erasure-Coded Storage Erasure coding has been widely studied for distributed storage systems. It has been adopted recently by storage vendors such as EMC, Cleversafe, Amplidata, Internet service companies like Facebook, Microsoft, and Google, and Open Source Software like Ceph Swift, Quantcast QFS, and HDFS-RAID. The primary reason is that erasure-coded storage uses less space than fully-replicated storage, while providing similar or higher data durability. To understand why erasure coding is becoming crucial in storage systems, we must explain some basics. Erasure coding is typically controlled by two key parameters: k and n. A file or file segment is typically broken into k chunks, erasure-coded and expanded into n chunks (n > k), and then distributed over n storage servers or hard disks. Any k chunks are sufficient to reconstruct the original file and it can tolerate up to a loss of m = n k chunks without any data loss. One way to think about erasure coding is to consider a system of over-specified linear equations. You are essentially given n linear equations to solve for k variables. Picking any k out of these n equations would be sufficient to determine the values of those k variables. We frequently refer to the first k chunks as the primary 2

chunks and the m chunks as the parity chunks. Since we can vary k and m arbitrarily, a general erasure-coded storage solution in the form of (k, n) or k + m has much higher flexibility in terms of the tradeoffs between storage space and reliability, compared to a RAID 6 system, which uses only two parity blocks and is equivalent to a k+2 erasure-coded scheme. A scalable distributed storage system, such as HDFS and Swift, that is stored on multiple racks or sites typically uses triple redundancy (three copies of each data block) to improve both availability and data durability. As the cloud storage volume continues to grow exponentially, the triple redundancy scheme becomes very expensive. As an example, the Quantacast QFS system uses 6+3 (k =6 and m = 3) erasure coding and is designed to replace HDFS for Map/Reduce processing. HDFS uses triple replication and incurs 200% storage overhead, but it can only tolerate up to ANY two missing blocks of the same data. A 6+3 erasure code, on the other hand, can tolerate up to ANY 3 missing coded blocks with only 50% storage overhead. Such significant cost saving, while maintaining same or higher reliability is why many storage systems are now incorporating erasure codes. One concern of erasure-coded storage is the extra overhead of encoding/decoding time, which depends heavily on the strength of the erasure coding scheme. For a fixed k, higher m or n incurs more computation overhead, while providing higher reliability. As computing servers gain in performance, the computation overhead of commonly-used erasure codes becomes more manageable and the bottleneck is frequently shifted to the disk or network throughput. Another concern is the repair cost. Since erasure coding of 6+3 requires 6 chunks to repair one chunk, the networking cost of repairing a chunk is 6 times that of a simple replication scheme. Some Facebook experiments use a 10+4 erasure coding scheme, which incurs even higher repair cost (but lower storage overhead of 40%). Several repair schemes (such as Xorbas[6] and Hitchhiker[5]) have been proposed to reduce the repair bandwidth, with or without additional storage overhead. As data durability becomes increasingly important for cloud storage, erasure coding can also play an important role in the geo-distribution of cloud storage. It allows chunks of an erasure-coded file to be placed in multiple data centers or racks to increase the data durability. For example, a 9+15 or (9, 24) erasure-coded storage system may put 6 chunks each in New Jersey, Illinois, Texas, and California (East, North, South, and West of U.S.). Since any file can be reconstructed from 9 chunks, it can be reconstructed by retrieving those chunks from any two data centers. Thus, it will tolerate up to two data center failures even in a major natural disaster like Hurricane Sandy in 2013, which caused a loss of 68 billion dollars and affected 24 states in the U.S. On the other hand, since each file retrieval requires accessing chunks from two data centers, it may incur longer latency and significant communication cost, which is fine for archival storage, but not ideal for frequently-accessed storage. Alternatively, if we know the access patterns of certain files and it turns out that most accesses come from New Jersey, we can place 9 chunks in New Jersey, and 5 chunks each in Illinois, Texas, and California. This would allow most accesses to be completed 3

with low network latency, with slightly lower reliability since a data center loss has the potential to lose 9 instead of 6 chunks. The chunk placement issue in erasure-coding affects latency, cost, and reliability in geo-distributed storage systems and is currently an active research field. 3 Software-Defined Storage Cloud computing started with the virtualization of computing resources, followed by the recent advances and rapid innovations in the area of Software-Defined Networks (SDN), which aims to virtualize networking resources and separate the control plane from the data plane. To truly realize and complete the vision of a virtualized data center, however, we need to have software-defined storage that virtualizes storage resources as well and separates storage management software from the underlying hardware. Unfortunately, unlike software-defined networking, there is not a clear definition on what software-defined storage really is even though many storage vendors claimed that they have SDS solutions. Most definitions of SDS include a list of desirable attributes[3][2]. Here I summarize the key ones that are pertinent to multi-tenant cloud storage solutions and call them the S.C.A.M.P. principles of SDS, followed by an overall definition of SDS: Scale-out: SDS should enable a scale-out (horizontal scaling of low-cost, commodity hardware) instead of a scale-up (vertical scaling by using more powerful hardware) storage solution as the workload grows or changes dynamically over time. A scale-out solution is best implemented in a cloud environment with large pools of computing, networking, and storage resources. A cloud storage solution is never just about storage - all the necessary computing and networking resources must also scale accordingly to support common storage operations: deduplication, compression, encryption/decryption, erasure coding/replication, etc. Customizable: SDS should allow customization to meet specific storage QoS requirements. This allows customers to purchase storage solutions based on their specific performance and reliability constraints and avoid unnecessary over-engineering, which frequently happens when a cloud storage service provider tries to meet the needs of multiple customers with diverse requirements. In a multi-tenant cloud with a shared backend storage, it is particularly difficult to guarantee the desired level of storage QoS. The latest version of Openstack Cinder, which provides the block storage service, now allows multiple backends with different QoS types (such as different IOPS or throughput numbers), to partially address this issue. Automation: SDS should automate the complete provisioning and deployment process without human intervention once the storage QoS requirements are clearly defined. The current practice is that a storage architect or a system administrator is 4

intimately involved in designing and installing the storage system. The process is typically error-prone and not amenable to real-time changes to adapt to changing workloads or requirements. Masking: SDS may mask the underlying storage system system complexity (physical or virtualized) and distributed system complexity (single or multiple-site) as long as they can present a common storage API (block, file-system, object, etc.) and meet the QoS requirements. This gives the infrastructure service providers greater flexibility in restructuring their resource pools or architecting the storage systems. For example, Ceph can present a block device API even though the underlying implementation is done in its RADOS object storage. Policy Management: The SDS software must monitor and manage the storage system according to the specified policy and continue to meet the storage QoS requirements despite potential interference from other tenants workloads. It also needs to handle failures and auto-scale the system when necessary to adapt to changing workloads. Note, however, as stated previously, that end-to-end storage QoS guarantee in a multi-tenant cloud is a hard problem and it requires the protection of resources on the entire path from a virtual machine to the storage volume. Microsoft s IOFlow[7] aims to provide an SDN-like controller to control storage bandwidth allocation on the path from a VM to a storage volume. Now we can give a definition of SDS by combining the S.C.A.M.P. principles: an SDS solution should automatically map customizable storage service requirements to a scalable and policy-managed cloud storage service, with abstractions that mask the underlying storage hardware and distributed system complexities. Incidentally, erasure coding is a crucial technology that can be used to meet the SDS customization requirement. For a fixed k, varying n (or m, the number of parity chunks), increases the reliability and replication factor (and hence the storage cost). At the same time, it increases the overall encoding/decoding time, and hence the required computation capacity, and perhaps reduced performance. This allows an automated storage architect to look at the storage QoS requirements and pick particular erasure code parameters (k and m) to meet the minimal reliability and performances requirements with the least amount of storage overhead. 4 Conclusion The rapid growth of cloud storage has created challenges for storage architects to meet the diverse performance and reliability requirements of different customers, while controlling the cost, in a multi-tenant cloud environment. Two emerging trends in the storage industry: erasure-coded storage and software-defined storage have the potential to address 5

the challenges and open up new opportunities for innovations. These two technologies are not entirely independent: erasure coding may play a major role in the customization and automation of software-defined storage, and optimal placement of storage shares in a geo-distributed cloud storage. References [1] Big Data Drives Big Demand for Storage. http://www.businesswire.com/news/home/20130416005045- /en/big-data-drives-big-demand-storage-idc. [2] Software-Defined Storage - Working Draft of SNIA (The Storage Networking Industry Association). http://www.snia.org/sites/default/files/snia Software Defined Storage White Paper- gcsv1.0k-draft.pdf. [3] Earl, B., Black, D., Kistler, J., Novak, R., and Maheshwari, U. Panel on software-defined storage. In HotStorage 13 (June 2013). https://www.usenix.org/conference/hotstorage13/workshop-program/presentation/- earl. [4] Gartner. Gartner Says That Consumers Will Store More Than a Third of Their Digital Content in the Cloud by 2016. http://www.gartner.com/newsroom/id/2060215. [5] Rashmi, K., Shah, N. B., Gu, D., Kuang, H., Borthakur, D., and Ramchandran, K. A hitchhiker s guide to fast and efficient data reconstruction in erasure-coded data centers. In Proceedings of the 2014 ACM conference on SIGCOMM (2014), ACM, pp. 331 342. [6] Sathiamoorthy, M., Asteris, M., Papailiopoulos, D., Dimakis, A. G., Vadali, R., Chen, S., and Borthakur, D. Xoring elephants: Novel erasure codes for big data. In Proceedings of the VLDB Endowment (2013), vol. 6, VLDB Endowment, pp. 325 336. [7] Thereska, E., Ballani, H., O Shea, G., Karagiannis, T., Rowstron, A., Talpey, T., Black, R., and Zhu, T. Ioflow: a software-defined storage architecture. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013), ACM, pp. 182 196. 6