Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage



Similar documents
Moving Virtual Storage to the Cloud

Parallels Cloud Storage

June Blade.org 2009 ALL RIGHTS RESERVED

E4 UNIFIED STORAGE powered by Syneto

StorPool Distributed Storage Software Technical Overview

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.

Advanced Knowledge and Understanding of Industrial Data Storage

3 Red Hat Enterprise Linux 6 Consolidation

PARALLELS CLOUD STORAGE

Parallels Cloud Storage

Big data management with IBM General Parallel File System

High Availability with Windows Server 2012 Release Candidate

PARALLELS CLOUD SERVER

Cloud Server. Parallels. An Introduction to Operating System Virtualization and Parallels Cloud Server. White Paper.

Deployment Guide. How to prepare your environment for an OnApp Cloud deployment.

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Zadara Storage Cloud A

Integration of Microsoft Hyper-V and Coraid Ethernet SAN Storage. White Paper

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Open-E Data Storage Software and Intel Modular Server a certified virtualization solution

Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms

Network Attached Storage. Jinfeng Yang Oct/19/2015

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

ZFS Administration 1

EMC Unified Storage for Microsoft SQL Server 2008

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

Reducing Storage TCO With Private Cloud Storage

Quantum StorNext. Product Brief: Distributed LAN Client

Peter Waterman Senior Manager of Technology and Innovation, Managed Hosting Blackboard Inc

NetApp and Microsoft Virtualization: Making Integrated Server and Storage Virtualization a Reality

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY

Block based, file-based, combination. Component based, solution based

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

EMC Integrated Infrastructure for VMware

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

PARALLELS CLOUD STORAGE

How to Choose your Red Hat Enterprise Linux Filesystem

Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure

RFP-MM Enterprise Storage Addendum 1

Microsoft Private Cloud Fast Track

STORAGE CENTER WITH NAS STORAGE CENTER DATASHEET

Sheepdog: distributed storage system for QEMU

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Scalable Windows Storage Server File Serving Clusters Using Melio File System and DFS

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

StarWind Virtual SAN for Microsoft SOFS

Availability and Disaster Recovery: Basic Principles

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

StarWind Virtual SAN Scale-Out Architecture

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January Permabit Technology Corporation

Ceph Distributed Storage for the Cloud An update of enterprise use-cases at BMW

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011

Storage Multi-Tenancy for Cloud Computing. Paul Feresten, NetApp; SNIA Cloud Storage Initiative Member

PARALLELS CLOUD STORAGE

EMC XTREMIO EXECUTIVE OVERVIEW

NOTICE ADDENDUM NO. TWO (2) JULY 8, 2011 CITY OF RIVIERA BEACH BID NO SERVER VIRTULIZATION/SAN PROJECT

9/26/2011. What is Virtualization? What are the different types of virtualization.

HYBRID STORAGE WITH FASTier ACCELERATION TECHNOLOGY

Scaling Cloud Storage. Julian Chesterfield Storage & Virtualization Architect

Best Practices for Implementing iscsi Storage in a Virtual Server Environment

Optimizing Large Arrays with StoneFly Storage Concentrators

POWER ALL GLOBAL FILE SYSTEM (PGFS)

Storage and High Availability with Windows Server 10971B; 4 Days, Instructor-led

Selling Compellent NAS: File & Block Level in the Same System Chad Thibodeau

Deep Dive on SimpliVity s OmniStack A Technical Whitepaper

ADVANCED NETWORK CONFIGURATION GUIDE

Scala Storage Scale-Out Clustered Storage White Paper

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

broadberry.co.uk/storage-servers

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

an introduction to networked storage

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

How To Back Up A Computer To A Backup On A Hard Drive On A Microsoft Macbook (Or Ipad) With A Backup From A Flash Drive To A Flash Memory (Or A Flash) On A Flash (Or Macbook) On

Case Study: Microsoft Internal Internet Service Provider

Affordable. Simple, Reliable and. Vess Family Overview. VessRAID FC RAID Storage Systems. VessRAID SAS RAID Storage Systems

The VMware Administrator s Guide to Hyper-V in Windows Server Brien Posey Microsoft

SEMI SYMMETRIC METHOD OF SAN STORAGE VIRTUALIZATION

White Paper. Low Cost High Availability Clustering for the Enterprise. Jointly published by Winchester Systems Inc. and Red Hat Inc.

Storage Architectures for Big Data in the Cloud

Solaris For The Modern Data Center. Taking Advantage of Solaris 11 Features

Realizing the True Potential of Software-Defined Storage

HadoopTM Analytics DDN

SOP Common service PC File Server

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

iscsi SAN for Education

RAID Overview: Identifying What RAID Levels Best Meet Customer Needs. Diamond Series RAID Storage Array

Microsoft Windows Server Hyper-V in a Flash

What Is Microsoft Private Cloud Fast Track?

Maximizing SQL Server Virtualization Performance

VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS

MICROSOFT HYPER-V SCALABILITY WITH EMC SYMMETRIX VMAX

Building a cost-effective and high-performing public cloud. Sander Cruiming, founder Cloud Provider

Transcription:

Moving Virtual Storage to the Cloud Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage

Table of Contents Overview... 1 Understanding the Storage Problem... 1 What Makes an Ideal Cloud Storage Solution for Hosters?... 2 Scalability... 2 Multiple Nodes... 2 Block-Based ( Not File-Based) Roots for Virtual Environments... 3 Object Handling... 3 Cloning, Snapshotting and Deduplication... 4 Sparse Objects and Thin Provisioning... 4 Object Resizing and Support for Legacy File System Roots... 4 Redundancy... 5 Cluster Simplicity... 5 Storage Expansion... 5 Summary of Requirements... 5 Must Have for Initial Deployment... 5 Nice to Have for Future Deployments... 6 Conclusion... 6 ii

Overview In traditional hosting models, storage is usually directly attached to the node serving up the virtual environments (VEs) 1. The storage usually comes in the form of SATA devices with 1.5-3Gb/s interfaces and an approximate sustained bandwidth of around 100MB/s. The great advantage of local storage is that it's fast (100MB/s) and it's scalable (as you add nodes, they come with more local storage). But the local nature of the traditional storage model is also a disadvantage because if you want to migrate your virtual environments, you have to take a physical copy of their associated storage, as well. This requirement makes the locally attached storage model inappropriate for dynamic, highly fluid environments, such as those found in the cloud. The ideal virtual storage solution for hosters offering cloud services is one that provides the speed and scalability advantages of locally attached storage but adds the ability to migrate, scale, and snapshot the storage. In addition, its cost per terabyte must be similar to that of local storage, and it should provide object copy redundancy for higher data reliability. The purpose of this white paper is to provide guidance to hosters who are thinking of moving from traditional storage to cloud storage to enhance their cloud offerings. By explaining how to evaluate the various features offered by the large range of storage providers in the market today, it will help you choose the system that s best for you. Understanding the Storage Problem Most hosting providers today are using either some kind of storage area network (SAN) or direct attached storage (DAS). The latter typically consists of a large machine with multiple disks in a RAID configuration, together with redundant power units, and it exports its storage as a block-level device either via iscsi or as shared file-system via NFS. Neither approach is ideal for hosters, however. The problem with enterprise-class SAN solutions is that they are very expensive, so they will significantly decrease your per-customer margin. The common drawback with DAS is that it doesn t allow you to leverage unused disk space if other resources such as CPU or memory are already assigned to virtual environments. As a result, you re unable to make efficient use of your available disk space (see Figure 1). 1 Virtual environments are individual Infrastructure as a Service (IaaS) units, provided either by hypervisor or container technology. 1

Figure 1. Large hosting providers in Europe and North America typically use only 36% of their disk space. What Makes an Ideal Cloud Storage Solution for Hosters? In this section, we look at various terms used to describe storage and relate them back to features that will (or won't) be useful in a hosting environment. Scalability Given the necessity of maintaining local scaling of storage bandwidth, it s clear that you need a distributed environment, because a centralized server system cannot scale. For instance, a centralized NFS server, even on 10Gb/s links, can serve just ten nodes before reaching its maximum bandwidth. Even if you add fabricswitching technology (such as a SAN or InfiniBand switch) to deliver the full available bandwidth from the servers to the clients, it still won t match the capacity you can get from a distributed environment. Because of this requirement for a distributed system, any system that uses file servers like NFS, iscsi, or Fibre Channel is inappropriate. And although it s possible to use split-fabric technology to overcome some of the objections to a centralized store, such solutions tend to raise the cost per terabyte beyond acceptable limits, making this approach impractical for hosters as well. Multiple Nodes In a hosting environment, pretty much every box has one or two 1Gb/s Ethernet interfaces, connected by a switch and a local disk. This means that in a storage hosting environment, it s possible to evenly host at 100MB/s, provided that the storage is served evenly across the cluster, with each node serving storage to all the others at its maximum link speed. However, for this approach to work, the node count of the storage cluster must be the same as that of the compute cluster. This requirement is easiest to achieve if all the nodes in the cluster are both storage and compute nodes. 2

Therefore, in a hosting environment, scalable storage is best delivered by reusing the existing nodes to run as storage servers. But if you take this approach, it s important to not disrupt the existing services running on the nodes by overtaxing them with excessive resource requirements so you need to choose a storage system that makes minimal use of resources. It s interesting to note that a 64-node rack cluster with a fast Ethernet switch supporting fabricswitching technology can, using nothing more than 1Gb network cards and fairly run-of-the-mill SATA devices, deliver an aggregate storage bandwidth of around 50GB/s, provided the data placement is done correctly. That s a greater bandwidth capability than a super-fast SSD array on a modern InfiniBand network, which can only deliver around 40GB/s aggregate on the fabric. Block-Based ( Not File-Based) Roots for Virtual Environments VEs may either (1) use a shared, file-based root (using NFS, a cluster file system like GFS2 or CEPH or, in the case of containers, binding the mounting directly into the host); or (2) use a block-based root (usually either iscsi or a block projection of an image file). The problem with the first approach is that all shared, file-based root systems suffer from a scaling problem as the number of VEs rises. That s because each VE root contains a large number of small files, and aggregating them in a file environment causes the file server to see a massively growing number of objects. As a result, metadata operations will run into bottlenecks. To explain this problem further: if each root has N objects and there are M roots, tracking the combined objects will require an N times M scaling of effort. Additionally, the objects must be tracked by the metadata, and in a root file system, the size of the objects can vary by ten orders of magnitude (that is, from a few bytes up to many gigabytes). Tracking objects of such variable sizes creates considerable metadata overhead. For all these reasons, we don t recommend using shared file-based roots for VEs in highly scalable cloud systems. In contrast, block-based roots can elegantly avoid the metadata scaling and sizing problems of shared file-based roots because the metadata is effectively partitioned. That is, each root runs a separate file system, which encapsulates its own metadata. Consequently, the metadata of the block export system needs to track only the metadata of the object representing the root, so scaling depends only on M (instead of N times M). In addition, because each image representing the block data ranges only from about a gigabyte to a terabyte in size (i.e., varying by only three orders of magnitude), you can use simpler techniques in the metadata to track these objects. Object Handling The ideal backend for handling block-based roots is one that s capable of doing incredibly rapid and random updates to the objects. This requirement tends to rule out abstracted image storage like Amazon S3 an approach that makes it very hard to do random updates. For the fastest possible updates, once the object layout has been identified from the metadata, you shouldn t need any further metadata communication to read from and write to the object. Further, the node performing the read or write should be able to communicate directly with the node(s) providing 3

the object. Any encapsulation should be minimal, so the update process can use as much of the available network bandwidth as it needs. Cloning, Snapshotting and Deduplication Cloning involves using copy-on-write 2 techniques to produce a duplicate of an image object. Snapshotting involves making a volatile copy of the object, either to permit rollback from a fallible operation, such as an update, or simply to facilitate a backup. Deduplication involves identifying and combining storage regions with identical content in different objects. All three techniques have been important for some time in enterprises that manage virtual environments. However, hosters currently have less need for deduplication, as surveys have shown that they have considerable available space in their environments. Thus, although deduplication may become a requirement for the future, it isn't currently high on hosters feature list at present. Sparse Objects and Thin Provisioning Sparse objects are objects in which not every byte has been allocated so these objects actually consume less storage than their size would imply. Sparse objects exist because root image files don't necessarily occupy all the space they have allocated (to see this, just look at the free space on any computer). The technique of using sparse objects is also referred to as "thin provisioning" by array vendors. As with deduplication, use of sparse objects typically is not of interest to hosting providers because they already generally have more storage than they need. The other factor that makes sparse objects a non-issue for hosters is the widespread use of cloning: in a root that s been cloned multiple times, unoccupied space will still point back to the master copy, so the only space saving that a sparse object would create would be a single block in the master copy, despite the existence of hundreds of clones. However, sparse objects are still a useful feature to keep in mind particularly because they allow hosting providers to consider business plans based on overcommitting storage. Object Resizing and Support for Legacy File System Roots Obviously, it should be a requirement that objects representing roots be resizable especially since cloud customers are usually charged per unit of storage, and therefore will want to optimize their use of storage as much as possible. However, in a block-based root system, resizing depends not only on the capabilities of the object store, but also on the file system chosen for the root (a choice that is usually made by the consumer of the VE). The problem here is that, for practical reasons, many root file systems cannot actually be shrunk (a classic example in Linux is the ext3 file system). It is therefore useful for any cloud object store to have assistive technologies for shrinking legacy file systems that are unshrinkable. 2 Copy on write is a technique that enables multiple images to share the same storage block, as long as users only read from those images. Once users write to an image, it will create a block that is unique to that image. 4

Redundancy Hosters today tend to provide redundancy for their local storage solutions by using hardware RAID systems on their individual nodes. Therefore, any cloud storage system based on these nodes can take advantage of the RAID system to provide initial redundancy. However, hosting providers need to be able to survive node failure as well as disk failure, so a cloud storage solution should also be able to duplicate objects across multiple nodes in the cluster. And because object duplication takes additional space, hosters should be able to specify the desired number of object copies. Cluster Simplicity One of the cardinal principles of system design is that a system should be as simple as possible, containing only as much complexity as is needed to perform all of its functions. Additional complexity beyond this point simply increases overhead, impairing the system s efficiency and generally weakening it. This is true in part because shared access to objects adds complexity to the cluster algorithms, and in part because the only use case for object roots is exclusive. In fact, mounting the same root to more than one machine will corrupt the underlying file system, making it important that the storage system itself be able to detect and prevent this condition. Additional complexity also increases the amount of testing required to thoroughly debug the code but since most organizations have fixed budgets for testing, the net effect is that testing is less comprehensive. For all these reasons, it s generally a bad idea to base your cloud storage on clustered file systems. Storage Expansion Since it s a given that customer storage requirements will only increase over time, expanding the capacity of the cloud storage system should be extremely easy whether it s done by adding new disks to individual nodes (preferably by hot-plugging them, so the storage system sees the new disks and simply absorbs them) or by adding additional nodes to the cluster. Summary of Requirements This section summarizes our recommendations for cloud storage, based on the preceding observations. Assuming you ll be deploying the cloud storage solution on mostly existing hardware, we ve divided the requirements into two categories: must have for initial deployment, and nice to have for the future. Must Have for Initial Deployment The absolutely critical requirements for your initial cloud storage deployment are: Cost-effectiveness. The storage solution should be able to reuse your existing hardware setup, require little extra hardware, and be as light as possible in terms of its resource footprint. Multi-node performance. The solution must be spread over enough nodes to be able to deliver the same level of performance as your current locally attached storage. 5

Block-based objects. To assure optimal performance and handling, the technology must be based on objects representing roots. Cloning and snapshotting. The solution must support copy-on-write use of master images, as well as the ability to freeze the state of the storage at any point in time. Hot-plugability. The solution should be easy to expand by simply inserting additional nodes and devices. Failure tolerance and redundancy. At a minimum, the solution should protect against singledisk failure. Ideally, it should protect against single-node failure, as well. Exclusive object access. The solution should ensure that an object representing a root file system is mounted only once in the cluster at any given time. Nice to Have for Future Deployments Some additional features that you may find convenient to add for future deployments are: Deduplication, to free up additional storage space. Sparse objects (thin provisioning), so you can safely overcommit storage. Assistance for shrinking legacy file systems, so customers who are charged per unit of storage can optimize their use of storage. Conclusion As the cloud revolution progresses, the ability to separate storage from your physical systems will become increasingly important. By understanding what your storage requirements are and how well different cloud storage systems match them, you ll be able to take full advantages of the benefits that cloud storage has to offer. To learn more about how cloud storage systems can increase the reliability and scalability of your hosted services and how Parallels helps service providers deliver cloud storage, please visit www.parallels.com/products/pcs. 6