Moving Virtual Storage to the Cloud



Similar documents
Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage

Cloud Server. Parallels. An Introduction to Operating System Virtualization and Parallels Cloud Server. White Paper.

Parallels Cloud Storage

June Blade.org 2009 ALL RIGHTS RESERVED

3 Red Hat Enterprise Linux 6 Consolidation

Big data management with IBM General Parallel File System

E4 UNIFIED STORAGE powered by Syneto

Advanced Knowledge and Understanding of Industrial Data Storage

StorPool Distributed Storage Software Technical Overview

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.

PARALLELS CLOUD STORAGE

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Cloud Server. Parallels. Key Features and Benefits. White Paper.

PARALLELS CLOUD SERVER

Integration of Microsoft Hyper-V and Coraid Ethernet SAN Storage. White Paper

Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms

EMC Unified Storage for Microsoft SQL Server 2008

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY

Quantum StorNext. Product Brief: Distributed LAN Client

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

High Availability with Windows Server 2012 Release Candidate

Zadara Storage Cloud A

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

EMC Integrated Infrastructure for VMware

Network Attached Storage. Jinfeng Yang Oct/19/2015

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Deployment Guide. How to prepare your environment for an OnApp Cloud deployment.

Open-E Data Storage Software and Intel Modular Server a certified virtualization solution

How to Choose your Red Hat Enterprise Linux Filesystem

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

Optimizing Large Arrays with StoneFly Storage Concentrators

Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure

ADVANCED NETWORK CONFIGURATION GUIDE

NetApp and Microsoft Virtualization: Making Integrated Server and Storage Virtualization a Reality

Peter Waterman Senior Manager of Technology and Innovation, Managed Hosting Blackboard Inc

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

EMC XTREMIO EXECUTIVE OVERVIEW

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Block based, file-based, combination. Component based, solution based

ZFS Administration 1

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

Ceph Distributed Storage for the Cloud An update of enterprise use-cases at BMW

PARALLELS CLOUD STORAGE

Best Practices for Implementing iscsi Storage in a Virtual Server Environment

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January Permabit Technology Corporation

RAID technology and IBM TotalStorage NAS products

StarWind Virtual SAN for Microsoft SOFS

an introduction to networked storage

Reducing Storage TCO With Private Cloud Storage

StarWind Virtual SAN Scale-Out Architecture

RFP-MM Enterprise Storage Addendum 1

Building Backup-to-Disk and Disaster Recovery Solutions with the ReadyDATA 5200

STORAGE CENTER WITH NAS STORAGE CENTER DATASHEET

Virtualizing the SAN with Software Defined Storage Networks

VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

How To Get A Storage And Data Protection Solution For Virtualization

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

Microsoft Private Cloud Fast Track

Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011

Realizing the True Potential of Software-Defined Storage

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

WHITE PAPER: ENTERPRISE SECURITY. Symantec Backup Exec Quick Recovery and Off-Host Backup Solutions

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

Storage Multi-Tenancy for Cloud Computing. Paul Feresten, NetApp; SNIA Cloud Storage Initiative Member

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Business-centric Storage for small and medium-sized enterprises. How ETERNUS DX powered by Intel Xeon processors improves data management

Every organization has critical data that it can t live without. When a disaster strikes, how long can your business survive without access to its

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

Scala Storage Scale-Out Clustered Storage White Paper

Scalable Windows Storage Server File Serving Clusters Using Melio File System and DFS

Replacing SAN with High Performance Windows Share over a Converged Network

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

EonStor DS remote replication feature guide

Availability and Disaster Recovery: Basic Principles

Affordable. Simple, Reliable and. Vess Family Overview. VessRAID FC RAID Storage Systems. VessRAID SAS RAID Storage Systems

Violin Memory Arrays With IBM System Storage SAN Volume Control

Technology Insight Series

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Recoup with data dedupe Eight products that cut storage costs through data deduplication

Evaluation of Enterprise Data Protection using SEP Software

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

How To Design A Data Centre

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

An Oracle White Paper August Oracle VM 3: Server Pool Deployment Planning Considerations for Scalability and Availability

Storage Architectures for Big Data in the Cloud

HadoopTM Analytics DDN

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Data Center Solutions

Building a Scalable Storage with InfiniBand

Transcription:

Moving Virtual Storage to the Cloud White Paper Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage www.parallels.com

Table of Contents Overview... 3 Understanding the Storage Problem... 3 What Makes an Ideal Cloud Storage Solution for Hosters?... 4 Scalability... 4 Multiple Nodes... 4 Block-Based ( Not File-Based) Roots for Virtual Environments... 4 Object Handling... 5 Cloning, Snapshotting and Deduplication... 5 Sparse Objects and Thin Provisioning... 5 Object Resizing and Support for Legacy File System Roots... 5 Redundancy... 6 Cluster Simplicity... 6 Storage Expansion... 6 Summary of Requirements... 6 Must Have for Initial Deployment... 6 Nice to Have for Future Deployments... 7 Conclusion... 7 Contact Us... 7 Parallels Moving Virtual Storage to the Cloud 2

Overview In traditional hosting models, storage is usually directly attached to the node serving up the virtual environments (VEs) 1. The storage usually comes in the form of SATA devices with 1.5-3Gb/s interfaces and an approximate sustained bandwidth of around 100MB/s. The great advantage of local storage is that it s fast (100MB/s) and it s scalable (as you add nodes, they come with more local storage). But the local nature of the traditional storage model is also a disadvantage because if you want to migrate your virtual environments, you have to take a physical copy of their associated storage, as well. This requirement makes the locally attached storage model inappropriate for dynamic, highly fluid environments, such as those found in the cloud. The ideal virtual storage solution for hosters offering cloud services is one that provides the speed and scalability advantages of locally attached storage but adds the ability to migrate, scale, and snapshot the storage. In addition, its cost per terabyte must be similar to that of local storage, and it should provide object copy redundancy for higher data reliability. The purpose of this white paper is to provide guidance to hosters who are thinking of moving from traditional storage to cloud storage to enhance their cloud offerings. By explaining how to evaluate the various features offered by the large range of storage providers in the market today, it will help you choose the system that s best for you. Understanding the Storage Problem Most hosting providers today are using either some kind of storage area network (SAN) or direct attached storage (DAS). The latter typically consists of a large machine with multiple disks in a RAID configuration, together with redundant power units, and it exports its storage as a block-level device either via iscsi or as shared file-system via NFS. Neither approach is ideal for hosters, however. The problem with enterprise-class SAN solutions is that they are very expensive, so they will significantly decrease your per-customer margin. The common drawback with DAS is that it doesn t allow you to leverage unused disk space if other resources such as CPU or memory are already assigned to virtual environments. As a result, you re unable to make efficient use of your available disk space (see Figure 1). 100% Disk Space Utilization (% Free vs. Used) 90% 80% 70% 60% 50% 86.48 65.73 57.25 49.43 71.64 51.64 40% 30% 20% 10% 0% 34.27 42.75 50.57 48.36 28.36 13.52 SP1 SP2 SP3 SP4 SP5 SP6 Service Providers (SP) Used Free Average Figure 1. Large hosting providers in Europe and North America typically use only 36% of their disk space. Parallels Moving Virtual Storage to the Cloud 3

What Makes an Ideal Cloud Storage Solution for Hosters? In this section, we look at various terms used to describe storage and relate them back to features that will (or won t) be useful in a hosting environment. SCALABILITY Given the necessity of maintaining local scaling of storage bandwidth, it s clear that you need a distributed environment, because a centralized server system cannot scale. For instance, a centralized NFS server, even on 10Gb/s links, can serve just ten nodes before reaching its maximum bandwidth. Even if you add fabric-switching technology (such as a SAN or InfiniBand switch) to deliver the full available bandwidth from the servers to the clients, it still won t match the capacity you can get from a distributed environment. Because of this requirement for a distributed system, any system that uses file servers like NFS, iscsi, or Fibre Channel is inappropriate. And although it s possible to use split-fabric technology to overcome some of the objections to a centralized store, such solutions tend to raise the cost per terabyte beyond acceptable limits, making this approach impractical for hosters as well. MULTIPLE NODES In a hosting environment, pretty much every box has one or two 1Gb/s Ethernet interfaces, connected by a switch and a local disk. This means that in a storage hosting environment, it s possible to evenly host at 100MB/s, provided that the storage is served evenly across the cluster, with each node serving storage to all the others at its maximum link speed. However, for this approach to work, the node count of the storage cluster must be the same as that of the compute cluster. This requirement is easiest to achieve if all the nodes in the cluster are both storage and compute nodes. Therefore, in a hosting environment, scalable storage is best delivered by reusing the existing nodes to run as storage servers. But if you take this approach, it s important to not disrupt the existing services running on the nodes by overtaxing them with excessive resource requirements so you need to choose a storage system that makes minimal use of resources. It s interesting to note that a 64-node rack cluster with a fast Ethernet switch supporting fabric-switching technology can, using nothing more than 1Gb network cards and fairly run-of-the-mill SATA devices, deliver an aggregate storage bandwidth of around 50GB/s, provided the data placement is done correctly. That s a greater bandwidth capability than a super-fast SSD array on a modern InfiniBand network, which can only deliver around 40GB/s aggregate on the fabric. BLOCK-BASED ( NOT FILE-BASED) ROOTS FOR VIRTUAL ENVIRONMENTS VEs may either (1) use a shared, file-based root (using NFS, a cluster file system like GFS2 or CEPH or, in the case of containers, binding the mounting directly into the host); or (2) use a block-based root (usually either iscsi or a block projection of an image file). The problem with the first approach is that all shared, file-based root systems suffer from a scaling problem as the number of VEs rises. That s because each VE root contains a large number of small files, and aggregating them in a file environment causes the file server to see a massively growing number of objects. As a result, metadata operations will run into bottlenecks. To explain this problem further: if each root has N objects and there are M roots, tracking the combined objects will require an N times M scaling of effort. Additionally, the objects must be tracked by the metadata, and in a root file system, the size of the objects can vary by ten orders of magnitude (that is, from a few bytes up to many gigabytes). Tracking objects of such variable sizes creates considerable metadata overhead. For all these reasons, we don t recommend using shared file-based roots for VEs in highly scalable cloud systems. In contrast, block-based roots can elegantly avoid the metadata scaling and sizing problems of shared file-based roots because the metadata is effectively partitioned. That is, each root runs a separate file system, which encapsulates its own metadata. Consequently, the metadata of the block export system Parallels Moving Virtual Storage to the Cloud 4

needs to track only the metadata of the object representing the root, so scaling depends only on M (instead of N times M). In addition, because each image representing the block data ranges only from about a gigabyte to a terabyte in size (i.e., varying by only three orders of magnitude), you can use simpler techniques in the metadata to track these objects. OBJECT HANDLING The ideal backend for handling block-based roots is one that s capable of doing incredibly rapid and random updates to the objects. This requirement tends to rule out abstracted image storage like Amazon S3 an approach that makes it very hard to do random updates. For the fastest possible updates, once the object layout has been identified from the metadata, you shouldn t need any further metadata communication to read from and write to the object. Further, the node performing the read or write should be able to communicate directly with the node(s) providing the object. Any encapsulation should be minimal, so the update process can use as much of the available network bandwidth as it needs. CLONING, SNAPSHOTTING AND DEDUPLICATION Cloning involves using copy-on-write 2 techniques to produce a duplicate of an image object. Snapshotting involves making a volatile copy of the object, either to permit rollback from a fallible operation, such as an update, or simply to facilitate a backup. Deduplication involves identifying and combining storage regions with identical content in different objects. All three techniques have been important for some time in enterprises that manage virtual environments. However, hosters currently have less need for deduplication, as surveys have shown that they have considerable available space in their environments. Thus, although deduplication may become a requirement for the future, it isn t currently high on hosters feature list at present. SPARSE OBJECTS AND THIN PROVISIONING Sparse objects are objects in which not every byte has been allocated so these objects actually consume less storage than their size would imply. Sparse objects exist because root image files don t necessarily occupy all the space they have allocated (to see this, just look at the free space on any computer). The technique of using sparse objects is also referred to as thin provisioning by array vendors. As with deduplication, use of sparse objects typically is not of interest to hosting providers because they already generally have more storage than they need. The other factor that makes sparse objects a non-issue for hosters is the widespread use of cloning: in a root that s been cloned multiple times, unoccupied space will still point back to the master copy, so the only space saving that a sparse object would create would be a single block in the master copy, despite the existence of hundreds of clones. However, sparse objects are still a useful feature to keep in mind particularly because they allow hosting providers to consider business plans based on overcommitting storage. OBJECT RESIZING AND SUPPORT FOR LEGACY FILE SYSTEM ROOTS Obviously, it should be a requirement that objects representing roots be resizable especially since cloud customers are usually charged per unit of storage, and therefore will want to optimize their use of storage as much as possible. However, in a block-based root system, resizing depends not only on the capabilities of the object store, but also on the file system chosen for the root (a choice that is usually made by the consumer of the VE). The problem here is that, for practical reasons, many root file systems cannot actually be shrunk (a classic example in Linux is the ext3 file system). It is therefore useful for any cloud object store to have assistive technologies for shrinking legacy file systems that are unshrinkable. Parallels Moving Virtual Storage to the Cloud 5

REDUNDANCY Hosters today tend to provide redundancy for their local storage solutions by using hardware RAID systems on their individual nodes. Therefore, any cloud storage system based on these nodes can take advantage of the RAID system to provide initial redundancy. However, hosting providers need to be able to survive node failure as well as disk failure, so a cloud storage solution should also be able to duplicate objects across multiple nodes in the cluster. And because object duplication takes additional space, hosters should be able to specify the desired number of object copies. CLUSTER SIMPLICITY One of the cardinal principles of system design is that a system should be as simple as possible, containing only as much complexity as is needed to perform all of its functions. Additional complexity beyond this point simply increases overhead, impairing the system s efficiency and generally weakening it. This is true in part because shared access to objects adds complexity to the cluster algorithms, and in part because the only use case for object roots is exclusive. In fact, mounting the same root to more than one machine will corrupt the underlying file system, making it important that the storage system itself be able to detect and prevent this condition. Additional complexity also increases the amount of testing required to thoroughly debug the code but since most organizations have fixed budgets for testing, the net effect is that testing is less comprehensive. For all these reasons, it s generally a bad idea to base your cloud storage on clustered file systems. STORAGE EXPANSION Since it s a given that customer storage requirements will only increase over time, expanding the capacity of the cloud storage system should be extremely easy whether it s done by adding new disks to individual nodes (preferably by hot-plugging them, so the storage system sees the new disks and simply absorbs them) or by adding additional nodes to the cluster. Summary of Requirements This section summarizes our recommendations for cloud storage, based on the preceding observations. Assuming you ll be deploying the cloud storage solution on mostly existing hardware, we ve divided the requirements into two categories: must have for initial deployment, and nice to have for the future. MUST HAVE FOR INITIAL DEPLOYMENT The absolutely critical requirements for your initial cloud storage deployment are: Cost-effectiveness. The storage solution should be able to reuse your existing hardware setup, require little extra hardware, and be as light as possible in terms of its resource footprint. Multi-node performance. The solution must be spread over enough nodes to be able to deliver the same level of performance as your current locally attached storage. Block-based objects. To assure optimal performance and handling, the technology must be based on objects representing roots. Cloning and snapshotting. The solution must support copy-on-write use of master images, as well as the ability to freeze the state of the storage at any point in time. Parallels Moving Virtual Storage to the Cloud 6

Hot-plugability. The solution should be easy to expand by simply inserting additional nodes and devices. Failure tolerance and redundancy. At a minimum, the solution should protect against single-disk failure. Ideally, it should protect against single-node failure, as well. Exclusive object access. The solution should ensure that an object representing a root file system is mounted only once in the cluster at any given time. NICE TO HAVE FOR FUTURE DEPLOYMENTS Some additional features that you may find convenient to add for future deployments are: Deduplication, to free up additional storage space. Sparse objects (thin provisioning), so you can safely overcommit storage. Assistance for shrinking legacy file systems, so customers who are charged per unit of storage can optimize their use of storage. Conclusion As the cloud revolution progresses, the ability to separate storage from your physical systems will become increasingly important. By understanding what your storage requirements are and how well different cloud storage systems match them, you ll be able to take full advantages of the benefits that cloud storage has to offer. To learn more about how cloud storage systems can increase the reliability and scalability of your hosted services and how Parallels helps service providers deliver cloud storage, please visit www.parallels.com/products/pcs. NOTES 1 2 Virtual environments are individual Infrastructure as a Service (IaaS) units, provided either by hypervisor or container technology. Copy on write is a technique that enables multiple images to share the same storage block, as long as users only read from those images. Once users write to an image, it will create a block that is unique to that image. Contact Us For more information about Parallels hosting and cloud solutions, please contact: Parallels, Inc. 500 SW 39th St. Suite 200 Renton, WA 98057 +1 425 282 6448 www.parallels.com Copyright 2013 Parallels IP Holdings GmbH. All rights reserved. Parallels and the Parallels logo are registered trademarks of Parallels IP Holdings GmbH. Other product and company names are the trademarks or registered trademarks of their respective owners. Parallels Moving Virtual Storage to the Cloud 7