Sheepdog: distributed storage system for QEMU

Similar documents

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed and Cloud Computing

Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage

AlphaTrust PRONTO - Hardware Requirements

Cloud Optimize Your IT

Hiroshi Miura. Scalable Private Cloud Storage with Full OSS Stack. CloudOpen Japan 5, June System Platform Sector, NTT DATA Corporation 年月日

Virtual server management: Top tips on managing storage in virtual server environments

Virtual Machine Migration NSRC

Storage Architectures for Big Data in the Cloud

Scaling Cloud Storage. Julian Chesterfield Storage & Virtualization Architect

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Handling Multimedia Under Desktop Virtualization for Knowledge Workers

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Maximizing SQL Server Virtualization Performance

Moving Virtual Storage to the Cloud

StorPool Distributed Storage Software Technical Overview

NETAPP WHITE PAPER USING A NETWORK APPLIANCE SAN WITH VMWARE INFRASTRUCTURE 3 TO FACILITATE SERVER AND STORAGE CONSOLIDATION

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Windows Server 2012 授權說明

Solution for private cloud computing

Security. Environments. Dave Shackleford. John Wiley &. Sons, Inc. s j}! '**»* t i j. l:i. in: i««;

Shared Parallel File System

BUILDING OF A DISASTER RECOVERY FRAMEWORK FOR E-LEARNING ENVIRONMENT USING PRIVATE CLOUD COLLABORATION

Virtual Machine Synchronization for High Availability Clusters

Challenges for Large Distributed IaaS Cloud -- WIDE Cloud --

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

NET ACCESS VOICE PRIVATE CLOUD

VMware vsphere 5.1 Advanced Administration

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

Server and Storage Virtualization with IP Storage. David Dale, NetApp

HP StorageWorks Enterprise File Services Clustered Gateway. Technical Product Review. Sprawl Makes Inexpensive Servers Expensive

Windows Server 2012 R2 Hyper-V: Designing for the Real World

Bosch Video Management System High Availability with Hyper-V

Sanbolic s SAN Storage Enhancing Software Portfolio

The Power of Deduplication-Enabled Per-VM Data Protection SimpliVity s OmniCube Aligns VM and Data Management

IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC b Test Report Date: 27, April

What s New with VMware Virtual Infrastructure

AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013

Investigation of storage options for scientific computing on Grid and Cloud facilities

Cloud n Service Presentation. NTT Communications Corporation Cloud Services

Performance Testing at Scale

Nutanix Tech Note. Configuration Best Practices for Nutanix Storage with VMware vsphere

CS 6343: CLOUD COMPUTING Term Project

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Dell Compellent Storage Center SAN & VMware View 1,000 Desktop Reference Architecture. Dell Compellent Product Specialist Team

StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud

Microsoft Hyper-V chose a Primary Server Virtualization Platform

Top 5 Reasons to choose Microsoft Windows Server 2008 R2 SP1 Hyper-V over VMware vsphere 5

High Availability Storage

Cloud Computing through Virtualization and HPC technologies

Intro to Virtualization

2) Xen Hypervisor 3) UEC

SCI Briefing: A Review of the New Hitachi Unified Storage and Hitachi NAS Platform 4000 Series. Silverton Consulting, Inc.

How To Design A Data Centre

The next step in Software-Defined Storage with Virtual SAN

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Red Hat Cluster Suite

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Drobo How-To Guide. Use a Drobo iscsi Array as a Target for Veeam Backups

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

VMware vsphere 5.0 Boot Camp

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version Fix Pack 2.

Enterprise. ESXi in the. VMware ESX and. Planning Deployment of. Virtualization Servers. Edward L. Haletky

FAQ. NetApp MAT4Shift. March 2015

Nexsan Accelerates Cloud Business Models to Improve Operating Savings

THE SUMMARY. ARKSERIES - pg. 3. ULTRASERIES - pg. 5. EXTREMESERIES - pg. 9

A Deduplication File System & Course Review

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

HBA Virtualization Technologies for Windows OS Environments

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Evaluation of Enterprise Data Protection using SEP Software

Implementation of Hadoop Distributed File System Protocol on OneFS Tanuj Khurana EMC Isilon Storage Division

Building Storage as a Service with OpenStack. Greg Elkinbard Senior Technical Director

Efficient Load Balancing using VM Migration by QEMU-KVM

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

Cloud Computing. Alex Crawford Ben Johnstone

Philips IntelliSpace Critical Care and Anesthesia on VMware vsphere 5.1

RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS: COMPETITIVE FEATURES

Lab Validation Report. By Steven Burns. Month Year

How To Use Vsphere On Windows Server 2012 (Vsphere) Vsphervisor Vsphereserver Vspheer51 (Vse) Vse.Org (Vserve) Vspehere 5.1 (V

White Paper. Recording Server Virtualization

Lab Validation Report

Ceph. A file system a little bit different. Udo Seidel

Hitachi NAS Platform and Hitachi Content Platform with ESRI Image

Table of Contents. vsphere 4 Suite 24. Chapter Format and Conventions 10. Why You Need Virtualization 15 Types. Why vsphere. Onward, Through the Fog!

Analysis of VDI Storage Performance During Bootstorm

How To Connect Virtual Fibre Channel To A Virtual Box On A Hyperv Virtual Machine

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

EMC PERFORMANCE OPTIMIZATION FOR MICROSOFT FAST SEARCH SERVER 2010 FOR SHAREPOINT

Distributed Block-level Storage Management for OpenStack

Virtual Computing and VMWare. Module 4

System Requirements Version 8.0 July 25, 2013

東海大學資訊工程研究所碩士論文

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

SMB Direct for SQL Server and Private Cloud

Transcription:

Sheepdog: distributed storage system for QEMU Kazutaka Morita NTT Cyber Space Labs. 9 August, 2010

Motivation There is no open source storage system which fits for IaaS environment like Amazon EBS IaaS environment VMs Block level volumes Host machines Requirements for storage system - Scalability - Reliability - Manageability 2

Why another storage system? Why not SAN storage? Large proprietary storage system is too expensive Shared storage could be a single point of failure Why not distributed file systems? (e.g. Ceph, Luster) Complex configuration about cluster membership FC switch SAN storage VMs Host machines VMs Host machines ethernet switch meta-data servers data servers Distributed file system 3

Sheepdog Fully symmetric architecture there is no central node such as a meta-data server VMs Host machines Scalability Scales to 1000 nodes Reliability Data replication No SPOF Ethernet switch Manageability Autonomous Dynamic membership Advanced volume manipulation 4

Design: not general file system We have simplified the design significantly API is designed specific to QEMU We cannot use sheepdog as a file system One volume can be attached to only one VM at once Communicate with other node sheeps sheep sheep (server process) Guest OS Guest OS sheep disk I/O QEMU disk disk Host machine disk 5

Sheepdog components corsync corsync corsync gateway VM -------- qemu VM -------- qemu gateway VM -------- qemu Cluster management Object storage gateway storage server storage server storage server 6

Object storage Stores flexible-sized data with a unique ID (objects) Clients don't care about where to store objects Two kinds of objects in Sheepdog One writer, one reader No writer, multiple readers write read Object Object 7

How to store volumes? Volumes are divided into 4 MB data objects Allocation table is stored to VDI object /dev/hda Volume VDI Object 0 1 3 0 1 2 3 Data Objects VDI Object 0 1 3 Object storage 8

Snapshot Copy VDI Object, and make allocated data objects read-only Updating read-only objects causes copy-on-write VDI Object VDI Object VDI Object 10 11 13 10 11 13 20 11 23 VDI Object (snapshot) VDI Object (snapshot) 10 11 13 10 11 13 Create snapshot Copy-on-write 9

Where to store objects? We use consistent hashing to decide which node to store objects Each node is also placed on the ring addition or removal of nodes does not significantly change the mapping of objects 150 NAME: D ID : 133 NAME: E ID : 169 125 175 0 100 25 75 NAME: C ID : 81 NAME: A ID : 18 50 NAME: B ID : 55 10

Replication Many distributed storage systems use chain replication to maintain I/O ordering Sheepdog can use direct replication because write collision cannot happen Chain replication gateway Direct replication gateway storage server storage server storage server storage server storage server storage server write read write (parallelizable) read (from one of them) 11

Cluster node management Totem ring protocol Dynamic membership management Total order and reliable multi-cast Virtual synchrony Machine A MSG1 MSG2 MSG3 B is down MSG4 MSG5 Machine B MSG1 MSG2 MSG3 Machine C MSG1 MSG2 MSG3 B is down MSG4 MSG5 12

Cluster node management Corosync cluster engine Implementation of totem-ring protocol Is adopted by well-known open source projects (Pacemaker, GFS2, etc) Sheepdog uses corosync multi-cast to avoid metadataserver Machine A Machine B Machine C Lock volume a Lock volume b Lock volume a Lock volume b Lock volume a Lock volume b Lock volume b (failed) Lock volume b (failed) Lock volume b (failed) 13

Performance (1 VM) Local disk NFS (local disk) NFS (NetApp Storage) Sheepdog VM VM VM VM FAS 2020 (NetApp storage) CPU : Core2 Quad 2.4GHz Memory : 1 GB Network : 1 Gbps Disk : SATA 7200 rpm Machines (Sheepog): 8 Data redundancy (Sheepdog): 1~3 14

$ dbench -s -S Performance (1 VM) Physical machine NFS (FAS 2020) Local disk Sheepdog (rep=3) Sheepdog (rep=2) Sheepdog (rep=1) NFS (FAS 2020) NFS (local disk) Local disk Comparable to NFS (local disk) 0 5 10 15 20 25 30 35 40 Throughput (MB/sec) 15

Performance (~ 256 VMs) Sheepdog NFS (NetApp FAS 2020) OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS Ethernet Ethernet NFS CPU : Core2 Quad 2.4GHz Memory : 1 GB Network : 1 Gbps Disk : SATA 7200 rpm Host machines : 8 ~ 64 Virtual machines : 1 ~ 256 Data redundancy : 3 FAS 2020 (NetApp storage) 16

Performance (~ 256 VMs) $ dbench -s -S Throughput scales according to the number of host machines 17

TODO items Short-term goals (in few month) More scalability Support libvirt, EBS API Performance improvement Long-term goals (in one or two years) guarantee reliability and availability under heavy load tolerance against network partition (split-brain) load balancing corresponding to I/O, CPU, memory load 18

Conclusion Sheepdog is scalable, manageable, and reliable storage pool for IaaS environment We hope Sheepdog will become the de facto standard of cloud storage system Further information Project page http://www.osrg.net/sheepdog/ Mailing list sheepdog@lists.wpkg.org 19

20

Appendix 21

Architecture: fully symmetric Zero configuration about cluster members Similar to Isilon architecture VMs Host machines ethernet switch ethernet switch VMs Sheepdog cluster Sheepdog cluster VMs Use sheepdog as a network storage Use sheepdog as a virtual infrastructure 22

Node membership history All nodes stores the history of node membership Objects are stored with the version of node membership (epoch) epoch 1 A, B Node membership 2 A, B, C 3 A, B, C. D, E 4 B, C Machine A Machine B Machine C Machine D Machine E obj:1 obj:1 Time Machine C joined Machine D, E joined obj:1 obj:1 obj:2 obj:2 obj:2 obj:2 obj:3 obj:3 Machine A, D, E left obj:4 obj:4 23

Strong consistency Prevent from reading old objects A and B stores obj (epoch 1) Without epoch A B C obj obj With epoch A B C obj:1 obj:1 C joined (epoch 2) obj obj obj:1 obj:1 obj is updated to obj' B and C are failed (epoch 3) obj obj obj' obj' obj:1 obj':2 obj':2 obj:1 24