Scaling Cloud Storage Julian Chesterfield Storage & Virtualization Architect
Outline Predicting Cloud IO Workloads Identifying the bottlenecks The distributed SAN approach OnApp s integrated storage platform
Predicting cloud IO workloads Can we estimate cloud IO workloads Provision SAN and optimise placement of VMs Plan future growth and SAN scale There is no crystal ball! Best case: we can estimate whether workloads are: a) Read-heavy b) Write-heavy c) Read-write balanced
Identifying the performance bottlenecks Provisioning SAN infrastructure requires planning Calculate VM density per HV and provision sufficient network throughput Switch load + switch redundancy SAN network controller interfaces to support redundancy/multi-pathing And most importantly Scale up SAN network controller capacity to support maximum number of HV endpoints + aggregated VM IO load
Traditional SAN architecture Scale-up of Hypervisors and Virtual Machines increases load on network Virtual Machines Hypervisor s SAN NIC bottlenecks can impact cloud performance SAN
Distributed storage architecture Uses storage drives within each Hypervisor host Virtual Machines Hypervisor s SAN
Distributed storage as a solution Storage Hypervisor converged VM hypervisor platform and storage host Remove centralised SAN bottlenecks by distributing drives across hypervisors Maintain redundancy by replicating across hypervisors Maintain performance by removing centralised bottlenecks
OnApp s integrated storage platform
Architecture A smart, independent array of storage nodes Multicast channel Virtual Storage Drives: Customer A: Customer B: HV1 HV2 HVn SATA 5600 SATA 7200 SATA 5600 SSD SATA 7200 SSD DATASTORES High Bonded NICs for bandwidth aggregation Mid Low Commodity fast-switched ethernet backplane
Key Storage Features Fault tolerant No central single point of failure Self healing & user-initiated repair Configurable data replication Easy online data migration High performance Configurable striping Optimized Disk I/O Optimized VM placement Multiple performance tiers Low cost Over-commit improves hardware ROI Supported on commodity hardware No need for separate SAN Support & integration bundled in Built for Cloud Cloud boot for fast deployment Linear growth with hypervisors Online hot-swap of drives Physical & virtual disk IOPS reporting
Motivation Maintain performance at any scale As Hypervisor deployment scales, so does IOPS and capacity No single IO contention point VM workload and data co-location Use commodity hardware within existing server chassis Most server chassis already deployed with some internal storage, with capacity to add more Cloud boot removes the need for local persistent install Storage cost Disrupt the current established enterprise storage market Remove the dependency on expensive, centralised disk arrays
Decentralised Consistency #1 Physical drives are grouped in Datastores (Diskgroups) Each drive has unique identifier Drives are location independent Datastores have a particular replication and striping policy associated A new virtual disk object (or LUN) is assigned to a datastore Content owning members are selected from the diskgroup # of Content replicas/logical stripe members is based on Datastore policy Each vdisk content instance stores information about the other owners *For that unique piece of content* => Each vdisk content object is maintained consistently across the owning members only via transaction-based protocol
Decentralised Consistency #2 Storage controller VMs report information only about content for which they are authoritative Content membership changes are strictly handled across the ownership group The same vdisk information is always reported by all members of the ownership set => A vdisk object is therefore known about correctly by any other member of the SAN, or not at all.
Internal HV architecture
Optimised virtual machine placement When migrated, VM is located on the HV whose physical disk contains the data replica VMs 1 2 3 4 5 HVs Physical disks Virtual disk VM with data replicas Writes are distributed, reads are local to minimize network traffic
Availability First fully supported GA version available via OnApp Cloud v3.0 platform install Visit the OnApp stand for a demo & more detail
Thanks! Julian Chesterfield Storage & Virtualization Architect