The Future of Software-Defined Storage What Does It Look Like in 3 Years Time? Richard McDougall, VMware, Inc
This Talk: The Future of SDS Hardware Futures & Trends New Workloads Emerging Cloud Native Platforms 2
Storage Workload Map 1 Million IOPS 10 s of Terabytes Focus of Most SDS Today One Thousand IOPS New Workloads 10 s of Petabytes 3
What Are Cloud-Native Applications? Developer access via APIs Continuous integration and deployment Built for scale Application Availability architected in the app, not hypervisor Microservices, not monolithic stacks Decoupled from infrastructure 4
What do Linux Containers Need from Storage? / Parent Linux /etc /containers Container 1 Container 2 /containers/cont1/rootfs /containers/cont2/roofts etc var etc var Ability to copy & clone root images Isolated namespaces between containers QoS Controls between containers Option A: copy whole root tree 5
Containers and Fast Clone Using Shared Read-Only Images / /etc /containers Actions Quick update of child to use same boot image Downsides: Container root fs not writable /containers/cont1/rootfs etc var 6
Docker Solution: Clone via Another Union File System (aufs) Except: CONFIDENTIAL 7
Docker s Direction: Leverage native copy-on-write filesystem / Parent Linux /etc /containers Golden Container 1 /containers/golden/rootfs /containers/cont1/roofts etc var etc var Goal: Snap single image into writable clones Boot clones as isolated linux instances # zfs snapshot cpool/golden@base # zfs clone cpool/root@base cpool/cont1 8
Shared Data: Containers can Share Filesystem Within Host / Containers Today: Share within the host /etc /containers Containers Future: /containers/cont1/rootfs Shared /containers/cont2/roofts Share across hosts Clustered File Systems etc shared var etc shared var container1 container2 CONFIDENTIAL 9
Docker Storage Abstractions for Containers Docker Container Persistent Data /data VOLUME /var/lib/mysql Local Disks Non-persistent boot environment / /etc /var 10
Docker Storage Abstractions for Containers Docker Container Persistent Data /data VOLUME /var/lib/mysql Block Volumes VSAN CEPH RBD ScaleIO Flocker Etc, Non-persistent boot environment / /etc /var 11
Container Storage Use Cases Unshared Volumes Shared Volumes Persist to External Storage Host Host Host Host Host Host C C C C C C C C C C C C API API Storage Platform Storage Platform Cloud Storage Use Case: Running containerbased SQL or nosql DB Use Case: Sharing a set of tools or content across app instances Use Case: Object store for retention/archival, DBaaS for config/transactions 12
Eliminate the Silos: Converged Big Data Platform Compute layer Hadoop batch analysis HBase real-time queries Big SQL Impala, Pivotal HawQ NoSQL Cassandra, Mongo, etc Other Spark, Shark, Solr, Platfora, Etc, Distributed Resource Manager or Virtualization platform File System (HDFS, MAPR, GPFS) POSIX Block Storage Distributed Storage Platform Host Host Host Host Host Host Host 13
Storage Platforms 14
Array Technologies 10 Million IOPS 10 s of Terabytes 10 s of Petabytes Consolidate and Increase Performance One Thousand IOPS 15
Storage Media Technology: Capacities & Latencies 2016 4TB Per Host DRAM 1TB ~100ns Trip to Australia in.3 seconds 4TB 1TB NVM ~500ns Trip to Australia in 1.8 seconds 48TB 4TB NVMe SSD ~10us Trip to Australia in 36 seconds 192 TB 16TB Capacity Oriented SSD ~1ms Trip to Australia in 1 hour 384TB 32TB Magnetic Storage ~10ms Trip to Australia in 12 hours Object Storage ~1s Trip to Australia in 50 days 16
Storage Media Technology 1 Million IOPS 10 s of Gigabytes 2D NAND IOPS Flash 10 s of Terabytes One Thousand IOPS 3D NAND Capacity Flash 17
NVDIMM s DRAM 1TB ~100ns NVDIMM-N [2] (Type 1) DDR3 / DDR4 Std DRAM Event Xfer NAND 1TB NVM ~500ns DRAM + NAND: Non-volatile Memory Load/Store 4TB NVMe SSD ~10us NVDIMM-F [1] (Type 3) DDR3 / DDR4 RAM Buffer Async Block Xfer NAND DRAM + NAND: Non-volatile Memory Block Mode 16TB Capacity Oriented SSD ~1ms Type 4 [1] 32TB Magnetic Storage ~10ms DDR4 Sync Xfer NVM Object Storage ~1s Next Generation Load/Store Memory 18
Intel 3D XPoint Technology Non-Volatile 1000x higher write endurance than NAND 1000x Faster than NAND 10x Denser than DRAM Can serve as system memory and storage Cross Point Structure: allows individual memory cell to be addressed Cacheline load/store should be possible. Stackable to boost density Memory cell can be read or written w/o requiring a transistor, reducing power & cost 1. Intel and Micron Produce Breakthrough Memory Technology, Intel 19
Storage Workload Map: SDS Solutions VSAN 1 Million IOPS 10 s of Terabytes 10 s of Petabytes One Thousand IOPS HDFS 22
What s Really Behind a Storage Array? Data Services Data Services Fail-over Services iscsi, FC, etc Storage Stack OS Storage Stack OS Clustered OS De-dup Volumes Mirroring Caching Etc, Dual Failover Processors Industry std X86 Servers SAS or FC Network Shared Disk SAS or FC 23
Different Types of SDS (1): Fail-Over Software on Commodity Servers Software- Defined Storage Data Services Storage Stack OS Data Services Storage Stack OS Nextenta Data On-tap Virtual Appliance EMC VNX VA etc, Your Servers HP Dell Intel Super-Micro etc, 24
(1) Better: Software Replication Using Servers + Local Disks Clustered Storage Stack Data Services OS Storage Stack Data Services OS Simpler Server configuration Lower cost Less to go wrong Software Replication Not very Scalable 25
(2) Caching Hot Core/Cold Edge Compute Layer All Flash Low Latency Network Host Host Host Host Host Host Host Host Hot Tier Caching of Reads/Writes S3 iscsi Scale-out Storage Service Scale-out Storage Tier 26
(3) Scale-Out SDS Storage Network Protocol Compute iscsi or Proprietary Scalable More Fault Tolerance Data Services Data Services Data Services Data Services Rolling Upgrades OS Storage Stack OS OS OS More Management (Separate compute and storage silos) 27
(4) Hyperconverged SDS Easier Management: (Top-Down Provisioning) App App App App App Scalable Hypervisor Data Services Storage Stack Hypervisor Hypervisor Hypervisor More Fault Tolerance Rolling Upgrades VSAN Fixed Compute and Storage Ratio 28
Storage Interconnects Compute Tier Protocols iscsi FC FCoE NVMe NVMe over Fabrics Hardware Transports Fibre Channel Ethernet Infiniband SAS HCA Device Connectivity SATA SAS NVMe 29
30
Network Lossy Dominant Network Protocol Enables converged storage network: iscsi iser FCoE RDMA over Ethernet NVMe Fabrics 31
Device Interconnects SAS 6-12G @ 1.2Gbytes/sec *There are now PCIe extensions for Storage HCA SATA SAS Server HCA Server PCI-e Bus Intel P3700 @3Gbytes/sec NVMe SSD PCI-e Bus SATA/SAS HDD/SSDs HCA Server PCI-e Bus PCIe Serial Speed Bytes/sec Gen 1 Gen 2 Gen 3 2gbits per lane 256MB/sec 4gbits per lane 512MB/sec 8gbits per lane 1GB/sec 4 lanes gen3 x 4 = 4Gbytes/sec Electrical Adapter NVM Over PCIe Flash: - Intel P3700 @3GB/s - Samsung XS1715 NVM - Future devices 32
Storage Fabrics Server Server Server PCI-e Bus PCI-e Bus PCI-e Bus NIC NIC NIC iscsi iscsi over TCP/IP FCoE FC over Ethernet NVMe over Fabrics (the future!) NVMe over Ethernet Pros: Popular: Windows, Linux, Hypervisor friendly Runs over many networks: routeable. TCP Optional RDMA acceleration (iser) Cons: Limited protocol (SCSI) Performance Setup, naming, endpoint management Pros: Runs on ethernet Can easily gateway between FC and FCoE Cons: Not routable Needs special switches (lossless ethernet) Not popular No RDMA option? Pros: Runs on ethernet Same rich protocol as NVMe In-band administraton & data Leverages RDMA for low latency Cons: 33
Storage Fabrics (PCIe Rack Level) PCIe Switch 4Gbytes/sec 4Gbytes/sec 4Gbytes/sec NVMe Over PCIe PCIe Shared Storage RAID Namespace QoS Autotiering 34 PCIe Rack Fabric (Shared Drives between Hosts) 34
Integrated PCIe Fabric 35
PCI-e Rack-Scale Compute+Storage Storage Box Storage Box PCIe Fabric <- Storage-DMA, RDMA over PCIe -> PCIe Fabric: Share disk boxes across hosts Storage-DMA: Host-Host RDMA capability Leverages the same PCI-e Fabric Compute Compute Compute Compute 36
NVMe The New Kid on the Block Create Delete Read Write SCSI-like: Block R/W Transport NFS-like: Objects/Namespaces vvol-like: Object + Metadata + Policy NVMe Objects Namespaces NVMe Core FC PCIe IB RDMA Ethernet NVMe over Fabrics 37
Now/Future Interconnect Now Next SATA 6g - NVMe SAS 12g 24g NVMe PCI-e Proprietary 4 x 8g NVMe SCSI Over FC/Virtual NVMe Future (Richard s Opinion) FC 8g 32g NVMe over Fabrics Ethernet 10g 20/40g 100g 38
SDS Hardware Futures: Conclusions & Challenges Magnetic Storage is phasing out Better IOPS per $$$ with Flash since this year Better Space per $$$ after 2016 Devices approaching 1M IOPS and extremely low latencies Device Latencies push for new interconnects and networks Storage (first cache) locality is important again Durability demands low-latency commits across the wire (think RDMA) The network is critical Flat 40G networks design point for SDS 39
Beyond Block: Software-Defined Data Services Platforms 40
Storage Protocols NoSQL K-V S3/Swift SCSI POSIX SQL Flavors HDFS HDFS 41
Too Many Data Silos Compute Cloud Developers Are Happy API API API API API Management Block Management Object Management DB HCL HCL HCL HCL HCL Management K-V Management Big Data Lots of Management Different SW Stacks Different HCLs Walled Resources 42
Option 1: Multi-Protocol Stack Compute Cloud Developers Are Happy Management API API API API API Big Data K-V DB Object Block One Stack to Manage Least Capable Services All-in-one Software-defined Storage HCL Shared Resources 43
Option 2: Common Platform + Ecosystem of Services Compute Cloud Developers Are Happy Management API API API API API One Platform To Manage Block CEPH MySQL MongoDB Hadoop Richest Services Software Defined Storage Platform HCL Shared Resources 44
Opportunities for SDS Platform Provide for needs of Cloud Native Data Services E.g. MongoDB, Cassandra, Hadoop HDFS Advanced Primitives in support of Distributed Data Services Policies for storage services: replication, raid, error handling Fault-domain aware placement/control Fine grained policies for performance: e.g. ultra-low latencies for database logs Placement control for performance: locality, topology Group snapshot, replication semantics 45
46