21 st Century Storage What s New and What s Changing Randy Kerns Senior Strategist Evaluator Group
Overview New technologies in storage - Continued evolution - Each has great economic value - Differing adoption rates - Some require operational changes Vendors offering solutions - Differences vary - Some vendors are slow to deliver May have a intermediate solution first
Technologies Covered Here Evaluator Group view as the most important currently: Solid-state technology Scale-out storage systems Storage virtualization disk virtualization - Includes storage pooling, thin provisioning, device protection Storage systems with storage as an application Object-based storage
Solid-State Technology Overview
Solid-State Technology Multiple implementations - DRAM-based: uses battery to retain state - NAND FLASH SLC: single level cell: 1 or 0, 100,000 s of writes MLC: multiple level cell: 3 or 4 states, 10,000 s of writes Wear-leveling to balance usage across cells ECC used for correction of single and double bit errors PCM: phase change memory future
Solid-State Technology Usages - SSD: Solid-state device, Solid-state disk or drive - Flash memory cards for servers - Tiering in storage systems - Caching in storage systems - All solid state storage systems - Consumer
Solid-State Device Usage Justifications for SSDs - Rugged no moving parts - Performance no rotational latency or actuator movement as with HDDs - Reliability MTBF: typically 6x over HDD - Operational costs power requirement is typically 1/4 th of HDD
Solid-State Devices in Storage General forms - As HDD replacement in existing storage system architecture HDD form factor Connected with storage interface Fibre Channel, SAS, SATA For server, still requires HBA - Memory extension add-in card to the server or system on PCIe Bus (called Flash card) - PCIe-SSD (or SSS) solid state device that connects to PCIe bus via extender adapter Trend data reduction in SSDs - Compression & deduplication to multiply capacity - New techniques will drive towards price parity with HDDs
Solid-State Devices in Storage All SSD systems - Storage platforms that only use SSDs - Can be designed with only SSD Optimize for performance and data placement equal wear out Consideration different with compression and deduplication vs. storage systems with spinning disks - Some traditional disk system vendors will sell systems with only SSDs installed Still have design considerations for spinning disks - Many new vendors A great deal to learn about storage, reliability, availability, support, etc. Each try to feature something new
All SSD System Vendors PureStorage Nimbus Nimble Data Whiptail Nexgen Solid Fire Texas Memory Systems Violin Kaminario Virident CacheIQ
Data Movement Storage System Tiering Within-the-Box In-the-box tiering Major performance boost Limit economic impact of costs of higher performance drives Utilizes resources better Solid-State Devices High-Performance Disks High-Capacity Disks
Tiering vs. Caching Storage tiers - Implies a particular price and performance metric - Provides actual capacity - All content resides on media - Performance is limited only to media speed - May be improved with caching - Major gain: storage consolidation Caching - Not considered actual storage - All capacity must be backed by non-volatile media - Performance limited to size of cache - Limited use for random workloads Actual location for data Data is transient
Scale-Out Storage File and Block
Scale-Out Storage: Benefits Historically, storage systems scaled up by adding more storage devices - Created imbalance of access density IOPS/GB - Economic choice created administrative challenge Scale-out technologies added more performance along with capacity - More control function - controllers
Scale-Out vs. Scale Up
Scale-Out Storage: File Storage Scale-out NAS - Implemented using distributed file system usually Clustered hardware Global namespace across nodes - Key implementation points Linkage between nodes InfiniBand, Ethernet:10Gb I/O balancing between nodes Capacity balancing between nodes Coherency across distance Switching requirements
Scale-Out NAS Vendors Dell FS7600 / FS8600 EMC Isilon HP StoreAll X9000 HDS HNAS IBM SONAS NetApp FAS 8.1.1 Xyratex ClusterStor And others
Scale-Out Storage: Block Storage Two approaches used - Multiple controller cards to common backend storage device pools Backplane connected typically Normally associated with high-end enterprise systems - Federation of separate controller nodes (included with NAS systems in some cases as integrated unified storage) Complexities in cache coherency and I/O routing Vendor differences
Scale-Out Block Storage Vendors Data Direct Networks Dell EqualLogic HP Lefthand P4000 IBM XIV And others
Storage Virtualization Disk Virtualization and Thin Provisioning
Storage Virtualization: Disk Virtualization Recent storage system architecture - Use storage pooling concept carve up device resources into smaller granular segments (typically called chunks) - Allows new or more efficient operation Finer grained allocation of capacity Enables thin provisioning Allows new, selectable data protection Leads to great capacity proficiency - Newly designed storage systems - Added to older system designs
Thin Provisioning Change in storage system design to allocation space only on write Historically committed entire volume (LUN) to operating system for use - All capacity allocated - Operating system or application used space as needed and managed the space - Led to inefficiencies not all space for volumes was utilized
Typical Allocation Volume/LUN Fully Allocated Unused Capacity Used Capacity Trapped and unavailable for other uses Traditional Allocation
Disk Virtualization When RAID added to storage, storage architecture was changed to use data from stripes across drives in RAID group - Parity within the RAID group - Distribution of data wide striping was based on number of disks in RAID group Still have capacity efficiency issue - Volumes with trapped capacity
RAID Group Volumes LUN 1 LUN 2 LUN 3 LUN 4.... LUN Assigned From Stripes in RAID Group Physical Disk Drives RAID Stripe 0 1 2 3 P RAID Group Example: RAID 5 4+1 (Data and Parity)
Thin Provisioning Trapped Capacity LUN 1 LUN 2 LUN 3 LUN 4 Available Available Available Used Capacity Used Capacity Used Used Capacity Capacity Available Used Thinly Provisioned Volumes
Disk Virtualization Storage system architecture to only assign capacity as it was written some systems have had for some time but not fully utilized Creates storage pools - Set of chunks of data size of chunk could be specified at time of configuration - Volumes were created without assigning real capacity - As data written, chunks were assigned For modern architecture, chunks were distributed across physical disks to match data protection choices - Different RAID levels - Advanced data protection techniques
Storage Pooling Thin Provisioned LUN Chunks added as required Chunks.... Storage Pool.... Physical Disks
Storage Pooling with Data Protection Across Pool Thin Provisioned LUN Protection of Chunks Chunks added as required Chunks.... Storage Pool Chunks mapped to specific physical devices.... Physical Disks
Thin Provisioning When space in volume is no longer required, data is deleted - For example: space can be returned to storage pool called space reclamation Different approaches for space reclamation - Storage system scans for zero blocks and returns those - APIs with applications and file systems to send UNMAP SCSI command to storage system to release space - Administrator copies volume that has grown fat to another thin volume to return to thin state
Forward Error Correction The RAID problem - Large capacity disk drives take a long time to rebuild Potentially days Probability of a second error occurring is high loss of data
Forward Error Correction The RAID problem - continued - Multiple parity disks being utilized Complexity management and algorithms How many should be used as capacity gets larger? - Information is being distributed to multiple locations Does RAID work with multi-site locations? How much protection is required?
Forward Error Correction New approach for data protection - For failed drives - For locations that don t respond when geographically dispersed Information Dispersal Algorithms or IDA - Study area in computer science - Early implementations from different sources no large scale adoption - More correct term is Forward Error Correction - Also Erasure Codes is used based on the details of implementation
Forward Error Correction Selectivity on the amount of data protection - Example: 12 of 16 drives must be present tolerate loss of 4 - Example: 12 of 16 sites must respond with date tolerate 4 sites not responding - Important characteristic is the ability to set the protection level Potential benefit - Disk system with protection set calculated for lifespan of storage system - Failures tolerated over time no replacements expected (with warning and exception conditions) - No service planned Reduces warranty cost for vendor Reduces service interruption / impact for IT
Forward Error Correction Current products with Forward Error Correction implementations - EMC Isilon - Amplidata - Cleversafe - EMC Atmos - Scality - NEC HydraStor - DataDirect Networks Web Object Scaler Expect others this will become a competitive area
Storage Systems Storage as an Application
Hypervisor within the Storage System Most storage systems hardware uses standard processor technology New: Use of hypervisor in storage system - Allows for multiple personalities Changes dynamic of storage system - Run embedded storage system software as an application Block storage control software NAS software file system clustered, distributed, single - Can run additional features as applications example is replication software - Can run application software on storage system
Block Storage File Storage Replication Application Application Advanced Storage System Hypervisor Device
Object-based Storage
Object-based Storage Renewed interest in object-based storage First very successful product was Centera - Used content address as object ID New solutions developing - Most use REST and HTTP for communication - Generally best described as files with additional metadata is the object
Object-based Storage Storage systems designed that can store and retrieve objects Reasons: - Ability to manage more objects than standard file systems - Additional information metadata aids in characterization of information Controls such as protection requirements, retention periods, other useful information Standards exist for object-based storage - Not necessarily completely followed Applications must change to use object-based storage
Object Storage System Application HTTP / REST Network Typically 10GbE Object Storage System Objects Metadata File Attributes CIFS, NFS Data & Attributes File Storage Adds Metadata Administrator Defined HTTP REST/CD MI Object Storage (local or Cloud)
Object Storage Vendors Amplidata Data Direct Networks Web Object Scaler Dell DX EMC Centera HDS Hitachi Content Platform Others including cloud locations such as Amazon S3
Summary Storage systems are changing - New capabilities Storage as an application running in a virtual machine Virtual machines within storage systems - New architectures Storage pooling in disk virtualization Thin provisioning Technology will cause more changes - Solid state technology - Tiering & caching - Object based storage More options, better opportunities and choices
Thank You! Questions? Randy Kerns: randy@evaluatorgroup.com Twitter: @rgkerns Blog: http://itknowledgeexchange.techtarget.com/storage-soup/