Dynamic Disk Pools Delivering Worry-Free Storage Dr. Didier Gava EMEA HPC Storage Architect MEW Workshop 2012 Liverpool, Germany
Historic View Of RAID Advancement RAID 5 1987 Single disk failure protection Required hot spare 100% performance in optimal state Same degraded performance as RAID 6 Degraded performance same as RAID 6 RAID 6 2006 Double disk failure protection Reduced need for hot spares 100% performance in optimal state 60% to 70% performance in degraded state (with failed drive or during rebuild) Degraded performance continues until disk replacement and data rebuilt Rebuild of drive very time consuming 2012 DDP Multiple disk failure protection Virtual hot spares 100% performance in optimal state 85% to 90% performance in degraded state Degraded performance continues until pool is rebalanced Rebalance of pool minimum 2x faster than RAID 6 2
Dynamic Disk Pools (DDP) Delivering New Levels Of Performance and Protection Consistent Performance Data Protection Versatile Efficiency NetApp Confidential - Limited Use
Dynamic Disk Pools Overview DDP dynamically distributes data, spare capacity, and protection information across a pool of drives Intelligent algorithm defines which drives should be used for segment placement (7 patents pending) Segments are dynamically recreated/redistributed as needed to maintain protection/distribution NetApp Confidential - Limited Use
Data rebalancing in minutes vs. days 120 Optimal Performance Impact of a Drive Failure 100 80 60 40 Acceptable Performance RAID Business Impact Time DDP 2.5 DAYS DDP RAID 6 4+ DAYS Business Impact 99% Exposure Improvement 20 0 Hours 300GB Drive 96 Minutes! (estimated) 1.3 DAYS 900GB Drive 2TB Drive Typical Rebalancing Improvements based on 24-drive Mixed Workload 3TB Drive No Impact 5
How It Works NetApp Confidential - Limited Use
Traditional RAID Volumes Disk drives organized into RAID groups Volumes reside across the drives in a RAID group Performance is dictated by number of spindles Hot spares sit idle until a drive fails Spare capacity is stranded 24-drive system with (2) 10-drive groups (8+2) and (4) hot spares NetApp Confidential - Limited Use
Traditional RAID Drive Failure Data is recreated on hot spare Single drive responsible for all writes (bottleneck) Recreation happens linearly (one stripe at a time) All volumes in that group are significantly impacted 24-drive system with (2) 10-drive groups (8+2) and (4) hot spares NetApp Confidential - Limited Use
DDP Volumes Each volume s data, protection information and spare capacity is distributed across all drives in disk pool All drives are active; none are idle Spare capacity is available to all volumes 24-drive system with single 24-drive pool NetApp Confidential - Limited Use
Dynamic Disk Pools Drive Failure Data is rebalanced throughout the disk pool All drives share responsibility for writes Operations run in parallel Up to 8X faster return to optimal condition 23 24-drive system with single 24-drive pool NetApp Confidential Limited Use 10
Storage Efficiency improvement Think GB/year vs GB/second NetApp Confidential - Internal Use Only 11
Worry-Free Benefits of DDP in HPC Environments Ease of deployment and expansion Ease of configuration Improved data protection Easy expansion reduces need for over provisioning Fastest return to optimal file system state Quickest return to high-availability state Flexible: ANY* number of drives is good Critical segments are prioritized for the system rebalances automatically. reconstruction Consistent file system performance consistent performance under all conditions and over time 99.999% availability storage system Self-healing Dynamic Rebalancing quickly returns file system to optimal state Data is dynamically rebalanced across remaining drives reducing exposure to data loss ALL drives contribute to regeneration (scalable, very fast) More efficient use of storage hardware ALL drives receive some regenerated segments Lower power, reduced cooling and Service advantage smaller footprint Failed drives can remain in place until Active sparing means all disks do useful normal service action work All drives participate in the fast Dynamic Rebalance process No spare disks to manage 12
More on www.sgi.com/ddp 13