Netapp HPC Solution for Lustre Rich Fenton (fenton@netapp.com) UK Solutions Architect
Agenda NetApp Introduction Introducing the E-Series Platform Why E-Series for Lustre? Modular Scale-out Capacity Density Management features Reference Architecture 2
NetApp At a glance Broad solutions portfolio Industry-leading partners Leading OEM storage provider 12,000+ employees Comprehensive professional services Global support 150+ offices around the world Member of the Fortune 500 Founded in 1993, Sunnyvale CA $6B $5B $4B $3B $2B $1B '05 '06 '07 '08 '09 '10 '11 '12 3
Scale Challenges in Two Domains Content Analytics Bandwidth 4
Big Data Workload Targets Stronger Bias For Scalability/BW Higher Resiliency & IOP/Perf Bias HPC Hadoop Tech Comp FMV Media & Ent DSS/DW Collab Shared IT App Dev Tier-2 BP StorageGRID IT Infra Enterprise Repositories Active Archives Stronger Bias For Simplicity & Cost Larger Capacity & Density Bias NetApp Confidential - Internal Use Only 5
Current Big Data Solutions NetApp Open Solution For Hadoop Increase competitive advantage with insights from your own data HPC: Seismic Processing Accelerate time to results with scalable bandwidth, unmatched density, and proven reliability Full Motion Video Ingest, Exploit and Disseminate life saving information faster HPC: Lustre Massively scalable solution for large scale cluster computing Media Content Management Improve your media workflows to better monetize your content Content Repository (Object Storage) Make informed decisions by leveraging more data from longer periods of time NetApp Confidential - Internal Use Only 6
Current E-Series Platforms NetApp Confidential - NDA Required 7
Drive Enclosure Components DE5600 2U / 24 / 2.5 SAS HDD and SSD Fully redundant Use for highest bandwidth, highest IOPs, moderate capacity DE1600 2U / 12 / 3.5 SAS & NL-SAS HDD Fully redundant Use for moderate bandwidth, moderate IOPs, high capacity DE6600 4U / 60 / 3.5 and 2.5 SAS & NL-SAS and SSD Fully redundant Use for highest bandwidth, high IOPs, highest capacity 8
DE6600 Drive Enclosure High-density disk shelf supporting up to 60 SAS drives 5 horizontal drawers with 12 drives per drawer Just 4U in height and fits standard 19 rack Up to 180 TB capacity with 3TB 7.2K SAS drives Superior RAS Drives remain online when drawer is extended for service Individual drawer extension and front access enables safer drive replacement 80 plus high efficiency power supplies Up to 10% reduction in power/cooling Optimal for RAID 6 layout 6 x (8+2)
Storage System Controllers & ESM 5400 Controller FC or IB host attach Up to 16 x 8Gb/s FC ports or 4 x 40Gb/s IB per dual controller system Up to 24GB cache memory per dual controller system Rich RAID feature set, including T10 PI 3.1GB/s writes, 350K 4K read IOPs SSD per dual controller system Use for highest bandwidth and highest IOPs 2600 Controller SAS, FC or iscsi host attach Up to 4 x 6 x 6Gb/s SAS, 8 x 8Gb/s FC, 2 x 10Gb/s GE iscsi, 8 x 1GE iscsi Up to 8GB memory per dual controller system Rich RAID feature set, including T10 PI 1.4GB/s writes, 70K 4K read IOPs SSD per dual controller system Use for balanced price/performance ESM Non-RAID, electronic card installed in drive enclosure when used as JBOD SAS 2.0 NetApp Confidential - NDA Required 10
Start Small Grow Big Front View Rear View Scale capacity by adding drive enclosures (e.g. DE6600) Scale bandwidth by adding systems (e.g. E5460) E5460 Dual Controller System 2 x E5460 Or 1 x E5460 + 1 x DE6600 3 x E5460 Or 1 x E5460 + 2 x DE6600 4 x E5460 Or 1 x E5460 + 3 x DE6600 5 x E5460 Or 1 x E5460 + 4 x DE6600 6 x E5460 Or 1 x E5460 + 5 x DE6600 1 x E5460 2 x Controller or 2 x ESM Drives (n) 60 drives 90-120 120-180 180-240 240-300 240-360 Capacity (TB) 180 270-360 360-540 480-720 600-900 720 1080 Bandwidth when scaling drive enclosures (GB/s, writes) Bandwidth when scaling systems (GB/s, writes) 3.1 3.1 3.1 3.1 3.1 3.1 3.1 6.2 9.3 12.4 15.5 18.6 NetApp Confidential - Internal and Partner Use Only 11
Linear Scaling with E5424 Scale bandwidth by adding systems (E5424) Scale capacity by adding drive enclosures (DE5600) Front View Rear View 2 x 5400 Controller or 2 x ESM E5424 Dual Controller System 1 x E5424 2 x E5424 or 1 x E5424 + 1 x DE5600 3 x E5424 or 1 x E5424 + 2 x DE5600 4 x E5424 or 1 x E5424 + 3 x DE5600 5 x E5424 or 1 x E5424 + 4 x DE5600 6 x E5424 or 1 x E5424 + 5 x DE5600 Drives (n) 24 48 72 96 120 144 Capacity (TB) 14 22 29 43 43-65 58 86 72-108 86 130 Bandwidth when scaling systems (GB/s, writes)* Bandwidth when scaling drive enclosures (GB/s, writes)* 1.7 3.4 5.1 6.8 8.5 10.2 1.7 3.0 3.1 3.1 3.1 3.1 12
SANtricity Management 13
DDP Overview DDP dynamically distributes data, spare capacity, and protection information across a pool of drives Intelligent algorithm defines which drives should be used for segment placement (7 patents pending) Segments are dynamically recreated/redistributed as needed to maintain balanced distribution Significantly faster return to optimal state following a drive failure (all drives participate in reconstruction) NetApp Confidential - Limited Use
Traditional RAID Volumes Disk drives organized into RAID groups Volumes reside across the drives in a RAID group Performance is dictated by number of spindles Hot spares sit idle until a drive fails Spare capacity is stranded 24-drive system with (2) 10-drive groups (8+2) and (4) hot spares NetApp Confidential - Limited Use
Traditional RAID Drive Failure Data is recreated on hot spare Single drive responsible for all writes (bottleneck) Recreation happens linearly (one stripe at a time) All volumes in that group are significantly impacted 24-drive system with (2) 10-drive groups (8+2) and (4) hot spares NetApp Confidential - Limited Use
DDP Volumes Each volume s data, protection information and spare capacity is distributed across all drives in disk pool All drives are active; none are idle Spare capacity is available to all volumes 24-drive system with single 24-drive pool NetApp Confidential - Limited Use
DDP Drive Failure Data is reconstructed throughout the disk pool All drives share responsibility for writes Operations run in parallel Up to 10X faster return to optimal condition 23 24-drive system with single 24-drive pool NetApp Confidential - Limited Use
Performance Consistent Performance Delivering consistent performance to maintain SLAs Stay in the green zone Performance drop is minimized following drive failure Significantly faster (up to 10X) return to optimal state following a drive failure Large pool of spindles for every volume reduces hot spots Each volume spread across all drives in pool Dynamic distribution/redistribution is a non-disruptive background operation Performance Impact of a Drive Failure Optimal Acceptable Time DDP RAID Rebuild NetApp Confidential - Limited Use
Data Protection Unmatched protection against drive failures and errors Shorter rebuild times reduces exposure to multiple cascading disk failures All drives participate in the dynamic rebuild resulting in up to 10X faster return to optimal state Any stripes experiencing multiple drive failures are given reconstruction priority Protects against unrecoverable media errors during reconstruction Provides significant improvement in data protection Larger pool provides even greater protection Time DDP Dynamic Rebuild RAID Rebuild NetApp Confidential - Limited Use
E-Series with Lustre NetApp Confidential - NDA Required 21
Why E-Series for Lustre High Bandwidth 3GB/s writes in 4U 30GB/s writes in 40U High Density 180TB per 4U 1.8PB per 40U rack High Availability DDP? Serviceability Lustre Support 22
HPC Solution for Lustre Configuration 23
HPC Solution for Lustre Configuration 24
Lustre on E-Series References LLNL 55PB E-Series on Lustre 1TB/s Sequoia Worlds Largest Supercomputer Lund University Sweden Genomic Data 200TB 3GB/s 25
Whamcloud NetApp provide level 1 & level 2 support Worldwide support for Lustre Bug fixes / patches Contract to develop features Installation / Deployment Training 26
More information 27
Email: fenton@netapp.com