The Use of Flash in Large-Scale Storage Systems. Nathan.Rutman@Seagate.com



Similar documents
Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Database Hardware Selection Guidelines

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

WHITE PAPER. Drobo TM Hybrid Storage TM

Integrating Flash-based SSDs into the Storage Stack

Flash Memory Technology in Enterprise Storage

Taking Linux File and Storage Systems into the Future. Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated

89 Fifth Avenue, 7th Floor. New York, NY White Paper. HP 3PAR Adaptive Flash Cache: A Competitive Comparison

Flash Controller Architecture for All Flash Arrays

How SSDs Fit in Different Data Center Applications

FlashSoft Software from SanDisk : Accelerating Virtual Infrastructures

SOLID STATE DRIVES AND PARALLEL STORAGE

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE. Matt Kixmoeller, Pure Storage

Flash-optimized Data Progression

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

The Revival of Direct Attached Storage for Oracle Databases

Flash 101. Violin Memory Switzerland. Violin Memory Inc. Proprietary 1

Intel RAID SSD Cache Controller RCS25ZB040

Automated Data-Aware Tiering

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

Accelerating Server Storage Performance on Lenovo ThinkServer

Application-Focused Flash Acceleration

The Data Placement Challenge

High Performance Computing OpenStack Options. September 22, 2015

Data Center Solutions

All-Flash Storage Solution for SAP HANA:

High Performance Computing Specialists. ZFS Storage as a Solution for Big Data and Flexibility

Scaling from Datacenter to Client

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011

Understanding the Economics of Flash Storage

ARCHITECTURE WHITE PAPER ARCHITECTURE WHITE PAPER

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

Data Center Storage Solutions

Flash Performance in Storage Systems. Bill Moore Chief Engineer, Storage Systems Sun Microsystems

Solid State Technology What s New?

Lab Evaluation of NetApp Hybrid Array with Flash Pool Technology

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Flash at the price of disk Redefining the Economics of Storage. Kris Van Haverbeke Enterprise Marketing Manager Dell Belux

NetApp High-Performance Computing Solution for Lustre: Solution Guide

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Architecting a High Performance Storage System

Current Status of FEFS for the K computer

SSDs: Practical Ways to Accelerate Virtual Servers

The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000

Accelerating Datacenter Server Workloads with NVDIMMs

SSDs: Practical Ways to Accelerate Virtual Servers

VNX HYBRID FLASH BEST PRACTICES FOR PERFORMANCE

Accelerate Oracle Performance by Using SmartCache of T Series Unified Storage

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Data Center Solutions

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Big Fast Data Hadoop acceleration with Flash. June 2013

Storage Fusion Xcelerator

HP Smart Array Controllers and basic RAID performance factors

NEXSAN NST STORAGE FOR THE VIRTUAL DESKTOP

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Solid State Storage in Massive Data Environments Erik Eyberg

ebay Storage, From Good to Great

EOFS Workshop Paris Sept, Lustre at exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

EMC VFCACHE ACCELERATES ORACLE

System Architecture. In-Memory Database

Understanding Microsoft Storage Spaces

Dial up or down the flash in the system depending on requirements Same OS, feature set or uxp whether you;re using hybrid or AFA

DEPLOYING HYBRID STORAGE POOLS With Sun Flash Technology and the Solaris ZFS File System. Roger Bitar, Sun Microsystems. Sun BluePrints Online

Increasing Storage Performance

Software-defined Storage Architecture for Analytics Computing

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

SSDs and RAID: What s the right strategy. Paul Goodwin VP Product Development Avant Technology

PIONEER RESEARCH & DEVELOPMENT GROUP

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

ntier Verde Simply Affordable File Storage

New Storage System Solutions

NetApp FAS Hybrid Array Flash Efficiency. Silverton Consulting, Inc. StorInt Briefing

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array

Make A Right Choice -NAND Flash As Cache And Beyond

Enhancements of ETERNUS DX / SF

Deep Dive on SimpliVity s OmniStack A Technical Whitepaper

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card

Hadoop on the Gordon Data Intensive Cluster

SLIDE 1 Previous Next Exit

February, 2015 Bill Loewe

The Archival Upheaval Petabyte Pandemonium Developing Your Game Plan Fred Moore President

Disk Storage & Dependability

Sonexion GridRAID Characteristics

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

SATA Evolves SATA Specification v3.2 FMS 2013

21 st Century Storage What s New and What s Changing

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

Solid State Storage in the Evolution of the Data Center

Performance Benchmark for Cloud Block Storage

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

Real-Time Storage Tiering for Real-World Workloads

Transcription:

The Use of Flash in Large-Scale Storage Systems Nathan.Rutman@Seagate.com 1

Seagate s Flash! Seagate acquired LSI s Flash Components division May 2014 Selling multiple formats / capacities today Nytro XP6500 Sandforce SF3700 flash controller 2

Why flash? Advantages Flash is faster in latency and throughput Flash is much faster for seeks Disadvantages Flash is more expensive, less dense, limited write cycles If it wasn t, would we put flash everywhere Tradeoffs Performance Cost Durability System Complexity 3

Architectures 4

Flash and Lustre Where can we use flash in our Lustre systems? Flash on MDT Flash on OSS servers Flash on OST devices Flash in front of Lustre 5

Flash SSDs on the MDT Assumed to be a perfect candidate Small, random IO High IOPS But: a large MDS can use RAID to bundle spindles We sell a 7+7 RAID 10 of 10K drives; mdtest with SSHD and SSDs drives shows no improvement Even for WIBs and journals Disks are cheaper, much better durability Some of our customers demand SSDs anyhow Conclusion: SSDs make sense for small MDTs, but for larger RAID pools the MDS should be your bottleneck, not the drives 6

Flash on OSTs SSHD Local metadata (journal device) Flash pool Flash cache 7

SSHD Solid State Hybrid Drive Basics: HDD magnetic media operates with the same characteristics as the HDD drive Persistent backup of DRAM-based write cache All writes to DRAM, coalesced to disk Back-EMF powers writes DRAM->flash NAND flash is used for read caching Read data moved to flash by popularity Neutral: Streaming performance same as HDD Other caching layers (OST, client) limits usefulness of read cache Positive: Double random IO write/rewrite perf Remove effects of local fs metadata updates Host-pinning - journal, block bitmaps, MMP 8

OST MD on flash devices SSHD for local filesystem operations block bitmaps write-intent bitmaps (wibs) Conclusion: should be a good use case for SSHD External MDRAID WIBs speeds RAID rebuilds is an optimization, not critical Conclusion: put it on an SSD External EXT4 journals frequent writing makes flash are sequential anyhow Conclusion: use fast HDD for reliability 9

Flash OST pool Basics: Pools of SSD and HDD OSTs Set striping per dir / usecase Neutral: No automatic migration Can use HSM policy engine and special data mover to automate tiering Simpler than new burst buffer software Positive: High random-io r/w Easy to adjust sizes Conclusion: May make sense, depending on use case 10

OSS flash cache separate flash layer between OSTs and HDDs All IOs flow through a large local flash cache device Writethrough or writeback Positive: Coalesce random writes Cache reads Bypass for cache miss - no penalty Large capacity - may help sequential to a point Transparent to upper layers Smart caching algorithms, LRUs SCSI on PCIe or NVMe Conclusion: Good choice for improving random & cached IO transparently 11

Interposing flash layer in front of Lustre separate flash and Lustre systems; burst buffer Basics New software layer in front of Lustre All IO written to this new interface Layered tiers: primary to flash, secondary to Lustre Negative Additional read latency if strictly tiered (stage-in, stage-out) Complexity: more layers, new API, HSM, failure handling/reliability, HA/dual-porting, RDMA/0-copy Neutral New frontend software stack New frontend semantics Positive Accelerates random and sequential No Lustre overhead Restricted interface with backend 12

Use Cases When should we use flash? 13

Use cases Defensive IO Job-based staging Random IO Streaming IO Capability duty cycle (all IO, all the time?) Read-heavy loads 14

Defensive IO (checkpoint-restart) To BB or Not to BB? Use flash as fast temp cache Snapshot all memory in 5 min Spool off to disk (sequential) in 55 min 1PB, 5 min to flash: 3.3TB/s / 500MB/s = 6600 SSD (@1TB) 1PB, 55 min to disk: 300GB/s / 100MB/s = 3000 HDD (@8TB) and backend capacity 50PB: 6250 HDDs = 27 min But wait a minute, we have 6600 extra SSDs in the system, let s buy 6600 Lustre HDDs instead: 12900 HDDs = 12.9 min & 103PB So for the cost delta of flash over HDD, you save 8 mins IO and lose 53PB of capacity. What if we 2x the speed of flash and HDD? 8.7 min for HDD alone. Conclusion: defensive IOs with a BB can buy some compute time, for a cost - but DO THE MATH. 15

Job-based staging Job scheduler pre-stages all job data into flash Job does IO to flash Scheduler destages on completion Lustre Flash Pool or Burst Buffer Double buffers can hide stage time Read and write Good for fixed dataset sizes and distinct filesets Bad for unknown sizes (e.g. searches) Requires scheduler knowledge of filesets Everything is written twice Beware flash write cycles Conclusion: like defensive IO, do the math. Relative sizes means disk-tier performance is free 16

Random IO OST SSHD or OSS flash cache accelerate small IO limited cache size Flash pool accelerate specified IO manual direction BB accelerate all IO new frontend software now must include cache handling, consolidation Hard drives aren t great at random IO Conclusion: flash pool good for known random jobs, OSS flash cache good for unknown 17

Streaming IO OST SSHD or OSS flash cache limited cache size, not generally useful Flash pool large cache size BB add destaging time Operating on the data, or archiving it? How continuous / large are your streams? Hard drives are good at this already Conclusion: flash pool works if you can age it out 18

Capability duty cycle How much of the time do you need max bandwidth? Analysis of a major HPC production storage system* 99% of the time, storage BW utilization < 33% of max 70% of the time, storage BW utilization < 5% of max Assume: Flash 5x disk speed Even distribution of load between 70-99% move the capability......to the capacity Conclusion: buys minor (~5%) time impact *DDN, MSST15 http://storageconference.us/2015/presentations/vildibill.pdf 19

Read-heavy loads If working set is large, flash doesn t help big data, data mining If working set fits into a flash buffer still have to stage-in or readthrough cache and hope for repeat reads 20

Summary Architecture Performance Cost Complexity MDT flash no gain $$$ - OST SSHD $ - OST MD flash $ Θ OST pool $$$ - OSS cache $$ ΘΘ BB? size dep $$$ ΘΘΘ if perm size dep 21

What does Seagate do with flash? Seagate makes both HDD and full range of flash devices All the systems that Seagate Systems Group sells contain flash in some form Both are getting faster and bigger MDS: RAID10 10K fast and durable OST: SSD for WIBs and journals OSS flash cache: will be offering SAS and NVME based systems over the next 12 months Flash as an interposing layer in front of Lustre: instead, flash and HDD characteristics should be treated quantitatively, not qualitatively (no separate systems) 22

Thanks 23