Important Differences Between Consumer and Enterprise Flash Architectures



Similar documents
NAND Flash Architecture and Specification Trends

Solid State Drive (SSD) FAQ

Scaling from Datacenter to Client

OCZ s NVMe SSDs provide Lower Latency and Faster, more Consistent Performance

NAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ

Redefining Flash Storage Solution

The Technologies & Architectures. President, Demartek

The Evolving NAND Flash Business Model for SSD. Steffen Hellmold VP BD, SandForce

Solid State Drives Data Reliability and Lifetime. Abstract

Comparison of NAND Flash Technologies Used in Solid- State Storage

How it can benefit your enterprise. Dejan Kocic Netapp

How SSDs Fit in Different Data Center Applications

Host Memory Buffer (HMB) based SSD System. Forum J-31: PCIe/NVMe Storage Jeroen Dorgelo Mike Chaowei Chen

Accelerating Datacenter Server Workloads with NVDIMMs

An Overview of Flash Storage for Databases

Low-Power Error Correction for Mobile Storage

Flash 101. Violin Memory Switzerland. Violin Memory Inc. Proprietary 1

Understanding endurance and performance characteristics of HP solid state drives

High-Performance SSD-Based RAID Storage. Madhukar Gunjan Chakhaiyar Product Test Architect

SLC vs MLC NAND and The Impact of Technology Scaling. White paper CTWP010

Temperature Considerations for Industrial Embedded SSDs

In-Block Level Redundancy Management for Flash Storage System

End to End Data Path Protection

A Comparison of Client and Enterprise SSD Data Path Protection

Solid State Drive Architecture

Solid State Technology What s New?

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

Accelerating Real Time Big Data Applications. PRESENTATION TITLE GOES HERE Bob Hansen

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Make A Right Choice -NAND Flash As Cache And Beyond

QuickSpecs. PCIe Solid State Drives for HP Workstations

Promise of Low-Latency Stable Storage for Enterprise Solutions

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Solid State Storage in Massive Data Environments Erik Eyberg

Application-Focused Flash Acceleration

SOLID STATE DRIVES AND PARALLEL STORAGE

A Balanced Approach to Optimizing Storage Performance in the Data Center

Flash s Role in Big Data, Past Present, and Future OBJECTIVE ANALYSIS. Jim Handy

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

NV-DIMM: Fastest Tier in Your Storage Strategy

PCIe Storage Performance Testing Challenge

Technologies Supporting Evolution of SSDs

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland

SATA II 3Gb/s SSD. SSD630 Benefits. Enhanced Performance. Applications

WP001 - Flash Management A detailed overview of flash management techniques

Flash In The Enterprise

ZD-XL SQL Accelerator 1.5

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University

Bringing Greater Efficiency to the Enterprise Arena

Data Center Solutions

Choosing the Right NAND Flash Memory Technology

NAND Flash & Storage Media

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Benefits of Solid-State Storage

Solid State Drive Technology

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

NAND Flash Architecture and Specification Trends

Choosing Storage Systems

Data Center Solutions

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

Indexing on Solid State Drives based on Flash Memory

21 st Century Storage What s New and What s Changing

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Flash-optimized Data Progression

The Myth of SSD Testing

NAND Basics Understanding the Technology Behind Your SSD

The Data Placement Challenge

HP Smart Array Controllers and basic RAID performance factors

Flash Fabric Architecture

1 Storage Devices Summary

Performance Beyond PCI Express: Moving Storage to The Memory Bus A Technical Whitepaper

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

Enterprise-class versus Desktopclass

HPe in Datacenter HPe3PAR Flash Technologies

An In-Depth Look at Variable Stripe RAID (VSR)

Booting from NAND Flash Memory

Price/performance Modern Memory Hierarchy

Flash Memory Basics for SSD Users

Flash Controller Architecture for All Flash Arrays

Don t Let RAID Raid the Lifetime of Your SSD Array

Managing the evolution of Flash : beyond memory to storage

The Efficient LDPC DSP System for SSD

Frequently Asked Questions March s620 SATA SSD Enterprise-Class Solid-State Device. Frequently Asked Questions

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

QuickSpecs. HP Z Turbo Drive

Firebird and RAID. Choosing the right RAID configuration for Firebird. Paul Reeves IBPhoenix. mail:

The Transition to PCI Express* for Client SSDs

WEBTECH EDUCATIONAL SERIES

SUPERTALENT PCI EXPRESS RAIDDRIVE PERFORMANCE WHITEPAPER

MCF54418 NAND Flash Controller

Nasir Memon Polytechnic Institute of NYU

Flash Performance in Storage Systems. Bill Moore Chief Engineer, Storage Systems Sun Microsystems

Transcription:

Important Differences Between Consumer and Enterprise Flash Architectures Robert Sykes Director of Firmware Flash Memory Summit 2013 Santa Clara, CA OCZ Technology

Introduction This presentation will describe key firmware design differences between consumer- and enterprise-class SSDs. Topics covered include end-to-end data protection, bad block recovery techniques for the latest NAND nodes such as Low-Density Parity Check (LDPC), advances in Flash Translation Layer techniques with a focus on firmware requirements for sustained performance, host interface buffering techniques, and a review of RAID and the flash interface.

Contents Consumer / Enterprise basics. End to End Data Protection Data Corruption / Correction (ECC, BCH, LDPC, RAID) Enterprise FTL architecture

Contents Consumer / Enterprise basics. End to End Data Protection Data Corruption / Correction (ECC, BCH, LDPC, RAID) Enterprise FTL architecture

Contents Consumer / Enterprise basics. End to End Data Protection Data Corruption / Correction (ECC, BCH, LDPC, RAID) Enterprise FTL architecture

Contents Consumer / Enterprise basics. End to End Data Protection Data Corruption / Correction (ECC, BCH, LDPC, RAID) Enterprise FTL architecture

Consumer vs Enterprise Basics

There have been several discussions over the past few months on the forums asking; What is the difference between consumer and enterprise grade SSDs? Snapshot of some of the discussions on the forums describing the differences: Interface is the difference. PCIe / SAS / NVMe are enterprise; SATA is consumer. Consumer: Cost / Capacity / Performance / Data Integrity Enterprise: Data Integrity / Performance / Capacity / Cost Enterprise has data redundancy

Consumer / Enterprise Basics Enterprise has consistent performance Enterprise drives have greater endurance Enterprise drives have additional raw capacity Enterprise drives require custom applications for specific needs Client / consumer drives contain a single or only two SSDs; Enterprise could be comprised of several SSDs

Consumer / Enterprise Basics Turning to specifications for the answer: JEDEC helps to define the difference by specifying workload differences. Data Usage Models JEDEC 218 and 219 define client and enterprise usage models Enterprise products will be used in heavy workload environments and are expected to perform.» JESD 218A defines this as 24hrs / day at 55ºC with 3 months retention at 40ºC Client drives generally have a lighter load.» JESD 218A defines this as 8hrs / day at 40ºC with 1 year retention at 30ºC Note: Data retention is defined here as the ability for the SSD to retain data in a power off state for a period of time.

Consumer / Enterprise Basics Enterprise SSDs should not really be defined by the interface, but by the application and system requirements. However, many enterprise systems are not surprisingly built around fast hosts such as PCIe / NVMe as some enterprise systems require high-end performance (high IOPS) with sustained data rates. Questions to ask when designing your system: Is the data WORM like (write once read many)? Is the data volatile data (like swap data)? Is the data redundant data (data backed up somewhere else)?

Consumer / Enterprise Basics Issues with sustained data read / writes: During sustained activity in an enterprise system, the probability of a system level issues increases. This could be anything from a host memory issue, DRAM failure or uncorrectable flash page. The following methods can be used to prevent system issues: End-to-End Data Protection Read Retry LDPC RAID Data Stirring Temperature Throttling Wear Leveling

End-to-End Data Protection

End-to-end Data Protection End-to-End Data Protection Why do we need it? Protect against silent errors» OS issues including device drivers problems» Any issues within the HW or FW of the storage controller (i.e., bus issues, firmware writing the wrong data, etc.) Metadata Application OS HBA Storage Metadata Metadata Validity Array Check Metadata Validity Check on each data transfer NAND

Data Corruption / Correction

Data Corruption / Correction What are the possible causes? NAND flash has an inherent weakness: (Quantum Mechanics!) as the NAND process shrinks so does the effectiveness of the oxide layer, in turn increasing the probability that a 0 becomes a 1 or vice versa when it shouldn t Electron Source Floating Gate Oxide Layer Drain

Data Corruption / Correction 100,000 P/E Cycle ECC Level Endurance 10,000 5,000 3,000 1000 SLC 5x nm MLC 3x nm MLC 2x nm MLC Future

Data Corruption / Correction How is ECC calculated? Spare area is used for Bad Block markers (Mandatory), ECC and User Specific Data X pages per block NAND Block Data Area Spare Area

Data Corruption / Correction The amount of spare area will depict the amount of ECC applied. For example: 25nm MLC may have 450 bytes of spare area allowing 24bit ECC per 1K of data 20nm MLC may have 750 bytes of spare area allowing 40bit plus ECC per 1K of data Note: The NAND vendors specify the level of ECC expected to meet the specified endurance of the drive Consumer drives could use BCH to achieve this level of ECC BCH (Bose, Chaudhuri, Hocquenghem; invented 1959)

Data Corruption / Correction Enterprise drives require greater endurance over consumer grade drives given the different requirements set out by the JEDEC standard; BCH may not be enough to meet these standards. One possible method to achieve this is to use LDPC (Low Density Parity Check) LDPC can easily utilise soft information LDPC can enhance endurance and retention Requires new ratio(s) of user data to parity data Compare with SNR/Frame Error Rate graphs, not a simple N-bit correction. How does LDPC work..

Data Corruption / Correction LDPC code word - satisfied LDPC code word error d3

Data Corruption / Correction LDPC code error v3 - decode iteration 0.6 1 0.3 0 1 0.5 01 1 0 0 V-nodes C-nodes V0-0 V1-0 V0-1 V1-1 V2-1 V2-1 V3-1 V4-1 V5-0

Data Corruption / Correction What happens when BCH and LDPC is not enough to recover the data? RAID could be used as a method to guarantee data but comes at the cost of additional NAND to store parity and data copies

RAID

RAID RAID protection As the NAND process shrinks, NAND becomes more susceptible to error, sometimes ECC, BCH and LDPC may not be enough HDD vendors use RAID to protect data across physical drives RAID 5 Controller A1 A2 A3 AP B1 B2 BP B3 C1 CP C2 C3 DP D1 D2 D3 Disk 1 Disk 2 Disk 3 Disk 4

RAID A similar approach can be taken with SSDs Block A1 Block B1 Blok C1 Block DP Die1 SSD Controller Block A2 Block B2 Block CP Block D1 Block A3 Block BP Block C2 Block D2 Die2 Die3 Block AP Block B3 Block C3 Block D3 Die4

RAID What is RAID protecting against? Latest NAND nodes could be susceptible to mass data loss die to charge issues within the NAND This can result in a whole world line data loss 1 1 1 0 0 1 0 0 1 1 1 0 0 1 0 0 Data Block marked as bad

Enterprise FTL

Enterprise FTL You can put a price on performance, but you cant buy low latency!

Enterprise FTL Host Basic FTL System Architecture NCQ Command Queue Event Queue FTL / BBM / WL NAND SATA Transfer Buffer Manager HIL FTL FIL DRAM

Enterprise FTL Enterprise drives may require sustained data rates that far exceed a general consumer grade NAND solution As such, the system requirements are far more complex than a consumer grade drive In the previous example the system is defined for SATA only system When considering the sustained data rates of NVMe, a different approach is required

Enterprise FTL In addition to coping with fast hosts, enterprise solutions also require reduced latency Updating the FTL mapping takes time: For 16K page there would be one entry for each 16K One entry is ~4B (i.e., for each 16K page written 4B of metadata must be stored in DDR) For a 256GB drive with 16K page NAND:» (256G / 16K ) *4B = 64MB For a 256GB drive with 8K page NAND:» (256G / 8K ) *4B = 128MB

Enterprise FTL Writing many mega bytes of mapping table data to the NAND on a regular basis could be a major contributing factor to latency issues in the system Instead of writing the complete table, latency can be reduced by splitting the table into smaller chunks of data and then creating a link between multiple maps

Enterprise FTL Now we have a new (very high level) FTL architecture we can address the performance bottlenecks

Enterprise FTL Enterprise FTL System Architecture HIL FIL PCIe 4 Host Lanes Gen3 Buffer Manager Multi CPU Soft LDPC Buffer Manager NAND FTL / BBM / WL SRAM DRAM

Enterprise FTL Other factors to consider in an enterprise system: Higher OP levels may be required to achieve good dirty performance Inclusion of super caps or batteries to protect against power loss Infield debugging, continuous logging of firmware behaviour accessible remotely for a speedy unobtrusive solution

Thank You!