Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery

Similar documents
Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery

Error Patterns in MLC NAND Flash Memory: Measurement, Characterization, and Analysis

Solid State Drive Technology

Programming Matters. MLC NAND Reliability and Best Practices for Data Retention. Data I/O Corporation. Anthony Ambrose President & CEO

Low-Power Error Correction for Mobile Storage

Yaffs NAND Flash Failure Mitigation

NAND Flash Architecture and Specification Trends

BER-Based Wear Leveling and Bad Block Management for NAND Flash

Database!Fatal!Flash!Flaws!No!One! Talks!About!!!

SLC vs. MLC: An Analysis of Flash Memory

Linux flash file systems JFFS2 vs UBIFS

Trends in NAND Flash Memory Error Correction

An In-Depth Look at Variable Stripe RAID (VSR)

Important Differences Between Consumer and Enterprise Flash Architectures

Don t Let RAID Raid the Lifetime of Your SSD Array

SLC vs MLC: Proper Flash Selection for SSDs in Industrial, Military and Avionic Applications. A TCS Space & Component Technology White Paper

Temperature Considerations for Industrial Embedded SSDs

Industrial Flash Storage Module

NAND Flash Architecture and Specification Trends

NAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ

The Technologies & Architectures. President, Demartek

SLC vs MLC: Which is best for high-reliability apps?

Intel Solid-State Drive 320 Series

Endurance Models for Cactus Technologies Industrial-Grade Flash Storage Products. White Paper CTWP006

NAND Flash Memory Reliability in Embedded Computer Systems

SLC vs MLC NAND and The Impact of Technology Scaling. White paper CTWP010

NAND Basics Understanding the Technology Behind Your SSD

Data Storage Time Sensitive ECC Schemes for MLC NAND Flash Memories

Solid State Drive (SSD) FAQ

3D NAND Technology Implications to Enterprise Storage Applications

Flash Memory Jan Genoe KHLim Universitaire Campus, Gebouw B 3590 Diepenbeek Belgium

Addressing Fatal Flash Flaws That Plague All Flash Storage Arrays

Comparison of NAND Flash Technologies Used in Solid- State Storage

WP001 - Flash Management A detailed overview of flash management techniques

Gaussian Tail or Long Tail: On Error Characterization of MLC NAND Flash

Incremental Redundancy to Reduce Data Retention Errors in Flash-based SSDs

Choosing the Right NAND Flash Memory Technology

How To Write On A Flash Memory Flash Memory (Mlc) On A Solid State Drive (Samsung)

PowerProtector: Superior Data Protection for NAND Flash

Lifetime Improvement of NAND Flash-based Storage Systems Using Dynamic Program and Erase Scaling

io3 Enterprise Mainstream Flash Adapters Product Guide

Solid State Technology What s New?

Advantages of e-mmc 4.4 based Embedded Memory Architectures

Offline Deduplication for Solid State Disk Using a Lightweight Hash Algorithm

High-Performance SSD-Based RAID Storage. Madhukar Gunjan Chakhaiyar Product Test Architect

Flash Memory Basics for SSD Users

An Overview of Flash Storage for Databases

The Efficient LDPC DSP System for SSD

1 / 25. CS 137: File Systems. Persistent Solid-State Storage

White Paper. Avoiding premature failure of NAND Flash memory. Inhalt

SOLID STATE DRIVES AND PARALLEL STORAGE

Flash Memory. Jian-Jia Chen (Slides are based on Yuan-Hao Chang) TU Dortmund Informatik 12 Germany 2015 年 01 月 27 日. technische universität dortmund

Technical Note. NOR Flash Cycling Endurance and Data Retention. Introduction. TN-12-30: NOR Flash Cycling Endurance and Data Retention.

How it can benefit your enterprise. Dejan Kocic Netapp

Flash-Friendly File System (F2FS)

Implementation and Challenging Issues of Flash-Memory Storage Systems

Flash s Role in Big Data, Past Present, and Future OBJECTIVE ANALYSIS. Jim Handy

Solid State Drives Data Reliability and Lifetime. Abstract

Data Distribution Algorithms for Reliable. Reliable Parallel Storage on Flash Memories

File System Management

SSD Server Hard Drives for IBM

Understanding endurance and performance characteristics of HP solid state drives

Nasir Memon Polytechnic Institute of NYU

A Flash Storage Technical. and. Economic Primer. and. By Mark May Storage Consultant, By James Green, vexpert Virtualization Consultant

In-Block Level Redundancy Management for Flash Storage System

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

NAND Flash Memories. Understanding NAND Flash Factory Pre-Programming. Schemes

MirrorBit Technology: The Foundation for Value-Added Flash Memory Solutions FLASH FORWARD

Algorithms and Methods for Distributed Storage Networks 3. Solid State Disks Christian Schindelhauer

The Bleak Future of NAND Flash Memory

Optimizing NAND Flash-Based SSDs via Retention Relaxation

Booting from NAND Flash Memory

Maximizing Your Server Memory and Storage Investments with Windows Server 2012 R2

Indexing on Solid State Drives based on Flash Memory

Flash Memories. João Pela (52270), João Santos (55295) December 22, 2008 IST

Nonlinear Multi-Error Correction Codes for Reliable NAND Flash Memories

IEEE Milestone Proposal: Creating the Foundation of the Data Storage Flash Memory Industry

Trabajo Memorias flash

Vol.2. Industrial SD Memory. Program/Erase Endurance. Extended Temperature. Power Failure Robustness

Solid State Storage in a Hard Disk Package. Brian McKean, LSI Corporation

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

Flash-optimized Data Progression

Rugged & Reliable Data Storage Solid-State State Flash Disk overview

Physical Data Organization

Frequently Asked Questions March s620 SATA SSD Enterprise-Class Solid-State Device. Frequently Asked Questions

RoHS Compliant SATA High Capacity Flash Drive Series Datasheet for SAFD 25NH-M

Managing the evolution of Flash : beyond memory to storage

THE CEO S GUIDE TO INVESTING IN FLASH STORAGE

USB Flash Drive Engineering Specification

Flash 101. Violin Memory Switzerland. Violin Memory Inc. Proprietary 1

Key Factors for Industrial Flash Storage

AN1819 APPLICATION NOTE Bad Block Management in Single Level Cell NAND Flash Memories

Transcription:

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation 1

Many use cases: You Probably Know + High performance, low energy consumption 2

Flash Controller NAND Flash Memory Challenges Requires erase before program (write) High raw bit error rate CPU Raw Flash Memory Chips ECC Controller 3

Raw bit error rate (RBER) ~2000 ~3000 Limited Flash Memory Lifetime Goal: Extend flash memory lifetime at low cost P/E Cycle Lifetime ECC-correctable RBER Program/Erase (P/E) Cycles (or Writes Per Cell) 4

Retention Loss Charge leakage over time 0 0 Flash cell Retention error 1 One dominant source of flash memory errors [DATE 12, ICCD 12] 5

Before I show you how we extend flash lifetime NAND Flash 101 6

Threshold Voltage (V th ) Flash cell Flash cell 1 0 Normalized V th 7

Threshold Voltage (V th ) Distribution Probability Density Function (PDF) 1 0 Normalized V th 8

Read Reference Voltage (V ref ) DF V ref 1 0 Normalized V th 9

Multi-Level Cell (MLC) DF Erased (11) ER-P1 V ref P1 (10) P1-P2 V ref P2 (00) P2-P3 V ref P3 (01) Normalized V th 10

Threshold Voltage Reduces Over Time Before After some retention retention loss: loss: DF P1 (10) P2 (00) P3 (01) Normalized V th 11

Fixed Read Reference Voltage Becomes Suboptimal Before After some retention retention loss: loss: DF P1 (10) P1-P2 V ref P2 (00) P2-P3 V ref P3 (01) Raw bit errors Normalized V th 12

P1-P2 OPT P2-P3 OPT Optimal Read Reference Voltage (OPT) After some retention loss: DF P1 (10) P1-P2 V ref P2 (00) P2-P3 V ref P3 (01) Minimal raw bit errors Normalized V th 13

Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage 14

Retention Failure After significant some retention retention loss: loss: DF P1 (10) P1-P2 V ref P2 (00) P2-P3 V ref P3 (01) Uncorrectable Correctable errors Normalized V th 15

Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 16

To understand the effects of retention loss: - Characterize retention loss using real chips 17

To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 18

Characterization Methodology FPGA-based flash memory testing platform [Cai+,FCCM 11] 19

Characterization Methodology FPGA-based flash memory testing platform Real 20- to 24-nm MLC NAND flash chips 0- to 40-day worth of retention loss Room temperature (20⁰C) 0 to 50k P/E Cycles 20

Characterize the effects of retention loss 1. Threshold Voltage Distribution 2. Optimal Read Reference Voltage 3. RBER and P/E Cycle Lifetime 21

PDF 1. Threshold Voltage (V th ) Distribution P1 P2 P3 Normalized V th 22

1. Threshold Voltage (V th ) Distribution 40-day 0-day 40-day 0-day P1 P2 P3 Finding: Cell s threshold voltage decreases over time 23

2. Optimal Read Reference Voltage (OPT) 40-day OPT 0-day OPT 40-day OPT 0-day OPT P1 P2 P3 Finding: OPT decreases over time 24

RBER 3. RBER and P/E Cycle Lifetime P/E Cycles 25

Nominal Lifetime Extended Lifetime 3. RBER and P/E Cycle Lifetime Reading data with 7-day worth of retention loss. V ref closer to actual OPT Actual OPT ECC-correctable RBER Finding: Using actual OPT achieves the longest lifetime 26

Characterization Summary Due to retention loss Cell s threshold voltage (V th ) decreases over time Optimal read reference voltage (OPT) decreases over time Using the actual OPT for reading Achieves the longest lifetime 27

To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 28

Naïve Solution: Sweeping V ref Key idea: Read the data multiple times with different read reference voltages until the raw bit errors are correctable by ECC Finds the optimal read reference voltage Requires many read-retries higher read latency 29

Comparison of Flash Read Techniques Flash Read Techniques Lifetime (P/E Cycle) Performance (Read Latency) Fixed V ref Sweeping V ref Our Goal 30

Observations 1. The optimal read reference voltage gradually decreases over time Key idea: Record the old OPT as a prediction (V pred ) of the actual OPT Benefit: Close to actual OPT Fewer read retries 2. The amount of retention loss is similar across pages within a flash block Key idea: Record only one V pred for each block Benefit: Small storage overhead (768KB out of 512GB) 31

Retention Optimized Reading (ROR) Components: 1. Online pre-optimization algorithm Periodically records a V pred for each block 2. Improved read-retry technique Utilizes the recorded V pred to minimize read-retry count 32

1. Online Pre-Optimization Algorithm Triggered periodically (e.g., per day) Find and record an OPT as per-block V pred Performed in background Small storage overhead DF New V pred Old V pred Normalized V th 33

2. Improved Read-Retry Technique Performed as normal read V pred already close to actual OPT Decrease V ref if V pred fails, and retry DF OPT V pred Very close Normalized V th 34

Retention Optimized Reading: Summary Flash Read Techniques Lifetime (P/E Cycle) Performance (Read Latency) Fixed V ref Sweeping V ref 64% ROR Nom. Life: 2.4% 64% Ext. Life: 70.4% 35

To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 36

Retention Failure After some significant retention retention loss: loss: DF P1 (10) P1-P2 V ref P2 (00) P2-P3 V ref P3 (01) Uncorrectable Correctable errors Normalized V th 37

Leakage Speed Variation DF S low-leaking cell F ast-leaking cell Normalized V th 38

Initially, Right After Programming DF P2 P3 S S F F F S F S Normalized V th 39

After Some Retention Loss DF Fast-leaking cells have lower V th P2 P3 Slow-leaking cells have higher V th S S F F F S F S Normalized V th 40

Eventually: Retention Failure DF P2 OPT P3 S S F F F F S S Normalized V th 41

Retention Failure Recovery (RFR) Key idea: Guess original state of the cell from its leakage speed property Three steps 1. Identify risky cells 2. Identify fast-/slow-leaking cells 3. Guess original states 42

OPT σ OPT OPT+σ 1. Identify Risky Cells DF Risky + S = P2 cells + F = P3 Key Formula S F F S Normalized V th 43

OPT σ OPT OPT+σ 2. Identifying Fast- vs. Slow-Leaking Cells DF Risky + S = P2 cells + F = P3 Key Formula???? Normalized V th 44

OPT σ OPT OPT+σ 2. Identifying Fast- vs. Slow-Leaking Cells DF Risky + S = P2 cells + F = P3 Key Formula S??? F?? F S? Normalized V th 45

3. Guess Original States DF Risky + S = P2 cells + F = P3 Key Formula S F F S Normalized V th 46

RFR Evaluation Program with random data 28 days Expect to eliminate 50% of raw bit errors ECC can correct remaining errors Detect failure, backup data Recover data 12 addt l. days 47

To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 48

Conclusion Problem: Retention loss reduces flash lifetime Overall Goal: Extend flash lifetime at low cost Flash Characterization: Developed an understanding of the effects of retention loss in real chips Retention Optimized Reading: A low-cost mechanism that dynamically finds the optimal read reference voltage 64% lifetime, 70.4% read latency Retention Failure Recovery: An offline mechanism that recovers data after detecting uncorrectable errors Raw bit error rate 50%, reduces data loss 49

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation 50

Backup Slides 51

RFR Motivation Data loss can happen in many ways 1. High P/E cycle 2. High temperature accelerates retention loss 3. High retention age (lost power for a long time) 52

What if there are other errors? Key: RFR does not have to correct all errors Example: ECC can correct 40 errors in a page Corrupted page has 20 retention errors, 25 other errors (45 total errors) After RFR: 10 retention errors, 30 other errors (40 total errors ECC correctable) 53