Managing the evolution of Flash : beyond memory to storage



Similar documents
Universal Flash Storage - Ultimatum of next generation Storage -

NAND Flash Architecture and Specification Trends

Samsung emmc. FBGA QDP Package. Managed NAND Flash memory solution supports mobile applications BROCHURE

NAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ

A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems

Important Differences Between Consumer and Enterprise Flash Architectures

Solid State Drive (SSD) FAQ

Flash-Friendly File System (F2FS)

Advantages of e-mmc 4.4 based Embedded Memory Architectures

How To Write On A Flash Memory Flash Memory (Mlc) On A Solid State Drive (Samsung)

NAND Flash Architecture and Specification Trends

Scaling from Datacenter to Client

NAND Flash & Storage Media

SATA II 3Gb/s SSD. SSD630 Benefits. Enhanced Performance. Applications

In-Block Level Redundancy Management for Flash Storage System

Technologies Supporting Evolution of SSDs

An Overview of Flash Storage for Databases

Samsung Solid State Drive RAPID mode

Comparison of NAND Flash Technologies Used in Solid- State Storage

Benchmarking Mobile Storage with the Arasan High Performance HVP

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Choosing the Right NAND Flash Memory Technology

Solid State Drive Technology

Eureka Technology. Understanding SD, SDIO and MMC Interface. by Eureka Technology Inc. May 26th, Copyright (C) All Rights Reserved

A New Chapter for System Designs Using NAND Flash Memory

Linux flash file systems JFFS2 vs UBIFS

Introduction to the Universal Flash Storage Assocation

Understanding endurance and performance characteristics of HP solid state drives

SATA SSD Series. InnoDisk. Customer. Approver. Approver. Customer: Customer. InnoDisk. Part Number: InnoDisk. Model Name: Date:

Nasir Memon Polytechnic Institute of NYU

Intel Solid-State Drive 320 Series

Universal Flash Storage: Mobilize Your Data

Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages

Flash 101. Violin Memory Switzerland. Violin Memory Inc. Proprietary 1

Flash Memory. Jian-Jia Chen (Slides are based on Yuan-Hao Chang) TU Dortmund Informatik 12 Germany 2015 年 01 月 27 日. technische universität dortmund

Embedded Multi-Media Card Specification (e MMC 4.5)

Price/performance Modern Memory Hierarchy

File Systems for Flash Memories. Marcela Zuluaga Sebastian Isaza Dante Rodriguez

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland

RoHS Compliant SATA High Capacity Flash Drive Series Datasheet for SAFD 25NH-M

Flash & DRAM Si Scaling Challenges, Emerging Non-Volatile Memory Technology Enablement - Implications to Enterprise Storage and Server Compute systems

NAND Flash Memory as Driver of Ubiquitous Portable Storage and Innovations

1 / 25. CS 137: File Systems. Persistent Solid-State Storage

UBI with Logging. Brijesh Singh Samsung, India Rohit Vijay Dongre Samsung, India

NAND Basics Understanding the Technology Behind Your SSD

Indexing on Solid State Drives based on Flash Memory

Host Memory Buffer (HMB) based SSD System. Forum J-31: PCIe/NVMe Storage Jeroen Dorgelo Mike Chaowei Chen

The Evolving NAND Flash Business Model for SSD. Steffen Hellmold VP BD, SandForce

Flash s Role in Big Data, Past Present, and Future OBJECTIVE ANALYSIS. Jim Handy

OCZ s NVMe SSDs provide Lower Latency and Faster, more Consistent Performance

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

Performance Beyond PCI Express: Moving Storage to The Memory Bus A Technical Whitepaper

High-Performance SSD-Based RAID Storage. Madhukar Gunjan Chakhaiyar Product Test Architect

Features. SSD370S SATA III 6Gb/s SSD. Advanced Global Wear-Leveling and Block management for reliability

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

Offline Deduplication for Solid State Disk Using a Lightweight Hash Algorithm

Flash Memory. For Automotive Applications. White Paper F-WP001

SLC vs MLC: Proper Flash Selection for SSDs in Industrial, Military and Avionic Applications. A TCS Space & Component Technology White Paper

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

SOLID STATE DRIVES AND PARALLEL STORAGE

Flash Memory Basics for SSD Users

SLC vs MLC: Which is best for high-reliability apps?

Outlook of Ultrabooks, Tablets and Smartphones Michael Wang Macronix Int l

The Technologies & Architectures. President, Demartek

SSD Server Hard Drives for IBM

A Data De-duplication Access Framework for Solid State Drives

Solid State Technology What s New?

Addressing Fatal Flash Flaws That Plague All Flash Storage Arrays

MCF54418 NAND Flash Controller

Caching Mechanisms for Mobile and IOT Devices

Data Distribution Algorithms for Reliable. Reliable Parallel Storage on Flash Memories

Input/output (I/O) I/O devices. Performance aspects. CS/COE1541: Intro. to Computer Architecture. Input/output subsystem.

Programming NAND devices

SALSA Flash-Optimized Software-Defined Storage

Solid State Storage in Massive Data Environments Erik Eyberg

Make A Right Choice -NAND Flash As Cache And Beyond

Solid State Drive Architecture

Speeding Up Cloud/Server Applications Using Flash Memory

SSD Write Performance IOPS Confusion Due to Poor Benchmarking Techniques

Powering the Next Mobile Generation An overview of UFS. Samsung Semiconductor, Inc Memory Marketing

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Understanding Flash SSD Performance

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Flash-optimized Data Progression

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Accelerating Server Storage Performance on Lenovo ThinkServer

Random Access Memory (RAM) Types of RAM. RAM Random Access Memory Jamie Tees SDRAM. Micro-DIMM SO-DIMM

A Storage Architecture for High Speed Signal Processing: Embedding RAID 0 on FPGA

Mini PCIe SSD - mpdm. RoHS Compliant. Product Specifications. August 15 th, Version 1.0

Getting the Most Out of Flash Storage

Transcription:

Managing the evolution of Flash : beyond memory to storage Tony Kim Director, Memory Marketing Samsung Semiconductor I nc. Nonvolatile Memory Seminar Hot Chips Conference August 22, 2010 Memorial Auditorium Stanford University

Contents NAND Flash technology Flash storage management Flash storage architecture by apps Future trend Conclusions

Contents NAND Flash technology Flash storage management Flash storage architecture by apps Future trend Conclusions

NAND Technology Shifts Smoothly Density has doubled every year since 2004 There are breaking points on key technology beyond 2013 Lithography shrink slow s, NAND Reliability degrades < Density> TLC < Reliability> MLC 256Gb SLC 100K 50K ~100K 128Gb 10K SLC 64Gb 5K 3K 16Gb 16Gb 1.5K 4Gb 8Gb 8Gb 1K MLC 64Mb 256Mb 0.5K TLC * SLC : Single level cell, MLC : Multi level cell, TLC : Triple level cell 3 / 34

Reliability Tolerance by Technology I nfluence on Scaling-down of Floating Gate Increase of F-Poly interference Cell interference Decrease of coupling ratio V PGM/ERS Reduction of charge loss damage tolerance Charge loss Ref: YunSeung Shin, Symposium on VLSI Circuits, pp.156 159, 2005 4 / 34

JEDEC Standard : Cycle & Retention Density Write operations should occur across device life time 10 year retention after lifetime w rite cycle is unpractical retention Endurance cycle retention Endurance cycle Max. spec cycle 10years Max. spec cycle Until 2006 As of 2007 100% END Cycle + 10y DTN 10% END Cycle + 10y DTN 100% END Cycle + 1y DTN 5 / 34

Controller : Critical for Maintaining Reliability More intelligence of controller can offset some of the generic degradation of NAND reliability from scaling Endurance Target Endurance Role of Controller 3 K 2bit 1 K 3bit Endurance of cell 4 xnm 3 xnm 2 xnm Technology 6 / 34

Technology Engine w ith ECC/ Bit Error/ FTL Legacy controller only with ECC cannot reliably handle 2 bit in 3xnm and beyond, let alone 3 bit Optimization w ith Flash cell characteristics in mind is crucial New metric for reliability measurement is needed such as lifetime data amount w ith standard pattern 3 bit MLC Legacy Controller R1 R2 R3 R4 R5 R6 R7 Vth Bit errors New Controller Melded together Host I/F FTL (S/W) ECC Engine NAND Host I/F FTL (S/W) ECC Engine Bit Error Engine NAND 7 / 34

Samsung s I nnovative Flash Technology Samsung is exploring new technology to break status- quo Samsung believes 3D- NAND is the most likely successor for Planar NAND in the coming future Planar NAND 3D-NAND Cell architecture Advantages Easy to produce with simple process High reliability Process reuse Challenges Hard to shrink under 1xnm Hard to stack 8 / 34

3D-NAND details Potential benefit Better reliability/ endurance than planar since cell design rule is much more relaxed. I ssues in future scaling Bit cost reduction is done by increasing the stacking layer, thereby increasing by 2X per each generation becomes more difficult and unlikely Block size w ill be larger than the planar-equivalent n + SSL Gate Block Size (MB) 4.5 4 3.5 3 2.5 W/ L share 8 Control Gate (W/L) 2 1.5 1 W/ L share 6 W/ L share 4 n n CSL+ p-sub + GSL Gate 0.5 0 8 16 24 32 Stacking layer # 9 / 34

Contents NAND Flash technology Flash storage management Flash storage architecture by apps Future trend Conclusions

Why Software I s Needed for Flash? Small data unit in Program and the large unit in Erase requires another block new ly allocated through mapping updating New Valid Logical #1 Log #1. #1 Phy. page. page page page page Copy New Block #N Invalidated Map Table Block #1 #1 #1 Block #1 New Log. Phy. #N Map Table (updated) Logical block #1 is mapped to physical block #1 and #N after updating 11 / 34

FTL ( Flash Translation Layer) Manages mapping from logical to physical address Detects and maps out Bad Blocks Does Wear-leveling for life extension Application Application File System FTL Sector Read/Write Page Read/ Program/Erase File System Host Driver Managed NAND FTL Sector Read/Write NAND Flash NAND 12 / 34

Reclaiming Valid : Garbage Collection Over time Flash is mixed with valid and invalid data Free space needs to be reclaimed to write new data Garbage collection merges the valid data from the scattered blocks To be updated Logical Block updated [Block #1] updated [Block #2] [Block #3]??? #1 #2 #3 #4 No more free block Physical Block Invalid updated Invalid updated Merge #1 #2 Erased updated Allocated for Logical Block #3 13 / 34

What is WAI (Wear Acceleration I ndex)? WAI : The index that represent how much FTL accelerate the wear-out of NAND Example: WAI Write Request (2 blocks) = EraseCount Total BlockSize WriteSizeBytes Bytes FTL Write (4 blocks) + Erase (4 blocks) WAI NAND Life time Meaning High Short Bad Low Long Good NAND WAI = 4 128KB 256KB = 2 * Block Size: 128KB 14 / 34

What is Wear-leveling? Hot data like FAT can wear out certain portion of cell array Wear-leveling maximizes the life span of NAND flash as each cell is used evenly Frequent Update FAT Erase Count Erase Count Every cell is used evenly Block Num ber Before Wear-leveling Block Num ber After Good Wear-leveling 15 / 34

Contents NAND Flash technology Flash storage management Flash storage architecture by apps Future trend Conclusions

NAND Flash Market Outlook (` 10 ~ ` 15) Samsung expected NAND Market CAGR of 53% between 2010 and 2015 Key applications for NAND market grow th for next decade are : Flash Card, Smart Phone, Tablet PC and SSD W/ W Annual Unit @16Gb Equiv. [ Billions] 6.7 Billion 2010-2015, 5-Yr CAGR: 53% SSD Tablet PC Smart Phone Flash Card 56 Billion Year ( Source : Samsung ) 17 / 34

Performance Multiplied w ith Multi- Chips Flash storage performance can be easily expanded with multi-way write interleaving along with multi-channels During program Busy, data can be loaded and programmed into other NAND devices on the same bus Flash Storage architecture CH 1 nce1 Load PROGRAM Load nce1 nce2 Load PROGRAM Load nce2 Write I OPS Flash Controller NAND Controller No.1 NAND Controller No.2 I / O s I / O s 5000 4000 3000 2000 nce3 1000 nce4 0 1 2 4 8 16 Number of chips 18 / 34

Application Segmented by Technology Applications w ill be fragmented by performance and reliability Closer communications needed to understand user requirements and tailor appropriate solutions Endurance 300K 100K 30K 3K 1.5K 1K TLC MLC MP3, PND SD Class4 SLC Mid-End Enterprise SSD Low-End Server SSD Mobile Phone, PMP MP3, PND SD Class6 High-End Server SSD Tablet, Consumer SSD High-End Card 500 SD Class2 SD Class4, OEM UFD SD Class6 50 Consumer Card, UFD Consumer Card, UFD 2.5MB/ s 5MB/ s 8MB/ s 13MB/ s Write Performance 19 / 34

Contents NAND Flash technology Flash storage management Flash storage architecture by apps Future trend Conclusions

Storage Architecture Evolution : Technical Trend New environment on the storage drives higher performance Multi tasking with Swap IO High performance App with fast data IO High bandwidth Network with User IO 150 Storage Bandwidth [MB/ sec] 1.3Gbps UWB 100 UFS -Multi Task -Full brow sing -Multimedia play -Async I O & Sync I O -Swap -Web Cache -HD Media File -Small Random I O -Big Size Sequential Write 1Gbps 802.11ac 80 60 emmc 4.4 -Multi Task -App install -Web access -Multimedia -App - base 600Mbps 802.11n 40 300Mbps 802.11n~ g 20 Former emmc - Single Task -Text -Photo 54Mbps 802.11g 11~ 40Mbps 802.11b 2009 2010 2011 2012 2013 System Environment Environment Netw ork Environment 21 / 34

Architecture for high Performance and Low Latency Architecture of OneNAND + movinand can be unified to movinand only with fast random write and low latency I mplementing HPI can resolve the problem 2 channel storage 1 channel storage 1 channel storage CPU SLC CPU SLC CPU OneNAND CTRL MLC DRAM CTRL SLC MLC emmc CTRL SLC MLC emmc MB/ S (Write) Now problematic TI ME 22 / 34

Preparing Host for New Features of emmc & UFS The enhancements in read/ write performance, data integrity at sudden system-power failure and lower power consumption at idle are only realized with the relevant support of file-system, and OS in some cases Linux open source community is far behind in aligning to them Chipset and handset vendors should work out the responsibility details Features Purpose emmc UFS Trim Write speed up Reliable write integrity at power-loss HPI Write-suspend for fast read access Command queuing Read/ Write speed up Sleep Power Lower power at idle 23 / 34

Attributes in Mobile System in mobile device has different data attribute System data/ Code data / Swap data - System working, Multi- tasking, System application working - High speed random I/ O performance & data reliability are mainly required User data - High density Multimedia data read & write - Sequential I/ O is mainly required code System Meta Sw ap User Read Centric Write Centric Sequential Random Sequential Random Reliability 24 / 34

New Feature for e-mmc 4.4 : Multi Partition Flexibility to host to manage emmc 4.4 4 general purpose partitions and enhanced user data area can be set in normal user data area General Purpose Partition Boot 1 Boot 2 RPMB (Secure ) Enhanced User data area(slc mode) Normal User Area Partitions NAND type Default Size Remarks Boot Area Partition 1 SLC Mode 128KB Size as multiple of 128KB (max. 32MB) Boot Area Partition 2 SLC Mode 128KB Size as multiple of 128KB (max. 32MB) RPMB Area Partition SLC Mode 128KB Size as multiple of 128KB (max. 32MB) General Purpose Partitions MLC or Enhanced Area 0KB Available size can be seen by following: (EXT_CSD[145]* 8 2 + EXT_CSD[144]* 8 1 +EXT_CSD[143]) * HC_WP_GPR_SIZE*HC_ERASE_GPR_SIZE * 512KB byte User Area Enhanced Area SLC Mode 0KB Start address multiple of Write Protect Group size Default Area MLC 93.1% 25 / 34

Usage Model in emmc4.4 SLC and MLC partitions to be tailored by use scenarios Host Side emmc 4.4 Boot Code System Code System File Security Read I O, Reliability Read I O, Reliability Boot Area Partition RPMB Area Partition General Purpose Area Partition 1 General Purpose Area Partition 2 SLC MLC Swap Application Code User Area Partition User Read/ Write I O, Sequential Pattern Enhanced User Area Partition 26 / 34

Better Multi-tasking Performance with Trim Trim reduces long write latency and in turn read latency at multi-tasking. Mobile phones very likely having some free space benefit from Trim Thread 1 (4k RD) RD RD RD Thread 2 (64K WR) Merge Program Merge Program (Source : Microsoft) No Impact as File Deleted Thread 1 (4k RD) RD RD RD RD Thread 2 (64K WR) Program Program Program Program Performance Increase as Files Deleted Conceptual view of increased read thread due to long write thread with Busy 27 / 34

How Write Latency Affects for Multi- tasking Multi-tasking needs low write latency in a time-out value for uninterrupted audio/ video play-back PROCESS A APP STOP HOST MP3 PLAY PROCESS B Download Device Traditional emmc Read Write I nternal Write ( Merge) MB/ s (Write) TI ME 28 / 34

I nterrupting Write-Busy Sustains Real-time Task Long write-busy of emmc should be interrupted for high priority real-time task such as audio/ video play-back HOST PROCESS A Video Playing PROCESS B Cache Write CMD 0 CMD 1 Request HPI Time-Out Point Time-Out Point DEVI CE MoviNAND STOP CMD 0 & I nterrupt Start Write I O HPI CMD I O Suspend CMD 0 RESUME CMD 0 Resume Write I O of Process B 29 / 34

Next Generation Embedded Storage : UFS Universal Flash Storage based on serial M-Phy Serial interface : 300MB/s bandwidth Native Command Queuing : Support parallel NAND Flash working for Random/Sequential IO Page mapping with DRAM : Reduce internal Merge operation NCQ w/ Q depth 4 HOST System Code data Swap data User data 300MB/ s Serial I nterface CTRL CMD1 CMD2 CMD3 CMD4 DRAM Page Mapping & Buffer with DRAM 30 / 34

CMD 4 CMD 2 CMD 1 CMD 6 CMD 5 CMD 3 UFS Command Queuing Enhance Performance Normal data buffered and w ritten to different NAND dies in parallel Host system should indicate critical data like file-system meta to be w ritten synchronously Command Queuing Command Reordering & Parallel Request Operate CMD13 Operate CMD25 Operate CMD 46 CMD 1 CMD 2 NAND chip no.1 CMD 3 UFS CTRL CMD 4 NAND chip no.2 CMD 5 CMD 6 31 / 34

Queue Depth & Random I O Performance Random Performance depends on parallel I O number Write IOPS UFS w/o UCQ UFS w/ UCQ Q Depth : 2 Read IOPS Q Depth : 1 Q Depth : 2 Q Depth : 4 * IOPS(Input/Output Operation Per Second) is a common benchmark for computer storage media Optimized Command Queue Depth Depends on the number of NAND channel or way in UFS Device Consider loss while sudden power off Register for Dynamic setting of Command Queue Depth Add to UCQ mode page 32 / 34

UFS Standardization Roadmap UFS version 1.0 is only the baseline Additions for performance, low power and reliability will be put into the follow-up spec 2010 2011 Q3 Q4 Q1 Q2 Q3 Q4 UFS Specification e-mmc features ERASE/ TRI M/ Reliable w rite NCQ Hybrid Pow er 1 st showing (Aug. interim meeting) 2 nd showing 2 nd showing Version 1.0 Version 1.x Final 2 nd showing 33 / 34

Conclusions Various NAND Flash technology target different markets with different focus in architecture and performance, resulting in w ider product portfolio High performance mobile systems should tailor the use of different NAND Flash technology of single storage with regards to performance and reliability for various application scenarios NAND-friendly system-level solutions such as Trim, HPI as well as Reliable Write on emmc will play a critical role in managing high reliability and performance Command queue architecture on UFS provides a path to future high performance by significantly increasing effective I OPS 34 / 34