Managing the evolution of Flash : beyond memory to storage Tony Kim Director, Memory Marketing Samsung Semiconductor I nc. Nonvolatile Memory Seminar Hot Chips Conference August 22, 2010 Memorial Auditorium Stanford University
Contents NAND Flash technology Flash storage management Flash storage architecture by apps Future trend Conclusions
Contents NAND Flash technology Flash storage management Flash storage architecture by apps Future trend Conclusions
NAND Technology Shifts Smoothly Density has doubled every year since 2004 There are breaking points on key technology beyond 2013 Lithography shrink slow s, NAND Reliability degrades < Density> TLC < Reliability> MLC 256Gb SLC 100K 50K ~100K 128Gb 10K SLC 64Gb 5K 3K 16Gb 16Gb 1.5K 4Gb 8Gb 8Gb 1K MLC 64Mb 256Mb 0.5K TLC * SLC : Single level cell, MLC : Multi level cell, TLC : Triple level cell 3 / 34
Reliability Tolerance by Technology I nfluence on Scaling-down of Floating Gate Increase of F-Poly interference Cell interference Decrease of coupling ratio V PGM/ERS Reduction of charge loss damage tolerance Charge loss Ref: YunSeung Shin, Symposium on VLSI Circuits, pp.156 159, 2005 4 / 34
JEDEC Standard : Cycle & Retention Density Write operations should occur across device life time 10 year retention after lifetime w rite cycle is unpractical retention Endurance cycle retention Endurance cycle Max. spec cycle 10years Max. spec cycle Until 2006 As of 2007 100% END Cycle + 10y DTN 10% END Cycle + 10y DTN 100% END Cycle + 1y DTN 5 / 34
Controller : Critical for Maintaining Reliability More intelligence of controller can offset some of the generic degradation of NAND reliability from scaling Endurance Target Endurance Role of Controller 3 K 2bit 1 K 3bit Endurance of cell 4 xnm 3 xnm 2 xnm Technology 6 / 34
Technology Engine w ith ECC/ Bit Error/ FTL Legacy controller only with ECC cannot reliably handle 2 bit in 3xnm and beyond, let alone 3 bit Optimization w ith Flash cell characteristics in mind is crucial New metric for reliability measurement is needed such as lifetime data amount w ith standard pattern 3 bit MLC Legacy Controller R1 R2 R3 R4 R5 R6 R7 Vth Bit errors New Controller Melded together Host I/F FTL (S/W) ECC Engine NAND Host I/F FTL (S/W) ECC Engine Bit Error Engine NAND 7 / 34
Samsung s I nnovative Flash Technology Samsung is exploring new technology to break status- quo Samsung believes 3D- NAND is the most likely successor for Planar NAND in the coming future Planar NAND 3D-NAND Cell architecture Advantages Easy to produce with simple process High reliability Process reuse Challenges Hard to shrink under 1xnm Hard to stack 8 / 34
3D-NAND details Potential benefit Better reliability/ endurance than planar since cell design rule is much more relaxed. I ssues in future scaling Bit cost reduction is done by increasing the stacking layer, thereby increasing by 2X per each generation becomes more difficult and unlikely Block size w ill be larger than the planar-equivalent n + SSL Gate Block Size (MB) 4.5 4 3.5 3 2.5 W/ L share 8 Control Gate (W/L) 2 1.5 1 W/ L share 6 W/ L share 4 n n CSL+ p-sub + GSL Gate 0.5 0 8 16 24 32 Stacking layer # 9 / 34
Contents NAND Flash technology Flash storage management Flash storage architecture by apps Future trend Conclusions
Why Software I s Needed for Flash? Small data unit in Program and the large unit in Erase requires another block new ly allocated through mapping updating New Valid Logical #1 Log #1. #1 Phy. page. page page page page Copy New Block #N Invalidated Map Table Block #1 #1 #1 Block #1 New Log. Phy. #N Map Table (updated) Logical block #1 is mapped to physical block #1 and #N after updating 11 / 34
FTL ( Flash Translation Layer) Manages mapping from logical to physical address Detects and maps out Bad Blocks Does Wear-leveling for life extension Application Application File System FTL Sector Read/Write Page Read/ Program/Erase File System Host Driver Managed NAND FTL Sector Read/Write NAND Flash NAND 12 / 34
Reclaiming Valid : Garbage Collection Over time Flash is mixed with valid and invalid data Free space needs to be reclaimed to write new data Garbage collection merges the valid data from the scattered blocks To be updated Logical Block updated [Block #1] updated [Block #2] [Block #3]??? #1 #2 #3 #4 No more free block Physical Block Invalid updated Invalid updated Merge #1 #2 Erased updated Allocated for Logical Block #3 13 / 34
What is WAI (Wear Acceleration I ndex)? WAI : The index that represent how much FTL accelerate the wear-out of NAND Example: WAI Write Request (2 blocks) = EraseCount Total BlockSize WriteSizeBytes Bytes FTL Write (4 blocks) + Erase (4 blocks) WAI NAND Life time Meaning High Short Bad Low Long Good NAND WAI = 4 128KB 256KB = 2 * Block Size: 128KB 14 / 34
What is Wear-leveling? Hot data like FAT can wear out certain portion of cell array Wear-leveling maximizes the life span of NAND flash as each cell is used evenly Frequent Update FAT Erase Count Erase Count Every cell is used evenly Block Num ber Before Wear-leveling Block Num ber After Good Wear-leveling 15 / 34
Contents NAND Flash technology Flash storage management Flash storage architecture by apps Future trend Conclusions
NAND Flash Market Outlook (` 10 ~ ` 15) Samsung expected NAND Market CAGR of 53% between 2010 and 2015 Key applications for NAND market grow th for next decade are : Flash Card, Smart Phone, Tablet PC and SSD W/ W Annual Unit @16Gb Equiv. [ Billions] 6.7 Billion 2010-2015, 5-Yr CAGR: 53% SSD Tablet PC Smart Phone Flash Card 56 Billion Year ( Source : Samsung ) 17 / 34
Performance Multiplied w ith Multi- Chips Flash storage performance can be easily expanded with multi-way write interleaving along with multi-channels During program Busy, data can be loaded and programmed into other NAND devices on the same bus Flash Storage architecture CH 1 nce1 Load PROGRAM Load nce1 nce2 Load PROGRAM Load nce2 Write I OPS Flash Controller NAND Controller No.1 NAND Controller No.2 I / O s I / O s 5000 4000 3000 2000 nce3 1000 nce4 0 1 2 4 8 16 Number of chips 18 / 34
Application Segmented by Technology Applications w ill be fragmented by performance and reliability Closer communications needed to understand user requirements and tailor appropriate solutions Endurance 300K 100K 30K 3K 1.5K 1K TLC MLC MP3, PND SD Class4 SLC Mid-End Enterprise SSD Low-End Server SSD Mobile Phone, PMP MP3, PND SD Class6 High-End Server SSD Tablet, Consumer SSD High-End Card 500 SD Class2 SD Class4, OEM UFD SD Class6 50 Consumer Card, UFD Consumer Card, UFD 2.5MB/ s 5MB/ s 8MB/ s 13MB/ s Write Performance 19 / 34
Contents NAND Flash technology Flash storage management Flash storage architecture by apps Future trend Conclusions
Storage Architecture Evolution : Technical Trend New environment on the storage drives higher performance Multi tasking with Swap IO High performance App with fast data IO High bandwidth Network with User IO 150 Storage Bandwidth [MB/ sec] 1.3Gbps UWB 100 UFS -Multi Task -Full brow sing -Multimedia play -Async I O & Sync I O -Swap -Web Cache -HD Media File -Small Random I O -Big Size Sequential Write 1Gbps 802.11ac 80 60 emmc 4.4 -Multi Task -App install -Web access -Multimedia -App - base 600Mbps 802.11n 40 300Mbps 802.11n~ g 20 Former emmc - Single Task -Text -Photo 54Mbps 802.11g 11~ 40Mbps 802.11b 2009 2010 2011 2012 2013 System Environment Environment Netw ork Environment 21 / 34
Architecture for high Performance and Low Latency Architecture of OneNAND + movinand can be unified to movinand only with fast random write and low latency I mplementing HPI can resolve the problem 2 channel storage 1 channel storage 1 channel storage CPU SLC CPU SLC CPU OneNAND CTRL MLC DRAM CTRL SLC MLC emmc CTRL SLC MLC emmc MB/ S (Write) Now problematic TI ME 22 / 34
Preparing Host for New Features of emmc & UFS The enhancements in read/ write performance, data integrity at sudden system-power failure and lower power consumption at idle are only realized with the relevant support of file-system, and OS in some cases Linux open source community is far behind in aligning to them Chipset and handset vendors should work out the responsibility details Features Purpose emmc UFS Trim Write speed up Reliable write integrity at power-loss HPI Write-suspend for fast read access Command queuing Read/ Write speed up Sleep Power Lower power at idle 23 / 34
Attributes in Mobile System in mobile device has different data attribute System data/ Code data / Swap data - System working, Multi- tasking, System application working - High speed random I/ O performance & data reliability are mainly required User data - High density Multimedia data read & write - Sequential I/ O is mainly required code System Meta Sw ap User Read Centric Write Centric Sequential Random Sequential Random Reliability 24 / 34
New Feature for e-mmc 4.4 : Multi Partition Flexibility to host to manage emmc 4.4 4 general purpose partitions and enhanced user data area can be set in normal user data area General Purpose Partition Boot 1 Boot 2 RPMB (Secure ) Enhanced User data area(slc mode) Normal User Area Partitions NAND type Default Size Remarks Boot Area Partition 1 SLC Mode 128KB Size as multiple of 128KB (max. 32MB) Boot Area Partition 2 SLC Mode 128KB Size as multiple of 128KB (max. 32MB) RPMB Area Partition SLC Mode 128KB Size as multiple of 128KB (max. 32MB) General Purpose Partitions MLC or Enhanced Area 0KB Available size can be seen by following: (EXT_CSD[145]* 8 2 + EXT_CSD[144]* 8 1 +EXT_CSD[143]) * HC_WP_GPR_SIZE*HC_ERASE_GPR_SIZE * 512KB byte User Area Enhanced Area SLC Mode 0KB Start address multiple of Write Protect Group size Default Area MLC 93.1% 25 / 34
Usage Model in emmc4.4 SLC and MLC partitions to be tailored by use scenarios Host Side emmc 4.4 Boot Code System Code System File Security Read I O, Reliability Read I O, Reliability Boot Area Partition RPMB Area Partition General Purpose Area Partition 1 General Purpose Area Partition 2 SLC MLC Swap Application Code User Area Partition User Read/ Write I O, Sequential Pattern Enhanced User Area Partition 26 / 34
Better Multi-tasking Performance with Trim Trim reduces long write latency and in turn read latency at multi-tasking. Mobile phones very likely having some free space benefit from Trim Thread 1 (4k RD) RD RD RD Thread 2 (64K WR) Merge Program Merge Program (Source : Microsoft) No Impact as File Deleted Thread 1 (4k RD) RD RD RD RD Thread 2 (64K WR) Program Program Program Program Performance Increase as Files Deleted Conceptual view of increased read thread due to long write thread with Busy 27 / 34
How Write Latency Affects for Multi- tasking Multi-tasking needs low write latency in a time-out value for uninterrupted audio/ video play-back PROCESS A APP STOP HOST MP3 PLAY PROCESS B Download Device Traditional emmc Read Write I nternal Write ( Merge) MB/ s (Write) TI ME 28 / 34
I nterrupting Write-Busy Sustains Real-time Task Long write-busy of emmc should be interrupted for high priority real-time task such as audio/ video play-back HOST PROCESS A Video Playing PROCESS B Cache Write CMD 0 CMD 1 Request HPI Time-Out Point Time-Out Point DEVI CE MoviNAND STOP CMD 0 & I nterrupt Start Write I O HPI CMD I O Suspend CMD 0 RESUME CMD 0 Resume Write I O of Process B 29 / 34
Next Generation Embedded Storage : UFS Universal Flash Storage based on serial M-Phy Serial interface : 300MB/s bandwidth Native Command Queuing : Support parallel NAND Flash working for Random/Sequential IO Page mapping with DRAM : Reduce internal Merge operation NCQ w/ Q depth 4 HOST System Code data Swap data User data 300MB/ s Serial I nterface CTRL CMD1 CMD2 CMD3 CMD4 DRAM Page Mapping & Buffer with DRAM 30 / 34
CMD 4 CMD 2 CMD 1 CMD 6 CMD 5 CMD 3 UFS Command Queuing Enhance Performance Normal data buffered and w ritten to different NAND dies in parallel Host system should indicate critical data like file-system meta to be w ritten synchronously Command Queuing Command Reordering & Parallel Request Operate CMD13 Operate CMD25 Operate CMD 46 CMD 1 CMD 2 NAND chip no.1 CMD 3 UFS CTRL CMD 4 NAND chip no.2 CMD 5 CMD 6 31 / 34
Queue Depth & Random I O Performance Random Performance depends on parallel I O number Write IOPS UFS w/o UCQ UFS w/ UCQ Q Depth : 2 Read IOPS Q Depth : 1 Q Depth : 2 Q Depth : 4 * IOPS(Input/Output Operation Per Second) is a common benchmark for computer storage media Optimized Command Queue Depth Depends on the number of NAND channel or way in UFS Device Consider loss while sudden power off Register for Dynamic setting of Command Queue Depth Add to UCQ mode page 32 / 34
UFS Standardization Roadmap UFS version 1.0 is only the baseline Additions for performance, low power and reliability will be put into the follow-up spec 2010 2011 Q3 Q4 Q1 Q2 Q3 Q4 UFS Specification e-mmc features ERASE/ TRI M/ Reliable w rite NCQ Hybrid Pow er 1 st showing (Aug. interim meeting) 2 nd showing 2 nd showing Version 1.0 Version 1.x Final 2 nd showing 33 / 34
Conclusions Various NAND Flash technology target different markets with different focus in architecture and performance, resulting in w ider product portfolio High performance mobile systems should tailor the use of different NAND Flash technology of single storage with regards to performance and reliability for various application scenarios NAND-friendly system-level solutions such as Trim, HPI as well as Reliable Write on emmc will play a critical role in managing high reliability and performance Command queue architecture on UFS provides a path to future high performance by significantly increasing effective I OPS 34 / 34