Implementation and Challenging Issues of Flash-Memory Storage Systems Tei-Wei Kuo Department of Computer Science & Information Engineering National Taiwan University Agenda Introduction Management Issues Performance vsoverheads Other Challenging Issues Conclusion 3/29/2007 Embedded Systems and Wireless Networking Lab. 2 1
Introduction Why Flash Memory Diversified Application Domains Portable Storage Devices Consumer Electronics Industrial Applications SoC and Hybrid Devices Critical System Components 3/29/2007 Embedded Systems and Wireless Networking Lab. 3 Trends in VLSI Technology Source: www.icknowledge.com 3/29/2007 Embedded Systems and Wireless Networking Lab. 4 * This slide was from the ASP-DAC 06 talk delivered by Prof. Sang L. Min from the Seoul National University. 2
Introduction Trends in Storage Technology September 2006 Samsung 2GB USB 2.0 Flash Drive Price: $49.99 Less Rebate: -$25.00 Final Price: $24.99* T-One 2GB Microdrive/3600RPM $144.99 Source: Using multilevel cell NAND flash technology in consumer applications, Electronic Engineering Times, July,2005 3/29/2007 Embedded Systems and Wireless Networking Lab. 5 Introduction Trends in Storage Technology March 2007 Transcend 8GB CompactFlashCard Price: $84.85 ScanDisk4GB CompactFlashCard Price: $55.99 Microdrive 4GB Compact Flash Type II Source: Using multilevel cell NAND flash technology in consumer applications, Price: $116 Electronic Engineering Times, July,2005. Amazon.com 3/29/2007 Embedded Systems and Wireless Networking Lab. 6 3
The Ultimate Limit on Mechanical Devices ua MicrodriveExample Fly By Night 2,000,000 Miles Per Hour Boeing 747 1/100 Flying Height Source: Richard Lary, The New Storage Landscape: Forces shaping the storage economy, 2003. Source: http://www.hitachigst.com/ 3/29/2007 Embedded Systems and Wireless Networking Lab. 7 * This slide was from the ASP-DAC 06 talk delivered by Prof. Sang L. Min from the Seoul National University. Introduction The Characteristics of Storage Media Media DRAM NOR FLASH 15X NAND FLASH 400X DISK Read 60ns (2B) 2.56us (512B) 150ns (1B) 14.4us (512B) 100X 10.2us (1B) 35.9us (512B) 50X 12.4ms (512B) (average) Access time Write 60ns (2B) 2.56us (512B) 211us (1B) 3.52ms (512B) 201us (1B) 226us (512B) 12.4 ms(512b) (average) Erase 1.2s (16KB) 2ms (16KB) [Reference] DRAM:2-2-2 PC100 SDRAM. NOR FLASH: Intel 28F128J3A-150. NAND FLASH: Samsung K9F5608U0M. Disk: Segate Barracuda ATA II. 1 1. J. Kim, J. M. Kim, S. H. Noh, S. L. Min, and Y. Cho. A space-efficient flash translation layer for compact-flash systems. IEEE Transactions on Consumer Electronics, 48(2):366 375, May 2002. 3/29/2007 Embedded Systems and Wireless Networking Lab. 8 - - 4
Introduction Single-Level Cell (SLC) Each Word Line is connected to control gates. Each Bit Line is connected to the drain. Control Gate Drain Source Cell I DS Selected cell 3/29/2007 Embedded Systems and Wireless Networking Lab. 9 Introduction Multi-Level Cell (MLC) vsslc 3/29/2007 Embedded Systems and Wireless Networking Lab. 10 5
Comparison of SLC and MLC 1-bit/Cell SLC NAND Flash 100,000 Program/Erase cycles (with ECC) [1] 10 years Data Retention [1] 2-bits/Cell MLC NAND Flash 10,000 Program/Erase cycles (with ECC) [2] 10 years Data Retention [2] 4-bits/Cell MLC NAND FLASH Developers (2006) M-systems, Intel, Samsung, and Toshiba [1] ST Micro-electronics NAND SLC large page datasheet (NAND08GW3B2A) [2] ST Micro-electronics NAND MLC large page datasheet (NAND04GW3C2A) * USD34.65 per GB for NOR, USD6.79 per GB for NAND 3/29/2007 Embedded Systems and Wireless Networking Lab. 11 Introduction Consumer Applications Electronic Engineering Times, July 2005 3/29/2007 Embedded Systems and Wireless Networking Lab. 12 6
Bandwidth Requirements Video ˇ ˇ Electronic Engineering Times, July 2005 3/29/2007 Embedded Systems and Wireless Networking Lab. 13 Bandwidth Requirements Audio ˇ ˇ Electronic Engineering Times, July 2005 3/29/2007 Embedded Systems and Wireless Networking Lab. 14 7
Introduction Challenges in Flash- Memory Storage Designs Requirements in Good Performance Limited Cost per Unit Strong Demands in Reliability Increasing in Access Frequencies Tight Coupling with Other Components Low Compatibility among Vendors 3/29/2007 Embedded Systems and Wireless Networking Lab. 15 Agenda Introduction Management Issues Performance vsoverheads Other Challenging Issues Conclusion 3/29/2007 Embedded Systems and Wireless Networking Lab. 16 8
Management Issues System Architectures AP AP AP AP File-System Layer Block Device Layer (FTL emulation) MTD drivers Flash Memory 3/29/2007 Embedded Systems and Wireless Networking Lab. 17 Management Issues Flash-Memory Characteristics Write one page 1 Page = 512B 1 Block = 32 pages(16kb) Block 0 Block 1 Block 2 Block 3 Erase one block 3/29/2007 Embedded Systems and Wireless Networking Lab. 18 9
Management Issues Flash-Memory Characteristics Write-Once No writing on the same page unless its residing block is erased! Pages are classified into valid, invalid, and free pages. Bulk-Erasing Pages are erased in a block unit to recycle used but invalid pages. Wear-Leveling Each block has a limited lifetime in erasing counts. 3/29/2007 Embedded Systems and Wireless Networking Lab. 19 Management Issues Flash-Memory Characteristics Example 1: Out-place Update A B C D Live pages pages Suppose that we want to update data A and B 3/29/2007 Embedded Systems and Wireless Networking Lab. 20 10
Management Issues Flash-Memory Characteristics Example 1: Out-place Update A B C D A B Dead pages 3/29/2007 Embedded Systems and Wireless Networking Lab. 21 Management Issues Flash-Memory Characteristics Example 2: Garbage Collection L D D L D D L D L L D L L L F D L F L L L L D F F L L F L L F D This block is to be recycled. (3 live pages and 5 dead pages) A live page A dead page A free page 3/29/2007 Embedded Systems and Wireless Networking Lab. 22 11
Management Issues Flash-Memory Characteristics Example 2: Garbage Collection D D D D D D D D L L D L L L L D Live data are copied to somewhere else. L F L L L L D L L L L F L L F D A live page A dead page A free page 3/29/2007 Embedded Systems and Wireless Networking Lab. 23 Management Issues Flash-Memory Characteristics Example 2: Garbage Collection F F F F F F F F L L D L L L L D L F L L L L D L The block is then erased. Overheads: live data copying block erasing. L L L F L L F D A live page A dead page A free page 3/29/2007 Embedded Systems and Wireless Networking Lab. 24 12
Management Issues Flash-Memory Characteristics Example 3: Wear-Leveling 100 10 L D D L D D L D L L D L L L F D A B Wear-leveling might interfere with the decisions of the blockrecycling policy. 20 15 L F L L L L D F F L L F L L F D Erase cycle counts C D A live page A dead page A free page 3/29/2007 Embedded Systems and Wireless Networking Lab. 25 Management Issues Challenges The write throughput drops significantly after garbage collection starts! The capacity of flash-memory storage systems increases very quickly such that memory space requirements grows quickly. Reliability becomes more and more critical when the manufacturing capacity increases! The significant increment of flash-memory access rates seriously exaggerates the Read/Program Disturb Problems! 3/29/2007 Embedded Systems and Wireless Networking Lab. 26 13
Agenda Introduction Management Issues Performance vs Overheads FTL vs NFTL Other Challenging Issues Conclusion 3/29/2007 Embedded Systems and Wireless Networking Lab. 27 System Architecture fwrite(file,data) process process process process Applications Block write (LBA,size) Flash I/O Requests File system (FAT, EXT2, NTFS...) Garbage Collection Address Translation File Systems FTL/NFTL Layer Device Driver Flash-Memory Storage System Control signals Physical Devices (Flash Memory Banks) 3/29/2007 Embedded Systems and Wireless Networking Lab. 28 14
Management Issues Flash- Memory Characteristics *FTL: Flash Translation Layer, MTD: Memory Technology Device 3/29/2007 Embedded Systems and Wireless Networking Lab. 29 Policies FTL FTL adopts a page-level address translation mechanism. The main problem of FTL is on large memory space requirements for storing the address translation information. 3/29/2007 Embedded Systems and Wireless Networking Lab. 30 15
Policies NFTL (Type 1) A logical address under NFTL is divided into a virtual block address and a block offset. e.g., LBA=1011 => virtual block address (VBA) = 1011 / 8 = 126 and block offset = 1011 % 8 = 3 Write data to LBA=1011 VBA=126 NFTL Address Translation Table (in main-memory)... (9)... Block Offset=3 A Chain Block Address = 9 Used Write If to the the page Write has to the page with been block used page with block offset=3 offset=3 A Chain Block Address = 23 Used A Chain Block Address = 50 3/29/2007 Embedded Systems and Wireless Networking Lab. 31 Policies NFTL (Type 2) A logical address under NFTL is divided into a virtual block address and a block offset. e.g., LBA=1011 => virtual block address (VBA) = 1011 / 8 = 126 and block offset = 1011 % 8 = 3 Write data to LBA=1011 VBA=126 NFTL Address Translation Table (in main-memory)... (9,23)... Block Offset=3 A Primary Block Address = 9 Used If the page has been Write to the used first free page A Replacement Block Address = 23 Used Used Used 3/29/2007 Embedded Systems and Wireless Networking Lab. 32 16
Policies NFTL NFTL is proposed for the large-scale NAND flash storage systems because NFTL adopts a block-level address translation. However, the address translation performance of read and write requests might deteriorate, due to linear searches of address translation information in primary and replacement blocks. 3/29/2007 Embedded Systems and Wireless Networking Lab. 33 Policies FTL or NFTL Memory Space Requirements Address Translation Time Garbage Collection Overhead Space Utilization FTL Large Short Less High NFTL Small Long More Low The Memory Space Requirements for one 1GB NAND (512B/Page, 4B/Table Entry, 32 Pages/Block) FTL: 8,192KB (= 4*(1024*1024*1024)/512) NFTL: 256KB (= 4*(1024*1024*1024)/(512*32)) Remark: Each page of small-block(/large-block) SLC NAND can store 512B(/2KB) data, and there are 32(/64)pages per block. Each page of MLCx2 NAND can store 2KB, and there are 128 pages per block. 3/29/2007 Embedded Systems and Wireless Networking Lab. 34 17
Address Translation Time - NFTL The address translation performance of read and write requests can be deteriorated, due to linear searches of physical addresses. 1. Assume that each block contains 8 pages. 2. Let LBA A, B, C, D, and E be written for 5, 5, 1, 1, and 1 times, respectively. Their data distribution could be like to what in the left figure. 3. For example, it might need to scan 9 spare areas for LBA B. 3/29/2007 Embedded Systems and Wireless Networking Lab. 35 Garbage Collection Overhead - NFTL 1. Copy the most-recent content to the new primary block. 2. Erase the old primary block and the replacement block. 3. Overhead is 2 block erases and 5 page writes. 3/29/2007 Embedded Systems and Wireless Networking Lab. 36 18
Space Utilization - NFTL 3 free pages are wasted. 3/29/2007 Embedded Systems and Wireless Networking Lab. 37 Agenda Introduction Management Issues Performance vs Overheads An Adaptive Two-Level Mapping Mechanism Other Challenging Issues Conclusion 3/29/2007 Embedded Systems and Wireless Networking Lab. 38 19
Motivation An adaptive two-level management design of a flash translation layer, called AFTL. Exploit the advantages of the fine-grained address mechanism and the coarse-grained address mechanism. Memory Space Requirements Address Translation Time Garbage Collection Overhead Space Utilization FTL Large Short Less High NFTL Small Long More Low AFTL A little larger than NFTL Much Better than NFTL Much Better than NFTL Much Better than NFTL 3/29/2007 Embedded Systems and Wireless Networking Lab. 39 AFTL Coarse-to-Fine Switching 1. AFTL doesn t erase the two blocks immediately. RPBA+ 5 RPBA+ 7 PPBA RPBA 2. AFTL moves the mapping information of the replacement block to the fine-grained hash table by adding finegrained slots. ( ( VBA,, PPBA,, RPBA 1) ) 3. The RPBA field of the corresponding mapping information is nullified. Chin-Hsien Wu and Tei-Wei Kuo, 2006, An Adaptive Two-Level Management for the Flash Translation Layer in Embedded Systems, IEEE/ACM 2006 International Conference on Computer-Aided Design (ICCAD), November 5-9, 2006. 3/29/2007 Embedded Systems and Wireless Networking Lab. 40 20
AFTL Fine-to-Coarse Switching The number of the fine-grained slots is limited. Some least recently used mapping information of fine-grained slots should be moved to the coarse-grained hash table. (F, PBAE) RPBA+ 5 RPBA+ 7 ( VBA, PPBA, RPBA) PPBA RPBA 1. Assume that this fine-grained slot is to be replaced. 2. Data stored in the page with the given (physical) address are copied to the primary or replacement block of the corresponding coarse-grained slot, as defined by NFTL. 3. If there dose not exist any corresponding coarse-grained slot, a new one is created. 3/29/2007 Embedded Systems and Wireless Networking Lab. 41 AFTL Fine-to-Coarse Switching Coarse-to-fine switches would introduce fine-tocoarse switches and overhead in valid page copying. It is because the number of the fine-grained slots is limited. Stop any coarse-to-fine switch when some frequency bound in coarse-to-fine switches is reached. We set a parameter in the experiments to control the frequency of switches to explore the behavior of the proposed mechanism. 3/29/2007 Embedded Systems and Wireless Networking Lab. 42 21
cg cg cg The Advantages of AFTL Improve the address translation performance. It is because the moving of their mapping information to the fine-grained hash table. Improve the garbage collection overhead. The delayed recycling of any replacement block reduces the potential number of valid data copyings and blocks erased. Improve the space utilization. The delayed recycling of any primary block lets free pages of a primary block be likely used in the future. ( δ δ δ cg. RPBA+ 5 δ cg. RPBA+ 7. VBA, δ RPBA) PPBA, δ cg.ppba δ cg. RPBA Chin-Hsien Wu and Tei-Wei Kuo, 2006, An Adaptive Two-Level Management for the Flash Translation Layer in Embedded Systems, IEEE/ACM 2006 International Conference on Computer-Aided Design (ICCAD), November 5-9, 2006. 3/29/2007 Embedded Systems and Wireless Networking Lab. 43 Agenda Introduction Management Issues Performance vs Overheads Performance Evaluation Other Challenging Issues Conclusion 3/29/2007 Embedded Systems and Wireless Networking Lab. 44 22
Performance Evaluation Performance Setup CPU RAM OS The characteristics of the experiment trace was over a 20GB disk. File Systems Applications Durations Total Write / Read Requests Different LBA s Intel Celeron 750MHz 320 MB Windows XP NTFS Web Applications, E-mail Clients, MP3 Player, MSN Messenger, Word, Excel, PowerPoint, Media, Player, Programming, and Virtual Memory Activities One week 13,198,805 / 2,797,996 sectors 1,669,228 3/29/2007 Embedded Systems and Wireless Networking Lab. 45 Performance Evaluation Performance Setup The maximum number of fine-grained slots is controlled by a parameter MFS. A parameter ST controls the frequency of switches between the two address translation mechanisms n/st. ST=0 => No constraint on the number of switches. Smaller ST => More switches. Larger ST => Less switches. 3/29/2007 Embedded Systems and Wireless Networking Lab. 46 23
Memory Space Requirements 1200 AFTL NFTL 1000 800 Memory Space(K) 600 400 200 0 MFS=2,500 MFS=5,000 MFS=7,500 MFS=10,000 MFS=12,500 MFS=15,000 NFTL 1. MFS ranged from 2,500, 5,000, 7,500, 10,000, 12,500, to 15,000. 2. AFTL uses a little more memory space than NFTL. 3/29/2007 Embedded Systems and Wireless Networking Lab. 47 Address Translation Performance Address Translation Time (ms) 1000000 950000 900000 850000 800000 750000 700000 650000 600000 MFS=2,500 ST=64 ST=32 ST=16 ST=0 NFTL MFS=5,000 MFS=7,500 MFS=10,000 MFS=12,500 MFS=15,000 NFTL 1. Larger MFS => smaller address translation time -More address translations going through the fine-grained address translation mechanism. 2. Smaller ST => longer address translation time -More coarse-to-fine switches 3/29/2007 Embedded Systems and Wireless Networking Lab. 48 24
Garbage Collection Overhead N u m b e r o f B l o c k s E ra s e d 550000 530000 510000 490000 470000 450000 430000 410000 390000 370000 350000 M F S =2,50 0 ST=64 ST=32 ST=16 ST=0 NFTL M F S =5,00 0 M F S=7,50 0 M F S=1 0,0 00 M F S=1 2,5 00 M F S=1 5,0 00 N FT L A v e r a g e N u m b e r o f V a l i d P a g e s C o p i e d 10 9 8 7 6 5 4 MFS =2,50 0 ST=64 ST=32 ST=16 ST=0 NFTL MFS =5,0 00 M F S =7,5 00 M F S =10,00 0 M F S =12,500 M F S =15,000 NF T L AFTL outperforms NFTL. -Coarse-to-fine switches can avoid immediate recycling of their primary and replacement blocks and related valid data copyings. 3/29/2007 Embedded Systems and Wireless Networking Lab. 49 Space Utilization Average Number of Pages Left 3 2.5 2 1.5 1 0.5 0 MFS=2,500 ST=64 ST=32 ST=16 ST=0 NFTL MFS=5,000 MFS=7,500 MFS=10,000 MFS=12,500 MFS=15,000 NFTL The Space utilization might be better under AFTL. -Coarse-to-fine switches can delay the recycling of replacement blocks. - pages of primary blocks might be used in the future.. 3/29/2007 Embedded Systems and Wireless Networking Lab. 50 25
cg cg cg Summary AFTL is proposed to exploit the advantages of finegrained/coarse-grained address translation mechanisms, and to switch dynamically and adaptively the mapping information between the two address translation mechanisms. AFTL does provide good performance in address mapping and space utilization and have garbage collection overhead and memory space requirements under proper management. 3/29/2007 Chin-Hsien Wu and Tei-Wei Kuo, 2006, Embedded An Adaptive Systems Two-Level and Wireless Management Networking for the Flash Lab. Translation Layer in Embedded Systems, 51 IEEE/ACM 2006 International Conference on Computer-Aided Design (ICCAD), November 5-9, 2006. ( δ δ δ cg. RPBA+ 5 δ cg. RPBA+ 7. VBA, δ RPBA) PPBA, δ cg.ppba δ cg. RPBA Agenda Introduction Management Issues Performance vsoverheads Other Challenging Issues Conclusion 3/29/2007 Embedded Systems and Wireless Networking Lab. 52 26
Challenging Issues Reliability Each Word Line is connected to control gates. Each Bit Line is connected to the drain. Control Gate Drain Source Cell I DS Selected cell 3/29/2007 Embedded Systems and Wireless Networking Lab. 53 Challenging Issues Reliability Read Operation When the floating gate is not charged with electrons, there is current I D (100uA) if a reading voltage is applied. ( 1 state) 5V 1V Program Operation Electrons are moved into the floating gate, and the threshold voltage is thus raised. 3/29/2007 Embedded Systems and Wireless Networking Lab. 54 27
Challenging Issues Reliability Over-Erasing Problems Fast Erasing Bits All of the cells connected to the same bit line of a depleted cell would be read as 1, regardless of their values. Read/Program Disturb Problems DC erasing of a programmed cell, DC programming of a non-programmed cell, drain disturb, etc. Flash memory that has thin gate oxide makes disturb problems more serious! Data Retention Problems Electrons stored in a floating gate might be lost such that the lost of electrons will sooner or later affects the charging status of the gate! 3/29/2007 Embedded Systems and Wireless Networking Lab. 55 Challenging Issues Observations The write throughput drops significantly after garbage collection starts! The capacity of flash-memory storage systems increases very quickly such that memory space requirements grows quickly. Reliability becomes more and more critical when the manufacturing capacity increases! The significant increment of flash-memory access rates seriously exaggerates the Read/Program Disturb Problems! Wear-leveling technology is even more critical when flash memory is adopted in many system components or might survive in products for a long life time! 3/29/2007 Embedded Systems and Wireless Networking Lab. 56 28
Wear Leveling versus Product Lifetime Settings File system: FAT16 file system with 8KB cluster size Flash memory: 256MB small-block flash memory with 100K erase cycles Updating of a 16MB file repeatedly with the throughput: 0.1MBs The file requires 2K clusters = 16MB 8KB(cluster size) The FAT size of this file is 4KB (2K(clusters) x 2 bytes) The 20% of blocks in flash memory joins the dynamic wear leveling Data of a 16MB file is stored in 1K blocks (16MB 16KB(block size)) Suppose flash memory is managed in the block level File systems update the FAT in each cluster writing so that FAT is updated 2K times for a 16MB file Writing of a 16MB incurs 1K block erases because of the reclaiming of invalid space. 3/29/2007 Embedded Systems and Wireless Networking Lab. 57 Wear Leveling versus Product Lifetime Ways in Data Updates In-Place-Updates: Rewriting on the Same Page Dynamic Wear Leveling: Rewriting over Another Page with Erasing over Blocks with Dead Pages Static Wear Leveling: Rewriting over Another Page with Erasing over Any Blocks Expected Lifetime of 16(MB) 100K NO Wear Leveling (in - place update) = 0.09(days) 0.1(MB/second) 2K 24 60 60 16(MB) 16K(blocks) 20% 100K Dynamic Wear Leveling = 197.5(days) 0.1(MB/second) (2K + 1K ) 24 60 60 16(MB) 16K(blocks) 100% 100K Static Wear Leveling = 987.5(days) 0.1(MB/second) (2K + 1K ) 24 60 60 3/29/2007 Embedded Systems and Wireless Networking Lab. 58 29
Wear Leveling versus Product Lifetime Static Wear Leveling Block-Level Mapping Use a counter for each block The garbage collector always finds the block with the least erase count. Some heuristic approach erases a block to maintain 2 free blocks when the garbage collector finds the erase count of the block is over a given threshold. Problems: High extra block erases and live-page copyings High main-memory consumption High computation cost : index in the selection of a victim block GC starts find a victim Update GC Write starts new data block data in 15block to block 3 4 block 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Flash Memory : index to the selected free block 5 5 5 54 5 5 54 45 54 5 5 5 45 4 4 5 Counter : block contains (some) valid data : free block : dead block 3/29/2007 Embedded Systems and Wireless Networking Lab. 59 Conclusion Summary The Characteristics of Flash Memory and Management Issues Popular Implementations: FTL vs NFTL Adaptive Two-Level Address Translation Performance, Cost, and Reliability Challenges 3/29/2007 Embedded Systems and Wireless Networking Lab. 60 30
Conclusion What Is Happening? Solid-State Storage Devices New Designs in the Memory Hierarchy More Applications in System Components and Products Challenging Issues: Performance, Cost, and Reliability Scalability Technology Reliability Technology Customization Technology 3/29/2007 Embedded Systems and Wireless Networking Lab. 61 Contact Information Professor Tei-Wei Kuo ktw@csie.ntu.edu.tw URL: http://csie.ntu.edu.tw/~ktw Flash Research: http://newslab.csie.ntu.edu.tw/~flash/ Office: +886-2-23625336-257 Fax: +886-2-23628167 Address: Dept. of Computer Science & Information Engr. National Taiwan University, Taipei, Taiwan 106 3/29/2007 Embedded Systems and Wireless Networking Lab. 62 31
Q & A 32