UFS 2.0 NAND Device Controller with SSD-Like Higher Read Performance Konosuke Watanabe Toshiba Santa Clara, CA 1
How did our embedded NAND storage device get SSD-like higher read performance? Santa Clara, CA 2
Outline Background Approaches Example results Santa Clara, CA 3
Embedded NAND Storage Device Host SoC Link NAND device controller NAND chips Single BGA package chip Connects to host SoC with standardized link Contains controller and NAND chips It must be small and lower power 10-20mm 1-2mm 10-20mm 1-2 W (active)...but now higher performance is also expected UFS 2.0: 1160MB/s (5.8Gbps x 2-lane) Santa Clara, CA 4 cf. SATA 3.0: 600MB/s
4KB random read (KIOPS) Performance of Current Embedded NAND Storage Device Products is Quite Low 100 50 DensBits DB3610 (128G) SanDisk inand Extreme (128G) Embedded SSD Samsung KLMCG8GEAC (64G) Samsung SSD 840 EVO (250G) SanDisk Ultra Plus SSD (256G) Crucial M500 (120G) Intel 335 (240G) SanDisk i100 (128G) SSD Intel 520 (240G) OCZ Vertex 3.20 (240G) OCZ Vertex 450 (256G) Corsair Force Series LX (256G) Intel 530 (240G) 0 100 200 300 400 500 600 700 Embedded Sequential read (MB/s) Read performance of embedded NAND storage device and SSD products Santa Clara, CA 5
Reason for Their Poor Performance and Our Challenge Embedded NAND storage device has strict limitation It must be small and lower power consumption Cannot take SSD s rich man s approach Improve performance by utilizing rich resources Massive NAND chips / channels Powerful CPU Large capacity RAM Goal: Achieve SSD-like higher performance without rich resources Focused on read performance of UFS 2.0 device Santa Clara, CA 6
Outline Background Approaches Example results Santa Clara, CA 7
Data Paths Strengthen (1) Conventional NAND Device Controller Embedded NAND storage device NAND device controller Logic + CPU Data paths NAND chips Host Host I/F NAND I/F Santa Clara, CA 8
Data Paths Strengthen (2) Strengthened Data Paths Minimally 5.8Gbps x 2-lane Embedded NAND storage device NAND device controller Logic + CPU 800MB/s 400MB/s x 2ch Host SoC M-PHY HOST I/F NAND I/F NAND chips Santa Clara, CA 9
Host NAND device controller NAND User data FTL data Random Read Latency Reduction (1) Latency of Random Read Random read is accompanied by a couple of NAND reads Target data (user data) read A couple of FTL data reads Special information for address translation NAND read latency is several 10s μsec Random read latency is larger than 100μsec Can be reduced by caching FTL data on large capacity RAM With increasing of package size and power consumption Santa Clara, CA 10
Host Memory FTL data cache Random Read Latency Reduction (2) Using Unified Memory Architecture (UMA) 1. Made NAND device controller available to access host memory Extended UFS standard 2. Cached FTL data on host memory 1. Reads host memory to get FTL data NAND device controller NAND User data FTL data Santa Clara, CA 2. Reads NAND to get user data Host memory read latency << NAND read latency Reduces random read latency by 50% in our evaluation environment With no increase in package size and chip power consumption 11
Random Read Throughput Boost (1) Parallel NAND Reading NAND chips can be read in parallel Multiple NAND chips (and channels) Multiple outstanding NAND read requests Parallelism is determined by number of NAND chips, channels and read requests SSD Our device NAND device controller NAND device controller Santa Clara, CA 12
Random Read Throughput Boost (2) Random Read Command Processor (RRCP) UM FTL data Pipeline1 Pipeline2 Pipeline3 NAND request manager Chip managers Channel arbiters Front-end Parallel out-of-order random read command processing Back-end Efficient NAND read access arbitration and management 7.0 NAND chips are activated in parallel on average in our evaluation environment (with 8 NAND chips) Santa Clara, CA 13
Outline Background Approaches Example results Santa Clara, CA 14
SSD-like Read Performance is Achieved 4KB random read (KIOPS) 100 50 0 Embedded SSD Our work 100 200 300 400 500 600 700 Sequential read (MB/s) 690MB/s 66.3KIOPS Our work (8 NAND chips) Santa Clara, CA 15
Package Size and Power Consumption can be Maintained Package size (with NAND chips) Package power Active (Seq. read) < 1.5 W Idle / Sleep 11.5mm x 13.0mm x 1.2mm < 1.45 mw / < 0.30 mw Depends on storage capacity cf. Typical level Size 10-20mm 1-2mm 10-20mm Power consumption 1-2 W (active) Santa Clara, CA 16
Conclusion Developed UFS 2.0 embedded NAND storage device Improved read performance with three approaches Strengthen data paths minimally Reduce random read latency by using unified memory architecture and FTL data caching Boost random read throughput by maximizing read command parallelism with Random Read Command Processor SSD-like read performance is achieved 4KB random read: 66.3KIOPS Sequential read: 690MB/s Package size and power consumption can be maintained Package size (with NAND chips) : 11.5mm x 13.0mm x 1.2mm Active (seq. read / write) : < 1.5 W Santa Clara, CA 17