Low-Power Error Correction for Mobile Storage Jeff Yang Principle Engineer Silicon Motion 1
Power Consumption The ECC engine will consume a great percentage of power in the controller Both RAID and LDPC become crucial to high reliable flash controller, which is employed in embedded applications. Stronger correction capability Longer ECC chunk (e.g., 2KB BCH is stronger than 1KB, and 2KB LDPC is much stronger than 1KB) Higher decoder resolution and parallelism architecture will reduce the decoding latency. Higher power consumption. Wider range corrupt data recovery Larger RAID recovery range will provide wider range data recovery. Consume more memory size and calculation logic. Higher power consumption. In normal operation For RAID engine, only the encoder part is active all the time. LDPC uses the hard-decoding only, but the encoder is always active in each flash write action. Power consumption reduction in hot region RAID encoding LDPC decoding SMI Confidential 2
RAID Engine The SSD module failure rate will be the Key for TLC adoption on SSD. Latest Planar NAND might be 1znm. Bit column, sense-amp, address decoder, or charge pump corrupted WL lining or WL to WL short. First popular 3D structure on BiCs. Inherit most of failure cases from 2D. High aspect ratio etching and larger number of layers control will contribute more failure cases. No Erase/Program failure, but with reading ECC failure. Need RAID engine to do the error recovery. In 3D TLC, three WL lining may cause data loss over 9 pages. High efficiency RAID with erasure decoding. Encoding with high efficiency memory access. Memory access dominates the encoding power. Low RAID recovery trigger rate. [Ref]: VLSI symp. 2009, pp.136-137, VLSI symp. 2008, pp. 192-193 VLSI symp. 2010, pp. 131-132, IEEE Trans. Electron Devices, vol.58, no.5, pp. 1006-1014, Apr.2011 Lower grade TLC flash with RAID to have 2.6K endurance 3
DSP Engine Power Consumption NAND Flash characteristics on Endurance and Retention. Retention capability on TLC degrades rapidly as P/E cycle increases. Reduce the power with Retention-aware algorithm: Predict the sensing point in the first read. Joint the soft-decoding and the Vt-tracking. Minimize the number of read operation The error bit number is the key factor of power consumption. Sensing the data which has lowest bit error rate Predict error types and select the decoding strategy. 4
NAND characteristic. SLC/MLC/TLC 5
TLC Error Behavior on Endurance X: Number of error bit Y: Number of ECC chunk Endurance 2K, 90% of the error bit number is less than 30 bits 2K: CSB-RBER = 2.7e-3 1K: CSB-RBER = 1.3e-3 Sensing the data with lowest error bit. Predict the error case and select the Soft-decoding/RAID strategy. trigger rate X: P/E cycle After the 2.7K cycling, it needs LDPC soft-decoding. RAID redundant overhead ~ 0.07%. Double strong endurance from LDPC hard to soft decoding. First RAID recovery occurs on 4.5K cycles. 6
LDPC Low Power Decoding Technology A mw 2A mw 3A mw Quantization Optimization for the Harddecoding mode. Change floating point calculation system 3A mw Traditional LDPC engine structure is here Clock gated control for shift register Smart Error prediction Retiming optimization Adaptive clipping Future work New decoding algo? 1.3A mw Power efficient LDPC engine structure 7
1KB BCH vs. LDPC decoding power consumption TSMC40nm on TLC flash Error bit range /1KB BCH LDPC NOTE 1~20 bit A mw 0.9A mw ~400MB/sec 20~40 bit B mw 1.4B mw ~400MB/sec 40~60 bit C mw 2.6 C mw ~300MB/sec Endurance BCH LDPC NOTE 0~1K P/E A mw 0.9A mw Hard-bit only 1.~2K B mw 1.4B mw Hard-bit only 2K~5K NA 2.6C mw, poor throughput soft-decoding trigger frequently. If requirement is 1K cycles, Will be the BCH enough to support the TLC? Flash Memory Summit 2013 8
After Data Retention at 500P/E Flash Memory Summit 2013 9
Comparison between 1KB and 2KB Error bit under 2KB chunk is 110bit. Almost 100% correctable rate for Hard-decision only. 851 MB/sec under 400MHz operation frequency. When the error bit under 1KB chunk is 60bit. Almost 100% correctable rate for Hard-decision only. 330 MB/sec under 400MHz operation frequency. 1KB vs. 2KB under the same error condition 1KB/60bit 2KB/110bit Area (gate) 1A 1.8A TP (@400MHz) 330 MB/sec 851 MB/sec Power (W) 1A 2.15A Power Efficiency 1A MB/J 1.2A MB/J Area Efficiency (per gate) 1A B/s 1.4A B /s UBER=1e-15 RBER=1e-2 RBER=1.25e-2 10 SMI Confidential
Ideal Solution for TLC-based SSD LDPC 1K+124B, t=~120 RBER= 1e-2, UBER=1e-15 BCH 2KB+228B, t=121bit, RBER=3.5e-2, UBER=1e-15 P/E = 2379 BCH 1KB+124B, t=70bit, RBER=3.1e-2, UBER=1e-15 P/E = 2164 The 2KB BCH engine only has limited improvement on endurance and retention. LDPC ideally fits TLC and maximizes the endurance of TLC-based SSD solution. LDPC + RAID is the ONLY reliable solution for TLC-based SSD. 11
Q & A Disclaimer Notice Although efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided as is as of the date of this document and always subject to change. 12