A New Chapter for System Designs Using NAND Flash Memory



Similar documents
Choosing the Right NAND Flash Memory Technology

NAND 201: An Update on the Continued Evolution of NAND Flash

NAND Flash Architecture and Specification Trends

enabling Ultra-High Bandwidth Scalable SSDs with HLnand

NAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ

Flash Memory. For Automotive Applications. White Paper F-WP001

NAND Flash Architecture and Specification Trends

Advantages of e-mmc 4.4 based Embedded Memory Architectures

Eureka Technology. Understanding SD, SDIO and MMC Interface. by Eureka Technology Inc. May 26th, Copyright (C) All Rights Reserved

1 Gbit, 2 Gbit, 4 Gbit, 3 V SLC NAND Flash For Embedded

NAND Basics Understanding the Technology Behind Your SSD

Samsung emmc. FBGA QDP Package. Managed NAND Flash memory solution supports mobile applications BROCHURE

NAND Flash & Storage Media

DDR subsystem: Enhancing System Reliability and Yield

Technical Note Memory Management in NAND Flash Arrays

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

Nasir Memon Polytechnic Institute of NYU

MCF54418 NAND Flash Controller

1 / 25. CS 137: File Systems. Persistent Solid-State Storage

Benefits of Solid-State Storage

Embedded Multi-Media Card Specification (e MMC 4.5)

Samsung emcp. WLI DDP Package. Samsung Multi-Chip Packages can help reduce the time to market for handheld devices BROCHURE

Trends in NAND Flash Memory Error Correction

AMD Opteron Quad-Core

Programming NAND devices

Universal Flash Storage: Mobilize Your Data

Comparison of NAND Flash Technologies Used in Solid- State Storage

Linux flash file systems JFFS2 vs UBIFS

Technologies Supporting Evolution of SSDs

HP Smart Array Controllers and basic RAID performance factors

Solid State Storage in Massive Data Environments Erik Eyberg

Semiconductor Device Technology for Implementing System Solutions: Memory Modules

1. Memory technology & Hierarchy

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

RealSSD Embedded USB Mass Storage Drive MTFDCAE001SAF, MTFDCAE002SAF, MTFDCAE004SAF, MTFDCAE008SAF

SAN Conceptual and Design Basics

Read this before starting!

Computer Architecture

Single 2.5V - 3.6V or 2.7V - 3.6V supply Atmel RapidS serial interface: 66MHz maximum clock frequency. SPI compatible modes 0 and 3

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment

Technical Note. Initialization Sequence for DDR SDRAM. Introduction. Initializing DDR SDRAM

Flash-optimized Data Progression

1 Storage Devices Summary

Solid State Drive Architecture

USB Flash Drive. RoHS Compliant. AH321 Specifications. June 4 th, Version 1.4. Apacer Technology Inc.

GETTING STARTED WITH LABVIEW POINT-BY-POINT VIS

Eight Ways to Increase GPIB System Performance

USB Flash Drive. RoHS Compliant. AH322 Specifications. July 2 nd, Version 1.2. Apacer Technology Inc.

UT165 Advanced USB2.0 Flash Drive Controller

White Paper Utilizing Leveling Techniques in DDR3 SDRAM Memory Interfaces

A Comparison of Client and Enterprise SSD Data Path Protection

udrive-usd-g1 Embedded DOS micro-drive Module Data Sheet

How To Write On A Flash Memory Flash Memory (Mlc) On A Solid State Drive (Samsung)

EDI s x32 MCM-L SRAM Family: Integrated Memory Solution for TMS320C4x DSPs

Web Site: Forums: forums.parallax.com Sales: Technical:

Q & A From Hitachi Data Systems WebTech Presentation:

760 Veterans Circle, Warminster, PA Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA

Computer Organization & Architecture Lecture #19

High-Performance SSD-Based RAID Storage. Madhukar Gunjan Chakhaiyar Product Test Architect

TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS

Intel X38 Express Chipset Memory Technology and Configuration Guide

DDR4 Memory Technology on HP Z Workstations

Redefining Flash Storage Solution

Addressing Fatal Flash Flaws That Plague All Flash Storage Arrays

Violin Memory Arrays With IBM System Storage SAN Volume Control

June Blade.org 2009 ALL RIGHTS RESERVED

Computer Systems Structure Input/Output

N /150/151/160 RAID Controller. N MegaRAID CacheCade. Feature Overview

FlashOverview. Industrial Grade Flash Solutions

Solid State Drive Technology

AND8336. Design Examples of On Board Dual Supply Voltage Logic Translators. Prepared by: Jim Lepkowski ON Semiconductor.

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware

Boost Database Performance with the Cisco UCS Storage Accelerator

Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide

IBM ^ xseries ServeRAID Technology

Intel 965 Express Chipset Family Memory Technology and Configuration Guide

NVM Express TM Infrastructure - Exploring Data Center PCIe Topologies

Version 3.0 Technical Brief

IBM System Storage DS5020 Express

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Am186ER/Am188ER AMD Continues 16-bit Innovation

SLC vs MLC NAND and The Impact of Technology Scaling. White paper CTWP010

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

How System Settings Impact PCIe SSD Performance

In-System Programming Design TM. Guidelines for ispjtag Devices. Introduction. Device-specific Connections. isplsi 1000EA Family.

Serial Communications

OCZ s NVMe SSDs provide Lower Latency and Faster, more Consistent Performance

MirrorBit Technology: The Foundation for Value-Added Flash Memory Solutions FLASH FORWARD

Intel Solid-State Drive 320 Series

S34ML08G2 NAND Flash Memory for Embedded

Connecting AMD Flash Memory to a System Address Bus

What Types of ECC Should Be Used on Flash Memory?

Power Reduction Techniques in the SoC Clock Network. Clock Power

nanoetxexpress Specification Revision 1.0 Figure 1 nanoetxexpress board nanoetxexpress Specification Rev 1.

Managing the evolution of Flash : beyond memory to storage

Flash Memory Technology in Enterprise Storage

Enabling Cloud Computing and Server Virtualization with Improved Power Efficiency

Transcription:

A New Chapter for System Designs Using Memory Jim Cooke Senior Technical Marketing Manager Micron Technology, Inc December 27, 2010 Trends and Complexities trends have been on the rise since was first introduced Although it s not a new issue, the required to support newer multilevel (MLC) and three-bit-per-cell technologies is becoming increasingly difficult for system designers to keep up with has historically been used to improve the overall data reliability of subsystems However, as cells shrink, fewer electrons are stored per floating gate To compensate for the increasing bit error rates of these smaller geometry cells, requirements have to dramatically increase to maintain the desired system reliability As system requirements for increase, the number of gates required to implement the logic also increases, as does the system complexity For example, 24 bits of requires about 200,000 gates, while 40 bits of requires about 300,000 gates It is estimated that in the future, advanced algorithms will approach close to 1 million gates See Figure 1 Logic Gates 350,000 300,000 250,000 200,000 150,000 100,000 50,000 0 4 40 24 12 8 72 50 34 25 Future Process Geometry (nm) 40 32 24 16 8 0 Bits Required Many high-performance systems require multiple channels of to reach the desired performance In these systems, each channel typically has its own logic For example, a 10-channel SSD may have 10 channels of logic implemented If 60 bits of were required for each of the 10 channels, the result would be 3 million gates just for the logic # of Gates to implement MLC-2 Required Figure 1: Increases as Process Geometries Shrink 1

A New Chapter for System Designs Using Memory Interface Choices Fully Managed Solutions Fully managed interfaces are typically the simplest to interface with Examples include e MMC memory or esd, which use a traditional multimedia card or SD card interface These multichip packages (MCPs) include a controller that implements the desired interface, as well as the and block management required by the An advantage of fully managed solutions is that they require the least amount of effort from the host s perspective The host software uses a straightforward block-level interface; and the controller within the MCP is responsible for the, block management, and wear leveling The disadvantage is that they re singlethreaded and based on a simple command, data, and acknowledge protocol, which limits performance In addition, the processor used in these MCPs is typically a small 8-bit processor, which also impacts performance These implementations are typically not very deterministic, because the internal controller can be doing block management or garbage collection activities in the background Consequently, they can be exposed to unexpected power loss With regard to performance, sequential READ/WRITE operations provide reasonable bandwidth for large data transfers, while random READ/WRITE operations can yield lower performance for smaller data transfers Legacy Interface The interface has traditionally been an asynchronous interface Although interface speeds have improved up to 50 MHz in recent years, not much else has changed on this interface Several years back, Micron and several other forwardthinking companies joined together to form a organization that was focused on simplifying the myriad of timing and command specifications offered by the industry The Open Interface (ONFI) developed the first version of their specification, ONFI 10 While there are many advantages to the original ONFI 10 specification, one of the biggest is the ability for the host to electronically detect the type of device that is connected, as well as other important parameters, like timing modes, page size, block size, requirements This feature has been carried forward to all of the ONFI specifications and remains an important aspect of all ONFI standards Another significant accomplishment of the ONFI organization was the development of the synchronous interface, also known as ONFI 2 ONFI 22 currently supports up to 200 mega transfers per second (200 MT/s) using a DDR, source-synchronous interface That is, after powering up, it can be used in asynchronous mode However, for higher performance, the host interrogates the device to see if it is able to support the higherspeed synchronous interface before changing to it Direct Solutions Implementations that connect directly to the host processor are responsible for managing the Hardware manages the, while software typically performs all block management and wear-leveling operations At first this may seem like a disadvantage However, with today s typical embedded processors running at speeds of hundreds of megahertz and often over one gigahertz, these high-performance processors can accomplish block management much faster and can take advantage of deterministic, multithreading techniques to improve performance In addition, with the host managing the device directly, the host software can make real-time decisions that can help eliminate exposure to unexpected power failures As shown in Figure 2 (page 3), the ONFI 22 specification (200 MT/s) was designed to accommodate up to 16 standard loads A typical implementation of this would be using two 8- packages The standard 8-, 100-ball BGA package includes two separate buses (DQ[7:0]1 and DQ[7:0]2), with each bus having four wired together Each of the four stacks are controlled with two chip enables A typical design would wire the two data or DQ buses together, forming a single 8-bit data bus for each package A maximum configuration would consist of two 100-ball BGA packages, each containing eight Each of these standard 100-ball BGA packages requires four chip enables (CE#) to select a specific Thus, the system controller needs to supply eight chip enables to support this configuration 2

A New Chapter for System Designs Using Memory 0 BGA 100 Package #1 BGA 100 Package #2 System Contoller 1 30 Control & CE1 2 DQ[7:0]1 Control & CE3 4 DQ[7:0]2 Control & CE1 2 DQ[7:0]1 Control & CE3 4 DQ[7:0]2 31 Figure 2: Typical High-Capacity Implementation Using Two 8-Die, 100-Ball BGA Packages Solutions Figure 3 (page 4), shows two system implementations: a traditional system where the processor is interfacing directly with and a system using Both implementations use the same ONFI hardware interface and similar 100-ball BGA ballouts The example includes a thin controller package with the in an MCP The controller implements the required by the inside the MCP package Utilizing the same ONFI asynchronous or synchronous interface enables designers to migrate easily from standard to Micron offers two versions of : Standard and Enhanced Standard, suggested mainly for consumer devices, implements the required and provides a traditional asynchronous ONFI bus for easy migration Enhanced manages, in addition to offering several performance-enabling features that are of most value to enterprise applications It also supports both the asynchronous and synchronous versions of the ONFI 22 interface and is available in densities up to 64GB By abstracting the, both versions of will be able to handle the additional that future versions of will require This will eliminate the need for designers to continually redesign their circuitry to keep up with manufacturers latest requirements Enhanced Figure 4 (page 4), shows the Enhanced architecture It supports a single ONFI 22 interface and up to 200 MT/s command, address, and data bus The V DDI decoupling capacitor is common in e MMC products and other devices that include a controller It is required to decouple the internal voltage regulator For backward compatibility with standard devices, the V DDI connection is located on an unused pin The controller supports two internal buses, one for even logical unit number or s and one for odd s These two independent buses can operate at up to 200 MT/s In addition, each bus has its own engine and can manage simultaneous READ or WRITE operations on the two buses It is envisioned that future versions of the controller will support the ONFI 3 specification, which is targeting up to 400 MT/s 3

A New Chapter for System Designs Using Memory System Using Raw System Bus Block Management Driver Same ONFI BUS System Using System Block Management Driver Bus V DDI Similar BGA-100 Package + = MCP Device Figure 3: Standard vs Enhanced MCP ONFI 22 Up to 200 MT/s V DDI CE# Enhanced 0 Up to 166 MT/s 1 Up to 166 MT/s Even s Odd s Async / Sync BGA-100 (14 x 18mm) Figure 4: Enhanced Architecture 4

A New Chapter for System Designs Using Memory Feature ONFI-compatible controller and device in a standard package (14 x 18mm; 100-ball BGA) Benefit 1 Footprint compatible with raw 2 Manages and abstracts 3 Improves endurance (future versions) 4 Provides platform for future enhancements Single 200 MT/s front side bus 1 Reduces loading on bus 2 Allows more per channel 3 Improves signal integrity Dual 166 MT/s internal buses 1 Enables parallel operations 2 Local copyback offloads wear-leveling operations Table 1: Enhanced Features and Benefits Many of these basic features are covered in the following discussion of the four advanced functions offered by Enhanced : volume addressing, electronic data mirroring, interrupt and internal copyback Volume Addressing Volume addressing allows a single chip select or chip enable (CE#) to address up to 16 volumes Each controller can support up to eight packaged in an MCP The controller provides a buffer for the system controller access The Enhanced design, shown in Figure 5 (page 6), offers an eightfold improvement in density while maintaining or improving signal integrity and reducing the active number of chip enables that are required This is because a single controller represents only one load to the system controller, but supports up to eight in the MCP package There are two aspects to the volume addressing concept The first is establishing the volume address for each of the packages The volume address is appointed only once at initialization and is maintained until the power is cycled The second aspect is the volume select command itself This is a new command that is followed by a single byte (actually only 4 bits) volume address Once the intended volume is selected, it remains selected until another volume is selected The chip enable pin savings can be significant For example, with a standard implementation, a single channel requires eight chip enables to control two 8- packages The 32-channel example would require a total of 256 chip enable pins whereas the Enhanced volume addressing feature can address the same amount of using only 32 chip enable pins What s more, these same 32 chip enable pins can be used to address eight times the density Electronic Data Mirroring Enhanced supports electronic data mirroring, which allows the data bus signal order to be electronically remapped to one of two configurations This is particularly useful for high-density designs with devices mounted on both the top and bottom of the PCB The package is able to electronically detect whether it is mounted on the top or bottom of the PCB This is accomplished using a specific initialization or reset sequence For example, it s common practice to issue a reset or FFh command to the device on power-up To accomplish the electronic DQ mirroring, the host must follow this FFh command with the traditional READ STATUS (70h) command The top detects this command sequence as FFh-70h; the bottom recognizes this same sequence as FFh-0Eh and can establish that it is the bottom package and reorder its data bus to align directly under the top This not only improves PCB routing but also improves signal integrity 5

A New Chapter for System Designs Using Memory 0 BGA 100 Package #1 BGA 100 Package #16 System Contoller 1 30 31 Control & CE1 Control & CE1 DQ[7:0]1 DQ[7:0]1 Figure 5: Typical High-Capacity Implementation Using up to Eight Die in 100-Ball BGA Packages Ready/Busy# Redefined to Interrupt Figure 6A shows the READY/BUSY# operation of standard The READY/BUSY# function is an opendrain signal The standard 8-, 100-ball BGA package includes four ready/busy (R/B#)pins Like chip enable, each of these pins is connected to two individual Referring to Figure 6A, R/B# 1 would be connected to Die 1 and Die 2 Because signal connections are at a premium, designers will either not use the R/B# signals, or if they do, they will wire-or all of the R/B# signals from one package (shown at the bottom of Figure 6A) This is not ideal, because any single that is busy pulls the entire R/B# signal LOW, preventing that become ready from being detected If the implementation does not use the R/B# signals, it has the option of using firmware to poll the status of each of the in the package, which is time-consuming and wastes power Neither case is ideal together When the system controller detects an interrupt, it can simply interrogate each of the packages or volumes to learn which volume has posted the new status The interrupt# operation can save signals on the system controller while improving the ability for the system controller to respond to status updates Internal Copyback The last but perhaps most noteworthy feature of Enhanced is the internal COPYBACK function, also known as INTERNAL DATA MOVE This function can provide a significant advantage in SSD systems when it comes to wear-leveling or garbage collection operations; that is, the process of collecting fragments of data scattered throughout the various pages and blocks of the and coalescing them in a more streamlined block or sequence of blocks It is similar to the old hard disk defragmentation utility Enhanced redefines the existing R/B# pin to be an interrupt pin As shown in Figure 6, the interrupt# signal, which is still open-drain, provides a real-time interrupt when the volume or becomes ready Designers can use this signal to provide real-time status to the system controller In larger configurations supporting multiple packages on a single bus, the interrupt# signals can be wired 6

A New Chapter for System Designs Using Memory Referring back to Figure 2, when using standard, moving data fragments from one block to another typically requires the following sequence of operations: 1 The system controller issues a READ command and source address to access the source page of data 2 The system controller inputs data from the device while calculating and making any necessary corrections Any updating of data or metadata is usually accomplished after this step 3 The system controller calculates and appends the new information before issuing a new PROGRAM command, destination address, and data sequence, which will store the data in the new block In this sequential operation, the bus is busy while moving the source and destination data from and to the device The timing of these operations can be significant An ONFI 22 synchronous bus operating at 200 MT/s would require about 41µs to move the data, assuming an 8K page Because the data has to be moved from and to the device, we double this time to 82µs, which doesn t include the overhead While this sequence is being carried out, the ONFI bus is busy and cannot be used for other operations Enhanced is different in that it supports internal Using this built-in enables the COPYBACK operation to be performed internal to the Enhanced package, assuming the source and destination of the data are within the package The system controller is still responsible for issuing the commands and addresses as well as any modified data or metadata The data movement is handled by the controller and does not tie up the external ONFI bus If the system controller is able to keep its wear-leveling and garbage collection operations within a single package, it can have significant performance advantages Figure 7 (page 8), shows an example using Enhanced on two ONFI channels labeled Channel 0 and Channel 1 On both SSD channels, we can see that four of the INTERNAL DATA MOVE operations are occurring simultaneously without the external ONFI bus being used for the data movement This frees the system controller and ONFI bus to move data from one package to another, if necessary Depending on the architecture, some percentage of these operations may need to go between packages or even between ONFI buses Taking advantage of the INTERNAL DATA MOVE operation can provide a significant performance improvement for garbage collection and wear-leveling operations Die 1 Die 1 Die 2 Die 2 Die 3 Die 3 Die 4 Die 4 Die 5 Die 5 Die 6 Die 6 Die 7 Die 7 Die 8 Die 8 R/B# Interrupt# Figure 6A: Ready/Busy# Operation Figure 6B: Interrupt# Operation 7

A New Chapter for System Designs Using Memory Conclusion Micron s Enhanced provides additional performance and features while eliminating the impact of s ever-increasing requirements Because Enhanced supports a ballout similar to the standard 100-ball BGA devices, it s possible to design your product to support both An example would be to include enough in your system controller to support SLC directly and select Enhanced for your multilevel cell needs, where can present more of a challenge The volume addressing feature of Enhanced Clear- enables higher densities to be addressed using fewer chips, saving potentially hundreds of pins in high-capacity implementations Electronic data mirroring simplifies PCB design and routing while improving the signal integrity of the ONFI bus The intelligent interrupt capability provides for real-time status updates to the system controller and minimizes the polling for firmware The dual internal buses provide improved COPYBACK operations, which in turn improve performance CHANNEL 0 CHANNEL 1 Figure 7: Copyback Using Enhanced microncom Products are warranted only to meet Micron s production data sheet specifications Products and specifications are subject to change without notice Micron, the Micron logo, and are trademarks of Micron Technology, Inc All other trademarks are the property of their respective owners 2010 Micron Technology, Inc All rights reserved 12/27/10 ENL 8