DRAM Circuit and Architecture Basics

Similar documents
Memory unit. 2 k words. n bits per word

Computer Architecture

1. Memory technology & Hierarchy

A N. O N Output/Input-output connection

SOLVING HIGH-SPEED MEMORY INTERFACE CHALLENGES WITH LOW-COST FPGAS

Intel X38 Express Chipset Memory Technology and Configuration Guide

User s Manual HOW TO USE DDR SDRAM

Features. DDR SODIMM Product Datasheet. Rev. 1.0 Oct. 2011

Technical Note DDR2 Offers New Features and Functionality

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

Table 1: Address Table

Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide

1.55V DDR2 SDRAM FBDIMM

Table 1 SDR to DDR Quick Reference

Memory Basics. SRAM/DRAM Basics

ADQYF1A08. DDR2-1066G(CL6) 240-Pin O.C. U-DIMM 1GB (128M x 64-bits)

Chapter 5 :: Memory and Logic Arrays

Random Access Memory (RAM) Types of RAM. RAM Random Access Memory Jamie Tees SDRAM. Micro-DIMM SO-DIMM

Switch Fabric Implementation Using Shared Memory

RAM & ROM Based Digital Design. ECE 152A Winter 2012

Intel 965 Express Chipset Family Memory Technology and Configuration Guide

Memory Testing. Memory testing.1

Computer Architecture

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

DDR SDRAM UDIMM MT16VDDT6464A 512MB MT16VDDT12864A 1GB MT16VDDT25664A 2GB

DDR SDRAM SODIMM. MT8VDDT3264H 256MB 1 MT8VDDT6464H 512MB For component data sheets, refer to Micron s Web site:

Technical Note. Initialization Sequence for DDR SDRAM. Introduction. Initializing DDR SDRAM

Memory Systems. Static Random Access Memory (SRAM) Cell

Mobile SDRAM. MT48H16M16LF 4 Meg x 16 x 4 banks MT48H8M32LF 2 Meg x 32 x 4 banks

GR2DR4B-EXXX/YYY/LP 1GB & 2GB DDR2 REGISTERED DIMMs (LOW PROFILE)

DDR2 SDRAM SODIMM MT16HTF12864H 1GB MT16HTF25664H 2GB

Understanding Memory TYPES OF MEMORY

RAM. Overview DRAM. What RAM means? DRAM

V58C2512(804/404/164)SB HIGH PERFORMANCE 512 Mbit DDR SDRAM 4 BANKS X 16Mbit X 8 (804) 4 BANKS X 32Mbit X 4 (404) 4 BANKS X 8Mbit X 16 (164)

DDR SDRAM SODIMM. MT9VDDT1672H 128MB 1 MT9VDDT3272H 256MB MT9VDDT6472H 512MB For component data sheets, refer to Micron s Web site:

DDR2 SDRAM SODIMM MT8HTF3264HD 256MB MT8HTF6464HD 512MB MT8HTF12864HD 1GB For component data sheets, refer to Micron s Web site:

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

Memory. The memory types currently in common usage are:

DQ0 DQ1 DQ2 DQ3 DQ4 DQ5 DQ6 DQ7 WE# RAS# A0 A1 A2 A3

DDR2 SDRAM SODIMM MT8HTF6464HDZ 512MB MT8HTF12864HDZ 1GB. Features. 512MB, 1GB (x64, DR) 200-Pin DDR2 SODIMM. Features

Chapter 1 Computer System Overview

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

CS250 VLSI Systems Design Lecture 8: Memory

Highlights of the High- Bandwidth Memory (HBM) Standard

Tuning DDR4 for Power and Performance. Mike Micheletti Product Manager Teledyne LeCroy

DDR4 Memory Technology on HP Z Workstations

Configuring and using DDR3 memory with HP ProLiant Gen8 Servers

DDR SDRAM SODIMM MT16VDDF6464H 512MB MT16VDDF12864H 1GB

Tuning DDR4 for Power and Performance. Mike Micheletti Product Manager Teledyne LeCroy

CHAPTER 11: Flip Flops

Low Power AMD Athlon 64 and AMD Opteron Processors

Random-Access Memory (RAM) The Memory Hierarchy. SRAM vs DRAM Summary. Conventional DRAM Organization. Page 1

Note: Data Rate (MT/s) CL = 3 CL = 4 CL = 5 CL = 6. t RCD (ns) t RP (ns) t RC (ns) t RFC (ns)

Static-Noise-Margin Analysis of Conventional 6T SRAM Cell at 45nm Technology

GETTING STARTED WITH PROGRAMMABLE LOGIC DEVICES, THE 16V8 AND 20V8

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access?

DDR subsystem: Enhancing System Reliability and Yield

Computer Systems Structure Main Memory Organization

361 Computer Architecture Lecture 14: Cache Memory

Features. DDR3 SODIMM Product Specification. Rev. 1.7 Feb. 2016

DDR2 SDRAM SODIMM MT8HTF12864HZ 1GB MT8HTF25664HZ 2GB. Features. 1GB, 2GB (x64, SR) 200-Pin DDR2 SDRAM SODIMM. Features

Configuring Memory on the HP Business Desktop dx5150

What Every Programmer Should Know About Memory

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

1 Gbit, 2 Gbit, 4 Gbit, 3 V SLC NAND Flash For Embedded

DDR2 SDRAM SODIMM MT4HTF6464HZ 512MB. Features. 512MB (x64, SR) 200-Pin DDR2 SODIMM. Features. Figure 1: 200-Pin SODIMM (MO-224 R/C C)

Motivation: Smartphone Market

With respect to the way of data access we can classify memories as:

DDR2 SDRAM UDIMM MT18HTF6472AY 512MB MT18HTF12872AY 1GB MT18HTF25672AY 2GB MT18HTF51272AY 4GB. Features

are un-buffered 200-Pin Double Data Rate (DDR) Synchronous DRAM Small Outline Dual In-Line Memory Module (SO-DIMM). All devices

DDR2 SDRAM UDIMM MT16HTF6464AY 512MB MT16HTF12864AY 1GB MT16HTF25664AY 2GB MT16HTF51264AY 4GB. Features

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

DDR2 SDRAM FBDIMM MT36HTF25672F 2GB MT36HTF51272F 4GB. Features. 2GB, 4GB (x72, DR) 240-Pin DDR2 SDRAM FBDIMM. Features

Lecture 5: Gate Logic Logic Optimization

DDR3 memory technology

DDR2 SDRAM FBDIMM MT18HTF12872FD 1GB MT18HTF25672FD 2GB. Features. 1GB, 2GB (x72, DR) 240-Pin DDR2 SDRAM FBDIMM. Features

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

White Paper Utilizing Leveling Techniques in DDR3 SDRAM Memory Interfaces

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Vcc DQ1 DQ2 DQ3 DQ4 Vcc DQ5 DQ6 DQ7 DQ8 NC. NC NC WE# RAS# NC NC A0 A1 A2 A3 Vcc

Memory technology evolution: an overview of system memory technologies

AN10935 Using SDR/DDR SDRAM memories with LPC32xx

Basic Computer Organization

Byte Ordering of Multibyte Data Items

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Semiconductor Memories

Technical Bulletin. Arista LANZ Overview. Overview

Wiki Lab Book. This week is practice for wiki usage during the project.

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

DDR3 DIMM Slot Interposer

Multi-Threading Performance on Commodity Multi-Core Processors

A 10,000 Frames/s 0.18 µm CMOS Digital Pixel Sensor with Pixel-Level Memory

Finite State Machine. RTL Hardware Design by P. Chu. Chapter 10 1

Transcription:

DRAM Circuit and Architecture Basics Overview Terminology Access Protocol Architecture Word Line Storage element (capacitor) Bit Line Switching element

DRAM Circuit Basics DRAM Cell Word Line Bit Line Storage element (capacitor) In/Out Buffers DRAM Column Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array

, Bitlines and Wordlines DRAM Circuit Basics Defined Bit Lines Word Line of DRAM Size: 8 Kb @ 256 Mb SDRAM node 4 Kb @ 256 Mb RDRAM node

DRAM Circuit Basics Sense Amplifier I 1 2 4 5 Sense and Amplify 3 6 6 s shown

DRAM Circuit Basics Sense Amplifier II : Precharged precharged to V cc /2 1 2 4 5 Sense and Amplify 3 6 V cc (logic 1) Gnd (logic 0) V cc /2

DRAM Circuit Basics Sense Amplifier III : Destructive Read 1 2 4 5 Sense and Amplify 3 6 Wordline Driven V cc (logic 1) Gnd (logic 0) V cc /2

DRAM Access Protocol ROW ACCESS DRAM Column Decoder CPU BUS MEMORY CONTROLLER AKA: OPEN a DRAM Page/ or ACT (Activate a DRAM Page/) or RAS ( Strobe) In/Out Buffers Decoder... Word Lines... Sense Amps... Bit Lines... Memory Array

once the data is valid on ALL of the bit lines, you can select a subset of the bits and send them to the output buffers... CAS picks one of the bits big point: cannot do another RAS or precharge of the lines until finished reading the column data... can t change the values on the bit lines or the output of the sense amps until it has been read by the memory controller DRAM Circuit Basics Column Defined Column: Smallest addressable quantity of DRAM on chip SDRAM*: column size == chip data bus width (4, 8,16, 32) RDRAM: column size!= chip data bus width (128 bit fixed) SDRAM*: get n columns per access. n = (1, 2, 4, 8) RDRAM: get 1 column per access. 4 bit wide columns #0 #1 #2 #3 #4 #5 One of DRAM * SDRAM means SDRAM and variants. i.e. DDR SDRAM

DRAM Access Protocol COLUMN ACCESS I DRAM Column Decoder CPU BUS MEMORY CONTROLLER READ Command or CAS: Column Strobe In/Out Buffers Decoder... Word Lines... Sense Amps... Bit Lines... Memory Array

DRAM Access Protocol Column Access II then the data is valid on the data bus... depending on what you are using for in/out buffers, you might be able to overlap a litttle or a lot of the data transfer with the next CAS to the same page (this is PAGE MODE) CPU BUS Out MEMORY CONTROLLER In/Out Buffers Decoder... Word Lines... Column Decoder Sense Amps... Bit Lines... Memory Array DRAM... with optional additional CAS: Column Strobe note: page mode enables overlap with CAS

DRAM Speed Part I NOTE How fast can I move data from DRAM cell to sense amp? Column Decoder DRAM CPU BUS t RCD MEMORY CONTROLLER In/Out Buffers Decoder... Word Lines... Sense Amps... Bit Lines... Memory Array RCD ( Command Delay)

DRAM Speed Part II How fast can I get data out of sense amps back into memory controller? t CAS aka t CASL aka t CL CPU BUS MEMORY CONTROLLER CAS: Column Strobe CASL: Column Strobe Latency CL: Column Strobe Latency In/Out Buffers Decoder... Word Lines... Column Decoder Sense Amps... Bit Lines... Memory Array DRAM

DRAM Speed Part III How fast can I move data from DRAM cell into memory controller? DRAM Column Decoder CPU BUS MEMORY CONTROLLER t RAC = t RCD + t CAS In/Out Buffers Decoder... Word Lines... Sense Amps... Bit Lines... Memory Array RAC (Random Access Delay)

DRAM Speed Part IV How fast can I precharge DRAM array so I can engage another RAS? DRAM Column Decoder CPU t RP BUS MEMORY CONTROLLER In/Out Buffers Decoder... Word Lines... Sense Amps... Bit Lines... Memory Array RP ( Precharge Delay)

DRAM Speed Part V How fast can I read from different rows? DRAM Column Decoder CPU BUS MEMORY CONTROLLER t RC = t RAS + t RP In/Out Buffers Decoder... Word Lines... Sense Amps... Bit Lines... Memory Array RC ( Cycle Time)

DRAM Speed Summary I What do I care about? t RCD t CAS t RP t RC = t RAS + t RP t RAC = t RCD + t CAS Seen in ads. Easy to explain Easy to sell Embedded systems designers DRAM manufactuers Computer Architect: Latency bound code i.e. linked list traversal RAS: Strobe CAS: Column Strobe RCD: Command Delay RAC :Random Access Delay RP : Precharge Delay RC : Cycle Time

DRAM Speed Summary II DRAM Type PC133 SDRAM Frequency Bus Width (per chip) Peak Bandwidth (per Chip) Random Access Time (t RAC ) Cycle Time (t RC ) 133 16 200 MB/s 45 ns 60 ns DDR 266 133 * 2 16 532 MB/s 45 ns 60 ns PC800 RDRAM 400 * 2 16 1.6 GB/s 60 ns 70 ns FCRAM 200 * 2 16 0.8 GB/s 25 ns 25 ns RLDRAM 300 * 2 32 2.4 GB/s 25 ns 25 ns DRAM is slow But doesn t have to be t RC < 10ns achievable Higher die cost Not adopted in standard Not commodity Expensive

DRAM latency isn t deterministic because of CAS or RAS+CAS, and there may be significant queuing delays within the CPU and the memory controller Each transaction has some overhead. Some types of overhead cannot be pipelined. This means that in general, longer bursts are more efficient. DRAM latency CPU A F B Mem Controller C D E 1 DRAM E 2 /E 3 A: Transaction request may be delayed in Queue B: Transaction request sent to Memory Controller C: Transaction converted to Command Sequences (may be queued) D: Command/s Sent to DRAM E 1 : Requires only a CAS or E 2 : Requires RAS + CAS or E 3: Requires PRE + RAS + CAS F: Transaction sent back to CPU DRAM Latency = A + B + C + D + E + F

DRAM Architecture Basics PHYSICAL ORGANIZATION NOTE x2 DRAM Column Decoder x4 DRAM Column Decoder x8 DRAM Column Decoder Buffers Sense Amps... Bit Lines... Buffers Sense Amps... Bit Lines... Buffers Sense Amps... Bit Lines... Decoder.... Memory Array Decoder.... Memory Array Decoder.... Memory Array x2 DRAM x4 DRAM x8 DRAM This is per bank Typical DRAMs have 2+ banks

let s look at the interface another way.. the say the data sheets portray it. [explain] main point: the RAS\ and CAS\ signals directly control the latches that hold the row and column addresses... DRAM Architecture Basics Read Timing for Conventional DRAM RAS CAS Access Column Access Transfer Column Column DQ out out

DRAM Evolutionary Tree since DRAM s inception, there have been a stream of changes to the design, from FPM to EDO to Burst EDO to SDRAM. the changes are largely structural modifications -- nimor -- that target THROUGHPUT. [discuss FPM up to SDRAM Everything up to and including SDRAM has been relatively inexpensive, especially when considering the pay-off (FPM was essentially free, EDO cost a latch, PBEDO cost a counter, SDRAM cost a slight re-design). however, we re run out of free ideas, and now all changes are considered expensive... thus there is no consensus on new directions and myriad of choices has appeared [ do LATENCY mods starting with ESDRAM... and then the INTERFACE mods ] Conventional DRAM (Mostly) Structural Modifications Targeting Throughput FPM EDO P/BEDO SDRAM ESDRAM Interface Modifications Targeting Throughput.............. MOSYS FCRAM $ VCDRAM Structural Modifications Targeting Latency Rambus, DDR/2 Future Trends

DRAM Evolution NOTE Read Timing for Conventional DRAM Access Column Access Transfer Overlap RAS Transfer CAS Column Column DQ out out

FPM aallows you to keep th esense amps actuve for multiple CAS commands... much better throughput problem: cannot latch a new value in the column address buffer until the read-out of the data is complete DRAM Evolution Read Timing for Fast Page Mode RAS Access Column Access Transfer Overlap Transfer CAS Column Column Column DQ out out out

solution to that problem -- instead of simple tri-state buffers, use a latch as well. by putting a latch after the column mux, the next column address command can begin sooner DRAM Evolution Read Timing for Extended Out RAS Access Column Access Transfer Overlap Transfer CAS Column Column Column DQ out out out

by driving the col-addr latch from an internal counter rather than an external signal, the minimum cycle time for driving the output bus was reduced by roughly 30% DRAM Evolution Read Timing for Burst EDO RAS Access Column Access Transfer Overlap Transfer CAS Column DQ

pipeline refers to the setting up of the read pipeline... first CAS\ toggle latches the column address, all following CAS\ toggles drive data out onto the bus. therefore data stops coming when the memory controller stops toggling CAS\ DRAM Evolution Read Timing for Pipeline Burst EDO RAS Access Column Access Transfer Overlap Transfer CAS Column DQ

main benefit: frees up the CPU or memory controller from having to control the DRAM s internal latches directly... the controller/cpu can go off and do other things during the idle cycles instead of wait... even though the time-to-first-word latency actually gets worse, the scheme increases system throughput DRAM Evolution Read Timing for Synchronous DRAM Clock RAS CAS Command Access Column Access Transfer Overlap Transfer ACT READ Col DQ (RAS + CAS + OE... == Command Bus)

output latch on EDO allowed you to start CAS sooner for next accesss (to same row) DRAM Evolution Inter- Read Timing for ESDRAM Regular CAS-2 SDRAM, R/R to same bank Clock Command latch whole row in ESDRAM -- allows you to start precharge & RAS sooner for thee next page access -- HIDE THE PRECHARGE OVERHEAD. ACT READ Col PRE Bank ACT READ Col DQ ESDRAM, R/R to same bank Clock Command ACT READ PRE ACT READ Col Bank Col DQ

neat feature of this type of buffering: write-around DRAM Evolution Write-Around in ESDRAM Regular CAS-2 SDRAM, R/W/R to same bank, rows 0/1/0 Clock Command ACT READ PRE ACT WRITE PRE ACT READ Col Bank Col Bank Col DQ ESDRAM, R/W/R to same bank, rows 0/1/0 Clock Command ACT READ PRE ACT WRITE READ Col Bank Col Col DQ (can second READ be this aggressive?)

main thing... it is like having a bunch of open row buffers (a la rambus), but the problem is that you must deal with the cache directly (move into and out of it), not the DRAM banks... adds an extra couple of cycles of latency... however, you get good bandwidth if the data you want is cache, and you can prefetch into cache ahead of when you want it... originally targetted at reducing latency, now that SDRAM is CAS-2 and RCD-2, this make sense only in a throughput way DRAM Evolution Internal Structure of Virtual Channel Bank B Bank A 2Kb Segment 2Kb Segment 2Kb Segment 2Kbit 16 Channels (segments) # DQs Input/Output Buffer DQs $ 2Kb Segment Decoder Sense Amps Sel/Dec Activate Prefetch Restore Read Write Segment cache is software-managed, reduces energy

FCRAM opts to break up the data array.. only activate a portion of the word line DRAM Evolution Internal Structure of Fast Cycle RAM SDRAM FCRAM 8K rows requires 13 bits tto select... FCRAM uses 15 (assuming the array is 8k x 1k... the data sheet does not specify) 13 bits Decoder Decoder 8M Array 8M Array 15 bits (8Kr x 1Kb) (?) Sense Amps Sense Amps t RCD = 15ns (two clocks) t RCD = 5ns (one clock) Reduces access time and energy/access

MoSys takes this one step further... DRAM with an SRAM interface & speed but DRAM energy [physical partitioning: 72 banks] auto refresh -- how to do this transparently? the logic moves tthrough the arrays, refreshing them when not active. but what is one bank gets repeated access for a long duration? all other banks will be refreshed, but that one will not. solution: they have a bank-sized CACHE of lines... in theory, should never have a problem (magic) DRAM Evolution Internal Structure of MoSys 1T-SRAM addr Bank Select Auto Refresh.............. $ DQs