Memory RWM NVRWM ROM Random Access Non-Random Access EPROM E 2 PROM Mask-Programmed Programmable (PROM) SRAM FIFO FLASH DRAM LIFO Shift Register CAM Memory Decoders M bits M bits N Words S 0 S 1 S 2 S N-2 S N_1 Word 0 Word 1 Word 2 Word N-2 Word N-1 Storage Cell A 0 A 1 A K-1 Decoder S 0 Word 0 Word 1 Word 2 Word N-2 Word N-1 Storage Cell Input-Output (M bits) Input-Output (M bits) N words => N select signals Too many select signals Decoder reduces # of select signals K = log 2 N 1
Array-Structured Memory Problem: ASPECT RATIO or HEIGHT >> WIDTH 2 L-K Bit Line Storage Cell A K A K+1 A L-1 Row Decoder Word Line Sense Amplifiers / Drivers M.2 K Amplify swing to rail-to-rail amplitude A 0 A K-1 Column Decoder Selects appropriate word Input-Output (M bits) Array Decoding 2
Hierarchical Memory Arrays Row Address Column Address Block Address Control Circuitry Block Selector Global Amplifier/Driver Global Data Bus I/O Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings Memory Timing Definitions Read Cycle READ Read Access Read Access Write Cycle WRITE Data Valid Write Access DATA Data Written 3
Memory Timing Approaches MSB LSB Address Bus Row Address Column Address RAS CAS Address Bus Address Address transition initiates memory operation RAS-CAS timing DRAM Timing Multiplexed Adressing SRAM Timing Self-timed Example: HM6264 8kx8 SRAM 4
HM6264 Interface Function Table 5
Timing Read Cycle 1 6
Read Cycle 1 85ns min 85ns max 85ns max 10ns min 85ns max 10ns min 45ns max 5ns min 30ns min 30ns min 30ns min 10ns min Read Cycle 2 7
Read Cycle 2 85ns max 10ns min 10ns min Write Timing 8
Write Cycle Write Cycle 85ns min 75ns min 0ns min 0ns min 75ns min 55ns min 0ns min, 30ns max 40ns min 0ns min 9
What Does All This Mean For a read: If you assert CS1, CS2, address, and OE all at the same time, it will be max 85ns before valid data are available at chip outputs For a write: You can assert CS1, CS2, address, data, and WE all at the same time if you want to You need to wait 55ns from WE edge, or 75ns from CS1/CS2 edge for write to have happened R/W Memories In General STATIC (SRAM) Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential DYNAMIC (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended 10
SRAM Circuits SRAM Cell, Transistors 11
SRAM, Resistive Pullups Array-Structured Memory Problem: ASPECT RATIO or HEIGHT >> WIDTH 2 L-K Bit Line Storage Cell A K A K+1 A L-1 Row Decoder Word Line Sense Amplifiers / Drivers M.2 K Amplify swing to rail-to-rail amplitude A 0 A K-1 Column Decoder Selects appropriate word Input-Output (M bits) 12
Memory Column Each column has all the support circuits Reading the Bit Single-ended read using an inverter Dynamic pre-charge on the bit lines P-types pull bit lines high 13
Reading the Bit 2 Single-ended read using an inverter Dynamic pre-charge on the bit lines Note the N-types used as pull-ups Reading the Bit 3 Differential read using sense amp Static N-type pullup on the bit lines 14
Read Waveforms Sense Amp 15
Sense Amp Transistors Column Organization 16
Write Circuits Write Circuit Simulation 0 17
Analog Sim, Circuit WL V DD M2 M4 M5 Q Q M6 M1 M3 BL BL Analog Analysis, Write WL V DD M4 M5 Q = 0 Q = 1 M6 M1 BL = 1 V DD BL = 0 k nm6 V, ( DD V Tn ) V 2 DD V ---------- ---------- DD k 2 8 pm4 V, ( DD V Tp ) V 2 DD V = ---------- ---------- DD 2 8 (W/L) n,m6 0.33 (W/L) p,m4 k ------------ nm5, V V 2 DD DD ---------- V 2 Tn ---------- 2 2 V 2 DD V k n, M1 ( V DD V Tn )---------- ----------- DD 2 8 = (W/L) n,m5 10 (W/L) n,m1 18
Analog Analysis, Read WL V DD BL M4 BL M5 Q = 0 Q = 1 M6 V DD M1 V DD V DD C bit C bit k nm5 ---------------, V V DD DD ----------- V 2 2 Tn ----------- 2 V DD V 2 = k 2 nm1, ( V DD V Tn )----------- ----------- DD 2 8 (W/L) n,m5 10 (W/L) n,m1 (supercedes read constraint) 6T SRAM Layout 19
Another 6T SRAM Layout SRAM bit from makemem (v1) 20
SRAM bit from makemem (v2) Array-Structured Memory Problem: ASPECT RATIO or HEIGHT >> WIDTH 2 L-K Bit Line Storage Cell A K A K+1 A L-1 Row Decoder Word Line Sense Amplifiers / Drivers M.2 K Amplify swing to rail-to-rail amplitude A 0 A K-1 Column Decoder Selects appropriate word Input-Output (M bits) 21
Row Decoders Select exactly one of the memory rows Simple versions are just gates Row Decoder Gates Standard gates Or, pseudo-nmos gates with static pull up Easier to make large fan-in NOR 22
Pre-decode Row Decoder Multiple levels of decoding can be more efficient layout Pre-decode Row Decoder Other circuit tricks for building row decoders 23
Array-Structured Memory Problem: ASPECT RATIO or HEIGHT >> WIDTH 2 L-K Bit Line Storage Cell A K A K+1 A L-1 Row Decoder Word Line Sense Amplifiers / Drivers M.2 K Amplify swing to rail-to-rail amplitude A 0 A K-1 Column Decoder Selects appropriate word Input-Output (M bits) Array-Structured Memory 24
Sharing Sense Amps Sense Amp Mux 25
Sense Amp Mux Decoded Column Decode 26
Improving Speed, Power Multi-Port Memory Very common to require multiple read ports Think about a register file, for example 27
Multi-Port Register Re1 Re0 Slightly larger cell, but with single-ended read makes a great register file Register File Slightly larger cell, but with single-ended read makes a great register file 28
Dynamic RAM Get rid of the pull-ups! Store info on capacitors Means that stored information leaks away Dynamic RAM Once you agree to use a capacitor for charge storage there are other ways to build this 29
3T DRAM Circuit BL1 BL2 WWL RWL WWL RWL M1 X M2 M3 X BL1 V DD V DD -V T C S BL2 V DD -V T ΔV No constraints on device ratios Reads are non-destructive Value stored at node X when writing a 1 = V WWL -V Tn 3T DRAM Layout BL2 BL1 GND RWL M3 M2 WWL M1 30
1 T DRAM Circuit 2-T (1-T) DRAM layout Note the increased gate size of the storage transistor Increases the capacitance 31
1T DRAM Observations 1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out. DRAM memory cells are single ended in contrast to SRAM cells. The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation. Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. When writing a 1 into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than V DD. WL 1T DRAM Read/Write BL WL Write "1" Read "1" M1 C S X GND V DD V T C BL BL V DD /2 V DD sensing V DD/2 Write: C S is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance C S ΔV = V BL V PRE = ( V BIT V PRE )----------------------- C S + C BL Voltage swing is small; typically around 250 mv. 32
1T DRAM Cell Folded bit line Array of DRAM Cells Folded Bit Line 33
Reading a 1T DRAM Cell Charge Sharing DRAM Sense Amp 34
Photo of 1T DRAM Advanced DRAM Cells Try to get more capacitance per unit area Trench Capacitor 35
Examples of Advanced DRAMs Word line Insulating Layer Cell plate Capacitor dielectric layer Cell Plate Si Capacitor Insulator Storage Node Poly 2nd Field Oxide Refilling Poly Si Substrate Transfer gate Isolation Storage electrode Trench Cell Stacked-capacitor Cell Memory Timing Approaches MSB LSB Address Bus Row Address Column Address RAS CAS Address Bus Address Address transition initiates memory operation RAS-CAS timing DRAM Timing Multiplexed Adressing SRAM Timing Self-timed 36
DRAM Interface Extended Data Out Page Mode 37
Comments on Timing Architectural Issues 38
SDRAM - Use CAS for Bursts DDR SDRAM Double Data Rate 39
DRAM Timing RAMBUS DRAM (RDRAM) 40
RDRAM Bandwidth Maximum Bandwidth 41
Normal Bus for DRAM DIMMs RDRAM Bus 42
Deep Pipelining - High Latency RDRAM Addressing 43
Row Activate Command RDRAM System Arch 44
RDRAM Internal Arch Regular DRAM 45
Single Bank DRAM Multi-Bank DRAM 46
Peak Bandwidth ROM 47
ROM V DD Pull-up devices WL[0] WL[1] GND WL[2] WL[3] GND BL[0] BL[1] BL[2] BL[3] ROM 48
ROM ROM Layout Metal1 on top of diffusion WL[0] WL[1] Basic cell 10 λ x 7 λ GND (diffusion) Polysilicon Metal1 WL[2] 2 λ WL[3] Only 1 layer (contact mask) is used to program memory array Programming of the memory can be delayed to one of last process steps 49
ROM Layout Precharged ROM φpre V DD Precharge devices WL[0] WL[1] GND WL[2] WL[3] GND BL[0] BL[1] BL[2] BL[3] PMOS precharge device can be made as large as necessary, but clock driver becomes harder to design. 50
Precharged ROM Other Memory Cells 51
EPROM Non-Volatile ROM Erasable Programmable ROM EEPROM Electrically Erasable Programmable ROM Flash EEPROM Electrically Erasable Programmable ROM that is erased in large chunks All these devices rely on trapping charge on a floating gate Source Floating gate EPROM Gate Drain D t ox G t ox n + Substrate p n + S (a) Device cross-section (b) Schematic symbol 52
20 V Programming EPROM 0 V 5 V 20 V 10 V 5 V 5 V 0 V 2.5 V 5 V S D S D S D Avalanche injection. Removing programming voltage leaves charge trapped. Higher Vth (around 7v) means that 5v Vgs no longer turns on the transistor SiO2 is an excellent insulator Trapped charge can stay for years Programming results in higher V T. Erasing an EPROM Erase by shining UV light through window in the package UV radiation makes oxide slightly conductive Erasure is slow - from seconds to minutes depending on UV intensity Also the erase/program cycles are limited (around 1000), mainly as a result of the UV erasing But, EPROMs are simple and dense 53
Source Floating gate EEPROM Gate Drain I n + 20-30 nm Substrate p 10 nm (a) Flotox transistor Floating Gate Tunneling Oxide transistor WL n + V DD 10 V V GD 10 V (b) Fowler-Nordheim I-V characteristic BL (c) EEPROM cell during a read operation Thin oxide allows erasing in-system Fowler-Nordheim Tunneling EEPROM Two transistors instead of one The second keeps you from removing too much charge during erasure Bigger and not as dense as EPROM But, more erase/program cycles On the order of 10 5 Eventually you get permanently trapped charge in the SiO2 54
Flash EEPROM Control gate Floating gate erasure Thin tunneling oxide n + source programming n + drain p-substrate Essentially the same as EEPROM But, large regions erased at once Means you can monitor the voltages and don t need the extra access transistor Flash EEPROM 55
Realistic PROM Devices Content Addressable Mem Asks the question: Are there are any locations that hold this value? Used for tag memories in associative caches Or translation lookaside buffers Or other pattern matching applications 56
Content Addressable Mem Add the Match line Essentially a distributed NOR gate Content Addressable Mem 57
Programmable Logic Array Product Terms AND PLANE x 0 x 1 x 2 OR PLANE f 0 f 1 x 0 x 1 x 2 PLA Still useful for random combinational logic Standard cell ASIC tools may be replacing them They can generate dense AND-OR circuits 58
Pseudo-Static PLA Circuit GND GND GND GND V DD GND GND GND V DD x 0 x 0 x 1 x 1 x 2 x 2 f 0 f 1 AND-PLANE OR-PLANE φ AND Dynamic PLA GND V DD φ OR V DD φ AND x 0 x 0 x 1 x 1 x 2 x 2 AND-PLANE f 0 f 1 GND OR-PLANE φ OR 59
PLA Layout V DD And-Plane Or-Plane φ GND x 0 x 0 x 1 x 1 x 2 x 2 Pull-up devices f 0 f 1 Pull-up devices PLA vs. ROM Programmable Logic Array structured approach to random logic two level logic implementation NOR-NOR (product of sums) NAND-NAND (sum of products) IDENTICAL TO ROM! Main difference ROM: fully populated PLA: one element per minterm Note: Importance of PLA s has drastically reduced 1. slow 2. better software techniques (mutli-level logic synthesis) 60
FPGAs Field Programmable Gate Arrays Array of P-type and N-type transistors Sources and drains connected to Power and ground Metal Map gate structures to sea of gates Less expensive only modify metal masks 61