Lecture 12: MOS Decoders, Gate Sizing

Similar documents
Lecture 5: Gate Logic Logic Optimization

ECE410 Design Project Spring 2008 Design and Characterization of a CMOS 8-bit Microprocessor Data Path

Memory Basics. SRAM/DRAM Basics

Semiconductor Memories

e.g. τ = 12 ps in 180nm, 40 ps in 0.6 µm Delay has two components where, f = Effort Delay (stage effort)= gh p =Parasitic Delay

Layout of Multiple Cells

Static-Noise-Margin Analysis of Conventional 6T SRAM Cell at 45nm Technology

Lecture 5: Logical Effort

Contents. Overview Memory Compilers Selection Guide

RAM & ROM Based Digital Design. ECE 152A Winter 2012

Gate Delay Model. Estimating Delays. Effort Delay. Gate Delay. Computing Logical Effort. Logical Effort

Module 7 : I/O PADs Lecture 33 : I/O PADs

Lecture 7: Clocking of VLSI Systems

Chapter 9 Semiconductor Memories. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Optimization and Comparison of 4-Stage Inverter, 2-i/p NAND Gate, 2-i/p NOR Gate Driving Standard Load By Using Logical Effort

Layout and Cross-section of an inverter. Lecture 5. Layout Design. Electric Handles Objects. Layout & Fabrication. A V i

CS250 VLSI Systems Design Lecture 8: Memory

INSTITUTE OF AERONAUTICAL ENGINEERING Dundigal, Hyderabad

Module 4 : Propagation Delays in MOS Lecture 22 : Logical Effort Calculation of few Basic Logic Circuits

CMOS Binary Full Adder

University of Texas at Dallas. Department of Electrical Engineering. EEDG Application Specific Integrated Circuit Design

CSE140 Homework #7 - Solution

Alpha CPU and Clock Design Evolution

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

1. Memory technology & Hierarchy

Sequential 4-bit Adder Design Report

CHAPTER 11: Flip Flops

EE 42/100 Lecture 24: Latches and Flip Flops. Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad

Chapter 5 :: Memory and Logic Arrays

3D NAND Technology Implications to Enterprise Storage Applications

Introduction to Digital System Design

Winbond W971GG6JB-25 1 Gbit DDR2 SDRAM 65 nm CMOS DRAM Process

Lab 3 Layout Using Virtuoso Layout XL (VXL)

Gates & Boolean Algebra. Boolean Operators. Combinational Logic. Introduction

Class 18: Memories-DRAMs

NAME AND SURNAME. TIME: 1 hour 30 minutes 1/6

Memory. The memory types currently in common usage are:

Mixed Logic A B A B. 1. Ignore all bubbles on logic gates and inverters. This means

CHAPTER 16 MEMORY CIRCUITS

Memory unit. 2 k words. n bits per word

A N. O N Output/Input-output connection

How To Design A Chip Layout

Low Power AMD Athlon 64 and AMD Opteron Processors

The components. E3: Digital electronics. Goals:

Pass Gate Logic An alternative to implementing complex logic is to realize it using a logic network of pass transistors (switches).

NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter

Memory Systems. Static Random Access Memory (SRAM) Cell

Class 11: Transmission Gates, Latches

Gates, Circuits, and Boolean Algebra

GETTING STARTED WITH PROGRAMMABLE LOGIC DEVICES, THE 16V8 AND 20V8

Let s put together a Manual Processor

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Computer Architecture

Fairchild Solutions for 133MHz Buffered Memory Modules

Digital to Analog Converter. Raghu Tumati

Sequential Circuit Design

CADENCE LAYOUT TUTORIAL

Document Contents Introduction Layout Extraction with Parasitic Capacitances Timing Analysis DC Analysis

Chapter 10 Advanced CMOS Circuits

1-Mbit (128K x 8) Static RAM

CMOS Thyristor Based Low Frequency Ring Oscillator

Lecture 30: Biasing MOSFET Amplifiers. MOSFET Current Mirrors.

Design of Energy Efficient Low Power Full Adder using Supply Voltage Gating

Reconfigurable ECO Cells for Timing Closure and IR Drop Minimization. TingTing Hwang Tsing Hua University, Hsin-Chu

Adder.PPT(10/1/2009) 5.1. Lecture 13. Adder Circuits

Lecture 10: Latch and Flip-Flop Design. Outline

MICROPROCESSOR AND MICROCOMPUTER BASICS

A New Paradigm for Synchronous State Machine Design in Verilog

StarRC Custom: Next-Generation Modeling and Extraction Solution for Custom IC Designs

Analog & Digital Electronics Course No: PH-218

MM74HC4538 Dual Retriggerable Monostable Multivibrator

ESP-CV Custom Design Formal Equivalence Checking Based on Symbolic Simulation

Systematic Design for a Successive Approximation ADC

Lecture Notes 3 Introduction to Image Sensors

Advanced VLSI Design CMOS Processing Technology

EE 209 Lab 1 Sound the Alarm

Latch Timing Parameters. Flip-flop Timing Parameters. Typical Clock System. Clocking Overhead

CHAPTER 7: The CPU and Memory

Chapter 7 Memory and Programmable Logic

IL2225 Physical Design

CS311 Lecture: Sequential Circuits

Field-Effect (FET) transistors

Sequential Logic. (Materials taken from: Principles of Computer Hardware by Alan Clements )

Digital Systems Ribbon Cables I CMPE 650. Ribbon Cables A ribbon cable is any cable having multiple conductors bound together in a flat, wide strip.

1.1 Silicon on Insulator a brief Introduction

CAD TOOLS FOR VLSI. FLOORPLANNING Page 1 FLOORPLANNING

Digital Logic Elements, Clock, and Memory Elements

Here we introduced (1) basic circuit for logic and (2)recent nano-devices, and presented (3) some practical issues on nano-devices.

Title : Analog Circuit for Sound Localization Applications

ICS379. Quad PLL with VCXO Quick Turn Clock. Description. Features. Block Diagram

Chapter 6 TRANSISTOR-TRANSISTOR LOGIC. 3-emitter transistor.

Interfacing 3V and 5V applications

Fabrication and Manufacturing (Basics) Batch processes

CACTI 3.0: An Integrated Cache Timing, Power, and Area Model

EEC 118 Lecture #17: Implementation Strategies, Manufacturability, and Testing

ICS514 LOCO PLL CLOCK GENERATOR. Description. Features. Block Diagram DATASHEET

Lecture 24. Inductance and Switching Power Supplies (how your solar charger voltage converter works)

Multiplexers Two Types + Verilog

DM Segment Decoder/Driver/Latch with Constant Current Source Outputs

Transcription:

Lecture 12: MOS Decoders, Gate Sizing MAH, AEN EE271 Lecture 12 1 Memory Reading W&E 8.3.1-8.3.2 - Memory Design Introduction Memories are one of the most useful VLSI building blocks. One reason for their utility is that memory arrays can be extremely dense. This density results from their very regular wiring. Memories come in many different types (RAM, ROM, EEPROM) and there are many different types of cells, but the basic idea and organization is pretty similar. We will look at the most common memory cell that is used today, a 6T sram cell, and then look at the other components needed to build complete memory system. We will also look at other types of memories. MAH, AEN EE271 Lecture 12 2

Peripheral Circuits decoder mux We need to build the decoder and wordline drive circuits, and the column select and bitline drive circuits. For both we need to build a decoder -- something to select the correct line. Lets look at building decoders for CMOS memories. MAH, AEN EE271 Lecture 12 3 Decoders A decoder is just a structure that contains a number of AND gates, where each gate is enabled for a different input value. For a n-bit to 2 n decoder, we need to build 2 n, n-input AND gates. And we want to build these AND gates so they layout nicely (in a regular way) MAH, AEN EE271 Lecture 12 4

Large Fanin AND Gates In CMOS building this type of gate causes a problem, since large fanin implies a series stack. We will see a little later in the notes that the best way to do this is to use a two-level decoder by predecoding the inputs. In nmos the problem was easy, large fanin NOR gates work well. So a collection of NOR gates solves the problem very nicely. MAH, AEN EE271 Lecture 12 5 CMOS Decoders In CMOS, a large fanin gate implies a series stack. So we need to build a decoder that does not use a large fanin gate. But how? Use a 2-level decoder. An n-bit decoder requires 2n wires A0, A0, A1, A1, Each gate is an n bit NOR (NAND gate) Could predecode the inputs Send,,,, A2 A3 Instead of A0, A0, A1, A1, Maps 4 wires into 4 wires that need to go to the decoder Reduces the number of inputs to the decode gate by a factor of two. MAH, AEN EE271 Lecture 12 6

Predecode Example A1 A1 A0 A0 2 Bit Predecode No Predecode MAH, AEN EE271 Lecture 12 7 Predecode Predecode is just like what we did when we needed to make a single six input AND gate. Did it in a few levels: predecode decode gate One can do a 2 input predecode, or a 3 input predecode A 2 input predecoder generates 4 outputs A 3 input predecoder generates 8 outputs The difference with standard logic is that we need to decode all possible inputs. This means that each predecode gate can be reused by many final decode gates. A little planning can yield a regular layout. MAH, AEN EE271 Lecture 12 8

Predecode A predecoded decoder: A 0 A 1 A 2 A 3 A 4 A 5 MAH, AEN EE271 Lecture 12 9 Layout Issues Often we need to build large array structures (for example we need a large RAM), so we want to layout the decoder in as little space as possible. We need to find a good way to layout this structure. Clearly we need to run the address lines through each decoder cell, and stack the decoder cells next to each other. MAH, AEN EE271 Lecture 12 10

Predecode Layout The output of the predecode gate need to drive the address lines. These address lines are usually high capacitance So usually it is better to use a NAND with an inverter buffer as the predecode cells. Cells can be placed on top of the address lines, or to the left of the address lines. predecode cells decode cells MAH, AEN EE271 Lecture 12 11 Decoder Cell Layout Need to have n and p transistors Need to take up minimum space Want it to be easy to program the cell While layout is regular each cell is different It connects to a different set of inputs Look at a couple of layout styles MAH, AEN EE271 Lecture 12 12

Decoder Layout Cell Area is proportional to n 2. Decoder area is n 3. A 0 Gnd Vdd A 0 A 1 A 1 A 2 A 2 The problem with this layout is that most of the space is wasted. All of the area under the wires is wasted. We should rotate the gate to fit under the wires. MAH, AEN EE271 Lecture 12 13 A Slightly Better Decoder Layout Better cell design (like we have talked about) Out1 Out0 A 0 A 0 A 1 A 1 A 2 A 2 Vdd Gnd In this layout, the basic cell remains unchanged, it is the wire contacts that are programmed. This is sometimes a good idea, since it lets you optimize the decode cell (in this case the 3 input gate) MAH, AEN EE271 Lecture 12 14

A Smaller Layout Leave space for all the tracks in the cell Address lines in M2/Poly Vdd Out1 Gnd Out0 A 0 A 0 A 1 A 1 A 2 A 2 Need to program the decoder by placing transistors, or metal. With predecode, you have more tracks per transistor. MAH, AEN EE271 Lecture 12 15 Wordline Driver Decoder is just part of the wordline drive circuit Also need to qualify the wordline (AND with clock) Also need to buffer the signal to drive WL cap Clock qualification can be done in the decoder A0 An Phi1 - just another input to the decoder Usually not a great idea, since this can lead to large skew Clock AND is usually done in last stage before driver decode_s1 can be large devices wordline_q1 Φ1 or use normal NAND gate MAH, AEN EE271 Lecture 12 16

Thin Drivers Wordline pitch of memory cell is not that tight (about 40λ), but not that large either. There are some memories (ROMs, drams) with much tighter pitch. For many of these applications you need thin gates and drivers. The minimum useful space is 16λ Decoder is here In Out 16λ Gnd Vdd Contacts can be shared For the wordline driver, I might use two of these drivers in parallel, to reduce the horizontal length (effectively fold the transistors again) MAH, AEN EE271 Lecture 12 17 Putting it Together Floorplan for a memory Bit Line Precharge Φ1 Memory Array Row Decode Mem Mem Drv Drv Decoder Decoder Predecoder Column Mux Bit IO 2:1 Mux & Bit IO R/W Address Built using Array constructs in Magic Decoder base is often array, with programming done by software Memory is built by arraying a cell that contains the cell and its mirror MAH, AEN EE271 Lecture 12 18

Transistor Sizing For memories (and other structures) you end up with long high cap wires Need to drive these large capacitors quickly, and this sets the device size We will look at chain of inverters first, and then think about gates Factors to consider in gate sizing: Need to think about the load you are driving Need to think about the load you present to your predecessor Why transistor sizes matter when you are driving a large capacitance 13ns falling 26ns rising min 4λ:2λ 2pF (10mm of metal2) MAH, AEN EE271 Lecture 12 19 Buffer (or Gate) Sizing But bigger gates have bigger input capacitance too: Delay = 4ns - falling 8ns - rising Delay = 0.3ns min 400-p 200-n 2pF Clearly we need to make the predriver larger too. Is there an optimal solution? Yes, in a way Minimize delay of chain - for the minimum all delays will match (why?) 1 f f 2 f 3 Equalizing delay principle applies to any critical path through gates. MAH, AEN EE271 Lecture 12 20

Buffer Chains to drive a Big Load Each gate drives a gate that is f times as large as it is. Assume that Wp = 2 Wn, so rise and fall times are the same 1. Final load cap is C L since the fanout (C L /C in ) of each gate is the same f N * C in = C L N = ln(c L /C in )/ln(f) Most books assume that the delay of a gate is: = f T gate = f * delay of a gate with a fanout of 1 Total delay = N f T gate = ln(c L /C in ) T gate f / ln(f) 4 3 2 1 0 has a minimum at e, but it is pretty flat MAH, AEN EE271 Lecture 12 21 Analysis is Slightly Wrong The optimal point of e assumes no wire delay and that the gate delay is Gate Delay = f T gate But this is not true, since the delay of a gate with no load is not zero. Each gate has some intrinsic delay from its own diffusion capacitance Gate Delay = T 0 + f T g = (f + α) T g.5 < α < 2 Total delay = N f T gate = ln(c L /C in ) T gate (f + α) / ln(f) 7 6 5 4 3 2 α = 1 α = 0 1 0 So, really should use f about 4 between stages. MAH, AEN EE271 Lecture 12 22

Gates Two issues: What about things other than inverters? Principle still holds. What happens when the wire load is not negligible? In general it is a little complicated General rule still holds - You should not be able to make any transistor bigger and decrease the delay (cost to your predecessor should equal your gain) For gates, you want to keep the loaded delays roughly equal This is not the same as keeping the fanout the same A NOR gate has a larger delay than an INV at the same fanout This difference is sometimes called logical effort (more later) MAH, AEN EE271 Lecture 12 23 Gate Sizing For a fixed number of stages: If any of the delays are not equal, Make the gate with the largest delay larger. Decreases its delay, and increases its predecessor s delay. But since its delay started larger, there will be a net win. Optimal is when the delays are equal For design, you don t want tons of SPICE or irsim simulations. What you really want is a spreadsheet or a program that solves the sizing problem as a linear optimization problem where each size is a dimension of the solution space. MAH, AEN EE271 Lecture 12 24

Wire Cap For fastest systems, want the wire capacitance to be small compared to gate capacitance But this leads to very large transistors. Compromise is to try to keep ratio from 30% to 70% wire For a standard cell library how big should the transistors be? Want the delay to have some tolerance to placements. Implies that the wire capacitance should be a small fraction of total Long wires are probably millimeters (.2pF) So, transistors should be pretty large (10-20x minimum size) OK, since transistors are the free things that fit under wires. Trend is toward larger transistors. Stick layout diagrams should show transistor widths. In industry, you don t default to minimum size transistors, you should default to 5-10x minimum size. MAH, AEN EE271 Lecture 12 25 + Wire Resistance Previous slides ignored wire resistance For short wires this is ok (R wire «R trans ) As wire gets longer R wire gets larger R trans gets smaller (larger transistor to drive larger capacitance) Can become an issue Wire resistance delay is proportional to length 2, Capacitance of wire is proportional to length Resistance is proportional to length too Sometimes add repeaters to reduce the total wire delay. Break the quadratic increase, but adds buffer delay MAH, AEN EE271 Lecture 12 26

+ Long Wires For long wires, we can separate the optimal wire into three regions: Buffer-Up/Down from input Optimal Repeater Distance Buffer-Up/Down to Load Optimal Repeater Size In the middle region, there is an optimal spacing that minimizes the delay Added buffer delay is matched by reduced wire delay Can use RC model to find optimal length, and repeater size The chain at the start and end buffer up or down from the driver and load to this natural size. This optimum distance is about 4 to 7mm in typical 0.5µ fabs. MAH, AEN EE271 Lecture 12 27 General Rules of Thumb (for speed) Try to keep the fanout of all gates to be less than 5 Try to keep the delays of the gates in a critical path roughly the same. Large fanin gates should have smaller fanout Keep fanin limited Often need short buffer chains (one inverter) Be flexible on sense of logic (push inversions around) Don t use minimum size transistors, unless you know the wire is short MAH, AEN EE271 Lecture 12 28