路 論 Chapter 15 System-Level Physical Design



Similar documents
Topics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

Lecture 7: Clocking of VLSI Systems

Sequential Circuit Design

Power Reduction Techniques in the SoC Clock Network. Clock Power

Timing Methodologies (cont d) Registers. Typical timing specifications. Synchronous System Model. Short Paths. System Clock Frequency

Clocking. Figure by MIT OCW Spring /18/05 L06 Clocks 1

Lecture 11: Sequential Circuit Design

Lecture 10: Sequential Circuits

PowerPC Microprocessor Clock Modes

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: ext: Sequential Circuit

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 16 Timing and Clock Issues

PROGETTO DI SISTEMI ELETTRONICI DIGITALI. Digital Systems Design. Digital Circuits Advanced Topics

TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN

Chapter 7 Memory and Programmable Logic

Sequential Circuits. Combinational Circuits Outputs depend on the current inputs

NAME AND SURNAME. TIME: 1 hour 30 minutes 1/6

Two-Phase Clocking Scheme for Low-Power and High- Speed VLSI

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

S. Venkatesh, Mrs. T. Gowri, Department of ECE, GIT, GITAM University, Vishakhapatnam, India

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Latch Timing Parameters. Flip-flop Timing Parameters. Typical Clock System. Clocking Overhead

Computer Architecture

Flip-Flops, Registers, Counters, and a Simple Processor

EE411: Introduction to VLSI Design Course Syllabus

Sequential Logic. (Materials taken from: Principles of Computer Hardware by Alan Clements )

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

Lecture 10 Sequential Circuit Design Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

Clock Distribution in RNS-based VLSI Systems

DATA SHEET. HEF40193B MSI 4-bit up/down binary counter. For a complete data sheet, please also download: INTEGRATED CIRCUITS

A New Paradigm for Synchronous State Machine Design in Verilog

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

DESIGN CHALLENGES OF TECHNOLOGY SCALING

Digital Logic Design Sequential circuits

ETEC 2301 Programmable Logic Devices. Chapter 10 Counters. Shawnee State University Department of Industrial and Engineering Technologies

Sequential 4-bit Adder Design Report

Alpha CPU and Clock Design Evolution

Static-Noise-Margin Analysis of Conventional 6T SRAM Cell at 45nm Technology

RAM & ROM Based Digital Design. ECE 152A Winter 2012

Layout of Multiple Cells

Computer Systems Structure Main Memory Organization

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Counters. Present State Next State A B A B

DM74LS169A Synchronous 4-Bit Up/Down Binary Counter

Read-only memory Implementing logic with ROM Programmable logic devices Implementing logic with PLDs Static hazards

Latches, the D Flip-Flop & Counter Design. ECE 152A Winter 2012

Sequential Logic: Clocks, Registers, etc.

Chapter 9 Latches, Flip-Flops, and Timers

We r e going to play Final (exam) Jeopardy! "Answers:" "Questions:" - 1 -

Chapter 2 Logic Gates and Introduction to Computer Architecture

Low Power AMD Athlon 64 and AMD Opteron Processors

EE 42/100 Lecture 24: Latches and Flip Flops. Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad

Fault Modeling. Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults. Transistor faults Summary

McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures

TIMING-DRIVEN PHYSICAL DESIGN FOR DIGITAL SYNCHRONOUS VLSI CIRCUITS USING RESONANT CLOCKING

CHAPTER 11 LATCHES AND FLIP-FLOPS

CSE140: Components and Design Techniques for Digital Systems

CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE

Introduction to Programmable Logic Devices. John Coughlan RAL Technology Department Detector & Electronics Division

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

. MEDIUM SPEED OPERATION - 8MHz . MULTI-PACKAGE PARALLEL CLOCKING FOR HCC4029B HCF4029B PRESETTABLE UP/DOWN COUNTER BINARY OR BCD DECADE

PROGETTO DI SISTEMI ELETTRONICI DIGITALI. Digital Systems Design. Digital Circuits Advanced Topics

Modeling Latches and Flip-flops

LOW POWER DESIGN OF DIGITAL SYSTEMS USING ENERGY RECOVERY CLOCKING AND CLOCK GATING

Computer organization

ECE410 Design Project Spring 2008 Design and Characterization of a CMOS 8-bit Microprocessor Data Path

The components. E3: Digital electronics. Goals:

Module 3: Floyd, Digital Fundamental

Having read this workbook you should be able to: recognise the arrangement of NAND gates used to form an S-R flip-flop.

Master/Slave Flip Flops

NOTE: The Flatpak version has the same pinouts (Connection Diagram) as the Dual In-Line Package.

Chapter 5. Sequential Logic

LAB #4 Sequential Logic, Latches, Flip-Flops, Shift Registers, and Counters

Interconnection Networks

WEEK 8.1 Registers and Counters. ECE124 Digital Circuits and Systems Page 1

Lab #5: Design Example: Keypad Scanner and Encoder - Part 1 (120 pts)

On-chip clock error characterization for clock distribution system

Combinational Logic Design Process

HCC/HCF4032B HCC/HCF4038B

Design Example: Counters. Design Example: Counters. 3-Bit Binary Counter. 3-Bit Binary Counter. Other useful counters:

54191 DM54191 DM74191 Synchronous Up Down 4-Bit Binary Counter with Mode Control

AC : PRACTICAL DESIGN PROJECTS UTILIZING COMPLEX PROGRAMMABLE LOGIC DEVICES (CPLD)

Chapter 2 Clocks and Resets

Experiment # 9. Clock generator circuits & Counters. Eng. Waleed Y. Mousa

So far we have investigated combinational logic for which the output of the logic devices/circuits depends only on the present state of the inputs.

True Single Phase Clocking Flip-Flop Design using Multi Threshold CMOS Technique

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

ASYNCHRONOUS COUNTERS

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

Memory Elements. Combinational logic cannot remember

Obsolete Product(s) - Obsolete Product(s)

DM54161 DM74161 DM74163 Synchronous 4-Bit Counters

EE552. Advanced Logic Design and Switching Theory. Metastability. Ashirwad Bahukhandi. (Ashirwad Bahukhandi)

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Topics. Flip-flop-based sequential machines. Signals in flip-flop system. Flip-flop rules. Latch-based machines. Two-sided latch constraint

54LS169 DM54LS169A DM74LS169A Synchronous 4-Bit Up Down Binary Counter

Registers & Counters

Transcription:

Introduction to VLSI Circuits and Systems 路 論 Chapter 15 System-Level Physical Design Dept. of Electronic Engineering National Chin-Yi University of Technology Fall 2007

Outline Clocked Flip-flops CMOS Clocking Styles Pipelined Systems Clock Generation and Distribution System Design Considerations

Clocked Flip-flops Synchronous design employs clocking signals to coordinate the movement of data through the system In Figure 15.2, the data bit D is loaded into the DFF only on a rising clock edge Q t + t ff ) = D( ) (15.1) ( 0 t0 Where t o is the rise edge, and t ff is the time delay when the output has this value In high speed design, the limiting circuit factor is the DFF delay time t ff that is determined by the electronics and the load» Decreasing t ff allows for a higher frequency clock» The tradeoff of speed and power Figure 15.1 Ideal clocking signal Figure 15.2 Timing in a DFF

Classical State Machines (1/2) Two models for state machines that use single-clock timing are shown in Figure 15.3» Moore machine and Mealy machine» Huffman model (Figure 15.4): contains both the Moore and Mealy models (a) Moore machine (b) Mealy machine Figure 15.3 Moore and mealy state machines Figure 15.4 Huffman model of a state machine

Classical State Machines (2/2) FPGA design are also heavily based on classical state machine theory» For examples, in combinational logic, Individual gates, PLAs, Programming Logic Device (PLD), and groups of multiplexors» Programming is achieved with EPROMs, fuses, SRAM arrays, and some contain lookup tables (LUT, e.g. FPGA) to aid in the design In Figure 15.5 m r r T > t + t + t ff d su (15.2) (15.3) (The AND-plane of the PLA can be programmed to produce minterms m r ) (The clock T must be large enough to allow completion) Where, t ff is the delay time form input to output of the flip-flop t d is the logic delay time through the PLA t su is the setup time of the flip-flip Figure 15.5 Huffman state machine using PLA logic

Outline Clocked Flip-flops CMOS Clocking Styles Pipelined Systems Clock Generation and Distribution System Design Considerations

Clocked Logic Cascades Ideal waveforms for the complimentary signal φ φ = 0 (15.4) (Figure 15.6) V max = V DD V Tn (15.5) (Figure 15.7) (a) FETs (b) Transmission gate Figure 15.7 Clock-controlled transistors Figure 15.6 Complementary clocks Figure 15.8 A clocked cascade

Timing Circles and Clock Skew Timing circles: are simple constructs that can be useful for visualizing data transfer» In Figure 15.9, this defines what is known as a 50% duty cycle Clock skew: the timing of a clock is out of phase with the system reference Figure 15.10 Clock generation circuit Figure 15.11 Clock skew Figure 15.9 Timing circle for a single-clock, dual-phase cascade Figure 15.12 Timing circle with clock skew

Circuit Effects and Clock Frequency (1/2) The logic-level description of the clocked cascade masks the circuit characteristics that determine the ultimate speed T 2 min = t FET + t NOT (15.7) f 1 = T min = max 1 2 t h (15.12) T 2 f f I max max leak T 2 min 1 = T = t r + t min 1 = T min = C max = in t h, FET HL, NOT 1 = 2( t r + t, FET 1 = 2( t r + t dv dt in, FET HL, NOT CL ) (15.8) ) (15.11) (15.12) (15.9) (15.10) V M V = DD V Tp 1+ W κ n ' β n L n = β p W κ p ' L : β n κ n ' = β κ ' p p p + β β n p β n V β p Tn (15.14) (15.15) (15.13) Figure 15.13 Shift register circuit (a) Circuit (b) Voltage decay Figure 15.14 Charge leakage in the shift register

Circuit Effects and Clock Frequency (2/2) Figure 15.15 Clocking waveforms with finite rise and fall times Figure 15.16 Static shift register design

Dual Non-overlapping Clocks In this technique, two distinct non-overlapping clocks 1 and 2 are used such that (Fig. 15.17 and 15.18) φ t) φ ( t) 0 (15.17) 1 ( 2 = Finite-state machines that are based on dualclock schemes can provide powerful interactive capabilities (Fig. 15.19) Figure 15.18 Timing circuit for a 2-clock network Figure 15.17 Dual non-overlapping clocks Figure 15.19 A dual-clock finite-state machine design

Other Multiple-clock Schemes It is possible to create different mutlipleclock schemes to control clocked logic cascades and state machine» For example, a triple, non-overlapping clock set However, in modern high-speed VLSI, complicated clocking schemes introduce too many problems» Solution: speed gains are accomplished by improved circuit design, processing, and architectural modifications» The most popular approach is to use a single-clock, dual-phase system Figure 15.20 Triple, non-overlapping clock signals Figure 15.21 Timing circle for a 3-clock non-overlapping network

Dynamic Logic Cascades (1/2) Dynamic logic circuits achieve synchronized data flow by controlling the internal operational states of the logic gate circuits» Typical domino logic state: In section 9.5 of Chapter 9» P: per-charge phase» E: Evaluation phase In Figure 15.23, Figure 15.22 Operation of a domino logic state Figure 15.23 A dynamic logic cascade

Dynamic Logic Cascades (2/2) In Figure 15.24, the data transfer into and out of a dynamic logic cascade is sequenced with the clock» The number of stages that can be included in the chain is determined by the delay for the case where every stage switches» However, this introduces charge leakage problem» Solution: charge keeper circuits True Single-Phase Clock (TSPC): use only a single clock throughout» Two TSPC latches are shown in Figure 15.25 Figure 15.24 Timing sequence in the domino cascade (a) n-block (b) p-block Figure 15.25 True single-phase clock latches

Outline Clocked Flip-flops CMOS Clocking Styles Pipelined Systems Clock Generation and Distribution System Design Considerations

Basic Concept of Pipelining Pipelining is a tech. that is used to increase the throughput of a sequential set of distinct data inputs through a synchronous logic cascade T > t + t + t + t ff d su s (15.18) Figure 15.26 Basic pipelined stage for timing analysis t hold < PW (15.19) (PW: pulse width of the clock) f < t ff + t d 1 + t su + t s (15.20) Where, t ff : flip-flop delay time t d : logic delay time t su : setup time of the flip-flip t hold : hold time of the flip-flip Figure 15.27 Waveform quantities for timing analysis

Pipelining (1/2) Pipelined systems are designed to increase the overall throughput of a set of sequential input states by dividing the cascade into small visualization of the problem (Figure 15.28) Once a circuit completes a calculation and passes the result on to the next stage, it remain idle for the rest of the clock cycle (Figure 15.29) Figure 15.28 Logic chains in a clocked system Figure 15.29 Circuit activity in a logic cascade

Pipelining (2/2) Since the delay through a logic gate varies its complexity and parasitics, the logic propagation rate will not be uniform Figure 15.31 A 4-stage pipeline Figure 15.30 Progression times in the logic cascade In Figure 15.31, dividing the long logic into small groups, add registers between the sections, and use a faster clock, then most of the circuits will be active at any given time 4 T + ( N 1) T = ( N + 3) T T pipe pipe i > t ff + t su + td, i + ts, i+ 1 T pipe = max{ T 1,..., T m } pipe (15.21) (15.22) (15.23) Figure 15.32 Pipeline with positive edge and negative edge-triggering

Outline Clocked Flip-flops CMOS Clocking Styles Pipelined Systems Clock Generation and Distribution System Design Considerations

Clock Distribution When frequencies f reach the 1GHz (10 9 Hz) level corresponding to a clock period of 1 T = = 1 f ns However, distribution of the clocking signal to various points of the chip is complicated because the intrinsic RC time delay increases as the square of the line length l according to τ = 2 Bl (15.24) (15.25) Figure 15.31 A 4-stage pipeline t t 2 2 1 = B( l b la 2 2 2 = B( l c lb ) ) (15.26) (15.27) Problems: clock skew and signal distortion will be very difficult to deal with in large chips Figure 15.31 A 4-stage pipeline

Clock Stabilization and Generation (1/2) Figure 15.35 A basic clock stabilization network Figure 15.36 Phase-locked loop (PLL) stabilization circuit

Clock Stabilization and Generation (2/2) Figure 15.37 Inverter-based clock generation circuit C = C line + CG (15.28) Figure 15.39 Generating complementary clocks using a latch Figure 15.38 Skew minimization circuit Figure 15.40 Circuit for producing non-overlapping clocks

Clock Routing and Driver Trees (a) Clocking points (b) First grouping (c) Interior routing Figure 15.41 Simplified view of the clock routing problem (a) First (b) Second (c) Third (d) Routing Figure 15.42 Partitioning steps for defining clocking groups

Driver Tree (1/2) Figure 15.43 Geometrical analysis of the letter H (a) Driver tree (b) Application to H-tree Figure 15.45 Driver tree arrangement Figure 15.44 Macro-level H-type distribution tree Figure 15.46 Driver tree design with interconnect parasitics

Driver Tree (2/2) (a) Driver tree (b) Chip distribution Figure 15.47 Distribution scheme with equivalent driver segments Figure 15.49 Single driver tree with multiple outputs Figure 15.48 Electrical circuit for a non-symmetrical distribution Figure 15.50 Asynchronous system clocking Figure 15.51 Operation of a self-timed element

Outline Clocked Flip-flops CMOS Clocking Styles Pipelined Systems Clock Generation and Distribution System Design Considerations

Bit Slice Design (1/2) For example, an ALU circuit design (Fig. 15.52) C = C( A, B; control) (15.30) Bit slice design is based on the fact that logic deals with data at the bit-level Figure 15.52 An n-bit ALU Figure 15.53 Block description

Bit Slice Design (2/2) With the bit slice philosophy, an n-bit ALU is created by paralleling n identical slices as shown in Figure 15.55» The repetition will also appear at the logic, circuit, and silicon levels» Advantage: library and instance building for large system design For example, in Verilog Figure 15.54 An ALU bit slice reg A, B, C; (1-bit data-type) reg[ 31,0] A, B, C; (32-bit data-type) Figure 15.55 ALU design with bit slices

Cache Memory Cache memory is designed to be place in between the CPU and the main memory to speed up the system operation» SRAM memory structure» Instruction-cache (I-cache) : is used to hold instructions that are fetched from the M.M. where the program to run freely» Data-cache (D-cache): is used to hold the results of computations Figure 15.57 shows a simple block diagram for a dual-issue superscalar design using I- cache and D-cache (Computer Architecture) (a) Basic system (b) Cache modification Figure 15.56 Adding cache memory Figure 15.57 Block diagram of a dualissue superscalar machine

Systolic Systems and Parallel Processing In a systolic system, the movement of the data is controlled by a clock, moving one phase on each cycle In parallel processing, a PE (processor element) can be as simple as an AND gate, or as a complex as a general-purpose microprocessor (Fig. 15.58)» Individual process elements communicate via switching arrays, which are controlled by either local or global signals» The strongest aspects of VLSI: Wafer scale integration, WSI Figure 15.58 Regular patterning in a parallel processing network