Sequential Circuit Design Lan-Da Van ( 倫 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2009 ldvan@cs.nctu.edu.tw http://www.cs.nctu.edu.tw/~ldvan/
Outlines Introduction Sequencing Methods Latches and Flip-Flops Sequential System Design Conclusion Lan-Da Van VLSI-06-2
Sequential Machines Use memory elements to make primary output values depend on (state + primary inputs). Varieties: Mealy machines outputs function of present state and inputs; Moore machines outputs depend only on state. Machine computes next state N, primary outputs O from current state S, primary inputs I. Next-state function: N = δ(i,s). Output function (Mealy): O = λ(i,s). Duty cycle: fraction of clock period for which clock is active (e.g., for active-low clock, fraction of time clock is 0). Lan-Da Van VLSI-06-3
FSM Structure Lan-Da Van VLSI-06-4
Sequencing Elements Latch: level sensitive Transparent latch, D latch Flip-flop: edge triggered Master-slave flip-flop, D flip-flop, D register Timing Diagrams Transparent Edge-trigger D clk Latch Q D clk Flop Q clk D Q (latch) Q (flop) Lan-Da Van VLSI-06-5
Memory Elements Store a value as controlled by one or more control inputs. May have multiple control inputs. Clock, Load, S-R, In CMOS, memory is created by: capacitance (dynamic); feedback (static). Storage element Latch: transparent when internal memory is being set from input. Flip-flop: not transparent reading input and changing output are separate events. Lan-Da Van VLSI-06-6
Memory Categories Memory Arrays Random Access Memory Serial Access Memory Content Addressable Memory (CAM) Read/Write Memory (RAM) (Volatile) Read Only Memory (ROM) (Nonvolatile) Shift Registers Queues Static RAM (SRAM) Dynamic RAM (DRAM) Serial In Parallel Out (SIPO) Parallel In Serial Out (PISO) First In First Out (FIFO) Last In First Out (LIFO) Mask ROM Programmable ROM (PROM) Erasable Programmable ROM (EPROM) Electrically Erasable Programmable ROM (EEPROM) Flash ROM Lan-Da Van VLSI-06-7
Setup & Hold Times Setup time: time before clock during which data input must be stable. Hold time: time after clock event for which data input must remain stable. clock data Lan-Da Van VLSI-06-8
Sequencing Methods Flip-flops T c 2-Phase Latches Pulsed Latches Flip-Flops clk clk Flop Combinational Logic clk Flop 2-Phase Transparent Latches Pulsed Latches φ 1 φ 2 φ p φ 1 φ 2 φ 1 Latch t pw φ p Latch T c /2 Combinational Logic t nonoverlap Latch Combinational Logic Combinational Logic Half-Cycle 1 Half-Cycle 1 t nonoverlap Latch φ p Latch Lan-Da Van VLSI-06-9
Timing Diagrams Contamination and Propagation Delays t pd Logic Prop. Delay A Combinational Logic Y A Y t cd t pd t cd t pcq t ccq t pdq Logic Cont. Delay Latch/Flop Clk-Q Prop Delay Latch/Flop Clk-Q Cont. Delay Latch D-Q Prop Delay D clk Flop Q clk D Q t setup t ccq t hold t pcq t pcq t setup t hold Latch D-Q Cont. Delay Latch/Flop Setup Time Latch/Flop Hold Time D clk Latch Q clk D Q t setup t hold t t ccq pcq t cdq t pdq Lan-Da Van VLSI-06-10
Max-Delay: Flip-Flops ( ) setup t T t + t pd c 14243 pcq sequencing overhead clk clk F1 Q1 Combinational Logic D2 F2 T c clk t pcq t setup Q1 t pd D2 Lan-Da Van VLSI-06-11
Max-Delay Example (1/2) Suppose the registers are built from flip-flops with a setup time of 62ps, hold time of -10ps, propagation delay of 90nps and contamination delay of 75ps. Lan-Da Van VLSI-06-12
Max-Delay Example (2/2) T t + t + c pcq pd t setup t pd = 590 + 60 + 100 + 80 + 100 + 70 = 1000 ps T c 90 + 1000 + 62 = 1152 ps Lan-Da Van VLSI-06-13
Max Delay: 2-Phase Latches ( 2 ) tpd = tpd1+ tpd 2 Tc tpdq 123 sequencing overhead φ 1 φ 2 φ 1 D1 Q1 Combinational D2 Q2 Combinational D3 Logic 1 Logic 2 L1 L2 L3 Q3 φ 1 φ 2 T c D1 t pdq1 Q1 t pd1 D2 t pdq2 Q2 t pd2 D3 Lan-Da Van VLSI-06-14
Max Delay: Pulsed Latches ( ) setup tpd Tc max tpdq, tpcq + t tpw 14444244443 sequencing overhead L1 L2 Lan-Da Van VLSI-06-15
Max-Delay Example Re-compute the ALU self-bypass path cycle time if the flip-flop is replaced with a pulsed latch. The pulsed latch has a pulse width of 150 ps, a setup time of 40 ps, a hold time of 5 ps, a clk-to-q propagation delay of 82 ps and contamination delay of 52 ps, and a D- to-q propagation delay of 92 ps. Solution: t max (, ) pd Tc tpdq tpcq + tsetup tpw 14444244443 sequencing overhead T c max( 92 + 1000,82 + 1000 + 40 150) = 1092 ps Lan-Da Van VLSI-06-16
Min-Delay: Flip-Flops clk F1 Q1 CL t t t cd hold ccq clk D2 F2 clk Q1 t ccq t cd D2 t hold Lan-Da Van VLSI-06-17
Min-Delay Example In the ALU self-bypass example with the flip-flop from Fig. 7.6, the earliest input to the late bypass multiplexer is the imm value coming from another flip-flop. Will this path experience any hold time failures? Solution: No. The late bypass mux has t cd =45 ps. The flip-flops have t hold =-10ps and t ccq =75 ps. Hence, t cd =45 ps is larger than (t hold -t ccq =-10-75=-85 ps). Lan-Da Van VLSI-06-18
D2 φ 1 φ 2 Min-Delay: 2-Phase Latches φ 2 L2 t t t t t cd1, cd 2 hold ccq nonoverlap t nonoverlap φ 1 L1 Q1 t ccq CL Hold time reduced by nonoverlap Paradox: hold applies twice each cycle, vs. only once for flops. But a flop is made of two latches! Q1 t cd D2 t hold Lan-Da Van VLSI-06-19
Min-Delay: Pulsed Latches t t t + t cd hold ccq pw φ p L1 φ p Q1 CL Hold time increased by pulse width D2 L2 φ p t pw t hold Q1 t ccq t cd D2 Lan-Da Van VLSI-06-20
Time Borrowing In a flop-based system: Data launches on one rising edge Must setup before next rising edge If it arrives late, system fails If it arrives early, time is wasted Flops have hard edges In a latch-based system Data can pass through latch while transparent Long cycle of logic can borrow time into next As long as each loop completes in one cycle Lan-Da Van VLSI-06-21
Time Borrowing Example φ 1 φ 2 φ1 φ1 φ 2 (a) Latch Combinational Logic Latch Combinational Logic Latch Borrowing time across half-cycle boundary Borrowing time across pipeline stage boundary φ 1 φ 2 (b) Latch Combinational Logic Latch Combinational Logic Loops may borrow time internally but must complete within the cycle Lan-Da Van VLSI-06-22
2-Phase Latches How Much Borrowing? T c borrow setup nonoverlap ( ) Pulsed Latches φ 1 φ 2 t t t t t + t borrow pw setup 2 D1 L1 Q1 Combinational Logic 1 D2 L2 Q2 φ 1 φ 2 t nonoverlap T c T c /2 Nominal Half-Cycle 1 Delay t borrow t setup D2 Lan-Da Van VLSI-06-23
Clock Skew We have assumed zero clock skew Clocks really have uncertainty in arrival time Decreases maximum propagation delay Increases minimum contamination delay Decreases time borrowing Clock must arrive at all memory elements in time to load data. Lan-Da Van VLSI-06-24
Clock Skew: Flip-Flops F2 F1 F2 ( ) setup skew tpd Tc tpcq + t + t 14 424443 sequencing overhead F1 t t t + t cd hold ccq skew Lan-Da Van VLSI-06-25
Clock Skew: Latches 2-Phase Latches ( 2 ) tpd Tc tpdq 123 sequencing overhead t, t t t t + t cd1 cd 2 hold ccq nonoverlap skew φ 1 φ 2 φ 1 φ 2 φ 1 D1 Q1 Combinational D2 Q2 Combinational D3 Logic 1 Logic 2 L1 L2 L3 Q3 T t t + t + t 2 ( ) c borrow setup nonoverlap skew Pulsed Latches ( ) setup skew tpd Tc max tpdq, tpcq + t tpw + t 1444442444443 sequencing overhead t t + t t + t cd hold pw ccq skew ( ) t t t + t borrow pw setup skew Lan-Da Van VLSI-06-26
Two-Phase Clocking If setup times are violated, reduce clock speed If hold times are violated, chip fails at any speed In this class, working chips are most important No tools to analyze clock skew An easy way to guarantee hold times is to use 2- phase latches with big nonoverlap times Call these clocks φ 1, φ 2 (ph1, ph2) Lan-Da Van VLSI-06-27
Signal Skew Machine data signals must obey setup and hold times avoid signal skew. Lan-Da Van VLSI-06-28
Data Shoot Through Latches do not cut combinational logic when clock is active. Latch-based machines must use multiple ranks of latches. Multiple ranks require multiple phases of clock. Data shoot through occurs if single-phase latch is used. Lan-Da Van VLSI-06-29
Unbalanced Delays Logic with unbalanced delays leads to inefficient use of logic: short clock period long clock period Lan-Da Van VLSI-06-30
Retiming Solution Retiming moves memory elements through combinational logic: Property: Retiming changes encoding of values in registers, but proper values can be reconstructed with combinational logic. Retiming must preserve number of latches OR registers around a cycle. Lan-Da Van VLSI-06-31
Summary Flip-Flops: Very easy to use, supported by all tools 2-Phase Transparent Latches: Lots of skew tolerance and time borrowing Pulsed Latches: Fast, some skew tol & borrow, hold time risk Lan-Da Van VLSI-06-32
Outlines Introduction Sequential Methods Latches and Flip-Flops Sequential System Design Conclusion Lan-Da Van VLSI-06-33
Dynamic Latch (1/3) Pass Transistor Latch Pros Tiny Low clock load Cons V t drop Leakage away Backdriving Diffusion input φ D Q Used in 1970 s Transmission gate No V t drop Leakage away Backdriving Diffusion input Requires inverted clock φ D Q φ Lan-Da Van VLSI-06-34
Dynamic Latch (2/3) Store charge on inverter gate capacitance: φ = 0: transmission gate is off, inverter output is determined by storage node. φ = 1: transmission gate is on, inverter output follows D input. Lan-Da Van VLSI-06-35
Dynamic Latch (3/3) Inverting buffer No V t drop Leakage away No backdriving Fixes either Diffusion input (upper side) Output noise sensitivity with inverted output (bottom side) Setup and hold times determined by transmission gate must ensure that value stored on transmission gate is solid. Lan-Da Van VLSI-06-36
Stick Diagram V DD D Q V SS φ φ Lan-Da Van VLSI-06-37
Physical Layout V DD D Q V SS φ φ Lan-Da Van VLSI-06-38
Multiplexer Dynamic Latch Lan-Da Van VLSI-06-39
Static Latch (1/3) Must use feedback to restore value. Some latches are static on one phase (pseudo-static) load on one phase, activate feedback on other phase. SR Latch Lan-Da Van VLSI-06-40
Static Latch (2/3) Tristate feedback φ No V t drop Leakage compensation D X Q Backdriving risk Diffusion input φ φ Non-isolated from output noise Requires inverted clock Buffered input No V t drop φ φ Leakage compensation No backdriving No diffusion input D φ X φ Q Non-isolated from output noise Requires inverted clock φ Lan-Da Van VLSI-06-41
Static Latch (3/3) Buffered output No V t drop Leakage compensation No backdriving No diffusion input Isolated from output noise Requires inverted clock Widely used in Artisan standard cells Very robust (most important) Rather large Rather slow (1.5 2 FO4 delays) High clock loading D φ φ X φ φ Q Lan-Da Van VLSI-06-42
Multiplexer Static Latches Mux Static Latch No V t drop Leakage compensation No backdriving No diffusion input Requires inverted clock Negative Latch Positive Latch Lan-Da Van VLSI-06-43
Recirculating Quasi-Static Latch Eliminate the problem: the value stored on the capacitor leaks away over time on dynamic latch Quasi-static: the latch data will vanish if the clocks are ceased. (i.e. static on one phase) Lan-Da Van VLSI-06-44
Clocked Inverter φ = 0: If both clocked transistors are off, output is floating. φ = 1: If both clocked inverters are on, acts as an inverter to drive output. circuit symbol Lan-Da Van VLSI-06-45
Clocked Inverter Latch φ = 0: i1 is off, i2-i3 form feedback circuit. φ = 1: i2 is off, breaking feedback; i1 is on, driving i3 and output. Static Latch is transparent when φ = 1. Lan-Da Van VLSI-06-46
Flip-Flops Not transparent use multiple storage elements to isolate output from input. Edge-Trigger: master-slave Lan-Da Van VLSI-06-47
Master-Slave Flip-Flop φ = 0: master latch is disabled; slave latch is enabled, but master latch output is stable, so pop the output of the master. φ = 1: master latch is enabled, loading value from input; slave latch is disabled, maintaining old output value. master slave D Q φ Lan-Da Van VLSI-06-48
Latch-Based Flip-Flop The storage nodes have to be refreshed at periodic intervals Lan-Da Van VLSI-06-49
Resettable Latches and Flip-Flop Lan-Da Van VLSI-06-50
Static Latch-Based Flip-Flop: Clock Skew Problem D-Latch D-Latch The 1-1 clock overlap introduces a race condition. During the 1-1 overlap, node A is driven by both D and B. Lan-Da Van VLSI-06-51
Outlines Introduction Sequencing Methods Latches and Flip-Flops Sequential System Design Conclusion Lan-Da Van VLSI-06-52
Sequential Machine Design Procedure Step1: Specification Step2: Formulation Obtain a state diagram or state table Step3: State Assignment Obtain state table if only a state diagram is available previously and assign binary codes to the states Step4: Flip-Flop Input Equation Determination Select flip-flop types and derive flip-flop equations from next state entries in the table Step5: Output Equation Determination Derive output equations from output entries in the table Step6: Optimization Optimize the equations Step7: Technology Mapping Find circuit from equations and map to flip-flops and gate Step8: Verification Verify correctness of final design Lan-Da Van VLSI-06-53
State Transition Graphs/Tables Basic functional description of FSM. Symbolic truth table for next-state, output functions: no structure of logic; no encoding of states. State transition graph and table are functionally equivalent. Lan-Da Van VLSI-06-54
State Assignment Must find binary encoding for symbolic states state assignment. State assignment affects: combinational logic area; combinational logic delay; memory element area. May also encode some machine inputs/outputs. Lan-Da Van VLSI-06-55
Example: One-bit Counter (1/4) Easy to specify as one-bit counter. Harder to specify n-bit counter behavior. Can specify n-bit counter as structure made of 1-bit counters. State table: Count Cin Next Count Cout (Carry Out) 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 1 Lan-Da Van VLSI-06-56
One-bit Counter Implementation (2/4) XOR computes next value of this bit of counter. NAND/inverter computes carry-out. Lan-Da Van VLSI-06-57
One-bit Counter Sticks Diagram (3/4) C out V DD l1(latch) n(nand) i(inv) x(xor) l2(latch) φ 1 φ 1 φ 2 φ 2 C in V SS Lan-Da Van VLSI-06-58
n-bit Counter Structure (4/4) Lan-Da Van VLSI-06-59
Example: 01 String Recognizer (1/5) Lan-Da Van VLSI-06-60 Behavior of machine which recognizes 01 in continuous stream of bits. Operation: Waits for 0 to appear in state bit1. Goes into separate state bit2 when 0 appears. If 1 appears immediately after 0, can t have a 01 on next cycle, so can go back to wait for 0 in state bit1. Time 0 1 2 3 4 5 Input 0 0 1 1 0 1 State Bit1 Bit2 Bit2 Bit1 Bit1 Bit2 Next Bit2 Bit2 Bit1 Bit1 Bit2 Bit1 Output 0 0 1 0 0 1
State Transition Table (2/5) Operation: Waits for 0 to appear in state bit1. Goes into separate state bit2 when 0 appears. If 1 appears immediately after 0, can t have a 01 on next cycle, so can go back to wait for 0 in state bit1. Input Present Next Output 0 Bit1 Bit2 0 1 Bit1 Bit1 0 0 Bit2 Bit2 0 1 Bit2 Bit1 1 Lan-Da Van VLSI-06-61
State Transition Graph (3/5) Equivalent to state transition table: Lan-Da Van VLSI-06-62
01 Recognizer Encoding (4/5) Choose bit1=0, bit2=1, and then truth table is as follows: Input Present Next Output 0 0 1 0 1 0 0 0 0 1 1 0 1 1 0 1 Lan-Da Van VLSI-06-63
01 Recognizer Logic Implementation (5/5) After encoding, truth table can be implemented in gates: Q D Q D Lan-Da Van VLSI-06-64
Power Optimization Memory elements stop glitch propagation: Glitch Lan-Da Van VLSI-06-65
Conclusions You should learn in depth about the following topics: Latch Flip-Flop Sequencing Sequential Circuits Sequential system clock discipline Lan-Da Van VLSI-06-66