PROGETTO DI SISTEMI ELETTRONICI DIGITALI Digital Systems Design Digital Circuits Advanced Topics 1
Sequential circuit and metastability 2
Sequential circuit A Sequential circuit contains: Storage elements: Latches or Flip-Flops Inputs Outputs Combinational Logic: Implements a multiple-output switching function Inputs are signals from the outside. Outputs are signals to the outside. Other inputs, State or Present State, are signals from storage elements. The remaining outputs, Next State are inputs to storage elements State Storage Elements Combina -tional Logic Next State 3
Bistable element the basic memory block Has two stable conditions (states) Can be used to store binary symbols If Q = high, the feedback to inverter 2 will cause its output to be low, which also forces the output of inverter 1 to be high. So this is a stable state. If Q = low, the feedback to inverter 2 will cause its output to be high, which also forces the output of inverter 1 to be low. So this is the second stable state. 4
Bistable element The bistable circuit DOES NOT require any input to drive it. It will automatically put itself in one of the above stable state upon poweron and remains there forever. Can be viewed as follows: 5
Bistable element If the push is less (i.e. violation of setup and hold time), the ball shall travel to the top of the hill (i.e. output metastable), stay there for some time and return to either stable state (i.e. output becomes stable eventually). It may also happen that the ball may rise partially and come back (i.e. output may produce some glitches). 6
Latch vs. Flip-flop Latch D D Q Q En En E D Y Level sensitive Flip-flop D D Q Q Clk Clk C D Q Edge-sensitive 7
Synchronous and Asynchronous Sequential Circuits Asynchronous sequential circuits: Outputs and state change as soon as an input changes. Synchronous sequential circuits: Outputs and state change depending on a special input (clock). The term asynchronous is because the two flip-flops are not clocked by the same signal. The following circuit is also called asynchronous, 2-bit, binary up counter: 8
Synchronous design Combinational Logic (Larger circuits difficult to predict) Synchronous Logic driven by a CLOCK Registers, Flip Flops (Memory) Intermediate Inputs New Output every clock edge CLOCK Register EDGES 9
Timing Waveforms (asynchronous) A B C Y 0 ns 10 ns20 ns 30 ns 40 ns 50 ns 60 ns A B C Y 10
Timing Waveforms (synchronous) A B C D Clk Q 0 ns 10 ns 20 ns 30 ns 40 ns 50 ns 60 ns A B C D D Q Q Clk C 11
Timing Waveforms (synchronous) A B A_r B_r C D Clk Q 0 ns 10 ns 20 ns 30 ns 40 ns 50 ns 60 ns A B Clk D C D C Q Q A_r B_r C D D Q C Q_r 12
Timing Waveforms (Edge Detection) Clk Long_In Q1_r Q2_r Q2_f En_rise 0 ns 10 ns 20 ns 30 ns 40 ns 50 ns 60 ns Long_In Clk D C Q Q1_r D Q Q2_r Q2_f C En_rise 13
Gated Clocks Q(7 downto 0) Eight bit binary counter with gated clock (not recommended) "00000001" + D Q CountEn Clk C Q(7 downto 0) Eight bit binary counter with free-running clock (recommended) "00000001" + CountEn 0 1 Clk D Q C 14
Synchronous design rules All flip-flops are clocked only by the same free-running master clock. No latches No asynchronous feedback No clock signals are derived from logic gating (unless gating produced by tool, i.e., correct by construction) Asynchronous flip-flop inputs (Preset or Clear) are used only for initialization. Delays from one flip-flop to another flip-flop are designed to be less than the clock period. Asynchronous inputs pass through flip-flops (one or more) before being used elsewhere. Enable (control) signals are nominally one clock period in length. Note: Synchronous design is not always the best method, but it is to be assumed unless other methods are absolutely necessary. 15
Metastability In an asynchronous system, the relationship between data and clock is not fixed; therefore, violations of setup and hold times can occur. When this happens, the output may go to an intermediate level between its two valid states and remain there for an indefinite amount of time before resolving itself or it may simply be delayed before making a normal transition. Two stable points, one metastable point: 16
Single-Stage Synchronizer - Metastability Minimum data set-up (tsu) and hold (thd) times must be met for the register to output synchronized data. The data input to the D flip-flop is asynchronous to the clock. The arrival time of the input data relative to the clock is not known and a danger zone (decision or metastability window) is created. After a clock-to-output delay (tco), the input data appears at the Q output. If the input data enters the danger zone, the Q output is likely to be in a metastable state until the internal silicon settles to either a logic high or low. The extra time required to resolve the logic state is called resolution time (tr). 17
Timing Waveforms (Metastability) Clk D_early Q_early D_late Q_late D_bad Q_bad 0 ns 10 ns 20 ns 30 ns 40 ns 50 ns 60 ns D D Q Changes before setup time Changes after hold time Changes after setup time or before hold time Indeterminate Q Clk C 18
Resolving the metastability The Resolving Time Constant,, comes from the expression that describes the probability of a metastable event lasting longer than some time, t. If a FF is in the metastable state at time t its probability of resolve it over an additional time period t is the same regardless of the present age t The time to solve the metastability must follow an exponential probability distribution. Thus the probability that the system is still in the metastable state at time t is: 19
Calculating the MTBF Mean (Average) Time Between Failures or MTBF of a system, is the reciprocal of the failure rate in the special case when the failure rate is constant, i.e. distributed exponentially Failure rate = Metastability Rate * Probability of Metastability The probability of a metastable event lasting longer than some time is R(t): exp(-t/ ) Assume the data arrives uniformly over clock cycle T. The probability that data will arrive in W in a clock period T is: P = W/T = W fc, where: W = Metastability window and fc = Clock frequency If the data rate is fd, then the rate of metastability becomes W fc fd Failure rate = W fc fd exp(-tr/ ) MTBF = exp(tr/ ) /(W fc fd) T 20
Estimating device parameters A stable synchronous source is used for the measurement. The set-up and hold times of the input data relative to the clock can be adjusted while observing the Q output for the metastable event. The Metastability Window, W, can be determined by accurately measuring the Clock-To-Output Propagation Delay Time (tco) When the Set-up time or Hold time is violated, the Clock-To-Output Delay is increased. When the tco is measured longer than the tco_max (specified in the device data sheet), the device is considered to be in the metastable state 21
Avoiding metastability The most common way to avoid metastability is to add one or more synchronizing FFs at the signals that move from one clock domain to the other. This approach allows for an entire clock period (except for the setup time of the following flip-flop) for metastable events to resolve itself. This does however; increase the latency in observation of input. The Failure rate is now the MTFB of the first FF and the overall MTFB can be computed as: the allowed resolve time are summed up! 22
Pipelining 23
Pipelining Clock frequency is defined as the rate at which data flows into the system and appears at the output. Pipelining decreases the combinational delay by inserting registers in a long combinational path, thus increasing the clock frequency and hence a higher performance. For a perfect clock without any jitter, the clock signal reaches all banks of registers simultaneously. If FFs are ideal (no tco, tsu and th): the maximum frequency FMAX is the reciprocal of the maximum delay path through the combinational logic. 24
FF and clock non idealities FF have delays: In real time circuit s clock input to register B would come after a small delay than at register A due to wire propagation delay: TSKW. Negative clock skew is due to early clocking i.e. clocking of registers before the relevant data is successfully latched. The variation between arrival times of the consecutive clock edges at the same point on the chip is defined as clock jitter TJIT. 25
Real world circuit Path in bold refers to the path with maximum delay between any two flip-flops in the circuit: 26
Real world circuit The total delay between the two flip-flops along the path b,f,j,l,m,n,o is: Assuming equal delays across all the flip-flops in design (which might not be the actual case) we have the generalized formula for the maximum period as: The combinational delay in the above equation can be reduced by adding more FFs, thus increasing the max frequency on which a circuit can operate. 27
Pipelining again! Pipelining splits the critical path (path with maximum combinational delay) with memory elements between the clock cycles increases the calculations per second since the clock period per stage is reduced but increases the overhead by adding memory elements 28
Performance Increase Consider a big array of combinational logic between registers. The latency of the circuit is also the clock period: Consider the same circuit to be pipelined into n stages: Pipeline stage with worst delay limits the clock period: The latency is n times the clock period: Ideally, each pipeline stage has equal delay: 29
Performance Increase So the minimum possible clock period for any pipeline stage is: Thus the final latency with this ideal clock period is: We can now calculate the speed increase of a circuit after pipelining: If we specify the register and clock overhead as a fraction k of the total clock period of an unpipelined circuit, then we have: 30
Endianess 31
Little-Endian or Big-Endian: Which Is Better? In Little Endian form, assembly language instructions for accessing 1, 2, 4, or longer byte number proceed in exactly the same way for all formats. Also, because of the 1:1 relationship between address offset and byte number (offset 0 is byte 0), multiple precision math routines are correspondingly easy to write. In Big Endian form, since the higher-order byte come first, it is easy to test whether the number is positive or negative by looking at the byte at offset zero. The numbers are also stored in the order in which they are printed out, so binary to decimal routines are particularly efficient. 32
Issues Dealing with Endianess Mismatch Endianess doesn t matter on a single system. It matters only when two computers are trying to communicate. One possible solution to interface opposite-endianess peripherals is to chose addresses so that they remain constant (i.e. Address Invariance where bytes remain at same address) 33
Issues Dealing with Endianess Mismatch Endianess doesn t matter on a single system. It matters only when two computers are trying to communicate. One other solution to interface oppositeendianess peripherals is to chose bit ordering to remain constant (Data Invariance where addresses are changed). 34
Swapping Swapping byte is an alternate way to achieve endianess conversion. This mode is useful in systems where the endianess is decided by the application itself. An atomic data object that is N- bytes in size (a) is treated like an N- element byte array (b). Each element of this byte array is then mirrored (Fig.1c) to complete the byte swap of the original data object. 35
Debouncing Techniques 36
Behavior of a switch When the metallic contacts of a switch strike together, their momentum and elasticity act together to cause bounce. A Resistor-Capacitor (RC) network is probably the most common and easiest method of de-bouncing circuit: 37
Software de-bouncing Solution A: Read the Switch after sufficient time allowing the bounces to settle down. The only downside with is slow response time; this approach would fail if user desires to operate the switch at a rate much faster than 500 ms Solution B: Interrupt the CPU on switch activation and de-bounce in ISR. Solution C: Use a Counter to eliminate the noise and validate switch state. The counter counts up as long as the signal is Low, and is reset when the signal is High. If the counter reaches a certain fixed value, which should be one or two times bigger noise pulses, this means that the current pulse is a valid pulse. 38
Commercial debouncer For the designs that do not include de-bounce circuitry on external inputs, system may choose to use external de-bounce ICs. When the input does not equal the output, the XNOR gate issues a counter reset. When the switch input state is stable for the full qualification period, the counter clocks the flip-flop, updating the output. The under-voltage lockout circuitry ensures that the outputs are at the correct state on power-up. 39
Clock gating and reset strategies 40
Clock gating In the traditional synchronous design style, the system clock is connected to the clock pin on every flip-flop in the design. This results in three major components of power consumption: 1. Power consumed by combinatorial logic whose values are changing on each clock edge (due to flops driving those combo cells). 2. Power consumed by flip-flops (this has non-zero value even if the inputs to the flip-flops, and therefore, the internal state of the flip-flops, is not changing). 3. Power consumed by the clock tree buffers in the design. Gating the clock path substantially reduces the power consumed by a Flip Flop. Gate clocking imposes that all enable signals be held constant from the active (rising) edge of the clock until the inactive (falling) edge of the clock to avoid truncating the generated clock pulse prematurely or generating multiple clock pulses (or glitches in clock): discard! 41
Clock gating The latch-based clock gating style adds a level-sensitive latch to the design to hold the enable signal from the active edge of the clock until the inactive edge of the clock. The latch captures the state of the enable signal and holds it until the complete clock pulse has been generated. The enable signal need only be stable around the rising edge of the clock. Only one input of the gate that turns the clock on and off changes at a time, ensuring that the circuit is free from any glitches or spikes on the output. 42
Synchronous reset Synchronous resets are based on the premise that the reset signal will only affect or reset the state of the flip-flop on the active edge of a clock. Advantages of Using Synchronous Resets Synchronous resets generally insure that the circuit is 100% synchronous. Synchronous reset logic will synthesize to smaller flip-flops, particularly if the reset is gated with the logic generating the Flop input. Synchronous resets ensure that reset can only occur at an active clock edge. The clock works as a filter for small reset glitches. Disadvantages Synchronous resets may need a pulse stretcher to guarantee a reset pulse width wide enough to ensure reset is present during an active edge of the clock. A synchronous reset will require a clock in order to reset the circuit. This may be a problem in some case where a gated clock is used to save power. 43
Synchronous reset 44
Asynchronous Reset Asynchronous reset flip-flops incorporate a reset pin into the flip-flop design. The most obvious advantage favoring asynchronous resets is that the circuit can be reset with or without a clock present. The biggest problem with asynchronous resets is that they are asynchronous, both at the assertion and at the de-assertion. The assertion is a non issue, the deassertion is the issue. If the asynchronous reset is released at or near the active clock edge of a flip-flop, the output of the flip-flop could go metastable. Another problem that an asynchronous reset can have, depending on its source, is spurious resets due to noise or glitches on the board or system reset. 45
Reset synchronizer The reset synchronizer logic is designed to take advantage of the best of both asynchronous and synchronous reset styles. An external reset signal asynchronously resets a pair of flip-flops, which in turn drive the master reset signal asynchronously through the reset buffer tree to the rest of the flip-flops in the design. The entire design will be asynchronously reset. 46
Reset Glitch Filtering 47
Multiple clock domains 48
Multiple Clock domains Designs with multiple clocks can have: clocks with different frequencies and/or clocks with same frequency but different phases between them. Metastability issues may arise; system desing must be partitioned so that each module should work on one clock only. A synchronizer module has to be made for all signals that cross from one clock domain to another 49
Synchronous Clock domain crossing Clocks originating from the same clock-root and having a known phase and frequency relationship between them are known as synchronous clocks. A clock crossing between such clocks is known as a synchronous clock domain crossing. Depending on frequency and phase relationship, synchronizers may be needed or not! Synchronous Clock domain crossing can be divided into several categories: Clocks with the same frequency and zero phase difference: no domain cross! 50
Synchronous Clock domain crossing Synchronous Clock domain crossing can be divided into several categories : Clocks with the same frequency and constant phase difference: tighter constraint on the combinational logic delay due to smaller setup/hold margins. Integer multiple clocks: the minimum possible phase difference between the active edges of the two clocks would always be equal to the time period of the fast clock, i.e. one complete cycle of the faster clock would always be available for sampling 51
Synchronous Clock domain crossing Synchronous Clock domain crossing can be divided into several categories. Rational multiple clocks: the minimum phase difference between the two clocks can be small enough to cause metastability: synchronization needs to be done! 52
Transfer of control signals Two or more flip-flops are cascaded to form a synchronizing circuit As previously seen, if the first flop of a synchronizer produces a metastable output, the metastability may get resolved before it is sampled by the second flip flop. This method does not guarantee that the output of the second flipflop will go metastable but it does decrease the probability of metastability. Adding more flops to the synchronizer will further reduce the probability of metastability. 53
Transfer of data signals Synchronous FIFO Simple Synchronous FIFO architecture: reading and writing is done on the same clock. The read and write addresses are generated by two pointers. A valid write enable increments the write pointer and a valid read enable increments the read pointer. A Status Block generates the fifo_empty and fifo_full signals. The Dual Port Memory (DPRAM) can have either synchronous (an explicit read signal is provided before the FIFO output is valid) or asynchronous reads (valid data is available as soon as it is written). 54
FIFO_FULL and FIFO_empty FIFO is either full or empty when read-pointer equals to the write pointer: it is necessary to distinguish between these two conditions. FIFO becomes full when a write causes both the pointers to become equal in the next clock. This makes the following condition for assertion of fifo_full signal: fifo _ full=(read _ pointer ==(write _ pointer +1))AND "write An alternative approach exploits a counter that constantly indicates the number of full or empty locations left in the FIFO: additional hardware is neede! 55
Transfer of data signals Asynchronous FIFO Asynchronous FIFO is used to transfer data across two asynchronous clock domains. Unlike handshake signaling, asynchronous FIFO is used in case of performance critical designs. where clock latency is a factor rather than system resources. The same approach of synchronous FIFO can be exploited with special care taken for FIFO empty and FIFO Full signal generation to avoid metastability conditions. 56
Transfer of data signals Asynchronous FIFO 57
FIFO full timings 58
FIFO full timings Since a typical synchronizer circuit consists of at least two FFs, synchronizing read pointer on write clock will result in changed read pointer reflected after two write clocks. This results in blocking additional writes on the FIFO for additional cycles but is harmles 59
FIFO empty timings 60
FIFO empty timings For the FIFO Empty calculation, write pointer is synchronized to the read clock and compared against the read pointer. Due to this, read side sees delayed writes (two clock delayed signal), and would still indicate FIFO empty even though it actually has some data, but it is harmless. 61
Transfer of data signals Asynchronous FIFO Suppose full and empty signals are generated using a counter which is changing from FFF to 000. Metastability can be avoided by synchronizing the counter, but this may still get sampled values that are widely off the mark (e.g. sampling counter in the middle of the updating phase). A possible solution is to count in Gray-code, with a number changing by one bit as it proceeds from one number to the next. Synchronizing gray counter will rarely result in sampled counter value getting metastable and secondly the value sampled will have at most one bit error Synchronizing read or write pointer on write clock will result in changed read or write pointer reflected after two (N) write clocks This results in blocking additional writes or read on the FIFO for additional cycles but is harmless! 62
Gray Code Implementation of FIFO Pointers STEP I : Convert the Gray value to Binary value. STEP II : Increment the Binary value depending on some condition. STEP III: Convert the Binary value back to Gray. STEP IV: Store the final Gray value of the counter in a register 63