2015 Symposium on Asynchronous Circuits and Systems A Pausible Bisynchronous FIFO for GALS Systems Ben Keller*, Ma@hew FojCk*, Brucek Khailany* *NVIDIA CorporaCon University of California, Berkeley May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 1
The Nightmare of Global Clocking Large chips have only a handful of clock domains Each clock domain is many mm 2, even in 28nm Made up of many par$$ons Tapeout signoff currently requires distribucng balanced synchronous clocks to every flip- flop in a clock domain 25mm 1 3 2 4 6 E 7 25mm 5 8 Clock domain ParCCons then closing SETUP and HOLD on all paths in the clock domain Across dozens of PVT corners! May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 2
Globally Asynchronous, Locally Synchronous! From this May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 3
Globally Asynchronous, Locally Synchronous! Local Local Clock Generator DVCO Local Local Clock Generator DVCO to this. Internal Logic Internal Logic Partition A Partition B May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 4
Towards Fine- Grained GALS Local Local Clock Generator DVCO Internal Logic Partition A Local Local Clock Generator DVCO Internal Logic Partition B Clock Domain Crossing No global clock! Local ring oscillator generates clock in each parccon No global Cming closure Reduce Cming margin Tracks local PVT variacon May improve tracking of high- frequency noise But there s a catch May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 5
Textbook Approach: Bisynchronous FIFO Data In Data Out tx_clk FIFO Memory (Dual Port RAM) Valid Full Write Pointer Logic Write Pointer Read Pointer Brute Force Synchronizers Read Pointer Logic Empty Ready tx_clk TX Interface rx_clk RX Interface May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 6
The Brute Force Synchronizer MTBF e t Add latency uncl metastability is most likely resolved +3 cycles of latency at every parccon boundary! Need to do be@er May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 7
Pausible Clocking Basics Metastability happens when Data_Sync transitions near the clock edge. Clock R2 Data_Sync Sync_OK OK Phase Delay Phase Mutual exclusion circuit only allows G1 OR G2 to go high (never both). Data_Sync R1 R2 G1 G2 Sync_OK Clock Generator If data arrives when clock is high, let it through. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 8
Pausible Clocking Basics Metastability happens when Data_Sync transitions near the clock edge. Clock R2 Data_Sync Sync_OK OK Phase Delay Phase Mutual exclusion circuit only allows G1 OR G2 to go high (never both). Data_Sync R1 R2 G1 G2 Sync_OK Clock Generator If data arrives when clock is low, delay it until after the clock edge. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 9
Metastability in Pausible Clocks If the metastability lasts long enough, the next clock edge is delayed until it resolves safely. Data_Sync R1 G1 Sync_OK Clock R2 OK Phase Delay Phase R2 G2 Clock Generator Data_Sync Mutex Output metastability May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 10
Clock Pauses Are Rare Mutex Delay (ps) 1E+05 1E+04 1E+03 1E+02 Average impact on cycle time: <0.1% 99.99% of metastability events resolve in <100ps 96% of synchronizations have no metastability 1E+01 1E- 07 1E- 06 1E- 05 1E- 04 1E- 03 1E- 02 1E- 01 1E+00 1E+01 1E+02 1E+03 Difference in Arrival Times (ps) May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 11
SynchronizaCon With Pausible Clock Generators Pausible Clock Generator Mutex and pausible clock safely synchronize the signal. Synchronized signal arrives after (an average of) just 1 clock cycle! Clock Generator Demystifying Data Driven and Pausible Clocking Schemes, Mullins07 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 12
Pausible Clocks: Our ContribuCon Prior work in pausible clocks: Proposed pausible clocking as a technique for low- latency asynchronous interface crossings (Yun96) Integrated pausible clocks with asynchronous FIFOs to make GALS wrappers (Moore02, Fan09, others) We propose a circuit that: Pairs pausible clocking techniques with standard two- ported synchronous FIFOs Uses a novel flow- control scheme that enables low- latency asynchronous boundary crossings Integrates easily with standard toolflows May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 13
A Pausible Bisynchronous FIFO Data In Data Out tx_clk FIFO Memory (Dual Port RAM) Valid Full Write Pointer Logic Write Pointer Read Pointer Read Pointer Logic Empty Ready tx_clk rx_clk TX Interface RX Interface May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 14
A Pausible Bisynchronous FIFO Data In Data Out tx_clk FIFO Memory (Dual Port RAM) Valid Full Write Pointer Logic Write Pointer Pausible Clock Generator Pausible Clock Generator Read Pointer Read Pointer Logic Empty Ready tx_clk rx_clk TX Interface RX Interface May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 15
A Pausible Bisynchronous FIFO Pausible Clock Generator Pausible Clock Generator Pointers are synchronized via two-phase increment and acknowledge signals. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 16
Two-phase signals 1. Valid data is written to the FIFO. 2. Write pointer increment is toggled. 5. Read pointer increment is toggled. 6. After the RX clock, write pointer acknowledge is toggled. 3. The write pointer increment is synchronized. 4. Valid is asserted and data is read out of the FIFO. 7. Write pointer acknowledge is synchronized. The increment signal can now safely be reused. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 17
SimulaCon Results: Brute Force Synchronizer 8 Latency (RX Cycles) 7 6 5 4 3 2 1 Typical latency: 4 cycles Typical latency: 1.25 cycles Brute Force Synchronizer Pausible Synchronizer 0 0.5 1 2 4 TX Clock Period / RX Clock Period May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 18
Pausible Clocking Timing Constraints Minimum clock period: T 2 t r2 + t g2 + t fb Maximum insertion delay: T ins T t fb t g2 t fb RX Pointer Logic Maximum same-cycle work: T CL t fb + t g2 +T ins t g2 t r2 t ins May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 19
Pausible Clocking LimitaCons: Where to Put the Mutexes? Partition Pausible Interface Delay due to physical distance Pausible Interface Clock Generator Pausible Interface Pausible Interface May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 20
Pausible Clocking LimitaCons: Where to Put the Mutexes? OpAon 1: Mutexes at the interface Reduces maximum clock rate OpAon 2: Mutexes at the center Increases latency + Improves maximum clock rate + Can bake the mutex into a black box with the clock generator Added Delay May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 21
Playing Nice with the Tools Most of the design is synchronous and works fine with synthesis tools FIFO is a standard dual- ported memory A single custom cell is needed Custom Cming constraints at clock domain crossings May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 22
Synchronizer Comparison Design Synchronous Interface Brute Force Synchronizer Pausible Synchronizer Average Latency (cycles) Area (um 2 ) Power (mw) Energy (fj/bit) 1 4968 4.08 25.5 ~4 5005 6.03 37.7 ~1.5 4808 5.41 33.8 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 23
Summary Pausible clocks have key advantages over standard synchronizers, including low latency and zero probability of failure We designed a pausible bisynchronous FIFO that pairs pausible clocking techniques with standard two- ported synchronous FIFOs Fast asynchronous interfaces that work well with standard toolflows are a key enabling technology of fine- grained GALS design May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 24
References R. Mullins and S. Moore, DemysCfying Data- Driven and Pausible Clocking Schemes, in Proc. IEEE Symposium on Asynchronous Cir- cuits and Systems, 2007, pp. 175 185. K. Yun and R. Donohue, Pausible clocking: a first step toward heterogeneous systems, in Proc. IEEE Interna$onal Conference on Computer Design, 1996, pp. 118 123. S. Moore et al., Point to point GALS interconnect, in Proc. IEEE Symposium on Asynchronous Circuits and Systems, 2002, pp. 69 75. X. Fan et al., Analysis and opcmizacon of pausible clocking based GALS design, IEEE Interna$onal Conference Computer Design, 2009. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 25
QuesCons? May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 26
Backup Slides May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 27
A data_in Pausible Bisynchronous FIFO tx_clk Dual-Port FIFO data_out ready wptr rptr ready valid Write Pointer Logic wptr_inc rptr_inc Read Pointer Logic valid From RX Side wptr_ack LAT LAT wptr_ack_sync rptr_ack_sync LAT LAT wptr_inc_sync To TX Side wptr_ack t_pause_start r1 r2 MUTEX g1 g2 C t_pause_start tx_clk r_pause_start r1 r2 MUTEX g1 g2 C r_pause_start rx_clk tx_clk_root rx_clk_root May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 28
Mutex Circuit SR Latch Metastability Filter R1 R2 G1 G2 0 0 0 0 1 0 1 0 0 1 0 1 1 1 R1 first - > G1 high R2 first - > G2 high Only G1 or G2 can be high at once (never both) If R1 and R2 transicon high simultaneously, metastability can result The metastability filter keeps both outputs low uncl metastability resolves May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 29
SimulaCon Setup TX Clock Domain RX Clock Domain Dummy Compute (TX) Interface Dummy Compute (RX) Implemented interface in Verilog Allows direct comparison to brute- force synchronizers via drop- in replacement May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 30
Minimum Clock Period Request arrives here t fb Clock R2 OK Phase Delay Phase Minimum clock period: t g2 T 2 t r2 + t g2 + t fb t r2 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 31
InserCon Delay Clock_root Requests can arrive near the clock edge! Clock R2 t ins LAT r2 OK Phase Delay Phase Maximum insertion delay: T ins T t fb t g2 Clock_root t ins May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 32
CombinaConal Logic Delay T CL t fb Maximum same-cycle work: T CL t fb + t g2 +T ins t g2 t r2 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 33
Metastability Margin t m t m t fb Minimum clock period: T 2 t r2 + t g2 + t fb Metastability margin: t m = T 2 (t r2 + t g2 + t fb ) t r2 t g2 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 34
Average Latency Average latency: T L = T +T ins t r2 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 35