A Pausible Bisynchronous FIFO for GALS Systems



Similar documents
Demystifying Data-Driven and Pausible Clocking Schemes

PROGETTO DI SISTEMI ELETTRONICI DIGITALI. Digital Systems Design. Digital Circuits Advanced Topics

Master/Slave Flip Flops

PROGETTO DI SISTEMI ELETTRONICI DIGITALI. Digital Systems Design. Digital Circuits Advanced Topics

White Paper Understanding Metastability in FPGAs

FPGA Clocking. Clock related issues: distribution generation (frequency synthesis) multiplexing run time programming domain crossing

Lecture 7: Clocking of VLSI Systems

Low latency synchronization through speculation

Timing Methodologies (cont d) Registers. Typical timing specifications. Synchronous System Model. Short Paths. System Clock Frequency

Lecture 11: Sequential Circuit Design

CLOCK DOMAIN CROSSING CLOSING THE LOOP ON CLOCK DOMAIN FUNCTIONAL IMPLEMENTATION PROBLEMS

Lab #5: Design Example: Keypad Scanner and Encoder - Part 1 (120 pts)

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

Lecture 10: Sequential Circuits

Asynchronous Bypass Channels

Having read this workbook you should be able to: recognise the arrangement of NAND gates used to form an S-R flip-flop.

EE 42/100 Lecture 24: Latches and Flip Flops. Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad

8 Gbps CMOS interface for parallel fiber-optic interconnects

Pausible Clocking: A First Step Toward Heterogeneous Systems æ

ETEC 2301 Programmable Logic Devices. Chapter 10 Counters. Shawnee State University Department of Industrial and Engineering Technologies

TIMING-DRIVEN PHYSICAL DESIGN FOR DIGITAL SYNCHRONOUS VLSI CIRCUITS USING RESONANT CLOCKING

Lesson 12 Sequential Circuits: Flip-Flops

Clocking. Figure by MIT OCW Spring /18/05 L06 Clocks 1

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

To design digital counter circuits using JK-Flip-Flop. To implement counter using 74LS193 IC.

Sequential Circuits. Combinational Circuits Outputs depend on the current inputs

Latch Timing Parameters. Flip-flop Timing Parameters. Typical Clock System. Clocking Overhead

Power Reduction Techniques in the SoC Clock Network. Clock Power

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

WEEK 8.1 Registers and Counters. ECE124 Digital Circuits and Systems Page 1

Memory Elements. Combinational logic cannot remember

EE552. Advanced Logic Design and Switching Theory. Metastability. Ashirwad Bahukhandi. (Ashirwad Bahukhandi)

DDR subsystem: Enhancing System Reliability and Yield

IE1204 Digital Design F12: Asynchronous Sequential Circuits (Part 1)

Lecture 10: Multiple Clock Domains

Abstract. Cycle Domain Simulator for Phase-Locked Loops

Experiment # 9. Clock generator circuits & Counters. Eng. Waleed Y. Mousa

Design Verification & Testing Design for Testability and Scan

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

Design and Verification of Nine port Network Router

Chapter 2 Clocks and Resets

Sequential Logic: Clocks, Registers, etc.

Sequential Logic Design Principles.Latches and Flip-Flops

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7

Qsys and IP Core Integration

Theory of Logic Circuits. Laboratory manual. Exercise 3

Chapter 9 Latches, Flip-Flops, and Timers

Measuring Metastability

Engr354: Digital Logic Circuits

Lecture 10 Sequential Circuit Design Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

BINARY CODED DECIMAL: B.C.D.

路 論 Chapter 15 System-Level Physical Design

Flip-Flops and Sequential Circuit Design. ECE 152A Winter 2012

Flip-Flops and Sequential Circuit Design

TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN

Registers & Counters

Two-Phase Clocking Scheme for Low-Power and High- Speed VLSI

Lecture-3 MEMORY: Development of Memory:

Topics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Model-Based Synthesis of High- Speed Serial-Link Transmitter Designs

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION

Nexus: An Asynchronous Crossbar Interconnect for Synchronous System-on-Chip Designs

COMBINATIONAL and SEQUENTIAL LOGIC CIRCUITS Hardware implementation and software design

A New Paradigm for Synchronous State Machine Design in Verilog

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Switch Fabric Implementation Using Shared Memory

ASYNCHRONOUS COUNTERS

University of Texas at Dallas. Department of Electrical Engineering. EEDG Application Specific Integrated Circuit Design

An All-Digital Phase-Locked Loop with High Resolution for Local On-Chip Clock Synthesis

How To Design A Chip Layout

White Paper Utilizing Leveling Techniques in DDR3 SDRAM Memory Interfaces

7. Latches and Flip-Flops

Counters. Present State Next State A B A B

Serial Communications

Chapter 13: Verification

Alpha CPU and Clock Design Evolution

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: ext: Sequential Circuit

VHDL GUIDELINES FOR SYNTHESIS

IL2225 Physical Design

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 16 Timing and Clock Issues

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX

Digital Logic Design Sequential circuits

Digital Systems Based on Principles and Applications of Electrical Engineering/Rizzoni (McGraw Hill

PowerPC Microprocessor Clock Modes

Combinational Logic Design Process

SEQUENTIAL CIRCUITS. Block diagram. Flip Flop. S-R Flip Flop. Block Diagram. Circuit Diagram

A 1.62/2.7/5.4 Gbps Clock and Data Recovery Circuit for DisplayPort 1.2 with a single VCO

ARM Webinar series. ARM Based SoC. Abey Thomas

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

A Low-Latency Asynchronous Interconnection Network with Early Arbitration Resolution

AN 223: PCI-to-DDR SDRAM Reference Design

Module 3: Floyd, Digital Fundamental

Counters and Decoders

Hunting Asynchronous CDC Violations in the Wild

Digital Fundamentals

Transcription:

2015 Symposium on Asynchronous Circuits and Systems A Pausible Bisynchronous FIFO for GALS Systems Ben Keller*, Ma@hew FojCk*, Brucek Khailany* *NVIDIA CorporaCon University of California, Berkeley May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 1

The Nightmare of Global Clocking Large chips have only a handful of clock domains Each clock domain is many mm 2, even in 28nm Made up of many par$$ons Tapeout signoff currently requires distribucng balanced synchronous clocks to every flip- flop in a clock domain 25mm 1 3 2 4 6 E 7 25mm 5 8 Clock domain ParCCons then closing SETUP and HOLD on all paths in the clock domain Across dozens of PVT corners! May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 2

Globally Asynchronous, Locally Synchronous! From this May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 3

Globally Asynchronous, Locally Synchronous! Local Local Clock Generator DVCO Local Local Clock Generator DVCO to this. Internal Logic Internal Logic Partition A Partition B May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 4

Towards Fine- Grained GALS Local Local Clock Generator DVCO Internal Logic Partition A Local Local Clock Generator DVCO Internal Logic Partition B Clock Domain Crossing No global clock! Local ring oscillator generates clock in each parccon No global Cming closure Reduce Cming margin Tracks local PVT variacon May improve tracking of high- frequency noise But there s a catch May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 5

Textbook Approach: Bisynchronous FIFO Data In Data Out tx_clk FIFO Memory (Dual Port RAM) Valid Full Write Pointer Logic Write Pointer Read Pointer Brute Force Synchronizers Read Pointer Logic Empty Ready tx_clk TX Interface rx_clk RX Interface May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 6

The Brute Force Synchronizer MTBF e t Add latency uncl metastability is most likely resolved +3 cycles of latency at every parccon boundary! Need to do be@er May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 7

Pausible Clocking Basics Metastability happens when Data_Sync transitions near the clock edge. Clock R2 Data_Sync Sync_OK OK Phase Delay Phase Mutual exclusion circuit only allows G1 OR G2 to go high (never both). Data_Sync R1 R2 G1 G2 Sync_OK Clock Generator If data arrives when clock is high, let it through. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 8

Pausible Clocking Basics Metastability happens when Data_Sync transitions near the clock edge. Clock R2 Data_Sync Sync_OK OK Phase Delay Phase Mutual exclusion circuit only allows G1 OR G2 to go high (never both). Data_Sync R1 R2 G1 G2 Sync_OK Clock Generator If data arrives when clock is low, delay it until after the clock edge. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 9

Metastability in Pausible Clocks If the metastability lasts long enough, the next clock edge is delayed until it resolves safely. Data_Sync R1 G1 Sync_OK Clock R2 OK Phase Delay Phase R2 G2 Clock Generator Data_Sync Mutex Output metastability May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 10

Clock Pauses Are Rare Mutex Delay (ps) 1E+05 1E+04 1E+03 1E+02 Average impact on cycle time: <0.1% 99.99% of metastability events resolve in <100ps 96% of synchronizations have no metastability 1E+01 1E- 07 1E- 06 1E- 05 1E- 04 1E- 03 1E- 02 1E- 01 1E+00 1E+01 1E+02 1E+03 Difference in Arrival Times (ps) May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 11

SynchronizaCon With Pausible Clock Generators Pausible Clock Generator Mutex and pausible clock safely synchronize the signal. Synchronized signal arrives after (an average of) just 1 clock cycle! Clock Generator Demystifying Data Driven and Pausible Clocking Schemes, Mullins07 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 12

Pausible Clocks: Our ContribuCon Prior work in pausible clocks: Proposed pausible clocking as a technique for low- latency asynchronous interface crossings (Yun96) Integrated pausible clocks with asynchronous FIFOs to make GALS wrappers (Moore02, Fan09, others) We propose a circuit that: Pairs pausible clocking techniques with standard two- ported synchronous FIFOs Uses a novel flow- control scheme that enables low- latency asynchronous boundary crossings Integrates easily with standard toolflows May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 13

A Pausible Bisynchronous FIFO Data In Data Out tx_clk FIFO Memory (Dual Port RAM) Valid Full Write Pointer Logic Write Pointer Read Pointer Read Pointer Logic Empty Ready tx_clk rx_clk TX Interface RX Interface May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 14

A Pausible Bisynchronous FIFO Data In Data Out tx_clk FIFO Memory (Dual Port RAM) Valid Full Write Pointer Logic Write Pointer Pausible Clock Generator Pausible Clock Generator Read Pointer Read Pointer Logic Empty Ready tx_clk rx_clk TX Interface RX Interface May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 15

A Pausible Bisynchronous FIFO Pausible Clock Generator Pausible Clock Generator Pointers are synchronized via two-phase increment and acknowledge signals. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 16

Two-phase signals 1. Valid data is written to the FIFO. 2. Write pointer increment is toggled. 5. Read pointer increment is toggled. 6. After the RX clock, write pointer acknowledge is toggled. 3. The write pointer increment is synchronized. 4. Valid is asserted and data is read out of the FIFO. 7. Write pointer acknowledge is synchronized. The increment signal can now safely be reused. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 17

SimulaCon Results: Brute Force Synchronizer 8 Latency (RX Cycles) 7 6 5 4 3 2 1 Typical latency: 4 cycles Typical latency: 1.25 cycles Brute Force Synchronizer Pausible Synchronizer 0 0.5 1 2 4 TX Clock Period / RX Clock Period May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 18

Pausible Clocking Timing Constraints Minimum clock period: T 2 t r2 + t g2 + t fb Maximum insertion delay: T ins T t fb t g2 t fb RX Pointer Logic Maximum same-cycle work: T CL t fb + t g2 +T ins t g2 t r2 t ins May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 19

Pausible Clocking LimitaCons: Where to Put the Mutexes? Partition Pausible Interface Delay due to physical distance Pausible Interface Clock Generator Pausible Interface Pausible Interface May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 20

Pausible Clocking LimitaCons: Where to Put the Mutexes? OpAon 1: Mutexes at the interface Reduces maximum clock rate OpAon 2: Mutexes at the center Increases latency + Improves maximum clock rate + Can bake the mutex into a black box with the clock generator Added Delay May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 21

Playing Nice with the Tools Most of the design is synchronous and works fine with synthesis tools FIFO is a standard dual- ported memory A single custom cell is needed Custom Cming constraints at clock domain crossings May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 22

Synchronizer Comparison Design Synchronous Interface Brute Force Synchronizer Pausible Synchronizer Average Latency (cycles) Area (um 2 ) Power (mw) Energy (fj/bit) 1 4968 4.08 25.5 ~4 5005 6.03 37.7 ~1.5 4808 5.41 33.8 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 23

Summary Pausible clocks have key advantages over standard synchronizers, including low latency and zero probability of failure We designed a pausible bisynchronous FIFO that pairs pausible clocking techniques with standard two- ported synchronous FIFOs Fast asynchronous interfaces that work well with standard toolflows are a key enabling technology of fine- grained GALS design May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 24

References R. Mullins and S. Moore, DemysCfying Data- Driven and Pausible Clocking Schemes, in Proc. IEEE Symposium on Asynchronous Cir- cuits and Systems, 2007, pp. 175 185. K. Yun and R. Donohue, Pausible clocking: a first step toward heterogeneous systems, in Proc. IEEE Interna$onal Conference on Computer Design, 1996, pp. 118 123. S. Moore et al., Point to point GALS interconnect, in Proc. IEEE Symposium on Asynchronous Circuits and Systems, 2002, pp. 69 75. X. Fan et al., Analysis and opcmizacon of pausible clocking based GALS design, IEEE Interna$onal Conference Computer Design, 2009. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 25

QuesCons? May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 26

Backup Slides May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 27

A data_in Pausible Bisynchronous FIFO tx_clk Dual-Port FIFO data_out ready wptr rptr ready valid Write Pointer Logic wptr_inc rptr_inc Read Pointer Logic valid From RX Side wptr_ack LAT LAT wptr_ack_sync rptr_ack_sync LAT LAT wptr_inc_sync To TX Side wptr_ack t_pause_start r1 r2 MUTEX g1 g2 C t_pause_start tx_clk r_pause_start r1 r2 MUTEX g1 g2 C r_pause_start rx_clk tx_clk_root rx_clk_root May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 28

Mutex Circuit SR Latch Metastability Filter R1 R2 G1 G2 0 0 0 0 1 0 1 0 0 1 0 1 1 1 R1 first - > G1 high R2 first - > G2 high Only G1 or G2 can be high at once (never both) If R1 and R2 transicon high simultaneously, metastability can result The metastability filter keeps both outputs low uncl metastability resolves May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 29

SimulaCon Setup TX Clock Domain RX Clock Domain Dummy Compute (TX) Interface Dummy Compute (RX) Implemented interface in Verilog Allows direct comparison to brute- force synchronizers via drop- in replacement May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 30

Minimum Clock Period Request arrives here t fb Clock R2 OK Phase Delay Phase Minimum clock period: t g2 T 2 t r2 + t g2 + t fb t r2 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 31

InserCon Delay Clock_root Requests can arrive near the clock edge! Clock R2 t ins LAT r2 OK Phase Delay Phase Maximum insertion delay: T ins T t fb t g2 Clock_root t ins May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 32

CombinaConal Logic Delay T CL t fb Maximum same-cycle work: T CL t fb + t g2 +T ins t g2 t r2 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 33

Metastability Margin t m t m t fb Minimum clock period: T 2 t r2 + t g2 + t fb Metastability margin: t m = T 2 (t r2 + t g2 + t fb ) t r2 t g2 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 34

Average Latency Average latency: T L = T +T ins t r2 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 35