Demystifying Data-Driven and Pausible Clocking Schemes



Similar documents
A Pausible Bisynchronous FIFO for GALS Systems

Low latency synchronization through speculation

Set-Reset (SR) Latch

PROGETTO DI SISTEMI ELETTRONICI DIGITALI. Digital Systems Design. Digital Circuits Advanced Topics

White Paper Understanding Metastability in FPGAs

Power Reduction Techniques in the SoC Clock Network. Clock Power

To design digital counter circuits using JK-Flip-Flop. To implement counter using 74LS193 IC.

Clocking. Figure by MIT OCW Spring /18/05 L06 Clocks 1

Sequential Circuits. Combinational Circuits Outputs depend on the current inputs

DDR subsystem: Enhancing System Reliability and Yield

A New Paradigm for Synchronous State Machine Design in Verilog

Lecture 11: Sequential Circuit Design

A Low-Latency Asynchronous Interconnection Network with Early Arbitration Resolution

Switched Interconnect for System-on-a-Chip Designs

EE 42/100 Lecture 24: Latches and Flip Flops. Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad

Asynchronous Bypass Channels

Timing Methodologies (cont d) Registers. Typical timing specifications. Synchronous System Model. Short Paths. System Clock Frequency

A high-speed interconnect network using ternary logic

Topics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology

DIGITAL COUNTERS. Q B Q A = 00 initially. Q B Q A = 01 after the first clock pulse.

Nexus: An Asynchronous Crossbar Interconnect for Synchronous System-on-Chip Designs

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

PROGETTO DI SISTEMI ELETTRONICI DIGITALI. Digital Systems Design. Digital Circuits Advanced Topics

Engr354: Digital Logic Circuits

A Tree Arbiter Cell for High Speed Resource Sharing in Asynchronous Environments

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

7. Latches and Flip-Flops

Lecture-3 MEMORY: Development of Memory:

Experiment # 9. Clock generator circuits & Counters. Eng. Waleed Y. Mousa

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Clock Distribution Networks in Synchronous Digital Integrated Circuits

Interconnection Networks

Memory Elements. Combinational logic cannot remember

Lecture 10: Sequential Circuits

Latch Timing Parameters. Flip-flop Timing Parameters. Typical Clock System. Clocking Overhead

In-Vehicle Networking

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX

Theory of Logic Circuits. Laboratory manual. Exercise 3

An Ultra-low low energy asynchronous processor for Wireless Sensor Networks

路 論 Chapter 15 System-Level Physical Design

Switch Fabric Implementation Using Shared Memory

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

Module 3: Floyd, Digital Fundamental

Modeling Latches and Flip-flops

Linux Block I/O Scheduling. Aaron Carroll December 22, 2007

WEEK 8.1 Registers and Counters. ECE124 Digital Circuits and Systems Page 1

The Future of Multi-Clock Systems

ECE380 Digital Logic

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 16 Timing and Clock Issues

Design Verification & Testing Design for Testability and Scan

Pausible Clocking: A First Step Toward Heterogeneous Systems æ

Measuring Metastability

Hello, and welcome to this presentation of the STM32 SDMMC controller module. It covers the main features of the controller which is used to connect

SoC IP Interfaces and Infrastructure A Hybrid Approach

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

RETRIEVING DATA FROM THE DDC112

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter

Systolic Computing. Fundamentals

EE552. Advanced Logic Design and Switching Theory. Metastability. Ashirwad Bahukhandi. (Ashirwad Bahukhandi)

Lecture 10 Sequential Circuit Design Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus

Design Example: Counters. Design Example: Counters. 3-Bit Binary Counter. 3-Bit Binary Counter. Other useful counters:

CHAPTER 11 LATCHES AND FLIP-FLOPS

Sequential Logic: Clocks, Registers, etc.

Lecture 7: Clocking of VLSI Systems

PipeCloud : Using Causality to Overcome Speed-of-Light Delays in Cloud-Based Disaster Recovery. Razvan Ghitulete Vrije Universiteit

On-Chip Interconnection Networks Low-Power Interconnect

The I2C Bus. NXP Semiconductors: UM10204 I2C-bus specification and user manual HAW - Arduino 1

Timing Errors and Jitter

Alpha CPU and Clock Design Evolution

Chapter 11 I/O Management and Disk Scheduling

PowerPC Microprocessor Clock Modes

Stress Testing Technologies for Citrix MetaFrame. Michael G. Norman, CEO December 5, 2001

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip

Quality of Service (QoS) for Asynchronous On-Chip Networks

Local Area Networks transmission system private speedy and secure kilometres shared transmission medium hardware & software

Mixed-Criticality Systems Based on Time- Triggered Ethernet with Multiple Ring Topologies. University of Siegen Mohammed Abuteir, Roman Obermaisser

PART III. OPS-based wide area networks

CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE

Communication Networks. MAP-TELE 2011/12 José Ruela

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Digital Logic Design Sequential circuits

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7

Interconnection Network Design

Avalon Interface Specifications

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

Using Altera MAX Series as Microcontroller I/O Expanders

Elettronica dei Sistemi Digitali Costantino Giaconia SERIAL I/O COMMON PROTOCOLS

Topics. Flip-flop-based sequential machines. Signals in flip-flop system. Flip-flop rules. Latch-based machines. Two-sided latch constraint

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Computer Systems Structure Input/Output

Transcription:

Demystifying Data-Driven and Pausible Clocking Schemes Robert Mullins Computer Architecture Group Computer Laboratory, University of Cambridge ASYNC 2007, 13 th IEEE International Symposium on Asynchronous Circuits and Systems

System-Timing: Emerging Challenges Higher-level control, timing and scheduling is naturally event-driven Current shift is from complex monolithic designs to networks of energy efficient cores Distinct block and systemlevel timing challenges Network-level timing Physically distributed Activity may be sparse Interconnect delay and power are significant Significant variations in temperature, supply voltage and process parameters 2

Combining Local and Global Approaches to Timing Synchronization free approaches Coping with metastability Timing-Safe Allocate a fixed period of time for metastability to resolve, e.g. two flip-flop synchronizer Value-Safe Wait for metastability to resolve, e.g. clock stretching or pausing techniques Clock is generated locally Value-safe ideas are less well understood, avoided by industry 3

Advantages of a value-safe approach Efficiency Synchronization delay is minimized Opportunities for optimization Robustness Inherently robust, no trade-off against performance. Only way to guarantee data is never lost, no MTBF. Could still have functional failures if we are delayed too long don t hit performance requirements Transparency Synchronous block is unaffected by clocking wrapper. Less true for traditional synchronization and clockgating approaches. Simplicity and modularity I aim to illustrate how simple these schemes are 4

Adding an asynchronous interface to a clock generator CLOCK 5

Adding an asynchronous interface to a clock generator Req C Ack CLOCK 6

Adding an asynchronous interface to a clock generator C CLOCK 7

Adding an asynchronous interface to a clock generator Req MUTEX Grant C CLOCK 8

Input register driven by a pausible clock 9

Req Req MUTEX Grant C C Ack CLOCK CLOCK Data-Driven Clock - May need to add a mechanism to ensure block receives enough clock edges, e.g. to flush pipeline Pausible Clock - Need to add an explicit sleep mechanism if we want to halt clock generator during periods of inactivity Helps classify and understand existing techniques. In reality, the design space is a continuum 10

Stretchable Clocks A type of data-driven clock 1. Rising clock edge is generated 2. Stretch signal may be asserted (synchronously) in response to clk+ 3. Low-phase of clock is stretched until some operation has completed and stretch signal is removed 11

Stretchable Clocks Req C Ack CLOCK 12

Stretchable Clocks Ack Req C CLOCK 13

Stretchable Clocks Stretch Ack C Req CLOCK 14

Stretchable Clocks Stretch Ack Stretch delays Ack+ C Req CLOCK 15

Stretchable Clocks Stretch Ack C Req CLOCK 16

Arbitrated Inputs Input Ports At most one input can be served per cycle Synchronised Inputs Cannot proceed until multiple inputs are ready Sampled Inputs Can progress with a variable number of data inputs (or none) Need to also choose event to trigger sampling of inputs Paper provides implementation details for each input port type for pausible and data-driven clock generators 17

Output Ports Scheduled Ensure data is output on a particular clock cycle, stall until data is consumed Registered Addition of an output register allows next computation to proceed while data is consumed Polled Sample output port ready signal and take appropriate action. Clock period is only ever extended to allow metastability to resolve, not because output is blocked. 18

A GALS Wrapper Example Free running clock Asynchronous input we know nothing about when data will arrive For simplicity, lets assume we can always accept new data Registered output feeding asynchronous FIFO Simple to combine clock generator, input and output ports 19

A GALS Wrapper Example: Step 1. Local clock generator with H/S interface 20

A GALS Wrapper Example: Step 2. Pausible Clock Template 21

A GALS Wrapper Example: Step 3. Provide registered output port support (stretchable clock template) 22

A GALS Wrapper Example: Step 4. 23

Data-Driven Clocking for On-Chip Networks Why is global synchrony limiting for on-chip networks? Reconfigurable networks, adaptive low-voltage interconnect drivers, irregular topologies,. Problem with traditional synchronization techniques Latency (could easily double best-case latency, our routers are single-cycle support VCs < 30FO4) Problems with fully-asynchronous implementations Latency (for the router designs we have examined) More difficult to speculate? Scheduling is expensive? 24

Data-Driven Clocking for On-Chip Routers Router should be clocked when one or more inputs are valid (or flits are buffered) Elevator analogy Free running (paternoster) elevator Chain of open compartments Must synchronise before you jump on! Traditional elevator (data-driven clock) Wait for someone to arrive Close doors, decide who is in and who is out Metastability issue again (potentially painful!) 25

Data-Driven Clock with Sampled Inputs Incoming data Either admitted or locked out Sample inputs when at least one input is ready (and clock is low) Local Clock Generator Template Assert Lock (Close Lift Doors) 26

Clock Tree Insertion Delays Delay from root to leaf of clock tree can be considerable (certainly non-zero!) If every clock cycle is the same, this clock insertion delay is not normally an issue If we stretch the clock the insertion delay must be considered in our timing analysis (also true for clock gating in synchronous world) Not difficult to handle, but can increase time required to admit new data 27

Clock Tree Insertion Delays Can place logic here 28

Clock Tree Insertion Delays How do we handle multi-cycle insertion delays? In practice, we would want to avoid very large synchronous blocks Need to ensure we admit data on the correct clock cycle Cannot cheat and promote data! We simply remember on which clock cycle data has been scheduled to be admitted 29

Summary Value-safe techniques are simple and robust Powerful framework for composing synchronous subsystems Build efficient event-driven global communication and scheduling infrastructure? Scope for supporting low-power techniques? (selftimed power-gating, DVFS support, timingspeculation ) Scope for exploiting event-driven scheduling and clocking at system-level. Synchronization costs are low enough to prompt use in on-chip network applications More in the paper, aims to be a useful survey and hopefully fills some gaps too. 30

Thank You! 31