Power Reduction Techniques in the SoC Clock Network. Clock Power



Similar documents
路 論 Chapter 15 System-Level Physical Design

Alpha CPU and Clock Design Evolution

Chapter 2 Logic Gates and Introduction to Computer Architecture

Clocking. Figure by MIT OCW Spring /18/05 L06 Clocks 1

S. Venkatesh, Mrs. T. Gowri, Department of ECE, GIT, GITAM University, Vishakhapatnam, India

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Interconnection Networks

Latch Timing Parameters. Flip-flop Timing Parameters. Typical Clock System. Clocking Overhead

Topics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology

What is a System on a Chip?

Lecture 7: Clocking of VLSI Systems

TIMING-DRIVEN PHYSICAL DESIGN FOR DIGITAL SYNCHRONOUS VLSI CIRCUITS USING RESONANT CLOCKING

Implementation Details

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

Fairchild Solutions for 133MHz Buffered Memory Modules

Signal integrity in deep-sub-micron integrated circuits

NAME AND SURNAME. TIME: 1 hour 30 minutes 1/6

PowerPC Microprocessor Clock Modes

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

LOW POWER DESIGN OF DIGITAL SYSTEMS USING ENERGY RECOVERY CLOCKING AND CLOCK GATING

Timing Methodologies (cont d) Registers. Typical timing specifications. Synchronous System Model. Short Paths. System Clock Frequency

Design and analysis of flip flops for low power clocking system

Static-Noise-Margin Analysis of Conventional 6T SRAM Cell at 45nm Technology

Memory Architecture and Management in a NoC Platform

Low Power AMD Athlon 64 and AMD Opteron Processors

Multiple clock domains

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

A New Paradigm for Synchronous State Machine Design in Verilog

On-Chip Interconnection Networks Low-Power Interconnect

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Class 11: Transmission Gates, Latches

TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN

Introduction to VLSI Programming. TU/e course 2IN30. Prof.dr.ir. Kees van Berkel Dr. Johan Lukkien [Dr.ir. Ad Peeters, Philips Nat.

7a. System-on-chip design and prototyping platforms

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi

Selecting the Optimum PCI Express Clock Source

AND8336. Design Examples of On Board Dual Supply Voltage Logic Translators. Prepared by: Jim Lepkowski ON Semiconductor.

OpenSPARC T1 Processor

Switch Fabric Implementation Using Shared Memory

Software engineering for real-time systems

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Pericom PCI Express 1.0 & PCI Express 2.0 Advanced Clock Solutions

Architectures and Platforms

A Dynamic Link Allocation Router

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

PROGETTO DI SISTEMI ELETTRONICI DIGITALI. Digital Systems Design. Digital Circuits Advanced Topics

System on Chip Design. Michael Nydegger

How To Design A Chip Layout

Intel architecture. Platform Basics. White Paper Todd Langley Systems Engineer/ Architect Intel Corporation. September 2010

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

The Motherboard Chapter #5

Switched Interconnect for System-on-a-Chip Designs

Data Cables. Schmitt TTL LABORATORY ELECTRONICS II

MODULE BOUSSOLE ÉLECTRONIQUE CMPS03 Référence :

A New Chapter for System Designs Using NAND Flash Memory

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2010

McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures

Open Flow Controller and Switch Datasheet

Table 1 SDR to DDR Quick Reference

Eureka Technology. Understanding SD, SDIO and MMC Interface. by Eureka Technology Inc. May 26th, Copyright (C) All Rights Reserved

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

A 10,000 Frames/s 0.18 µm CMOS Digital Pixel Sensor with Pixel-Level Memory

Side Channel Analysis and Embedded Systems Impact and Countermeasures

DESIGN CHALLENGES OF TECHNOLOGY SCALING

A Survey on Sequential Elements for Low Power Clocking System

PLAS: Analog memory ASIC Conceptual design & development status

Serial Communications

1. Memory technology & Hierarchy

CpE358/CS381. Switching Theory and Logical Design. Class 4

Demystifying Data-Driven and Pausible Clocking Schemes

IL2225 Physical Design

Nexus: An Asynchronous Crossbar Interconnect for Synchronous System-on-Chip Designs

An Ultra-low low energy asynchronous processor for Wireless Sensor Networks

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 16 Timing and Clock Issues

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

21152 PCI-to-PCI Bridge

Clock Distribution Networks in Synchronous Digital Integrated Circuits

User s Manual HOW TO USE DDR SDRAM

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7

DDR subsystem: Enhancing System Reliability and Yield

Testing of Digital System-on- Chip (SoC)

NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

Fault Modeling. Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults. Transistor faults Summary

Central Processing Unit (CPU)

DM Segment Decoder/Driver/Latch with Constant Current Source Outputs

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

EE 42/100 Lecture 24: Latches and Flip Flops. Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad

Computer Architecture

Lecture 11. Clocking High-Performance Microprocessors

Model-Based Synthesis of High- Speed Serial-Link Transmitter Designs

8 Gbps CMOS interface for parallel fiber-optic interconnects

VLSI Design Verification and Testing

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview

On-chip clock error characterization for clock distribution system

Transcription:

Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a large load all sequential logic elements all precharged/dynamic logic distributed throughout chip, so lots of wiring» DEC 21164 s clock accounts for 40% of total chip power 3.75nF total clock load 20W (out of 50W) in clock distribution network Low Power Design for SoCs ASIC Tutorial SoC.2 1

Processor Power Budgets Datapath Memory I/O (pads) Inner circle: low end embedded microprocessor Next circle: high end CPU with on-chip cache Next circle: MPEG2 decoder ASIC Outer circle: ATM switch ASIC Low Power Design for SoCs ASIC Tutorial SoC.3 Power Reduction P clock = CV dd2 f Minimize voltage (V) using half swing clocks Minimize clock load (C)» clock gating» careful routing, distributed drivers Minimize clock frequency (f)» DET flipflops» localized to multiply frequency of clock GALS design approach Low Power Design for SoCs ASIC Tutorial SoC.4 2

Reduced Swing Vdd N-device clock Gnd Regular P-device clock Vdd Vtp Vtn Gnd Half Swing P-device clock N-device clock Low Power Design for SoCs ASIC Tutorial SoC.5 Half Swing s Advantages» as long as Vtn (Vtp) less (greater) than 1/2Vdd on-off characteristics of nfet (pfet) unchanged Disadvantages» sequential element delay approx. doubled (propagation delay and setup/hold time) due to increased on-resistance» half-swing clock generator done via charge sharing, so sleep modes problematic» not appropriate for very low voltage systems Low Power Design for SoCs ASIC Tutorial SoC.6 3

Gating Most popular method for power reduction of clock signals and fu s» often idle functional units e.g., floating point units» need circuit to generate enable signal clock enable Functional unit increases complexity of control logic timing critical to avoid clock glitches at AND gate output» additional gate delay on clock signal masking AND gate can replace a buffer in the clock distribution tree Low Power Design for SoCs ASIC Tutorial SoC.7 Glitch Free Gating B A 0 1 < < (1) Gated From < REG < Gated Gated (1) Gated (2) (2) Low Power Design for SoCs ASIC Tutorial SoC.8 4

Gated FSM Architecture Reg Comb Logic AF Latch AF - Activation Function, Which evaluates to logic 1 when clock needs to be stopped. Gated Low Power Design for SoCs ASIC Tutorial SoC.9 Tree Construction to Facilitate Gating Can insert clock gating at multiple levels in clock tree Can shut off entire subtree if all gating conditions are satisfied H-Tree Network Idle condition Gated clock Low Power Design for SoCs ASIC Tutorial SoC.10 5

Driver Distribution Comparison Dimension (cm) SD (W) DD (W) 0.25 0.052 0.051 0.5 0.206 0.101 0.75 0.464 0.152 1.0 0.825 0.202 1.25 1.29 0.253 1.5 1.85 0.303 1.75 2.53 0.354 SD = single driver, DD = distributed driver (H-tree) 3.3V supply, 100MHz frequency, 1 micron feature size Low Power Design for SoCs ASIC Tutorial SoC.11 Tree Structure Affects Gating x1 R1 x1 R1 A B x2 x1+x3 R2 R3 A B x3 R3 R2 x2+x4 (a) R4 x2 x4 (b) R4 Assuming x1, x2, x3, x4 are mutually exclusive Low Power Design for SoCs ASIC Tutorial SoC.12 6

Multiple Frequency s f < f1 < f2 < f3 System clock f f1 Bus Interface I/O controller Parallel serial interface f3 Key is in the design of the local circuits used to generate the clock signal in each module f2 RISC Core Low Power Design for SoCs ASIC Tutorial SoC.13 Frequency Multipliers Circuit Tech Input Freq Vdd Power Diss Area 1 0.8µ 50MHz 5V 16mW 0.31mm2 2 0.5µ 50MHz 3.3V 10mW 0.52mm2 DDL 3 33MHz 3.8V 49.4mW 1 Young, 1992 2 Alvarez, 1995 3 Gupta Low Power Design for SoCs ASIC Tutorial SoC.14 7

GALS Design Style Reduce clock power consumption by using a Globally Asynchronous, Locally Synchronous (GALS) design style Overheads for» local clock generation independent clock generators low power global clock reference signal with local clock frequency multipliers» global asynchronous communication Skew tolerant Low Power Design for SoCs ASIC Tutorial SoC.15 GALS Architecture f1 Bus Interface I/O controller Parallel serial interface f3 data handshake protocol f2 RISC Core Low Power Design for SoCs ASIC Tutorial SoC.16 8

Key References Alvarez, A wide bandwidth low voltage for PowerPC microprocessors, IEEE Journal of SSC, 30:383-391, April 1995. Chen, A simple technique for global clock power reduction, PSU Internal Report, 1998. Chen, power issues in system-on-a-chip designs, Proc. of Workshop on VLSI, pp. 48-53, March 1999. Friedman, distribution design in VLSI circuits: An Overview, Proc. of ISCAS, pp. 1475-1478, May 1994. Gupta, Features of differential delay line used on the embedded ultra low power Intel486 in developer.intel.com/design/intarch/papers/ddl486.htm Hemani, Lowering power consumption in clock by using GALS design style, Proc. of DAC, pp. 873-878, 1999. Kojima, Half-swing clocking scheme for 75% power saving, IEEE Journal of SSC, 30(4):432-435, April 1994. Tellez, Activity driven clock design for low power circuits, Proc. of ICCAD, pp. 62-65, Nov. 1995. Young, A clock generator with 5 to 110MHz of lock range for microprocessors, IEEE Journal of SSC, pp. 1599-1607, Nov. 1992 Low Power Design for SoCs ASIC Tutorial SoC.17 9