McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures



Similar documents
Low Power AMD Athlon 64 and AMD Opteron Processors

System on Chip Design. Michael Nydegger

DESIGN CHALLENGES OF TECHNOLOGY SCALING

Design and analysis of flip flops for low power clocking system

Class 11: Transmission Gates, Latches

International Journal of Electronics and Computer Science Engineering 1482

Here we introduced (1) basic circuit for logic and (2)recent nano-devices, and presented (3) some practical issues on nano-devices.

路 論 Chapter 15 System-Level Physical Design

Fault Modeling. Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults. Transistor faults Summary

1. Memory technology & Hierarchy

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Low leakage and high speed BCD adder using clock gating technique

CO2005: Electronics I (FET) Electronics I, Neamen 3th Ed. 1

Power Reduction Techniques in the SoC Clock Network. Clock Power

Reconfigurable ECO Cells for Timing Closure and IR Drop Minimization. TingTing Hwang Tsing Hua University, Hsin-Chu

Alpha CPU and Clock Design Evolution

Integrated Circuits & Systems

Leakage Power Reduction Using Sleepy Stack Power Gating Technique

VLSI Design Verification and Testing

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX

TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN

EVALUATING POWER MANAGEMENT CAPABILITIES OF LOW-POWER CLOUD PLATFORMS. Jens Smeds

A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems

Introduction to CMOS VLSI Design

How To Design A Chip Layout

Chapter 10 Advanced CMOS Circuits

Introduction to VLSI Programming. TU/e course 2IN30. Prof.dr.ir. Kees van Berkel Dr. Johan Lukkien [Dr.ir. Ad Peeters, Philips Nat.

Three-Phase Dual-Rail Pre-Charge Logic

Implementation Details

CMOS Power Consumption and C pd Calculation

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

CMOS, the Ideal Logic Family

ARM Cortex-A9 MPCore Multicore Processor Hierarchical Implementation with IC Compiler

CD4027BMS. CMOS Dual J-K Master-Slave Flip-Flop. Pinout. Features. Functional Diagram. Applications. Description. December 1992

ICS379. Quad PLL with VCXO Quick Turn Clock. Description. Features. Block Diagram

How To Reduce Energy Dissipation Of An On Chip Memory Array

Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors

數 位 積 體 電 路 Digital Integrated Circuits

Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi

Design Compiler Graphical Create a Better Starting Point for Faster Physical Implementation

Intel s Revolutionary 22 nm Transistor Technology

ECE124 Digital Circuits and Systems Page 1

In-network Monitoring and Control Policy for DVFS of CMP Networkson-Chip and Last Level Caches

Clocking. Figure by MIT OCW Spring /18/05 L06 Clocks 1

Memory Systems. Static Random Access Memory (SRAM) Cell

Enabling Technologies for Distributed and Cloud Computing

PyMTL and Pydgin Tutorial. Python Frameworks for Highly Productive Computer Architecture Research

CONTENTS. Preface Energy bands of a crystal (intuitive approach)

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7

CURRENT LIMITING SINGLE CHANNEL DRIVER V OFFSET. Packages

Chapter 9 Semiconductor Memories. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Photonic Networks for Data Centres and High Performance Computing

Signal Types and Terminations

Yaffs NAND Flash Failure Mitigation

HT1632C 32 8 &24 16 LED Driver

Design Cycle for Microprocessors

University of Texas at Dallas. Department of Electrical Engineering. EEDG Application Specific Integrated Circuit Design

CADENCE LAYOUT TUTORIAL

ZL40221 Precision 2:6 LVDS Fanout Buffer with Glitchfree Input Reference Switching and On-Chip Input Termination Data Sheet

Application Note AN-1135

Topics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology

SPREAD SPECTRUM CLOCK GENERATOR. Features

NAME AND SURNAME. TIME: 1 hour 30 minutes 1/6

LC898300XA. Functions Automatic adjustment to the individual resonance frequency Automatic brake function Initial drive frequency adjustment function

Networking Virtualization Using FPGAs

These help quantify the quality of a design from different perspectives: Cost Functionality Robustness Performance Energy consumption

Power-Aware High-Performance Scientific Computing

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

THE INVERTER DYNAMICS

MRF175GU MRF175GV The RF MOSFET Line 200/150W, 500MHz, 28V

1.1 Silicon on Insulator a brief Introduction

Features. Modulation Frequency (khz) VDD. PLL Clock Synthesizer with Spread Spectrum Circuitry GND

Enabling Technologies for Distributed Computing

Lecture 5: Gate Logic Logic Optimization

Glitch Free Frequency Shifting Simplifies Timing Design in Consumer Applications

Objective. Testing Principle. Types of Testing. Characterization Test. Verification Testing. VLSI Design Verification and Testing.

A 10,000 Frames/s 0.18 µm CMOS Digital Pixel Sensor with Pixel-Level Memory

Static-Noise-Margin Analysis of Conventional 6T SRAM Cell at 45nm Technology

DEVELOPING TRENDS OF SYSTEM ON A CHIP AND EMBEDDED SYSTEM

Modeling TCAM Power for Next Generation Network Devices

AP331A XX G - 7. Lead Free G : Green. Packaging (Note 2)

Performance Impacts of Non-blocking Caches in Out-of-order Processors

Design and Construction of Variable DC Source for Laboratory Using Solar Energy

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

ICS514 LOCO PLL CLOCK GENERATOR. Description. Features. Block Diagram DATASHEET

40-Segment 2.54 pitch LED Bar Display Board User s Guide

DISCONTINUED PRODUCT

W04 Transistors and Applications. Yrd. Doç. Dr. Aytaç Gören

TSL INTEGRATED OPTO SENSOR

Distributed communication-aware load balancing with TreeMatch in Charm++

IL2225 Physical Design

PECL and LVDS Low Phase Noise VCXO (for MHz Fund Xtal) XIN XOUT N/C N/C CTRL VCON (0,0) OESEL (Pad #25) 1 (default)

TRIPLE PLL FIELD PROG. SPREAD SPECTRUM CLOCK SYNTHESIZER. Features

An All-Digital Phase-Locked Loop with High Resolution for Local On-Chip Clock Synthesis

Bi-directional level shifter for I²C-bus and other systems.

IR2110(-1-2)(S)PbF/IR2113(-1-2)(S)PbF HIGH AND LOW SIDE DRIVER Product Summary

ICS SPREAD SPECTRUM CLOCK SYNTHESIZER. Description. Features. Block Diagram DATASHEET

Curriculum for a Master s Degree in ECE with focus on Mixed Signal SOC Design

NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter

Design and Verification of Nine port Network Router

Transcription:

McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures Sheng Li, Junh Ho Ahn, Richard Strong, Jay B. Brockman, Dean M Tullsen, Norman Jouppi MICRO 2009

How do u estimate access time? How do u estimate power? How do you estimate area? CACTI started in 1994 from the following papers. Only area and time initially. Power added later.

How do u estimate access time?

What type of models are these? Are they accurate? How could you estimate more accurately? SPICE

Power Estimation Circuit Level Power Estimation using SPICE Gate level power estimation The first such tool integrated with Synopsis around 1995 (Power Mill) Low Power libraries provided along with high speed libraries for use in synthesis Synopsis, Cadence, Magma all have builtin power estimation now Power management primitives are being incorporated into HDL level design

Motivation Tools both limit and drive research directions New demands Multicore/manycore Evaluate power, timing and area Different source of power dissipation Deep-submicron

Related Work Wattch: Enabling a tremendous surge in power-related research Orion: Combined Wattch s core power model with a router power model CACTI: First tool in rapid power, area, and timing estimation

How did Wattch work?

Wattch methodology

What s wrong with previous work? Wattch No timing and area models Only models dynamic power consumption Use simple linear scaling model RUU model from Simplescalar Orion2 No short circuit power or timing Incomplete CACTI

Contributions First integrated power, area, and timing modeling framework Model all three types of power dissipation Complete, integrated solution for multithreaded and multicore processor power Deep-submicron tech. that no longer linear

Overview

Integrated Approach Power Dynamic: Similar to Wattch Short Circuit: IEEE TCAD 00 Leakage: MASTAR & Intel Timing CACTI with extension Area CACTI Empirical modeling for complex logic

Hierarchical Approach

Architecture Level Core Divided into several main units: e.g. IFU, EXU, LSU, OOO issue/dispatch unit NoC On-Chip Caches Memory Controller Clocking Signal links and routers Coherent cache 3 main hardware structure Empirical model for physical interface (PHY) PLL and clock distribution network Empirical model for PLL power

Circuit Level Short wires as one-section Pi-RC model Wires Long wires as a buffered wire model Arrays Logic Clock Distribution Network Based on CACTI with extensions Highly regular: CACTI Less regular: Model from Intel, AMD and Sun Highly customized: Empirical, from Intel and Sun Separate circuit model Global, Domain, and Local

Validation

Validation Average error 1 st Contributor Error / % in total pwr Niagara 1.47W / 23% 74% / 1% Niagara2 1.87W / 26% 47% / 5% Alpha 21364 3.4W / 26% 45% / 3% Xeon Tulsa 4.2W / 17% 29% / 3%

Scaling & Clustering New generation: Double # of cores Keep the same micro-architecture

Parameters and Benchmarks

Area and Power

Performance and Efficiency

Performance and Efficiency

In-order vs. OOO with M5

Figure of Merit Metrics 1. EDA^2 2. EDA 3. ED 4. ED^2 5. AT 6. AT^2

Power and Energy Power is drawn from a voltage source attached to the V DD pin(s) of a chip. Instantaneous Power: P( t) i ( t) V DD DD Energy: Average Power: T E P( t) dt i ( t) V dt T 0 0 0 DD DD E 1 Pavg idd() t VDDdt T T T

Power Dissipation in CMOS ICs There are 4 sources of power dissipation in digital ICs Pavg = Pswitching + Pshortcircuit + Pleakage + Pstatic Pavg = Time Averaged Power Pswitching = Switching Component of Power Pshortcircuit = Short Circuit Component of Power Pleakage = Leakage Component of Power Pstatic = Static Component of Power

Dynamic Power Dissipation Caused by the switching of the circuits (Pswitching + Pshortcircuit) Higher operating frequency -> more frequent switching activities and results in increased power dissipation

Switching Power Dissipation Vdd Vin Vout CL Energy/transition = C L * V dd 2 Power = Energy/transition * f = C L * V dd 2 * f * a Switching power is required to charge and discharge load capacitances when transistors switch., CL = Load Capacitance f = Clock Frequency a = Activity Factor

Short Circuit Current When transistors switch, both nmos and pmos networks may be momentarily ON at once Leads to a blip of short circuit current. < 10% of dynamic power if rise/fall times are comparable for input and output

Short Circuit Current Vdd Vin Vout C L 0.15 I VDD (ma) 0.10 0.05 0.0 1.0 2.0 3.0 V in (V) 4.0 5.0

Power Dissipation in CMOS ICs Pavg = Pswitching + Pshortcircuit + Pleakage + Pstatic = a CL V VDD f + ISC VDD + Ileakage VDD + Istatic VDD CL = Load Capacitance V = Voltage Swing (most cases V = VDD) f = Clock Frequency ISC = Short circuit current Ileakage = Leakage Current Istatic = Static Current a = Activity Factor

Leakage Power The power that is dissipated from the devices, when they are in idle state. Adds to average power, not peak power More expensive than dynamic power Independent of transistor utilization

CMOS Leakage Current (a) (b) (c) (a) NMOS-transistor (b) Ideal switch (c) leaking device Any guesses on what percent of chip power is leakage power nowadays?

Leakage Power Trends Source: ITRS

Leakage Power Trends in Nanoscale Source: Intel

Transistor Leakage Current Mechanisms There are 6 different leakage current mechanisms in CMOS devices: Subthreshold leakage current (I sub ) Weak inversion conduction current Gate Oxide Tunneling Current (I gate ) Tunneling of electrons from substrate to gate Reverse Biased PN Junction Leakage Current (I rev ) Current flows into the well from the drain and source Gate Induced Drain Leakage Current (I gidl ) Due to high field effect in the drain junction Hot Carrier Gate Leakage Current (I hot ) Due to high electric field near the Si-SiO 2 Punch Through Leakage Current (I PT ) Proximity of the drain and source Subthreshold leakage current and Gate Oxide Tunneling Current are the most significant ones

Subthreshold Leakage Current Even though the transistor is logically turned OFF, there is a non-zero leakage current through the channel. This current is known as the subthreshold leakage current because it occurs when the gate voltage is below its threshold voltage

What Affects Leakage Power?

Static Current Static CMOS circuits are not supposed to consume constant static power from constant static current flow Sometimes designers deviate from CMOS design style Pseudo NMOS circuits This is an example where power is traded for area efficiency. The pseudo NMOS circuit does not require a P-transistor network and saves half the transistors required for logic computation, as compared to CMOS logic If an output signal is known to have a very high probability of logic 1 (say 0.99), it may make sense to implement the computation in pseudo NMOS logic

Static Current (contd..) The circuit has a special property: The static current flows only when the output is at logic 0 Make use of this property in low power design In general, the pseudo NMOS circuit is not used on random logic For special circuits such as PLAs or register files it is useful due to its efficient area usage In such as circuit there is a constant current flow from Vdd to Vss

Power Dissipation in CMOS ICs Pavg = Pswitching + Pshortcircuit + Pleakage + Pstatic = a CL V VDD f + ISC VDD + Ileakage VDD + Istatic VDD CL = Load Capacitance V = Voltage Swing (most cases V = VDD) f = Clock Frequency ISC = Short circuit current Ileakage = Leakage Current Istatic = Static Current a = Activity Factor

Basic Principles of Low Power Design Power reduction can be achieved at most design abstraction levels: Material Process technology Physical Design (Floor Plan, placement & routing) Circuit design techniques Transistor sizing Logic restructuring Architecture transformation Alternative computation algorithm

Basic Principles of Low Power Design Reduce Voltage Reduce Frequency Reduce Capacitance Reduce activity factor Use of different number representation, counting sequence etc. Alternate logic implementation Reduce Static components of power Transistor Sizing Layout Techniques Careful circuit design

Popular Techniques in Power Aware Design DVFS (Dynamic Voltage and Frequency Scaling) Low Power Modes Power Gating Clock Gating Logic Restructuring Compiler Control of Power Operating System Management of Power

Power Estimation Circuit Level Power Estimation using SPICE Gate level power estimation The first such tool integrated with Synopsis around 1995 (Power Mill) Low Power libraries provided along with high speed libraries for use in synthesis Synopsis, Cadence, Magma all have builtin power estimation now Power management primitives are being incorporated into HDL level design