Analysis (III) Low Power Design. Kai Huang

Similar documents
VLIW Processors. VLIW Processors

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Low Power AMD Athlon 64 and AMD Opteron Processors

A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems

Computer Architecture TDTS10

Pipelining Review and Its Limitations

CD4027BC Dual J-K Master/Slave Flip-Flop with Set and Reset

DESIGN CHALLENGES OF TECHNOLOGY SCALING

Learn About Energy Conservation in Computer Systems

CMOS, the Ideal Logic Family

CPU Performance. Lecture 8 CAP

1. Memory technology & Hierarchy

EVALUATING POWER MANAGEMENT CAPABILITIES OF LOW-POWER CLOUD PLATFORMS. Jens Smeds

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Photonic Networks for Data Centres and High Performance Computing

MM74HC273 Octal D-Type Flip-Flops with Clear

CD4001BC/CD4011BC Quad 2-Input NOR Buffered B Series Gate Quad 2-Input NAND Buffered B Series Gate

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX

CISC, RISC, and DSP Microprocessors

CMOS Power Consumption and C pd Calculation

CD40174BC CD40175BC Hex D-Type Flip-Flop Quad D-Type Flip-Flop

Power Analysis of Link Level and End-to-end Protection in Networks on Chip

MM74HC174 Hex D-Type Flip-Flops with Clear

CD4027BM CD4027BC Dual J-K Master Slave Flip-Flop with Set and Reset

CD4008BM CD4008BC 4-Bit Full Adder

on an system with an infinite number of processors. Calculate the speedup of

Chapter 10 Advanced CMOS Circuits

TPN4R712MD TPN4R712MD. 1. Applications. 2. Features. 3. Packaging and Internal Circuit Rev.4.0. Silicon P-Channel MOS (U-MOS )

SSM3K335R SSM3K335R. 1. Applications. 2. Features. 3. Packaging and Pin Configuration Rev.3.0. Silicon N-Channel MOS (U-MOS -H)

NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter

Power Reduction Techniques in the SoC Clock Network. Clock Power

Alpha CPU and Clock Design Evolution

74HC154; 74HCT to-16 line decoder/demultiplexer

MADR TR. Single Driver for GaAs FET or PIN Diode Switches and Attenuators Rev. V1. Functional Schematic. Features.

MM74C150 MM82C19 16-Line to 1-Line Multiplexer 3-STATE 16-Line to 1-Line Multiplexer

1 pc Charge Injection, 100 pa Leakage CMOS 5 V/5 V/3 V 4-Channel Multiplexer ADG604

4-bit binary full adder with fast carry CIN + (A1 + B1) + 2(A2 + B2) + 4(A3 + B3) + 8(A4 + B4) = = S1 + 2S2 + 4S3 + 8S4 + 16COUT

74HC393; 74HCT393. Dual 4-bit binary ripple counter

74HC238; 74HCT to-8 line decoder/demultiplexer

Embedded System Hardware - Processing (Part II)

MM74HC14 Hex Inverting Schmitt Trigger

CMOS Binary Full Adder

MADR TR. Quad Driver for GaAs FET or PIN Diode Switches and Attenuators Rev. 4. Functional Schematic. Features.

CMOS Logic Integrated Circuits

Intel s Revolutionary 22 nm Transistor Technology

MM74HCT373 MM74HCT374 3-STATE Octal D-Type Latch 3-STATE Octal D-Type Flip-Flop

DM74LS169A Synchronous 4-Bit Up/Down Binary Counter

8-bit binary counter with output register; 3-state

ARM Microprocessor and ARM-Based Microcontrollers

74HC165; 74HCT bit parallel-in/serial out shift register

12. Introduction to Virtual Machines

74HC138; 74HCT to-8 line decoder/demultiplexer; inverting

PS323. Precision, Single-Supply SPST Analog Switch. Features. Description. Block Diagram, Pin Configuration, and Truth Table. Applications PS323 PS323

MM74HC4538 Dual Retriggerable Monostable Multivibrator

Introduction to Virtual Machines

Design and Construction of Variable DC Source for Laboratory Using Solar Energy

Low-power configurable multiple function gate

MADR TR. Quad Driver for GaAs FET or PIN Diode Switches and Attenuators. Functional Schematic. Features. Description. Pin Configuration 2

Solar Energy Conversion using MIAC. by Tharowat Mohamed Ali, May 2011

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

INTEGRATED CIRCUITS. 74LVC08A Quad 2-input AND gate. Product specification IC24 Data Handbook Jun 30

Automist - A Tool for Automated Instruction Set Characterization of Embedded Processors

3-to-8 line decoder, demultiplexer with address latches

Application Note 58 Crystal Considerations with Dallas Real Time Clocks

2.996/6.971 Biomedical Devices Design Laboratory Lecture 4: Power Supplies

Processor Architectures

time instructions easily predictable regions (a) original execution flow speculation stream. verification stream (b) contrail processor (4PEs) squash

Memory Characterization to Analyze and Predict Multimedia Performance and Power in an Application Processor

DM Segment Decoder/Driver/Latch with Constant Current Source Outputs


Flash Corruption: Software Bug or Supply Voltage Fault?

Networking Virtualization Using FPGAs

74HC02; 74HCT General description. 2. Features and benefits. Ordering information. Quad 2-input NOR gate

N-channel enhancement mode TrenchMOS transistor

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

Cold-Junction-Compensated K-Thermocoupleto-Digital Converter (0 C to C)

Efficient Big Data Analytics Computing: A Research Challenge

CSEE W4824 Computer Architecture Fall 2012

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

74HC4067; 74HCT channel analog multiplexer/demultiplexer

Design Cycle for Microprocessors

Features Benefits Applications

Power-Aware High-Performance Scientific Computing

PowerPC Microprocessor Clock Modes

DM74LS191 Synchronous 4-Bit Up/Down Counter with Mode Control

14-stage ripple-carry binary counter/divider and oscillator

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

Sequential 4-bit Adder Design Report

Programming Logic controllers

McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

74VHC574 Octal D-Type Flip-Flop with 3-STATE Outputs

Performance evaluation

HEF4021B. 1. General description. 2. Features and benefits. 3. Ordering information. 8-bit static shift register

Lecture 10: Latch and Flip-Flop Design. Outline

High and Low Side Driver

HCF4028B BCD TO DECIMAL DECODER

74HCU General description. 2. Features and benefits. 3. Ordering information. Hex unbuffered inverter

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

Transcription:

Analysis (III) Low Power Design Kai Huang

Chinese new year: 1.3 billion urban exodus 1/28/2014 Kai.Huang@tum 2 The interactive map, which is updated hourly The thicker, brighter lines are the busiest routes. Current view 28.01.2014 9am by Baidu

Outline General Remarks Power and Energy Basic Techniques o Parallelism o VLIW (parallelism and reduced overhead) o Dynamic Voltage Scaling o Dynamic Power Management 1/28/2014 Kai.Huang@tum 3

Power and Energy Consumption Power is considered as the most important constraint in embedded systems. [in: L. Eggermont (ed): Embedded Systems Roadmap 2002, STW] Power demands are increasing rapidly, yet battery capacity cannot keep up. [in Diztel et al.: Power-Aware Architecting for data-dominated applications, 2007, Springer] 1/28/2014 Kai.Huang@tum 4

Implementation Alternatives Power efficiency 1/28/2014 Kai.Huang@tum 5

Energy Efficiency Hugo De Man, IMEC, Philips, 2007 Necessary to optimize HW and SW. Use heterogeneous architectures. Apply specialization techniques. H. de Man, Keynote, DATE 02; 1/28/2014 Kai.Huang@tum 6

Outline General Remarks Power and Energy Basic Techniques o Parallelism o VLIW (parallelism and reduced overhead) o Dynamic Voltage Scaling o Dynamic Power Management 1/28/2014 Kai.Huang@tum 7

Power and Energy are Related In many cases, faster execution also means less energy, but the opposite may be true if power has to be increased to allow faster execution. 1/28/2014 Kai.Huang@tum 8

Low Power vs. Low Energy Minimizing the power consumption is important for o the design of the power supply o the design of voltage regulators o the dimensioning of interconnect o cooling (short term cooling) high cost (estimated to be rising at $1 to $3 per Watt for heat dissipation [Skadron et al. ISCA 2003]) limited space Minimizing the energy consumption is important due to o restricted availability of energy (mobile systems) o limited battery capacities (only slowly improving) o very high costs of energy (solar panels, in space) o long lifetimes, low temperatures 1/28/2014 Kai.Huang@tum 9

Power Consumption of a CMOS Gate subthreshold and gate-oxide leakage Ileak : leakage current Iint : short circuit current Isw : switching current 1/28/2014 Kai.Huang@tum 10

Power Consumption of CMOS Processors Main sources: o Dynamic power consumption charging and discharging capacitors o Short circuit power consumption short circuit path between supply rails during switching o Leakage leaking diodes and translators becomes one of the major factors due to shrinking feature sizes in semiconductor technology 1/28/2014 Kai.Huang@tum 11

Dynamic Voltage Scaling (DVS) Power consumption of CMOS circuits (ignoring leakage): Delay for CMOS circuits: V dd α C L f : supply voltage : switching activity : load capacity : clock frequency V dd V T : supply voltage : threshold voltage Decreasing V dd reduces P quadratically (f constant). The gate delay increases only reciprocally. Maximal frequency f max decreases linearly. 1/28/2014 Kai.Huang@tum 12

Potential for Energy Optimization: DVS Saving energy for a given task: o Reduce the supply voltage V dd o Reduce switching activity α o Reduce the load capacitance C L o Reduce the number of cycles #cycles 1/28/2014 Kai.Huang@tum 13

Example: Voltage Scaling [Courtesy, Yasuura, 2000] 1/28/2014 Kai.Huang@tum 14

Power Supply Gating Power gating is one of the most effective ways of minimizing static power consumption (leakage) o Cut-off power supply to inactive units/components o Reduces leakage 1/28/2014 Kai.Huang@tum 15

Outline General Remarks Power and Energy Basic Techniques o Parallelism o VLIW (parallelism and reduced overhead) o Dynamic Voltage Scaling o Dynamic Power Management 1/28/2014 Kai.Huang@tum 16

Use of Parallelism 1/28/2014 Kai.Huang@tum 17

Use of Pipelining 1/28/2014 Kai.Huang@tum 18

Outline General Remarks Power and Energy Basic Techniques o Parallelism o VLIW (parallelism and reduced overhead) o Dynamic Voltage Scaling o Dynamic Power Management 1/28/2014 Kai.Huang@tum 19

New ideas help... Pentium Crusoe Running the same multimedia application. As published by Transmeta [www.transmeta.com] 1/28/2014 Kai.Huang@tum 20

VLIW Architectures Large degree of parallelism o many computational units, (deeply) pipelined Simple hardware architecture o explicit parallelism (parallel instruction set) o parallelization is done offline (compiler) 1/28/2014 Kai.Huang@tum 21

Transmeta is a typical VLIW Architecture 128-bit instructions (bundles): o 4 operations per instruction o 2 combinations of instructions allowed Register files o 64 integer, 32 floating point Some interesting features o 6 stage pipeline (2x fetch, decode, register read, execute, write) o X86 ISA execution using software techniques Skip the binary compatibility problem!! Interpretation and just-in-time binary translation o Speculation support 1/28/2014 Kai.Huang@tum 22

Transmeta 1/28/2014 Kai.Huang@tum 23

Outline General Remarks Power and Energy Basic Techniques o Parallelism o VLIW (parallelism and reduced overhead) o Dynamic Voltage Scaling o Dynamic Power Management 1/28/2014 Kai.Huang@tum 24

Spatial vs. Dynamic Voltage Management 1/28/2014 Kai.Huang@tum 25

Potential for Energy Optimization: DVS Saving energy for a given task: o Reduce the supply voltage V dd o Reduce switching activity α o Reduce the load capacitance C L o Reduce the number of cycles #cycles 1/28/2014 Kai.Huang@tum 26

Example: INTEL Xscale OS should schedule distribution of the energy budget. 1/28/2014 Kai.Huang@tum 27

DVS Example: a) Complete Task ASAP Task that need to execute 10² cycles within 25 seconds. V dd [V] 5.0 4.0 2.5 Energy per cycle [nj] 40 25 10 f max [MHz] 50 40 25 Cycle time [ns] 20 25 40 [V²] 5² 4² 10⁹ cycles@50 MHz deadline 9 E a 10 40 10 40[ J ] 9 2.5² 5 10 15 20 25 t [s] 1/28/2014 Kai.Huang@tum 28

DVS Example: b) Two Voltages Task that need to execute 10² cycles within 25 seconds. V dd [V] 5.0 4.0 2.5 Energy per cycle [nj] 40 25 10 f max [MHz] 50 40 25 Cycle time [ns] 20 25 40 [V²] 5² 4² 2.5² 750M cycles@50 MHz + 250M cycles@25 deadline E b 750 10 250 10 32.5[ J ] 6 6 40 10 10 10 9 9 5 10 15 20 25 t [s] 1/28/2014 Kai.Huang@tum 29

DVS Example: c) Optimal Voltage Task that need to execute 10² cycles within 25 seconds. V dd [V] 5.0 4.0 2.5 Energy per cycle [nj] 40 25 10 f max [MHz] 50 40 25 Cycle time [ns] 20 25 40 [V²] 5² 4² 10⁹ cycles@40 MHz deadline 9 E b 10 25 10 25[ J ] 9 2.5² 5 10 15 20 25 t [s] 1/28/2014 Kai.Huang@tum 30

Outline General Remarks Power and Energy Basic Techniques o Parallelism o VLIW (parallelism and reduced overhead) o Dynamic Voltage Scaling o Dynamic Power Management 1/28/2014 Kai.Huang@tum 31

Dynamic Power V.S. Static Power 1/28/2014 Kai.Huang@tum 32

1/28/2014 Kai.Huang@tum 33

Dynamic Power Management (DPM) 1/28/2014 Kai.Huang@tum 34

Reduce Power According to Workload 1/28/2014 Kai.Huang@tum 35

Reduce Static Power Example Assumption o Given arrival curve, buffer size and deadline requirement, power parameters Problem statement o To determine the on/off periods such that energy consumption is minimized no deadline violation and buffer overflow Details see the HuangDPMOffline2009 paper 1/28/2014 Kai.Huang@tum 36

Basic Idea: Use RTC to Compute Bounds is the service demand to avoid deadline violation is the service demand to avoid buffer overflow 1/28/2014 Kai.Huang@tum 37

Basic Idea: Choose the Bound of Min Energy Derive a periodic on/off curve which energy consumption is minimized 1/28/2014 Kai.Huang@tum 38

Bounding Delay Approximation From two parameters to only T off 1/28/2014 Kai.Huang@tum 39