Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a large load all sequential logic elements all precharged/dynamic logic distributed throughout chip, so lots of wiring» DEC 21164 s clock accounts for 40% of total chip power 3.75nF total clock load 20W (out of 50W) in clock distribution network Low Power Design for SoCs ASIC Tutorial SoC.2 1
Processor Power Budgets Datapath Memory I/O (pads) Inner circle: low end embedded microprocessor Next circle: high end CPU with on-chip cache Next circle: MPEG2 decoder ASIC Outer circle: ATM switch ASIC Low Power Design for SoCs ASIC Tutorial SoC.3 Power Reduction P clock = CV dd2 f Minimize voltage (V) using half swing clocks Minimize clock load (C)» clock gating» careful routing, distributed drivers Minimize clock frequency (f)» DET flipflops» localized to multiply frequency of clock GALS design approach Low Power Design for SoCs ASIC Tutorial SoC.4 2
Reduced Swing Vdd N-device clock Gnd Regular P-device clock Vdd Vtp Vtn Gnd Half Swing P-device clock N-device clock Low Power Design for SoCs ASIC Tutorial SoC.5 Half Swing s Advantages» as long as Vtn (Vtp) less (greater) than 1/2Vdd on-off characteristics of nfet (pfet) unchanged Disadvantages» sequential element delay approx. doubled (propagation delay and setup/hold time) due to increased on-resistance» half-swing clock generator done via charge sharing, so sleep modes problematic» not appropriate for very low voltage systems Low Power Design for SoCs ASIC Tutorial SoC.6 3
Gating Most popular method for power reduction of clock signals and fu s» often idle functional units e.g., floating point units» need circuit to generate enable signal clock enable Functional unit increases complexity of control logic timing critical to avoid clock glitches at AND gate output» additional gate delay on clock signal masking AND gate can replace a buffer in the clock distribution tree Low Power Design for SoCs ASIC Tutorial SoC.7 Glitch Free Gating B A 0 1 < < (1) Gated From < REG < Gated Gated (1) Gated (2) (2) Low Power Design for SoCs ASIC Tutorial SoC.8 4
Gated FSM Architecture Reg Comb Logic AF Latch AF - Activation Function, Which evaluates to logic 1 when clock needs to be stopped. Gated Low Power Design for SoCs ASIC Tutorial SoC.9 Tree Construction to Facilitate Gating Can insert clock gating at multiple levels in clock tree Can shut off entire subtree if all gating conditions are satisfied H-Tree Network Idle condition Gated clock Low Power Design for SoCs ASIC Tutorial SoC.10 5
Driver Distribution Comparison Dimension (cm) SD (W) DD (W) 0.25 0.052 0.051 0.5 0.206 0.101 0.75 0.464 0.152 1.0 0.825 0.202 1.25 1.29 0.253 1.5 1.85 0.303 1.75 2.53 0.354 SD = single driver, DD = distributed driver (H-tree) 3.3V supply, 100MHz frequency, 1 micron feature size Low Power Design for SoCs ASIC Tutorial SoC.11 Tree Structure Affects Gating x1 R1 x1 R1 A B x2 x1+x3 R2 R3 A B x3 R3 R2 x2+x4 (a) R4 x2 x4 (b) R4 Assuming x1, x2, x3, x4 are mutually exclusive Low Power Design for SoCs ASIC Tutorial SoC.12 6
Multiple Frequency s f < f1 < f2 < f3 System clock f f1 Bus Interface I/O controller Parallel serial interface f3 Key is in the design of the local circuits used to generate the clock signal in each module f2 RISC Core Low Power Design for SoCs ASIC Tutorial SoC.13 Frequency Multipliers Circuit Tech Input Freq Vdd Power Diss Area 1 0.8µ 50MHz 5V 16mW 0.31mm2 2 0.5µ 50MHz 3.3V 10mW 0.52mm2 DDL 3 33MHz 3.8V 49.4mW 1 Young, 1992 2 Alvarez, 1995 3 Gupta Low Power Design for SoCs ASIC Tutorial SoC.14 7
GALS Design Style Reduce clock power consumption by using a Globally Asynchronous, Locally Synchronous (GALS) design style Overheads for» local clock generation independent clock generators low power global clock reference signal with local clock frequency multipliers» global asynchronous communication Skew tolerant Low Power Design for SoCs ASIC Tutorial SoC.15 GALS Architecture f1 Bus Interface I/O controller Parallel serial interface f3 data handshake protocol f2 RISC Core Low Power Design for SoCs ASIC Tutorial SoC.16 8
Key References Alvarez, A wide bandwidth low voltage for PowerPC microprocessors, IEEE Journal of SSC, 30:383-391, April 1995. Chen, A simple technique for global clock power reduction, PSU Internal Report, 1998. Chen, power issues in system-on-a-chip designs, Proc. of Workshop on VLSI, pp. 48-53, March 1999. Friedman, distribution design in VLSI circuits: An Overview, Proc. of ISCAS, pp. 1475-1478, May 1994. Gupta, Features of differential delay line used on the embedded ultra low power Intel486 in developer.intel.com/design/intarch/papers/ddl486.htm Hemani, Lowering power consumption in clock by using GALS design style, Proc. of DAC, pp. 873-878, 1999. Kojima, Half-swing clocking scheme for 75% power saving, IEEE Journal of SSC, 30(4):432-435, April 1994. Tellez, Activity driven clock design for low power circuits, Proc. of ICCAD, pp. 62-65, Nov. 1995. Young, A clock generator with 5 to 110MHz of lock range for microprocessors, IEEE Journal of SSC, pp. 1599-1607, Nov. 1992 Low Power Design for SoCs ASIC Tutorial SoC.17 9