«A 32-bit DSP Ultra Low Power accelerator»



Similar documents
UTBB-FDSOI 28nm : RF Ultra Low Power technology for IoT

Coherent sub-thz transmission systems in Silicon technologies: design challenges for frequency synthesis

Clocking. Figure by MIT OCW Spring /18/05 L06 Clocks 1

Power Reduction Techniques in the SoC Clock Network. Clock Power

Intel Labs at ISSCC Copyright Intel Corporation 2012

Sequential 4-bit Adder Design Report

NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter

Alpha CPU and Clock Design Evolution

Low Power AMD Athlon 64 and AMD Opteron Processors

STMicroelectronics. Deep Sub-Micron Processes 130nm, 65 nm, 40nm, 28nm CMOS, 28nm FDSOI. SOI Processes 130nm, 65nm. SiGe 130nm

SRAM Scaling Limit: Its Circuit & Architecture Solutions

Static-Noise-Margin Analysis of Conventional 6T SRAM Cell at 45nm Technology

Digital Design for Low Power Systems

Mono-Processor, DSP, smart sensor nodes Many-Processor System-on-Chip (MPSoC) Dedicated computing platforms, e.g. H.264 hardware encoder

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

IC-EMC Simulation of Electromagnetic Compatibility of Integrated Circuits

Implementation Details

Design of Energy Efficient Low Power Full Adder using Supply Voltage Gating

Digital Integrated Circuit (IC) Layout and Design

Lecture 10: Latch and Flip-Flop Design. Outline

How To Design A Chip Layout

VITESSE SEMICONDUCTOR CORPORATION. 16:1 Multiplexer. Timing Generator. CMU x16

Fully Differential CMOS Amplifier

System Considerations

Thermal Modeling Methodology for Fast and Accurate System-Level Analysis: Application to a Memory-on-Logic 3D Circuit

數 位 積 體 電 路 Digital Integrated Circuits

What is a System on a Chip?

Here we introduced (1) basic circuit for logic and (2)recent nano-devices, and presented (3) some practical issues on nano-devices.

LOW POWER DESIGN OF DIGITAL SYSTEMS USING ENERGY RECOVERY CLOCKING AND CLOCK GATING

LOW POWER DUAL EDGE - TRIGGERED STATIC D FLIP-FLOP

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

A 10,000 Frames/s 0.18 µm CMOS Digital Pixel Sensor with Pixel-Level Memory

A New Programmable RF System for System-on-Chip Applications

Reconfigurable System-on-Chip Design

High-Speed Electronics

A 1.62/2.7/5.4 Gbps Clock and Data Recovery Circuit for DisplayPort 1.2 with a single VCO

A N. O N Output/Input-output connection

NVM memory: A Critical Design Consideration for IoT Applications

7a. System-on-chip design and prototyping platforms

Model-Based Synthesis of High- Speed Serial-Link Transmitter Designs

Pass Gate Logic An alternative to implementing complex logic is to realize it using a logic network of pass transistors (switches).

DESIGN CHALLENGES OF TECHNOLOGY SCALING

Lab Experiment 1: The LPC 2148 Education Board

CS250 VLSI Systems Design Lecture 8: Memory

On-Line Diagnosis using Orthogonal Multi-Tone Time Domain Reflectometry in a Lossy Cable

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Am186ER/Am188ER AMD Continues 16-bit Innovation

Chapter 9 Latches, Flip-Flops, and Timers

Process modules Digital input PMI for 24 V DC inputs for 120 V AC inputs

Semiconductor Memories

Design of Low Power One-Bit Hybrid-CMOS Full Adder Cells

Layout and Cross-section of an inverter. Lecture 5. Layout Design. Electric Handles Objects. Layout & Fabrication. A V i

International Journal of Electronics and Computer Science Engineering 1482

A 3.2Gb/s Clock and Data Recovery Circuit Without Reference Clock for a High-Speed Serial Data Link

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

An All-Digital Phase-Locked Loop with High Resolution for Local On-Chip Clock Synthesis

Systematic Design for a Successive Approximation ADC

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware

White Paper Utilizing Leveling Techniques in DDR3 SDRAM Memory Interfaces

On-chip clock error characterization for clock distribution system

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

A Utility for Leakage Power Recovery within PrimeTime 1 SI

ECE124 Digital Circuits and Systems Page 1

CMOS Binary Full Adder

ECE410 Design Project Spring 2008 Design and Characterization of a CMOS 8-bit Microprocessor Data Path

A Laser Scanner Chip Set for Accurate Perception Systems

Module 7 : I/O PADs Lecture 33 : I/O PADs

MSFET MICROSENS Miniature ph Sensor Module

StarRC Custom: Next-Generation Modeling and Extraction Solution for Custom IC Designs

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

A 1-GSPS CMOS Flash A/D Converter for System-on-Chip Applications

路 論 Chapter 15 System-Level Physical Design

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7

ABB Drives. User s Manual HTL Encoder Interface FEN-31

INF4420 Introduction

Op-Amp Simulation EE/CS 5720/6720. Read Chapter 5 in Johns & Martin before you begin this assignment.

Serial port interface for microcontroller embedded into integrated power meter

Tire pressure monitoring

ESP-CV Custom Design Formal Equivalence Checking Based on Symbolic Simulation

Optimizing VCO PLL Evaluations & PLL Synthesizer Designs

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

S. Venkatesh, Mrs. T. Gowri, Department of ECE, GIT, GITAM University, Vishakhapatnam, India

Pericom PCI Express 1.0 & PCI Express 2.0 Advanced Clock Solutions

CHAPTER 16 MEMORY CIRCUITS

Digital to Analog Converter. Raghu Tumati

ZL40221 Precision 2:6 LVDS Fanout Buffer with Glitchfree Input Reference Switching and On-Chip Input Termination Data Sheet

Chapter 2 Heterogeneous Multicore Architecture

How To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual

NEW adder cells are useful for designing larger circuits despite increase in transistor count by four per cell.

Riding silicon trends into our future

True Single Phase Clocking Flip-Flop Design using Multi Threshold CMOS Technique

NAME AND SURNAME. TIME: 1 hour 30 minutes 1/6

The new 32-bit MSP432 MCU platform from Texas

Chapter 9 Semiconductor Memories. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

DS1104 R&D Controller Board

Wireless Security Camera

Transcription:

«A 32-bit DSP Ultra Low Power accelerator» E. Beigné edith.beigne@cea.fr CEA LETI MINATEC, Grenoble, France www.cea.fr

Low power SOC challenges : Energy Efficiency Fine-Grain AVFS solutions FDSOI technology brings a new actuator Fine-Grain AVFS with Body Biasing Silicon results in FDSOI 28nm Future IoT needs and Conclusions CEA. All rights reserved Diego Puschini ICDV 2014 2

Cliquez pour Low modifier Power le SoC style challenges du titre Application Applicative flexibility Unpredictable load Multi-Core Architectures Technology Process variability Aging Architecture Massively parallel architectures CEA. All rights reserved 2015 COOL Chips E. Beigné 3

Cliquez pour modifier Energy le style Efficiency du titre Power consumption management follows two objectives today : Reduce the overall power or Increase the global energetic efficiency Performance does no more only mean High frequency but also Low power High Performance/ Low Power trade-off difficult to reach Mimimum power consumed at a given frequency CEA. All rights reserved 2015 COOL Chips E. Beigné 4

Low power SOC challenges : Energy Efficiency Fine-Grain AVFS solutions FDSOI technology brings a new actuator Fine-Grain AVFS with Body Biasing Silicon results in FDSOI 28nm Future IoT needs and Conclusions CEA. All rights reserved Diego Puschini ICDV 2014 5

Dynamic Voltage Frequency Scaling Cliquez pour modifier le style du titre Power Domain DVFS for power-performance trade-off V regulator Ctrl. F generator Core 1! " Frequency Power V dd F are adjusted to ensure performance at minimum consumption Voltage Frequency CEA. All rights reserved 2015 COOL Chips E. Beigné 6

Adaptive Voltage Frequency Scaling Cliquez pour modifier le style du titre V regulator Power Domain AVFS approach Fmax PVT monitors Data fusion adjustment Ctrl. F generator Core Adjustment Data fusion 1! " Frequency Voltage F is adjusted to reach F max for the given conditions (V dd, T ) CEA. All rights reserved 2015 COOL Chips E. Beigné 7

Low power SOC challenges : Energy Efficiency Fine-Grain AVFS solutions FDSOI technology brings a new actuator Fine-Grain AVFS with Body Biasing Silicon results in FDSOI 28nm Future IoT needs and Conclusions CEA. All rights reserved Diego Puschini ICDV 2014 8

Body Biasing and Multi-V T in UTBB FD-SOI Cliquez pour modifier le style du titre RVT Bulk-like Well VB NMOS NMOS PMOS VB PMOS -300mV VBPMOS 3V p-well n-well -3V VBNMOS 300mV LVT Flip-Well GND VB PMOS NMOS PMOS VB NMOS -3V VBPMOS 300mV n-well p-well -300mV VBNMOS 3V [Flatresse et al. ISSCC 2013] GND CEA. All rights reserved 2015 COOL Chips E. Beigné 9

Low power SOC challenges : Energy Efficiency Fine-Grain AVFS solutions FDSOI technology brings a new actuator Fine-Grain AVFS with Body Biasing Silicon results in FDSOI 28nm Future IoT needs and Conclusions CEA. All rights reserved Diego Puschini ICDV 2014 10

Cliquez AVFS pour with modifier Adaptive le Body style du Biasing titre V regulator Power Domain ABB in FD-SOI Ultra Wide Voltage Range (UWVR) Body Bias increases efficiency Ctrl. F generator Core BB generator Adjustment Data fusion UWVR 1! " Frequency Eff. 0 1.3V V dd V bb should be chosen to minimize consumption Voltage Source: http://www.st.com/web/en/about_st/learn_fd-soi.html CEA. All rights reserved 2015 COOL Chips E. Beigné 11

Cliquez pour Fine modifier grain le AVFS style with du titre ABB V regulator Ctrl. F generator BB generator Adjustment Power Power Domain Domain Power Domain Core Data fusion Power Domain Power Domain Power Domain Power Domain Granularity in complex SoC Fine grain requires per core actuators Low area/complexity enable fine grain AVFS with ABB Concession on performance Speed Discretization Switching cost Efficiency Frequency Voltage Example of V DD discretization Actuators constraints to be considered for management policy CEA. All rights reserved 2015 COOL Chips E. Beigné 12

Summarizing AVFS with ABB in FD-SOI Cliquez pour modifier le style du titre V DD generator low area for fine grain F max monitor for adaptive approach Power Domain V regulator Ctrl. F generator BB generator Core T monitor for adaptive approach Adjustment Data fusion F generator low area for fine grain V BB generator to improve efficiency CEA. All rights reserved 2015 COOL Chips E. Beigné 13

Something new for AVFS power management Cliquez pour modifier le style du titre Dynamic power management today FD-SOI context (UWVR, increased efficiency) New opportunity: extended Body Bias range Source: http://www.st.com/web/en/about_st/learn_fd-soi.html Power Domain V regulator 3 on-chip constrained actuators Ctrl. F generator Core Known target performance (F target ) BB generator n Power Modes (PM) available Adjustment Data fusion F V dd V bb P F Which configuration to minimize power? taking into account actuators constraints CEA. All rights reserved 2015 COOL Chips E. Beigné 14

Low power SOC challenges : Energy Efficiency Fine-Grain AVFS solutions FDSOI technology brings a new actuator Fine-Grain AVFS with Body Biasing Silicon results in FDSOI 28nm Future IoT needs and Conclusions CEA. All rights reserved Diego Puschini ICDV 2014 15

Address Gen. Cliquez pour modifier DSP le style architecture du titre 32bit VLIW DSP core - V/F domain Control Address Gen. Shift Reg. Mux Prog SRAM Data SRAM Register Array 2Read-1Write Ports 64x32b 32b Multiplier Bit Extension Mux Adder 40b MAC Bit Trunc Cordics + Divider Comp/Select CODA TMFLT-R TMFLT-S TMFLT-S TMFLT-S TMFLT-S TMFLT-S TMFLT-S Serial Interface Input/Output RAM 2x1Kx32 PLL Variability Power Control (CVP) CEA. All rights reserved 2015 COOL Chips E. Beigné 16

Cliquez pour modifier Chip le style Micrograph du titre Technology STMicro Transistors Core area DSP benchmark VDD range VBB range UTBB FDSOI 28 nm Flip-Well (LVT) L=24nm 1 mm² FFT 1024 0.397V-1.3V 0V/±2V Wilson, R et al., "A 460MHz at 397mV, 2.6GHz at 1.3V, 32b VLIW DSP, embedding FMAX tracking," International Solid-State Circuits Conference (ISSCC), pp. 452-453, February 2014 E. Beigné, et al., JSSC 2015 CEA. All rights reserved 2015 COOL Chips E. Beigné 17

Cliquez Power pour distribution modifier le Body style du Biasing titre Including all biasing voltages Thin VBB power stripes Body-Biasing Supplied through specific IOs (± 2V) CEA. All rights reserved 2015 COOL Chips E. Beigné 18

Cliquez pour Optimized modifier le library style du sub-set titre Use of available standard cells optimized for low voltage: Extend the subset to the Wide Voltage Range Gate length modulation done through Poly Biasing Keep the power-delay optimal cells in UWVR Cells are characterized over [0.275V-1.4V] V DD range and [0V- (±2V)] V BB range CEA. All rights reserved 2015 COOL Chips E. Beigné 19

Cliquez pour Optimized modifier le Pulsed-latch style du titre FF Transmission-Gate Pulsed Latches A latch and a pulse generator Same behavior as a conventional Master-Slave FF Proved to be the most energy-efficient in a large portion of the design spacee D-to-Q (ps) Egy/cycle (fj) Area (µm²) ST C2MOS 117.5 6 4,41 ST SA 46.5 (- 60 %) 12.5 (+ 208 %) 6,85 (+ 55 %) TGPLMuxScan 30.5 (- 74 %) 7.2 (+ 20 %) 4,73 (+ 7 %) TGPLMuxClk 26 (- 78 %) 9.1 (+ 56 %) 5,06 (+ 14 %) CEA. All rights reserved 2015 COOL Chips E. Beigné 20

Timing margins reduction in UWVR : F MAX tracking Cliquez pour modifier le style du titre Design in typical corner case instead of worst case approach Need to compensate for PVT variations in UWVR Reduce energy per operation or increase clock frequency Track the optimal Frequency/Voltage/Biasing point Real maximum frequency is not available Functional in the UWVR / Low area and power overhead Two complementary solutions proposed on-chip Replica path based CODA (ClOning DAta paths) Timing slack sensor based TMFLT (TiMing FauLT) CEA. All rights reserved 2015 COOL Chips E. Beigné 21

Fmax tracking (solution 1) : Path Cloning Cliquez pour modifier le style du titre 16 paths cloned at 1V signoff Correlation between F MAX and frequency predictions 5 slots into 21 dies @ 1V 25 C (+4/-3% accuracy) 1 slot into 21 dies @ [0.8V-1.3V] 25 C (6% accuracy) Not intrusive / Need complementary solution in UWVR CEA. All rights reserved 2015 COOL Chips E. Beigné 22

Fmax tracking (solution 2) : Slack Monitoring Cliquez pour modifier le style du titre 128 TMFLT-Sensors 1 TMFLT-Ring Analyze the registers slack time 300ps detection window Programmable replica path Time-to-digital measurement RAM FF FF Reset_n TMFLT Sensor FF Detectio n window Warning EN CLK E CG SEL 31 Prog. delay reg1 reg2 Clock tree 0 0 0 1 1 1 SIG slack time CEA. All rights reserved 2015 COOL Chips E. Beigné 23

frequency error (%) Fmax tracking (solution 2) : Slack Monitoring Cliquez pour modifier le style du titre Frequency estimation error compared to F MAX 8 6 4 2 0-2 -4 Error of ±4% at 1V 21 dies -6 600 700 800 900 1000 1100 1200 1300 Power Supply (mv) Compared to worst case PVT corner (3σ) approach without F MAX tracking: Energy gain of 40.6% @ 0.6V Frequency gain of 24% @1V CEA. All rights reserved 2015 COOL Chips E. Beigné 24

Low power SOC challenges : Energy Efficiency Fine-Grain AVFS solutions FDSOI technology brings a new actuator Fine-Grain AVFS with Body Biasing Silicon results in FDSOI 28nm Future IoT needs and Conclusions CEA. All rights reserved Diego Puschini ICDV 2014 25

DSP speed performance measurements Cliquez pour modifier le style du titre Frequency (MHz) 3000 2500 2000 1500 1000 500 Boost 0mV FBB 0V FBB 0.5V FBB 1.0V FBB 1.5V FBB 2.0V Boost 500mV Boost 1000mV Boost 1500mV Boost 2000mV 1GHz@570mV 2.6GHz@1.3V 0 300 400 500 600 700 800 900 1000 1100 1200 1300 460MHz@397mV VDD voltage (mv) FBB up to +2V and F MAX tracking : frequency up to 460MHz at minimum voltage CEA. All rights reserved 2015 COOL Chips E. Beigné 26

Energy/cycle (pj/cycle) 450 400 350 300 250 200 150 100 50 0 DSP energy performance measurements Cliquez pour modifier le style du titre FBB 0V FBB 0.5V FBB 1.0V FBB 1.5V FBB 2.0V Boost 0mV Boost 500mV Boost 1000mV Boost 1500mV Boost 2000mV 62pJ/cycle @ 460MHz 300 500 700 900 1100 1300 1500 1700 1900 2100 2300 2500 For a fixed voltage supply : Lowest energy at 460MHz Power consumption : 370mW at 1V Frequency (MHz) CEA. All rights reserved 2015 COOL Chips E. Beigné 27

Energy/cycle (pj/cycle) 450 400 350 300 250 200 150 100 50 DSP energy performance measurements Cliquez pour modifier le style du titre Boost FBB 0V 0mV FBB 0.5V FBB 1.0V FBB 1.5V FBB 2.0V Boost 500mV Boost 1000mV Boost 1500mV Boost 2000mV +59% frequency +14% frequency -20% energy/cycle 0 300 500 700 900 1100 1300 1500 1700 1900 2100 2300 2500 FBB and F MAX tracking : Frequency up to 59% (target 100pJ/cycle) Energy by 20% (target 1.7 GHz) Frequency (MHz) CEA. All rights reserved 2015 COOL Chips E. Beigné 28

Frequency (MHz) 3000 1000 100 10 1 Comparison with State of the Art WVR Cliquez pour modifier le style du titre This work INTEL, 22nm, ISCC 2012 INTEL, 32nm, ISCC 2012 MIT, TI, 28nm, ISSCC 2011 MIT, TI, 28nm, ISSCC 2011 NXP, IMEC, 32nm, ISSCC 2011 INTEL, 32nm, ISSCC MIT, TI, 28nm, ISCC 2012 2011 CSEM (100µW/MHz) 22nm, INTEL, ISSCC TI, CEVA, 40nm TI, 40nm TI (200µW/MHz) 0 0.2 0.4 0.6 0.8 1 0 0,2 0,4 0,6 0,8 1 1,2 1,4 1.2 1.4 Supply Voltage (V) CEA. All rights reserved 2015 COOL Chips E. Beigné 29

Low power SOC challenges : Energy Efficiency Fine-Grain AVFS solutions FDSOI technology brings a new actuator Fine-Grain AVFS with Body Biasing Silicon results in FDSOI 28nm Future IoT needs and Conclusions CEA. All rights reserved Diego Puschini ICDV 2014 30

What about FDSOI for IoT applications? Cliquez pour modifier le style du titre Energy harvesting will bring more challenges to Voltage supplies levels Autonomous Wireless Sensor Nodes are requiring even more Versatility Moving the operating range from very low power to medium speed Leakage is a big constraint Possible use of Reverse Body Biasing and Poly Biasing Analog and RF is performant Towards fully integrated mixed-signal nodes CEA. All rights reserved 2015 COOL Chips E. Beigné 31

Cliquez pour modifier le style Conclusion du titre Circuit designers are faced with conflicting requirements Decrease the power dissipation AND Maintain the performance In a world of increasing parametric variations This can be met thanks to a combination of design techniques Reduce design margins and adapt to environment SenseReact paradigm Leverage the technology s possibilities Well engineering and back-biasing in FDSOI Take home message Develop technology that brings additional actuators to circuit designers This will unleash their creativity Think about low leakage applications like IoT Low voltage performances Performance adaptation CEA. All rights reserved 2015 COOL Chips E. Beigné 32