Application of control theory to ManyProcessor System-on-Chip (MPSoC) (computing platforms) Grenoble Workshop on Autonomic Computing and Control May 27, 2013 www.cea.fr Suzanne LESECQ, CEA, LETI, DACLE/LIALP suzanne.lesecq@cea.fr D. Puschini, E. Beigné, W. Lombardi, A. Molnos, J. Mottin, V. Olive L. Vincent (post-doc fellow with Persyval-lab) Y. Akgul, M. Altieri, N.-M. Nguyen, M. Becher T. Ducroux(with STM) Cliquez pour modifier le style Context du titre Computing platforms Embedded systems Mono-Processor, DSP, smart sensor nodes Many-Processor System-on-Chip (MPSoC) Dedicated computing platforms, e.g. H.264 hardware encoder Main challenge for embedded (mobile) platforms Supply voltage V dd Power consumption P under performance constraints ( target ) Clock req. Body bias V bb Processing Element (PE) Power consumption Temperature increase Task to be finished before its deadline DL target CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 2 Task DL t 1
Cliquez pour modifier Main le style objective du titre Adaptive architecture to mitigate local but also dynamic PVT variations need for T,V evolution Power Domain Local DVS control Vdd actuator actuator Processing Element ast local adjustment Info extract (Data fusion) Power Domain Global Control Power Domain Power Domain Power Domain Objective: Reach the most energy efficientand safe operating point CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 3 Cliquez pour modifier With le constraints style du titre Advanced technologies Power consumption highly depends on temperature Thermal runaway! Low complexity of the control strategies Possibly, implementation in hardware (Silicon) CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 4 2
Cliquez pour modifier Under le style Variability du titre Dynamics for Process-Voltage-Temperature variations very different Process Voltage Temperature J. Cain and al., "Electrical linewidth metrology for systematic CD variation characterization and causal analysis," Metrologt, Inspection, and Process Control formicrolithographyxvii, ProceedingsofSPIE, vol. 5038,pp. 350-361,2003. LIRMM P. Li and al., Efficient full-chip thermalmodeling and analysis, Computer Aided Design, 2004. ICCAD-2004.IEEE/ACM International Conference on, pp. 319 326, 2004. P T Rui Zheng and al.,"circuit Aging Prediction for Low-Power Operation, CICC, 2009 Keng L. Wong and al., Enhancing Microprocessor Immunity to Power Supply Noise With Clock-Data Compensation, IEEE JOURNAL O SOLID-STATE CIRCUITS, VOL. 41, NO. 4, APRIL 2006 J. Altet and al., Thermal couplingin integrated circuits: application to thermal testing, Solid-StateCircuits, IEEE Journal of, vol. 36, pp. 81 91, 2001. CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 5 Adaptive architecture: power management Cliquez pour modifier le style du titre Power consumption P = Pstat ( Vdd, Vbb, T, techno) + P dyn (, V 2 dd (, T ), V bb, activity) Timing faults to be avoided Non functional zone T 1 clk Timing fault clk T time V min V max V dd Temperature measurement/estimation CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 6 3
Cliquez pour modifier le style du titre rom previous talks (Alberto, Erik, Ada) Modulating control # inite set of values System # inite set of values unctional system (without our control/observation tools) Improve power efficiency Improve performances CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 7 Local control Back to Adaptive architecture Cliquez pour modifier Main issues (and le style DSOI technology) du titre Closed-loop systems Power Domain V actuator actuator V bb actuator Multi probe Activity TSM TSM Core Multi probe Multi probe 8-to-1 multiplexer Scan in Test Stage 1 Stage 2 Stage n Ring-oscillator#1 Ring-oscillator#2. Ring-oscillator#7.. 28 bits Counter Start / Stop Overflow bit Adressdecoder 3 bits Config Scan out DL WL ast local adjustment {PM i} P V Info extract (Data fusion) {V,T} estimated Global Control (Scheduler, OS) Power P 1 P tot P 2 α.dl (1-α).DL DL t CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 8 4
Platform level Cliquez pour modifier le style du Outline titre (Process ) Voltage Temperature estimation Sensor Local estimation of V and T Validation on a hardware platform Models (Memory) VT Estimation MultiProbe Sensor: 7 ROs Choice of set point (, Vdd(, Vbb)) Control of set-points And particular implementations CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 9 Cliquez pour modifier Variability le style sensors du titre Monitor local variations of V and T Integrated (on-chip) sensors Dedicated sensors Precise, absolute value but limited V,T functioning range Analog large size + ADC General purpose sensors Ring-Oscillator : RO = f (P,V,T) Modèles Proposed sensor: MultiProbe, a set of ROs co-located Standard cell : easy conception small: easilly integrated et replicated on chip V and T not directly read Estimation VT MultiProbe Test Start / Stop 8-to-1 multiplexer Stage 1 Stage 2 Stage n Ring-oscillator #1 Ring-oscillator #2.. Ring-oscillator. #7 Adress decoder Counter ROs Scan in 28 bits Counter Overflow bit 3 bits Config Scan out 31µm x 14.4µm = 450 µm² in CMOS 32nm CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 10 5
Cliquez pour modifier Multiprobe le style du sensor titre Test Start / Stop Modèles Estimation VT MultiProbe Stage 1 Stage 2 Stage n 8-to-1 multiplexer Scan in 28 bits Counter Ring-oscillator#1 Ring-oscillator#2... Ring-oscillator#7 Overflow bit Adress decoder 3 bits Config Scan out 7 Ring Oscilators with different architectures Exploit differentbehavioursin orderto estimatethe Voltage and Temperature values LowTherm CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 11 Cliquez pour modifier VT estimation: le style Principle du titre Observers? model Comparison between models and mesure Modèles Estimation VT MultiProbe Power domain MProbe Processing Element MProbe MultiProbe (sensor: 7 ROs) Info extract (Data fusion) Models (Memory) VT estimation { V ˆ, T ˆ } Estimat ed Goodness-of-fit test { V ˆ, T ˆ } estimation CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 12 6
i i Cliquez pour modifier le VT style estimation du titre Based on Hypothesis testing Modèles Estimation VT MultiProbe Models Measurement Pre-treatment pvalue Models (Memory) VT Estimatio M{ V1, T1 } L M{ V1 }, Tq Models storage Build the CD 7 frequencies MultiProbe (sensor: 7 ROs) M M{ Vi, Ti } M M{ V, T } L M{ V, T } p 1 p q Models reading Goodness-of-fit test Kolmogorov-Smirnov test { Vˆ, T ˆ } estimated CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 13 Cliquez pour modifier Simulation le style du results titre Models (emory) M L M { V } { V }, T, T 1 1 1 q Modèles Stockage Estimation VT Constructeur de CD Modèles Estimation VT MultiProbe M M{ V, T } M{ V, T } L M{ V, T } p 1 M p q Modèles Lecture Goodness-of-fit test Kolmogorov-Smirnov test 7 réquences MultiProbe (sensor: 7 ROs) pvalue Aggregation Weighted mean value { Vˆ, T ˆ } estimated x Estimated state V = 0,831V T = 11,66 C ᴏ Real state V = 0,83V T = 12 C Estimation mean errors µ εv =2,42mV, σ εv =5,00mV µ εt =-0,58 C, σ εt =7,46 C Depend on: Number of models Statistical test Measurement pre-treatment CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 14 7
Cliquez pour modifier Temporal le performances style du titre SThorm: 4x(16 cores+ 8 MProbes) Modèles Estimation VT MultiProbe Voltage Estimation??? V Software implementation 2500 clock cycles per model evaluation 605µs @ 500MHz Hardware (dedicated) implementation 42 cycles per model 10kbits of memory 9k gates 10µs @ 500MHz 10µs aster version for V estimation CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 15 Cliquez pour modifier Temporal le performances style du titre Modèles Estimation VT MultiProbe V T x15 V Monitoring / estimation using both methods( VT and V) CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 16 8
Cliquez Validation pour : performed modifier on le STHORM style du platform titre SThorm : 4x(16 cores+ 8 MProbes) Modèles Estimation VT MultiProbe Measurements performed in an oven CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 17 Cliquez pour modifier le style Validation du titre Measurementson a multiprobein STHORM Modèles Estimation VT MultiProbe CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 18 9
Cliquez pour modifier What can le style we do du now? titre VT estimation task allocation under thermal constraints ast online thermal floorplan construction : know the temperatureatdesiredlocation (not sensor location) Mitigation using thermal aware scheduling (e.g. OpenCL) at cluster granularity DEMO at DAC 2014 Heat dissipated by 1 PE Implementation References L. Vincent, P. Maurine, S. Lesecq, and E. Beigné, Embedding Statistical Tests for on-chip Dynamic Voltage and Temperature Monitoring, DAC 2012 L. Vincent, P. Maurine, E. Beigne, S. Lesecqand J. Mottin, "Temperature and ast Voltage On-Chip Monitoring using Low-Cost Digital Sensors", VARI 2013 CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 19 Cliquez pour modifier le style du Outline titre (Process) Voltage Temperature estimation Choice of set point (, Vdd, Vbb) Advanced technologies new parameter to be adjusted Control of set-points And particular implementation CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 20 10
A third actuator with large output range? Cliquez pour modifier le style du titre Control actuator D-SOI? V dd actuator PE V bb actuator Power domain (with on-chip actuators) Objective: choose the set point (, V dd, V bb ) under performance constraints P tot V dd,3 target V dd,2 V dd,1 V bb,min V bb V CEA. All rightsreserved bb,max S. LesecqSTAARS workshop May 27, 2014 21 Cliquez pour modifier State le style of du the titre art Combination of actuators Traditionally 2 actuators V dd ++ V bb + V dd V bb P tot () profile is convex [1] 3 actuators V bb is modified once V dd New opportunities Dynamic management? [2] Continuous vs discrete actuator 3 continuous actuators P tot V dd [V dd,min, V dd,max ] V bb [V bb,min, V bb,max ] [ low, high ] 2 continuous and 1 discrete Control V dd = V dd,i, with actuator i=1..n (implemcts) V bb [V bb,min, V bb,max ] [ low, V high dd actuator ] PE V Choose appropriate configuration (, V dd, V bb ) to minimize bb actuator power consumption in the case of 2 continuous actuators Power and domain 1 discrete (with on-chip actuators)? [1] R. Rao, et al., Energy optimal speed control of devices with discrete speed sets, DAC 2005 [2]. irouzi, et al., Dynamic soft error hardening via joint body biasing and dynamic voltage scaling, Euromicro 2011 CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 22 11
Cliquez pour Motivations modifier le Assumptions style du titre Several configurations for target Apply target is not optimal P tot P i V dd,2,v bb,2 V dd,2,v bb,1 Which configuration (, V dd, V bb ) should be applied to minimize power consumption under performance constraints? target 1 target = i V dd,1, V bb V dd,2, V bb Assumptions V dd discrete, V bb continuous P tot known or given (V dd, ), V bb is adjusted to minimize P tot Known performance constraints target Control? actuator V dd actuator PE V bb actuator Voltage requency Island target CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 23 Cliquez pour modifier le style Proposition du titre 3 actuators out of which one is discrete implementation constraints Proposed method Selection phase PWCS Execution phase P tot A B P tot () PWCS M1 M2 target target target Mode 1 (M1): Apply one configuration at target belonging to the PWCS Yes target in the PWCS? No Mode 2 (M2): Apply the 2 closestconfigurations in the PWCS hopping execution Ensure optimal power consumption on the whole frequency range CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 24 12
Cliquez Results on pour a DSP modifier in DSOI le style technology du titre DSP in STM 28 nm D-SOI Hopping execution (M2) vs. applying directly target with min(p tot ) 1200 V dd =1.3V V dd = {0.7, 0.9, 1.1, 1.3} V V bb [0, 1.5] V [700, 2560] MHz rom ISSCC 2014. R. Wilson, E. Beigne, et al., A 460 MHz at 397mV, 2.6 GHz at 1.3V, 32b VLIW DSP, embedding max tracking. P tot (mw) Power saving(%) 800 V dd =1.1V P tot () PWCS 400 V dd =0.9V V dd =0.7V M1 0 M2 20 600 1600 2600 (MHz) 16 Up to 17 % power saving 8 0 600 1600 2600 (MHz) CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 25 Cliquez pour modifier le style du Outline titre (Process) Voltage Temperature estimation Choice of set point (, Vdd, Vbb) Control actuator Control of set-points? V dd actuator V bb actuator PE Power domain (with on-chip actuators) And particular implementation CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 26 13
Dynamic Cliquez pour Voltage modifier and requency le style du Scaling titre requency Actuator Voltage Actuator CLK V dd Processing Element Continuous clock actuator Continuous voltage actuator V- relation How to ensure staying in functional zone? Timing aults Non unctional Zone unctional Zone CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 27 Cliquez pour modifier Coupled Drivers le style (difficult du resuse) titre Voltage Controlled-Oscillator (VCO) Stay in safe domain? Energy Management Unit VCO V dd Voltage Actuator CLK V dd Processing Element Imprecise clock frequency output (with jitter) Not used in the recent technologies due to PVT variability Jointly designed actuators Reuse is difficult T. Burd, T. Pering, A. Stratakos and R. Brodersen. A dynamic voltage scaled microprocessor system. In Solid-State Circuits Conference, 2000. CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 28 14
Stay in safe domain? Cliquez pour Non-Coupled modifier Drivers le style (promote du reuse!) titre Phase- or requency-locked Loop (PLL or LL) Energy Management Unit requency Actuator Voltage Actuator CLK V dd Processing Element Predefined sequence Poor power efficiency Poor Performance during transient CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 29 Non unctional Zone, unctional Zone, Cliquez Coupled pour modifier actuators: le Joint style du Control titre Coupled Voltage-requency Control Energy Top-level Management Controller Unit P 0 0 Joint Control requency Actuator Voltage Actuator CLK V dd Processing Element Objective: mechanism to: Jointly control V transient periods Increasing power efficiency CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 30 Non unctional Zone, unctional Zone, 15
Cliquez pour modifier le Joint style du Control titre JOINT CONTROL BLOCK requency + -Lim Actuator Δ V-Ref ΔV Voltage + V-Lim Actuator Hypotheses: Closed-loop V actuators V are measurable without delay V actuators are black-box models Linear relation for V- CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 31 Cliquez pour Hardware modifier Implementation? style du titre SetRef Control GenV "!" P 0 CalcVpath 0!" " Calcpath Gen %' ( ) %' ( + / ( +!" ( +!" ( ) ( ) (-./ 0123 (-./ 0123 CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 32 16
Cliquez pour modifier le style du Results titre (V dd = 0.9V; T = -40 C) Nearly the size (area) of a Digital requency-locked Loop¹ ¹C. Albea, D. Puschini, S. Lesecq and E. Beigné. Optimal and robust control for a small-area LL. In Proc. IEEE Mediterranean Conference on Control Automation, pp. 1100 1105, 2011. CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 33 Cliquez pour modifier le Non-Coupled style du titre Performance: 3.81K cycles Non unctional Zone, unctional Zone, CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 34 17
Cliquez pour modifier le style Coupled du titre Performance: 5.81K cycles (52.33%) Non unctional Zone, unctional Zone, CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 35 Cliquez pour modifier le style Comparison du titre 91.71% V Plot Under-clocking (difference between the path and the reference) Improve energy efficiency during transitions Promote design reuse Silicon implementation? CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 36 18
Cliquez pour modifier le style du Outline titre (Process) Voltage Temperature estimation Choice of set point (, Vdd, Vbb) Control of set-points And particular implementation CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 37 Cliquez pour modifier Dedicated le style platforms du titre e.g. H.264 hardware encoder Video encoder, up to HD1280 720 CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 38 19
Cliquez pour modifier Dedicated le style platforms du titre VENGME platform (with VNU, Hanoi) IOs between blocks IOS inside blocks Split in various power domains Adapt (V dd, ) in order to - Meet perf. constraints - Decrease PW consumption CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 39 Cliquez pour modifier le style Summary du titre Needs for control in micro-electronics Hardware digital implementation Extra power consumption Extra Silicon area Simple problems complex ones implemconstraints, Complex problems per se MEMS, New DCDC architectures Silicon photonics for manycore in future micro-servers Thermal tuning to compensate for intrinsic resonance shift (due to PVT variability) CEA. All rightsreserved S. LesecqSTAARS workshop May 27, 2014 40 20
Thank you Centre de Grenoble 17 rue des Martyrs 38054 Grenoble Cedex Centre de Saclay Nano-Innov PC 172 91191 Gifsur Yvette Cedex 21