An Ultra-low low energy asynchronous processor for Wireless Sensor Networks.Necchi,.avagno, D.Pandini,.Vanzago Politecnico di Torino ST Microelectronics
Wireless Sensor Networks - Ad-hoc wireless networks - Sensing - Computation - Actuation Application areas: Monitoring Building automation Health care, Medical Emergency response Automotive Async 06 - March 13-15 2
3 Key WSN Requirements Flexibility (general purpose design) High energy efficiency (battery powered) Extremely wide voltage supply range Exhausted battery or energy scavenging Fast and inexpensive wake-up event driven power management (not predictable) Sporadic high computational load Encryption (security) Aggregation, distributed data processing
Sensor node architecture Main components of a WSN node: Microcontroller Atmel AVR TI Memory MSP430 Radio Sensors / Actuators Power supply Battery (energy storage) Power scavenging Async 06 - March 13-15 4
5 Circuit-level Power Management Management Clock Gating Power Gating Dynamic Voltage Scaling Adaptive Body Biasing Save energy while Idle X X Active X X Scenario Idle Time ong Deadlines DVS can be obtained by: Off-line pre-computed voltage/frequency tables High delay margins Evaluated on-line: PowerWise,, Razor, Asynchronous, De-synchronization
Closed-loop loop DVS technique PowerWise: Samples, with a high frequency clock, the output of a digital delay line, and arrange voltage supply to deliver required performance Razor: Detects timing errors comparing values stored in duplicated slave latches, in which the second is clocked half clock cycle later, restarts the pipeline and arranges voltage supply accordingly Asynchronous with Dual-Rail encoding: (Quasi) delay insensitive implementation, that guarantees correctness for (almost) every voltage supply and process variation Asynchronous with Bundled Data encoding: A digital delay line output is directly used to generate a local clock signal, resulting in a direct dependence between voltage supply and delay period Async 06 - March 13-15 6
7 De-synchronization Synchronous Desynchronize Asynchronous CK CK
8 Design Flow HD RT Synthesis & Optimization Obtain asynchronous implementation from synchronous specification: ibrary Netlist De-synchronization Netlist Physical Design ayout Think synchronously Design synchronously De-synchronize (automatically) Test synchronously Run asynchronously
Synchronous circuit MS flip-flop 0 1 0 1 CK 0 0 Async 06 - March 13-15 9
De-synchronization 0 1 0 1 C C C C C 0 C 0 Async 06 - March 13-15 10
11 De-synchronization Distributed micropipeline-style controllers substitute the clock network C C C C C C The data path remains intact!
12 Flow equivalence [Guernic, Talpin, ann, 2003] A B
13 Flow equivalence [Guernic, Talpin, ann, 2003] K A 1 3 0 2 1 5 3 1 6 0 B 5 1 2 3 1 4 2 4 3 1 Synchronous behavior A 1 3 0 2 1 5 3 1 6 0 B 5 1 2 3 1 4 2 4 3 1 De-synchronized behavior
Flow equivalence [Guernic, Talpin, ann, 2003] K A 1 3 0 2 1 5 3 1 6 0 B 5 1 2 3 1 4 2 4 3 1 Synchronous behavior A 1 3 0 2 1 5 3 1 6 0 B 5 1 2 3 1 4 2 4 3 1 De-synchronized behavior Theorem: The de-synchronization model preserves flow-equivalence Async 06 - March 13-15 14
15 Flow equivalence [Guernic, Talpin, ann, 2003]
De-synchronization Benefits For the end user: Reduced electromagnetic emission Process Variation tolerance Enables partial average case design, Async 06 - March 13-15 wrt process & environment variation (not wrt data-dependent dependent delay) The resulting circuit will be: Ready for frequency and voltage scaling Inherently more robust to delay variations Virtually no performance or area overhead wrt synchronous For the designer Conventional EDA Tools and design flow imited design time and effort, fully automated Re-use legacy designs 16
17 Asynchronous advantages not offered by de-synchronization Fine-grained power management The desynchronized circuit inherits the synchronous clock gating Fine-grained pipelining The pipeline structure is not changed Data-dependent delays Could be exploited by using a datapath with completion detection (work in progress) Robustness with respect to uncorrelated local variability Would require completion detection
18 Synchronous ogic Interfacing C C C 0 1 0 1 0 1 FAST C C C OGIC Data path (not modified) Handshaking line
19 Synchronous ogic Interfacing C C C 0 1 0 1 0 1 SOW C C C OGIC Synchronized with an external slower clock -Just low EMI External CK
20 Synchronous ogic Interfacing C C C 0 1 0 1 C C 0 1 C SEF TIMED OGIC Example: SRAM with Completion Detection
21 Sensor node architecture Main components of a WSN node: Microcontroller Memory Radio Sensors / Actuators Power supply Battery (energy storage) Power scavenging Atmel AVR
Our Case Study Application independent 8 Bit CPU architecture: Atmel AVR Instruction Set (like MICA2 - MICAZ) from OpenCores.org,, implemented with a 130nm technology Toolchain and lots of software are ready to use nesc, TinyOS, TinyDB,, Surge, Tossim Aggressive Energy management enabled by de-synchronization, using: Dynamic Voltage Scaling zero wake-up time (No CK, no wait for P to restart) 22
23 Typical AVR architecture INSTR. Memory DATA Memory Instruction FETCH 0 1 MEM Instruction Access DECODE AU Execution Data Path (8 bit) External CK Address bus Clk distribution
24 Design Choices Main target is energy efficiency (vs( speed) arge delay margins (100%) to increase robustness at low voltage supply AVR core is really small (~4500 gates), hence we used a Single controller Reduced area overhead No electro magnetic emission reduction
25 De-synchronized AVR INSTR. Memory DATA Memory Instruction FETCH 0 1 MEM Instruction Access DECODE AU Execution C Data Path Address bus Handshake signal distribution Delay chain
26 ogic and Delay ine Matching
Energy Efficiency Energy per Instruction Power Consumption Async 06 - March 13-15 eakage per instruction Voltage Supply [V] ogic Delay 27
28 Energy Efficiency
29 Some Past Work Comparison Philips 80c51 (H. van Gageldonk., 1998) Asynchronous bundled-data data implementation of the 8051 ISA, general purpose. utonium (A. Martin et al., 2003) Asynchronous QDI implementation of the 8051 ISA. Snap/le (V. Ekanayake et al., 2004) Asynchronous QDI processor specifically designed for WSN. Razor (D. Ernst et al., 2004) Synchronous processor that estimated the best Vdd by dynamically monitoring the delay of the logic using a redundant latching schema.
30 CONCUSIONS Aggressive Energy management using DVS 14 pj/instr @ 1.2 V (170 MIPS) 2.7 pj/instr @ 0.51 V ( 48 MIPS) Minimal overhead wrt synchronous counterpart +6% area (due to FF->latch conversion) -20% speed (could be improved by reducing margins) Future work: Analysis with other SPICE-like simulators (Hsim( Hsim) Statistical simulations to check robustness wrt process variability (Monte Carlo) Fabrication (?)