Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Computer Architecture and
Current System Trends The convergence of application specific processors and multicore systems New phase where we see two distinctive computing architectures emerging Massive parallel homogenous cores Heterogeneous many-core architectures
Homogeneous Many-cores Tilera- like homogenous architectures: servers, cloud compu8ng Great for trivially parallel applica8ons Having similar cores on the die makes the manufacturing and tes8ng process more manageable Homogeneous collec8on of cores also keeps the so?ware support model simple Lack of specialization of hardware to different tasks
Heterogeneous Many-cores Integration of heterogeneous technologies These SoC architectures have large number of processing units programmable RISC/CISC cores, memory, DSPs, and accelerator function units/asic Logic diagram of the hardware processing units in a typical smart phone today
Complex SoC Architectures SoC architectures need more research and standardization to encourage high degree of reusability SoC design complexity trends [International Technology Roadmap for Semiconductors 2011 Report]
Design Challenges Design of heterogeneous many-core systems shares many of the same challenges in general-purpose homogeneous architectures, but there are few added design constraints SoC architectures are deployed in many computation environments that require concurrent execution of several tasks with different, sometimes opposite, performance goals Integration of different services on the same computing platform requires rethinking of the cores, the memory hierarchy, and the interconnect network On these platforms, we need to think not only in terms of tasks and task parallelism, but also services and service guarantees Quality of service at the on-chip network level
Network Impact on Performance 3 x 109 Conventional multicore architecture execution profile 12000 2.5 10000 Compute Time time (cycles) 2 1.5 1 0.5 8000 6000 4000 2000 Power (mw) 0 0 1 2 4 6 8 10 12 14 16 18 0 Number of cores
Dynamic Nature of On-Chip Routing A D A S
Dynamic Nature of On-Chip Routing B D A D B S A S
Dynamic Nature of On-Chip Routing B D A D C D B S A S C S
Dynamic Nature of On-Chip Routing B D A D C D B S A S C S
Conventional Router PE L R R PE H R R PE G R R Rou8ng Module VC state VC Allocator Switch Allocator Output Rou8ng Computa8on (RC) Virtual Channel Alloca8on (VA) PE F PE A R PE E PE B R 3x3 Mesh network PE D PE C R VC state crossbar switch Router architecture Output Switch Alloca8on (SA) Switch Traversal (ST) Router: rou8ng phases
Network Level Quality of Service The Hard Real-time Support (HRES) router is able to: Decouple hard real-time and best effort traffic using a twodatapath routing scheme Maximize link throughput and guarantee hard real-time timing constraints Provide fairness of link utilization among the two classes of traffic Be acknowledgment-free, retransmission-free, and lossless Be deadlock and livelock free with no modification to the buffered datapath of the router. It has low hardware overhead for supporting hard real-time communications.
Hard Real-time Support (HRES) router Best-effort latency VC state Input flit 1 Guaranteed latency Flit TY VC Payload TY VC Payload Head, body, tail, or All Head, Body, Tail, or All And Guaranteed-latency bit
Hard Real-time Support (HRES) router Routing Module VC Allocator Switch Allocator VC State Input 1 Output Guaranteed Service Selector
Other Routers Routing Module VC Allocator Switch Allocator Routing Module VC Allocator Switch Allocator Input VC state Output Input VC state Output Input VC state crossbar crossbar switch switch Guaranteed Service Selector Output Input VC state crossbar switch Output Two-network router Single shared crossbar router
Area Comparison of Architectures Number ports Cell area HRES router 392 47190.44 Two-Network router 721 51766.84 Single crossbar router 392 52812.94 Two-network router has duplication of wires and logic Lead to more cell area Changes in the network interface Single large crossbar: switch arbitration logic is modified to give priority to the real-time traffic Increase in switch arbitration datapath and router critical path Real-time and normal traffic requests must be serialized
Area and Power IBM 45-nm SOI CMOS technology cell library Comparable static power at the different design points Comparable logic area at the different design points
subject to: QoS-Aware On-Chip Routing f i[r] (u, v) w i[r] (u, v) (1) i (u,v) E i, (u, v) E f i[r] (u, v) S [r] (u, v) (2) S[r] (u, v) 1 (3) 0 i f i[r] (u, v) c [i] (u, v) c(u, v) (4) (u, v) E ( i f i (u, v)+ i f i[r] (u, v)) c(u, v) (5) i, j {1,...,k i } i f i[r] K i[r] (6) Ki[r] ki (7) (u,v) E f (i,j)[r](u, v) f (i,j)[r] deadline i (8)
subject to: QoS-Aware On-Chip Routing f i[r] (u, v) w i[r] (u, v) (1) i (u,v) E i, (u, v) E f i[r] (u, v) S [r] (u, v) (2) S[r] (u, v) 1 (3) Non-starvation 0 i f i[r] (u, v) c [i] (u, v) c(u, v) (4) (u, v) E ( i f i (u, v)+ i f i[r] (u, v)) c(u, v) (5) Deadlines i, j {1,...,k i } i f i[r] K i[r] (6) Ki[r] ki (7) (u,v) E f (i,j)[r](u, v) f (i,j)[r] deadline i (8)
Hybrid electric vehicle (HEV) Application In a modern automobile there are as many as as 70 electronic control units (ECUs) embedded in a vehicle* In the HEV, a motor drive, with a controlled inverter system, is needed to deliver powerful and efficient drive to the electric motor Mixed-criticality application: hard real-time and best effort Engine Converter Battery Pack Synchronous Motor & drive Start of System-Step Planetary Gear Synchronous Generator & Drive Gas Pedal & brake Crash Sensors Input Analysis Previous Distributed Control State Previous State Variables Control Algorithm Monitoring Unit Guards Evaluation Monitoring Unit Data Storage State Selection Continuous State 2 Continuous State 3 State Variables Control System Emulator Data Storage Distributed Control Output Signals System Components Actuators End of System-Step * 2009 Automotive Embedded Systems Handbook Continuous State 1
Evaluation Results HEV HEV Average number of packets received in one system step 16 14 12 10 8 6 4 L Bless Ls Bless 2 v QoS p QoS HRES 0 0 50 100 150 200 250 300 Number of state and monitoring variables Average number of packets received in one system step 16 14 12 10 8 6 4 2 v QoS p QoS HRES 0 0 50 100 150 200 250 300 Number of state and monitoring variables
Evaluation Results Total system step latency (cycles) 7 x 105 6 5 4 3 2 HEV L Bless 1 Ls Bless v QoS p QoS HRES 0 0 50 100 150 200 250 300 Number of state and monitoring variables Average number of packets received in one system step 16 14 12 10 8 6 4 2 HEV C1: 50% capacity sharing C2: 65% capacity sharing C3: 70% capacity sharing 0 0 50 100 150 200 250 300 Number of state and monitoring variables
Q & A Sec8on Thank You! More Informa+on at http://caes.cs.uoregon.edu/