Low-Overhead Hard Real-time Aware Interconnect Network Router

Similar documents
Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

- Nishad Nerurkar. - Aniket Mhatre

Quality of Service (QoS) for Asynchronous On-Chip Networks

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Asynchronous Bypass Channels

The proliferation of the raw processing

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Optimizing Data Center Networks for Cloud Computing

vci_anoc_network Specifications & implementation for the SoClib platform

Advanced Core Operating System (ACOS): Experience the Performance

Chapter 11 I/O Management and Disk Scheduling

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

On-Chip Interconnection Networks Low-Power Interconnect

Packetization and routing analysis of on-chip multiprocessor networks

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

Networking Virtualization Using FPGAs

The Internet of Things: Opportunities & Challenges

Foundation for High-Performance, Open and Flexible Software and Services in the Carrier Network. Sandeep Shah Director, Systems Architecture EZchip

White Paper Abstract Disclaimer

Router Architectures

Scalability and Classifications

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Switched Interconnect for System-on-a-Chip Designs

Chapter 2 Heterogeneous Multicore Architecture

OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect

Link Layer. 5.6 Hubs and switches 5.7 PPP 5.8 Link Virtualization: ATM and MPLS

NVM memory: A Critical Design Consideration for IoT Applications

Adaptive Cruise Control System Overview

Optimizing Converged Cisco Networks (ONT)

Real-time apps and Quality of Service

Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors

What is a System on a Chip?

Resource Utilization of Middleware Components in Embedded Systems

ESSENTIALS. Understanding Ethernet Switches and Routers. April 2011 VOLUME 3 ISSUE 1 A TECHNICAL SUPPLEMENT TO CONTROL NETWORK

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Seven Challenges of Embedded Software Development

5 Performance Management for Web Services. Rolf Stadler School of Electrical Engineering KTH Royal Institute of Technology.

Communication Networks. MAP-TELE 2011/12 José Ruela

Pre-tested System-on-Chip Design. Accelerates PLD Development

LOCAL INTERCONNECT NETWORK (LIN)

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Systems on Chip and Networks on Chip: Bridging the Gap with QoS

The 5G Infrastructure Public-Private Partnership

Architectures and Platforms

OpenSPARC T1 Processor

Clearing the Way for VoIP

TÓPICOS AVANÇADOS EM REDES ADVANCED TOPICS IN NETWORKS

Switching Architectures for Cloud Network Designs

OpenSoC Fabric: On-Chip Network Generator

P545 Autonomous Cart

CS6204 Advanced Topics in Networking

In-Vehicle Networking

Why the Network Matters

Solving Network Challenges

ELEC 5260/6260/6266 Embedded Computing Systems

Software Stacks for Mixed-critical Applications: Consolidating IEEE AVB and Time-triggered Ethernet in Next-generation Automotive Electronics

Synapse s SNAP Network Operating System

Computer Organization & Architecture Lecture #19

Smart Queue Scheduling for QoS Spring 2001 Final Report

Interconnection Networks

The SA601: The First System-On-Chip for Guitar Effects By Thomas Irrgang, Analog Devices, Inc. & Roger K. Smith, Source Audio LLC

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

How To Understand The Power Of The Internet

SOC architecture and design

Securing Local Area Network with OpenFlow

Embedded Systems on ARM Cortex-M3 (4weeks/45hrs)

Data Center and Cloud Computing Market Landscape and Challenges

ECE 358: Computer Networks. Solutions to Homework #4. Chapter 4 - The Network Layer

Open Flow Controller and Switch Datasheet

Requirements of Voice in an IP Internetwork

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

Achieving Low-Latency Security

Enabling Cloud Architecture for Globally Distributed Applications

White Paper. Requirements of Network Virtualization

AUTOMOTIVE FIELDBUS TECHNOLOGY: DEVELOPMENT TOOLS AND ELECTRONIC EQUIPMENT FOR LABORATORY PRACTICES

QoS issues in Voice over IP

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

Tyrant: A High Performance Storage over IP Switch Engine

Driving force. What future software needs. Potential research topics

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Interconnection Network Design

B4: Experience with a Globally-Deployed Software Defined WAN TO APPEAR IN SIGCOMM 13

Switch Fabric Implementation Using Shared Memory

Transcription:

Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Computer Architecture and

Current System Trends The convergence of application specific processors and multicore systems New phase where we see two distinctive computing architectures emerging Massive parallel homogenous cores Heterogeneous many-core architectures

Homogeneous Many-cores Tilera- like homogenous architectures: servers, cloud compu8ng Great for trivially parallel applica8ons Having similar cores on the die makes the manufacturing and tes8ng process more manageable Homogeneous collec8on of cores also keeps the so?ware support model simple Lack of specialization of hardware to different tasks

Heterogeneous Many-cores Integration of heterogeneous technologies These SoC architectures have large number of processing units programmable RISC/CISC cores, memory, DSPs, and accelerator function units/asic Logic diagram of the hardware processing units in a typical smart phone today

Complex SoC Architectures SoC architectures need more research and standardization to encourage high degree of reusability SoC design complexity trends [International Technology Roadmap for Semiconductors 2011 Report]

Design Challenges Design of heterogeneous many-core systems shares many of the same challenges in general-purpose homogeneous architectures, but there are few added design constraints SoC architectures are deployed in many computation environments that require concurrent execution of several tasks with different, sometimes opposite, performance goals Integration of different services on the same computing platform requires rethinking of the cores, the memory hierarchy, and the interconnect network On these platforms, we need to think not only in terms of tasks and task parallelism, but also services and service guarantees Quality of service at the on-chip network level

Network Impact on Performance 3 x 109 Conventional multicore architecture execution profile 12000 2.5 10000 Compute Time time (cycles) 2 1.5 1 0.5 8000 6000 4000 2000 Power (mw) 0 0 1 2 4 6 8 10 12 14 16 18 0 Number of cores

Dynamic Nature of On-Chip Routing A D A S

Dynamic Nature of On-Chip Routing B D A D B S A S

Dynamic Nature of On-Chip Routing B D A D C D B S A S C S

Dynamic Nature of On-Chip Routing B D A D C D B S A S C S

Conventional Router PE L R R PE H R R PE G R R Rou8ng Module VC state VC Allocator Switch Allocator Output Rou8ng Computa8on (RC) Virtual Channel Alloca8on (VA) PE F PE A R PE E PE B R 3x3 Mesh network PE D PE C R VC state crossbar switch Router architecture Output Switch Alloca8on (SA) Switch Traversal (ST) Router: rou8ng phases

Network Level Quality of Service The Hard Real-time Support (HRES) router is able to: Decouple hard real-time and best effort traffic using a twodatapath routing scheme Maximize link throughput and guarantee hard real-time timing constraints Provide fairness of link utilization among the two classes of traffic Be acknowledgment-free, retransmission-free, and lossless Be deadlock and livelock free with no modification to the buffered datapath of the router. It has low hardware overhead for supporting hard real-time communications.

Hard Real-time Support (HRES) router Best-effort latency VC state Input flit 1 Guaranteed latency Flit TY VC Payload TY VC Payload Head, body, tail, or All Head, Body, Tail, or All And Guaranteed-latency bit

Hard Real-time Support (HRES) router Routing Module VC Allocator Switch Allocator VC State Input 1 Output Guaranteed Service Selector

Other Routers Routing Module VC Allocator Switch Allocator Routing Module VC Allocator Switch Allocator Input VC state Output Input VC state Output Input VC state crossbar crossbar switch switch Guaranteed Service Selector Output Input VC state crossbar switch Output Two-network router Single shared crossbar router

Area Comparison of Architectures Number ports Cell area HRES router 392 47190.44 Two-Network router 721 51766.84 Single crossbar router 392 52812.94 Two-network router has duplication of wires and logic Lead to more cell area Changes in the network interface Single large crossbar: switch arbitration logic is modified to give priority to the real-time traffic Increase in switch arbitration datapath and router critical path Real-time and normal traffic requests must be serialized

Area and Power IBM 45-nm SOI CMOS technology cell library Comparable static power at the different design points Comparable logic area at the different design points

subject to: QoS-Aware On-Chip Routing f i[r] (u, v) w i[r] (u, v) (1) i (u,v) E i, (u, v) E f i[r] (u, v) S [r] (u, v) (2) S[r] (u, v) 1 (3) 0 i f i[r] (u, v) c [i] (u, v) c(u, v) (4) (u, v) E ( i f i (u, v)+ i f i[r] (u, v)) c(u, v) (5) i, j {1,...,k i } i f i[r] K i[r] (6) Ki[r] ki (7) (u,v) E f (i,j)[r](u, v) f (i,j)[r] deadline i (8)

subject to: QoS-Aware On-Chip Routing f i[r] (u, v) w i[r] (u, v) (1) i (u,v) E i, (u, v) E f i[r] (u, v) S [r] (u, v) (2) S[r] (u, v) 1 (3) Non-starvation 0 i f i[r] (u, v) c [i] (u, v) c(u, v) (4) (u, v) E ( i f i (u, v)+ i f i[r] (u, v)) c(u, v) (5) Deadlines i, j {1,...,k i } i f i[r] K i[r] (6) Ki[r] ki (7) (u,v) E f (i,j)[r](u, v) f (i,j)[r] deadline i (8)

Hybrid electric vehicle (HEV) Application In a modern automobile there are as many as as 70 electronic control units (ECUs) embedded in a vehicle* In the HEV, a motor drive, with a controlled inverter system, is needed to deliver powerful and efficient drive to the electric motor Mixed-criticality application: hard real-time and best effort Engine Converter Battery Pack Synchronous Motor & drive Start of System-Step Planetary Gear Synchronous Generator & Drive Gas Pedal & brake Crash Sensors Input Analysis Previous Distributed Control State Previous State Variables Control Algorithm Monitoring Unit Guards Evaluation Monitoring Unit Data Storage State Selection Continuous State 2 Continuous State 3 State Variables Control System Emulator Data Storage Distributed Control Output Signals System Components Actuators End of System-Step * 2009 Automotive Embedded Systems Handbook Continuous State 1

Evaluation Results HEV HEV Average number of packets received in one system step 16 14 12 10 8 6 4 L Bless Ls Bless 2 v QoS p QoS HRES 0 0 50 100 150 200 250 300 Number of state and monitoring variables Average number of packets received in one system step 16 14 12 10 8 6 4 2 v QoS p QoS HRES 0 0 50 100 150 200 250 300 Number of state and monitoring variables

Evaluation Results Total system step latency (cycles) 7 x 105 6 5 4 3 2 HEV L Bless 1 Ls Bless v QoS p QoS HRES 0 0 50 100 150 200 250 300 Number of state and monitoring variables Average number of packets received in one system step 16 14 12 10 8 6 4 2 HEV C1: 50% capacity sharing C2: 65% capacity sharing C3: 70% capacity sharing 0 0 50 100 150 200 250 300 Number of state and monitoring variables

Q & A Sec8on Thank You! More Informa+on at http://caes.cs.uoregon.edu/