Systems on Chip and Networks on Chip: Bridging the Gap with QoS



Similar documents
A Generic Network Interface Architecture for a Networked Processor Array (NePA)

Asynchronous Bypass Channels

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Low-Overhead Hard Real-time Aware Interconnect Network Router

Quality of Service (QoS) for Asynchronous On-Chip Networks

Architectures and Platforms

How To Provide Qos Based Routing In The Internet

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

4 Internet QoS Management

QoS Parameters. Quality of Service in the Internet. Traffic Shaping: Congestion Control. Keeping the QoS

Quality of Service versus Fairness. Inelastic Applications. QoS Analogy: Surface Mail. How to Provide QoS?

What is a System on a Chip?

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Chapter 7 outline. 7.5 providing multiple classes of service 7.6 providing QoS guarantees RTP, RTCP, SIP. 7: Multimedia Networking 7-71

On-Chip Interconnection Networks Low-Power Interconnect

Distributed Systems 3. Network Quality of Service (QoS)

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Three Key Design Considerations of IP Video Surveillance Systems

Communication Networks. MAP-TELE 2011/12 José Ruela

Region 10 Videoconference Network (R10VN)

Traffic Engineering & Network Planning Tool for MPLS Networks

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Networks on Chip. on-chip interconnect: physical. Kees Goossens. Kees Goossens Eindhoven University of Technology 1

Motivation. QoS Guarantees. Internet service classes. Certain applications require minimum level of network performance:

7a. System-on-chip design and prototyping platforms

SOC architecture and design

Chapter 1 Reading Organizer

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL

Switched Interconnect for System-on-a-Chip Designs

VOICE OVER IP AND NETWORK CONVERGENCE

Computer Systems Structure Input/Output

CS/ECE 438: Communication Networks. Internet QoS. Syed Faisal Hasan, PhD (Research Scholar Information Trust Institute) Visiting Lecturer ECE

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

Improving Quality of Service

White paper. Latency in live network video surveillance

PART III. OPS-based wide area networks

PCI Express Overview. And, by the way, they need to do it in less time.

SoC IP Interfaces and Infrastructure A Hybrid Approach

Quality of Service (QoS)) in IP networks

Quality of Service in the Internet. QoS Parameters. Keeping the QoS. Traffic Shaping: Leaky Bucket Algorithm

16/5-05 Datakommunikation - Jonny Pettersson, UmU 2. 16/5-05 Datakommunikation - Jonny Pettersson, UmU 4

Real-time apps and Quality of Service

18: Enhanced Quality of Service

Measuring IP Performance. Geoff Huston Telstra

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK

Lecture 4: Introduction to Computer Network Design

Internet Quality of Service

Solving Network Challenges

APPLICATION NOTE 209 QUALITY OF SERVICE: KEY CONCEPTS AND TESTING NEEDS. Quality of Service Drivers. Why Test Quality of Service?

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

White Paper Abstract Disclaimer

Switch Fabric Implementation Using Shared Memory

CHAPTER 6. VOICE COMMUNICATION OVER HYBRID MANETs

Introduction to System-on-Chip

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

Multimedia Requirements. Multimedia and Networks. Quality of Service

Link Layer. 5.6 Hubs and switches 5.7 PPP 5.8 Link Virtualization: ATM and MPLS

AN EVENT-BASED NETWORK-ON-CHIP MONITORING SERVICE

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Analysis of IP Network for different Quality of Service

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

Optimizing Converged Cisco Networks (ONT)

Network Simulation Traffic, Paths and Impairment

Packetized Telephony Networks

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

OpenSoC Fabric: On-Chip Network Generator

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Lecture 16: Quality of Service. CSE 123: Computer Networks Stefan Savage

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Configuring QoS in a Wireless Environment

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP

Requirements of Voice in an IP Internetwork

Lecture 8 Performance Measurements and Metrics. Performance Metrics. Outline. Performance Metrics. Performance Metrics Performance Measurements

The Conversion Technology Experts. Quality of Service (QoS) in High-Priority Applications

Network management and QoS provisioning - QoS in the Internet

Latency on a Switched Ethernet Network

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

QoS issues in Voice over IP

PART II. OPS-based metro area networks

Qsys and IP Core Integration

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

The network we see so far. Internet Best Effort Service. Is best-effort good enough? An Audio Example. Network Support for Playback

STANDPOINT FOR QUALITY-OF-SERVICE MEASUREMENT

1. Memory technology & Hierarchy

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

Quality of Service (QoS) EECS 122: Introduction to Computer Networks Resource Management and QoS. What s the Problem?

Introduction to LAN/WAN. Network Layer

Transcription:

Systems on Chip and Networks on Chip: Bridging the Gap with QoS Philips Research The Netherlands

sources of unpredictability 2 applications unpredictability architectures physical effects but we still want to build predictable systems! the quality of service concept helps

overview 3 1. application & user views leads to quality of service (QoS) concept 2. system on chip (SoC) design view leads to networks on chip (NoC) 3. NoCs and QoS: a synthesis to recuperate predictability types of QoS commitment and their costs 4. the Æthereal approach and architecture

1. application & user views applicationinduced unpredictability

1. future applications 5 convergence of application domains increased functionality and heterogeneity higher semantic content/entropy more dynamism 29000 27000 25000 23000 VBR MPEG DVD stream [VBR: variable bit rate] 21000 19000 17000 15000 worstcase load structural load running average instantaneous load

1. future applications 6 embedded and pervasive applications ("ambient intelligence") real time safety critical users expect predictable behaviour e.g. PC, mobile phone, TV, heating system, air bag low high quality of service is resource management for predictability

1. consumerelectronics requirements 7 consumerelectronics media processing is challenging signal processing hard real time very regular load high quality worst case typically on DSPs media processing hard real time irregular load high quality average case SoC/media processors multimedia soft real time irregular load "sloppy" quality average case PC/desktop

2. SoCs and NoCs architectureinduced unpredictability

2. systems on chip 9 Moore s law predicts exponential growth of resources but.. someone has to do the work to make it come true 1. deep submicron problems (DSM) wire vs. transistor speed, power, signal integrity 2. design productivity gap IP reuse, platforms, NoCs verification integration hell you! DSM nightmare

10 2. example SoC CAB Philips's advanced settop box and digital TV SoC Viper (pnx8500) MBS 0.18 m m / 8M + MPEG 1.8V / 4.5 W VIP 35 M transistors 82 clock domains 1394 TPI more than 50 IP blocks MMI+AICP Conditional access MSP MPI MIPS Viper TriMedia VLIW m m @ 0.09

2. composing local solutions 11 to solve both DSM and productivity gap issues: global approaches won t work with exponential problem so 1. break up problem (modularity) 2. then compose subsolutions 3. in a scalable fashion hierarchy helps abstraction helps

2. composing local solutions examples 12 for timing (closure): globally asynchronous, locally synchronous (GALS) for layout: IPlevel Mead & Conway, e.g. wiring strategies, tiling for architectures: chip multiprocessing (CMP), tiling, systolic/cell, for programming: e.g. Kahn process networks locally sequential, globally concurrent cpu cpu $ $ network mem $ cpu $ cpu locally shared memory, globally message passing

2. networks on chip 13 have to connect many local solutions heterogeneity, scalability through the decoupling of communication & computation networks on chip address this challenge from above (protocol stack, IP reuse) from below (DSM) ip ip ip ip ip ip ip network? on chip ip integration hell ip ip ip NoC DSM nightmare

2. networks on chip 14 twopronged approach deal with communication dynamism protocol stacks enable differentiated services routerbased network offers services application demands services hardware technology protocol stack is based on services scalable, compositional IP composition structure interconnect (wires, layout, timing)

2. networks on chip: examples 15 two types of component: routers tree transport data in packets network interfaces convert IP view (transactions, e.g. Amba, OCP) to network view (packets) NI R R R R R R NI NI NI NI mesh NI NI R R NI NI R R NI NI R R fat tree R R R R R R R R NI R NI R NI R NI NI NI NI

2. networks on chip: advantages 16 differentiated services offer different kinds of communication with one network scalable add routers and network interfaces for extra bandwidth (at the cost of additional latency) compositional add routers/nis without changing existing components e.g. timing, buffers efficient use of wires statistical multiplexing/sharing (average vs. worstcase) fewer wires less wire congestion point to point wires at high speed communication becomes reusable, configurable IP

3. NoCs and QoS: a synthesis managing unpredictability with quality of service

3. GULP? 18 local schedulers (RT)OS task switching interrupts cache strategy cache pollution interconnect busses, bridges networks memory controllers external memory IP IP IP IP cpu $ interconnect IP ext. mem mem arb. interconnect interconnect IP IP IP IP IP IP IP IP IP $ cpu IP IP e.g. RR, TDMA, FCFS, LRU, EDLF, FIFO, priority, what is the global behaviour, composed of interacting local solutions?

3. GULP? 19 example CPU @ 225MHz, 64KB I$ ~70 cycle latency to external memory single task switch ~13K cycles = RTOS overhead + task switch [1] ~6K cycles + cache reload due to pollution (10%) [2] ~7K cycles with 20 hard realtime video tasks @ 60Hz 1200 switches X 13K cycles 7% CPU load what about effective task throughput, latency? have to guarantee throughput and latency (for hard realtime IP) throughput (for soft realtime IP) minimal latency (for CPU control tasks) cycles/instr task X switch [1] [2] task Y time

3. GULP? 20 so, now we can make SoCs with NoCs, using our decoupled recomposed solutions get locally predictable, globally unpredictable behaviour (GULP) GALS: multiple clock domains leads to uncertainty in time or data power management: combining local autonomous probabilistic managers NUMA: local vs. remote shared memory, dynamic (cache/mem) paging Kahn process networks: how are sequential processes scheduled? interacting schedulers/resource managers but the user wanted (global) predictable behaviour..

3. QoS & GULP 21 our tenet is that a quality of service approach is essential to recuperate global predictability the user and application require it it fits well with NoC protocol stack quality of service is nothing more than 1. stating what service you want (negotiation) 2. having the provider either commit to or reject your request 3. renegotiate when your requirements change (re)negotiate steady states create a series of steady states that are predictable

3. example: QoS & VBR 22 29000 27000 25000 23000 21000 19000 17000 15000 [VBR: variable bit rate] (re)negotiate worstcase load structural load running average instantaneous load steady states

3. quality of service 23 QoS means reducing uncertainty to negotiation phase for both user and provider requires & enables resource management notion of commitment guaranteed versus besteffort service types of commitment 1. correctness e.g. uncorrupted data 2. completion e.g. no packet loss 3. bounds e.g. maximum latency

3. some remarks 24 the types of commitment are dependent e.g. cannot offer latency bound without completion this has repercussions for protocol stack & architecture data retransmission on unreliable lowswing wires immediately excludes guaranteed latency & jitter service users quality of service must be done at all levels physical: power manager of IP blocks link & network: network and communication links task level: CPU scheduler (RTOS), application software services service providers QoS is pervasive, it cannot be bolted on afterwards

3. some remarks 25 the statistical guarantees oxymoron e.g. guaranteeing >0% packet arrival implies QoS have to keep track of percentage lost post hoc analysis of behaviour of architecture is no guarantee unless boundary conditions of analysis are enforced (and then resources have to be managed QoS)

3. the cost of QoS 26 a) guaranteed bounds require worstcase resource dimensioning b) completion requires at least averagecase resources

3. the cost of QoS 27 besteffort services can have better average resource utilisation at the cost of unpredictable/unbounded worstcase behaviour the combination of besteffort & guaranteed services (c) is useful!

3. quality of service 28 IP integration is the problem SoC design becomes communication centric the NoC is the focus of the architecture to make SoCs predictable, NoCs must offer QoS ip ip ip ip ip ip ip network on chip ip ip ip ip predictable composition with QoS

4. NoCs and QoS: the Æthereal approach

4. Æthereal context 30 consumer electronics reliability & predictability are essential low cost is crucial time to market must be reduced NoCs and QoS helps on all accounts NoCs are focal point of SoCs QoS is essential for NoCs hence the Æthereal NoC offers differentiated services to manage (and hence reduce) resources to ease integration (and hence decrease TTM)

4. NoC services master IP LD, ST data slave IP 31 request communication services using connections opening & closing affect resource reservations with properties data integrity (uncorrupted data transfer) transaction ordering un/ordered per slave/connection transaction completion flow control data loss or not delivery bounds throughput, latency, jitter commitment correctness completion bounds

4. architecture decisions 32 we expect lossless, (partially) ordered connections with or without throughput guarantees to be most popular hence, to reduce costs, we implement this natively, i.e. don't drop data in the network(*) don't reorder data in the network no retransmissions & filtering of duplicates no reorder buffers not dropping data complicates congestion & deadlock issues (*) excluding network interfaces

4. architecture decisions 33 conceptually, two disjoint networks a network with throughput+latency guarantees (GT) a network without those guarantees (besteffort, BE) besteffort router programming guaranteed router priority/arbitration we have a several types of commitment in the network combine guaranteed worstcase behaviour with good average resource usage

4. besteffort router architecture 34 wormhole routing input queueing source routing source decides on path to follow through network other options (all feasible, areawise) input queuing with virtualcutthrough routing virtual output queuing & islip with wormhole routing output queuing with wormhole routing

4. guaranteedthroughput router 35 contentionfree routing synchronous, using slot tables timedivision multiplexed circuits storeandforward routing headerless packets information is present in slot table

4. architecture decisions 36 to offer guaranteed latency or bandwidth over finite interval cannot drop data must bound contention and congestion ratebased scheduling has high buffer costs (deep fifos) deadlinebased scheduling even higher buffer costs (deep priority queues) contentionfree routing low buffer costs (shallow fifos) NB, all require some notion of time

4. contentionfree routing 37 latency guarantees are easy in circuit switching emulate circuits with packet switching schedule packet injection in network such that they never contend for same link at same time in space: disjoint paths in time: timedivision multiplexing or a combination

o1 o2 o3 o4 o1 o2 o3 o4 38 slot table of router 1 i1 i3 i1 i1 i1 the input routed to the output at this slot 1 network 2 1 interface 2 1 1 network interface 2 2 1 1 1 1 router 1 3 3 4 4 router 3 2 2 2 2 1 1 1 1 router 2 3 3 2 2 1 1 network interface 2 2 input 2 for router 1 is output 1 for router 2 o1 i4 o2 o3 o4 i1 use slots to avoid contention divide up bandwidth

4. programming model 39 use besteffort packets to set up connections setup & teardown packets like in ATM (asynchronous transfer mode) distributed, concurrent, pipelined safe: always consistent compute slot assignment compile time, run time, or combination connection opening is guaranteed to complete (but without a latency guarantee) with commitment or rejection

4. architecture insights 40 memories (for packet storage) registerbased fifos are expensive RAMbased fifos are as expensive 80% of router is memory special hardware fifos are very useful 20% of router is memory stu iqu iqu iqu iqu switch msu iqu iqu speed of memories registers are fast enough RAMs may be too slow hardware fifos are fast enough routers based on registerfile and hardware fifos drawn to approximately same scale (1mm2, 0.26mm2)

4. architecture insights 41 router must be scalable, but up to a point in terms of area (switch, input or output buffers) in terms of speed (arbiter, memories) don't go beyond 8x8 routers (for fat tree)? switch is less of a problem than expected latency of router is essential increase data rate increase arbitration rate minimise number of hops in network topology choice think about tradeoff hops versus router latency

4. router results 42 a prototype router: 5 input and 5 output ports (arity 5) 0.25 mm2 CMOS12 500 MHz data path, 166 MHz control path flit size of 3 words of 32 bits 500x32 = 16 Gb/s throughput per link, in each direction 256 slots & 5x1 flit fifos for guaranteedthroughput traffic 6x8 flit fifos for besteffort traffic

4. router architecture 43 BQ GQ BQ GQ X data packets BQ GQ slot table arbiter flow control reconfiguration logic programming packets

5. conclusions

5. conclusions 45 future applications are more dynamic and embedded users wants predictable and reliable behaviour QoS bridges this apparent contradiction future SoC design relies on NoCs to solve DSM issues close the design productivity gap integration bliss? NoC DSM heaven? QoS can be provided by the NoC protocol stack (services) but, the NoC architecture must take this into account there is an increasing awareness of the need for predictable system design and the role NoCs can play

5. conclusions 46 the Æthereal NoC offers differentiated services with different types of commitment the Æthereal architecture aims to marry guaranteed worstcase behaviour with good average resource usage Æthereal's prototype routers show feasibility of NoCs