Time-predictable Processor and Network-on-Chip. Jens Sparsø Technical University of Denmark

Size: px

Start display at page:

Download "Time-predictable Processor and Network-on-Chip. Jens Sparsø Technical University of Denmark"

Augustus Doyle
7 years ago
Views:

1 Time-predictable Processor and Network-on-Chip Jens Sparsø Technical University of Denmark

2 T-CREST platform Processor node Patmos CPU Local memories (caches, SPM) DMA controllers SDRAM Memory controller node Connected by NOC Patmos CPU Globally Asynchronous Locally Synchronous implementation I$ SPM Network Interface D$ DMA SDRAM Proc. node MEM ctl 2

3 Research challenges A basic MIPS-style processor pipeline is predictable Not too many research challenges Communication and memory hierarchy is where the action is in a CMP. 3

4 Research Challenges cont. Inside a processor node: Scratch pad memories Special caches (stack / method / ) Processor to processor DMA driven block transfers (message passing) Nature: All-to-all Processor to shared memory (SDRAM) Arbitration in NOC and in memory controller Nature: All-towards-one Communication NOC Memory NOC 4

5 WP2: Processor Time-predictable processor Called Patmos Flexibility to define the instruction set A compiler is adapted for Patmos Co-design for low WCET of Patmos Compiler WCET analysis 5

6 Patmos Intended as research platform for realtime architecture Dual issue RISC style processor Full predication all instructions Local caches and memories: Scratch pad memory (SPM) Stack cache Method cache (FIFO and LRU replacement) Data cache (Set associative, LRU, writethrough, no-allocation) 6

7 Pipeline Overview 7

8 Instruction Set 3 register ALU instructions Immediate ALU instructions Short and long ALU (2. issue slot) Load/store with register + offset Word, half word, byte Aligned access Typed load/store for different data caches Split load 8

9 Instruction Set cont. Branch with offset and register indirect Call instruction to support method cache Compare and predicate operation Stack cache support instructions Dual issue ALU in both pipelines Load/store and branch in 1. pipeline 9

10 Patmos Status Simulator complete Complete ISA coverage Executes MiBench benchmark suite Hardware described in VHDL FPGA implementation On-chip instruction ROM and data memory Test programs in assembler Tiny C programs are ok 10

11 Patmos Next Steps Executing MiBench in the HW Stack cache implementation Dual pipeline More on caches, SPMs, Attach the Network-on-Chip Integration with memory controller 11

12 T-CREST platform Processor node Patmos CPU Local memories (caches, SPM) DMA controllers SDRAM Memory controller node Connected by NOC Patmos CPU Globally Asynchronous Locally Synchronous implementation I$ SPM Network Interface D$ DMA SDRAM Proc. node MEM ctl 12

13 WP3: Network on Chip Two types of traffic: Processor to processor DMA driven block transfers SPM SPM (message passing) Nature: All-to-all Processor to shared memory (SDRAM) Arbitration in NOC and in memory controller Nature: All-towards-one Globally-Asynchronous Locally-Synchronous (GALS) implementation of platform 13

14 NOC centric view CPU $ SPM CPU CPU $ SPM $ SPM NI NI NI Clock domain Read/write memory-style interface The NOC Clock domain Mesochronous NIs NI CPU $ SPM NI CPU $ SPM NI M-ctl MEM Asynchronous - Packet switched - TDM based 14

15 Time predictable NOC Include NoC latency in WCET analysis Note: most NOC s are not time predictable Time predictable: Problem: Avoid interference of traffic Solution: End-to-end (virtual) circuits T-CREST solution: TDM (Time Division Multiplexing) with static schedule HW implementation is simple and fast WCET analysis is simple 15

16 Principles: TDM-based NOC IP1 IP2 c5 NOC c1 c2 c4 C6 c3 IP3 IP1 IP5 IP4 c2 c1 TDM period c3 IP5 NOC is a simple X-Y switched pipelined structure. No buffering. No arbitration. 16

17 Study of TDM scheduling Static schedule All-to-all N x (N-1) virtual circuits Same bandwidth on all circuits Topologies: Mesh Torous Bi-torous Ring 17

18 TDM schedule period Bounds IO Bound (n-1) Capacity bound (# links) Bisection bound (half to half comm.) Minimum (or best found) Size Mesh Torus Bi-torus bound min bound min bound min 3x (10) 4x (19) 5x (30) 6x (45) 7x7 48 (63) 8x8 64 (86) 9x9 92 (113) 18

Goossens, Aelite: A Flit-Synchronous Network on Chip with

19 A TDM-based router Link HPU Link Link HPU X bar Link Link HPU Link A. Hansson, M. Subburaman, K. Goossens, Aelite: A Flit-Synchronous Network on Chip with Composable and Predictable Services, Proc. DATE 2009 Payload flit Header Route flit 19

20 Asynchronous TDM router HPU HPU Design is more challenging than it looks: Which asynchronous design style? Delay insensitive / bundled data? 2-phase or 4-phase signaling (events or signal levels) TDM requires some form of common time (ticking) but most ticks carry no data. How to implement something that resembles clock gating? (avoid excessive power consumption)... HPU... X bar... 20

21 NI for T-CREST NOC Original T-CREST plan (Traditional NI design) New NI micro-architecture 21

22 Traditional NI design (w. buffers) CPU $ SPM NI NOC NI CPU $ SPM DMA driven Flow control TDM based Flow control (credits) DMA driven Flow control Per-connection FIFO-buffers and flow control. Accounts for 50-85% of NI hardware! 22

23 New T-CREST NI-microarchitecture CPU $ SPM NI NOC NI CPU $ SPM Merge/combine - DMA controllers - TDM schedule DMA controllers in NIs and driven by TDM schedule. Avoids FIFO-buffers and flow control. Saves 50-85% of traditional NI hardware 23

24 New T-CREST NI-microarchitecture 24

25 Key features SPM used for clock domain crossing One DMA needed per connection, but only one active at any given time due to TDM. Enables efficient table-based implementation of DMA controllers. End-to-end (i.e., SPM-to-SPM) data transfer avoids buffering and flow control. 25

26 Summary Dual issue MIPS-style processor - Patmos Predicated instructions Typed load and store instructions Special caches Scratch pad memory Network on chip Processor-to-processor DMA driven block transfers (message passing) Nature: All-to-all TDM w. static schedule (aelite-like) Asynchronous Processor-to-memory controller (SDRAM) Arbitration in NOC and in memory controller Nature: All-towards-one 26

27 T-CREST: More Info T-CREST web site Most deliverables are public Some first papers Development Most artifacts are open-source Hosted at GitHub 27

Design of Networks-on-Chip for Real-Time Multi-Processor Systems-on-Chip

Design of Networks-on-Chip for eal-time Multi-Processor Systems-on-Chip Jens Sparsø Department of Informatics and Mathematical Modelling Technical University of Denmark Email: jsp@imm.dtu.dk Abstract This