A Generic Network Interface Architecture for a Networked Processor Array (NePA)

Similar documents

What is a System on a Chip?

Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP)

Qsys and IP Core Integration

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

7a. System-on-chip design and prototyping platforms

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Introduction to System-on-Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

KeyStone Training. Multicore Navigator Overview. Overview Agenda

Packetization and routing analysis of on-chip multiprocessor networks

Chapter 11 I/O Management and Disk Scheduling

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Multiprocessor System-on-Chip

Router Architectures

SOC architecture and design

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Router and Routing Basics

Computer and Set of Robots

Computer System Design. System-on-Chip

Design and Verification of Nine port Network Router

Computer Organization & Architecture Lecture #19

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Operating System Support for Multiprocessor Systems-on-Chip

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Chapter 2 Logic Gates and Introduction to Computer Architecture

Scaling Networking Applications to Multiple Cores

AN FPGA FRAMEWORK SUPPORTING SOFTWARE PROGRAMMABLE RECONFIGURATION AND RAPID DEVELOPMENT OF SDR APPLICATIONS

Architectures and Platforms

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

A Scalable Large Format Display Based on Zero Client Processor

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Asynchronous Bypass Channels

C-GEP 100 Monitoring application user manual

Protocols and Architecture. Protocol Architecture.

PCI Express Overview. And, by the way, they need to do it in less time.

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

10/100/1000Mbps Ethernet MAC with Protocol Acceleration MAC-NET Core with Avalon Interface

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Understanding Latency in IP Telephony

vci_anoc_network Specifications & implementation for the SoClib platform

Packet-based Network Traffic Monitoring and Analysis with GPUs

Going Linux on Massive Multicore

SoC IP Interfaces and Infrastructure A Hybrid Approach

Breaking the Interleaving Bottleneck in Communication Applications for Efficient SoC Implementations

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

Systems on Chip and Networks on Chip: Bridging the Gap with QoS

Testing of Digital System-on- Chip (SoC)

Low-Overhead Hard Real-time Aware Interconnect Network Router

Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip

Computer Networks CS321

Multichannel Voice over Internet Protocol Applications on the CARMEL DSP

4. H.323 Components. VOIP, Version 1.6e T.O.P. BusinessInteractive GmbH Page 1 of 19

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2

Central Processing Unit (CPU)

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING UNIT 5 INPUT/OUTPUT UNIT JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip

Embedded System Hardware - Processing (Part II)

Microprocessor & Assembly Language

Mobile IP Network Layer Lesson 01 OSI (open systems interconnection) Seven Layer Model and Internet Protocol Layers

Open Flow Controller and Switch Datasheet

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION

CCNA R&S: Introduction to Networks. Chapter 5: Ethernet

Video Conference System

PART B QUESTIONS AND ANSWERS UNIT I

Model-based system-on-chip design on Altera and Xilinx platforms

IHM VoIP Products. Document history:

Voice Over IP Per Call Bandwidth Consumption

High Speed I/O Server Computing with InfiniBand

Optimizing Configuration and Application Mapping for MPSoC Architectures

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN.

AMD Opteron Quad-Core

Switched Interconnect for System-on-a-Chip Designs

Using the TASKING Software Platform for AURIX

Quality of Service (QoS) for Asynchronous On-Chip Networks

Question: 3 When using Application Intelligence, Server Time may be defined as.

Using a Generic Plug and Play Performance Monitor for SoC Verification

Pre-tested System-on-Chip Design. Accelerates PLD Development

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL

Communications and Computer Networks

ELEC 5260/6260/6266 Embedded Computing Systems

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview

Wireless Microcontrollers for Environment Management, Asset Tracking and Consumer. October 2009

Cisco CCNP Optimizing Converged Cisco Networks (ONT)

Eight Ways to Increase GPIB System Performance

Getting Started with RemoteFX in Windows Embedded Compact 7

Transcription:

A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine

Outline Introduction Network-on-Chip Related Works Generic Network Interface Related Works Networked Processor Array (NePA) Architecture Generic Network Interface Programming Sequence Modular Wrapper for a Slave IP Core Case Studies: Memory/ Turbo Decoder IP Cores Summary 2

Introduction Gate Delay in Times (ps) 200 160 120 9000 8000 7000 6000 5000 Wire Delay in Times (ps) 80 4000 3000 40 2000 1000 0 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 Year Gate Delay (HP) Gate Delay (LOP) Gate Delay (LSTP) Global Wire Delay 1 Global Wire Delay 2 Metal1 Wire Delay 1 Metal1 Wire Delay 2 Int. Wire Delay 1 Int. Wire Delay 2 0 In 2018, the interconnection delay is estimated to be 1000 times greater than gate delay [ITRS] The interconnection network among multiple IPs becomes another challenging issue in System-on-Chip (SoC) design from ITRS 2004 Report

Introduction (cont d) Current Trends in VLSI Technology Requirements Computation intensive applications Highly integrated + low power Increasing # of computing resources in SoC CPUs, DSPs, ASPs System platforms MPSoC (Multi Processor System-on-Chip) or CMP (Chip Multi Processor) Homogeneous/Heterogeneous processors Similarity in a small scale distributed computer system Interconnection? 4

Network-on-Chip Interconnection CPU RAM ROM Switch Link Interconnect Network Network Interface Peripheral DSP Communication The use of switching based technology co-processor Have been extensively used for computer network Communication between IPs can be packet based I/O The key efficiency of NoC Communication resources are SHARED! 5

Network-on-Chip (cont d) Network-on-Chip (NoC) Architecture Network-like interconnection Insertion of routers Shortened wiring requirement Alleviating scalability and freedom from the limitation of complex wiring Difference from computer network technology (Internet TCP/IP) Simple and light-weight modification low power requirement for mobile applications Performance and cost Different interface specification of integrated components raise a considerable difficulty for adopting NoC techniques 6

Generic Network Interface The reuse of IP cores in plug-and-play manner can be achieved by using a generic network interface (NI) Reduce design time of new system Translate packet-based communication into a higher level protocol Decouple computation from communication Hide the implementation details of interconnection 7

Related Works Different Packetization strategy Software library, on-core and off-core implementation A hardware wrapper implementation has the lowest area overhead and latency NI for standard Interface such as OCP, DTL and AXI Improve reuse of IP cores Performance is penalized because of increasing latency Generic architecture and automatic generation of interface Existing researches limit the embedded IP cores to CPU (ARM7 and MC68000) The designs of wrapper for application specific cores still lack generic aspects 8

NePA Architecture System Platform Host I/F (HI) Memory Station (MS) Memory Station (MS) Host I/F (HI) MS NI Memory Controller Router Data RAM IP x Network Interface Specific IP (FFT, Viterbi or Turbo coder) Router IP 1 Processing Element (PE) Processing Element (PE) IP 3 IP 2 Processing Element (PE) Processing Element (PE) IP 4 PE NI Router Program RAM Data RAM Host I/F (HI) Memory Station (MS) Memory Station (MS) Host I/F (HI) Processor Core (ARM / MIPS etc) 9

NePA Architecture: High-Performance Router Architecture Interconnect throughout FIFO between neighboring PEs Simple Interconnect Wiring Minimal (shortest) adaptive routing Livelock-free Point-to-point single or block transfer Two disjoint sub-networks for the west-to-east and east-to-west traffics Network avoids a cyclic dependency Resulting in deadlock-freedom Prioritized packet delivery W W Input N1 Input IntR Input E S1 Output S N1 N2 Right Router N1 Output N S1 Input E Output W output W N2 Input Internal Router S S2 Output N2 Output N W Input IntL Input S2 Input Left Router E Int R INT INT Int L S1 S2 10

Network Interface Prototype: Packetization Unit Build the packet header and converts the data into flits Header builder: form the head flit based on the information provided by registers DMA controller: generate control over the address and read signal for the internal memory automatically Flit Controller: wrap up the head flit and body flits into a packet 11

Network Interface Prototype: Depacketization Unit Receive data from interconnection network Flit Controller: select head flit from a packet and pass it to the header parser Header parser: extract control information from the head flit and assert an interrupt signal to the OpenRISC core DMA controller: writes the body flit data into the internal memory automatically 12

Network Interface Prototype: Programming Sequence Sending SINGLE Packet All required parameters are set to the associated registers Writing command register generate a complete packet Sending BLOCK packet sdatareg represents the number of data sreadaddrreg indicates the start address of data in memory Receiving SINGLE/BLOCK packets Parameters are accessed by interrupt service routine Accessing rdatareg completes the procedures for current packet For BLOCK packet, OpenRISC sets the corresponding write address (wwriteaddrreg) for internal memory access 13

Generic Network Interface Modification of Packet for NI access A slave IP core is not able to write registers in a current NI These registers are accessed by other cores using the network Opcode and Operand of an instruction are located at Tag and Data field in the SINGLE packet Type field indicates that packet contains an instruction for NI Instruction decoder in the header parser fetches opcode and operand from a packet Update internal registers 14

Generic Network Interface Modular Wrapper for a slave IP core Un-buffered Mode: data is exchanged in data stream without intermediate buffer Buffered Mode: data is saved in the intermediate buffer temporarily 15

Generic Network Interface Modular Wrapper for a Memory Maintain data and shared among a number of PEs. Assume synchronous SRAM model Wrapper design Core type is slave IP There is no control signals for initialization or status monitoring Data interface is realized in the un-buffered mode removing the FIFOs between NI and memory Programming sequence The base address is set to the desired value using SINGLE packet Sending BLOCK packet stores data into memory Read operation is done by sending SINGLE packet 16

Generic Network Interface Modular Wrapper for a Turbo Decoder Stand-alone turbo decoder operating block by block process Wrapper design Core type is slave IP There are six signals that are used for initialization and mode selection Data interface adopts buffered mode, inserting FIFOs between NI and the core Programming sequence Before starting turbo decoding, it is initialized by sending packet which accesses the input control signals Data is sent to the core using BLOCK packet When decoding of one block is completed, NI start to send a packet to the other node automatically 17

Summary Introduced Networked Processor Array (NePA) Proposed network interface architecture for OpenRISC core Classified the possible IP cores for processing elements Proposed a modular wrapper for an embedded IP cores Allocation table was used for the configuration of the modular wrapper Programming model was presented Case studies in memory and turbo decoder cores demonstrated feasibility and efficiency of the proposal 18