Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Size: px
Start display at page:

Download "Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy"

Transcription

1 Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of Electronics, Devi Ahilya University, Indore, M. P., India 2 Professor, School of Electronics, Devi Ahilya University, Indore, M. P., India Abstract To improve load balancing in NoCs several techniques exists in literature like Regional Congestion Awareness (RCA) and similar techniques. Also there are some techniques based on put port selection like count of free virtual channels, count of fluid buffers, buffer occupancy time at reachable downstream neighbors and flit flow history based algorithm named as Tracker. Among these techniques, Tracker has performed significantly better than others. However, Tracker has been simulated and verified using NoC simulation tool and no hardware implementation of flit flow based algorithm exists in literature yet. The proposed work is anew in the regard that no hardware implementation of Tracker architecture have been seen till now in the research literature of Network on chip. It implements improved flit flow based technique used by Tracker implemented on programmable hardware (Xilinx Virtex-5 FPGA) and achieves significant frequency of MHz as validated by experimental synthesized results. The innovation in the existing architecture was brought ab by insertion of additional buffers in the tracker internal logic to achieve better area / performance trade off for chip multiprocessors. Keywords NoC, Tracker, Adaptive ring, MOE, Load balancing ring, Virtual Channel rer, Verilog, Virtex FPGA. I. INTRODUCTION Since the last decade, optimizing computational efficiency of intellectual property cores had been the preferred choice in technological innovations among System on Chips (SoCs). At the same time, reliable and efficient communication also has to be given more emphasis as far as achieving important metrics, high performance and throughput among SoCs are concerned. This is due to the fact of wire delays getting comparable with the gate delays with continually decreasing feature sizes [1]. Network on chip considered as efficient replacements of other form of interconnects in chip multiprocessors and system on chip designs [1, 3]. Network on Chip (NoC) architecture consists of combination of topology, ring algorithms, switching, power optimization etc. Among different topologies, mesh topology considered as the most competent topology. 262 In the 2D mesh topology, the processing cores are arranged as rectangular tiles. Each processing IP core is connected to a local rer, which in turn connects them together in the form of mesh arrangement by interconnecting with the other similarly connected rers. The communication among the IP cores take place in the units of packets, flits and phits at different levels of abstraction. Among the three switching techniques store and forward, wormhole and virtual cut through switching, the wormhole switching is mostly preferred because of its less buffer space utilisation, in turn utilizing lesser area and power requirements. In this switching, each packet is sub divided into sequence of flow control units called as flits. The control information is driven by the header flit. The rer of a 2D mesh topology contains five bidirectional ports, four ports for each of the directions, east, west, north and sh and one for the local tile (IP Core). Every input port of a rer could optionally be associated with flits buffers set, termed as virtual channels (VCs). The use of virtual channels is subjected to the choice of establishing balance between saving on average network latency for the cost of area and power consumption of added buffers [5]. The available buffer space information among the rers is being carried by control signals. Flits belonging to several VCs within the same input port arbitrate among themselves and successful flits from various input ports will undergo switch arbitration and allocation and finally flits are forwarded through the crossbar switch to the respective put ports [2]. An existing NoC architecture mostly has a choice among deterministic, oblivious and adaptive ring algorithms to determine re taken by a packet to its destination. Although adaptive ring imbibes more complex implementation but it is still preferred because of better fault tolerance, increased network throughput and decreased network latency as compared to oblivious policies when non-uniform or busty traffic is applied to it. Despite of these pluses performance of adaptive ring becomes detrimental when local decisions based on network load are taken, resulting in disturbing load balancing within the NoC.

2 This may imply undesired infusion of hot spots. Also adaptive rers choose the best re for the incoming packets among the available res by the choice of having chosen dynamically varying network congestion status among other selection metrics. For the unrestricted flow of packets within the rers buffers along with virtual channel is utilized. At the same time, handshaking mechanisms using set of control signals are used for maintaining information synchronization among the rers. Thus computational units like that of IP cores establish efficient communication among themselves by producing, and processing data packets and control signals through the NoC infrastructure. Adaptive ring chooses the best re for incoming packets from a set of available res by making use of a proper selection metric that captures dynamically varying congestion status [3]. Recently designed Network on chip architectures generally requires crucial parameters like low latency, load balancing and deadlock free ring to be satisfied to the best of the extent in order to maintain optimized ring implementation. At the higher packet injection rates, the capability to handle packets flow among the neighboring rers and number of allocated virtual channel buffers pose biggest limitation. The adaptivity in the ring mechanism for a typical adaptive rer gets feasible when it chooses the best possible put port for an incoming bit. For this, the put port selection function chooses one of the put ports for the flit by choosing an appropriate metric that takes care of congestion [6]. The selection metric which was preferred to represent congestion will decide the optimum selection strategy. Link utilization of the network can be improved by balancing the traffic across all the links. Flit flow history based analysis method as proposed by Tracker [2] was obvious choice made over the existing metrics like availability of free virtual channels [5, 6], buffer fluidity values [10] and buffer occupancy values [11]. Ring decisions are taken in such a fashion that less frequently used links are preferred. II. RELATED WORK The tracker algorithm utilized the minimum odd-even ring (MOE) which is one of the simplest and most commonly used deadlock free adaptive ring algorithms used in mesh NoCs [4]. The MOE algorithm makes random selection of the available ports. The selection functions have to work above MOE to implement ring function. Load information of neighboring switches for channel selection is depicted in [7]. Load balancing ring scheme by random channel selection is proposed in [9]. Based on the past flow pattern, the author in [8] estimates network s congestion level and deterministically calculates optimized ring paths for all traffic flows. Count of free Virtual channel is also explored as a selection metric in the adjacent subsequent rers [5, 12]. Free VC status of reachable neighbors of adjacent rers of current node is investigated by [6]. Count of number of fluid buffers is explored in [10]. History of buffer occupancy within realistic time interval is discussed in [11]. Tracker architecture included a Virtual channel rer [3] that monitors flow of flits through all its going ports and exchanges this flit flow information with its neighbors. Computation of flit flow information is done using Cumulative Flit Count(CFC). It designates contention level of an put port of neighboring rer. The architecture of the selection logic of tracker rer which was implemented in Tracker is as shown in Fig.1. West From rer 4 To rer 4 Status register of node 5 Sh Fig 1. Internal architecture of internal logic in Tracker [2] Working of flit forwarding in a tracker rer is as shown in the following Fig. 2. Flit F is sourced from node 4 and is destined to node 15. It first reaches at node 5. For flit F, the MOE ring function [3] chooses east port (link to node 6) and north port (link to node 9) as the possible put ports. At the same time, flip flop () values from node 9 and node 6, reach node 5. East 263

3 D 11 West in East in To Rer 4 4 S Status register of node 5 East West in From Rer 4 Fig. 2: Rer Architecture Network Model III. PROPOSED IMPLEMENTATION The proposed architecture of improved tracker design is as shown in Fig. 3. The additional buffers are inserted in the existing tracker internal logic to achieve better area / performance trade off for chip multiprocessors. The rer network architecture model (Fig. 2) as designed in [2] is implemented using the improved internal architecture. 16 nodes are designed and configured in a 4 x4 mesh network model and tested the Tracker (node) behavior among the network. Each node is said to be a test_node and using the array of 16 such nodes connected together, a typical NoC architecture is designed. The modified rer architecture network model is designed in verilog HDL by module name rer_top. It consists of combination of following group of inputs and put signals: Inputs: a) clk, reset of 16 bits, b) west_data0_in, sh_data0_in, sh_data1_in, sh_data2_in, sh_data3_in, east_data3_in, east_data7_in, east_data11_in, east_data15_in, north_data15_in, north_data14_in, north_data13_in, north_data12_in, west_data12_in, west_data8_in, west_data4_in of 8 bits. Fig. 3: Improved internal architecture of internal logic of tracker c) req0_w, req4_w, req8_w, req12_w, req12_n, req13_n, req14_n, req15_n, req15_e, req11_e, req7_e, req3_e, req0_s, req1_s, req2_s, req3_s of 1 bit. d) busy0, busy1, busy2, busy3, busy4, busy5, busy6, busy7, busy8, busy9, busy10, busy11, busy12, busy13, busy14, busy15 of 4 bits. Outputs: Sh Sh in west_0_, sh_0_, sh_1_, sh_2_, sh_3_, east_3_, east_7_, east_11_, east_15_, north_15_, north_14_, north_13_, north_12_, west_12_, west_8_, west_4_ of 8 bits. Among the inputs, the a) group contains clk and 16 bit resets, one reset for each test_node. The b) group consists of all the inputs, each of 8 bits pertaining to each of the test_nodes from 0 to 15, in the 4 x 4 mesh network model. The c) group consists of collection of control signals in terms of request signals of neighboring nodes for data transfer. Each of the request signals are of 1 bit size. The d) group contains the status of the put port of a test_node in terms of busy signal. 264

4 Each of the test_node is associated with 4 bits busy signal corresponding to the directions north, sh, east and west. The busy signal will be put on as 0 provided the tracker algorithm working within test_node validates the put path availability corresponding to the referred direction. Among the puts, the port directions of ermost test_nodes each having data ports of 8 bits are depicted. Test_Node: The test node is as shown in Fig. 4. It is the basic building block of the NoC architecture network model. It contains the information related to and required by all the neighborhoods surrounding it. A typical test_ node has four input data ports each of 8 bits named as north_data, sh_data, east_data, west_data corresponding to each direction north, sh, east and west respectively along with the associated buffers. It also has 4 bits request input coming from the incoming nodes. nsew_busy signal of 4 bits, gives the desired direction of propagation of flit as proposed by the MOE algorithm running within the test_node. Test_node has four put data ports each of 8 bits named as north_, sh_, east_ and west_ corresponding to each direction north, sh, east and west respectively. To forward the requests towards the downstream there is a signal of four bits o_nsew_req and to respond to the requests acknowledgements are sent in terms of 4 bits nsew_ack signals. One path is selected and configured for only one side receive and one side put. For example: if the node 5 needs to receive the data in the west side and puts the data to north, then the preferable settings are designed such that west node request is 1 and north side busy signal should be 0. north_data sh_data east_data west_data nsew_req nsew_busy Test_Node Fig. 4: Test_node north_ sh_ east_ west_ o_nsew_req nsew_ack Inputs : north_data, sh_data, east_data, west_data are of size 8 bits and nsew_req, nsew_busy are of size 4 bits. Outputs: north_, sh_, east_, west_ are of 8 bits and o_nsew_req, nsew_ack are of size 4 bits. According to the top level bit configuration, each node can receive and send the value. Test_wrapper: The above 4x4 network model is given data inputs, along with clock and resets and the data puts are observed by designing a top level module, named as test_wrapper, which encompasses rer_top module. It has clk, 16-bits reset, 8-bits data_in and 8-bits data_. It is as shown in Fig. 5. clk reset data_in Test_wrapper Fig. 5: Test_wrapper Data_ The rer_top module is instantiated within it and data is sent from the west direction of test_node 4, according to minimal odd-even ring (MOE) algorithm and red by applying the improved tracker algorithm pertaining to the architecture as shown in Fig. 3. The neighborhood nodes are investigated and counts for Cumulative Flit Count (CFC) for them is observed, so as to reach to test_node 15. The code is tested for one path (configuration bit is generated according to MOE algorithm). As depicted by dark arrows in Fig. 2, the data is red from the west port of test_node 4, then to the west of test_node 5, then to the north of test_node 9, then to the east of test_node 10, then to the east of test_node 11 and finally to the north of test_node 15. The data will have the options of diverting to the east of the node 5, but due to turn model restrictions of the MOE algorithm, it could not take turn to north from node 6, to node 10, thereby follows the most conveniently available path towards north of node 9. The test_wrapper is functionally verified by applying a test bench. After resetting initially for the first clock cycle the desirable response appears as shown in the simulation result as depicted in Fig. 6. The code of test_wrapper has been synthesized on Xilinx Virtex-5 FPGA and results in the maximum clock frequency of MHz are shown in results section. 265

5 IV. RESULTS AND DISCUSSIONS The simulation result is as shown by Figure 6. As per the simulation it took around 23 clock cycles to reach the data, which was put at the west input of source test_node 4 to reach at the east port of destination test_node 15. Upon synthesizing on Xilinx Virtex-5 (XCVvlx50t- 2ff1136), the maximum clock frequency of MHz is obtained. Fig. 6: Functional Simulation of test_wrapper Fig. 7: Synthesized top level block Test wrapper V. CONCLUSION There are some techniques based on put port selection like count of free virtual channels, count of fluid buffers, buffer occupancy time at reachable downstream neighbors and flit flow history based algorithm named as Tracker. Among these techniques, Tracker has performed significantly better than all others. Hardware implementation of improved adaptive NoC rer with flit flow history based load balancing selection strategy, has been proposed. The proposed technique is an improved version of the Tracker architecture which incorporates insertion of additional buffers in the existing tracker internal logic to achieve better area / performance trade off for chip multiprocessors. The proposed work implements improved flit flow history based technique used by Tracker implemented on programmable hardware (Xilinx Virtex-5 FPGA) using Verilog HDL and achieves significant frequency of MHz as validated by experimental synthesized results. REFERENCES [1] W. Dally and B. Towles, Re packets, not wires: On-chip interconnection networks, in DAC, pp , [2] John Jose, K.V. Mahathi, J. Shiva Shankar and Madhu Mutyam, TRACKER: A Low Overhead Adaptive NoC Rer with Load Balancing Selection Strategy, IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Nov. 5-8, 2012, San Jose, California, USA. [3] W. Dally and B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann Publishers Inc., USA, [4] G. M. Chiu, The odd-even turn model for adaptive ring, IEEE TPDS, vol. 11, no. 7, pp , [5] W. Dally, Virtual-channel flow control, IEEE TPDS, vol. 3, no. 2, pp , [6] G. Ascia, et al., Implementation and analysis of a new selection strategy for adaptive ring in NoC, IEEE TOC, vol. 57, no. 6, pp , [7] E. Nilsson, et al., Load distribution with the proximity congestion awareness in a network-on-chip, in DATE, pp , [8] A. E. Kiasari, et al., A framework for designing congestionaware deterministic ring, in NoCArc, pp , [9] M.H.Cho, et al. Path-based, randomized, oblivious, minimal ring, in NoCArc, pp , [10] Y. C. Lan, et al., Fluidity concept for NoC: A congestion avoidance and relief ring scheme, in SoC, pp , [11] J. Jose, et al., BOFAR : Bu_er occupancy factor based adaptive rer for mesh NoCs, in NoCArc, pp , [12] J. Kim, et al., A low latency rer supporting adaptivity for on- chip interconnects, in DAC, pp ,

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

Asynchronous Bypass Channels

Asynchronous Bypass Channels Asynchronous Bypass Channels Improving Performance for Multi-Synchronous NoCs T. Jain, P. Gratz, A. Sprintson, G. Choi, Department of Electrical and Computer Engineering, Texas A&M University, USA Table

More information

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,

More information

Design and Verification of Nine port Network Router

Design and Verification of Nine port Network Router Design and Verification of Nine port Network Router G. Sri Lakshmi 1, A Ganga Mani 2 1 Assistant Professor, Department of Electronics and Communication Engineering, Pragathi Engineering College, Andhra

More information

Switched Interconnect for System-on-a-Chip Designs

Switched Interconnect for System-on-a-Chip Designs witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased

More information

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION T.S Ghouse Basha 1, P. Santhamma 2, S. Santhi 3 1 Associate Professor & Head, Department Electronic & Communication Engineering,

More information

Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip

Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip Manjunath E 1, Dhana Selvi D 2 M.Tech Student [DE], Dept. of ECE, CMRIT, AECS Layout, Bangalore, Karnataka,

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

A Low Latency Router Supporting Adaptivity for On-Chip Interconnects

A Low Latency Router Supporting Adaptivity for On-Chip Interconnects A Low Latency Supporting Adaptivity for On-Chip Interconnects Jongman Kim Dongkook Park T. Theocharides N. Vijaykrishnan Chita R. Das Department of Computer Science and Engineering The Pennsylvania State

More information

vci_anoc_network Specifications & implementation for the SoClib platform

vci_anoc_network Specifications & implementation for the SoClib platform Laboratoire d électronique de technologie de l information DC roject oclib vci_anoc_network pecifications & implementation for the oclib platform ditor :. MR ANAD Version. : // articipants aux travaux

More information

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,

More information

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches 3D On-chip Data Center Networks Using Circuit Switches and Packet Switches Takahide Ikeda Yuichi Ohsita, and Masayuki Murata Graduate School of Information Science and Technology, Osaka University Osaka,

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP)

Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP) 19th International Symposium on Computer Architecture and High Performance Computing Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP) Seung Eun Lee, Jun Ho Bahn, and

More information

Scaling 10Gb/s Clustering at Wire-Speed

Scaling 10Gb/s Clustering at Wire-Speed Scaling 10Gb/s Clustering at Wire-Speed InfiniBand offers cost-effective wire-speed scaling with deterministic performance Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400

More information

Leveraging Torus Topology with Deadlock Recovery for Cost-Efficient On-Chip Network

Leveraging Torus Topology with Deadlock Recovery for Cost-Efficient On-Chip Network Leveraging Torus Topology with Deadlock ecovery for Cost-Efficient On-Chip Network Minjeong Shin, John Kim Department of Computer Science KAIST Daejeon, Korea {shinmj, jjk}@kaist.ac.kr Abstract On-chip

More information

Load Balancing Mechanisms in Data Center Networks

Load Balancing Mechanisms in Data Center Networks Load Balancing Mechanisms in Data Center Networks Santosh Mahapatra Xin Yuan Department of Computer Science, Florida State University, Tallahassee, FL 33 {mahapatr,xyuan}@cs.fsu.edu Abstract We consider

More information

Interconnection Networks

Interconnection Networks Interconnection Networks Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Three questions about

More information

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip Cristina SILVANO [email protected] Politecnico di Milano, Milano (Italy) Talk Outline

More information

Open Flow Controller and Switch Datasheet

Open Flow Controller and Switch Datasheet Open Flow Controller and Switch Datasheet California State University Chico Alan Braithwaite Spring 2013 Block Diagram Figure 1. High Level Block Diagram The project will consist of a network development

More information

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP SWAPNA S 2013 EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP A

More information

Interconnection Network Design

Interconnection Network Design Interconnection Network Design Vida Vukašinović 1 Introduction Parallel computer networks are interesting topic, but they are also difficult to understand in an overall sense. The topological structure

More information

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

From Bus and Crossbar to Network-On-Chip. Arteris S.A. From Bus and Crossbar to Network-On-Chip Arteris S.A. Copyright 2009 Arteris S.A. All rights reserved. Contact information Corporate Headquarters Arteris, Inc. 1741 Technology Drive, Suite 250 San Jose,

More information

Interconnection Networks

Interconnection Networks Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode

More information

Optical interconnection networks with time slot routing

Optical interconnection networks with time slot routing Theoretical and Applied Informatics ISSN 896 5 Vol. x 00x, no. x pp. x x Optical interconnection networks with time slot routing IRENEUSZ SZCZEŚNIAK AND ROMAN WYRZYKOWSKI a a Institute of Computer and

More information

A New Paradigm for Synchronous State Machine Design in Verilog

A New Paradigm for Synchronous State Machine Design in Verilog A New Paradigm for Synchronous State Machine Design in Verilog Randy Nuss Copyright 1999 Idea Consulting Introduction Synchronous State Machines are one of the most common building blocks in modern digital

More information

Packetization and routing analysis of on-chip multiprocessor networks

Packetization and routing analysis of on-chip multiprocessor networks Journal of Systems Architecture 50 (2004) 81 104 www.elsevier.com/locate/sysarc Packetization and routing analysis of on-chip multiprocessor networks Terry Tao Ye a, *, Luca Benini b, Giovanni De Micheli

More information

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX White Paper Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX April 2010 Cy Hay Product Manager, Synopsys Introduction The most important trend

More information

Communication Networks. MAP-TELE 2011/12 José Ruela

Communication Networks. MAP-TELE 2011/12 José Ruela Communication Networks MAP-TELE 2011/12 José Ruela Network basic mechanisms Introduction to Communications Networks Communications networks Communications networks are used to transport information (data)

More information

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs Antoni Roca, Jose Flich Parallel Architectures Group Universitat Politechnica de Valencia (UPV) Valencia, Spain Giorgos Dimitrakopoulos

More information

A Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator

A Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator A Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator Nan Jiang Stanford University [email protected] James Balfour Google Inc. [email protected] Daniel U. Becker Stanford University

More information

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

A Generic Network Interface Architecture for a Networked Processor Array (NePA) A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction

More information

Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors

Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors Hyungjun Kim, 1 Arseniy Vitkovsky, 2 Paul V. Gratz, 1 Vassos Soteriou 2 1 Department of Electrical and Computer Engineering, Texas

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

Low-Overhead Hard Real-time Aware Interconnect Network Router

Low-Overhead Hard Real-time Aware Interconnect Network Router Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering

More information

How To Fix A 3 Bit Error In Data From A Data Point To A Bit Code (Data Point) With A Power Source (Data Source) And A Power Cell (Power Source)

How To Fix A 3 Bit Error In Data From A Data Point To A Bit Code (Data Point) With A Power Source (Data Source) And A Power Cell (Power Source) FPGA IMPLEMENTATION OF 4D-PARITY BASED DATA CODING TECHNIQUE Vijay Tawar 1, Rajani Gupta 2 1 Student, KNPCST, Hoshangabad Road, Misrod, Bhopal, Pin no.462047 2 Head of Department (EC), KNPCST, Hoshangabad

More information

Real-time Processor Interconnection Network for FPGA-based Multiprocessor System-on-Chip (MPSoC)

Real-time Processor Interconnection Network for FPGA-based Multiprocessor System-on-Chip (MPSoC) Real-time Processor Interconnection Network for FPGA-based Multiprocessor System-on-Chip (MPSoC) Stefan Aust, Harald Richter Department of Computer Science Clausthal University of Technology Julius-Albert-Str.

More information

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

Influence of Load Balancing on Quality of Real Time Data Transmission*

Influence of Load Balancing on Quality of Real Time Data Transmission* SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 6, No. 3, December 2009, 515-524 UDK: 004.738.2 Influence of Load Balancing on Quality of Real Time Data Transmission* Nataša Maksić 1,a, Petar Knežević 2,

More information

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09 Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors NoCArc 09 Jesús Camacho Villanueva, José Flich, José Duato Universidad Politécnica de Valencia December 12,

More information

Quality of Service (QoS) for Asynchronous On-Chip Networks

Quality of Service (QoS) for Asynchronous On-Chip Networks Quality of Service (QoS) for synchronous On-Chip Networks Tomaz Felicijan and Steve Furber Department of Computer Science The University of Manchester Oxford Road, Manchester, M13 9PL, UK {felicijt,sfurber}@cs.man.ac.uk

More information

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Bradley R. Quinton Dept. of Electrical and Computer Engineering University of British Columbia [email protected]

More information

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware Shaomeng Li, Jim Tørresen, Oddvar Søråsen Department of Informatics University of Oslo N-0316 Oslo, Norway {shaomenl, jimtoer,

More information

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems

More information

FPGA Implementation of IP Packet Segmentation and Reassembly in Internet Router*

FPGA Implementation of IP Packet Segmentation and Reassembly in Internet Router* SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 6, No. 3, December 2009, 399-407 UDK: 004.738.5.057.4 FPGA Implementation of IP Packet Segmentation and Reassembly in Internet Router* Marko Carević 1,a,

More information

FPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL

FPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL FPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL B. Dilip, Y. Alekhya, P. Divya Bharathi Abstract Traffic lights are the signaling devices used to manage traffic on multi-way

More information

40G MACsec Encryption in an FPGA

40G MACsec Encryption in an FPGA 40G MACsec Encryption in an FPGA Dr Tom Kean, Managing Director, Algotronix Ltd, 130-10 Calton Road, Edinburgh EH8 8JQ United Kingdom Tel: +44 131 556 9242 Email: [email protected] February 2012 1 MACsec

More information

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7 ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7 13.7 A 40Gb/s Clock and Data Recovery Circuit in 0.18µm CMOS Technology Jri Lee, Behzad Razavi University of California, Los Angeles, CA

More information

Dynamic Congestion-Based Load Balanced Routing in Optical Burst-Switched Networks

Dynamic Congestion-Based Load Balanced Routing in Optical Burst-Switched Networks Dynamic Congestion-Based Load Balanced Routing in Optical Burst-Switched Networks Guru P.V. Thodime, Vinod M. Vokkarane, and Jason P. Jue The University of Texas at Dallas, Richardson, TX 75083-0688 vgt015000,

More information

SOCWIRE: A SPACEWIRE INSPIRED FAULT TOLERANT NETWORK-ON-CHIP FOR RECONFIGURABLE SYSTEM-ON-CHIP DESIGNS

SOCWIRE: A SPACEWIRE INSPIRED FAULT TOLERANT NETWORK-ON-CHIP FOR RECONFIGURABLE SYSTEM-ON-CHIP DESIGNS SOCWIRE: A SPACEWIRE INSPIRED FAULT TOLERANT NETWORK-ON-CHIP FOR RECONFIGURABLE SYSTEM-ON-CHIP DESIGNS IN SPACE APPLICATIONS Session: Networks and Protocols Long Paper B. Osterloh, H. Michalik, B. Fiethe

More information

Towards a Design Space Exploration Methodology for System-on-Chip

Towards a Design Space Exploration Methodology for System-on-Chip BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 14, No 1 Sofia 2014 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2014-0008 Towards a Design Space Exploration

More information

Token-ring local area network management

Token-ring local area network management Token-ring local area network management by BARBARA J. DON CARLOS IBM Corporation Research Triangle Park, North Carolina ABSTRACT This paper describes an architecture for managing a token-ring local area

More information

G.Vijaya kumar et al, Int. J. Comp. Tech. Appl., Vol 2 (5), 1413-1418

G.Vijaya kumar et al, Int. J. Comp. Tech. Appl., Vol 2 (5), 1413-1418 An Analytical Model to evaluate the Approaches of Mobility Management 1 G.Vijaya Kumar, *2 A.Lakshman Rao *1 M.Tech (CSE Student), Pragati Engineering College, Kakinada, India. [email protected]

More information

Optimizing Configuration and Application Mapping for MPSoC Architectures

Optimizing Configuration and Application Mapping for MPSoC Architectures Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : [email protected] 1 Multi-Processor Systems on Chip (MPSoC) Design Trends

More information

The proliferation of the raw processing

The proliferation of the raw processing TECHNOLOGY CONNECTED Advances with System Area Network Speeds Data Transfer between Servers with A new network switch technology is targeted to answer the phenomenal demands on intercommunication transfer

More information

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip www.ijcsi.org 241 A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip Ahmed A. El Badry 1 and Mohamed A. Abd El Ghany 2 1 Communications Engineering Dept., German University in Cairo,

More information

Chapter 2. Multiprocessors Interconnection Networks

Chapter 2. Multiprocessors Interconnection Networks Chapter 2 Multiprocessors Interconnection Networks 2.1 Taxonomy Interconnection Network Static Dynamic 1-D 2-D HC Bus-based Switch-based Single Multiple SS MS Crossbar 2.2 Bus-Based Dynamic Single Bus

More information

NIOS II Based Embedded Web Server Development for Networking Applications

NIOS II Based Embedded Web Server Development for Networking Applications NIOS II Based Embedded Web Server Development for Networking Applications 1 Sheetal Bhoyar, 2 Dr. D. V. Padole 1 Research Scholar, G. H. Raisoni College of Engineering, Nagpur, India 2 Professor, G. H.

More information

Ring Local Area Network. Ring LANs

Ring Local Area Network. Ring LANs Ring Local Area Network Ring interface (1-bit buffer) Ring interface To station From station Ring LANs The ring is a series of bit repeaters, each connected by a unidirectional transmission link All arriving

More information

LogiCORE IP AXI Performance Monitor v2.00.a

LogiCORE IP AXI Performance Monitor v2.00.a LogiCORE IP AXI Performance Monitor v2.00.a Product Guide Table of Contents IP Facts Chapter 1: Overview Target Technology................................................................. 9 Applications......................................................................

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394

More information

OpenSoC Fabric: On-Chip Network Generator

OpenSoC Fabric: On-Chip Network Generator OpenSoC Fabric: On-Chip Network Generator Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf MODSIM 2014 Presentation

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design Applying the Benefits of on a Chip Architecture to FPGA System Design WP-01149-1.1 White Paper This document describes the advantages of network on a chip (NoC) architecture in Altera FPGA system design.

More information

In-network Monitoring and Control Policy for DVFS of CMP Networkson-Chip and Last Level Caches

In-network Monitoring and Control Policy for DVFS of CMP Networkson-Chip and Last Level Caches In-network Monitoring and Control Policy for DVFS of CMP Networkson-Chip and Last Level Caches Xi Chen 1, Zheng Xu 1, Hyungjun Kim 1, Paul V. Gratz 1, Jiang Hu 1, Michael Kishinevsky 2 and Umit Ogras 2

More information

QoSIP: A QoS Aware IP Routing Protocol for Multimedia Data

QoSIP: A QoS Aware IP Routing Protocol for Multimedia Data QoSIP: A QoS Aware IP Routing Protocol for Multimedia Data Md. Golam Shagadul Amin Talukder and Al-Mukaddim Khan Pathan* Department of Computer Science and Engineering, Metropolitan University, Sylhet,

More information

TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN

TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN USING DIFFERENT FOUNDRIES Priyanka Sharma 1 and Rajesh Mehra 2 1 ME student, Department of E.C.E, NITTTR, Chandigarh, India 2 Associate Professor, Department

More information

Computer Networking Networks

Computer Networking Networks Page 1 of 8 Computer Networking Networks 9.1 Local area network A local area network (LAN) is a network that connects computers and devices in a limited geographical area such as a home, school, office

More information

packet retransmitting based on dynamic route table technology, as shown in fig. 2 and 3.

packet retransmitting based on dynamic route table technology, as shown in fig. 2 and 3. Implementation of an Emulation Environment for Large Scale Network Security Experiments Cui Yimin, Liu Li, Jin Qi, Kuang Xiaohui National Key Laboratory of Science and Technology on Information System

More information

On-Chip Communications Network Report

On-Chip Communications Network Report On-Chip Communications Network Report ABSTRACT This report covers the results of an independent, blind worldwide survey covering on-chip communications networks (OCCN), defined as is the entire interconnect

More information

A Fast Path Recovery Mechanism for MPLS Networks

A Fast Path Recovery Mechanism for MPLS Networks A Fast Path Recovery Mechanism for MPLS Networks Jenhui Chen, Chung-Ching Chiou, and Shih-Lin Wu Department of Computer Science and Information Engineering Chang Gung University, Taoyuan, Taiwan, R.O.C.

More information

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING CHAPTER 6 CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING 6.1 INTRODUCTION The technical challenges in WMNs are load balancing, optimal routing, fairness, network auto-configuration and mobility

More information

Performance Analysis of AQM Schemes in Wired and Wireless Networks based on TCP flow

Performance Analysis of AQM Schemes in Wired and Wireless Networks based on TCP flow International Journal of Soft Computing and Engineering (IJSCE) Performance Analysis of AQM Schemes in Wired and Wireless Networks based on TCP flow Abdullah Al Masud, Hossain Md. Shamim, Amina Akhter

More information

RSVP- A Fault Tolerant Mechanism in MPLS Networks

RSVP- A Fault Tolerant Mechanism in MPLS Networks RSVP- A Fault Tolerant Mechanism in MPLS Networks S.Ravi Kumar, M.Tech(NN) Assistant Professor Gokul Institute of Technology And Sciences Piridi, Bobbili, Vizianagaram, Andhrapradesh. Abstract: The data

More information

Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects*

Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects* 15th IEEE Symposium on High-Performance Interconnects Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects* Dongkook Park, Reetuparna Das, Chrysostomos Nicopoulos, Jongman

More information

Customer Specific Wireless Network Solutions Based on Standard IEEE 802.15.4

Customer Specific Wireless Network Solutions Based on Standard IEEE 802.15.4 Customer Specific Wireless Network Solutions Based on Standard IEEE 802.15.4 Michael Binhack, sentec Elektronik GmbH, Werner-von-Siemens-Str. 6, 98693 Ilmenau, Germany Gerald Kupris, Freescale Semiconductor

More information

Interconnection Generation for System-on-Chip Design and Design Space Exploration

Interconnection Generation for System-on-Chip Design and Design Space Exploration Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. G. Fettweis Interconnection Generation for System-on-Chip Design and Design Space Exploration Dipl.-Ing. Markus Winter Vodafone Chair for Mobile

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 Load Balancing Heterogeneous Request in DHT-based P2P Systems Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh

More information

Opnet Based simulation for route redistribution in EIGRP, BGP and OSPF network protocols

Opnet Based simulation for route redistribution in EIGRP, BGP and OSPF network protocols IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 9, Issue 1, Ver. IV (Jan. 2014), PP 47-52 Opnet Based simulation for route redistribution

More information

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E) Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E) 1 Topologies Internet topologies are not very regular they grew incrementally Supercomputers

More information

9/14/2011 14.9.2011 8:38

9/14/2011 14.9.2011 8:38 Algorithms and Implementation Platforms for Wireless Communications TLT-9706/ TKT-9636 (Seminar Course) BASICS OF FIELD PROGRAMMABLE GATE ARRAYS Waqar Hussain [email protected] Department of Computer

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

Local Area Network By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore Email: [email protected] Local Area Network LANs connect computers and peripheral

More information

A Framework for Automatic Generation of Configuration Files for a Custom Hardware/Software RTOS

A Framework for Automatic Generation of Configuration Files for a Custom Hardware/Software RTOS A Framework for Automatic Generation of Configuration Files for a Custom Hardware/Software Jaehwan Lee, Kyeong Keol Ryu and Vincent John Mooney III School of Electrical and Computer Engineering Georgia

More information

FPGA Clocking. Clock related issues: distribution generation (frequency synthesis) multiplexing run time programming domain crossing

FPGA Clocking. Clock related issues: distribution generation (frequency synthesis) multiplexing run time programming domain crossing FPGA Clocking Clock related issues: distribution generation (frequency synthesis) Deskew multiplexing run time programming domain crossing Clock related constraints 100 Clock Distribution Device split

More information