Scalable and Fault-tolerant Network-on-Chip Design Using the Quartered Recursive Diagonal Torus Topology



Similar documents
A Holistic Method for Selecting Web Services in Design of Composite Applications

Sebastián Bravo López

Performance Analysis of IEEE in Multi-hop Wireless Networks

Open and Extensible Business Process Simulator

Hierarchical Clustering and Sampling Techniques for Network Monitoring

Asymmetric Error Correction and Flash-Memory Rewriting using Polar Codes

Granular Problem Solving and Software Engineering

Chapter 1 Microeconomics of Consumer Theory

Channel Assignment Strategies for Cellular Phone Systems

Neural network-based Load Balancing and Reactive Power Control by Static VAR Compensator

Parametric model of IP-networks in the form of colored Petri net

Static Fairness Criteria in Telecommunications

A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs

Research on Virtual Vehicle Driven with Vision

A Keyword Filters Method for Spam via Maximum Independent Sets

Computational Analysis of Two Arrangements of a Central Ground-Source Heat Pump System for Residential Buildings

FIRE DETECTION USING AUTONOMOUS AERIAL VEHICLES WITH INFRARED AND VISUAL CAMERAS. J. Ramiro Martínez-de Dios, Luis Merino and Aníbal Ollero

REDUCTION FACTOR OF FEEDING LINES THAT HAVE A CABLE AND AN OVERHEAD SECTION

An Enhanced Critical Path Method for Multiple Resource Constraints

Recovering Articulated Motion with a Hierarchical Factorization Method

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 3, MAY/JUNE

Weighting Methods in Survey Sampling

Agent-Based Grid Load Balancing Using Performance-Driven Task Scheduling

Programming Basics - FORTRAN 77

A Three-Hybrid Treatment Method of the Compressor's Characteristic Line in Performance Prediction of Power Systems

How To Fator

OpenSession: SDN-based Cross-layer Multi-stream Management Protocol for 3D Teleimmersion

A novel active mass damper for vibration control of bridges

Computer Networks Framing

An Efficient Network Traffic Classification Based on Unknown and Anomaly Flow Detection Mechanism

Improved SOM-Based High-Dimensional Data Visualization Algorithm

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules

Chapter 5 Single Phase Systems

In order to be able to design beams, we need both moments and shears. 1. Moment a) From direct design method or equivalent frame method

Implementation of PIC Based LED Displays

10.1 The Lorentz force law


Marker Tracking and HMD Calibration for a Video-based Augmented Reality Conferencing System

Microcontroller Based PWM Controlled Four Switch Three Phase Inverter Fed Induction Motor Drive

A Context-Aware Preference Database System

' R ATIONAL. :::~i:. :'.:::::: RETENTION ':: Compliance with the way you work PRODUCT BRIEF

From a strategic view to an engineering view in a digital enterprise

Electrician'sMathand BasicElectricalFormulas

CIS570 Lecture 4 Introduction to Data-flow Analysis 3

Bandwidth Allocation and Session Scheduling using SIP

AngelCast: Cloud-based Peer-Assisted Live Streaming Using Optimized Multi-Tree Construction

MATE: MPLS Adaptive Traffic Engineering

Optimal Online Buffer Scheduling for Block Devices *

Wireless Transmission Systems. Instructor

Intelligent Measurement Processes in 3D Optical Metrology: Producing More Accurate Point Clouds

WORKFLOW CONTROL-FLOW PATTERNS A Revised View

Srinivas Bollapragada GE Global Research Center. Abstract

) ( )( ) ( ) ( )( ) ( ) ( ) (1)

Unit 12: Installing, Configuring and Administering Microsoft Server

Impedance Method for Leak Detection in Zigzag Pipelines

1.3 Complex Numbers; Quadratic Equations in the Complex Number System*

Capacity at Unsignalized Two-Stage Priority Intersections

Robust Classification and Tracking of Vehicles in Traffic Video Streams

VOLTAGE CONTROL WITH SHUNT CAPACITANCE ON RADIAL DISTRIBUTION LINE WITH HIGH R/X FACTOR. A Thesis by. Hong-Tuan Nguyen Vu

Soft-Edge Flip-flops for Improved Timing Yield: Design and Optimization

Henley Business School at Univ of Reading. Pre-Experience Postgraduate Programmes Chartered Institute of Personnel and Development (CIPD)

Classical Electromagnetic Doppler Effect Redefined. Copyright 2014 Joseph A. Rybczyk

SLA-based Resource Allocation for Software as a Service Provider (SaaS) in Cloud Computing Environments

protection p1ann1ng report

Software Ecosystems: From Software Product Management to Software Platform Management

Deduplication with Block-Level Content-Aware Chunking for Solid State Drives (SSDs)

Online Energy Generation Scheduling for Microgrids with Intermittent Energy Sources and Co-Generation

Trade Information, Not Spectrum: A Novel TV White Space Information Market Model

An integrated optimization model of a Closed- Loop Supply Chain under uncertainty

OpenScape 4000 CSTA V7 Connectivity Adapter - CSTA III, Part 2, Version 4.1. Developer s Guide A31003-G9310-I D1

Social Network Analysis Based on BSP Clustering Algorithm

Interpretable Fuzzy Modeling using Multi-Objective Immune- Inspired Optimization Algorithms

Learning Curves and Stochastic Models for Pricing and Provisioning Cloud Computing Services

5.2 The Master Theorem

Green Cloud Computing

Big Data Analysis and Reporting with Decision Tree Induction

Strategic Plan. Achieving our 2020 vision. Faculty of Health Sciences

Supply chain coordination; A Game Theory approach

THE PERFORMANCE OF TRANSIT TIME FLOWMETERS IN HEATED GAS MIXTURES

Deadline-based Escalation in Process-Aware Information Systems

INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS

WATER CLOSET SUPPORTS TECHNICAL DATA

Capacity of Multi-Channel Wireless Networks. Random (c, f) Assignment

Scalable Hierarchical Multitask Learning Algorithms for Conversion Optimization in Display Advertising

A Game Theoretical Approach to Gateway Selections in Multi-domain Wireless Networks

Chapter 1: Introduction

Earthquake Loss for Reinforced Concrete Building Structures Before and After Fire damage

IT Essentials II: Network Operating Systems

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Scalable TCP: Improving Performance in Highspeed Wide Area Networks

Impact Simulation of Extreme Wind Generated Missiles on Radioactive Waste Storage Facilities

TECHNOLOGY-ENHANCED LEARNING FOR MUSIC WITH I-MAESTRO FRAMEWORK AND TOOLS

THERMAL TO MECHANICAL ENERGY CONVERSION: ENGINES AND REQUIREMENTS Vol. I - Thermodynamic Cycles of Reciprocating and Rotary Engines - R.S.

Equalization of Fiber Optic Channels

To the Graduate Council:

Transformation of Tracking Error in Parallel Kinematic Machining. Universität Bremen

INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS

AUDITING COST OVERRUN CLAIMS *

INTELLIGENCE IN SWITCHED AND PACKET NETWORKS

Discovering Trends in Large Datasets Using Neural Networks

Transcription:

alable and Fault-tolerant etwork-on-chip esign Using the Quartered Reursive iagonal Torus Topolog Xianfang Tan 1, Lei Zhang 1, hankar eelkrishnan 1, Mei Yang 1, Yingtao Jiang 1, Yulu Yang 1 epartment of Eletrial and Computer Engineering, Universit of evada, Las Vegas, V 89154 College of Information Tehnolog and iene, ankai Universit, Tianjin, China Emails: 1 {tan, zhangl6}@unlvnevadaedu, sankier@gmailom, {meiang, ingtao}@egrunlvedu ABTRACT etwork-on-a-hip (oc) is an effetive approah to onnet and manage the ommuniation between the variet of design elements and intelletual propert bloks required in large and omple sstem-on-hips In this paper, we propose a new oc arhiteture, referred as the Quartered Reursive iagonal Torus (QRT), whih is onstruted b overlaing diagonal torus ue to its small diameter and rih routing reourses, QRT is determined to be well suitable to onstrut highl salable ocs In QRT, data pakets an be routed through a proposed minimal routing algorithm based on the Johnson odes that have traditionall been used in finite state mahine designs It has been shown that this proposed routing algorithm with minor modifiations is apable of handling the single link/node failure The hardware ost of the proposed QRT arhiteture and its assoiated routing algorithm is revealed b designing two QRT routers whih have been snthesized using TMC 18μm CMO tehnolog Categories and ubjet esriptors: C1 Multiple ata tream Arhitetures (Multiproessors), Interonnetion arhitetures (eg, ommon bus, multiport memor, rossbar swith) General Terms: Algorithms, esign Kewords oc, QRT, routing algorithm, router 1 ITROUCTIO The development of VLI tehnolog has ontinuousl driven the inrease of on-hip apait The International Tehnolog Roadmap for emiondutors (ITR) [7] predits that stem-on- Chips (ocs) will grow to multi-billion transistors in net a few ears One of the major hallenges in designing suh highl integrated ocs will be to find an effetive wa to integrate multiple pre-designed Intelletual Propert (IP) ores while addressing pressing power and performane onerns [1] With a ommuniation entri design stle, etworks-on-chips (ocs) have emerged as a new oc paradigm [1] to overome the limitations of bus-based ommuniation infrastruture In general, a paket-based oc onsists of routers, the network interfae between the router and the IP, and the interonnetion network The major hallenges imposed upon oc designs inlude salabilit, energ effiien, and reonfigurabilit [] umerous This work is in part supported b F under grant no ECC-7168 Permission to make digital or hard opies of all or part of this work for personal or lassroom use is granted without fee provided that opies are not made or distributed for profit or ommerial advantage and that opies bear this notie and the full itation on the first page To op otherwise, or republish, to post on servers or to redistribute to lists, requires prior speifi permission and/or a fee GLVLI 8, Ma 4 6, 8, Orlando, Florida, UA Copright 8 ACM 978-1-5959-999-9/8/5$5 interonnetion network topologies have been onsidered for ocs, inluding mesh [9], torus [8], fat tree [5], honeomb [6], and quite a few others [] However, these interonnetion topologies either sale poorl or have no support for reonfigurabilit To address these two problems, a lass of topologies named Reursive iagonal Torus (RT) [1] has made its wa to oc designs [15] An RT struture is onstruted b reursivel overlaing - diagonal torus [1], and it has the following features: reursive struture with onstant node degree, smaller diameter and average distane, and embedded mesh/torus topolog Consider a torus network omposed of nodes, where = nk and both n and k are natural numbers The rank- torus is formed b reating a link between node (, ) and eah of its neighboring four nodes: (mod(±1, ), ) and (, mod(±1, )) On top of rank- torus, rank- 1 torus is formed b adding one link between node (, ) and eah of the four nodes (mod(±n, ), mod(±n, )) it needs to onnet to The diretion of the rank-1torus is at an angle of 45 degrees to the rank- torus On rank-1 torus, rank- torus an be formed b adding 4 links to a node in the same manner In a more general sense, a rank-(r+1) torus is formed upon rank-r torus In this paper, a speial tpe of RT struture, Quartered RT (QRT) will be presented The QRT arhiteture ehibits a number of desirable networking properties that make it partiularl appreiated in building large on-hip miro-networks B indeing the network nodes with the Johnson odes, a new minimal routing algorithm, named Johnson Coded Vetor Routing (JCVR), is proposed for QRT We further show that the JCVR algorithm with minor modifiations is apable of routing data pakets there is a single link/node failure in a QRT-based oc To eamine the hardware ost of the proposed QRT arhiteture and the JCVR algorithm, the QRT router is snthesized and the simulation results are reported The rest of the paper is organized as follows etion introdues the QRT struture and its network properties, followed b the introdution of the Johnson ode and the basi funtions defined for the Johnson ode in etion etion 4 desribes the JCVR algorithm and the fault tolerant routing variant under single link/node failure In etion 5, the design and implementation of the QRT router and the simulation results will be desribed Finall, etion 6 onludes the paper QRT TRUCTURE A IT ETWORK PROPERTIE As a speial tpe of RT, the QRT network is also onstruted b overlaing diagonal torus An -QRT has nodes where =4n and n is a positive integer A QRT is omposed of nodes, rank- links, and rank-1 links Eah node (, ), where, -1, in the QRT has 4 rank- links onneting to its four neighbors (mod(±1, ), ) and (, mod(±1, )) on rank- torus, and 4 rank-1 links

onneting to its another four neighbors (mod(±n, ), mod(±n, )) on rank-1 torus In total, an -QRT has nodes and 4 links Fig 1 shows the struture of 4-QRT Fig (a) shows the diretions of the 8 hannels, and eah hannel is labeled b the dimension, rank number, and diretion Fig (b) shows the oordinates of all eight nodes that are onneted to node (, ) through respetive hannels shown in Fig (a) Figure 1 truture of 4-QRT As QRT is a speial tpe of RT struture, the Vetor Routing (VR) algorithm [1] proposed for RT an be readil applied to QRT The goal of the vetor routing algorithm is to represent the routing vetor with an epression that ombines all unit vetors in rank- and rank-1 torus and the hop ounts However, the vetor routing algorithm annot guarantee it is minimal routing (ie, the number of hops on the routing path ma not be minimum) In this paper, we propose a minimal routing algorithm, named Johnson oded vetor routing algorithm In the following, the Johnson odes are introdued first before the JCVR algorithm is desribed JOHO COE A FUCTIO TO MAIPULATE THE COE 1 Johnson Codes The m-bit Johnson ode has M (M=m) ode words, and it an be denoted as { C, C 1, C, L, Ci, Ci+ 1, L, CM, CM 1 }, where ode word C i is given as ( i) ( i) ( i) ( i) ( i) ( i) ( i) = b b L b b b Lb b, and Ci 1 j j j+ 1 m () i + j < m or i + j m b i j = 1 m i + j < m For eample, the eight 4-bit Johnson ode words are, 1, 11, 111, 1111, 111, 11 and 1 On the other hand, given an m-bit Johnson ode word C i, i an be determined b the f a funtion as (a) (b) Figure (a) iretions of 8 hannels (b) Coordinates of 8 nodes onneted to (, ) through the hannels Lemma 1: The diameter of the -QRT network is n+1, where n=/4 Lemma : The average distane of the -QRT is n + n n +, where n=/4 16n The omplete proofs of these two lemmas, based on the routing algorithm to be presented in etion 4, are provided in the appendi Table 1 lists the diameters and the average distanes of QRT, mesh, torus, and hperube with different network sizes One an see that QRT has smaller diameter and average distane than the other three strutures for most network sizes Table 1 Comparison of four tpes of networks in terms of diameter (R) and average distane (A) etwork QRT Mesh Torus Hperube ize R A R A R A R A 4 4 147 6 67 4 4 8 8 1 14 5 8 4 6 16 16 5 77 167 16 8 8 4 9 651 6 1 16 1 5 bj if b = j= i = fa( Ci ) = m + bj if b = 1 j= Basi Funtions of Johnson Codes Two basi funtions of Johnson odes are defined: the distane funtion and the diretion funtion These funtions will be used in the JCVR algorithm The distane funtion is used to determine the Hamming distane between two Johnson-odes Given two m bits Johnson-odes: A and B, where A= a a 1 a a m-1 and B = b b 1 b b m-1, the distane p m 1 between them is = ( ) p a i b i The diretion funtion is used to determine the output diretion The diretion is alulated as: Case 1: p=, then = Case : 1 > ai a = and b =, then = ; < ai

Case : Case 4: Case 5: 1 < ai a =1 and b =1, then = ; > ai 1 ai a = and b =1, then = ; > ai 1 ai a =1 and b =, then = < ai For simpliit, we denote the alulation of the distane and p, = f r A, B diretion as one funtion f r as ( ) ( ) 4 ROUTIG ALGORITHM FOR QRT 41 JCVR Algorithm The routing steps from a soure node to its destination node an be r r r r r derived as A = vex1x1 + veyy 1 1 + vexx + veyy, where X r 1, Y r 1, X r, and Y r are the unit vetors in rank-1 torus and rank- torus (with their diretions shown in Fig (a)) and their respetive oeffiients, vex 1, vey 1, vex, vey, give the number of hops on orresponding dimensions In the JCVR algorithm, the oeffiients for the unit vetors are alulated at the soure node For -QRT, given the soure and the destination addresses =(, ), =(, ), where and ( and ) represent the X and Y oordinates of the soure (destination) node in m-bit (m=/) Johnson ode, respetivel The diretion and distane values of the two oordinates, (p, ) and (p, ), an be determined b alulating f r as disussed in etion The address of eah intermediate node M=(M, M ) on the routing path from to is given as, M M = f s = f s (,, ) (,, ) mod = mod mod = mod ( + n) ( n) ( + n) ( n) The JCVR algorithm for QRT is listed below < < Johnson Coded Vetor Routing Algorithm for QRT: begin initialize vex, vey, vex 1, vey 1 to zero let M =, M = while ( M and M ) do alulate (p, ) and (p, ) using f r funtion if ( = and =) then set vex, vey, vex 1, vey 1 to zero elseif (p +p n) then if ( ) then vex =vex + p end elseif ( <) then vex =vex p if ( ) then vey =vey +p elseif ( <) then vey =vey p go to end elseif (p +p >n) then if ( and ) then vex 1 =vex 1 +1 if ( and <) then vey 1 =vey 1 1 if ( < and ) then vey 1 =vey 1 +1 if ( < and <) then vex 1 =vex 1 1 alulate M and M using the f s funtion endif endwhile For an -QRT, the while loop will be eeuted no more than m=/4 times to omplete the alulation of the four oeffiients (vex 1, vey 1, vex, vey ) At the soure node, the four oeffiients are omputed and enapsulated into the paket At the soure node and eah intermediate node, the oeffiients for the unit vetors will be heked in a default order of vex 1, vey 1, vex, vey The paket will be routed to the diretion as determined b the first unit vetor with non-zero oeffiient (hop ount) and this vetor s oeffiient, depending on the routing diretion (+/-), will be deremented/inremented before the data pakets are atuall forwarded to the net hop towards the destination Compared to the VR algorithm, the advantage of the JCVR algorithm is that it an ahieve minimal routing (as proved in Lemma 1), whih is not guaranteed in VR 4 Fault Tolerane Routing for QRT under ingle Link/ode Failure It has been well reognized that fault-tolerane apabilit is vital for a oc sstem [], sine one fault link/proessor ma isolate a large fration of IP ores from being reahable b other ores et we show that with minor modifiation, the JCVR algorithm is apable of handling single link/node failure The following assumptions are adopted in this paper [1] 1) An link or node in the network an fail, and the fault omponents are unusable; that is, data will not be transmitted over a fault link or routed through a fault node ) The fault model is stati; that is, no new faults our during a routing proess ) Both soure and destination nodes (on rank- or rank-1 torus) are fault-free 4) The faults our independentl 5) If a node fails, all the eight links assoiated with the node on rank-/1 tori are onsidered unusable uring the routing proess, the net routing link/node is found to be fault in a QRT network, fault-tolerane routing an be realized dnamiall as desribed below The fault link enountered must be on one of the four dimensions (ie, X r 1, Y r 1, X r, and Y r ) There are two ases to onsider: Case 1: if there eists one or more other dimensions with non-zero oeffiient, the routing steps on suh dimension(s) will be ompleted first followed b the routing steps on the dimension with fault link/node Case : otherwise, one step on the dimension on the same rank and orthogonal to the dimension with fault link/node will be ompleted

first followed b the routing steps on the dimension with fault link/node and one opposite routing step on the orthogonal dimension Fig (a) shows an eample of Case 1 under single link failure The detoured routing path is -M -M 1 -, whih does not introdue etra hops ompared to the original routing path -M -M - imilarl, as shown in Fig 4(a), no etra hop is introdued on the detoured routing path for suh a ase under single node failure Fig (b) shows an eample of Case with a single link failure, where the detoured routing path has two etra hops than the original routing path As shown in Fig 4(b), the number of etra hops on the detoured routing path is also two for suh a ase under single node failure Fault Link Original Path ew Path as there are 9 links in a node The Loal (L) is reserved to attah the router to an IP, and all the other eight ports (whose names follow the eight diretions) onnet the router to all its neighbors in eight diretions Eah port has both input and output hannels The main funtions of the router inlude buffering, routing, sheduling, swithing, and flow ontrol To implement these funtions, the following omponents are designed: input hannel module, output hannel module, rossbar swith, and sheduler Fig 5 shows the top level blok diagram of the QRT router The router is designed as a Verilog HL soft-ore inorporating parameterized omponents In the following, the design of eah omponent is desribed in brief The details an be found in [1] M M M M M 1 M 4 M (a) Case1 (b) Case Figure Two ases under single link fault M M 1 M M M M 1 (a) Case1 (b) Case Figure 4 Two ases under single node fault Hene the following lemma an be derived Lemma : When a single fault link/node eists in the QRT network, the etra number of hops on the detoured route generated b the fault-tolerane JCVR routing algorithm, as ompared to the number of hops on the original route for a fault-free QRT, is no more than 5 EIG A IMPLEMETATIO OF QRT ROUTER To eamine the hardware ost of the proposed QRT arhiteture and the JCVR algorithm, the QRT router is designed at RTL level using Verilog HL and the design is subsequentl snthesized using TMC 18μm CMO tehnolog 51 QRT Router esign The router designed for QRT-based ocs has up to 9 ommuniation ports, named as, E, E, E,, W, W, W and L, Figure 5 Blok diagram of the QRT router Input Channel Module (ICM): The ICM performs the following funtions (i) It buffers the inoming pakets using a FIFO buffer and it analzes the header (ii) It performs the routing algorithm (iii) It generates a request ( req ) to the sheduler requesting the destined output hannel Eah ICM is omposed of Input Flow Controller (IFC), FIFO, and Input Controller (IC) The IC at a loal port implements the routing algorithm One a header flit reahes the IC, the routing vetors are alulated and updated in the header flit The IC at other ports onl implements the funtion of updating the routing vetors as desribed in etion Output Channel Module (OCM): The OCM is responsible for flow ontrol inside the router and outside the router It is omposed of two arhitetural bloks named Output Controller (OC) and Output Flow Controller (OFC) Crossbar with: The 9 9 rossbar swith has 9 data inputs, eah onneted b an ICM, 9 ontrol inputs from the sheduler, and 9 data outputs, eah onneted to an OCM In Verilog soft-ore implementation, a multipleer-based rossbar swith design is atuall adopted Parameterized Round Robin Arbiter based heduler (PRRA): The PRRA design is parameterized with the router size A 9-QRT router omposed of 9 deoders, 9 Parallel Round Robin Arbiters (PRRAs) [16], and 9 OR gates Eah PRRA is assoiated with one output port and responsible for arbitrating the requests to the output port from up to 9 input ports 5 nthesis Results The QRT routers using both VR and JCVR algorithms are implemented For both designs, Verilog HL odes [4] are generated and snthesized using nopss s design analzer [1] targeting TMC 18μm CMO tehnolog As the VR-based QRT router and JCVR-based QRT router onl differ on the design of the ICMs,

in the following, the performane of the ICMs and the whole router in terms of area (given as the number of -input A gates equivalent), timing, and power onsumption (both dnami power and leakage power) is reported The ICM ontains IC, FIFO and IFC Two tpes of ICs are designed: IC for the loal port (IC L ), the port that onnets the loal node to the router, and IC at other ports (IC O ) Two tpes of IC L are designed based on the VR and JCVR algorithms with fault-tolerane apabilit under single link/node failure Table lists the area, timing dela, and power of the ICM with IC L for both routing algorithms The results show that the ICM with VR-based IC L has less area, power onsumption, and iruit dela than the ICM with JCVR-based IC L Table : Results of ICMs with IC L ICM with IC L (VR) ICM with IC L (JCVR) Total Area (-input A gate) 857 58459 Power nami Power (mw) 59 655 Leakage Power (nw) 68 855 Timing ela (ns) 67 17 The results of other omponents an be found in [1] Table summarizes the results of the VR-based and the JCVR-based QRT router designs A moderate inrease of area, power onsumption and iruit dela for JCVR-based router is evident as ompared to a VRbased router However, as pointed out earlier, this etra ost is justifiable beause JCVR an provide minimum routing, a feature that is ritial in high performane ocs but VR fails to deliver Table : Results of QRT routers VR-based Router JCVR-based Router Area (-input A gates) 667 6664 nami Power (mw) 61 6 Leakage Power (nw) 7418 759 Timing ela (ns) 64 81 6 COCLUIO In this paper, Quartered Reursive iagonal Torus (QRT) onstruted b overlaing diagonal torus, was introdued as a novel topolog for ocs The QRT struture ehibits a number of desirable networking properties (inluding small diameter and average distane, onstant node degree), whih make it partiularl appreiated in building large on-hip miro-networks A minimal routing algorithm, the JCVR algorithm, was proposed for QRT struture It was shown that fault tolerane routing under single link/node failure an be handled b the JCVR algorithm with minor modifiations Compared with the VR algorithm, the JCVR algorithm ahieves minimal routing The tradeoff is the moderatel higher implementation ost of the JCVR-based router than the VRbased router Our future work inludes the stud of the laout sheme for the QRT struture and the fault tolerant routing sheme under multiple link/node failures 7 REFERECE [1] Benini L and e Miheli G, etworks on hip: a new oc paradigm, IEEE Computer, 5, 1, 7-78 [] Bjerregaard T, and Mahadevan, 6, A surve of researh and praties of etwork-on-hip, ACM Comput urv, 8, 1, 1-51 [] umitras T, Kerner, and Marulesu,, Towards onhip fault-tolerant ommuniation, Pro AP-AC, 5- [4] Feliijan T and Furber B, 4, An asnhronous on-hip network router with qualit-of-servie (Qo) support, Pro IEEE Int l oc Conf, 74-77 [5] Guerrier P and Greiner A,, A generi arhiteture for on hip paket-swithed interonnetions, Pro esign, Automation and Test in Europe, 5-56 [6] Hemani A, et al,, etwork on a hip: an arhiteture for billion transistor era, Pro IEEE orchip Conf [7] International Tehnolog Roadmap for emiondutors Roadmap, OI=http://publiitrsnet/ [8] Maresau T, et al,, Interonnetion networks enable fine-grain dnami multi-tasking on FPGAs, Pro 1th Conf FPGA, 795-85 [9] Millberg M, ilsson E, Thid R, Kumar, and Jantsh A, 4, The ostrum bakbone-a ommuniation protool stak for networks on hip, Pro Intl Conf VLI design, 69-96 [1] eelkrishnan, Yang M, Jiang Y, Zhang L, Yang Y, and Lu E, and Yun X, 8, esign and implementation of a parameterized oc router and its appliation to build PRTbased ocs, to be presented on ITG 8 [11] Ogras UY, Hu J, and Marulesu R, 5, Ke researh problems in oc design: a holisti perspetive, Pro COE+I, 69-74 [1] nopss nthesis Tutorial, OI=http://wwwfawebiitkgpernetin/~apal/lps7/webpag es/71_n_tutpdf [1] Yang M, Li T, Jiang Y, and Yang Y, 5, Fault-tolerant routing shemes in RT(,,1)/α-based interonnetion network for networks-on-hip designs, Pro IPA, 5-57 [14] Yang Y, Amano H, hibamura H, and ueoshi T, 199, Reursive diagonal torus: an interonnetion network for massivel parallel omputers, Pro 5th IEEE mp Parallel and istrib Proessing, 591-594 [15] Yu Y, Yang M, Yang Y, and Jiang Y, 5, A RT-based interonnetion network for salable oc designs, Pro ITCC, 7-78 [16] Zheng Q and Yang M, 7, Algorithm-hardware odesign of fast parallel round-robin arbiters, IEEE Trans Parallel and istrib st, 18, 1, 19-115 APPEIX A Proof of Lemma 1 Proof: To prove the lemma, we first introdue the Quad-tar etwork (Q) An -Q is onstruted based on a (+1)(+1) mesh b removing all nodes with their minimum distane (in number of hops) to the enter node (indeed as (, )) greater than and their related links Fig 6 shows the struture of a

-Q, with the enter node (C) and an edge node (E) marked The minimum distane between C and E is hops In an n-q, the total number of nodes in the network is given b U ( n) = 4 [ n + ( n ) + L + + 1] n( n 1) + = 4 + 1 = n + 1 n + 1 Figure 6 truture of -Q And the sum of lengths (in number of hops) of all shortest paths between all nodes in the network to C is [ + L+ + 1 ] ( n) = 4 n + ( n ) 4n( n + 1)( n + 1) = n = 6 ( n + 1)( n + 1) Fig 7 shows that the 1-QRT is partitioned into 8 Qs in different sizes (shown in different olored regions) based on their respetive loation to the original node O (the node at the left-up orner) ome of the Qs are overlapped In general, for an - QRT, it an be partitioned to one n-q (B in Fig 7), four n-q (F 1, F, F, and F 4 in Fig 7) and three (n-1)-q ( 1,, and in Fig 7), where n=/4 Figure 7 Partition of QRT to Qs Consider the n-q region B in Fig 7 Then the rank- path between an node in B and the original node O is less than or equal to n hops Aording to the JCVR algorithm, the routing path between an node in B and the original node O is less than n Then onsider the n-q regions F 1, F, F, and F 4 The rank- path between an node in these regions and their enter node is less than or equal to n hops Aording to the JCVR algorithm, the routing path between an node in F 1, F, F and F 4 to the original node O is less than n+1 1,, and are (n-1)-qs All rank- paths between nodes in these regions to their original nodes are less than or equal to n-1 hops Aording to JCVR algorithm, the routing path between an node in 1,, and to the original node O is less than n+1 ote that the ombination of all these regions is equivalent to the whole QRT network Beause the QRT is smmetri to an of its nodes, suh partition an be applied to all nodes Then we an have the first onlusion: The diameter of the -QRT network is less than or equal to n+1, where n = 4 As disussed in etion 41, the routing steps from a soure node to a destination node an be derived as r r r r r A = vex X + veyy + vex X + vey Y The length of a routing 1 1 1 1 path is determined b the summation of the four oeffiients and will not hange with the sequene of routing steps From Fig 7, it an be seen that an routing path whih involves more than rank-1 links is not minimal That is, an shortest routing path an have at most rank-1 links Consider the shortest routing path between the original node O and node X as marked in Fig 7 One of the shortest routing paths from O to X with rank-1 links is O P R (or O P T) plus n rank- links from R to X (or T to X), whose length is n+ And one of the shortest routing paths from O to X with 1 rank-1 link is O P plus n rank- links from P to X, whose length is n+1 And there are man shortest routing paths from O to X with onl rank- links with lengths n Then we an have the seond onlusion: The diameter of the -QRT network is greater than or equal to n+1, where n = 4 Combing the above two onlusions, Lemma 1 holds B Proof of Lemma Proof: The alulation of the average distane in QRT is performed b partitioning the network into a set of the Qs Aording to the proof of Lemma 1, b the JCVR algorithm, the total lengths of all shortest paths between one seleted node and all other nodes are um = ( n) + 4 { ( n) n[ 4 + ( n 1) ] + U ( n) [ 4 + ( n 1) ]} + [ ( n 1) + U ( n 1) ] n = + n n + And the number of all suh paths is -1=16n -1 Then the average distane between one seleted node and all other nodes is ( n) + 4 { ( n) n[ 4 + ( n ) ] + U ( n) [ 4 + ( n ) ]} + [ ( n ) + U ( n ) ] d = 16n n + n n + = 16n As the -QRT is a smmetri network, the average distane of - QRT is also d