Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors



Similar documents
Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Interconnection Network

Interconnection Network Design

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Asynchronous Bypass Channels

Topological Properties

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

A Dynamic Link Allocation Router

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Scaling 10Gb/s Clustering at Wire-Speed

Architecture of distributed network processors: specifics of application in information security systems

LOAD-BALANCED ROUTING IN INTERCONNECTION NETWORKS

Interconnection Networks

Optimizing Configuration and Application Mapping for MPSoC Architectures

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Simulation Modelling Practice and Theory

Interconnection Networks

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy

Chapter 2. Multiprocessors Interconnection Networks

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Packetization and routing analysis of on-chip multiprocessor networks

Distributed Explicit Partial Rerouting (DEPR) Scheme for Load Balancing in MPLS Networks

Interconnection Networks

Influence of Load Balancing on Quality of Real Time Data Transmission*

CONTINUOUS scaling of CMOS technology makes it possible

Parallel Programming

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

MAXIMIZING RESTORABLE THROUGHPUT IN MPLS NETWORKS

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Network Architecture and Topology

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches

vci_anoc_network Specifications & implementation for the SoClib platform

Quality of Service Routing Network and Performance Evaluation*

On-Chip Interconnection Networks Low-Power Interconnect

Configuration Discovery and Mapping of a Home Network

COMMUNICATION PERFORMANCE EVALUATION AND ANALYSIS OF A MESH SYSTEM AREA NETWORK FOR HIGH PERFORMANCE COMPUTERS

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

Chapter 12: Multiprocessor Architectures. Lesson 04: Interconnect Networks

A Hybrid Electrical and Optical Networking Topology of Data Center for Big Data Network

A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs

Analysis of QoS Routing Approach and the starvation`s evaluation in LAN

Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP)

Lecture 2 Parallel Programming Platforms

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip

MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL

RSVP- A Fault Tolerant Mechanism in MPLS Networks

Optical interconnection networks with time slot routing

Using Fuzzy Logic Control to Provide Intelligent Traffic Management Service for High-Speed Networks ABSTRACT:

A Heuristic for Peak Power Constrained Design of Network-on-Chip (NoC) based Multimode Systems

Large-Scale Distributed Systems. Datacenter Networks. COMP6511A Spring 2014 HKUST. Lin Gu

From Hypercubes to Dragonflies a short history of interconnect

Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip

Scalability and Classifications

Leveraging Torus Topology with Deadlock Recovery for Cost-Efficient On-Chip Network

Performance of networks containing both MaxNet and SumNet links

Cost/Performance Evaluation of Gigabit Ethernet and Myrinet as Cluster Interconnect

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

Towards a Design Space Exploration Methodology for System-on-Chip

Three Key Design Considerations of IP Video Surveillance Systems

Behavior Analysis of Multilayer Multistage Interconnection Network With Extra Stages

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Multi-layer Structure of Data Center Based on Steiner Triple System

Components: Interconnect Page 1 of 18

SUPPORT FOR HIGH-PRIORITY TRAFFIC IN VLSI COMMUNICATION SWITCHES

OFF-CHIP COMMUNICATIONS ARCHITECTURES FOR HIGH THROUGHPUT NETWORK PROCESSORS

An Active Packet can be classified as

Dynamic Congestion-Based Load Balanced Routing in Optical Burst-Switched Networks

Data Center Network Structure using Hybrid Optoelectronic Routers

CS 78 Computer Networks. Internet Protocol (IP) our focus. The Network Layer. Interplay between routing and forwarding

Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

QoSIP: A QoS Aware IP Routing Protocol for Multimedia Data

A Study of Network Security Systems

Router Scheduling Configuration Based on the Maximization of Benefit and Carried Best Effort Traffic

SIMULATION STUDY OF BLACKHOLE ATTACK IN THE MOBILE AD HOC NETWORKS

Research Article ISSN Copyright by the authors - Licensee IJACIT- Under Creative Commons license 3.0

InfiniBand Clustering

Performance Evaluation of Multi-Core Multi-Cluster Architecture (MCMCA)

Concurrent Round-Robin-Based Dispatching Schemes for Clos-Network Switches

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION

QoS issues in Voice over IP

High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features

Recovery Modeling in MPLS Networks

Load Balancing and Switch Scheduling

Smart Queue Scheduling for QoS Spring 2001 Final Report

Quality of Service using Traffic Engineering over MPLS: An Analysis. Praveen Bhaniramka, Wei Sun, Raj Jain

Path Selection Analysis in MPLS Network Based on QoS

Quality of Service Routing in MPLS Networks Using Delay and Bandwidth Constraints

Quality of Service (QoS)) in IP networks

VoIP QoS on low speed links

Ring Protection: Wrapping vs. Steering

Improving Router Efficiency in Network on Chip Triplet-Based Hierarchical Interconnection Network with Shared Buffer Design

Optimizing Data Center Networks for Cloud Computing

GR2000: a Gigabit Router for a Guaranteed Network

Requirements of Voice in an IP Internetwork

Transcription:

2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi, Siavash Khorsandi, and Mohammad K. Akbari Department of Computer Engineering and Information Technology Amirkabir University of Technology (Tehran Polytechnic) Tehran, Iran {a_khosravi, khorsand, akbarif}@aut.ac.ir Abstract Due to the significant growth of link speeds, amount of data that should be stored on router line cards are rapidly increasing. Therefore, a large number of memory modules are required for packet storage. In addition, a high performance interconnection network on line cards is strongly needed for inter-communication between processing elements and memory modules. In this paper we propose a new interconnection network, known as hyper node torus that can support larger number of nodes compared to other topologies, such as common torus network on a limited area. Simulation results show that hyper node torus can sustain more traffic load, and also it has higher throughput than the torus network under different traffic loads. As a result, it has better scalability and higher performance for using in high-speed packet processors. Keywords- Interconnection network; hyper node torus; torus; k-ary n-cube; I. INTRODUCTION In current computer networks, the transmission links between network nodes are generally not the performance bottleneck in the system. According to Moore s law, line capacities grow exponentially; so the bottleneck tends to lie on network nodes [1]. Routers and switches, as the main components of networks, should receive, decode, encode and switch packets. Most of these functions are run on the line cards receiving packets from incoming ports and handing them to the processors for processing. These processing elements and memory modules that act as storage elements are connected together via an interconnection network on the line card. This interconnection network should provide highperformance communication, with high-throughput and lowdelay, between processing elements and memory modules. Furthermore it should provide high scalability under area limitation in implementation. The common topologies that are used for interconnection networks on line cards are shared bus and crossbar switches. Shared bus has scalability problem. However, they cannot grow well with growing the number of processing elements and memory modules [7]. In the shared bus, only two modules can access the bus at the same time. Cross bar switches also has scalability problem because with increasing the number of nodes in the network, the number of switches grows in the order of N 2. Another problem of cross bar switches is that it can work properly only as long as there is no congestion in the network. Once congestion occurs, its performance decreases. A promising interconnection network that is proposed for this purpose is k-ary n-cube network. However, according to simulations that are done in [2], two dimensional k-ary n-cube network (so called torus network), does not provide a good performance because its diameter increases with increase in network size. Their simulation results showed that k-ary 3-cube networks, that are three dimensional networks, provide better performance. But the area analysis given for its implementation indicated that it is expected to be large; hence, making it undesirable for implementation under area limitation [7]. In this paper we propose hyper node torus network. Hyper node torus is a two dimensional interconnection network that is derived from torus network, but each node of the torus is replaced with a ring with four nodes in each ring. One advantage of hyper node torus over torus is that it can support more nodes without significant increase in diameter. We evaluate the area that hyper node torus will need to be implemented on the board and compare it with the area that will be consumed by the torus network with the same number of nodes. Also, we simulate both networks under different simulation parameters in order to compare them under network load. The rest of this paper is organized as follows. Section 2 introduces the hyper node torus network. Section 3 goes through performance evaluation of this network and compares its performance with torus network. Section 4 concludes the paper and presents some future research directions. II. HYPER NODE TORUS INTERCONNECTION NETWORK This section introduces hyper node torus interconnection network. Hyper node torus network is obtained from torus network that each node is replaced by a ring. As illustrated in figure 1, this network has a two level structure: A high level hyper node torus represents by the (x, y) which represents node position in a torus network and a low level hyper node torus is presented by z which shows node position in the ring in each hyper node. This interconnection network has a distributed architecture and each node in this network can be a processor or a memory module. This allows network packets to be shared among these 978-1-4244-9154-4/11/$26.00 2011 IEEE 106

elements on network line card. In this structure memories are distributed among processors to allow direct access to memory storage and processors. Total number of nodes in this network is equal to N=2nk n. Which k represents the number of nodes in each dimension and n represents network dimension. Therefore, hyper node torus network size will be 2n times greater than torus and k-ary n- cube networks. According to figure 1 degree for each node is 3; therefore, hyper node torus mitigates high dimensionality and port limitation of k-ary n-cube networks. Routing always is through destination, so packet will never go far away from the destination and this prevents from livelock in the network. Figure 2. Wormhole Routing busy time. (t r: routing delay, t s: switching delay) B. Area Analysis One of the most important factors for designing an interconnection network on the board is the area limitation. Because alongside providing high-performance, selected interconnection network should fit on the line card. According to [5] the area that is consumed by a torus network is expressed in (1), where N, represents the number of nodes that are forming the network, k represents number of nodes in each dimension in the torus network and L represents number of layers of available wires. Figure 1. 4 4 Hyper Node Torus Interconnection Network A. Hyper Node Torus Routing Mechanism The routing algorithm that is used for hyper node torus interconnection network is based on simple XY routing. This algorithm attempts to deliver packets from source node to destination node based on the shortest path. In order to achieve optimize routing algorithm, we made some changes to the XY routing algorithm. In each step that packet will be routed, before comparing the (x,y) coordinates of current node to the (x,y) coordinates of destination node, first z coordinates of both nodes should be compared, then after selecting out path from the ring, routing will be based on comparing (x,y) coordinates of source and destination node. In order to hide message passing delay in the network we used wormhole routing. In wormhole routing a packet is divided into a header flit, several body flits and a tail flit. Header flit is sent first, and while propagating path through the interconnection network then other flits will be sends. According to figure 2 that represents wormhole routing busy time for a packet, just header flit will face to the routing delay and other flits just will face to switching delay in each node and propagation delay in each hop. In summary the general routing mechanisms for this network are: Routing is based on comparing destination node with the source node. Area Torus = 16N2 (L 2-1)k 2 +o( N 2 L 2 k 2 ) (1) By replacing each node by a ring in the torus network, the area for hyper node torus will be: 16N 2 2 Area Hyper Node Torus = (L 2-1)k 2 log 2 2 N +o( N L 2 k 2 log 2 2 N ) (2) If we assume that the implementation is using 2 layers, then the consumed area for torus and hyper node torus network, each with 16 and 64 nodes, will be according to table 1 and table 2. It should be considered that this numbers do not represents the real area that is used by these networks on the board, and this is just for a comparison among different interconnection networks [5]. TABLE I. THE AREA FOR TORUS NETWORK WITH 16 AND 64 NODES 4-ary 2-cube Area 16 16 2 8-ary 2-cube (2 2-1) 4 2 85 16 64 (2 2-1) 8 2 341 2 107

TABLE II. Area THE AREA FOR HYPER NODE TORUS NETWORK WITH 16 AND 64 NODES 2 2-Hyper Node Torus 4 4-Hyper Node Torus 16 16 2 2 2-1 2 2 log 2 2 16 21 16 64 2 2 2-1 4 2 log 2 2 64 37 III. PERFORMANCE EVALUATION In this paper we used Delay and Throughput as two standard metrics for comparing performance between torus and hyper node torus interconnection networks. The simulator that is used in this paper is Noxim [3]. Noxim is an open-source SystemC-based NoC simulator. As Noxim is a cycle accurate simulator, the computed delay is showed based on the cycle that packet face until it reaches to its destination. Delay computation is based on the propagation delay and node delay in each hop. Node delay itself is composed of routing delay and switching delay. Delay equation for each packet, according to [4] will be shown as (3), that h represents number of hops in the path, T s is for switching delay, T r is for routing delay and T p stands for propagation delay. Delay=h (T s +T r +T p )+ L W (T s+t p p) (3) A. Simulation Parameters In order to compare simulation results based on the two factors that later were mentioned, we need to define different parameters that are participating in the simulations. Parameters can be divided in two groups, those that are constant during simulation and those that are changed. The first group, constant parameters, is as follow: Packet Length: Random between 10 and 15 flits Buffer Depth: 4 flits Simulation Time: 10000 cycle Warm up Time: 1000 cycle Time Distribution of Traffic: Poisson Other parameters that are changing during simulation represent the offered-load of network. They are as follow: Packet Injection Rate in each element Traffic Type: In this paper we used Uniform and Hot- Spot traffic. Hot-spot traffic is for increasing traffic in border nodes in order to simulate line card I/O ports. B. Simulation Results According to the simulation parameters that were presented, several simulation scenarios in this step will be run. Since the number of elements on line cards can reach to 64 nodes [6], we use network sizes with 16 and 64 nodes for simulations and results comparisons. In order to compare delay of torus and hyper node torus networks, we used random traffic while increasing traffic load during simulation for both networks with different network sizes. First we compare delay of a 4 4-torus and a 2 2-hyper node torus network. As figure 3 shows while network load increases, delay for both networks with total number of 16 nodes respectively increases. Figure 4 shows another scenario for delay comparison between an 8 8-torus and a 4 4-hyper node torus network, both with 64 nodes. This figure shows the same behavior as figure 3. From both results it can be concluded that with increasing traffic load, hyper node torus with different sizes obtains lower delay, in other word hyper node torus network is able to sustain higher traffic load than torus network. Figure 5 compares throughput for 4 4-torus and 2 2-hyper node torus network with random traffic under three different traffic loads. As this figure shows higher throughput is achieved by hyper node torus network. Figure 6 repeats the same scenario for an 8 8-torus and a 4 4-hyper node torus network, as it shows, hyper node torus network has better throughput than torus network as the previous scenario. Figure 3. Delay vs. Traffic Load for Networks with 16 nodes Figure 4. Delay vs. Traffic Load for Networks with 64 nodes 108

According to the obtained results of network throughput for hyper node torus and torus network, with different traffic loads, hyper node torus in all of the scenarios has better throughput and can route more packets successfully to their destinations. Since these interconnection networks should be applied on router line cards, corner and edge nodes will be in contact with I/O ports; therefore, traffic rate will be higher in these areas. Likewise centric nodes will face with similar situation, because most of the network traffic will be routed from this part of the network. In this way we apply hot-spot traffic at these nodes of the interconnection network to simulate the real situation after implementation. The hot-spot nodes traffic will be 20 percent more than the traffic of other nodes of the network. Hot-spot nodes are defined as follow: 4 4-torus hot-spot edge nodes: (0,0) and (0,1) 4 4-torus hot-spot centric nodes: (1,1) and (2,1) 8 8-torus hot-spot centric nodes: (3,3) and (4,3) 8 8-torus hot-spot edge nodes: (0,0) and (0,1) 2 2-hyper node torus hot-spot edge nodes: ((0,0),0) and ((0,1),0) 2 2-hyper node torus hot-spot centric nodes: ((0,0),2) and ((1,0),0) 4 4-hyper node torus hot-spot edge nodes: ((0,0),0) and ((0,1),0) 4 4-hyper node torus hot-spot centric nodes: ((1,1),2) and ((2,1),0) Figure 7 demonstrates throughput under hot-spot traffic for 4 4-torus and 2 2-hyper node torus network and figure 8 is the same scenario for 8 8-torus and 4 4-hyper node torus network with defined hot-spot nodes and uniform traffic on all nodes. According to both figures after applying hot-spot traffic in the network, hyper node torus performs better throughput than torus network. Therefore, it can be concluded that hyper node torus interconnection network can act better in real network traffic and route more packets successfully than torus network. Figure 5. Throughput Comparison with different packet injection rates for Networks with 16 nodes Figure 6. Throughput Comparison with different packet injection rates for Networks with 64 nodes Figure 7. Throughput with Hot-Spot traffic for Networks with 16 nodes Figure 8. Throughput with Hot-Spot traffic for Networks with 64 nodes 109

IV. CONCLUSION In this paper we proposed a new interconnection network, the hyper node torus, as a structure to connect processing elements and memory modules on the router line cards. We defined different simulation scenarios under uniform and hotspot traffic to compare hyper node torus network performance with regular torus. As the simulation results illustrate, hyper node torus network can provide slightly better performance in terms of latency and throughput as well as bandwidth of communication between processors and memory modules. However, from the viewpoint of implementation, the proposed interconnection network requires much less area than comparable regular torus network on the board. In a 64-node network, the required area is as less as 10% of the regular torus. As a result, the proposed network provides much better scalability to be used in high-capacity network processors. REFERENCES [1] DN.Armstrong, Architectural Considerations for Network Processor Design, University of Texas at Austin, Department of Electrical and Computer Engineering, Literature Survey, March 2002. [2] J.Engel and T.Kocak, Off-chip communication architecture for high throughput network processors, Elsevier, Computer Communications, pp.867-879, January 2009. [3] Noxim: network-on-chip simulator [Online]. Available: http://sourceforge.net/projects/noxim/ [4] WJ. Dally, Performance analysis of k-ary n-cube interconnection networks, IEEE Transactions on Computers, vol.39, no.6, pp.775 785, June 1990. [5] C.H. Yeh, B. Parhami, E.A. Varvarigos, Multilayer VLSI layout for interconnection networks, Proc. Int. Conf. Parallel Proc., 33 40, 2000. [6] J.Engel and T.Kocak, k-ary n-cube based off-chip communications architecture for high-speed packet processors, IEEE, 48th Midwest Symposium on Circuits and Systems, pp.1903-1906, 2006. [7] A. Agarwal, Limits on interconnection network performance, IEEE Transactions on Parallel and Distributed Systems, pp. 398-412, 2002. 110