Interconnection Networks



Similar documents
System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Network

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Interconnection Network Design

Lecture 2 Parallel Programming Platforms

Topological Properties

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Components: Interconnect Page 1 of 18

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Chapter 2. Multiprocessors Interconnection Networks

Scalability and Classifications

Why the Network Matters

Parallel Programming


COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Interconnection Networks

Module 15: Network Structures

Chapter 14: Distributed Operating Systems

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Chapter 16: Distributed Operating Systems

Chapter 12: Multiprocessor Architectures. Lesson 04: Interconnect Networks

Introduction to LAN/WAN. Network Layer

Computer Networks Vs. Distributed Systems

Routing in packet-switching networks

MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL

Behavior Analysis of Multilayer Multistage Interconnection Network With Extra Stages

The Butterfly, Cube-Connected-Cycles and Benes Networks

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Communication Networks. MAP-TELE 2011/12 José Ruela

Principles and characteristics of distributed systems and environments

Scaling 10Gb/s Clustering at Wire-Speed

Asynchronous Bypass Channels

Local-Area Network -LAN

Chapter 15: Distributed Structures. Topology

Overview of Network Hardware and Software. CS158a Chris Pollett Jan 29, 2007.

Router Architectures

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

ECE 358: Computer Networks. Solutions to Homework #4. Chapter 4 - The Network Layer

Optical interconnection networks with time slot routing

Lecture 2.1 : The Distributed Bellman-Ford Algorithm. Lecture 2.2 : The Destination Sequenced Distance Vector (DSDV) protocol

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Switched Interconnect for System-on-a-Chip Designs

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

EE4367 Telecom. Switching & Transmission. Prof. Murat Torlak

Interconnection Networks

Data Center Network Topologies: FatTree

Wide Area Networks. Learning Objectives. LAN and WAN. School of Business Eastern Illinois University. (Week 11, Thursday 3/22/2007)

From Hypercubes to Dragonflies a short history of interconnect

Infrastructure Components: Hub & Repeater. Network Infrastructure. Switch: Realization. Infrastructure Components: Switch

WAN Data Link Protocols

CSCI 362 Computer and Network Security

CS 78 Computer Networks. Internet Protocol (IP) our focus. The Network Layer. Interplay between routing and forwarding

LOAD-BALANCED ROUTING IN INTERCONNECTION NETWORKS

White Paper Abstract Disclaimer

Computer Networks. Definition of LAN. Connection of Network. Key Points of LAN. Lecture 06 Connecting Networks

Performance of networks containing both MaxNet and SumNet links

Packetization and routing analysis of on-chip multiprocessor networks

R2. The word protocol is often used to describe diplomatic relations. How does Wikipedia describe diplomatic protocol?

Architecture of distributed network processors: specifics of application in information security systems

Definition. A Historical Example

Local Area Networks transmission system private speedy and secure kilometres shared transmission medium hardware & software

Real-Time (Paradigms) (51)

Computer Networks. By Hardeep Singh

Parallel Computing. Benson Muite. benson.

Computer Networks: LANs, WANs The Internet

WAN. Introduction. Services used by WAN. Circuit Switched Services. Architecture of Switch Services

How To Understand The Concept Of A Distributed System

Computer Networking: A Survey

Latency on a Switched Ethernet Network

Lecture 6 Types of Computer Networks and their Topologies Three important groups of computer networks: LAN, MAN, WAN

Offline sorting buffers on Line

Mixed-Criticality Systems Based on Time- Triggered Ethernet with Multiple Ring Topologies. University of Siegen Mohammed Abuteir, Roman Obermaisser

Dynamic Source Routing in Ad Hoc Wireless Networks

Annotation to the assignments and the solution sheet. Note the following points

PART III. OPS-based wide area networks

Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors

Chapter 9A. Network Definition. The Uses of a Network. Network Basics

Introduction. Abusayeed Saifullah. CS 5600 Computer Networks. These slides are adapted from Kurose and Ross

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

A hierarchical multicriteria routing model with traffic splitting for MPLS networks

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

SAN Conceptual and Design Basics

Network Architecture and Topology

Tolerating Multiple Faults in Multistage Interconnection Networks with Minimal Extra Stages

What is this Course All About

ESSENTIALS. Understanding Ethernet Switches and Routers. April 2011 VOLUME 3 ISSUE 1 A TECHNICAL SUPPLEMENT TO CONTROL NETWORK

Transcription:

CMPT765/408 08-1 Interconnection Networks Qianping Gu 1 Interconnection Networks The note is mainly based on Chapters 1, 2, and 4 of Interconnection Networks, An Engineering Approach by J. Duato, S. Yalamanchili, and L. Ni and Section 2.3.2.1 of Multiwavelength Optical Networks, A Layered Approach, by T.E. Stern and K. Bala. Interconnection for digital systems Interconnection networks provide the interconnection for digital systems. Examples of interconnection networks include the internal buses in VLSI circuits, telephone switches and networks, networks for parallel/distributed computing systems (including vector supercomputers, multicomputers, multiprocessors, cluster/network of workstations), LAN, MAN, WAN, and networks for industrial applications and electronic devices. Parallel computing and networks Parallel computing systems have been developed to meet the increasing demands on computing powers. A bottleneck in parallel computing systems is the the communication between processors. Therefore, the performance of interconnection networks is a critical issue in parallel computing. This has been a major driving force for the research of interconnection networks. The study of interconnection networks in parallel computing system includes the performance and cost issues. Parallel computer architecture Distributed memory multiprocessors (multicomputers) A multicomputer consists of a set of processors, each processor has its own memory, interconnected by a network. Communications between processors are realized by message passing on the interconnection network. It is easy to make a multicomputer with a large number of processors (and thus a large computing power in theory) but it is difficult to make programs on multicomputers because data and tasks need to be distributed to processors in an efficient way. Shared memory mutliprocessors In this model, all processors in a system share a common memory space. The communication between processors are realized by read/write the shared memory cells through interconnection networks. This simplifies the data exchange between processors. When the system is small, the access time to the memory of every processor can be considered uniform. However, this is not true if the system is large. Distributed shared-memory multiprocessors This model combined the previous two models, each processor has a local cache memory, and all processors share a common main memory. Multicomputers, shared memory multiprocessors, and distributed shared-memory multiprocessors are known as fine grained parallel computing systems, because the computation and communication between processors can be performed in a highly synchronized way.

CMPT765/408 08-1 Interconnection Networks Qianping Gu 2 Network of workstations This model refers to a set of workstations/pcs connected by a network such as a LAN. The model can be further classified into two categories: NOW (network of workstations) and COW (cluster of workstations). NOW is a system dedicated to the parallel computing. Usually, the performance issue of the network in NOW is considered when the system is built. COW refers to a set of workstations/pcs connected by a network that the system may work for other purposes but its extra computing power is used for parallel computing. This model is known as coarse grained parallel computing systems. The computation on this model can be bulk synchronized. Classification of Interconnection Networks Interconnection networks provide the interconnection among end systems and can be classified into shared-medium networks, direct networks, indirect networks, and hybrid networks. Shared-medium networks In shared-medium networks, processors are connected by a common transmission medium such as a bus. All processors share the medium which does not generate message. To send a message to a destination, a source broadcasts the message on the medium and the destination picks up the message. Because processors may send messages to the medium simultaneously, the resolution of network access conflicts is needed. The nature of the shared medium also limits the bandwidth of the network and the number of end systems in the network. Examples of the shared medium networks include the Ethernet. The protocol of for the medium access control used in the Ethernet is known as CSMA/CD (carrier sense multiple access with collision detection). Point-to-point networks In point-to-point networks, end systems are connected by point-to-point communication links. The networks can be further classified into two categories: direct networks and indirect networks. In direct networks, point-to-point links directly connect each end system to some other end systems. In indirect networks, end systems are connected via one or more switches, switches are connected via point-to-point links. Hybrid networks Some networks may have more complicated structures such as hierarchical structures or hypergraph topologies. Such networks are classified as hybrid networks. Direct Networks A direct network consists of a set of nodes and a set of point-to-point links. Each node is directly connected to a small subset of nodes by links. Each node performs both routing and computing. A direct network is usually modeled as a graph, with vertices and edges

CMPT765/408 08-1 Interconnection Networks Qianping Gu 3 of the graph for the nodes and links in the network, respectively. A direct network is characterized by its topology and the routing/switching technologies used in the network. Important topology properties of the network include node degree (the number of links connected to the node), diameter (the maximum distance between two nodes in the network), regularity (a network is regular when all nodes have the same degree), symmetry (a network is symmetric when it looks alike from every node), and orthogonal property (a network is orthogonal if its nodes and links can be arranged in n dimensions such that each link is placed in exactly one dimension). In direct networks, the paths for message transmission are selected by routing algorithms. The switching mechanisms determine how inputs are connected to outputs in a node. All the switching techniques can be used in direct networks. Popular topologies for direct networks r-dimensional mesh The r-dimensional mesh consists of N = k 1 k 2... k r nodes, k i nodes along dimension i, k i 2. Each node is identified by a label (a 1,..., a r ), 0 a i k i 1 and 1 i r. Two nodes u = (a 1,..., a r ) and v = (b 1,..., b r ) are connected by a link iff there is exactly one j such that a j = b j ± 1 and a i = b i for all i j. The most important mesh networks in practice are the 2-D mesh (r = 2, k 0 = k 1 = n, N = n 2 ) and the 3-D mesh (r = 3, k 0 = k 1 = k 2 = n, N = n 3 ). r-dimensional torus Similar to the r-dimensional mesh, the r-dimensional torus consists of N = k 1 k 2... k r nodes, k i nodes along dimension i, k i 2. Each node is identified by a label (a 1,..., a r ), 0 a i k i 1 and 1 i r. Two nodes u = (a 1,..., a r ) and v = (b 1,..., b r ) are connected by a link iff there is exactly one j such that a j = b j ± 1 mod k j and a i = b i for all i j. The torus can be considered as the mesh with wrap around connections. When k 1 = k 2 =... = k r = k, the r-dimensional torus is called the k-ary r-cube. When r = 1, the network is the ring. The 2-D torus (r = 2, k 0 = k 1 = n, N = n 2 ) and the 3-D torus (r = 3, k 0 = k 1 = k 2 = n, N = n 3 ) are important networks. Hypercube The n-dimensional hypercube (n-cube) consists of 2 n nodes. Each node is identified by (a 1,..., a n ), a i {0, 1}, 1 i n. There is a link between u = (a 1,..., a n ) and v = (b 1,..., b n ) iff u and v differ in exactly one bit position. The n-cube has degree n and diameter n. Tree The k-ary tree is a tree in which every node except leaves has exactly k children. When k = 2, the k-ary tree is known as the binary tree. Cube-connected cycles The cube-connected cycle network can be considered as an n-dimensional hypercube of virtual nodes, each virtual node is a ring of n nodes (n2 n nodes in total). The cubeconnected cycle have node-degree 3 and diameter O(n).

CMPT765/408 08-1 Interconnection Networks Qianping Gu 4 Shuffle-Net The (δ, k)-shufflenet is a regular digraph of in-/out-degree δ, N = kδ k nodes, and kδ k+1 arcs. The nodes are arranged in k columns, each column has δ k nodes. The nodes in each column are connected to the next column via δ k+1 arcs in a generalization of the perfect shuffle pattern. The (δ, k)-shufflenet has diameter d = 2k 1 and N = d+1 2 δ(d+1)/2 nodes. debruijn digraphs The debruijn digraph B(δ, d) has in-/out-degree δ, diameter d, and N = δ d nodes. Each node has a label (a 1,..., a d ), a i {0, 1,..., δ 1}. There are arcs from node v = (a 1, a 2,..., a d ) to nodes with labels (a 2,..., a d, α), α {0, 1,..., δ 1}. Star graph The n-dimensional star graph has n! nodes, each node is identified by a permutation of (1, 2,..., n). Nodes u and v are connected iff v can be obtained by exchanging the 1st symbol with the ith symbol, 2 i n, in the permutation of u. The n-dimensional star graph has node-degree n 1 and diameter 3(n 1)/2. Indirect Networks Nodes are connected by network of switches which can be set dynamically in different topologies. Only nodes can be end systems (sources and destinations). A network can be modeled by a graph, with vertices for switches and edges for links. End systems are not shown usually. The source systems are connected to the inputs and the destination systems are connected to the outputs of the network. Typical indirect networks include the crossbar network and the multistage interconnection networks (MIN). Main factors for the networks include the topology, routing, and switching. Cross bar network An r n crossbar network consists of r input lines, n output lines, and rn cross-points located at the intersections of the lines. At each cross point, the input line and output line are connected by a binary switch which has two states, connected and disconnected. Any set of point-to-point connections (permutations) can be realized on an n n crossbar by closing one cross point in each row and each column. A problem with a crossbar switch is the large number of crosspoints in the switch. Multistage interconnection networks (MIN) In MIN, the inputs are connected to the outputs through a number of switch stages. Key factors for the MIN include the number of stages and the connection pattern between stages. Three stage Clos networks For a network with n inputs and n outputs, there are k p m cross bar networks in the 1st stage, m k k cross bar networks in the 2nd stage, and k m p cross bar networks in the 3rd stage, where n = kp.

CMPT765/408 08-1 Interconnection Networks Qianping Gu 5 Generalized Clos networks For n a power of 2, taking p = m = 2, there are n/2 2 2 switches in each of the 1st and 3rd stages, and 2 n/2 n/2 networks in the 2nd stage. Recursively realizing the networks in the 2nd stage, we get a 2 log 2 n 1 stages network (Beneš network, has O(n log 2 n) 2 2 switches). A network with S connection states requires at least log 2 S binary switching elements. For n n network for realizing any permutations, there are S = n! connection states and needs at least n log n 1.44n binary switches. Generalized MIN model Assume that each stage has the same number of inputs and outputs. patterns between stages can be defined by permutations. The connection Basic Permutations Assumptions: Each stage has k n 1 k k switches. k n inputs/outputs are identified by x n 1...x 0, 0 x i k 1 for 0 i n 1. Perfect k-shuffle σ k (x n 1...x 0 ) = x n 2...x 1 x 0 x n 1. σ 2 perfectly shuffles N cards. Inverse perfect shuffle connection σ k 1 (x n 1...x 0 ) = x 0 x n 1...x 1. Digit reversal connection ρ k (x n 1...x 0 ) = x 0 x 1...x n 1 The ith k-ary butterfly permutation, 0 i n 1, β k i (x n 1...x i+1 x i x i 1...x 1 x 0 ) = x n 1...x i+1 x 0 x i 1...x 1 x i The ith cube connection E i, 0 i n 1, k = 2 E i (x n 1...x i+1 x i x i 1...x 0 ) = x n 1...x i+1 x i x i 1...x 0. The ith k-ary baseline permutation, 0 i n 1, δ k i (x n 1...x i+1 x i x i 1...x 1 x 0 ) = x n 1...x i+1 x 0 x i x i 1...x 1. Classification of MINs Blocking network: A path from a free input to a free output is not always possible because of the conflicts with existing connection paths. Nonblocking network: A path from a free input to a free output is always possible without affecting the existing connection path. Rearrangeable networks: A path from a free input to a free output is always possible with possible rearrangements of the paths for existing connections. Unidirectional MINs: Links and switches are unidirectional. Bidirectional MINs: Links and switches are bidirectional.

CMPT765/408 08-1 Interconnection Networks Qianping Gu 6 Unidirectional MINs An MIN of N inputs/outputs and k k switches needs at least log k N stages to allow a connection between any input/output pair. For the MINs of n stages, we number the stages 0, 1,..., n 1 from left (input) to right (output). Let C i (1 i n 1) denote the connection pattern between (i 1)st stage and ith stage, C 0 be the connection pattern between the sources and inputs of stage 0, and C n be the connection pattern between outputs of stage n 1 and the destinations. Baseline network: C 0, σ k ; C i (1 i n), δ k n i. Butterfly network: C 0, β k 0 ; C i (1 i n 1), β k n i; C n, β k 0. Cube MINs: C 0, σ k ; C i (1 i n), β k n i. Omega network: C i (0 i n 1), σ k ; C n, β k 0. Bidirectional MINs (BMINs) The BMIN consists of bidirectional switches/links. Bidirectional switches support three types of connections: forward, backward, and turnaround. End systems are connected to one side (e.g., left) of the network. Routing paths are established by crossing stages in forward direction, a turnaround connection, and in backward direction. Butterfly BMINs: Can be viewed as a folded Beneš network Inverse butterfly BMINs: Can be viewed as fat-tree (used in CM-5) Hybrid Networks Hierarchical Networks: Example, hierarchical buses. Cluster based network. Hypergraph topology. Blocking properties of indirect networks Accessibility: A network is fully accessible, if there is a path from any input to any output in the network. Non-blocking property: A set of one-to-one connection requests on an N N network can be defined by a permutation. A network is non-blocking, if any permutation can be realized by edge-disjoint paths in the network. Non-blocking properties can be further classified into rearrangeable non-blocking, wide-sense non-blocking, and strict-sense nonblocking, depending on if the permutation is realized statically or dynamically. Rearrangeable non-blocking A network is rearrangeable non-blocking if any permutation can be realized by edgedisjoint paths when the entire permutation is known. In other words, any permutation can be statically realized. The word rearrangeable refers to that if the connection

CMPT765/408 08-1 Interconnection Networks Qianping Gu 7 requests in a permutation arrive dynamically, the permutation can be realized with possible rearranging active connections. This is equivalent to realize a permutation statically. Wide-sense nonblocking When connection requests in a permutation arrive dynamically in sequence, the permutation can be realized by edge-disjoint paths without rearranging active connections subject to the condition that a selected path is used for each new connection request. In other words, any permutation can be dynamically realized with the help of a wise algorithm. Strict-sense nonblocking When connection requests in a permutation arrive dynamically in sequence, the permutation can be realized by edge-disjoint paths without rearranging active connections, any idle path can be used for each new connection request. In other words, any permutation can be dynamically realized. Obviously, the strict-sense non-blocking implies the wise-sense non-blocking which implies the rearrangeable non-blocking. Networks by 2 2 Switches A 2 2 switch has two input links and two output links and has a through state and a cross state for one-to-one connection from inputs to outputs. An r n cross bar network can be constructed by r n switches at the r n cross points of r input lines and n output lines. An n n crossbar network is strict-sense nonblocking MINs: An n-dimensional MIN has N = 2 n inputs/outputs and multiple stages of switches, with N/2 switches in each stage. A k-stage network has k N/2 switches. A necessary condition for the full accessibility is that the network has at least n stages. Sufficient conditions for the full accessibility depend on connection patterns between stages. A k-stage MIN has 2 k N/2 distinct states. To realize all permutations on an N N network, at least N! distinct states of the network is needed. A necessary condition on the number of stages for rearrangeable non-blocking networks is k 2 log N O(1). Sufficient conditions on the number of stages for rearrangeable non-blocking networks, depend on connection patterns between stages. The n-dimensional Beneš network has 2 log N 1 stages and is rearrangeable non-blocking. Blocking MINs There is a class of well known n-dimensional full accessible MINs with n stages of 2 2 switches. There is a unique path from any input to any output in a network of this class. Examples of the networks include baseline networks, omega networks, butterfly networks, indirect binary n-cube networks, and the inverse networks of the above. These networks are blocking networks. The networks in the class have similar structures. Especially, many of them are topologically or functionally equivalent.

CMPT765/408 08-1 Interconnection Networks Qianping Gu 8 Routing on n-dimensional MINs Routing on the n-dimensional Omega network Ω n An input u = (u n 1...u 1 u 0 ) and an output v = (v n 1...v 1 v 0 ) can be connected by the unique path Collisions (u n 1 u n 2...u 0 ) (u n 2...u 0 v n 1 ) (u n 3...u 0 v n 1 v n 2 )... (u n i...u 0 v n 1...v n i+1 )... (v n 1...v 0 ). If two paths share an edge then a collision occurs. Paths (u n 1...u 0 ) (v n 1...v 0 ) and (s n 1...s 0 ) (t n 1...t 0 ) are edge-disjoint iff i, 0 i n 1, u n i...u 0 v n 1...v n i+1 s n i...s 0 t n 1...t n i+1 For two paths u v and s t, let α(u, s) be the largest l such that the rightmost (least significant) l bits of u and s are the same, and let β(v, t) be the largest l such that the leftmost (most significant) l bits of v and t are the same. Paths u v and s t are edge-disjoint if α(u, s) + β(v, t) < n. Routing on other networks like butterfly networks is similar. Non-blocking networks N N cross bar network This network is strict-sense non-blocking. A problem with the cross bar network is the large number O(N 2 ) of switches in the network. How to construct a non-blocking network with as few switches as possible has been a major research topic in circuit-switched networks. The number of switches can be reduced by increasing the number of stages in the network. Clos network Three stage N N Clos network (N = k p), k p m cross bar networks in the 1st stage m k k cross bar networks in the 2nd stage k m p cross bar networks in the 3rd stage The outputs of one stage are connected to the inputs of the next stage by a shuffle pattern. More precisely, let the k networks in the 1st stage be identified by an integer x 0 {0, 1,.., k 1} and let the m outputs of each switch be identified by an integer x 1 {0, 1,.., m 1}. Let the m networks in the 2nd stage be identified by integer

CMPT765/408 08-1 Interconnection Networks Qianping Gu 9 x 1 {0, 1,.., m 1} and let the k outputs of each switch be identified by integer x 0 {0, 1,.., k 1}. Then output (x 1 x 0 ) of the 1st stage is connected to input (x 0 x 1 ) of the 2nd stage. Similarly, output (x 0 x 1 ) of the 2nd stage is connected to input (x 1 x 0 ) of the 3rd stage. A necessary and sufficient condition to make the three stage Clos network strict-sense non-blocking is m = 2p 1. An outline for proving this is given below. For any idle input/output pair (u, v), where u is an input of network x 0 in the 1st stage and v is an output of network x 0 in the 3rd stage, at most m 1 networks of the 2nd stage have been used for the connections from the inputs other than u of network x 0, and at most m 1 networks of the 2nd stage have been used for the connections to the outputs other than v of network x 0. Since there are 2m 1 networks at the 2nd stage, there is at least one network which is not used by any of the previous connections from network x 0 or to network of x 0. Therefore, (u, v) can be connected via that network of the 2nd stage. For N = p 2, taking m = 2p 1, a non-blocking Clos-network can be realized by O(N 3/2 ) switches. Reduce switches by increasing stages The number of switches can be further reduced by increasing the number of stages with a recursive construction. Let N = p r+1 (r 1) and m = 2p 1. The construction starts from a 3-stage network k = p r p m cross bars in the 1st stage m p r p r cross bars in the 2nd stage p r m p cross bars in the 3rd stage Recursively realize the middle stage cross-bars until each of the cross-bars in the 2nd stage becomes a p p network. The number of switches in the above construction is O(N 1+1/(r+1) ). Beneš Networks For N = 2 n, taking p = m = 2, N/2 2 2 switches in the 1st stage 2 N/2 N/2 networks in the 2nd stage N/2 2 2 switches in the 3rd stage Recursively realizing the networks in the 2nd stage, we get a 2 log 2 N 1 stages network which is known as the n-dimensional Beneš network and has O(N log 2 N) 2 2 switches. The Beneš network is rearrangeable non-blocking. A strict-sense non-blocking network can be constructed by n = log 2 N copies of the Beneš networks (known as Cantor network) as follows: At the input stage, there are N 1 n switches. In the middle stage, there are n copies of the Beneš networks. At the output stage, there are N n 1 switches.

CMPT765/408 08-1 Interconnection Networks Qianping Gu 10 Network equivalence Topological equivalence: Two networks are topologically equivalent if one can be obtained from the other by relabeling switches and/or inputs/outputs of switches. Functional equivalence: Two networks are functional equivalent if they realize the same set of permutations. Routing on direct networks A direct network consists of a set of nodes, each is connected to some other nodes (neighbors) by point-to-point links. The network is also known as the point-to-point network or message-passing network. The term is usually used for parallel/distributed computing systems. The communication in the network is realized by message passing. Usually a message is partitioned into packets. A packet consists of a header and a data area. A header has routing and sequencing information. Usually each node in the network performs both computing and routing. In most systems, each node has a dedicated router to free the CPU from routing. Each router usually can realize non-blocking routing from its inputs to its outputs. Communication time A common metric for communication time is communication latency which has three parts: Start-up latency It is the time to handle packet at source and destination nodes and depends on the protocols and internal architectures of the nodes. Network latency It is the time between a packet leaving the source and arriving at the destination assuming that there is no contention during the transmission. It depends on the topology of the network, the channel capacity, switching techniques, routing algorithms, and structures of routers. Blocking time It is the time delay caused by contentions. The contention can happen on links and routers. The blocking time depends on dynamic behaviors of the network. Queueing theory is a main methodology for analyzing the average blocking time. Main factors on the communication latency include the topology, routing, flow control, and switching. Topology The topology refers to how nodes are interconnected by communication links (channels). It is ideal that any two nodes is directly connected by a link. The corresponding graphs to such networks are complete. However, due to the hardware constraints and cost, it is not feasible to have large complete networks. Much research work has been done on network topologies. Some basic preferable properties of topologies include small diameter,

CMPT765/408 08-1 Interconnection Networks Qianping Gu 11 small node degree, multiple disjoint paths between nodes, symmetry, and regularity. Some properties contradict with each other, like diameter and node degree. Additional properties related to performances of the network include the following. Orthogonal property, provides the base for dimension based routing which is easy to realize. Bisection width, the minimum number of links to partition the network into two parts. This property is important to the fault tolerance of the network. Channel bandwidth, the data rate of a link. It is the product of channel width and the channel rate, where the channel width is the number of bits can be transmitted in parallel on a link (the number of lines) and the channel rate is the peak transfer rate of bits on a single line. The channel bandwidth determines the performance of the link. Routing Routing is to select a path for delivering messages from source to destination. The following strategies have been used for routing. Source routing The source node determines the routing path which is fixed. The packet must carry the information on the routing path. Distributed routing Each router decides the neighbor node to which it sends the message without the global information of the network. The algorithm used by routers should be fast and easy to implement. Deterministic routing A unique routing path is used for a source and destination pair. The path does not depend on the dynamic states of the network. This is also known as oblivious routing. Adaptive routing The routing path for a source and destination pair may not be unique. The path depends on the dynamic states of the network. Minimal routing The routing path is always a shortest path. We will study routing in more details later. Switching in direct networks Both circuit-switching and packet-switching have been used in direct networks. Circuit switching In circuit-switching networks, a dedicated path is set-up to connect a pair of source and destination. The message is transfered without buffering on the path. The communication

CMPT765/408 08-1 Interconnection Networks Qianping Gu 12 latency consists of the time for path set-up and the delay in transferring the message. In most of the circuit-switching parallel computing systems, the paths are set-up by passing a prob (control) packet from the source to the destination. Let T be the communication latency, L c be the length of the control packet, L be the length of the message, B be the channel bandwidth, and d be the length (the number of channels) in the path. Then T = d L c B + L B. If L c << L then the latency can be considered independent of d. The circuit-switching is not efficient for busty traffics. For short messages if a circuit is released each time, then the time for path set-up is excessive. If a circuit is not released then the channels in the circuit are idle, resulting in low utilization. Another problem is that one busy channel can block a whole circuit. Packet switching In packet-switching networks, the path for transmitting messages from a source to a destination may not be dedicated or unique and messages are transmitted with possible buffering at the intermediate routers on the path from the source to the destination. This overcomes some problems in circuit-switching networks. We discuss a number of variants of circuit-switching techniques. Message-switching In message-switching networks, messages are routed dynamically using routing algorithms. Usually the algorithms are distributed and adaptive ones. Messages are transmitted in a store-and-forward way: messages are stored in buffers at intermediate nodes and the whole message is stored before it is forwarded to the next node. Each message has a header which contains the routing information like the source/destination addresses. A link (channel) can be shared by messages for multiple connections. Let β be the start-up time for transmitting a message at a node. The communication latency for message-switching networks is T = d(β + L B ). For a network with diameter D the worst case lower bound for the latency is D(β + L B ). Problems with the above networks include large buffers required at routers (because a buffer should be able to store messages of different sizes) and latency proportional to the distance from source to destination.

CMPT765/408 08-1 Interconnection Networks Qianping Gu 13 Packet-switching In packet-switching networks, a message is partitioned into packets (usually of fixed size), each packet has a header containing routing information. Each packet is routed in a storeand-forward way independently. This allows two types of parallelism in routing. One is that several links in a same path are simultaneously used for transferring packets (packets pipelined) and the other is that packets can be routed on multiple paths from the source to the destination. Because the packets have the same size, it is easier to control the buffer size at routers. Packet-switching networks have better resource utilization than messageswitching networks but have duplicated headers and more overhead at routers. Let P be the size of the packets. Then the latency for the first packet is d(β + P B ). If packets are pipelined on the links of the path, the communication latency for the message is T = (d + L P 1)(β + P B ). For a network with diameter D and cut-width C (the number of edges that separate the network), the worst case lower bound for the latency is Virtual cut-through max{dβ, L BC }. The virtual cut-through switching can be considered as a mixture of circuit-switching and packet-switching. The idea here is that when the next channel is available at an intermediate router, the router sends the received part of a packet to the channel before the entire packet is stored. A router only buffers an entire packet when the output channel is busy. In the worst case, the latency of the network is similar to that of a packet-switched network. In the best case, the latency is similar to that of a switched network. In the ideal case, the latency is Wormhole routing T = dβ + L B. In wormhole-switching networks, a packet is further partitioned into smaller units (flits). The first flit(s) contains the header and the other flits contain data. The routing decision is based on header flits and data flits follow the route for the header in a pipelined fashion (the router does not buffer data flits). Since data flits do not contain routing information, a contiguous path of channels is needed. When a header is blocked, all flits stop advancing and remain in channels. This is different from the virtual-cut-through switching where the data are moved from channels to buffers of the router. The latency of the network is T = dβ + T B

CMPT765/408 08-1 Interconnection Networks Qianping Gu 14 if there is no contention. The advantages of the networks is small latency (almost independent to d if no contention) and very small buffers (just FIFO flit buffers). Problems include large blocking time if network congested and deadlocks in routing. An example of a deadlock may like this: packet A holds some resources while requesting others held by packet B which is demanding the resources held by A. Resources are buffers in store-and-forward and virtual-cut-through networks and channels in circuit-switching and wormhole networks. Much work has been done on preventing and avoiding deadlocks. The details are omitted here. Virtual channels In the VC switching, a message/packet is transmitted through a virtual path. A virtual path is not a dedicated one for a connection but can be shared by multiple connections. Similar to the circuit-switching, there are three phases in the VC switching, connection establishment, transmission, and connection termination. The advantages of the virtual channel switching include the similar latency to and better resource utilization than those of circuit-switching networks. Disadvantages include the increased scheduling complexity and the physical channel shared by multiple virtual connections can be a bottleneck which increases the latency. Routing Strategies Deterministic routing For orthogonal networks, an efficient approach is dimension-ordered routing. The routing path consists of a sequence of links with a specific dimension order. The intermediate routers can easily compute the next output link from the source and destination addresses in the message. An example of dimension-ordered routing is the E-cube routing on the hypercube network: A packet contains the destination address d {0, 1} n. When the packet arrives at a node with label v {0, 1} n, the node computes d v, where is the bitwise binary sum, and uses the link on the dimension corresponding to the rightmost (least significant) 1 in d v as the output link (if d v = 0, the packet has arrived at the destination d). Another example is mesh-xy -routing. In this approach, a packet is first routed in the X direction to the correct row and then in the Y direction to the correct column. Adaptive routing This routing strategy can choose routing paths based on the conditions of the network. There are a number of approaches. Details are omitted. Table look-up routing This is a distributed routing strategy. Each node keeps a table to indicate which outgoing link to use for a destination. The Internet uses this approach. Key points for this approach include getting information for updating the routing table and reducing the table size.