Simulation Modelling Practice and Theory

Size: px
Start display at page:

Download "Simulation Modelling Practice and Theory"

Transcription

1 Simulation Modelling Practice and Theory 7 (9) 5 5 Contents lists available at ScienceDirect Simulation Modelling Practice and Theory journal homepage: Adaptive routing in wormhole-switched necklace-cubes: Analytical modelling and performance comparison Sina Meraji, Hamid Sarbazi-Azad * Department of Computer Engineering, Sharif University of Technology, Tehran, Iran School of Computer Science, Institute for Research in Fundamental Sciences, Tehran, Iran article info abstract Article history: Available online June 9 Keywords: Interconnecting network Hypercube Necklace hypercube Adaptive routing Wormhole switching Performance evaluation Analytical modelling The necklace hypercube has recently been introduced as an attractive alternative to the well-known hypercube. Previous research on this network topology has mainly focused on topological properties, VLSI and algorithmic aspects of this network. Several analytical models have been proposed in the literature for different interconnection networks, as the most cost-effective tools to evaluate the performance merits of such systems. This paper proposes an analytical performance model to predict message latency in wormholeswitched necklace hypercube interconnection networks with fully adaptive routing. The analysis focuses on a fully adaptive routing algorithm which has been shown to be the most effective for necklace hypercube networks. The results obtained from simulation experiments confirm that the proposed model exhibits a good accuracy under different operating conditions. Ó 9 Elsevier B.V. All rights reserved.. Introduction A large number of interconnection networks have been proposed and studied for highly parallel distributed-memory multicomputers [6]. Among them the hypercube has been one of the most famous ones which has many desirable properties such as logarithmic diameter and fault-tolerance. It is not, however, scalable from hardware cost point of view, i.e. when adding some few nodes to it, we have to duplicate the network size to reach the next specified network size. Other drawback with hypercubes was reported by Patel et al. [6] when considering VLSI layout. They showed that the minimum number of tracks for VLSI layout of an n-cube (n-dimensional hypercube) using a one-dimensional implementation has an order of network size. Their investigation revealed that for an example network of k processors (a -cube), at least 687 tracks are required for VLSI implementation. The necklace hypercube, introduced in [9], is a new interconnection network based on the hypercube network. While preserving most of properties of the hypercube, it has some other desirable properties such as hardware scalability and efficient VLSI layout that make it more attractive than an equivalent hypercube network [9]. There are three main approaches for performance evaluation of interconnection networks. The first one is monitoring the behaviour of the actual system; it can capture the effects of low-level design choices, but restricts experimentation with different router policies since it can be prohibitively expensive and time-consuming to change these features. Simulation is the second approach for performance evaluation of interconnection network. We may implement different routing algorithms, different switching methods, and different interconnection topologies with simulation environments, but simulation is time consuming especially when we study large networks. The last way is to using mathematical approaches for performance analysis of interconnection networks. Mathematical models are cost-effective and versatile tools for evaluating system * Corresponding author. address: azad@mail.ipm.ir (H. Sarbazi-Azad) X/$ - see front matter Ó 9 Elsevier B.V. All rights reserved. doi:.6/j.simpat.9.6.8

2 S. Meraji, H. Sarbazi-Azad / Simulation Modelling Practice and Theory 7 (9) performance under different design alternatives. The significant advantage of analytical models over simulation is that they can be used to obtain performance results for large systems and their behaviour under network configurations and working conditions which may not be feasible to study using simulation on conventional computers due to the excessive computation and memory demands. Several researchers have recently proposed analytical models of popular interconnection networks, e.g. k-ary n-cubes, tori, hypercubes, and meshes [ 5,7,8,,7]. The most difficult part in developing any analytical model of adaptive routing is the computation of the probability of message blocking at a given router due to the number of combinations that have to be considered when enumerating the number of paths that a message may have used to reach its current position in the network. Almost all studies on necklace hypercube interconnection networks focus on topological properties and algorithmic issues. There has been hardly any study on performance evaluation of such networks and no analytical model proposed for necklace hypercubes. In this paper, we discuss performance issues of necklace hypercube graphs by introducing a reasonably accurate mathematical model to predict the average message latency in wormhole necklace hypercubes using a high-performance routing algorithm proposed in [,]. An early version of this paper appeared in []. The rest of this paper is organized as follows. In Section, the structure of the necklace hyeprcube is described. In Section, adaptive wormhole routing in the necklace hypercube is discussed. Section proposes a mathematical performance model for adaptive routing in wormhole necklace hypercube. Validation of the proposed performance model is realized in Section 5 using results obtained from simulation experiments. In Section 6, using the proposed analytical the performance of necklace hypercubes is compared to equivalent hypercube under different implementation constraints. Finally, Section 7 concludes the paper.. The necklace hypercube graph The necklace hypercube is an undirected graph that is based on a hypercube by appending a necklace of processors (or nodes, interchangeably) to each edge. That is, besides connecting adjacent nodes (according to the base hypercube topology), we connect i-th dimension neighbors by an array of nodes, as a necklace. The necklace length may be fixed or variable for different edge necklaces. In the former, there are fixed number of nodes between each two adjacent neighbors on a necklace. The network is called regular necklace hypercube, and can be defined as ðn; kþ-rnh, where n is the number of dimensions and k is the necklace size. With fixed length necklaces, however, scaling up the network is limited to specified network sizes indicated by n and k as n ðnk þ Þ. In the latter form of necklace hypercubes, named as irregular necklace hypercube, each necklace associated to channel i in the base hypercube, contains k i nodes. Hence, the network can be defined by a vector of n n elements, i.e. k ¼ðk ; k ;...; k n n Þ. Such a network, denoted as ðk ; k ;...; k n n Þ INH, has excellent scalability with theoretically no limitations to scale. The network size for a ðk ; k ;...; k n n Þ INH is given as n þ P n n i¼ k i. The second factor in network size expression (the sigma part) implies the scalability property as we can have any desirable value for k i ; 6 i 6 n n ; in the network. Fig. a and b shows some examples of regular and irregular necklace hypercubes. Dark nodes are the nodes in the base hypercube and grey nodes are the necklace nodes. The index of each necklace node in its necklace is shown inside the node. Edge number of each edge in the base hypercube is shown inside a grey rectangle which is based on the i th dimension edge of smaller base vertex. Let us now give a more formal definition of the necklace hypercube. () () 7 () () 6 () () () 7 () 6 9 () () () 5 8 () (a) 9 () () () 5 8 () (b) Fig.. Examples of necklace hypercubes: (a) A regular necklace hypercube with n = and k =. (b) An irregular necklace hypercube.

3 5 S. Meraji, H. Sarbazi-Azad / Simulation Modelling Practice and Theory 7 (9) 5 5 Definition. A necklace hypercube is an undirected graph G ¼ ðv; EÞ, where u V is defined as u ¼ ðb; d; iþ with b (denoting the Base Vertex) being the address of hypercubic node (the one with smaller address in its dimension, d 6 d 6 nþ; d is the dimension number of the necklace (containing the node), and i (as the index) is the index of the node in its necklace. Note that d for the hypercube vertices (or base vertices) is zero. For simplicity, we index the nodes of a necklace from zero (zero for the base vertex) up to the number of nodes on the necklace. A ðn; kþ-rnh has n þðn n Þk nodes ( n nodes of degree n and n n k nodes of degree ), n n þ n n ðk þ Þ edges, and a diameter of n þ k. The bisection width of the necklace hypercube is twice that of its base hypercube. As the bisection width of an n-dimensional hypercube is n, the bisection width of the necklace hypercube based on an n-dimensional hypercube is then n [9]. One of the best features of necklace networks is their efficient VLSI layouts. According to the result given in [9], the number of wiring tracks sufficient and necessary for the single-row wiring layout of the n-dimensional hypercube (or n-cube), Q n, when the numeric node order is used is given by [9]: t num ðq n Þ¼ ð=þ n þ bn=c. For the ðn; kþ-rnh, in general, we can extend this expression. Note that each necklace can be placed in just two extra tracks. So, the number of wiring tracks sufficient and necessary for the -dimensional wiring layout of necklace hypercube ðn; kþ-rnh [9], when the numeric node order is used is t num ððn; kþ-rnhþ ¼ ð=þ n þ bn=c that is independent of k [9]. Hence, the VLSI layout for a (,)-RNH requires six tracks only. Note that (,)-RNH contains nodes and the corresponding hypercube (-cube as the nearest size) needs tracks, i.e. is twice the number of tracks needed for its equivalent necklace network.. Adaptive wormhole routing in necklace hypercubes Deterministic routing algorithm for the necklace hypercube was already proposed in []. It can be used with both packet switching and wormhole switching techniques. To define a fully adaptive routing algorithm, we must first define a deadlockfree routing (with any level of adaptivity). We can then use the deadlock-free routing algorithm as described in [] to construct a deadlock-free fully adaptive routing algorithm. In order to have a deadlock-free routing algorithm in the necklace hypercube, we use two classes of virtual channels: class A and class B. Virtual channels of class A are used when the message is at the source necklace; we use virtual channels of this class until we reach the first base vertex of the source necklace. Virtual channels of class B are used in the destination necklace; it means that when we enter the destination necklace, we use this class of virtual channels until we reach to the destination node. Each of these classes can be used for a deadlock-free routing algorithm in the base hypercube, e.g. e-cube routing. The minimum number of virtual channels in each class is one. Thus, we need at least two virtual channels per physical channel to implement a deadlock-free routing algorithm in necklace hypercubes. According to Duato s methodology, since the base deadlock-free routing algorithm requires two virtual channels, we can have a fully adaptive deadlock-free routing algorithm in the necklace hypercube using at least three virtual channels, two of which used by the base routing algorithm and the remaining one used in any possible way that can bring the message closer to its destination. We call the virtual channels used for the base deadlock-free routing base virtual channels and the remaining virtual channels as adaptive virtual channels. When there are more than three virtual channels per physical channel, the network performance is maximized when the extra virtual channels are used as adaptive virtual channels. Thus, with V virtual channels per physical channel, the best performance is achieved when we have V- adaptive virtual channels and two base virtual channels. This is of course for routing in necklaces; in the base hypercube we can use Duato s fully adaptive routing algorithm [6] which uses V- adaptive virtual channels and virtual channel for e-cube deterministic routing. Fig. shows the routing algorithm between nodes u ¼ðb ; d ; i Þ and v ¼ðb ; d ; i Þ in pseudo code. Note that the algorithm contains three main steps:. Move towards the nearest base neighbor using any of the V- adaptive virtual channels. If all V- adaptive virtual channels are busy use the virtual channel of class A from base virtual channels to move toward the nearest base neighbor in the source necklace.. Move towards the nearest neighbor of the destination using fully adaptive routing algorithm in the hypercube with V- virtual channels []. If all V- virtual channels are busy use the remaining virtual channel with e-cube routing in the base hypercube.. Now the current node is one of the base vertices of the destination necklace. Move towards the destination node using any of the V- adaptive virtual channels. If all V- virtual channels are busy use the virtual channel of class B from base virtual channels to move toward the destination node along the destination necklace.. The analytical model In this section, we derive an analytical performance model for wormhole adaptive routing in a necklace hypercube. Our analysis focuses on the routing algorithm which was introduced in previous section but the modelling approach used here can be equally applied for other routing schemes.

4 S. Meraji, H. Sarbazi-Azad / Simulation Modelling Practice and Theory 7 (9) Routing (u, v): // u = ( b, d, i) and v = ( b, d, i) are the source and destination. u = b, u = b Location ( d); /* Location is a function that calculates the difference bit for the other extreme base vertex of a necklace. */ v = b, v = b Location( d ); CC = min( difference ( i, u), difference( i, u)) ; If there is any free virtual channel vc in V- adaptive virtual channels then set vc = vc else set vc = vc (class A base virtual channels); Send message through physical channel CC and virtual channel vc; Set the current node C; HD = H( C, v ), HD = H( C, v) ; // H(x,y) gives the Hamming distance between x and y If ( HD < HD) then Goto v using Duato fully adaptive routing algorithm else Goto v using Duato fully adaptive routing algorithm If there is any free virtual channel vc in V- adaptive virtual channels then set vc = vc else set vc = vc (class B virtual channels); Send message through physical channel i and virtual channel vc; End. Fig.. Fully adaptive routing algorithm in the necklace hypercube. The measure of interest in our model is the average message latency as a representative for network performance. The following assumptions are made when developing the proposed performance model. These assumptions have been widely used in similar modelling studies [ 5,7,8,,7,8]. (a) Messages are broken into some packet of fixed length of M flits. The flit transfer time between any two neighbouring nodes is assumed to be equal to one cycle. (b) Message destinations are uniformly distributed across the network nodes. (c) Nodes generate traffic independently of each other, which follow a Poisson process, with a mean rate of k g messages/ cycle. (d) Messages are transferred to the local processor through the ejection channel once they arrive at their destination. (e) V virtual channels per physical channel are used. These virtual channels are used according to the routing algorithm described in the previous section. According to the proposed fully adaptive routing algorithm, a message must cross three sub-networks from the source node to reach the destination node. At first, it moves from the source node to the nearest base node; then moves to the nearest base node of the destination node in the base hypercube; and finally crosses the destination necklace to reach the destination node. Therefore, we have message latencies: the mean message latency at the source necklace, Latency N, the mean message latency at the base hypercube, Latency H, and finally the mean message latency at the destination necklace, Latency N. Hence, the total mean message latency can be predicted as: Latency ¼ Latency H þ Latency N In order to compute the mean message latency we must consider three parameters: the mean network latency, S, that is the time to cross the network, the mean waiting time seen by a message in the source node to be injected into the network, W s. To model the effect of virtual channels multiplexing, the mean message latency is then scaled by a factor, V, representing the average degree of virtual channels multiplexing that takes place at a given physical channel []. Therefore, the mean message latency in each sub-network can be approximated as ðþ Latency ¼ðS þ W s ÞV: The average number of hops that a message makes across the necklace sub-network, d N, is given by d N ¼ k b X k=c i¼ i A value of dk=e must be added d k N, if the necklace length is odd. The average number of hops that a message makes across the base cube network, d H, is given by [] n d H ¼ n n ðþ ðþ ðþ

5 56 S. Meraji, H. Sarbazi-Azad / Simulation Modelling Practice and Theory 7 (9) 5 5 Fig. shows a necklace of (,5)-RNH. In this figure all messages generated in node cross edge (,) except those that their destination is node, or. If we consider the messages that are generated in node and node 5 and their corresponding destinations are nodes, and 5 (most crossing edge (,)) the exact message traffic rate over this edge can be expressed as k g N N þ k g N þ k g N ¼ k g ð5þ We can compute the traffic over edge (,) as follows. All messages generated in node cross edge (,) except those that their destination is node, or 5. Also, all messages generated in node, for destinations not being node, or 5, cross this edge. Considering messages generated in node for node, we can write the average traffic rate over this channel as N k g N þ k N g N þ k g N ffi k g ð6þ Using a similar approach, the traffic rate over the channels of a necklace with k nodes can be written as k g ; k g ; k g ;...; k=k g. We have similar results for the edges in the other direction []. Therefore, we can compute the average traffic rate received by each necklace channel, k cn,as k cn ¼ k g P k= i¼ i k= ¼ðk= þ Þ k g The traffic rate over the edge that connects the last necklace node to the base node is k k g. As each base node has n necklaces and the traffic rate generated in a base node itself is k g, we can compute the traffic rate injected to the base hypercube by a base node as ð7þ k b ¼ k g þ n k k g ð8þ Fully adaptive routing in the base hypercube allows a message to use any available channel that brings it closer to its destination node, resulting in almost evenly distributed traffic rate over hypercubic channels. A router in the base hypercube has n output channels. Since each message travels, on average d hops to cross the network, the rate of messages received by each hypercubic channel, k ch, can be expressed as [] k ch ¼ k bn ð9þ ðn Þ Let us follow a typical message which makes d hops to reach at its destination. The average network latency, S, seen by the message crossing from the source to destination node, consists of two parts: one is the time of actual message transmission, and the blocking time in the network. Therefore, S, can be expressed as S ¼ M þ d þ dt b where M is the message length, and T b is the average blocking time seen by the message at each hop. The term T b is given by T b ¼ P block w with P Block being the probability that a message is blocked at the current channel and w is the mean waiting time to acquire a channel in the event of blocking. A message is blocked at a given channel in the necklace sub-network when all adaptive and deterministic virtual channels of the current physical channel are busy. Let P a be the probability that all adaptive virtual channels of a physical channel are busy and P a&d denote the probability that all adaptive and deterministic virtual channels of a physical channel are busy. In necklace sub-networks, we have only one path for the messages, so no adaptivity exists for messages moving across a necklace. In order to compute P a&dn (probability P a&d for necklaces), we must consider two cases: (a) The probability that all of V virtual channels of a physical channel are busy, P V, and (b) the probability that V- virtual channels of the V virtual channels associated to a physical channel are busy. In this case only one combination of V- virtual channels of the total V virtual channels can result in blocking. ðþ ðþ Fig.. A (,5)-RNH network.

6 S. Meraji, H. Sarbazi-Azad / Simulation Modelling Practice and Theory 7 (9) So, the probability that all adaptive and deterministic virtual channels of a physical channel are busy can be expressed as P a&dn ¼ P V þ P V V V In order to compute the probability of blocking, P block, we average over all the probabilities that a message may be blocked crossing d hops in the network as P blockn ¼ d N Xd N P a&dn i¼ A message is blocked at a given channel in the hypercube network when all the adaptive virtual channels of the remaining dimensions to be visited and also the deterministic virtual channel of the current dimension are busy. We can then write P ah ; P a&dh and P blockh as []: P ah ¼ P V þ P V ðþ V V P a&dh ¼ P V ð5þ P blockh ¼ d XH dh i P a&dh ð6þ d H i¼ P ah where d H is the average number of hops that a message might take in the hypercube sub-network. To determine the mean waiting time, w, to acquire a virtual channel when a message is blocked, a physical channel is treated as an M/G/ queue with a mean waiting time of [9] qs þ C S w ¼ ð7þ ð qþ q ¼ k c S ð8þ C ¼ r S ð9þ S S where k c is the traffic rate on the channel (given by Eqs. (7) or (9) for necklace or hypercubic channels), is its service time calculated by Eq. (), and is the variance of service time distribution. Since the minimum service time at a channel is equal to the message length, M, following a suggestion given in [5], the variance of the service time distribution can be approximated as. Hence, the mean waiting time becomes ðþ ðþ w ¼ k cs ð þð M=SÞ Þ ð k c SÞ ðþ Similarly, modelling the local queue in the source node as an M/G/ queue, with the mean arrival rate k g =V and service time S with an approximated variance ðs MÞ, yields the mean waiting time seen by a message at the source node as [] k g v W s ¼ S ð þð M=SÞ Þ ð kg SÞ ðþ V The probability, P v, that v virtual channels are busy at a physical channel can be determined using a Markovian model. Let state p v ð 6 v 6 VÞ correspond to v virtual channels being busy. The transition rate out of state p v to state p v þ is the traffic rate k c (given by Eqs. (7) and (9)) while the rate out of state p v to state p v is ( is given by Eq. ()). The transition rates out of state p v are reduced by k c to account for the arrival of messages while a channel is in this state. The Markovian model results in the following steady state probability [], in which the service time of a channel has been approximated as the network latency of that channel, as ( P v ¼ ð k csþðk c SÞ v ; 6 v < V ðþ ðk c SÞ v ; v ¼ V: When multiple virtual channels are used per physical channel, they share the physical bandwidth in a time-multiplexed manner. The average degree of multiplexing of virtual channels, that takes place at a given physical channel, can then be estimated by [] P V v¼ V ¼ v p v P V v¼ vp : ðþ v

7 58 S. Meraji, H. Sarbazi-Azad / Simulation Modelling Practice and Theory 7 (9) 5 5 The above equations reveal that there are several inter-dependencies between the different variables of the model. For instance, Eqs. () and () reveal that S is a function of w while Eq. (7) shows that w is a function of S. Given that closed-form solutions to such inter-dependencies are very difficult to determine, the different variables of the model are computed using an iterative technique. Fig. shows different steps of the iterative algorithm. 5. Validation of the model The proposed analytical model has been validated through a discrete-event simulator (Xmulator [5]) that mimics the behaviour of the described routing algorithms in the necklace hypercube at flit level. The simulator uses the same assumptions as the analysis, and some of these assumptions are detailed here with a view to making the network operation clearer. The network cycle time is defined as the transmission time of a single flit from one router to the next. Messages are generated at each node according to a Poisson process with a mean inter-arrival rate of k g messages/cycle. Message length is fixed at M flits. Destination nodes are determined using a uniform random number generator. The mean message latency is defined as the mean amount of time from the generation of a message until the last data flit reaches the local processor at the destination node. The other measures include the mean network latency, the time taken to cross the network, the mean queuing time at the source node, and the time spent at the local queue before entering the first network channel. Numerous validation experiments have been performed for several combinations of network sizes, message lengths, and number of virtual channels to validate the model. Figs. 5 8 depict latency results predicted by the model explained in the previous section plotted against those provided by the simulator for different scenarios. The horizontal axis in the figure shows the traffic generation rate at each node while the vertical axis shows the mean message latency. In Fig. 5, we consider two very large networks with more than,. We consider virtual channels per physical channel, and two different message lengths of M = and 6 flits (larger message size, e.g. 8 flits, took the network to saturation state for small values of traffic generation rates). In Fig. 6, we consider two large networks (more than nodes) with V = 8 virtual channels per physical channel, and three different message lengths of M =, 6 and 8 flits. Fig. 7 represents the same results for medium sized networks (about 56 nodes); here, we use V = 6 virtual channels per physical channel and message length M =, 6 and 8 flits. Finally, in Fig. 8, we have shown a comparison between the results given by the model and those gathered from the simulation experiments for two small networks (about nodes); here the number of virtual channels is V = per physical channel and message length is M =, 6 and 8 flits. The figures reveal that in all cases the analytical model can predict the mean message latency with a good degree of accuracy in the steady-state regions. Moreover, model predictions are still good even when the network operates in the heavy traffic region, and when it starts to approach the saturation region. However, some discrepancies around the saturation point are apparent. These can be accounted for by the approximations made to ease the derivation of different variables of the model, e.g. the approximation made to estimate the variance of the service time distribution at a channel. Such an approximation greatly simplifies the model as it allows us to avoid computing the exact distribution of the message service time at a given channel, which is not a straightforward task due to inter-dependencies between service times at successive channels as wormhole routing relies on a blocking mechanism for flow control. 6. Analytical performance comparison of hypercubes and necklace hypercubes In this section, we compare the performance of the necklace hypercube with its equivalent hypercube network. We use Boura s model [] to predict hypercube s behaviour and the proposed model here to predict the performance of equivalkent Iterative algorithm for solving the proposed model Procedure: for λ g =. until saturation point, do: Step ) Let S be initialized to M. Step ) Compute λ C N, λ C H, and w using equations (7), (9), and (). Step ) Compute P v using equation () and V using equation (). Step 5) Compute new S using equations () through (). If it is different from the old value by grater than ε (a predefine error limit) then go to step. Step 6) Compute W s, V, S and at last Latency using equations (), (), (), and (). Step 7) λg = λg + Step _ Rate End of procedure. Fig.. The pseudo code of iterative approach to solving model parameters.

8 S. Meraji, H. Sarbazi-Azad / Simulation Modelling Practice and Theory 7 (9) Simul Simul Model Simul6 Model6 8 6 Model Simull6 Model E Fig. 5. The average message latency predicted by the model against simulation results when virtual channels are used per physical channel and message length is M =, 6 flits, in (,6)-RNH (left) and (,)-RNH (right) networks. Simul Simul Model Model 6 Simul6 Model6 5 Simul6 Model6 5 Simul8 Model8 Simul8 Series Fig. 6. The average message latency predicted by the model against simulation results when eight virtual channels are used per physical channel and message length is M =, 6, 8 flits, in (7,8)-RNH (left) and (7,)-RNH (right) networks. necklace hypercubes. We first compare the networks with equal number of nodes without considering any technological or implementation constraint. To do so, we consider a -dimensional hypercube and a (7, )-RNH, both with nodes. Fig. 9a shows the average message latency of these networks when message size is fixed at M = 6 flits and number of virtual channels per physical channel is V = 8. As can be seen in the figure, the hypercube performs better than the equivalent necklace network. One reason is the excessive number of channels existing in the hypercube which enables the network to tolerate higher traffic rates. Moreover, the average distance for the ðn; kþ-rnh is sþt, where 8 ðk ÞðkþÞðkþÞ ; if k isodd n ( s ¼ v þ n nþ k ð þ nkþ ; v ¼ 8 ðk þ k þ Þþ ; if n isodd >< ; t ¼ ðþ n 8 ðk ðk ÞðkþÞ þ k þ k mod Þ; if n iseven ; if k iseven n >:

9 5 S. Meraji, H. Sarbazi-Azad / Simulation Modelling Practice and Theory 7 (9) 5 5 Avg total Message latency 5 Simul model simul6 model6 simul8 model8 Avg total Message latency 5 simul model simul6 model6 simul8 model Fig. 7. The average message latency predicted by the model against simulation results when six virtual channels are used per physical channel and message length is M =, 6, 8 flits, in (,8)-RNH (left) and (,)-RNH (right) networks. Simul Simul Model Simul6 5 Simul6 5 Simul8 Model6 Simul8 Model Model Model6 Model Fig. 8. The average message latency predicted by the model against simulation results when four virtual channels are used per physical channel and message length is M =, 6, 8 flits, in (,8)-RNH (left) and (,)-RNH (right) networks. Comparing it to the average distance in a binary n-cube (i.e. n/) reveals a big difference especially when the necklace size, k, is big. When an interconnection network is implemented on a chip or over a board, the bisection bandwidth has the most important effect on the performance. Thus, many studies, e.g. [,8], consider a constant bisection bandwidth when comparing two equivalent networks (with the same number of nodes). According to [], the multiplication of bisection width of the network (the number of edges to be cut in order to halve the network in two equivalent parts) and channel bandwidth is defined to be the bisection bandwidth of the network. The bisection width of the hypercube is equal to n, and the bisection width of the m-dimensional necklace hypercube is m [9]. Assuming two equivalent networks, a hypercube with bisection width of BW H and channel bandwidth of CW H, and a necklace hypercube with bisection width BW NH and channel bandwidth of CW NH, and considering a fixed network bisection bandwidth, we can write BW H CW H ¼ BW NH CW NH Thus, the flit size of the hypercube equals BW NH BW H CW NH. So, considering that the phit size of the necklace hypercube equals its flit size, the transmission time for a flit in the equivalent hypercube will be BW H BW NH times of the flit transmission time in the necklace hypercube. ð5þ

10 S. Meraji, H. Sarbazi-Azad / Simulation Modelling Practice and Theory 7 (9) HC NH HC- NH a b 9 HC NH c Fig. 9. The average message latency predicted by the model for a dimensional hypercube and (7,)-RNH when eight virtual channels are used per physical channel and message length is M = 6, when (a) no constraints is considered, (b) under bisection bandwidth constraint, and (c) under pin-out constraint. Fig. 9b compares the performance of the -dimensional hypercube and (7, )-RNH under constant bisection bandwidth constraint. The message length is fixed at 6 flits and the number of virtual channels per physical channel is V = 8. As can be seen in this figure, the necklace hypercube outperforms the equivalent hypercube under bisection bandwidth constraint. Pin-out constraint also has an important impact on the performance, when an interconnection network is implemented by wiring between chips, boards, and cabinets []. Constant pin-out constraint has been already used by other researchers [,8] for comparing different networks. Pin-out of a node is defined to be the node degree (the number of channels in a node) multiplied by the channel width. Assuming two equivalent networks, a hypercube with node degree P H and channel width of CW H, and a necklace hypercube with bisection width P NH and channel width of CW NH, and considering pin-out constraint, we can write P H CW H ¼ P NH CW NH Using Eq. (6), we can calculate the ratio between the flit transmission time for the hypercube and necklace hypercube. The node degree in an n-dimensional hypercube is n, but there is a problem when we want to determine the pin-out of necklace hypercube. Here, we have two types of nodes: necklace nodes with degree of and the hypercube nodes with degree of n. As the number of nodes with degree is much larger than those with degree n, we can use the approximate value of for node degree. Fig. 9c shows the performance comparison of the -dimensional hypercube and (7, )-RNH under pin-out constraint. As depicted in the figure, the necklace hypercube performs better than the hypercube under pin-out constraint. ð6þ

11 5 S. Meraji, H. Sarbazi-Azad / Simulation Modelling Practice and Theory 7 (9) Conclusion and future work The necklace hypercube network has recently been introduced as an attractive alternative to the well-known hypercube. In this paper, we introduced a mathematical performance model of adaptive wormhole routing in necklace hypercubes and validated it through simulation experiments. We saw that the proposed model manages to achieve a good degree of accuracy while maintaining simplicity, making it a practical evaluation tool that can be used by the researchers in the field to gain insight into the performance behaviour of fully adaptive routing in wormhole-switched necklace hypercube. We then used the proposed analytical model to compare the performance of same-sized necklace hypercubes and hypercubes with and without implementation constraints. The considered implementation constraints were the bisection bandwidth and pin-out. When no implementation constraints were taken into account, the hypercube was shown to behave better than its equivalent necklace hypercube. However, under both bisection bandwidth and pin-out implementation constraints the necklace hypercube outperformed its equivalent hypercube. Our next objective is to analyse the performance of the necklace hypercube under important non-uniform traffic patterns. Examining other types of necklace networks, with different base networks, can also be another line of research for future work. Considering and analyzing irregular necklace networks can be another interesting research topic. Another network which is based on the hypercube and is very similar to necklace hypercube, is the stretched hypercube []. Conducting similar analytical studies for this network can be another line of future research. References [] A. Agarwal, Limits on interconnection network performance, IEEE Transactions on Parallel and Distributed Systems (99) 98. [] Y. Boura, C.R. Das, T.M. Jacob, A performance model for adaptive routing in hypercubes, in: Proceedings of the International Workshop on Parallel Processing, 99, pp. 6. [] B. Ciciani, M. Colajanni, C. Paolucci, An accurate model for the performance analysis of deterministic wormhole routing, in: Proceedings of the th International Parallel Processing Symposium, 997, pp [] W.J. Dally, Virtual channel flow control, IEEE Transactions on Parallel and Distributed Systems (99) 9 5. [5] J.T. Draper, J. Ghosh, A comprehensive analytical model for wormhole routing in multicomputer systems, Journal of Parallel and Distributed Computing (99). [6] J. Duato, S. Yalamanchili, L. Ni, Interconnection Networks: An Engineering Approach, Morgan Kaufmann, 5. [7] R. Greenberg, L. Guan, Modelling and comparison of wormhole routed mesh and torus networks, in: Proceedings of the 9th IASTED International Conference on Parallel and Distributed Computing and Systems, 997, pp [8] J. Kim, C.R. Das, Hypercube communication delay with wormhole routing, IEEE Transactions on Computers (99) [9] L. Kleinrock, Queueing Systems, vol., John Wiley, 975. [] S. Meraji, H. Sarbazi-Azad, Performance evaluation of deterministic routing in wormhole necklace cubes, in: Proceedings of the ACS/IEEE Conference on Computers and Applications, Amman, Jordan, 7. [] S. Meraji, A. Nayebi, H. Sarbazi-Azad, Performance evaluation of adaptive wormhole routing in necklace hypercubes, in: International Conference on Computing: Theory and Applications, IEEE Press, Kolkata, India, March 5 7, 7. [] S. Meraji, Performance evaluation of necklace hypercube interconnection networks, M.Sc. Thesis, Sharif University of Technology, October 6. [] S. Meraji, H. Sarbazi-Azad, A. Patooghy, Performance modeling of necklace hypercubes, in: 6th International Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems (PMEO-PDS 7), held in Conjunction with IEEE IPDPS7, Long Beach, CA, USA, 6 March 7. [] H. Hashemi-Najafabadi, H. Sarbazi-Azad, P. Rajabzadeh, An accurate performance model of fully adaptive routing in wormhole-switched D-mesh multicomputers, Microprocessors and Microsystems (7) [5] A. Nayebi, S. Meraji, A. Shamaei, H. Sarbazi-Azad, Xmulator: a listener-based integrated simulation platform for interconnection networks, in: Proceedings of Asia Modelling Symposium, IEEE Press, Thailand, 7. [6] A. Patel, A. Kusalik, C. McCrosky, Area-efficient VLSI layout for binary hypercubes, IEEE Transactions on Computers 9 () [7] H. Sarbazi-Azad, M. Ould-Khaoua, L.M. Mackenzie, An accurate analytical model of adaptive wormhole routing in k-ary n-cube interconnection networks, Performance Evaluation () [8] H. Sarbazi-Azad, Constraint-based performance comparison of multi-dimensional interconnection networks with deterministic and adaptive routing strategies, Journal of Computers and Electrical Engineering () [9] P. Shareghi, H. Sarbazi-Azad, Topological properties of necklace networks, in: IEEE International Symposium on Parallel Architectures, Algorithms and Networks (IEEE I-SPAN), IEEE Computer Society Press, 5, pp. 5. [] P. Shareghi, H. Sarbazi-Azad, The stretched network: properties, routing, and performance, Journal of Information Science and Engineering (8) 6 78.

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,

More information

Topological Properties

Topological Properties Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E) Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E) 1 Topologies Internet topologies are not very regular they grew incrementally Supercomputers

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

Interconnection Networks

Interconnection Networks CMPT765/408 08-1 Interconnection Networks Qianping Gu 1 Interconnection Networks The note is mainly based on Chapters 1, 2, and 4 of Interconnection Networks, An Engineering Approach by J. Duato, S. Yalamanchili,

More information

Load Balancing and Switch Scheduling

Load Balancing and Switch Scheduling EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,

More information

Asynchronous Bypass Channels

Asynchronous Bypass Channels Asynchronous Bypass Channels Improving Performance for Multi-Synchronous NoCs T. Jain, P. Gratz, A. Sprintson, G. Choi, Department of Electrical and Computer Engineering, Texas A&M University, USA Table

More information

Interconnection Network Design

Interconnection Network Design Interconnection Network Design Vida Vukašinović 1 Introduction Parallel computer networks are interesting topic, but they are also difficult to understand in an overall sense. The topological structure

More information

Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors

Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors Journal of omputational Information Systems 7:2 (2011) 623-630 Available at http://www.jofcis.com Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors Wenhong WEI 1,, Yong LI 2 1 School

More information

Load Balancing Between Heterogenous Computing Clusters

Load Balancing Between Heterogenous Computing Clusters Load Balancing Between Heterogenous Computing Clusters Siu-Cheung Chau Dept. of Physics and Computing, Wilfrid Laurier University, Waterloo, Ontario, Canada, N2L 3C5 e-mail: schau@wlu.ca Ada Wai-Chee Fu

More information

Performance of networks containing both MaxNet and SumNet links

Performance of networks containing both MaxNet and SumNet links Performance of networks containing both MaxNet and SumNet links Lachlan L. H. Andrew and Bartek P. Wydrowski Abstract Both MaxNet and SumNet are distributed congestion control architectures suitable for

More information

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING CHAPTER 6 CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING 6.1 INTRODUCTION The technical challenges in WMNs are load balancing, optimal routing, fairness, network auto-configuration and mobility

More information

Parallel Programming

Parallel Programming Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Parallel Architectures Acknowledgements Prof. Felix

More information

A Dynamic Link Allocation Router

A Dynamic Link Allocation Router A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems

More information

Preserving Message Integrity in Dynamic Process Migration

Preserving Message Integrity in Dynamic Process Migration Preserving Message Integrity in Dynamic Process Migration E. Heymann, F. Tinetti, E. Luque Universidad Autónoma de Barcelona Departamento de Informática 8193 - Bellaterra, Barcelona, Spain e-mail: e.heymann@cc.uab.es

More information

Load Balancing Mechanisms in Data Center Networks

Load Balancing Mechanisms in Data Center Networks Load Balancing Mechanisms in Data Center Networks Santosh Mahapatra Xin Yuan Department of Computer Science, Florida State University, Tallahassee, FL 33 {mahapatr,xyuan}@cs.fsu.edu Abstract We consider

More information

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of

More information

COMMUNICATION PERFORMANCE EVALUATION AND ANALYSIS OF A MESH SYSTEM AREA NETWORK FOR HIGH PERFORMANCE COMPUTERS

COMMUNICATION PERFORMANCE EVALUATION AND ANALYSIS OF A MESH SYSTEM AREA NETWORK FOR HIGH PERFORMANCE COMPUTERS COMMUNICATION PERFORMANCE EVALUATION AND ANALYSIS OF A MESH SYSTEM AREA NETWORK FOR HIGH PERFORMANCE COMPUTERS PLAMENKA BOROVSKA, OGNIAN NAKOV, DESISLAVA IVANOVA, KAMEN IVANOV, GEORGI GEORGIEV Computer

More information

PART III. OPS-based wide area networks

PART III. OPS-based wide area networks PART III OPS-based wide area networks Chapter 7 Introduction to the OPS-based wide area network 7.1 State-of-the-art In this thesis, we consider the general switch architecture with full connectivity

More information

TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy

TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy John Jose, K.V. Mahathi, J. Shiva Shankar and Madhu Mutyam PACE Laboratory, Department of Computer Science and Engineering

More information

Load Balancing between Computing Clusters

Load Balancing between Computing Clusters Load Balancing between Computing Clusters Siu-Cheung Chau Dept. of Physics and Computing, Wilfrid Laurier University, Waterloo, Ontario, Canada, NL 3C5 e-mail: schau@wlu.ca Ada Wai-Chee Fu Dept. of Computer

More information

Optical interconnection networks with time slot routing

Optical interconnection networks with time slot routing Theoretical and Applied Informatics ISSN 896 5 Vol. x 00x, no. x pp. x x Optical interconnection networks with time slot routing IRENEUSZ SZCZEŚNIAK AND ROMAN WYRZYKOWSKI a a Institute of Computer and

More information

Computers and Electrical Engineering

Computers and Electrical Engineering Computers and Electrical Engineering 35 (2009) 803 814 Contents lists available at ScienceDirect Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng Technical Communication

More information

Scaling 10Gb/s Clustering at Wire-Speed

Scaling 10Gb/s Clustering at Wire-Speed Scaling 10Gb/s Clustering at Wire-Speed InfiniBand offers cost-effective wire-speed scaling with deterministic performance Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400

More information

Novel Hierarchical Interconnection Networks for High-Performance Multicomputer Systems

Novel Hierarchical Interconnection Networks for High-Performance Multicomputer Systems JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 20, 1213-1229 (2004) Short Paper Novel Hierarchical Interconnection Networks for High-Performance Multicomputer Systems GENE EU JAN, YUAN-SHIN HWANG *, MING-BO

More information

Components: Interconnect Page 1 of 18

Components: Interconnect Page 1 of 18 Components: Interconnect Page 1 of 18 PE to PE interconnect: The most expensive supercomputer component Possible implementations: FULL INTERCONNECTION: The ideal Usually not attainable Each PE has a direct

More information

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09 Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors NoCArc 09 Jesús Camacho Villanueva, José Flich, José Duato Universidad Politécnica de Valencia December 12,

More information

SUPPORT FOR HIGH-PRIORITY TRAFFIC IN VLSI COMMUNICATION SWITCHES

SUPPORT FOR HIGH-PRIORITY TRAFFIC IN VLSI COMMUNICATION SWITCHES 9th Real-Time Systems Symposium Huntsville, Alabama, pp 191-, December 1988 SUPPORT FOR HIGH-PRIORITY TRAFFIC IN VLSI COMMUNICATION SWITCHES Yuval Tamir and Gregory L Frazier Computer Science Department

More information

PERFORMANCE STUDY AND SIMULATION OF AN ANYCAST PROTOCOL FOR WIRELESS MOBILE AD HOC NETWORKS

PERFORMANCE STUDY AND SIMULATION OF AN ANYCAST PROTOCOL FOR WIRELESS MOBILE AD HOC NETWORKS PERFORMANCE STUDY AND SIMULATION OF AN ANYCAST PROTOCOL FOR WIRELESS MOBILE AD HOC NETWORKS Reza Azizi Engineering Department, Bojnourd Branch, Islamic Azad University, Bojnourd, Iran reza.azizi@bojnourdiau.ac.ir

More information

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P) Distributed Computing over Communication Networks: Topology (with an excursion to P2P) Some administrative comments... There will be a Skript for this part of the lecture. (Same as slides, except for today...

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a Parallel Computer Hardware Multiple Processors Multiple Memories Interconnection Network System Software Parallel

More information

Linear Crossed Cube (LCQ): A New Interconnection Network Topology for Massively Parallel System

Linear Crossed Cube (LCQ): A New Interconnection Network Topology for Massively Parallel System I. J. Computer Network and Information Security, 215, 3, 18-25 Published Online February 215 in MECS (http://www.mecs-press.org/) DOI: 1.5815/ijcnis.215.3.3 Linear Crossed Cube (): A New Interconnection

More information

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age. Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

More information

Performance Evaluation of Router Switching Fabrics

Performance Evaluation of Router Switching Fabrics Performance Evaluation of Router Switching Fabrics Nian-Feng Tzeng and Malcolm Mandviwalla Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, LA 70504, U. S. A. Abstract

More information

An Ecient Dynamic Load Balancing using the Dimension Exchange. Ju-wook Jang. of balancing load among processors, most of the realworld

An Ecient Dynamic Load Balancing using the Dimension Exchange. Ju-wook Jang. of balancing load among processors, most of the realworld An Ecient Dynamic Load Balancing using the Dimension Exchange Method for Balancing of Quantized Loads on Hypercube Multiprocessors * Hwakyung Rim Dept. of Computer Science Seoul Korea 11-74 ackyung@arqlab1.sogang.ac.kr

More information

A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs

A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs ang u, Mei ang, ulu ang, and ingtao Jiang Dept. of Computer Science Nankai University Tianjing, 300071, China yuyang_79@yahoo.com.cn,

More information

Data Center Network Structure using Hybrid Optoelectronic Routers

Data Center Network Structure using Hybrid Optoelectronic Routers Data Center Network Structure using Hybrid Optoelectronic Routers Yuichi Ohsita, and Masayuki Murata Graduate School of Information Science and Technology, Osaka University Osaka, Japan {y-ohsita, murata}@ist.osaka-u.ac.jp

More information

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches 3D On-chip Data Center Networks Using Circuit Switches and Packet Switches Takahide Ikeda Yuichi Ohsita, and Masayuki Murata Graduate School of Information Science and Technology, Osaka University Osaka,

More information

A Power Efficient QoS Provisioning Architecture for Wireless Ad Hoc Networks

A Power Efficient QoS Provisioning Architecture for Wireless Ad Hoc Networks A Power Efficient QoS Provisioning Architecture for Wireless Ad Hoc Networks Didem Gozupek 1,Symeon Papavassiliou 2, Nirwan Ansari 1, and Jie Yang 1 1 Department of Electrical and Computer Engineering

More information

NOVEL PRIORITISED EGPRS MEDIUM ACCESS REGIME FOR REDUCED FILE TRANSFER DELAY DURING CONGESTED PERIODS

NOVEL PRIORITISED EGPRS MEDIUM ACCESS REGIME FOR REDUCED FILE TRANSFER DELAY DURING CONGESTED PERIODS NOVEL PRIORITISED EGPRS MEDIUM ACCESS REGIME FOR REDUCED FILE TRANSFER DELAY DURING CONGESTED PERIODS D. Todinca, P. Perry and J. Murphy Dublin City University, Ireland ABSTRACT The goal of this paper

More information

Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP)

Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP) 19th International Symposium on Computer Architecture and High Performance Computing Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP) Seung Eun Lee, Jun Ho Bahn, and

More information

Switched Interconnect for System-on-a-Chip Designs

Switched Interconnect for System-on-a-Chip Designs witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased

More information

CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS

CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS 137 CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS 8.1 CONCLUSION In this thesis, efficient schemes have been designed and analyzed to control congestion and distribute the load in the routing process of

More information

A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS

A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS Mihai Horia Zaharia, Florin Leon, Dan Galea (3) A Simulator for Load Balancing Analysis in Distributed Systems in A. Valachi, D. Galea, A. M. Florea, M. Craus (eds.) - Tehnologii informationale, Editura

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

Why the Network Matters

Why the Network Matters Week 2, Lecture 2 Copyright 2009 by W. Feng. Based on material from Matthew Sottile. So Far Overview of Multicore Systems Why Memory Matters Memory Architectures Emerging Chip Multiprocessors (CMP) Increasing

More information

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM 152 APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM A1.1 INTRODUCTION PPATPAN is implemented in a test bed with five Linux system arranged in a multihop topology. The system is implemented

More information

Data Center Switch Fabric Competitive Analysis

Data Center Switch Fabric Competitive Analysis Introduction Data Center Switch Fabric Competitive Analysis This paper analyzes Infinetics data center network architecture in the context of the best solutions available today from leading vendors such

More information

An Active Network Based Hierarchical Mobile Internet Protocol Version 6 Framework

An Active Network Based Hierarchical Mobile Internet Protocol Version 6 Framework An Active Network Based Hierarchical Mobile Internet Protocol Version 6 Framework Zutao Zhu Zhenjun Li YunYong Duan Department of Business Support Department of Computer Science Department of Business

More information

CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING

CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING Journal homepage: http://www.journalijar.com INTERNATIONAL JOURNAL OF ADVANCED RESEARCH RESEARCH ARTICLE CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING R.Kohila

More information

Interconnection Networks

Interconnection Networks Interconnection Networks Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Three questions about

More information

Analyzing Mission Critical Voice over IP Networks. Michael Todd Gardner

Analyzing Mission Critical Voice over IP Networks. Michael Todd Gardner Analyzing Mission Critical Voice over IP Networks Michael Todd Gardner Organization What is Mission Critical Voice? Why Study Mission Critical Voice over IP? Approach to Analyze Mission Critical Voice

More information

An On-Line Algorithm for Checkpoint Placement

An On-Line Algorithm for Checkpoint Placement An On-Line Algorithm for Checkpoint Placement Avi Ziv IBM Israel, Science and Technology Center MATAM - Advanced Technology Center Haifa 3905, Israel avi@haifa.vnat.ibm.com Jehoshua Bruck California Institute

More information

A Comparison Study of Qos Using Different Routing Algorithms In Mobile Ad Hoc Networks

A Comparison Study of Qos Using Different Routing Algorithms In Mobile Ad Hoc Networks A Comparison Study of Qos Using Different Routing Algorithms In Mobile Ad Hoc Networks T.Chandrasekhar 1, J.S.Chakravarthi 2, K.Sravya 3 Professor, Dept. of Electronics and Communication Engg., GIET Engg.

More information

Journal of Parallel and Distributed Computing 61, 11481179 (2001) doi:10.1006jpdc.2001.1747, available online at http:www.idealibrary.com on Adaptive Routing on the New Switch Chip for IBM SP Systems Bulent

More information

Smart Queue Scheduling for QoS Spring 2001 Final Report

Smart Queue Scheduling for QoS Spring 2001 Final Report ENSC 833-3: NETWORK PROTOCOLS AND PERFORMANCE CMPT 885-3: SPECIAL TOPICS: HIGH-PERFORMANCE NETWORKS Smart Queue Scheduling for QoS Spring 2001 Final Report By Haijing Fang(hfanga@sfu.ca) & Liu Tang(llt@sfu.ca)

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

Decentralized Utility-based Sensor Network Design

Decentralized Utility-based Sensor Network Design Decentralized Utility-based Sensor Network Design Narayanan Sadagopan and Bhaskar Krishnamachari University of Southern California, Los Angeles, CA 90089-0781, USA narayans@cs.usc.edu, bkrishna@usc.edu

More information

Assignment #3 Routing and Network Analysis. CIS3210 Computer Networks. University of Guelph

Assignment #3 Routing and Network Analysis. CIS3210 Computer Networks. University of Guelph Assignment #3 Routing and Network Analysis CIS3210 Computer Networks University of Guelph Part I Written (50%): 1. Given the network graph diagram above where the nodes represent routers and the weights

More information

AS the number of components in a system increases,

AS the number of components in a system increases, 762 IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 6, JUNE 2008 An Efficient and Deadlock-Free Network Reconfiguration Protocol Olav Lysne, Member, IEEE, JoséMiguel Montañana, José Flich, Member, IEEE Computer

More information

MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL

MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL Sandeep Kumar 1, Arpit Kumar 2 1 Sekhawati Engg. College, Dundlod, Dist. - Jhunjhunu (Raj.), 1987san@gmail.com, 2 KIIT, Gurgaon (HR.), Abstract

More information

A Catechistic Method for Traffic Pattern Discovery in MANET

A Catechistic Method for Traffic Pattern Discovery in MANET A Catechistic Method for Traffic Pattern Discovery in MANET R. Saranya 1, R. Santhosh 2 1 PG Scholar, Computer Science and Engineering, Karpagam University, Coimbatore. 2 Assistant Professor, Computer

More information

An Adaptive Load Balancing to Provide Quality of Service

An Adaptive Load Balancing to Provide Quality of Service An Adaptive Load Balancing to Provide Quality of Service 1 Zahra Vali, 2 Massoud Reza Hashemi, 3 Neda Moghim *1, Isfahan University of Technology, Isfahan, Iran 2, Isfahan University of Technology, Isfahan,

More information

Chapter 2. Multiprocessors Interconnection Networks

Chapter 2. Multiprocessors Interconnection Networks Chapter 2 Multiprocessors Interconnection Networks 2.1 Taxonomy Interconnection Network Static Dynamic 1-D 2-D HC Bus-based Switch-based Single Multiple SS MS Crossbar 2.2 Bus-Based Dynamic Single Bus

More information

LOAD-BALANCED ROUTING IN INTERCONNECTION NETWORKS

LOAD-BALANCED ROUTING IN INTERCONNECTION NETWORKS LOAD-BALANCED ROUTING IN INTERCONNECTION NETWORKS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT

More information

Load Balancing by MPLS in Differentiated Services Networks

Load Balancing by MPLS in Differentiated Services Networks Load Balancing by MPLS in Differentiated Services Networks Riikka Susitaival, Jorma Virtamo, and Samuli Aalto Networking Laboratory, Helsinki University of Technology P.O.Box 3000, FIN-02015 HUT, Finland

More information

Leveraging Torus Topology with Deadlock Recovery for Cost-Efficient On-Chip Network

Leveraging Torus Topology with Deadlock Recovery for Cost-Efficient On-Chip Network Leveraging Torus Topology with Deadlock ecovery for Cost-Efficient On-Chip Network Minjeong Shin, John Kim Department of Computer Science KAIST Daejeon, Korea {shinmj, jjk}@kaist.ac.kr Abstract On-chip

More information

CONTINUOUS scaling of CMOS technology makes it possible

CONTINUOUS scaling of CMOS technology makes it possible IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 7, JULY 2006 693 It s a Small World After All : NoC Performance Optimization Via Long-Range Link Insertion Umit Y. Ogras,

More information

Christian Bettstetter. Mobility Modeling, Connectivity, and Adaptive Clustering in Ad Hoc Networks

Christian Bettstetter. Mobility Modeling, Connectivity, and Adaptive Clustering in Ad Hoc Networks Christian Bettstetter Mobility Modeling, Connectivity, and Adaptive Clustering in Ad Hoc Networks Contents 1 Introduction 1 2 Ad Hoc Networking: Principles, Applications, and Research Issues 5 2.1 Fundamental

More information

CHAPTER 3 CALL CENTER QUEUING MODEL WITH LOGNORMAL SERVICE TIME DISTRIBUTION

CHAPTER 3 CALL CENTER QUEUING MODEL WITH LOGNORMAL SERVICE TIME DISTRIBUTION 31 CHAPTER 3 CALL CENTER QUEUING MODEL WITH LOGNORMAL SERVICE TIME DISTRIBUTION 3.1 INTRODUCTION In this chapter, construction of queuing model with non-exponential service time distribution, performance

More information

Provisioning algorithm for minimum throughput assurance service in VPNs using nonlinear programming

Provisioning algorithm for minimum throughput assurance service in VPNs using nonlinear programming Provisioning algorithm for minimum throughput assurance service in VPNs using nonlinear programming Masayoshi Shimamura (masayo-s@isnaistjp) Guraduate School of Information Science, Nara Institute of Science

More information

Adaptive DCF of MAC for VoIP services using IEEE 802.11 networks

Adaptive DCF of MAC for VoIP services using IEEE 802.11 networks Adaptive DCF of MAC for VoIP services using IEEE 802.11 networks 1 Mr. Praveen S Patil, 2 Mr. Rabinarayan Panda, 3 Mr. Sunil Kumar R D 1,2,3 Asst. Professor, Department of MCA, The Oxford College of Engineering,

More information

Technology-Driven, Highly-Scalable Dragonfly Topology

Technology-Driven, Highly-Scalable Dragonfly Topology Technology-Driven, Highly-Scalable Dragonfly Topology By William J. Dally et al ACAL Group Seminar Raj Parihar parihar@ece.rochester.edu Motivation Objective: In interconnect network design Minimize (latency,

More information

Behavior Analysis of TCP Traffic in Mobile Ad Hoc Network using Reactive Routing Protocols

Behavior Analysis of TCP Traffic in Mobile Ad Hoc Network using Reactive Routing Protocols Behavior Analysis of TCP Traffic in Mobile Ad Hoc Network using Reactive Routing Protocols Purvi N. Ramanuj Department of Computer Engineering L.D. College of Engineering Ahmedabad Hiteishi M. Diwanji

More information

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING Hussain Al-Asaad and Alireza Sarvi Department of Electrical & Computer Engineering University of California Davis, CA, U.S.A.

More information

AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK

AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK Abstract AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK Mrs. Amandeep Kaur, Assistant Professor, Department of Computer Application, Apeejay Institute of Management, Ramamandi, Jalandhar-144001, Punjab,

More information

Quality of Service Routing Network and Performance Evaluation*

Quality of Service Routing Network and Performance Evaluation* Quality of Service Routing Network and Performance Evaluation* Shen Lin, Cui Yong, Xu Ming-wei, and Xu Ke Department of Computer Science, Tsinghua University, Beijing, P.R.China, 100084 {shenlin, cy, xmw,

More information

Chapter 12: Multiprocessor Architectures. Lesson 04: Interconnect Networks

Chapter 12: Multiprocessor Architectures. Lesson 04: Interconnect Networks Chapter 12: Multiprocessor Architectures Lesson 04: Interconnect Networks Objective To understand different interconnect networks To learn crossbar switch, hypercube, multistage and combining networks

More information

Performance Evaluation of Multi-Core Multi-Cluster Architecture (MCMCA)

Performance Evaluation of Multi-Core Multi-Cluster Architecture (MCMCA) Performance Evaluation of Multi-Core Multi-Cluster Architecture (MCMCA) Norhazlina Hamid, Robert J. Walters and Gary B. Wills School of Electronics & Computer Science, University of Southampton, SO17 1BJ,

More information

Performance advantages of resource sharing in polymorphic optical networks

Performance advantages of resource sharing in polymorphic optical networks R. J. Durán, I. de Miguel, N. Merayo, P. Fernández, R. M. Lorenzo, E. J. Abril, I. Tafur Monroy, Performance advantages of resource sharing in polymorphic optical networks, Proc. of the 0th European Conference

More information

A Fast Path Recovery Mechanism for MPLS Networks

A Fast Path Recovery Mechanism for MPLS Networks A Fast Path Recovery Mechanism for MPLS Networks Jenhui Chen, Chung-Ching Chiou, and Shih-Lin Wu Department of Computer Science and Information Engineering Chang Gung University, Taoyuan, Taiwan, R.O.C.

More information

DESIGN AND DEVELOPMENT OF LOAD SHARING MULTIPATH ROUTING PROTCOL FOR MOBILE AD HOC NETWORKS

DESIGN AND DEVELOPMENT OF LOAD SHARING MULTIPATH ROUTING PROTCOL FOR MOBILE AD HOC NETWORKS DESIGN AND DEVELOPMENT OF LOAD SHARING MULTIPATH ROUTING PROTCOL FOR MOBILE AD HOC NETWORKS K.V. Narayanaswamy 1, C.H. Subbarao 2 1 Professor, Head Division of TLL, MSRUAS, Bangalore, INDIA, 2 Associate

More information

Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,

More information

Dynamic Congestion-Based Load Balanced Routing in Optical Burst-Switched Networks

Dynamic Congestion-Based Load Balanced Routing in Optical Burst-Switched Networks Dynamic Congestion-Based Load Balanced Routing in Optical Burst-Switched Networks Guru P.V. Thodime, Vinod M. Vokkarane, and Jason P. Jue The University of Texas at Dallas, Richardson, TX 75083-0688 vgt015000,

More information

SIMULATION STUDY OF BLACKHOLE ATTACK IN THE MOBILE AD HOC NETWORKS

SIMULATION STUDY OF BLACKHOLE ATTACK IN THE MOBILE AD HOC NETWORKS Journal of Engineering Science and Technology Vol. 4, No. 2 (2009) 243-250 School of Engineering, Taylor s University College SIMULATION STUDY OF BLACKHOLE ATTACK IN THE MOBILE AD HOC NETWORKS SHEENU SHARMA

More information

1 Location Management in Cellular

1 Location Management in Cellular 1 Location Management in Cellular Networks JINGYUAN ZHANG Abstract In a cellular network, a service coverage area is divided into smaller areas of hexagonal shape, referred to as cells. The cellular concept

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed

More information

Available online at www.sciencedirect.com. ScienceDirect. Procedia Computer Science 52 (2015 ) 902 907

Available online at www.sciencedirect.com. ScienceDirect. Procedia Computer Science 52 (2015 ) 902 907 Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 52 (2015 ) 902 907 The 4th International Workshop on Agent-based Mobility, Traffic and Transportation Models, Methodologies

More information

THE Internet is an open architecture susceptible to various

THE Internet is an open architecture susceptible to various IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 16, NO. 10, OCTOBER 2005 1 You Can Run, But You Can t Hide: An Effective Statistical Methodology to Trace Back DDoS Attackers Terence K.T. Law,

More information

Web Server Software Architectures

Web Server Software Architectures Web Server Software Architectures Author: Daniel A. Menascé Presenter: Noshaba Bakht Web Site performance and scalability 1.workload characteristics. 2.security mechanisms. 3. Web cluster architectures.

More information

A Study of Network Security Systems

A Study of Network Security Systems A Study of Network Security Systems Ramy K. Khalil, Fayez W. Zaki, Mohamed M. Ashour, Mohamed A. Mohamed Department of Communication and Electronics Mansoura University El Gomhorya Street, Mansora,Dakahlya

More information

Research Article ISSN 2277 9140 Copyright by the authors - Licensee IJACIT- Under Creative Commons license 3.0

Research Article ISSN 2277 9140 Copyright by the authors - Licensee IJACIT- Under Creative Commons license 3.0 INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An international, online, open access, peer reviewed journal Volume 2 Issue 2 April 2013 Research Article ISSN 2277 9140 Copyright

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information