A comparative study of bidirectional ring and crossbar interconnection networks

Size: px
Start display at page:

Download "A comparative study of bidirectional ring and crossbar interconnection networks"

Transcription

1 Computers and Electrical Engineering ) 43±57 A comparative study of bidirectional ring and crossbar interconnection networks Hitoshi Oi a, *, N. Ranganathan b a Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA b Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA Received 16June 1999; accepted 3 April 2000 Abstract For distributed shared memory multiprocessors, the choice and the design of interconnection networks have a signi cant impact on their performance. Bidirectional ring networks are considered to be physically fast due to their simple structure, but topologically slow since their communication latency grows proportionally to the number of nodes. In this paper, we will present a quantitative measure to the question of how physically fast a bidirectional ring has to be to overcome its topologically slow communication latency by comparing it to the crossbar, an interconnection network that has opposite characteristics to the ring network. A hybrid method, in which workload parameters extracted from memory traces are given to an analytical model is used for performance evaluation. Our study shows for 16nodes con guration, the performance of two networks are similar. For 32 and 64 nodes con gurations, the bidirectional ring outperforms the crossbar by 21% and 61% respectively, on the average of four parallel applications. Ó 2001 Elsevier Science Ltd. All rights reserved. Keywords: Interconnection networks; Distributed shared memory multiprocessor; Performance evaluation; Slotted ring; Crossbar 1. Introduction A distributed shared memory DSM) multiprocessor provides a view of a globally shared address space by sending coherence protocol messages between processing elements. Therefore, the interconnection network a ects the performance of a DSM multiprocessor signi cantly. * Corresponding author. Tel.: addresses: hitoshi@cse.fau.edu H. Oi), ranganat@csee.usf.edu N. Ranganathan) /01/$ - see front matter Ó 2001 Elsevier Science Ltd. All rights reserved. PII: S )

2 44 H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 Ring networks have the advantages of 1) xed node degree modular expandability), 2) simple network interface structure fast operation speed) and 3) low wiring complexity fast transmission speed). On the other hand, the main disadvantage of the ring network is its communication latency growth rate is proportional to the number of nodes. However, when nonlocal misses are likely to be destined to neighboring nodes, this disadvantage can be alleviated by using a bidirectional ring [1,2]. Crossbar switches have opposite characteristics: they can connect any pair of nodes by one hop, but the transmission speed is expected to be much slower due to wiring and circuit complexity [3]. In this paper, we will present a quantitative measure to the question of how physically fast a bidirectional ring has to be to overcome its topologically slow communication latency, by comparing its performance to that of the crossbar switch. This paper is organized as follows: The rest of this section introduces the past studies on the comparisons of the interconnection networks for multiprocessors and instances of multiprocessor using either ring or crossbar switch networks. The architectures of multiprocessors that will be assumed in our study are presented in Section 2. Our methodology for evaluating the performance and the pro le of the applications used in the performance evaluation are described in Section 3. The comparison of the bidirectional ring and the crossbar by estimated execution times ETs) is presented in Section 4. Some conclusions are provided in Section Related work There have been several researches in the comparisons of interconnection networks for multiprocessors in the past. Ravindran and Stumn compared the performance of multiprocessors using hierarchical ring and mesh networks [4]. The miss latency was used for performance comparison. Their study mainly showed maximum number of nodes at which the hierarchical ring outperformed the mesh network. Since they assumed a unidirectional ring, nearest-neighbor communication pattern was not taken into account. Barroso and Dubois evaluated the performance of the slotted ring multiprocessor [5]. They investigated the e ect of the design choices such as coherence protocols snoopy, linked list and full-map directory) and processor speed, and compared their unidirectional ring architecture with the split-transaction bus. Lang et al. studied the e ective bandwidth of the crossbar switches, and compared it to that of the multiple-bus interconnection network using parametric simulations [3]. They assumed the dance-hall UMA architecture processors and memories are di erent sides of interconnection network). The instances of multiprocessors using crossbar switches or ring networks include the following. NUMAchine [6] is a multiprocessor developed at the University of Toronto. It has three hierarchy levels: a small number of PEs connected by a bus form a cluster called ``station'') and the stations are connected by two levels of local and central) rings. KSR-1 of Kendall Square Research was a cache-only memory architecture COMA) multiprocessor with hierarchy rings [7]. Exemplar S-Class of Hewlett Packard [8] and CP-PACS developed at University of Tsukuba [9] use crossbar switches. Convex SPP-1000 [10] uses a crossbar switch within a cluster and scalable coherence interface SCI) [11] rings between clusters.

3 2. Architecture model H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 45 In this section, we describe the architecture of the DSM multiprocessors using the bidirectional ring and the crossbar that will be assumed in this paper. Each processing element PE) consists of a processor, a memory unit with directory entries, L1 and L2 cache, and a network interface Fig. 1). PEs are connected by either a bidirectional ring or a crossbar switch network. It is assumed that both architectures are based on the CC-NUMA model with directory based cache coherency protocol. The memory unit within a PE is a part of the globally shared memory and is associated with directory entries corresponding to the part of global address space assigned to the PE. L1 cache is direct mapped, write-through and its access time and size are one clock cycle and 16KB. L2 cache is full associative, write-back and its access time t L2 ) and size are ve clock cycle and in nity. Block size of both L1 and L2 is 16bytes. L2 cache miss is responded either by the home node the PE that is assigned the portion of global address of the accessed data) or by the owner node the PE that owns a modi ed copy of the requested data in the cache). For a write access, invalidation scheme is used. On the bidirectional ring, a single invalidation message is broadcast by passing it all the way through the ring. On the other hand, the crossbar network is assumed to have the multicast functionality to send invalidate messages simultaneously to all the PEs that have copies of the block. The parameters that de ne the con guration of the system are listed in Table 1. t L2, t s, t r, t m, and t c are no-contention timing parameters, and they are represented in terms of processor clock cycles. In this paper, we consider small to medium scale multiprocessors, and hence we chose N ˆ 16, 32 and 64. The speed of crossbar network is xed, while that of ring network is varied widely to nd out the point where the performance of two con gurations match. Both bidirectional ring and crossbar are assumed to be at one level). On the bidirectional ring, a header message is divided into two packets, and it takes t r to transfer a packet between Fig. 1. DSM multiprocessor architecture.

4 46 H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 Table 1 System parameters timing parameters are in processor clock cycles) Symbol Description Value N Number of PE 16, 32, 64 t L2 L2 cache access time 5 t s Crossbar switch packet transfer time 4, 8, 16, 32, 64 t r Bidirectional ring packet transfer time 1±5 t m Memory access time 50 t c Protocol handling time 10 d p Data message packet length 4 adjacent PEs. This assumption of t r is to compensate that a node on the bidirectional ring has two links for each direction. On the bidirectional ring, each link connects a pair of adjacent PEs with a minimum wire length. Therefore, it is not di cult to keep t r constant for larger N. Thus, we use the same t r ˆ 1;...; 5 for all N. t s is the time to transfer a header message between any pair of PEs on the crossbar network. We have chosen t s as follows: First, we take two instances of shared memory multiprocessors using crossbar switches, and look at their relative speed between the processor and the crossbar, the dimensional size and the data path width of the crossbar, and the word length of the processor. Exemplar S-Class of Hewlett Packard uses 180 MHz 64-bits processors and 8 8 crossbars operating at 120 MHz with 64-bits datapath [8]. CP-PACS developed at University of Tsukuba uses 180 MHz 64-bits processors and crossbar switches with data bandwidth of 300 MB/s per node [9]. Using the above de nition, Exemplar and CP-PACS have t s ˆ 1:5 and t s ˆ 8 respectively. Taking these values into consideration, we have chosen t s ˆ 4 and 8 for N ˆ 16. Next, we consider t s for larger N 32 and 64). The delay of crossbar switch grows at O N 2 due to the length of wire and number of nodes [3]. Thus, we use t s ˆ 16and t s ˆ 64 for N ˆ 32 and 64, respectively. Unlike UMA including Exemplar) in which the delay of crossbar is dominant for the network latency, both the wire connection between PEs and crossbar, which is considered to be O N, and the delay of crossbar switches a ect the latency of the network in NUMA architecture. Hence, we also use lower values of t s ˆ 8 and t s ˆ 16and 32 for N ˆ 32 and 64, respectively. A data message is assumed to be four times longer than a header message d p ˆ 4. This value is chosen from the cache block size 16bytes for 32 bit processors) assumed when the trace les were collected. On the crossbar, a data message is transmitted in a contiguous 4t s time period, while on the bidirectional ring a data message is divided into multiple packets and they are sent in possibly) uncontiguous slots. Therefore, the transmission delay of a data message on the bidirectional ring is P 8t r. 1 We assume the slotted ring con guration for the bidirectional ring. 3. Performance model The methods that have been used for performance evaluation of computer systems include analytical models [12], parametric simulations [4], trace-driven simulations [13] and execution- 1 Equality holds when all the packets are transmitted in contiguous slots.

5 H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 47 driven simulations [14]. In this paper, we use the hybrid approach by Barroso and Dubois [5] by extending it to the bidirectional ring and the crossbar. Below, we brie y describe the derivation of ET using the hybrid approach. The ET in clock cycles is given by ET ˆ Inst DataAccess; 1 where Inst is instruction fetches. We assume that all the instruction fetches are L1 cache hit. Let Refs, L1Miss, L2Miss, and L2MissLatency be the number of references, the number of L1 cache misses, the number of L2 cache misses, and the penalty of L2 cache miss, respectively. Thus, Eq. 1) becomes ET ˆ Refs L1Miss t L2 L2Miss L2MissLatency: 2 L2 cache miss is responded by either local or remote memory. Thus, L2MissLatency is de ned to consist of T, the time to transmit a message either request or data) to the network, P, the propagation delay of a message to reach the destination on the network, M, the time to access the memory, and C, the time of coherence protocol handling. These components of miss latency appear di erent number of times per miss depending on the access mode and the state of the memory block. Note that M includes contention delay while t m in Table 1 does not. P is constant t s on the crossbar while it is the average message traversal length extracted from the traces on the bidirectional ring. C is assumed to be constant t c i. e. no contention at L2 cache is modeled). We model the contention at network and memory. Thus, the values of parameters T and M are the functions of the utilizations of the resources, network and memory, respectively. The utilization of the bidirectional ring R u is R u ˆ tr P P H dtrv d p DtTrv ; ET where +H dtrv and +DtTrv are the sums of header and data message traversals, respectively. On the other hand, the utilization of the crossbar switch is SW u ˆ ts +H dtx d p +H dtx ; ET where +H dtx and +DtTx are the the numbers of header and data message transmissions. This is because on the crossbar switch network, a message only takes one hop to reach any destination PE. These utilization parameters are combined with the M=G=1 queue model to estimate the ET of applications on both architectures [15]. Thus, ET itself is a function of ET. ET ˆ Refs L1Miss t L2 L2Miss L2MissLatency ET : The derivation of ET is iterated until it converges to a tolerant level <0.1%). We use two metrics to evaluate the performance of two architectures. One is the ratio of ETs, namely, the ET on the crossbar divided by the ET on the bidirectional ring. Another metric is the scalability speed up), which is obtained by dividing the ET of the uniprocessor by that of the multiprocessor.

6 48 H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 Table 2 Pro le of trace les Application Refs Inst) FFT 7.44M 3.11M) Simple 27.03M 11.59M) Weather 31.76M 13.64M) Speech 22.55M 11.77M) N Miss Rate %) L1 L2 Write Local Trace les The pro le of the memory traces we use in our experiments are listed in Table 2. These trace les are from the MIT trace set [16], and are obtained from the ftp server of the TraceBase project at the New Mexico State University [17]. FFT, Simple and Weather were written in FORTRAN, and traces of these applications were derived using the postmortem method, a technique that generates a parallel trace from a uniprocessor execution. Speech was written in the programming language Mul-T, and its trace le was collected by inserting instrumentation codes into the program by the compiler. Write miss rate sixth column) is the fraction of write misses within all L2 cache misses. A write access to shared block is considered to be a write miss since it generates memory access and network tra c. The workload parameters of each application were extracted from the trace le as follows: Each trace le is fed to a cache/directory simulator developed at the laboratory for computer science of MIT, 2 and statistical information of the applications such as hit/miss, read/write, clean/dirty was collected. These trace les were collected on a 64 processor system. To extract data for the case of 16 32) processors, access traces from 4i to 4i 3 from 2i to 2i 1) were given to ith cache, where i ˆ 0;...; 15 i ˆ 0;...; 31. It is assumed that all the instruction fetches are cache hits. In addition to the original functionality, two modi cations were made on the cache simulator. First, two-level cache hierarchy was incorporated as described in the previous section. Another modi cation made to the cache simulator was to collection of communication distance number of links on the bidirectional ring) between the node where miss occurs and the home node. We assume that the rst PE that accesses a block is the home node of the block. When the miss node and the home node are the same, no network tra c is generated. On FFT and Speech, more than 2 This simulator was also provided by the TraceBase ftp server.

7 H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 49 half of accesses to shared data are to local addresses, while on Simple and Weather the locality of shared access is quite low. This communication distance a ects the performance of the bidirectional ring signi cantly in two ways: First, the longer the distance the longer the propagation delay higher P). Second, the longer the distance the higher utilization of the ring higher T ). Figs. 12±14 show the distribution of request ± home node distance on the bidirectional ring. For N ˆ 64, the di erence of communication pattern among applications is observed most clearly. FFT and Speech exhibit all-to-all communication pattern while Weather and Simple have a certain level of nearest neighbor communication patterns. Especially for N ˆ 64 on Weather, 30% of non-local misses are accesses to adjacent nodes. For N ˆ 16and 32, the degree of nearest neighbor communication pattern of Weather and Simple are weaker than the case of N ˆ 64. Although the number of applications is not large, these applications exhibit variations in their workload parameters. 4. Bidirectional ring versus crossbar evaluation In this section, we present comparisons of the bidirectional ring and the crossbar using the model and the traces presented in the previous section Figures in this section are placed after the references). FFT Fig. 2) has all-to-all communication pattern Fig. 14), which is an advantage to the crossbar over the bidirectional ring. The speed up for larger N is medium among four applications Fig. 3). L2 miss rate of FFT is relatively small, and it increases by 36% for N ˆ 16! 64. However, its local miss rate is almost constant over N. This suppresses the increase of network tra c, and is advantageous for both architecture. Write miss rate of FFT decreases from 78% to 57% for N ˆ 16! 64. This is an advantage for the bidirectional ring since on the bidirectional Fig. 2. Relative performance/fft.

8 50 H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 Fig. 3. Speed up/fft. ring a write miss generates an invalidation message that traverses the entire ring and hence increases the ring utilization. With fast ring speed t r ˆ 1, the bidirectional ring outperforms the crossbar by 6%, 27%, and 73% for N ˆ 16, 32 and 64 and t s ˆ 4, 8, and 16, respectively. Miss rate of Simple increases for larger N while local miss rate decreases. These cause more network tra c and hence are disadvantage for both network. On the other hand, decreasing write miss rate and increasing nearest neighbor communication pattern of Simple are advantageous for the bidirectional ring. With t r ˆ 1, the bidirectional ring outperforms the crossbar by 5%, 19%, and 70% for N ˆ 16, 32 and 64 and t s ˆ 4, 8, and 16, respectively Fig. 4). When N ˆ 32! 64, the bidirectional ring achieves speed up of 40% while the performance of the crossbar decreases slightly even with an assumption of O N communication latency growth rate Fig. 5). Weather has relatively low miss rate, and hence the performance di erence between two networks is relatively small, especially for N ˆ 16and 32 Fig. 6). However, when N is increased from 32 to 64, the e ect of interconnection network becomes stronger due to increasing miss rate by 37% and decreasing local miss rate by 40%. The bidirectional ring can still achieve speed up of 46% with t r ˆ 1, while the performance of the crossbar stays almost the same Fig. 7). The sources of this performance di erence between two networks are considered to be decreasing write miss rate by 20% and increasing degree of nearest neighbor communication for N ˆ 32! 64. With t r ˆ 1, the bidirectional ring outperforms the crossbar by 2%, 9% and 58% for N ˆ 16, 32 and 64 and t s ˆ 4, 8, and 16, respectively. The miss rate and local miss rate of Speech are constant over increasing N. Therefore, the growth of communication latency of interconnection networks is the dominant factor for the performance. In addition, since Speech has the all-to-all communication pattern, relative performance of Speech is mainly a ected by the speed of both networks Fig. 8). With t r ˆ 1, the bidirectional ring outperforms the crossbar by 5%, 16%, and 45% for N ˆ 16, 32 and 64 and t s ˆ 4, 8, and 16, respectively. For N ˆ 32! 64, the bidirectional ring still provides some level of

9 H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 51 Fig. 4. Relative performance/simple. Fig. 5. Speed up/simple. speed up 18% to 66%), while the performance of the crossbar 7 can be either increased by 32% or decreased by 42% depending on the speed of the switch Fig. 9). The average geometric mean) of relative performance and speed up of both networks are shown in Figs. 10 and 11. On the average of four applications, the bidirectional ring provides better performance by 4%, 21%, and 61% for N ˆ 16, 32 and 64 and t s ˆ 4, 8, and 16, respectively. The slowest ring speed for the bidirectional ring Figs. 12±14) to outperform the crossbar are shown in Table 3. For N ˆ 16, we need a very fast ring that can operate at the same speed as the

10 52 H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 Fig. 6. Relative performance/weather. Fig. 7. Speed up/weather. processor clock t r ˆ 1 to outperform a very fast implementation of a crossbar switch network t s ˆ 4. If the speed of a crossbar switch is moderate, a ring that operates at half the speed of the processor clock is su cient. For N ˆ 32, a bidirectional ring with half and one fourth the speed of the processor clock su ce to outperform the crossbar switch network with very fast and moderate t s ˆ 8 and 16) speed respectively. For N ˆ 64, it is much easier for a bidirectional ring to out-

11 H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 53 Fig. 8. Relative performance/speech. Fig. 9. Speed up/speech. perform a crossbar switch network due to the latter physically slow operating speed. However, although it outperforms the crossbar, without su ciently fast ring speed, we cannot expect speed up on the bidirectional ring. For example, on the average of four applications, we have speed up of 48% on the bidirectional ring with t r ˆ 1 when N is increased from 32 to 64. On the other hand, if t r ˆ 5, which is su cient to outperform a moderately fast crossbar, the speed up is only 11% for N ˆ 32! 64.

12 54 H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 Fig. 10. Relative performance/average. Fig. 11. Speed up/average. 5. Conclusions In this paper, we have evaluated the performance of the bidirectional ring by comparing it to the performance of the crossbar network. We used a hybrid evaluation method which is a combination of an analytical model and workload parameters extracted from the memory traces of four parallel applications. Our study indicates that for a 16nodes con guration, which is a

13 H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 55 Fig. 12. Home node distribution bidirectional ring, 16nodes). Fig. 13. Home node distribution bidirectional ring, 32 nodes). Fig. 14. Home node distribution bidirectional ring, 64 nodes).

14 56 H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 Table 3 Ring speed to outperform crossbar N t s Application Average FFT Simple Weather Speech > > >5 >5 >5 >5 64 >5 >5 >5 >5 >5 typical size of crossbar switches used in some of actual multiprocessors, both architecture achieve similar performance for the link speed we have assumed. For a 32 nodes con guration, the bidirectional ring outperforms the crossbar by 21% on the average of four application programs, with an assumption that the growth rate of the communication latency of the crossbar is suppressed to O N. For a 64 nodes con guration, we can still expect speed up on the bidirectional ring 48% on the average) while that of the crossbar increases only by 8%. Further investigations could include analysis of dynamic behavior of both networks such as hot spot contention) using execution-driven simulations and the hierarchical network models. References [1] Oi H, Ranganathan N. Performance analysis of the bidirectional ring-based multiprocessor. Proceedings of the ISCA Tenth International Conference on Parallel and Distributed Computing Systems, p. 397±400. [2] Oi H, Ranganathan N. E ect of message length and processor speed on the performance of the bidirectional ringbased multiprocessor. Proceedings on the International Conference on Computer Design, October p. 267±72. [3] Lang T, Valero M, Alegre I. Bandwidth of crossbar and multiple-bus connection for multiprocessors. Trans Comp IEEE 1982;c31 12):1227±34. [4] Ravindran G, Stumm M. A performance comparison of hierarchical ring- and mesh-connected multiprocessor networks. Proceedings of International Symposium on High Performance Computer Architecture, February p. 58±69. [5] Barroso LA, Dubois M. Performance evaluation of the slotted ring multiprocessor. Trans Comp IEEE 1995;44 7):878±90. [6] Vranesic Z, et al. The NUMAchine Multiprocessor. Technical report, Department of electrical and computer engineering, Department of computer science, University of Toronto, June [7] Kendall square research corporation. KSR1 Technical Summary, [8] Exemplar System Architecture. Hewlett Packard. [9] Boku T, et al. Architecture of massively parallel processor CP-PACS. Proceedings of the 1997 Second Aizu International Symposium on Parallel Algorithms/Architecture Synthesis, Fukushima, Japan, p. 31±40. [10] Sterling T, et al. An empirical evaluation of the convex SPP-1000 hierarchical shared memory system. Int J Parallel Prog 1996;24 4):377±96.

15 H. Oi, N. Ranganathan / Computers and Electrical Engineering ) 43±57 57 [11] Gustavason DB. Scalable coherent interface and related standards projects. IEEE MICRO 1992;12 1):10±22. [12] Zhang X, Yan Y. Comparative modeling and evaluation of CC-NUMA and COMA on hierarchical ring architecture. Trans Parallel Distributed Sys IEEE 1995;6 12):1316±31. [13] Farkas K, Vranesi Z, Stumn M. Scalable cache consistency for hierarchically structured multiprocessors. J Supercomput 1995;8:345±69. [14] Davis H, Goldschmidt SR, Hennessy R. Multiprocessor simulation and tracing using tango. Proceedings of the 1991 International Conference on Parallel Processing, vol. II, p. 99±107. [15] Bhuyan L, Ghosal D, Yang Q. Approximate analysis of single and multiple ring networks. Trans Comp IEEE 1989;38 7):1027±40. [16] Chaiken D, et al. Directory-based cache coherence in large-scale multiprocessors. IEEE Comp 1990;23 6):49±58. [17] ftp://tracebase.nmsu.edu/pub/, Parallel Architecture Research Laboratory, New Mexico State University. Hitoshi Oi is an assistant professor in the Department of Computer Science and Engineering at the Florida Atlantic University. Before joining FAU, he was working with HAL Computer Systems in Campbell, California, as a Computer Architect. He received a Ph.D. degree in Computer Science and Engineering from the University of South Florida. He also reveived a Bachelor's degree in Electronics and Communication Engineering and a Master's degree in Electrical Engineering, both from Meiji University. His research interests include computer architecture and performance evaluation, LSI design and veri cation, and non-standard logic. N. Ranganathan received the Ph.D. degree in Computer Science from the University of Central Florida, Orlando in He is currently a professor in the Department of Computer Science and Engineering and the Center for Microelectronics Research at the University of South Florida, Tampa. His research interests include VLSI design, design automation, hardware algorithms, computer architecture and parallel processing.

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

Components: Interconnect Page 1 of 18

Components: Interconnect Page 1 of 18 Components: Interconnect Page 1 of 18 PE to PE interconnect: The most expensive supercomputer component Possible implementations: FULL INTERCONNECTION: The ideal Usually not attainable Each PE has a direct

More information

Parallel Programming

Parallel Programming Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Parallel Architectures Acknowledgements Prof. Felix

More information

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

Vorlesung Rechnerarchitektur 2 Seite 178 DASH Vorlesung Rechnerarchitektur 2 Seite 178 Architecture for Shared () The -architecture is a cache coherent, NUMA multiprocessor system, developed at CSL-Stanford by John Hennessy, Daniel Lenoski, Monica

More information

Principles and characteristics of distributed systems and environments

Principles and characteristics of distributed systems and environments Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

Module 5. Broadcast Communication Networks. Version 2 CSE IIT, Kharagpur

Module 5. Broadcast Communication Networks. Version 2 CSE IIT, Kharagpur Module 5 Broadcast Communication Networks Lesson 1 Network Topology Specific Instructional Objectives At the end of this lesson, the students will be able to: Specify what is meant by network topology

More information

Influence of Load Balancing on Quality of Real Time Data Transmission*

Influence of Load Balancing on Quality of Real Time Data Transmission* SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 6, No. 3, December 2009, 515-524 UDK: 004.738.2 Influence of Load Balancing on Quality of Real Time Data Transmission* Nataša Maksić 1,a, Petar Knežević 2,

More information

Interconnection Networks

Interconnection Networks Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode

More information

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394

More information

Chapter 2 Heterogeneous Multicore Architecture

Chapter 2 Heterogeneous Multicore Architecture Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is

More information

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis Parallel Computers Definition: A parallel computer is a collection of processing

More information

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING CHAPTER 6 CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING 6.1 INTRODUCTION The technical challenges in WMNs are load balancing, optimal routing, fairness, network auto-configuration and mobility

More information

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09 Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors NoCArc 09 Jesús Camacho Villanueva, José Flich, José Duato Universidad Politécnica de Valencia December 12,

More information

Lecture 23: Multiprocessors

Lecture 23: Multiprocessors Lecture 23: Multiprocessors Today s topics: RAID Multiprocessor taxonomy Snooping-based cache coherence protocol 1 RAID 0 and RAID 1 RAID 0 has no additional redundancy (misnomer) it uses an array of disks

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES

QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES SWATHI NANDURI * ZAHOOR-UL-HUQ * Master of Technology, Associate Professor, G. Pulla Reddy Engineering College, G. Pulla Reddy Engineering

More information

Why the Network Matters

Why the Network Matters Week 2, Lecture 2 Copyright 2009 by W. Feng. Based on material from Matthew Sottile. So Far Overview of Multicore Systems Why Memory Matters Memory Architectures Emerging Chip Multiprocessors (CMP) Increasing

More information

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc (International Journal of Computer Science & Management Studies) Vol. 17, Issue 01 Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc Dr. Khalid Hamid Bilal Khartoum, Sudan dr.khalidbilal@hotmail.com

More information

Router Architectures

Router Architectures Router Architectures An overview of router architectures. Introduction What is a Packet Switch? Basic Architectural Components Some Example Packet Switches The Evolution of IP Routers 2 1 Router Components

More information

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK

AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK Abstract AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK Mrs. Amandeep Kaur, Assistant Professor, Department of Computer Application, Apeejay Institute of Management, Ramamandi, Jalandhar-144001, Punjab,

More information

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches 3D On-chip Data Center Networks Using Circuit Switches and Packet Switches Takahide Ikeda Yuichi Ohsita, and Masayuki Murata Graduate School of Information Science and Technology, Osaka University Osaka,

More information

Operating System Multilevel Load Balancing

Operating System Multilevel Load Balancing Operating System Multilevel Load Balancing M. Corrêa, A. Zorzo Faculty of Informatics - PUCRS Porto Alegre, Brazil {mcorrea, zorzo}@inf.pucrs.br R. Scheer HP Brazil R&D Porto Alegre, Brazil roque.scheer@hp.com

More information

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

Agenda. Distributed System Structures. Why Distributed Systems? Motivation Agenda Distributed System Structures CSCI 444/544 Operating Systems Fall 2008 Motivation Network structure Fundamental network services Sockets and ports Client/server model Remote Procedure Call (RPC)

More information

Annotation to the assignments and the solution sheet. Note the following points

Annotation to the assignments and the solution sheet. Note the following points Computer rchitecture 2 / dvanced Computer rchitecture Seite: 1 nnotation to the assignments and the solution sheet This is a multiple choice examination, that means: Solution approaches are not assessed

More information

Operating Systems and Networks Sample Solution 1

Operating Systems and Networks Sample Solution 1 Spring Term 2014 Operating Systems and Networks Sample Solution 1 1 byte = 8 bits 1 kilobyte = 1024 bytes 10 3 bytes 1 Network Performance 1.1 Delays Given a 1Gbps point to point copper wire (propagation

More information

MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL

MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL Sandeep Kumar 1, Arpit Kumar 2 1 Sekhawati Engg. College, Dundlod, Dist. - Jhunjhunu (Raj.), 1987san@gmail.com, 2 KIIT, Gurgaon (HR.), Abstract

More information

Scaling 10Gb/s Clustering at Wire-Speed

Scaling 10Gb/s Clustering at Wire-Speed Scaling 10Gb/s Clustering at Wire-Speed InfiniBand offers cost-effective wire-speed scaling with deterministic performance Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400

More information

How To Understand The Concept Of A Distributed System

How To Understand The Concept Of A Distributed System Distributed Operating Systems Introduction Ewa Niewiadomska-Szynkiewicz and Adam Kozakiewicz ens@ia.pw.edu.pl, akozakie@ia.pw.edu.pl Institute of Control and Computation Engineering Warsaw University of

More information

On the Feasibility of Prefetching and Caching for Online TV Services: A Measurement Study on Hulu

On the Feasibility of Prefetching and Caching for Online TV Services: A Measurement Study on Hulu On the Feasibility of Prefetching and Caching for Online TV Services: A Measurement Study on Hulu Dilip Kumar Krishnappa, Samamon Khemmarat, Lixin Gao, Michael Zink University of Massachusetts Amherst,

More information

Interconnection Network Design

Interconnection Network Design Interconnection Network Design Vida Vukašinović 1 Introduction Parallel computer networks are interesting topic, but they are also difficult to understand in an overall sense. The topological structure

More information

Measuring and Characterizing Cache Coherence Traffic

Measuring and Characterizing Cache Coherence Traffic Measuring and Characterizing Cache Coherence Traffic Tuan Bui, Brian Greskamp, I-Ju Liao, Mike Tucknott CS433, Spring 2004 Abstract Cache coherence protocols for sharedmemory multiprocessors have been

More information

A Comparison of General Approaches to Multiprocessor Scheduling

A Comparison of General Approaches to Multiprocessor Scheduling A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University

More information

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of

More information

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc. Filename: SAS - PCI Express Bandwidth - Infostor v5.doc Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI Article for InfoStor November 2003 Paul Griffith Adaptec, Inc. Server

More information

Object Request Reduction in Home Nodes and Load Balancing of Object Request in Hybrid Decentralized Web Caching

Object Request Reduction in Home Nodes and Load Balancing of Object Request in Hybrid Decentralized Web Caching 2012 2 nd International Conference on Information Communication and Management (ICICM 2012) IPCSIT vol. 55 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V55.5 Object Request Reduction

More information

Topological Properties

Topological Properties Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any

More information

Customer Specific Wireless Network Solutions Based on Standard IEEE 802.15.4

Customer Specific Wireless Network Solutions Based on Standard IEEE 802.15.4 Customer Specific Wireless Network Solutions Based on Standard IEEE 802.15.4 Michael Binhack, sentec Elektronik GmbH, Werner-von-Siemens-Str. 6, 98693 Ilmenau, Germany Gerald Kupris, Freescale Semiconductor

More information

Load Balancing Mechanisms in Data Center Networks

Load Balancing Mechanisms in Data Center Networks Load Balancing Mechanisms in Data Center Networks Santosh Mahapatra Xin Yuan Department of Computer Science, Florida State University, Tallahassee, FL 33 {mahapatr,xyuan}@cs.fsu.edu Abstract We consider

More information

SIMULATION STUDY OF BLACKHOLE ATTACK IN THE MOBILE AD HOC NETWORKS

SIMULATION STUDY OF BLACKHOLE ATTACK IN THE MOBILE AD HOC NETWORKS Journal of Engineering Science and Technology Vol. 4, No. 2 (2009) 243-250 School of Engineering, Taylor s University College SIMULATION STUDY OF BLACKHOLE ATTACK IN THE MOBILE AD HOC NETWORKS SHEENU SHARMA

More information

Computer Networks Vs. Distributed Systems

Computer Networks Vs. Distributed Systems Computer Networks Vs. Distributed Systems Computer Networks: A computer network is an interconnected collection of autonomous computers able to exchange information. A computer network usually require

More information

Packetization and routing analysis of on-chip multiprocessor networks

Packetization and routing analysis of on-chip multiprocessor networks Journal of Systems Architecture 50 (2004) 81 104 www.elsevier.com/locate/sysarc Packetization and routing analysis of on-chip multiprocessor networks Terry Tao Ye a, *, Luca Benini b, Giovanni De Micheli

More information

An Active Packet can be classified as

An Active Packet can be classified as Mobile Agents for Active Network Management By Rumeel Kazi and Patricia Morreale Stevens Institute of Technology Contact: rkazi,pat@ati.stevens-tech.edu Abstract-Traditionally, network management systems

More information

System-Level Performance Analysis for Designing On-Chip Communication Architectures

System-Level Performance Analysis for Designing On-Chip Communication Architectures 768 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 20, NO. 6, JUNE 2001 System-Level Performance Analysis for Designing On-Chip Communication Architectures Kanishka

More information

LIN (Local Interconnect Network):

LIN (Local Interconnect Network): LIN (Local Interconnect Network): History: LIN (Local Interconnect Network) was developed as cost-effective alternate to CAN protocol. In 1998 a group of companies including Volvo, Motorola, Audi, BMW,

More information

Optical interconnection networks with time slot routing

Optical interconnection networks with time slot routing Theoretical and Applied Informatics ISSN 896 5 Vol. x 00x, no. x pp. x x Optical interconnection networks with time slot routing IRENEUSZ SZCZEŚNIAK AND ROMAN WYRZYKOWSKI a a Institute of Computer and

More information

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 ISSN 0976 6464(Print)

More information

Interconnection Networks

Interconnection Networks CMPT765/408 08-1 Interconnection Networks Qianping Gu 1 Interconnection Networks The note is mainly based on Chapters 1, 2, and 4 of Interconnection Networks, An Engineering Approach by J. Duato, S. Yalamanchili,

More information

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* Junho Jang, Saeyoung Han, Sungyong Park, and Jihoon Yang Department of Computer Science and Interdisciplinary Program

More information

The Quality of Internet Service: AT&T s Global IP Network Performance Measurements

The Quality of Internet Service: AT&T s Global IP Network Performance Measurements The Quality of Internet Service: AT&T s Global IP Network Performance Measurements In today's economy, corporations need to make the most of opportunities made possible by the Internet, while managing

More information

Chapter 2. Multiprocessors Interconnection Networks

Chapter 2. Multiprocessors Interconnection Networks Chapter 2 Multiprocessors Interconnection Networks 2.1 Taxonomy Interconnection Network Static Dynamic 1-D 2-D HC Bus-based Switch-based Single Multiple SS MS Crossbar 2.2 Bus-Based Dynamic Single Bus

More information

Flexible Deterministic Packet Marking: An IP Traceback Scheme Against DDOS Attacks

Flexible Deterministic Packet Marking: An IP Traceback Scheme Against DDOS Attacks Flexible Deterministic Packet Marking: An IP Traceback Scheme Against DDOS Attacks Prashil S. Waghmare PG student, Sinhgad College of Engineering, Vadgaon, Pune University, Maharashtra, India. prashil.waghmare14@gmail.com

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.

More information

Real Time Bus Monitoring System by Sharing the Location Using Google Cloud Server Messaging

Real Time Bus Monitoring System by Sharing the Location Using Google Cloud Server Messaging Real Time Bus Monitoring System by Sharing the Location Using Google Cloud Server Messaging Aravind. P, Kalaiarasan.A 2, D. Rajini Girinath 3 PG Student, Dept. of CSE, Anand Institute of Higher Technology,

More information

Performance evaluation of TCP connections in ideal and non-ideal network environments

Performance evaluation of TCP connections in ideal and non-ideal network environments Computer Communications 24 2001) 1769±1779 www.elsevier.com/locate/comcom Performance evaluation of TCP connections in ideal and non-ideal network environments Hala ElAarag, Mostafa Bassiouni* School of

More information

Communications and Computer Networks

Communications and Computer Networks SFWR 4C03: Computer Networks and Computer Security January 5-8 2004 Lecturer: Kartik Krishnan Lectures 1-3 Communications and Computer Networks The fundamental purpose of a communication system is the

More information

Quality of Service Routing Network and Performance Evaluation*

Quality of Service Routing Network and Performance Evaluation* Quality of Service Routing Network and Performance Evaluation* Shen Lin, Cui Yong, Xu Ming-wei, and Xu Ke Department of Computer Science, Tsinghua University, Beijing, P.R.China, 100084 {shenlin, cy, xmw,

More information

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory Computer Organization and Architecture Chapter 4 Cache Memory Characteristics of Memory Systems Note: Appendix 4A will not be covered in class, but the material is interesting reading and may be used in

More information

Scalable Source Routing

Scalable Source Routing Scalable Source Routing January 2010 Thomas Fuhrmann Department of Informatics, Self-Organizing Systems Group, Technical University Munich, Germany Routing in Networks You re there. I m here. Scalable

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Network Structure or Topology

Network Structure or Topology Volume 1, Issue 2, July 2013 International Journal of Advance Research in Computer Science and Management Studies Research Paper Available online at: www.ijarcsms.com Network Structure or Topology Kartik

More information

Assignment #3 Routing and Network Analysis. CIS3210 Computer Networks. University of Guelph

Assignment #3 Routing and Network Analysis. CIS3210 Computer Networks. University of Guelph Assignment #3 Routing and Network Analysis CIS3210 Computer Networks University of Guelph Part I Written (50%): 1. Given the network graph diagram above where the nodes represent routers and the weights

More information

A Review of Customized Dynamic Load Balancing for a Network of Workstations

A Review of Customized Dynamic Load Balancing for a Network of Workstations A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester

More information

A Routing Metric for Load-Balancing in Wireless Mesh Networks

A Routing Metric for Load-Balancing in Wireless Mesh Networks A Routing Metric for Load-Balancing in Wireless Mesh Networks Liang Ma and Mieso K. Denko Department of Computing and Information Science University of Guelph, Guelph, Ontario, Canada, N1G 2W1 email: {lma02;mdenko}@uoguelph.ca

More information

Resource Allocation Schemes for Gang Scheduling

Resource Allocation Schemes for Gang Scheduling Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Communication Networks. MAP-TELE 2011/12 José Ruela

Communication Networks. MAP-TELE 2011/12 José Ruela Communication Networks MAP-TELE 2011/12 José Ruela Network basic mechanisms Introduction to Communications Networks Communications networks Communications networks are used to transport information (data)

More information

Building an Inexpensive Parallel Computer

Building an Inexpensive Parallel Computer Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University

More information

Asynchronous Bypass Channels

Asynchronous Bypass Channels Asynchronous Bypass Channels Improving Performance for Multi-Synchronous NoCs T. Jain, P. Gratz, A. Sprintson, G. Choi, Department of Electrical and Computer Engineering, Texas A&M University, USA Table

More information

CCNA R&S: Introduction to Networks. Chapter 5: Ethernet

CCNA R&S: Introduction to Networks. Chapter 5: Ethernet CCNA R&S: Introduction to Networks Chapter 5: Ethernet 5.0.1.1 Introduction The OSI physical layer provides the means to transport the bits that make up a data link layer frame across the network media.

More information

ADVANCED COMPUTER ARCHITECTURE: Parallelism, Scalability, Programmability

ADVANCED COMPUTER ARCHITECTURE: Parallelism, Scalability, Programmability ADVANCED COMPUTER ARCHITECTURE: Parallelism, Scalability, Programmability * Technische Hochschule Darmstadt FACHBEREiCH INTORMATIK Kai Hwang Professor of Electrical Engineering and Computer Science University

More information

Computer Networking: A Survey

Computer Networking: A Survey Computer Networking: A Survey M. Benaiah Deva Kumar and B. Deepa, 1 Scholar, 2 Assistant Professor, IT Department, Sri Krishna College of Arts and Science College, Coimbatore, India. Abstract- Computer

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

QoS issues in Voice over IP

QoS issues in Voice over IP COMP9333 Advance Computer Networks Mini Conference QoS issues in Voice over IP Student ID: 3058224 Student ID: 3043237 Student ID: 3036281 Student ID: 3025715 QoS issues in Voice over IP Abstract: This

More information

Dynamic Source Routing in Ad Hoc Wireless Networks

Dynamic Source Routing in Ad Hoc Wireless Networks Dynamic Source Routing in Ad Hoc Wireless Networks David B. Johnson David A. Maltz Computer Science Department Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213-3891 dbj@cs.cmu.edu Abstract

More information

Performance Evaluation of Mobile Agent-based Dynamic Load Balancing Algorithm

Performance Evaluation of Mobile Agent-based Dynamic Load Balancing Algorithm Performance Evaluation of Mobile -based Dynamic Load Balancing Algorithm MAGDY SAEB, CHERINE FATHY Computer Engineering Department Arab Academy for Science, Technology & Maritime Transport Alexandria,

More information

PCI Express* Ethernet Networking

PCI Express* Ethernet Networking White Paper Intel PRO Network Adapters Network Performance Network Connectivity Express* Ethernet Networking Express*, a new third-generation input/output (I/O) standard, allows enhanced Ethernet network

More information

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E) Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E) 1 Topologies Internet topologies are not very regular they grew incrementally Supercomputers

More information

Forced Low latency Handoff in Mobile Cellular Data Networks

Forced Low latency Handoff in Mobile Cellular Data Networks Forced Low latency Handoff in Mobile Cellular Data Networks N. Moayedian, Faramarz Hendessi Department of Electrical and Computer Engineering Isfahan University of Technology, Isfahan, IRAN Hendessi@cc.iut.ac.ir

More information

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

COMPUTER HARDWARE. Input- Output and Communication Memory Systems COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)

More information

Implementation of a Lightweight Service Advertisement and Discovery Protocol for Mobile Ad hoc Networks

Implementation of a Lightweight Service Advertisement and Discovery Protocol for Mobile Ad hoc Networks Implementation of a Lightweight Advertisement and Discovery Protocol for Mobile Ad hoc Networks Wenbin Ma * Department of Electrical and Computer Engineering 19 Memorial Drive West, Lehigh University Bethlehem,

More information

Chapter 18: Database System Architectures. Centralized Systems

Chapter 18: Database System Architectures. Centralized Systems Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

Chapter 4 Multi-Stage Interconnection Networks The general concept of the multi-stage interconnection network, together with its routing properties, have been used in the preceding chapter to describe

More information

Computer Networks. By Hardeep Singh

Computer Networks. By Hardeep Singh Computer Networks Contents Introduction Basic Elements of communication systemnetwork Topologies Network types Introduction A Computer network is a network of computers that are geographically distributed,

More information

Behavior Analysis of TCP Traffic in Mobile Ad Hoc Network using Reactive Routing Protocols

Behavior Analysis of TCP Traffic in Mobile Ad Hoc Network using Reactive Routing Protocols Behavior Analysis of TCP Traffic in Mobile Ad Hoc Network using Reactive Routing Protocols Purvi N. Ramanuj Department of Computer Engineering L.D. College of Engineering Ahmedabad Hiteishi M. Diwanji

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages

Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages Soohyun Yang and Yeonseung Ryu Department of Computer Engineering, Myongji University Yongin, Gyeonggi-do, Korea

More information

InfiniBand Clustering

InfiniBand Clustering White Paper InfiniBand Clustering Delivering Better Price/Performance than Ethernet 1.0 Introduction High performance computing clusters typically utilize Clos networks, more commonly known as Fat Tree

More information

A Reputation Replica Propagation Strategy for Mobile Users in Mobile Distributed Database System

A Reputation Replica Propagation Strategy for Mobile Users in Mobile Distributed Database System A Reputation Replica Propagation Strategy for Mobile Users in Mobile Distributed Database System Sashi Tarun Assistant Professor, Arni School of Computer Science and Application ARNI University, Kathgarh,

More information