Quality-of-service and error control techniques for mesh-based network-on-chip architectures

Size: px
Start display at page:

Download "Quality-of-service and error control techniques for mesh-based network-on-chip architectures"

Transcription

1 INTEGRATION, the VLSI journal 38 (25) Quality-of-service and error control techniques for mesh-based network-on-chip architectures Praveen Vellanki, Nilanjan Banerjee, Karam S. Chatha Department of CSE, Arizona State University, P.O. BOX 87546, Tempe, AZ , USA Received 16 June 24; received in revised form 19 July 24; accepted 21 July 24 Abstract Network-on-a-chip (NoC) has been proposed as a solution for addressing the design challenges of future high-performance system-on-chip architectures in the nanoscale regime. Many real-time applications require input data that arrives with low delay jitter. Such communication traffic can only be supported by incorporating multiple levels of service in the interconnection network. Further, as technology scales toward deep submicron, on-chip interconnects are becoming more and more sensitive to noise sources such as power supply noise, crosstalk, and radiation induced effects, that are likely to reduce the reliability of data. Hence, effective error control schemes are required for ensuring data integrity. This paper addresses two important aspects of NoC architectures, quality of service and error control schemes and makes the following contributions: (i) it presents techniques for supporting guaranteed throughput (for low delay jitter traffic) and best-effort traffic quality levels in NoC router, (ii) it presents architectures for integrating error control schemes in the NoC router architecture, and (iii) it presents cycle accurate power and performance models of the two architecture enhancements for a mesh based NoC architecture. r 24 Elsevier B.V. All rights reserved. Keywords: Network-on-chip; Quality-of-service, Error-control; Power consumption; Performance Corresponding author. Department of Computer Science and Engineering, Arizona State University, Brickyard Suite 51, 699 South Mill Avenue, Tempe, AZ85281, USA. Tel.: ; fax: addresses: pvellanki@asu.edu (P. Vellanki), nbanerjee@asu.edu (N. Banerjee), kchatha@asu.edu (K.S. Chatha) /$ - see front matter r 24 Elsevier B.V. All rights reserved. doi:1.116/j.vlsi

2 354 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Introduction The physical characteristics of nanoscale technologies will pose several challenges to the systemon-chip (SoC) designers. Global signal delays will span multiple clock cycles [1,2]. Signal integrity will also be compromised due to increased RC effects, inductance, and cross-coupling capacitances [3]. Nanoscale packet switched networks or network-on-chip (NoC) have been proposed as architectural solution for SoC design in the nanoscale regime [4 8]. Packet switching supports asynchronous transfer of information. It provides extremely high bandwidth by distributing the propagation delay across multiple switches, thus pipelining the signal transmission. Packet switching networks also support error detection and correction schemes that can be applied towards improving the signal integrity. Quality of service (QoS) can be ensured by distinguishing between different types of traffic. In this paper, we present techniques for supporting multiple levels of service and error control schemes for a mesh based NoC architecture. Fig. 1 plots the variation in packet latency for destinations that are uniformly 3 hops away in a 4 4 mesh based NoC architecture for a router with 4 virtual channels at an injection rate of.5 packets/cycle/node. The x-axis denotes the latency of various packets, and y-axis denotes the number of packets. The mean latency of the plot is clock cycles which is close to the peak of the plot. However, there are a large number of packets (214, over 5%) that experience transmission latency that is more than double the average latency. Such a large variance in average latency is unacceptable for many NoC implementations such as traffic between a cache and lower level memory, or different processing elements of a multimedia application. We present techniques for supporting both low jitter guaranteed throughput and best-effort traffic in a NoC router. Cycle accurate power and performance models for trade-off analysis of the two techniques are also presented. In the nanoscale regime, crosstalk on long global wires will be a major source of errors. Switching activity on aggressor links can cause errors by either forcing a logic transition 4 35 BE 3 Noof Packets Latency (cycles) Fig. 1. Variation in packet latency.

3 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) on a stable victim link or by delaying the transition on a switching victim link. Both these instances result in capture of an incorrect logic level at the receiver. A number of error control schemes [9] have been proposed for general communication networks. In a NoC architecture, due to the stringent performance and power constraints, low complexity and low power error control schemes are desirable. Hence, we have implemented two low overhead error control: single error detection and retransmission (PAR), and single error correction (SEC). We also present power and performance trade-offs of the two schemes under variable traffic profile. The trade-off in performance versus power consumption of interconnection network is a key question. The performance of the nanoscale interconnection network can be specified by the average latency of sending a message through the network, and the bandwidth of the network. The power consumption of the network consists of the dynamic and leakage power consumption of the various components. This paper also presents results for power versus performance trade-off analysis for different service levels of traffic and error control schemes. We integrated the QoS and error control schemes into a VHDL based cycle accurate power and performance model of NoC architecture. The model is a parameterized register transfer level (RTL) design of the NoC architecture elements. The design is parameterized on (i) size of packets, (ii) length and width of physical links, (iii) number, and depth of virtual channels, and (iv) switching technique. The model is annotated with delay, dynamic, and leakage energy estimates of the various components. The model can estimate the latency, throughput, dynamic, and leakage power consumption of a NoC architecture. The RTL design for the QoS and error control circuitry was synthesized and the SPICE level netlist was extracted from the layout. The design was then characterized for delay, and dynamic and leakage power consumption at :18 mm: The characterized values were integrated into the VHDL based RTL design to build the cycle accurate performance model. The paper is organized as follows: Section 2 discusses the previous work, Section 3 gives a quick overview of the NoC architecture and the cycle accurate performance model, 4 discusses the QoS schemes, 5 discusses error control techniques, Section 6 discusses the packet format and protocol, Section 7 presents the experimental results, and Section 8 concludes the paper. 2. Previous work In recent years a number of researchers have proposed architectures, performance evaluation techniques and optimization approaches for NoC. This section classifies and presents the existing research under four categories: seminal work, router architectures, performance models, and automated optimization approaches. Our paper discusses innovative router architectures for supporting guaranteed throughput, and error control schemes in mesh based on-chip interconnection networks, and presents power and performance evaluation models for the same. The work presented in our paper can be classified under both router architectures, and performance models. Hence, in the following section we compare and contrast our work with existing techniques in both categories.

4 356 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Seminal work Guerrier et al. [4] presented a NoC design called SPIN that was based on fat-tree topology. They also presented the router architecture and cycle accurate performance model for their NoC design. Sgroi et al. [5] discussed a platform based SoC design methodology that proposed the inclusion of NoC for supporting on-chip communication. Dally et al. [6] demonstrated the feasibility of the NoC and estimated that the NoC places an area overhead of 6.6%. Benini et al. [7] in their conceptual paper on NoC, predict that packet switched on-chip interconnection networks will be essential to address the complexity of future SoC designs. Kumar et al. [8] presented a conceptual system-level architecture that allowed a mesh-based NoC to accommodate large resources such as memory banks, FPGA areas, or high performance multi-processors. Except for Guerrier et al. [4] all the above mentioned works did not present detailed architectures or performance models. We will address [4] in more detail when we discuss NoC architectures and performance models NoC architectures Several researchers have proposed architectures, and related optimizations for on-chip interconnection networks. We classify the related research on NoC architectures based on the supported levels of traffic service classes, error control schemes, and power optimizations Architectures for best effort traffic In this paragraph we review the NoC architectures that support only best effort traffic class. SPIN [4,1,11] was one of the seminal works to propose a detailed NoC architecture built with fat tree topology. Proteo [12,13] is a VSIA-complaint NoC architecture that can be configured for ring, star, and bus topologies. Xpipes [14] is a parameterized router architecture that can be utilized in arbitrary NoC topologies. As shown in Fig. 1, best effort traffic class is limited by large deviation in average latency which is not desirable for many real-time applications. In this paper we present a technique for supporting low jitter guaranteed throughput traffic Architectures for guaranteed throughput traffic Nostrum [15,16] is a protocol stack for mesh based NoC architecture that supports both best effort and guaranteed throughput traffic classes. Nostrum ensures bandwidth for guaranteed throughput traffic by reserving time slots called looped containers for its transmission on interrouter links. If no guaranteed throughput traffic is injected into the network the time slots are not utilized. In contrast we support guaranteed throughput traffic by reserving a certain number of virtual channels (buffers). Hence, if no guaranteed throughput traffic is injected into the network the best effort traffic can be transported with maximum bandwidth. AEthereal [17,18] is also a mesh based NoC architecture that supports guaranteed throughput traffic by utilizing a centralized scheduler for allocation of link bandwidth. Our architecture utilizes a distributed scheme where the traffic producer sets-up a guaranteed throughput connection by reserving virtual channels, transfers the data, and then tears down the connection by giving up the virtual channels. Finally, neither of these two works presented detailed results for performance and power consumption of their respective architectures.

5 Architectures with error control schemes Bertozziet al. [19] presented power versus performance results for point-to-point error control in an on-chip bus protocol based on AMBA bus. Their work did not address NoC architectures, and did not consider the influence of network traffic on the performance of the error control schemes. Zimmer et al. [2] presented a fault model for NoC architecture. They also proposed a QoS scheme that treated control traffic with higher reliability than data traffic. In contrast, our paper presents a QoS scheme for guaranteed throughput and best-effort traffic. The performance and power consumption of the error control schemes in the presence of variable traffic profiles for a mesh-based NoC architecture have also been discussed Architectural optimizations for low power Worm et al. [21] proposed an adaptive low power transmission scheme for NoC that minimized the voltage swing and frequency subject to the workload requirement. Chen et al. [22] proposed power-aware buffer policy that minimized the leakage power consumption in virtual channels. Simunic et al. [23] proposed a system-level power reduction scheme for SoC architectures with onchip interconnection networks. Their scheme applied dynamic voltage management and dynamic voltage scaling policies based on both local and global workload information. Our work is focused on architecture extensions and performance models for supporting guaranteed throughput and error control schemes Performance evaluation Innovative performance evaluation models are required to address the design challenges of NoC based interconnection architectures. Although there are a number of models for network performance evaluation [24 27], these models do not consider the power consumption characteristics. Current system level performance evaluation tools [28 3] are targeted towards shared bus architectures and do not consider interconnection networks. Traditional solutions for on-chip global communication include models for various shared-bus [31 33] and ad hoc pointto-point interconnections. Wassal et al. [34] proposed system-level performance and power models for a shared-memory internet protocol/asynchronous transfer mode switching fabric. Ye et al. [35] analyzed the power consumption in the switch fabrics of network routers and proposed system-level models for the same. Pamunuwa et al. [36] performed a system level analysis and estimated the wiring overhead and the gate count for implementing mesh-based NoC architecture. They also estimated the power consumption by assuming switching activity on 5% of the gates. Wang et al. [37] proposed a power-performance simulator for interconnection network called Orion. All these models do not incorporate the QoS and error control schemes. Bolotin et al. [38] proposed analytical models for system-level performance and cost estimation of NoC architectures. They did not address the power consumption in NoC Automated design techniques P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) In the recent past researchers have begin to address the problem of synthesizing custom NoC architectures, and mapping communication traffic on them. Pinto et al. [39] presented a quadratic

6 358 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) programming based approach for synthesis of custom NoC architectures. Hu et al. [4] presented an integrated task and communication scheduling approach for mapping applications on meshbased NoC architectures. Murali et al. [41] presented a technique for bandwidth constrained mapping of cores to mesh based NoC architectures. As opposed to the synthesis techniques, this paper focuses on architectural extensions and performance models. 3. NoC architecture and characterization In the following paragraphs we describe the architecture of the various NoC elements (physical links, routers), and the techniques applied for their characterization Physical links The physical links include the data and control wires for communication between two router elements of the interconnection network Characterization of physical links The power and performance of a physical link is determined by its width (number of bits of data and control signals), length, and capacitive load of the router. In nanoscale technologies, individual wires are modeled by distributed RLC expressions for accurate description of their physical characteristics [42]. The RLC and cross-coupling capacitances of the interconnection model were obtained from the Berkeley Predictive Technology Model website [43]. We characterized the links in sets of three, two and single wire, respectively for :18 mm technology. The three and two wire sets included the distributed RLC effects and cross-coupling capacitances, while the single wire model only included the distributed RLC effects. We considered three different types of links: local ðp1 mmþ; intermediate ð41 mm and ðp4 mmþ; and global ð44 mmþ [1]. We obtained energy values for 64 ð8 8Þ; 16 ð4 4Þ and 4 ð2 2Þ different switching combinations for the three, two and single wire sets, respectively. The wire lengths were incremented in steps of 1 mm up to 1 mm; steps of 5 mm up to 4 mm and steps of 1 mm up to 5 mm: Table 1 summarizes the switching energy consumed in :18 mm technology for three wire-set switching for 1, 1 and 5 mm; respectively Performance evaluation of physical links We included the link characterization values as a table in our performance model. The energy consumed by a n-bit wide link can be calculated from the energy consumed by the three, two and single wire sets of similar length. For example, consider the 9-bit (odd) wide link shown in the lefthand side of Fig. 2. The total switching energy consumed by the links can be calculated by adding the switching energy consumed by the three wire sets S, S1, S2 and S3, and subtracting the energy consumed by single wire links A, B, and C, respectively. In the case of a 8-bit (even) wide link shown in the right-hand side of Fig. 2, the energy consumed by two wire set S3 is included in the calculation. The length of the physical link which is a major factor in determining its power consumption and performance is specified by the designer.

7 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Table 1 3 wire-set characterization Switching Energy (in fj) 1 mm 1 mm 5 mm (-), (1-1), (1-1), (11-11), (1-1), (11-11), (11-11), ( ) (-1), (-1), (1-11), (1-11), (1-11), (11-111), (1-11), (11-111), (-1), (1-11), (1-11), (11-111) (-11), (-11), (1-111), (1-111) (-11), (1-111) (-111) (1-1), (1-1), (11-11), (11-11) (1-1), (11-11) (1-11), (11-1) (1-11) S A S1 B S2 C S3 Odd number of links S S1 S2 A B C S3 Even number of links Total Energy = E(s) + E(s1) + E(s2) + E(s3) - E(a) - E(b) - E(c) Fig. 2. Performance evaluation of links The NoC router A router architecture that can be utilized in a 2D mesh topology is shown in Fig. 3. The router consists of five unit routers to communicate in X-minus, X-plus, Y-minus, and Y-plus directions, and with the processor. Unit routers inside a single router are connected through a 5 5 crossbar. Data is transferred across routers or between the processor and the corresponding router by an asynchronous handshaking protocol. A single unit router is highlighted in lower half of Fig. 3. It consists of input and output link controllers, virtual channels, a header decoder and an arbiter. Data arrives at an input virtual channel of an unit router from either the previous router or the processor connected to the same router. The header decoder decodes the header flit of the packet after receiving data from the input virtual channel, decides the packet s destination direction (X ; Xþ; Y ; Yþ; processor), and sends a request to the arbiter of the unit router in

8 36 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Crossbar Control Lines To/From "Y+" Router To/From Processor Out FIFO Request and Grant Lines Link controller, Header decoder Arbiter & FIFO Link controller, Header decoder Arbiter & FIFO Link Control Lines To/From "X _ " Router Link controller, Header decoder Arbiter & FIFO Cross Link controller, Header decoder Arbiter & FIFO To/From "X+" Router Link Data Lines Link controller, Header decoder Arbiter & FIFO Data Lines To/From "Y grant clear req Signal from out FIFO through crossbar Signal to out FIFO through crossbar full Header Decoder wr_req N rd_req empty N Input Link Controller Error Decoder full wr_e N N wr_req wr_ack wr_vcid Data to crossbar rd_vcid Virtual Channel N GT Virtual Channel... GT Virtual Channel 1 BE In FIFO Data from neighbouring router wr_vcid Data from crossbar full N Control to crossbar Arbiter req clear grant Virtual Channel N GT Virtual Channel... GT Virtual Channel 1 BE Out FIFO empty rd_e N N Error Encoder Output LinkController Data to neighbouring router rd_req rd_ack rd_vcid Fig. 3. Router architecture.

9 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Table 2 Unit components Unit full adder 2-bit comparator 1-bit flip at the output.96 pj Output transition.15 pj 2-bit flip at the output.168 pj Input change but no output change.78 pj Input change but no output change.552 pj Leakage.77 fj Leakage.438 fj 2-1 Multiplexer D Flip-Flop Output transition.61 pj Output transition.189 pj Input change but no output change fj Input change but no output change.14 pj Leakage.13 fj Leakage.34 fj Nand gate Xor gate Output transition.312 pj Output transition.675 pj Input change but no output change.117 fj Input change but no output change.159 pj Leakage.26 fj Leakage.126 fj the corresponding direction. Once the grant is received the header decoder starts sending data from the input to the output virtual channel through the crossbar. The complete architecture and the detailed implementation can be found in [44]. We designed RTL models for each of the components separately. The larger components were characterized in terms of unit components like unit full adder, 2-bit comparator, 2:1 1-bit multiplexer, D flip-flop, and logic gates. SPICE net-lists for :18 mm technology were extracted for each component and characterized for energy and performance (shown in Table 2). Power consumption of the entire router architecture is computed by including the characterized energy values as table lookups in the RTL model. 4. Quality-of-service schemes In this section we describe the QoS schemes that are supported by our architecture, and their performance and power characterization. The NoC architecture supports two levels of service: best effort (BE) and guaranteed throughput (GT). Each packet is divided into multiple flits. The flit is a unit of transfer between two routers. The packets are routed by a deterministic dimension ordered source routing strategy. This deadlock free strategy first transmits the packet in X-dimension till the x-offset is zero, and then the packet is transmitted in the Y-dimension. Both the service levels ensure guaranteed and in-order delivery of packets. In the following few paragraphs we first describe the BE service level, and then the GT service level Best effort traffic service level The BE traffic service level packets are injected from the input queue into the input virtual channel of the router by the processor if the channel is not full. The processor checks the full

10 362 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) signal before injecting the packet. Inside the network, the same strategy is followed to transmit the each flit of the packet from the output virtual channel of one router to the input virtual channel of the neighboring router. Such a transmission strategy acts as an explicit hop-to-hop flow control mechanism, and together with the dimension ordered routing ensures guaranteed, and in-order delivery of packets. There is a round robin priority based scheduling mechanism for each of the following tasks: Selection of an input virtual channel by the header decoder. Selection of an output virtual channel by the arbiter. Grant of the crossbar to the header decoder by arbiter. Selection of the output virtual channel by the link controller. In all the above decision mechanisms the scheduler is invoked if (i) the packet is partially transmitted and blocked, or (ii) after complete transmission of each packet. Since all the packets are of the same size, the BE round robin priority scheme approximates the theoretically optimal, work-conserving generalized processor sharing (GPS). The GPS scheme provides fair allocation of link bandwidth to all the packets Guaranteed throughput traffic service level Many applications demonstrate bursty traffic behavior that must be transmitted from source to destination with a required throughput and low jitter. Examples are traffic between a cache and lower level memory, or between various processing blocks of a multimedia processing engine. As demonstrated in Fig. 1, the BE traffic service level is unable to support the desired QoS. We support guaranteed throughput traffic by dividing the virtual channels between GT and BE service levels. The number of virtual channels assigned to each service level is a design parameter that is specified by the designer. In the case of heavy network load the GT traffic can be transmitted on the BE virtual channels, but not vice versa. The round robin service mechanism is modified to give priority to the GT traffic over the BE traffic. Among each of the two service levels, every virtual channel gets equal priority. The GT traffic is always transmitted as a stream of packets with a designer specified fixed size. At the processor, the GT packets are queued until the stream size is reached. Once the desired stream size is reached, the GT protocol performs the following three steps; connection set-up, transmission, and tear-down. In the connection set-up, the virtual channels are reserved for the stream all the way from the source to the destination. The connection set-up stage might take a variable amount of time based on the network load. Once the connection is set-up the stream can be transmitted with maximum throughput. After the entire stream has been transmitted the reserved virtual channels are set free by tear-down step. Since, the GT traffic is always transmitted as a stream with maximum throughput, it prevents under-utilization of resources. Further, since the GT traffic is transmitted in discrete streams of fixed sizes, starvation of other GT traffic is also prevented. As the GT traffic can utilize virtual channels that are allocated for BE traffic, there is a possibility for starvation of BE traffic at high injection rates. However, as the experimental results will demonstrate the starvation can be easily avoided by limiting the ratio of GT/BE traffic

11 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) to be around.25 for a router with 4 virtual channels (two virtual channels allocated to GT). This is not un-realistic as only a small portion of the total network traffic is expected to be supported on the GT traffic class Architecture and characterization for QoS schemes The basic router [44] supporting only BE service levels has been enhanced to support multiple levels of service as shown in Fig. 3. The round robin priority based scheduling units present in header decoder, arbiter and output link controller have been modified to give priority to channels transferring GT traffic. For instance, if there are N virtual channels per node in the router, and K of these N channels have been allocated to transmit GT traffic, then the schedulers assign priority to these K channels to transfer data. If GT traffic is not present, then BE packets are allocated resources in a round robin manner. The energy model for the modified architecture is implemented utilizing unit components shown in Table Error control schemes In the nanoscale regime, crosstalk in long global communication wires is expected to be the major source of errors. In this paper, we focus on the crosstalk errors in the links between the routers. The error control schemes are incorporated into the output and input link controllers, respectively. The output link controller includes the encoder, and the input link controller includes the corresponding decoder. Due to the strict constraints on low latency and power consumption requirements, we have implemented low overhead error control schemes. The two schemes that we implemented include PAR, and SEC. Single error detection and retransmission (PAR): The basic single bit parity check method is used to detect the error, and re-transmission of data is requested in the presence of error. The main idea behind this scheme is to enable error recovery based on the re-transmission. The hardware overhead is negligible since it requires only one extra bit of information per flit of data transfer. However, latency per packet increases in case of retransmission. Single error correction (SEC): The basic (15,11) Hamming code [9] implementation with a single error correction capability is utilized for this scheme. The decoder present in the input virtual channel controller of a router is more complex than the encoder at the output virtual channel controller, because of the correction circuitry. The hop-to-hop transmission of 11 bit data requires 4 additional check bits Architecture and characterization of error control schemes In our architecture, we have primarily concentrated on modification of the link controllers to incorporate the error model as shown in Fig. 3. The data is encoded at the output link controller and is subsequently decoded at the input link controller before progressing through the next router towards its destination. This hop based error detection and correction mechanism allows strong error control. The functionality and characterization of the link controllers have been described below.

12 364 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Single error detection and retransmission (PAR): This scheme is implemented as shown in Fig. 4. The input link controller has 2 states, S and S1. S represents the idle state, in which the state machine waits for a req from the output virtual channel of the neighboring router. Once it receives a req from the output virtual channel, it checks the output of the parallel error detection circuitry. In absence of an error, it goes to S1 raising the ack signal and also the write signal (to its own infifo) high. In state S1, it lowers the write signal and stays in this state as long as the req signal remains high. Once the req signal is lowered, it returns to S lowering the ack signal in the transition. In presence of error, it shifts to state S1 raising the ack signal to the previous output link controller. However, it maintains the write signal low in this case and waits for req signal to go low to shift back to S, while raising the re-transmit signal. The output link controller has a complimentary state sequence as shown in Fig. 5. The characterized energy values for both the link controllers are also shown in Fig. 4 and 5. Single error correction (SEC): This scheme is similar to the above scheme with the controllers having 2 states each. The difference lies in the state S of the input link controller where an error detection leads to a subsequent correction of the error before shifting to state S1. The error = '', REQ = '1' / ACK = '1', write = '1' or error = '1', REQ = '1' / ACK = '1', write = '' REQ = '' / ACK = '', write = '' (E =.275 pj) (E =.24 pj) S S1 (E =.24 pj) (E =.225 pj) REQ = '1' / ACK = '1', write = '' REQ = '' / ACK = '', write = '' retransmit = error Leakage energy value for the circuit =.6 fj Fig. 4. Input link controller. ACK = '' and ivc!= full and ovc!=empty/ REQ = '' read =! retransmit (E =.261 pj) ACK!= ''/ REQ = '' read = '' (E =.9 pj) S S1 (E =.9 pj) ACK = ''/ REQ = '1' read = '' (E =.219 pj) ivc = input virtual channel ovc = output virtual channel ACK = '1'/ REQ = '1' read = '' Leakage energy value of the circuit =.2 fj Fig. 5. Output link controller.

13 characterized values of both the link controllers are similar to those shown in Figs. 4 and 5. We characterize the PAR and the SEC circuitry in terms of unit xor gates (energy values shown in Table 2) Error generation model Hegde et al. [45] developed a model for noise from various sources in CMOS circuitry as a Gaussian source. The model has been applied towards error estimation in SoC architectures [19,46]. In the model, it is assumed that the gate input is in error when the noise voltage V N exceeds the gate decision threshold voltage V th which is defined as V th ¼ V dd 2 The model assumes that a signaling waveform has a certain noise V N added on to it, and V N has a normal distribution with a variance of s 2 N and mean of. The probability of error is given by ¼ Q V dd 2s N P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) ; where QðxÞ ¼ Z 1 x 1 p ffiffiffiffiffi e y2 =2 dy 2p is the Gaussian pulse. We utilize the above model to generate errors in the individual wires of the NoC links. 6. Packet format and protocol The message is partitioned into fixed length packets that are in turn broken down into flits for efficient data transfer. A packet consists of three kinds of flits the header flit, the data flit and the tail flit, that are differentiated by two bits of control information. The header flit contains information of the destination router (X,Y) for each packet. The header flit contains additional information of one bit to indicate whether it is a best effort or a guaranteed throughput packet. 7. Results We performed design space exploration and performance versus power trade-off analysis for a 4 4 mesh topology of a NoC based interconnection network. Each unit router consisted of 4 virtual channels, with 2 channels each allocated to GT and BE traffic service levels. The physical channels supported unidirectional communication with both data and control bits. International Technology Roadmap for Semiconductors (ITRS) predicts that in future the die size for high end SoC architectures would be around 22 mm 22 mm: Kumar et al. [8] have also made similar predictions. Hence, we assume a chip dimension of 2 mm 2 mm and consider the inter-router links to be 4.5 mm. In our experiments, the simulator generated two varieties of traffic to random destinations uniformly distributed traffic and Poisson distributed traffic. The traffic was injected through the 16 processors by utilizing a uniform/poisson distribution over a designer specified

14 366 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) time interval. In our architecture, due to the asynchronous communication protocol, it takes two clock cycles to transfer each flit. The network was allowed to stabilize for the first 1 cycles, after which it was run for 1, clock cycles. At the end of 1, clock cycles the total number of packets reaching the destination, their acceptance rate, and latencies were calculated. The acceptance rate is the number of packets received at the destination per cycle per node. The average dynamic and leakage power consumption of the various components was also calculated over 1, clock cycles. The clock width was assumed to be 3 ns. In the following plots, we distinguish between queue and network latency. The queue latency denotes the amount of time spent by the packet at the source queue after its generation, and before its injection into the network. The network latency denotes the time required by the packet to transmit from source to destination. The total latency of the packet is summation of the queue and network latency. Additionally, for the GT traffic packets, we consider the set-up latency as the time required to reserve the virtual channels from source to destination. The BE packets were assumed to consist of 5 flits. The GT packets also consisted of 5 flits, and the GT stream was assumed to be 15 packets long. At a particular injection rate, the number of GT and BE packets to be generated are specified as a ratio r ¼ GT=BE: The queue latency of the GT traffic is calculated as the difference between the time when the total stream has been generated and the time when the stream is injected to the network Evaluation of QoS schemes Fig. 6 plots the variation in network latencies of GT and BE traffic when the destination is 3 hops away from the source at an injection rate of.5 packets/cycle/node. While the BE traffic experiences a wide spectrum of network latencies, the GT traffic latency spectrum has a sharp 7 6 GT BE 5 Noof Packets Latency (cycles) Fig. 6. Spectrum for BE/GT.

15 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) spike. This plot validates that our router is able to provide guaranteed low jitter latency for GT traffic transmission. Figs. 7 and 8 plot the network latency of the BE and GT traffic as the injection rate is varied from.25 to.1, and r is varied from.25 to 1. As can be observed from the plots, for all values of r, as the injection rate is increased the average network latency of the BE traffic increases. There is also an increase of average BE network latency with increasing r values, since more priority is given to GT traffic over BE traffic. The average network latency for the GT traffic, on the other hand, remains almost constant. Latency (cycles) Fig. 7. Network latency for BE. Latency (cycles) Fig. 8. Network latency for GT.

16 368 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Figs. 9 and 1 plot the variation in queue latency of BE, and GT traffic, respectively. The queue latency for BE traffic increases dramatically with rise in injection rate and r. This observation is supported by the BE acceptance rate plot shown in Fig. 12. It should be noted that BE queue latency soars to around 3 clock cycles for r ¼ 1 and injection rate :1: The queue latency for GT traffic for lower injection rates and low values of r remains negligible as less number of GT packets are generated by the processors and the resources can easily cater to them without any congestion. However, for higher values of r with higher injection rates we observe a considerable increase in queue latency because of high network congestion between GT traffic. 3 Latency (cycles) Fig. 9. Queue latency for BE. Latency (cycles) Fig. 1. Queue latency for GT.

17 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Fig. 11 plots the variation in connection set-up latency for the GT traffic. The set-up latency increases with both the injection rate and ratio r. The increase has the smallest slope for r ¼ :25: Figs plot the acceptance rates for BE, GT, and combined traffic, respectively. As can be seen from the plots, at a particular r value the BE acceptance rate initially increases with increase in injection rate. It peaks at around.5 injection rate, and then falls. However, the acceptance rate for GT traffic increases linearly with increase in injection rate and r. Priority of GT traffic over BE traffic helps explain the variation in BE acceptance rate. The combined network acceptance rate rises linearly with the injection rate before the network is congested, and is constant after congestion Latency (cycles) Fig. 11. Setup latency for GT. AcceptanceRate (packets/cycle/node) Fig. 12. Acceptance rate BE.

18 37 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Acceptance Rate (packets/cycle/node) Fig. 13. Acceptance rate GT. Acceptance Rate (packets/cycle/node) Fig. 14. Acceptance rate BE and GT. Figs. 15 and 16 plot the variation in average dynamic and leakage power of the NoC for the variation in injection rates and r, respectively. The dynamic power consumption closely follows the combined BE and GT acceptance rate plot shown in Fig. 14. At higher acceptance rates, the dynamic power consumption is high, and vice versa. Also the peaks in dynamic power consumption plots are mirrored by troughs in leakage power consumption, and vice versa. The virtual channel buffers are the main contributors to both dynamic and leakage power consumption in NoC. Fig. 17 plots the power consumed by the buffers at.5 injection rate. There is an increase in the power consumption of the GT virtual channel buffers as the GT/BE

19 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Power (mw) Fig. 15. Dynamic power BE/GT Power (mw) Fig. 16. Leakage power BE/GT. ratio increases from.25 to 1. since the utilization of the GT virtual channels increases with the increasing values of r. However, for GT/BE ratio of.5 we see the power consumption in the BE virtual channel buffers to be more than the GT virtual channel buffers. This is observed since the BE virtual channels can be used to transfer GT traffic but not vice versa. The power consumption of the individual components of the router network for an injection rate of.5 for the different values of r has been shown in the Fig. 18. It can be seen from the plots that the virtual channel buffers are the dominant consumers of total power. It can also be

20 372 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Power (mw) Fifo_BE Fifo_GT Fig. 17. Fifo power BE/GT Power (mw) FIFO Headerdecoder Arbiter Crossbar VitualControllers Links Fig. 18. Component power. observed that the header decoders, arbiter, and the link controllers also contribute significantly to the total power consumption. Figs show similar plots for the router network under Poisson traffic distribution. It should be noted that the results of the latencies, acceptance rates and power consumption for the Poisson traffic model is very similar to that of the uniform random traffic model. This proves that our router design can effectively support both kinds of traffic profiles.

21 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Latency (cycles) Fig. 19. Network latency for BE(Poisson). Latency (cycles) Fig. 2. Network latency for GT(Poisson). The following conclusion can be inferred from the extensive experimentation performed with our router architecture supporting multiple levels of service: For a low value of r ¼ :25; the GT traffic experiences almost zero queue latency and a low setup latency. Also the acceptance of BE traffic is high for this case. Hence, a low value of r (around.25) should be utilized when designing a NoC with GT and BE traffic service levels Evaluation of error control schemes We characterized the NoC for :18 mm technology, and consider V dd ¼ 1:8V: We evaluated the performance of the error control schemes by assigning the noise voltage variance, s N to.5 V

22 374 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Latency (cycles) Fig. 21. Queue latency for BE(Poisson). Latency (cycles) Fig. 22. Queue latency for GT(Poisson). [45] and.36 V, respectively. The corresponding bit error rate, is.35 (high, H in plots) and.63 (low, L in plots), respectively. The ratio of GT/BE packets generated, r, has been taken to be.25. Fig. 3 plots the overall acceptance rate of the NoC under low and high error rates using both the PAR and SEC error control schemes. The acceptance for the PAR scheme is lower than the SEC scheme for higher injection rates because of the latency involved in retransmission. For lower injection rates, the difference in the acceptance rates between the two schemes diminishes due to less traffic in the network.

23 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Latency (cycles) Fig. 23. Setup latency for GT(Poisson). Acceptance Rate (packets/cycle/node) Fig. 24. Acceptance rate BE(Poisson). Fig. 31 plot the network latencies under various injection and bit error rates. The network latency is always higher for the PAR scheme due to retransmission delay. This is reinforced by the overall acceptance plot in Fig. 3. The average latency is higher at high bit error rates because more number of flits are prone to error and are hence retransmitted. Fig. 32 shows the network power consumption for low and high error rates using both the PAR and SEC schemes. The SEC power consumption for high injection rates is more than PAR due to high acceptance rates for SEC. For low injection and low bit error rates, the power consumption for the SEC scheme is almost equal to PAR scheme. However, the area consumed by the PAR implementation is lower than the SEC scheme, making it an attractive technique for error control

24 376 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Acceptance Rate (packets/cycle/node) Fig. 25. Acceptance rate GT(Poisson). Acceptance Rate (packets/cycle/node) Fig. 26. Acceptance rate BE and GT(Poisson). in this case. For all other cases ({low bit error rate, high injection rate}, {high bit error rate, low injection rate}, {high bit error rate, high injection rate}), SEC is a preferred choice due to high acceptance rates. Moreover, for low injection and high bit error case, the power consumed by the retransmission circuitry offsets the power consumed by error correction. The results for the error control schemes are summarized in Table 3. The table shows the appropriate error control schemes under different injection and bit error rates respectively. Fig. 33 shows the leakage power consumption for low and high error rates using both the PAR and SEC schemes. Leakage power consumption is more in the PAR scheme than in the SEC scheme since the dynamic power consumption is less and vice versa.

25 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Power (mw) Fig. 27. Dynamic power BE and GT(Poisson) Power (mw) Fig. 28. Leakage power BE/GT(Poisson). 8. Conclusion In this paper, we presented a cycle accurate performance and power evaluation model for BE and GT traffic with error correction/detection on mesh-based NoC. We presented results for extensive design space exploration and performance versus power trade-off analysis of a 4 4 mesh architecture. The experimental results were presented for both uniform and Poisson traffic distributions. The results demonstrated that our architecture is able to provide excellent support for both GT and BE traffic schemes as long as the GT/BE traffic ratio is around.25. On

26 378 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Power (mw) FIFO Headerdecoder Arbiter Crossbar VitualControllers Links Fig. 29. Component power(poisson). Acceptance Rate (packets/cycle/node) SEC_H Parity_H SEC_L Parity_L Parity_L SEC_L Parity_H SEC_H Injection Rate Fig. 3. Acceptance rate PAR/SEC. the basis of their performance and power consumption characteristics it was also shown that PAR (single error control) scheme is better than the SEC (single error correction) at low injection and low error rates. In all other circumstances the SEC scheme gives better performance. The current version of the model is limited to mesh based topologies supporting deterministic routing schemes and synthetically generated traffic. Future work will address developing

27 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Latency (cycles) SEC_H Parity_H SEC_L Parity_L 1 5 Parity_L SEC_L Parity_H SEC_H Injection Rate Fig. 31. Network latency PAR/SEC. Power (mw) SEC_H Parity_H SEC_L Parity_L Parity_L SEC_L Parity_H SEC_H Injection Rate Fig. 32. Dynamic power PAR/SEC. router architectures and related power and performance models for generic topologies. Adaptive routing schemes would also be explored. Finally, design space exploration would be performed with communication traces of realistic benchmark applications that are mapped to NoC architectures.

28 38 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) Table 3 Summary of error control schemes Injection rate (Low) Injection rate (High) Bit error rate (low) PAR SEC Bit error rate (high) SEC SEC SEC_H Power (mw) Parity_H SEC_L Parity_L Parity_L SEC_L Parity_H SEC_H Injection Rate (packets/cycle/node) Fig. 33. Leakage power PAR/SEC. References [1] D. Sylvester, K. Keutzer, A global wiring paradigm for deep submicron design, IEEE Trans. Comput. Aided Design Integrated Circuits Systems (2) [2] R.Ho, K. Mai, M. Horowitz, The future of wires, Proc. IEEE (21) [3] J. Davis, D. Meindl, Compact distributed RLC interconnect models Part II: coupled line transient expressions and peak crosstalk in multilevel networks, IEEE Trans. Electron Devices 47 (11) (2) [4] P. Guerrier, A. Greiner, A generic architecture for on-chip packet-switched interconnections, in: DATE, Paris, France, March 2. [5] M. Sgroi, M. Sheets, A. Mihal, K. Keutzer, S. Malik, J. Rabeay, A. Sangiovanni-Vincentelli, Addressing the system-on-a-chip interconnect woes through communication-based design, in: Proceedings of Design Automation Conference, June 21, pp [6] William J. Dally, Brian Towles, Route packet, not wires: on-chip interconnection networks, in: Proceedings of DAC, June 22. [7] Luca Benini, Giovanni De Micheli, Networks on chips: a new SoC paradigm, IEEE Comput. (22) 7 78.

29 P. Vellanki et al. / INTEGRATION, the VLSI journal 38 (25) [8] S. Kumar, A. Jantsch, M. Millberg, J. Oberg, J.P. Soininen, M. Forsell, K.T.A. Hemani, A network on chip architecture and design methodology, in: IEEE Computer Society Annual Symposium, on VLSI, Pittsburg, Pennsylvania, April 22. [9] S. Lin, D.J. Costello, Error Control Coding: Fundamentals and Applications, Prentice-Hall, Englewood Cliffs, NJ, [1] A. Andriahantenaina, A. Greiner, Micro-network for SoC: implementation of a 32-port SPIN network, in: DATE, Munich, Germany, March 23. [11] A. Andriahantenaina, H. Charlery, A. Greiner, L. Mortiez, C.A. Zeferino, SPIN: a scalable, packet switched, onchip micro-network, in: DATE, Munich, Germany, March 23. [12] D. Siguenza-Tortosa, J. Nurmi, Proteo: a new approach to network-on-chip, in: Proceedings of IASTED International Conference on Communication Systems and Network, Malaga, Spain, 22. [13] D. Siguenza-Tortosa, J. Nurmi, VHDL-based simulation environment for Proteo NoC, in: High-Level Design Validation and Test Workshop, Paris, France, October 22. [14] M. Dall Osso, G. Biccari, L. Giovanninni, D. Bertozzi, L. Benini, Xpipes: a latency insensitive prameterized network-on-chip architecture for multi-processor SoCs, in: Proceedings of ICCD, San Jose, CA, October 23. [15] M. Millberg, E. Nilsson, R. Thid, S. Kumar, A. Jantsch, The Nostrum backbone a communication protocol stack for networks on chip, in: VLSI Design Conference, Mumbai, India, January 24. [16] M. Millberg, E. Nilsson, R. Thid, A. Jantsch, Guaranteed bandwidth using looped containers in temporally disjoint networks within the Nostrum network on chip, in: DATE, February 24, pp [17] J. Dielissen, A. Ra dulescu, K. Goossens, E. Rijpkema, Concepts and implementation of the Philips network-onchip, in: IP-Based SOC Design, November 23. [18] E. Rijpkema, K.G.W. Goossens, A. Radulescu, Trade offs in the design of a router with both guaranteed best-effort services for networks on chip, in: DATE, 24. [19] D. Bertozzi, L. Benini, G. De Micheli, Low power error resilient encoding for on-chip data buses, in: DATE, 23. [2] H. Zimmer, A. Jantsch, A fault model notation and error-control scheme for switch-to-switch buses in a networkon-chip, in: ISSS/CODES, 23. [21] F. Worm, P. Ienne, P. Thiran, G. De Micheli, An adaptive low-power transmission scheme for on-chip networks, in: Proceedings of ISSS, Kyoto, Japan, 22. [22] X. Chen, L.-S. Peh, Leakage power modeling and optimization in interconnection networks, in: Proceedings of ISLPED, Seoul, Korea, 23. [23] T. Simunic, S. Boyd, Managing power consumption in networks on chips, in: Proceedings of DATE, Paris, France, 22. [24] J. Duato, S. Yalamanchili, L. Ni, Interconnection networks, an engineering approach, IEEE Computer Society, [25] H.J. Seigel, A model of SIMD machines and a comparison of various interconnection networks, IEEE Trans. Comput. 28 (12) (1979) [26] W.J. Dally, Performance analysis of k-ary n-cube interconnection network, IEEE Trans. Comput. 39 (6) (199) [27] J.F. Draper, J. Ghosh, A comprehensive analytical model for wormhole routing in multicomputer systems, J. Parallel Distributed Comput. 23 (1994) [28] D. Brooks, V. Tiwari, M. Martonosi, Wattch: a framework for architectural-level power analysis and optimizations, in: International Symposium on Computer Architecture, 2, pp [29] W. Ye, N. Vijaykrishna, M. Kandemir, M.J. Irwin, The design and use of simplepower: a cycle-accurate energy estimation tool, in: Proceedings of Design Automation Conference, June 2. [3] T. Givargis, F. Vahid, J. Henkel, Instruction-based system-level power evaluation of system-on-a-chip peripheral cores, IEEE Trans. VLSI 1(6) (22). [31] Arm Inc., AMBA specification, [32] IBM, The coreconnect bus architecture, [33] D.Wingard, MicroNetwork-based integration of SOCs, in: DAC, Las Vegas, Nevada, June 21. [34] A.G. Wassal, M.A. Hasan, Low-power system-level design of VLSI packet switching fabrics, IEEE Trans. CAD 2 (21)

Analysis of Error Recovery Schemes for Networks-on-Chips

Analysis of Error Recovery Schemes for Networks-on-Chips Analysis of Error Recovery Schemes for Networks-on-Chips 1 Srinivasan Murali, Theocharis Theocharides, Luca Benini, Giovanni De Micheli, N. Vijaykrishnan, Mary Jane Irwin Abstract Network on Chip (NoC)

More information

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,

More information

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

A Dynamic Link Allocation Router

A Dynamic Link Allocation Router A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection

More information

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of

More information

Topology adaptive network-on-chip design and implementation

Topology adaptive network-on-chip design and implementation Topology adaptive network-on-chip design and implementation T.A. Bartic, J.-Y. Mignolet, V. Nollet, T. Marescaux, D. Verkest, S. Vernalde and R. Lauwereins Abstract: Network-on-chip designs promise to

More information

Asynchronous Bypass Channels

Asynchronous Bypass Channels Asynchronous Bypass Channels Improving Performance for Multi-Synchronous NoCs T. Jain, P. Gratz, A. Sprintson, G. Choi, Department of Electrical and Computer Engineering, Texas A&M University, USA Table

More information

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

From Bus and Crossbar to Network-On-Chip. Arteris S.A. From Bus and Crossbar to Network-On-Chip Arteris S.A. Copyright 2009 Arteris S.A. All rights reserved. Contact information Corporate Headquarters Arteris, Inc. 1741 Technology Drive, Suite 250 San Jose,

More information

Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip

Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip Manjunath E 1, Dhana Selvi D 2 M.Tech Student [DE], Dept. of ECE, CMRIT, AECS Layout, Bangalore, Karnataka,

More information

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip Cristina SILVANO silvano@elet.polimi.it Politecnico di Milano, Milano (Italy) Talk Outline

More information

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip www.ijcsi.org 241 A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip Ahmed A. El Badry 1 and Mohamed A. Abd El Ghany 2 1 Communications Engineering Dept., German University in Cairo,

More information

Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA

Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA B. Neji 1, Y. Aydi 2, R. Ben-atitallah 3,S. Meftaly 4, M. Abid 5, J-L. Dykeyser 6 1 CES, National engineering School

More information

TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy

TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy John Jose, K.V. Mahathi, J. Shiva Shankar and Madhu Mutyam PACE Laboratory, Department of Computer Science and Engineering

More information

A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs

A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs ang u, Mei ang, ulu ang, and ingtao Jiang Dept. of Computer Science Nankai University Tianjing, 300071, China yuyang_79@yahoo.com.cn,

More information

Switched Interconnect for System-on-a-Chip Designs

Switched Interconnect for System-on-a-Chip Designs witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased

More information

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,

More information

A 2-Slot Time-Division Multiplexing (TDM) Interconnect Network for Gigascale Integration (GSI)

A 2-Slot Time-Division Multiplexing (TDM) Interconnect Network for Gigascale Integration (GSI) A 2-Slot Time-Division Multiplexing (TDM) Interconnect Network for Gigascale Integration (GSI) Ajay Joshi Georgia Institute of Technology School of Electrical and Computer Engineering Atlanta, GA 3332-25

More information

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP SWAPNA S 2013 EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP A

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

An Event-Based Monitoring Service for Networks on Chip

An Event-Based Monitoring Service for Networks on Chip An Event-Based Monitoring Service for Networks on Chip CALIN CIORDAS and TWAN BASTEN Eindhoven University of Technology and ANDREI RĂDULESCU, KEES GOOSSENS, and JEF VAN MEERBERGEN Philips Research Networks

More information

Quality of Service (QoS) for Asynchronous On-Chip Networks

Quality of Service (QoS) for Asynchronous On-Chip Networks Quality of Service (QoS) for synchronous On-Chip Networks Tomaz Felicijan and Steve Furber Department of Computer Science The University of Manchester Oxford Road, Manchester, M13 9PL, UK {felicijt,sfurber}@cs.man.ac.uk

More information

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors

Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors Hyungjun Kim, 1 Arseniy Vitkovsky, 2 Paul V. Gratz, 1 Vassos Soteriou 2 1 Department of Electrical and Computer Engineering, Texas

More information

Packetization and routing analysis of on-chip multiprocessor networks

Packetization and routing analysis of on-chip multiprocessor networks Journal of Systems Architecture 50 (2004) 81 104 www.elsevier.com/locate/sysarc Packetization and routing analysis of on-chip multiprocessor networks Terry Tao Ye a, *, Luca Benini b, Giovanni De Micheli

More information

Optical interconnection networks with time slot routing

Optical interconnection networks with time slot routing Theoretical and Applied Informatics ISSN 896 5 Vol. x 00x, no. x pp. x x Optical interconnection networks with time slot routing IRENEUSZ SZCZEŚNIAK AND ROMAN WYRZYKOWSKI a a Institute of Computer and

More information

CONTINUOUS scaling of CMOS technology makes it possible

CONTINUOUS scaling of CMOS technology makes it possible IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 7, JULY 2006 693 It s a Small World After All : NoC Performance Optimization Via Long-Range Link Insertion Umit Y. Ogras,

More information

Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP)

Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP) 19th International Symposium on Computer Architecture and High Performance Computing Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP) Seung Eun Lee, Jun Ho Bahn, and

More information

A Low Latency Router Supporting Adaptivity for On-Chip Interconnects

A Low Latency Router Supporting Adaptivity for On-Chip Interconnects A Low Latency Supporting Adaptivity for On-Chip Interconnects Jongman Kim Dongkook Park T. Theocharides N. Vijaykrishnan Chita R. Das Department of Computer Science and Engineering The Pennsylvania State

More information

AN EVENT-BASED NETWORK-ON-CHIP MONITORING SERVICE

AN EVENT-BASED NETWORK-ON-CHIP MONITORING SERVICE AN EVENT-BASED NETWOK-ON-CH MOTOING SEVICE Calin Ciordas Twan Basten Andrei ădulescu Kees Goossens Jef van Meerbergen Eindhoven University of Technology, Eindhoven, The Netherlands hilips esearch Laboratories,

More information

Local Area Networks transmission system private speedy and secure kilometres shared transmission medium hardware & software

Local Area Networks transmission system private speedy and secure kilometres shared transmission medium hardware & software Local Area What s a LAN? A transmission system, usually private owned, very speedy and secure, covering a geographical area in the range of kilometres, comprising a shared transmission medium and a set

More information

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems

More information

Interconnection Network Design

Interconnection Network Design Interconnection Network Design Vida Vukašinović 1 Introduction Parallel computer networks are interesting topic, but they are also difficult to understand in an overall sense. The topological structure

More information

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7 ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7 13.7 A 40Gb/s Clock and Data Recovery Circuit in 0.18µm CMOS Technology Jri Lee, Behzad Razavi University of California, Los Angeles, CA

More information

On-Chip Interconnect: The Past, Present, and Future

On-Chip Interconnect: The Past, Present, and Future On-Chip Interconnect: The Past, Present, and Future Professor Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester URL: http://www.ece.rochester.edu/~friedman Future

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

- Nishad Nerurkar. - Aniket Mhatre

- Nishad Nerurkar. - Aniket Mhatre - Nishad Nerurkar - Aniket Mhatre Single Chip Cloud Computer is a project developed by Intel. It was developed by Intel Lab Bangalore, Intel Lab America and Intel Lab Germany. It is part of a larger project,

More information

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

A Generic Network Interface Architecture for a Networked Processor Array (NePA) A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction

More information

6.6 Scheduling and Policing Mechanisms

6.6 Scheduling and Policing Mechanisms 02-068 C06 pp4 6/14/02 3:11 PM Page 572 572 CHAPTER 6 Multimedia Networking 6.6 Scheduling and Policing Mechanisms In the previous section, we identified the important underlying principles in providing

More information

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs Antoni Roca, Jose Flich Parallel Architectures Group Universitat Politechnica de Valencia (UPV) Valencia, Spain Giorgos Dimitrakopoulos

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

On-Chip Interconnection Networks Low-Power Interconnect

On-Chip Interconnection Networks Low-Power Interconnect On-Chip Interconnection Networks Low-Power Interconnect William J. Dally Computer Systems Laboratory Stanford University ISLPED August 27, 2007 ISLPED: 1 Aug 27, 2007 Outline Demand for On-Chip Networks

More information

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL IJVD: 3(1), 2012, pp. 15-20 DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL Suvarna A. Jadhav 1 and U.L. Bombale 2 1,2 Department of Technology Shivaji university, Kolhapur, 1 E-mail: suvarna_jadhav@rediffmail.com

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

Interconnection Networks

Interconnection Networks CMPT765/408 08-1 Interconnection Networks Qianping Gu 1 Interconnection Networks The note is mainly based on Chapters 1, 2, and 4 of Interconnection Networks, An Engineering Approach by J. Duato, S. Yalamanchili,

More information

Optimizing Configuration and Application Mapping for MPSoC Architectures

Optimizing Configuration and Application Mapping for MPSoC Architectures Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : Sebastien.Le-Beux@polymtl.ca 1 Multi-Processor Systems on Chip (MPSoC) Design Trends

More information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.

More information

Peak Power Control for a QoS Capable On-Chip Network

Peak Power Control for a QoS Capable On-Chip Network Peak Power Control for a QoS Capable On-Chip Network Yuho Jin 1, Eun Jung Kim 1, Ki Hwan Yum 2 1 Texas A&M University 2 University of Texas at San Antonio {yuho,ejkim}@cs.tamu.edu yum@cs.utsa.edu Abstract

More information

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7 ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7 4.7 A 2.7 Gb/s CDMA-Interconnect Transceiver Chip Set with Multi-Level Signal Data Recovery for Re-configurable VLSI Systems

More information

Requirements of Voice in an IP Internetwork

Requirements of Voice in an IP Internetwork Requirements of Voice in an IP Internetwork Real-Time Voice in a Best-Effort IP Internetwork This topic lists problems associated with implementation of real-time voice traffic in a best-effort IP internetwork.

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design Applying the Benefits of on a Chip Architecture to FPGA System Design WP-01149-1.1 White Paper This document describes the advantages of network on a chip (NoC) architecture in Altera FPGA system design.

More information

Signal integrity in deep-sub-micron integrated circuits

Signal integrity in deep-sub-micron integrated circuits Signal integrity in deep-sub-micron integrated circuits Alessandro Bogliolo abogliolo@ing.unife.it Outline Introduction General signaling scheme Noise sources and effects in DSM ICs Supply noise Synchronization

More information

DESIGN CHALLENGES OF TECHNOLOGY SCALING

DESIGN CHALLENGES OF TECHNOLOGY SCALING DESIGN CHALLENGES OF TECHNOLOGY SCALING IS PROCESS TECHNOLOGY MEETING THE GOALS PREDICTED BY SCALING THEORY? AN ANALYSIS OF MICROPROCESSOR PERFORMANCE, TRANSISTOR DENSITY, AND POWER TRENDS THROUGH SUCCESSIVE

More information

Real-time Processor Interconnection Network for FPGA-based Multiprocessor System-on-Chip (MPSoC)

Real-time Processor Interconnection Network for FPGA-based Multiprocessor System-on-Chip (MPSoC) Real-time Processor Interconnection Network for FPGA-based Multiprocessor System-on-Chip (MPSoC) Stefan Aust, Harald Richter Department of Computer Science Clausthal University of Technology Julius-Albert-Str.

More information

Timing analysis of network on chip architectures for MP-SoC platforms

Timing analysis of network on chip architectures for MP-SoC platforms Microelectronics Journal 36 (2005) 833 845 www.elsevier.com/locate/mejo Timing analysis of network on chip architectures for MP-SoC platforms Cristian Grecu, Partha Pratim Pande*, André Ivanov, Res Saleh

More information

A Design Methodology for Application-Specific Networks-on-Chip

A Design Methodology for Application-Specific Networks-on-Chip A Design Methodology for Application-Specific Networks-on-Chip JIANG XU and WAYNE WOLF Princeton University JOERG HENKEL University of Karlsruhe and SRIMAT CHAKRADHAR NEC Laboratories America, Inc. With

More information

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications TRIPTI SHARMA, K. G. SHARMA, B. P. SINGH, NEHA ARORA Electronics & Communication Department MITS Deemed University,

More information

Scaling 10Gb/s Clustering at Wire-Speed

Scaling 10Gb/s Clustering at Wire-Speed Scaling 10Gb/s Clustering at Wire-Speed InfiniBand offers cost-effective wire-speed scaling with deterministic performance Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400

More information

AN ANALYSIS OF DELAY OF SMALL IP PACKETS IN CELLULAR DATA NETWORKS

AN ANALYSIS OF DELAY OF SMALL IP PACKETS IN CELLULAR DATA NETWORKS AN ANALYSIS OF DELAY OF SMALL IP PACKETS IN CELLULAR DATA NETWORKS Hubert GRAJA, Philip PERRY and John MURPHY Performance Engineering Laboratory, School of Electronic Engineering, Dublin City University,

More information

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim Department of Computer Science and Engineering Texas A&M University

More information

Interconnection Networks

Interconnection Networks Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode

More information

Chapter 3 ATM and Multimedia Traffic

Chapter 3 ATM and Multimedia Traffic In the middle of the 1980, the telecommunications world started the design of a network technology that could act as a great unifier to support all digital services, including low-speed telephony and very

More information

CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS

CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS 137 CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS 8.1 CONCLUSION In this thesis, efficient schemes have been designed and analyzed to control congestion and distribute the load in the routing process of

More information

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING CHAPTER 6 CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING 6.1 INTRODUCTION The technical challenges in WMNs are load balancing, optimal routing, fairness, network auto-configuration and mobility

More information

Communication Networks. MAP-TELE 2011/12 José Ruela

Communication Networks. MAP-TELE 2011/12 José Ruela Communication Networks MAP-TELE 2011/12 José Ruela Network basic mechanisms Introduction to Communications Networks Communications networks Communications networks are used to transport information (data)

More information

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Bradley R. Quinton Dept. of Electrical and Computer Engineering University of British Columbia bradq@ece.ubc.ca

More information

AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK

AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK Abstract AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK Mrs. Amandeep Kaur, Assistant Professor, Department of Computer Application, Apeejay Institute of Management, Ramamandi, Jalandhar-144001, Punjab,

More information

Analysis of IP Network for different Quality of Service

Analysis of IP Network for different Quality of Service 2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore Analysis of IP Network for different Quality of Service Ajith

More information

PART III. OPS-based wide area networks

PART III. OPS-based wide area networks PART III OPS-based wide area networks Chapter 7 Introduction to the OPS-based wide area network 7.1 State-of-the-art In this thesis, we consider the general switch architecture with full connectivity

More information

Student, Haryana Engineering College, Haryana, India 2 H.O.D (CSE), Haryana Engineering College, Haryana, India

Student, Haryana Engineering College, Haryana, India 2 H.O.D (CSE), Haryana Engineering College, Haryana, India Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A New Protocol

More information

White Paper Abstract Disclaimer

White Paper Abstract Disclaimer White Paper Synopsis of the Data Streaming Logical Specification (Phase I) Based on: RapidIO Specification Part X: Data Streaming Logical Specification Rev. 1.2, 08/2004 Abstract The Data Streaming specification

More information

路 論 Chapter 15 System-Level Physical Design

路 論 Chapter 15 System-Level Physical Design Introduction to VLSI Circuits and Systems 路 論 Chapter 15 System-Level Physical Design Dept. of Electronic Engineering National Chin-Yi University of Technology Fall 2007 Outline Clocked Flip-flops CMOS

More information

Architecture of distributed network processors: specifics of application in information security systems

Architecture of distributed network processors: specifics of application in information security systems Architecture of distributed network processors: specifics of application in information security systems V.Zaborovsky, Politechnical University, Sait-Petersburg, Russia vlad@neva.ru 1. Introduction Modern

More information

QoS issues in Voice over IP

QoS issues in Voice over IP COMP9333 Advance Computer Networks Mini Conference QoS issues in Voice over IP Student ID: 3058224 Student ID: 3043237 Student ID: 3036281 Student ID: 3025715 QoS issues in Voice over IP Abstract: This

More information

TIMING-DRIVEN PHYSICAL DESIGN FOR DIGITAL SYNCHRONOUS VLSI CIRCUITS USING RESONANT CLOCKING

TIMING-DRIVEN PHYSICAL DESIGN FOR DIGITAL SYNCHRONOUS VLSI CIRCUITS USING RESONANT CLOCKING TIMING-DRIVEN PHYSICAL DESIGN FOR DIGITAL SYNCHRONOUS VLSI CIRCUITS USING RESONANT CLOCKING BARIS TASKIN, JOHN WOOD, IVAN S. KOURTEV February 28, 2005 Research Objective Objective: Electronic design automation

More information

Automist - A Tool for Automated Instruction Set Characterization of Embedded Processors

Automist - A Tool for Automated Instruction Set Characterization of Embedded Processors Automist - A Tool for Automated Instruction Set Characterization of Embedded Processors Manuel Wendt 1, Matthias Grumer 1, Christian Steger 1, Reinhold Weiß 1, Ulrich Neffe 2 and Andreas Mühlberger 2 1

More information

CS 78 Computer Networks. Internet Protocol (IP) our focus. The Network Layer. Interplay between routing and forwarding

CS 78 Computer Networks. Internet Protocol (IP) our focus. The Network Layer. Interplay between routing and forwarding CS 78 Computer Networks Internet Protocol (IP) Andrew T. Campbell campbell@cs.dartmouth.edu our focus What we will lean What s inside a router IP forwarding Internet Control Message Protocol (ICMP) IP

More information

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION T.S Ghouse Basha 1, P. Santhamma 2, S. Santhi 3 1 Associate Professor & Head, Department Electronic & Communication Engineering,

More information

Adaptive DCF of MAC for VoIP services using IEEE 802.11 networks

Adaptive DCF of MAC for VoIP services using IEEE 802.11 networks Adaptive DCF of MAC for VoIP services using IEEE 802.11 networks 1 Mr. Praveen S Patil, 2 Mr. Rabinarayan Panda, 3 Mr. Sunil Kumar R D 1,2,3 Asst. Professor, Department of MCA, The Oxford College of Engineering,

More information

On-Chip Communication Architectures

On-Chip Communication Architectures On-Chip Communication Architectures Networks-on-Chip ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 12 1 Outline Introduction NoC Topology Switching strategies Routing algorithms Flow

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Automatic Deadlock Checking and Routing

Automatic Deadlock Checking and Routing A Tool for Automatic Detection of Deadlock in Wormhole Networks on Chip SAMI TAKTAK, JEAN-LOU DESBARBIEUX LIP6 Lab University Paris VI & CNRS and EMMANUELLE ENCRENAZ LSV ENS Cachan & CNRS We present an

More information

COMMUNICATION PERFORMANCE EVALUATION AND ANALYSIS OF A MESH SYSTEM AREA NETWORK FOR HIGH PERFORMANCE COMPUTERS

COMMUNICATION PERFORMANCE EVALUATION AND ANALYSIS OF A MESH SYSTEM AREA NETWORK FOR HIGH PERFORMANCE COMPUTERS COMMUNICATION PERFORMANCE EVALUATION AND ANALYSIS OF A MESH SYSTEM AREA NETWORK FOR HIGH PERFORMANCE COMPUTERS PLAMENKA BOROVSKA, OGNIAN NAKOV, DESISLAVA IVANOVA, KAMEN IVANOV, GEORGI GEORGIEV Computer

More information

Transport layer issues in ad hoc wireless networks Dmitrij Lagutin, dlagutin@cc.hut.fi

Transport layer issues in ad hoc wireless networks Dmitrij Lagutin, dlagutin@cc.hut.fi Transport layer issues in ad hoc wireless networks Dmitrij Lagutin, dlagutin@cc.hut.fi 1. Introduction Ad hoc wireless networks pose a big challenge for transport layer protocol and transport layer protocols

More information

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29. Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet

More information

International Journal of Electronics and Computer Science Engineering 1482

International Journal of Electronics and Computer Science Engineering 1482 International Journal of Electronics and Computer Science Engineering 1482 Available Online at www.ijecse.org ISSN- 2277-1956 Behavioral Analysis of Different ALU Architectures G.V.V.S.R.Krishna Assistant

More information

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc (International Journal of Computer Science & Management Studies) Vol. 17, Issue 01 Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc Dr. Khalid Hamid Bilal Khartoum, Sudan dr.khalidbilal@hotmail.com

More information

Clocking. Figure by MIT OCW. 6.884 - Spring 2005 2/18/05 L06 Clocks 1

Clocking. Figure by MIT OCW. 6.884 - Spring 2005 2/18/05 L06 Clocks 1 ing Figure by MIT OCW. 6.884 - Spring 2005 2/18/05 L06 s 1 Why s and Storage Elements? Inputs Combinational Logic Outputs Want to reuse combinational logic from cycle to cycle 6.884 - Spring 2005 2/18/05

More information

Design and Verification of Nine port Network Router

Design and Verification of Nine port Network Router Design and Verification of Nine port Network Router G. Sri Lakshmi 1, A Ganga Mani 2 1 Assistant Professor, Department of Electronics and Communication Engineering, Pragathi Engineering College, Andhra

More information

Quality of Service versus Fairness. Inelastic Applications. QoS Analogy: Surface Mail. How to Provide QoS?

Quality of Service versus Fairness. Inelastic Applications. QoS Analogy: Surface Mail. How to Provide QoS? 18-345: Introduction to Telecommunication Networks Lectures 20: Quality of Service Peter Steenkiste Spring 2015 www.cs.cmu.edu/~prs/nets-ece Overview What is QoS? Queuing discipline and scheduling Traffic

More information

CHAPTER 6. VOICE COMMUNICATION OVER HYBRID MANETs

CHAPTER 6. VOICE COMMUNICATION OVER HYBRID MANETs CHAPTER 6 VOICE COMMUNICATION OVER HYBRID MANETs Multimedia real-time session services such as voice and videoconferencing with Quality of Service support is challenging task on Mobile Ad hoc Network (MANETs).

More information

Interconnection Networks

Interconnection Networks Interconnection Networks Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Three questions about

More information

Overview of Network Hardware and Software. CS158a Chris Pollett Jan 29, 2007.

Overview of Network Hardware and Software. CS158a Chris Pollett Jan 29, 2007. Overview of Network Hardware and Software CS158a Chris Pollett Jan 29, 2007. Outline Scales of Networks Protocol Hierarchies Scales of Networks Last day, we talked about broadcast versus point-to-point

More information

Performance Analysis of AQM Schemes in Wired and Wireless Networks based on TCP flow

Performance Analysis of AQM Schemes in Wired and Wireless Networks based on TCP flow International Journal of Soft Computing and Engineering (IJSCE) Performance Analysis of AQM Schemes in Wired and Wireless Networks based on TCP flow Abdullah Al Masud, Hossain Md. Shamim, Amina Akhter

More information

Analysis of Effect of Handoff on Audio Streaming in VOIP Networks

Analysis of Effect of Handoff on Audio Streaming in VOIP Networks Beyond Limits... Volume: 2 Issue: 1 International Journal Of Advance Innovations, Thoughts & Ideas Analysis of Effect of Handoff on Audio Streaming in VOIP Networks Shivani Koul* shivanikoul2@gmail.com

More information

Analysis of QoS Routing Approach and the starvation`s evaluation in LAN

Analysis of QoS Routing Approach and the starvation`s evaluation in LAN www.ijcsi.org 360 Analysis of QoS Routing Approach and the starvation`s evaluation in LAN 1 st Ariana Bejleri Polytechnic University of Tirana, Faculty of Information Technology, Computer Engineering Department,

More information

Leveraging Torus Topology with Deadlock Recovery for Cost-Efficient On-Chip Network

Leveraging Torus Topology with Deadlock Recovery for Cost-Efficient On-Chip Network Leveraging Torus Topology with Deadlock ecovery for Cost-Efficient On-Chip Network Minjeong Shin, John Kim Department of Computer Science KAIST Daejeon, Korea {shinmj, jjk}@kaist.ac.kr Abstract On-chip

More information

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches 3D On-chip Data Center Networks Using Circuit Switches and Packet Switches Takahide Ikeda Yuichi Ohsita, and Masayuki Murata Graduate School of Information Science and Technology, Osaka University Osaka,

More information

Load Balancing Mechanisms in Data Center Networks

Load Balancing Mechanisms in Data Center Networks Load Balancing Mechanisms in Data Center Networks Santosh Mahapatra Xin Yuan Department of Computer Science, Florida State University, Tallahassee, FL 33 {mahapatr,xyuan}@cs.fsu.edu Abstract We consider

More information

How To Test The Performance Of Different Communication Architecture On A Computer System

How To Test The Performance Of Different Communication Architecture On A Computer System Evaluation of the Traffic-Performance Characteristics of System-on-Chip Communication Architectures Kanishka Lahiri Dept. of ECE UC San Diego klahiri@ece.ucsd.edu Anand Raghunathan NEC USA C&C Research

More information