Evaluation of Architectural Alternatives to Reduce Power Consumption in a Network-on-Chip

Size: px
Start display at page:

Download "Evaluation of Architectural Alternatives to Reduce Power Consumption in a Network-on-Chip"

Transcription

1 Evaluation of Architectural Alternatives to Reduce Power Consumption in a Network-on-Chip Jaison Valmor Bruch, Cesar Albenes Zeferino University of Vale do Itajaí UNIVALI Laboratory of Embedded and istributed Systems LES Itajaí, Brazil {jaison, zeferino}@univali.br 4 : ma address if desired Abstract This work aimed at improving energy efficiency of a Network-on-Chip by applying and evaluating techniques to reduce the dynamic power dissipated by the network. Clock gating and data encoding techniques were applied in experiments based on SystemC simulation and synthesis in FPGA. Results confirmed the effectiveness of these techniques in reducing the switching activity, and identified limitations of the FPGA technology for the implementation of the evaluated techniques. Keywords-Power consumption; Network-on-Chip; FPGA. I. INTROUCTION Power consumption is one of the most critical issues in the design of digital systems, and the growing market for portable products and the inefficiency of the batteries exacerbate this issue [1]. Therefore, energy efficiency should be considered at each stage of the development process [2]. The power dissipation is also among the main causes for the shift from a single core paradigm to the multicore paradigm, in which the addition of cores to the chip, instead of increasing the operating frequency, allows an increasing in performance [2]. A multicore system, also known as SoC (System-on-Chip), comprises a computer system fully integrated into a single chip. Future SoCs will integrate from dozens to hundreds of cores and will present critical requirements for communication, including scalable performance and parallelism in communication. Bus-based architectures, commonly used in SoCs with few cores, will not meet these requirements. The Network-on-Chip (NoC) approach was proposed as an alternative to shared busses. These networks use point-to-point connections and present advantageous features regarding parallelism, operating frequency, power consumption, scalability and reusability [3]. The Laboratory of Embedded and istributed Systems of the University of Vale do Itajaí has a project in the area of NoCs called SoCIN (System-on-Chip Interconnection Network). This project aims at exploring NoC architectures with low silicon costs for the implementation of scalable embedded systems with high demand for communication. In systems using NoCs, the power consumption due to the mechanisms necessary for its implementation and the flow of information between the cores cannot be trivial. Therefore, it is necessary to use techniques that allow improve the energy efficiency. The network used in SoCIN Project was not originally designed considering these requirements. Aiming at improving its energy efficiency, this study applied techniques to reduce the power consumption in this NoC, so by reducing the dynamic power dissipation. We applied the techniques of clock gating and data encoding in experiments based on SystemC simulation and synthesis in FPGA. Simulation results confirmed the effectiveness of these techniques in reducing the switching activity, but the techniques applied to reduce the power dissipation resulted in silicon overhead and degradation of the maximum operating frequency. This paper is organized in five sections, including this one. Section II presents the types of power consumption inherent in CMOS technology and some techniques used to reduce this consumption. Following, Section III discusses some works that address the issue of power consumption in Networks-on-Chip. Section IV describes SoCIN, reference to this work, and discusses the implementation and evaluation of techniques for reducing power dissipation. Section V presents the final conclusions. II. REUCING POWERCONSUMPTION IN CMOS CIRCUITS Power consumption affects a large number of critical design decisions, such as cooling requirements, size of supply lines, supply capacity and the number of integrated circuits on a single chip [4]. According to [5], there are three main sources of power dissipation in CMOS circuits: ynamic power: when the device is switching, it dissipates dynamic power. The main source of dissipation is the switching activity due to charging and discharging of capacitances [6]; Static power: even when the device has no switching activity, leakage currents in CMOS circuits dissipate static power [2]; Short-circuit currents: it occurs in CMOS circuits due to the switching activity when both n-channel and p- channel conduct electricity for a short period. This results in a short-circuit, which is due to a pulse of current from V to GN [7]. Several techniques are applied to reduce the power dissipation in CMOS circuits. Among the most widely used, we highlight clock gating, data encoding, voltage and frequency adjustment, and power gating. We used the first two techniques and describe then in the next paragraphs.

2 The clock gating technique consists in disable the clock signal in circuits when the operation performed by these is irrelevant to the current state of the system. This shutdown causes a decrease in dynamic power dissipation by reducing the switching activity [6]. The data transfer lines account for a significant amount of the total energy consumed by a chip. One of the factors that cause this consumption is the transition of signals on the wires, which depends on the characteristics of the transferred data. The reduction in the number of transitions is obtained with data encoding technique [2]. Some of the existing encoding techniques include: Bus-Invert [8], T-Bus-Invert [9]; SILENT [1] and Gray [11]. III. RELATE WORKS This section presents some works that address the reduction of power consumption in Networks-on-Chip with the use of some of the techniques aforementioned. In [12], the Authors applied the clock gating technique to the internal components of a router and to the entire router. In the first case, the buffers are driven by the clock signal only when it is necessary to store a new data value and the arbiter is only updated when the last valid request is delivered. In the second case, when the router is idle, it is decoupled from the clock tree. The uniform traffic pattern and stream flows were applied in the experiments. The reductions in power dissipation were 72% and 58%, respectively. In [13], the Authors applied the following techniques for data encoding in a NoC: adaptive [14], Bus Invert, Gray and Transition [15] coding. The experiments were based on traffic patterns of real applications (HTML, WAV, MP3 and JPG) and demonstrated that the effectiveness of the encoding schemes is highly dependent on the traffic pattern applied to the network. In this work, we applied the techniques of clock gating (applied in [12]) and of Bus-Invert data encoding (applied in [13]) to SoCIN, as discussed in the next session. IV. REUCING POWER CONSUMPTION IN SOCIN A. SoCIN architecture SoCIN is a customizable Network-on-Chip based on a parameterizable router that can be configured in order to meet performance and costs requirements of a target application. The following features can be customized: channel width, buffers depth and the techniques used for switching, routing, arbitration and flow control. SoCIN uses a 2- mesh topology in which each router has a communication port named Local, at which can be attached a single core or a subsystem. Besides this port, there are from two to four communication ports for connection with the neighbor routers. Each communication link is composed of two point-to-point simplex channels composed of wires for data, packet framing and link flow control [16]. The communication ports are composed of two communication channels: Input Channel and Output Channel, and each one includes one parameterizable FIFO buffer (the FIFO at the output channels is optional). In this work, we applied the techniques of clock gating and Bus-Invert data encoding in order to reduce the dynamic power dissipated by the transference of data through wires and the storage of data at the FIFOs. B. Applying clock gating to the FIFO buffers Clock gating (CG) was applied to the circuitry of the FIFOs. These elements consist of flip-flop-based registers that store the flits blocked in the router. Buffers are synchronous circuits. Their registers are updated at each clock cycle, even if there is no writing operation. Therefore, each register adds a capacitive load to the clock tree and causes power consumption. The clock gating mechanism was applied to the buffer registers in order to avoid unnecessary loading and discharging of capacitances. This approach allows enabling the clock signal only for the register that stores a new value in a writing operation. The circuit shown in Figure 1 was the first one used in this work. Ena CLK C GCLK Figure 1. Clock gating circuit. According to [17], the circuit shown above is the clock gating solution traditionally implemented in ASIC (Application-Specific Integrated Circuit) technologies. In this circuit, the gated clock signal (GCLK) is derived from an AN operation between the enable signal (Ena) and the clock signal (CLK). The latch shown in Figure 1 is necessary to avoid the appearance of glitches at the output of the AN gate. 1) Evaluation of switching activity reduction In order to evaluate the effectiveness of the clock gating solution in reducing the switching activity at the FIFOs of SoCIN, experiments were performed starting from two different buffers implemented in SystemC. The first model includes the buffer modeled with registers that can calculate the switching activity of the clock signal, but with no clock gating. In the second type of buffer, the clock gating solution (Figure 1) was implemented in the registers of the FIFO. The experiments consisted in injecting a stream composed of 5 1-flit packets in a FIFO buffer of 8 34-bit positions in each buffer aforementioned. Table I shows the values obtained in the experiments. Compared to the original buffer, the clock gating technique allows obtaining a reduction of 87.57% in the switching activity. TABLE I. SWITCHING ACTIVITY IN FIFO BUFFERS Original buffer Buffer with gated clock Variation 8,32 transitions 998 transitions % 2) Synthesis in FPGA The evaluation of the use of clock gating in SoCIN was also done in silicon, and based on FPGA synthesis. The clock gating solution was described in VHL and synthesized to the Altera Cyclone II EP2C35672C6 device by using Altera uartus II (version 9.1, service pack 2). We obtained the

3 following metrics: (i) the silicon costs (expressed as the number of Lock-Up Tables LUTs and flip-flops FFs); (ii) the dynamic power dissipation (in mw); and (iii) the maximum operating frequency (in MHz). The experiments were performed in a single router with FIFO buffers at each input channel, each FIFO with 8 positions of 34 bits. The dynamic power consumption was obtained with a simulation of the NoC working at 1 MHz for a simulated time of 1 microseconds (1 us). Two traffic scenarios were used. In the first one, a zero-load traffic (with no data) was applied to the router with the purpose of identifying the power dissipated by the clock switching. In the second one, a flow composed of 5 1-flit packets was injected at the Local port in the direction of the East port. It was used an injection rate of 5% of the channel bandwidth and the data switching activity was of 1% (by toggling all the bits of adjacent flits in the packet payload). The network parameters were set to: wormhole switching, XY routing, round-robin arbitration and credit-based flow control. The clock gating (CG) solution depicted in Figure 1 presented a reduction in power dissipation. However, due to clock skew effect in the FPGA, data integrity was lost, as it was observed by simulation. Table II presents the results obtained in the experiments. For the zero-load scenario, the reduction in the dynamic power dissipation was of 24.53%. For the second scenario, the reduction was of 14.8%. Furthermore, there was a decrease in the operating frequency in 22.81% and an overhead of 5.24% in LUTs when comparing with the original implementation (with no clock gating). TABLE II. RESULTS FOR ASIC CLOCK GATING Silicon costs Fmax Pdin 1 MHz LUTs FFs (MHz) Scenario 1 Scenario 2 Original 1,59 1, ASIC CG 1,588 1, Variation (%) According to [17], the clock gating approach recommended for FPGA devices is implemented with a method different from the one typically adopted with ASIC technologies. A feedback multiplexer is used to emulate the functionality of a gated clock. As Figure 2 shows, the enable signal (Ena) controls the selector of a multiplexer. If Ena equals, is feedbacked to the. If Ena equals 1, the input of the flip-flop receives the current value of the input of the circuit. Ena CLK 1 Figure 2. Clock gating in FPGA (as recommended by [17]). We applied this model of clock gating in the buffers and obtained a reduction of 14.17% in the power dissipation for the first scenario (zero-loaded), but an increasing of 8.21% in the second scenario. The silicon overhead and the reduction in performance were minimal. Besides the increase in power dissipation in the second scenario, this technique also corrupted the data injected into the router and, therefore, it did not solve the problem found in the first clock gating implementation. Table III presents the values obtained in the experiments. TABLE III. RESULTS FOR FPGA CLOCK GATING Silicon costs Fmax Pdin 1 MHz LUTs FFs (MHz) Scenario 1 Scenario 2 Original 1,59 1, FPGA CG 1,51 1, Variation (%) The traditional clock gating circuits discussed so far were inefficient when synthesized to the target FPGA, increasing power dissipation or loosing the data integrity. Then, we applied an alternative circuit derived from the traditional approach used in ASIC technology, as it is shown in Figure 3. It follows the basic idea of clock gating: the register is synchronized by the clock signal only when Ena equals 1 and CLK equals. Ena CLK GCLK Figure 3. Alternative approach for clock gating in FPGA The previous clock gating techniques store the flits that compose the data packet during the rising edge of the clock signal. The alternative clock gating circuit inverts the clock signal that reaches the router, and, therefore, the flits are written to the buffer in the falling edge of the clock. This solves the problem of clock skew because all the data bits are already stable as the time of writing in the buffer. In the other implementations, different propagation delays of data signals and clock signals resulted in the lost of synchronization and data integrity. This approach showed a reduction of power dissipation in the two scenarios: 3.5% for Scenario 1 and 16.72% for Scenario 2, with a minimal silicon overhead (less than 1%). However, there was a considerable reduction in the maximum operating frequency, from MHz to 92.4 MHz. Table IV summarizes these values. TABLE IV. RESULTS FOR THE ALTERNATIVE CLOCK GATING Silicon costs Fmax Pdin 1 MHz LUTs FFs (MHz) Scenario 1 Scenario 2 Original 1,59 1, Alternative CG 1,519 1, Variation (%) Figure 4 shows a comparison of both implementations of clock gating for the two scenarios. It shows the efficiency of the alternative technique compared to other classical solutions to reduce the dynamic power dissipation. In contrast, Figure 5 illustrates the maximum operating frequency obtained with

4 each approach. The implementation that saves more energy power is the one that more degrades the operating frequency. Although this could be considered a drawback, it allows reaching better energy efficiency for systems that can operate at lower clock frequency. Furthermore, the network is parameterizable and offers other resources to meet the performance requirements of a target application, like different alternatives for switching, routing, flow control and arbitration. ynamic Power (mw) 4, 3,5 3, 2,5 2, 1,5 1,,5, Scenario 1 Scenario 2 Original ASIC CG FPGA CG Alternative CG Figure 4. Comparison between the clock gating implementations Maximum Operating Frequency (MHz) Original ASIC CG FPGA CG Alternative CG Figure 5. Performance comparison between the clock gating alternatives C. Applying data encoding to the NoC links In order to implement data encoding in SoCIN, we selected the Bus-Invert technique due to its simplicity. This technique is based on the inversion of the bits of a data to be transferred through a channel when there is a variation greater than 5% between their bits and bits of the last data transferred, thereby to reduce the switching activity along the channel. Typically, an additional wire is used to indicate when an inverted data is being transferred. When this bit is '1', it is because the bits that comprise the data transferred through the channel had to have their values inverted again by the receiver do recover its original value. To determine when data is to undergo the inversion operation, it is calculated the number of different bits (Hamming distance) between the current data in the channel and the next one to be sent. If Hamming distance is greater than half of data channel width, all the data bits must be inverted. The NoC used in this work uses a pair of bits for packet framing (the frame field). Originally, only three of the four combinations of the framing bits were defined (the first three rows of Table V). In order to avoid the addition of an extra wire to the data channel, the fourth combination of the framing bits ( 11 ) was chosen to signal when the inversion is applied to the payload flits. None inversion can be done to the header flit because it carries the routing information, and it would be necessary to add a decoder to each router port (what would be too expensive). Also, the trailer flit cannot be inverted because there is no way to signal when it is inverted without the addition of another bit to the channel. Frame field TABLE V. FRAMING COES Meaning Payload flit (Packet body) 1 Header flit (Packet head) 1 Trailer flit (Packet tail) 11 Inverted payload flit (packet body) 1) Evaluation of switching activity The evaluation of the effectiveness of the Bus-Invert technique in reducing the switching activity due to transference and storage of packets was performed with the implementation of SystemC RTL models of a FIFO buffer, a data encoder and a data decoder. The experiments were based on two SystemC simulation models. The first model included only the FIFO, which was implemented with registers able to measure the switching activity of data that they store. The second system includes the encoder and the decoder as it is shown in Figure 6. Encoder din wr wok FIFO dout rok rd ecoder Figure 6. Buffer FIFO with encoder and decoder The switching activity evaluation was performed with the injection of flit packets (each flit was 34-bit wide). Two traffic patterns were used. In the first one (Scenario 1), switching activity equals 1% of the bits between the header and the first payload flit and between each pair of adjacent flits of the payload. In the second traffic pattern (Scenario 2), switching equals 5% of the data bits plus 1 bit. The aim of this approach is to identify the reduction in switching activity for the best case (1%) and the minimal reduction (5% + 1 bit). Table VI shows the results obtained in each experiment. One can see that, in Scenario 1, the reduction in switching activity is 9.91% when employed the Bus-Invert encoding technique. For Scenario 2, the reduction in switching activity is 1.53%. Therefore, the experiment points to an improvement in the switching activity in order to reduce dynamic power dissipated by the buffers. However, the results do not consider the impact of the circuits necessary for encoding and decoding processes. This issue was evaluated with the synthesis in FPGA, described in the next sub-section.

5 TABLE VI. System Without Bus-Invert With Bus-Invert REUCTION OF THE FIFOS SWITCHING ACTIVITY Scenario 1 (1% of switching) Scenario 2 (5% + 1 bit of switching) 421,336 transitions 242,584 transitions 38,296 transitions 217,48 transitions Variation -9.91% -1.53% 2) Synthesis in FPGA The evaluation of the use of the Bus-Invert technique in SoCIN was performed with the implementation of VHL models of the encoder and decoder, and the integration of them in a system also composed of a mesh (Figure 7) with 34-bit channels (32 bits for data and 2 bits for packet framing) and 8-flit FIFOs operating at 1 MHz. The models were synthesized to the Altera Cyclone II EP2C35672C6 device under two levels of optimization to reduce power dissipation: off (with no optimization) and extra-effort (the synthesis tool applies the maximum effort in the optimization process, but can cause degradation in operating frequency). The obtained metrics were: (i) the silicon costs, (ii) the maximum operating frequency; and (iii) the dynamic power dissipation. ecoder W Encoder N N N N S S S S Figure 7. A 4 1 NoC with Bus-Invert encoder and decoder The silicon overhead of using the encoder and the decoder (in comparison with a system composed only of the NoC 4 1) was of 5% in LUTs and 1% in flip-flops. The maximum operating frequency suffered a degradation of 34.7 (%) when none power optimization was done by uartus II. With extraeffort, the reduction in operating frequency was of 41.9 (%). The evaluation experiments to obtain the power dissipation were based on the simulation of the the sending of a single packet containing 1Kbytes of data from 3 (the rightmost router in Figure 7) destined to the (the leftmost router in the figure), varying the amount of switching bits between successive flits: from only 1 bit (3.1%) to 32 bits (1%). The power evaluation of Bus-Invert technique was performed for the following configurations: 1. NoC 4 1: injection of a packet in a system including only the NoC (with no encoder or decoder). This allowed us to determine the power dissipated by the transference of packets without the influence of coding technique; 2. NoC 4 1 (Manual encoding): injection of a packet manually encoded in the NoC-only system. With this, we identified the effect of the encoding in power dissipation without the extra costs caused by the encoding and decoding process; E 3. NoC Encoder: injection of a non-encoded packet in a system composed of the NoC and the encoder in order to identify the power dissipation added by the encoder; and 4. NoC Encoder + ecoder: injection of a nonencoded packet in the system of Figure 7 to measure the total cost of implementing the Bus-Invert technique. Figure 8 shows the results of the experiments performed to evaluate the Bus-Invert coding technique without the use of automatic power optimization carried out by the synthesis tool. Pdin (nw) NoC 4x1 NoC 4x1 + Encoder Switching bits NoC 4x1 (Manual encoding) NoC 4x1 + Encoder + ecoder Figure 8. ynamic power without automatic power optimization Curve NoC 4 1 indicates the power dissipated by the network when the injected packet is not encoded or decoded. As one can see, the power dissipation increases linearly with the number of switching bits. Curve NoC 4 1 (Manual encoding) indicates the dynamic power dissipation when the coding technique is employed. In this case, data injected in the network is already encoded. One can see, the reduction in power dissipation when there are more than 17 switching bits. This occurs because each flit is composed of 34 bits, and 5% + 1 bit equal 18 bits. In the configurations in which the encoder and the decoder were added to the NoC (configurations 3 and 4), there is an increasing in power dissipation due to these components. However, when the encoder starts the process of inversion of bits (with 18 switching bits), the increasing in the dynamic power dissipation is even more significant. This power overhead is due to the encoding and decoding processes, but it was expected at least some power reduction. This did not occur due to the placement done by the synthesis tool when performing the technological mapping to the FPGA with no power optimization. By applying extra-effort for power optimization, we obtained better results, which are shown in Figure 9. However, the reduction of the dynamic power consumption in the fourth configuration (the complete system) occurs just when there are more than 26 switching bits. When applying the coding mechanisms, the power dissipation is lower than the one of the original structure when the encoder begins the process of inversion. With 17 switching bits, the power dissipation

6 increases, and a power reduction occurs just with 28 switching bits. REFERENCES Pdin (mw) NoC 4x1 NoC 4x1 (Manual encoding) NoC 4x1 + Encoder NoC 4x1 + Encoder + ecoder Switching bits [1] J. M. Rabaey and M. Pedram (Eds.), Low Power esign Methodologies. Norwell, MA: Kluwer, 1996, pp [2] S. Kaxiras and M. Martonosi, Computer Architecture Techniques for Power-Efficiency. Morgan and Claypool, 28. [3] C. A. Zeferino, Redes-em-Chip: arquiteturas e modelos para avaliação de área e desempenho, Ph Thesis, UFRGS, Brazil, 23. (in portuguese) [4] J. M. Rabaey, igital Integrated Circuits. Englewood Cliffs, NJ: Prentice-Hall, [5] A.P. Chandrakasan and R. Brodersen, Minimizing power consumption in digital CMOS circuits, Proceedings of the IEEE, 83(4): , Apr [6] M. Keating,. Flynn, R. Aitken, A. Gibbons, K. Shi, Low Power Methodology Manual, Springer 27. Figure 9. ynamic power with automatic power optimization V. CONCLUSIONS Based on the studies performed in this work, it was possible to identify several techniques for reducing power consumption in CMOS circuits that can be applied in a Network-on-Chip. Among these techniques, we had chosen to implement clock gating and data encoding based on the Bus-Invert approach. The experiments performed in SystemC confirmed the effectiveness of the chosen techniques in reducing the switching activity. However, the synthesis of these techniques in FPGA did not produce the expected results. The clock gating solutions described in the literature reduced the dynamic power dissipation. However, they presented problems related to the loss of data integrity. Then, we applied a solution that ensured data integrity and reduced the dynamic power dissipation at minimal silicon overhead. However, it reduced the maximum operating frequency. When applying data encoding, the Bus-Invert encoder and decoder were not effective in reducing the dynamic power dissipation, and, in many cases, leaded to increase the power dissipation. Although the silicon overhead was minimal, they resulted in an expressive reduction of the maximum operating frequency. As a final conclusion, we can state that the evaluated techniques are effective for reducing the switching activity, but the use of FPGA technology to obtain the silicon results did not allow a definitive evaluation of these techniques. Because of this, as future works, we intend to use ASIC technologies in order to overcome the constraints of FPGA technology. ACKNOWLEGMENTS This research was funded by UNIVALI (University of Vale do Itajaí), INCT NAMITEC (National Institute for Science and Technology on Micro and Nanoelectronic Systems) and CNPq Brazilian funding agency (grant number /28-4). [7] J.P. Colinge and C. A. Colinge, Physics of Semiconductor evices Boston, MA: Kluwer, 22. [8] M. R. Stan and W. P. Burleson, Bus-invert coding for low-power I/O, IEEE Transactions on VLSI Systems, Vol. 3, no. 1, pp , [9] J. C. S. Palma, Reduzindo o consumo de potência em redes intra-chip através de esquemas de codificação de dados, Ph Thesis, UFRGS, Brazil, 27. (in portuguese) [1] K. Lee et al., SILENT: Serialized Low-Energy Transmission Coding for On-Chip Interconnection Networks, In: Proceedings of IEEE International Conference on Computer Aided esign, Nov. 24, pp [11] C. L. Su, C. Y. Tsui, and A. M. espain, Low power architecture design and compilation techniques for high-performance processors, In: Proceedings of IEEE COMPCON, Feb [12] R. Mullins, Minimising dynamic power consumption in on-chip networks, In: Proceedings of International Symposium on System-on- Chip, Tampere, Finland, Nov. 26. [13] J. Palma, L. Indrusiak, F.G. Moraes, A. Garcia Ortiz, M. Glesner, R. Reis, Inserting ata Encoding Techniques into NoC-Based Systems In: Proceedings of IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Mar. 27. pp [14] L. Benini, A. Macii, E. Macii, M. Poncino, and R. Scarsi, Architectures and synthesis algorithm for power efficient bus interfaces, IEEE Transactions on Computer-Aided esign Integrated Circuits Systems, Vol. 19, no. 9, pp , Sep. 2. [15] P. Ramos and A. Oliveira. Low Overhead Encodings for Reduced Activity in ata and Address Buses, In: Proceedings of IEEE International Symposium on Signals, Circuits and Systems, July pp [16] C. A. Zeferino et al., Avaliação de esempenho de Rede-em-Chip Modelada em SystemC In: Proceedings of the 27rd Congress of Brazilian Computer Society - WPerformance, 27. pp (in portuguese) [17] Y. Zhang, J. Roivainen, and A. Mämmelä, Clock-gating in FPGAs: A novel and comparative evaluation, In: Proceedings of the 9th EUROMICRO Conference on igital System esign: Architectures, Methods and Tools, 26. pp

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

A Dynamic Link Allocation Router

A Dynamic Link Allocation Router A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection

More information

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,

More information

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of

More information

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip www.ijcsi.org 241 A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip Ahmed A. El Badry 1 and Mohamed A. Abd El Ghany 2 1 Communications Engineering Dept., German University in Cairo,

More information

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP SWAPNA S 2013 EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP A

More information

Topics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology

Topics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology Topics of Chapter 5 Sequential Machines Memory elements Memory elements. Basics of sequential machines. Clocking issues. Two-phase clocking. Testing of combinational (Chapter 4) and sequential (Chapter

More information

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip Cristina SILVANO silvano@elet.polimi.it Politecnico di Milano, Milano (Italy) Talk Outline

More information

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs Antoni Roca, Jose Flich Parallel Architectures Group Universitat Politechnica de Valencia (UPV) Valencia, Spain Giorgos Dimitrakopoulos

More information

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION T.S Ghouse Basha 1, P. Santhamma 2, S. Santhi 3 1 Associate Professor & Head, Department Electronic & Communication Engineering,

More information

Asynchronous Bypass Channels

Asynchronous Bypass Channels Asynchronous Bypass Channels Improving Performance for Multi-Synchronous NoCs T. Jain, P. Gratz, A. Sprintson, G. Choi, Department of Electrical and Computer Engineering, Texas A&M University, USA Table

More information

S. Venkatesh, Mrs. T. Gowri, Department of ECE, GIT, GITAM University, Vishakhapatnam, India

S. Venkatesh, Mrs. T. Gowri, Department of ECE, GIT, GITAM University, Vishakhapatnam, India Power reduction on clock-tree using Energy recovery and clock gating technique S. Venkatesh, Mrs. T. Gowri, Department of ECE, GIT, GITAM University, Vishakhapatnam, India Abstract Power consumption of

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design Applying the Benefits of on a Chip Architecture to FPGA System Design WP-01149-1.1 White Paper This document describes the advantages of network on a chip (NoC) architecture in Altera FPGA system design.

More information

Low-Overhead Hard Real-time Aware Interconnect Network Router

Low-Overhead Hard Real-time Aware Interconnect Network Router Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering

More information

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

From Bus and Crossbar to Network-On-Chip. Arteris S.A. From Bus and Crossbar to Network-On-Chip Arteris S.A. Copyright 2009 Arteris S.A. All rights reserved. Contact information Corporate Headquarters Arteris, Inc. 1741 Technology Drive, Suite 250 San Jose,

More information

Clocking. Figure by MIT OCW. 6.884 - Spring 2005 2/18/05 L06 Clocks 1

Clocking. Figure by MIT OCW. 6.884 - Spring 2005 2/18/05 L06 Clocks 1 ing Figure by MIT OCW. 6.884 - Spring 2005 2/18/05 L06 s 1 Why s and Storage Elements? Inputs Combinational Logic Outputs Want to reuse combinational logic from cycle to cycle 6.884 - Spring 2005 2/18/05

More information

Switched Interconnect for System-on-a-Chip Designs

Switched Interconnect for System-on-a-Chip Designs witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches 3D On-chip Data Center Networks Using Circuit Switches and Packet Switches Takahide Ikeda Yuichi Ohsita, and Masayuki Murata Graduate School of Information Science and Technology, Osaka University Osaka,

More information

LOW POWER DESIGN OF DIGITAL SYSTEMS USING ENERGY RECOVERY CLOCKING AND CLOCK GATING

LOW POWER DESIGN OF DIGITAL SYSTEMS USING ENERGY RECOVERY CLOCKING AND CLOCK GATING LOW POWER DESIGN OF DIGITAL SYSTEMS USING ENERGY RECOVERY CLOCKING AND CLOCK GATING A thesis work submitted to the faculty of San Francisco State University In partial fulfillment of the requirements for

More information

A 2-Slot Time-Division Multiplexing (TDM) Interconnect Network for Gigascale Integration (GSI)

A 2-Slot Time-Division Multiplexing (TDM) Interconnect Network for Gigascale Integration (GSI) A 2-Slot Time-Division Multiplexing (TDM) Interconnect Network for Gigascale Integration (GSI) Ajay Joshi Georgia Institute of Technology School of Electrical and Computer Engineering Atlanta, GA 3332-25

More information

TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy

TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy John Jose, K.V. Mahathi, J. Shiva Shankar and Madhu Mutyam PACE Laboratory, Department of Computer Science and Engineering

More information

Demystifying Data-Driven and Pausible Clocking Schemes

Demystifying Data-Driven and Pausible Clocking Schemes Demystifying Data-Driven and Pausible Clocking Schemes Robert Mullins Computer Architecture Group Computer Laboratory, University of Cambridge ASYNC 2007, 13 th IEEE International Symposium on Asynchronous

More information

AN EFFICIENT DESIGN OF LATCHES FOR MULTI-CLOCK MULTI- MICROCONTROLLER SYSTEM ON CHIP FOR BUS SYNCHRONIZATION

AN EFFICIENT DESIGN OF LATCHES FOR MULTI-CLOCK MULTI- MICROCONTROLLER SYSTEM ON CHIP FOR BUS SYNCHRONIZATION N EFFICIENT ESIGN OF LTCHES FOR MULTI-CLOCK MULTI- MICROCONTROLLER SYSTEM ON CHIP FOR US SYNCHRONIZTION noop Kumar Vishwakarma 1, Neerja Singh 2 1 Student (M.Tech.), ECE, ES Engineering College Ghaziabad,

More information

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Bradley R. Quinton Dept. of Electrical and Computer Engineering University of British Columbia bradq@ece.ubc.ca

More information

Analysis of Error Recovery Schemes for Networks-on-Chips

Analysis of Error Recovery Schemes for Networks-on-Chips Analysis of Error Recovery Schemes for Networks-on-Chips 1 Srinivasan Murali, Theocharis Theocharides, Luca Benini, Giovanni De Micheli, N. Vijaykrishnan, Mary Jane Irwin Abstract Network on Chip (NoC)

More information

Design and Verification of Nine port Network Router

Design and Verification of Nine port Network Router Design and Verification of Nine port Network Router G. Sri Lakshmi 1, A Ganga Mani 2 1 Assistant Professor, Department of Electronics and Communication Engineering, Pragathi Engineering College, Andhra

More information

Communication Networks. MAP-TELE 2011/12 José Ruela

Communication Networks. MAP-TELE 2011/12 José Ruela Communication Networks MAP-TELE 2011/12 José Ruela Network basic mechanisms Introduction to Communications Networks Communications networks Communications networks are used to transport information (data)

More information

Interconnection Networks

Interconnection Networks Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode

More information

SPEED-POWER EXPLORATION OF 2-D INTELLIGENCE NETWORK- ON-CHIP FOR MULTI-CLOCK MULTI-MICROCONTROLLER ON 28nm FPGA (Zynq-7000) DESIGN

SPEED-POWER EXPLORATION OF 2-D INTELLIGENCE NETWORK- ON-CHIP FOR MULTI-CLOCK MULTI-MICROCONTROLLER ON 28nm FPGA (Zynq-7000) DESIGN SPEED-POWER EXPLORATION OF 2-D INTELLIGENCE NETWORK- ON-CHIP FOR MULTI-CLOCK MULTI-MICROCONTROLLER ON 28nm FPGA (Zynq-7000) DESIGN Anoop Kumar Vishwakarma 1, Uday Arun 2 1 Student (M.Tech.), ECE, ABES

More information

Alpha CPU and Clock Design Evolution

Alpha CPU and Clock Design Evolution Alpha CPU and Clock Design Evolution This lecture uses two papers that discuss the evolution of the Alpha CPU and clocking strategy over three CPU generations Gronowski, Paul E., et.al., High Performance

More information

Introduction to System-on-Chip

Introduction to System-on-Chip Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

International Journal of Electronics and Computer Science Engineering 1482

International Journal of Electronics and Computer Science Engineering 1482 International Journal of Electronics and Computer Science Engineering 1482 Available Online at www.ijecse.org ISSN- 2277-1956 Behavioral Analysis of Different ALU Architectures G.V.V.S.R.Krishna Assistant

More information

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

A Generic Network Interface Architecture for a Networked Processor Array (NePA) A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction

More information

Packetization and routing analysis of on-chip multiprocessor networks

Packetization and routing analysis of on-chip multiprocessor networks Journal of Systems Architecture 50 (2004) 81 104 www.elsevier.com/locate/sysarc Packetization and routing analysis of on-chip multiprocessor networks Terry Tao Ye a, *, Luca Benini b, Giovanni De Micheli

More information

TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN

TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN USING DIFFERENT FOUNDRIES Priyanka Sharma 1 and Rajesh Mehra 2 1 ME student, Department of E.C.E, NITTTR, Chandigarh, India 2 Associate Professor, Department

More information

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX White Paper Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX April 2010 Cy Hay Product Manager, Synopsys Introduction The most important trend

More information

Static-Noise-Margin Analysis of Conventional 6T SRAM Cell at 45nm Technology

Static-Noise-Margin Analysis of Conventional 6T SRAM Cell at 45nm Technology Static-Noise-Margin Analysis of Conventional 6T SRAM Cell at 45nm Technology Nahid Rahman Department of electronics and communication FET-MITS (Deemed university), Lakshmangarh, India B. P. Singh Department

More information

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7 ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7 4.7 A 2.7 Gb/s CDMA-Interconnect Transceiver Chip Set with Multi-Level Signal Data Recovery for Re-configurable VLSI Systems

More information

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,

More information

Quality of Service (QoS) for Asynchronous On-Chip Networks

Quality of Service (QoS) for Asynchronous On-Chip Networks Quality of Service (QoS) for synchronous On-Chip Networks Tomaz Felicijan and Steve Furber Department of Computer Science The University of Manchester Oxford Road, Manchester, M13 9PL, UK {felicijt,sfurber}@cs.man.ac.uk

More information

Design and analysis of flip flops for low power clocking system

Design and analysis of flip flops for low power clocking system Design and analysis of flip flops for low power clocking system Gabariyala sabadini.c PG Scholar, VLSI design, Department of ECE,PSNA college of Engg and Tech, Dindigul,India. Jeya priyanka.p PG Scholar,

More information

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001 Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

Two-Phase Clocking Scheme for Low-Power and High- Speed VLSI

Two-Phase Clocking Scheme for Low-Power and High- Speed VLSI International Journal of Advances in Engineering Science and Technology 225 www.sestindia.org/volume-ijaest/ and www.ijaestonline.com ISSN: 2319-1120 Two-Phase Clocking Scheme for Low-Power and High- Speed

More information

CONTINUOUS scaling of CMOS technology makes it possible

CONTINUOUS scaling of CMOS technology makes it possible IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 7, JULY 2006 693 It s a Small World After All : NoC Performance Optimization Via Long-Range Link Insertion Umit Y. Ogras,

More information

8 Gbps CMOS interface for parallel fiber-optic interconnects

8 Gbps CMOS interface for parallel fiber-optic interconnects 8 Gbps CMOS interface for parallel fiberoptic interconnects Barton Sano, Bindu Madhavan and A. F. J. Levi Department of Electrical Engineering University of Southern California Los Angeles, California

More information

- Nishad Nerurkar. - Aniket Mhatre

- Nishad Nerurkar. - Aniket Mhatre - Nishad Nerurkar - Aniket Mhatre Single Chip Cloud Computer is a project developed by Intel. It was developed by Intel Lab Bangalore, Intel Lab America and Intel Lab Germany. It is part of a larger project,

More information

A Survey on Sequential Elements for Low Power Clocking System

A Survey on Sequential Elements for Low Power Clocking System Journal of Computer Applications ISSN: 0974 1925, Volume-5, Issue EICA2012-3, February 10, 2012 A Survey on Sequential Elements for Low Power Clocking System Bhuvana S ECE Department, Avinashilingam University

More information

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU ESE566 REPORT3 Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU Nov 19th, 2002 ABSTRACT: In this report, we discuss several recent published papers on design methodologies of core-based

More information

SYSTEM-ON-CHIP (SoC) design in the nanoelectronics era

SYSTEM-ON-CHIP (SoC) design in the nanoelectronics era 148 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 2, FEBRUARY 2006 Low-Power Network-on-Chip for High-Performance SoC Design Kangmin Lee, Student Member, IEEE, Se-Joong

More information

PROGETTO DI SISTEMI ELETTRONICI DIGITALI. Digital Systems Design. Digital Circuits Advanced Topics

PROGETTO DI SISTEMI ELETTRONICI DIGITALI. Digital Systems Design. Digital Circuits Advanced Topics PROGETTO DI SISTEMI ELETTRONICI DIGITALI Digital Systems Design Digital Circuits Advanced Topics 1 Sequential circuit and metastability 2 Sequential circuit - FSM A Sequential circuit contains: Storage

More information

Clock Distribution Networks in Synchronous Digital Integrated Circuits

Clock Distribution Networks in Synchronous Digital Integrated Circuits Clock Distribution Networks in Synchronous Digital Integrated Circuits EBY G. FRIEDMAN Invited Paper Clock distribution networks synchronize the flow of data signals among synchronous data paths. The design

More information

Optimizing Configuration and Application Mapping for MPSoC Architectures

Optimizing Configuration and Application Mapping for MPSoC Architectures Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : Sebastien.Le-Beux@polymtl.ca 1 Multi-Processor Systems on Chip (MPSoC) Design Trends

More information

Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip

Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip Manjunath E 1, Dhana Selvi D 2 M.Tech Student [DE], Dept. of ECE, CMRIT, AECS Layout, Bangalore, Karnataka,

More information

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7 ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7 13.7 A 40Gb/s Clock and Data Recovery Circuit in 0.18µm CMOS Technology Jri Lee, Behzad Razavi University of California, Los Angeles, CA

More information

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc. Filename: SAS - PCI Express Bandwidth - Infostor v5.doc Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI Article for InfoStor November 2003 Paul Griffith Adaptec, Inc. Server

More information

SOCWIRE: A SPACEWIRE INSPIRED FAULT TOLERANT NETWORK-ON-CHIP FOR RECONFIGURABLE SYSTEM-ON-CHIP DESIGNS

SOCWIRE: A SPACEWIRE INSPIRED FAULT TOLERANT NETWORK-ON-CHIP FOR RECONFIGURABLE SYSTEM-ON-CHIP DESIGNS SOCWIRE: A SPACEWIRE INSPIRED FAULT TOLERANT NETWORK-ON-CHIP FOR RECONFIGURABLE SYSTEM-ON-CHIP DESIGNS IN SPACE APPLICATIONS Session: Networks and Protocols Long Paper B. Osterloh, H. Michalik, B. Fiethe

More information

Qsys and IP Core Integration

Qsys and IP Core Integration Qsys and IP Core Integration Prof. David Lariviere Columbia University Spring 2014 Overview What are IP Cores? Altera Design Tools for using and integrating IP Cores Overview of various IP Core Interconnect

More information

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Design of a High Speed Communications Link Using Field Programmable Gate Arrays Customer-Authored Application Note AC103 Design of a High Speed Communications Link Using Field Programmable Gate Arrays Amy Lovelace, Technical Staff Engineer Alcatel Network Systems Introduction A communication

More information

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394

More information

Advanced Core Operating System (ACOS): Experience the Performance

Advanced Core Operating System (ACOS): Experience the Performance WHITE PAPER Advanced Core Operating System (ACOS): Experience the Performance Table of Contents Trends Affecting Application Networking...3 The Era of Multicore...3 Multicore System Design Challenges...3

More information

Sequential 4-bit Adder Design Report

Sequential 4-bit Adder Design Report UNIVERSITY OF WATERLOO Faculty of Engineering E&CE 438: Digital Integrated Circuits Sequential 4-bit Adder Design Report Prepared by: Ian Hung (ixxxxxx), 99XXXXXX Annette Lo (axxxxxx), 99XXXXXX Pamela

More information

On-Chip Interconnection Networks Low-Power Interconnect

On-Chip Interconnection Networks Low-Power Interconnect On-Chip Interconnection Networks Low-Power Interconnect William J. Dally Computer Systems Laboratory Stanford University ISLPED August 27, 2007 ISLPED: 1 Aug 27, 2007 Outline Demand for On-Chip Networks

More information

Topology adaptive network-on-chip design and implementation

Topology adaptive network-on-chip design and implementation Topology adaptive network-on-chip design and implementation T.A. Bartic, J.-Y. Mignolet, V. Nollet, T. Marescaux, D. Verkest, S. Vernalde and R. Lauwereins Abstract: Network-on-chip designs promise to

More information

Course 12 Synchronous transmission multiplexing systems used in digital telephone networks

Course 12 Synchronous transmission multiplexing systems used in digital telephone networks Course 12 Synchronous transmission multiplexing systems used in digital telephone networks o Disadvantages of the PDH transmission multiplexing system PDH: no unitary international standardization of the

More information

Interconnection Network Design

Interconnection Network Design Interconnection Network Design Vida Vukašinović 1 Introduction Parallel computer networks are interesting topic, but they are also difficult to understand in an overall sense. The topological structure

More information

Photonic Networks for Data Centres and High Performance Computing

Photonic Networks for Data Centres and High Performance Computing Photonic Networks for Data Centres and High Performance Computing Philip Watts Department of Electronic Engineering, UCL Yury Audzevich, Nick Barrow-Williams, Robert Mullins, Simon Moore, Andrew Moore

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 16 Timing and Clock Issues

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 16 Timing and Clock Issues EE 459/500 HDL Based Digital Design with Programmable Logic Lecture 16 Timing and Clock Issues 1 Overview Sequential system timing requirements Impact of clock skew on timing Impact of clock jitter on

More information

A Monitoring-Aware Network-on-Chip Design Flow

A Monitoring-Aware Network-on-Chip Design Flow A Monitoring-Aware Network-on-Chip Design Flow Calin Ciordas a Andreas Hansson a Kees Goossens b Twan Basten a a Eindhoven University of Technology, {c.ciordas,m.a.hansson,a.a.basten}@tue.nl b Philips

More information

Load Balancing & DFS Primitives for Efficient Multicore Applications

Load Balancing & DFS Primitives for Efficient Multicore Applications Load Balancing & DFS Primitives for Efficient Multicore Applications M. Grammatikakis, A. Papagrigoriou, P. Petrakis, G. Kornaros, I. Christophorakis TEI of Crete This work is implemented through the Operational

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

Low Power AMD Athlon 64 and AMD Opteron Processors

Low Power AMD Athlon 64 and AMD Opteron Processors Low Power AMD Athlon 64 and AMD Opteron Processors Hot Chips 2004 Presenter: Marius Evers Block Diagram of AMD Athlon 64 and AMD Opteron Based on AMD s 8 th generation architecture AMD Athlon 64 and AMD

More information

White Paper Increase Flexibility in Layer 2 Switches by Integrating Ethernet ASSP Functions Into FPGAs

White Paper Increase Flexibility in Layer 2 Switches by Integrating Ethernet ASSP Functions Into FPGAs White Paper Increase Flexibility in Layer 2 es by Integrating Ethernet ASSP Functions Into FPGAs Introduction A Layer 2 Ethernet switch connects multiple Ethernet LAN segments. Because each port on the

More information

An On-chip Security Monitoring Solution For System Clock For Low Cost Devices

An On-chip Security Monitoring Solution For System Clock For Low Cost Devices An On-chip Security Monitoring Solution For System Clock For Low Cost Devices Frank Vater Innovations for High Performance Microelectronics Im Technologiepark 25 15236 Frankfurt (Oder), Germany vater@ihpmicroelectronics.com

More information

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems Harris Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems David Harris Harvey Mudd College David_Harris@hmc.edu Based on EE271 developed by Mark Horowitz, Stanford University MAH

More information

SoC IP Interfaces and Infrastructure A Hybrid Approach

SoC IP Interfaces and Infrastructure A Hybrid Approach SoC IP Interfaces and Infrastructure A Hybrid Approach Cary Robins, Shannon Hill ChipWrights, Inc. ABSTRACT System-On-Chip (SoC) designs incorporate more and more Intellectual Property (IP) with each year.

More information

EFFECTS OF NoC ARCHITECTURAL PARAMETERS IN MPSoC PERFORMANCE

EFFECTS OF NoC ARCHITECTURAL PARAMETERS IN MPSoC PERFORMANCE Pontifícia Universidade Católica do Rio Grande do Sul Faculdade de Engenharia / Faculdade de Informática Engenharia de Computação EFFECTS OF NoC ARCHITECTURAL PARAMETERS IN MPSoC PERFORMANCE Douglas Roberto

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

RTL Low Power Techniques for System-On-Chip Designs

RTL Low Power Techniques for System-On-Chip Designs RTL Low Power Techniques for System-On-Chip Designs Mike Gladden Motorola, Inc. Austin, TX rwdb80@email.sps.mot.com Indraneel Das Synopsys, Inc. Austin, TX ineel@synopsys.com ABSTRACT Low power design

More information

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim Department of Computer Science and Engineering Texas A&M University

More information

路 論 Chapter 15 System-Level Physical Design

路 論 Chapter 15 System-Level Physical Design Introduction to VLSI Circuits and Systems 路 論 Chapter 15 System-Level Physical Design Dept. of Electronic Engineering National Chin-Yi University of Technology Fall 2007 Outline Clocked Flip-flops CMOS

More information

How To Test The Performance Of Different Communication Architecture On A Computer System

How To Test The Performance Of Different Communication Architecture On A Computer System Evaluation of the Traffic-Performance Characteristics of System-on-Chip Communication Architectures Kanishka Lahiri Dept. of ECE UC San Diego klahiri@ece.ucsd.edu Anand Raghunathan NEC USA C&C Research

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

Extending Platform-Based Design to Network on Chip Systems

Extending Platform-Based Design to Network on Chip Systems Extending Platform-Based Design to Network on Chip Systems Juha-Pekka Soininen 1, Axel Jantsch 2, Martti Forsell 1, Antti Pelkonen 1, Jari Kreku 1, and Shashi Kumar 2 1 VTT Electronics (Technical Research

More information

NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter

NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter Description: The NTE2053 is a CMOS 8 bit successive approximation Analog to Digital converter in a 20 Lead DIP type package which uses a differential

More information

COMMUNICATION PERFORMANCE EVALUATION AND ANALYSIS OF A MESH SYSTEM AREA NETWORK FOR HIGH PERFORMANCE COMPUTERS

COMMUNICATION PERFORMANCE EVALUATION AND ANALYSIS OF A MESH SYSTEM AREA NETWORK FOR HIGH PERFORMANCE COMPUTERS COMMUNICATION PERFORMANCE EVALUATION AND ANALYSIS OF A MESH SYSTEM AREA NETWORK FOR HIGH PERFORMANCE COMPUTERS PLAMENKA BOROVSKA, OGNIAN NAKOV, DESISLAVA IVANOVA, KAMEN IVANOV, GEORGI GEORGIEV Computer

More information

A New Paradigm for Synchronous State Machine Design in Verilog

A New Paradigm for Synchronous State Machine Design in Verilog A New Paradigm for Synchronous State Machine Design in Verilog Randy Nuss Copyright 1999 Idea Consulting Introduction Synchronous State Machines are one of the most common building blocks in modern digital

More information

Clock Distribution in RNS-based VLSI Systems

Clock Distribution in RNS-based VLSI Systems Clock Distribution in RNS-based VLSI Systems DANIEL GONZÁLEZ 1, ANTONIO GARCÍA 1, GRAHAM A. JULLIEN 2, JAVIER RAMÍREZ 1, LUIS PARRILLA 1 AND ANTONIO LLORIS 1 1 Dpto. Electrónica y Tecnología de Computadores

More information

Efficient Built-In NoC Support for Gather Operations in Invalidation-Based Coherence Protocols

Efficient Built-In NoC Support for Gather Operations in Invalidation-Based Coherence Protocols Universitat Politècnica de València Master Thesis Efficient Built-In NoC Support for Gather Operations in Invalidation-Based Coherence Protocols Author: Mario Lodde Advisor: Prof. José Flich Cardo A thesis

More information

Next Generation High Speed Computing Using System-on-Chip (SoC) Technology

Next Generation High Speed Computing Using System-on-Chip (SoC) Technology Next Generation High Speed Computing Using System-on-Chip (SoC) Technology Qurat-ul-Ain Malik 1 and M. Aqeel Iqbal 2 Department of Software Engineering Faculty of Engineering & IT, FUIEMS, Rawalpindi (46000),

More information

Implementation and Design of AES S-Box on FPGA

Implementation and Design of AES S-Box on FPGA International Journal of Research in Engineering and Science (IJRES) ISSN (Online): 232-9364, ISSN (Print): 232-9356 Volume 3 Issue ǁ Jan. 25 ǁ PP.9-4 Implementation and Design of AES S-Box on FPGA Chandrasekhar

More information

11. High-Speed Differential Interfaces in Cyclone II Devices

11. High-Speed Differential Interfaces in Cyclone II Devices 11. High-Speed Differential Interfaces in Cyclone II Devices CII51011-2.2 Introduction From high-speed backplane applications to high-end switch boxes, low-voltage differential signaling (LVDS) is the

More information

Universal Flash Storage: Mobilize Your Data

Universal Flash Storage: Mobilize Your Data White Paper Universal Flash Storage: Mobilize Your Data Executive Summary The explosive growth in portable devices over the past decade continues to challenge manufacturers wishing to add memory to their

More information

Implementation of Web-Server Using Altera DE2-70 FPGA Development Kit

Implementation of Web-Server Using Altera DE2-70 FPGA Development Kit 1 Implementation of Web-Server Using Altera DE2-70 FPGA Development Kit A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT OF FOR THE DEGREE IN Bachelor of Technology In Electronics and Communication

More information

DDR subsystem: Enhancing System Reliability and Yield

DDR subsystem: Enhancing System Reliability and Yield DDR subsystem: Enhancing System Reliability and Yield Agenda Evolution of DDR SDRAM standards What is the variation problem? How DRAM standards tackle system variability What problems have been adequately

More information

On-Chip Interconnect: The Past, Present, and Future

On-Chip Interconnect: The Past, Present, and Future On-Chip Interconnect: The Past, Present, and Future Professor Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester URL: http://www.ece.rochester.edu/~friedman Future

More information

Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA

Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA B. Neji 1, Y. Aydi 2, R. Ben-atitallah 3,S. Meftaly 4, M. Abid 5, J-L. Dykeyser 6 1 CES, National engineering School

More information