Real-time Processor Interconnection Network for FPGA-based Multiprocessor System-on-Chip (MPSoC)
|
|
|
- Joshua Holland
- 10 years ago
- Views:
Transcription
1 Real-time Processor Interconnection Network for FPGA-based Multiprocessor System-on-Chip (MPSoC) Stefan Aust, Harald Richter Department of Computer Science Clausthal University of Technology Julius-Albert-Str Clausthal-Zellerfeld, Germany stefan.aust [email protected] Abstract This paper introduces a new approach for a network on chip (NOC) design which is based on a NlogN interconnect topology. The intended application area for the NOC is the real-time communication of multiprocessors that are hosted by a single Field Programmable Gate Array (FPGA). The proposed NOC is an on-chip multistage interconnection network for which an upper limit can be guaranteed that is at most needed for the latency while delivering data between sending and receiving processors. The reason for the deterministic interprocessor communication is the constant path length from input to any output port of the NOC. In contrast to contemporary NOCs, no intermediate routers exist. Thus, no overloaded router with hot spot problems can occur, and the proposed NOC can be used for real-time applications. Example NoCs of size 4x4 and 8x8 were implemented in VDHL, together with their softcore processors on Spartan3 and Virtex-4 and -5 FPGAs from Xilinx. Keywords network on chip; multistage interconnection network; softcore processor; real-time multiprocessor; FPGAbased multiprocessor I. INTRODUCTION The increasing quantity of logic cells that can be integrated into a single FPGA allows novel solutions by using the system on chip (SoC) paradigm. Just recently, multiprocessor system on chip (MPSoC) applications have become feasible that are hosted by a single FPGA [1,2]. In such MPSoCs, each processor exists only as Verilog or VDHL [3] description that can be extended or modified as needed, and that is afterwards synthesized for a target FPGA such as Spartan3 or Virtex-4/-5/-6 from Xilinx, for example. Because of the adaptability of the processor architecture to the demands of the real-time system, such computing devices are called soft-core or soft processors. MPSoCs with soft processors exhibit both, the high performance of parallel computers and the flexibility of reconfigurable hardware [4]. In real-time systems, data- and computing-intensive applications can make use of this technological progress. For instance, driver assistant systems in cars require to service more sensors and actuators than ever. Such applications demand higher computing power and less electrical power at the same time, while the system size has to be minimized. To match such demands, the proposed network on chip (NoC) design can be used in MPSoCs. In the future, we believe that MPSoCs will replace in part conventional electronic controller units in automobiles as well as in complex machinery [5]. The majority of embedded systems are located in realtime applications. Amongst others, the real-time performance of multiprocessor computers relies on the predictability of the interprocessor communication. For an MPSoC, deterministic behaviour of the interconnection network has to be guaranteed. This requirement is hardly to implement with conventional packet routing that takes place in direct, i.e. static networks. In static networks, adaptive multi-hop routing together with packet prioritization induces an undesirable indeterminism to network latency. The formation of hotspots due to excessive data traffic in router nodes excludes predictability also. We therefore propose, a new paradigm for MPSoCs, which makes use of multistage interconnection networks (MINs) as a network on chip. This paper is organized as follows: in section 2, the state of the art in NoCs is given. Section 3 makes a recap of MINs. Their utility and their problems in on-chip usage are investigated in section 4. In section 5, the topology of the proposed NoC is presented, and in section 6 its chosen implementation is described, together with the MPSoC for which it was developed. The paper ends with conclusion and literature reference in section 7. II. STATE-OF-THE ART IN NETWORKS ON CHIP Interprocessor communication in MPSoCs with tens of cores or more is no longer feasible by using shared buses due to their low intrinsic scalability in bandwidth and latency [6,7]. Also crossbar structures are no longer practicable due to their O(N 2 ) complexity if N becomes large. To overcome the von Neumann bottleneck and the O(N 2 ) increase in hardware, alternative architectures have been introduced by NoCs as the new paradigm in SoC design [8]. Since then considerable number of NoC designs have been proposed which provide diverse communication types and network topologies [7,9]. NoCs with direct (static) networks have been proposed by [10,11,12,13] such as mesh, tree, torus or hypercube. Some examples of these static topologies are given in Fig. 1. The basic principle of direct network topologies is that each processor is connected directly to a smaller number of neighbour processors where each processor acts in addition as a switch or router node for 47
2 introduced by Lawrie [15], consists of shuffle and exchange permutations in logn stages and can be defined by Ω n = ( σ n E ) n (1) Figure 1. Toplogies of direct networks: a) mesh; b) tree; c) hypercube. frames or packets, respectively. Routing is performed either statically or dynamically via the source and target information contained in the frame/packet headers. The communication channels between the processor nodes are operating on layer 3 of the ISO 7-layer model. Finally, some nodes provide additional communication channels for the necessary I/O. A desirable property of NoCs for MPSoCs is scalability, which means that for small and large processor numbers as well, the same basic interconnect structure, can be used in principle. Another property which is mandatory for real-time systems is that the achievable bandwidth and the maximum latency for data transfer is deterministic i.e. predictable. This means that an upper limit for the latency must be guaranteed that is at most needed for data transfer, as well as a lower limit for the bandwidth. This is required to build and program systems that can react in due time. However, all direct networks have the potential hazard of hot spots that are overloaded router nodes. Hot spots result in unpredictable bandwidth and latency, which in turn is not tolerable for real-time systems. Furthermore, scalability is also not possible if hot spots occur. This is why we propose an alternative to direct networks that can be used for NoCs in MPSoCs and that is based on indirect networks. In indirect networks, computing nodes are connected via a cascaded set of switches. Because of the switch arrangement, each path from source to target is of the same length, and every switch has to serve only a fixed number of traffic streams. Thus, hot spots cannot occur, if the rearrangable non-blocking subtype of indirect networks is used. Indirect networks are described in the next section. III. MULTISTAGE INTERCONNECTION NETWORK (MIN) By origin, MINs were proposed for telephone exchange systems, and later for parallel computers. Vector supercomputers, multiprocessors and multicomputers with processors on individual silicon chips were introduced two decades ago for high-performance computing. MINs have been designed to match their bandwidth and latency constraints and to support effective execution of parallelized algorithms. Therefore, MINs are known from parallel computers, and we have adopted these structures to provide for deterministic on-chip interprocessor communication. MINs connect computing nodes through a set of elementary switches that are organized in 1, 3 or logn stages, where N is the number of the ports the network features. The mathematical patterns between the switch stages are permutation functions, such as perfect shuffle, butterfly or bit reversal [14]. The Omega network, for example, which was where n is the total number of stages, σ is the shuffle permutation over n bits, and E is an exchange stage [16]. An example for an Omega network of size 8x8 is given in Fig. 2. By using the smallest possible switch size of 2x2, the construction of a MIN needs (N/2)log 2 N switches only, which is the minimum number possible at all. For comparison, a crossbar network requires N 2 switches. Typical representatives for logn-mins are Omega, Baseline and Butterfly networks [16,17,18,19]. They belong to the socalled delta subclass of MINs which means that routing through the logn-min is easily accomplished by using bit after bit of the target port address in order to set each switch so that it routes data either = (parallel) or x (cross). Because of the constant number of switches that data has to pass from the network input port to output port, a constant routing time or at least an upper limit for the routing canbe guaranteed. MINs are therefore beneficial for interprocessor communication with respect to latency, which is important for real-time applications. However, constant routing time cannot be guaranteed for transfers that take place at all input ports simultaneously. The reason is that each output can be reached from every input in principle, but there are permutations of inputs to outputs that can not be realized which is why MINs are called fully reachable but blocking. This is the main disadvantage of logn-mins. There are two other categories of MINs, which are called Clos and Benes networks that do not belong to the logn type and that are non-blocking [19]. Unfortunately, the Clos network has a switch complexity of (3/2)N N, and the Benes network has Nlog 2 N complexity which is both not the minimum logn-mins have. This means for the applicability of Clos and Benes MINs as on-chip networks that they consume more chip area as needed, and that they need more electrical power as logn-mins do. Both are disadvantages for VLSI integration. Furthermore, Clos and Benes networks are non-blocking only for the price of rearranging already existing internal paths through the network, which is a problem for ongoing real-time transmissions. During path rearrangement, no data transfer can take place. Finally, path Figure 2. Omega topology of size 8x8. 48
3 rearrangements require a central control instance that executes the rearranging algorithm. However, a central control unit prevents from scalability, and the rearranging algorithm is so complex that it consumes a processor by its own, together with considerable computing time. IV. IDEAL ON-CHIP MIN In general, we can state that there is no ideal on-chip interconnection network with good scalability, deterministic routing latency for real-time capability, minimum chip area and minimum power consumption at the same time. However, with the introduction of FIFO buffers, each disadvantage of the above networks could be solved, at least for many practical applications. In the case of logn-mins for example, FIFOs at all switch stages are needed as temporary storage for incoming data to hold them until the blocking situation is managed. Blocking management has to be made every time when the MPSoC requests a forbidden permutation from input to output ports. Delaying and serializing the needed transfers via FIFO accomplish this. Data is then delayed until blocking is over, and afterwards data is read out from the FIFO one after the other. A positive aspect is that blocking management can be achieved in a fully decentralized manner. In the case of Clos and Benes MINs, FIFOs are required only at the input ports to store incoming data until the internal path rearrangements have been accomplished. If a permutation from input to output needs rearrangement, then the input FIFOs are filled while the network is drained. When all network-internal paths are empty, rearrangement can take place by setting switches newly. After that, data is let again into the network. In both cases, the FIFO solution is not perfect because it introduces indeterministic delays in the MPSoC interprocessor communication. Depending on the filling state of a FIFO and depending on the needed transfers per time unit, more or less data frames or packets have to be temporarily stored in the FIFOs. Only by means of a fixed FIFO depth, an upper limit for the maximum latency can be stated for data delivery. However, this is sufficent in practice for many real-time applications. FIFO overflow can occur, but it is considered as a programming fault of the MPSoC. It has to be mentioned here also that is not the fault of the network but of the programmer if two input ports want to deliver data to the same output port at the same time. This is comparable to writing the same variable in a shared memory from two processors at the same time. To summarize, the state of the art in on-chip networks is that logn-mins are the best option because of their O(NlogN) scalability and their minimum chip and power consumption compared to busses, crossbars, Clos and Benes networks. Therefore, we propose a logn-min as the preferred NoC for MPSoC. In the next section we will explain which type of logn-min is best suited, and what we did to improve its real-time behaviour. V. TOPOLOGY OF THE PROPOSED MINOC The network topology we decided for is known as Baseline network [20]. In Fig. 3, a Baseline network of size 16x16 is presented, together with two routing examples. This topology has been introduced in 1980 by C. Wu and T. Feng to proof equivalence among logn-mins. The stages in the Baseline network are connected via an unshuffle wiring. The topology of the Baseline network is mathematically isomorphic to other networks of the log 2 N class but the network has technological advantages compared to other logn-mins. The production of the Baseline network is characterized by a recursive construction. Each stage is of 1,2,4,... sub-networks of the same type. From the view-point of the first stage, the Baseline consists of one switch block Figure 3. Baseline topology of size 16x16. of size NxN. The second stage contains two switch blocks of size (N/2) x (N/2) and so forth. That iterates down to the smallest blocks of size 2x2 as atomic elements. In addition, each stage has the minimum possible number of crossing wires, and the wires have minimum lengths [14]. Both features, recursive construction and minimum wiring, are advantages for implementation in VLSI or FPGA that are not found in other logn-mins. In Baseline networks, the routing algorithm evaluates the most significant bit of the n=logn bits of the target address first [16]. With every bit evaluation, the interval of possible output ports is halved. After n steps, the target output port is exactly specified. This routing algorithm is a good example of the divide-andconquer principle known from theoretical informatics. Finally, the recursive construction of the Baseline network eases its definition in VHDL. The VHDL code of a 2x2 switch is for example: signal A, B, C, D: std_logic_vector(0 to 31); shared variable S: boolean; C <= A when S = false -- parallel connection else B; -- cross connection D <= A when S = true -- parallel connection else B; -- cross connection 49
4 where A and B are inputs, C and D are outputs and S is the switch state. Not shown are FIFO buffers, but as we have learned from practice, the FPGA synthesis of the switch controller is much more complex than the switches and the FIFO. With our preferred MINoC, we yield a MPSoC of the symmetric multiprocessor type that is depicted in Fig. 4. Its architecture includes softcore processors P, local memory M P and shared memory M S. A. Switches and Wiring In the following sections of this paper we refer to message-based interprocessor communication. However, with the network coupling of shared memory (M S ) interprocessor communication via shared variables becomes feasible as well. As seen before, the switches feature two states for direct and crossed connection paths, but for parallel computing mechanisms for task synchronization are needed also. These can be implemented by two additional switch states called upper and lower broadcast (Fig. 6). Direct and crossed connection paths are used in point-to-point communication Figure 6. States of the switch: a) straight and cross b) upper and lower broadcast. Figure 4. Block diagram of the resulting MPSoC Architecture. With this architecture, both programming paradigms of message passing and shared memory are supported simultaneously. VI. IMPLEMENTATION OF THE MINOC The MINoC is implemented by adding a custom VHDL core to the existing description of a Xilinx Microblaze soft processor [21]. The block diagram of the so enhanced MicroBlaze is shown in Fig. 5. Our custom IP core realizes the MINoC for the MPSoC. It consists of three components: 1.) switches 2.) network interface, and 3.) network controller. The first component (switches) implements the described Baseline topology. The second component is the network interface. It connects the MicroBlaze via its proprietary FSL bus [22,23] to the MINoC input ports and output ports. The third component is the network controller, which we have introduced to improve real-time behaviour. The network controller allows for interprocessor communication only in fixed points in time. This can guarantee a better upper limit for latency in data delivery. Figure 5. Block diagram of the soft processor enhanced by a MINoC. between sender/receiver pairs via message passing. Broadcast communciation is needed for synchronous task start and stop and for distributing input data to processors. Both types of communication are required in the intended application domain of real-time embedded systems. The wire patterns between the stages are implemented via bidirectional communication channels that are defined as signals in VHDL. With bidirectional channels, handshake protocols between sender and receiver are implemented. B. Network Interface The soft processors are connected to the interconnection network via the Fast Simplex Link (FSL) from Xilinx [23]. Each FSL interface provides an uni-directional point-to-point communication channel that includes a FIFO buffer. The FIFO buffer decouples the processor clock from the network but it introduces indeterminism as described that cannot be avoided. However, the network controller reduces the jitter in message latency during data transfer. Since the FSL interface is an internal part of the soft processor, communication takes 2 processor cycles only for transferring a 32-bit word from sender to receiver if the FIFO buffers are free. Therefore, the FSL enables a high-speed interprocessor communication. When the FSL interface is added to the soft processor, the MicroBlaze instruction set is augmented with four additional instructions: Blocking Read (get) Non-blocking Read (nget) Blocking Write (put) Non-blocking Write (nput) These instructions are for reading from and writing data to the FIFO of the FSL interfaces. As soon as data are written to FIFO, the put instruction terminates and the NoC moves all data from source to destination FIFO. Finally, a get instruction reads the data out of the receive FIFO. 50
5 C. Network Controller The MIN operates not by packet- or message switching but circuit switching. This provides a direct connection between sender and receiver and delivers maximum speed for interprocessor communication since no frame or packet header is required that would be overhead only. Thus all data are received in sent order. Furthermore, a time-slot that is returning periodically is granted to each processor according to a scheduling policy where messages can be sent. The scheduling policy has been implemented in the network controller in hardware by means of VHDL. When the network controller schedules a data transfer, a communication time-slot opens, and the network controller establishes a physical path between a pair of processors. As long as the time-slot stays open, data can be sent directly from sender to receiver with full speed of the processor interfaces. After a time-slot has closed, the next path will be opened round-robin. In our tests we have used preemptive scheduling with a fixed slot time. For this purpose, a central network controller was implemented and tested. This network controller serves all connection requests from the processors. Preemption is made if a processor whose time slot has arrived does not want a data transfer through the network. The desribed scheduling policy is identical to task scheduling in real-time operating systems. The usage of a scheduling algorithm for data transfer results in better predictability to the network latency which suffers from the indeterminism of the FIFOs. Furthermore, several scheduling algorithms are possible, such as priority scheduling or earliest-deadline-first which are known from real-time operating systems. D. Overall Architecture The entire MPSoC including the MINoC has been implemented and tested with evaluation boards carrying FPGAs from Xilinx of the Spartan-3, Virtex-4 and Virtex-5 types. As boards have been used the ML 505 from Xilinx with a Virtex-5 FPGA, the XpressFX100 from PLDA with a Virtex-4 and the Spartan-3 starter kit board from Digilent with a Spartan-3 chip. With Spartan-3, a 4x4 network was implemented together with 4 MicroBlazes on the same Figure 7. Overall architecture of the MPSoC. FPGA. On the Virtex-4 and the Virtex-5 board, an MPSoC with a MINSoC of size 8x8 has been implemented. All processors are Xilinx MicroBlaze softcores that emulate a 32-bit RISC processor. Each soft processor features private local memory with Block RAM for instructions and data. Multiple Block RAMs are linked to processors via the Local Memory Bus (LMB) [21]. All MicroBlazes in turn are coupled to the interconnection network via the FSL-MIN interface as shown in Fig. 5. An overview of the system architecture as it was implemented is given in Fig. 7. With using the FSL processor interface we can measure that it takes two clock counts of the processor to write or read a single 32-bit data to or from the network interface. Once the connection from sender to receiver has been established no additional latency of the circuit-switched network exists, so that a network clock rate of 100 MHz induces a real data transfer rate of 1.6 Gbit/s per connection. Depending on the topology of the network multiple connections between sender/receiver pairs can be established at the same time without additional latencies. Higher network clock rates are feasible because the FSL can be operated in asynchronous mode. The maximum frequency depends on the FPGA chip that is used for implementation. Our design studies have shown, that the network topology can be implemented in FPGA hardware without problems. The challenge is the implementation of the network controller with its routing algorithm in an efficient way. But, once the network is fully implemented in FPGA hardware, the MINoC provides a deterministic highperformance network for interprocessor communication in MPSoCs. VII. CONCLUSION AND FUTURE WORK This paper proposes an on-chip multistage interconnection network with the least possible number of hardware, the minimum amount of wiring between stages and the minimum wire lengths. It can be used for highperformance interprocessor communication in real-time applications. Although logn-mins have been already researched and used in parallel super computers, they can be adapted also for network-on-chips as well. High bandwidth and low latency are combined with a deterministic behavior of interprocessor communication in the proposed NoC. The objective is to use MPSoCs in high-performance embedded systems with hard real-time constraints that can be found in electronic control units for cars or for production machinery. In future work we want to discuss the pros and cons of blocking and non-blocking MINs for real-time computing and their implementation in FPGA hardware. Blocking networks such as the Baseline network minimize the costs in hardware but they require a suitable scheduling strategy because not all permutations of sender/receiver pairs can be realized. Therefore further scheduling algorithms such as priority scheduling or earliest-deadline-first have to be considered for hardware implementation. In comparison with blocking networks, non-blocking networks as the Benes network can be operated without complex scheduling strategies since messages can be sent to each free receiver port at any time. On the other hand non-blocking networks exhibit extra costs in hardware plus they require complex routing algorithms due to the rearrangement of alternative connection paths. 51
6 REFERENCES [1] A. Jerraya and W. Wolf, Multiprocessor Systems-on-Chip, San Francisco, CA: Morgan Kaufmann, [2] M. Grant, Overview of the MPSoC design challenge, in Proceedings of the 43rd annual Design Automation Conference, San Francisco, CA, July 2006, pp [3] U. Heinkel, M. Padeffke, W. Haas, and T. Buerner, The VHDL Reference, Cichester, England: John Wiley & Sons, [4] C. Bobda, T. Haller, and F. Muehlbauer, Design of Adaptive Multiprocessor on Chip Systems, in SBCCI 2007, Rio de Janeiro, Sept. 2007, pp [5] S. Aust and H. Richter, Space Division of Processing Power For Feed Forward and Feed Back Control in Complex Production and Packaging Machinery, in WAC 2010, Kobe, Japan, Sept [6] P. Guerrier and A. Greiner, A generic architecture for on-chip packet switched interconnections, in DATE2000, March 2000, pp [7] H. G. Lee, N. Chang, U. Y. Ogras, and R. Marculescu, On-chip communication architecture exploration: A quantitative evaluation of point-to-point, bus, and network-on-chip approaches, in ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 3, Article 23, August [8] L. Benini and G. De Micheli, Networks on chips: a new SoC paradigm, in IEEE Computer magazine, vol. 35, no. 1, Jan. 2002, pp [9] D. Atienza, F. Angiolini, S. Murali, A. Pullini, L. Benini, and G. De Micheli, Network-on-Chip design and synthesis outlook, in Integration, the VLSI Journal, 41(2), Feb [10] S. Kumar, A. Jantsch, J.-P. Soininen, M. Forsell, M. Millberg, J. Öberg, K. Tiensyrjä, and A. Hemani, A Network on Chip Architecture and Design Methodology, in Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2002, pp [11] T. Bjerregaard and S. Mahadevan, A Survey of Research and Practices of Network-on-Chip, in Integrated Circuits and Systems Design (SBCCI) 2003, pp [12] C. A. Zeferino and A. A. Susin, SoCIN: A Parametric and Scalable Network-on-Chip, in ACM Computing Surveys (CSUR), vol. 38, issue 1, [13] L. Benini and G. De Micheli, Networks on Chips: Technology and Tools, San Francisco, CA: Morgan Kaufmann, [14] H. Richter, US-Patent 5,175,539 Interconnection Network. [15] D. H. Lawrie, Access and Alignment of Data in an Array Processor, in IEEE Transactions on Computers, vol. C-24, no. 12, December 1975, pp [16] H. Richter, Verbindungsnetzwerke für parallele und verteilte Systeme, in German, Heidelberg: Spektrum, [17] J. L. Hennessy and D. A. Patterson, Computer Architecture. A Quantitve Approach, 4th edition, San Francisco, CA: Morgan Kaufmann, [18] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks. An Engineering Approach, San Francisco, CA: Morgan Kaufmann, [19] W. Dally and B. Towles, Principles and Practices of Interconnection Networks, San Francisco, CA: Morgan Kaufmann Publishers Inc., [20] C.-L. Wu and T.-Y. Feng, On a Class of Multistage Interconnection Networks, in IEEE Transactions on Computers, Vol. C-29, No. 8, August 1980, pp [21] Xilinx Inc., MicroBlaze Processor Reference Guide, Product Specification: UG081, v9.0, Jan [22] Xilinx Inc., Fast Simplex Link (FSL) Bus (v2.11b), Product Specification: DS449, June 2009 [23] H. P. Rosinger, Connecting Customized IP to the MicroBlaze Soft Processor Using the Fast Simplex Link (FSL) Channel, Xilinx Application Note: XAPP529, v1.3, May
Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip
Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana
System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1
System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect
Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy
Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of
Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!
Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel
Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)
Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems
A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs
A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs ang u, Mei ang, ulu ang, and ingtao Jiang Dept. of Computer Science Nankai University Tianjing, 300071, China [email protected],
Interconnection Networks
CMPT765/408 08-1 Interconnection Networks Qianping Gu 1 Interconnection Networks The note is mainly based on Chapters 1, 2, and 4 of Interconnection Networks, An Engineering Approach by J. Duato, S. Yalamanchili,
Switched Interconnect for System-on-a-Chip Designs
witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased
Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)
Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,
Chapter 4 Multi-Stage Interconnection Networks The general concept of the multi-stage interconnection network, together with its routing properties, have been used in the preceding chapter to describe
Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA
Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA B. Neji 1, Y. Aydi 2, R. Ben-atitallah 3,S. Meftaly 4, M. Abid 5, J-L. Dykeyser 6 1 CES, National engineering School
Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng
Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption
Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors
2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,
SOCWIRE: A SPACEWIRE INSPIRED FAULT TOLERANT NETWORK-ON-CHIP FOR RECONFIGURABLE SYSTEM-ON-CHIP DESIGNS
SOCWIRE: A SPACEWIRE INSPIRED FAULT TOLERANT NETWORK-ON-CHIP FOR RECONFIGURABLE SYSTEM-ON-CHIP DESIGNS IN SPACE APPLICATIONS Session: Networks and Protocols Long Paper B. Osterloh, H. Michalik, B. Fiethe
A Generic Network Interface Architecture for a Networked Processor Array (NePA)
A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction
Interconnection Network
Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network
Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip
Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip Cristina SILVANO [email protected] Politecnico di Milano, Milano (Italy) Talk Outline
From Bus and Crossbar to Network-On-Chip. Arteris S.A.
From Bus and Crossbar to Network-On-Chip Arteris S.A. Copyright 2009 Arteris S.A. All rights reserved. Contact information Corporate Headquarters Arteris, Inc. 1741 Technology Drive, Suite 250 San Jose,
Lecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Chapter 2. Multiprocessors Interconnection Networks
Chapter 2 Multiprocessors Interconnection Networks 2.1 Taxonomy Interconnection Network Static Dynamic 1-D 2-D HC Bus-based Switch-based Single Multiple SS MS Crossbar 2.2 Bus-Based Dynamic Single Bus
Interconnection Networks
Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode
Components: Interconnect Page 1 of 18
Components: Interconnect Page 1 of 18 PE to PE interconnect: The most expensive supercomputer component Possible implementations: FULL INTERCONNECTION: The ideal Usually not attainable Each PE has a direct
Computer Network. Interconnected collection of autonomous computers that are able to exchange information
Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.
Optical interconnection networks with time slot routing
Theoretical and Applied Informatics ISSN 896 5 Vol. x 00x, no. x pp. x x Optical interconnection networks with time slot routing IRENEUSZ SZCZEŚNIAK AND ROMAN WYRZYKOWSKI a a Institute of Computer and
Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip
Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC
Scaling 10Gb/s Clustering at Wire-Speed
Scaling 10Gb/s Clustering at Wire-Speed InfiniBand offers cost-effective wire-speed scaling with deterministic performance Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400
Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip
Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip Manjunath E 1, Dhana Selvi D 2 M.Tech Student [DE], Dept. of ECE, CMRIT, AECS Layout, Bangalore, Karnataka,
Communication Networks. MAP-TELE 2011/12 José Ruela
Communication Networks MAP-TELE 2011/12 José Ruela Network basic mechanisms Introduction to Communications Networks Communications networks Communications networks are used to transport information (data)
Behavior Analysis of Multilayer Multistage Interconnection Network With Extra Stages
Behavior Analysis of Multilayer Multistage Interconnection Network With Extra Stages Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Computer
Design and Verification of Nine port Network Router
Design and Verification of Nine port Network Router G. Sri Lakshmi 1, A Ganga Mani 2 1 Assistant Professor, Department of Electronics and Communication Engineering, Pragathi Engineering College, Andhra
Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin
BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394
3D On-chip Data Center Networks Using Circuit Switches and Packet Switches
3D On-chip Data Center Networks Using Circuit Switches and Packet Switches Takahide Ikeda Yuichi Ohsita, and Masayuki Murata Graduate School of Information Science and Technology, Osaka University Osaka,
Parallel Programming
Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen [email protected] WS15/16 Parallel Architectures Acknowledgements Prof. Felix
Interconnection Network Design
Interconnection Network Design Vida Vukašinović 1 Introduction Parallel computer networks are interesting topic, but they are also difficult to understand in an overall sense. The topological structure
IMPLEMENTATION OF FPGA CARD IN CONTENT FILTERING SOLUTIONS FOR SECURING COMPUTER NETWORKS. Received May 2010; accepted July 2010
ICIC Express Letters Part B: Applications ICIC International c 2010 ISSN 2185-2766 Volume 1, Number 1, September 2010 pp. 71 76 IMPLEMENTATION OF FPGA CARD IN CONTENT FILTERING SOLUTIONS FOR SECURING COMPUTER
Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)
Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E) 1 Topologies Internet topologies are not very regular they grew incrementally Supercomputers
Quality of Service (QoS) for Asynchronous On-Chip Networks
Quality of Service (QoS) for synchronous On-Chip Networks Tomaz Felicijan and Steve Furber Department of Computer Science The University of Manchester Oxford Road, Manchester, M13 9PL, UK {felicijt,sfurber}@cs.man.ac.uk
DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL
IJVD: 3(1), 2012, pp. 15-20 DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL Suvarna A. Jadhav 1 and U.L. Bombale 2 1,2 Department of Technology Shivaji university, Kolhapur, 1 E-mail: [email protected]
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP SWAPNA S 2013 EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP A
Open Flow Controller and Switch Datasheet
Open Flow Controller and Switch Datasheet California State University Chico Alan Braithwaite Spring 2013 Block Diagram Figure 1. High Level Block Diagram The project will consist of a network development
CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION
CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION T.S Ghouse Basha 1, P. Santhamma 2, S. Santhi 3 1 Associate Professor & Head, Department Electronic & Communication Engineering,
MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL
MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL Sandeep Kumar 1, Arpit Kumar 2 1 Sekhawati Engg. College, Dundlod, Dist. - Jhunjhunu (Raj.), [email protected], 2 KIIT, Gurgaon (HR.), Abstract
Applying the Benefits of Network on a Chip Architecture to FPGA System Design
Applying the Benefits of on a Chip Architecture to FPGA System Design WP-01149-1.1 White Paper This document describes the advantages of network on a chip (NoC) architecture in Altera FPGA system design.
FPGA Implementation of IP Packet Segmentation and Reassembly in Internet Router*
SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 6, No. 3, December 2009, 399-407 UDK: 004.738.5.057.4 FPGA Implementation of IP Packet Segmentation and Reassembly in Internet Router* Marko Carević 1,a,
Networking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES
QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES SWATHI NANDURI * ZAHOOR-UL-HUQ * Master of Technology, Associate Professor, G. Pulla Reddy Engineering College, G. Pulla Reddy Engineering
Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs
Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs Antoni Roca, Jose Flich Parallel Architectures Group Universitat Politechnica de Valencia (UPV) Valencia, Spain Giorgos Dimitrakopoulos
Asynchronous Bypass Channels
Asynchronous Bypass Channels Improving Performance for Multi-Synchronous NoCs T. Jain, P. Gratz, A. Sprintson, G. Choi, Department of Electrical and Computer Engineering, Texas A&M University, USA Table
AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK
Abstract AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK Mrs. Amandeep Kaur, Assistant Professor, Department of Computer Application, Apeejay Institute of Management, Ramamandi, Jalandhar-144001, Punjab,
Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware
Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware Shaomeng Li, Jim Tørresen, Oddvar Søråsen Department of Informatics University of Oslo N-0316 Oslo, Norway {shaomenl, jimtoer,
Extending Platform-Based Design to Network on Chip Systems
Extending Platform-Based Design to Network on Chip Systems Juha-Pekka Soininen 1, Axel Jantsch 2, Martti Forsell 1, Antti Pelkonen 1, Jari Kreku 1, and Shashi Kumar 2 1 VTT Electronics (Technical Research
Why the Network Matters
Week 2, Lecture 2 Copyright 2009 by W. Feng. Based on material from Matthew Sottile. So Far Overview of Multicore Systems Why Memory Matters Memory Architectures Emerging Chip Multiprocessors (CMP) Increasing
Real-Time (Paradigms) (51)
Real-Time (Paradigms) (51) 5. Real-Time Communication Data flow (communication) in embedded systems : Sensor --> Controller Controller --> Actor Controller --> Display Controller Controller Major
http://www.ece.ucy.ac.cy/labs/easoc/people/kyrkou/index.html BSc in Computer Engineering, University of Cyprus
Christos Kyrkou, PhD KIOS Research Center for Intelligent Systems and Networks, Department of Electrical and Computer Engineering, University of Cyprus, Tel:(+357)99569478, email: [email protected] Education
Principles and characteristics of distributed systems and environments
Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single
INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 ISSN 0976 6464(Print)
Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2
Reconfigurable Architectures Chapter 3.2 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design Coarse-Grained Reconfigurable Devices Recall: 1. Brief Historically development (Estrin Fix-Plus
Power Reduction Techniques in the SoC Clock Network. Clock Power
Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a
Multiprocessor System-on-Chip
http://www.artistembedded.org/fp6/ ARTIST Workshop at DATE 06 W4: Design Issues in Distributed, CommunicationCentric Systems Modelling Networked Embedded Systems: From MPSoC to Sensor Networks Jan Madsen
Chapter 14: Distributed Operating Systems
Chapter 14: Distributed Operating Systems Chapter 14: Distributed Operating Systems Motivation Types of Distributed Operating Systems Network Structure Network Topology Communication Structure Communication
Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP)
19th International Symposium on Computer Architecture and High Performance Computing Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP) Seung Eun Lee, Jun Ho Bahn, and
Packetization and routing analysis of on-chip multiprocessor networks
Journal of Systems Architecture 50 (2004) 81 104 www.elsevier.com/locate/sysarc Packetization and routing analysis of on-chip multiprocessor networks Terry Tao Ye a, *, Luca Benini b, Giovanni De Micheli
Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah
(DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de [email protected] NIOS II 1 1 What is Nios II? Altera s Second Generation
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com
Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and
Low-Overhead Hard Real-time Aware Interconnect Network Router
Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering
Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師
Lecture 7: Distributed Operating Systems A Distributed System 7.2 Resource sharing Motivation sharing and printing files at remote sites processing information in a distributed database using remote specialized
Optimizing Configuration and Application Mapping for MPSoC Architectures
Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : [email protected] 1 Multi-Processor Systems on Chip (MPSoC) Design Trends
White Paper Increase Flexibility in Layer 2 Switches by Integrating Ethernet ASSP Functions Into FPGAs
White Paper Increase Flexibility in Layer 2 es by Integrating Ethernet ASSP Functions Into FPGAs Introduction A Layer 2 Ethernet switch connects multiple Ethernet LAN segments. Because each port on the
A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip
www.ijcsi.org 241 A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip Ahmed A. El Badry 1 and Mohamed A. Abd El Ghany 2 1 Communications Engineering Dept., German University in Cairo,
Module 15: Network Structures
Module 15: Network Structures Background Topology Network Types Communication Communication Protocol Robustness Design Strategies 15.1 A Distributed System 15.2 Motivation Resource sharing sharing and
SUPPORT FOR HIGH-PRIORITY TRAFFIC IN VLSI COMMUNICATION SWITCHES
9th Real-Time Systems Symposium Huntsville, Alabama, pp 191-, December 1988 SUPPORT FOR HIGH-PRIORITY TRAFFIC IN VLSI COMMUNICATION SWITCHES Yuval Tamir and Gregory L Frazier Computer Science Department
QoS issues in Voice over IP
COMP9333 Advance Computer Networks Mini Conference QoS issues in Voice over IP Student ID: 3058224 Student ID: 3043237 Student ID: 3036281 Student ID: 3025715 QoS issues in Voice over IP Abstract: This
CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE
CHAPTER 5 71 FINITE STATE MACHINE FOR LOOKUP ENGINE 5.1 INTRODUCTION Finite State Machines (FSMs) are important components of digital systems. Therefore, techniques for area efficiency and fast implementation
WBAN Beaconing for Efficient Resource Sharing. in Wireless Wearable Computer Networks
Contemporary Engineering Sciences, Vol. 7, 2014, no. 15, 755-760 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.4686 WBAN Beaconing for Efficient Resource Sharing in Wireless Wearable
Chapter 16: Distributed Operating Systems
Module 16: Distributed ib System Structure, Silberschatz, Galvin and Gagne 2009 Chapter 16: Distributed Operating Systems Motivation Types of Network-Based Operating Systems Network Structure Network Topology
A Study of Network Security Systems
A Study of Network Security Systems Ramy K. Khalil, Fayez W. Zaki, Mohamed M. Ashour, Mohamed A. Mohamed Department of Communication and Electronics Mansoura University El Gomhorya Street, Mansora,Dakahlya
Chapter 12: Multiprocessor Architectures. Lesson 04: Interconnect Networks
Chapter 12: Multiprocessor Architectures Lesson 04: Interconnect Networks Objective To understand different interconnect networks To learn crossbar switch, hypercube, multistage and combining networks
How To Understand The Concept Of A Distributed System
Distributed Operating Systems Introduction Ewa Niewiadomska-Szynkiewicz and Adam Kozakiewicz [email protected], [email protected] Institute of Control and Computation Engineering Warsaw University of
STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM
STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM Albert M. K. Cheng, Shaohong Fang Department of Computer Science University of Houston Houston, TX, 77204, USA http://www.cs.uh.edu
LogiCORE IP AXI Performance Monitor v2.00.a
LogiCORE IP AXI Performance Monitor v2.00.a Product Guide Table of Contents IP Facts Chapter 1: Overview Target Technology................................................................. 9 Applications......................................................................
The proliferation of the raw processing
TECHNOLOGY CONNECTED Advances with System Area Network Speeds Data Transfer between Servers with A new network switch technology is targeted to answer the phenomenal demands on intercommunication transfer
Introduction to System-on-Chip
Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University
How To Provide Qos Based Routing In The Internet
CHAPTER 2 QoS ROUTING AND ITS ROLE IN QOS PARADIGM 22 QoS ROUTING AND ITS ROLE IN QOS PARADIGM 2.1 INTRODUCTION As the main emphasis of the present research work is on achieving QoS in routing, hence this
TD 271 Rev.1 (PLEN/15)
INTERNATIONAL TELECOMMUNICATION UNION STUDY GROUP 15 TELECOMMUNICATION STANDARDIZATION SECTOR STUDY PERIOD 2009-2012 English only Original: English Question(s): 12/15 Geneva, 31 May - 11 June 2010 Source:
ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7
ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7 4.7 A 2.7 Gb/s CDMA-Interconnect Transceiver Chip Set with Multi-Level Signal Data Recovery for Re-configurable VLSI Systems
PCI Express Overview. And, by the way, they need to do it in less time.
PCI Express Overview Introduction This paper is intended to introduce design engineers, system architects and business managers to the PCI Express protocol and how this interconnect technology fits into
