Master s Project Report June, Venugopal Duvvuri Department of Electrical and Computer Engineering University Of Kentucky

Size: px
Start display at page:

Download "Master s Project Report June, 2002. Venugopal Duvvuri Department of Electrical and Computer Engineering University Of Kentucky"

Transcription

1 Design, Development, and Simulation/Experimental Validation of a Crossbar Interconnection Network for a Single-Chip Shared Memory Multiprocessor Architecture Master s Project Report June, 2002 Venugopal Duvvuri Department of Electrical and Computer Engineering University Of Kentucky Under the Guidance of Dr. J. Robert Heath Associate Professor Department of Electrical and Computer Engineering University of Kentucky 1

2 Table of Contents Topic Page Number ABSTRACT 3 Chapter 1: Introduction, Background, and Positioning of Research 4 Chapter 2: Types of Interconnect Systems 8 Chapter 3: Multistage Interconnection Systems Complexity 16 Chapter 4: Design of the Crossbar Interconnect Network 28 Chapter 5: VHDL Design Capture, Simulation, Synthesis and Implementation Flow 35 Chapter 6: Design Validation via Post-Implementation Simulation Testing 39 Chapter 7: Experimental Prototype Development, Testing, and Validation Results 61 Chapter 8: Conclusions 65 References 66 Appendix A: Interconnect Network and Memory VHDL Code (Version 1) 67 Appendix B: Interconnect Network VHDL Code (Version 2) 76 2

3 ABSTRACT This project involves modeling, design, Hardware Description Language (HDL) design capture, synthesis, HDL simulation testing, and experimental validation of an interconnect network for a Hybrid Data/Command Driven Computer Architecture (HDCA) system, which is a single-chip shared memory multiprocessor architecture system. Various interconnect topologies that may meet the requirements of the HDCA system are studied and evaluated related to utilization within the HDCA system. It is determined the Crossbar topology best meets the HDCA system requirements and it is therefore used as the interconnect network of the HDCA system. The design capture, synthesis, simulation and implementation is done in VHDL using XILINX Foundation CAD software. A small reduced scale prototype design is implemented in a PROM based Spartan XL Field Programmable Gate Array (FPGA) chip which is successfully experimentally tested to validate the design and functionality of the described crossbar interconnect network. 3

4 Chapter 1 Introduction, Background, and Positioning of Research This project is first, the study of different kinds of interconnect networks that may meet the requirements of a Hybrid Data/Command Driven Architecture (HDCA) multiprocessor system [2,5,6] shown in Figure 1.1. The project then involves Vhsic Hardware Description Language (VHDL) [10] description, synthesis, simulation testing, and experimental prototype testing of an interconnect network which acts as a Computing Element (CE) to data memory circuit switch for the HDCA system. The HDCA system is a multiprocessor shared memory architecture. The shared memory is organized as a number of individual memory blocks as shown in Figures 1.1, 3.4a, and 4.1 and is explained in detail in later chapters. This kind of memory organization is required by this architecture. If two or more processors want to communicate with memory locations within the same memory block, lower priority processors have to wait until the highest priority processor gets its transaction done. Only the highest priority processor will receive a request grant and the requests from other lower priority processors must be queued and these requests are processed only after the completion of the first highest priority transaction. The interconnect network to be designed should be able to connect requesting processors on one side of the interconnect network to the memory blocks on the other side of the interconnect network. The efficiency of the interconnect network increases as the possible number of parallel connections between the processors and the memory blocks increases. Interconnection networks play a central role in determining the overall performance of a multiprocessor system. If the network cannot provide adequate performance for a particular application, nodes (or CE processors in this case) will frequently be forced to wait for data to arrive. In this project, different types of interconnect networks, which may be applicable to a HDCA system are addressed, and advantages and disadvantages of these interconnects are discussed. Different types of interconnects, their routing mechanism, and the complexity factor in designing the interconnects is also described in detail. This project includes design and VHDL description and synthesis of a interconnect network 4

5 based on the crossbar topology, the topology which best meets HDCA system requirements. Inputs Input FIFOs RAM Data Memory... Outputs CE-Data Memory Interconnect. CE 0 CE 1 CE n-1 Q Q Q CE-Mapper Control Token Router FIFO CE-File Interconnect Control Token Mapper (CTM)... File Large File Memory File Q => Muiltifunctional Queue Figure 1.1: Single-Chip Reconfigurable HDCA System (Large File Memory May Be Off-Chip) 5

6 The crossbar topology is a very popular interconnect network in industry today. Interconnects are applicable to different kinds of systems having their own requirements. In some systems, such as distributed memory systems, there should be a way that the processors can communicate with each other. A crossbar topology (single sided topology) [1] can be designed to meet the requirement of inter-processor communication and is unique to distributed memory systems, because in distributed memory systems, processors do not share a common memory. All the processors in the system are directly connected to their own memory and caches. Any processor cannot directly access another processor's memory. All communication between the processors is made possible through the interconnection network. Hence, there is a need for inter-processor communication in distributed memory architectures. The crossbar topology suitable for these architectures is the single-sided crossbar network. All the processors are connected to an interconnection network and communication between any two processors is possible. The HDCA system does not need an interconnect that supports inter-processor communication, as it is a shared memory architecture. For the shared memory architectures, a double-sided crossbar network can be used as the interconnect network. This design needs some kind of priority logic, which prioritizes conflicting requests for memory accesses by the processors. This also requires a memory organization which is shared by all processors. The HDCA system requires the memory to be divided into memory blocks, each block containing memory locations with different address ranges. The actual interconnect design is a combination of a crossbar interconnect (double sided topology) [1], priority logic, and a shared memory organization. Another interconnect architecture has been implemented as the interconnect for the CE to Data memory circuit switch in an initial prototype of the HDCA system [2]. The initial HDCA prototype assumes no processor conflicts in accessing a particular memory block, which can be handled in the design presented here by the priority logic block. The input queue depth of individual CE processors of a HDCA system is used by the priority logic block of the proposed interconnect network in granting requests to the processor having the deepest queue depth. The presented design is specific to the CE to Data memory circuit switch for a HDCA system. The detailed crossbar interconnect network design is described in 6

7 Chapter 4. VHDL design capture, synthesis and implementation procedures are discussed in Chapter 5. Chapter 6 includes the VHDL simulation testing setup and results. A test case is described in chapter 6 which was tested during pre-synthesis HDL simulation, post-synthesis HDL simulation, and the post-implementation HDL simulation process. In Chapter 7 an experimental prototype of the crossbar interconnect network is developed and tested to validate the presented interconnect architecture, design, and functionality. 7

8 Chapter 2 Types of Interconnect Systems Interconnect networks can be classified as static or dynamic [11]. In the case of a static interconnection network, all connections are fixed, i.e. the processors are wired directly, whereas in the latter case there are routing switches in between. The decision whether to use a static or dynamic interconnection network depends on the type of problem to be solved by the computer system utilizing the interconnect system. Generally, static topologies are suitable for problems whose communication patterns can be predicted a priori reasonably well, whereas dynamic topologies (switching networks), though more expensive, are suitable for a wider class of problems. Static networks are mainly used in message passing networks and are mainly used for inter-processor communications. Types of Static Networks: 1. Star connected network: Figure 2.1: Star Connected Network In a star topology there is one central node computer, to which all other node computers are connected; each node has one connection, except the center node, which has N-1 connections. Routing in stars is trivial. If one of the communicating nodes is the center node, then the path is just the edge connecting them. If not, the message is routed from the source node to the center node, and from there to the destination node. Star 8

9 networks are not suitable for large systems, since the center node will become a bottleneck with an increasing number of processors. A typical Star connected network is shown in Figure Meshes: Figures 2.2 and 2.3 show a typical 1-Dimensional (D) mesh and 2-D mesh respectively. The simplest and cheapest way to connect the nodes of a parallel computer is to use a one-dimensional mesh. Each node has two connections and boundary nodes have one. If the boundary nodes are connected to each other, we have a ring, and all nodes have two connections. The one-dimensional mesh can be generalized to a k- dimensional mesh, where each node (except boundary nodes) has 2k connections. In meshes, the dimension-order routing technique is used [12]. That is, routing is performed in one dimension at a time. In a three-dimensional mesh for example, a message's path from node (a,b,c) to the node (x,y,z) would be moved along the first dimension to node (x,b,c), then, along the second dimension to node (x,y,c), and finally, in the third dimension to the destination-node (x,y,z). This type of topology is not suitable to build large-scale computers, since there is a wide range of latencies (the latency between neighboring processors is much lower than between not-neighbors), and secondly the maximum latency grows with the number of processors. Figure 2.2: 1 - D Mesh Figure 2.3: 2 - D Mesh 9

10 3. Hypercubes: The hypercube topology is one of the most popular and used in many large-scale systems. A k-dimensional hypercube has 2 k nodes, each with k connections. Figure 2.4 shows a 4-D hypercude. Hypercubes scale very well, the maximum latency in a k- dimensional (or "k-ary") hypercube is log 2 N, with N = 2 k. An important property of hypercube interconnects is the relationship between node-number and which nodes are connected together. The rule is that any two nodes in the hypercube, whose binary representations differ in exactly one bit, are connected together. For example in a four-dimensional hypercube, node 0 (0000) is connected to node 1 (0001), node 2 (0010), node 4 (0100) and node 8 (1000). This numbering scheme is called the Gray code scheme. A hypercube connected in this fashion is shown in Figure 2.5. A k-dimensional hypercube is nothing more than a k-dimensional mesh with only two nodes in each dimension, and thus the routing algorithm is the same as for meshes; apart from one difference. The path from node A to node B is calculated by simply calculating the Exclusive-OR, X = A XOR B, from the binary representations for node A and B. If the i th bit in X is '1' the message is moved to the neighboring node in the i th dimension. If the i th bit is '0', the message is not moved. This means, that it takes at most log 2 N steps for a message to reach its destination (where N is the number of nodes in the hypercube). Figure 2.4: 4-D Hypercube 10

11 Figure 2.5: Gray Code Scheme in Hypercube Types of Dynamic Networks: 1. Bus-based networks: They are the simplest and an efficient solution when the cost and a moderate number of processors are involved. Their main drawback is a bottleneck to the memory when the number of processors becomes large and also a single point of failure hangs the system. To overcome these problems to some extent, several parallel buses can be incorporated. Figure 2.6 shows a bus-based network incorporated with a single bus. P0 P1 Pn-1 Figure 2.6: Bus-based Network 11

12 2. Crossbar switching networks: Figure 2.7 shows a double-sided crossbar network having n processors (P i ) and m memory blocks (M i ). All the processors in a crossbar network have dedicated buses directly connected to all memory blocks. This is a non-blocking network, as a connection of a processor to a memory block does not block the connection of any other processor to any other memory block. In spite of high speed, their use is normally limited to those systems containing 32 or fewer processors, due to non-linear (n x m) complexity and cost. They are applied mostly in multiprocessor vector computers and in multiprocessor systems with multilevel interconnections. M1 M2 Mm P1 P2 Pn Figure 2.7: Crossbar Network 3. Multistage interconnection networks: Multistage networks are designed with several small dimension crossbar networks. Input/Output connection establishment is done in two or more stages. Figures 12

13 2.8 and 2.9 shown below are Benes multistage and Clos multistage networks. These are non-blocking networks and are suitable for very large systems as their complexity is much less than that of Crossbar networks. The main disadvantage of these networks is latency. The latency increases with the size of the network. II B N/d,d P 0 P 1 I III M 0 M 1 X d, d X d, d P d-1 M 0d-1 P d P d+1 I II B N/d,d III M d M d+1 X d, d X d, d P 2d-1 M 2d-1 P n-d P n-d+1 I III M n-d M n-d+1 P n-1 X d, d II X d, d M n-1 B N/d,d Figure 2.8: Benes Network (B N,d ) 13

14 P 0 P 1 I (1) II (1) III (1) M 0 M 1 P d-1 M d-1 P d M d P d P d+1 I (2) III (2) M d+1 P 2d-1 II (2) M 2d-1 P n-d M n-d P n-d+1 I (C1) III (C2) M n-d+1 P n-1 M n-1 II (K) Figure 2.9: Clos Network 14

15 Some of the multistage networks are compared with crossbar networks, primarily from a stand-point of complexity, in the next chapter. Table 2.10 shows some general properties of bus, crossbar and multistage interconnection networks. Property Bus Crossbar Multistage Speed Low High High Cost Low High Moderate Reliability Low High High Complexity Low High Moderate Table 2.1: Properties of Various Interconnection Network Topologies 15

16 Chapter 3 Multistage Interconnection Network Complexity For the HDCA system, the desired interconnect should be able to establish nonblocking, high speed connections from the requesting Computing Elements (CEs) to the memory blocks. The interconnect should be able to sense if there are any conflicts such as two or more processors requesting connection to the same memory block and give only a highest priority processor the ability to connect. The multistage Benes network and Clos network, their complexity comparison with a crossbar network, and advantages and disadvantages of these candidate networks is discussed in this chapter. Crossbar Topology: A crossbar network is a highly non-blocking, very reliable, very high-speed network. Figure 2.7 is a typical single stage crossbar network with N inputs/processors and M outputs/memory blocks. It is denoted by X N,M. The complexity (Crosspoint count) of a Crossbar network is given by N x M. Complexity increases with an increase in number of inputs or number of outputs. This is the main disadvantage of the Crossbar network. Hence there is less scope for scalability of crossbar networks. The crossbar topology implemented for shared memory architectures is referred to as a double-sided crossbar network. Benes Network: A Benes network is a multistage, non-blocking network.. For any value of N, d should be chosen so that Log d N is an integer. The number of stages for a N x N Benes network is given by (2log d N-1) and it has (N/d) crossbar switches in each stage. Hence B N,d is implemented with [[(N/d).(2log d N-1)] crossbar switches. The general architecture of a Benes network (B N,d ) is shown in Figure 2.8. In the figure, N: Number of Inputs or Outputs, 16

17 d: Dimension of each crossbar switch ( X d,d ), I: First stage switch = X d,d, II: Middle stage switch = B N/d,d III: Last stage switch = X d,d, The complexity (crosspoint count) of the network is given by [(N/d).(2log d N-1). d 2 ]. Network latency is a factor of (2log d N-1), because of (2log d N-1) stages between input stage and output stage. There are different possible routings from any input to output. It is a limited scalability architecture. For a B N,d implementation, N has to be a power of d. For all other configurations, a higher order Benes network can be used, but at the cost of some hardware wastage. The main disadvantage of this network is the network latency and limited scalability. For very large networks, a Benes network implementation is very cost effective. Clos Network: Figure 2.9 shows a typical N x M Clos network represented by C N,M. The blocks I, III are always crossbar switches and II is a crossbar switch for a 3 stage Clos network. In implementations of higher order Clos networks, II is a lower order Clos network. For example, for a 5 stage Clos implementation, II is a three-stage Clos network. N: Number of processors M: Number of Memory blocks K: Number of Second stage switches C1: Number of First stage switches C2: Number of Third stage switches For a three stage Clos network, I = X, II = X N/C1,K C1,C2, III = X K,M/C2 and the condition for non-blocking Clos implementation is K = N/C1 + M/C2-1. A three stage Clos implementation for N = 16, M = 32, C1 = 4, C2 = 8 has K = 16/4 + 32/8-1 = 7. Each 1st stage switch becomes a 4 x 7 crossbar switch and the 2nd stage 17

18 switch becomes a 4 x 8 Crossbar switch and each third stage switch becomes a Crossbar switch of size 7 x 4. (I = X 4,7 II = X 4,8 III = X 7,4 ). The complexity of a Clos network is given by C clos = [K(N +M) + K(C1.C2 ) ]. Using the non-blocking condition, K = N/C1 + M/C2-1. For N = M & C1 = C2, K = 2N/C1-1 and hence C clos = (2N/C1-1) {2N + C1 2 }. For an optimum crosspoint count for non-blocking Clos networks, N/C1 = (N/2) 1/2 = > C1 2 = 2N => C clos = ((2N) 1/2-1). 4N. (Approximately) The main advantage of a Clos network implementation is its scalability. A Clos network can be implemented for any non-prime value of N. The disadvantages of this implementation are network latency and implementation for small systems. The network latency is a factor of the number of intermediate stages between the input stage and the output stage. From the complexity comparison shown in Table 3.1 and charts shown in Figures 3.2 and 3.3, it can be analyzed that the crossbar topology for small systems and the Benes network for large systems, match the requirements of the interconnect network for a HDCA system. The number of processors on the input side and number of memory blocks on the output side are assumed to be N for simplicity in comparison of the topologies. This assumption holds for any rectangular size implementations of these topologies, which is not possible in the Benes network. The complexity comparison table for the three topologies studied so far is given in Table 3.1. In the table I is the complexity, and II is the corresponding network implementation for the values of N for the respective topologies. Chart 1, shown in Figure 3.2, is the graph of complexity of the three topologies versus N, the number of processors or memory blocks, for lower values of N (N <= 16). Chart 2, shown in Figure 3.3, is the graph of complexity of the three topologies versus N, the number of processors or memory blocks, for higher values of N (N >= 16). 18

19 Table 3.1: Complexity Comparison Table N Crossbar Benes Clos I II I II I II 2 4 X(2,2) 4 B(2,2) 4 C(2,2) 3 9 X(3,3) 9 B(3,3) 9 C(3,3) 4 16 X(4,4) 24 B(4,2) 36 C(4,2) 5 25 X(5,5) 25 B(5,5) 25 C(5,5) 6 36 X(6,6) 80 B(8,2) 63 C(6,3) 7 49 X(7,7) 80 B(8,2) 96 C(8,4) 8 64 X(8,8) 80 B(8,2) 96 C(8,4) 9 81 X(9,9) 81 B(9,3) 135 C(9,3) X(10,10) 224 B(16,2) 135 C(10,5) X(11,11) 224 B(16,2) 180 C(12,6) X(12,12) 224 B(16,2) 180 C(12,6) X(13,13) 224 B(16,2) 189 C(14,7) X(14,14) 224 B(16,2) 189 C(14,7) X(15,15) 224 B(16,2) 275 C(15,5) X(16,16) 224 B(16,2) 278 C(16,8) X(32,32) 576 B(32,2) 896 C(32,8) X(64,64) 1408 B(64,2) 2668 C(64,16) X(81,81) 1701 B(81,3) 4131 C(81,9) X(128,128) 3328 B(128,2) 7680 C(128,16) 19

20 Complexity Chart Crossbar Benes Clos Figure 3.2: Complexity Chart for N <= 16 N Chart Complexity Crossbar Benes Clos N Figure 3.3: Complexity Chart for N >= 16 20

21 The following equations are used to calculate the complexity of all three topologies for the different configurations given in Table 3.1. C clos = (2N/C1-1) {2N + C1 2 }, For N/C1 = (N/2) 1/2, taken to the closest integer value. C benes = [(N/d).(2log d N-1). d 2 ], C crossbar = N 2 From Table 3.1 and the Charts in Figures 3.2 and 3.3, the crossbar topology has the lowest complexity for the values of N < = 16. Hence the crossbar network is the best interconnect implementation for systems that have not more than 16 processors/memory blocks since the hardware required for the implementation is less than all other possible implementations, it is faster than any other network, and it is a non-blocking network as every input has connection capability to every output in the system. And for the systems having more than 16 x 16 configurations, and less than 64 x 64 configurations, the designer has to tradeoff between speed and complexity. Because, for the multistage networks such as the Benes network, complexity is less than that of the crossbar network but at the cost of speed, as speed of multistage networks is much lower than that of the crossbar network. For systems having more than 64 x 64 configurations, the Benes network proves to be the best implementation. The HDCA system normally requires an interconnect with a complexity less than 256. The crossbar implemented interconnect best suits the system as it has minimum complexity for the sizes of interconnect needed by the HDCA system, it is highly nonblocking as no processor has to share any bus with any other processor and it is a very high speed implementation as it has only one intermediate stage between processors and memory blocks. 21

22 Multiprocessor Shared Memory Organization: P[0] PI[0] IM[0] MB[0] P[1] P[N-1] PI[1] PI[N-1] I N T E R C O N N E C T IM[1] IM[M-1] MB[1] MB[M-1] Figure 3.4a: Multi-processor, Interconnect and Shared Memory Organization Figure 3.4a shows the organization of a multiprocessor shared memory organization. Figure 3.4b shows the organization of the shared memory used in the HDCA system. By making 2 c = M, the shared memory architecture in Figure 3.4b can be used as the shared memory for the HDCA system. Figure 3.4c shows the organization of each memory block. In each memory block there are 2 b addressable locations each of a bits width. 22

23 MB [0] MB [1] MB [2] MB [2 c -1] Figure 3.4b: Shared Memory Organization 23

24 a b - 1 Figure 3.4c: Organization of each Memory Block Related Research in Crossbar Networks: The crossbar switches (networks) of today see a wide use in a variety of applications including network switching, parallel computing, and various telecommunications applications. By using the Field Programmable Gate Arrays (FPGAs) or the Complex Programmable Logic Devices (CPLDs) to implement the crossbar switches, design engineers have the flexibility to customize the switch to suit their specific design goals, as well as obtain switch configurations not available with offthe-shelf parts. In addition, the use of in-system programmable devices allows the switch 24

25 to be reconfigured if design modifications become necessary. There are two types of implementations possible based on the crossbar topology. One is the single-sided crossbar implementation as shown in Figure 3.5 and the other is the double-sided crossbar implementation as shown in Figure N- 2 N- 1 Figure 3.5: Single-sided Crossbar network The single-sided crossbar network is usually implemented and utilized in distributed memory systems, where all the nodes (or processors) connected to the interconnect need to communicate to each other. Whereas in the double-sided crossbar networks which are usually utilized as the interconnect between processors and memory blocks in a multiprocessor shared memory architecture as shown in Figures 3.4a, 3.4b, and 3.4c, processors need to communicate with memory blocks but not processors with processors and memory blocks with memory blocks. A crossbar network is implemented with serial buses or parallel buses. In a serialbus, crossbar network implementation, addresses and data are sent by the processor through a single-bit bus in a serial fashion, which are fetched by the memory blocks at the rate of 1-bit on every active edge of the system clock. Some of the conventional crossbar switches use this protocol for the crossbar interconnect network. The other implementation is the parallel-bus crossbar network implementation. This implementation is much faster than the serial-bus implementation. All the memory blocks 25

26 fetch addresses on one clock and data on the following clocks. This implementation consumes more hardware than the serial-bus crossbar network implementation but is a much faster network implementation and is hence used in some high performance multiprocessor systems. The main issue in implementing a crossbar network is arbitration of processor requests for memory accesses. The processor request arbitration comes into picture when two or more processors request for memory access within the same memory block. There are different protocols that may be followed in designing the interconnect. One of them is a round robin protocol. In the case of conflict among processor requests for memory accesses within the same memory block, requests are granted to the processors in a round robin fashion. A fixed priority protocol assigns fixed priorities to the processors. In case of conflict, the processor having the highest priority ends up having its request granted all the time. For a variable priority protocol, as will be used in an HDCA system, priorities assigned to processors dynamically vary over time based on some parameter (metric) of the system. In the HDCA system of Figure 1.1, all processors are dynamically assigned a priority depending upon their input Queue (Q) depth at any time. In the case of conflict, the processor having the highest (deepest) queue depth at that point of time gets the request grant. A design engineer has to choose among the above mentioned protocols and various kinds of implementations depending on the system requirements in designing a crossbar network. The HDCA system will need some kind of arbitration in case of processor conflicts. The interconnect design presented in this project is closely related to the design of a crossbar interconnect for the distributed memory systems presented in [3]. Both the designs address the possibility of conflicts between the processor requests for the memory accesses within the same memory block. Both the designs use parallel address and data buses between every processor and every memory block. The design presented in this project is different from the interconnect network of [3] in two ways. Firstly, the crossbar interconnect network presented in this project is suitable for the shared memory architecture. The crossbar topology used in this project is a double-sided topology whereas a single-sided topology is used in the design of the crossbar interconnect network of [3]. The priority arbitration scheme proposed in the interconnect network of 26

27 [3] uses a fixed priority scheme based on the physical distance between the processors and gives the closest processor the highest priority and farthest processor the lowest priority. The priority arbitration scheme presented in this design uses the input queue depth of the processors in determining the priorities. The HDCA system requires a double-sided, parallel bus crossbar network, with variable priority depending upon input queue depth of the processors. In this project, a double-sided, parallel-bus crossbar network using variable priority protocol is designed, implemented and tested as the interconnect for the HDCA system. The detailed design of the implementation is described in the next chapter. 27

28 Chapter 4 Design of the Crossbar Interconnect Network This chapter presents the detailed design of a crossbar interconnect which meets the requirements of the HDCA system of Figure1.1. The organization of processors, interconnect and memory blocks is shown in Figure 3.4a. Shared Memory Organization: The shared memory organization used in this project is shown in Figures 3.4b and 3.4c. From Figure 3.4b, there are (2 c = M) memory blocks and the organization of each memory block is shown in Figure 3.4c. In each memory block, there are 2 b addressable locations, of a bits width. Hence the main memory which includes all the memory blocks has 2 b+c addressable locations of a bits width. Hence the width of the address bus of each processor is (b + c) bits wide and the data bus of each processor is a bits wide. Signal Description: The schematic equivalent to the behavioral VHDL description of the interconnect is shown in Figure 4.1. In general, a processor i of Figure 3.4a has CTRL[i], RW[i], ADDR[i], QD[i] as inputs to the interconnect. CTRL[i] of processor i goes high when it wants a connection to a memory block. RW[i] goes high when it wants to read and goes low when it wants to write. ADDR[i] is the (b + c) bit address of the memory location and ADDR_BL[i] is the c bit address of the memory block with which the processor wants to communicate. The memory block is indicated by the c MSBs of ADDR[i]. QD[i] is the queue depth of the processor. FLAG[i], an input to the processor goes high granting the processor s request. FLAG[i] is decided by the priority logic of the interconnect network. PI[i], DM[i][j] and IM[j] of Figure 4.1 are different types of buses used in the interconnection network. The bus structure of these buses is shown in Figures 4.2 and 4.3. At any time, processors represented by i, can request access to memory blocks represented by MB[j]. That means CTRL[i] of those processors go high and have memory block address, ie ADDR_BL[i] = j. Hence in Figure 4.2, the bus PI[i] of the 28

29 DEC[0] D M [0][0] P[0] PI[0] PRL[0] IM [0] MB[0] MB_ ADDR[ 0 ] DEC[1] P [ 1 ] PI[1] PRL[1] IM [1] MB[1] MB_ADDR[1] DEC[N-1] P[N-1] PI[ N-1] PRL[M -1] IM [M -1] MB[M-1] MB_ADDR[N-1] D M [N -1][M -1] Figure 4.1: Block Diagram of the Crossbar Interconnect Network 29

30 C T R L [ i ] R W [ i ] A D D R [ i ] B + C D A T A [ i ] A Q D E P [ i ] F L A G [ i ] Figure 4.2: PI[i] and DM[i][j] Bus Structures. (The PI[i] Bus and DM[i][j] Bus Have the Same Set of Signal Lines as Shown in This Figure) C T R L [ i ] R W [ i ] A D D R [ i ] B + C D A T A [ i ] A F L A G [ i ] Figure 4.3: IM[j] Bus Structure processors gets connected to the bus DM[i][j], through the decode logic DEC[i] of Figure 4.1 and shown again in Figure 4.4. As shown in Figure 4.4, ADDR_BL[i] of the requesting processor is decoded by decoder DEC[i], and connects PI[i] to the DM[i][j] output bus of DEC[i]. Every memory block has a priority logic block, PRL[j], as shown in Figure 4.5. The function of this logic block is to grant a request to the processor having the deepest queue depth among the processors requesting memory access to the same memory block. As shown in Figure 4.5, once processor i gets a grant from the priority logic PRL[j] via the FLAG[i] signal of the DM[i][j] and PI[i] busses shown in Figures 4.1 and 4.2, the DM[i][j] bus is connected to the IM[j] bus by MUX[j] of Figure 4.5. Thus a 30

31 connection is established via PI[i], DM[i][j] and IM[j] between processor i and memory block j. This connection remains active as long as the processor holds deepest queue depth or CTRL[i] of the processor is active. A priority logic block gives a grant only to the highest DM[i][0] PI[i] DM[i][1] DEC[i] DM[i][M-1] ADDR_BL[i] Figure 4.4: Decode Logic (DEC[i]) DM[0][j] DM[1][j] MUX[j] IM[j] DM[N-1[j] PROC SEL[j] PRL_LOGIC[j] Figure 4.5: Priority Logic (PRL[j]) 31

32 F ctrl[0] = '1' & mbaddr[0] = j T max=0 max=qd[0] i = 0 ctrl[1] = '1' & mbaddr[1] = j & qd[1] >= max F T flag[i] = '0' max = qd[1] i = 1 max = max i = i F ctrl[2] ='1' & mbaddr[2] =j & qd[2] >= max T flag[i] = '0' max = qd[2] i = 2 max = max i = i ctrl[3] = '1' & mbaddr[3] = j & qd[3] >= max F T flag[i] = '0' max = qd[3] i = 3 max = max i =i F ctrl[i] = '1' T flag[i] = '0' flag[i] = '1' Figure 4.6: Priority Control Flow Chart for PR_LOGIC[j] in Figure

33 priority processor. The queue depth of the processors is used in determining the priority. In cases of processors having the same queue depth the processor having highest processor identification number gets the highest priority. A processor can access a particular memory block as long as it has highest priority to that memory block. If some other processor gets highest priority for that particular memory block, the processor, which is currently accessing the memory block gets its connection disconnected. It will have to wait until it gets the highest priority for accessing that block again. The flow chart showing the algorithmic operation of the j th priority logic block as shown in Figure 4.5 is shown in Figure 4.6 above. To fully follow the flow chart of Figure 4.6, we must reconcile signal names used in Figure 4.2 and Figure 4.6. The c MSBs of ADDR[i] of Figure 4.2 correspond to the mbaddr[x] of Figure 4.6 where x has an integer value ranging from 0 to (2 C 1). QDEP[i] of Figure 4.2 is the same as qd[i] of Figure 4.6. The number of processors in the figure are assumed to be 4 but the algorithm holds true for any number of processors. The PR_LOGIC[j] block of the priority logic of Figure 4.5 compares the current maximum queue depth with the queue depth of every processor starting from the 0 th processor. This comparison is done only for those processors whose CTRL is in the logic 1 state and are requesting memory access to that memory block where the priority logic operation is performed. The integer value i shown in Figure 4.6, is the identification number of the processor having deepest queue depth at that time. After completion of processor prioritizing, the processor i which has the deepest queue depth gets its request granted (FLAG[i] goes high) to access that memory block. This logic operation is structurally equivalent to the schematic shown in Figure 4.5, in which PROC_SEL[j] ( = i in the flowchart shown in Figure 4.6) acts as the select input to the multiplexer MUX[j]. This condition is achieved in VHDL code (Appendix A and Appendix B) by giving memory access to any processor (it s FLAG[i] is set equal to logic 1 ) only if its CTRL[i] is in the logic 1 state and it has the deepest queue depth among the processors requesting access within the same memory block. The Interconnect gives all the processors the flexibility of simultaneous reads or writes for those processors that are granted requests by the priority logic. In the best case all processors will have their requests granted. This is the case when CTRL of all processors is 1 and no two processors have the same ADDR_BL. In this case the binary 33

34 value of FLAG is 1111, after the completion of all iterations of the priority logic. The VHDL description of the crossbar interconnect network implementation has a single function which describes the processor prioritization done in all the memory blocks (ie the corresponding priority logic blocks). The described function works for any number of processors or memory blocks. Figures 4.1 through 4.6 show the block level design of the interconnect network. In the best case when all processors access different memory blocks, all the processors receive request grants (FLAGs) in the logic 1 state and all get their connections to different memory blocks. In this case the Interconnect is used to the fullest of its capacity, having different processors communicating with different memory blocks simultaneously. 34

35 Chapter 5 VHDL Design Capture, Simulation, Synthesis, and Implementation Flow VHDL, the Very High Speed Integrated Circuit Hardware Description Language, became a key tool for design capture and development of personal computers, cellular telephones, and high-speed data communications devices during the 1990s. VHDL is a product of the Very High Speed Integrated Circuits (VHSIC) program funded by the department of defense in the 1970s and 80s. VHDL provides both low-level and highlevel language constructs that enable designers to describe small and large circuits and systems. It provides portability of code between simulation and synthesis tools, as well as device-independent design. It also facilitates converting a design from a programmable logic to an Application Specific Integrated Circuit (ASIC) implementation. VHDL is an industry standard for the description, simulation, modeling and synthesis of digital circuits and systems. The main reason behind the growth in the use of VHDL can be attributed to synthesis, the reduction of a design description to a lower-level circuit representation. A design can be created and captured using VHDL without having to choose a device for implementation. Hence it provides a means for a device-independent design. Device-independent design and portability allows benchmarking a design using different device architectures and different synthesis tools. Electronic Design Automation (EDA) Design Tool Flow: The EDA design tool flow is as follows: Design description (capture) in VHDL Pre-synthesis simulation for design verification/validation Synthesis Post-synthesis simulation for design verification/validation Implementation (Map, Place and Route) Post-implementation simulation for design verification/validation Design optimization 35

36 Final implementation to FPGA/CPLD/ or ASIC technology. The inputs to the synthesis EDA software tool are the VHDL code, synthesis directives and the device technology selection. Synthesis directives include different kinds of external as well as internal directives that influence the device implementation process. The required device selection is done during this process. Field Programmable Gate Array (FPGA): The FPGA architecture is an array of logic cells (blocks) that communicate with one another and with I/O via wires within routing channels. Like a semi-custom gate array, which consists of an array of transistors, an FPGA consists of an array of logic cells [8,10]. A FPGA chip consists of an array of logic blocks and routing channels as shown in Figure 5.1. Each circuit or system must be mapped into the smallest square FPGA that can accommodate it. Figure 5.1: FPGA Architecture Each logic block contains or consists of a number of RAM based Look Up Tables (LUTs) used for logic function implementation and D-type flip-flops in addition to several multiplexers used for signal routing or logic function implementation. FPGA 36

37 routing can be segmented and/or un-segmented. Un-segmented routing is when each wiring segment spans only one logic block before it terminates in a switch box. By turning on some of the programmable switches within a switch box, longer paths can be constructed. The design for this project is implemented (prototyped) to a Spartan XL FPGA chip which is a XILINX product [8]. It is a PROM based FPGA. The one that is used for this project is an XCS10PC84 from the XL family. The FPGA is implemented with a regular, flexible, programmable architecture of Configurable Logic Blocks (CLBs), routing channels and surrounded by I/O devices. The FPGA is provided with a clock rate of 50 Mhz. There are two more configurable clocks on the chip. The Spartan XCS10PC84XL is an 84- pin device with 466 logic cells and is approximately equivalent to 10,000 gates. Typically, the gate range for the XL chips will be from ,000. The XCS10 has a 14x14 CLB matrix with 196 total CLBs. There are 616 flip-flops in the chip and the maximum available I/Os on the chip are 112. Digilab XL Prototype Board: Digilab XL prototype boards [8] feature a Xilinx Spartan FPGA (either 3.3V or 5V) and all the I/O devices needed to implement a wide variety of circuits. The FPGA on the board can be programmed directly from a PC using an included JTAG cable, or from an on-board PROM. A view of the board is shown in the figure below. The board has one internally generated clock and two configurable clocks. Figure 5.2: Digilab Spartan XL Prototyping Board 37

38 The Digilab prototype board contains 8 LEDs and a seven-segment display which can be used for monitoring prototype input/output signals of interest when testing the prototype system programmed into the Spartan XL chip on the board. VHDL Design Capture: Behavioral VHDL description was used in design capture and coding of the crossbar interconnect network design and logic. The structural equivalents of two behavioral VHDL descriptions is shown in Figures 6.1 and 6.2 of the next chapter. The scenario of a processor trying to access a particular memory block and whether its request is granted (FLAG = 1 ) or rejected (FLAG = 0 ) can be generalized to all processors and all memory blocks. Hence, the main VHDL code has a function flg, an entity main and a process P1 and it is possible to increase the number of processors, memory blocks (and address bus), memory locations in each memory block (and address bus), width of data bus, and the width of the processor queue depth bus. Appendix A contains the VHDL code, structured as shown in Figure 6.1, which describes the crossbar interconnect network assuming the number of processors, the number of memory blocks, and the number of addressable locations in each memory block to be 4. The input queue depth of each processor is 4-bits wide. This VHDL code is described considering the crossbar interconnect network and the shared memory as a single functional unit as shown in Figure 6.1. This code is tested for correct design capture and interconnect network functionality via the pre-synthesis, post-synthesis and post-implementation VHDL simulation stages and is downloaded onto the XILINX based Spartan XL FPGA [7,8] for prototype testing and evaluation. Appendix B contains the VHDL code describing only the crossbar interconnection network as a single functional unit as depicted in Figure 6.2. This code has more I/O pins than the previous code. This code was tested via pre-synthesis and post-synthesis VHDL simulation. With the exceptions of the I/O pins and shared memory, the functionality of both VHDL interconnect network descriptions is the same and is identical to the description of the crossbar interconnect network organization and architecture design described in Chapter 4. 38

39 Chapter 6 Design Validation via Post-Implementation Simulation Testing There are two VHDL code descriptions of the interconnect network, one in Appendix A and the other in Appendix B. Both have the same interconnect network functionality but different modular structures. The VHDL code in Appendix A is described considering both the crossbar interconnect network and the shared memory as a single block as shown in Figure 6.1. The VHDL code described in Appendix A has only processors to interface to the interconnect. data_in addr bus MB0 qdep ctrl rw MODULE main MB1 clk rst MB2 MB3 flag 4 data_out 16 Figure 6.1: Block Diagram of VHDL Code Described in Appendix A. 39

40 16 8 addr_prc addr_mem data_in_prc data_in_mem 16 4 qdep main_ic rw_mem ctrl 4 rw 4 16 clk data_out_mem rst 16 data_out_prc 4 flag Figure 6.2: Block Diagram of VHDL Code Described in Appendix B. 40

41 The VHDL code in Appendix B describes the crossbar interconnect network as a single block as shown in Figure 6.2. The VHDL code in Appendix B has two interfaces, the processors to interconnect network interface and interconnect to memory blocks interface. In both cases (Appendix A and Appendix B) the VHDL code describes a crossbar interconnect network interfaced to four processors and four memory blocks and each memory block has 4 addressable locations. The data bus of each processor is taken to be 4-bits wide. Entity 'main' corresponds to the interconnect module described in Appendix A and entity 'main_ic' corresponds to the interconnect described in Appendix B. The VHDL descriptions of the interconnect network in Appendices A and B are written in a generic parameterized manner such that additional processors and memory modules may be interfaced to the interconnect and also, the size of the memory modules may be increased. The size of the prototyped interconnect was kept small so that it would fit into the Xilinx Spartan XL chip on the prototype board shown in Figure 5.2. No functionality of the interconnect network was compromised by keeping the prototype to a small size. The VHDL description in Appendix A has a 16-bit data_in input port to the interconnect in which 4-bit data buses of all the 4 processors are included as shown in Figure 6.3. The same is the case of addr_bus, qdep, ctrl, rw, data_out, flag. Various scenarios such as: 1. all the processors writing to different memory locations in different memory, 2. all the processors reading from different memory locations in different memory, 3. two or more processors requesting access to different memory locations within the same memory block, and only the highest priority processor getting the grant to access the memory block, 4. two or more processors requesting access to the same memory location, and only the highest priority processor getting the grant to access the memory block, 5. two or more processors requesting access to the same memory block, and some of the processors having the same queue depth, with only the highest priority processor getting the grant to access the memory block, 41

42 D A T A D A T A 1 D A T A DATA_IN[15:0] D A T A Figure 6.3: Input Data Bus Format 6. one or more processors in idle state, are tested in the module 'main' during the pre-synthesis, post-synthesis and postimplementation simulations. This module is also downloaded onto a Xilinx based Spartan XL FPGA chip and is tested under the various scenarios described above. Figures 6.4, 6.5, and 6.6 show the post-implementation simulation tracers, which show the behaviour of the interconnect network and shared memory under different scenarios, described in VHDL code in Appendix A. The same coding style is followed in the VHDL code for main_ic described in Appendix B. The above mentioned scenarios of processor requests are tested on the module 'main_ic' also during the pre-synthesis, post-synthesis simulations. Figures 6.7 and 6.8 show the post-synthesis simulation tracers, which shows the behaviour of the interconnect network under different scenarios. 42

43 The simulation tracers in Figure show the behaviour of the interconnect module output data_out and shared memory in different scenarios, which are explained below. A testcase top is developed to generate input stimulus to the module main and to display the control signals, address, data of all processors on LEDs of the Spartan XL FPGA chip. scnr, pid, addr and data signals observed on the simulation tracer are used in developing the testcase top, which is described in detail in Chapter 7. In this chapter, the input stimulus and output (data_out and data in memory locations) observed on the simulation tracers is discussed. (All the data mentioned in different scenarios and that are shown on the simulation tracers are represented in hexadecimal system) Scenario 0: Input stimulus: data_in <= x"4321" ; addr_bus <= x"fb73" ; qdep <= x"1234" ; ctrl <= x"f" ; rw <= x"f" ; In this case, processor '0' is requesting memory access within memory block '0' (location '3'), processor '1' within memory block '1' (location '3'), processor '2' within memory block '2' (location '3') and processor '3' within memory block '3' (location '3'). Hence there is no conflict between the processors for memory accesses. Processors '0', '1', '2' and '3' (from 16-bit 'data_in' bus ) are writing '1', '2', '3' and '4' to the corresponding memory locations. As all the processors get the memory access and hence the data '1', '2', '3' and '4' is written to the memory location '3' in each of the four memory blocks. This data can be observed in the corresponding memory locations, on the simulation tracer 1 in Figure 6.4, for scnr = 0. Scenario 1: Input stimulus: data_in <= x"26fe" ; 43

44 addr_bus <= x"37bf" ; qdep <= x"1234" ; ctrl <= x"f" ; rw <= x"0" ; In this case, processor '0' is requesting memory access within memory block '3' (location '3'), processor '1' within memory block '2' (location '3'), processor '2' within memory block '1' (location '3') and processor '3' within memory block '0' (location '3'). Hence there is no conflict between the processors for memory accesses. Processors '0', '1', '2' and '3' are reading '4', '3', '2' and '1' from the corresponding memory locations. As all the processors get the memory access and hence the data '4', '3', '2' and '1' is read from to the memory locations 'F', 'B', '7' and '3'. This data can be observed on the data_out bus, on the simulation tracer 1 in Figure 6.4, for scnr = 1. Scenarios '0' and '1' test the case of data exchange between processors. In the scenario '0', processor '0' writes '1' to memory location '3' in memory block '0' and processor '3' writes '4' to memory location '3' in memory block '3'. In scenario '1' processor '0' reads '4' (Data written by processor '3' in scenario '0') from memory location '3' in memory block '3' and processor '3' reads '1' (Data written by processor '0' in scenario '0') from memory location '3' in memory block '0'. Similarly data exchange between processors '1' and '2' is also tested. Scenario 2: Input stimulus: data_in <= x"aaaa" ; addr_bus <= x"cd56" ; qdep <= x"efff" ; ctrl <= x"f" ; rw <= x"5" ; 44

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems

More information

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

Components: Interconnect Page 1 of 18

Components: Interconnect Page 1 of 18 Components: Interconnect Page 1 of 18 PE to PE interconnect: The most expensive supercomputer component Possible implementations: FULL INTERCONNECTION: The ideal Usually not attainable Each PE has a direct

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

Chapter 2. Multiprocessors Interconnection Networks

Chapter 2. Multiprocessors Interconnection Networks Chapter 2 Multiprocessors Interconnection Networks 2.1 Taxonomy Interconnection Network Static Dynamic 1-D 2-D HC Bus-based Switch-based Single Multiple SS MS Crossbar 2.2 Bus-Based Dynamic Single Bus

More information

FUNCTIONAL ENHANCEMENT AND APPLICATIONS DEVELOPMENT FOR A HYBRID, HETEROGENEOUS SINGLE-CHIP MULTIPROCESSOR ARCHITECTURE

FUNCTIONAL ENHANCEMENT AND APPLICATIONS DEVELOPMENT FOR A HYBRID, HETEROGENEOUS SINGLE-CHIP MULTIPROCESSOR ARCHITECTURE University of Kentucky UKnowledge University of Kentucky Master's Theses Graduate School 2004 FUNCTIONAL ENHANCEMENT AND APPLICATIONS DEVELOPMENT FOR A HYBRID, HETEROGENEOUS SINGLE-CHIP MULTIPROCESSOR

More information

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Design of a High Speed Communications Link Using Field Programmable Gate Arrays Customer-Authored Application Note AC103 Design of a High Speed Communications Link Using Field Programmable Gate Arrays Amy Lovelace, Technical Staff Engineer Alcatel Network Systems Introduction A communication

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Interconnection Networks

Interconnection Networks Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode

More information

Optimising the resource utilisation in high-speed network intrusion detection systems.

Optimising the resource utilisation in high-speed network intrusion detection systems. Optimising the resource utilisation in high-speed network intrusion detection systems. Gerald Tripp www.kent.ac.uk Network intrusion detection Network intrusion detection systems are provided to detect

More information

Introduction to Programmable Logic Devices. John Coughlan RAL Technology Department Detector & Electronics Division

Introduction to Programmable Logic Devices. John Coughlan RAL Technology Department Detector & Electronics Division Introduction to Programmable Logic Devices John Coughlan RAL Technology Department Detector & Electronics Division PPD Lectures Programmable Logic is Key Underlying Technology. First-Level and High-Level

More information

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E) Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E) 1 Topologies Internet topologies are not very regular they grew incrementally Supercomputers

More information

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview Technical Note TN-29-06: NAND Flash Controller on Spartan-3 Overview Micron NAND Flash Controller via Xilinx Spartan -3 FPGA Overview As mobile product capabilities continue to expand, so does the demand

More information

MICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1

MICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1 MICROPROCESSOR A microprocessor incorporates the functions of a computer s central processing unit (CPU) on a single Integrated (IC), or at most a few integrated circuit. It is a multipurpose, programmable

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Open Flow Controller and Switch Datasheet

Open Flow Controller and Switch Datasheet Open Flow Controller and Switch Datasheet California State University Chico Alan Braithwaite Spring 2013 Block Diagram Figure 1. High Level Block Diagram The project will consist of a network development

More information

Interconnection Network Design

Interconnection Network Design Interconnection Network Design Vida Vukašinović 1 Introduction Parallel computer networks are interesting topic, but they are also difficult to understand in an overall sense. The topological structure

More information

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT 216 ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT *P.Nirmalkumar, **J.Raja Paul Perinbam, @S.Ravi and #B.Rajan *Research Scholar,

More information

Interconnection Networks

Interconnection Networks CMPT765/408 08-1 Interconnection Networks Qianping Gu 1 Interconnection Networks The note is mainly based on Chapters 1, 2, and 4 of Interconnection Networks, An Engineering Approach by J. Duato, S. Yalamanchili,

More information

RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition

RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition A Tutorial Approach James O. Hamblen Georgia Institute of Technology Michael D. Furman Georgia Institute of Technology KLUWER ACADEMIC PUBLISHERS Boston

More information

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of

More information

On-Chip Interconnection Networks Low-Power Interconnect

On-Chip Interconnection Networks Low-Power Interconnect On-Chip Interconnection Networks Low-Power Interconnect William J. Dally Computer Systems Laboratory Stanford University ISLPED August 27, 2007 ISLPED: 1 Aug 27, 2007 Outline Demand for On-Chip Networks

More information

9/14/2011 14.9.2011 8:38

9/14/2011 14.9.2011 8:38 Algorithms and Implementation Platforms for Wireless Communications TLT-9706/ TKT-9636 (Seminar Course) BASICS OF FIELD PROGRAMMABLE GATE ARRAYS Waqar Hussain firstname.lastname@tut.fi Department of Computer

More information

How To Fix A 3 Bit Error In Data From A Data Point To A Bit Code (Data Point) With A Power Source (Data Source) And A Power Cell (Power Source)

How To Fix A 3 Bit Error In Data From A Data Point To A Bit Code (Data Point) With A Power Source (Data Source) And A Power Cell (Power Source) FPGA IMPLEMENTATION OF 4D-PARITY BASED DATA CODING TECHNIQUE Vijay Tawar 1, Rajani Gupta 2 1 Student, KNPCST, Hoshangabad Road, Misrod, Bhopal, Pin no.462047 2 Head of Department (EC), KNPCST, Hoshangabad

More information

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,

More information

Parallel Programming

Parallel Programming Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Parallel Architectures Acknowledgements Prof. Felix

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

Lecture N -1- PHYS 3330. Microcontrollers

Lecture N -1- PHYS 3330. Microcontrollers Lecture N -1- PHYS 3330 Microcontrollers If you need more than a handful of logic gates to accomplish the task at hand, you likely should use a microcontroller instead of discrete logic gates 1. Microcontrollers

More information

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP SWAPNA S 2013 EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP A

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Switch Fabric Implementation Using Shared Memory

Switch Fabric Implementation Using Shared Memory Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today

More information

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and

More information

CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE

CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE CHAPTER 5 71 FINITE STATE MACHINE FOR LOOKUP ENGINE 5.1 INTRODUCTION Finite State Machines (FSMs) are important components of digital systems. Therefore, techniques for area efficiency and fast implementation

More information

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a Parallel Computer Hardware Multiple Processors Multiple Memories Interconnection Network System Software Parallel

More information

LogiCORE IP AXI Performance Monitor v2.00.a

LogiCORE IP AXI Performance Monitor v2.00.a LogiCORE IP AXI Performance Monitor v2.00.a Product Guide Table of Contents IP Facts Chapter 1: Overview Target Technology................................................................. 9 Applications......................................................................

More information

International Journal of Advancements in Research & Technology, Volume 2, Issue3, March -2013 1 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 2, Issue3, March -2013 1 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 2, Issue3, March -2013 1 FPGA IMPLEMENTATION OF HARDWARE TASK MANAGEMENT STRATEGIES Assistant professor Sharan Kumar Electronics Department

More information

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 ISSN 0976 6464(Print)

More information

Chapter 4 Multi-Stage Interconnection Networks The general concept of the multi-stage interconnection network, together with its routing properties, have been used in the preceding chapter to describe

More information

What is a System on a Chip?

What is a System on a Chip? What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex

More information

Why the Network Matters

Why the Network Matters Week 2, Lecture 2 Copyright 2009 by W. Feng. Based on material from Matthew Sottile. So Far Overview of Multicore Systems Why Memory Matters Memory Architectures Emerging Chip Multiprocessors (CMP) Increasing

More information

Topological Properties

Topological Properties Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any

More information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.

More information

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL IJVD: 3(1), 2012, pp. 15-20 DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL Suvarna A. Jadhav 1 and U.L. Bombale 2 1,2 Department of Technology Shivaji university, Kolhapur, 1 E-mail: suvarna_jadhav@rediffmail.com

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

Design and Verification of Nine port Network Router

Design and Verification of Nine port Network Router Design and Verification of Nine port Network Router G. Sri Lakshmi 1, A Ganga Mani 2 1 Assistant Professor, Department of Electronics and Communication Engineering, Pragathi Engineering College, Andhra

More information

INTRODUCTION TO DIGITAL SYSTEMS. IMPLEMENTATION: MODULES (ICs) AND NETWORKS IMPLEMENTATION OF ALGORITHMS IN HARDWARE

INTRODUCTION TO DIGITAL SYSTEMS. IMPLEMENTATION: MODULES (ICs) AND NETWORKS IMPLEMENTATION OF ALGORITHMS IN HARDWARE INTRODUCTION TO DIGITAL SYSTEMS 1 DESCRIPTION AND DESIGN OF DIGITAL SYSTEMS FORMAL BASIS: SWITCHING ALGEBRA IMPLEMENTATION: MODULES (ICs) AND NETWORKS IMPLEMENTATION OF ALGORITHMS IN HARDWARE COURSE EMPHASIS:

More information

Implementation Details

Implementation Details LEON3-FT Processor System Scan-I/F FT FT Add-on Add-on 2 2 kbyte kbyte I- I- Cache Cache Scan Scan Test Test UART UART 0 0 UART UART 1 1 Serial 0 Serial 1 EJTAG LEON_3FT LEON_3FT Core Core 8 Reg. Windows

More information

- Nishad Nerurkar. - Aniket Mhatre

- Nishad Nerurkar. - Aniket Mhatre - Nishad Nerurkar - Aniket Mhatre Single Chip Cloud Computer is a project developed by Intel. It was developed by Intel Lab Bangalore, Intel Lab America and Intel Lab Germany. It is part of a larger project,

More information

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394

More information

Read this before starting!

Read this before starting! Points missed: Student's Name: Total score: /100 points East Tennessee State University Department of Computer and Information Sciences CSCI 4717 Computer Architecture TEST 2 for Fall Semester, 2006 Section

More information

Introduction to Digital System Design

Introduction to Digital System Design Introduction to Digital System Design Chapter 1 1 Outline 1. Why Digital? 2. Device Technologies 3. System Representation 4. Abstraction 5. Development Tasks 6. Development Flow Chapter 1 2 1. Why Digital

More information

Implementation of Web-Server Using Altera DE2-70 FPGA Development Kit

Implementation of Web-Server Using Altera DE2-70 FPGA Development Kit 1 Implementation of Web-Server Using Altera DE2-70 FPGA Development Kit A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT OF FOR THE DEGREE IN Bachelor of Technology In Electronics and Communication

More information

Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic

Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic Aims and Objectives E 3.05 Digital System Design Peter Cheung Department of Electrical & Electronic Engineering Imperial College London URL: www.ee.ic.ac.uk/pcheung/ E-mail: p.cheung@ic.ac.uk How to go

More information

Serial Communications

Serial Communications Serial Communications 1 Serial Communication Introduction Serial communication buses Asynchronous and synchronous communication UART block diagram UART clock requirements Programming the UARTs Operation

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP

More information

MICROPROCESSOR AND MICROCOMPUTER BASICS

MICROPROCESSOR AND MICROCOMPUTER BASICS Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit

More information

Development of a Research-oriented Wireless System for Human Performance Monitoring

Development of a Research-oriented Wireless System for Human Performance Monitoring Development of a Research-oriented Wireless System for Human Performance Monitoring by Jonathan Hill ECE Dept., Univ. of Hartford jmhill@hartford.edu Majdi Atallah ECE Dept., Univ. of Hartford atallah@hartford.edu

More information

Chapter 12: Multiprocessor Architectures. Lesson 04: Interconnect Networks

Chapter 12: Multiprocessor Architectures. Lesson 04: Interconnect Networks Chapter 12: Multiprocessor Architectures Lesson 04: Interconnect Networks Objective To understand different interconnect networks To learn crossbar switch, hypercube, multistage and combining networks

More information

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001 Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering

More information

Chapter 7 Memory and Programmable Logic

Chapter 7 Memory and Programmable Logic NCNU_2013_DD_7_1 Chapter 7 Memory and Programmable Logic 71I 7.1 Introduction ti 7.2 Random Access Memory 7.3 Memory Decoding 7.5 Read Only Memory 7.6 Programmable Logic Array 77P 7.7 Programmable Array

More information

Fondamenti su strumenti di sviluppo per microcontrollori PIC

Fondamenti su strumenti di sviluppo per microcontrollori PIC Fondamenti su strumenti di sviluppo per microcontrollori PIC MPSIM ICE 2000 ICD 2 REAL ICE PICSTART Ad uso interno del corso Elettronica e Telecomunicazioni 1 2 MPLAB SIM /1 MPLAB SIM is a discrete-event

More information

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM 152 APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM A1.1 INTRODUCTION PPATPAN is implemented in a test bed with five Linux system arranged in a multihop topology. The system is implemented

More information

White Paper FPGA Performance Benchmarking Methodology

White Paper FPGA Performance Benchmarking Methodology White Paper Introduction This paper presents a rigorous methodology for benchmarking the capabilities of an FPGA family. The goal of benchmarking is to compare the results for one FPGA family versus another

More information

Breaking the Interleaving Bottleneck in Communication Applications for Efficient SoC Implementations

Breaking the Interleaving Bottleneck in Communication Applications for Efficient SoC Implementations Microelectronic System Design Research Group University Kaiserslautern www.eit.uni-kl.de/wehn Breaking the Interleaving Bottleneck in Communication Applications for Efficient SoC Implementations Norbert

More information

NORTHEASTERN UNIVERSITY Graduate School of Engineering

NORTHEASTERN UNIVERSITY Graduate School of Engineering NORTHEASTERN UNIVERSITY Graduate School of Engineering Thesis Title: Enabling Communications Between an FPGA s Embedded Processor and its Reconfigurable Resources Author: Joshua Noseworthy Department:

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

Sistemas Digitais I LESI - 2º ano

Sistemas Digitais I LESI - 2º ano Sistemas Digitais I LESI - 2º ano Lesson 6 - Combinational Design Practices Prof. João Miguel Fernandes (miguel@di.uminho.pt) Dept. Informática UNIVERSIDADE DO MINHO ESCOLA DE ENGENHARIA - PLDs (1) - The

More information

Chapter 13: Verification

Chapter 13: Verification Chapter 13: Verification Prof. Ming-Bo Lin Department of Electronic Engineering National Taiwan University of Science and Technology Digital System Designs and Practices Using Verilog HDL and FPGAs @ 2008-2010,

More information

Quartus II Software Design Series : Foundation. Digitale Signalverarbeitung mit FPGA. Digitale Signalverarbeitung mit FPGA (DSF) Quartus II 1

Quartus II Software Design Series : Foundation. Digitale Signalverarbeitung mit FPGA. Digitale Signalverarbeitung mit FPGA (DSF) Quartus II 1 (DSF) Quartus II Stand: Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de Quartus II 1 Quartus II Software Design Series : Foundation 2007 Altera

More information

International Workshop on Field Programmable Logic and Applications, FPL '99

International Workshop on Field Programmable Logic and Applications, FPL '99 International Workshop on Field Programmable Logic and Applications, FPL '99 DRIVE: An Interpretive Simulation and Visualization Environment for Dynamically Reconægurable Systems? Kiran Bondalapati and

More information

Life Cycle of a Memory Request. Ring Example: 2 requests for lock 17

Life Cycle of a Memory Request. Ring Example: 2 requests for lock 17 Life Cycle of a Memory Request (1) Use AQR or AQW to place address in AQ (2) If A[31]==0, check for hit in DCache Ring (3) Read Hit: place cache word in RQ; Write Hit: replace cache word with WQ RDDest/RDreturn

More information

Hardware and Software

Hardware and Software Hardware and Software 1 Hardware and Software: A complete design Hardware and software support each other Sometimes it is necessary to shift functions from software to hardware or the other way around

More information

SYSTEM-ON-PROGRAMMABLE-CHIP DESIGN USING A UNIFIED DEVELOPMENT ENVIRONMENT. Nicholas Wieder

SYSTEM-ON-PROGRAMMABLE-CHIP DESIGN USING A UNIFIED DEVELOPMENT ENVIRONMENT. Nicholas Wieder SYSTEM-ON-PROGRAMMABLE-CHIP DESIGN USING A UNIFIED DEVELOPMENT ENVIRONMENT by Nicholas Wieder A thesis submitted to the faculty of The University of North Carolina at Charlotte in partial fulfillment of

More information

Rapid System Prototyping with FPGAs

Rapid System Prototyping with FPGAs Rapid System Prototyping with FPGAs By R.C. Coferand Benjamin F. Harding AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Newnes is an imprint of

More information

PowerPC Microprocessor Clock Modes

PowerPC Microprocessor Clock Modes nc. Freescale Semiconductor AN1269 (Freescale Order Number) 1/96 Application Note PowerPC Microprocessor Clock Modes The PowerPC microprocessors offer customers numerous clocking options. An internal phase-lock

More information

Digital Systems Design! Lecture 1 - Introduction!!

Digital Systems Design! Lecture 1 - Introduction!! ECE 3401! Digital Systems Design! Lecture 1 - Introduction!! Course Basics Classes: Tu/Th 11-12:15, ITE 127 Instructor Mohammad Tehranipoor Office hours: T 1-2pm, or upon appointments @ ITE 441 Email:

More information

Behavior Analysis of Multilayer Multistage Interconnection Network With Extra Stages

Behavior Analysis of Multilayer Multistage Interconnection Network With Extra Stages Behavior Analysis of Multilayer Multistage Interconnection Network With Extra Stages Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Computer

More information

Programmable Logic IP Cores in SoC Design: Opportunities and Challenges

Programmable Logic IP Cores in SoC Design: Opportunities and Challenges Programmable Logic IP Cores in SoC Design: Opportunities and Challenges Steven J.E. Wilton and Resve Saleh Department of Electrical and Computer Engineering University of British Columbia Vancouver, B.C.,

More information

OpenSPARC T1 Processor

OpenSPARC T1 Processor OpenSPARC T1 Processor The OpenSPARC T1 processor is the first chip multiprocessor that fully implements the Sun Throughput Computing Initiative. Each of the eight SPARC processor cores has full hardware

More information

CSE2102 Digital Design II - Topics CSE2102 - Digital Design II

CSE2102 Digital Design II - Topics CSE2102 - Digital Design II CSE2102 Digital Design II - Topics CSE2102 - Digital Design II 6 - Microprocessor Interfacing - Memory and Peripheral Dr. Tim Ferguson, Monash University. AUSTRALIA. Tel: +61-3-99053227 FAX: +61-3-99053574

More information

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware Shaomeng Li, Jim Tørresen, Oddvar Søråsen Department of Informatics University of Oslo N-0316 Oslo, Norway {shaomenl, jimtoer,

More information

Hardware Implementations of RSA Using Fast Montgomery Multiplications. ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner

Hardware Implementations of RSA Using Fast Montgomery Multiplications. ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner Hardware Implementations of RSA Using Fast Montgomery Multiplications ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner Overview Introduction Functional Specifications Implemented Design and Optimizations

More information

Computer Organization & Architecture Lecture #19

Computer Organization & Architecture Lecture #19 Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of

More information

AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping

AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping 3.1.1 Constants, variables and data types Understand what is mean by terms data and information Be able to describe the difference

More information

Redundancy in enterprise storage networks using dual-domain SAS configurations

Redundancy in enterprise storage networks using dual-domain SAS configurations Redundancy in enterprise storage networks using dual-domain SAS configurations technology brief Abstract... 2 Introduction... 2 Why dual-domain SAS is important... 2 Single SAS domain... 3 Dual-domain

More information

Lab #5: Design Example: Keypad Scanner and Encoder - Part 1 (120 pts)

Lab #5: Design Example: Keypad Scanner and Encoder - Part 1 (120 pts) Dr. Greg Tumbush, gtumbush@uccs.edu Lab #5: Design Example: Keypad Scanner and Encoder - Part 1 (120 pts) Objective The objective of lab assignments 5 through 9 are to systematically design and implement

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level System: User s View System Components: High Level View Input Output 1 System: Motherboard Level 2 Components: Interconnection I/O MEMORY 3 4 Organization Registers ALU CU 5 6 1 Input/Output I/O MEMORY

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Contents. System Development Models and Methods. Design Abstraction and Views. Synthesis. Control/Data-Flow Models. System Synthesis Models

Contents. System Development Models and Methods. Design Abstraction and Views. Synthesis. Control/Data-Flow Models. System Synthesis Models System Development Models and Methods Dipl.-Inf. Mirko Caspar Version: 10.02.L.r-1.0-100929 Contents HW/SW Codesign Process Design Abstraction and Views Synthesis Control/Data-Flow Models System Synthesis

More information

AC 2007-2485: PRACTICAL DESIGN PROJECTS UTILIZING COMPLEX PROGRAMMABLE LOGIC DEVICES (CPLD)

AC 2007-2485: PRACTICAL DESIGN PROJECTS UTILIZING COMPLEX PROGRAMMABLE LOGIC DEVICES (CPLD) AC 2007-2485: PRACTICAL DESIGN PROJECTS UTILIZING COMPLEX PROGRAMMABLE LOGIC DEVICES (CPLD) Samuel Lakeou, University of the District of Columbia Samuel Lakeou received a BSEE (1974) and a MSEE (1976)

More information

Verification of Triple Modular Redundancy (TMR) Insertion for Reliable and Trusted Systems

Verification of Triple Modular Redundancy (TMR) Insertion for Reliable and Trusted Systems Verification of Triple Modular Redundancy (TMR) Insertion for Reliable and Trusted Systems Melanie Berg 1, Kenneth LaBel 2 1.AS&D in support of NASA/GSFC Melanie.D.Berg@NASA.gov 2. NASA/GSFC Kenneth.A.LaBel@NASA.gov

More information

LatticeECP3 High-Speed I/O Interface

LatticeECP3 High-Speed I/O Interface April 2013 Introduction Technical Note TN1180 LatticeECP3 devices support high-speed I/O interfaces, including Double Data Rate (DDR) and Single Data Rate (SDR) interfaces, using the logic built into the

More information

IMPLEMENTATION OF FPGA CARD IN CONTENT FILTERING SOLUTIONS FOR SECURING COMPUTER NETWORKS. Received May 2010; accepted July 2010

IMPLEMENTATION OF FPGA CARD IN CONTENT FILTERING SOLUTIONS FOR SECURING COMPUTER NETWORKS. Received May 2010; accepted July 2010 ICIC Express Letters Part B: Applications ICIC International c 2010 ISSN 2185-2766 Volume 1, Number 1, September 2010 pp. 71 76 IMPLEMENTATION OF FPGA CARD IN CONTENT FILTERING SOLUTIONS FOR SECURING COMPUTER

More information