Interconnection Networks
|
|
- Molly Mason
- 8 years ago
- Views:
Transcription
1 Interconnection Networks Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Three questions about interconnection networks What is an interconnection network? A programmable system that transports data between terminals Where do you find interconnection network? Used in almost all digital systems that are large enough to have two components to connect The most common applications are in computer systems and communication switches Connection between processors and memories, I/O devices and I/O controllers Simple bus systems are used in many systems, but high processor performance demand fast interconnection networks Why are interconnection network important? Limiting factor in the performance of many systems 1
2 Architecture of Interconnection Networks How to connect the nodes up (processors, memories, router line cards, SoC modules) TOPOLOGY Which path should a message take? ROUTING AND DEADLOCKS How is the message actually forwarded from source to destination FLOW CONTROL How to build the routers ROUTER MICROARCHITECTURE How to build the links LINK ARCHITECTURE How do nodes talk to the network NETWORK INTERFACE Metrics in Interconnection Networks Performance Latency How fast data can be transported through the network Throughput How many pieces of data (messages) can be transported in each time unit Power Area Cost Fault-Tolerance Quality-of-service 2
3 Topology Interconnection networks consists of a set of shared router nodes and channels Topology refers to the arrangement of these nodes and channels Analogous to roadmap Channels (roads), packets (cars), router nodes (intersection) Topological Properties Routing Distance - number of links on route Average Distance Diameter - maximum routing distance Bisection Bandwidth is the bandwidth crossing a minimal cut that divides the network in half A network is partitioned by a set of links if their removal disconnects the graph Degree number of communication links attached to a node 3
4 Linear Arrays and Rings N-2 N-1... Linear Array Diameter? Average Distance? Bisection bandwidth? Route A -> B given by relative address R = B-A Ring? Examples: Fiber Distributed Data Interface (FDDI), Scalable Coherent Interface (SCI), FiberChannel Arbitrated Loop Multidimensional Meshes, Tori, and Hypercubes d-dimensional k-ary torus (or k-ary d-cube) N = k d Each dimension has k nodes, which can be located with a vector A k-ary d-cube can be constructed with k k-ary (d 1)-cubes The radix in each dimension may be different For example, 2,3,4-ary 3-cube d-dimensional k-ary mesh: similar to torus Cut the channels between the first and last node in every dimension Hypercube: binary d-cube The radix in all dimensions is either 0 or 1 4
5 Hypercubes Also called binary n-cubes Number of nodes N = 2 n Distance: O(logN) hops Good bisection bandwidth Complexity Out degree is n = logn 0-D 1-D 2-D 3-D 4-D 5-D! Real World 2D mesh 1824 node Paragon: 16 x 114 mesh 5
6 Properties Routing Relative distance: R = (b d-1 a d-1,..., b 0 a 0 ) Traverse r i = b i a i hops in each dimension dimension-order routing Degree? Diameter? Average Distance dk/4 for cube Bisection bandwidth? k d-1 bidirectional links Physical layout? 2D in O(N) space Higher dimension? Embeddings in two dimensions 6 x 3 x 2 Embed multiple logical dimension in one physical dimension using long wires 6
7 Topology Summary Topology Degree Diameter Ave Dist Bisection D (D P=1024 1D Array 2 N-1 N / 3 1 huge 1D Ring 2 N/2 N/4 2 2D Mesh 4 2 (N 1/2-1) 2/3 N 1/2 N 1/2 63 (21) 2D Torus 4 N 1/2 1/2 N 1/2 2N 1/2 32 (16) k-ary n-cube 2n nk/2 nk/4 nk/4 15 Hypercube n=logn n n/2 N/2 10 (5) All have some bad permutations Many popular permutations are very bad for meshes (transpose) Randomness in wiring or routing makes it hard to find a bad one! Trees Diameter and ave distance logarithmic k-ary tree, height d = log k N Address specified d-vector of radix k coordinates describing path down from root Fixed degree Route up to common ancestor and down R = B xor A let i be position of most significant 1 in R, route up i+1 levels down in direction given by low i+1 bits of B H-tree space is O(N) with O( N) long wires Bisection BW? 7
8 Fat-Trees Fatter links (really more of them) as you go up, so bisection BW scales with N Butterflies Tree with lots of roots! N log N (actually N/2 x logn) Exactly one route from any source to any dest R = A xor B, at level i use straight edge if r i =0, otherwise cross edge Bisection N/2 8
9 Benes network and Fat Tree Back-to-back butterfly can route all permutations Off line What if you just pick a random mid point? INPUT Butterfly network Inverse butterfly network OUTPUT Relationship Butterflies to Hypercubes Wiring is isomorphic Except that Butterfly always takes log n steps Many other types of multistage interconnection networks 9
10 How Many Dimensions? n = 2 or n = 3 Short wires, easy to build Many hops, low bisection bandwidth Requires traffic locality n >= 4 Harder to build, more wires, longer average length Fewer hops, better bisection bandwidth Can handle non-local traffic k-ary d-cubes provide a consistent framework for comparison N = k d Scale dimension (d) or nodes per dimension (k) Real Machines Wide links, smaller routing delay Tremendous variation 10
11 Routing Messages, Packets, Flits, Phits Flits (flow control digits) is the basic unit of bandwidth and storage allocation Phits (physical transfer digits) is the unit of information that is transferred across a channel in a single clock cycle 11
12 Typical Packet Format Trailer Error Code Data Payload Routing and Control Header digital symbol Sequence of symbols transmitted over a channel A packet consists of different types of flits Head, body, or tail The head flit carries the packet s routing information A packet has a format of HB*T* Routing Routing algorithm determines which of the possible paths are used as routes how the route is determined R: N x N C, which at each switch maps the destination node to the next channel on the route Issues: Routing mechanism arithmetic source-based port select table driven general computation Properties of the routes Deadlock free 12
13 Taxonomy of Routing Algorithms Deterministic Route determined by (source, dest), not intermediate state (i.e. traffic) Given two nodes x and y, the path R x,y is the same Oblivious Choose a route without considering any information about the network s current state Example, a random algorithm Adaptive Route influenced by traffic along the way Minimal Only selects shortest paths Example: routing on a ring Greedy Always send the packet in the shortest direction Uniform random Randomly pick a direction, with equal probability for picking either direction Weighted random Randomly pick a direction, but weight the short direction with 1 d/n where d is the shortest path Adaptive Send the packet in the direction for which local channel has the lowest load Record how many packets a channel has transmitted over the last T slots 13
14 Routing relation R: N N ρ(p) The output of the relation is an entire path There may be multiple paths R: N N ρ(c) Routing is incremental The output only indicates the channels that the packet take at the current node R: C N ρ(c) Similar to the second method Use the current channel instead of current node Adaptive Routing R: C N Σ C Essential for fault tolerance At least multipath Can improve utilization of the network Simple deterministic algorithms easily run into bad permutations Fully/partially adaptive, minimal/non-minimal Can introduce complexity or anomalies Little adaptation goes a long way! 14
15 Routing Mechanism Need to select output port for each input packet in a few cycles Simple arithmetic in regular topologies Example: x, y routing in a grid west (-x) x< 0 east (+x) x> 0 south (-y) x= 0, y < 0 north (+y) x= 0, y > 0 processor x= 0, y = 0 Reduce relative address of each dimension in order Dimension-order routing in k-ary d-cubes Calculate preferred directions then adjust one dimension each time Used in Cray T3D, which connects up to 2048 DEC Alpha processing elements Routing Mechanism (cont) P 3 P 2 P 1 P 0 Source-based Mainly used in deterministic and oblivious routing All routing decisions are made in the source and message header carries series of port selects Used and stripped en route Fast, simple, and scalable CS-2, Myrinet, MIT Artic Node-table More appropriate for adaptive routing Decide the output channel based on incoming channel and destination Can redirect traffic if one output link is congested or fails ATM, HPPI 15
16 Deadlock How can it arise? Necessary conditions: Shared resource (buffers or channels) Incrementally allocated Non-preemptible Think of a channel as a shared resource that is acquired incrementally Source buffer then destination buffer Channels along a route How do you avoid it? Deadlock avoidance: guarantee no deadlock Constrain how channel resources are allocated. Example: dimension order Deadlock recovery: deadlock is detected and corrected How do you prove that a routing algorithm is deadlock free? Deadlock Freedom Resources are logically associated with channels Messages introduce dependences between resources as they move forward Need to articulate the possible dependences that can arise between channels Show that there are no cycles in Channel Dependence Graph Find a numbering of channel resources such that every legal route follows a monotonic sequence => No traffic pattern can lead to deadlock Network need not be acyclic, on channel dependence graph All deadlock avoidance techniques use some form of resource ordering 16
17 Deadlock Recovery Detection Determining exactly whether the network is deadlocked is difficult Most practical detection mechanism are conservative May have false positives Timeout counters Reset when making progress Recovery Regressive: packets or connections that are deadlocked are removed Progressive: keep the packets or connections in escape buffer Potentially has better performance Routing using the escape buffer is designed to be deadlock-free Flow Control Flow control determines how a network s resources are allocated Resources: channel bandwidth, buffer capacity, etc. Good flow control: achieves a high fraction of ideal bandwidth and delivers packets with low, predictable latency Can also be viewed as a problem of contention resolution Problem is there because we are sharing resources Processor: Resources in a processor: ALUs, registers How to run as many operations, optimizing use of ALUs and registers Network Resources in a network: Buffers, links How to forward as many messages, optimizing use of buffers and links 17
18 Contention Two packets trying to use the same link at the same time Limited buffering Drop? Flow control protocols Bufferless Dropping Misrouting Circuit switching Header traverses the network and reserves resources Data are then sent through the reserved path Buffered Store-and-forward Virtual cut-through Wormhole Virtual-channel 18
19 Simplest Flow Control: Dropping If two things arrive and I don t have resources, drop one of them Flow control protocol on the Internet Not used in interconnection networks why? Time-space Diagram: Dropping 19
20 Next Simplest Flow Control: Misrouting If only one message can enter the network at each node, and one message can exit the network at each node, the network can never be congested. Right? Philosophy behind misrouting: intentionally route away from congestion No need for buffering Circuit Switching Bufferless Probe that sets up path through network If the request flit is blocked, it is held in place (not dropped) Reserve all links Data are then sent through links Simple router Similar to the dropping case Need only one register to buffer the header When is this good? When is it not? 20
21 Time-space Diagram: Circuit Switching Store-and-Forward Buffered flow control: flits can be stored in routing nodes Flits arriving on cycle i do not have to leave on cycle i + 1 Make intermediate stops and wait till the whole packet has arrived before you move on Two resources must be allocated to the packet A packet-sized buffer at the other side of the channel Exclusive use of the channel Other packets can use intermediate links Pros and cons? 21
22 Time-space Diagram: Store-and-Forward With store-and-forward, packets do no have to be divided into flits Virtual Cut-through Why wait till entire message has arrived at each intermediate stop? The head of the message can dash off first Of course, the two resources must be allocated When the head gets blocked, whole message gets blocked at the intermediate node 22
23 Time-space Diagram: Virtual Cut-through Wormhole Similar to virtual cut-through, but channel and buffers are allocated to flits rather than packets When the head flit arrives, it must acquire three resources before being forwarded to the next node A virtual channel for the packet State bits indicating the output channel, state of virtual channel (Idle, waiting for resources, or active), and other information One flit buffer One flit of channel bandwidth Body flits do not need to acquire virtual channels But still needs to allocate flit buffer and channel bandwidth The tail flit releases the virtual channel Channel is owned by a packet, but buffers are allocated on a flit-by-flit basis When a flit cannot acquire a buffer, the channel goes idle 23
24 Time-space Diagram: Wormhole Virtual Channel Associates several virtual channels with a single physical channel When a packet blocks, instead of holding on to physical links so others cannot use them, hold on to virtual links The head flit needs three resources to advance A virtual channel, a downstream flit buffer, and channel bandwidth Subsequent body flits uses the same virtual channel But still needs to allocate flit buffer and channel bandwidth However, these flits are not guaranteed access to the channel bandwidth Lanes on the highway You have to compete with other cars 24
25 Time-space diagram: virtual-channel Arbitration may not be fair It can be winner-take-all Link-level flow control Given that you can t drop packets, how to manage the buffers? When can you send stuff forward, when not? Three techniques Credit-based: upstream router keeps a count of the number of free flit buffer in each virtual channel downstream On/off: a single bit indicate whether the upstream node can send or not Ack/nack: upstream node optimistically sends flits when they are available and downstream node sends back ack or nack Flit-Reservation Reduces buffer turnaround time 25
26 Link-level flow control Short Links F/E Ready/Ack Req F/E Source Data Destination Long links Several flits on the wire Buffer turnaround time A flits leaves downstream node. Credit is sent to the current node. Credit is processed and a flip is sent to downstream node. Downstream node receives the flip hold pipeline delay wire delay buffer use release hold credit delay pipeline delay wire delay buffer use release credit delay Buffer turnaround time 26
27 Flit-reservation flow control Hides the overhead by separating the control and data networks Control flits race ahead to reserve network resources Can also streamlines the delivery of credits Allows zero buffer turnaround time Not always possible to reserve resources The control head flit is similar to a typical head flit, but with an additional field shows the time offset to the first data flit Routing node knows when the data flit will arrive, and starts to prepare buffer now Router (switch) microarchitecture: What s in a router? It s a system as well Logic State machines, Arbiters, Allocators Control the movement through router Idle, Routing, Waiting for resources, Active Memory Buffers Store flits before forwarding them SRAMs, registers, processor memory Communication Switches Transfer flits from input to output ports Crossbars, multiple crossbars, fully-connected, bus 27
28 Typical Router Design Input Ports Receiver Input Buffer Output Buffer Transmiter Output Ports Cross-bar Control Routing, Scheduling Router Components Output ports Transmitter (typically drives clock and data) Input ports Synchronizer and aligns data signal with local clock domain Essentially a FIFO buffer Crossbar Connects each input to any output Degree limited by area or pinout Buffering Control logic Complexity depends on routing logic and scheduling algorithm Determine output port for each incoming packet Arbitrate among inputs directed at same output 28
29 Buffer Organizations Input buffers Buffering at each input port, stores flits till they get to leave through switch to next hop Central buffers A central memory shared among every port Functions as switch as well Output buffers Flits flow right through to output port Highest throughput, no head-of-line blocking Input Buffered Router Input Ports R0 Output Ports R1 R2 Cross-bar R3 Scheduling Independent routing logic per input FSM Scheduler logic arbitrates each output Priority, FIFO, or random Head-of-line blocking problem If an earlier flit is missed, the later flits hold the buffer 29
30 Output Buffered Router Input Ports R0 Output Ports R1 Output Ports R2 Output Ports R3 Output Ports Control Commit to output - limited adaptivity Switch has to handle input line speeds Virtual-channel Router 30
31 Virtual-channel Router Packet head, body, tail flits Head Routing output port Request and arbitrate for next VC Request and arbitrate for switch path Request and arbitrate for buffer Traverse switch Body Request and arbitrate for switch path Request and arbitrate for buffer Traverse switch Tail Request and arbitrate for switch path Request and arbitrate for buffer Traverse switch Release switch path State machines Control the state of the router Each input channel G: Global State: is it idle? routing? waiting for VC? buffer? R: Output port Filled by routing O: Output VC Filled by VC allocation P: Head and tail queue pointers C: Credits Each output channel G: Global state: Idle? Active? Waiting for credits? I: Input VC that is sending flits to this output port C: Credit count 31
32 Pipelining of a typical virtual channel router Cycle Head flit RC VA SA ST Body flit 1 SA ST Boyd flit 2 SA ST Tail flit SA ST Cycle 0: Head flits arrives. G will change to R on the next cycle Cycle 1: RC(Routing computation). R and G (=V) will be updated on the next cycle Cycle 2: VA(Virtual channel allocation). On the next cycle, O and G (=A) will be updated. The state of output channel will be updated Cycle 3: SA: Switch allocation Cycle 4: ST: Switch traversal Output arbiters N requesters (inputs) trying to get a single resource under contention (output) N:1 arbiter for each output Several types of arbiters Fixed priority arbiter Variable priority arbiter Oblivious arbiter Round robin arbiter 32
33 Fixed Priority Arbiter Variable Priority Arbiter A one-hot priority signal p selects the highest priority Only one of the p s can be 1 33
34 Variable Priority Arbiters Oblivious Not dependent on previous grants or requests Rotating priorities Random priorities Variable Priority Arbiters Round robin Request that was last served should have lowest priority Serve all other requests first before returning to this requestor If a grant is issued this cycle, the request next to the one receiving the grant will have the highest priority on the next cycle 34
35 Allocators NxM allocator: N requestors fighting for M resources Results: A grant can be asserted only if the corresponding request is asserted At most one grant for each input may be asserted At most one grant for each resource may be asserted Allocators In Routers VC Allocator Input VCs requesting for a range of output VCs E.g. a packet of VC0 arrives at East input port. It s destined for west output port, and would like to get any of the VCs of that output port. Switch Allocator Input VCs of an input port request for different output ports (e.g. One s going North, another s going West) 35
36 Simplest Allocators: Separable Approximate with two stages of arbitration One on inputs, one on outputs. They can be in either order. Separable Allocator Example: Dumb arbiters that always choose the first request 36
37 Switches The fabric that directs flits from one input port to another output port Design issue: number of input and output ports, and speedups Speedup: the ratio of the total input bandwidth to the netowk s ideal capacity (the best throughput) Tradeoff between cost (delay, area, power) and performance (throughput) Tradeoff between leaving it up to allocation or simplifying the job for allocators Crossbar switches Input speedup = 1 Input speedup = 2 37
38 Effect of input speedup With a random allocator Throughput is the fraction of capacity Several flit buffer organizations Central Simple logical view There are actually two switches: MUX in and demux out Problems: bandwidth and latency Separate memory per input port Virtual channels associated with a physical channel can share buffer 38
39 Virtual Channel (VC) Buffer Organization One buffer per VC Allows switches to access multiple VC associated with one PC, but leads to poor memory utilization. Approximations: A small amount of output ports on a single buffer Divide VCs among buffers Memory Interleaving! Case Study: Alpha router 39
40 Alpha router Torus Virtual cut-through (316 packet buffers) Adaptive routing: prefer to continue in the same dimension Deadlock avoidance Coherence: Requests may fill up buffers, stalling acks (Solution: Virtual channel class, order) Network: Escape virtual channel Router microarchitecture 40
41 Router microarchitecture Network Interface How a processor sends data to the network Shared memory cache-coherent multiprocessors Interfaces caches with networks Message-passing multiprocessors Interfaces processor pipeline with networks Dedicated register (or two registers) Register map Memory map Virtual memory map I/O interrupt + DMA 41
42 Cache-coherent SMP processor-network interface Highly optimized interface: from load/ store to messages in a few cycles Request is placed in memory request register Tag: how to handle the reply, e.g., store the data in R24 Type: cacheable or not; read or write Cache hit: place in reply register right away Cache miss: enter miss status holding register (MSHR) Use this to merge reads/writes as well Number of MSHRs == number of pending memory references (4 to 32) Cache-coherent SMP memory-network interface Messages from the network initialize transaction status holding register (TSHR) Messages may be queued TSHR tracks the status of pending memory operations Example: For a non-cacheable read, the TSHR status changes: Read pending (waiting for bank) Bank activated (waiting for data) Read complete (preparing message) Idle (the reply message sent) 42
43 Message-passing multiprocessors: Dedicated register Send Move a value to the network out register Special MOV instruction for the last word to terminate the packet Read Block on the register until packet arrives, or test register and retry later Pros: fast Cons: Long messages: processor becoming DMA engine! Security: hold the register forever Register map Send a message atomically from a subset of the processor s general purpose register Cons: Long messages have to be segmented Pressures on general purpose register Processors are still DMA engines 43
44 I/O interface Most common interface today, in PCs, Clusters of workstations (e.g. Infiniband, Myrinet, PCI) Software-level messaging: Interrupt triggers handler Handler sets up DMA DMA engine constructs packets from memory and sends out to network Physical-memory-mapped or virtual-memory-mapped Case Study: Princeton SHRIMP Where: I/O bus How: Virtual memory map 44
45 Virtual memory mapping Map_network(My_virtual_addr_range,Your_virtual_addr_range) Each virtual page -> local physical page -> remote physical page -> remote virtual address Store to these virtual addresses => network Virtual memory map (SHRIMP) 45
46 Case Study: M-Machine Multicomputer Experimental multicomputer built at MIT and Standford 2-D torus Multi-ALU processor (MAP) chip 46
Interconnection Network
Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network
More informationInterconnection Network Design
Interconnection Network Design Vida Vukašinović 1 Introduction Parallel computer networks are interesting topic, but they are also difficult to understand in an overall sense. The topological structure
More informationLecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)
Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,
More informationInterconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!
Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel
More informationSystem Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1
System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect
More informationTopological Properties
Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any
More informationLecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)
Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E) 1 Topologies Internet topologies are not very regular they grew incrementally Supercomputers
More informationInterconnection Networks
CMPT765/408 08-1 Interconnection Networks Qianping Gu 1 Interconnection Networks The note is mainly based on Chapters 1, 2, and 4 of Interconnection Networks, An Engineering Approach by J. Duato, S. Yalamanchili,
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationInterconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)
Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems
More informationAsynchronous Bypass Channels
Asynchronous Bypass Channels Improving Performance for Multi-Synchronous NoCs T. Jain, P. Gratz, A. Sprintson, G. Choi, Department of Electrical and Computer Engineering, Texas A&M University, USA Table
More informationHyper Node Torus: A New Interconnection Network for High Speed Packet Processors
2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,
More informationDesign and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip
Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana
More informationScalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
More informationWhy the Network Matters
Week 2, Lecture 2 Copyright 2009 by W. Feng. Based on material from Matthew Sottile. So Far Overview of Multicore Systems Why Memory Matters Memory Architectures Emerging Chip Multiprocessors (CMP) Increasing
More informationIntroduction to Parallel Computing. George Karypis Parallel Programming Platforms
Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a Parallel Computer Hardware Multiple Processors Multiple Memories Interconnection Network System Software Parallel
More informationCOMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)
COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP
More informationChapter 2. Multiprocessors Interconnection Networks
Chapter 2 Multiprocessors Interconnection Networks 2.1 Taxonomy Interconnection Network Static Dynamic 1-D 2-D HC Bus-based Switch-based Single Multiple SS MS Crossbar 2.2 Bus-Based Dynamic Single Bus
More informationHardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy
Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of
More informationScaling 10Gb/s Clustering at Wire-Speed
Scaling 10Gb/s Clustering at Wire-Speed InfiniBand offers cost-effective wire-speed scaling with deterministic performance Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400
More informationOn-Chip Interconnection Networks Low-Power Interconnect
On-Chip Interconnection Networks Low-Power Interconnect William J. Dally Computer Systems Laboratory Stanford University ISLPED August 27, 2007 ISLPED: 1 Aug 27, 2007 Outline Demand for On-Chip Networks
More informationComponents: Interconnect Page 1 of 18
Components: Interconnect Page 1 of 18 PE to PE interconnect: The most expensive supercomputer component Possible implementations: FULL INTERCONNECTION: The ideal Usually not attainable Each PE has a direct
More informationArchitectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng
Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption
More informationIntroduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip
Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip Cristina SILVANO silvano@elet.polimi.it Politecnico di Milano, Milano (Italy) Talk Outline
More informationDistributed Elastic Switch Architecture for efficient Networks-on-FPGAs
Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs Antoni Roca, Jose Flich Parallel Architectures Group Universitat Politechnica de Valencia (UPV) Valencia, Spain Giorgos Dimitrakopoulos
More informationQuality of Service (QoS) for Asynchronous On-Chip Networks
Quality of Service (QoS) for synchronous On-Chip Networks Tomaz Felicijan and Steve Furber Department of Computer Science The University of Manchester Oxford Road, Manchester, M13 9PL, UK {felicijt,sfurber}@cs.man.ac.uk
More informationTRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy
TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy John Jose, K.V. Mahathi, J. Shiva Shankar and Madhu Mutyam PACE Laboratory, Department of Computer Science and Engineering
More informationTDT 4260 lecture 11 spring semester 2013. Interconnection network continued
1 TDT 4260 lecture 11 spring semester 2013 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Interconnection network continued Routing Switch microarchitecture
More informationPacketization and routing analysis of on-chip multiprocessor networks
Journal of Systems Architecture 50 (2004) 81 104 www.elsevier.com/locate/sysarc Packetization and routing analysis of on-chip multiprocessor networks Terry Tao Ye a, *, Luca Benini b, Giovanni De Micheli
More informationComputer Network. Interconnected collection of autonomous computers that are able to exchange information
Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.
More informationA Dynamic Link Allocation Router
A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection
More information- Nishad Nerurkar. - Aniket Mhatre
- Nishad Nerurkar - Aniket Mhatre Single Chip Cloud Computer is a project developed by Intel. It was developed by Intel Lab Bangalore, Intel Lab America and Intel Lab Germany. It is part of a larger project,
More informationChapter 4 Multi-Stage Interconnection Networks The general concept of the multi-stage interconnection network, together with its routing properties, have been used in the preceding chapter to describe
More informationSwitched Interconnect for System-on-a-Chip Designs
witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased
More informationSAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
More informationChapter 11 I/O Management and Disk Scheduling
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization
More informationAnnotation to the assignments and the solution sheet. Note the following points
Computer rchitecture 2 / dvanced Computer rchitecture Seite: 1 nnotation to the assignments and the solution sheet This is a multiple choice examination, that means: Solution approaches are not assessed
More informationFrom Hypercubes to Dragonflies a short history of interconnect
From Hypercubes to Dragonflies a short history of interconnect William J. Dally Computer Science Department Stanford University IAA Workshop July 21, 2008 IAA: # Outline The low-radix era High-radix routers
More informationCS 78 Computer Networks. Internet Protocol (IP) our focus. The Network Layer. Interplay between routing and forwarding
CS 78 Computer Networks Internet Protocol (IP) Andrew T. Campbell campbell@cs.dartmouth.edu our focus What we will lean What s inside a router IP forwarding Internet Control Message Protocol (ICMP) IP
More informationLOAD-BALANCED ROUTING IN INTERCONNECTION NETWORKS
LOAD-BALANCED ROUTING IN INTERCONNECTION NETWORKS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT
More informationModule 15: Network Structures
Module 15: Network Structures Background Topology Network Types Communication Communication Protocol Robustness Design Strategies 15.1 A Distributed System 15.2 Motivation Resource sharing sharing and
More informationIntroduction. Abusayeed Saifullah. CS 5600 Computer Networks. These slides are adapted from Kurose and Ross
Introduction Abusayeed Saifullah CS 5600 Computer Networks These slides are adapted from Kurose and Ross Roadmap 1.1 what is the Inter? 1.2 work edge end systems, works, links 1.3 work core packet switching,
More informationChapter 12: Multiprocessor Architectures. Lesson 04: Interconnect Networks
Chapter 12: Multiprocessor Architectures Lesson 04: Interconnect Networks Objective To understand different interconnect networks To learn crossbar switch, hypercube, multistage and combining networks
More informationComputer Organization & Architecture Lecture #19
Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of
More informationOn-Chip Communication Architectures
On-Chip Communication Architectures Networks-on-Chip ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 12 1 Outline Introduction NoC Topology Switching strategies Routing algorithms Flow
More informationDEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP SWAPNA S 2013 EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP A
More informationInfiniBand Clustering
White Paper InfiniBand Clustering Delivering Better Price/Performance than Ethernet 1.0 Introduction High performance computing clusters typically utilize Clos networks, more commonly known as Fat Tree
More informationCommunication Networks. MAP-TELE 2011/12 José Ruela
Communication Networks MAP-TELE 2011/12 José Ruela Network basic mechanisms Introduction to Communications Networks Communications networks Communications networks are used to transport information (data)
More informationTCP over Multi-hop Wireless Networks * Overview of Transmission Control Protocol / Internet Protocol (TCP/IP) Internet Protocol (IP)
TCP over Multi-hop Wireless Networks * Overview of Transmission Control Protocol / Internet Protocol (TCP/IP) *Slides adapted from a talk given by Nitin Vaidya. Wireless Computing and Network Systems Page
More informationSynchronization. Todd C. Mowry CS 740 November 24, 1998. Topics. Locks Barriers
Synchronization Todd C. Mowry CS 740 November 24, 1998 Topics Locks Barriers Types of Synchronization Mutual Exclusion Locks Event Synchronization Global or group-based (barriers) Point-to-point tightly
More informationDistributed Computing over Communication Networks: Topology. (with an excursion to P2P)
Distributed Computing over Communication Networks: Topology (with an excursion to P2P) Some administrative comments... There will be a Skript for this part of the lecture. (Same as slides, except for today...
More informationData Center Network Topologies: FatTree
Data Center Network Topologies: FatTree Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking September 22, 2014 Slides used and adapted judiciously
More informationChapter 14: Distributed Operating Systems
Chapter 14: Distributed Operating Systems Chapter 14: Distributed Operating Systems Motivation Types of Distributed Operating Systems Network Structure Network Topology Communication Structure Communication
More informationIntroduction to LAN/WAN. Network Layer
Introduction to LAN/WAN Network Layer Topics Introduction (5-5.1) Routing (5.2) (The core) Internetworking (5.5) Congestion Control (5.3) Network Layer Design Isues Store-and-Forward Packet Switching Services
More informationChapter 16: Distributed Operating Systems
Module 16: Distributed ib System Structure, Silberschatz, Galvin and Gagne 2009 Chapter 16: Distributed Operating Systems Motivation Types of Network-Based Operating Systems Network Structure Network Topology
More informationOperating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師
Lecture 7: Distributed Operating Systems A Distributed System 7.2 Resource sharing Motivation sharing and printing files at remote sites processing information in a distributed database using remote specialized
More informationChapter 18: Database System Architectures. Centralized Systems
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationPCI Express Basics Ravi Budruk Senior Staff Engineer and Partner MindShare, Inc.
PCI Express Basics Ravi Budruk Senior Staff Engineer and Partner MindShare, Inc. Copyright 2007, PCI-SIG, All Rights Reserved 1 PCI Express Introduction PCI Express architecture is a high performance,
More informationA Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator
A Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator Nan Jiang Stanford University qtedq@cva.stanford.edu James Balfour Google Inc. jbalfour@google.com Daniel U. Becker Stanford University
More informationVorlesung Rechnerarchitektur 2 Seite 178 DASH
Vorlesung Rechnerarchitektur 2 Seite 178 Architecture for Shared () The -architecture is a cache coherent, NUMA multiprocessor system, developed at CSL-Stanford by John Hennessy, Daniel Lenoski, Monica
More informationCray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak
Cray Gemini Interconnect Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Outline 1. Introduction 2. Overview 3. Architecture 4. Gemini Blocks 5. FMA & BTA 6. Fault tolerance
More informationPrinciples and characteristics of distributed systems and environments
Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single
More informationLocal Area Networks transmission system private speedy and secure kilometres shared transmission medium hardware & software
Local Area What s a LAN? A transmission system, usually private owned, very speedy and secure, covering a geographical area in the range of kilometres, comprising a shared transmission medium and a set
More informationInterconnection Networks. B649 Parallel Computing Seung-Hee Bae Hyungro Lee
Interconnection Networks B649 Parallel Computing Seung-Hee Bae Hyungro Lee Outline Introduction Interconnecting Two Devices Connecting More than Two Devices Network Topology Network Routing, Arbitration,
More informationRouting in packet-switching networks
Routing in packet-switching networks Circuit switching vs. Packet switching Most of WANs based on circuit or packet switching Circuit switching designed for voice Resources dedicated to a particular call
More informationQoS Switching. Two Related Areas to Cover (1) Switched IP Forwarding (2) 802.1Q (Virtual LANs) and 802.1p (GARP/Priorities)
QoS Switching H. T. Kung Division of Engineering and Applied Sciences Harvard University November 4, 1998 1of40 Two Related Areas to Cover (1) Switched IP Forwarding (2) 802.1Q (Virtual LANs) and 802.1p
More informationCROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING
CHAPTER 6 CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING 6.1 INTRODUCTION The technical challenges in WMNs are load balancing, optimal routing, fairness, network auto-configuration and mobility
More informationIntel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family
Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family White Paper June, 2008 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL
More informationThe Internet. Charging for Internet. What does 1000M and 200M mean? Dr. Hayden Kwok-Hay So
The Internet CCST9015 Feb 6, 2013 What does 1000M and 200M mean? Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering 2 Charging for Internet One is charging for speed (How fast the
More informationBroadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.
Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet
More informationBehavior Analysis of Multilayer Multistage Interconnection Network With Extra Stages
Behavior Analysis of Multilayer Multistage Interconnection Network With Extra Stages Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Computer
More informationUNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction
More informationA Low Latency Router Supporting Adaptivity for On-Chip Interconnects
A Low Latency Supporting Adaptivity for On-Chip Interconnects Jongman Kim Dongkook Park T. Theocharides N. Vijaykrishnan Chita R. Das Department of Computer Science and Engineering The Pennsylvania State
More informationInterconnection Networks
Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode
More informationReal-Time (Paradigms) (51)
Real-Time (Paradigms) (51) 5. Real-Time Communication Data flow (communication) in embedded systems : Sensor --> Controller Controller --> Actor Controller --> Display Controller Controller Major
More informationTransport Layer Protocols
Transport Layer Protocols Version. Transport layer performs two main tasks for the application layer by using the network layer. It provides end to end communication between two applications, and implements
More informationDesign and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip
Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip Manjunath E 1, Dhana Selvi D 2 M.Tech Student [DE], Dept. of ECE, CMRIT, AECS Layout, Bangalore, Karnataka,
More informationControlled Random Access Methods
Helsinki University of Technology S-72.333 Postgraduate Seminar on Radio Communications Controlled Random Access Methods Er Liu liuer@cc.hut.fi Communications Laboratory 09.03.2004 Content of Presentation
More informationComputer Systems Structure Input/Output
Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices
More informationLow-Overhead Hard Real-time Aware Interconnect Network Router
Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering
More informationWhat is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus
Datorteknik F1 bild 1 What is a bus? Slow vehicle that many people ride together well, true... A bunch of wires... A is: a shared communication link a single set of wires used to connect multiple subsystems
More informationDefinition. A Historical Example
Overlay Networks This lecture contains slides created by Ion Stoica (UC Berkeley). Slides used with permission from author. All rights remain with author. Definition Network defines addressing, routing,
More informationWhite Paper Abstract Disclaimer
White Paper Synopsis of the Data Streaming Logical Specification (Phase I) Based on: RapidIO Specification Part X: Data Streaming Logical Specification Rev. 1.2, 08/2004 Abstract The Data Streaming specification
More informationRecursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip
Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim Department of Computer Science and Engineering Texas A&M University
More informationHow To Understand The Concept Of A Distributed System
Distributed Operating Systems Introduction Ewa Niewiadomska-Szynkiewicz and Adam Kozakiewicz ens@ia.pw.edu.pl, akozakie@ia.pw.edu.pl Institute of Control and Computation Engineering Warsaw University of
More informationPerformance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09
Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors NoCArc 09 Jesús Camacho Villanueva, José Flich, José Duato Universidad Politécnica de Valencia December 12,
More informationSwitch Fabric Implementation Using Shared Memory
Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today
More informationConfiguration Discovery and Mapping of a Home Network
Communicating Process Architectures 2002 191 James Pascoe, Peter Welch, Roger Loader and Vaidy Sunderam (Eds.) IOS Press, 2002 Configuration Discovery and Mapping of a Home Network Keith PUGH Computer
More informationParallel Programming
Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Parallel Architectures Acknowledgements Prof. Felix
More informationOpen Flow Controller and Switch Datasheet
Open Flow Controller and Switch Datasheet California State University Chico Alan Braithwaite Spring 2013 Block Diagram Figure 1. High Level Block Diagram The project will consist of a network development
More informationJournal of Parallel and Distributed Computing 61, 11481179 (2001) doi:10.1006jpdc.2001.1747, available online at http:www.idealibrary.com on Adaptive Routing on the New Switch Chip for IBM SP Systems Bulent
More informationFiber Channel Over Ethernet (FCoE)
Fiber Channel Over Ethernet (FCoE) Using Intel Ethernet Switch Family White Paper November, 2008 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR
More informationChapter 8 Interconnection Networks and Clusters
EEF011 Computer Architecture 計 算 機 結 構 Chapter 8 Interconnection Networks and Clusters 吳 俊 興 高 雄 大 學 資 訊 工 程 學 系 January 2005 Chapter 8. Interconnection Networks and Clusters 8.1 Introduction 8.2 A Simple
More informationSCALABILITY AND AVAILABILITY
SCALABILITY AND AVAILABILITY Real Systems must be Scalable fast enough to handle the expected load and grow easily when the load grows Available available enough of the time Scalable Scale-up increase
More informationPerformance Analysis of Storage Area Network Switches
Performance Analysis of Storage Area Network Switches Andrea Bianco, Paolo Giaccone, Enrico Maria Giraudo, Fabio Neri, Enrico Schiattarella Dipartimento di Elettronica - Politecnico di Torino - Italy e-mail:
More informationChapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
More informationThe proliferation of the raw processing
TECHNOLOGY CONNECTED Advances with System Area Network Speeds Data Transfer between Servers with A new network switch technology is targeted to answer the phenomenal demands on intercommunication transfer
More informationOptical interconnection networks with time slot routing
Theoretical and Applied Informatics ISSN 896 5 Vol. x 00x, no. x pp. x x Optical interconnection networks with time slot routing IRENEUSZ SZCZEŚNIAK AND ROMAN WYRZYKOWSKI a a Institute of Computer and
More informationEfficient Built-In NoC Support for Gather Operations in Invalidation-Based Coherence Protocols
Universitat Politècnica de València Master Thesis Efficient Built-In NoC Support for Gather Operations in Invalidation-Based Coherence Protocols Author: Mario Lodde Advisor: Prof. José Flich Cardo A thesis
More informationA Link Load Balancing Solution for Multi-Homed Networks
A Link Load Balancing Solution for Multi-Homed Networks Overview An increasing number of enterprises are using the Internet for delivering mission-critical content and applications. By maintaining only
More information