Components: Interconnect Page 1 of 18

Similar documents
Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Topological Properties

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Interconnection Network

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Why the Network Matters

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Chapter 2. Multiprocessors Interconnection Networks

Interconnection Networks

Interconnection Network Design

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms

Chapter 12: Multiprocessor Architectures. Lesson 04: Interconnect Networks

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

Parallel Programming

Lecture 2 Parallel Programming Platforms

Data Center Switch Fabric Competitive Analysis

Scalability and Classifications

Annotation to the assignments and the solution sheet. Note the following points

Scaling 10Gb/s Clustering at Wire-Speed

Interconnection Networks


Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Interconnection Networks

Large Scale Clustering with Voltaire InfiniBand HyperScale Technology

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Binary search tree with SIMD bandwidth optimization using SSE

Parallel Architectures and Interconnection

Behavior Analysis of Multilayer Multistage Interconnection Network With Extra Stages

On-Chip Interconnection Networks Low-Power Interconnect

Mixed-Criticality Systems Based on Time- Triggered Ethernet with Multiple Ring Topologies. University of Siegen Mohammed Abuteir, Roman Obermaisser

Synchronization. Todd C. Mowry CS 740 November 24, Topics. Locks Barriers

Lecture 24: WSC, Datacenters. Topics: network-on-chip wrap-up, warehouse-scale computing and datacenters (Sections )

Industrial Ethernet How to Keep Your Network Up and Running A Beginner s Guide to Redundancy Standards

InfiniBand Clustering

Interconnection Network of OTA-based FPAA

Network Architecture and Topology

The Butterfly, Cube-Connected-Cycles and Benes Networks

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Hadoop Cluster Applications

A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs

Chapter 15: Distributed Structures. Topology

Photonic Networks for Data Centres and High Performance Computing

Chapter 2 Parallel Architecture, Software And Performance

Cisco s Massively Scalable Data Center

Lecture 6 Types of Computer Networks and their Topologies Three important groups of computer networks: LAN, MAN, WAN

Communication Networks. MAP-TELE 2011/12 José Ruela

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

Local Area Networks transmission system private speedy and secure kilometres shared transmission medium hardware & software

Mobile Security Wireless Mesh Network Security. Sascha Alexander Jopen

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Module 5. Broadcast Communication Networks. Version 2 CSE IIT, Kharagpur

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

Systolic Computing. Fundamentals

A Source Identification Scheme against DDoS Attacks in Cluster Interconnects

CSCI 362 Computer and Network Security

Distributed communication-aware load balancing with TreeMatch in Charm++

SOC architecture and design

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

A hierarchical multicriteria routing model with traffic splitting for MPLS networks

PowerPC Microprocessor Clock Modes

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

Real-time Processor Interconnection Network for FPGA-based Multiprocessor System-on-Chip (MPSoC)

Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors

Brocade Solution for EMC VSPEX Server Virtualization

MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL

Non-blocking Switching in the Cloud Computing Era

Memory Systems. Static Random Access Memory (SRAM) Cell

Locality-Sensitive Operators for Parallel Main-Memory Database Clusters

Definition. A Historical Example

Large-Scale Distributed Systems. Datacenter Networks. COMP6511A Spring 2014 HKUST. Lin Gu

Designing HP SAN Networking Solutions

Parallel Computing. Benson Muite. benson.

LOAD-BALANCED ROUTING IN INTERCONNECTION NETWORKS

Topology-based network security

AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping

MPLS-TP. Future Ready. Today. Introduction. Connection Oriented Transport

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Documentation. M-Bus 130-mbx

C Information Systems for Managers Fall 1999

FORDHAM UNIVERSITY CISC Dept. of Computer and Info. Science Spring, Lab 2. The Full-Adder

IST 220 Exam 3 Notes Prepared by Dan Veltri

Computer Networking: A Survey

The Goldberg Rao Algorithm for the Maximum Flow Problem

2 Basic Concepts. Contents

DIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION

Introduction to LAN/WAN. Network Layer

Municipal Mesh Network Design

Transcription:

Components: Interconnect Page 1 of 18 PE to PE interconnect: The most expensive supercomputer component Possible implementations: FULL INTERCONNECTION: The ideal Usually not attainable Each PE has a direct link to every other PE. Nice in principle but costly: Number of links is proportional to the square of the number of PEs. For large number of PEs this becomes impractical. Therefore we will try two compromises: Static interconnect networks and dynamic interconnect networks.

Components: Interconnect Page 2 of 18 BUS Bus-based networks are perhaps the simplest: they consist of a shared medium common to all nodes. Cost of the network is proportional to the number of nodes, distance between any two nodes is constant O(1). Ideal for broadcasting info among nodes. However: Bounded bus bandwidth limits total number of nodes. Partial remedy: use caches new problem: cache contamination. Bus networks: (a) Without local memory / caches, (b) with local memory / caches

Components: Interconnect Page 3 of 18 CROSSBAR A crossbar network connecting p processors to b memory banks is shown below: This is a non-blocking network: a connection of one processor to a given memory bank does not block a connection of another processor to a different memory bank. There must be pxb switches. It is reasonable to assume that b > p From this follows that the cost of crossbar is high, at least O(p 2 ), so it is not very scalable like the fully connected network.

Components: Interconnect Page 4 of 18 LINEAR + RING In an attempt to reduce interconnect cost, we try out sparser networks: (a) Linear network: every node has two neighbours (except terminal nodes) (b) Ring or 1D torus: every node has exactly two neighbours. Note that by providing the wraparound link we halve the maximum distance between the nodes and double the bandwidth. We may attempt a multidimensional generalization: MESH + TORUS: 2D, 3D, etc. (a) 2D mesh, (b) 2D torus, (c) 3D mesh. Designers like 2D meshes due to easy wiring layout. Users like 3D meshes and 3D tori because many problems map naturally to 3D topologies (like weather modeling, structural modeling, etc.). This is so because we seem to inhabit a 3D universe. Note that nd meshes and nd tori need not have the same number of nodes in each dimension. This facilitates upgrades at cost of increased node-to-node distance.

Components: Interconnect Page 5 of 18 Another multidimensional generalization: So far, when increasing the number of processors we kept the network dimensionality constant. How about another approach: Let s keep the number of processors in any given dimension constant (say, 2) and keep increasing dimensionality. We get hypercube. HYPERCUBE (a.k.a. n-cube) Observe a clever numbering scheme of nodes in a hypercube, facilitating message forwarding.

Components: Interconnect Page 6 of 18 Basic concept: TREE In a tree network there is only one path between any two nodes. The taller the tree, the higher is communication bottleneck at high levels of the tree. Two remedies are possible: We may have (a) static tree networks, or (b) dynamic tree networks. Alternatively, we may introduce fat tree networks (see below). Fat tree network.

Components: Interconnect Page 7 of 18 The crossbar network is scalable in terms of performance, but not scalable in terms of cost. Conversely, the bus network is scalable in terms of cost but not in terms of performance, hence some designers feel the need to compromise: MULTISTAGE NETWORKS A multistage network connects a number of processors to a number of memory banks, via a number of switches organized in layers, viz: Each switch can be in one of the following positions: The example above is that of the Omega Network.

Components: Interconnect Page 8 of 18 OMEGA Omega network connecting P processors to P memory banks (see below for P=8) Omega network has P/2 * log(p) switches, so the cost of this network is lower than the crossbar network.

Components: Interconnect Page 9 of 18 OMEGA (continued) Omega network belongs to the class of blocking networks: Observe that, in the diagram above, when P2 is connected to M6, P6 cannot talk to M4.

Components: Interconnect Page 10 of 18 An Omega network can be static: switches may remain in fixed position (either straight-thru or criss-cross). An Omega network can also be used to connect processors to processors. Example of such a network: SHUFFLE EXCHANGE Consider a set of N processors, numbered P 0, P 1, P N-1 Perfect shuffle connects processors P i and P j by a one-way communications link, if j = 2*i for 0 <= i <= N/2 1 or j = 2*i + 1 N otherwise. See below an example for N= 8 where arrows represent shuffle links and solid lines represent exchange links. In other words, perfect shuffle connects processor I with (2*I modulo (N-1)), with the exception of the processor N 1 which is connected to itself. Having trouble with this logic? Consider the following:

Components: Interconnect Page 11 of 18 SHUFFLE EXCHANGE (continued) Let s represent numbers i and j in binary. If j can be obtained from i by a circular shift to the left, then P i and P j are connected by one-way communications link, viz.: A perfect unshuffle can be obtained by reversing the direction of arrows or making all links bi-directional.

Components: Interconnect Page 12 of 18 Other interconnect solutions: A naïve solution: STAR In this solution the central node plays the same role as the bus in bus networks. It also suffers from the some shortcomings. However, this idea can be generalized:

Components: Interconnect Page 13 of 18 STAR (continued) A generalized star interconnection network has the property that for a given integer N, we have exactly N! processors. Each processor is labeled with the permutation to which it corresponds. Two processors P i and P j are connected if the label I can be transformed into label j by switching the first label symbol of I with a symbol of j (excluding 1 st symbol of j) Below we have a star network for N=4, i.e. a network of 4! = 24 processors. Example: Processors labeled 2134 and 3124 are connected with two links. NOTE: The whole idea is to make each node a center node of a small star!

Components: Interconnect Page 14 of 18 DE BRUIJN A network consisting of N = d k processors, each labeled with a k-digit word (a k-1 a k-2 a 1 a 0 ) where a j is a digit (radix d), i.e. a j is one of (0, 1,, d-1) The processors directly reachable from (a k-1 a k-2 a 1 a 0 ) are q (a k-2 a 1 a 0 q) and (q a k-1 a k-2 a 1 ) where q is another digit (radix d). Shown below is a de Bruijn network for d=2 and k=3 De Bruijn network can be seen as a generalization of a shuffle exchange network. It contains shuffle connections, but has smaller diameter than the shuffle exchange (roughly half the diameter).

Components: Interconnect Page 15 of 18 BUTTERFLY A Butterfly network is made of (n + 1)*2 n processors organized into n+1 rows, each containing 2 n processors. Rows are labeled 0 n. Each processor has four connections to other processors (except processors in top and bottom row). Processor P(r, j), i.e. processor number j in row r is connected to P(r-1, j) and P(r-1, m) where m is obtained by inverting the r th significant bit in the binary representation of j.

Components: Interconnect Page 16 of 18 PYRAMID A pyramid consists of (4 d+1 1)/3 processors organized in d+1 levels so as: Levels are numbered from d down to 0 There is 1 processor at level d Every level below d has four times the number of processors than the level immediately above it. Note the connections between processors. Pyramid interconnection can be seen as generalization of the ring binary tree network, or as a way of combining meshes and trees.

Components: Interconnect Page 17 of 18 COMPARISON OF INTERCONNECTION NETWORKS Intuitively, one network topology is more desirable than another if it is More efficient More convenient More regular (i.e. easy to implement) More expandable (i.e. highly modular) Unlikely to experience bottlenecks Clearly no one interconnection network maximizes all these criteria. Some tradeoffs are needed. Standard criteria used by industry: Network diameter = Max. number of hops necessary to link up two most distant processors Network bisection width = Minimum number of links to be severed for a network to be into two halves (give or take one processor) Network bisection bandwidth = Minimum sum of bandwidths of chosen links to be severed for a network to be into two halves (give or take one processor) Maximum-Degree of PEs = maximum number of links to/from one PE Minimum-Degree of PEs = minimum number of links to/from one PE

Components: Interconnect Page 18 of 18 COMPARISON OF INTERCONNECTION NETWORKS (continued) Interconnect comparison at-a-glance: Network Topology Number of Nodes Node Degree Linear and Ring d 2 Shuffle-Exchange 2 d 3 2D Mesh d 2 4 Hypercube 2 d d Star m! m-1 De Bruijn 2 d 4 Binary Tree 2 d - 1 3 Butterfly (d+1)* 2 d d+1 Omega 2 d 2 Pyramid (4 d+1 1)/3 9