On-Chip Interconnection Networks Low-Power Interconnect

Size: px
Start display at page:

Download "On-Chip Interconnection Networks Low-Power Interconnect"

Transcription

1 On-Chip Interconnection Networks Low-Power Interconnect William J. Dally Computer Systems Laboratory Stanford University ISLPED August 27, 2007 ISLPED: 1 Aug 27, 2007

2 Outline Demand for On-Chip Networks What is Unique about On-Chip Networks The Power Problem Enabling Technologies Circuits - set the constraints Topology Micro-Architecture Summary ISLPED: 2 Aug 27, 2007

3 Urgent Demand for OCINs ISLPED: 3 Aug 27, 2007

4 The Future is CMPs OCINs are a Critical Component ISLPED: 4 Aug 27, 2007

5 Example CMP OCIN ISLPED: 5 Aug 27, 2007

6 Growing Complexity of SoCs Demands an On-Chip Interconnection Network Avner Goren TI EPF 2004 ISLPED: 6 Aug 27, 2007

7 So, what s different about on-chip networks? ISLPED: 7 Aug 27, 2007

8 State of Off-Chip Networks ISLPED: 8 Aug 27, 2007

9 Technology Trends bandwidth per router node (Gb/s) BlackWidow Torus Routing Chip Intel ipsc/2 J-Machine CM-5 Intel Paragon XP Cray T3D MIT Alewife IBM Vulcan Cray T3E SGI Origin 2000 AlphaServer GS320 IBM SP Switch2 Quadrics QsNet Cray X1 Velio 3003 IBM HPS SGI Altix 3000 Cray XT3 YARC year ISLPED: 9 Aug 27, 2007

10 Some History MARS Router 1984 Torus Routing Chip 1985 MDP 1991 Network Design Frame 1988 Robert Mullins Reliable Router 1994 ISLPED: 10 MAP 1998 Imagine 2002 YARC 2006 Aug 27, 2007

11 Some very good books ISLPED: 11 Aug 27, 2007

12 Topology Summary of Off-Chip Networks* Fit to packaging and signaling technology High-radix - Clos or FlatBfly gives lowest cost Routing Global adaptive routing balances load w/o destroying locality Flow control Virtual channels/virtual cut-through *oversimplified ISLPED: 12 Aug 27, 2007

13 So, what s different about on-chip networks? ISLPED: 13 Aug 27, 2007

14 Cost, Channels, Workload are Different Cost Off-chip: cost is channels - pins, connectors, cables, optics On-chip: cost is Si area and Power (storage and switches), wires plentiful Drives networks with many long, wide channels, few buffers Channel Characteristics On-chip RC lines - need a repeater every 1mm (or less) Short distance - low latency Can put logic in repeaters, motivates low-latency routers Workload CMP cache traffic SoC isochronous flows Design issues Floorplanning Different constraints motivate some surprising differences in design. ISLPED: 14 Aug 27, 2007

15 The Power Problem ISLPED: 15 Aug 27, 2007

16 Power is Dominated by Interconnect In 65nm A 32 bit add takes 0.8pJ Moving a 32 bit word 1mm takes 13pJ A 64-core CMP at 1GHz Estimated BW demand of 6.4GW/s (~200Gb/s) Average distance for each transfer of 10mm (each way) Power due to network ~50W Power due to 64 adds at 1GHz ~50mW Power due to 64 RISC processors at 1GHz ~13W Need to reduce network power by at least 10x ISLPED: 16 Aug 27, 2007

17 OCINs are an Enabling Technology Modular Standard interface enables design reuse Optimized Encapsulates circuits and wiring. Enables optimization. Efficient Just the wires needed to reach the destination are toggled. ISLPED: 17 Aug 27, 2007

18 OCINs vs Buses Most CMPs and SOCs today use one or more buses for interconnect A bus is just one type of OCIN Low bandwidth (one transation at a time) High concentration (all processors share one link) Inefficient (all wire segments toggled even for short communication) For both buses and other OCINs Control path is dominated by arbiters/allocators Data path is dominated by data movement and buffering Can do much better by searching more of the network design space ISLPED: 18 Aug 27, 2007

19 Circuits set Cost & Area Constraints for Architecture Can do substantially (10x-100x) better than default circuits ISLPED: 19 Aug 27, 2007

20 Enabling Technology is a Prerequisite Channels, Buffers, Switches Topology Routing Flow Control Microarchitecture ISLPED: 20 Aug 27, 2007

21 Channels 10x to 100x power reduction Eq signaling for faster propagation and increased repeater distance (D & P Chapter 8, Heaton 01) Elastic channels provide free buffers (Mizuno 01) Send 4-8 bits per cycle per wire (assuming 20FO4 cycle) ISLPED: 21 Aug 27, 2007

22 Buffers Dense arrays (vs. Flip-Flops or Latches) 1/10 area/bit 1/10 power for low-swing read Low-swing write 1/10 power for writes. Low-swing read - can keep swing low through muxes. ISLPED: 22 Aug 27, 2007

23 Switches Low-swing bit lines Operate at channel rate Reduces area and hence power Equalized drive Buffered crosspoints Integral allocation ISLPED: 23 Aug 27, 2007

24 Circuits Impact Architecture With standard-cell approach Power is approximately evenly split between channels, buffers, and routers With efficient circuits Channels 1/30, buffers 1/3 Routers dominate Routing >> Buffering >> Propagating Motivates topologies with fewer hops, longer channels. Just propagate bits - avoid buffering, really avoid routing ISLPED: 24 Aug 27, 2007

25 Properties of these elements drives optimal network organization ISLPED: 25 Aug 27, 2007

26 On-Chip Interconnection Network System = Processor Tiles Source: Balfour and Dally, ICS 06 ISLPED: 26 Aug 27, 2007

27 On-Chip Interconnection Network (2) System = Processor Tiles + Channels Source: Balfour and Dally, ICS 06 ISLPED: 27 Aug 27, 2007

28 Interconnection Network (3) System = Processor Tiles + Channels + Routers Source: Balfour and Dally, ICS 06 ISLPED: 28 Aug 27, 2007

29 Router Architecture Input-queued Virtual Channel Speculative Pipeline Source: Balfour and Dally, ICS 06 ISLPED: 29 Aug 27, 2007

30 Router Area Accurate modeling requires floorplan Source: Balfour and Dally, ICS 06 ISLPED: 30 Aug 27, 2007

31 Torus Source: Balfour and Dally, ICS 06 ISLPED: 31 Aug 27, 2007

32 Concentrated Mesh Source: Balfour and Dally, ICS 06 ISLPED: 32 Aug 27, 2007

33 Express Links Source: Balfour and Dally, ICS 06 ISLPED: 33 Aug 27, 2007

34 Network Replication Abundant wire resources build second network Resource allocation tradeoff Wide: [+] Serialization Latency [+] Router Energy Efficiency [ ] Router Area Replicated: [+] Decoupled Resources [+] Area Efficiency [?] Energy Efficiency [ ] Serialization Latency [+] SCALABLE Source: Balfour and Dally, ICS 06 ISLPED: 34 Aug 27, 2007

35 Energy Efficiency Network Energy Completion Time (normalized to Torus network) Concentrated Mesh (Replicated) Torus Fat-Tree Fat-Tree with Taper Mesh (Replicated) Source: Balfour and Dally, ICS 06 ISLPED: 35 Aug 27, 2007

36 Large differences in efficiency. Optimal topology not obvious, not regular and very sensitive to properties of network elements ISLPED: 36 Aug 27, 2007

37 Where is Energy Expended? Power [W] Buffers Switch Channels 0 Concentrated Mesh (Replicated) Torus Source: Balfour and Dally, ICS 06 ISLPED: 37 Aug 27, 2007

38 Flow Control Trade channel bandwidth (cheap) for buffer space (expensive) Make buffers shallow Compensate for lower duty factor by overprovisioning channels Little cost in energy Circuit switching (no buffers) Elastic buffers - use free buffers in the channels ISLPED: 38 Aug 27, 2007

39 Flow Control in an On-Chip FlatBfly S X D ISLPED: 39 Aug 27, 2007

40 View as Two Buffered Links S X D S X D ISLPED: 40 Aug 27, 2007

41 Channels Have Repeaters S X D ISLPED: 41 Aug 27, 2007

42 Buffers Decouple Channel Allocation in Time S X D S X D ISLPED: 42 Aug 27, 2007

43 Circuit Switching S X D S X D ISLPED: 43 Aug 27, 2007

44 With Elastic Buffers S X D S X D ISLPED: 44 Aug 27, 2007

45 Summary ISLPED: 45 Aug 27, 2007

46 Comments on Shekar s Talk There is more to life than meshes and buses Optimal networks, like flattened butterflies are better than either Buses weren t even good on boards Slow, no parallelism, excess power, no locality Even Shekar s numbers say buses are worse W vs 20-80W Differential signaling can be applied to networks too. Routers don t take 3-5 clocks - many examples of one clock routers See Mullins, ISCA 2004 Arbiters for circuit switched network no different than routers for packet switched But drops packets on collisions - packet switching more efficient Yes, embrace parallelism - with well-designed networks We re making progress - in December Shekar was saying buses, now he s advanced to circuit switching ISLPED: 46 Aug 27, 2007

47 Summary OCINs critically important Vital component of CMPs, SoCs Less mature technology than other components Naïve implementations consume significant Power Power is dominated by interconnect - operations are cheap Very different than off-chip networks Cost, Channels, Workloads, Design Issues Efficient network elements are enabling technology Energy and area efficient channels, buffers, & switches Change the equation for network design: channels << buffers << routers Topology Minimize diameter Concentrated Mesh with Express Channels Flattened Butterfly Bus is one extreme, but almost never the right answer Flow Control Minimize buffers at switch points Use elastic buffers to minimize latency Efficient OCIN designs yield significant gains ISLPED: 47 Aug 27, 2007

48 Backup Slides ISLPED: 48 Aug 27, 2007

49 On-Chip Flattened Butterfly Conventional 2D Mesh 2D Flattened Butterfly Source: Kim and Dally, to appear ISLPED: 49 Aug 27, 2007

50 On-Chip Flattened Butterfly dimension 1 Layout Mapping dimension 2 Source: Kim and Dally, to appear ISLPED: 50 Aug 27, 2007

51 Bypass Channels Conventional Flattened Butterfly Flattened Butterfly with Bypass Channels connected to local router Source: Kim and Dally, to appear ISLPED: 51 Aug 27, 2007

52 Latency Comparison Latency (normalized to mesh network) bitrev tornado UR MESH CMESH FBFLY-MIN FBFLY- NONMIN transpose randperm bitcomp FBFLY-BYP Source: Kim and Dally, to appear ISLPED: 52 Aug 27, 2007

53 Power Comparison Power (W) Memory Crossbar Channel 2 0 MESH CMESH FBFLY-MIN FBFLY- NONMIN Source: Kim and Dally, to appear FBFLY-BYP ISLPED: 53 Aug 27, 2007

From Hypercubes to Dragonflies a short history of interconnect

From Hypercubes to Dragonflies a short history of interconnect From Hypercubes to Dragonflies a short history of interconnect William J. Dally Computer Science Department Stanford University IAA Workshop July 21, 2008 IAA: # Outline The low-radix era High-radix routers

More information

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi

Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi University of Utah & HP Labs 1 Large Caches Cache hierarchies

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Technology-Driven, Highly-Scalable Dragonfly Topology

Technology-Driven, Highly-Scalable Dragonfly Topology Technology-Driven, Highly-Scalable Dragonfly Topology By William J. Dally et al ACAL Group Seminar Raj Parihar parihar@ece.rochester.edu Motivation Objective: In interconnect network design Minimize (latency,

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

From Bus and Crossbar to Network-On-Chip. Arteris S.A. From Bus and Crossbar to Network-On-Chip Arteris S.A. Copyright 2009 Arteris S.A. All rights reserved. Contact information Corporate Headquarters Arteris, Inc. 1741 Technology Drive, Suite 250 San Jose,

More information

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Cray Gemini Interconnect Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Outline 1. Introduction 2. Overview 3. Architecture 4. Gemini Blocks 5. FMA & BTA 6. Fault tolerance

More information

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,

More information

Photonic Networks for Data Centres and High Performance Computing

Photonic Networks for Data Centres and High Performance Computing Photonic Networks for Data Centres and High Performance Computing Philip Watts Department of Electronic Engineering, UCL Yury Audzevich, Nick Barrow-Williams, Robert Mullins, Simon Moore, Andrew Moore

More information

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E) Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E) 1 Topologies Internet topologies are not very regular they grew incrementally Supercomputers

More information

Why the Network Matters

Why the Network Matters Week 2, Lecture 2 Copyright 2009 by W. Feng. Based on material from Matthew Sottile. So Far Overview of Multicore Systems Why Memory Matters Memory Architectures Emerging Chip Multiprocessors (CMP) Increasing

More information

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs Antoni Roca, Jose Flich Parallel Architectures Group Universitat Politechnica de Valencia (UPV) Valencia, Spain Giorgos Dimitrakopoulos

More information

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy

TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy John Jose, K.V. Mahathi, J. Shiva Shankar and Madhu Mutyam PACE Laboratory, Department of Computer Science and Engineering

More information

Interconnection Network Design

Interconnection Network Design Interconnection Network Design Vida Vukašinović 1 Introduction Parallel computer networks are interesting topic, but they are also difficult to understand in an overall sense. The topological structure

More information

Topological Properties

Topological Properties Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Interconnection Networks

Interconnection Networks Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode

More information

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP

More information

Asynchronous Bypass Channels

Asynchronous Bypass Channels Asynchronous Bypass Channels Improving Performance for Multi-Synchronous NoCs T. Jain, P. Gratz, A. Sprintson, G. Choi, Department of Electrical and Computer Engineering, Texas A&M University, USA Table

More information

What is a System on a Chip?

What is a System on a Chip? What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex

More information

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems

More information

Low Power AMD Athlon 64 and AMD Opteron Processors

Low Power AMD Athlon 64 and AMD Opteron Processors Low Power AMD Athlon 64 and AMD Opteron Processors Hot Chips 2004 Presenter: Marius Evers Block Diagram of AMD Athlon 64 and AMD Opteron Based on AMD s 8 th generation architecture AMD Athlon 64 and AMD

More information

Interconnection Networks

Interconnection Networks Interconnection Networks Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Three questions about

More information

Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors

Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors Hyungjun Kim, 1 Arseniy Vitkovsky, 2 Paul V. Gratz, 1 Vassos Soteriou 2 1 Department of Electrical and Computer Engineering, Texas

More information

Components: Interconnect Page 1 of 18

Components: Interconnect Page 1 of 18 Components: Interconnect Page 1 of 18 PE to PE interconnect: The most expensive supercomputer component Possible implementations: FULL INTERCONNECTION: The ideal Usually not attainable Each PE has a direct

More information

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip Cristina SILVANO silvano@elet.polimi.it Politecnico di Milano, Milano (Italy) Talk Outline

More information

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano Intel Itanium Quad-Core Architecture for the Enterprise Lambert Schaelicke Eric DeLano Agenda Introduction Intel Itanium Roadmap Intel Itanium Processor 9300 Series Overview Key Features Pipeline Overview

More information

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09 Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors NoCArc 09 Jesús Camacho Villanueva, José Flich, José Duato Universidad Politécnica de Valencia December 12,

More information

Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip

Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip Manjunath E 1, Dhana Selvi D 2 M.Tech Student [DE], Dept. of ECE, CMRIT, AECS Layout, Bangalore, Karnataka,

More information

PCI Express Overview. And, by the way, they need to do it in less time.

PCI Express Overview. And, by the way, they need to do it in less time. PCI Express Overview Introduction This paper is intended to introduce design engineers, system architects and business managers to the PCI Express protocol and how this interconnect technology fits into

More information

A Low Latency Router Supporting Adaptivity for On-Chip Interconnects

A Low Latency Router Supporting Adaptivity for On-Chip Interconnects A Low Latency Supporting Adaptivity for On-Chip Interconnects Jongman Kim Dongkook Park T. Theocharides N. Vijaykrishnan Chita R. Das Department of Computer Science and Engineering The Pennsylvania State

More information

Low-Overhead Hard Real-time Aware Interconnect Network Router

Low-Overhead Hard Real-time Aware Interconnect Network Router Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering

More information

Chapter 1 Reading Organizer

Chapter 1 Reading Organizer Chapter 1 Reading Organizer After completion of this chapter, you should be able to: Describe convergence of data, voice and video in the context of switched networks Describe a switched network in a small

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

inter-chip and intra-chip Harm Dorren and Oded Raz

inter-chip and intra-chip Harm Dorren and Oded Raz Will photonics penetrate into inter-chip and intra-chip communications? Harm Dorren and Oded Raz On-chip interconnect networks R X R X R X R X R T X R XT X T X T X T X X Electronic on-chip interconnect

More information

A Low-Radix and Low-Diameter 3D Interconnection Network Design

A Low-Radix and Low-Diameter 3D Interconnection Network Design A Low-Radix and Low-Diameter 3D Interconnection Network Design Yi Xu,YuDu, Bo Zhao, Xiuyi Zhou, Youtao Zhang, Jun Yang Dept. of Electrical and Computer Engineering Dept. of Computer Science University

More information

Parallel Programming

Parallel Programming Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Parallel Architectures Acknowledgements Prof. Felix

More information

Leveraging Torus Topology with Deadlock Recovery for Cost-Efficient On-Chip Network

Leveraging Torus Topology with Deadlock Recovery for Cost-Efficient On-Chip Network Leveraging Torus Topology with Deadlock ecovery for Cost-Efficient On-Chip Network Minjeong Shin, John Kim Department of Computer Science KAIST Daejeon, Korea {shinmj, jjk}@kaist.ac.kr Abstract On-chip

More information

- Hubs vs. Switches vs. Routers -

- Hubs vs. Switches vs. Routers - 1 Layered Communication - Hubs vs. Switches vs. Routers - Network communication models are generally organized into layers. The OSI model specifically consists of seven layers, with each layer representing

More information

8 Gbps CMOS interface for parallel fiber-optic interconnects

8 Gbps CMOS interface for parallel fiber-optic interconnects 8 Gbps CMOS interface for parallel fiberoptic interconnects Barton Sano, Bindu Madhavan and A. F. J. Levi Department of Electrical Engineering University of Southern California Los Angeles, California

More information

Interconnection Network of OTA-based FPAA

Interconnection Network of OTA-based FPAA Chapter S Interconnection Network of OTA-based FPAA 5.1 Introduction Aside from CAB components, a number of different interconnect structures have been proposed for FPAAs. The choice of an intercmmcclion

More information

Chapter 2 Heterogeneous Multicore Architecture

Chapter 2 Heterogeneous Multicore Architecture Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is

More information

A Dynamic Link Allocation Router

A Dynamic Link Allocation Router A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection

More information

Global Foundation Services

Global Foundation Services Global Foundation Services Introduction I work in Global Foundations Services (GFS) Lead R&D for Microsoft s next-generation end-to-end solutions for the cloud infrastructure We take a long term view

More information

A Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator

A Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator A Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator Nan Jiang Stanford University qtedq@cva.stanford.edu James Balfour Google Inc. jbalfour@google.com Daniel U. Becker Stanford University

More information

ARCHITECTING EFFICIENT INTERCONNECTS FOR LARGE CACHES

ARCHITECTING EFFICIENT INTERCONNECTS FOR LARGE CACHES ... ARCHITECTING EFFICIENT INTERCONNECTS FOR LARGE CACHES WITH CACTI 6.0... INTERCONNECTS PLAY AN INCREASINGLY IMPORTANT ROLE IN DETERMINING THE POWER AND PERFORMANCE CHARACTERISTICS OF MODERN PROCESSORS.

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Scaling 10Gb/s Clustering at Wire-Speed

Scaling 10Gb/s Clustering at Wire-Speed Scaling 10Gb/s Clustering at Wire-Speed InfiniBand offers cost-effective wire-speed scaling with deterministic performance Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400

More information

OpenSoC Fabric: On-Chip Network Generator

OpenSoC Fabric: On-Chip Network Generator OpenSoC Fabric: On-Chip Network Generator Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf MODSIM 2014 Presentation

More information

SAN Conceptual and Design Basics

SAN Conceptual and Design Basics TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer

More information

CCNA R&S: Introduction to Networks. Chapter 5: Ethernet

CCNA R&S: Introduction to Networks. Chapter 5: Ethernet CCNA R&S: Introduction to Networks Chapter 5: Ethernet 5.0.1.1 Introduction The OSI physical layer provides the means to transport the bits that make up a data link layer frame across the network media.

More information

Network Architecture and Topology

Network Architecture and Topology 1. Introduction 2. Fundamentals and design principles 3. Network architecture and topology 4. Network control and signalling 5. Network components 5.1 links 5.2 switches and routers 6. End systems 7. End-to-end

More information

- Nishad Nerurkar. - Aniket Mhatre

- Nishad Nerurkar. - Aniket Mhatre - Nishad Nerurkar - Aniket Mhatre Single Chip Cloud Computer is a project developed by Intel. It was developed by Intel Lab Bangalore, Intel Lab America and Intel Lab Germany. It is part of a larger project,

More information

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School Introduction to Infiniband Hussein N. Harake, Performance U! Winter School Agenda Definition of Infiniband Features Hardware Facts Layers OFED Stack OpenSM Tools and Utilities Topologies Infiniband Roadmap

More information

OC By Arsene Fansi T. POLIMI 2008 1

OC By Arsene Fansi T. POLIMI 2008 1 IBM POWER 6 MICROPROCESSOR OC By Arsene Fansi T. POLIMI 2008 1 WHAT S IBM POWER 6 MICROPOCESSOR The IBM POWER6 microprocessor powers the new IBM i-series* and p-series* systems. It s based on IBM POWER5

More information

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim Department of Computer Science and Engineering Texas A&M University

More information

Latency Considerations for 10GBase-T PHYs

Latency Considerations for 10GBase-T PHYs Latency Considerations for PHYs Shimon Muller Sun Microsystems, Inc. March 16, 2004 Orlando, FL Outline Introduction Issues and non-issues PHY Latency in The Big Picture Observations Summary and Recommendations

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Packetization and routing analysis of on-chip multiprocessor networks

Packetization and routing analysis of on-chip multiprocessor networks Journal of Systems Architecture 50 (2004) 81 104 www.elsevier.com/locate/sysarc Packetization and routing analysis of on-chip multiprocessor networks Terry Tao Ye a, *, Luca Benini b, Giovanni De Micheli

More information

Interconnection Networks

Interconnection Networks CMPT765/408 08-1 Interconnection Networks Qianping Gu 1 Interconnection Networks The note is mainly based on Chapters 1, 2, and 4 of Interconnection Networks, An Engineering Approach by J. Duato, S. Yalamanchili,

More information

SeaMicro SM10000-64 Server

SeaMicro SM10000-64 Server SeaMicro SM10000-64 Server Building Datacenter Servers Using Cell Phone Chips Ashutosh Dhodapkar, Gary Lauterbach, Sean Lie, Ashutosh Dhodapkar, Gary Lauterbach, Sean Lie, Dhiraj Mallick, Jim Bauman, Sundar

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

How PCI Express Works (by Tracy V. Wilson)

How PCI Express Works (by Tracy V. Wilson) 1 How PCI Express Works (by Tracy V. Wilson) http://computer.howstuffworks.com/pci-express.htm Peripheral Component Interconnect (PCI) slots are such an integral part of a computer's architecture that

More information

Infrastructure Components: Hub & Repeater. Network Infrastructure. Switch: Realization. Infrastructure Components: Switch

Infrastructure Components: Hub & Repeater. Network Infrastructure. Switch: Realization. Infrastructure Components: Switch Network Infrastructure or building computer networks more complex than e.g. a short bus, some additional components are needed. They can be arranged hierarchically regarding their functionality: Repeater

More information

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family White Paper June, 2008 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

More information

Clocking. Figure by MIT OCW. 6.884 - Spring 2005 2/18/05 L06 Clocks 1

Clocking. Figure by MIT OCW. 6.884 - Spring 2005 2/18/05 L06 Clocks 1 ing Figure by MIT OCW. 6.884 - Spring 2005 2/18/05 L06 s 1 Why s and Storage Elements? Inputs Combinational Logic Outputs Want to reuse combinational logic from cycle to cycle 6.884 - Spring 2005 2/18/05

More information

Driving force. What future software needs. Potential research topics

Driving force. What future software needs. Potential research topics Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #

More information

ICRI-CI Retreat Architecture track

ICRI-CI Retreat Architecture track ICRI-CI Retreat Architecture track Uri Weiser June 5 th 2015 - Funnel: Memory Traffic Reduction for Big Data & Machine Learning (Uri) - Accelerators for Big Data & Machine Learning (Ran) - Machine Learning

More information

Network Design. Yiannos Mylonas

Network Design. Yiannos Mylonas Network Design Yiannos Mylonas Physical Topologies There are two parts to the topology definition: the physical topology, which is the actual layout of the wire (media), and the logical topology, which

More information

Quality of Service (QoS) for Asynchronous On-Chip Networks

Quality of Service (QoS) for Asynchronous On-Chip Networks Quality of Service (QoS) for synchronous On-Chip Networks Tomaz Felicijan and Steve Furber Department of Computer Science The University of Manchester Oxford Road, Manchester, M13 9PL, UK {felicijt,sfurber}@cs.man.ac.uk

More information

Testing of Digital System-on- Chip (SoC)

Testing of Digital System-on- Chip (SoC) Testing of Digital System-on- Chip (SoC) 1 Outline of the Talk Introduction to system-on-chip (SoC) design Approaches to SoC design SoC test requirements and challenges Core test wrapper P1500 core test

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design Applying the Benefits of on a Chip Architecture to FPGA System Design WP-01149-1.1 White Paper This document describes the advantages of network on a chip (NoC) architecture in Altera FPGA system design.

More information

Communicating with devices

Communicating with devices Introduction to I/O Where does the data for our CPU and memory come from or go to? Computers communicate with the outside world via I/O devices. Input devices supply computers with data to operate on.

More information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.

More information

Designing HP SAN Networking Solutions

Designing HP SAN Networking Solutions Exam : HP0-J65 Title : Designing HP SAN Networking Solutions Version : Demo 1 / 6 1.To install additional switches, you must determine the ideal ISL ratio. Which ISL ratio range is recommended for less

More information

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc. Filename: SAS - PCI Express Bandwidth - Infostor v5.doc Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI Article for InfoStor November 2003 Paul Griffith Adaptec, Inc. Server

More information

Switched Interconnect for System-on-a-Chip Designs

Switched Interconnect for System-on-a-Chip Designs witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased

More information

Communication Networks. MAP-TELE 2011/12 José Ruela

Communication Networks. MAP-TELE 2011/12 José Ruela Communication Networks MAP-TELE 2011/12 José Ruela Network basic mechanisms Introduction to Communications Networks Communications networks Communications networks are used to transport information (data)

More information

InfiniBand Clustering

InfiniBand Clustering White Paper InfiniBand Clustering Delivering Better Price/Performance than Ethernet 1.0 Introduction High performance computing clusters typically utilize Clos networks, more commonly known as Fat Tree

More information

Efficient Built-In NoC Support for Gather Operations in Invalidation-Based Coherence Protocols

Efficient Built-In NoC Support for Gather Operations in Invalidation-Based Coherence Protocols Universitat Politècnica de València Master Thesis Efficient Built-In NoC Support for Gather Operations in Invalidation-Based Coherence Protocols Author: Mario Lodde Advisor: Prof. José Flich Cardo A thesis

More information

Storage at a Distance; Using RoCE as a WAN Transport

Storage at a Distance; Using RoCE as a WAN Transport Storage at a Distance; Using RoCE as a WAN Transport Paul Grun Chief Scientist, System Fabric Works, Inc. (503) 620-8757 pgrun@systemfabricworks.com Why Storage at a Distance the Storage Cloud Following

More information

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2 Reconfigurable Architectures Chapter 3.2 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design Coarse-Grained Reconfigurable Devices Recall: 1. Brief Historically development (Estrin Fix-Plus

More information

The OSI & Internet layering models

The OSI & Internet layering models CSE 123 Computer Networks Fall 2009 Lecture 2: Protocols & Layering Today What s a protocol? Organizing protocols via layering Encoding layers in packets The OSI & Internet layering models The end-to-end

More information

PCI Express* Ethernet Networking

PCI Express* Ethernet Networking White Paper Intel PRO Network Adapters Network Performance Network Connectivity Express* Ethernet Networking Express*, a new third-generation input/output (I/O) standard, allows enhanced Ethernet network

More information

A Heuristic for Peak Power Constrained Design of Network-on-Chip (NoC) based Multimode Systems

A Heuristic for Peak Power Constrained Design of Network-on-Chip (NoC) based Multimode Systems A Heuristic for Peak Power Constrained Design of Network-on-Chip (NoC) based Multimode Systems Praveen Bhojwani, Rabi Mahapatra, un Jung Kim and Thomas Chen* Computer Science Department, Texas A&M University,

More information

Introduction to System-on-Chip

Introduction to System-on-Chip Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

OpenSPARC T1 Processor

OpenSPARC T1 Processor OpenSPARC T1 Processor The OpenSPARC T1 processor is the first chip multiprocessor that fully implements the Sun Throughput Computing Initiative. Each of the eight SPARC processor cores has full hardware

More information

Annotation to the assignments and the solution sheet. Note the following points

Annotation to the assignments and the solution sheet. Note the following points Computer rchitecture 2 / dvanced Computer rchitecture Seite: 1 nnotation to the assignments and the solution sheet This is a multiple choice examination, that means: Solution approaches are not assessed

More information

CHIP MULTIPROCESSOR COHERENCE AND INTERCONNECT SYSTEM DESIGN

CHIP MULTIPROCESSOR COHERENCE AND INTERCONNECT SYSTEM DESIGN CHIP MULTIPROCESSOR COHERENCE AND INTERCONNECT SYSTEM DESIGN by Natalie D. Enright Jerger A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical

More information

Peak Power Control for a QoS Capable On-Chip Network

Peak Power Control for a QoS Capable On-Chip Network Peak Power Control for a QoS Capable On-Chip Network Yuho Jin 1, Eun Jung Kim 1, Ki Hwan Yum 2 1 Texas A&M University 2 University of Texas at San Antonio {yuho,ejkim}@cs.tamu.edu yum@cs.utsa.edu Abstract

More information

SCSI vs. Fibre Channel White Paper

SCSI vs. Fibre Channel White Paper SCSI vs. Fibre Channel White Paper 08/27/99 SCSI vs. Fibre Channel Over the past decades, computer s industry has seen radical change in key components. Limitations in speed, bandwidth, and distance have

More information

Redundancy in enterprise storage networks using dual-domain SAS configurations

Redundancy in enterprise storage networks using dual-domain SAS configurations Redundancy in enterprise storage networks using dual-domain SAS configurations technology brief Abstract... 2 Introduction... 2 Why dual-domain SAS is important... 2 Single SAS domain... 3 Dual-domain

More information

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Chapter 2. Multiprocessors Interconnection Networks

Chapter 2. Multiprocessors Interconnection Networks Chapter 2 Multiprocessors Interconnection Networks 2.1 Taxonomy Interconnection Network Static Dynamic 1-D 2-D HC Bus-based Switch-based Single Multiple SS MS Crossbar 2.2 Bus-Based Dynamic Single Bus

More information