Asynchronous Bypass Channels



Similar documents
Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors

TRACKER: A Low Overhead Adaptive NoC Router with Load Balancing Selection Strategy

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

COMMUNICATION PERFORMANCE EVALUATION AND ANALYSIS OF A MESH SYSTEM AREA NETWORK FOR HIGH PERFORMANCE COMPUTERS

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip

vci_anoc_network Specifications & implementation for the SoClib platform

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Network Design

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Leveraging Torus Topology with Deadlock Recovery for Cost-Efficient On-Chip Network

Interconnection Networks

A Dynamic Link Allocation Router

CONTINUOUS scaling of CMOS technology makes it possible

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Efficient Built-In NoC Support for Gather Operations in Invalidation-Based Coherence Protocols

A Low-Radix and Low-Diameter 3D Interconnection Network Design

Interconnection Networks

Interconnection Network

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS

A Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator

Packetization and routing analysis of on-chip multiprocessor networks

A Low Latency Router Supporting Adaptivity for On-Chip Interconnects

Communication Networks. MAP-TELE 2011/12 José Ruela

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Chapter 4. VoIP Metric based Traffic Engineering to Support the Service Quality over the Internet (Inter-domain IP network)

Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP)

Wide Area Networks. Learning Objectives. LAN and WAN. School of Business Eastern Illinois University. (Week 11, Thursday 3/22/2007)

Synthetic Traffic Models that Capture Cache Coherent Behaviour. Mario Badr

In-network Monitoring and Control Policy for DVFS of CMP Networkson-Chip and Last Level Caches

Topological Properties

Interconnection Networks

Quality of Service (QoS) for Asynchronous On-Chip Networks

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

A Heuristic for Peak Power Constrained Design of Network-on-Chip (NoC) based Multimode Systems

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Load Balancing Mechanisms in Data Center Networks

A Low-Latency Asynchronous Interconnection Network with Early Arbitration Resolution

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms

Dynamic Congestion-Based Load Balanced Routing in Optical Burst-Switched Networks


Routing in packet-switching networks

Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track)

Design and Verification of Nine port Network Router

Performance Analysis of Storage Area Network Switches

Solving Network Challenges

Scaling 10Gb/s Clustering at Wire-Speed

ΤΕΙ Κρήτης, Παράρτηµα Χανίων

DigiPoints Volume 1. Student Workbook. Module 4 Bandwidth Management

Switched Interconnect for System-on-a-Chip Designs

On-Chip Communication Architectures

Introduction to Quality of Service. Andrea Bianco Telecommunication Network Group

COMMUNICATION NETWORKS WITH LAYERED ARCHITECTURES. Gene Robinson E.A.Robinsson Consulting

Catnap: Energy Proportional Multiple Network-on-Chip

Why the Network Matters

Optimizing Configuration and Application Mapping for MPSoC Architectures

Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip

Smart Queue Scheduling for QoS Spring 2001 Final Report

Network management and QoS provisioning - QoS in the Internet

On-Chip Interconnection Networks Low-Power Interconnect

Distributed Explicit Partial Rerouting (DEPR) Scheme for Load Balancing in MPLS Networks

What is CSG150 about? Fundamentals of Computer Networking. Course Outline. Lecture 1 Outline. Guevara Noubir noubir@ccs.neu.

Low-Overhead Hard Real-time Aware Interconnect Network Router

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

Faculty of Engineering Computer Engineering Department Islamic University of Gaza Network Chapter# 19 INTERNETWORK OPERATION

Ethernet. Ethernet Frame Structure. Ethernet Frame Structure (more) Ethernet: uses CSMA/CD

Protagonist International Journal of Management And Technology (PIJMT) Online ISSN Vol 2 No 3 (May-2015) Active Queue Management

Optical interconnection networks with time slot routing

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches

Preserving Message Integrity in Dynamic Process Migration

Virtual Private LAN Service (VPLS) Conformance and Performance Testing Sample Test Plans

LTE BACKHAUL REQUIREMENTS: A REALITY CHECK

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

EFFECTS OF NoC ARCHITECTURAL PARAMETERS IN MPSoC PERFORMANCE

Analysis of QoS Routing Approach and the starvation`s evaluation in LAN

2 Basic Concepts. Contents

Three Key Design Considerations of IP Video Surveillance Systems

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

From Hypercubes to Dragonflies a short history of interconnect

Data Center Network Topologies: FatTree

QoS issues in Voice over IP

SAN Conceptual and Design Basics

CHAPTER 6. VOICE COMMUNICATION OVER HYBRID MANETs

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

How To Provide Qos Based Routing In The Internet

CS/ECE 438: Communication Networks. Internet QoS. Syed Faisal Hasan, PhD (Research Scholar Information Trust Institute) Visiting Lecturer ECE

COMPUTER NETWORKS - LAN Interconnection

PART III. OPS-based wide area networks

TDT 4260 lecture 11 spring semester Interconnection network continued

Chapter 2. Multiprocessors Interconnection Networks

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL

Joint ITU-T/IEEE Workshop on Carrier-class Ethernet

IRMA: Integrated Routing and MAC Scheduling in Multihop Wireless Mesh Networks

A First Approach to Provide QoS in Advanced Switching

Lecture 2.1 : The Distributed Bellman-Ford Algorithm. Lecture 2.2 : The Destination Sequenced Distance Vector (DSDV) protocol

Network Instruments white paper

Transcription:

Asynchronous Bypass Channels Improving Performance for Multi-Synchronous NoCs T. Jain, P. Gratz, A. Sprintson, G. Choi, Department of Electrical and Computer Engineering, Texas A&M University, USA

Table of Contents Motivation Architecture and Operation Topology and Routing Algorithm Evaluation Conclusion and Future Work

Motivation Feature size is decreasing and number of cores on a chip is increasing. Interconnect latency forms a major portion of the total latency seen by an application.

Motivation

Objectives of ABC Offer asynchronous bypass channels (ABCs) at the intermediate nodes to avoid Synchronization delay Latching delay Explore network topologies and routing algorithms to complement the ABC router. Offer a skew tolerant solution for multisynchronous networks.

Table of Contents Motivation Architecture and Operation Topology and Routing Algorithm Evaluation Conclusion and Future Work

ABC Network Architecture Clocking scheme Mesochronous Source-synchronous Routing algorithm Unicast Deterministic Source coded Flow control Wormhole On/Off

ABC Network Architecture 5 input and output ports Paths connecting the opposite cardinal directions are referred to as straight-paths All other paths are known as turnpaths. - + PE + - + ROUTER - - + NODE

Router Micro-Architecture Straight path : ABC path Pair of bi-synchronous FIFO and a synchronous FIFO. Turn path: bisynchronous FIFOs. A clock signal is also transmitted along with the data.

Router Operation ABC state Incoming flits utilize the ABC path. In this state a flit faces only the link delay and the multiplexer overhead. (Assumed to be 3/4 th of a clock cycle). The incoming clock is retained and transmitted with the data. Flits are also latched into the synchronous FIFO. FIFO state Incase of late arriving control flow from downstream Flushed away when not needed Flits are read out from one of the bisynchronous FIFOs. The flit faces a delay of 2 cycles to appear at the output of the bisynchronous FIFO. Same as in traditional synchronizing router designs. The local clock at the node is transmitted with the data.

Router Operation

Table of Contents Motivation Architecture and Operation Topology and Routing Algorithm Evaluation Conclusion and Future Work

Topology Cost of turn : 3 cycles. Cost of ABC : Link delay + Multiplexer overhead assumed.75 clock cycles. ABC routers favor straight paths over turns. Need to search for complementary topologies.

Topology Observations: Reduce the number of turns to improve performance 1-D topology? Removes all turns Hop count does not scale well with node count >2-D topology? High impact on resources required by ABC router (number of FIFOs grows exponentially w/ dimension above 2) 2-D topology? 2-D topologies a good fit for planar silicon implementation 2-D mesh requires a turn for most source and destination pairs

2-D Topologies Number of router ports required must be five. It must consist of only two chains. Each chain must pass through all the nodes. Chains must be ordered. Connection between neighboring nodes only. Overall maximum bisection bandwidth must be equivalent to a standard mesh topology.

2-D Topologies SERPENTINE HOOK H-SHAPE SICKLE

Routing Algorithm A modified version of standard XY dimension order routing (DOR). Routing algorithm selects the path with the least delay in terms of clock cycles with a bias towards the straight path. Route encoded at source node. Chains are ordered to avoid deadlocks S D

Table of Contents Motivation Architecture and Operation Topology and Routing Algorithm Evaluation Conclusion and Future Work

Methodology Baseline Network 7x7, 2-D mesh network XY DOR routing Router Two VCs per port 8-flit-deep Packets synchronized at every hop 3 cycle penalty ABC Network 7x7, Serpentine network ABC routing Router 8 bi-synch 6-flit-deep 4 synch FIFOs 3-flit-deep Packet length was varied randomly between two to five flits. Fully Synthesizable Verilog model used for simulations.

Methodology Synthetic load Traffic Generated: Uniform Random Bit Complement Transpose 1000 cycles of warm-up 5000 packets monitored Realistic Workload SPLASH-2 suite of benchmarks obtained from a forty-nine node, shared memory CMP system simulator 10,000 cycles of warm-up 500,000 cycles from the middle of each trace

Evaluation (Synthetic Traffic) Uniform Random Lower no-load latency. Higher saturation throughput. Kink at 15% injection rate.

Evaluation (Synthetic Traffic) Bit Complement Lower no-load latency. Higher saturation throughput. Kink at 10% injection rate. Better performance because of better utilization of ABC.

Evaluation (Synthetic Traffic) Transpose Higher throughput because of higher bisection bandwidth. Marginal improvement in no load latency.

RELATIVE LATENCY Evaluation (Realistic Traffic) 1.2 1 ABC BASELINE 0.8 0.6 0.4 0.2 0 SPLASH-2 SUITE

Table of Contents Motivation Architecture and Operation Topology and Routing Algorithm Evaluation Conclusion and Future Work

Conclusion Introduced Asynchronous Bypass channel router. Avoids synchronization and latching at each hop required in multi-synchronous designs. Lower per-hop latency New topology and routing algorithm to match ABC's strengths. Lower number of hops Provides improved performance versus baseline for all workloads examined.

Future Work Lower latency solutions for turn paths Optimal topologies for ABC enhanced routers Implementing algorithms based on dynamic congestion information for Routing Polymorphic Topologies Switch between the FIFO and ABC

Thank You

NUMBER OF CLOCK CYCLES Evaluation (Topologies) 25 20 15 AVERAGE WORST 10 5 0 TOPOLOGY