Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng



Similar documents
Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Communication Networks. MAP-TELE 2011/12 José Ruela

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Interconnection Networks

Switch Fabric Implementation Using Shared Memory


On-Chip Interconnection Networks Low-Power Interconnect

Power Reduction Techniques in the SoC Clock Network. Clock Power

Design and Verification of Nine port Network Router

Open Flow Controller and Switch Datasheet

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Interconnection Network Design

Chapter 2. Multiprocessors Interconnection Networks

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

Switched Interconnect for System-on-a-Chip Designs

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

- Nishad Nerurkar. - Aniket Mhatre

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Interconnection Networks

COMMUNICATION NETWORKS WITH LAYERED ARCHITECTURES. Gene Robinson E.A.Robinsson Consulting

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2

Computer Organization & Architecture Lecture #19

Clock Distribution Networks in Synchronous Digital Integrated Circuits

Clock Distribution in RNS-based VLSI Systems

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

ECE 358: Computer Networks. Solutions to Homework #4. Chapter 4 - The Network Layer

Packetization and routing analysis of on-chip multiprocessor networks

Chapter 12: Multiprocessor Architectures. Lesson 04: Interconnect Networks

Nexus: An Asynchronous Crossbar Interconnect for Synchronous System-on-Chip Designs

Topics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology

A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs

Lecture 2 Parallel Programming Platforms

Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA

Tyrant: A High Performance Storage over IP Switch Engine

路 論 Chapter 15 System-Level Physical Design

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE

Interconnection Network of OTA-based FPAA

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip

SoC IP Interfaces and Infrastructure A Hybrid Approach

Architecture of distributed network processors: specifics of application in information security systems

Chapter 9A. Network Definition. The Uses of a Network. Network Basics

Low-Overhead Hard Real-time Aware Interconnect Network Router

Breaking the Interleaving Bottleneck in Communication Applications for Efficient SoC Implementations

Architectures and Platforms

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP

Network management and QoS provisioning - QoS in the Internet

International Journal of Electronics and Computer Science Engineering 1482

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

2 Basic Concepts. Contents

Quality of Service (QoS) for Asynchronous On-Chip Networks

Alpha CPU and Clock Design Evolution

OpenSoC Fabric: On-Chip Network Generator

Chapter 2 Heterogeneous Multicore Architecture

A bachelor of science degree in electrical engineering with a cumulative undergraduate GPA of at least 3.0 on a 4.0 scale

Customer Specific Wireless Network Solutions Based on Standard IEEE

Clocking. Figure by MIT OCW Spring /18/05 L06 Clocks 1

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL

White Paper Abstract Disclaimer

Topological Properties

Business Case for BTI Intelligent Cloud Connect for Content, Co-lo and Network Providers

7a. System-on-chip design and prototyping platforms

TIMING-DRIVEN PHYSICAL DESIGN FOR DIGITAL SYNCHRONOUS VLSI CIRCUITS USING RESONANT CLOCKING

The Internet: A Remarkable Story. Inside the Net: A Different Story. Networks are Hard to Manage. Software Defined Networking Concepts

Use-it or Lose-it: Wearout and Lifetime in Future Chip-Multiprocessors

vci_anoc_network Specifications & implementation for the SoClib platform

How To Understand The Concept Of A Distributed System

Introduction to Digital System Design

Chapter 2 Logic Gates and Introduction to Computer Architecture

Interconnection Generation for System-on-Chip Design and Design Space Exploration

Local-Area Network -LAN

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7

PowerPC Microprocessor Clock Modes

Development of the FITELnet-G20 Metro Edge Router

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware

Data Communication and Computer Network

524 Computer Networks

CS 78 Computer Networks. Internet Protocol (IP) our focus. The Network Layer. Interplay between routing and forwarding

Computer Networks Vs. Distributed Systems

IST 220 Exam 3 Notes Prepared by Dan Veltri

10CS64: COMPUTER NETWORKS - II

Module 5. Broadcast Communication Networks. Version 2 CSE IIT, Kharagpur

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

Chapter 1 Reading Organizer

Module 15: Network Structures

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms

Transcription:

Architectural Level Power Consumption of Network Presenter: YUAN Zheng

Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption Many methods : physical level, circuit level, system level One important aspect: Architectural Level 2

Outline 1. Introduction 2. Energy-Efficiency for Interconnection 3. Power Consumption of Switching and Routing 4. Example: MAIA 5. Summary 3

Outline 1. Introduction 2. Energy-Efficiency for Interconnection 3. Power Consumption of Switching and Routing 4. Example: MAIA 5. Summary 4

NoC Architecture Shared Medium Networks Single Bus Network Multiple Bus Network Direct and Indirect Network Switch and Route Infrastructure Hybrid Network Hierarchical and Heterogeneous Architecture 5

Hierarchical and Heterogeneous Architecture Locality reduces the cost of global connections Wiring channels created along the sides of each module Switchbox Generalized mesh structure 6

Outline 1. Introduction 2. Energy-Efficiency for Interconnection 3. Power Consumption of Switching and Routing 4. Example: MAIA 5. Summary 7

Architectural Effect on Power Consumption 50 % from interconnect wires Short range local communication fast and power efficient Long range global communication complex protocol slow and power inefficient * D. L. Liu and C. Svensson, Power consumption estimation in CMOS VLSI chips, IEEE Journal SSC, June 1994. 8

Power Dissipation of Bus Structure Poor in energy efficiency because each data transfer is broadcast Load capacitance of the entire bus has to be driven during each data transfer P = 1/2 C f V 2 Load capacitance 9

Bus Splitting Split bus into multiple segments Data transfers proceed in parallel locally High throughput and low energy consumption 10

Clustering Partition the modules into clusters of tightly-connected components Elements (cores) share physical communication channels More power efficient in intracluster Balance communication load Hierarchical generalized mesh 11

GALS - Globally Asynchronous Locally Synchronous Partition system into optimal number/size of synchronous blocks Communicate asynchronously Clock frequency Power consumption 12

Outline 1. Introduction 2. Energy-Efficiency for Interconnection 3. Power Consumption of Switching and Routing 4. Example: MAIA 5. Summary 13

Switch and Router Banyan switch Router-based architecture 14

Power Modeling for Switching (1) The power consumption on switch fabrics comes from three different sources: 1. Internal node switches 2. Internal buffer queues 3. Interconnect wires 15

Power Modeling for Switching (2) Bit Energy E bit : Energy consumed for each bit when the bit is transported inside the switch fabrics from ingress ports to egress ports 1. Internal node switches : E Sbit 2. Internal buffer queues : E Bbit 3. Interconnect wires : E Wbit * Ye, T.T.; Benini, L.; De Micheli, G. Analysis of power consumption on switch fabrics in network routers Design Automation Conference, 2002. 16

Node Switch Power Consumption E Sbit Packets from one stage to the next stage Header data path and payload data path E Sbit is input state-dependent Input vector 0 1 switch * Ye, T.T.; Benini, L.; De Micheli, G. Analysis of power consumption on switch fabrics in network routers Design Automation Conference, 2002. 17

Internal Buffer Power Consumption E Bbit Destination contention two or more packets in the ingress ports requesting the same destination port at the same time Interconnect contention same interconnect link may be shared by packets with different destinations * Ye, T.T.; Benini, L.; De Micheli, G. Analysis of power consumption on switch fabrics in network routers Design Automation Conference, 2002. 18

Interconnect Wires Power Consumption E Wbit Signal on the wire will toggle between logic 0 and logic 1 C wire : wire capacitance C input : total capacitance of input gates C W = C wire + C input : total load capacitance * Ye, T.T.; Benini, L.; De Micheli, G. Analysis of power consumption on switch fabrics in network routers Design Automation Conference, 2002. 19

Interconnect Wire Length Estimation Thompson Model Based on graph embedding process Each vertex in G is mapped into a d d square of vertices in H, where d is the degree of vertex Each edge in G is mapped into one or more edges of graph H Find the minimum number of columns p min and rows q min in H G(V G,E G ) H(V H,E H ) V: Vertices E: Edges d = 4 -> 4 4 20

Switch Architecture Purpose : compare the power consumption in different switch architectures A. Crossbar Switch B. Full Connected Network C. Banyan Network D. Batcher Banyan Network 21

A. Crossbar Switch Any of the N input ports can be connected to any of the N output ports by a node switch Interconnect contention free Destination contention free (solved by arbiter, no buffer need) Bit energy of thompson grid wire * Ye, T.T.; Benini, L.; De Micheli, G. Analysis of power consumption on switch fabrics in network routers Design Automation Conference, 2002. 22

B. Full-Connected Network Arbiter controls MUXs to direct the switch path Interconnect contention and Destination contention free Power consumption and complexity scale up with the number of inputs N * Ye, T.T.; Benini, L.; De Micheli, G. Analysis of power consumption on switch fabrics in network routers Design Automation Conference, 2002. 23

C. Banyan Network Butterfly topology: N = 2 n inputs and N = 2 n outputs Stage i checks the i th bit of the destination address of the packet, self-routing switch fabric, buffer used for Interconnect contention problem Bit energy of thompson grid wire q i = 1 : contention occurs q i = 0 : no contention * Ye, T.T.; Benini, L.; De Micheli, G. Analysis of power consumption on switch fabrics in network routers Design Automation Conference, 2002. 24

D. Batcher Banyan network Similar to Banyan network Solve interconnect contention problem Sorting network added, each input-output connection have its own dedicated path bit energy of sorting switches * Ye, T.T.; Benini, L.; De Micheli, G. Analysis of power consumption on switch fabrics in network routers Design Automation Conference, 2002. 25

Analysis of the Switch Network Architecture I. Fully connected switch has the lowest power consumption, but large implementation area and less flexibility. II. Interconnect contention has a dramatic impact on the power consumption of Banyan switch because of buffer problem. III. Interconnect wires dominate the power consumptions with large switch port. 26

Outline 1. Introduction 2. Energy-Efficiency for Interconnection 3. Power Consumption of Switching and Routing 4. Example: MAIA 5. Summary 27

Voice Coding Chip - MAIA Programmable microprocessor Heterogeneous computing elements (satellite) Two-level hierarchical meshstructure with reconfigurable interconnect network Architecture of MAIA Voice Coding Chip 28

Component Description Embedded Microprocessor Power and performance optimized ARM8 core Programmable ASIC Elements Dual-stage pipelined MAC(multiply-accumulate) and ALU Embedded FPGA Logic block Low-swing circuits Interconnect Architecture Clock Distribution 29

Reconfigurable Interconnect Architecture Reconfiguration model The bars (C1, C2, etc.) between two reconfiguration times (t0->t1, t1->t2) represent a set of intersatellite connections realized simultaneously by the reconfigurable interconnect. 30

Communication Interface Description Inter-Satellite Communication Interface two-phase self-timed handshaking scheme, realized in a globally asynchronous, locally synchronous implementation (GALS) fashion. Communication Interface Between Microprocessor and Satellites Synchronization and communication between synchronous ARM8 core and asynchronous reconfigurable data paths using interface control unit 31

Hierarchical Generalized Mesh Interconnect Network Four clusters of tightly connected modules Each cluster has a local mesh for intra-cluster connections Interface ports for intercluster connection as hierarchical switch-boxes Interface for Microprocessor and Satellites Inter-cluster communication Switch box Intra-cluster communication Hierarchical 2-level generalized mesh architecture (LAYOUT) 32

Result Model-to-model energy of different architectures Manhattan Distance : shortest distance between two points measured along X and Y axes * H. Zhang, "A 1V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications", IEEE Journal on Solid State Circuits, Vol. 35, Nov 2000 33

Outline 1. Introduction 2. Energy-Efficiency for Interconnection 3. Power Consumption of Switching and Routing 4. Example: MAIA 5. Summary 34

Summary 1. Bus-based network : poor power efficiency and limited throughput, but simple and economical 2. Switch and route : great high performance and low power efficiency, but complex and variable 3. Power dissipation on buffers increases sharply as throughput increases in switch architecture 4. Interconnect wires dominate the power consumptions with large switch port 5. MAIA, a voice chip, implemented with hierarchical and heterogeneous architecture is much more energy efficient 35

Thank you for your presence! QUESTIONS