Pipeline Processors. MCA-304 : Advanced Computer Architecture by Dr. Sumit Mittal

Similar documents
CS521 CSE IITG 11/23/2012

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors


Parallel AES Encryption with Modified Mix-columns For Many Core Processor Arrays M.S.Arun, V.Saminathan

a. CSMA/CD is a random-access protocol. b. Polling is a controlled-access protocol. c. TDMA is a channelization protocol.

RAM & ROM Based Digital Design. ECE 152A Winter 2012

Asynchronous Counters. Asynchronous Counters

Table 1: Address Table

FPGA IMPLEMENTATION OF AN AES PROCESSOR

Some Computer Organizations and Their Effectiveness. Michael J Flynn. IEEE Transactions on Computers. Vol. c-21, No.

Chapter 9 Computer Design Basics!

Let s put together a Manual Processor

Implementation of Full -Parallelism AES Encryption and Decryption

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Deploying De-Duplication on Ext4 File System

Sequential Logic. (Materials taken from: Principles of Computer Hardware by Alan Clements )

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

WAVES. MultiRack SETUP GUIDE V9.80

Hardware Implementations of RSA Using Fast Montgomery Multiplications. ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner

Memory unit. 2 k words. n bits per word

Chapter 6: distributed systems

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Lesson 15 - Fill Cells Plugin

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

Performance Workload Design

Tag Processor Exercise

MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL

1. Memory technology & Hierarchy

To design digital counter circuits using JK-Flip-Flop. To implement counter using 74LS193 IC.

Cascaded Counters. Page 1 BYU

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices

Pipelining Review and Its Limitations

A Computer Vision System on a Chip: a case study from the automotive domain

Time Management II. June 5, Copyright 2008, Jason Paul Kazarian. All rights reserved.

Read this before starting!

Optimising the resource utilisation in high-speed network intrusion detection systems.

Design Example: Counters. Design Example: Counters. 3-Bit Binary Counter. 3-Bit Binary Counter. Other useful counters:

Understanding changes to the Trust Services Principles for SOC 2 reporting

AES1. Ultra-Compact Advanced Encryption Standard Core. General Description. Base Core Features. Symbol. Applications

ANN Based Fault Classifier and Fault Locator for Double Circuit Transmission Line

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.

Chapter 2 Data Storage

8-ch RAID0 Design by using SATA Host IP Manual Rev1.0 9-Jun-15

PROBLEMS #20,R0,R1 #$3A,R2,R4

Interconnection Network Design

Features. DDR SODIMM Product Datasheet. Rev. 1.0 Oct. 2011

1.55V DDR2 SDRAM FBDIMM

Programmable Logic Controllers

Switch Fabric Implementation Using Shared Memory

Interconnection Generation for System-on-Chip Design and Design Space Exploration

Applications Development on the ARM Cortex -M0+ Free On-line Development Tools Presented by William Antunes

The Correlation Coefficient

A 10,000 Frames/s 0.18 µm CMOS Digital Pixel Sensor with Pixel-Level Memory

Serial Communications

IJESRT. [Padama, 2(5): May, 2013] ISSN:

CS263: Wireless Communications and Sensor Networks

4.4 What is a Requirement? 4.5 Types of Requirements. Functional Requirements

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

Shear :: Blocks (Video and Image Processing Blockset )

SIMERO Software System Design and Implementation

Lecture 8: Synchronous Digital Systems

Counters are sequential circuits which "count" through a specific state sequence.

ELECTENG702 Advanced Embedded Systems. Improving AES128 software for Altera Nios II processor using custom instructions

Decisions in IBM Websphere ILOG BRMS

Automata Designs for Data Encryption with AES using the Micron Automata Processor

Table of Contents. Using the plug- in Pure Storage Flash Array Home Page... 11

Chapter 5 Instructor's Manual

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

Lecture 2 Linear functions and examples

3 Signals and Systems: Part II

Module: Software Instruction Scheduling Part I

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

LEVERAGING FPGA AND CPLD DIGITAL LOGIC TO IMPLEMENT ANALOG TO DIGITAL CONVERTERS

A PPENDIX G S IMPLIFIED DES

Giving credit where credit is due

GINI-Coefficient and GOZINTO-Graph (Workshop) (Two economic applications of secondary school mathematics)

SIM-PL: Software for teaching computer hardware at secondary schools in the Netherlands

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

Performance Tuning Guidelines for PowerExchange for Microsoft Dynamics CRM

The Evolution of CCD Clock Sequencers at MIT: Looking to the Future through History

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

A Lab Course on Computer Architecture

Stress Testing Technologies for Citrix MetaFrame. Michael G. Norman, CEO December 5, 2001

PROBLEMS (Cap. 4 - Istruzioni macchina)

路 論 Chapter 15 System-Level Physical Design

Computer Architecture

Memory Systems. Static Random Access Memory (SRAM) Cell

A-level COMPUTER SCIENCE

CSE2102 Digital Design II - Topics CSE Digital Design II

Mobile SDRAM. MT48H16M16LF 4 Meg x 16 x 4 banks MT48H8M32LF 2 Meg x 32 x 4 banks

Architectures and Design Methodologies for Micro and Nanocomputing

Dual-Core Processors on Dell-Supported Operating Systems

APPLIED EDUCATIONAL SYSTEMS, Inc. (800)

Tutorial.

Los Angeles County Department of Mental Health Chief Information Office Bureau Project Management & Administration Division

Open Flow Controller and Switch Datasheet

Transcription:

Pipeline Processors

Reservation Table Each functional evaluation can be represented using a diagram called Reservation Table (RT). It is the space-time diagram of a pipeline corresponding to one functional evaluation. X axis time units Y axis stages Columns represent the evaluation time for a given function Multiple checkmarks in a row, means repeated usage of the same stage in different cycles

Reservation Table For first sequence Sa, Sb, Sc, Sb, Sc, Sa called function A, we have 0 1 2 3 4 5 Sa A A Sb A A Sc A A

Reservation Table For second sequence Sa, Sc, Sb, Sa, Sb, Sc called function B, we have 0 1 2 3 4 5 Sa B B Sb B B Sc B B

3-Stage Non-Linear Pipeline Output A Input Sa Sb Sc Output B Time Reservation Table Stage Sa Sb Sc 0 1 2 3 4 5

Function A

3-Stage Pipeline : Sa, Sb, Sc, Sb, Sc, Sa Output A Input Sa Sb Sc Output B Time Reservation Table Stage Sa Sb Sc 0 1 2 3 4 5 A

3-Stage Pipeline : Sa, Sb, Sc, Sb, Sc, Sa Output A Input Sa Sb Sc Output B Time Reservation Table Stage Sa Sb Sc 0 1 2 3 4 5 A A

3-Stage Pipeline : Sa, Sb, Sc, Sb, Sc, Sa Output A Input Sa Sb Sc Output B Time Reservation Table Stage Sa Sb Sc 0 1 2 3 4 5 A A A

3-Stage Pipeline : Sa, Sb, Sc, Sb, Sc, Sa Output A Input Sa Sb Sc Output B Time Reservation Table Stage 0 1 2 3 4 5 Sa A Sb A A Sc A

3-Stage Pipeline : Sa, Sb, Sc, Sb, Sc, Sa Output A Input Sa Sb Sc Output B Time Reservation Table Stage 0 1 2 3 4 5 Sa A Sb A A Sc A A

3-Stage Pipeline : Sa, Sb, Sc, Sb, Sc, Sa Output A Input Sa Sb Sc Output B Time Reservation Table Stage 0 1 2 3 4 5 Sa A A Sb A A Sc A A

Function B

3-Stage Pipeline: Sa, Sc, Sb, Sa, Sb, Sc Output A Input Sa Sb Sc Output B Time Reservation Table Stage Sa Sb Sc 0 1 2 3 4 5 B

3-Stage Pipeline: Sa, Sc, Sb, Sa, Sb, Sc Output A Input Sa Sb Sc Output B Time Reservation Table Stage Sa Sb Sc 0 1 2 3 4 5 B B

3-Stage Pipeline: Sa, Sc, Sb, Sa, Sb, Sc Output A Input Sa Sb Sc Output B Time Reservation Table Stage 0 1 2 3 4 5 Sa B Sb B Sc B

3-Stage Pipeline: Sa, Sc, Sb, Sa, Sb, Sc Output A Input Sa Sb Sc Output B Time Reservation Table Stage 0 1 2 3 4 5 Sa B B Sb B Sc B

3-Stage Pipeline: Sa, Sc, Sb, Sa, Sb, Sc Output A Input Sa Sb Sc Output B Time Reservation Table Stage 0 1 2 3 4 5 Sa B B Sb B B Sc B

3-Stage Pipeline: Sa, Sc, Sb, Sa, Sb, Sc Output A Input Sa Sb Sc Output B Time Reservation Table Stage 0 1 2 3 4 5 Sa B B Sb B B Sc B B

Latency Analysis Latency: The number of time units (clock cycles) between two initiations of a pipeline is the latency between them. A latency value k means that two initiations are separated by k clock cycles. Collision: An attempt by two or more initiations to use the same pipeline stage at the same time. Collision implies resource conflicts b/w two initiations in the pipeline. Some latencies cause collision, some not. Latencies that will cause collision are called Forbidden Latencies and the latencies that will not cause collision are called Permissible Latencies.

Latency Cycles A latency sequence is a sequence of permissible nonforbidden latencies between successive task initiations. A latency cycle is a latency sequence which repeats the same subsequence indefinitely.

Latency Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 x 1 x 1 x 2 x 1 x 2 x 3 x 2 x 3 x 1 x 1 x 2 x 3 x 1 x 1 x 1 x 2 x 2 x 2 x 2 x 3 x 3 x 3 Cycle Cycle A latency sequence is a sequence of permissible nonforbidden latencies between successive task initiations. A latency cycle is a latency sequence which repeats the same subsequence indefinitely.

Example X Y S1 S2 S3

Reservation Tables for X & Y S1 S2 S3 X X X X X X X X S1 S2 S3 Y Y Y Y Y Y

Forbidden Latencies To detect the forbidden latencies, check the distance between two checkmarks in the same row of the reservation table.

X after X S1 S2 S3 X X X X X X X X 2 S1 S2 S3 X1 X2 X1 X2 X1 X1 X2 X1 X2 X1 X2 X1 X2 X1

X after X S1 S2 S3 X X X X X X X X 5 S1 S2 S3 X1 X2 X1 X1 X1 X1 X2 X1 X1 X1 X2

X after X S1 S2 S3 X X X X X X X X 4 S1 S2 S3 X1 X2 X1 X1 X1 X1 X2 X2 X1 X1 X2 X1

X after X S1 S2 S3 X X X X X X X X 7 S1 X1 X1 X2 X1 S2 S3 X1 X1 X1 X1 X1

Collision Vector Combined set of permissible and forbidden latencies. Forbidden Latencies: 2, 4, 5, 7 Collision vector C = (C m, C m-1,, C 2, C 1 ), m <= n-1 n = number of column in reservation table The value of Ci = 1 if the latency i causes a collision; Ci = 0 if the latency i is permissible. Collision Vector = 1 0 1 1 0 1 0

Y after Y S1 S2 S3 Y Y Y Y Y Y Y Y Y Y S1 S2 S3 Y Y Y Y Y Y Y Y

Y after Y S1 S2 S3 Y Y Y Y Y Y Y Y Y Y S1 S2 S3 Y Y Y Y Y Y Y Y

Collision Vector Forbidden Latencies: 2, 4 Collision vector C = (C m, C m-1,, C 2, C 1 ), m <= n-1 n = number of column in reservation table C i = 1 if latency i causes collision, 0 otherwise Collision Vector = 1 0 1 0

Collision Vector Reservation Table x x x x x x C = (??...??)

Exercise Find the collision vector 1 2 3 4 5 6 7 A X X X B X X C X X D X

State Diagram State diagrams can be constructed to specify the permissible transitions among successive initiations. The collision vector, corresponding to the initial state of pipeline at time 1, is called the initial collision vector (ICV).

State Diagram The next state of the pipeline at time t+p can be obtained by using a bit-right shift register Initial CV is loaded into the register. The register is then shifted to the right. When a 0 emerges from the right end after p shifts, p is a permissible latency. When a 1 emerges, the corresponding latency should be forbidden latency. Logical 0 enters from the left end of the shift register. The next state after p shifts is obtained by bitwise-oring the initial CV with the shifted register contents.

Right Shift Register The next state can be obtained with the help of an p-bit shift register 0 latency. 1 Collision 0 Safe to allow an initiation Each 1-bit shift corresponds to increase in the latency by 1.

The Next State The next state is obtained by bitwise ORing the initial collision vector with the shifted register C.V. = 1 0 1 1 0 1 0 (first state) 0 1 0 1 1 0 1 C.V. 1-bit right shifted 1 0 1 1 0 1 0 initial C.V. ---------------- OR 1 1 1 1 1 1 1

State Diagram for X 8 + 1 0 1 1 0 1 0 3 6 8 + 1 8 + 1 0 1 1 0 1 1 1 1 1 1 1 1 1 3 * 6

Group Activity 1 1 2 3 4 S1 X X S2 S3 X X a. What are the forbidden latency? b. Draw the state transition diagram. c. List all the simple cycles and greedy cycles. d. Determine the minimal average latency (MAL). e. Determine the throughput of this pipeline.