CS352H: Computer Systems Architecture

Similar documents
Design of Pipelined MIPS Processor. Sept. 24 & 26, 1997

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:


WAR: Write After Read

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Pipeline Hazards. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by Arvind and Krste Asanovic

PROBLEMS #20,R0,R1 #$3A,R2,R4

Data Dependences. A data dependence occurs whenever one instruction needs a value produced by another.

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Computer Organization and Components

Solutions. Solution The values of the signals are as follows:

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

CPU Performance Equation

Lecture: Pipelining Extensions. Topics: control hazards, multi-cycle instructions, pipelining equations

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

Course on Advanced Computer Architectures

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Computer organization

VLIW Processors. VLIW Processors

Instruction Set Architecture

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

Instruction Set Architecture (ISA)

Instruction Set Design

Introducción. Diseño de sistemas digitales.1

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Pipelining Review and Its Limitations

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Central Processing Unit (CPU)

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

CPU Organization and Assembly Language

An Introduction to the ARM 7 Architecture

CS521 CSE IITG 11/23/2012

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

Intel 8086 architecture

CPU Organisation and Operation

Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

(Refer Slide Time: 00:01:16 min)

Reduced Instruction Set Computer (RISC)

In the Beginning The first ISA appears on the IBM System 360 In the good old days

Review: MIPS Addressing Modes/Instruction Formats

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

IA-64 Application Developer s Architecture Guide

8085 INSTRUCTION SET

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December :59

Summary of the MARIE Assembly Language

Computer Organization and Architecture

A FPGA Implementation of a MIPS RISC Processor for Computer Architecture Education

Introduction to Cloud Computing

StrongARM** SA-110 Microprocessor Instruction Timing

Week 1 out-of-class notes, discussions and sample problems

Instruction Set Architecture (ISA) Design. Classification Categories

The Microarchitecture of Superscalar Processors

LSN 2 Computer Processors

SOC architecture and design

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Pipelined MIPS Processor. Dmitri Strukov ECE 154A

A SystemC Transaction Level Model for the MIPS R3000 Processor

CSE 141 Introduction to Computer Architecture Summer Session I, Lecture 1 Introduction. Pramod V. Argade June 27, 2005

A Lab Course on Computer Architecture

CHAPTER 7: The CPU and Memory

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

MICROPROCESSOR AND MICROCOMPUTER BASICS

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

Generating MIF files

Systems I: Computer Organization and Architecture

Thread level parallelism

Using Graphics and Animation to Visualize Instruction Pipelining and its Hazards

Computer Architecture TDTS10

Chapter 5 Instructor's Manual

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

High Performance Processor Architecture. André Seznec IRISA/INRIA ALF project-team

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Embedded System Hardware - Processing (Part II)

CHAPTER 4 MARIE: An Introduction to a Simple Computer

Instruction Set Architecture. Datapath & Control. Instruction. LC-3 Overview: Memory and Registers. CIT 595 Spring 2010

Chapter 01: Introduction. Lesson 02 Evolution of Computers Part 2 First generation Computers

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Technical Report. Complexity-effective superscalar embedded processors using instruction-level distributed processing. Ian Caulfield.

A s we saw in Chapter 4, a CPU contains three main sections: the register section,

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

Design of Digital Circuits (SS16)

CSC 2405: Computer Systems II

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

Processor Architectures

Boosting Beyond Static Scheduling in a Superscalar Processor

EECS 427 RISC PROCESSOR

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

CSE 141L Computer Architecture Lab Fall Lecture 2

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ

MACHINE ARCHITECTURE & LANGUAGE

EE361: Digital Computer Organization Course Syllabus

MIPS Assembler and Simulator

An Overview of Stack Architecture and the PSC 1000 Microprocessor

Transcription:

CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline - Hazards October 1, 2009 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell

Data Hazards in ALU Instructions Consider this sequence: sub $2, $1,$3 and $12,$2,$5 or $13,$6,$2 add $14,$2,$2 sw $15,100($2) We can resolve hazards with forwarding How do we detect when to forward? University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 2

Dependencies & Forwarding University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 3

Detecting the Need to Forward Pass register numbers along pipeline e.g., ID/EX.RegisterRs = register number for Rs sitting in ID/EX pipeline register ALU operand register numbers in EX stage are given by ID/EX.RegisterRs, ID/EX.RegisterRt Data hazards when 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt Fwd from EX/MEM pipeline reg Fwd from MEM/WB pipeline reg University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 4

Detecting the Need to Forward But only if forwarding instruction will write to a register! EX/MEM.RegWrite, MEM/WB.RegWrite And only if Rd for that instruction is not $zero EX/MEM.RegisterRd 0, MEM/WB.RegisterRd 0 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 5

Forwarding Paths University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 6

Forwarding Conditions EX hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 7

Double Data Hazard Consider the sequence: add $1,$1,$2 add $1,$1,$3 add $1,$1,$4 Both hazards occur Want to use the most recent Revise MEM hazard condition Only fwd if EX hazard condition isn t true University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 8

Revised Forwarding Condition MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 9

Datapath with Forwarding University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 10

Load-Use Data Hazard Need to stall for one cycle University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 11

Load-Use Hazard Detection Check when using instruction is decoded in ID stage ALU operand register numbers in ID stage are given by IF/ID.RegisterRs, IF/ID.RegisterRt Load-use hazard when ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt)) If detected, stall and insert bubble University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 12

How to Stall the Pipeline Force control values in ID/EX register to 0 EX, MEM and WB do nop (no-operation) Prevent update of PC and IF/ID register Using instruction is decoded again Following instruction is fetched again 1-cycle stall allows MEM to read data for lw Can subsequently forward to EX stage University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 13

Stall/Bubble in the Pipeline Stall inserted here University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 14

Stall/Bubble in the Pipeline University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 15

Datapath with Hazard Detection University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 16

Stalls and Performance Stalls reduce performance But are required to get correct results Compiler can arrange code to avoid hazards and stalls Requires knowledge of the pipeline structure University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 17

Branch Hazards If branch outcome determined in MEM Flush these instructions (Set control values to 0) PC University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 18

Reducing Branch Delay Move hardware to determine outcome to ID stage Target address adder Register comparator Example: branch taken 36: sub $10, $4, $8 40: beq $1, $3, 7 44: and $12, $2, $5 48: or $13, $2, $6 52: add $14, $4, $2 56: slt $15, $6, $7... 72: lw $4, 50($7) University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 19

Example: Branch Taken University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 20

Example: Branch Taken University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 21

Data Hazards for Branches If a comparison register is a destination of 2 nd or 3 rd preceding ALU instruction add $1, $2, $3 IF ID EX MEM WB add $4, $5, $6 IF ID EX MEM WB IF ID EX MEM WB beq $1, $4, target IF ID EX MEM WB Can resolve using forwarding University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 22

Data Hazards for Branches If a comparison register is a destination of preceding ALU instruction or 2 nd preceding load instruction Need 1 stall cycle lw $1, addr IF ID EX MEM WB add $4, $5, $6 IF ID EX MEM WB beq stalled IF ID beq $1, $4, target ID EX MEM WB University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 23

Data Hazards for Branches If a comparison register is a destination of immediately preceding load instruction Need 2 stall cycles lw $1, addr IF ID EX MEM WB beq stalled IF ID beq stalled ID beq $1, $0, target ID EX MEM WB University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 24

Dynamic Branch Prediction In deeper and superscalar pipelines, branch penalty is more significant Use dynamic prediction Branch prediction buffer (aka branch history table) Indexed by recent branch instruction addresses Stores outcome (taken/not taken) To execute a branch Check table, expect the same outcome Start fetching from fall-through or target If wrong, flush pipeline and flip prediction University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 25

1-Bit Predictor: Shortcoming Inner loop branches mispredicted twice! outer: inner: beq,, inner beq,, outer Mispredict as taken on last iteration of inner loop Then mispredict as not taken on first iteration of inner loop next time around University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 26

2-Bit Predictor Only change prediction on two successive mispredictions University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 27

Calculating the Branch Target Even with predictor, still need to calculate the target address 1-cycle penalty for a taken branch Branch target buffer Cache of target addresses Indexed by PC when instruction fetched If hit and instruction is branch predicted taken, can fetch target immediately University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 28

Exceptions and Interrupts Unexpected events requiring change in flow of control Different ISAs use the terms differently Exception Arises within the CPU e.g., undefined opcode, overflow, syscall, Interrupt From an external I/O controller Dealing with them without sacrificing performance is hard University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 29

Handling Exceptions In MIPS, exceptions managed by a System Control Coprocessor (CP0) Save PC of offending (or interrupted) instruction In MIPS: Exception Program Counter (EPC) Save indication of the problem In MIPS: Cause register We ll assume 1-bit 0 for undefined opcode, 1 for overflow Jump to handler at 8000 00180 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 30

An Alternate Mechanism Vectored Interrupts Handler address determined by the cause Example: Undefined opcode: C000 0000 Overflow: C000 0020 : C000 0040 Instructions either Deal with the interrupt, or Jump to real handler University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 31

Handler Actions Read cause, and transfer to relevant handler Determine action required If restartable Take corrective action use EPC to return to program Otherwise Terminate program Report error using EPC, cause, University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 32

Exceptions in a Pipeline Another form of control hazard Consider overflow on add in EX stage add $1, $2, $1 Prevent $1 from being clobbered Complete previous instructions Flush add and subsequent instructions Set Cause and EPC register values Transfer control to handler Similar to mispredicted branch Use much of the same hardware University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 33

Pipeline with Exceptions University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 34

Exception Properties Restartable exceptions Pipeline can flush the instruction Handler executes, then returns to the instruction Refetched and executed from scratch PC saved in EPC register Identifies causing instruction Actually PC + 4 is saved Handler must adjust University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 35

Exception Example Exception on add in 40 sub $11, $2, $4 44 and $12, $2, $5 48 or $13, $2, $6 4C add $1, $2, $1 50 slt $15, $6, $7 54 lw $16, 50($7) Handler 80000180 sw $25, 1000($0) 80000184 sw $26, 1004($0) University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 36

Exception Example University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 37

Exception Example University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 38

Multiple Exceptions Pipelining overlaps multiple instructions Could have multiple exceptions at once Simple approach: deal with exception from earliest instruction Flush subsequent instructions Precise exceptions In complex pipelines Multiple instructions issued per cycle Out-of-order completion Maintaining precise exceptions is difficult! University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 39

Imprecise Exceptions Just stop pipeline and save state Including exception cause(s) Let the handler work out Which instruction(s) had exceptions Which to complete or flush May require manual completion Simplifies hardware, but more complex handler software Not feasible for complex multiple-issue out-of-order pipelines University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 40

Fallacies Pipelining is easy (!) The basic idea is easy The devil is in the details e.g., detecting data hazards Pipelining is independent of technology So why haven t we always done pipelining? More transistors make more advanced techniques feasible Pipeline-related ISA design needs to take account of technology trends e.g., predicated instructions University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 41

Pitfalls Poor ISA design can make pipelining harder e.g., complex instruction sets (VAX, IA-32) Significant overhead to make pipelining work IA-32 micro-op approach e.g., complex addressing modes Register update side effects, memory indirection e.g., delayed branches Advanced pipelines have long delay slots University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 42