Design of Pipelined MIPS Processor. Sept. 24 & 26, 1997



Similar documents
Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

CS352H: Computer Systems Architecture

Pipeline Hazards. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by Arvind and Krste Asanovic

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Solutions. Solution The values of the signals are as follows:

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Review: MIPS Addressing Modes/Instruction Formats

Computer organization

WAR: Write After Read

Computer Organization and Components

Course on Advanced Computer Architectures

Reduced Instruction Set Computer (RISC)

Lecture: Pipelining Extensions. Topics: control hazards, multi-cycle instructions, pipelining equations

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

CSC 2405: Computer Systems II

PROBLEMS #20,R0,R1 #$3A,R2,R4

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December :59

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

Data Dependences. A data dependence occurs whenever one instruction needs a value produced by another.

Instruction Set Architecture

Instruction Set Design


CPU Performance Equation

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Pipelining Review and Its Limitations

Chapter 1 Computer System Overview

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

The Operating System and the Kernel

An Introduction to the ARM 7 Architecture

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

StrongARM** SA-110 Microprocessor Instruction Timing

MIPS Assembler and Simulator

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Introducción. Diseño de sistemas digitales.1

Central Processing Unit (CPU)

EE361: Digital Computer Organization Course Syllabus

Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007

CPU Organization and Assembly Language

Using Graphics and Animation to Visualize Instruction Pipelining and its Hazards

VLIW Processors. VLIW Processors

The ARM Architecture. With a focus on v7a and Cortex-A8

MICROPROCESSOR AND MICROCOMPUTER BASICS

A FPGA Implementation of a MIPS RISC Processor for Computer Architecture Education

(Refer Slide Time: 00:01:16 min)

A SystemC Transaction Level Model for the MIPS R3000 Processor

Winter 2002 MID-SESSION TEST Friday, March 1 6:30 to 8:00pm

PROBLEMS. which was discussed in Section

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

Operating Systems 4 th Class

CS521 CSE IITG 11/23/2012

Computer Organization and Architecture

CHAPTER 7: The CPU and Memory

Computer Architecture TDTS10

CHAPTER 4 MARIE: An Introduction to a Simple Computer

Intel 8086 architecture

CPU Organisation and Operation

CSE 141 Introduction to Computer Architecture Summer Session I, Lecture 1 Introduction. Pramod V. Argade June 27, 2005

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

IA-64 Application Developer s Architecture Guide

LSN 2 Computer Processors

MIPS R4400PC/SC Errata, Processor Revision 2.0 & 3.0

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Exception and Interrupt Handling in ARM

1. Computer System Structure and Components

Central Processing Unit

SMIPS Processor Specification

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

Operating Systems. Lecture 03. February 11, 2013

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Generating MIF files

PCSpim Tutorial. Nathan Goulding-Hotta v0.1

EECS 427 RISC PROCESSOR

Administrative Issues

Instruction Set Architecture (ISA)

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

M A S S A C H U S E T T S I N S T I T U T E O F T E C H N O L O G Y DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Assembly Language Programming

In the Beginning The first ISA appears on the IBM System 360 In the good old days

PART B QUESTIONS AND ANSWERS UNIT I

1 Storage Devices Summary

Computer Organization and Components

Week 1 out-of-class notes, discussions and sample problems

CSE 141L Computer Architecture Lab Fall Lecture 2

AXI Performance Monitor v5.0

The 104 Duke_ACC Machine

Process Description and Control william stallings, maurizio pizzonia - sistemi operativi

Basic Computer Organization

BASIC COMPUTER ORGANIZATION AND DESIGN

Microprocessor and Microcontroller Architecture

How It All Works. Other M68000 Updates. Basic Control Signals. Basic Control Signals

Transcription:

Design of Pipelined MIPS Processor Sept. 24 & 26, 1997 Topics Instruction processing Principles of pipelining Inserting pipe registers Data Hazards Control Hazards Exceptions

MIPS architecture subset R-type instructions (add, sub, and, or, slt): rd <-- rs funct rt 0 rs rt rd shamt funct 31-26 25-21 20-16 15-11 10-6 5-0 RI-type instructions (addiu): rt <-- rs funct I16 0 rs rt I16 31-26 25-21 20-16 15-0 Load: rt <-- Mem[rs + I16] Store: Mem[rs + I16] <-- rt 35 or 43 rs rt I16 31-26 25-21 20-16 15-0 Branch equal: PC <-- (rs == rt)? PC + 4 + I16 <<2 : PC + 4 4 rs rt I16 31-26 25-21 20-16 15-0 2

Datapath IF instruction fetch ID instruction decode/ register fetch EX execute/ address calculation MEM memory access WB write back MUX 4 Add shift left 2 Add PC Instruction memory Read address Instruction [31-0] Instr [25-21] Instr [20-16] Instr [15-11] MUX read reg 1 read data 1 read reg 2 Registers write reg read data 2 write data MUX ALU zero ALU result read addr read data Data memory write addr MUX Instr [15-0] 16 Sign 32 \ extend \ Instr [5-0] ALU control write data 3

R-type instructions IF: Instruction fetch IR <-- IMemory[PC] PC <-- PC + 4 ID: Instruction decode/register fetch A <-- Register[IR[25:21]] B <-- Register[IR[20:16]] Ex: Execute ALUOutput <-- A op B MEM: Memory nop WB: Write back Register[IR[15:11]] <-- ALUOutput 4

IF: Instruction fetch IR <-- IMemory[PC] PC <-- PC + 4 Load instruction ID: Instruction decode/register fetch A <-- Register[IR[25:21]] B <-- Register[IR[20:16]] Ex: Execute ALUOutput <-- A + SignExtend(IR[15:0]) MEM: Memory Mem-Data <-- DMemory[ALUOutput] WB: Write back Register[IR[20:16]] <-- Mem-Data 5

Store instruction IF: Instruction fetch IR <-- IMemory[PC] PC <-- PC + 4 ID: Instruction decode/register fetch A <-- Register[IR[25:21]] B <-- Register[IR[20:16]] Ex: Execute ALUOutput <-- A + SignExtend(IR[15:0]) MEM: Memory DMemory[ALUOutput] <-- B WB: Write back nop 6

Branch on equal IF: Instruction fetch IR <-- IMemory[PC] PC <-- PC + 4 ID: Instruction decode/register fetch A <-- Register[IR[25:21]] B <-- Register[IR[20:16]] Ex: Execute Target <-- PC + SignExtend(IR[15:0]) << 2 Z <-- (A - B == 0) MEM: Memory If (Z) PC <-- target WB: Write back nop 7 Is this a delayed branch?

Pipelining Basics Unpipelined System 30ns Comb. Logic 3ns R E G Delay = 33ns Throughput = 30MHz Clock Op1 Op2 Op3 Time One operation must complete before next can begin Operations spaced 33ns apart 8

3 Stage Pipelining 10ns 3ns 10ns 3ns 10ns 3ns Comb. Logic R E G Comb. Logic R E G Comb. Logic R E G Delay = 39ns Throughput = 77MHz Clock Op1 Op2 Op3 Space operations 13ns apart Op4 3 operations occur simultaneously Time 9

Limitation: Nonuniform Pipelining 5ns 3ns 15ns 3ns 10ns 3ns Com. Log. R E G Comb. Logic R E G Comb. Logic R E G Delay = 39ns Throughput = 55MHz Clock Throughput limited by slowest stage Must attempt to balance stages 10

Limitation: Deep Pipelines 5ns 3ns 5ns 3ns 5ns 3ns 5ns 3ns 5ns 3ns 5ns 3ns Com. Log. R E G Com. Log. R E G Com. Log. R E G Com. Log. R E G Com. Log. R E G Com. Log. R E G Clock Delay = 48ns, Throughput = 128MHz Diminishing returns as add more pipeline stages Register delays become limiting factor Increased latency Small througput gains 11

Limitation: Sequential Dependencies Comb. Logic R E G Comb. Logic R E G Comb. Logic R E G Clock Op1 Op2 Op3 Op4 Op4 gets result from Op1! Pipeline Hazard Time 12

Pipelined datapath MUX IF/ID ID/EX EX/MEM MEM/WB PC 4 Add Instruction mem Read address Instruction [31-0] Instr [25-21] Instr [20-16] read reg 1 read data 1 read reg 2 Registers write reg read data 2 write data shift left 2 MUX Add ALU zero ALU result read addr read data Data mem write addr MUX Instr [15-0] 16 \ Sign 32 \ ext Instr [5-0] ALU ctl write data Instr [20-16] Instr [15-11] MUX 13

Pipeline Structure Branch Flag & Target PC IF/ID ID/EX EX/MEM MEM/WB IF ID EX MEM Instr. Mem. Reg. File Data Mem. Next PC Write Back Reg. & Data Notes Each stage consists of operate logic connecting pipe registers WB logic merged into ID Additional paths required for forwarding 14

Pipe Register Operation Next State Current State Current State stays constant while Next State being updated Update involves transferring Next State to Current 15

Pipeline Stage ID Reg. File Operation Current State Computes next state based on current From/to one or more pipe registers May have embedded memory elements Low level timing signals control their operation during clock cycle Writes based on current pipe register state Reads supply values for Next state Next State 16

MIPS Simulator Features Based on MIPS subset Code generated by dis Hexadecimal instruction code Executable available HOME/public/sim/solve_tk Demo Programs HOME/public/sim/solve_tk/demos Run Controls Speed Control Mode Selection Current State Pipe Register Next State Register Values 17

Reg-Reg Instructions R-type instructions (add, sub, and, or, slt): rd <-- rs funct rt 0 rs rt rd shamt funct 31-26 25-21 20-16 15-11 10-6 5-0 +4 Add IF/ID ID/EX EX/MEM MEM/WB ALU Op PC Instr. Mem. instr rd rs rt Reg. Wdata Waddr A B A B dest data data data ALU dest dest 18

Simulator ALU Example IF ID EX Fetch instruction Fetch operands Compute ALU result MEM WB Nothing Store result in destination reg. 0x0: 24020003 li r2,3 0x4: 24030004 li r3,4 0x8: 00000000 nop 0xc: 00000000 nop demo1.o 0x10:00432021 addu r4,r2,r3 # --> 7 0x14:00000000 nop 0x18:0000000d break 0 0x1c:00000000 nop.set noreorder addiu $2, $0, 3 addiu $3, $0, 4 nop nop addu $4, $2, $3 nop demo1.s break 0.set reorder 19

Simulator Store/Load Examples IF ID EX Fetch instruction Get addr reg Store: Get data Compute EA MEM Load: Read Store: Write WB Load: Update reg. demo2.o 0x0: 24020003 li r2,3 # Store/Load -->3 0x4: 24030004 li r3,4 # --> 4 0x8: 00000000 nop 0xc: 00000000 nop 0x10:ac430005 sw r3,5(r2) # 4 at 8 0x14:00000000 nop 0x18:00000000 nop 0x1c:8c640004 lw r4,4(r3) # --> 4 0x20:00000000 nop 0x24:0000000d break 0 0x28:00000000 nop 0x2c:00000000 nop 20

Simulator Branch Examples IF ID EX Fetch instruction Fetch operands Subtract operands test if 0 Compute target MEM WB Taken: Update PC to target Nothing 21 demo3.o 0x0: 24020003 li r2,3 0x4: 24030004 li r3,4 0x8: 00000000 nop 0xc: 00000000 nop 0x10:10430008 beq r2,r3,0x34 # Don't take 0x14:00000000 nop 0x18:00000000 nop 0x1c:00000000 nop 0x20:14430004 bne r2,r3,0x34 # Take 0x24:00000000 nop 0x28:00000000 nop 0x2c:00000000 nop 0x30:00432021 addu r4,r2,r3 # skip 0x34:00632021 addu r4,r3,r3 # Target...

Data Hazards in MIPS Pipeline Problem Registers read in ID, and written in WB Must resolve conflict between instructions competing for register array Generally do write back in first half of cycle, read in second But what about intervening instructions? E.g., suppose initially $2 is zero: $2 $3 $4 $5 $6 IF ID EX M WB IF ID EX M WB IF ID EX M WB IF ID EX M WB IF ID EX M WB addiu $2, $0, 63 addiu $3, $2, 0 addiu $4, $2, 0 addiu $5, $2, 0 addiu $6, $2, 0 22

Simulator Data Hazard Example Operation Read in ID Write in WB Write-before-read register file demo4.o 0x0: 2402003f li r2,63 0x4: 00401821 move r3,r2 # --> 0x3F? 0x8: 00402021 move r4,r2 # --> 0x3f? 0xc: 00402821 move r5,r2 # --> 0x3f? 0x10:00403021 move r6,r2 # --> 0x3f? 0x14:00000000 nop 0x18:0000000d break 0 0x1c:00000000 nop 23

Handling Hazards by Stalling Idea Delay instruction until hazard eliminated Put bubble into pipeline Dynamically generated NOP Pipe Register Operation Normally transfer next state to current Stall indicates that current state should not be changed Bubble indicates that current state should be set to 0 Stage logic designed so that 0 is like NOP [Other conventions possible] Next State Current State Bubble Stall 24

Stall Control Stall Control IF ID EX MEM Instr. Mem. Reg. File Data Mem. Stall Logic Determines which stages to stall or bubble on next update 25

Observations on Stalling Good Bad Relatively simple hardware Only penalizes performance when hazard exists As if placed NOP s in code Except that does not waste instruction memory Reality Some problems can only be dealt with by stalling E.g., instruction cache miss Stall PC, bubble IF/ID Data cache miss Stall PC, IF/ID, ED/EX, EX/MEM, bubble MEM/WB 26

Forwarding (Bypassing) Observation ALU data generated at end of EX Steps through pipe until WB ALU data consumed at beginning of EX Idea Expedite passing of previous instruction result to ALU By adding extra data pathways and control 27

Forwarding for ALU-ALU Hazard +4 PC Add Instr. Mem. IF/ID rs instr rt rd $1 $2 ALU Op 5 7 A B ID/EX + $1 $2 5 7 $1 Bypass Control ALU dest EX/MEM MEM/WB data data data 12 $2 dest dest Program: 0x18: add $2, $1, $2 0x1c: add $1, $1, $2 28

Some Hazards with Loads & Stores Data Generated by Load Load-ALU lw $1, 8($2) add $2, $1, $2 Data Generated by Store Store-Load Data sw $1, 8($2) lw $3, 8($2) Load-Store Data lw $1, 8($2) sw $1, 12($2) Load-Store (or Load) Addr. lw $1, 8($2) sw $1, 12($1) 29

Analysis of Data Transfers Data Sources Available after EX ALU Result Reg-Reg Result Available after MEM Read Data Load result ALU Data Reg-Reg Result passing through MEM stage Data Destinations ALU A input Need in EX Reg-Reg Operand Load/Store Base ALU B input Need in EX Reg-Reg Operand Write Data Need in MEM Store Data 30

Complete Bypassing MUX IF/ID ID/EX EX/MEM MEM/WB 4 Add shift left 2 Add PC Instruction mem Read address Instruction [31-0] Instr [25-21] Instr [20-16] Instr [15-0] read reg 1 read data 1 read reg 2 Registers write reg read data 2 write data 16 \ Sign 32 \ ext Instr [5-0] MUX ALU ctl ALU zero ALU result read addr read data Data mem write addr write data MUX Instr [20-16] Instr [15-11] MUX EX/EX MEM/EX MEM/MEM 31

Simulator Data Hazard Examples demo5.o 32 0x0: 2402003f li r2, 63 # --> 0x3F 0x4: 00401821 move r3, r2 # --> EX-EX 0x3F 0x8: 00000000 nop 0xc: 00000000 nop 0x10: 2402000f li r2, 15 # --> 0xF 0x14: 00000000 nop 0x18: 00401821 move r3, r2 # --> MEM-EX 0xF 0x1c: 00000000 nop 0x20: 2402000c li r2, 12 # --> 0xC 0x24: 24020010 li r2, 16 # --> 0x10 0x28: ac430000 sw r3, 0(r2) # EX-EX 0xF at 0x10 0x2c: 00000000 nop 0x30: 8c440000 lw r4, 0(r2) # --> 0xF 0x34: 00822821 addu r5, r4, r2 # (Stall) --> 0x1F 0x38: 00000000 nop 0x3c: 0000000d break 0 0x40: 00000000 nop 0x44: 00000000 nop

Impact of Forwarding Single Remaining Hazard Class Load followed by ALU operation Including address calculation MIPS I Architecture Programmer may not put instruction after load that uses loaded register Not even allowed to assume it will be old value Load delay slot Can force with.set noreorder MIPS II Architecture No restriction on programs Stall following instruction one cycle Then pick up with MEM/EX bypass 33 Load-ALU lw $1, 8($2) add $2, $1, $2 Load-Store (or Load) Addr. lw $1, 8($2) lw $4, 12($1)

Methodology for characterizing and Enumerating Data Hazards OP writes reads R rd rs,rt RI rt rs LW rt rs SW rs,rt Bx rs,rt JR rs JAL $31 JALR rd rs The space of data hazards (from a programcentric point of view) can be characterized by 3 independent axes: 5 possible write regs (axis 1): R.rd, RI.rt, LW.rt, JAL.$31, JALR.rd 10 possible read regs (axis 2): R.rs, R.rt, RI.rs, RI.rs, SW.rs, SW.rt Bx.rs, Bx.rt, JR.rs, JALR.rs A dependent read can be a distance of either 1 or 2 from the corresponding write (axis 3): distance 2 hazard: R.rd/R.rs/2 addiu $2, $0, 63 addu $3, $0, $2 addu $4, $2, $0 distance 1 hazard: R.rd/R.rt/1 34

Enumerating data hazards reads distance = 1 writes R.rd RI.rt LW.rt JAL.$31 JALR.rd R.rs R.rt RI.rs LW.rs SW.rs SW.rt Bx.rs Bx.rt JR.rs JALR.rs n/a n/a n/a n/a n/a n/a n/a n/a writes R.rd RI.rt LW.rt JAL.$31 JALR.rd reads distance = 2 R.rs R.rt RI.rs LW.rs SW.rs SW.rt Bx.rs Bx.rt JR.rs JALR.rs 35

Simulator Microtest Example 0x0: 24020010 li r2,16 0x4: 00000000 nop 0x8: 00000000 nop 0xc: 00022821 addu r5,r0,r2 0x10: 00a00821 move r1,r5 0x14: 00000000 nop 0x18: 00000000 nop 0x1c: 24040010 li r4,16 0x20: 00000000 nop 0x24: 00000000 nop 0x28: 1024000b beq r1,r4,0x58 demo7.o 0x48: 00000000 nop 0x4c: 00000000 nop 0x50: 00000000 nop 0x54: 00000000 nop 0x58: 0001000d break 1 0x5c: 00000000 nop 0x60: 00000000 nop 0x64: 00000000 nop 0x68: 00000000 nop 0x6c: 00000000 nop 0x2c: 00000000 nop 0x30: 00000000 nop 0x34: 00000000 nop 0x38: 00000000 nop 0x3c: 00000000 nop 0x40: 0000000d break 0 0x44: 00000000 nop 36 Tests for single failure mode ALU Rd --> ALUI Rs, dist 1 Hits break 0 when error Jumps to break 1 when OK Grep for ERROR or break 0

Pipelined datapath IF instruction fetch ID instruction decode/ register fetch EX execute/ address calc MEM memory access WB write back MUX What happens with a branch? IF/ID ID/EX EX/MEM MEM/WB 4 Add shift left 2 Add PC Instruction mem Read address Instruction [31-0] Instr [25-21] Instr [20-16] Instr [15-0] read reg 1 read data 1 read reg 2 Registers write reg read data 2 write data 16 \ Sign 32 \ ext Instr [5-0] MUX ALU ctl ALU zero ALU result read addr read data Data mem write addr write data MUX Instr [20-16] Instr [15-11] MUX 37

Control Hazards in MIPS Pipeline Problem Instruction fetched in IF, branch condition set in MEM When does branch take effect? E.g.: assume initially that all registers = 0 IF ID EX M WB beq $0, $0, target IF ID EX M WB IF ID EX M WB addiu $2, $0, 63 addiu $3, $0, 63 $2 $3 $4 $5 $6 IF ID EX M WB IF ID EX M WB addiu $4, $0, 63 addiu $5, $0, 63 target: addiu $6, $0, 63 38

Branch Example Assume All registers initially 0 Desired Behavior Take branch at 0x00 Execute delay slot 0x04 Execute target 0x18 PC + 4 + I16<<2 PC = 0x00 I16 = 5 Branch Code (demo8.o) 0x00: BEQ $0, $0, 5 0x04: li $2, 63 # Slot 0x08: li $3, 63 # Xtra1 0x0c: li $4, 63 # Xtra2 0x10: li $5, 63 # Xtra3 0x14: nop 0x18: li $6, 63 # Target 39

Stall Until Resolve Branch Detect when branch in stages ID or EX Stop fetching until resolve Delay slot instruction just before target Stall Control Stall +4 Add MUX Bubble Branch? Target IF/ID ID/EX EX/MEM MEM/WB PC Instr. Mem. instr Reg. Data Mem. nxt cur nxt cur nxt cur nxt cur nxt cur 40

Branch Not Taken Example Assume All registers initially 0 Desired Behavior Execute entire sequence Effect of Stalling Wastes 2 cycles waiting for branch decision Branch Code (demo9.o) 0x00: BNE $0, $0, 5 0x04: li $2, 63 # Slot 0x08: li $3, 63 # Xtra1 0x0c: li $4, 63 # Xtra2 0x10: li $5, 63 # Xtra3 0x14: nop 0x18: li $6, 63 # Target 41

Fetch & Cancel When Taken Instruction does not cause any updates until MEM or WB stages Instruction can be cancelled from pipe up through EX stage Replace with bubble Strategy Continue fetching under assumption that branch not taken If decide to take branch, cancel undesired ones +4 Add MUX Bubble Bubble Branch? Target IF/ID ID/EX EX/MEM MEM/WB PC T g t X t r a 2 nxt cur Instr. Mem. instr X t r a 1 Reg. s l o t nxt cur b e q nxt cur Data Mem. nxt cur 42 nxt cur

Branch Prediction Analysis Our Scheme Implements Predict Not Taken But 67% of branches are taken Would give CPI 1.0 + 0.16 * 0.67 * 2.0 = 1.22 Alternative Schemes Predict taken Would be hard to squeeze into our pipeline» Can t compute target until ID In principle, would give CPI 1.11 Backwards taken, forwards not taken Predict based on sign of I16 Exploits fact that loops usually closed with backward branches 43

How Does Real MIPS Handle Branches? Avoids any Branch Penalty! Do not treat PC as pipe register Loaded at beginning of IF Compute branch target & branch condition in ID Can fetch at target in following cycle Target +4 Add IncrPC Add Target 44 M U X Instr. Mem. instr Reg. PC IF/ID ID/EX branch? rs rt = A B

Exceptions An exception is a transfer of control to the OS in response to some event (i.e. change in processor state) User Process Operating System event exception exception return (optional) exception processing by exception handler 45

Internal (CPU) exceptions Internal exceptions occur as a result of events generated by executing instructions. Execution of a SYSCALL instruction. allows a program to ask for OS services (e.g., timer updates) Execution of a BREAK instruction used by debuggers Errors during instruction execution arithmetic overflow, address error, parity error, undefined instruction Events that require OS intervention virtual memory page fault 46

External (I/O) exceptions External exceptions occur as a result of events generated by devices external to the processor. I/O interrupts hitting ^C at the keyboard arrival of a packet arrival of a disk sector Hard reset interrupt hitting the reset button Soft reset interrupt hitting ctl-alt-delete on a PC 47

Exception handling (hardware tasks) Recognize event(s) Associate one event with one instruction. external event: pick any instruction multiple internal events: typically choose the earliest instruction. multiple external events: prioritize multiple internal and external events: prioritize Create Clean Break in Instruction Stream Complete all instructions before excepting instruction Abort excepting and all following instructions User Process A C 48

Exception handling (hardware tasks) Set status registers EPC register: exception program counter external exception: address of instruction about to be executed. internal exception (not in the delay slot): address of instruction causing the exception internal exception (in delay slot): address of preceding branch or jump Cause register records the event that caused the exception Others 15 other registers! Which get set depends on CPU and exception type Disable interrupts and switch to kernel mode Jump to common exception handler location 49

Exception handling (software tasks) Deal with event (Optionally) Resume execution using special rfe (return from exception) instruction similar to jr instruction, but restores processor to user mode as a side effect. Where to resume execution? usually re-execute the instruction causing exception unless instruction was in branch delay slot, in which case re-execute the branch instruction immediately preceding the delay slot EPC points here anyhow Also, what about syscall or break? 50

Example: Integer overflow (EX) user code sub $11, $2, $4 and $12, $2, $5 or $13, $2, $6 add $1, $2, $1 slt $15, $6, $7 lw $16, 50($7) handler code overflow and or add slt lw IF ID IF flush these instructions Overflow detected here EX ID IF MEM EX ID IF WB ID EX ID IF the or instruction completes WB nop nop nop sw $25, 1000($0)... sw start handler code IF 51

Exception Handling In pmips Simulator Relevant Pipeline State Address of instruction in pipe stage (SPC) Exception condition (EXC) Set in stage when problem encountered» IF for fetch problems, EX for instr. problems, MEM for data probs. Triggers special action once hits WB PC IF/ID S P C E X C Branch Flag & Target ID/EX EX/MEM MEM/WB S S S P P P C C C E E E X C X C X C IF ID EX MEM Instr. Mem. Reg. File Data Mem. Next PC Write Back Reg. & Data 52

MIPS Exception Examples In directory HOME/public/sim/exc Illegal Instruction (exc1.o) 0x0: srl r3,r2,4 # Unimplemented 0x4: li r2,4 # Should cancel Illegal Instruction followed by store (exc2.o) 0x0: li r2,15 # --> 0xf 0x4: srl r3,r2,4 # Unimplemented 0x8: sw r2,4(r0) # Should cancel 53

More MIPS Exception Examples Good instruction in delay slot (exc3.o) 0x0: li r3,3 0x4: jr r3 # Bad address 0x8: li r3,5 # Should execute Which is excepting instruction? Bad instruction in delay slot (exc4.o) 0x0: li r3,3 0x4: jr r3 # Bad address 0x8: sw r3,0(r3) # Bad address 54

Final MIPS Exception Example Avoiding false alarms (exc5.o) 0x0: beq r2,r2,0x10 0x4: nop 0x8: srl r2,r4,4 # Should cancel 0xc: nop 0x10: li r2,1 # --> 1 0x14: break 0 55

Implementation Features Correct Detects excepting instruction (exc4.o) Furthest one down pipeline = Earliest one in program order Completes all preceding instructions (exc3.o) Usually aborts excepting instruction & beyond (exc1.o) Prioritizes exception conditions Earliest stage where instruction ran into problems Avoids false alarms (exc5.o) Problematic instructions that get canceled anyhow Shortcomings Store following excepting instruction (exc2.o) EPC SPC for delay slot instruction 56