Pipelined MIPS Processor. Dmitri Strukov ECE 154A



Similar documents
CS352H: Computer Systems Architecture

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

Solutions. Solution The values of the signals are as follows:

Things to Remember. r Complete all of the sections on the Retirement Benefit Options form that apply to your request.

STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION

BIOS American Megatrends Inc (AMI) v02.61 BIOS setup guide and manual for AM2/AM2+/AM3 motherboards

Converting knowledge Into Practice

Episode 401: Newton s law of universal gravitation

WAR: Write After Read

Controlling the Money Supply: Bond Purchases in the Open Market

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Pipeline Hazards. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by Arvind and Krste Asanovic

How to create RAID 1 mirroring with a hard disk that already has data or an operating system on it

Semipartial (Part) and Partial Correlation

An Introduction to Omega

Design of Pipelined MIPS Processor. Sept. 24 & 26, 1997

Figure 2. So it is very likely that the Babylonians attributed 60 units to each side of the hexagon. Its resulting perimeter would then be 360!

Alarm transmission through Radio and GSM networks

Computer Organization and Components

Questions & Answers Chapter 10 Software Reliability Prediction, Allocation and Demonstration Testing

who supply the system vectors for their JVM products. 1 HBench:Java will work best with support from JVM vendors

Left- and Right-Brain Preferences Profile

Definitions and terminology

Voltage ( = Electric Potential )

Database Management Systems

est using the formula I = Prt, where I is the interest earned, P is the principal, r is the interest rate, and t is the time in years.

Software Engineering and Development

12. Rolling, Torque, and Angular Momentum

Anti-Lock Braking System Training Program

Financing Terms in the EOQ Model

Chapter 3 Savings, Present Value and Ricardian Equivalence

How to create a default user profile in Windows 7

Define What Type of Trader Are you?

Instructions to help you complete your enrollment form for HPHC's Medicare Supplemental Plan

How To Change V1 Programming

SUPPORT VECTOR MACHINE FOR BANDWIDTH ANALYSIS OF SLOTTED MICROSTRIP ANTENNA

Experiment 6: Centripetal Force

Do Vibrations Make Sound?

Gauss Law. Physics 231 Lecture 2-1

How Much Should a Firm Borrow. Effect of tax shields. Capital Structure Theory. Capital Structure & Corporate Taxes

Review: MIPS Addressing Modes/Instruction Formats

How to SYSPREP a Windows 7 Pro corporate PC setup so you can image it for use on future PCs

Continuous Compounding and Annualization

Problem Set # 9 Solutions

Exam #1 Review Answers

Firstmark Credit Union Commercial Loan Department

DOCTORATE DEGREE PROGRAMS

Ignorance is not bliss when it comes to knowing credit score

Hitachi Virtual Storage Platform

Deflection of Electrons by Electric and Magnetic Fields

Transmittal 198 Date: DECEMBER 9, SUBJECT: Termination of the Existing Eligibility-File Based Crossover Process at All Medicare Contractors

INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS

The LCOE is defined as the energy price ($ per unit of energy output) for which the Net Present Value of the investment is zero.

Chapter 1: Introduction BELSORP analysis program Required computer environment... 8

CANCER, HEART ATTACK OR STROKE CLAIM FORM

The Role of Gravity in Orbital Motion

NUCLEAR MAGNETIC RESONANCE

ICD-10. Implementation

Cloud Service Reliability: Modeling and Analysis

How to recover your Exchange 2003/2007 mailboxes and s if all you have available are your PRIV1.EDB and PRIV1.STM Information Store database

Power Monitoring and Control for Electric Home Appliances Based on Power Line Communication

Efficient Redundancy Techniques for Latency Reduction in Cloud Systems

HOSPITAL INDEMNITY CLAIM FORM

Over-encryption: Management of Access Control Evolution on Outsourced Data

Module Availability at Regent s School of Drama, Film and Media Autumn 2016 and Spring 2017 *subject to change*

P/ACE MDQ Basic Training Workbook

Comparing Availability of Various Rack Power Redundancy Configurations

Supplementary Material for EpiDiff

GESTÃO FINANCEIRA II PROBLEM SET 1 - SOLUTIONS

Real Time Tracking of High Speed Movements in the Context of a Table Tennis Application

Faithful Comptroller s Handbook

Comparing Availability of Various Rack Power Redundancy Configurations

An application of stochastic programming in solving capacity allocation and migration planning problem under uncertainty

Lab #7: Energy Conservation

How To Use A Network On A Network With A Powerline (Lan) On A Pcode (Lan On Alan) (Lan For Acedo) (Moe) (Omo) On An Ipo) Or Ipo (

Scheduling Hadoop Jobs to Meet Deadlines

Displacement, Velocity And Acceleration

Towards Realizing a Low Cost and Highly Available Datacenter Power Infrastructure

Ilona V. Tregub, ScD., Professor

APPLICATION AND AGREEMENT FORM FOR TELECOMMUNICATION SERVICES BUSINESS APPLICATION

PY1052 Problem Set 8 Autumn 2004 Solutions

UNIT CIRCLE TRIGONOMETRY

DYNAMICS AND STRUCTURAL LOADING IN WIND TURBINES

Public Health and Transportation Coalition (PHiT) Vision, Mission, Goals, Objectives, and Work Plan August 2, 2012

How To Find The Optimal Stategy For Buying Life Insuance

Strength Analysis and Optimization Design about the key parts of the Robot

Chapter 30: Magnetic Fields Due to Currents

Lecture 16: Color and Intensity. and he made him a coat of many colours. Genesis 37:3

VISCOSITY OF BIO-DIESEL FUELS

Course on Advanced Computer Architectures

Transcription:

Pipelined MIPS Pocesso Dmiti Stukov ECE 154A

Pipelining Analogy Pipelined laundy: ovelapping execution Paallelism impoves pefomance Fou loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup = 2n/0.5n + 1.5 4 = numbe of stages

Single-Cycle vs. Multicycle vs. Pipelined Clock Time needed Time allotted Inst 1 Inst 2 Inst 3 Inst 4 Clock Time needed Time allotted 3 cycles 5 cycles 3 cycles 4 cycles Inst 1 Inst 2 Inst 3 Inst 4 Time saved 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 1 f a d w Cycle 1 f f f f f f f Cycle 2 3 f f a d a w d w 2 3 a a a a a a a Dainage egion 4 f = Fetch f a d w 5 = Reg ead a = op 6 d = Data access w = Witeback 7 Instuction f f a f d a (a) Task-time diagam w d a w d w 4 5 Stat-up egion Pipeline stage d d d d d d d w w w w w w w (b) Space-time diagam

MIPS Pipeline Five stages, one step pe stage 1. IF: Instuction fetch fom memoy 2. ID: Instuction decode & egiste ead 3. EX: Execute opeation o calculate addess 4. MEM: Access memoy opeand 5. WB: Wite esult back to egiste lw Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 IFetch Dec Exec Mem WB

Pipeline Pefomance Example Assume time fo stages is 100ps fo egiste ead o wite 200ps fo othe stages Compae pipelined datapath with single-cycle datapath Inst Inst fetch Registe ead op Memoy access Registe wite Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-fomat 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps

Pipeline Pefomance Example Single-cycle (T c = 800ps) Pipelined (T c = 200ps)

Pipeline Speedup Example If all stages ae balanced i.e., all take the same time Time between instuctions pipelined = Time between instuctions nonpipelined Numbe of stages If not balanced, speedup is less Speedup due to inceased thoughput Latency (time fo each instuction) does not decease

Pipelining and ISA Design MIPS ISA designed fo pipelining All instuctions ae 32-bits Easie to fetch and decode in one cycle c.f. x86: 1- to 17-byte instuctions Few and egula instuction fomats Can decode and ead egistes in one step Load/stoe addessing Can calculate addess in 3 d stage, access memoy in 4 th stage Alignment of memoy opeands Memoy access takes only one cycle

Gaphically Repesenting MIPS Pipeline Can help with answeing questions like: How many cycles does it take to execute this code? What is the doing duing cycle 4? Is thee a hazad, why does it occu, and how can it be fixed?

Why Pipeline? Fo Pefomance! Time (clock cycles) I n s t. Inst 0 Inst 1 Once the pipeline is full, one instuction is completed evey cycle, so CPI = 1 O d e Inst 2 Inst 3 Inst 4 Time to fill the pipeline

1 2 3 4 f = Fetch f a d w 5 = Reg ead a = op 6 d = Data access w = Witeback 7 Instuction f f a f d a (a) Task-time diagam Review fom Last Lectue 1 2 3 4 5 6 7 8 9 10 11 f multi cycle pipelined f a f d a w d a w d w w d a Cycle w d w 1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 11 f Clock f f f f f f Time Pipeline stage (b) Space-time diagam Cycle Time Dainage needed egion a a a a a a a Time allotted Inst 1 Inst 2 Inst 3 Inst 4 Stat-up d d d d d d d egion Clock w w w w w w w needed Time allotted 3 cycles 5 cycles 3 cycles 4 cycles Inst 1 Inst 2 Inst 3 Inst 4 Time saved Single cycle multi cycle Execution time = 1/ Pefomance = Inst count x CPI x CCT N = # of stages fo pipeline design o ~ maximum numbe of steps fo MC CPI ideal MCP =N /InstCount + 1 1/InstCount lage N and/o small InstCount esult in wose CPI Pefomance to un one instuction is the same as of CP (i.e. latency fo single instuction is not educed) Design Inst count CPI Single Cycle (SC) 1 1 1 Multi cycle (MC) 1 N CPI > 1 (close to N than 1) Multi cycle pipelined (MCP) CCT > 1/N 1 > 1 >1/N What ae the othe issues affecting CCT and CPI fo MC and MCP?

Visualizing pipeline - I Cycle 1 I n s t. Inst 1 Inst 2 IM Reg DM Reg O d e Inst 3 Inst 4 Inst 5 One way to visualize pipeline: Snapshot of what it is in pipeline in a paticula cycle

Visualizing pipeline - I Cycle 2 I n s t. Inst 1 Inst 2 IM Reg DM Reg O d e Inst 3 Inst 4 Inst 5 One way to visualize pipeline: Snapshot of what it is in pipeline in a paticula cycle

Visualizing pipeline - I Cycle 3 I n s t. Inst 1 Inst 2 IM Reg DM Reg O d e Inst 3 Inst 4 Inst 5 One way to visualize pipeline: Snapshot of what it is in pipeline in a paticula cycle

Visualizing pipeline - I Cycle 4 I n s t. Inst 1 Inst 2 IM Reg DM Reg O d e Inst 3 Inst 4 Inst 5 One way to visualize pipeline: Snapshot of what it is in pipeline in a paticula cycle

Visualizing pipeline - I Cycle 5 I n s t. Inst 1 Inst 2 IM Reg DM Reg O d e Inst 3 Inst 4 Inst 5 One way to visualize pipeline: Snapshot of what it is in pipeline in a paticula cycle

Time (in cycles) Visualizing pipeline - II 1 2 3 4 5 6 7 8 I n s t. Inst 1 Inst 2 IM Reg DM Reg O d e Inst 3 Inst 4 Inst 5 IM Reg DM

Time (in cycles) Visualizing pipeline - II 1 2 3 4 5 6 7 8 I n s t. Inst 1 Inst 2 IM Reg DM Reg O d e Inst 3 Inst 4 Inst 5 IM Reg DM

Time (in cycles) Visualizing pipeline - II 1 2 3 4 5 6 7 8 I n s t. Inst 1 Inst 2 IM Reg DM Reg O d e Inst 3 Inst 4 Inst 5 IM Reg DM

Time (in cycles) Visualizing pipeline - II 1 2 3 4 5 6 7 8 I n s t. Inst 1 Inst 2 IM Reg DM Reg O d e Inst 3 Inst 4 Inst 5 IM Reg DM

Time (in cycles) Visualizing pipeline - II 1 2 3 4 5 6 7 8 I n s t. Inst 1 Inst 2 IM Reg DM Reg O d e Inst 3 Inst 4 Inst 5 IM Reg DM

Hazads Situations that pevent stating the next instuction in the next cycle Stuctue hazads A equied esouce is busy Data hazad Need to wait fo pevious instuction to complete its data ead/wite Contol hazad Deciding on contol action depends on pevious instuction

Stuctue Hazads Conflict fo use of a esouce In MIPS pipeline with a single memoy Load/stoe equies data access Instuction fetch would have to stall fo that cycle Would cause a pipeline bubble Hence, pipelined datapaths equie sepaate instuction/data memoies O sepaate instuction/data caches

A Single Memoy Would Be a Stuctual Hazad Time (clock cycles) I n s t. lw Inst 1 Mem Reg Mem Reg Mem Reg Mem Reg Reading data fom memoy O d e Inst 2 Inst 3 Mem Reg Mem Reg Mem Reg Mem Reg Inst 4 Reading instuction fom memoy Mem Reg Mem Reg Fix with sepaate inst and data memoies (I$ and D$)

Note that all instuctions will take effectively 5 cycles even if some stages ae not used fo o instuction finishes ealy Why? Time (clock cycles) I n s t. O d e Inst 0 Inst 1 Inst 2 Inst 3 Inst 4

Data Hazads An instuction depends on completion of data access by a pevious instuction add $s0, $t0, $t1 sub $t2, $s0, $t3

Data Dependencies instuction j is said data dependent on instuction i if eithe of the following holds 1. Instuction i poduces a esult that may be used by instuction j, o 2. Instuction j is data dependent on instuction k and instuction k is data dependent on instuction i Typically only type 1 data dependency is sufficient to satisfy fo the coect execution of the pogam since type 2 dependency just implies that one instuction is dependent on anothe if thee exist a chain of dependencies of the fist type between the two instuctions. A dependency between two instuctions will only esult in a data hazad if the instuctions ae close enough togethe fo the consideed simple datapath in class. In geneal, it may also become a hazad fo advanced pipelined designs when the pocesso executes multiple and/o out-of-ode instuctions Thee ae thee paticula data dependencies: 1. RAW (ead afte wite) j eads a souce afte i wites it 2. WAW (wite afte wite) j wites an opeand afte it is witten by I 3. WAR (wite afte ead) j wites a destination afte it is ead by i Note that RAW is what is called tue data dependency because thee is a flow of data between the instuctions. WAW and WAR ae called name dependency, since two instuctions use the same egiste of memoy location (but thee is no flow of data between the instuctions).

Registe Usage Can Cause Data Hazads Dependencies backwad in time cause hazads add $1, sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 Read befoe wite data hazad

Loads Can Cause Data Hazads Dependencies backwad in time cause hazads I n s t. O d e lw $1,4($2) sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 Load-use data hazad

How About Registe File Access? Time (clock cycles) I n s t. add $1, Inst 1 Fix egiste file access hazad by doing eads in the second half of the cycle and wites in the fist half O d e Inst 2 add $2,$1, clock edge that contols egiste witing clock edge that contols loading of pipeline state egistes

One Way to Fix a Data Hazad I n s t. add $1, stall Can fix data hazad by waiting stall but impacts CPI O d e stall sub $4,$1,$5 and $6,$1,$7 How to implement stall?

Fowading: Anothe Way to Fix a Data Hazad I n s t. add $1, sub $4,$1,$5 Fix data hazads by fowading esults as soon as they ae available to whee they ae needed O d e and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 Requies exta connection in a datapath!

Fowading Illustation I n s t. add $1, sub $4,$1,$5 O d e and $6,$7,$1 EX fowading MEM fowading

Yet Anothe Complication! Anothe potential data hazad can occu when thee is a conflict between the esult of the WB stage instuction and the MEM stage instuction which should be fowaded? I n s t. O d e add $1,$1,$2 add $1,$1,$3 add $1,$1,$4

Load-Use Data Hazad Can t always avoid stalls by fowading If value not computed when needed Can t fowad backwad in time!

Code Scheduling to Avoid Stalls Reode code to avoid use of load esult in the next instuction C code fo A = B + E; C = B + F; stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 11 cycles

PC MIPS Pipeline Contol Path Modifications All contol signals can be detemined duing Decode and held in the state egistes between pipeline stages PCSc ID/EX EX/MEM IF/ID Contol 4 Read Addess Add Instuction Memoy RegWite Read Add 1 Registe Read Add 2 File Wite Add Wite Data Read Data 1 Read Data 2 Sign 16 Extend 32 Shift left 2 Sc Add cntl Op Banch Addess Wite Data Data Memoy Read Data MemRead MEM/WB MemtoReg RegDst

Pipeline Contol IF Stage: ead Inst Memoy (always asseted) and wite PC (on System Clock) ID Stage: no optional contol signals to set Reg Dst EX Stage MEM Stage WB Stage Op1 Op0 Sc Bch Mem Read Mem Wite Reg Wite Mem toreg R 1 1 0 0 0 0 0 1 0 lw 0 0 0 1 0 1 0 1 1 sw X 0 0 1 0 0 1 0 X beq X 0 1 0 1 0 0 0 X

PC Datapath with Fowading Hadwae PCSc ID/EX EX/MEM IF/ID Contol 4 Read Addess Add Instuction Memoy Read Add 1 Registe Read Add 2 File Wite Add Wite Data Read Data 1 Read Data 2 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Wite Data Data Memoy Read Data MEM/WB EX/MEM.RegisteRd ID/EX.RegisteRt ID/EX.RegisteRs Fowad Unit MEM/WB.RegisteRd

Data Fowading Contol Conditions 1. EX Fowad Unit: if (EX/MEM.RegWite and (EX/MEM.RegisteRd!= 0) and (EX/MEM.RegisteRd = ID/EX.RegisteRs)) FowadA = 10 if (EX/MEM.RegWite and (EX/MEM.RegisteRd!= 0) and (EX/MEM.RegisteRd = ID/EX.RegisteRt)) FowadB = 10 2. MEM Fowad Unit: if (MEM/WB.RegWite and (MEM/WB.RegisteRd!= 0) and (EX/MEM.RegisteRd!= ID/EX.RegisteRs) and (MEM/WB.RegisteRd = ID/EX.RegisteRs)) FowadA = 01 if (MEM/WB.RegWite and (MEM/WB.RegisteRd!= 0) and (EX/MEM.RegisteRd!= ID/EX.RegisteRt) and (MEM/WB.RegisteRd = ID/EX.RegisteRt)) FowadB = 01 Fowads the esult fom the pevious inst. to eithe input of the Fowads the esult fom the pevious o second pevious inst. to eithe input of the

Load-use Hazad Detection Unit Need a Hazad detection Unit in the ID stage that insets a stall between the load and its use 1. ID Hazad detection Unit: if (ID/EX.MemRead and ((ID/EX.RegisteRt = IF/ID.RegisteRs) o (ID/EX.RegisteRt = IF/ID.RegisteRt))) stall the pipeline The fist line tests to see if the instuction now in the EX stage is a lw; the next two lines check to see if the destination egiste of the lw matches eithe souce egiste of the instuction in the ID stage (the load-use instuction) Afte this one cycle stall, the fowading logic can handle the emaining data hazads

Hazad/Stall Hadwae Along with the Hazad Unit, we have to implement the stall Pevent the instuctions in the IF and ID stages fom pogessing down the pipeline done by peventing the PC egiste and the IF/ID pipeline egiste fom changing Hazad detection Unit contols the witing of the PC (PC.wite) and IF/ID (IF/ID.wite) egistes Inset a bubble between the lw instuction (in the EX stage) and the load-use instuction (in the ID stage) (i.e., inset a nop in the execution steam) Set the contol bits in the EX, MEM, and WB contol fields of the ID/EX pipeline egiste to 0 (nop). The Hazad Unit contols the mux that chooses between the eal contol values and the 0 s. Let the lw instuction and the instuctions afte it in the pipeline (befoe it in the code) poceed nomally down the pipeline

PC Adding the Hazad/Stall Hadwae PCSc Hazad Unit 0 ID/EX ID/EX.MemRead EX/MEM 4 Read Addess Add Instuction Memoy IF/ID Contol Read Add 1 Registe Read Add 2 File 0 Wite Add Wite Data 1 Read Data 1 Read Data 2 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Wite Data Data Memoy Read Data MEM/WB ID/EX.RegisteRt Fowad Unit

Time (in cycles) Visualizing Load-Use Stall 1 2 3 4 5 6 7 8 I n s t. lw $1 add $2, $1 IM Reg DM Reg O d e Inst 2 Inst 3 Inst 4 IM Reg DM

Time (in cycles) Visualizing Load-Use Stall 1 2 3 4 5 6 7 8 I n s t. lw $1 add $2, $1 IM Reg DM Reg O d e Inst 2 Inst 3 Inst 4 IM Reg DM

Time (in cycles) Visualizing Load-Use Stall 1 2 3 4 5 6 7 8 I n s t. lw $1 add $2, $1 IM Reg DM Reg O d e Inst 2 Inst 3 Inst 4 Can detect stall load condition in this cycle by looking in pipeline egistes IM Reg DM

Time (in cycles) Visualizing Load-Use Stall 1 2 3 4 5 6 7 8 I n s t. lw $1 nop IM Reg DM Reg O d e add $2, $1 Inst 2 Inst 3 IM Reg DM

Time (in cycles) Visualizing Load-Use Stall 1 2 3 4 5 6 7 8 I n s t. lw $1 nop IM Reg DM Reg O d e add $2, $1 Inst 2 Inst 3 IM Reg DM

Contol Hazads When the flow of instuction addesses is not sequential (i.e., PC = PC + 4); incued by change of flow instuctions Unconditional banches (j, jal, j) Conditional banches (beq, bne) Exceptions Possible appoaches Stall (impacts CPI) Move decision point as ealy in the pipeline as possible, theeby educing the numbe of stall cycles Delay decision (equies compile suppot) Pedict and hope fo the best! Contol hazads occu less fequently than data hazads, but thee is nothing as effective against contol hazads as fowading is fo data hazads

PC Datapath Banch and Jump Hadwae Jump PCSc Shift left 2 ID/EX EX/MEM IF/ID Contol 4 Read Addess Add Instuction Memoy Read Add 1 Registe Read Add 2 File Wite Add Wite Data PC+4[31-28] Read Data 1 Read Data 2 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Wite Data Data Memoy Read Data MEM/WB Fowad Unit

Jumps Incu One Stall Jumps not decoded until ID, so one flush is needed To flush, set IF.Flush to zeo the instuction field of the IF/ID pipeline egiste (tuning it into a noop) I n s t. j flush Fix jump hazad by waiting flush O d e j taget Fotunately, jumps ae vey infequent only 3% of the SPECint instuction mix

Two Types of Stalls Nop instuction (o bubble) inseted between two instuctions in the pipeline (as done fo load-use situations) Keep the instuctions ealie in the pipeline (late in the code) fom pogessing down the pipeline fo a cycle ( bounce them in place with wite contol signals) Inset nop by zeoing contol bits in the pipeline egiste at the appopiate stage Let the instuctions late in the pipeline (ealie in the code) pogess nomally down the pipeline Flushes (o instuction squashing) wee an instuction in the pipeline is eplaced with a nop instuction (as done fo instuctions located sequentially afte j instuctions) Zeo the contol bits fo the instuction to be flushed

PC Suppoting ID Stage Jumps Jump PCSc Shift left 2 ID/EX EX/MEM IF/ID Contol 4 Instuction Memoy 0 Read Addess Add Read Add 1 Registe Read Add 2 File Wite Add Wite Data PC+4[31-28] Read Data 1 Read Data 2 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Wite Data Data Memoy Read Data MEM/WB Fowad Unit

One Way to Fix a Banch Contol Hazad I n s t. beq flush Fix banch hazad by waiting flush but affects CPI O d e flush flush beq taget Inst 3 IM Reg DM

Reducing the Delay of Banches Move the banch decision hadwae back to the EX stage Reduces the numbe of stall (flush) cycles to two Adds an and gate and a 2x1 mux to the EX timing path Add hadwae to compute the banch taget addess and evaluate the banch decision to the ID stage Reduces the numbe of stall (flush) cycles to one (like with jumps) But now need to add fowading hadwae in ID stage Computing banch taget addess can be done in paallel with RegFile ead (done fo all instuctions only used when needed) Compaing the egistes can t be done until afte RegFile ead, so compaing and updating the PC adds a mux, a compaato, and an and gate to the ID timing path Fo deepe pipelines, banch decision points can be even late in the pipeline, incuing moe stalls

ID Banch Fowading Issues MEM/WB fowading is taken cae of by the nomal RegFile wite befoe ead opeation Need to fowad fom the EX/MEM pipeline stage to the ID compaison hadwae fo cases like WB add3 $1, MEM add2 $3, EX add1 $4, ID beq $1,$2,Loop IF next_seq_inst WB add3 $3, MEM add2 $1, EX add1 $4, ID beq $1,$2,Loop IF next_seq_inst if (IDcontol.Banch and (EX/MEM.RegisteRd!= 0) and (EX/MEM.RegisteRd = IF/ID.RegisteRs)) FowadC = 1 if (IDcontol.Banch and (EX/MEM.RegisteRd!= 0) and (EX/MEM.RegisteRd = IF/ID.RegisteRt)) FowadD = 1 Fowads the esult fom the second pevious inst. to eithe input of the compae

ID Banch Fowading Issues, con t If the instuction immediately befoe the banch poduces one of the banch souce opeands, then a stall needs to be inseted (between the beq and add1) since the EX stage opeation is occuing at the same time as the ID stage banch compae opeation WB add3 $3, MEM add2 $4, EX add1 $1, ID beq $1,$2,Loop IF next_seq_inst Bounce the beq (in ID) and next_seq_inst (in IF) in place (ID Hazad Unit deassets PC.Wite and IF/ID.Wite) Inset a stall between the add in the EX stage and the beq in the ID stage by zeoing the contol bits going into the ID/EX pipeline egiste (done by the ID Hazad Unit) If the banch is found to be taken, then flush the instuction cuently in IF (IF.Flush)

PC IF.Flush Compae Suppoting ID Stage Banches PCSc Banch Hazad Unit 0 1 ID/EX EX/MEM IF/ID Contol 0 4 Add Shift left 2 Add MEM/WB Instuction Memoy Read Addess 0 Read Add 1 RegFile Read Add 2 Read Data 1 Wite Add ReadData 2 Wite Data 16 Sign Extend 32 cntl Data Memoy Read Data Addess Wite Data Fowad Unit Fowad Unit

Delayed Banches If the banch hadwae has been moved to the ID stage, then we can eliminate all banch stalls with delayed banches which ae defined as always executing the next sequential instuction afte the banch instuction the banch takes effect afte that next instuction MIPS compile moves an instuction to immediately afte the banch that is not affected by the banch (a safe instuction) theeby hiding the banch delay With deepe pipelines, the banch delay gows equiing moe than one delay slot Delayed banches have lost populaity compaed to moe expensive but moe flexible (dynamic) hadwae banch pediction Gowth in available tansistos has made hadwae banch pediction elatively cheape

Scheduling Banch Delay Slots A. Fom befoe banch B. Fom banch taget C. Fom fall though add $1,$2,$3 if $2=0 then delay slot sub $4,$5,$6 add $1,$2,$3 if $1=0 then delay slot A is the best choice, fills delay slot and educes IC In B and C, the sub instuction may need to be copied, inceasing IC In B and C, must be okay to execute sub when banch fails add $1,$2,$3 if $1=0 then delay slot sub $4,$5,$6 becomes becomes becomes add $1,$2,$3 if $2=0 then if $1=0 then add $1,$2,$3 add $1,$2,$3 if $1=0 then sub $4,$5,$6 sub $4,$5,$6

Static Banch Pediction Resolve banch hazads by assuming a given outcome and poceeding without waiting to see the actual banch outcome 1. Pedict not taken always pedict banches will not be taken, continue to fetch fom the sequential instuction steam, only when banch is taken does the pipeline stall If taken, flush instuctions afte the banch (ealie in the pipeline) in IF, ID, and EX stages if banch logic in MEM thee stalls In IF and ID stages if banch logic in EX two stalls in IF stage if banch logic in ID one stall ensue that those flushed instuctions haven t changed the machine state automatic in the MIPS pipeline since machine state changing opeations ae at the tail end of the pipeline (MemWite (in MEM) o RegWite (in WB)) estat the pipeline at the banch destination

Flushing with Mispediction (Not Taken) I n s t. O d e 4 beq $1,$2,2 flush 8 sub $4,$1,$5 16 and $6,$1,$7 20 o 8,$1,$9 To flush the IF stage instuction, asset IF.Flush to zeo the instuction field of the IF/ID pipeline egiste (tansfoming it into a noop)

Banching Stuctues Pedict not taken woks well fo top of the loop banching stuctues Loop: beq $1,$2,Out But such loops have jumps at the bottom of the loop to etun to the top of the loop and incu the jump stall ovehead 1 nd loop inst... last loop inst j Loop Out: fall out inst Pedict not taken doesn t wok well fo bottom of the loop banching stuctues Loop: 1 st loop inst 2 nd loop inst... last loop inst bne $1,$2,Loop fall out inst

Static Banch Pediction, con t Resolve banch hazads by assuming a given outcome and poceeding 2. Pedict taken pedict banches will always be taken Pedict taken always incus one stall cycle (if banch destination hadwae has been moved to the ID stage) Is thee a way to cache the addess of the banch taget instuction?? As the banch penalty inceases (fo deepe pipelines), a simple static pediction scheme will hut pefomance. With moe hadwae, it is possible to ty to pedict banch behavio dynamically duing pogam execution 3. Dynamic banch pediction pedict banches at un-time using un-time infomation

Dynamic Banch Pediction A banch pediction buffe (aka banch histoy table (BHT)) in the IF stage addessed by the lowe bits of the PC, contains bit(s) passed to the ID stage though the IF/ID pipeline egiste that tells whethe the banch was taken the last time it was execute Pediction bit may pedict incoectly (may be a wong pediction fo this banch this iteation o may be fom a diffeent banch with the same low ode PC bits) but the doesn t affect coectness, just pefomance Banch decision occus in the ID stage afte detemining that the fetched instuction is a banch and checking the pediction bit(s) If the pediction is wong, flush the incoect instuction(s) in pipeline, estat the pipeline with the ight instuction, and invet the pediction bit(s) A 4096 bit BHT vaies fom 1% mispediction (nasa7, tomcatv) to 18% (eqntott)

PC Banch Taget Buffe The BHT pedicts when a banch is taken, but does not tell whee its taken to! A banch taget buffe (BTB) in the IF stage caches the banch taget addess, but we also need to fetch the next sequential instuction. The pediction bit in IF/ID selects which next instuction will be loaded into IF/ID at the next clock edge Would need a two ead pot instuction memoy BTB O the BTB can cache the banch taken instuction while the instuction memoy is fetching the next sequential instuction Instuction Memoy Read Addess 0 If the pediction is coect, stalls can be avoided no matte which diection they go

1-bit Pediction Accuacy A 1-bit pedicto will be incoect twice when not taken Assume pedict_bit = 0 to stat (indicating banch not taken) and loop contol is at the bottom of the loop code 1. Fist time though the loop, the pedicto mispedicts the banch since the banch is taken back to the top of the loop; invet pediction bit (pedict_bit = 1) 2. As long as banch is taken (looping), pediction is coect 3. Exiting the loop, the pedicto again mispedicts the banch since this time the banch is not taken falling out of the loop; invet pediction bit (pedict_bit = 0) Loop: 1 st loop inst 2 nd loop inst... last loop inst bne $1,$2,Loop fall out inst Fo 10 times though the loop we have a 80% pediction accuacy fo a banch that is taken 90% of the time

2-bit Pedictos A 2-bit scheme can give 90% accuacy since a pediction must be wong twice befoe the pediction bit is changed ight 9 times 1 Taken 0 Taken Pedict Taken Pedict Not Taken 11 01 wong on loop fall out Not taken Taken Not taken Taken 10 ight on 1 st iteation Pedict Taken Pedict Not Taken 00 1 Not taken Not taken 0 Loop: 1 st loop inst 2 nd loop inst... last loop inst bne $1,$2,Loop fall out inst BHT also stoes the initial FSM state

Dealing with Exceptions Exceptions (aka inteupts) ae just anothe fom of contol hazad. Exceptions aise fom R-type aithmetic oveflow Tying to execute an undefined instuction An I/O device equest An OS sevice equest (e.g., a page fault, TLB exception) A hadwae malfunction The pipeline has to stop executing the offending instuction in midsteam, let all pio instuctions complete, flush all following instuctions, set a egiste to show the cause of the exception, save the addess of the offending instuction, and then jump to a peaanged addess (the addess of the exception handle code) The softwae (OS) looks at the cause of the exception and deals with it

Two Types of Exceptions Inteupts asynchonous to pogam execution caused by extenal events may be handled between instuctions, so can let the instuctions cuently active in the pipeline complete befoe passing contol to the OS inteupt handle simply suspend and esume use pogam Taps (Exception) synchonous to pogam execution caused by intenal events condition must be emedied by the tap handle fo that instuction, so much stop the offending instuction midsteam in the pipeline and pass contol to the OS tap handle the offending instuction may be etied (o simulated by the OS) and the pogam may continue o it may be aboted

Whee in the Pipeline Exceptions Occu Aithmetic oveflow Undefined instuction TLB o page fault I/O sevice equest Hadwae malfunction Stage(s)? EX ID IF, MEM any any Synchonous? yes yes yes no no Bewae that multiple exceptions can occu simultaneously in a single clock cycle

Multiple Simultaneous Exceptions I n s t. O d e Inst 0 Inst 1 Inst 2 Inst 3 D$ page fault aithmetic oveflow undefined instuction Inst 4 I$ page fault Hadwae sots the exceptions so that the ealiest instuction is the one inteupted fist

Additions to MIPS to Handle Exceptions (Fig 6.42) Cause egiste (ecods exceptions) hadwae to ecod in Cause the exceptions and a signal to contol wites to it (CauseWite) EPC egiste (ecods the addesses of the offending instuctions) hadwae to ecod in EPC the addess of the offending instuction and a signal to contol wites to it (EPCWite) Exception softwae must match exception to instuction A way to load the PC with the addess of the exception handle Expand the PC input mux whee the new input is hadwied to the exception handle addess - (e.g., 8000 0180 hex fo aithmetic oveflow) A way to flush offending instuction and the ones that follow it

PC Datapath with Contols fo Exceptions IF.Flush Compae 4 8000 0180 hex Add IF/ID PCSc Hazad Unit Contol Shift left 2 Banch ID.Flush 0 1 0 Add ID/EX Cause EPC EX.Flush 0 0 EX/MEM MEM/WB Instuction Memoy Read Addess 0 Read Add 1 RegFile Read Add 2 Read Data 1 Wite Add ReadData 2 Wite Data 16 Sign Extend 32 cntl Data Memoy Read Data Addess Wite Data Fowad Unit Fowad Unit

fowading Stall hee Stalling vs. Flushing Example Inst1: lw $1, 0($2) Inst2: add $2, $1, $1 Inst3: add $3, $2, $1 Inst4: bne $1, $1, label Inst5: and $1, $2, $3 Inst6: o $1, $1, $1 Flush hee (assuming no delay slot) Inst1: j Inst4 Inst2: add $2, $1, $1 Inst3: add $3, $2, $1 Inst4: bne $1, $1, label Inst5: and $1, $2, $3 Inst6: o $1, $1, $1 Cycle 1 IM Reg DM Reg IM Reg DM Reg Cycle 2 IM Reg DM Reg IM Reg DM Reg inst2 nop Cycle 3 IM Reg DM Reg IM Reg DM Reg Inset nop nop Cycle 4 IM Reg DM Reg IM Reg DM Reg nop nop Cycle 5 IM Reg DM Reg IM Reg DM Reg

Stall hee Stalling vs. Flushing Example Inst1: lw $1, 0($2) Inst2: add $2, $1, $1 Inst3: add $3, $2, $1 Inst4: bne $1, $1, label Inst5: and $1, $2, $3 Inst6: o $1, $1, $1 Flush hee (assuming no delay slot) Inst1: j Inst4 Inst2: add $2, $1, $1 Inst3: add $3, $2, $1 Inst4: bne $1, $1, label Inst5: and $1, $2, $3 Inst6: o $1, $1, $1 Inst1 Inst2 Inst3 Inst4 Inst5 Inst6 cycle cycle 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 IF ID EX M W IF ID ID EX M W IF ID EX M W IF IF IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W

Stall hee Stalling vs. Flushing Example Inst1: lw $1, 0($2) Inst2: add $2, $1, $1 Inst3: add $3, $2, $1 Inst4: bne $1, $1, label Inst5: and $1, $2, $3 Inst6: o $1, $1, $1 Flush hee (assuming no delay slot) Inst1: j Inst4 Inst2: add $2, $1, $1 Inst3: add $3, $2, $1 Inst4: bne $1, $1, label Inst5: and $1, $2, $3 Inst6: o $1, $1, $1 Inst1 nop Inst2 Inst3 Inst4 Inst5 Inst6 cycle cycle 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 IF ID EX M W EX M W IF ID ID EX M W IF IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W

Stall hee Stalling vs. Flushing Example Inst1: lw $1, 0($2) Inst2: add $2, $1, $1 Inst3: add $3, $2, $1 Inst4: bne $1, $1, label Inst5: and $1, $2, $3 Inst6: o $1, $1, $1 Flush hee (assuming no delay slot) Inst1: j Inst4 Inst2: add $2, $1, $1 Inst3: add $3, $2, $1 Inst4: bne $1, $1, label Inst5: and $1, $2, $3 Inst6: o $1, $1, $1 Inst1 nop Inst2 Inst3 Inst4 Inst5 Inst6 cycle cycle 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 IF ID EX M W EX M W IF ID ID EX M W IF IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W IF ID EX M W

The BIG Pictue Pipeline Summay Pipelining impoves pefomance by inceasing instuction thoughput Executes multiple instuctions in paallel Each instuction has the same latency Subject to hazads Stuctue, data, contol Instuction set design affects complexity of pipeline implementation

Othe Sample Pipeline Altenatives ARM7 IM Reg EX PC update IM access decode eg access op DM access shift/otate commit esult (wite back) XScale PC update BTB access stat IM access IM1 IM2 Reg DM1 Reg SHFT DM2 IM access decode eg 1 access shift/otate eg 2 access op DM wite eg wite stat DM access exception

Acknowledgments Some of the slides contain mateial developed and copyighted by M.J. Iwin (Penn state), B. Pahami (UCSB), and instucto mateial fo the textbook