Computer Architecture Pipelines. Diagrams are from Computer Architecture: A Quantitative Approach, 2nd, Hennessy and Patterson.

Similar documents
Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

WAR: Write After Read

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Design of Pipelined MIPS Processor. Sept. 24 & 26, 1997

Pipeline Hazards. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by Arvind and Krste Asanovic

Lecture: Pipelining Extensions. Topics: control hazards, multi-cycle instructions, pipelining equations

CS352H: Computer Systems Architecture

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Chapter 5 Instructor's Manual


Using Graphics and Animation to Visualize Instruction Pipelining and its Hazards

Solutions. Solution The values of the signals are as follows:

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

Instruction Set Architecture (ISA)

Computer organization

Computer Organization and Architecture

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

LSN 2 Computer Processors

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Instruction Set Design

PROBLEMS #20,R0,R1 #$3A,R2,R4

Pipelining Review and Its Limitations

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

A Lab Course on Computer Architecture

CPU Performance Equation

CS521 CSE IITG 11/23/2012

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

An Introduction to the ARM 7 Architecture

CPU Organization and Assembly Language

(Refer Slide Time: 00:01:16 min)

Introducción. Diseño de sistemas digitales.1

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

Review: MIPS Addressing Modes/Instruction Formats

MICROPROCESSOR AND MICROCOMPUTER BASICS

Data Dependences. A data dependence occurs whenever one instruction needs a value produced by another.

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December :59

In the Beginning The first ISA appears on the IBM System 360 In the good old days

Computer Organization and Components

Let s put together a Manual Processor

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

EC 362 Problem Set #2

Course on Advanced Computer Architectures

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Central Processing Unit (CPU)

Week 1 out-of-class notes, discussions and sample problems

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

Intel 8086 architecture

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

Computer Architectures

Computer Architecture TDTS10

PROBLEMS. which was discussed in Section

Instruction Set Architecture

EECS 427 RISC PROCESSOR

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Thread level parallelism

Introduction to Cloud Computing

Reduced Instruction Set Computer (RISC)

IA-64 Application Developer s Architecture Guide

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU

Introduction to Microprocessors

StrongARM** SA-110 Microprocessor Instruction Timing

TIMING DIAGRAM O 8085

CHAPTER 7: The CPU and Memory

Instruction Set Architecture. Datapath & Control. Instruction. LC-3 Overview: Memory and Registers. CIT 595 Spring 2010

İSTANBUL AYDIN UNIVERSITY

on an system with an infinite number of processors. Calculate the speedup of

Chapter 9 Computer Design Basics!

Chapter 2 Topics. 2.1 Classification of Computers & Instructions 2.2 Classes of Instruction Sets 2.3 Informal Description of Simple RISC Computer, SRC

EE361: Digital Computer Organization Course Syllabus

A SystemC Transaction Level Model for the MIPS R3000 Processor

Microprocessor & Assembly Language

Chapter 2 Logic Gates and Introduction to Computer Architecture

Instruction Set Architecture (ISA) Design. Classification Categories

8086 Microprocessor (cont..)

ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER

Static Scheduling. option #1: dynamic scheduling (by the hardware) option #2: static scheduling (by the compiler) ECE 252 / CPS 220 Lecture Notes

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

VLIW Processors. VLIW Processors

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Microprocessor and Microcontroller Architecture

The WIMP51: A Simple Processor and Visualization Tool to Introduce Undergraduates to Computer Organization

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

MACHINE ARCHITECTURE & LANGUAGE

A s we saw in Chapter 4, a CPU contains three main sections: the register section,

Comp 255Q - 1M: Computer Organization Lab #3 - Machine Language Programs for the PDP-8

Faculty of Engineering Student Number:

An Overview of Stack Architecture and the PSC 1000 Microprocessor

The Microarchitecture of Superscalar Processors

4 RISC versus CISC Architecture

Computer Systems Design and Architecture by V. Heuring and H. Jordan

The 104 Duke_ACC Machine

Transcription:

Computer Architecture Pipelines Diagrams are from Computer Architecture: A Quantitative Approach, 2nd, Hennessy and Patterson.

Instruction Execution

Instruction Execution 1. Instruction Fetch (IF) - Get the instruction to execute. IR M[PC] NPC PC + 4 2. Instruction Decode/Register Fetch (ID) Figure out what the instructions is supposed to do and what it needs. A Register[R s1 ] B Register[R s2 ] Imm {(IR 16 ) 16, IR 15..0 }

Instruction Execution 3. Execution (EX) The instruction has been decoded, so execution can be split according to instruction type. Reg-Reg ALUout A op B Reg-Imm ALUout A op Imm Branch ALUout NPC + Imm Target cond (A{=,!=}0) LD/ST ALUout A op Imm Eff Address Jump??

Instruction Execution 4. Memory Access/Branch Completion (MEM) Besides the IF stage this is the only stage that access the memory to load and store data. Load: Load Memory Data = LMD = Mem[ALUout] Store: Mem[ALUout] B Branch: PC (cond)?aluout:npc Jump/JAL:?? JR: PC A ELSE: PC NPC 5. Write-Back (WB) Store all the results and loads back to registers. ALU Instruction: Rd ALUoutput Load: Rd LMD JAL: R31 Old NPC

What is Pipelining?? Pipelining is an implementation technique whereby multiple instructions are overlapped in execution.

Pipelining Time 1 2 3 4 5 6 7 Instr 1 IF ID EX MEM WB Instr 2 IF ID EX MEM WB Instr 3 IF ID EX MEM WB Instr 4 IF ID EX MEM Instr 5 IF ID EX What are the overheads of pipelinging? What are the complexities? What types of pipelines are there?

Pipelining

Pipelining Pipeline Overhead Latches between stages Must latch data and control Expansion of stage delay Original time was IF + ID + EX + MEM + WB With phases skipped for some instructions New best time is MAX (IF,ID,EX,MEM,WB) + Latch Delay More memory ports or separate memories Requires complicated control to make as fast as possible But so do non-pipelined high performance designs

Pipelining Pipeline complexities 3 or 5 or 10 instructions are executing at the same time What if the instructions are dependent? Must have hardware or software ways to ensure proper execution. What if an interrupt occurs? Would like to be able to figure out what instruction caused it!!

Pipeline Stalls Structural What causes pipeline stalls? 1. Structural Hazard Inability to perform 2 tasks. Example: Memory needed for both instructions and data SpecInt92 has 35% load and store If this remains a hazard, each load or store will stall one IF.

Pipeline Stalls Data 2.1 Data Hazards Read after Write RAW Data is needed before it is written. Calc Done R1 Written ADD R1,R2,R3 IF ID EX MEM WB ADD R4,R1,R5 IF ID EX MEM WB R1 Read? ADD R4,R1,R5 IF ID R1 Used EX MEM WB ADD R4,R2,R3 IF ID EX MEM WB TIME = 1 2 3 4 5 6 7 8 R1 Read?

Pipeline Stalls Data Data must be forwarded to other instructions in pipeline if ready for use but not yet stored. In 5-stage MIPS pipeline, this means forwarding to next three instructions. Can reduce to 2 instructions if register bank can process a simultaneous read and write. Text indicates this as writing in first half of WB and reading in second half of ID. More likely, both occur at once, with read receiving the data as it is written.

Pipeline Stalls Data Load Delays Data available at end of MEM, not in time for the Instr + 1 s EX phase.

Pipeline Stalls Data So insert a bubble

Pipeline Stalls Data A Store requires two registers at two different times: Rs1 is required for address calculation during EX Data to be stored (Rd) is required during MEM Data can be loaded and immediately stored Data from the ALU must be forwarded to MEM for stores

Pipeline Stalls Data 2.2 Data Hazards Write After Write WAW Out-of-order completion of instructions FMUL1 FMUL2 FDIV R1, R2, R3 FMUL R1, R2, R3 IF ID/RF ISSUE WB FDIV1 FDIV2 FDIV3 FDIV4 Length of FDIV pipe could cause WAW error

Pipeline Stalls Data 2.3 Data Hazards Write After Read WAR A slow reader followed by a fast writer IF ID ISSUE Read registers for multiply FLOAD1 FMADD1 FMADD2 FMADD3 FMADD4 WB Read registers for add FMADD R1, R2, R3, R4 IF ID IS FM1 FM2 FM3 FM4 WB FLOAD R4, #300(R5) IF ID IS F1 WB

Pipeline Stalls Data 2.4 Data Hazards Read After Read RAR A read after a read is not a hazard Assuming that a read does not change any state

Pipeline Stalls Control 3. Control Hazards Branch, Jump, Interrupt Or why computers like straight line code 3.1 Control Hazards Jump Jump IF ID EX MEM WB IF ID EX MEM WB When is knowledge of jump available? When is target available? With absolute addressing, may be able to fit jump into IF phase with zero overhead.

Pipeline Stalls Control This relies on performing simple decode in IF. If IF is currently the slowest phase, this will slow the clock. Relative jumps would require an adder and its delay.

Pipeline Stalls Control 3.2 Control Hazards Branch Branch IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Branch is resolved in MEM and information forwarded to IF, causing three stalls. 16% of the SpecInt instructions are branches, and about 67% are taken, CPI = CPI IDEAL + STALLS = 1 + 3(0.67)(0.16) = 1.32 Must Resolve branch sooner Update PC sooner Fetch correct next instruction more frequently

Pipeline Stalls Control Delayed branch and branch delay slot Ideally, fill slot with instruction from before branch Otherwise from target (since branches are more often taken than not) Or from fall-through Slots filled 50-70% of time Interrupt recovery complicated

Pipelines Summary of Pipelines Why use a pipeline? Seems so complicated. What are some of the overheads?

Other Architectures Motorola 68000 Family (circa late 70 s) Instructions can specify 8, 16, or 32 bit operands. Has 32 bit registers. Has 2 register files A and D both have 8 32-bit registers. (A is for addresses and D is for Data). Used to save bits in the opcode since implicit use of A or D is in the instruction. Uses a two-address instruction set. Instructions are usually 16-bits but can have up to 4 more 16-bit words. Supports lots of addressing modes.

Other Architectures Intel x86 Family (circa late 70 s) Probably the least elegant design available. Uses 16-bit words, so only 64k unique address can be provided. Used segment addressing to overcome this limit. Basically shift the address by 4 (multiply by 16) and then add a 16-bit offset giving and effective 20-bit address, thus 1MB of memory. Has 4 registers (segment registers) used to hold segment addresses: CS, DS, SS, ES. Has 4 general registers: AX, BX, CX, and DX. Expanded to 32-bit addressing with the 386.

Other Architectures Sun SPARC Architecture (late 1980 s) Scalable Processor ARChitecture Developed around the same time as the MIPS architecture. Very similar to MIPS except for register usage. Has a set of global registers (8) like MIPS. Has a large set of registers from which only a subset can be accessed at any time, called a register file. Is a load/store architecture with 3-addressing instructions. Has 32-bit registers and all instructions are 32-bits. Designed to support procedure call and returns efficiently. Number of register in the register file vary with implementation but at most 24 can be accessed. Those 24 plus the 8 global give a register window.