CSE 220: System Fundamentals I Unit 13: MIPS Architecture: Single-Cycle Processors

Similar documents
Design of Digital Circuits (SS16)

Solutions. Solution The values of the signals are as follows:

Computer organization

Review: MIPS Addressing Modes/Instruction Formats

(Refer Slide Time: 00:01:16 min)

Computer Organization and Components

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Microprocessor & Assembly Language

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

Pipeline Hazards. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by Arvind and Krste Asanovic

Let s put together a Manual Processor

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

Instruction Set Architecture. Datapath & Control. Instruction. LC-3 Overview: Memory and Registers. CIT 595 Spring 2010

CPU Organisation and Operation

CS 61C: Great Ideas in Computer Architecture Finite State Machines. Machine Interpreta4on

Binary Adders: Half Adders and Full Adders

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Instruction Set Architecture

Chapter 9 Computer Design Basics!

CS352H: Computer Systems Architecture

Notes on Assembly Language

ECE410 Design Project Spring 2008 Design and Characterization of a CMOS 8-bit Microprocessor Data Path

Introducción. Diseño de sistemas digitales.1

Chapter 2 Logic Gates and Introduction to Computer Architecture

MICROPROCESSOR AND MICROCOMPUTER BASICS

Systems I: Computer Organization and Architecture

A s we saw in Chapter 4, a CPU contains three main sections: the register section,

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

SIM-PL: Software for teaching computer hardware at secondary schools in the Netherlands

CSE140 Homework #7 - Solution

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

Design of Pipelined MIPS Processor. Sept. 24 & 26, 1997

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

An Introduction to the ARM 7 Architecture

Instruction Set Design

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December :59

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

TIMING DIAGRAM O 8085

Generating MIF files

Flip-Flops, Registers, Counters, and a Simple Processor

CHAPTER 3 Boolean Algebra and Digital Logic

The 104 Duke_ACC Machine

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

Sequential Logic. (Materials taken from: Principles of Computer Hardware by Alan Clements )

Summary of the MARIE Assembly Language

CHAPTER 7: The CPU and Memory

CHAPTER 4 MARIE: An Introduction to a Simple Computer

Reduced Instruction Set Computer (RISC)

Central Processing Unit (CPU)

MACHINE ARCHITECTURE & LANGUAGE

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

BINARY CODED DECIMAL: B.C.D.

Figure 8-1 Four Possible Results of Adding Two Bits

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

WAR: Write After Read

Chapter 5 Instructor's Manual

WEEK 8.1 Registers and Counters. ECE124 Digital Circuits and Systems Page 1

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

Lesson 12 Sequential Circuits: Flip-Flops

EE 42/100 Lecture 24: Latches and Flip Flops. Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad

A FPGA Implementation of a MIPS RISC Processor for Computer Architecture Education

Register File, Finite State Machines & Hardware Control Language

Comp 255Q - 1M: Computer Organization Lab #3 - Machine Language Programs for the PDP-8

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Take-Home Exercise. z y x. Erik Jonsson School of Engineering and Computer Science. The University of Texas at Dallas

İSTANBUL AYDIN UNIVERSITY

CHAPTER 11: Flip Flops

Using Graphics and Animation to Visualize Instruction Pipelining and its Hazards

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

SEQUENTIAL CIRCUITS. Block diagram. Flip Flop. S-R Flip Flop. Block Diagram. Circuit Diagram

Today. Binary addition Representing negative numbers. Andrew H. Fagg: Embedded Real- Time Systems: Binary Arithmetic

Intel 8086 architecture

A SystemC Transaction Level Model for the MIPS R3000 Processor

earlier in the semester: The Full adder above adds two bits and the output is at the end. So if we do this eight times, we would have an 8-bit adder.

3.Basic Gate Combinations

Chapter 01: Introduction. Lesson 02 Evolution of Computers Part 2 First generation Computers

ARM Cortex-M3 Assembly Language

Experiment # 9. Clock generator circuits & Counters. Eng. Waleed Y. Mousa

Pigeonhole Principle Solutions

CSE140: Components and Design Techniques for Digital Systems. Introduction. Prof. Tajana Simunic Rosing

CS311 Lecture: Sequential Circuits

Systems I: Computer Organization and Architecture

Registers & Counters

Module 3: Floyd, Digital Fundamental

To design digital counter circuits using JK-Flip-Flop. To implement counter using 74LS193 IC.

CSE 141 Introduction to Computer Architecture Summer Session I, Lecture 1 Introduction. Pramod V. Argade June 27, 2005

1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1.

EC 362 Problem Set #2

Instruction Set Architecture (ISA)

A New Paradigm for Synchronous State Machine Design in Verilog

Week 1 out-of-class notes, discussions and sample problems

Memory Elements. Combinational logic cannot remember

Transcription:

Microarchitecture CSE 22: System Fundamentals I Unit 3: MIPS Architecture: Single-Cycle Processors For most of the remainder of the semester we will look at different ways to piece together a MIPS CPU We will study microarchitecture, which is how an architecture is actually implemented in the hardware A particular architecture could be implemented in a variety of ways that trade off performance, cost and complexity More specifically, we need to determine for a CPU: datapath: the functional blocks that do the work of the CPU (memory, registers, ALU, muxes) control: the control signals that direct the functional blocks to perform the desired operations 2 Microarchitecture Microarchitecture We will explore three implementations for MIPS architecture: Single-cycle: each instruction executes in a single cycle Multicycle: each instruction is broken into series of shorter steps, each of which executes in one cycle Pipelined: each instruction broken up into series of steps, and multiple instructions execute simultaneously Any microarchitecture must represent the architectural state of the CPU, which consists of (the values of) the program counter, the 32 registers and the memory The control unit receives the current instruction from the datapath and tells the datapath how to execute the instruction We ll start with the design of the single-cycle processor In the beginning we will consider only a subset of MIPS instructions: R-type instructions: and, or, add, sub, slt Memory instructions: lw, sw Branch instructions: beq Later we will add other MIPS instructions to the datapath We will even invent our own instructions, add them to the architecture and see how the control and datapath must be modified to handle them 3 4

State Elements State Elements These are the units that store values and represent the architectural state In these diagrams, a black line indicates data buses, whereas blue lines indicate control signals Although not realistic, for now we will assume that instructions and data are stored in two physically separate memories PC gives the address of the current instruction, and PC gives the address of the next instruction CLK is the system clock, which oscillates between low and high voltage to synchronize the elements State elements change their state only on the rising edge of the clock (transition from low to high), so all elements change state simultaneously 5 6 State Elements State Elements The instruction memory has a single read port (RD). The 32-bit instruction address input (A) determines which instruction is sent out on RD. The register file contains thirty-two 32-bit registers, two read ports and one write port. A, A2 and A3 take 5-bit addresses to indicate which registers should be read (A, A2 for RD, RD2) and/or written (A3 for WD3) The WE3 is the write enable control signal. When, the register file writes the data into the specified register (given by A3) on the rising edge of the clock. Finally, the data memory has a single read/write port. If WE is asserted, the data from WD is written into address A on the rising edge. Otherwise, WE is and the memory reads the data at address A and sends it out on RD. 7 8

Single-Cycle Datapath Our first iteration of a MIPS CPU will feature a single-cycle datapath, which means that the CPU executes only one instruction per cycle The first step is to read the instruction from instruction memory Single-Cycle Datapath Let s consider the lw instruction lw is an I-type instruction lw rt, imm(rs) Reg[rt] = Mem[Reg[rs]+SignImm] You will need to know the instruction formats for the quizzes and final exam, so start studying them now The instruction memory reads out (fetches) the 32-bit instruction, labeled Instr 9 Single-Cycle Datapath: lw Single-Cycle Datapath: lw For lw, we need to read the source register containing the base address This is rs, i.e., :. We need to send these bits to the register read port A, which reads the register value onto RD. lw also needs an offset, which is given in : This (signed) offset must be sign-extended. We call this result SignImm. 2

Single-Cycle Datapath: lw Now, this offset must be added to the base address that came out of RD, so we need an adder ALUControl is a 3-bit control signal that tells the ALU what operation to perform. here indicates addition. Single-Cycle Datapath: lw The output of the ALU gives the effective address of the word we want to read from memory, so this address is sent to the data memory (via A), which reads the word from memory and writes it to the ReadData bus 3 4 Single-Cycle Datapath: lw Single-Cycle Datapath: lw The ReadData bus carries the data word to the register file s write port, WD3 The rt field of the instruction ( : ) is sent to A3, which indicates the register that will receive the data The control signal RegWrite is asserted so that the data value is written into the register file on the rising edge of the clock at the end of the cycle 5 6

Single-Cycle Datapath: lw While all of this is happening, the CPU must compute the address of the next instruction, PC We use an adder to add 4 to the PC and write the new value on the rising edge of the clock Single-Cycle Datapath: sw sw is executed in a similar way to lw and uses much of the same datapath: sw rt, imm(rs) Mem[Reg[rs]+SignImm] = Reg[rt] 7 8 Single-Cycle Datapath: sw The RegWrite signal is not asserted because we are reading from the register file, not writing to it. However, data is still read from memory, but ReadData is simply ignored. The MemWrite signal is asserted because we are writing to the memory Single-Cycle Datapath: R-Type Now we ll see how to extend the datapath to handle the R-type instructions add, sub, and, or and slt All of these read two registers, perform an ALU operation, and write back to the register file Example: Reg[rd] = Reg[rs] + Reg[rt] for add So, all will use the same hardware but have different ALUControl signals 9 2

Single-Cycle Datapath: R-Type Single-Cycle Datapath: R-Type First, a mux has been added to let us control the source of the SrcB value to the ALU (control signal ALUSrc) For an R-type instruction the operand comes from the register file For the lw and sw instructions the operand comes from the instruction itself So ALUSrc is for R-type instructions and for lw and sw We will use muxes again for selecting the source of inputs 2 22 Single-Cycle Datapath: R-Type Single-Cycle Datapath: R-Type When doing a lw, we needed to write to the register file R-type instructions also write to the register file We add a mux and control signal MemtoReg to choose between ReadData and ALUResult MemtoReg is for R-type instructions to choose Result from the ALUResult MemtoReg is for lw to choose ReadData. We don t care about the value of MemtoReg for sw because sw does not write to the register file. 23 24

Single-Cycle Datapath: R-Type For lw, we write data to register rt ( : ) Single-Cycle Datapath: R-Type For an R-type instruction we write to register rd ( : ) 25 26 Single-Cycle Datapath: R-Type We add a control signal RegDst to choose WriteReg from the appropriate field of the instruction RegDst is for R-type instructions (to select rd, : ) and for lw (to select rt, : ) We don t care about the value of RegDst for sw Single-Cycle Datapath: beq beq is an I-type instruction beq compares two registers and adds a branch offset (with the help of : ) to the PC if the registers contents are equal To test for equality, we compute (rs rt) and check if the difference (ALUResult) equals zero So ALUControl is to direct the ALU to do subtraction : is the number of instructions to skip over, so we sign-extend the offset and multiply it by 4: = = + 4 + 4 27 28

Single-Cycle Datapath: beq Single-Cycle Datapath: beq Normally, = 4= + 4, so we need to add a mux to choose which formula to use. If control signal Branch is asserted and the ALU produces, then PCSrc is and we choose PCBranch. Stony Brook University CSE 22 29 Complete Single-Cycle CPU ALUSrc is to select SrcB from the register file RegWrite and MemWrite are both (no writing takes place) We don t care about RegDst and MemtoReg s values because we are not writing to registers or memory Stony Brook University CSE 22 3 Single-Cycle Control Unit The control signals are based on the opcode and funct fields of instructions, which are : and :, respectively R-type instructions use the funct field to determine the operation ALUOp: Meaning Stony Brook University CSE 22 3 Add Subtract Look at funct Not used Stony Brook University CSE 22 32

Single-Cycle Control Unit Since is not used, we can use X to represent Subtract, and X for Look at funct Moreover, for R-type instructions, we actually need only : b/c the first two bits are always This simplifies the ALU decoder a little Next slide: truth table for the main decoder ALUOp : Meaning Add Subtract Look at funct Not used ALUOp : funct ALUControl 2: X (Add) X X (Subtract) X (add) (Add) X (sub) (Subtract) X (and) (And) X (or) (Or) X (slt) (SLT) 33 Instr. Op 5: RegWrite RegDst ALUSrc Branch MemWrite MemtoReg ALUOp : R-type lw sw beq 34 Example: Control Signals and Data Flow for or Operation Datpaths: An Important Note We just highlighted only the part of the datapath used by the or instruction This doesn t meant that the parts we did not highlight are turned off or made inactive somehow Signals are always traveling down wires, even when the CPU is doing no useful work or not using part of its circuitry to perform a computation This is why it s important to set control signals to prevent unintended writes and other changes to the architectural state 35 36

Single-Cycle Datapath: beq Single-Cycle Datapath: lw 37 38 Single-Cycle Datapath: addi Single-Cycle Datapath: addi addi is an I-type instruction that adds the value in a register to the immediate, and writes the result to another register Reg[rt] = Reg[rs] + immediate We already have all the datapath hardware we need to implement this instruction, so all that s needed is to add a row to the main decoder truth table Instr. Op 5: RegWrite RegDst ALUSrc Branch MemWrite MemtoReg ALUOp : addi 39 4

Single-Cycle Datapath: j Single-Cycle Datapath: j The j instruction is a J-type instruction that writes a new value to the PC What hardware could/should we add to implement j? The two least significant bits of the PC are always (why?) The next 26 least significant bits of the new PC (PC ) are taken from the jump address field ( : ) Finally, the four most significant bits are taken from the value of the current PC+4 Our current datapath cannot perform this computation to determine PC, so we need more hardware Stony Brook University CSE 22 4 Single-Cycle Datapath: j Instr. Op5: j RegWrite RegDst ALUSrc Branch MemWrite Stony Brook University CSE 22 Stony Brook University CSE 22 42 Single-Cycle Datapath: j MemtoReg ALUOp: Jump 43 Highlight the part of the datapath used by the j instruction. Stony Brook University CSE 22 44

Single-Cycle Performance Every instruction in the single-cycle CPU takes clock cycle, even simple ones like j So, the clock period is constant and must be long enough to accommodate the slowest instruction. A MIPS program lines long will take ( delay_of_slowest_instruction) picoseconds to complete. In most implementation technologies, the ALU, memory and register file accesses are much slower than other operations The lw instruction requires the data be read from and written to the register file and also read from memory This makes it one of the slowest operations, so let s look at the critical path for the lw instruction Critical Path for lw Instruction The critical path will necessarily start with the PC loading a new address on the rising edge of the clock Now, when presented with a signal, a register does not instantaneously change its state there is a delay until the register reaches a stable state and its value can be read This is the propagation delay for the register The textbook calls this the time of propagation from clock to Q Why Q? Because in circuits for storing values, Q is used as the label for the current values stored in the circuit. A register s clock-to-q delay is denoted, so for the PC the propagation delay will be denoted _ 45 46 Critical Path for lw Instruction Critical Path for lw Instruction The instruction memory reads the next instruction which means we incur a delay of retrieving data from memory 47 48

Critical Path for lw Instruction Critical Path for lw Instruction Next we read rs from the register file and send it to SrcA Meanwhile, the immediate field is sign-extended and selected at the ALUSrc mux to determine SrcB 49 5 Critical Path for lw Instruction Critical Path for lw Instruction The ALU now adds the two values to compute the effective address Next the data memory reads from the effective address 5 52

Critical Path for lw Instruction Then the MemtoReg mux selects ReadData Critical Path for lw Instruction Finally, Result must setup at the register file before the next rising clock edge so that it can be properly written Here, setup refers to the fact that it takes time for the register s value to stabilize Complicating the critical path delay analysis is the fact that there are two paths through the circuit we have to consider: a path through the register file (to read rs) and a path through the sign-extending unit and ALUSrc mux Reading the register file will take longer, so we can safely ignore the other path 53 54 Critical Path for lw Instruction Critical Path for lw Instruction Therefore, the delay (CPU cycle time) is: = _ + + + + + + There are really different ways we could estimate the critical path delay For example, we have not considered the time for signals to travel down wires, but we will usually ignore this Also, trying to rely on a formula is not a very general way to proceed. Rather, we should determine the critical path for an instruction and then compute the delay by looking at which components lie on the datapath for that instruction. Suppose we have the following timings (in ps) for the combinational and memory units: I-Mem Adder MUX ALU RegFile D-Mem SignExt Shifter 4 5 2 3 5 2 2 55 56

Critical Path for lw Instruction Computing Critical Path Delays The previous example didn t include the entire datapath (i.e., the hardware for the j instruction is missing), but the general idea will work for other instruction types Let s compute the critical path delay (propagation delay) for R-type instructions in the datapath I-Mem Adder MUX ALU RegFile D-Mem SignExt Shifter 4 5 2 3 5 2 2 57 58 Critical Path for R-type Instructions Rules for Adding Instructions I-Mem Read Adder MUX ALU RegFile Read RegFIle Write D-Mem Read D-Mem Write SignExt 4 5 3 2 3 45 5 4 4 6 Shifter 59 Determine which group of instructions the new instruction is most like, if any. Determine the flow of the data from the Fetch stage through the datapath. Determine any necessary changes to the existing datapath and any additional hardware that may be required (always add hardware as a last resort). Determine the control signals for the existing system and added hardware. You may need to change existing controls to add more bits. Make sure you change the control values for the existing instructions and specify any new controls. Don t break existing functionality! 6

Single-Cycle Datapath: swi Single-Cycle Datapath: swi Mem[Reg[rd] + Reg[rs]] = Reg[rt] Suppose we invent a new R-type instruction, swi, that performs the following operation: Mem[Reg[rd] + Reg[rs]] = Reg[rt] swi would read two registers and add their contents to determine the effective address of the location in memory to store the contents of rt In a single cycle we can use each component of the datapath only once For example, this means we cannot read from the register file on two separate occasions in a single cycle Therefore, we will need to add another read port to the register file (RD3) and also a corresponding address input (A4) Stony Brook University CSE 22 6 Single-Cycle Datapath: swi Instr. Op5: RegWrite RegDst ALUSrc Branch MemWrite MemtoReg ALUOp: Jump lw sw X X beq X X addi j X XX X X XX X X Stony Brook University CSE 22 Stony Brook University CSE 22 62 Single-Cycle Datapath: swinc R-type swi 63 Imagine an I-type instruction that could store a word and increment a register by 4 in one instruction (i.e., one clock cycle) In other words: Mem[Reg[rs]+immediate] = Reg[rt] Reg[rs] = Reg[rs] + 4 Format: swinc rt, immediate(rs) The sw portion of the instruction is unchanged. No datapath modifications are required for this half of the instruction. For the second half of the instruction, the value in the rs register must be incremented by 4. How could we do this? Let s look over the datapath diagram. Stony Brook University CSE 22 64

Single-Cycle Datapath: swinc Single-Cycle Datapath: swinc All adders are being used by other components, so we need to insert another adder This new adder will take the RD output and the immediate value 4 as its inputs, and output the sum of those two values This sum needs to be written to the register file Now, the MemtoReg mux determine what data is sent to the write port (WD3) of the register file We will need to add a third input to that mux so we can send the output of our new adder to the register file Also, we need to provide the correct register number to A3, so we will need to add a third input to the RegDst mux 65 66 Single-Cycle Datapath: swinc Mark the datapath for the instruction and give the control signals. Mem[Reg[rs]+immediate] = Reg[rt] Reg[rs] = Reg[rs] + 4 Single-Cycle Datapath: bnmul4 Imagine a MIPS instruction that would take a branch if the value in rs were not a multiple of 4: bnmul4 rs, label if Reg[rs] is NOT a multiple of 4 then = + 4 + 4 else = + 4 How could we implement this? What hardware and logic would we need to add to the single-cycle datapath and control unit? The binary representation of an integer (positive or negative) that is a multiple of four will end in So we will branch if either or both of the two least significant bits are s: Reg[rs][] or Reg[rs][] = 67 68

Single-Cycle Datapath: bnmul4 Modify the datapath and control unit, mark the datapath for the instruction and give the control signals for the new instruction. Single-Cycle Datapath: adde Consider add on equal and add on not equal instructions ADDE Rs, Rt, Rd if Reg[Rs] == 2's complement value in shamt field then Reg[Rd] = Reg[Rs] + Reg[rt] ADDNE Rs, Rt, Rd if Reg[Rs]!= 2's complement value in shamt field then Reg[Rd] = Reg[Rs] + Reg[rt] Let s add these two instructions simultaneously into the single-cycle MIPS datapath. We will assume that we may not add either instruction to the ALU. 69 7 Single-Cycle Datapath: adde Modify the datapath and control unit, mark the datapath for the instructions and give the control signals for the new instructions. 7