We will examine a simplified MIPS implementation first, and then produce a more realistic pipelined version.

Similar documents
Solutions. Solution The values of the signals are as follows:

Review: MIPS Addressing Modes/Instruction Formats

Computer organization

Computer Organization and Components

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Design of Digital Circuits (SS16)

Pipeline Hazards. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by Arvind and Krste Asanovic

Design of Pipelined MIPS Processor. Sept. 24 & 26, 1997

Reduced Instruction Set Computer (RISC)

Introducción. Diseño de sistemas digitales.1

Instruction Set Architecture

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

CS352H: Computer Systems Architecture

(Refer Slide Time: 00:01:16 min)

Let s put together a Manual Processor

MICROPROCESSOR AND MICROCOMPUTER BASICS

Summary of the MARIE Assembly Language

A s we saw in Chapter 4, a CPU contains three main sections: the register section,

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December :59

EC 362 Problem Set #2

Chapter 9 Computer Design Basics!

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

Comp 255Q - 1M: Computer Organization Lab #3 - Machine Language Programs for the PDP-8

CSE 141L Computer Architecture Lab Fall Lecture 2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

Instruction Set Design

A SystemC Transaction Level Model for the MIPS R3000 Processor

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

A FPGA Implementation of a MIPS RISC Processor for Computer Architecture Education

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

PROBLEMS #20,R0,R1 #$3A,R2,R4

Microprocessor & Assembly Language

CPU Organisation and Operation

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Assembly Language Programming

Generating MIF files

CPU Organization and Assembly Language

Notes on Assembly Language

EE361: Digital Computer Organization Course Syllabus

Instruction Set Architecture (ISA)

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Binary Adders: Half Adders and Full Adders

CSE 141 Introduction to Computer Architecture Summer Session I, Lecture 1 Introduction. Pramod V. Argade June 27, 2005

Central Processing Unit

CHAPTER 7: The CPU and Memory

OAMulator. Online One Address Machine emulator and OAMPL compiler.

Intel 8086 architecture

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

CS521 CSE IITG 11/23/2012

SIM-PL: Software for teaching computer hardware at secondary schools in the Netherlands

Central Processing Unit (CPU)

Chapter 01: Introduction. Lesson 02 Evolution of Computers Part 2 First generation Computers

Traditional IBM Mainframe Operating Principles

The AVR Microcontroller and C Compiler Co-Design Dr. Gaute Myklebust ATMEL Corporation ATMEL Development Center, Trondheim, Norway

Lecture Outline. Stack machines The MIPS assembly language. Code Generation (I)

Sequential Logic. (Materials taken from: Principles of Computer Hardware by Alan Clements )

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Computer Organization and Architecture

Take-Home Exercise. z y x. Erik Jonsson School of Engineering and Computer Science. The University of Texas at Dallas

CS 61C: Great Ideas in Computer Architecture Finite State Machines. Machine Interpreta4on

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 4 Lecture 5 The Microarchitecture Level Integer JAVA Virtual Machine

In the Beginning The first ISA appears on the IBM System 360 In the good old days

PART B QUESTIONS AND ANSWERS UNIT I

The WIMP51: A Simple Processor and Visualization Tool to Introduce Undergraduates to Computer Organization

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

1 Computer hardware. Peripheral Bus device "B" Peripheral device. controller. Memory. Central Processing Unit (CPU)

Chapter 5 Instructor's Manual

Systems I: Computer Organization and Architecture

Memory Elements. Combinational logic cannot remember

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

MIPS Assembler and Simulator

Introduction to MIPS Assembly Programming

Preface. Any questions from last time? A bit more motivation, information about me. A bit more about this class. Later: Will review 1st 22 slides

1 Classical Universal Computer 3

Computer Organization. and Instruction Execution. August 22

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

5 Combinatorial Components. 5.0 Full adder. Full subtractor

EE 261 Introduction to Logic Circuits. Module #2 Number Systems

(Refer Slide Time: 2:03)

Winter 2002 MID-SESSION TEST Friday, March 1 6:30 to 8:00pm

1 Description of The Simpletron

LC-3 Assembly Language

THUMB Instruction Set

What to do when I have a load/store instruction?

PROGRAMMABLE LOGIC CONTROLLERS Unit code: A/601/1625 QCF level: 4 Credit value: 15 OUTCOME 3 PART 1

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

PROBLEMS. which was discussed in Section

An Introduction to the ARM 7 Architecture

Transcription:

Introduction Datapath Design 1 We will examine a simplified MIPS implementation first, and then produce a more realistic pipelined version. A simple, representative subset of machine instructions, shows most aspects: - Memory reference: lw, sw - Arithmetic/logical: add, sub, and, or, slt - Transfer of control: beq, j R op rs rt rd shamt funct I op rs rt 16-bit immediate J op 26-bit immediate 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Instruction Execution Datapath Design 2 I PC instruction memory, fetch instruction II Register numbers register file, read registers to get operand(s) III Depending on instruction class - May use ALU to calculate needed value - R-type: need result of specified operation - Load/store: need memory address to be read from/written to - Branch: need the branch target address - May access data memory - Load/store: access data memory to read/write value - Set address for next instruction fetch: PC branch target or PC + 4 or jump target

Executing Instruction Fetch Datapath Design 3 Here's what we need for simple fetches (no beqor j): increment by 4 for next instruction (for now) 32-bit register storing address of instruction to fetch assume there are separate storage for instructions (for now)

Executing R-Format Instructions Datapath Design 4 Read two operands from register file Use ALU to perform the arithmetic/logical operation Write the result to a register GPR[rd] = GPR[rs] funct GPR[rt] op rs rt rd shamt funct Register numbers Operands Control Operation results Register file

Executing R-Format Instructions Datapath Design 5 GPR[rd] = GPR[rs] funct GPR[rt] op rs rt rd shamt funct Control tells ALU Control to use the funct bits rs and rt specify where operands are ALU applies specified operation to operands rd specifies where result goes add, sub, and, or, slt ALU result goes to register file ALU Control sets ALU to correct operation

Executing Load Instructions Read register operand Datapath Design 6 GPR[rt] = Mem[GPR[rs] + imm] Calculate the address to be read using register operand and 16-bit offset from instruction - Use ALU, but sign-extend offset... op rs rt 16-bit immediate

Executing Load Instructions... Read memory and update register Datapath Design 7 GPR[rt] = Mem[GPR[rs] + imm] op rs rt 16-bit immediate

Executing Load Instructions Datapath Design 8 GPR[rt] = Mem[GPR[rs] + imm] op rs rt 16-bit immediate Control tells ALU Control to add operands rs specifies where operand is ALU computes address to read Value is retrieved from memory location and sent to register file rt specifies where result goes lw 16-bit immediate is extended

Executing Store Instructions Datapath Design 9 Mem[GPR[rs] + imm] = GPR[rt] op rs rt 16-bit immediate Control tells ALU Control to add operands ALU computes address to read rs specifies where operand is rt specifies where data comes from sw 16-bit immediate is extended Value is retrieved from register file and sent to memory

Unifying the Designs Datapath Design 10 In order to produce a complete datapath design, we must identify and deal with any conflicts. First, consider the specification of the register numbers supplied to the register file. They come from the current instruction, but using which bits? Read reg 1 Read reg 2 Write reg R-type 25:21 20:16 15:11 load 25:21 not used 20:16 store 25:21 20:16 not used We have a conflict regarding the write register: To resolve the conflict, we must be able to select one set of bits for R-type instructions and a different set of bits for the load instructions how do we make a selection in hardware?

Unifying the Designs Datapath Design 11 We also have a conflicts regarding the source of the write data and the source of the right (lower) operand to the ALU: To resolve these conflicts, we must be able to: Write data source Right operand source R-type ALU output register file load Data memory sign-extend store not used sign-extend - send the ALU output to the register file for R-type instructions, but send the data read from the memory unit to the register file for load instructions - send the data read from the register file to the ALU for R-type instructions, but send the output from the sign-extender to the ALU for load and store instructions

Unified R-type/Load/Store Datapath Datapath Design 12 By adding three multiplexors, we can resolve the conflicts that we have identified and produce a design that will (almost) handle the R-type, load and store instructions we are considering. Multiplexor to select right operand to ALU Multiplexor to select write data for register file Multiplexor to select write register number add, sub, and, or, slt, lw, sw

Analyzing the Control Signals Datapath Design 13 We've identified quite a few necessary control signals in our design. Which ones are two-valued? Do any require more than two values? How should they be set? The datapath cannot operate correctly unless every control signal is managed properly. Two things to remember: - Every line always carries a value. We may not know (or care) what it is in some cases. But there is always a value there. - It doesn't matter what value a line carries if that value is never stored or used to make a decision. (Hence, we will find that we have don't-care conditions.)

Register File Control Datapath Design 14 R-type and load instructions require writing a value into a register, but the store instruction does not. Writing a value modifies the state of the system, so it is not a don't-care. So, we must manage the RegWrite signal accordingly: RegWrite R-type 1 load 1 store 0 Value at Write data is written to the Write register iff RegWrite is 1.

Data Memory Control Datapath Design 15 MemRead - must be 1 for load instructions, since they copy a value from memory to a register - might be a don't-care for R-type and store instructions why? If not, should be 0. MemWrite - must be 1 for store instructions, since they copy a value from a register to memory - must be 0 for R-type and load instructions; otherwise they could modify a memory value that should not be changed MemRead MemWrite R-type? 0 load 1 0 store? 1

Multiplexor Control Datapath Design 16 RegDst - must be 0 for load instructions - must be 1 for R-type instructions - don't-care for store instructions (why?) see slide 10 ALUSrc - must be 0 for R-type instructions - must be 1 for load and store instructions see slide 15 MemtoReg - must be 1 for load instructions - must be 0 for R-type instructions - don't-care for store instructions (why?) see slide 11

ALU Control Datapath Design 17 There are a lot of control signals, even in our simple datapath. At this point, almost all of them are single-bit signals (i.e., they make a choice between two alternate actions). The ALU control needs to be different because there are more than two choices for what it will actually do: ALU R-type add sub and or slt load store add subtract and or slt add add So, the ALU will require a multi-bit control signal why? how many bits?

ALU Control Datapath Design 18 This suggests separating the control logic into two modules: - a master control module that determines the type of instruction being executed and sets the non-alu control signals - a secondary control module that manages the interface of the ALU itself The master module will send : - a specific selector pattern for the ALU if the instruction is not R-type; e.g., it sends the ADD selector pattern for lw and sw instructions - a flag telling the secondary module to analyze the functbits if the instruction is R- type We'll fill in the details of the two modules later, but for now we do know what each must do, at a high level.

Unified Datapath Design with Control Datapath Design 19 add, sub, and, or, slt, lw, sw

Executing Branch Instructions Datapath Design 20 Read two operands from the register file Use the ALU to compare the operands: subtract and check Zero signal... if GPR[rs] = GPR[rt] then PC = PC + 4 + (imm << 2) op rs rt 16-bit immediate

Executing Branch Instructions Datapath Design 21... Calculate the branch target address: - Sign-extend displacement (immediate from instruction) - Shift left 2 places (MIPS uses word displacement why?) - Add to PC + 4 (already calculated PC + 4 during the instruction fetch) Send computed branch target address to the PC (if taken) if GPR[rs] = GPR[rt] then PC = PC + 4 + (imm << 2) op rs rt 16-bit immediate

Aside: Branch Target Address Datapath Design 22 Examine the display below of a short MIPS assembly program the addresses at which the instructions will be loaded into memory when the program is executed: address assembly source code ---------------------------------------------.text 0x00400000 main: addi $s0, $zero, 10 0x00400004 addi $s1, $zero, 20 0x00400008 beq $s0, $s1, else 0x0040000C add $s3, $s1, $s0 0x00400010 j endif 0x00400014 else: sub $s3, $s1, $s0 0x00400018 endif: li $v0, 10 0x0040001C syscall First of all, note that all the addresses are multiples of 4*. Therefore, the "distance" between two instructions is always a multiple of 4. *QTP: why isn't this surprising?

Aside: Branch Target Address Datapath Design 23 The immediate we store for a beqinstruction is (almost) the "distance" from the beq instruction to the instruction that is the target of the branch. address assembly source code ---------------------------------------------... 0x00400008 beq $s0, $s1, else... 0x00400014 else: sub $s3, $s1, $s0... Here, that "distance" is 0x14 0x08 = 0x0C. In base-2, 0x0C is 1100. Now, recall that the base-2 representation of a multiple of 4 ends in 00. If the "distance" always ends in 00, there's no reason to store that 00 in the instruction

Aside: Branch Target Address Datapath Design 24 Do we gain anything if we don't store those two 0's in the instruction? address assembly source code ---------------------------------------------... 0x00400008 beq $s0, $s1, else... 0x00400014 else: sub $s3, $s1, $s0... Yes. We can store a 16-bit "distance" in the instruction, but effectively use an 18-bit "distance" at runtime. That means that a conditional branch can "branch" farther which is a gain.

Aside: Branch Target Address Datapath Design 25 But, there's more address assembly source code ---------------------------------------------... 0x00400008 beq $s0, $s1, else... 0x00400014 else: sub $s3, $s1, $s0... It takes time to evaluate the difference of the two registers; we can do other, useful things in the meantime, like calculate PC + 4: - If we don't take the branch, we need that address to fetch the next instruction. - If we do take the branch, we wasted no time computing PC + 4 along the way. But that means the "distance" we'll store is actually from PC + 4 to the branch target.

Aside: Branch Target Address Datapath Design 26 Now, consider the machine code for our example: beq target address machine code ---------------------------------------------------------... 0x00400008 000100 10000 10001 0000000000000010 0x0040000C... 0x00400014... 0x08 The immediate in the beq instruction is: 0000000000000010 That's 2. If we shift that left by 2 bits (multiply by 4), we get 8. If we add 8 to the address 0x0040000C, we get 0x00400014, which is the target address we need.

Executing Branch Instructions Datapath Design 27 op rs rt 16-bit immediate rs and rt specify where operands are Control tells ALU to subtract operands Adder computes branch target AND gate controls which address goes to the PC Immediate is sign-extended and shifted ALU subtracts and sets Zero beq

Executing Jump Instructions Datapath Design 28 Calculate the jump target address: - Shift left 2 places (just as with branch target address calculation) - Concatenate with PC + 4[31:28] (already calculated PC + 4 during the instruction fetch) Send computed jump target address to the PC PC = (PC+4)[31:28] (IR[25:0] << 2) from PC+4 p 31 p 30 p 29 p 28 i 25 i 24 i 23 i 22 >i 3 i 2 i 1 i 0 00 op 26-bit immediate

Aside: Jump Target Address Datapath Design 29 The calculation of the jump target address is similar to the calculation of the branch target address (beq). The essential difference is that the immediate is not added to the PC (or to PC + 4). The MIPS model is to view memory as a sequence of 256 MB segments. Now, 256 MB can be addressed by 28-bit addresses.* For jump (j) instructions, the 26-bit immediate is shifted 2 bits to the left, yielding a 28-bit offset, which is then added to the starting address of the current memory segment. The starting address of the current memory segment is given by the high 4 bits of PC + 4.* *QTP: why?

Executing Jump Instructions Datapath Design 30 op 26-bit immediate Concatenate to form 32-bit address for jump Calculate address of next sequential instruction High 4 bits of PC + 4 specify "segment" containing this code. Control sets Jump signal so that the jump address will be used to fetch the next instruction MUX passes jump address back to PC

Unified Datapath Datapath Design 31 add, sub, and, or, slt, lw, sw, beq, j

Summary Datapath Design 32 The unified datapath that we have designed: - illustrates many of the logical issues that must be solved in designing any datapath - can be extended to support additional instructions (easily for some, less so for others) - is fundamentally unsatisfactory in that it requires a single clock cycle be long enough for every path within the datapath to stabilize before the next instruction is fetched We may explore the second issue in exercises. The third issue can only be dealt with by transforming the design to incorporate a pipeline.