Processor Design: How to Implement MIPS. Simplicity favors regularity

Similar documents
Review: MIPS Addressing Modes/Instruction Formats

Computer organization

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Reduced Instruction Set Computer (RISC)

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

(Refer Slide Time: 00:01:16 min)

Introducción. Diseño de sistemas digitales.1

Design of Digital Circuits (SS16)

Let s put together a Manual Processor

Solutions. Solution The values of the signals are as follows:

Computer Organization and Components

Microprocessor & Assembly Language

Lecture 10 Sequential Circuit Design Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

Instruction Set Architecture

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

CS 61C: Great Ideas in Computer Architecture Finite State Machines. Machine Interpreta4on

Sequential Circuits. Combinational Circuits Outputs depend on the current inputs

WEEK 8.1 Registers and Counters. ECE124 Digital Circuits and Systems Page 1

Sequential Logic. (Materials taken from: Principles of Computer Hardware by Alan Clements )

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

Lecture 10: Sequential Circuits

Design of Pipelined MIPS Processor. Sept. 24 & 26, 1997

MICROPROCESSOR AND MICROCOMPUTER BASICS

Lecture 11: Sequential Circuit Design

The 104 Duke_ACC Machine

A s we saw in Chapter 4, a CPU contains three main sections: the register section,

Pipeline Hazards. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by Arvind and Krste Asanovic

A New Paradigm for Synchronous State Machine Design in Verilog

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

TIMING DIAGRAM O 8085

Lecture-3 MEMORY: Development of Memory:

Chapter 9 Computer Design Basics!

To design digital counter circuits using JK-Flip-Flop. To implement counter using 74LS193 IC.

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions

Instruction Set Design

Latches, the D Flip-Flop & Counter Design. ECE 152A Winter 2012

Memory Elements. Combinational logic cannot remember

16-bit ALU, Register File and Memory Write Interface

In the Beginning The first ISA appears on the IBM System 360 In the good old days

Module 3: Floyd, Digital Fundamental

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

Register File, Finite State Machines & Hardware Control Language

Systems I: Computer Organization and Architecture

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Memory unit. 2 k words. n bits per word

CS352H: Computer Systems Architecture

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

Engr354: Digital Logic Circuits

ECE410 Design Project Spring 2008 Design and Characterization of a CMOS 8-bit Microprocessor Data Path

Digital Systems Based on Principles and Applications of Electrical Engineering/Rizzoni (McGraw Hill

Chapter 2 Logic Gates and Introduction to Computer Architecture

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December :59

ETEC 2301 Programmable Logic Devices. Chapter 10 Counters. Shawnee State University Department of Industrial and Engineering Technologies

PART B QUESTIONS AND ANSWERS UNIT I

Systems I: Computer Organization and Architecture

MACHINE ARCHITECTURE & LANGUAGE

ECE 3401 Lecture 7. Concurrent Statements & Sequential Statements (Process)

CHAPTER 7: The CPU and Memory

Generating MIF files

EE 42/100 Lecture 24: Latches and Flip Flops. Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad

Computer Organization. and Instruction Execution. August 22

CSE140: Components and Design Techniques for Digital Systems

Counters and Decoders

DEPARTMENT OF INFORMATION TECHNLOGY

CPU Organisation and Operation

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

A SystemC Transaction Level Model for the MIPS R3000 Processor

LSN 2 Computer Processors

Giving credit where credit is due

Central Processing Unit (CPU)

CSE 141L Computer Architecture Lab Fall Lecture 2

Computer Organization and Components

CHAPTER 11: Flip Flops

CSE 141 Introduction to Computer Architecture Summer Session I, Lecture 1 Introduction. Pramod V. Argade June 27, 2005

Chapter 01: Introduction. Lesson 02 Evolution of Computers Part 2 First generation Computers

CHAPTER IX REGISTER BLOCKS COUNTERS, SHIFT, AND ROTATE REGISTERS

CPU Organization and Assembly Language

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

EECS 427 RISC PROCESSOR

Take-Home Exercise. z y x. Erik Jonsson School of Engineering and Computer Science. The University of Texas at Dallas

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

Admin. ECE 550: Fundamentals of Computer Systems and Engineering. Last time. VHDL: Behavioral vs Structural. Memory Elements

Memory Basics. SRAM/DRAM Basics

Systems I: Computer Organization and Architecture

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

ECE 451 Verilog Exercises. Sept 14, James Barnes

Chapter 2 Topics. 2.1 Classification of Computers & Instructions 2.2 Classes of Instruction Sets 2.3 Informal Description of Simple RISC Computer, SRC

Asynchronous Counters. Asynchronous Counters

8254 PROGRAMMABLE INTERVAL TIMER

CS311 Lecture: Sequential Circuits

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

Lecture 8: Synchronous Digital Systems

VHDL GUIDELINES FOR SYNTHESIS

Instruction Set Architecture. Datapath & Control. Instruction. LC-3 Overview: Memory and Registers. CIT 595 Spring 2010

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

Digital Logic Design Sequential circuits

Central Processing Unit

Transcription:

ECE468 Computer Organization and Architecture esigning a Single Cycle atapath Processor esign: How to Implement MIPS Simplicity favors regularity ECE468 atapath1 23-3-19 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input atapath Output Today s Topic: atapath esign What is data? What is datapath? ECE468 atapath2 23-3-19

The Big Picture: The Performance Perspective Performance of a machine was determined by: Instruction count Clock cycle time Clock cycles per instruction Processor design (datapath and control) will determine: Clock cycle time Clock cycles per instruction In the next two lectures: Single cycle processor: - Advantage: One clock cycle per instruction - isadvantage: long cycle time ECE468 atapath3 23-3-19 The MIPS Instruction Formats All MIPS instructions are bits long The three instruction formats: 21 11 6 R-type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits I-type 21 op rs rt immediate 6 bits 5 bits 5 bits bits J-type op target address 6 bits bits The different fields are: op: operation of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of the jump instruction ECE468 atapath4 23-3-19

The MIPS Subset A and subtract add rd, rs, rt sub rd, rs, rt OR Immediate: ori rt, rs, imm LOA and STORE lw rt, rs, imm sw rt, rs, imm BRANCH: beq rs, rt, imm 21 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 21 op rs rt immediate 6 bits 5 bits 5 bits bits JUMP: j target op target address 6 bits bits ECE468 atapath5 23-3-19 An Abstract View of the Implementation Two types of functional units Operational element that operate on data (combinational) State element that contain data (sequential) PC Ideal Instruction Instruction Address Rd 5 Instruction bus Rs 5 Rt 5 Imm Generic Implementation: use PC to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow? Rw Ra Rb -bit ALU ata Address ata In Ideal ata ataout Next step: to fill in the details: more units, more connections, and control unit ECE468 atapath6 23-3-19

State Elements Unclocked vs Clocked Clocks used in synchronous logic when should an element that contains state be updated? falling edge cycle time rising edge ECE468 atapath7 23-3-19 An unclocked state element The set-reset latch output depends on present inputs and also on past inputs R Q S Q ECE468 atapath8 23-3-19

Latches and Flip-flops Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) Change of state (value) is based on the clock Latches: whenever the inputs change, and the clock is asserted Flip-flop: state changes only on a clock edge (edge-triggered methodology) "logically true", could mean electrically low A clocking methodology defines when signals can be read and written wouldn't want to read a signal at the same time it was being written ECE468 atapath9 23-3-19 -latch and flip-flop Two inputs: the data value to be stored () the clock signal (C) indicating when to read & store Output changes when C is high C Q _ Q C Q Output changes only on the clock edge Q latch C Q latch _ C Q Q _ Q C C Q ECE468 atapath1 23-3-19

Clocking Methodology (Appendix B7) Setup Hold on t Care Setup Hold All storage elements are clocked by the same clock edge Edge-trigged: all stored values are updated on a clock edge Cycle Time = Latch Prop + Longest elay Path + Setup + Clock Skew (Latch Prop + Shortest elay Path - Clock Skew) > Hold Time ECE468 atapath11 23-3-19 An Abstract View of the Critical Path Register file and ideal memory: The CLK input is a factor ONLY during write operation uring read operation, behave as combinational logic: - Address valid => Output valid after access time PC Ideal Instruction Instruction Address Rd 5 Instruction bus Rs 5 Rt 5 Imm Critical Path (Load Operation) = PC s prop time + Instruction s Access Time + Register File s Access Time + ALU to Perform a -bit Add + ata Access Time + Setup Time for Register File Write + Clock Skew Rw Ra Rb -bit ALU ata Address ata In Ideal ata ataout ECE468 atapath12 23-3-19

The Steps of esigning a Processor Instruction Set Architecture => Register Transfer Language Register Transfer Language (RTL) => atapath components atapath interconnect atapath components => Control signals Control signals => Control logic Element < component ECE468 atapath13 23-3-19 What is RTL: The A Instruction add rd, rs, rt Register Transfer Language mem[pc] Fetch the instruction from memory R[rd] R[rs] + R[rt] The A operation PC PC + 4 Calculate the next instruction s address ECE468 atapath14 23-3-19

What is RTL: The Load Instruction lw rt, rs, imm mem[pc] Fetch the instruction from memory Addr R[rs] + SignExt(imm) Calculate the memory address R[rt] Mem[Addr] Load the data into the register PC PC + 4 Calculate the next instruction s address ECE468 atapath15 23-3-19 Combinational Logic Elements Adder MUX (pb-9,b-19) ALU A B A B A Select OP MUX Adder Y B ALU CarryIn ECE468 atapath 23-3-19 Sum Carry Result Zero ecoder 3 ecoder out out1 out2 out7 In which cases do we need an adder, ALU, MUX or ecoder?

Storage Element: Register (pb22-b25) Register Write Enable Similar to the Flip Flop except - N-bit input and output ata In - Write Enable input N Write Enable: - : ata Out will not change - 1: ata Out will become ata In Array of logical elements(see register file on next 2 slides) ata Out N The content is updated at the clock tick ONLY if the Write Enable signal is set to 1 ECE468 atapath17 23-3-19 Storage Element: Register File Register File consists of registers: Two -bit output busses: busa and busb One -bit input bus: Register is selected by: RA selects the register to put on busa RB selects the register to put on busb RW selects the register to be written via when Write Enable is 1 RW RA RB Write Enable 5 5 5 -bit busa busb Clock input (CLK) The CLK input is a factor ONLY during write operation uring read operation, behaves as a combinational logic block: - RA or RB valid => busa or busb valid after access time ECE468 atapath18 23-3-19

Storage Element: Register File -- etailed diagram RW RA RB Write Enable 5 5 5 Write Enable RA RB -bit busa busb RW 1 -to-1 ecoder 3 C C Register Register 1 M U busa C C Register 3 Register X M U busb X ECE468 atapath19 23-3-19 Storage Element: Idealized Write Enable Address (idealized) One input bus: ata In ata In ataout One output bus: ata Out word is selected by: Address selects the word to put on ata Out Write Enable = 1: address selects the memory memory word to be written via the ata In bus Clock input (CLK) The CLK input is a factor ONLY during write operation uring read operation, behaves as a combinational logic block: - Address valid => ata Out valid after access time ECE468 atapath2 23-3-19

Overview of the Instruction Fetch Unit (Fig 55) The common RTL operations Fetch the Instruction: mem[pc] Update the program counter: - Sequential Code: PC <- PC + 4 - Branch and Jump PC <- something else PC Next Address Logic Address Instruction Instruction Word ECE468 atapath21 23-3-19 RTL: The A Instruction 21 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits add rd, rs, rt mem[pc] Fetch the instruction from memory R[rd] R[rs] + R[rt] The actual operation PC PC + 4 Calculate the next instruction s address ECE468 atapath22 23-3-19

RTL: The Subtract Instruction 21 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits sub rd, rs, rt mem[pc] Fetch the instruction from memory R[rd] R[rs] - R[rt] The actual operation PC PC + 4 Calculate the next instruction s address ECE468 atapath23 23-3-19 atapath for Register-Register Operations R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt Ra, Rb, and Rw comes from instruction s rs, rt, and rd fields ALUctr and RegWr: control logic after decoding the instruction 21 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits RegWr Rd Rs 5 5 5 Rt Rw Ra Rb -bit busa busb ALUctr ALU Result ECE468 atapath24 23-3-19

Register-Register Timing PC Rs, Rt, Rd, Op, Func ALUctr Old Value -to-q New Value Old Value Old Value Instruction Access Time New Value elay through Control Logic New Value RegWr Old Value New Value Register File Access Time busa, B Old Value New Value ALU elay Old Value New Value RegWr Rd Rs Rt 5 5 5 Rw Ra Rb -bit busa busb ALUctr ALU Result Register Write Occurs Here ECE468 atapath25 23-3-19 RTL: The OR Immediate Instruction 21 op rs rt immediate 6 bits 5 bits 5 bits bits ori rt, rs, imm mem[pc] Fetch the instruction from memory R[rt] R[rs] or ZeroExt(imm) The OR operation PC PC + 4 Calculate the next instruction s address bits 15 immediate bits ECE468 atapath 23-3-19

atapath for Logical Operations with Immediate R[rt] <- R[rs] op ZeroExt[imm]] Example: ori rt, rs, imm 21 op rs rt immediate 6 bits 5 bits 5 bits bits Rd Rt Regst on t Care Rs RegWr (Rt) 5 5 5 imm Rw Ra Rb -bit busb ZeroExt busa ALUctr Result ECE468 atapath27 23-3-19 ALUSrc Newly added parts are in blue color ALU RTL: The Load Instruction lw rt, rs, imm 21 op rs rt immediate 6 bits 5 bits 5 bits bits mem[pc] Fetch the instruction from memory Addr R[rs] + SignExt(imm) Calculate the memory address R[rt] Mem[Addr] Load the data into the register PC PC + 4 Calculate the next instruction s address 15 bits 15 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 bits immediate bits immediate bits ECE468 atapath28 23-3-19

atapath for Load Operations R[rt] <- Mem[R[rs] + SignExt[imm]] Example: lw rt, rs, imm 21 op rs rt immediate 6 bits 5 bits 5 bits bits Rd Rt Regst on t Care Rs RegWr (Rt) 5 5 5 imm Rw Ra Rb -bit busb Extender busa ALUSrc ALUctr ALU ata In MemWr WrEn Adr ata MemtoReg ExtOp ECE468 atapath29 23-3-19 RTL: The Store Instruction 21 op rs rt immediate 6 bits 5 bits 5 bits bits sw rt, rs, imm mem[pc] Fetch the instruction from memory Addr R[rs] + SignExt(imm) Calculate the memory address Mem[Addr] R[rt] Store the register into memory PC PC + 4 Calculate the next instruction s address ECE468 atapath3 23-3-19

atapath for Store Operations Mem[R[rs] + SignExt[imm] <- R[rt]] Example: sw rt, rs, imm 21 op rs rt immediate 6 bits 5 bits 5 bits bits Regst Rd Rt RegWr 5 5 imm Rs Rt 5 Rw Ra Rb -bit busb Extender busa ALUSrc ALUctr ALU ata In MemWr WrEn Adr ata MemtoReg ExtOp ECE468 atapath 23-3-19 RTL: The Branch Instruction 21 op rs rt immediate 6 bits 5 bits 5 bits bits beq rs, rt, imm mem[pc] Fetch the instruction from memory Cond R[rs] - R[rt] Calculate the branch condition if (CON eq ) Calculate the next instruction s address - PC PC + 4 + ( SignExt(imm) x 4 ) else - PC PC + 4 ECE468 atapath 23-3-19

atapath for Branch Operations beq rs, rt, imm We need to compare Rs and Rt! 21 op rs rt immediate 6 bits 5 bits 5 bits bits Rd Rt Regst Rs Rt RegWr 5 5 5 imm Rw Ra Rb -bit busb Extender busa ALUSrc ALUctr ALU imm Branch Zero PC Next Address Logic To Instruction ExtOp ECE468 atapath33 23-3-19 Binary Arithmetic for the Next Address In theory, the PC is a -bit byte address into the instruction memory: Sequential operation: PC<:> = PC<:> + 4 Branch operation: PC<:> = PC<:> + 4 + SignExt[Imm] * 4 The magic number 4 always comes up because: The -bit PC is a byte address And all our instructions are 4 bytes ( bits) long In other words: The 2 LSBs of the -bit PC are always zeros There is no reason to have hardware to keep the 2 LSBs In practice, we can simplify the hardware by using a 3-bit PC<:2>: Sequential operation: PC<:2> = PC<:2> + 1 Branch operation: PC<:2> = PC<:2> + 1 + SignExt[Imm] In either case: Instruction Address = PC<:2> concat ECE468 atapath34 23-3-19

Next Address Logic: Expensive and Fast Solution Using a 3-bit PC: Sequential operation: PC<:2> = PC<:2> + 1 Branch operation: PC<:2> = PC<:2> + 1 + SignExt[Imm] In either case: Instruction Address = PC<:2> concat PC 3 1 imm Instruction<15:> SignExt Adder 3 3 3 3 Adder 3 1 Addr<:2> Addr<1:> Instruction Instruction<:> Branch Zero ECE468 atapath35 23-3-19 Next Address Logic: Cheap and Slow Solution Why is this slow? Cannot start the address add until Zero (output of ALU) is valid oes it matter that this is slow in the overall scheme of things? Probably not here Critical path is the load operation 3 PC imm Instruction<15:> 3 SignExt 3 1 3 1 Carry In Adder 3 Addr<:2> Addr<1:> Instruction Instruction<:> Branch Zero ECE468 atapath36 23-3-19

RTL: The Jump Instruction op target address 6 bits bits j target mem[pc] Fetch the instruction from memory PC<:2> PC<:28> concat target<25:> Calculate the next instruction s address ECE468 atapath37 23-3-19 Instruction Fetch Unit j target PC<:2> PC<:28> concat target<25:> PC<:28> Target Instruction<25:> 3 4 3 3 1 Addr<:2> Addr<1:> Instruction PC 3 1 imm Adder SignExt 3 Adder 3 3 1 Jump Instruction<:> Instruction<15:> Branch Zero This is the whole design of Instruction Fetch Unit: 3 inputs: jump, Branch and Zero; 1 output: instruction word ECE468 atapath38 23-3-19

Putting it All Together: A Single Cycle atapath We have everything except control signals (underline) Regst 1 Rs Rt RegWr 5 5 5 Rd imm Rt Rw Ra Rb -bit busb Extender Branch Jump busa ExtOp 1 ALUSrc Instruction Fetch Unit ALUctr ata In ECE468 atapath39 23-3-19 ALU Zero Instruction<:> Rt <21:25> Rs <:2> MemWr WrEn Adr ata Rd <11:15> <:15> Imm 1 MemtoReg Where to get more information? To be continued ECE468 atapath4 23-3-19