Processor Design. Fall 2014 CS 465. Instructor: Huzefa Rangwala, PhD

Similar documents
Review: MIPS Addressing Modes/Instruction Formats

Solutions. Solution The values of the signals are as follows:

Computer organization

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Design of Digital Circuits (SS16)

Computer Organization and Components

Pipeline Hazards. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by Arvind and Krste Asanovic

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Reduced Instruction Set Computer (RISC)

Instruction Set Design

Introducción. Diseño de sistemas digitales.1

Design of Pipelined MIPS Processor. Sept. 24 & 26, 1997

Instruction Set Architecture

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

(Refer Slide Time: 00:01:16 min)

A s we saw in Chapter 4, a CPU contains three main sections: the register section,

Chapter 9 Computer Design Basics!

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

Microprocessor & Assembly Language

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December :59

A SystemC Transaction Level Model for the MIPS R3000 Processor

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

The 104 Duke_ACC Machine

Central Processing Unit (CPU)

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

CS352H: Computer Systems Architecture

Generating MIF files

CPU Organisation and Operation

CPU Organization and Assembly Language

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

Let s put together a Manual Processor

Chapter 2 Logic Gates and Introduction to Computer Architecture

16-bit ALU, Register File and Memory Write Interface

MICROPROCESSOR AND MICROCOMPUTER BASICS

LSN 2 Computer Processors

CSE 141 Introduction to Computer Architecture Summer Session I, Lecture 1 Introduction. Pramod V. Argade June 27, 2005

In the Beginning The first ISA appears on the IBM System 360 In the good old days

Systems I: Computer Organization and Architecture

Intel 8086 architecture

CHAPTER 4 MARIE: An Introduction to a Simple Computer

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

Instruction Set Architecture (ISA)

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

İSTANBUL AYDIN UNIVERSITY

SIM-PL: Software for teaching computer hardware at secondary schools in the Netherlands

WAR: Write After Read

A FPGA Implementation of a MIPS RISC Processor for Computer Architecture Education

Chapter 2 Topics. 2.1 Classification of Computers & Instructions 2.2 Classes of Instruction Sets 2.3 Informal Description of Simple RISC Computer, SRC

CSE 141L Computer Architecture Lab Fall Lecture 2

BASIC COMPUTER ORGANIZATION AND DESIGN

Sequential Logic. (Materials taken from: Principles of Computer Hardware by Alan Clements )

Preface. Any questions from last time? A bit more motivation, information about me. A bit more about this class. Later: Will review 1st 22 slides

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Chapter 01: Introduction. Lesson 02 Evolution of Computers Part 2 First generation Computers

CS 61C: Great Ideas in Computer Architecture Finite State Machines. Machine Interpreta4on

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

Instruction Set Architecture. Datapath & Control. Instruction. LC-3 Overview: Memory and Registers. CIT 595 Spring 2010

MACHINE ARCHITECTURE & LANGUAGE

Summary of the MARIE Assembly Language

1 Classical Universal Computer 3

PART B QUESTIONS AND ANSWERS UNIT I

Central Processing Unit

ECE410 Design Project Spring 2008 Design and Characterization of a CMOS 8-bit Microprocessor Data Path

Systems I: Computer Organization and Architecture

EC 362 Problem Set #2

CHAPTER 7: The CPU and Memory

PROBLEMS #20,R0,R1 #$3A,R2,R4

Computer Organization. and Instruction Execution. August 22

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 5 Instructor's Manual

Chapter 4 Lecture 5 The Microarchitecture Level Integer JAVA Virtual Machine

CHAPTER 3 Boolean Algebra and Digital Logic

TIMING DIAGRAM O 8085

The WIMP51: A Simple Processor and Visualization Tool to Introduce Undergraduates to Computer Organization

EECS 427 RISC PROCESSOR

5 Combinatorial Components. 5.0 Full adder. Full subtractor

An Introduction to the ARM 7 Architecture

WEEK 8.1 Registers and Counters. ECE124 Digital Circuits and Systems Page 1

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

CHAPTER 11: Flip Flops

ARM Microprocessor and ARM-Based Microcontrollers

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

Computer Organization and Architecture

Techniques for Accurate Performance Evaluation in Architecture Exploration

Memory Elements. Combinational logic cannot remember

EE361: Digital Computer Organization Course Syllabus

Giving credit where credit is due

Administrative Issues

Counters and Decoders

An Overview of Stack Architecture and the PSC 1000 Microprocessor

Flip-Flops, Registers, Counters, and a Simple Processor

1 Computer hardware. Peripheral Bus device "B" Peripheral device. controller. Memory. Central Processing Unit (CPU)

Notes on Assembly Language

Levels of Programming Languages. Gerald Penn CSC 324

Transcription:

Processor Design Fall 2014 CS 465 Instructor: Huzefa Rangwala, PhD

Big Picture: Where are We Now? Five classic components of a computer Processor Input Control Memory Datapath Output Today s topic: design a single cycle processor machine design arithmetic (Ch 3) inst. set design (Ch 2) technology Single-cycle CPU CS465 2

Recall: What does performance of a CPU depend on? Three terms, please!

The Performance Perspective Performance of a machine is determined by: Instruction count Clock cycle time Clock cycles per instruction Processor design (datapath and control) will determine: Clock cycle time Clock cycles per instruction Today: single cycle processor Inst. Count Advantage: one clock cycle per instruction Disadvantage: long cycle time CPI Cycle Time Single-cycle CPU CS465 4

Focus on a subset of instructions Simple subset, shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j

How to Design a Processor Analyze instruction set datapath requirements Meaning of each instruction given by register transfers Datapath must include storage element for ISA registers (possibly more) Datapath must support each register transfer Select set of datapath components and establish clocking methodology Assemble datapath meeting the requirements Analyze implementation of each instruction to determine setting of control points that effects the register transfer Assemble the control logic Single-cycle CPU CS465 6

Instruction Execution PC instruction memory, fetch instruction Register numbers register file, read registers Depending on instruction class Use ALU to calculate Arithmetic result Memory address for load/store Branch target address Access data memory for load/store PC target address or PC + 4 Chapter 4 The Processor 7

MIPS Instruction Formats All MIPS instructions are 32 bits long; 3 instruct. formats: 31 26 21 16 11 6 R-type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 0 I-type 31 26 21 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 16 0 J-type 31 op 26 6 bits 26 bits The different fields are: op: operation of the instruction target address rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of the jump instruction 0 Single-cycle CPU CS465 8

Step 1a: The MIPS-lite Subset ADD, SUB, AND, OR add rd, rs, rt sub rd, rs, rt and rd, rs,rt or rd,rs,rt 31 26 21 16 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 11 6 0 LOAD and STORE Word lw rt, rs, imm16 sw rt, rs, imm16 31 26 21 16 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 0 BRANCH: beq rs, rt, imm16 31 26 21 16 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 0 Single-cycle CPU CS465 9

Register Transfer Language RTL gives the meaning of the instructions First step is to fetch the instruction from memory op rs rt rd shamt funct = MEM[ PC ] op rs rt Imm16 = MEM[ PC ] inst Register Transfers ADD R[rd] < R[rs] + R[rt]; PC < PC + 4 SUB R[rd] < R[rs] R[rt]; PC < PC + 4 OR R[rd] < R[rs] R[rt]; PC < PC + 4 LOAD R[rt] < MEM[ R[rs] + sign_ext(imm16)]; PC < PC + 4 STORE MEM[ R[rs] + sign_ext(imm16) ] < R[rt]; PC < PC + 4 BEQ if ( R[rs] == R[rt] ) then PC < PC + sign_ext(imm16)] 00 else PC < PC + 4 Single-cycle CPU CS465 10

Step 1b: Requirements of Instr. Set Memory Instruction & data Registers (32 x 32) Read RS Read RT Write RT or RD Extender Add and Sub register or extended immediate PC Add 4 or extended immediate to PC Single-cycle CPU CS465 11

Step 2: Components of Datapath Combinational elements Storage elements Clocking methodology Single-cycle CPU CS465 12

Logic Design Basics Information encoded in binary Low voltage = 0, High voltage = 1 One wire per bit Multi-bit data encoded on multi-wire buses Combinational element Operate on data Output is a function of input State (sequential) elements Store information 4.2 Logic Design Conventions Chapter 4 The Processor 13

Combinational Elements AND-gate Y = A & B! Adder! Y = A + B A B + Y A B I0 I1 M u x S Y! Multiplexer! Y = S? I1 : I0 Y! Arithmetic/Logic Unit! Y = F(A, B) A ALU Y B F Chapter 4 The Processor 14

Sequential Elements Register: stores data in a circuit Uses a clock signal to determine when to update the stored value Edge-triggered: update when Clk changes from 0 to 1 D Clk Q Clk D Q Chapter 4 The Processor 15

Sequential Elements Register with write control Only updates on clock edge when write control input is 1 Used when stored value is required later Clk D Write Clk Q Write D Q Chapter 4 The Processor 16

Clocking Methodology Combinational logic transforms data during clock cycles Between clock edges Input from state elements, output to state element Longest delay determines clock period Chapter 4 The Processor 17

Our Implementation An edge triggered methodology Typical execution: Read contents of some state elements Send values through some combinational logic Write results to one or more state elements S t a t e e l e m e n t 1 C o m b i n a t i o n a l l o g i c S t a t e e l e m e n t 2 C l o c k c y c l e S t a t e e l e m e n t C o m b i n a t i o n a l l o g i c Single-cycle CPU CS465 18

Storage Element Register File Register file consists of 32 registers: Two 32-bit output busses: busa and busb One 32-bit input bus: busw RW RA RB Write Enable 5 5 5 busw 32 Clk 32 32-bit Registers Register is selected by: RA (number) selects the register to put on busa (data) RB (number) selects the register to put on busb (data) RW (number) selects the register to be written via busw (data) when Write Enable is 1 Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: RA or RB valid => busa or busb valid after access time busa 32 busb 32 Single-cycle CPU CS465 19

Storage Element: Idealized Memory Memory (idealized) One input bus: Data In One output bus: Data Out Write Enable Address Data In DataOut 32 Clk 32 Memory word is selected by: Address selects the word to put on Data Out Write Enable = 1: address selects the memory word to be written via the Data In bus Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: Address valid => Data Out valid after access time Single-cycle CPU CS465 20

How to Design a Processor 1. Analyze instruction set datapath requirements 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer 5. Assemble the control logic Single-cycle CPU CS465 21

Building a Datapath Datapath Elements that process data and addresses in the CPU Registers, ALUs, mux s, memories, We will build a MIPS datapath incrementally Fetch Instructions Read operands and execute instructions. 4.3 Building a Datapath Chapter 4 The Processor 22

3a: Instruction Fetch Unit The common RTL operations Fetch the Instruction: mem[pc] Update the program counter: Sequential Code: PC <- PC + 4 Branch and Jump: PC <- something else We don t know if instruction is a Branch/Jump or one of the other instructions until we have fetched and interpreted the instruction from memory So all instructions initially increment the PC Single-cycle CPU CS465 23

3a: Instruction Fetch Unit The common RTL operations Fetch the Instruction: mem[pc] Update the program counter: Sequential Code: PC <- PC + 4 Branch and Jump: PC <- something else We don t know if instruction is a Branch/Jump or one of the other instructions until we have fetched and interpreted the instruction from memory So all instructions initially increment the PC Single-cycle CPU CS465 24

Components to Assemble I n s t r u c t i o n a d d r e s s I n s t r u c t i o n P C A d d S u m I n s t r u c t i o n m e m o r y a. I n s t r u c t i o n m e m o r y b. P r o g r a m c o u n t e r c. A d d e r Single-cycle CPU CS465 25

Datapath for Instruction Fetch A d d 4 P C R e a d a d d r e s s I n s t r u c t i o n I n s t r u c t i o n m e m o r y Single-cycle CPU CS465 26

Step 3b: R-format Instructions Example instructions: add, sub, and, or, slt R[rd] <- R[rs] op R[rt] Read register 1, Read register 2, and Write register come from instruction s rs, rt, and rd fields ALU control and RegWrite: control logic after decoding the instruction 31 26 21 16 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 0 Single-cycle CPU CS465 27

Datapath for R-format Instructions I n s t r u c t i o n R e a d r e g i s t e r 1 R e a d r e g i s t e r 2 R e g i s t e r s W r i t e r e g i s t e r W r i t e d a t a R e a d d a t a 1 R e a d d a t a 2 R e g W r i t e 4 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t Single-cycle CPU CS465 28

Load/Store Instructions Read register operands Calculate address using 16-bit offset Use ALU, but sign-extend offset Load: Read memory and update register Store: Write register value to memory Chapter 4 The Processor 29

Datapath for LW & SW R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw rt, rs, imm16 Mem[ R[rs] + SignExt[imm16] <- R[rt] ] Example: sw rt, rs, imm16 I n s t r u c t i o n R e a d r e g i s t e r 1 R e a d r e g i s t e r 2 R e g i s t e r s W r i t e r e g i s t e r W r i t e d a t a R e g W r i t e R e a d d a t a 1 R e a d d a t a 2 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a M e m W r i t e D a t a m e m o r y R e a d d a t a 16 32 S i g n e x t e n d M e m R e a d Single-cycle CPU CS465 30

Branch Instructions Read register operands Compare operands Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 Already calculated by instruction fetch Chapter 4 The Processor 31

Branch Instructions Just re-routes wires Chapter 4 The Processor 32 Sign-bit wire replicated

Composing the Elements First-cut data path does an instruction in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions Chapter 4 The Processor 33

R-Type/Load/Store Datapath Chapter 4 The Processor 34

Full Datapath Chapter 4 The Processor 35

How to Design a Processor 1. Analyze instruction set datapath requirements 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer 5. Assemble the control logic Single-cycle CPU CS465 36

Datapath With Control Chapter 4 The Processor 37

Step 4: Datapath: RTL -> Control Instruction<31:0> Inst Memory Adr Op <21:25> Fun Rt <21:25> <0:15> <11:15> <16:20> Rs Rd Imm16 Control Branch RegWr RegDst ALUSrc ALUop MemRd MemWr MemtoReg Zero DATA PATH Single-cycle CPU CS465 38

Control Selecting the operations to perform (ALU, read/ write, etc.) Design the ALU Control Unit Controlling the flow of data (multiplexor inputs) Design the Main Control Unit Information comes from the 32 bits of the instruction Single-cycle CPU CS465 39

ALU Control ALU used for Load/Store: F = add Branch: F = subtract R-type: F depends on funct field ALU control Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 set-on-less-than 1100 NOR 4.4 A Simple Implementation Scheme Chapter 4 The Processor 40

ALU Control Assume 2-bit ALUOp derived from opcode Combinational logic derives ALU control opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010 sw 00 store word XXXXXX add 0010 beq 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010 subtract 100010 subtract 0110 AND 100100 AND 0000 OR 100101 OR 0001 set-on-less-than 101010 set-on-less-than 0111 Chapter 4 The Processor 41

Control Implementation Must describe hardware to compute 4-bit ALU control input Given instruction type 00 = lw, sw 01 = beq 10 = arithmetic Function code for arithmetic ALUOp computed from instruction type Describe it using a truth table (can turn into gates): ALUOp Funct field Operation ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 0 0 X X X X X X 0010 X 1 X X X X X X 0110 1 X X X 0 0 0 0 0010 1 X X X 0 0 1 0 0110 1 X X X 0 1 0 0 0000 1 X X X 0 1 0 1 0001 1 X X X 1 0 1 0 0111 Single-cycle CPU CS465 42

The Main Control Unit Control signals derived from instruction R-type Load/ Store Branch 0 rs rt rd shamt funct 31:26 25:21 20:16 15:11 10:6 5:0 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt address 31:26 25:21 20:16 15:0 opcode always read read, except for load write for R-type and load sign-extend and add Chapter 4 The Processor 43

Design the Main Control Unit Seven control signals RegDst RegWrite ALUSrc PCSrc MemRead MemWrite MemtoReg Single-cycle CPU CS465 44

Control Signals 1. RegDst = 0 => Register destination number for the Write register comes from the rt field (bits 20-16) RegDst = 1 => Register destination number for the Write register comes from the rd field (bits 15-11) 2. RegWrite = 1 => The register on the Write register input is written with the data on the Write data input (at the next clock edge) 3. ALUSrc = 0 => The second ALU operand comes from Read data 2 ALUSrc = 1 => The second ALU operand comes from the signextension unit 4. PCSrc = 0 => The PC is replaced with PC+4 PCSrc = 1 => The PC is replaced with the branch target address 5. MemtoReg = 0 => The value fed to the register write data input comes from the ALU MemtoReg = 1 => The value fed to the register write data input comes from the data memory 6. MemRead = 1 => Read data memory 7. MemWrite = 1 => Write data memory Single-cycle CPU CS465 45

R-format Instructions RegDst = 1 RegWrite = 1 ALUSrc = 0 Branch = 0 MemtoReg = 0 MemRead = 0 MemWrite = 0 ALUOp = 10 Single-cycle CPU CS465 46

Memory Access Instructions Load word RegDst = 0 RegWrite = 1 ALUSrc = 1 Branch = 0 MemtoReg = 1 MemRead = 1 MemWrite = 0 ALUOp = 00 Store Word RegDst = X RegWrite = 0 ALUSrc = 1 Branch = 0 MemtoReg = X MemRead = 0 MemWrite = 1 ALUOp = 00 Single-cycle CPU CS465 47

Branch on Equal RegDst = X RegWrite = 0 ALUSrc = 0 Branch = 1 MemtoReg = X MemRead = 0 MemWrite = 0 ALUOp = 01 Single-cycle CPU CS465 48

Step 5: Implementing Control Simple combinational logic (truth tables) I n p u t s O p 5 O p 4 O p 3 A L U O p A L U c o n t r o l b l o c k O p 2 O p 1 O p 0 A L U O p 0 A L U O p 1 O u t p u t s R - f o r m a t I w s w b e q R e g D s t F 3 O p e r a t i o n 2 A L U S r c F ( 5 0 ) F 2 O p e r a t i o n 1 O p e r a t i o n M e m t o R e g R e g W r i t e F 1 F 0 O p e r a t i o n 0 M e m R e a d M e m W r i t e B r a n c h A L U O p 1 ALU Control Unit A L U O p O Main Control Unit Single-cycle CPU CS465 49

Datapath With Control Chapter 4 The Processor 50

R-Type Instruction Chapter 4 The Processor 51

Load Instruction Chapter 4 The Processor 52

Branch-on-Equal Instruction Chapter 4 The Processor 53

Implementing Jumps Jump 2 address 31:26 25:0 Jump uses word address Update PC with concatenation of Top 4 bits of old PC 26-bit jump address 00 Need an extra control signal decoded from opcode Chapter 4 The Processor 54

Datapath With Jumps Added Chapter 4 The Processor 55

Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipelining Chapter 4 The Processor 56

Summary 5 steps to design a processor 1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer 5. Assemble the control logic MIPS makes it easier Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates Single cycle datapath => CPI=1, Clock Cycle Time => long Single-cycle CPU CS465 57