Processor Design Fall 2014 CS 465 Instructor: Huzefa Rangwala, PhD
Big Picture: Where are We Now? Five classic components of a computer Processor Input Control Memory Datapath Output Today s topic: design a single cycle processor machine design arithmetic (Ch 3) inst. set design (Ch 2) technology Single-cycle CPU CS465 2
Recall: What does performance of a CPU depend on? Three terms, please!
The Performance Perspective Performance of a machine is determined by: Instruction count Clock cycle time Clock cycles per instruction Processor design (datapath and control) will determine: Clock cycle time Clock cycles per instruction Today: single cycle processor Inst. Count Advantage: one clock cycle per instruction Disadvantage: long cycle time CPI Cycle Time Single-cycle CPU CS465 4
Focus on a subset of instructions Simple subset, shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j
How to Design a Processor Analyze instruction set datapath requirements Meaning of each instruction given by register transfers Datapath must include storage element for ISA registers (possibly more) Datapath must support each register transfer Select set of datapath components and establish clocking methodology Assemble datapath meeting the requirements Analyze implementation of each instruction to determine setting of control points that effects the register transfer Assemble the control logic Single-cycle CPU CS465 6
Instruction Execution PC instruction memory, fetch instruction Register numbers register file, read registers Depending on instruction class Use ALU to calculate Arithmetic result Memory address for load/store Branch target address Access data memory for load/store PC target address or PC + 4 Chapter 4 The Processor 7
MIPS Instruction Formats All MIPS instructions are 32 bits long; 3 instruct. formats: 31 26 21 16 11 6 R-type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 0 I-type 31 26 21 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 16 0 J-type 31 op 26 6 bits 26 bits The different fields are: op: operation of the instruction target address rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of the jump instruction 0 Single-cycle CPU CS465 8
Step 1a: The MIPS-lite Subset ADD, SUB, AND, OR add rd, rs, rt sub rd, rs, rt and rd, rs,rt or rd,rs,rt 31 26 21 16 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 11 6 0 LOAD and STORE Word lw rt, rs, imm16 sw rt, rs, imm16 31 26 21 16 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 0 BRANCH: beq rs, rt, imm16 31 26 21 16 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 0 Single-cycle CPU CS465 9
Register Transfer Language RTL gives the meaning of the instructions First step is to fetch the instruction from memory op rs rt rd shamt funct = MEM[ PC ] op rs rt Imm16 = MEM[ PC ] inst Register Transfers ADD R[rd] < R[rs] + R[rt]; PC < PC + 4 SUB R[rd] < R[rs] R[rt]; PC < PC + 4 OR R[rd] < R[rs] R[rt]; PC < PC + 4 LOAD R[rt] < MEM[ R[rs] + sign_ext(imm16)]; PC < PC + 4 STORE MEM[ R[rs] + sign_ext(imm16) ] < R[rt]; PC < PC + 4 BEQ if ( R[rs] == R[rt] ) then PC < PC + sign_ext(imm16)] 00 else PC < PC + 4 Single-cycle CPU CS465 10
Step 1b: Requirements of Instr. Set Memory Instruction & data Registers (32 x 32) Read RS Read RT Write RT or RD Extender Add and Sub register or extended immediate PC Add 4 or extended immediate to PC Single-cycle CPU CS465 11
Step 2: Components of Datapath Combinational elements Storage elements Clocking methodology Single-cycle CPU CS465 12
Logic Design Basics Information encoded in binary Low voltage = 0, High voltage = 1 One wire per bit Multi-bit data encoded on multi-wire buses Combinational element Operate on data Output is a function of input State (sequential) elements Store information 4.2 Logic Design Conventions Chapter 4 The Processor 13
Combinational Elements AND-gate Y = A & B! Adder! Y = A + B A B + Y A B I0 I1 M u x S Y! Multiplexer! Y = S? I1 : I0 Y! Arithmetic/Logic Unit! Y = F(A, B) A ALU Y B F Chapter 4 The Processor 14
Sequential Elements Register: stores data in a circuit Uses a clock signal to determine when to update the stored value Edge-triggered: update when Clk changes from 0 to 1 D Clk Q Clk D Q Chapter 4 The Processor 15
Sequential Elements Register with write control Only updates on clock edge when write control input is 1 Used when stored value is required later Clk D Write Clk Q Write D Q Chapter 4 The Processor 16
Clocking Methodology Combinational logic transforms data during clock cycles Between clock edges Input from state elements, output to state element Longest delay determines clock period Chapter 4 The Processor 17
Our Implementation An edge triggered methodology Typical execution: Read contents of some state elements Send values through some combinational logic Write results to one or more state elements S t a t e e l e m e n t 1 C o m b i n a t i o n a l l o g i c S t a t e e l e m e n t 2 C l o c k c y c l e S t a t e e l e m e n t C o m b i n a t i o n a l l o g i c Single-cycle CPU CS465 18
Storage Element Register File Register file consists of 32 registers: Two 32-bit output busses: busa and busb One 32-bit input bus: busw RW RA RB Write Enable 5 5 5 busw 32 Clk 32 32-bit Registers Register is selected by: RA (number) selects the register to put on busa (data) RB (number) selects the register to put on busb (data) RW (number) selects the register to be written via busw (data) when Write Enable is 1 Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: RA or RB valid => busa or busb valid after access time busa 32 busb 32 Single-cycle CPU CS465 19
Storage Element: Idealized Memory Memory (idealized) One input bus: Data In One output bus: Data Out Write Enable Address Data In DataOut 32 Clk 32 Memory word is selected by: Address selects the word to put on Data Out Write Enable = 1: address selects the memory word to be written via the Data In bus Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: Address valid => Data Out valid after access time Single-cycle CPU CS465 20
How to Design a Processor 1. Analyze instruction set datapath requirements 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer 5. Assemble the control logic Single-cycle CPU CS465 21
Building a Datapath Datapath Elements that process data and addresses in the CPU Registers, ALUs, mux s, memories, We will build a MIPS datapath incrementally Fetch Instructions Read operands and execute instructions. 4.3 Building a Datapath Chapter 4 The Processor 22
3a: Instruction Fetch Unit The common RTL operations Fetch the Instruction: mem[pc] Update the program counter: Sequential Code: PC <- PC + 4 Branch and Jump: PC <- something else We don t know if instruction is a Branch/Jump or one of the other instructions until we have fetched and interpreted the instruction from memory So all instructions initially increment the PC Single-cycle CPU CS465 23
3a: Instruction Fetch Unit The common RTL operations Fetch the Instruction: mem[pc] Update the program counter: Sequential Code: PC <- PC + 4 Branch and Jump: PC <- something else We don t know if instruction is a Branch/Jump or one of the other instructions until we have fetched and interpreted the instruction from memory So all instructions initially increment the PC Single-cycle CPU CS465 24
Components to Assemble I n s t r u c t i o n a d d r e s s I n s t r u c t i o n P C A d d S u m I n s t r u c t i o n m e m o r y a. I n s t r u c t i o n m e m o r y b. P r o g r a m c o u n t e r c. A d d e r Single-cycle CPU CS465 25
Datapath for Instruction Fetch A d d 4 P C R e a d a d d r e s s I n s t r u c t i o n I n s t r u c t i o n m e m o r y Single-cycle CPU CS465 26
Step 3b: R-format Instructions Example instructions: add, sub, and, or, slt R[rd] <- R[rs] op R[rt] Read register 1, Read register 2, and Write register come from instruction s rs, rt, and rd fields ALU control and RegWrite: control logic after decoding the instruction 31 26 21 16 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 0 Single-cycle CPU CS465 27
Datapath for R-format Instructions I n s t r u c t i o n R e a d r e g i s t e r 1 R e a d r e g i s t e r 2 R e g i s t e r s W r i t e r e g i s t e r W r i t e d a t a R e a d d a t a 1 R e a d d a t a 2 R e g W r i t e 4 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t Single-cycle CPU CS465 28
Load/Store Instructions Read register operands Calculate address using 16-bit offset Use ALU, but sign-extend offset Load: Read memory and update register Store: Write register value to memory Chapter 4 The Processor 29
Datapath for LW & SW R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw rt, rs, imm16 Mem[ R[rs] + SignExt[imm16] <- R[rt] ] Example: sw rt, rs, imm16 I n s t r u c t i o n R e a d r e g i s t e r 1 R e a d r e g i s t e r 2 R e g i s t e r s W r i t e r e g i s t e r W r i t e d a t a R e g W r i t e R e a d d a t a 1 R e a d d a t a 2 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a M e m W r i t e D a t a m e m o r y R e a d d a t a 16 32 S i g n e x t e n d M e m R e a d Single-cycle CPU CS465 30
Branch Instructions Read register operands Compare operands Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 Already calculated by instruction fetch Chapter 4 The Processor 31
Branch Instructions Just re-routes wires Chapter 4 The Processor 32 Sign-bit wire replicated
Composing the Elements First-cut data path does an instruction in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions Chapter 4 The Processor 33
R-Type/Load/Store Datapath Chapter 4 The Processor 34
Full Datapath Chapter 4 The Processor 35
How to Design a Processor 1. Analyze instruction set datapath requirements 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer 5. Assemble the control logic Single-cycle CPU CS465 36
Datapath With Control Chapter 4 The Processor 37
Step 4: Datapath: RTL -> Control Instruction<31:0> Inst Memory Adr Op <21:25> Fun Rt <21:25> <0:15> <11:15> <16:20> Rs Rd Imm16 Control Branch RegWr RegDst ALUSrc ALUop MemRd MemWr MemtoReg Zero DATA PATH Single-cycle CPU CS465 38
Control Selecting the operations to perform (ALU, read/ write, etc.) Design the ALU Control Unit Controlling the flow of data (multiplexor inputs) Design the Main Control Unit Information comes from the 32 bits of the instruction Single-cycle CPU CS465 39
ALU Control ALU used for Load/Store: F = add Branch: F = subtract R-type: F depends on funct field ALU control Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 set-on-less-than 1100 NOR 4.4 A Simple Implementation Scheme Chapter 4 The Processor 40
ALU Control Assume 2-bit ALUOp derived from opcode Combinational logic derives ALU control opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010 sw 00 store word XXXXXX add 0010 beq 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010 subtract 100010 subtract 0110 AND 100100 AND 0000 OR 100101 OR 0001 set-on-less-than 101010 set-on-less-than 0111 Chapter 4 The Processor 41
Control Implementation Must describe hardware to compute 4-bit ALU control input Given instruction type 00 = lw, sw 01 = beq 10 = arithmetic Function code for arithmetic ALUOp computed from instruction type Describe it using a truth table (can turn into gates): ALUOp Funct field Operation ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 0 0 X X X X X X 0010 X 1 X X X X X X 0110 1 X X X 0 0 0 0 0010 1 X X X 0 0 1 0 0110 1 X X X 0 1 0 0 0000 1 X X X 0 1 0 1 0001 1 X X X 1 0 1 0 0111 Single-cycle CPU CS465 42
The Main Control Unit Control signals derived from instruction R-type Load/ Store Branch 0 rs rt rd shamt funct 31:26 25:21 20:16 15:11 10:6 5:0 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt address 31:26 25:21 20:16 15:0 opcode always read read, except for load write for R-type and load sign-extend and add Chapter 4 The Processor 43
Design the Main Control Unit Seven control signals RegDst RegWrite ALUSrc PCSrc MemRead MemWrite MemtoReg Single-cycle CPU CS465 44
Control Signals 1. RegDst = 0 => Register destination number for the Write register comes from the rt field (bits 20-16) RegDst = 1 => Register destination number for the Write register comes from the rd field (bits 15-11) 2. RegWrite = 1 => The register on the Write register input is written with the data on the Write data input (at the next clock edge) 3. ALUSrc = 0 => The second ALU operand comes from Read data 2 ALUSrc = 1 => The second ALU operand comes from the signextension unit 4. PCSrc = 0 => The PC is replaced with PC+4 PCSrc = 1 => The PC is replaced with the branch target address 5. MemtoReg = 0 => The value fed to the register write data input comes from the ALU MemtoReg = 1 => The value fed to the register write data input comes from the data memory 6. MemRead = 1 => Read data memory 7. MemWrite = 1 => Write data memory Single-cycle CPU CS465 45
R-format Instructions RegDst = 1 RegWrite = 1 ALUSrc = 0 Branch = 0 MemtoReg = 0 MemRead = 0 MemWrite = 0 ALUOp = 10 Single-cycle CPU CS465 46
Memory Access Instructions Load word RegDst = 0 RegWrite = 1 ALUSrc = 1 Branch = 0 MemtoReg = 1 MemRead = 1 MemWrite = 0 ALUOp = 00 Store Word RegDst = X RegWrite = 0 ALUSrc = 1 Branch = 0 MemtoReg = X MemRead = 0 MemWrite = 1 ALUOp = 00 Single-cycle CPU CS465 47
Branch on Equal RegDst = X RegWrite = 0 ALUSrc = 0 Branch = 1 MemtoReg = X MemRead = 0 MemWrite = 0 ALUOp = 01 Single-cycle CPU CS465 48
Step 5: Implementing Control Simple combinational logic (truth tables) I n p u t s O p 5 O p 4 O p 3 A L U O p A L U c o n t r o l b l o c k O p 2 O p 1 O p 0 A L U O p 0 A L U O p 1 O u t p u t s R - f o r m a t I w s w b e q R e g D s t F 3 O p e r a t i o n 2 A L U S r c F ( 5 0 ) F 2 O p e r a t i o n 1 O p e r a t i o n M e m t o R e g R e g W r i t e F 1 F 0 O p e r a t i o n 0 M e m R e a d M e m W r i t e B r a n c h A L U O p 1 ALU Control Unit A L U O p O Main Control Unit Single-cycle CPU CS465 49
Datapath With Control Chapter 4 The Processor 50
R-Type Instruction Chapter 4 The Processor 51
Load Instruction Chapter 4 The Processor 52
Branch-on-Equal Instruction Chapter 4 The Processor 53
Implementing Jumps Jump 2 address 31:26 25:0 Jump uses word address Update PC with concatenation of Top 4 bits of old PC 26-bit jump address 00 Need an extra control signal decoded from opcode Chapter 4 The Processor 54
Datapath With Jumps Added Chapter 4 The Processor 55
Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipelining Chapter 4 The Processor 56
Summary 5 steps to design a processor 1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer 5. Assemble the control logic MIPS makes it easier Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates Single cycle datapath => CPI=1, Clock Cycle Time => long Single-cycle CPU CS465 57