Designing a Single Cycle Datapath Computer Science 14 cps 14 1 Outline of Today s Lecture Homework #3 Projects Reading Ch 41-43 Where are we with respect to the BIG picture? The Steps of Designing a Processor Datapath and timing for Reg-Reg Operations Datapath for Logical Operations with Immediate Datapath for Load and Store Operations Datapath for Branch and Jump Operations cps 14 2
Projects: What to submit Project description: eg, interpretations of instructions and description of principles behind design Source code (commented) Design files (full project directory) A description of each team member s contribution Demonstration cps 14 3 What is Computer Architecture? Coordination of levels of abstraction Application CPU Compiler Operating System Digital Design Circuit Design Firmware I/O system Software Interface Between HW and SW Set Architecture,, I/O Hardware Under a set of rapidly changing technology Forces cps 14 4
The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath Output Today s Topic: Datapath Design cps 14 5 Datapath Design How do we build hardware to implement the MIPS instructions? Add, LW, SW, Beq, Jump cps 14 6
An Abstract View of the Implementation PC Address Ideal Rd 5 Rs 5 Rt 5 Imm Rw Ra Rb -bit A B Data Address Data In Ideal Data DataOut cps 14 7 The MIPS Formats All MIPS instructions are bits long The three instruction formats: 26 21 11 6 R-type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits I-type 26 21 op rs rt immediate 6 bits 5 bits 5 bits bits J-type 26 op target address 6 bits 26 bits The different fields are: op: operation of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of the jump instruction cps 14 8
The MIPS Subset (We can t implement them all!) ADD and subtract add rd, rs, rt sub rd, rs, rt OR Immediate: ori rt, rs, imm LOAD and STORE lw rt, rs, imm sw rt, rs, imm BRANCH: beq rs, rt, imm 26 21 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 26 21 op rs rt immediate 6 bits 5 bits 5 bits bits 11 6 JUMP: j target 26 op target address 6 bits 26 bits cps 14 9 The Hardware Program How do I build the hardware to implement the MIPS instructions and their sequencing? Fetch Decode Operand Fetch Execute Result Store Next cps 14 1
Combinational Logic Elements (Basic Building Blocks) Adder A Adder CarryIn Sum B Carry MUX Select A B MUX Y OP A Result B Zero cps 14 11 Storage Element: Register (Basic Building Block) Register Similar to the D Flip Flop except - N-bit input and output - Write Enable input Write Enable: - negated (): Data Out will not change - asserted (1): Data Out will become the same as Data In Write Enable Data In N Data Out N cps 14 12
Storage Element: Register File Register File consists of registers: Two -bit output busses: busa and busb One -bit input bus: busw Register is selected by: RA selects the register to put on busa RB selects the register to put on busb RW selects the register to be written via busw when Write Enable is 1 Clock input (CLK) RW RA RB Write Enable 5 5 5 busw The CLK input is a factor ONLY during write operation -bit During read operation, behaves as a combinational logic block: - RA or RB valid => busa or busb valid after access time busa busb cps 14 13 Storage Element: Idealized Write Enable Address (idealized) One input bus: Data In One output bus: Data Out word is selected by: Write Enable = : Address selects the word to put on the Data Out bus Write Enable = 1: Address selects the memory word to be written via the Data In bus Clock input (CLK) Data In The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: - Address valid => Data Out valid after access time DataOut cps 14 14
An Abstract View of the Implementation PC Address Ideal Rd 5 Rs 5 Rt 5 Imm Rw Ra Rb -bit A B Data Address Data In Ideal Data DataOut cps 14 15 Clocking Methodology Setup Hold Don t Care Setup Hold All storage elements are clocked by the same clock edge Cycle Time >= CLK-to-Q + Longest Delay Path + Setup + Clock Skew Longest delay path = critical path cps 14
An Abstract View of the Critical Path Register file and ideal memory: The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: - Address valid => Output valid after access time PC Address Ideal Rd 5 Rs 5 Rt 5 Imm Rw Ra Rb -bit Data Address Data In Ideal Data DataOut cps 14 17 The Steps of Designing a Processor Set Architecture => Register Transfer Language Register Transfer Language => Datapath components Datapath interconnect Datapath components => Control signals Control signals => Control logic cps 14 18
Overview of the Fetch Unit The common RTL operations Fetch the : mem[pc] Update the program counter: - Sequential Code: PC <- PC + 4 - Branch and Jump: PC <- something else PC Next Address Logic Address Word cps 14 19 RTL: The ADD add rd, rs, rt mem[pc] Fetch the instruction from memory R[rd] <- R[rs] + R[rt] The ADD operation PC <- PC + 4 Calculate the next instruction s address cps 14 2
RTL: The Load lw rt, rs, imm mem[pc] Fetch the instruction from memory Address <- R[rs] + SignExt(imm) Calculate the memory address R[rt] <- Mem[Address] Load the data into the register PC <- PC + 4 Calculate the next instruction s address cps 14 21 RTL: The ADD 26 21 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits add rd, rs, rt mem[pc] Fetch the instruction from memory R[rd] <- R[rs] + R[rt] The actual operation PC <- PC + 4 Calculate the next instruction s address cps 14 22
RTL: The Subtract 26 21 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits sub rd, rs, rt mem[pc] Fetch the instruction from memory R[rd] <- R[rs] - R[rt] The actual operation PC <- PC + 4 Calculate the next instruction s address cps 14 23 Datapath for Register-Register Operations R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt Ra, Rb, and Rw comes from instruction s rs, rt, and rd fields ctr and RegWr: control logic after decoding the instruction fields: op and func 26 21 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits busw RegWr Rd Rs 5 5 5 Rw Ra Rb -bit Rt busa busb ctr Result cps 14 24
Register-Register Timing Inst fetch Decode Opr fetch Execute Write Back PC Old Value Rs, Rt, Rd, Op, Func ctr RegWr busa, B busw cps 14 25 busw -to-q New Value Old Value Old Value Old Value Old Value Old Value RegWr Access Time New Value Rd Rs Rt 5 5 5 Rw Ra Rb -bit Delay through Control Logic New Value busa busb New Value Register File Access Time New Value ctr Result Delay New Value Register Write Occurs Here RTL: The OR Immediate 26 21 op rs rt immediate 6 bits 5 bits 5 bits bits ori rt, rs, imm mem[pc] Fetch the instruction from memory R[rt] <- R[rs] or ZeroExt(imm) The OR operation PC <- PC + 4 Calculate the next instruction s address bits 15 immediate bits cps 14 26
Datapath for Logical Operations with Immediate R[rt] <- R[rs] op ZeroExt[imm]] Example: ori rt, rs, imm 26 21 op rs rt immediate 6 bits 5 bits 5 bits bits Rd Rt RegDst Don t Care Rs RegWr (Rt) 5 5 5 busw imm Rw Ra Rb -bit busb ZeroExt busa Src ctr Result cps 14 27 RTL: The Load lw rt, rs, imm 26 21 op rs rt immediate 6 bits 5 bits 5 bits bits mem[pc] Fetch the instruction from memory Address <- R[rs] + SignExt(imm) Calculate the memory address R[rt] <- Mem[Address] Load the data into the register PC <- PC + 4 Calculate the next instruction s address 15 bits 15 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 bits immediate bits immediate bits cps 14 28
Datapath for Load Operations R[rt] <- Mem[R[rs] + SignExt[imm]] Example: lw rt, rs, imm 26 21 op rs rt immediate 6 bits 5 bits 5 bits bits Rd Rt RegDst Don t Care Rs RegWr (Rt) 5 5 5 busw imm Rw Ra Rb -bit busb Extender busa Src ctr Data In MemWr WrEn Adr Data MemtoReg cps 14 29 ExtOp RTL: The Store 26 6 bits 5 bits 5 bits bits sw rt, rs, imm 21 op rs rt immediate mem[pc] Fetch the instruction from memory Address <- R[rs] + SignExt(imm) Calculate the memory address Mem[Address] <- R[rt] Store the register into memory PC <- PC + 4 Calculate the next instruction s address cps 14 3
Datapath for Store Operations Mem[R[rs] + SignExt[imm] <- R[rt]] Example: sw rt, rs, imm 26 21 op rs rt immediate 6 bits 5 bits 5 bits bits RegDst busw Rd Rt RegWr 5 5 imm Rs Rt 5 Rw Ra Rb -bit busb Extender busa Src ctr Data In MemWr WrEn Adr Data MemtoReg cps 14 ExtOp RTL: The Branch 26 21 op rs rt immediate 6 bits 5 bits 5 bits bits beq rs, rt, imm mem[pc] Fetch the instruction from memory Cond <- R[rs] - R[rt] Calculate the branch condition if (COND eq ) Calculate the next instruction s address - PC <- PC + 4 + ( SignExt(imm) x 4 ) else - PC <- PC + 4 cps 14
Datapath for Branch Operations beq rs, rt, imm We need to compare Rs and Rt! 26 21 op rs rt immediate 6 bits 5 bits 5 bits bits Rd Rt RegDst Rs Rt RegWr 5 5 5 busw imm Rw Ra Rb -bit busb Extender busa Src ctr imm Branch Zero PC Next Address Logic To cps 14 33 ExtOp Binary Arithmetic for the Next Address In theory, the PC is a -bit byte address into the instruction memory: Sequential operation: PC<:> = PC<:> + 4 Branch operation: PC<:> = PC<:> + 4 + SignExt[Imm] * 4 The magic number 4 always comes up because: The -bit PC is a byte address And all our instructions are 4 bytes ( bits) long In other words: The 2 LSBs of the -bit PC are always zeros There is no reason to have hardware to keep the 2 LSBs In practice, we can simplify the hardware by using a 3-bit PC<:2>: Sequential operation: PC<:2> = PC<:2> + 1 Branch operation: PC<:2> = PC<:2> + 1 + SignExt[Imm] In either case: --Address = PC<:2> concat cps 14 34
Next Address Logic: Expensive and Fast Solution Using a 3-bit PC: Sequential operation: PC<:2> = PC<:2> + 1 Branch operation: PC<:2> = PC<:2> + 1 + SignExt[Imm] In either case: --Address = PC<:2> concat PC 3 1 imm <15:> SignExt Adder 3 3 3 3 Adder 3 1 Addr<:2> Addr<1:> <:> Branch Zero cps 14 35 Next Address Logic 3 PC imm <15:> 3 SignExt 3 1 3 1 Carry In Adder 3 Addr<:2> Addr<1:> <:> Branch Zero cps 14 36
RTL: The Jump 26 op target address 6 bits 26 bits j target mem[pc] Fetch the instruction from memory PC <- PC+4<:28> concat target<25:> concat <> Calculate the next instruction s address cps 14 37 Fetch Unit j target PC<:2> <- PC+4<:28> concat target<25:> 3 PC+4<:28> 4 Target <25:> 26 3 3 1 Addr<:2> Addr<1:> PC 3 1 imm <15:> Adder SignExt 3 Adder 3 3 1 Jump <:> Branch Zero cps 14 38
Putting it All Together: A Single Cycle Datapath We have everything except control signals RegDst busw 1 Rs Rt RegWr 5 5 5 Rd imm Rt Rw Ra Rb -bit busb Extender Branch Jump busa 1 Src Fetch Unit ctr Data In Zero <:> Rt <21:25> WrEn Rs <:2> MemWr Adr Data Rd <11:15> <:15> Imm 1 MemtoReg cps 14 39 ExtOp