ECE468 Computer Organization and Architecture esigning a Single Cycle atapath Processor esign: How to Implement MIPS Simplicity favors regularity ECE468 atapath1 23-3-19 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input atapath Output Today s Topic: atapath esign What is data? What is datapath? ECE468 atapath2 23-3-19
The Big Picture: The Performance Perspective Performance of a machine was determined by: Instruction count Clock cycle time Clock cycles per instruction Processor design (datapath and control) will determine: Clock cycle time Clock cycles per instruction In the next two lectures: Single cycle processor: - Advantage: One clock cycle per instruction - isadvantage: long cycle time ECE468 atapath3 23-3-19 The MIPS Instruction Formats All MIPS instructions are bits long The three instruction formats: 21 11 6 R-type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits I-type 21 op rs rt immediate 6 bits 5 bits 5 bits bits J-type op target address 6 bits bits The different fields are: op: operation of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of the jump instruction ECE468 atapath4 23-3-19
The MIPS Subset A and subtract add rd, rs, rt sub rd, rs, rt OR Immediate: ori rt, rs, imm LOA and STORE lw rt, rs, imm sw rt, rs, imm BRANCH: beq rs, rt, imm 21 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 21 op rs rt immediate 6 bits 5 bits 5 bits bits JUMP: j target op target address 6 bits bits ECE468 atapath5 23-3-19 An Abstract View of the Implementation Two types of functional units Operational element that operate on data (combinational) State element that contain data (sequential) PC Ideal Instruction Instruction Address Rd 5 Instruction bus Rs 5 Rt 5 Imm Generic Implementation: use PC to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow? Rw Ra Rb -bit ALU ata Address ata In Ideal ata ataout Next step: to fill in the details: more units, more connections, and control unit ECE468 atapath6 23-3-19
State Elements Unclocked vs Clocked Clocks used in synchronous logic when should an element that contains state be updated? falling edge cycle time rising edge ECE468 atapath7 23-3-19 An unclocked state element The set-reset latch output depends on present inputs and also on past inputs R Q S Q ECE468 atapath8 23-3-19
Latches and Flip-flops Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) Change of state (value) is based on the clock Latches: whenever the inputs change, and the clock is asserted Flip-flop: state changes only on a clock edge (edge-triggered methodology) "logically true", could mean electrically low A clocking methodology defines when signals can be read and written wouldn't want to read a signal at the same time it was being written ECE468 atapath9 23-3-19 -latch and flip-flop Two inputs: the data value to be stored () the clock signal (C) indicating when to read & store Output changes when C is high C Q _ Q C Q Output changes only on the clock edge Q latch C Q latch _ C Q Q _ Q C C Q ECE468 atapath1 23-3-19
Clocking Methodology (Appendix B7) Setup Hold on t Care Setup Hold All storage elements are clocked by the same clock edge Edge-trigged: all stored values are updated on a clock edge Cycle Time = Latch Prop + Longest elay Path + Setup + Clock Skew (Latch Prop + Shortest elay Path - Clock Skew) > Hold Time ECE468 atapath11 23-3-19 An Abstract View of the Critical Path Register file and ideal memory: The CLK input is a factor ONLY during write operation uring read operation, behave as combinational logic: - Address valid => Output valid after access time PC Ideal Instruction Instruction Address Rd 5 Instruction bus Rs 5 Rt 5 Imm Critical Path (Load Operation) = PC s prop time + Instruction s Access Time + Register File s Access Time + ALU to Perform a -bit Add + ata Access Time + Setup Time for Register File Write + Clock Skew Rw Ra Rb -bit ALU ata Address ata In Ideal ata ataout ECE468 atapath12 23-3-19
The Steps of esigning a Processor Instruction Set Architecture => Register Transfer Language Register Transfer Language (RTL) => atapath components atapath interconnect atapath components => Control signals Control signals => Control logic Element < component ECE468 atapath13 23-3-19 What is RTL: The A Instruction add rd, rs, rt Register Transfer Language mem[pc] Fetch the instruction from memory R[rd] R[rs] + R[rt] The A operation PC PC + 4 Calculate the next instruction s address ECE468 atapath14 23-3-19
What is RTL: The Load Instruction lw rt, rs, imm mem[pc] Fetch the instruction from memory Addr R[rs] + SignExt(imm) Calculate the memory address R[rt] Mem[Addr] Load the data into the register PC PC + 4 Calculate the next instruction s address ECE468 atapath15 23-3-19 Combinational Logic Elements Adder MUX (pb-9,b-19) ALU A B A B A Select OP MUX Adder Y B ALU CarryIn ECE468 atapath 23-3-19 Sum Carry Result Zero ecoder 3 ecoder out out1 out2 out7 In which cases do we need an adder, ALU, MUX or ecoder?
Storage Element: Register (pb22-b25) Register Write Enable Similar to the Flip Flop except - N-bit input and output ata In - Write Enable input N Write Enable: - : ata Out will not change - 1: ata Out will become ata In Array of logical elements(see register file on next 2 slides) ata Out N The content is updated at the clock tick ONLY if the Write Enable signal is set to 1 ECE468 atapath17 23-3-19 Storage Element: Register File Register File consists of registers: Two -bit output busses: busa and busb One -bit input bus: Register is selected by: RA selects the register to put on busa RB selects the register to put on busb RW selects the register to be written via when Write Enable is 1 RW RA RB Write Enable 5 5 5 -bit busa busb Clock input (CLK) The CLK input is a factor ONLY during write operation uring read operation, behaves as a combinational logic block: - RA or RB valid => busa or busb valid after access time ECE468 atapath18 23-3-19
Storage Element: Register File -- etailed diagram RW RA RB Write Enable 5 5 5 Write Enable RA RB -bit busa busb RW 1 -to-1 ecoder 3 C C Register Register 1 M U busa C C Register 3 Register X M U busb X ECE468 atapath19 23-3-19 Storage Element: Idealized Write Enable Address (idealized) One input bus: ata In ata In ataout One output bus: ata Out word is selected by: Address selects the word to put on ata Out Write Enable = 1: address selects the memory memory word to be written via the ata In bus Clock input (CLK) The CLK input is a factor ONLY during write operation uring read operation, behaves as a combinational logic block: - Address valid => ata Out valid after access time ECE468 atapath2 23-3-19
Overview of the Instruction Fetch Unit (Fig 55) The common RTL operations Fetch the Instruction: mem[pc] Update the program counter: - Sequential Code: PC <- PC + 4 - Branch and Jump PC <- something else PC Next Address Logic Address Instruction Instruction Word ECE468 atapath21 23-3-19 RTL: The A Instruction 21 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits add rd, rs, rt mem[pc] Fetch the instruction from memory R[rd] R[rs] + R[rt] The actual operation PC PC + 4 Calculate the next instruction s address ECE468 atapath22 23-3-19
RTL: The Subtract Instruction 21 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits sub rd, rs, rt mem[pc] Fetch the instruction from memory R[rd] R[rs] - R[rt] The actual operation PC PC + 4 Calculate the next instruction s address ECE468 atapath23 23-3-19 atapath for Register-Register Operations R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt Ra, Rb, and Rw comes from instruction s rs, rt, and rd fields ALUctr and RegWr: control logic after decoding the instruction 21 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits RegWr Rd Rs 5 5 5 Rt Rw Ra Rb -bit busa busb ALUctr ALU Result ECE468 atapath24 23-3-19
Register-Register Timing PC Rs, Rt, Rd, Op, Func ALUctr Old Value -to-q New Value Old Value Old Value Instruction Access Time New Value elay through Control Logic New Value RegWr Old Value New Value Register File Access Time busa, B Old Value New Value ALU elay Old Value New Value RegWr Rd Rs Rt 5 5 5 Rw Ra Rb -bit busa busb ALUctr ALU Result Register Write Occurs Here ECE468 atapath25 23-3-19 RTL: The OR Immediate Instruction 21 op rs rt immediate 6 bits 5 bits 5 bits bits ori rt, rs, imm mem[pc] Fetch the instruction from memory R[rt] R[rs] or ZeroExt(imm) The OR operation PC PC + 4 Calculate the next instruction s address bits 15 immediate bits ECE468 atapath 23-3-19
atapath for Logical Operations with Immediate R[rt] <- R[rs] op ZeroExt[imm]] Example: ori rt, rs, imm 21 op rs rt immediate 6 bits 5 bits 5 bits bits Rd Rt Regst on t Care Rs RegWr (Rt) 5 5 5 imm Rw Ra Rb -bit busb ZeroExt busa ALUctr Result ECE468 atapath27 23-3-19 ALUSrc Newly added parts are in blue color ALU RTL: The Load Instruction lw rt, rs, imm 21 op rs rt immediate 6 bits 5 bits 5 bits bits mem[pc] Fetch the instruction from memory Addr R[rs] + SignExt(imm) Calculate the memory address R[rt] Mem[Addr] Load the data into the register PC PC + 4 Calculate the next instruction s address 15 bits 15 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 bits immediate bits immediate bits ECE468 atapath28 23-3-19
atapath for Load Operations R[rt] <- Mem[R[rs] + SignExt[imm]] Example: lw rt, rs, imm 21 op rs rt immediate 6 bits 5 bits 5 bits bits Rd Rt Regst on t Care Rs RegWr (Rt) 5 5 5 imm Rw Ra Rb -bit busb Extender busa ALUSrc ALUctr ALU ata In MemWr WrEn Adr ata MemtoReg ExtOp ECE468 atapath29 23-3-19 RTL: The Store Instruction 21 op rs rt immediate 6 bits 5 bits 5 bits bits sw rt, rs, imm mem[pc] Fetch the instruction from memory Addr R[rs] + SignExt(imm) Calculate the memory address Mem[Addr] R[rt] Store the register into memory PC PC + 4 Calculate the next instruction s address ECE468 atapath3 23-3-19
atapath for Store Operations Mem[R[rs] + SignExt[imm] <- R[rt]] Example: sw rt, rs, imm 21 op rs rt immediate 6 bits 5 bits 5 bits bits Regst Rd Rt RegWr 5 5 imm Rs Rt 5 Rw Ra Rb -bit busb Extender busa ALUSrc ALUctr ALU ata In MemWr WrEn Adr ata MemtoReg ExtOp ECE468 atapath 23-3-19 RTL: The Branch Instruction 21 op rs rt immediate 6 bits 5 bits 5 bits bits beq rs, rt, imm mem[pc] Fetch the instruction from memory Cond R[rs] - R[rt] Calculate the branch condition if (CON eq ) Calculate the next instruction s address - PC PC + 4 + ( SignExt(imm) x 4 ) else - PC PC + 4 ECE468 atapath 23-3-19
atapath for Branch Operations beq rs, rt, imm We need to compare Rs and Rt! 21 op rs rt immediate 6 bits 5 bits 5 bits bits Rd Rt Regst Rs Rt RegWr 5 5 5 imm Rw Ra Rb -bit busb Extender busa ALUSrc ALUctr ALU imm Branch Zero PC Next Address Logic To Instruction ExtOp ECE468 atapath33 23-3-19 Binary Arithmetic for the Next Address In theory, the PC is a -bit byte address into the instruction memory: Sequential operation: PC<:> = PC<:> + 4 Branch operation: PC<:> = PC<:> + 4 + SignExt[Imm] * 4 The magic number 4 always comes up because: The -bit PC is a byte address And all our instructions are 4 bytes ( bits) long In other words: The 2 LSBs of the -bit PC are always zeros There is no reason to have hardware to keep the 2 LSBs In practice, we can simplify the hardware by using a 3-bit PC<:2>: Sequential operation: PC<:2> = PC<:2> + 1 Branch operation: PC<:2> = PC<:2> + 1 + SignExt[Imm] In either case: Instruction Address = PC<:2> concat ECE468 atapath34 23-3-19
Next Address Logic: Expensive and Fast Solution Using a 3-bit PC: Sequential operation: PC<:2> = PC<:2> + 1 Branch operation: PC<:2> = PC<:2> + 1 + SignExt[Imm] In either case: Instruction Address = PC<:2> concat PC 3 1 imm Instruction<15:> SignExt Adder 3 3 3 3 Adder 3 1 Addr<:2> Addr<1:> Instruction Instruction<:> Branch Zero ECE468 atapath35 23-3-19 Next Address Logic: Cheap and Slow Solution Why is this slow? Cannot start the address add until Zero (output of ALU) is valid oes it matter that this is slow in the overall scheme of things? Probably not here Critical path is the load operation 3 PC imm Instruction<15:> 3 SignExt 3 1 3 1 Carry In Adder 3 Addr<:2> Addr<1:> Instruction Instruction<:> Branch Zero ECE468 atapath36 23-3-19
RTL: The Jump Instruction op target address 6 bits bits j target mem[pc] Fetch the instruction from memory PC<:2> PC<:28> concat target<25:> Calculate the next instruction s address ECE468 atapath37 23-3-19 Instruction Fetch Unit j target PC<:2> PC<:28> concat target<25:> PC<:28> Target Instruction<25:> 3 4 3 3 1 Addr<:2> Addr<1:> Instruction PC 3 1 imm Adder SignExt 3 Adder 3 3 1 Jump Instruction<:> Instruction<15:> Branch Zero This is the whole design of Instruction Fetch Unit: 3 inputs: jump, Branch and Zero; 1 output: instruction word ECE468 atapath38 23-3-19
Putting it All Together: A Single Cycle atapath We have everything except control signals (underline) Regst 1 Rs Rt RegWr 5 5 5 Rd imm Rt Rw Ra Rb -bit busb Extender Branch Jump busa ExtOp 1 ALUSrc Instruction Fetch Unit ALUctr ata In ECE468 atapath39 23-3-19 ALU Zero Instruction<:> Rt <21:25> Rs <:2> MemWr WrEn Adr ata Rd <11:15> <:15> Imm 1 MemtoReg Where to get more information? To be continued ECE468 atapath4 23-3-19