Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Outline EEL-473 Computer Architecture Designing a Single Cycle path Introduction The steps of designing a processor path and timing for register-register operations path for logical operations with immediates path for load and store operations path for branch and jump operations Big Picture The five classic components of a computer Processor Control path Input Output Today s topic: design of a single cycle processor The Big Picture: The Performance Perspective Performance of a machine is determined by: CPI count Clock cycle time Clock cycles per instruction Inst Count - CPI will discuss later Processor design determines: Clock cycle time Clock cycles per instruction Cycle Time Single cycle processor: - Advantage: One clock cycle per instruction - Disadvantage: long cycle time

How to Design a Processor: step-by-step Analyze instruction set => datapath requirements The meaning of each instruction is given by the register transfers The datapath must include storage element for ISA registers - And possibly more The datapath must support each register transfer 2 Select set of datapath components and establish clocking methodology 3 Assemble datapath meeting the requirements 4 Analyze implementation of each instruction to determine setting of control points that effects the register transfer Assemble the control logic MIPS ISA: instruction formats All MIPS instructions are bits long There are 3 instruction formats: 6 R-type op rs rt rd shamt funct bits bits 6 bits I-type bits J-type op target address 6 bits bits The different fields are: op: operation of the instruction rs, rt, rd: the source(s) and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of the jump instruction *Step a: The MIPS lite subset for today Logical Register Transfers ADD and SUB addu rd, rs, rt subu rd, rs, rt OR Immediate: ori rt, rs, imm LOAD and STORE Word lw rt, rs, imm sw rt, rs, imm BRANCH: beq rs, rt, imm op rs rt rd shamt funct bits bits 6 bits bits bits bits 6 RTL gives the meaning of the instructions All start by fetching the instruction op rs rt rd shamt funct = MEM[ ] op rs rt Imm = MEM[ ] inst Register Transfers ADDU R[rd] < R[rs] + R[rt]; < + 4 SUBU R[rd] < R[rs] R[rt]; < + 4 ORi R[rt] < R[rs] + zero_ext(imm); < + 4 LOAD R[rt] < MEM[ R[rs] + sign_ext(imm)]; < + 4 STORE MEM[ R[rs] + sign_ext(imm) ] < R[rt]; < + 4 BEQ if ( R[rs] == R[rt] ) then < + sign_ext(imm)] else < + 4

Step : Requirements of the Set instruction & data ( x ) read RS read RT Write RT or RD Step 2: Components of the path Combinational Elements Storage Elements Clocking methodology Extender Add and Sub register or extended immediate Add 4 or extended immediate to Combinational Logic Elements (Basic Building Blocks) MUX A B Select A B OP A B MUX CarryIn Sum Carry Y Result Storage Element: Register (Basic Building Block) Register Similar to the D Flip Flop except - N-bit input and output - Write Enable input Write Enable: - negated () (not asserted): Out will not change - asserted (): Out will become In Write Enable In N Out N

Storage Element: Register File Register File consists of registers: Two -bit output busses: Write Enable and One -bit input bus: Register is selected by: Ra (number) selects the register to put on (data) Rb (number) selects the register to put on (data) Rw (number) selects the register to be written via (data) when Write Enable is Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block (ie, reads are not clocked): -bit - RA or RB valid => or valid after access time Storage Element: Idealized (idealized) One input bus: In One output bus: Out word is selected by: selects the word to put on Out Write Enable = : address selects the memory word to be written via the In bus Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block (ie, reads are not clocked): Write Enable In Out - valid => Out valid after access time Clocking Methodology Step 3 Setup Hold Don t Care Setup Hold Register Transfer Requirements > path Assembly Fetch Read Operands and Execute Operation All storage elements are clocked by the same clock edge Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew

3a: Overview of the Fetch Unit The common RTL operations Fetch the : mem[] Update the program counter: - Sequential Code: <- + 4 - Branch and Jump: <- something else RTL: The ADD 6 op rs rt rd shamt funct bits bits 6 bits add rd, rs, rt op rs rt rd shamt funct <- mem[] Next Logic R[rd] <- R[rs] + R[rt] The actual operation <- + 4 Calculate the next instruction s address Word RTL: The Subtract 6 op rs rt rd shamt funct bits bits 6 bits sub rd, rs, rt op rs rt rd shamt funct <- mem[] R[rd] <- R[rs] - R[rt] The actual operation <- + 4 Calculate the next instruction s address 3b: Add & Subtract R[rd] <- R[rs] op R[rt] Example: addu rd, rs, rt Ra, Rb, and Rw come from instruction s rs, rt, and rd fields ctr and RegWr: control logic after decoding the instruction 6 op rs rt rd shamt funct bits bits 6 bits ctrl RegWr -bit ctr Result

,,, Op, Func ctr Register-Register Timing -to-q New Value Old Value RegWr Access Time New Value -bit Delay through Control Logic New Value RegWr Old Value New Value, B Old Value Old Value Old Value Old Value Register File Access Time New Value ctr Delay New Value Result Register Write Occurs Here RTL: The OR Immediate ori rt, rs, imm op rs rt Imm <- mem[] R[rt] <- R[rs] OR ZeroExt(imm) bits The OR operation <- + 4 Calculate the next instruction s address *3c: Logical Operations with Immediate R[rt] <- R[rs] op ZeroExt[imm] ] RegDst RegWr imm -bit ZeroExt rd? bits Src ctr Result RTL: The Load lw rt, rs, imm op rs rt Imm <- mem[] Addr <- R[rs] + SignExt(imm) R[rt] <- Mem[Addr] Calculate the memory address Load the data into the register <- + 4 Calculate the next instruction s address bits bits bits immediate bits immediate bits

3d: Load Operations R[rt] <- Mem[R[rs] + SignExt[imm]] Example: lw rt, rs, imm 3e: Store Operations Mem[ R[rs] + SignExt[imm] <- R[rt] ] Example: sw rt, rs, imm rd bits bits RegDst RegWr imm -bit Extender Src ctr In MemWr WrEn Adr W_Src RegDst RegWr imm -bit Extender ctr In MemWr WrEn Adr W_Src ExtOp ExtOp Src 3f: The Branch beq rs, rt, imm op rs rt Imm <- mem[] Equal <- R[rs] == R[rt] Calculate the branch condition if (COND eq ) Calculate the next instruction s address - <- + 4 + ( SignExt(imm) x 4 ) else - <- + 4 bits path for Branch Operations beq rs, rt, imm path generates condition (equal) bits Branch Next Logic Word Inst RegWr -bit Equal Equal?

Putting it All Together: A Single Cycle path <:> RegDst RegWr Branch Next Logic imm -bit <:2> Extender <:2> Equal <:> <:> Imm ctr = <:> MemWr WrEn Adr In MemtoReg An Abstract View of the Critical Path Register file and ideal memory: Next The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: - valid => Output valid after access time Ideal -bit Imm A B Critical Path (Load Operation) = s -to-q + s Access Time + Register File s Access Time + to Perform a -bit Add + Access Time + Setup Time for Register File Write + Clock Skew In Ideal ExtOp Src Binary arithmetic for the next address In theory, the is a -bit byte address into the instruction memory: Sequential operation: <:> = <:> + 4 Branch operation: <:> = <:> + 4 + SignExt[Imm] * 4 Next Logic: Expensive and Fast Solution Using a 3-bit : Sequential operation: <:2> = <:2> + Branch operation: <:2> = <:2> + + SignExt[Imm] In either case: = <:2> concat The magic number 4 always comes up because: The -bit is a byte address And all our instructions are 4 bytes ( bits) long In other words: The 2 LSBs of the -bit are always zeros There is no reason to have hardware to keep the 2 LSBs In practice, we can simplify the hardware by using a 3-bit <:2>: Sequential operation: <:2> = <:2> + Branch operation: <:2> = <:2> + + SignExt[Imm] In either case: = <:2> concat 3 imm <:> SignExt 3 3 3 3 3 Branch Zero Addr<:2> Addr<:> <:>

Next Logic: Cheap and Slow Solution Why is this slow? Cannot start the address add until Zero (output of ) is valid Does it matter that this is slow in the overall scheme of things? Probably not here Critical path is the load operation imm <:> 3 SignExt 3 3 3 Carry In 3 Addr<:2> Addr<:> <:> RTL: The Jump op target address 6 bits bits j target mem[] <:2> <- <:28> concat target<2:> Calculate the next instruction s address Branch Zero Fetch Unit j target <:2> <- <:28> concat target<2:> Putting it All Together: A Single Cycle path We have everything except control signals (underline) 3 imm <:> <:28> Target <2:> SignExt 3 3 4 3 3 3 3 Branch Zero Jump Addr<:2> Addr<:> <:> RegDst RegWr imm -bit Extender Branch ExtOp Jump Src Fetch Unit ctr In Zero <:> <:2> WrEn <:2> MemWr Adr <:> <:> Imm MemtoReg

An Abstract View of the Implementation Next Ideal Logical vs Physical Structure -bit A B Control Control Signals path Conditions In Ideal Out Summary steps to design a processor Analyze instruction set => datapath requirements 2 Select set of datapath components & establish clock methodology 3 Assemble datapath meeting the requirements 4 Analyze implementation of each instruction to determine setting of control points that effects the register transfer Assemble the control logic MIPS makes it easier s same size Source registers always in same place Immediates same size, location Operations always on registers/immediates Single cycle datapath => CPI=, CCT => long Next time: implementing control (Steps 4 and )