Processor Basic steps to process an instruction Execute Write Back Instruction Fetch ory ccess Instruction Decode Operand Fetch EE524/CptS561 Jose G. Delgado-Frias 1 path +4 N Inst.. IR B L U. imm Instruction Fetch Inst. Dec. Op. Fetch Execute ory ccess Write Back IR [ N + 4 [IR 6..10 B [IR 11..15 Imm ((IR 16 ) 16 ## IR 11..15 EE524/CptS561 Jose G. Delgado-Frias 2 1
path (arith/logic inst) +4 N Inst.. IR B L U. imm Instruction Fetch Inst. Dec. Op. Fetch Execute ory ccess Write Back IR [ N + 4 [IR 6..10 B [IR 11..15 Imm ((IR 16 ) 16 ## IR 11..15 EE524/CptS561 Jose G. Delgado-Frias 3 path (load Inst) +4 N Inst.. IR B L U. imm Instruction Fetch Inst. Dec. Op. Fetch Execute ory ccess Write Back IR [ N + 4 [IR 6..10 B [IR 11..15 Imm ((IR 16 ) 16 ## IR 11..15 EE524/CptS561 Jose G. Delgado-Frias 4 2
path (store Inst) +4 N Inst.. IR B L U. imm Instruction Fetch Inst. Dec. Op. Fetch Execute ory ccess Write Back IR [ N + 4 [IR 6..10 B [IR 11..15 Imm ((IR 16 ) 16 ## IR 11..15 EE524/CptS561 Jose G. Delgado-Frias 5 path (branch Inst) +4 N Inst.. IR B L U. imm Instruction Fetch Inst. Dec. Op. Fetch Execute ory ccess Write Back IR [ N + 4 [IR 6..10 B [IR 11..15 Imm ((IR 16 ) 16 ## IR 11..15 EE524/CptS561 Jose G. Delgado-Frias 6 3
path w/ pipeline +4 Inst.. L U. EE524/CptS561 Jose G. Delgado-Frias 7 Instruction Fetch () stage 0 /ID 4 dd +4 m u x +4 +4 Instruction memory [ [ IR DD R7,R2,R5 clock EE524/CptS561 Jose G. Delgado-Frias 8 4
Instruction Decode (ID) stage 1 /ID ID/ IR IR 6..10 IR 11..15 rs1 rs2 isters [R2 [R5 IR 16..31 R7 Sign extend EE524/CptS561 Jose G. Delgado-Frias 9 Execution () stage 2 ID/ / [R2 [R5 SUM R7 EE524/CptS561 Jose G. Delgado-Frias 10 5
ory () stage 3 / / SUM ory R7 SUM EE524/CptS561 Jose G. Delgado-Frias 11 Write Back () stage 4 / SUM R7 SUM EE524/CptS561 Jose G. Delgado-Frias 12 6
Instruction Decode (ID) stage 4 /ID ID/ IR IR 6..10 IR 11..15 R7 SUM rs1 rs2 isters IR 16..31 Sign extend EE524/CptS561 Jose G. Delgado-Frias 13 Instruction Fetch () stage 100 /ID 4 dd +4 m u x +4 +4 Instruction memory [ [ IR BNZ R5,-32 clock EE524/CptS561 Jose G. Delgado-Frias 14 7
Instruction Decode (ID) stage 101 /ID +4 ID/ IR IR 6..10 IR 11..15 rs1 rs2 isters [R5 [RX IR 16..31 RY -32-32 Sign extend EE524/CptS561 Jose G. Delgado-Frias 15 Execution () stage 102 ID/ T / +4 [R5 [RX -28-32 RY EE524/CptS561 Jose G. Delgado-Frias 16 8
ory () stage 103 T / / -28 ory RY EE524/CptS561 Jose G. Delgado-Frias 17 Instruction Fetch () stage -28 /ID T 103 4 dd +16 m u x +4-28 Instruction memory [ IR EE524/CptS561 Jose G. Delgado-Frias 18 9
path w/ pipeline +4 Inst.. L U. EE524/CptS561 Jose G. Delgado-Frias 19 Pipeline 1 INSTRUCTIONS 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 CLOCK CYCLE EE524/CptS561 Jose G. Delgado-Frias 20 10
Pipeline Clock cycle 1 2 3 4 5 6 7 8 EE524/CptS561 Jose G. Delgado-Frias 21 Pipeline Hazards Structural Hazards two or more instructions use same hardware at the same time. Hazards dependencies Result from inst. j is needed by inst. k Control Hazards Branch changes flow, what happen with the following instruction(s) EE524/CptS561 Jose G. Delgado-Frias 22 11
Resources EE524/CptS561 Jose G. Delgado-Frias 23 Hazards R1 R2+R3 R5 R1+R3 R8 R1-R6 EE524/CptS561 Jose G. Delgado-Frias 24 12
Forwarding R1 R2+R3 R5 R1+R3 R8 R1-R6 EE524/CptS561 Jose G. Delgado-Frias 25 path w/ pipeline +4 Inst.. L U. Forwarding unit EE524/CptS561 Jose G. Delgado-Frias 26 13
Example DD XOR SUB R1,R2,R3 R7,R8,R1 R4,R3,R1 XOR SUB DD R7,R8,R1 R4,R3,R1 R1,R2,R3 SUB DD R4,R3,R1 R1,R2,R3 DD R1,R2,R3 +4 Inst.. L U. Forwarding unit EE524/CptS561 Jose G. Delgado-Frias 27 Example XOR R7,R8,R1 SUB R4,R3,R1 DD R1.. +4 Inst.. L U. Forwarding unit EE524/CptS561 Jose G. Delgado-Frias 28 14
Hazard Classification RW (Read fter Write) w/ forward only load presents a problem WW WR RR j: R1 k: RY R1 j: R1 k: R1 j: R1 k: R1 j: R1 k: R1 EE524/CptS561 Jose G. Delgado-Frias 29 Forwarding (load) R1 LD[ R5 R1+R3 R8 R1-R6 EE524/CptS561 Jose G. Delgado-Frias 30 15
hazard (load) R1 LW R1,0(R1) ID SUB R4,R1,R5 ID stall ND R6,R1,R7 OR R8,R1,R9 stall stall ID ID EE524/CptS561 Jose G. Delgado-Frias 31 Branch LBEL_: BR R1, LBEL_ DD R2,R3,R7 ND R5,R7,R11 : : LD R4,R2,005 EE524/CptS561 Jose G. Delgado-Frias 32 16
Branch BR R1, LBEL_ DD R2,R3,R7 ND R5,R7,R11 LD R4,R2,005 EE524/CptS561 Jose G. Delgado-Frias 33 path w/ pipeline +4 Inst.. L U. Forwarding unit EE524/CptS561 Jose G. Delgado-Frias 34 17
What to do w/ branch Reduce the number of cycles to decide on a branch. Delayed branch (Software Solutions) NO-OP move instructions from before from target from fall through EE524/CptS561 Jose G. Delgado-Frias 35 Branch BR R1, LBEL_ DD R2,R3,R7 LD R4,R2,005 EE524/CptS561 Jose G. Delgado-Frias 36 18
NO-OP Branch NO-OP EE524/CptS561 Jose G. Delgado-Frias 37 From Before Branch EE524/CptS561 Jose G. Delgado-Frias 38 19
From Target Branch EE524/CptS561 Jose G. Delgado-Frias 39 From Fall Through Branch EE524/CptS561 Jose G. Delgado-Frias 40 20
Example loop: LW R1,0(R2) R1 [R2+0 DDI SW DDI SUB R1,R1,#1 0(R2),R1 R2,R2,#4 R4,R3,R2 R1 R1+1 [R2+0 R1 R2 R2+4 R4 R3-R2 BNEZ R4,loop R4 = 0 GOTO loop Initial value: R3 = R2+396 EE524/CptS561 Jose G. Delgado-Frias 41 Dynamic Branch Prediction Hardware approach Based on past history 2-bit counter EE524/CptS561 Jose G. Delgado-Frias 42 21
2-bit prediction scheme Taken Predict taken Not Taken Taken Predict taken Taken Not Taken Predict not taken Not Taken Taken Predict not taken Not Taken EE524/CptS561 Jose G. Delgado-Frias 43 Branch Target Buffer (BTB) Current Predicted Branch Prediction Taken / Untaken EE524/CptS561 Jose G. Delgado-Frias 44 22