COSC 243. Control Unit and Microprogramming. Lecture 12. COSC 243 (Computer Architecture)

COSC 243 Control Unit and Microprogramming 1

Operation of a CPU Each instruction has a bunch of things that need to happen: Get the instruction from memory: fetch cycle Load any operands from memory: indirect cycle Execute the instruction: execute cycle Check interrupts: interrupt cycle The Control unit is responsible for coordinating this 2

The Fetch Cycle The fetch cycle consists of a number of smaller steps 1. Copy PC into the Memory Address Register (MAR) 2. MAR output enable (put address on bus) 3. Memory read enable (put data on bus) 4. Read data off bus into memory buffer (MBR) 5. Copy the value MBR to IR 6. Add 1 to PC 3

The Fetch Cycle This can be written as a sequence of micro operations: t1: MAR (PC) t2: MBR Memory ; read t3: IR (MBR) t4: PC (PC) + 1 Initial t1 t2 t3 MAR MAR 00101100 MAR 00101100 MAR 00101100 t4 MAR 00101100 MBR PC 00101100 IR MBR PC 00101100 IR MBR 11011100 PC 00101100 IR MBR 11011100 PC 00101100 IR 11011100 MBR 11011100 PC 00101101 IR 11011100 4

Other cycles Each cycle is made up of more basic elements Transferring data in/out of registers Turning on/off control lines (for ALU, memory, etc.) Decision based on status flags

General control unit structure Inputs: Instruction register Status flags Clock signal Outputs: Control lines within CPU Control lines to bus (memory, I/O) State: Current micro-operation 6

General control unit structure Status Instruction Control Logic Control lines State 7

Control logic We need to calculate: Control signals Next state Based on: Current state Status flags instruction Two approaches: Hard wired: a combinatorial circuit microprogramming 8

Hardwired Approach Implemented with gates and flip-flops: State is instruction register, plus a counter Counter determines which micro-operation we are on Sequential design methodology is used State transition diagrams etc. Difficulties: Complicated state diagrams: many control lines, many states Long gate delays Hard to add instructions A typical CPU has between 50 and several hundred instructions Single mistake is difficult to fix 9

Microprogramming Store the sequences of micro-operations as a kind of program The control unit is a kind of mini-cpu: Executes micro-operations in sequence Can jump to other micro-operations based on status flags But only thing it does is turns control wires on and off, and determine the next micro-op Micro-program is stored in a ROM (or EEPROM etc.) Can be updated on boot in some CPUs 10

Microprogramming Instruction Register Status Clock Sequence logic Next address Firmware ROM Decode Next address control Control lines 11

Sequence logic Each micro-operation in firmware has a unique address (just like a normal program in main memory) Sequence logic just determines address of next micro-operation Basic sequence logic, on each clock pulse: Check any branch conditions. If condition is met, branch to given address Check if this instruction is done. If it is, load start address for next instruction Otherwise, increment address 12

Microinstructions Microinstructions need to be decoded Two main strategies: Horizontal microinstructions: Each control wire's output is stored as a bit in the instruction. No decoding necessary. Very wide instructions (> 40 bits common) Vertical microinstructions: Control wires are encoded (eg. binary encoded number of register, rather than one bit per register) Shorter instructions, but decoding necessary Many variations are possible. Horizontal is most common 13

Instruction Pipelining Recall: Fetch Cycle Indirect Cycle Execute Cycle Interrupt Cycle What is the fetch hardware doing during the execute cycle? Can we put it to good use? 14

Instruction Pre-fetching In a 2-stage pipeline the fetch unit pre-fetches the next instruction while the execute unit performs the current instruction If it s the wrong instruction (due to a branch) it discards it and loads the correct instruction New Address Instruction Fetch Instruction Execute Result Discard 15

Speedup If the fetch and execution times were the same then the speedup would be 100% (double the throughput) However, Instruction times are usually dominated by execution time So in practice it doesn t double in a 2-stage pipeline Branch instructions can slow it down Because it has pre-fetched from the wrong addresses Also note, Extra hardware is needed to buffer the reads 16

Branches In a pipelined system the next instruction is pre-fetched If the current instruction is if there are two outcomes: Continue with the next instruction Go somewhere else In the first case the pre-fetch is correct The pipeline continues In the second case it is incorrect The result must be discarded and the pipeline restarted How can we use this to our advantage when programming in a high level language? 17

6-Stage Pipeline The more stages the instruction is broken into the more instructions that can be pipelined. In a 6-stage pipeline we see: Fetch Instruction Decode Instruction Calculate Operands Fetch Operands Execute Instruction Write Operand It s assumed that all instructions go through all stages And that there are no bus collisions 18

6-Stage Pipeline Clock cycle Instruction 1 2 3 4 5 6 7 8 1 Fetch Decode Calculate Operands Fetch Operands 2 Fetch Decode Calculate Operands Execute Fetch Operands 3 Fetch Decode Calculate Operands Write Execute Fetch Operands 4 Fetch Decode Calculate Operands Write Execute Fetch Operands Write Execute 19

Problems with Pipelines Data must be moved from stage to stage in the pipeline This takes time, so much time that the overall time of each instruction can be increased The extra hardware is complex So complex that it can become more complex than the operations themselves 20

Conditional Branches What happens with a conditional branch instruction? What if we need the results of the previous instruction in the next instruction? 21

Pipeline Hazards Resource hazard / structural hazard Two or more instructions need the same resource Such as reading from memory, the ALU, etc Data hazard Read after write (RAW) One instruction writes to a memory location then another reads from it But, in the pipeline the data has already been read Write after read (WAR) One instruction reads from a memory location then another writes to it But if the write finished before the read occurs they are out-of-order Write after write (WAW) Two instructions write to a memory location one after the other But they happen out-of-order in the pipeline 22

Control Hazards Control hazard Incorrect prediction of the outcome of a conditional branch Solutions to the Control Hazard Multiple Streams Fill a second pipeline with the alternative destination Expensive and hard to avoid contention Pre-fetch Branch Target Preload just the branch target instruction Loop Buffer Small cache in the fetch unit that stores the last n instructions Branch prediction Never / always / opcode / history 23

Summary Each instruction has 4 cycles Fetch / Indirect / Execute / Interrupt These can be broken down into smaller steps Micro operations And micro programmed In microcode They can also be run separately and joined together In a pipeline 24