Computer Organization Datapath Design No. 11-1
Motivation Computer Design as an application of digital logic design procedure Computer = Processing Unit + Memory System + I/O Units Processing Unit = Control + Datapath Control = Finite State Machine Inputs = Machine Instruction, Datapath Conditions Outputs = Register Transfer Control Signals Instruction Interpretation = Instruction Fetch, Decode, Execute Datapath = Functional Units + Registers Functional Units = LU, Multipliers, Dividers, etc. Registers = Program Counter, Shifters, Storage Registers No. 11-2
Chapter Overview Design of Datapaths and Processor Control Units Datapath interconnection strategies: Point-to-Point, Single Bus, Multiple Busses No. 11-3
Structure of a Computer Von Neumann rchitecture Basic Computer Organization Processing Unit =LU + Registers CPU = Control Unit + Processing Unit No. 11-4
Structure of a Computer Block Diagram View Central Processing Unit (CPU) Processor ddress Read/Write Data Memory System Control Control Signals Data Inputs Datapath Instruction Unit Instruction fetch and interpretation FSM Execution Unit Functional Units and Registers No. 11-5
Structure of a Computer Typical Instruction Formats No. 11-6
Structure of a Computer Core organization of a typical single-address computer No. 11-7
Structure of a Computer Example of Instruction Sequencing Instruction: dd Rx to Ry and place result in Rz Step 1: Fetch the dd instruction from Memory to Instruction Register (IR) Step 2: Decode Instruction Instruction in IR is an DD Source operands are Rx, Ry Destination operand is Rz Step 3: Execute Instruction Move Rx and Ry to LU Set up LU to perform DD function DD Rx to Ry Move LU result to Rz No. 11-8
Structure of a Computer Instruction Types Data Manipulation dd, Subtract, etc. Data Staging Load/Store data to/from memory Register-to-register move Control Conditional/unconditional branches subroutine call and return No. 11-9
Structure of a Computer Control Elements of the Control Unit (Instruction Unit): Standard FSM: State Register Next State Logic Output Logic (datapath control signaling) Plus dditional "Control" Registers: Instruction Register (IR) Program Counter (PC) No. 11-10
Structure of a Computer Control 0 Reset Control State Diagram Reset Fetch Instruction Decode Execute Instructions partitioned into three classes: Branch Load/Store Register-to-Register Branch Branch Taken XEQ Instr. Branch Not Taken Load/ Store Fetch Instr. XEQ Instr. Incr. PC Initialize Machine Registerto-Register Different Sequence for Each Instruction Type XEQ Instr. Housekeeping No. 11-11
Structure of a Computer Datapath LU Block Diagram 32 32 B LU Operation 32 Cout S No. 11-12
Structure of a Computer Block Diagram/Register Transfer View Single ccumulator Machine C := C <op> Mem "single address instructions" C implicit operand Control Flow Data Flow FSM Opcode IR C LU S Store Path B Load Path Memory ddress MR PC Memory N bits wide M words Instruction Path rrowhead Lines represent dataflow others are control flow Memory ddress Register Hold address during memory accesses No. 11-13
Structure of a Computer Block Diagram/Register Transfer View Placement of Data and Instructions in Memory: Data and instructions mixed in memory: Princeton rchitecture Data and instructions in separate memory: Harvard rchitecture Princeton architecture simpler to implement Harvard architecture has certain performance advantages: overlap instruction fetch with operand fetch We assume the more common Princeton architecture throughout No. 11-14
Structure of a Computer Core Organization of a Typical Single-ddress computer No. 11-15
Structure of a Computer Block Diagram/Register Transfer View Trace an instruction: C := C + Mem<address> 1. Instruction Fetch: Move PC to MR Initiate a memory read sequence Move data from memory to IR MR PC; MD M<MR>; IR MD; 2. Instruction Decode: Op code bits of IR are input to control FSM Rest of IR bits encode the operand address No. 11-16
Structure of a Computer Block Diagram/Register Transfer View Trace an instruction: C := C + Mem<address> 3. Operand Fetch: Move operand address from IR to MR Initiate a memory read sequence MR IR<addr>; MD M<MR>; 4. Instruction Execute: Data available on load path Move data to LU input Configure LU to perform DD operation Move S result to C C S; 5. Housekeeping: Increment PC to point at next instruction PC PC+1; No. 11-17
Structure of a Computer Block Diagram/Register Transfer View Control: Transfer data from one register to another ssert appropriate control signals Register transfer notation Register to Register moves Ifetch: MR PC; MD M<MR>; IR MD; -- move PC to MR -- assert Memory RED signal -- load IR from Memory Instruction Decode: IF IR<op code> = DD_FROM_MEMORY THEN Instruction Execution: MR IR<addr>; MD M<MR>; -- move operand addr to MR -- assert Memory RED signal ssert Control Signal Route MD to LU B; Route C to LU ; S = DD(,B); C S; -- gate MD to LU B -- gate C to LU -- instruct LU to perform DD -- gate LU result to C PC PC+1; -- increment PC No. 11-18
Structure of a Computer R R Core Organization of a Typical Single-ddress Computer No. 11-19
Structure of a Computer Memory Interface More Realistic Block Diagram: Issue memory request Is it a read or a write? Memory asks CPU to wait PC IR Request Read/Write Wait LD/ST Data Instructions M R M D R Memory Decouple memory system from internal processor operation Memory Buffer Register No. 11-20
Structure of a Computer Memory Interface No common clock between CPU and memory Follow asynchronous 4-cycle handshake request/wait (ack) protocol 1. Request sserted Request 2. Wait Unasserted Read/Write 3. Request Unasserted Data From Memory To Memory 4. Wait sserted Wait Read Cycle Write Cycle Memory cannot make request unless Wait signal is asserted Hi-to-Low transition on Wait implies that data is ready (read) or data has been latched by memory (write) No. 11-21
Structure of a Computer I/O Interface Memory-Mapped I/O I/O devices share the memory address space Control registers manipulated just like memory word Read/write signal is used to initiate I/O operation dv. No special I/O instructions I/O and memory operations use the same bus Less expensive and easy to implement Disadv.: Slow Isolated I/O (Independent I/O) ddress space for memory and I/O are separate Special I/O instructions (e.g. IN, OUT) are used to manipulate I/O dv.: Fast No. 11-22
Polling Program periodically checks whether I/O has completed its operation Interrupts I/O device signals CPU when operation is complete Software must take over to handle the data transfers from the device Check for interrupt pending before fetching next instruction Save PC & vector to special memory location for next instruction Instruction set includes a "return from interrupt" instruction No. 11-23
Busing Strategies Register-to-Register Coummunications Point-to-point Single shared bus Multiple special purpose busses Tradeoffs between datapath/control complexity and amount of parallelism supported by the hardware Case study: Four general purpose registers that must be able to exchange their contents Swap instruction must be supported: SWP(Ri, Rj) Ri Rj Rj; Ri; No. 11-24
Bussing Schemes Point-to-Point Connection Scheme S0<1:0> MUX S1<1:0> MUX S2<1:0> MUX S3<1:0> MUX LD0 R0 LD1 R1 LD2 R2 LD3 R3 Four registers interconnected via 4:1 Mux's and point-to-point connections Edge-triggered N bit registers controlled by LDi signals N x 4:1 MUXes per register, controlled by Si<1:0> signals No. 11-25
Bussing Schemes Point-to-point Connections Example: Register transfers R0 R1 and R3 R2 Register transfer operations: S0<1:0> 01; S3<1:0> 10; LD0 1; LD3 1; Enable path from R1 to R0 Enable path from R2 to R3 ssert load for R0 ssert load for R3 No. 11-26
Bussing Strategies Single Bus Interconnection S0<1:0> MUX S1<1:0> MUX S2<1:0> MUX S3<1:0> MUX LD0 R0 LD1 R1 LD2 R2 LD3 R3 S<1:0> MUX Single Bus LD0 R0 LD1 R1 LD2 R2 LD3 R3 per register MUX block replaced by single block 25% hardware cost of previous alternative shared set of pathways is called a BUS Single bus becomes a critical resource -- used by only one transfer at a time No. 11-27
Bussing Strategies Single Bus Interconnection SWP Operation special TEMP register must be introduced ("Register 4") MUX's become 5:1 rather than 4:1 State X: State Y: State Z: (R1 R4) 001 S<2:0>; 1 LD4; (R2 R1) 010 S<2:0>; 1 LD1; (R4 R2) 100 S<2:0> 1 LD2; Three states are required rather than one! plus extra register and wider mux More More control states states because this this datapath supports less less parallel activity Engineering choices made based on how frequently multiple transfers take place at the same time No. 11-28
Bussing Strategies Multiple Busses Real datapaths are a compromise between the two extremes Register Transfer Diagram Single Bus Design Memory ddress Bus M R P C I R C BUS B M B R Memory Data Bus Register transfer operations: PC BUS IR BUS C BUS MBR BUS LU Result BUS BUS PC BUS IR BUS C BUS MBR BUS LU B BUS MR C LU ("hardwired") No. 11-29
Bussing Strategies Multiple Busses Memory ddress Bus M R P C I R C BUS B M B R Memory Data Bus Example Register Transfer for Single Bus Design Instruction Interpretation for "DD Mem[X]" Fetch Operand Cycle 1: IR<operand address> BUS; BUS MR; Cycle 2: Memory Read; Databus MBR; Perform DD Cycle 3: Write Result Cycle 4: MBR BUS; BUS LU B; C LU ; DD; LU Result BUS; BUS C; Requires latch for LU Result No. 11-30
Bussing Strategies Multiple Busses Three Bus Design -- Supports more parallelism ddress Bus Result Bus Memory ddress Bus M R P C I R C B M B R Memory Data Bus Memory Bus Single bus replaced by three busses: Memory Bus (MBUS) Result Bus (RBUS) ddress Bus (BUS) No. 11-31
Bussing Strategies Multiple Busses Memory ddress Bus M R ddress Bus P C I R C Result Bus B M B R Memory Data Bus Memory Bus Instruction Interpretation for "DD Mem[X]" Fetch Operand Cycle 1: IR<operand address> BUS; BUS MR; Cycle 2: Memory Read; Databus MBR; Perform DD Cycle 3: MBR MBUS; MBUS LU B; C LU ; DD; Implemented in three cycles rather than four Write Result LU Result RBUS; RBUS C; dvantage of separate BUS: overlap PC MR with instruction execution No. 11-32
Design of CPUs 1. Develop the state diagram (SM chart) and associated register transfer operations for the CPU assuming point-to-point connections 2. Identify the register interconnections that are not used at the same time in any control state. These can be replaced by a BUS-type interconnections. 3. Revise the state diagram to reflect the the register transfer operations supported by the modified datapath 4. Determine how to implement the register transfer operations by detailed control signal sequence. Revise the state diagram to assert these signals in the desired sequences No. 11-33
Finite State Machines for Simple CPUs State Diagram and Datapath Derivation Processor Specification: Instruction Format: 15 14 13 0 ddress Op Code 00 = LD 01 = ST 10 = DD 11 = BRN Load from memory: Mem[XXX] C; Store to memory: C Mem[XXX]; dd from memory: C + Mem[XXX] C; Branch if accumulator is negative: C < 0 XXX PC; Memory Interface: M R M B R 14 Request Read/Write Wait 16 Memory 14 [0:2-1] <15:0> No. 11-34
Finite State Machines for Simple CPUs Deriving the State Diagram and Datapath First pass state diagram: Reset Instruction Fetch Operation Decode LD ST DD BRN Operation Execution No. 11-35
Finite State Machines for Simple CPUs Deriving the State Diagram and Datapath ssume Synchronous Mealy Machine: Transitions associated with arcs rather than states Reset Reset State State (State (State 0) 0) and and Instruction Fetch Fetch Sequence On Reset: zero the PC Mem Request unasserted Mem asserts Wait signal Instruction Fetch: issue read request 4 cycle handshake on Wait signal Note: No explicit mention of the busses being used to implement register transfers! Reset/ Wait/ Wait/ 1 Read/Write, 1 Request, MR Memory Wait/ RES IF0 IF1 IF2 Reset/ PC MR, PC + 1 PC Reset/0 PC Wait/ MR Memory, 1 Read/Write, 1 Request Wait/Mem MBR Wait/MBR IR No. 11-36
Finite State Machines for Simple CPUs Deriving the State Diagram and Datapath Operation Decode State OD IR<15:14>=00 01 10 11 LD0 ST0 D0 BR0 Four Way Next State Branch based on opcode bits No. 11-37
Finite State Machines for Simple CPUs Deriving the State Diagram and Datapath Execution Sequences Load Sequence like IFetch, except that operand address comes from IR and data should be loaded into C Wait/ OD LD0 IR<15:14>=00/ IR<13:0> MR Wait/ 1 Read/Write, 1 Request, MR Memory LD1 Wait/ MR Memory, 1 Read/Write, 1 Request Wait/Mem MBR Wait/ LD2 Wait/MBR C RES No. 11-38
Finite State Machines for Simple CPUs Deriving the State Diagram and Datapath Store Execution Sequence Memory write sequence OD Wait/ Wait/ 0 Read/Write, 1 Request, MR Memory, MBR Memory ST0 ST1 IR<15:14>=01/ IR<13:0> MR, C MBR Wait/ MR Memory, MBR Memory, 0 Read/Write, 1 Request Wait/ ST2 Wait/ RES Wait/ No. 11-39
Finite State Machines for Simple CPUs Deriving the State Diagram and Datapath dd Execution Sequence Similar to Load sequence dd MBR, C rather than simply transfer MBR to C OD IR<15:14>=10/ IR<13:0> MR Wait/ Wait/ 1 Read/Write, 1 Request, MR Memory D0 D1 Wait/ MR Memory, 1 Read/Write, 1 Request Wait/Mem MBR Wait/ D2 RES Wait/ MBR + C C No. 11-40
Finite State Machines for Simple CPUs Deriving the State Diagram and Datapath Branch Execution Sequence OD IR<15:14> = 11/ BR0 C<15> = 1/ IR<13:0> PC C<15> = 0/ Replace PC with Operand ddress if C < 0 RES Otherwise, do nothing No. 11-41
Simplify Wait Looping Eliminate some Wait states t this point, Wait must be asserted, so why loop on Wait? Why loop on Wait when resync will take place at state IF0? No. 11-42
Finite State Machines for Simple CPUs Deriving the State Diagram and Datapath State Machines Inputs and Outputs so far: Inputs: Reset Wait IR<15:14> C<15> Outputs: 0 PC PC + 1 PC PC MR MR Memory ddress Bus Memory Data Bus MBR MBR Memory Data Bus MBR IR MBR C C MBR C + MBR C IR<13:0> MR IR<13:0> PC 1 Read/Write 0 Read/Write 1 Request No. 11-43
Finite State Machines for Simple CPUs Processor Signal Flow Memory Reset Wait Mem ddr Bus Mem Data Bus C O N T R O L Read/Write Request 0 PC PC + 1 PC PC MR MR Memory ddress Bus Memory Data Bus MBR MBR Memory Data Bus MBR IR MBR C C MBR C + MBR C IR<13:0> MR IR<13:0> PC IR<15:14> C<15> D T P T H No. 11-44
Finite State Machines for Simple CPUs Mapping onto Datapath Control Specification so far is independent of bussing strategy Implied transfers: Operand Fetch IFetch Branch Store dd Memory ddress Bus M R P C I R C dd B M B R Memory Data Bus dd Load IFetch This is the point-to-point connection scheme No. 11-45
Finite State Machines for Simple CPUs Mapping onto Datapath Operations Observe that instruction fetch and operand fetch take place at different times This implies that IR, PC, and MR transfers can be implemented by single bus (ddress Bus) Combine MBR, IR, LU B, and C connections (Memory Bus) Combine LU, C, and MBR connections (Result Bus) Three bus architecture: C + MBR C implemented in single state No. 11-46
Finite State Machines for Simple CPUs Mapping onto Datapath Operations ddress Bus Result Bus Memory ddress Bus M R P C I R C B M B R Memory Data Bus Memory Bus C has two inputs, RBUS and MBUS (Other registers except MBR have single input and output) Dual ported configuration is more complex Better idea: reuse existing paths were possible MBR C transfer implemented by PSS B LU operation No. 11-47
Finite State Machines for Simple CPUs Mapping onto Datapath Operations Detailed implementation of register transfer operations More detailed control operations are called microoperations One register transfer operation = several microoperations Some operations directly implemented by functional units: e.g., DD, Pass B, 0 PC, PC + 1 PC Some operations require multiple control operations: e.g., PC MR implemented as PC BUS and BUS MR No. 11-48
Finite State Machines for Simple CPUs Mapping onto Datapath Operations Relationship between register transfer and microoperations: Register Transfer Microoperations 0 PC 0 PC (delayed); PC + 1 PC PC + 1 PC (delayed); PC MR PC BUS (immediate), BUS MR (delayed); MR ddress Bus MR ddress Bus (immediate); Data Bus MBR Data Bus MBR (delayed); MBR Data Bus MBR Data Bus (immediate); MBR IR MBR BUS (immediate), BUS IR (delayed); MBR C MBR MBUS (immediate), MBUS LU B (immediate), LU PSS B (immediate), LU Result RBUS (immediate), RBUS C (delayed); No. 11-49
Finite State Machines for Simple CPUs Mapping onto Datapath Operations Relationship between register transfer and microoperations: Register Transfer Microoperations C MBR C + MBR C IR<13:0> MR IR<13:0> PC C RBUS (immediate), RBUS MBR (delayed); C LU (immediate), MBR MBUS (immediate), MBUS LU B (immediate), LU DD (immediate), LU Result RBUS (immediate), RBUS C (delayed); IR BUS (immediate), BUS IR (delayed); IR BUS (immediate), BUS PC (delayed); 1 Read/Write Read (immediate); 0 Read/Write Write (immediate); 1 Request Request (immediate); Special microoperations for C LU and LU Result RBUS not strictly necessary since these connections can be hardwired No. 11-50
Finite State Machines for Simple CPUs Mapping onto Datapath Operations Revised microoperation signal flow Memory Reset Wait Mem ddr Bus Mem Data Bus Read/Write Request C O N T R O L 0 PC PC + 1 PC PC BUS IR BUS BUS MR BUS PC MR Memory ddress Bus Memory Data Bus MBR MBR Memory Data Bus MBR MBUS D T P T H 5 inputs make sure that Reset and Wait are synchronized 16 datapath control lines 2 memory control lines MBUS IR MBUS LUB RBUS C RBUS MBR LU DD LU PSS B IR<15:14> C<15> No. 11-51
Controller Implementation Chapter Summary Basic organization of the Von Neumann computer Separation of processor and memory Datapath connectivity Control Unit Organization Register transfer operation No. 11-52