Computing Systems. The Processor: Datapath and Control. The performance of a machine depends on 3 key factors:

Similar documents
Computer organization

Solutions. Solution The values of the signals are as follows:

Review: MIPS Addressing Modes/Instruction Formats

Chapter 2 Logic Gates and Introduction to Computer Architecture

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Pipeline Hazards. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by Arvind and Krste Asanovic

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Introducción. Diseño de sistemas digitales.1

CPU Organisation and Operation

(Refer Slide Time: 00:01:16 min)

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

A s we saw in Chapter 4, a CPU contains three main sections: the register section,

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Instruction Set Architecture. Datapath & Control. Instruction. LC-3 Overview: Memory and Registers. CIT 595 Spring 2010

Let s put together a Manual Processor

Computer Organization and Components

Instruction Set Architecture

CHAPTER 4 MARIE: An Introduction to a Simple Computer

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December :59

CPU Organization and Assembly Language

Design of Digital Circuits (SS16)

Reduced Instruction Set Computer (RISC)

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

İSTANBUL AYDIN UNIVERSITY

Memory Elements. Combinational logic cannot remember

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

Chapter 9 Computer Design Basics!

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

The 104 Duke_ACC Machine

Systems I: Computer Organization and Architecture

Design of Pipelined MIPS Processor. Sept. 24 & 26, 1997

Microprocessor & Assembly Language

Central Processing Unit (CPU)

CS352H: Computer Systems Architecture

MICROPROCESSOR AND MICROCOMPUTER BASICS

Instruction Set Design

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions

Sequential Logic. (Materials taken from: Principles of Computer Hardware by Alan Clements )

A SystemC Transaction Level Model for the MIPS R3000 Processor

BASIC COMPUTER ORGANIZATION AND DESIGN

CHAPTER 7: The CPU and Memory

MACHINE ARCHITECTURE & LANGUAGE

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

LSN 2 Computer Processors

Getting the Most Out of Synthesis

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

CS311 Lecture: Sequential Circuits

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

WAR: Write After Read

ECE410 Design Project Spring 2008 Design and Characterization of a CMOS 8-bit Microprocessor Data Path

Chapter 4 Lecture 5 The Microarchitecture Level Integer JAVA Virtual Machine

ETEC 2301 Programmable Logic Devices. Chapter 10 Counters. Shawnee State University Department of Industrial and Engineering Technologies

CHAPTER 3 Boolean Algebra and Digital Logic

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Administrative Issues

Generating MIF files

To design digital counter circuits using JK-Flip-Flop. To implement counter using 74LS193 IC.

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

Lecture 7: Clocking of VLSI Systems

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

CSE 141L Computer Architecture Lab Fall Lecture 2

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Lecture 8: Synchronous Digital Systems

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Intel 8086 architecture

EC 362 Problem Set #2

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

EE361: Digital Computer Organization Course Syllabus

In the Beginning The first ISA appears on the IBM System 360 In the good old days

CSE 141 Introduction to Computer Architecture Summer Session I, Lecture 1 Introduction. Pramod V. Argade June 27, 2005

Computer Organization and Architecture

EE 42/100 Lecture 24: Latches and Flip Flops. Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad

Chapter 01: Introduction. Lesson 02 Evolution of Computers Part 2 First generation Computers

TIMING DIAGRAM O 8085

Having read this workbook you should be able to: recognise the arrangement of NAND gates used to form an S-R flip-flop.

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Instruction Set Architecture (ISA)

SIM-PL: Software for teaching computer hardware at secondary schools in the Netherlands

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: ext: Sequential Circuit

Register File, Finite State Machines & Hardware Control Language

How It All Works. Other M68000 Updates. Basic Control Signals. Basic Control Signals

Systems I: Computer Organization and Architecture

Flip-Flops, Registers, Counters, and a Simple Processor

PROBLEMS. which was discussed in Section

Processor Architectures

5 Combinatorial Components. 5.0 Full adder. Full subtractor

CS 61C: Great Ideas in Computer Architecture Finite State Machines. Machine Interpreta4on

An Overview of Stack Architecture and the PSC 1000 Microprocessor

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

Summary of the MARIE Assembly Language

The WIMP51: A Simple Processor and Visualization Tool to Introduce Undergraduates to Computer Organization

Digital Systems Based on Principles and Applications of Electrical Engineering/Rizzoni (McGraw Hill

1 Computer hardware. Peripheral Bus device "B" Peripheral device. controller. Memory. Central Processing Unit (CPU)

A FPGA Implementation of a MIPS RISC Processor for Computer Architecture Education

Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System?

CHAPTER 11: Flip Flops

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

Sequential Circuit Design

Transcription:

Computing Systems The Processor: Datapath and Control claudio.talarico@mail.ewu.edu 1 Introduction The performance of a machine depends on 3 key factors: compiler and ISA instruction count clock cycle time clock cycles per instructions (CPI) hardware implementation Implement a basic MIPS simplified to contains only: memory-reference instructions: lw,sw arithmetic-logical instructions: add,sub,and,or,slt control-flow instructions: beq,j 2

Overview of the implementation Generic Implementation: use the program counter (PC) to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do The actions required to complete an instruction depend on the instruction class Even across instruction classes there are some similarities e.g., all instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow? 3 Functional units to implement the processor Two types of functional units: elements that operate on data values (combinational) elements that contain state (sequential) Combinational units The current outputs depends only on the current inputs Sequential units The current outputs depends on the current inputs but also on the past inputs the element remember its history i.e., it has the capability of storing the input provided 4

State elements: latches and flip-flops Output is equal to the stored value (state) inside the element (don't need to ask for permission to look at the value) Change of state (value) is based on the control signal Unclocked: Latches state update is level triggered i.e., the state can change whenever the control change Clocked: Flip-Flops State update is edge triggered i.e., state can change only on clock edges rising edge falling edge clock clock cycle 5 Clocking methodology Computer design cannot tolerate unpredictability clocking methodology is designed to prevent this circumstance. specify the timing of reads and writes of the state elements The easiest solution is to use a synchronous clocking scheme edge triggered methodology Typical execution: read contents of some state elements, send values through some combinational logic write results to state elements state element 1 combinational logic state element 2 clock 6

What functional units do we need? we need an ALU we need memory to store instructions and data instruction memory takes address and supply instruction data memory takes address and supply data (lw) data memory address and data and write into memory (sw) we need to manage a PC and its update we need a register file to include 32 registers read two operands and write a result back sometime the operand comes from the instruction we need additional support for the immediate class of instructions (sign extension) we need additional support for the jump instruction 7 Datapath building blocks 8

Register File (read circuits) Built using D flip-flops The clock signal is not shown Make sure you understand what is the mux above? 9 Register File (write circuits) This is just a diagram to illustrate the principle. In practice never gate a clock signal!!! 10

Datapath implementation Use multiplexors to stitch the various functional units together 11 The datapath operation 1. Fetching instructions and incrementing PC 12

Arithmetic-logical instructions 2. two registers read from the register file 3. ALU operates on the data from the two registers 4. The result from ALU is written in the register file 13 Memory-reference instructions: sw 2. Two registers are read from the register file 3. ALU add the value read from one of the register and the sign-extended, lower 16 bits of instruction (offset) 4. The value read from the second register is written in data memory at the address given by the sum computed by the ALU 14

Memory-reference instructions: lw 2. A registers is read from the register file 3. ALU computes the sum of the value read from the register file and the sign-extended, lower 16 bits of instruction (offset) 4. The sum from the ALU is used as the address for the data memory in data memory 5. The data from the memory is written in the register file at the destination register 15 Control flow instructions: beq branch target address 2. Two registers are read from the register file 3. ALU performs a subtract on the values from the register file the value of PC+4 is added to the signextended, lower 16 bits of the instruction (offset) shifted left by 2 4. The Zero result from the ALU is used to decide which adder result to store into the PC 16

Format of the instructions Arithmetic-logical: add,sub,and,or,slt (R-type) 0 rs rt rd shamt funct 31:26 25:21 20:16 15:11 10:6 5:0 Example: add $t0,$t1,$t2 # $t1 in rs, $t2 in rt, $t0 in rd Memory-reference: lw,sw (I-type) 35 or 43 rs rt offset 31:26 25:21 20:16 15:0 Example: lw $t0,offset($t1) # $t1 in rs, $t0 in rt sw $t0,offset($t1) # $t1 in rs, $t0 in rt Oops!! The destination register can be in two possible places. For load is in bit 20:16 (rt), while for R-type instruction it is in bit positions 15:11 (rd). We have a small bug!!! 17 Format of the instructions branch: beq (I-type) 4 rs rt address 31:26 25:21 20:16 15:0 Example: beq $t0,$t1,label # $t0 in rs, $t1 in rt jump: j (J-type) We will leave the implementation of j out until the very end 5 address 31:26 25:0 Example: j address 18

A small bug fix!!! We need to add a mux to select which field of the instruction is used to indicate the register to be written 19 The control unit Selecting the operations to perform (ALU, read/write of data memory and register file) Controlling the flow of data (multiplexor inputs) Information comes from the 32 bits of the instruction Example: add $8, $17, $18 000000 10001 10010 01000 00000 100000 op rs rt rd shamt funct ALU's operation based on instruction type and function code 20

The control unit ALU control input (ALU operation lines) 0000 and 0001 or 0010 add 0110 subtract 0111 set-on-less-than 1100 nor Why is the code for subtract 0110 and not 0011? 21 The control unit The control unit must compute 4-bit ALU control input: given function code for arithmetic given instruction type ALUop=00 for lw,sw ALUop=01 for beq, ALUop=10 for arithmetic Describe it using a truth table (can turn into gates): ALUop is an intermediate 2-bit code computed from the opcode field of the instruction to simplify the logic needed for computing the 4-bit ALU operation control input 22

ALU operation ALUop Instruction RegDst ALUSrc Memto- Reg Mem Mem Branch ALUOp1 ALUp0 Reg Write Read Write R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 23 The control unit Simple combinational logic (truth tables) 24

Implementing jump 25 Single-cycle control structure Every instruction begins execution on one clock edge and completes execution on the next clock edge We use a single long clock cycle for every instruction All of the control logic is combinational We wait for everything to settle down, and the right thing to be done ALU might not produce right answer right away we use write signals along with clock to determine when to write Cycle time determined by length of the longest path 26 13

Single-cycle implementation Critical path for different instruction classes Instruction class R-type Load Store Branch Jump Functional units used by the instruction class instr. fetch instr. fetch instr. fetch instr. fetch instr. fetch reg. access reg. access reg. access reg. access ALU ALU ALU ALU reg. access mem. access mem. access reg. access 27 Performance of single-cycle machines Assume the major functional units of a machine have the following delays: Memory Units: 200 ps ALU and adders: 100 ps Register File (read or write): 50 ps Muxes, control unit, PC accesses, sign extension unit: no delay Instruction mix 25% loads, 10% stores, 45% ALU instructions, 15% branches, and 5% jump What is execution time for an implementation in which every instruction operates in 1 clock cycle of a fixed length? every instruction executes in 1 clock cycle using a variable length clock? 28

Timing for different instruction classes Instr. class Instr. Mem. Reg. Read ALU Data Mem. Reg. Write Total R-type 200 50 100 0 50 400 ps Load 200 50 100 200 50 600 ps Store 200 50 100 200 550 ps Branch 200 50 100 350 ps Jump 200 200 ps 29 Single-cycle machine: performance The clock cycle for a machine with a single clock for all instructions will be determined by the longest instruction CPU clock cycle (single clock) = 600 ps The average clock cycle for a machine with a variable clock is: CPU clock cycle (variable clock) = 600 x 25% + 550 x 10% + 400 x 45% + 350 x 15% + 200 x 3% = 447.5 ps CPU execution time = IC x CPI x clock cycle time CPU performance CPU performance variable clock single clock CPU execution time = CPU execution time single clock variable clock = 600 447.5 = 1.34 30

Single-cycle problems Single cycle Problems: clock cycle is equal to the worst-case delay for all instructions i.e., we violate the key design principle of making the common case fast if we implement more complicated instructions (e.g., floating point arithmetic) the performance penalty is unbearable!!! some functional units must be duplicated (wasteful of area) One possible solution: a multicycle datapath: use a smaller cycle time have different instructions take different numbers of cycles 31 Multi-cycle approach We will be reusing functional units ALU used to compute address and to increment PC Memory used for instruction and data Break up the instructions into steps, each step takes a cycle balance the amount of work to be done restrict each cycle to use only one major functional unit At the end of a cycle store values for use in later cycles (easiest thing to do) introduce additional internal registers Our control signals will not be determined directly by instruction e.g., what should the ALU do for a subtract instruction? We ll use a finite state machine for control 32

Multi-cycle datapath with control lines (without jump) PCLoad IR Both for instruction and data MDR 33 Five execution steps Instruction Fetch Instruction Decode and Register Fetch Execution, Memory Address Computation, or Branch Completion Memory Access or R-type instruction completion Write-back step Instructions take from 3 to 5 cycles! 34

Step 1: instruction fetch Use PC to get instruction and put it in the Instruction Register. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using RTL "Register- Transfer Language" IR <= Memory[PC]; PC <= PC + 4; Can we figure out the values of the control signals? What is the advantage of updating the PC now? 35 Step 2: instruction decode and register fetch Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; ALUOut <= PC + (sign-extend(ir[15:0]) << 2); We aren't setting any control lines based on the instruction type (the instruction is still being decoded in the control unit) 36

Step 3: instruction dependent ALU is performing one of three functions, based on instruction type Memory Reference: (address computation) ALUOut <= A + sign-extend(ir[15:0]); R-type: (execution of the operation) ALUOut <= A op B; Branch completion: (write PC) if (A==B) PC <= ALUOut; Jump completion: (write PC) PC <= {PC[31:28], IR[25:0], 2b 00}; 37 Step 4: R-type or memory-access Memory Reference: (loads and stores access memory) load: MDR <= Memory[ALUOut]; (read access) or store completion step: Memory[ALUOut] <= B; (write access) R-type instruction completion step: (write destination register) Reg[IR[15:11]] <= ALUOut; The write actually takes place at the end of the cycle on the edge 38

Step 5: Write back Memory Reference: load completion step: Reg[IR[20:16]] <= MDR; 39 Summary: 40

Simple Questions How many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label add $t5, $t2, $t3 sw $t5, 8($t3) Label:... #assume not What is going on during the 8th cycle of execution? In what cycle does the actual addition of $t2 and $t3 takes place? 41 Complete multi-cycle machine 42

Review: Finite State Machine (FSM) Finite state machines: a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input) We ll use a Moore machine (output based only on current state) Mealy Machine inputs next-state function current state output function outputs Moore Machine output register 43 Implementing the control unit Note: don t care if not mentioned asserted if name only otherwise exact value How many state bits will we need? 44

Implementing the control unit Value of control signals is dependent upon: what instruction is being executed which step is being performed In each clock cycle decide all the actions that need to be taken Control unit is the most complex part of the design Can be hard-wired, ROM based, or microprogrammed Simpler instructions lead to simpler control Sometime simple instructions are more effective than a single complex instruction Complex instructions may have to be maintained for compatibility reasons 45 Historical perspective Historical context of CISC: Too much logic to put on a single chip Use a ROM (or even RAM) to hold the microcode It s easy to add new instructions Microprogramming appropriate if hundreds of opcodes, modes, cycles, etc. signals specified symbolically using microinstructions 46

Microprogramming = IR[31:35] 47 Microprogramming detail Dispatch ROM 1 Dispatch ROM 2 Op Opcode name Value Op Opcode name Value 000000 R-format 0110 100011 lw 0011 000010 jmp 1001 101011 sw 0101 000100 beq 1000 PLA or ROM 100011 lw 0010 101011 sw 0010 1 State Adder Mux 3 2 1 0 AddrCtl 0 Dispatch ROM 2 Dispatch ROM 1 Address select logic State number Address-control action Value of AddrCtl 0 Use incremented state 3 1 Use dispatch ROM 1 1 2 Use dispatch ROM 2 2 3 Use incremented state 3 4 Replace state number by 0 0 5 Replace state number by 0 0 6 Use incremented state 3 7 Replace state number by 0 0 8 Replace state number by 0 0 9 Replace state number by 0 0 Instruction register opcode field 48