# CSE320 Final Exam Practice Questions

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 CSE320 Final Exam Practice Questions Single Cycle Datapath/ Multi Cycle Datapath Adding instructions Modify the datapath and control signals to perform the new instructions in the corresponding datapath. Use the minimal amount of additional hardware and clock cycles/control states. Remember: When adding new instructions, don't break the operation of the standard ones. Avoid adding ALUs, adders, Reg Files, or memories to the datapath You can add MUXes, logic gates, etc. but try to do minimally. (these cost in terms of area, cycle time, etc) a. Load Word Register (uses R instruction format) lwr Rt, Rd (Rs) #Reg[Rt] = Mem[Reg[Rd]+Reg[Rs]] b. Add 3 operands (new instruction format: opcode(6), rs(5), rt(5), rd(5), rx(5), (6 bits not used)) add3 Rd, Rs, Rt, Rx #Reg[Rd] = Reg[Rs] + Reg[Rt] + Reg[Rx] c. Add to Memory (new instruction format: opcode(6), rs(5), rt(5), rd(5), offset(11)) addm Rd, Rt, Offset(Rs) #Reg[Rd] = Reg[Rt] + Mem[sign extended offset + Reg[Rs]] d. Branch on less than or Equal (uses I instruction format) blez Rs, label # if Reg[Rs] < 0, PC = PC+4 + (sign extended offset << 2) e. Branch Equal to Memory (new instruction format: opcode(6), rs(5), rt(5), rd(5), offset(11)) beqm Rd, Rt, Offset(Rs) # if Reg[Rt] = Mem[Offset+Reg[Rs]], PC = PC Reg[Rd] f. Branch Equal to 0 to Immediate (uses R instruction format) beqzi (Rs), Label #if Mem[Reg[Rs]] = 0, then PC = PC + (sign extended offset) (NOTE: This is not PC+4, and not shifted by 2) g. Store Word and Increment swinc Rt, offset(rs) #Mem[Reg[Rs] + sign extended offset]= Reg[Rt], Reg[Rs] = Reg[Rs] + 4 h. Store Word and Decrement swdec Rt, offset(rs) #Mem[Reg[Rs] + sign extended offset] = Reg[Rt], Reg[Rs] = Reg[Rs] 4 What if you were to add (g) and (h) simultaneously to the datapaths? Datapath Timing 1. Calculate the delay in the modified datapaths when performing instructions above. Assume the following delays: Memory: 200ps Register Files Access (READ/Write): 50ps ALU and adders: 100ps Logic Gates and Multiplexors: 1ps All other times are negligible 2. Calculate the minimal clock cycle time if all of the new instructions were added in the Single and Multicycle cases.

2 Other Datapath Questions Given MIPS code, can you determine.. Whatis happening at clock cycle X in the Single Cycle Datapath? Or what cycle is operation X happening? What is happening at clock cycle X in the Multi Cycle Datapath? Or what cycle is operation X happening? How many cycles it will take to execute the code? Can you identify the signals (control and values) in the datapath for a given clock cycle? And other questions of this nature. Short Answer Misc Questions 1. What is the primary advantage of fixed sized opcodes? Instruction decode is faster and more efficient. Control does not need to determine the length/ position of the opcode in the instruction. 2. Will a speedup of 20 on 50% of a program result in an overall speedup of at least 2 times? Explain your answer The new overall speedup is calculated according to Amdahl s Law. For an overall speedup of 2, the new execution time must be 50% or less of the old execution time. No. The new overall execution time = 50% + 50%/20 = 52.5% of the old. 3. What are the 5 components of a modern computer system (Hint: Two of them can be combined and called the processor) Datapath + control = processor, memory, input, output 4. What is a stored program computer? A computer where the instruction of the program are stored in memory, the CPU is assigned the task of fetching the instruction from memory, decoding them and executing them. 5. True or False: Program execution time increase when the instruction count increase (IC) TRUE In a load/store architecture, the only instructions that access memory are load and store types. TRUE More powerful instructions lead to higher performance since the total number of instructions executed is smaller for a given task with more powerful instructions. FALSE An add operation has 3 operands (2 input and 1 output), therefore add instructions must be 3 address instructions. FALSE 6. In a system executing jobs, when is throughput = 1/latency? Throughput of a machine is the number of instructions which are executed per second. Latency is the length of time per execution of an instruction. Throughput = 1/latency when a system is executing one task at a time eg. In a single or multicycle datapath

3 Piplelining increases throughput since multiple instructions are executing simultaneously. Therefore the latency of each instruction (on average) is shorter than the length of an instruction. 7. What are the advantages and disadvantages of write through and write back cache modifications in shared memory systems? Write through will slow the system down, taking more time for each write. However, with a write back cache, there may be data contention since the multiple references could be referencing the data when it is dirty. Pipelining 1. What are the main benefits and disadvantages of pipelining? 2. Name the type of pipelining hazards. Define how and when they can occur in systems (in general). Define how/when they occur in MIPS. Give a MIPS datapath or code example of each type. 3. Many processors have 5 or 6 stage pipelines. A typical value for the CPI (cycles per instruction) in such processors is in the range of 1.0 to 1.5. Does it mean that the latency of execution of most instruction 1 or 2 clock cycles? Why, or why not? 4. Why do conditional branches impact the performance of a pipelined implementation? 5. Briefly describe 2 solutions to reduce the performance impact of conditional branch instructions in a pipelined implementation. 6. Given sequences of instructions determine the forwarding paths and required stalls Performance 1. The computer spends 82% of the time computing and 18% waiting for the disk. The instruction mix and the average cycles per instruction (CPI) for each type is: Type Instruction % CPI int 40% 1 FP 30% 5 Other 30% 2 a. Consider 3 modifications to the computer. Compute the speedup for each. i. The processor is replaced with a new one that reduces the total computation time by 35%. Speedup = 1 / ((1-0.82) * 0.65) = 1.40 ii. iii. The disk is replaced with a solid state device that reduces the disk waiting time by 85%. Speedup = 1 / ((1-0.18) * 0.15) = 1.18 The processor is replaced with a new one that has improved floating point performance. The average floating point CPI is reduced to 3; all other aspects are unchanged.

4 Average CPI (old) = 0.40 * * * 2 = 2.5 Average CPI (enhanced) = 0.40 * * * 2 = 1.9 Speedup (computation) = 2.5 / 1.9 = Speedup = 1 / ((1-0.82) / 1.315) = b. Which modification gave the best speedup? Modification (i) provides the best speedup. c. For the two modifications in part (i) that did not result in the best speedup, is it possible for them to achieve the speedup achieved by the modification in part (ii)? Show your work and explain your answer. i. An infineitely fast dask: Speedup = 1 / (1 0.18) = 1.22 which is still slower than i ii. If FP only 1 clock cycle: Average CPI (enhanced) = 0.40 * * * 2 = 1.3 Speedup (computation) = 2.5 / 1.3 = 1.92 Speedup = 1 / ((1-0.82) / 1.92) = You have two RiSC 16 processors X and Z, with the following characteristics. They are both multi cycle processors, in which an instruction executes in a variable number of processor cycles. X and Z execute variations on the same instruction set (RiSC) is the following way: a. Processor X implements the base instruction set, including LUI. Processor X implements multiplication in software, meaning there is not MULT instruction. b. Processor Z eliminates the LUI instruction in favor of a MULT instruction, getting LUI functionality from LW. c. Processor Z s MULT instruction uses the ALU over & over again in a loop, performing shifts and conditional adds, and requires 80 processor cycles per multiply. d. Executing one MULT instruction on Processor Z eliminates on average 30 instructions that would be executed on Processor X when implemented in a software. However, Processor Z then need additional ALU functionality which increasing the ALU s critical path from 10ns to 12ns. Also, Assume the following: - Cache read/write: 10 ns - Register file read/write: 8 ns - ALU operation: 10 ns for processor X, 12 ns for processor Z Assume the following distribution of instruction types (assume that LUI requires 3 cycles): Processor X Processor Z MULT 0% 5% LUI 5% 0% LW 20% 25% SW 10% 10% R Type 45% 40% BEQ 20% 20%

5 For example, if processor Z executes 5 MULT instructions out of every 100. For each MULT instruction, processor X executes an additional 30 instructions. a. Compare the execution times of the two processors. Execution time = TIC = Cycle time * Instruction Count * Average CPI Exec time(x) = Tx * Ix * Cx Exec time(z) = Tz * Iz * Cz Tx= 10ns; Tz=12ns If Iz=100, Ix= 95 + (5 * 30) = 245 CPI for each instruction type: MULT=80 cycles, LUI =3, LW=5, SW=4, R TYPE =4, BEQ=3 Therefore: Average CPI for X = Cx = (0.05*3)+(0.2*5)+(0.1*4)+(0.45*4)+(0.2*3)=3.95 Average CPI for Z = Cz = (0.05*80)+(0.25*5)+(0.1*4)+(0.4*4)+(0.2*3)=7.85 Comparing execution times: Exec time(x) = 10 ns/c * 3.95 c/i * 245 i = ns Exec time(z) = 12 ns/c * 7.85 c/i * 100 i = 9420 ns Processor Z with the multiply instruction is about 1.03 times faster than processor X for this instruction mix. b. At what clock speed for processor Z are the two designs equal in performance? Equating the two excution times and solving for Tz 10ns * 3.95cpi * 245 instructions = Tz * 7.85cpi * 100 instructions Tz = (10ns * 3.95 * 245)/(7.85* 100) Tz= 12.33ns For smaller Tz (faster clock), processor Z has better performance; for larger Tz (slower clock), processor X has better performance. c. (more difficult) Assuming the original ALU latency for processor Z (12 ns), how fast would your softwareemulated multiply have to be (on average) for processor X to be just as fast as processor Z? In other words, how many instructions would processor X execute in place of 1 MUL? Remember that Ix was defined to be 95 + (# of multiplications) * (cost of each) We first need to find the instruction count of processor X necessary for equal performance. 10ns * 3.95 * Ix = 12ns * 7.85 * 100 which implies Ix = Total number of multiply emulate instructions is ( )= Therefore number of instructions per multiply = 143.5/5 = ~28 instructions

6 3. Two important parameters control the performance of a processor: cycle time and cycles per instruction. There is an enduring trade off between these two parameters in the design process of microprocessors. While some designers prefer to increase the processor frequency at the expense of large CPI, other designers follow a different school of thought in which reducing the CPI comes at the expense of lower processor frequency. Consider the following machines, and compare their performance using the following instruction mix: 25% loads, 13% stores, 47% ALU instructions, and 15% branches/jumps. Assume the unmodified multi cycle datapath and finite state machine. M1: The multicycle datapath is designed with a 1 GHz clock M2: A machine like M1 except that register updates are done in the same clock cycle as a memory read of ALU operation. Thus in the finite state machine, states 6 and 7 and states 3 and 4 are combined. This machine has an 3.2 GHz clock, since the register update increases the length of the critical path. M3: A machine like M2 except that effective address calculations are done in the same clock cycle as a memory access. Thus states 2, 3, and 4 can be combined, as can 2 and 5, as well as 6 and 7. This machine has a 2.8 GHz clock because of the long cycle created by combining address calculation and memory access. Find out which of the machines is fastest. Are there instruction mixes that would make another machine faster, and if so, what are they? In the original multi cycle data path the CPI for each instruction is as follows: Loads: 5 cycles Stores: 4 cycles ALU: 4 cycles Branch/Jumps: 3 cycles Performance M1: Average CPI =.25*5 +.13*4 +.47*4 +.15*3 = 4.1 Cycle Time = (CPI*#instructions)/ clock rate = 4.1I/1GHz = 4.1I*10 9 seconds Performance M2: Loads shorten to 4 cycles ALUs shorten to 3 cycles Average CPI =.25*4 +.13*4 +.47*3 +.15*3 = 3.38 Cycle Time = (CPI*#instructions)/ clock rate = 3.38I/3.2GHz = 1.06I*10 9 seconds Performance M3: Loads shorten to 3 cycles Stores shorten to 3 cycles ALUs shorten to 3 cycles Average CPI =.25*3 +.13*3 +.47*3 +.15*3 = 3 Cycle Time = (CPI*#instructions)/ clock rate = 3I/2.8GHz = 1.07I*10 9 seconds M2 is fastest.

7 M1 can never be faster than M2, even if all the instructions are branch instructions, the CPI will be 3 for all 3 cases, and the clock rate is faster on the other 2 processors. M3 can be faster than M2, if all instruction loads or all stores then Ex: M2: Average CPI = 1*4 + 0*4 + 0*3 + 0*3 = 4 M3: Average CPI = 1*3 + 0*3 + 0*3 + 0*3 = 3 M2 Cycle Time = (CPI*#instructions)/ clock rate = 4I/3.2GHz = 1.25I*10 9 seconds M3 Cycle Time = (CPI*#instructions)/ clock rate = 3I/3.2GHz = 1.07I*10 9 seconds Review Adder and ALU Creation and Building larger ALUs from units Consider the 4 bit ALU below which can perform the following 5 operations: add, sub, AND, OR and negate B. Inputs are A={A 3,A 2,A 1,A 0 }, B={B 3,B 2,B 1,B 0 }, and C in. Outputs are result R={R 3,R 2,R 1,R 0 }and C out. Numbers are in 2 s complement form. Fill in the table below, for each operation, what the values of the control signals should be. Indicate don t cares where appropriate. Operation m 1 M 0 C in B INV A z Add Sub OR 1 0 X 0 1 AND 0 1 X 0 1 Negate B Digital Logic 1. Using Boolean algebra, prove the following:

8 a. bd + c d = ((b d ) + (cd)) bd + c d = (b d ) (cd) bd + c d = (b+d)(c +d ) bd + c d = bc + c d+ bd + dd bd + c d = bc + c d+ bd + 0 bd + c d = bc (d+d )+ c d+ bd bd + c d = bc d + bc d + c d+ bd bd + c d = c d (1 + b) + bd (c + 1) bd + c d = bd + c d De Morgans De Morgans Distributive Complementary Null Commutative Distributive Null b. abc + bc d + a bd = abc + a bd abc + bc d(a+a ) + a bd = abc + a bd abc + abc d + a bc d + a bd = abc + a bd abc (1+d) + a bd (c +1) = abc + a bd abc + a bd = abc + a bd Null Distributive Commutative Null c. a + a(a b + b c) = a + b + c a + a((a b) (b c) ) = a + b + c De Morgans a + a((a+b )(b+c )) = a + b + c De Morgans a + a(ab+bb +ac +bc ) = a + b + c Distributive a + a(ab+0+ac +bc ) = a + b + c Complementary a + a(ab+ac +bc ) = a + b + c Distributive a + aab+aac +abc = a + b + c Idempotence a + ab+ac +abc = a + b + c Idempotence a + a(b+c +bc ) = a + b + c a + b + c + bc = a + b + c No Name a + b + c (1+ b) = a + b + c Null a + b + c = a + b + c 2. Consider the following function: z(x 3,x 2, x 1, x 0 ) = x 3 x 2 + x 3 x 1 x 0 + x 3 x 2 x 0 + x 3 x 2 x 0 a. How many literals does z contain? 11 b. Is z, minimal? If not, find the minimal expression using Boolean algebra. No. x 3 x 2 + x 3 x 1 x 0 + x 3 x 2 x 0 + x 3 x 2 x 0 x 3 x 2 (1 + x 0 ) + x 3 x 1 x 0 + x 3 x 2 x 0 x 3 x 2 (1 + x 0 ) + x 3 x 1 x 0 + x 3 x 2 x 0 x 3 x 2 + x 3 x 2 x 0 + x 3 x 1 x 0 + x 3 x 2 x 0 x 3 x 2 + x 3 x 1 x 0 + x 2 x 0 c. Find the equivalent sum of minterms (SOP) for z (using m notation) z(x 3,x 2, x 1, x 0 ) =m(4,5,6,7,11,12,14,15)

9 d. Find the equivalent product of maxterms (POS) for z (using M notation) z(x 3,x 2, x 1, x 0 ) =M (0,1,2,3,8,9,10,13) 3. You were recently hired as an engineer in a company that designs alarm systems custom made to meet the customer s specifications. You are asked to design a system that uses the inputs of three sensors A, B, and C. The alarm should go off (activated) when the following criteria are met: When A is off, or When B is on and C is off, or When both A and C are on. a. Write the truth table for the function A B C Alarm b. Write the Boolean expression in Product of Sums (POS) form. A + B + C c. Draw/Implement the function using 2 selector MUX gates. a. d. Draw/Implement the function using 2 level NAND NAND gates.

10 The straight forward solution using minterms/sop expression: A Better Solution using Demorgan s law: A + B + C = ((A + B + C) ) = (AB C ) 4. Find the minimal 2 level implementation using NOR NOR gates, of a system with two 2 bit inputs (A = {a 1, a 0 } & B = {b 1, b 0 }) which output the following. If A+B is even, then the output is their product. If A+B is odd, then the output is their sum. a 1 a 0 b 1 b 0 A+B Z Z 3 Z 2 Z 1 Z 0 (decimal) (decimal) Z 3 = a1a0b1b0 Z 2 = a1a0 b1 + a1a0b1b0 = a1a0 b1 + a1b1b0 Z 1 = a1 b1b0 + a1 a0b0 + a1 a0b1 +a1a0b1 + a1b1 b0

11 Z 0 = (a0 + b0)(a1+a0 +b1 5. Implement the functionality of a 2 input Decoder using minimal AND, OR and NOT gates. Decoders take a binary number and map this value to an output line. 2 input values, means 4 different values (4 outputs) S 1 S 0 F 3 F 2 F 1 F

12 6. Implement a 7 segment Controller.(truth table, Boolean expressions, gate logic). Practice with any combination of logic units. q 0 q 1 q 2 q 3 7-segment Controller A B C D E F G

13 7. Implement the 7 segment using a 4 selector DEMUX and Or gates.

### Ch 5: Designing a Single Cycle Datapath

Ch 5: Designing a Single Cycle path Computer Systems Architecture CS 365 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Memory path Input Output Today s

### Computer Architecture

Computer Architecture Having studied numbers, combinational and sequential logic, and assembly language programming, we begin the study of the overall computer system. The term computer architecture is

### Central Processing Unit (CPU)

Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following

### Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

Pipeline Hazards Structure hazard Data hazard Pipeline hazard: the major hurdle A hazard is a condition that prevents an instruction in the pipe from executing its next scheduled pipe stage Taxonomy of

### Week 1 out-of-class notes, discussions and sample problems

Week 1 out-of-class notes, discussions and sample problems Although we will primarily concentrate on RISC processors as found in some desktop/laptop computers, here we take a look at the varying types

### Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

### 1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1.

File: chap04, Chapter 04 1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1. 2. True or False? A gate is a device that accepts a single input signal and produces one

### Solutions. Solution 4.1. 4.1.1 The values of the signals are as follows:

4 Solutions Solution 4.1 4.1.1 The values of the signals are as follows: RegWrite MemRead ALUMux MemWrite ALUOp RegMux Branch a. 1 0 0 (Reg) 0 Add 1 (ALU) 0 b. 1 1 1 (Imm) 0 Add 1 (Mem) 0 ALUMux is the

### Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Pipelining HW Q. Can a MIPS SW instruction executing in a simple 5-stage pipelined implementation have a data dependency hazard of any type resulting in a nop bubble? If so, show an example; if not, prove

### A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

Other architectures Example. Accumulator-based machines A single register, called the accumulator, stores the operand before the operation, and stores the result after the operation. Load x # into acc

### Chapter 5 Instructor's Manual

The Essentials of Computer Organization and Architecture Linda Null and Julia Lobur Jones and Bartlett Publishers, 2003 Chapter 5 Instructor's Manual Chapter Objectives Chapter 5, A Closer Look at Instruction

### Computer organization

Computer organization Computer design an application of digital logic design procedures Computer = processing unit + memory system Processing unit = control + datapath Control = finite state machine inputs

### CSE 30321 Computer Architecture I Fall 2009 Final Exam December 18, 2009

CSE 30321 Computer Architecture I Fall 2009 Final Exam December 18, 2009 Test Guidelines: 1. Place your name on EACH page of the test in the space provided. 2. every question in the space provided. If

### EE282 Computer Architecture and Organization Midterm Exam February 13, 2001. (Total Time = 120 minutes, Total Points = 100)

EE282 Computer Architecture and Organization Midterm Exam February 13, 2001 (Total Time = 120 minutes, Total Points = 100) Name: (please print) Wolfe - Solution In recognition of and in the spirit of the

### EC 362 Problem Set #2

EC 362 Problem Set #2 1) Using Single Precision IEEE 754, what is FF28 0000? 2) Suppose the fraction enhanced of a processor is 40% and the speedup of the enhancement was tenfold. What is the overall speedup?

### PROBLEMS. which was discussed in Section 1.6.3.

22 CHAPTER 1 BASIC STRUCTURE OF COMPUTERS (Corrisponde al cap. 1 - Introduzione al calcolatore) PROBLEMS 1.1 List the steps needed to execute the machine instruction LOCA,R0 in terms of transfers between

### United States Naval Academy Electrical and Computer Engineering Department. EC262 Exam 1

United States Naval Academy Electrical and Computer Engineering Department EC262 Exam 29 September 2. Do a page check now. You should have pages (cover & questions). 2. Read all problems in their entirety.

### Introducción. Diseño de sistemas digitales.1

Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1

### COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December 2009 23:59

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December 2009 23:59 Overview: In the first projects for COMP 303, you will design and implement a subset of the MIPS32 architecture

### Karnaugh Maps & Combinational Logic Design. ECE 152A Winter 2012

Karnaugh Maps & Combinational Logic Design ECE 52A Winter 22 Reading Assignment Brown and Vranesic 4 Optimized Implementation of Logic Functions 4. Karnaugh Map 4.2 Strategy for Minimization 4.2. Terminology

### Digital Logic: Boolean Algebra and Gates

Digital Logic: Boolean Algebra and Gates Textbook Chapter 3 CMPE2 Summer 28 Basic Logic Gates CMPE2 Summer 28 Slides by ADB 2 Truth Table The most basic representation of a logic function Lists the output

### Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

Digital Logic Design Basics Combinational Circuits Sequential Circuits Pu-Jen Cheng Adapted from the slides prepared by S. Dandamudi for the book, Fundamentals of Computer Organization and Design. Introduction

### Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

### EE361: Digital Computer Organization Course Syllabus

EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)

### A Brief Review of Processor Architecture. Why are Modern Processors so Complicated? Basic Structure

A Brief Review of Processor Architecture Why are Modern Processors so Complicated? Basic Structure CPU PC IR Regs ALU Memory Fetch PC -> Mem addr [addr] > IR PC ++ Decode Select regs Execute Perform op

### 4 BOOLEAN ALGEBRA AND LOGIC SIMPLIFICATION

4 BOOLEAN ALGEBRA AND LOGIC SIMPLIFICATION BOOLEAN OPERATIONS AND EXPRESSIONS Variable, complement, and literal are terms used in Boolean algebra. A variable is a symbol used to represent a logical quantity.

### Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

### LSN 2 Computer Processors

LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

### Instruction Set Architecture

Instruction Set Architecture Consider x := y+z. (x, y, z are memory variables) 1-address instructions 2-address instructions LOAD y (r :=y) ADD y,z (y := y+z) ADD z (r:=r+z) MOVE x,y (x := y) STORE x (x:=r)

### Course on Advanced Computer Architectures

Course on Advanced Computer Architectures Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, September 3rd, 2015 Prof. C. Silvano EX1A ( 2 points) EX1B ( 2

### Chapter 4. Arithmetic for Computers

Chapter 4 Arithmetic for Computers Arithmetic Where we've been: Performance (seconds, cycles, instructions) What's up ahead: Implementing the Architecture operation a b 32 32 ALU 32 result 2 Constructing

### (Refer Slide Time: 00:01:16 min)

Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control

### Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Computers in mid 50 s Hardware was expensive

### on an system with an infinite number of processors. Calculate the speedup of

1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements

### Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

### l What have discussed up until now & why: l C Programming language l More low-level then Java. l Better idea about what s really going on.

CS211 Computer Architecture l Topics Digital Logic l Transistors (Design & Types) l Logic Gates l Combinational Circuits l K-Maps Class Checkpoint l What have discussed up until now & why: l C Programming

### Understanding Logic Design

Understanding Logic Design ppendix of your Textbook does not have the needed background information. This document supplements it. When you write add DD R0, R1, R2, you imagine something like this: R1

### Pipeline Hazards. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by Arvind and Krste Asanovic

1 Pipeline Hazards Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by and Krste Asanovic 6.823 L6-2 Technology Assumptions A small amount of very fast memory

### #5. Show and the AND function can be constructed from two NAND gates.

Study Questions for Digital Logic, Processors and Caches G22.0201 Fall 2009 ANSWERS Digital Logic Read: Sections 3.1 3.3 in the textbook. Handwritten digital lecture notes on the course web page. Study

### 150127-Microprocessor & Assembly Language

Chapter 3 Z80 Microprocessor Architecture The Z 80 is one of the most talented 8 bit microprocessors, and many microprocessor-based systems are designed around the Z80. The Z80 microprocessor needs an

### MICROPROCESSOR AND MICROCOMPUTER BASICS

Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit

### Lecture 5: Gate Logic Logic Optimization

Lecture 5: Gate Logic Logic Optimization MAH, AEN EE271 Lecture 5 1 Overview Reading McCluskey, Logic Design Principles- or any text in boolean algebra Introduction We could design at the level of irsim

### PROBLEMS #20,R0,R1 #\$3A,R2,R4

506 CHAPTER 8 PIPELINING (Corrisponde al cap. 11 - Introduzione al pipelining) PROBLEMS 8.1 Consider the following sequence of instructions Mul And #20,R0,R1 #3,R2,R3 #\$3A,R2,R4 R0,R2,R5 In all instructions,

### Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

Addressing The problem Objectives:- When & Where do we encounter Data? The concept of addressing data' in computations The implications for our machine design(s) Introducing the stack-machine concept Slide

### ENEE 244 (01**). Spring 2006. Homework 4. Due back in class on Friday, April 7.

ENEE 244 (**). Spring 26 Homework 4 Due back in class on Friday, April 7.. Implement the following Boolean expression with exclusive-or and AND gates only: F = AB'CD' + A'BCD' + AB'C'D + A'BC'D. F = AB

### The University of Nottingham

The University of Nottingham School of Computer Science A Level 1 Module, Autumn Semester 2007-2008 Computer Systems Architecture (G51CSA) Time Allowed: TWO Hours Candidates must NOT start writing their

### VLIW Processors. VLIW Processors

1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW

### We r e going to play Final (exam) Jeopardy! "Answers:" "Questions:" - 1 -

. (0 pts) We re going to play Final (exam) Jeopardy! Associate the following answers with the appropriate question. (You are given the "answers": Pick the "question" that goes best with each "answer".)

### Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

WAR: Write After Read write-after-read (WAR) = artificial (name) dependence add R1, R2, R3 sub R2, R4, R1 or R1, R6, R3 problem: add could use wrong value for R2 can t happen in vanilla pipeline (reads

### Computer Architecture TDTS10

why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers

### Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software

### Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

### COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING 2013/2014 1 st Semester Sample Exam January 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc.

### Let s put together a Manual Processor

Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce

### CHAPTER 3 Boolean Algebra and Digital Logic

CHAPTER 3 Boolean Algebra and Digital Logic 3.1 Introduction 121 3.2 Boolean Algebra 122 3.2.1 Boolean Expressions 123 3.2.2 Boolean Identities 124 3.2.3 Simplification of Boolean Expressions 126 3.2.4

### Performance evaluation

Performance evaluation Arquitecturas Avanzadas de Computadores - 2547021 Departamento de Ingeniería Electrónica y de Telecomunicaciones Facultad de Ingeniería 2015-1 Bibliography and evaluation Bibliography

### Instruction Set Architectures

Instruction Set Architectures Early trend was to add more and more instructions to new CPUs to do elaborate operations (CISC) VAX architecture had an instruction to multiply polynomials! RISC philosophy

### Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

Chapter 4 Register Transfer and Microoperations Section 4.1 Register Transfer Language Digital systems are composed of modules that are constructed from digital components, such as registers, decoders,

### Boolean Algebra. Boolean Algebra. Boolean Algebra. Boolean Algebra

2 Ver..4 George Boole was an English mathematician of XIX century can operate on logic (or Boolean) variables that can assume just 2 values: /, true/false, on/off, closed/open Usually value is associated

### CSE140: Components and Design Techniques for Digital Systems

CSE4: Components and Design Techniques for Digital Systems Tajana Simunic Rosing What we covered thus far: Number representations Logic gates Boolean algebra Introduction to CMOS HW#2 due, HW#3 assigned

### CS352H: Computer Systems Architecture

CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline - Hazards October 1, 2009 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell Data Hazards in ALU Instructions

### 8-bit 4-to-1 Line Multiplexer

Project Part I 8-bit 4-to-1 Line Multiplexer Specification: This section of the project outlines the design of a 4-to-1 multiplexor which takes two 8-bit buses as inputs and produces a single 8-bit bus

### BOOLEAN ALGEBRA & LOGIC GATES

BOOLEAN ALGEBRA & LOGIC GATES Logic gates are electronic circuits that can be used to implement the most elementary logic expressions, also known as Boolean expressions. The logic gate is the most basic

### CSE140 Homework #7 - Solution

CSE140 Spring2013 CSE140 Homework #7 - Solution You must SHOW ALL STEPS for obtaining the solution. Reporting the correct answer, without showing the work performed at each step will result in getting

### Boolean Algebra Part 1

Boolean Algebra Part 1 Page 1 Boolean Algebra Objectives Understand Basic Boolean Algebra Relate Boolean Algebra to Logic Networks Prove Laws using Truth Tables Understand and Use First Basic Theorems

### Transparent D Flip-Flop

Transparent Flip-Flop The RS flip-flop forms the basis of a number of 1-bit storage devices in digital electronics. ne such device is shown in the figure, where extra combinational logic converts the input

### A SystemC Transaction Level Model for the MIPS R3000 Processor

SETIT 2007 4 th International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 25-29, 2007 TUNISIA A SystemC Transaction Level Model for the MIPS R3000 Processor

Advanced Pipelining Techniques 1. Dynamic Scheduling 2. Loop Unrolling 3. Software Pipelining 4. Dynamic Branch Prediction Units 5. Register Renaming 6. Superscalar Processors 7. VLIW (Very Large Instruction

### Instruction Set Design

Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,

### CHAPTER 4 MARIE: An Introduction to a Simple Computer

CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 195 4.2 CPU Basics and Organization 195 4.2.1 The Registers 196 4.2.2 The ALU 197 4.2.3 The Control Unit 197 4.3 The Bus 197 4.4 Clocks

### NEW adder cells are useful for designing larger circuits despite increase in transistor count by four per cell.

CHAPTER 4 THE ADDER The adder is one of the most critical components of a processor, as it is used in the Arithmetic Logic Unit (ALU), in the floating-point unit and for address generation in case of cache

### Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

Execution Cycle Pipelining CSE 410, Spring 2005 Computer Systems http://www.cs.washington.edu/410 1. Instruction Fetch 2. Instruction Decode 3. Execute 4. Memory 5. Write Back IF and ID Stages 1. Instruction

### CPU Organization and Assembly Language

COS 140 Foundations of Computer Science School of Computing and Information Science University of Maine October 2, 2015 Outline 1 2 3 4 5 6 7 8 Homework and announcements Reading: Chapter 12 Homework:

### TIMING DIAGRAM O 8085

5 TIMING DIAGRAM O 8085 5.1 INTRODUCTION Timing diagram is the display of initiation of read/write and transfer of data operations under the control of 3-status signals IO / M, S 1, and S 0. As the heartbeat

### Model Answers HW2 - Chapter #3

Model Answers HW2 - Chapter #3 1. The hypothetical machine of figure 3.4 also has two I/O instructions: 0011= Load AC fro I/O 0111= Store AC to I/O In these cases the 12-bit address identifies a particular

### Points Addressed in this Lecture. Standard form of Boolean Expressions. Lecture 5: Logic Simplication & Karnaugh Map

Points Addressed in this Lecture Lecture 5: Logic Simplication & Karnaugh Map Professor Peter Cheung Department of EEE, Imperial College London (Floyd 4.5-4.) (Tocci 4.-4.5) Standard form of Boolean Expressions

### Systems I: Computer Organization and Architecture

Systems I: Computer Organization and Architecture Lecture 9 - Register Transfer and Microoperations Microoperations Digital systems are modular in nature, with modules containing registers, decoders, arithmetic

### Intel 8086 architecture

Intel 8086 architecture Today we ll take a look at Intel s 8086, which is one of the oldest and yet most prevalent processor architectures around. We ll make many comparisons between the MIPS and 8086

### CPU Organisation and Operation

CPU Organisation and Operation The Fetch-Execute Cycle The operation of the CPU 1 is usually described in terms of the Fetch-Execute cycle. 2 Fetch-Execute Cycle Fetch the Instruction Increment the Program

### Computer Organization and Components

Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides

### what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

### The Von Neumann Model. University of Texas at Austin CS310H - Computer Organization Spring 2010 Don Fussell

The Von Neumann Model University of Texas at Austin CS310H - Computer Organization Spring 2010 Don Fussell The Stored Program Computer 1943: ENIAC Presper Eckert and John Mauchly -- first general electronic

### Reduced Instruction Set Computer (RISC)

Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the ISA. RISC Goals RISC: Simplify ISA Simplify CPU Design Better CPU Performance Motivated by simplifying

### Chapter 4. Gates and Circuits. Chapter Goals. Chapter Goals. Computers and Electricity. Computers and Electricity. Gates

Chapter Goals Chapter 4 Gates and Circuits Identify the basic gates and describe the behavior of each Describe how gates are implemented using transistors Combine basic gates into circuits Describe the

### Design of Pipelined MIPS Processor. Sept. 24 & 26, 1997

Design of Pipelined MIPS Processor Sept. 24 & 26, 1997 Topics Instruction processing Principles of pipelining Inserting pipe registers Data Hazards Control Hazards Exceptions MIPS architecture subset R-type

### Karnaugh Maps. Example A B C X 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 0 1 1 0 1 0 1 1 0 1 1 1 1 1. each 1 here gives a minterm e.g.

Karnaugh Maps Yet another way of deriving the simplest Boolean expressions from behaviour. Easier than using algebra (which can be hard if you don't know where you're going). Example A B C X 0 0 0 0 0

### CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

CS:APP Chapter 4 Computer Architecture Wrap-Up William J. Taffe Plymouth State University using the slides of Randal E. Bryant Carnegie Mellon University Overview Wrap-Up of PIPE Design Performance analysis

### UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering EEC180B Lab 7: MISP Processor Design Spring 1995 Objective: In this lab, you will complete the design of the MISP processor,

### Two-level logic using NAND gates

CSE140: Components and Design Techniques for Digital Systems Two and Multilevel logic implementation Tajana Simunic Rosing 1 Two-level logic using NND gates Replace minterm ND gates with NND gates Place

### A Lab Course on Computer Architecture

A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

### SECTION C [short essay] [Not to exceed 120 words, Answer any SIX questions. Each question carries FOUR marks] 6 x 4=24 marks

UNIVERSITY OF KERALA First Degree Programme in Computer Applications Model Question Paper Semester I Course Code- CP 1121 Introduction to Computer Science TIME : 3 hrs Maximum Mark: 80 SECTION A [Very

### A Basic Processor for Teaching Digital Circuits and Systems Design with FPGA

A Basic Processor for Teaching Digital Circuits and Systems Design with FPGA Maicon Carlos Pereira, Paulo Viniccius Viera, André Luis Alice Raabe and Cesar Albenes Zeferino Master Program on Applied Computer

### SIM-PL: Software for teaching computer hardware at secondary schools in the Netherlands

SIM-PL: Software for teaching computer hardware at secondary schools in the Netherlands Ben Bruidegom, benb@science.uva.nl AMSTEL Instituut Universiteit van Amsterdam Kruislaan 404 NL-1098 SM Amsterdam

### Chapter 4 Lecture 5 The Microarchitecture Level Integer JAVA Virtual Machine

Chapter 4 Lecture 5 The Microarchitecture Level Integer JAVA Virtual Machine This is a limited version of a hardware implementation to execute the JAVA programming language. 1 of 23 Structured Computer

### Pipelining Review and Its Limitations

Pipelining Review and Its Limitations Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 16, 2010 Moscow Institute of Physics and Technology Agenda Review Instruction set architecture Basic