CS 104 Practice Final Exam Solutions

Similar documents
Intel 8086 architecture

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

EE361: Digital Computer Organization Course Syllabus

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

361 Computer Architecture Lecture 14: Cache Memory

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Board Notes on Virtual Memory


Chapter 5 Instructor's Manual

Computer Organization and Components

CPU Organization and Assembly Language

Instruction Set Architecture

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December :59

Reduced Instruction Set Computer (RISC)

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

A3 Computer Architecture

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

More MIPS: Recursion. Computer Science 104 Lecture 9

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Instruction Set Design

This Unit: Virtual Memory. CIS 501 Computer Architecture. Readings. A Computer System: Hardware

Lecture Outline. Stack machines The MIPS assembly language. Code Generation (I)

LSN 2 Computer Processors

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

Introducción. Diseño de sistemas digitales.1

We r e going to play Final (exam) Jeopardy! "Answers:" "Questions:" - 1 -

Chapter 2 Logic Gates and Introduction to Computer Architecture

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Design of Pipelined MIPS Processor. Sept. 24 & 26, 1997

Solutions. Solution The values of the signals are as follows:

CHAPTER 7: The CPU and Memory

CHAPTER 4 MARIE: An Introduction to a Simple Computer

Numbering Systems. InThisAppendix...

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

VLIW Processors. VLIW Processors

HOMEWORK # 2 SOLUTIO

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

How It All Works. Other M68000 Updates. Basic Control Signals. Basic Control Signals

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Chapter 1 Computer System Overview

An Introduction to the ARM 7 Architecture

Week 1 out-of-class notes, discussions and sample problems

1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1.

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

CS352H: Computer Systems Architecture

İSTANBUL AYDIN UNIVERSITY

CS 61C: Great Ideas in Computer Architecture Finite State Machines. Machine Interpreta4on

Course on Advanced Computer Architectures

Central Processing Unit (CPU)

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

WAR: Write After Read

The Big Picture. Cache Memory CSE Memory Hierarchy (1/3) Disk

CSE 141 Introduction to Computer Architecture Summer Session I, Lecture 1 Introduction. Pramod V. Argade June 27, 2005

Winter 2002 MID-SESSION TEST Friday, March 1 6:30 to 8:00pm

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

CPU Organisation and Operation

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

Let s put together a Manual Processor

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access?

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

Operating Systems. Virtual Memory

EC 362 Problem Set #2

Lecture 17: Virtual Memory II. Goals of virtual memory

Assembly Language Programming

CPU Performance Equation

Introduction to MIPS Assembly Programming

Computer Systems Structure Main Memory Organization

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

Read this before starting!

PROBLEMS. which was discussed in Section

Chapter 4 System Unit Components. Discovering Computers Your Interactive Guide to the Digital World

Pipeline Hazards. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by Arvind and Krste Asanovic

COS 318: Operating Systems. Virtual Memory and Address Translation

Review: MIPS Addressing Modes/Instruction Formats

Faculty of Engineering Student Number:

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

MICROPROCESSOR AND MICROCOMPUTER BASICS

Traditional IBM Mainframe Operating Principles

Chapter 5, The Instruction Set Architecture Level

RAM & ROM Based Digital Design. ECE 152A Winter 2012

CSE 141L Computer Architecture Lab Fall Lecture 2

CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont.

Figure 1: Graphical example of a mergesort 1.

Chapter 7 Memory Management

CS/COE1541: Introduction to Computer Architecture. Memory hierarchy. Sangyeun Cho. Computer Science Department University of Pittsburgh

1. Memory technology & Hierarchy

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions

Transcription:

CS 104 Practice Final Exam Solutions Name: There are 13 questions, with the point values as shown below. You have 180 minutes with a total of 150 points. Pace yourself accordingly. This exam must be individual work. You may not collaborate with your fellow students. You may use 1 sheet of notes you created, but no other external resources. I certify that the work shown on this exam is my own work, and that I have neither given nor received improper assistance of any form in the completion of this work. Signature: # Question Points Earned Points Possible 1 Vocabulary 10 2 Binary Math 10 3 C Programming 10 4 MIPS Assembly 10 5 Logic Gates 10 6 Performance 20 7 Datapaths 15 8 Caches I 10 9 Caches II 10 10 Virtual Memory 15 11 Branch Prediction 10 12 Short Answer 10 13 Multiple-Multiple-Choice 10 Total 150 Percent 100 1

Question 1: Vocabulary [10 pts] Match each of the following definitions with the appropriate vocab word: A ALU 1. Multiple hard disks combined for performance and/or reliability. M. RAID 2. A memory technology which maintains its state as long as the power is on, but loses its contents when power is turned off. R. SRAM 3. An asynchronous notification of an external event, requiring the attention of the OS. H. Interrupt 4. The structure which holds all of the translations from virtual addresses to physical addresses K. Page Table 5. Discarding incorrect instructions from a pipeline F. Flush 6. A piece of logic which selects between two inputs, based on the value of a third input J. Mux 7. The idea that most of a program s data accesses are likely to be contained within a small range of nearby addresses. Q. Spatial Locality 8. A class of ISAs characterized by simple instructions which are easily implemented in high performance hardware. O. RISC 9. A type of datapath in which the CPI is always 1.0 (by definition). P. Single-cycle 10. The part of the branch predictor responsible for prediciting the taken target of branches (except for returns). B. BTB B Branch Target Buffer C CISC D DRAM E Exception F Flush G Hard Disk H Interrupt I Multi-cycle J Mux K Page Table L Pipeline M RAID N Return Address Stack O RISC P Single-cycle Q Spatial Locality R SRAM S Stall T Temporal Locality U XOR-gate 2

Question 2: Binary Math [10 pts] 1. Convert the number -42 to signed, 2 s complement 8-bit binary. 1101 0110 2. Write the binary number 0110 1111 0000 0101 in hexidecimal. 0x6F05 3. Write the hexidecimal representation of -13.75 as an IEEE single-precision floating point number. C15C0000 4. Add the binary numbers 0101 1110 + 0111 0010. 1101 0000 5. State whether the addition you did in part 4 overflows if the operands are treated as signed numbers. Yes 6. State whether the addition you did in part 4 overflows if the operands are treated as unsigned numbers. No 3

Question 3: C Programming [10 pts] Given the following linked list node definition: struct ll_node { int data; struct ll_node * next; }; Write the reverselist function which reverses a linked list, and returns the reversed list. Answer: struct ll_node * reverselist(struct ll_node * lst) { struct ll_node * ans = NULL; while (lst!= NULL) { struct ll_node * temp = lst->next; lst->next = ans; ans = lst; lst = temp; } return ans; } 4

Question 4: MIPS Assembly [10 pts] Translate the strupper function (written in C below) to MIPS assembly: void strupper(char * s) { while (*s!= \0 ) { *s = toupper(*s); s++; } } Answer: strupper: addiu $sp, $sp, 32 sw $fp, 0($sp) sw $ra, 4($sp) sw $s0, 8($sp) addiu $fp, $sp, 28 move $s0, $a0.l_lp: lbu $a0, 0($s0) beqz $a0,.l_done jal toupper sw $v0, 0($s0) addiu $s0, $s0, 1 b.l_lp.l_done lw $fp, 0($sp) lw $ra, 4($sp) lw $s0, 8($sp) addiu $sp, $sp, 32 jr $ra 5

Question 5: Logic Gates [10 pts] Given the following circuit: A AND B NOT OR C AND 1. Write the boolean formula for this circuit (A and not B) or (B and C) 2. Fill in the truth table for this circuit: A B C Out 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 0 1 1 1 1 6

Question 6: Performance [20 pts] A pipelined processor executes a 100-billion instruction program, and has the following performance related characteristics: 20% Loads, 10% Stores, 20% Branches, 50% ALU. Clock frequency: 2.0 GHz (0.5ns). Branch mis-prediction penalty: 5 cycles. Accuracy: 90%. Load-to-use penalty: 50% of loads cause a 1-cycle penalty. L1 instruction cache: Thit included in pipeline. %miss: 1% L1 data cache: Thit: included in pipeline. %miss: 5%. The L1 data cache has a writebuffer and a writeback buffer. L2 cache: Thit: 10 cycles. %miss: 5%. Memory: Thit: 200 cycles. %miss: 0%. Compute the following: 1. What is the CPI penalty from load-to-use stalls? 0.1 2. What is the CPI penalty from branch mis-predictions? 0.1 3. What is the CPI penalty from data memory (D-cache misses)? 0.2 4. What is the CPI penalty from instruction memory (I-cache misses)? 0.2 5. What is the total CPI? 1.6 6. How long (in seconds) does the program take to execute? 80 seconds 7

7. Supposed the pipeline could be re-designed to have more stages and a faster clock. Under this new design, the clock frequency is 4.0GHz (0.25ns). The branch mis-prediction penalty increases to 10 cycles. Now, 50% of loads cause a 2-cycle load-to-use penalty. (a) What is the new CPI penalty from load-to-use stalls? 0.2 (b) What is the new CPI penalty from branch mis-predictions? 0.2 (c) What is the new CPI penalty from data memory? 0.3 (d) What is the new CPI penalty from instruction memory? 0.3 (e) What is the new total CPI? 2.0 (f) How long (in seconds) does the program take to execute now? 50 seconds 8

Question 7: Datapaths [15 pts] The following (multi-cycle) datapath has support for J-type absolute jumps (jal, j, etc), but not for I-type relative branches (beq, bne, etc). Recall that these relative branches take the immediate value, shift it left by 2, add it to PC+4, and then use that as the target (if taken). Add support to this datapath for such branches. +4 <<2 PC IMEM IW RegFile A B O DMEM D SX Const 9

Question 8: Caches I [10 pts] You are desigining the memory hierarchy for a new processor. The access latency of main memory is 200 cycles. You have the following choices for the L2 design: 2MB, 16-way set-associative. Thit = 30 cycles. %miss=1%. 1MB, 8-way set-associative. Thit = 20 cycles. %miss=5%. 512KB, 4-way set-associative. Thit = 15 cycles. %miss=10%. and the following choices for L1 designs: 64KB, 8-way set-associative. Thit = 2 cycles. %miss=4%. 32KB, 4-way set-associative. Thit = 1 cycle. %miss=5%. Which cache designs would you choose and why? Answer: The L2 design is independent of the L1 design, so we can/should select it first. For the L2, we can compute Tavg for each design: 2MB: 30 + 0.01 * 200 = 32 1MB: 20 + 0.05 * 200 = 30 512K: 15 + 0.10 * 200 = 35 This means the 1MB (Tavg=30) is the best L2 design. Now we can pick the L1: 64KB: 2 + 0.04 * 30 = 2 + 1.2 = 3.2 32KB: 1 + 0.05 * 30 = 1 + 1.5 = 2.5 This means 32KB (Tavg = 2.5) is the best L1 design. 10

Question 9: Caches II [10 pts] Assume you have an empty 32B cache with 8B blocks. The cache is 2-way setassociative. Addresses are 8 bits. The left column below lists the addresses accessed. For each address, show how it is split into tag, index and offset in the next three columns, show the new state of the cache after the access in the next 4 columns, and state the outcome whether its a hit or a miss in the last column. The first row shows the initial state of the cache. The columns for the ways of each set show the tags (only) in those ways. Way 0 is MRU, Way 1 is LRU in each set at any given time. You do not need to worry about data values. All numbers are in hex, as should be all of the numbers in your answer. Set 0 Set 1 Address Tag Index Offset Way 0 Way 1 Way 0 Way 1 Outcome (start) 0 F 7 C F1 F 0 1 F 0 7 C Hit 1F 1 1 7 F 0 1 7 Miss 18 1 1 0 F 0 1 7 Hit C2 C 0 2 C F 1 7 Miss 81 8 0 1 8 C 1 7 Miss 01 0 0 1 0 8 1 7 Miss 11

Question 10: Virtual Memory [15 pts] Suppose that a system has a 32-bit (4GB) virtual address space. It has 1GB of physical memory, and uses 1MB pages. 1. How many virtual pages are there in the address space? 4096 2. How many physical pages are there in the address space? 1024 3. How many bits are there in the offset? 20 4. How many bits are there in the virtual page number? 12 5. How many bits are there in the physical page number? 10 6. Some entries of the page table are shown below (all values are in hex, and all entries shown are valid). Translate virtual address 0x410423 to a physical address, using the translations in this page table. 0xDD10423 12

Entry Number Value 0 1F 1 3C 2 55 3 9C 4 DD 5 EE 6 99...... 20 2F 21 4C 22 65 23 AC 24 ED 25 FE 26 100...... 40 11F 41 13C 42 155 43 19C 44 1DD 45 1EE 46 199...... 13

Question 11: Branch Prediction [10 pts] One particular branch (i.e., one specific PC) has the following actual outcomes. Show the predictions for both a one-bit counter and a two-bit counter (no history). For the two-bit counter, use T for strongly taken, t for weakly taken, n for weakly not-taken, and N for strongly not taken. Finally fill in the accuracy (percentage of predictions that were correct) at the bottom of the table. The first prediction is done for you. 1-bit counter 2-bit counter Outcome Prediction Correct? Prediction Correct? T N no n no T T yes t yes T T yes T yes N T no T no T N no t yes N T no T no T N no t yes T T yes T yes T T yes T yes N T no T no Accuracy 40% 60% 14

Question 12: Short Answer [10 pts] 1. Explain what a segmentation fault is. In your answer, be sure to address what type of programming errors/program actions cause it, as well as how the hardware detects the situation. Answer: A segmentation fault occurs when a program attempts to access invalid memory. The canonical example is an attempt to dereference a NULL pointer. The hardware detects this situation by finding no valid translation for the requested address, causing a page fault. The OS then determines that the requested address lies outside of the program s address space and terminates it with a segmentation fault. 2. Compare and contrast caches and virtual memory. Give at least one similarity and one difference between the two. For the difference, explain why this difference exists. Answer: (Many possible) Similarity: both split memory into fixed sized chunks (blocks/pages), and manipulate memory at this granularity. Difference: In virtual memory, software (the OS) makes a replacement decision, while in caches, the hardware makes the replacement decision. This difference exists because Tmiss is so much larger for memory (which misses to disk) than any other level of the memory hierarchy this justifies the extra time for software transfer control into a software routine to make a complex decision, in the hopes of reducing %miss. 15

Question 13: Multiple-Multiple-Choice [10 pts] For each question, circle all that apply. If none of the selections are appropriate, then choose e. None of the above 1. The advantage(s) of virtual memory is/are: a,c a. Programmers do not need to worry about the actual memory locations holding their program. b. L1 and L2 cache hit rates improve. c. Security. d. The OS can be written in a less hardware-dependent manner. e. None of the above. 2. Which of the following will decrease conflict misses in a cache: e a. Increasing block size b. Decreasing associativity c. Decreasing Tmiss of the next level cache d. Increasing the clock frequency e. None of the above 3. Which of the following are commonly features of a RISC ISA: b,c a. Memory-to-memory operations b. 3 operand arithmetic operations c. Fixed length instruction encodings d. Load-compare-branch instructions e. None of the above 4. Which of the following are problems that can arise when pipelining: a,d a. Data Hazards b. Water Hazards 16

c. Dukes of Hazards d. Control Hazards e. None of the above 5. Which of the following is an advantage of interrupts over polling? b a. Interrupts are a simpler mechanism for both the hardware and software. b. Interrupts allow for more efficient CPU utilization. c. Interrupts are compatible with virtual memory, while polling is not. d. Polling causes bad hit rates in the L1 instruction cache. e. None of the above 17