ECE 6100 Advanced Computer Architecture Sample Final Exam Solns

Similar documents
Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Lecture: Pipelining Extensions. Topics: control hazards, multi-cycle instructions, pipelining equations

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

RAID Performance Analysis

Pipelining Review and Its Limitations

Communicating with devices

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

on an system with an infinite number of processors. Calculate the speedup of

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Price/performance Modern Memory Hierarchy

University of Dublin Trinity College. Storage Hardware.

Computer Architecture TDTS10

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

William Stallings Computer Organization and Architecture 7 th Edition. Chapter 6 External Memory

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

EE361: Digital Computer Organization Course Syllabus

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer


Performance evaluation

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

Introduction To Computers: Hardware and Software

Mass Storage Structure

Record Storage and Primary File Organization

CS 6290 I/O and Storage. Milos Prvulovic

Course on Advanced Computer Architectures

Chapter 1. The largest computers, used mainly for research, are called a. microcomputers. b. maxicomputers. c. supercomputers. d. mainframe computers.

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

PROBLEMS. which was discussed in Section

Architecture of Hitachi SR-8000

LSN 2 Computer Processors

WAR: Write After Read

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

System Architecture. CS143: Disks and Files. Magnetic disk vs SSD. Structure of a Platter CPU. Disk Controller...

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.

Physical Data Organization

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Computer Systems Structure Main Memory Organization

Introduction to I/O and Disk Management

Outline. CS 245: Database System Principles. Notes 02: Hardware. Hardware DBMS Data Storage

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Solid State Drive Architecture

Computer Organization. and Instruction Execution. August 22

Week 1 out-of-class notes, discussions and sample problems

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Big Picture. IC220 Set #11: Storage and I/O I/O. Outline. Important but neglected

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends

A Lab Course on Computer Architecture

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

PROBLEMS #20,R0,R1 #$3A,R2,R4

1 Storage Devices Summary

IA-64 Application Developer s Architecture Guide

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Platter. Track. Index Mark. Disk Storage. PHY 406F - Microprocessor Interfacing Techniques

Thread level parallelism

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

ARM Microprocessor and ARM-Based Microcontrollers

Chapter 5 Instructor's Manual

Module: Software Instruction Scheduling Part I

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

Data Storage - II: Efficient Usage & Errors

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis

Using Graphics and Animation to Visualize Instruction Pipelining and its Hazards

Hardware: Input, Processing, and Output Devices. A PC in Every Home. Assembling a Computer System

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Classes of multimedia Applications

Chapter 7 Types of Storage. Discovering Computers Your Interactive Guide to the Digital World

LCMON Network Traffic Analysis

OS OBJECTIVE QUESTIONS

Instruction scheduling

Database Management Systems

FPGA-based Multithreading for In-Memory Hash Joins

Lecture 9: Memory and Storage Technologies

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

Next Generation GPU Architecture Code-named Fermi

1. Memory technology & Hierarchy

PIONEER RESEARCH & DEVELOPMENT GROUP

An Approach to High-Performance Scalable Temporal Object Storage

Computer Graphics Hardware An Overview

EEM 486: Computer Architecture. Lecture 4. Performance

ELECTENG702 Advanced Embedded Systems. Improving AES128 software for Altera Nios II processor using custom instructions

Software Pipelining by Modulo Scheduling. Philip Sweany University of North Texas

William Stallings Computer Organization and Architecture 8 th Edition. External Memory

High-speed image processing algorithms using MMX hardware

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

HP Z Turbo Drive PCIe SSD

Dynamode External USB3.0 Dual RAID Encloure. User Manual.

Transcription:

Problem 1 (3 parts, 20points) Pipelining Speedup Suppose a program running on a RISC machine performs 16,000,000 instructions during its execution. The total time it takes to execute an instruction is 200 ns, independent of the clock cycle time. The total amount of work that needs to be performed on each instruction is infinitely divisible, so there may be any number of pipeline stages. 1a) [6 points] Complete the table below by computing the stage time, total execution time, and speedup (relative to the non-pipelined case) for the different pipelining depths. Ignore all hazards (i.e., assume ideal pipelining for this part). Neglect stage time increases caused by pipeline register delays, etc., for this part. Pipeline Depth Stage time Total Execution Time Pipeline Speedup 1 200 ns 3.2 sec 1 2 100 ns 1.6 sec 2 4 8 50 ns 0.8 sec 4 25 ns 0.4 sec 8 Execution Time = 16 M instructions * 1 cycle/instruction * stage_time 1b) [6 points] Now suppose pipelining register delays and processor control overhead adds 10 ns to the latency of each pipeline stage. (So, for example, if there are four pipeline stages, each instruction will have an execution latency of 240 ns and the pipelined machine produces 1 instruction every 60 ns.) What is the maximum speedup that can be obtained through pipelining? Assume there are no hazards (ideal pipelining). Shortest possible stage time = 10ns Total execution time = 16M instructions * 1 cycle/inst * 10ns = 0.16 seconds Original execution time = 3.2 seconds Speedup = 3.2/0.16 = 20 OR Shortest possible stage time = 10ns Maximum throughput is 1/10ns. Compare this to throughput of nonpipelined case 1/200ns: 1/10 / 1/200 = 20 Maximum Speedup: 20 1

1c) [8 points] Now take into account stalls caused by hazards in the pipeline. Complete the table below using the average stall cycles per instruction listed for each pipeline depth. Ignore stage time increases caused by pipeline register delays, control overhead, etc., for this part. Pipeline Depth Average # Stall Cycles/Instruction Stage Time Total Execution Time Pipeline Speedup 1 0.0 200 ns 3.2 sec 1 2 0.6 100 ns 2.56 sec 1.25 4 1.4 50 ns 1.92 sec 1.67 8 4.1 25 ns 2.04 sec 1.57 Execution Time = Instruction_Count * CPI * stage_time 16M insts * 1.6 cpi * 100ns = 2.56 sec 16M insts * 2.4 cpi * 50ns = 1.92 sec 16M insts * 5.1 cpi * 25ns = 2.04 sec 2

Problem 2 (5 parts, 20 points) Video CD-ROMs Consider a video CD-ROM system. Suppose it takes 1 byte per pixel to represent the pixel s color and a single image frame in a movie contains 16K pixels (the size of the frame is approximately 128 pixels by 128 pixels). A CD-ROM drive (1X) has a 150 KB/second transfer rate and a total storage capacity of 600 MB per disk. A typical flicker-free movie must run at 30 frames/second. 2a) [4 points] How many frames per second can be provided with a 1X CD-ROM? (show work) 150 KB/sec / 16KB/frame = 9.375 9.375 frames/second. 2b) [4 points] In general, an nx CD-ROM drive spins n times as fast and provide n times the transfer rate as a 1X CD-ROM. (For example, a 2X CD-ROM drive spins the CD-ROM twice as fast and has twice the transfer rate.) How many times faster than the original (1X) CD- ROM do we need to spin our CD-ROM to get the transfer rate necessary for a flicker-free movie? (show work) 9.375 frames/sec * N = 30 frames/sec; N = 3.2, round up to 4 X OR 30 frames/sec * 16 KB/frame = 480 KB/sec; 480/150 = 3.2; round up to 4 X 4 X CD-ROM. 2c) [4 points] How many frames of a movie can be stored on a CD-ROM (1X)? (show work) 600 MB / 16KB/frame = 37.5 K 37500 frames. 2d) [4 points] If we are able to run our movie at 30 frames per second, how many minutes of a movie can be stored on the CD-ROM? (show work) 37.5K frames / 30 frames/sec = 1250 sec = 20.83 min 20.83 minutes. 2e) [4 points] How much data compression do we need to do to fit a 120 minute movie on the CD-ROM? (show work) 120 / 20.83 = 5.76 X; round up to 6 X 6 times reduction in the data. 3

Problem 3 (2 parts, 15 points) The Hazards of Multi-cycle Functional Units Consider the following program fragment executing on a basic 5-stage DLX pipeline with all stages taking 1 cycle, except the Execute stage, which takes a variable number of cycles, depending on the functional unit used: Functional unit Number of EX cycles Integer ALU 1 Floating Point Add 5 Floating Point Load/Store 2 Floating Point Multiply 3 Assume registers are written in the first half of the clock cycle and read in the second half. 3a) [9 points] Suppose the instructions enter the pipeline in order, with a new instruction starting on each cycle. (That is, assume there is no hazard detection mechanism being used and no stalls are introduced to avoid hazards.) Determine which data hazards occur in executing this program fragment. Indicate the hazards as in the following example: if there is a WAR (antidependence) hazard between instructions 3 and 4, involving register F8, and 3 precedes 4, put 3(F8) in the WAR column to the right of instruction 4. An instruction may cause more than one hazard. Assume there are no instructions previous to instruction 1. Instruction RAW (true) WAR (anti) WAW (output) 1: SUBF F1, F2, F3 2: ADDF F1, F4, F5 3: MULTF F6, F3, F1 2(F1) 4: SF 100(R1), F6 3(F6) 5: LF F1, 0(R1) 2(F1) 6: ADDF F2, F1, F6 3(F6), 5(F1) 3b) [6 points] In this part, assume there is no forwarding hardware. If the instructions are executed in order, determine how many stall cycles are required for each instruction (i.e., how many bubbles must be inserted BEFORE each instruction to avoid all data hazards). Instruction Number of Stalls 1: SUBF F1, F2, F3 0 2: ADDF F1, F4, F5 0 3: MULTF F6, F3, F1 6 4: SF 100(R1), F6 4 5: LF F1, 0(R1) 0 6: ADDF F2, F1, F6 3 4

Problem 4 (4 parts, 20 points) Disk Technology Suppose we have a magnetic disk with the following parameters. Controller overhead 3 ms Average seek time 10 ms Rotation rate 5400 revolutions/minute Transfer rate 2.88 MB/second # sectors per track 32 sectors/track Sector size 1 KByte 4a) [5 points] What is the average time to read or write a single sector? (show work) 3 ms + 10ms + ½(60/5400) +1KB/2.88MB/sec = 3 + 10 + 5.56 + 0.35 ms = 18.9 ms 18.9 ms 4b) [5 points] What is the average time to read or write 16 KB in 16 consecutive sectors in the same cylinder? (show work) 3 ms + 10ms + ½(60/5400) +16KB/2.88MB/sec = 3 + 10 + 5.56 + 16(0.35) ms 24.12 ms 4c) [5 points] What is the average time to read or write an entire track (32 consecutive Kbytes)? Assume sectors can be read or written in any order. (show work) 3 ms + 10ms + 0 ms + 32KB/2.88MB/sec = 3 + 10 + 11.11 = 24.11 ms 24.11 ms 4d) [5 points] Now suppose we have an array of 8 of these magnetic disks. The disks are synchronized so that the arms on all the disks are always over the same track and the same sector within the track. The data is striped across the disks in the array so that 8 consecutive sectors can be read in parallel. What is the average time to read or write 16 consecutive KB in the disk array system? (show work) 3 ms + 10ms + ½(60/5400) +2KB/2.88MB/sec = 3 + 10 + 5.56 + 2(0.35) ms = 19.25 ms 5