1. Program A runs in 10 seconds on a machine with a 100 MHz clock. How many clock cycles does program A require?

Similar documents
Quiz for Chapter 1 Computer Abstractions and Technology 3.10

on an system with an infinite number of processors. Calculate the speedup of

EEM 486: Computer Architecture. Lecture 4. Performance

Chapter 2. Why is some hardware better than others for different programs?

CPU Performance. Lecture 8 CAP

EC 362 Problem Set #2

Performance evaluation

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.

Central Processing Unit (CPU)

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Block diagram of typical laptop/desktop

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Computer Organization. and Instruction Execution. August 22

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis

Board Notes on Virtual Memory

A Lab Course on Computer Architecture

What's in a computer?

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

Operation Count; Numerical Linear Algebra

SEQUENCES ARITHMETIC SEQUENCES. Examples

LSN 2 Computer Processors

Lecture 8: Binary Multiplication & Division

İSTANBUL AYDIN UNIVERSITY

2: Computer Performance

Week 1 out-of-class notes, discussions and sample problems

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Computer Architecture-I

(Refer Slide Time: 00:01:16 min)

System Models for Distributed and Cloud Computing

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

CSEE W4824 Computer Architecture Fall 2012

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Quantitative Computer Architecture

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

EE361: Digital Computer Organization Course Syllabus

Chapter 5 Instructor's Manual

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

The BBP Algorithm for Pi

Faster polynomial multiplication via multipoint Kronecker substitution

CISC, RISC, and DSP Microprocessors

Finding Rates and the Geometric Mean

Chapter 2 Logic Gates and Introduction to Computer Architecture

CS 61C: Great Ideas in Computer Architecture Finite State Machines. Machine Interpreta4on

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

Problems and Measures Regarding Waste 1 Management and 3R Era of public health improvement Situation subsequent to the Meiji Restoration

Parallel Scalable Algorithms- Performance Parameters

Pipelining Review and Its Limitations

Don t Slow Me Down with that Calculator Cliff Petrak (Teacher Emeritus) Brother Rice H.S. Chicago cpetrak1@hotmail.com

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

The Methodology of Application Development for Hybrid Architectures

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Memory management basics (1) Requirements (1) Objectives. Operating Systems Part of E1.9 - Principles of Computers and Software Engineering

Communicating with devices

LECTURE NOTES ON MACROECONOMIC PRINCIPLES

Sequences. A sequence is a list of numbers, or a pattern, which obeys a rule.

=

Lattice QCD Performance. on Multi core Linux Servers

Choosing a Computer for Running SLX, P3D, and P5

4/1/2017. PS. Sequences and Series FROM 9.2 AND 9.3 IN THE BOOK AS WELL AS FROM OTHER SOURCES. TODAY IS NATIONAL MANATEE APPRECIATION DAY

Course on Advanced Computer Architectures

Comparing Multi-Core Processors for Server Virtualization

Comparative Performance Review of SHA-3 Candidates

The Central Processing Unit:

HY345 Operating Systems

How To Understand The Design Of A Microprocessor

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Introduction to Operating Systems. Perspective of the Computer. System Software. Indiana University Chen Yu

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

ELECTENG702 Advanced Embedded Systems. Improving AES128 software for Altera Nios II processor using custom instructions

Performance metrics for parallel systems

Figure 1: Graphical example of a mergesort 1.

Enabling Technologies for Distributed Computing

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions

Conversions between percents, decimals, and fractions

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks

ARM Microprocessor and ARM-Based Microcontrollers

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

$ Example If you can earn 6% interest, what lump sum must be deposited now so that its value will be $3500 after 9 months?

In mathematics, it is often important to get a handle on the error term of an approximation. For instance, people will write

64-Bit versus 32-Bit CPUs in Scientific Computing

Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving

26 Integers: Multiplication, Division, and Order

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

Financial Mathematics

Algebra I Notes Relations and Functions Unit 03a

Python Programming: An Introduction to Computer Science

The Mathematics 11 Competency Test Percent Increase or Decrease

Computer Architecture TDTS10

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

Transcription:

(5 pts) Exercise 1-51 1. Program A runs in 10 seconds on a machine with a 100 MHz clock. How many clock cycles does program A require? (5 pts) Exercise 1-52 2. ) Our favorite program runs in 10 seconds on computer A, which has a 400 Mhz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target?" 3.) Why might machine B need more clock cycles to run the program?

(10 pts) Exercise 1-53 We wish to compare the performance of two different computers: M1 and M2. The following measurements have been made on these computers: Time on M1 Time on M2 Program 1 2.0 seconds 1.5 seconds Program 2 5.0 seconds 10.0 seconds Which computer is faster for each program, and how many times as fast is it? (10 pts) Exercise 1-56 Consider the machines from the previous exercise, and assume the following additional measurements were made: Instructions executed on M1 Instructions executed on M2 Program 1 5 x 10 9 6 x 10 9 What is the instruction execution rate (instructions per second) for each computer when running program 1?

(10 pts) Exercise 1-57 Suppose that M1 from Exercise 4-5 costs $500 and M2 costs $800. If you needed to run program 1 a large number of times, which computer would you buy in large quantities? Why? (5 pts) Exercise 1-61: MIPS Two different compilers are being tested for a 100 MHz. machine that has three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software. Compiler #1: code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. Compiler #2: code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. Which sequence will be faster according to execution time? Which sequence will be faster according to MIPS? MIPS = Inst. Count / (ExecutionTime * 10 6 )

(5 pts) Exercise 1-62 Program A runs in 0.34 seconds on a 500 Mhz machine. You know that this program requires 100 million instructions of which: 10% are mult. instructions that take an unknown number of cycle 60% are other arithmetic instructions taking 1 cycle 30% are memory instructions taking 2 cycles How many cycles does a multiplication take on this machine? (5 pts) Exercise 1-63 Program A runs in 2 seconds on a certain machine. You know that this program requires 500 million instructions of which: 30% are multiplication instructions that take 10 cycles 40% are other arithmetic instructions taking 1 cycle 30% are memory instructions taking 2 cycles Suppose multiplication could be improved to take just 1 cycle. How much faster would the new machine be compared to the old?

(10 pts) Exercise 1-66 Consider two different implementations, P1 and P2, of the same instruction set. There are five classes of instructions (A-E), which have the following average CPI on the two machines CPI on P1 CPI on P2 Class A 1 2 Class B 2 2 Class C 3 2 Class D 4 4 Class E 3 4 P1 has a clock rate of 4 GHz, P2 has a clock rate of 6 GHz. If the number of instructions executed in a certain program is divided equally among the classes of instructions except for class A, which occurs twice as often as each of the others, how much faster is P2 than P1?

(10 pts) Exercise 1-67 Suppose you wish to run a program P with 7.5 x 10 9 instructions on a 5 GHz machine with a CPI of 0.8. a. What is the expected CPU time? b. When you run P, it takes 3 seconds of wall clock time to complete. What is the percentage of the CPU time P received? (5 pts) Exercise 1-71 Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floating-point instructions? Formula: Time after Improve. = Exec. Time Unaffected +( Exe. Time Affected / Amount of Improvement)

(5 pts) Exercise 1-72 We are looking for a benchmark to show off the new floating-point unit described above, and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 100 seconds with the old floating-point hardware. How much of the execution time would floatingpoint instructions have to account for in this program in order to yield our desired speedup on this benchmark? (10 pts) Exercise 1-75 (use Amdahl s Law) You are going to enhance a computer, and there are two possible improvements: either make multiply instructions run four times faster than before, or make memory access instructions run two times faster than before. You repeatedly run a program that takes 100 seconds to execute. Of this time, 20% is used for multiplication, 50% for memory access instructions, and 30% for other tasks. What will the speedup be if you improve only multiplication? What will the speedup be if you improve only memory access? What will the speedup be if both improvements are made?