Quantitative Computer Architecture



Similar documents
Output Analysis (2, Chapters 10 &11 Law)

EEM 486: Computer Architecture. Lecture 4. Performance

Domain 1: Designing a SQL Server Instance and a Database Solution

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

(VCP-310)

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Convexity, Inequalities, and Norms

Lesson 17 Pearson s Correlation Coefficient

Now here is the important step

Solving Logarithms and Exponential Equations

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Overview of some probability distributions.

LECTURE 13: Cross-validation

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Types of Workloads. Raj Jain. Washington University in St. Louis

1 Computing the Standard Deviation of Sample Means

CHAPTER 3 THE TIME VALUE OF MONEY

Properties of MLE: consistency, asymptotic normality. Fisher information.

Section 11.3: The Integral Test

1. C. The formula for the confidence interval for a population mean is: x t, which was

Determining the sample size

ODBC. Getting Started With Sage Timberline Office ODBC

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Descriptive Statistics

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Math C067 Sampling Distributions


THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

Soving Recurrence Relations

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

3. Greatest Common Divisor - Least Common Multiple

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Hypothesis testing. Null and alternative hypotheses

AP Calculus AB 2006 Scoring Guidelines Form B

Queuing Systems: Lecture 1. Amedeo R. Odoni October 10, 2001

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

on an system with an infinite number of processors. Calculate the speedup of

Lesson 15 ANOVA (analysis of variance)

Bond Valuation I. What is a bond? Cash Flows of A Typical Bond. Bond Valuation. Coupon Rate and Current Yield. Cash Flows of A Typical Bond

Chapter 2. Why is some hardware better than others for different programs?

Asymptotic Growth of Functions

Sequences and Series

A probabilistic proof of a binomial identity

Normal Distribution.

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

CS103X: Discrete Structures Homework 4 Solutions

Basic Elements of Arithmetic Sequences and Series

Incremental calculation of weighted mean and variance

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

A Secure Implementation of Java Inner Classes

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

CCH Accountants Starter Pack

Measures of Spread and Boxplots Discrete Math, Section 9.4

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

1 Correlation and Regression Analysis

Chapter XIV: Fundamentals of Probability and Statistics *

S. Tanny MAT 344 Spring be the minimum number of moves required.

Simple Annuities Present Value.

France caters to innovative companies and offers the best research tax credit in Europe

Time Value of Money. First some technical stuff. HP10B II users

FOUNDATIONS OF MATHEMATICS AND PRE-CALCULUS GRADE 10

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Theorems About Power Series

Page 1. Real Options for Engineering Systems. What are we up to? Today s agenda. J1: Real Options for Engineering Systems. Richard de Neufville

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

Basic Measurement Issues. Sampling Theory and Analog-to-Digital Conversion

Agenda. Outsourcing and Globalization in Software Development. Outsourcing. Outsourcing here to stay. Outsourcing Alternatives

Static revisited. Odds and ends. Static methods. Static methods 5/2/16. Some features of Java we haven t discussed

Chapter 7: Confidence Interval and Sample Size

WindWise Education. 2 nd. T ransforming the Energy of Wind into Powerful Minds. editi. A Curriculum for Grades 6 12

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

iprox sensors iprox inductive sensors iprox programming tools ProxView programming software iprox the world s most versatile proximity sensor

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Betting on Football Pools

The Big Picture: An Introduction to Data Warehousing

I. Chi-squared Distributions

client communication

Data Center Ethernet Facilitation of Enterprise Clustering. David Flynn, Linux Networx Orlando, Florida March 16, 2004

Infinite Sequences and Series

CS100: Introduction to Computer Science

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Learning objectives. Duc K. Nguyen - Corporate Finance 21/10/2014

Neolane Reporting. Neolane v6.1

CHAPTER 11 Financial mathematics

CCH CRM Books Online Software Fee Protection Consultancy Advice Lines CPD Books Online Software Fee Protection Consultancy Advice Lines CPD

OMG! Excessive Texting Tied to Risky Teen Behaviors

2-3 The Remainder and Factor Theorems

PUBLIC RELATIONS PROJECT 2016

Ekkehart Schlicht: Economic Surplus and Derived Demand

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

Transcription:

Performace Measuremet ad Aalysis i Computer Quatitative Computer Measuremet Model Iovatio Proposed How to measure, aalyze, ad specify computer system performace or My computer is faster tha your computer! Implemetatio Aalysis What is Performace? How to measure Executio? Executio? Throughput? Of What? What is relative performace? How is it specified? % time program... program results... 160.7u 19.9s 4:15 71% % Wall-clock time? user CPU time? user + kerel CPU time? Aswer:

Relative Performace ca be cofusig A rus i 12 secods B rus i 20 secods A/B =.6, so A is 40% faster, or 1.4X faster, or B is 40% slower B/A = 1.67, so A is 67% faster, or 1.67X faster, or B is 67% slower eeds a precise defiitio Relative Performace, the Defiitio Speedup (of x over y) Relative Performace X Executio = = Y = Performace Performace Y Executio X We ca remove all ambiguity by always costraiig to be > 1 => machie x is times faster tha y. Examples your program rus i 5 miutes o a Itel Xeo, but 2 miutes o a Core i7 processor. How much faster is the i7 processor? aother program rus i 10 miutes with the stadard compiler, but whe recompiled with a ew compiler, the program rus i 9 miutes. How much faster is the ew compiled program (what is the speedup)? How to Specify Performace, i summary Performace oly has meaig i the cotext of a program or workload (MIPS, GFLOPS???). Whe talkig about the performace of a sigle machie, we talk about respose time or throughput. Whe talkig about relative performace, we will say machie y has a speedup of over machie x based o the ratio of their executio times for a workload. speedup of 1.6 1.6 times as fast 60% speedup [correct but more ofte misiterpreted]

But What Workload? Sythetic workloads whetstoe, dhrystoe,... toy bechmarks puzzle, quicksort, sieve,... kerels livermore loops, lipack real programs To maximize their efforts, architects will attempt to mirror the decisio process of the market. Whe the market uses poor measuremet methodology, we ca get poor architectures! SPEC: System Performace Evaluatio Cooperative First Roud 1989 10 programs yieldig a sigle umber Secod Roud 1992 SpecIt92 (6 iteger programs) ad SpecFP92 (14 floatig poit programs) Compiler Flags ulimited. Third Roud 1995 Sigle flag settig for all programs; ew set of programs Fourth Roud, 2000 More complex programs, larger data sets Fifth Roud, 2006 Loger ruig time, some larger data sets, more applicatio areas SPEC combies real programs with eforced measuremet stadards. SPEC First Roud Oe program: 99% of time i sigle lie of code New frot-ed compiler could improve dramatically SPEC First Roud Oe program: 99% of time i sigle lie of code New frot-ed compiler could improve dramatically 800 700 600 500 400 300 200 100 0 gcc epresso spice doduc asa7 li eqtott matrix300 fpppp tomcatv Bechmark

How to Summarize Performace Real workloads typically ivolve multiple programs, ad thus, multiple results. Popular bechmarks (e.g., SPEC, livermore loops,...) ivolve multiple programs. Everyoe wats to summarize results with a sigle umber. But the summarized result ca be dramatically skewed by the method used to combie them. How to Summarize Performace Computer A Computer B Computer C Program 1 1 10 20 Program 2 1000 100 20 Total time 1001 110 40 Which machie is fastest? How to Summarize Performace Arithmetic Mea 1 i i1 Weighted Arithmetic Mea i * Weight i i 1 where the sum of the weights is 1. Geometric Mea Harmoic Mea i 1 i 1 Executio i i 1 ExecutioRatio i Executio base 1 Rate i Summarizig Performace A B C W(1) W(2) W(3) Program 1 1 10 20.5.909.999 Program 2 1000 100 20.5.091.001 AM/W(1) 500.5 55 20 AM/W(2) 91.82 18.18 20 AM/W(3) 2 10.09 20 GM 31.6 31.6 20 which machie is fastest ow?

Summarizig Performace Eve the uweighted arithmetic mea implies a weightig Geometric mea does ot ecessarily predict executio time for ay mix of the programs ratios of geometric meas ever chage (regardless of which machie is used as the base), ad always give equal weight to all bechmarks To give uequal weight requires weighted arithmetic mea Aalyzig Performace That was all about measurig performace. What tools do we use to aalyze (predict) performace i the absece of somethig to measure? models, equatios, queueig theory, mea value aalysis, istructio-level simulatio, gate-level simulatio,... Measuremet Model Iovatio Proposed Implemetatio Aalysis Speedup (due to architectural chage) Speedup is just relative performace o the same machie with somethig chaged. From before, the: Amdahl s Law The impact of a performace improvemet is limited by the percet of executio time affected by the improvemet speedup = relative performace = ET for etire task without chage ET for etire task with chage Executio time after improvemet = Executio Affected Amout of Improvemet + Executio Uaffected Suppose the chage oly affects part of executio time Make the commo case fast!!

Amdahl s Law ad Massive Parallelism.9.1.45.225.1.1.1 Speedup 1.0 1/.55 = 1.82 1/.325 = 3.07 < 10 Examples program A rus for 30 secods, but 5 secods of that time is just waitig for memory. If we double the speed of the memory subsystem, what is the speedup? fp istructios accout for 10% of executio time of program B. Should we double the speed of the fp istructios, or speed up iteger by 20%? How much do we eed to speed up the memory to get a 20% improvemet i program A? What is? How may clock cycles? CPU Executio = CPU clock cycles * Clock cycle time = CPU clock cycles / Clock rate Every covetioal processor has a clock with a associated clock cycle time or clock rate. Every program rus i a itegral umber of clock cycles. GHz = billios of cycles/secod X GHz = 1/X aosecods cycle time Number of CPU cycles = Istructios executed * Average Clock Cycles per Istructio (CPI) CPI = CPU clock cycles / Istructio cout or

All Together Now Examples CPU Executio secods Istructio CPI = X X Cout Clock Cycle 4 GHz processor, program rus i 30 secods, executig 40 billio istructios: CPI =?? If we reduce CPI to 2.4, ET =?? ew compiler reduces IC to 32 billio, but icreases CPI to 2.6: good or bad? 2 GHz Core i7 has CPI of.9, 4 GHz Core i7 has a CPI of 1.1 (why?): What s the speedup for that workload? istructios cycles/istructio secods/cycle Who Affects Performace? What Affects Performace? CPU Executio Istructio CPI = X X Cout Clock Cycle CPU Executio Istructio CPI = X X Cout Clock Cycle programmer compiler istructio-set architect machie architect hardware desiger materials scietist/physicist/silico egieer pipeliig superpipeliig cache from CISC to RISC superscalar

Key Poits We eed to be precise about how to specify performace. Performace is oly meaigful i the cotext of a workload. Be careful how you summarize performace. Amdahl s law ET = IC * CPI * CT