Performance evaluation Arquitecturas Avanzadas de Computadores - 2547021 Departamento de Ingeniería Electrónica y de Telecomunicaciones Facultad de Ingeniería 2015-1
Bibliography and evaluation Bibliography Lecture slides Chapter 4: Computer Organization and Design The hardware/software interface, D. A. Patterson y J. L. Henessy, Morgan Kaufman Publishers, 3rd Edition, 2005. Chapter 1: Computer architecture A quantitative approach, J. Henessy and D. Patterson, Morgan Kaufman, 5th Edition, 2011 (previous editions may be good too). Evaluation Test I (15%) covering units 1-2 2
How good is a computer? We can think of many parameters: Porcessor s clock rate Power consumed by a program Execution time for a program Number of tasks done per second Reliability Aesthetic appearance Social repercussion, etc These are the metrics, the things we want to estimate or measure (not all of them are easy to measure though) How should we compare two computer systems? 3
Performance: Latency vs. Throughput Latency: time to finish a fixed task Throughput: number of tasks per unit of time Different: exploit parallelism for throughput, not latency Usually a trade-off: latency vs. throughput Choose definition of performance that matches your goals Scientific program: latency; web server: throughput? Example: transport people 10 km Car: capacity = 5, speed = 60 kmh Bus: capacity = 60, speed = 20 kmh Latency: car = 10 min, bus = 30 min Throughput: car = 15 pph (count return trip), bus = 60 pph 4
Example: latency vs. throughput Do the following changes to a computer system increase throughput, decrease response time or both? a) Replacing the processor with a faster version b) Adding more processors to a systems that uses multiple processors for separate tasks (a web sever) Answer a) Both b) Throughput 5
Comparing Performance System a is x times faster than b if latency a = latency(b) x throughput a = throughput b x System a is x% faster than b if latency a = latency(b) (1 + x 100) throughput a = throughput b (1 + x 100) Car/bus example Latency? Car is 3 times (and 200%) faster than bus Throughput? Bus is 4 times (and 300%) faster than car 6
Performance definitions Let s define our final goal as to minimize the execution time for some application, then we can define performance in terms of execution time as follows: performance a = 1 execution_time(a) 7
Execution time Execution time is affected by multiple factors in a computer system: execution time = CPU time + disk access + memory access + I/O activities + OS overhead We will focus on CPU time since we ll study mostly the processor. However, some applications depend heavily on e.g. disk access performance. 8
CPU time We measure CPU time in seconds, but Remember that computer HW works synchronously, with a clock signal, having a period and a frequency data reg logic reg clock How to relate clock cycles with CPU time? 9
Clock cycles and CPU time Just use one of the two simple formulas: CPU time = clock cycles * cycle time Or using clock rate CPU time = clock cycles / cycle rate Classic designer s tradeoff : Attempting to reduce the clock cycles may lead to reducing the clock rate too, and vice versa 10
Book exercise 11
Answer 12
How about instructions? Since a program executes instructions, they should also play a part in the CPU performance equations So far we had: CPU time = clock cycles * cycle time Now we will also say that: clock cycles = instructions for a program * average clock cycles per instruction IC: Instruction Count Static IC vs. dynamic IC What is needed to determine each? CPI: Cycles Per Instruction Can be used to compare two ISA implementations 13
14
The CPU performance equation Finally, the classic formula that incorporates the three key factors that affect performance is: CPU time = Instruction Count * CPI * cycle time Or CPU time = Instruction Count * CPI / clock rate 15
CPU Performance Equation Factors affecting CPU execution time: Factor Inst. count CPI Clock rate Program x (x) Compiler x (x) ISA x x (x) Microarchitecture x x Technology x CPU time = Instruction Count * CPI / clock rate 16
Cycles per Instruction (CPI) Depends on the instruction CPIi = Execution Time of Instruction i * Clock Rate Computing the total CPI: Example: program dependent! 17
Another CPI Example Assume a processor with instruction frequencies and costs Integer ALU: 50%, 1 cycle Load: 20%, 5 cycle Store: 10%, 1 cycle Branch: 20%, 2 cycle Which change would improve performance more? a) Faster branch prediction to reduce branch cost to 1 cycle? b) Better data cache to reduce load cost to 3 cycles? Compute CPI Base = 0.5*1 + 0.2*5 + 0.1*1 + 0.2*2 = 2 A = 0.5*1 + 0.2*5 + 0.1*1 + 0.2*1 = 1.8 B = 0.5*1 + 0.2*3 + 0.1*1 + 0.2*2 = 1.6 (winner) 18
Book example 19
Answer 20
IPC, MIPS and GHz The metrics you are most likely to see in marketing are IPC (instruction per cycle), MIPS (million instruction per second) and GHz How are they incomplete? Back to the CPU time formula: 1/IPC 1/MIPS 1/GHz Which processor would you buy? Processor A: CPI = 2, clock = 5 GHz Processor B: CPI = 1, clock = 3 GHz Probably A, but B is faster (assuming same ISA/compiler) Meta-point: danger of partial performance metrics! GHz can be boosted artificially by design (lower the other 2 terms) e.g., 800 MHz PentiumIII faster than 1 GHz Pentium4! 21
Gene Amdahl American computer architect Born in 1922 Worked for IBM until 1970 Founded Amdahl Corporation to compete in the mainframe market against IBM Proposed the later known as Amdahl s Law during the 1967 Spring Joint Computer Conference 22
Amdahl s law Suppose an enhancement speeds up a fraction f of a task by a factor of Sf If f is small Sf doesn t matter. Concentrate effort on improving frequently occurring events or frequently used 23
Practicing Amdahl s law 1. What is the percentage of time each instruction takes? 2. How much is the total time reduced if the time for FP instructions is reduced by 20%? How much is the total speed up? 3. How much is the total time reduced if the time for L/S instructions is reduced by 20%? How much is the total speed up? 4. Can the total time be reduced by 20% by reducing only the time for branch instructions? 5. What s the theoretical speed up limit by reducing the branch instructions time? 24
Another exercise 25