Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 6 Fundamentals in Performance Evaluation Computer Architecture Part 6 page 1 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Why performance evaluation? Comparison of computers Selection of a computer Changes in the configuration of an existing computer (tuning) Design of computers Verification or validation of design desicions Methods for performance evaluation: (1) analytical methods (2) measurements Computer Architecture Part 6 page 2 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Aspects for evaluation modularity orthogonality adequacy virtuality symmetry transparency Is the system composed of mostly independent parts, so called modules? Does every module offer an own set of functions to the system? Is one particular function not offered by different modules? Do performance and cost of a module meet its weight for the whole system? Are the physical limits of the hardware modules been repealed to the user? (Examples: virtual memory) It is possible to derive the function of unknown parts from the properties of some known parts of the architecture, e.g. parts of the ISA? Are nonrelevant parts of the architecture been hidden to the user? (Example: transparent coprocessor) Computer Architecture Part 6 page 3 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Analytical methods Performance measures: (hypothetical maximaum performance!!) MIPS (Millions of Instructions per Second) MFLOPS (Millions of Floating Point Operations per Sec.) Mix: (as well calculated, not measured) In a mix, the average execution time for each instruction is calculated and scaled by a characteristical weight. Core-Programs: Typical application programs, written for the evaluated computer No measurements, the overall execution time is calculated using the execution times of the single machine instructions Computer Architecture Part 6 page 4 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Performance measures runtime = # clock cycles * clock period MIPS (million instruction per second) MIPS = instruction count runtime 10 6 MIPS = instruction count = instruction count clock frequency # clock cycles clock period 10 6 # clock cycles 10 6 MIPS = clock frequency = clock frequency IPC CPI 10 6 10 6 CPI (cycles per instruction) # clock cycles CPI = instruction count MFLOPS (million floating point operations per second) # executed floating point instruction MFLOPS = runtime 10 6 IPC (instructions per cycle) ICP = 1 / CPI Computer Architecture Part 6 page 5 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Drawbacks of performance measures CPI, IPC, MIPS and MFLOPS are dependent on the instruction set. CPI, IPC, MIPS and MFLOPS are dependent on the program. CPI, IPC, MIPS and MFLOPS are dependent on the microarchitecture Conclusions: Greater MIPS or MFLOPS ratings do not implicitly mean more performance! It is of vital importance to chose well-suited test applications (benchmarks)! Computer Architecture Part 6 page 6 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Measurements Benchmarks Use of existing or synthetic programs to measure the performance These programs are translated and executed on the evaluated computer Therefore, not only the computer hardware, but as well the compiler influences the outcome of a benchmark Monitoring: Monitors are used to observe parts of the computer at run-time Therefore, interesting quantities inside the computer can be measured beside the overall outcome of a benchmark (e.g. cache utilization, network traffic, ) Monitoring can be done by hardware or software Computer Architecture Part 6 page 7 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Benchmark terminology benchmark A test program. benchmark suite A set of benchmarks. synthetic benchmark A test program only useful as benchmark. kernel benchmark A very small synthetic benchmark. Usually a time intensive part of a real program is chosen. Kernel benchmarks are well suited for design and simulation but normally unqualified to compare complete systems. benchmark application A complete program additionally used as benchmark. Opposite to synthetic benchmark. Computer Architecture Part 6 page 8 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
SPEC-Benchmarks SPEC Standard Performance Evaluation Corporation since 1989, consortium of different manufacturer, general purpose computer applications, mainly to measure speed and throughput Several benchmark suites, e.g. SPEC95, SPECweb96, SPEC JVM98 SPEC JBB2000 SPEC CINT 2006 SPEC CFP 2006 Computer Architecture Part 6 page 9 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
SPECmarks Goal: comparable values for different systems But: single values don't always reflect real relations, therefore only a first indication to select or judge a computer CPU performance plus cache, memory and compiler is measured, the operating system and IO is less relevant Integer test-programs (ANSI C) Floating-point test-programs (Fortran77) SPECmark : this characteristic is the geometric mean of the individual program characteristics contained in the suite Computer Architecture Part 6 page 10 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
SPEC-CINT2006: 12 Integer test programs (C, C++) name perlbench bzip2 description PERL interpreter bzip compressionsprogram gcc GNU-C-Compiler version 3.2 mcf gobmk hmmer Simplex algorithm for traffic planning AI implementation of the game Go Protein sequence analysis based on a hidden Markov model sjeng libquantum h264ref omnetpp astar xalancbmk Chess program Quantum computer simulator H.264 codec OMNET++ discrete event simulator Route planning XML translator Computer Architecture Part 6 page 11 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
SPEC-CFP2006: 17 Floating-point test programs (C, C++, FORTRAN) name description bwaves gamess milc zeusmp gromacs cactusadm Fluid dynamics algorithm Quantum chemistry algorithm Physics algorithm Fluid dynamics algorithm Newton's equations of motion Equation solver for Einstein's evolutionary equation leslie3d namd dealll soplex povray calculix GemsFDTD Fluid dynamics algorithm Biomolecular simulation Finite-Elements Simplex algorithm Image rendering Finite-Elements Maxwell equation solver tonto lbm wrf Shinx3 Quantum chemistry Lattice-Bolzmann-simulator Weather modeling Speach recognition Computer Architecture Part 6 page 12 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
More popular benchmark suites Basic Linear Algebra Subprograms (BLAS): For numerical applications Core of the LINPACK software package to solve lienar equation systems TOP 500 list of the fastest parallel computers Whetstone-Benchmark: Developed in the seventies, a single program with lot of floating-point calculations Dhrystone-Benchmark: Improvement of Whetstone, developed in the eighties Powerstone-Benchmark-Suite: To compare the energy consumption of microprocessors and microcontrollers Computer Architecture Part 6 page 13 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Powerstone benchmark suite name description auto bilv bilt compress crc des dhry engine fir_int Vehicle control Logical and shift operations Graphical application UNIX compression program CRC error detection Data encryption Dhrystone Engine control Integer FIR filter g3fax FAX group 3 g721 jpeg pocsag servo summin ucbqsort v42bits whet Audio compression JPEG 24-Bit compression Communication protocol for pagers Hard disc control Hand writing recognition Quick sort Modem operation Whetstone Computer Architecture Part 6 page 14 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Monitoring Monitors are components recording the states of a system during its normal operation. Contents of registers, flags, buffers and traffic in data paths are recorded. Monitors are used to observe and debug systems. Computer Architecture Part 6 page 15 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Monitoring Generally, monitors can be classified in: a) Hardware monitors A hardware monitor is a separate component which is physically connected to the locations of the target system where measurements take place. Hardware monitors typically consist of comparators and counters to create data, memories to store it and busses for data transport. Thus, hardware monitors use its own resources. Computer Architecture Part 6 page 16 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Monitoring b) Software monitors A software monitor is a program, implemented to collect measuring data through interfaces provided by the operation system, the programming languages or application program. A software monitor uses the resources of the observed system to collect, transport and store data. c) Hybrid monitors A hybrid monitor is a mixed hardware and software monitor. Often simple elements like counters and memories are implemented in hardware while more complex observation functions are implemented in software. Computer Architecture Part 6 page 17 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Monitoring constraints 1. Accessing information Ideally monitoring is integrated into the hardware and software components of a system during design. Software monitors are cheaper than hardware monitors but they may influence the systems run time behavior. 2. Reaction less monitoring Hardware and most hybrid monitors store the recorded data in their own memories. Software monitors have to use the memories of the observed system. Thus, hardware monitors are more reaction less than software monitors. Computer Architecture Part 6 page 18 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Monitoring constraints: 3. Amount of recorded data and its further processing Most purposes, especially debugging, require observations with high resolution. For the accurate analysis of program errors the causing machine instruction has to be identified. For other purposes, e.g. a global performance analysis, a coarser resolution is sufficient. Although it often seems necessary to record observable data on the level of machine instruction execution, this would generate traces much greater than the memory usage of the observed application. Thus, the cost to store this high amount of data and the general difficulties of processing the trace data prohibit a complete recording of traces at machine instruction level. Computer Architecture Part 6 page 19 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Instrumentation One way of software monitoring is to insert measuring commands into program code e.g. loop or time counters. This is called instrumentation. Instrumentation can be performed by the user, the compiler, the class library or the operation system. instrumented program computer measure system results measure results Computer Architecture Part 6 page 20 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Montitoring overview method direct instrumentation trace driven simulation system state accuracy tools hardware very high Hardware monitor hardware high instrumented program hard- and satisfactory simulation program software + hardware Trace simulation software sufficient simulation program Computer Architecture Part 6 page 21 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting
Typical load-dependent parameters throughput Defines the average number of jobs completed per time unit. A job may be: execution of an instruction or a program, saving a data block or sending a message. utilization Defines the throughput (average number of jobs completed) divided by the maximum possible throughput. response time Defines the average time needed to complete a job. utilization ratio Defines the time spent working on the jobs divided by whole operating time. Computer Architecture Part 6 page 22 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting