Performance evaluation

Similar documents
Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends

EEM 486: Computer Architecture. Lecture 4. Performance

on an system with an infinite number of processors. Calculate the speedup of

CSEE W4824 Computer Architecture Fall 2012

CPU Performance. Lecture 8 CAP

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

EE361: Digital Computer Organization Course Syllabus

Week 1 out-of-class notes, discussions and sample problems

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Pipelining Review and Its Limitations

Introducción. Diseño de sistemas digitales.1

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

How To Understand The Design Of A Microprocessor

Computer Architecture Syllabus of Qualifying Examination

Course on Advanced Computer Architectures

Virtualization and Cloud Computing. Sorav Bansal

Chapter 2. Why is some hardware better than others for different programs?

Communicating with devices

Computer Organization and Components

Driving force. What future software needs. Potential research topics

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

A Lab Course on Computer Architecture

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis

Introduction to Cloud Computing

Real-Time Scheduling 1 / 39

Software and the Concurrency Revolution

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Networking Virtualization Using FPGAs

Computer Organization. and Instruction Execution. August 22

Processor Architectures

Computer Architecture

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Scaling in a Hypervisor Environment

Design Cycle for Microprocessors

System Models for Distributed and Cloud Computing

Five Families of ARM Processor IP

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Introduction to Microprocessors

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Thread level parallelism

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Capacity Planning for Microsoft SharePoint Technologies

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

Computer Systems Structure Input/Output

Central Processing Unit (CPU)


Architecture of Hitachi SR-8000

Virtualization. Pradipta De

Java Performance. Adrian Dozsa TM-JUG

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Advanced Computer Architecture

64-Bit versus 32-Bit CPUs in Scientific Computing

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

VLIW Processors. VLIW Processors

AMD Opteron Quad-Core

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Instruction Set Design

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

Computer Architecture-I

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner

Chapter 1: Introduction. What is an Operating System?

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Building Blocks for PRU Development

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

NVIDIA Tools For Profiling And Monitoring. David Goodwin

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Application. Performance Testing

The Bus (PCI and PCI-Express)

CISC, RISC, and DSP Microprocessors

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Client/Server and Distributed Computing

Contents. Chapter 1. Introduction

Chapter 2 Logic Gates and Introduction to Computer Architecture

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

CS 6290 I/O and Storage. Milos Prvulovic

High Performance Computing. Course Notes HPC Fundamentals

Parallel Algorithm Engineering

Enterprise Applications

Capacity Estimation for Linux Workloads

Chapter 13 Selected Storage Systems and Interface

EC 362 Problem Set #2

Datacenter Operating Systems

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

ICS Principles of Operating Systems

Chapter 2 - Computer Organization

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

U. Wisconsin CS/ECE 752 Advanced Computer Architecture I

Transcription:

Performance evaluation Arquitecturas Avanzadas de Computadores - 2547021 Departamento de Ingeniería Electrónica y de Telecomunicaciones Facultad de Ingeniería 2015-1

Bibliography and evaluation Bibliography Lecture slides Chapter 4: Computer Organization and Design The hardware/software interface, D. A. Patterson y J. L. Henessy, Morgan Kaufman Publishers, 3rd Edition, 2005. Chapter 1: Computer architecture A quantitative approach, J. Henessy and D. Patterson, Morgan Kaufman, 5th Edition, 2011 (previous editions may be good too). Evaluation Test I (15%) covering units 1-2 2

How good is a computer? We can think of many parameters: Porcessor s clock rate Power consumed by a program Execution time for a program Number of tasks done per second Reliability Aesthetic appearance Social repercussion, etc These are the metrics, the things we want to estimate or measure (not all of them are easy to measure though) How should we compare two computer systems? 3

Performance: Latency vs. Throughput Latency: time to finish a fixed task Throughput: number of tasks per unit of time Different: exploit parallelism for throughput, not latency Usually a trade-off: latency vs. throughput Choose definition of performance that matches your goals Scientific program: latency; web server: throughput? Example: transport people 10 km Car: capacity = 5, speed = 60 kmh Bus: capacity = 60, speed = 20 kmh Latency: car = 10 min, bus = 30 min Throughput: car = 15 pph (count return trip), bus = 60 pph 4

Example: latency vs. throughput Do the following changes to a computer system increase throughput, decrease response time or both? a) Replacing the processor with a faster version b) Adding more processors to a systems that uses multiple processors for separate tasks (a web sever) Answer a) Both b) Throughput 5

Comparing Performance System a is x times faster than b if latency a = latency(b) x throughput a = throughput b x System a is x% faster than b if latency a = latency(b) (1 + x 100) throughput a = throughput b (1 + x 100) Car/bus example Latency? Car is 3 times (and 200%) faster than bus Throughput? Bus is 4 times (and 300%) faster than car 6

Performance definitions Let s define our final goal as to minimize the execution time for some application, then we can define performance in terms of execution time as follows: performance a = 1 execution_time(a) 7

Execution time Execution time is affected by multiple factors in a computer system: execution time = CPU time + disk access + memory access + I/O activities + OS overhead We will focus on CPU time since we ll study mostly the processor. However, some applications depend heavily on e.g. disk access performance. 8

CPU time We measure CPU time in seconds, but Remember that computer HW works synchronously, with a clock signal, having a period and a frequency data reg logic reg clock How to relate clock cycles with CPU time? 9

Clock cycles and CPU time Just use one of the two simple formulas: CPU time = clock cycles * cycle time Or using clock rate CPU time = clock cycles / cycle rate Classic designer s tradeoff : Attempting to reduce the clock cycles may lead to reducing the clock rate too, and vice versa 10

Book exercise 11

Answer 12

How about instructions? Since a program executes instructions, they should also play a part in the CPU performance equations So far we had: CPU time = clock cycles * cycle time Now we will also say that: clock cycles = instructions for a program * average clock cycles per instruction IC: Instruction Count Static IC vs. dynamic IC What is needed to determine each? CPI: Cycles Per Instruction Can be used to compare two ISA implementations 13

14

The CPU performance equation Finally, the classic formula that incorporates the three key factors that affect performance is: CPU time = Instruction Count * CPI * cycle time Or CPU time = Instruction Count * CPI / clock rate 15

CPU Performance Equation Factors affecting CPU execution time: Factor Inst. count CPI Clock rate Program x (x) Compiler x (x) ISA x x (x) Microarchitecture x x Technology x CPU time = Instruction Count * CPI / clock rate 16

Cycles per Instruction (CPI) Depends on the instruction CPIi = Execution Time of Instruction i * Clock Rate Computing the total CPI: Example: program dependent! 17

Another CPI Example Assume a processor with instruction frequencies and costs Integer ALU: 50%, 1 cycle Load: 20%, 5 cycle Store: 10%, 1 cycle Branch: 20%, 2 cycle Which change would improve performance more? a) Faster branch prediction to reduce branch cost to 1 cycle? b) Better data cache to reduce load cost to 3 cycles? Compute CPI Base = 0.5*1 + 0.2*5 + 0.1*1 + 0.2*2 = 2 A = 0.5*1 + 0.2*5 + 0.1*1 + 0.2*1 = 1.8 B = 0.5*1 + 0.2*3 + 0.1*1 + 0.2*2 = 1.6 (winner) 18

Book example 19

Answer 20

IPC, MIPS and GHz The metrics you are most likely to see in marketing are IPC (instruction per cycle), MIPS (million instruction per second) and GHz How are they incomplete? Back to the CPU time formula: 1/IPC 1/MIPS 1/GHz Which processor would you buy? Processor A: CPI = 2, clock = 5 GHz Processor B: CPI = 1, clock = 3 GHz Probably A, but B is faster (assuming same ISA/compiler) Meta-point: danger of partial performance metrics! GHz can be boosted artificially by design (lower the other 2 terms) e.g., 800 MHz PentiumIII faster than 1 GHz Pentium4! 21

Gene Amdahl American computer architect Born in 1922 Worked for IBM until 1970 Founded Amdahl Corporation to compete in the mainframe market against IBM Proposed the later known as Amdahl s Law during the 1967 Spring Joint Computer Conference 22

Amdahl s law Suppose an enhancement speeds up a fraction f of a task by a factor of Sf If f is small Sf doesn t matter. Concentrate effort on improving frequently occurring events or frequently used 23

Practicing Amdahl s law 1. What is the percentage of time each instruction takes? 2. How much is the total time reduced if the time for FP instructions is reduced by 20%? How much is the total speed up? 3. How much is the total time reduced if the time for L/S instructions is reduced by 20%? How much is the total speed up? 4. Can the total time be reduced by 20% by reducing only the time for branch instructions? 5. What s the theoretical speed up limit by reducing the branch instructions time? 24

Another exercise 25