CPU Performance. Lecture 8 CAP 3103 06-11-2014

Similar documents
Chapter 2. Why is some hardware better than others for different programs?

EEM 486: Computer Architecture. Lecture 4. Performance

CSEE W4824 Computer Architecture Fall 2012

Performance evaluation

on an system with an infinite number of processors. Calculate the speedup of

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

EC 362 Problem Set #2

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

Central Processing Unit (CPU)

Computer Organization. and Instruction Execution. August 22

Processor Architectures

Enabling Technologies for Distributed Computing

Energy-aware job scheduler for highperformance

Enabling Technologies for Distributed and Cloud Computing

1. Memory technology & Hierarchy

Legal Notices and Important Information

HY345 Operating Systems

Communicating with devices

Beyond Virtualization: A Novel Software Architecture for Multi-Core SoCs. Jim Ready September 18, 2012

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

ICS SYSTEM PERIPHERAL CLOCK SOURCE. Description. Features. Block Diagram DATASHEET

The Bus (PCI and PCI-Express)

Chapter 1: Introduction. What is an Operating System?

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

Windows Server Performance Monitoring

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.

Computer Architectures

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

The Motherboard Chapter #5

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis

Networking Virtualization Using FPGAs

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Overlapping Data Transfer With Application Execution on Clusters

ICS379. Quad PLL with VCXO Quick Turn Clock. Description. Features. Block Diagram

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Real-Time Scheduling 1 / 39

A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems

Pipelining Review and Its Limitations

Week 1 out-of-class notes, discussions and sample problems

The implementation and performance/cost/power analysis of the network security accelerator on SoC applications

Readings for this topic: Silberschatz/Galvin/Gagne Chapter 5

EECS 678: Introduction to Operating Systems

Datacenter Operating Systems

All Tech Notes and KBCD documents and software are provided "as is" without warranty of any kind. See the Terms of Use for more information.

Operating Systems Lecture #6: Process Management

ICS Principles of Operating Systems

Process Scheduling CS 241. February 24, Copyright University of Illinois CS 241 Staff

Implementing a Digital Answering Machine with a High-Speed 8-Bit Microcontroller

Operating Systems. Lecture 03. February 11, 2013

Introduction to Cloud Computing

High Performance Computing. Course Notes HPC Fundamentals

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Price/performance Modern Memory Hierarchy

Multi-Profile CMOS Infrared Network Camera

Run-time Resource Management in SOA Virtualized Environments. Danilo Ardagna, Raffaela Mirandola, Marco Trubian, Li Zhang

Cisco MCS 7825-H3 Unified Communications Manager Appliance

Scaling in a Hypervisor Environment

AGIPD Interface Electronic Prototyping

Increasing Flash Throughput for Big Data Applications (Data Management Track)

Cisco MCS 7816-I3 Unified Communications Manager Appliance

Advances in Virtualization In Support of In-Memory Big Data Applications

Web Application s Performance Testing

Dynamic Resource allocation in Cloud

Cisco MCS 7825-H2 Unified CallManager Appliance

1 TO 4 CLOCK BUFFER ICS551. Description. Features. Block Diagram DATASHEET

A Low Latency Library in FPGA Hardware for High Frequency Trading (HFT)

Lattice QCD Performance. on Multi core Linux Servers

ARM Cortex-A9 MPCore Multicore Processor Hierarchical Implementation with IC Compiler

Microtronics technologies Mobile:

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

Digital Design for Low Power Systems

8-Bit Flash Microcontroller for Smart Cards. AT89SCXXXXA Summary. Features. Description. Complete datasheet available under NDA

Hitachi Virtage Embedded Virtualization Hitachi BladeSymphony 10U

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

Develop a Dallas 1-Wire Master Using the Z8F1680 Series of MCUs

Host Power Management in VMware vsphere 5

Problems and Measures Regarding Waste 1 Management and 3R Era of public health improvement Situation subsequent to the Meiji Restoration

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Operating System: Scheduling

Networking Remote-Controlled Moving Image Monitoring System

Virtualization in the ARMv7 Architecture Lecture for the Embedded Systems Course CSD, University of Crete (May 20, 2014)

Choosing a Computer for Running SLX, P3D, and P5

International Journal of Electronics and Computer Science Engineering 1482

How To Understand The Design Of A Microprocessor

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Transcription:

CPU Performance Lecture 8 CAP 3103 06-11-2014

Defining Performance Which airplane has the best performance? 1.6 Performance Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747 BAC/Sud Concorde Douglas DC- 8-50 0 100 200 300 400 500 Passenger Capacity 0 2000 4000 6000 8000 10000 Cruising Range (miles) Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747 BAC/Sud Concorde Douglas DC- 8-50 0 500 1000 1500 Cruising Speed (mph) 0 100000 200000 300000 400000 Passengers x mph Chapter 1 Computer Abstractions and Technology 2

Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time e.g., tasks/transactions/ per hour How are response time and throughput affected by Replacing the processor with a faster version? Adding more processors? We ll focus on response time for now Chapter 1 Computer Abstractions and Technology 3

Relative Performance Define Performance = 1/Execution Time X is n time faster than Y Performance X Execution time Performance Y Y Execution time X n Example: time taken to run a program 10s on A, 15s on B Execution Time B / Execution Time A = 15s / 10s = 1.5 So A is 1.5 times faster than B Chapter 1 Computer Abstractions and Technology 4

Measuring Execution Time Elapsed time Total response time, including all aspects Processing, I/O, OS overhead, idle time Determines system performance CPU time Time spent processing a given job Discounts I/O time, other jobs shares Comprises user CPU time and system CPU time Different programs are affected differently by CPU and system performance Chapter 1 Computer Abstractions and Technology 5

CPU Clocking Operation of digital hardware governed by a constant-rate clock Clock (cycles) Data transfer and computation Update state Clock period Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250 10 12 s Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0 10 9 Hz Chapter 1 Computer Abstractions and Technology 6

CPU Time CPU Time CPU Clock Cycles Clock Cycle Time CPU Clock Cycles Clock Rate Performance improved by Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate against cycle count Chapter 1 Computer Abstractions and Technology 7

CPU Time Example Computer A: 2GHz clock, 10s CPU time Designing Computer B Aim for 6s CPU time Can do faster clock, but causes 1.2 clock cycles How fast must Computer B clock be? Chapter 1 Computer Abstractions and Technology 8

CPU Time Example Computer A: 2GHz clock, 10s CPU time Designing Computer B Aim for 6s CPU time Can do faster clock, but causes 1.2 clock cycles How fast must Computer B clock be? Clock Rate B Clock Cycles CPU Time B B 1.2 Clock Cycles 6s A Clock Cycles A CPU Time A Clock Rate A 10s2GHz 2010 9 Clock Rate B 1.22010 6s 9 2410 6s 9 4GHz Chapter 1 Computer Abstractions and Technology 9

Instruction Count and CPI Clock Cycles Instruction Count Cycles per Instruction CPU Time Instruction Count CPI Clock Cycle Time Instruction Count CPI Clock Rate Instruction Count for a program Determined by program, ISA and compiler Average cycles per instruction Determined by CPU hardware If different instructions have different CPI Average CPI affected by instruction mix Chapter 1 Computer Abstractions and Technology 10

CPI Example Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? Chapter 1 Computer Abstractions and Technology 11

CPI Example Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? CPU Time A CPU Time B CPU Time B CPU Time A Instruction Count CPI A I 2.0 250ps I500ps Instruction Count CPI B I1.2 500ps I 600ps I 600ps I500ps 1.2 Cycle Time A Cycle Time B A is faster by this much Chapter 1 Computer Abstractions and Technology 12

CPI in More Detail If different instruction classes take different numbers of cycles Clock Cycles n i1 (CPI Instructio n Count i i) Weighted average CPI CPI Clock Cycles Instructio n Count n i1 CPI i Instructio n Count Instructio n Count i Relative frequency Chapter 1 Computer Abstractions and Technology 13

CPI Example Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 Which code sequence executes the most instructions? Which one will be faster? What is the CPI for each sequence? Chapter 1 Computer Abstractions and Technology 14

CPI Example Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 Sequence 1: IC = 5 Clock Cycles = 2 1 + 1 2 + 2 3 = 10 Avg. CPI = 10/5 = 2.0 Sequence 2: IC = 6 Clock Cycles = 4 1 + 1 2 + 1 3 = 9 Avg. CPI = 9/6 = 1.5 Chapter 1 Computer Abstractions and Technology 15

Performance Summary The BIG Picture CPU Time Instructio ns Program Clock cycles Instructio n Seconds Clock cycle Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, T c Chapter 1 Computer Abstractions and Technology 16

Power Trends 1.7 The Power Wall In CMOS IC technology Power Capacitive load Voltage 2 Frequency 30 5V 1V 1000 Chapter 1 Computer Abstractions and Technology 17

Reducing Power Suppose we developed a new, simpler processor that has 85% of the capacitive load of the more complex older processor. Further, assume that it has adjustable voltage so that it can reduce voltage 15% compared to processor B, which results in a 15% shrink in frequency. What is the impact on dynamic power? Chapter 1 Computer Abstractions and Technology 18

Reducing Power Suppose a new CPU has 85% of capacitive load of old CPU 15% voltage and 15% frequency reduction 2 Pnew Cold 0.85(Vold 0.85) Fold 0.85 4 0.85 2 P C V F old The power wall old We can t reduce voltage further We can t remove more heat old old 0.52 How else can we improve performance? Chapter 1 Computer Abstractions and Technology 19