Performance evaluation

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Performance evaluation"

Transcription

1 Performance evaluation Arquitecturas Avanzadas de Computadores Departamento de Ingeniería Electrónica y de Telecomunicaciones Facultad de Ingeniería

2 Bibliography and evaluation Bibliography Lecture slides Chapter 4: Computer Organization and Design The hardware/software interface, D. A. Patterson y J. L. Henessy, Morgan Kaufman Publishers, 3rd Edition, Chapter 1: Computer architecture A quantitative approach, J. Henessy and D. Patterson, Morgan Kaufman, 5th Edition, 2011 (previous editions may be good too). Evaluation Test I (15%) covering units 1-2 2

3 How good is a computer? We can think of many parameters: Porcessor s clock rate Power consumed by a program Execution time for a program Number of tasks done per second Reliability Aesthetic appearance Social repercussion, etc These are the metrics, the things we want to estimate or measure (not all of them are easy to measure though) How should we compare two computer systems? 3

4 Performance: Latency vs. Throughput Latency: time to finish a fixed task Throughput: number of tasks per unit of time Different: exploit parallelism for throughput, not latency Usually a trade-off: latency vs. throughput Choose definition of performance that matches your goals Scientific program: latency; web server: throughput? Example: transport people 10 km Car: capacity = 5, speed = 60 kmh Bus: capacity = 60, speed = 20 kmh Latency: car = 10 min, bus = 30 min Throughput: car = 15 pph (count return trip), bus = 60 pph 4

5 Example: latency vs. throughput Do the following changes to a computer system increase throughput, decrease response time or both? a) Replacing the processor with a faster version b) Adding more processors to a systems that uses multiple processors for separate tasks (a web sever) Answer a) Both b) Throughput 5

6 Comparing Performance System a is x times faster than b if latency a = latency(b) x throughput a = throughput b x System a is x% faster than b if latency a = latency(b) (1 + x 100) throughput a = throughput b (1 + x 100) Car/bus example Latency? Car is 3 times (and 200%) faster than bus Throughput? Bus is 4 times (and 300%) faster than car 6

7 Performance definitions Let s define our final goal as to minimize the execution time for some application, then we can define performance in terms of execution time as follows: performance a = 1 execution_time(a) 7

8 Execution time Execution time is affected by multiple factors in a computer system: execution time = CPU time + disk access + memory access + I/O activities + OS overhead We will focus on CPU time since we ll study mostly the processor. However, some applications depend heavily on e.g. disk access performance. 8

9 CPU time We measure CPU time in seconds, but Remember that computer HW works synchronously, with a clock signal, having a period and a frequency data reg logic reg clock How to relate clock cycles with CPU time? 9

10 Clock cycles and CPU time Just use one of the two simple formulas: CPU time = clock cycles * cycle time Or using clock rate CPU time = clock cycles / cycle rate Classic designer s tradeoff : Attempting to reduce the clock cycles may lead to reducing the clock rate too, and vice versa 10

11 Book exercise 11

12 Answer 12

13 How about instructions? Since a program executes instructions, they should also play a part in the CPU performance equations So far we had: CPU time = clock cycles * cycle time Now we will also say that: clock cycles = instructions for a program * average clock cycles per instruction IC: Instruction Count Static IC vs. dynamic IC What is needed to determine each? CPI: Cycles Per Instruction Can be used to compare two ISA implementations 13

14 14

15 The CPU performance equation Finally, the classic formula that incorporates the three key factors that affect performance is: CPU time = Instruction Count * CPI * cycle time Or CPU time = Instruction Count * CPI / clock rate 15

16 CPU Performance Equation Factors affecting CPU execution time: Factor Inst. count CPI Clock rate Program x (x) Compiler x (x) ISA x x (x) Microarchitecture x x Technology x CPU time = Instruction Count * CPI / clock rate 16

17 Cycles per Instruction (CPI) Depends on the instruction CPIi = Execution Time of Instruction i * Clock Rate Computing the total CPI: Example: program dependent! 17

18 Another CPI Example Assume a processor with instruction frequencies and costs Integer ALU: 50%, 1 cycle Load: 20%, 5 cycle Store: 10%, 1 cycle Branch: 20%, 2 cycle Which change would improve performance more? a) Faster branch prediction to reduce branch cost to 1 cycle? b) Better data cache to reduce load cost to 3 cycles? Compute CPI Base = 0.5* * * *2 = 2 A = 0.5* * * *1 = 1.8 B = 0.5* * * *2 = 1.6 (winner) 18

19 Book example 19

20 Answer 20

21 IPC, MIPS and GHz The metrics you are most likely to see in marketing are IPC (instruction per cycle), MIPS (million instruction per second) and GHz How are they incomplete? Back to the CPU time formula: 1/IPC 1/MIPS 1/GHz Which processor would you buy? Processor A: CPI = 2, clock = 5 GHz Processor B: CPI = 1, clock = 3 GHz Probably A, but B is faster (assuming same ISA/compiler) Meta-point: danger of partial performance metrics! GHz can be boosted artificially by design (lower the other 2 terms) e.g., 800 MHz PentiumIII faster than 1 GHz Pentium4! 21

22 Gene Amdahl American computer architect Born in 1922 Worked for IBM until 1970 Founded Amdahl Corporation to compete in the mainframe market against IBM Proposed the later known as Amdahl s Law during the 1967 Spring Joint Computer Conference 22

23 Amdahl s law Suppose an enhancement speeds up a fraction f of a task by a factor of Sf If f is small Sf doesn t matter. Concentrate effort on improving frequently occurring events or frequently used 23

24 Practicing Amdahl s law 1. What is the percentage of time each instruction takes? 2. How much is the total time reduced if the time for FP instructions is reduced by 20%? How much is the total speed up? 3. How much is the total time reduced if the time for L/S instructions is reduced by 20%? How much is the total speed up? 4. Can the total time be reduced by 20% by reducing only the time for branch instructions? 5. What s the theoretical speed up limit by reducing the branch instructions time? 24

25 Another exercise 25

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs. This Unit CIS 501: Computer Architecture Unit 4: Performance & Benchmarking Metrics Latency and throughput Speedup Averaging CPU Performance Performance Pitfalls Slides'developed'by'Milo'Mar0n'&'Amir'Roth'at'the'University'of'Pennsylvania'

More information

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends This Unit CIS 501 Computer Architecture! Metrics! Latency and throughput! Reporting performance! Benchmarking and averaging Unit 2: Performance! CPU performance equation & performance trends CIS 501 (Martin/Roth):

More information

EEM 486: Computer Architecture. Lecture 4. Performance

EEM 486: Computer Architecture. Lecture 4. Performance EEM 486: Computer Architecture Lecture 4 Performance EEM 486 Performance Purchasing perspective Given a collection of machines, which has the» Best performance?» Least cost?» Best performance / cost? Design

More information

on an system with an infinite number of processors. Calculate the speedup of

on an system with an infinite number of processors. Calculate the speedup of 1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements

More information

CPU Performance. Lecture 8 CAP 3103 06-11-2014

CPU Performance. Lecture 8 CAP 3103 06-11-2014 CPU Performance Lecture 8 CAP 3103 06-11-2014 Defining Performance Which airplane has the best performance? 1.6 Performance Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747

More information

Lecture 4: Evaluating Performance Instructor: Dimitris Nikolopoulos. Guest Lecturer: Matthew Curtis-Maury

Lecture 4: Evaluating Performance Instructor: Dimitris Nikolopoulos. Guest Lecturer: Matthew Curtis-Maury CS2504: Computer Organization Lecture 4: Evaluating Performance Instructor: Dimitris Nikolopoulos Guest Lecturer: Matthew Curtis-Maury Understanding Performance Why do we study performance? Evaluate during

More information

CSEE W4824 Computer Architecture Fall 2012

CSEE W4824 Computer Architecture Fall 2012 CSEE W4824 Computer Architecture Fall 2012 Lecture 2 Performance Metrics and Quantitative Principles of Computer Design Luca Carloni Department of Computer Science Columbia University in the City of New

More information

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle? Lecture 3: Evaluating Computer Architectures Announcements - Reminder: Homework 1 due Thursday 2/2 Last Time technology back ground Computer elements Circuits and timing Virtuous cycle of the past and

More information

EE361: Digital Computer Organization Course Syllabus

EE361: Digital Computer Organization Course Syllabus EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)

More information

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Quiz for Chapter 1 Computer Abstractions and Technology 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

More information

Pipelining Review and Its Limitations

Pipelining Review and Its Limitations Pipelining Review and Its Limitations Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 16, 2010 Moscow Institute of Physics and Technology Agenda Review Instruction set architecture Basic

More information

A Brief Review of Processor Architecture. Why are Modern Processors so Complicated? Basic Structure

A Brief Review of Processor Architecture. Why are Modern Processors so Complicated? Basic Structure A Brief Review of Processor Architecture Why are Modern Processors so Complicated? Basic Structure CPU PC IR Regs ALU Memory Fetch PC -> Mem addr [addr] > IR PC ++ Decode Select regs Execute Perform op

More information

Week 1 out-of-class notes, discussions and sample problems

Week 1 out-of-class notes, discussions and sample problems Week 1 out-of-class notes, discussions and sample problems Although we will primarily concentrate on RISC processors as found in some desktop/laptop computers, here we take a look at the varying types

More information

Multicore and Parallel Processing

Multicore and Parallel Processing Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 Administrivia FlameWar Games Night Next Friday, April 27 th 5pm

More information

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394

More information

Introducción. Diseño de sistemas digitales.1

Introducción. Diseño de sistemas digitales.1 Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1

More information

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu. Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Computers in mid 50 s Hardware was expensive

More information

Computer Architecture Syllabus of Qualifying Examination

Computer Architecture Syllabus of Qualifying Examination Computer Architecture Syllabus of Qualifying Examination PhD in Engineering with a focus in Computer Science Reference course: CS 5200 Computer Architecture, College of EAS, UCCS Created by Prof. Xiaobo

More information

Computer Architecture. R. Poss

Computer Architecture. R. Poss Computer Architecture R. Poss 1 What is computer architecture? 2 Your ideas and expectations What is part of computer architecture, what is not? Who are computer architects, what is their job? What is

More information

Course on Advanced Computer Architectures

Course on Advanced Computer Architectures Course on Advanced Computer Architectures Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, September 3rd, 2015 Prof. C. Silvano EX1A ( 2 points) EX1B ( 2

More information

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to: 55 Topic 3 Computer Performance Contents 3.1 Introduction...................................... 56 3.2 Measuring performance............................... 56 3.2.1 Clock Speed.................................

More information

Chapter 2. Why is some hardware better than others for different programs?

Chapter 2. Why is some hardware better than others for different programs? Chapter 2 1 Performance Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation Why is some hardware better than

More information

Computer Organization and Components

Computer Organization and Components Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides

More information

A Lab Course on Computer Architecture

A Lab Course on Computer Architecture A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

More information

Virtualization and Cloud Computing. Sorav Bansal

Virtualization and Cloud Computing. Sorav Bansal Virtualization and Cloud Computing Sorav Bansal Administrivia Instructors: Sorav Bansal, Huzur Saran, Gautam Shroff (Tata Consultancy Services) Webpage: http://www.cse.iitd.ernet.in/~sbansal/csl862 Syllabus:

More information

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis Performance Metrics and Scalability Analysis 1 Performance Metrics and Scalability Analysis Lecture Outline Following Topics will be discussed Requirements in performance and cost Performance metrics Work

More information

Communicating with devices

Communicating with devices Introduction to I/O Where does the data for our CPU and memory come from or go to? Computers communicate with the outside world via I/O devices. Input devices supply computers with data to operate on.

More information

Driving force. What future software needs. Potential research topics

Driving force. What future software needs. Potential research topics Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #

More information

Real-Time Scheduling 1 / 39

Real-Time Scheduling 1 / 39 Real-Time Scheduling 1 / 39 Multiple Real-Time Processes A runs every 30 msec; each time it needs 10 msec of CPU time B runs 25 times/sec for 15 msec C runs 20 times/sec for 5 msec For our equation, A

More information

Computer Organization. and Instruction Execution. August 22

Computer Organization. and Instruction Execution. August 22 Computer Organization and Instruction Execution August 22 CSC201 Section 002 Fall, 2000 The Main Parts of a Computer CSC201 Section Copyright 2000, Douglas Reeves 2 I/O and Storage Devices (lots of devices,

More information

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design Learning Outcomes Simple CPU Operation and Buses Dr Eddie Edwards eddie.edwards@imperial.ac.uk At the end of this lecture you will Understand how a CPU might be put together Be able to name the basic components

More information

System Models for Distributed and Cloud Computing

System Models for Distributed and Cloud Computing System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems

More information

Department of Electrical and Computer Engineering Faculty of Engineering and Architecture American University of Beirut Course Information

Department of Electrical and Computer Engineering Faculty of Engineering and Architecture American University of Beirut Course Information Department of Electrical and Computer Engineering Faculty of Engineering and Architecture American University of Beirut Course Information Course title: Computer Organization Course number: EECE 321 Catalog

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats Execution Cycle Pipelining CSE 410, Spring 2005 Computer Systems http://www.cs.washington.edu/410 1. Instruction Fetch 2. Instruction Decode 3. Execute 4. Memory 5. Write Back IF and ID Stages 1. Instruction

More information

Scaling in a Hypervisor Environment

Scaling in a Hypervisor Environment Scaling in a Hypervisor Environment Richard McDougall Chief Performance Architect VMware VMware ESX Hypervisor Architecture Guest Monitor Guest TCP/IP Monitor (BT, HW, PV) File System CPU is controlled

More information

Software and the Concurrency Revolution

Software and the Concurrency Revolution Software and the Concurrency Revolution A: The world s fastest supercomputer, with up to 4 processors, 128MB RAM, 942 MFLOPS (peak). 2 Q: What is a 1984 Cray X-MP? (Or a fractional 2005 vintage Xbox )

More information

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches: Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

More information

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

More information

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems Fastboot Techniques for x86 Architectures Marcus Bortel Field Application Engineer QNX Software Systems Agenda Introduction BIOS and BIOS boot time Fastboot versus BIOS? Fastboot time Customizing the boot

More information

Computer Architecture

Computer Architecture Computer Architecture Random Access Memory Technologies 2015. április 2. Budapest Gábor Horváth associate professor BUTE Dept. Of Networked Systems and Services ghorvath@hit.bme.hu 2 Storing data Possible

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

64-Bit versus 32-Bit CPUs in Scientific Computing

64-Bit versus 32-Bit CPUs in Scientific Computing 64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department

More information

AMD Opteron Quad-Core

AMD Opteron Quad-Core AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced

More information

Virtualization. Pradipta De pradipta.de@sunykorea.ac.kr

Virtualization. Pradipta De pradipta.de@sunykorea.ac.kr Virtualization Pradipta De pradipta.de@sunykorea.ac.kr Today s Topic Virtualization Basics System Virtualization Techniques CSE506: Ext Filesystem 2 Virtualization? A virtual machine (VM) is an emulation

More information

Instruction Set Design

Instruction Set Design Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,

More information

Processor Architectures

Processor Architectures ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Exam 3 Tuesday, December 6 th Caches Virtual Memory Input / Output OperaKng Systems Compilers & Assemblers Processor Architecture

More information

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner CPS104 Computer Organization and Programming Lecture 18: Input-Output Robert Wagner cps 104 I/O.1 RW Fall 2000 Outline of Today s Lecture The I/O system Magnetic Disk Tape Buses DMA cps 104 I/O.2 RW Fall

More information

An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors

An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors Matthew Curtis-Maury, Xiaoning Ding, Christos D. Antonopoulos, and Dimitrios S. Nikolopoulos The College of William &

More information

NVIDIA Tools For Profiling And Monitoring. David Goodwin

NVIDIA Tools For Profiling And Monitoring. David Goodwin NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale

More information

Java Performance. Adrian Dozsa TM-JUG 18.09.2014

Java Performance. Adrian Dozsa TM-JUG 18.09.2014 Java Performance Adrian Dozsa TM-JUG 18.09.2014 Agenda Requirements Performance Testing Micro-benchmarks Concurrency GC Tools Why is performance important? We hate slow web pages/apps We hate timeouts

More information

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy

More information

Lecture 11: Memory Hierarchy Design. CPU-Memory Performance Gap

Lecture 11: Memory Hierarchy Design. CPU-Memory Performance Gap Lecture 11: Memory Hierarchy Design Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 CPU-Memory Performance Gap 2 The Memory Bottleneck Typical CPU clock

More information

Introduction to Microprocessors

Introduction to Microprocessors Introduction to Microprocessors Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 2, 2010 Moscow Institute of Physics and Technology Agenda Background and History What is a microprocessor?

More information

Design Cycle for Microprocessors

Design Cycle for Microprocessors Cycle for Microprocessors Raúl Martínez Intel Barcelona Research Center Cursos de Verano 2010 UCLM Intel Corporation, 2010 Agenda Introduction plan Architecture Microarchitecture Logic Silicon ramp Types

More information

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING 2013/2014 1 st Semester Sample Exam January 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc.

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Five Families of ARM Processor IP

Five Families of ARM Processor IP ARM1026EJ-S Synthesizable ARM10E Family Processor Core Eric Schorn CPU Product Manager ARM Austin Design Center Five Families of ARM Processor IP Performance ARM preserves SW & HW investment through code

More information

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Prof. Silvano, Politecnico di Milano

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Instructor: Andreas Moshovos moshovos@eecg.toronto.edu Fall 2005 Some material is based on slides developed by profs. Mark Hill, David Wood, Guri Sohi and Jim Smith at the

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Capacity Planning for Microsoft SharePoint Technologies

Capacity Planning for Microsoft SharePoint Technologies Capacity Planning for Microsoft SharePoint Technologies Capacity Planning The process of evaluating a technology against the needs of an organization, and making an educated decision about the configuration

More information

Central Processing Unit (CPU)

Central Processing Unit (CPU) Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following

More information

Thread level parallelism

Thread level parallelism Thread level parallelism ILP is used in straight line code or loops Cache miss (off-chip cache and main memory) is unlikely to be hidden using ILP. Thread level parallelism is used instead. Thread: process

More information

Application. Performance Testing

Application. Performance Testing Application Performance Testing www.mohandespishegan.com شرکت مهندش پیشگان آزمون افسار یاش Performance Testing March 2015 1 TOC Software performance engineering Performance testing terminology Performance

More information

CSE 30321 Computer Architecture I Fall 2009 Final Exam December 18, 2009

CSE 30321 Computer Architecture I Fall 2009 Final Exam December 18, 2009 CSE 30321 Computer Architecture I Fall 2009 Final Exam December 18, 2009 Test Guidelines: 1. Place your name on EACH page of the test in the space provided. 2. every question in the space provided. If

More information

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Quiz for Chapter 6 Storage and Other I/O Topics 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [6 points] Give a concise answer to each

More information

More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction

More information

Computer Organization and Components

Computer Organization and Components Computer Organization and Components IS1500, fall 2015 Lecture 5: I/O Systems, part I Associate Professor, KTH Royal Institute of Technology Assistant Research Engineer, University of California, Berkeley

More information

Architecture of Hitachi SR-8000

Architecture of Hitachi SR-8000 Architecture of Hitachi SR-8000 University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 Most of the slides from Hitachi Slide 2 the problem modern computer are data

More information

Main Memory Background

Main Memory Background ECE 554 Computer Architecture Lecture 5 Main Memory Spring 2013 Sudeep Pasricha Department of Electrical and Computer Engineering Colorado State University Pasricha; portions: Kubiatowicz, Patterson, Mutlu,

More information

ICS 143 - Principles of Operating Systems

ICS 143 - Principles of Operating Systems ICS 143 - Principles of Operating Systems Lecture 5 - CPU Scheduling Prof. Nalini Venkatasubramanian nalini@ics.uci.edu Note that some slides are adapted from course text slides 2008 Silberschatz. Some

More information

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes TRACE PERFORMANCE TESTING APPROACH Overview Approach Flow Attributes INTRODUCTION Software Testing Testing is not just finding out the defects. Testing is not just seeing the requirements are satisfied.

More information

CISC, RISC, and DSP Microprocessors

CISC, RISC, and DSP Microprocessors CISC, RISC, and DSP Microprocessors Douglas L. Jones ECE 497 Spring 2000 4/6/00 CISC, RISC, and DSP D.L. Jones 1 Outline Microprocessors circa 1984 RISC vs. CISC Microprocessors circa 1999 Perspective:

More information

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

More information

Case Study I: A Database Service

Case Study I: A Database Service Case Study I: A Database Service Prof. Daniel A. Menascé Department of Computer Science George Mason University www.cs.gmu.edu/faculty/menasce.html 1 Copyright Notice Most of the figures in this set of

More information

The Bus (PCI and PCI-Express)

The Bus (PCI and PCI-Express) 4 Jan, 2008 The Bus (PCI and PCI-Express) The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the

More information

Battle against Intel CPU Monopoly Moves from Lower Costs to Faster IA-32 Processors.

Battle against Intel CPU Monopoly Moves from Lower Costs to Faster IA-32 Processors. Battle against Intel CPU Monopoly Moves from Lower Costs to Faster IA-32 Processors. Giovana Mendes Dep. Informática, Universidade do Minho 4710-057 Braga, Portugal gimendes@uol.com.br Abstract.The battle

More information

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Pentium vs. Power PC Computer Architecture and PCI Bus Interface Pentium vs. Power PC Computer Architecture and PCI Bus Interface CSE 3322 1 Pentium vs. Power PC Computer Architecture and PCI Bus Interface Nowadays, there are two major types of microprocessors in the

More information

Instruction Set Architectures

Instruction Set Architectures Instruction Set Architectures Early trend was to add more and more instructions to new CPUs to do elaborate operations (CISC) VAX architecture had an instruction to multiply polynomials! RISC philosophy

More information

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng Slide Set 8 for ENCM 369 Winter 2015 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2015 ENCM 369 W15 Section

More information

CS 6290 I/O and Storage. Milos Prvulovic

CS 6290 I/O and Storage. Milos Prvulovic CS 6290 I/O and Storage Milos Prvulovic Storage Systems I/O performance (bandwidth, latency) Bandwidth improving, but not as fast as CPU Latency improving very slowly Consequently, by Amdahl s Law: fraction

More information

Solid State Storage in Massive Data Environments Erik Eyberg

Solid State Storage in Massive Data Environments Erik Eyberg Solid State Storage in Massive Data Environments Erik Eyberg Senior Analyst Texas Memory Systems, Inc. Agenda Taxonomy Performance Considerations Reliability Considerations Q&A Solid State Storage Taxonomy

More information

Client/Server and Distributed Computing

Client/Server and Distributed Computing Adapted from:operating Systems: Internals and Design Principles, 6/E William Stallings CS571 Fall 2010 Client/Server and Distributed Computing Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Traditional

More information

Influence of Technology and Software on Instruction Sets: Up to the dawn of IBM 360

Influence of Technology and Software on Instruction Sets: Up to the dawn of IBM 360 1 Influence of Technology and Software on Instruction Sets: Up to the dawn of IBM 360 Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by and Krste Asanovic

More information

What is a System on a Chip?

What is a System on a Chip? What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex

More information

VLIW Processors. VLIW Processors

VLIW Processors. VLIW Processors 1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW

More information

EC 362 Problem Set #2

EC 362 Problem Set #2 EC 362 Problem Set #2 1) Using Single Precision IEEE 754, what is FF28 0000? 2) Suppose the fraction enhanced of a processor is 40% and the speedup of the enhancement was tenfold. What is the overall speedup?

More information

Computer Architecture-I

Computer Architecture-I Computer Architecture-I 1. Die Yield is given by the formula, Assignment 1 Solution Die Yield = Wafer Yield x (1 + (Defects per unit area x Die Area)/a) -a Let us assume a wafer yield of 100% and a 4 for

More information

Chapter 13 Selected Storage Systems and Interface

Chapter 13 Selected Storage Systems and Interface Chapter 13 Selected Storage Systems and Interface Chapter 13 Objectives Appreciate the role of enterprise storage as a distinct architectural entity. Expand upon basic I/O concepts to include storage protocols.

More information

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern: Pipelining HW Q. Can a MIPS SW instruction executing in a simple 5-stage pipelined implementation have a data dependency hazard of any type resulting in a nop bubble? If so, show an example; if not, prove

More information

Contents. Chapter 1. Introduction

Contents. Chapter 1. Introduction Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1 Hierarchy Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Hierarchy- 1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor

More information

Datacenter Operating Systems

Datacenter Operating Systems Datacenter Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015 This Lecture What s a datacenter Why datacenters Types of datacenters Hyperscale datacenters Major

More information

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for redundant data storage Provides fault tolerant

More information