Chapter 2. The Role of Performance

Size: px
Start display at page:

Download "Chapter 2. The Role of Performance"

Transcription

1 Chapter 2 The Role of Performance 1 Performance Why is some hardware better than others for different programs? What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?) How does the machine's instruction set affect performance? 2

2 Which of these airplanes has the best performance? Airplane Passengers Range (mi) Speed (mph) Boeing Boeing BAC/Sud Concorde Douglas DC The plane with the highest cruising speed is the Concorde. The plane with the longest range is the DC-8. The plane with the largest capacity is the Boeing 747. How much faster is the Concorde compared to the 747? How much bigger is the 747 than the Douglas DC-8? 3 Airplanes Example (cont.) Airplane Boeing Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Passenger Throughput (Passengers x m.p.h) 228, , ,200 79,424 If we define performance in terms of speed, this leaves two possible definitions: 1. Define the fastest plane as the one with the highest speed, taking a single passenger from one point to another in the least time. The Concorde would clearly be the fastest. 2. Define the fastest plane as the one with the highest speed, taking 450 passengers from one point to another in the least time. The Boeing 747 would clearly be the fastest. Check the above table. 4

3 Computer Performance: TIME, TIME, TIME Response Time (latency) How long does it take for my job to run? How long does it take to execute a job? How long must I wait for the database query? The time between the start and completion of a task. Throughput How many jobs can the machine run at once? What is the average execution rate? How much work is getting done? The total amount of work done in a given time. 5 Example If we upgrade a machine with a new processor what do we increase? If we add a new machine to the lab what do we increase? Answer: Case 1: Both response time and throughput are improved. Case 2: No one task gets work done faster, so only throughput increases. But by increasing the number of processors would reduce the waiting time in the queue, therefore response time could improve. 6

4 Execution Time Elapsed Time counts everything (disk and memory accesses, I/O, etc.) a useful number, but often not good for comparison purposes CPU time doesn't count I/O or time spent running other programs can be broken up into system time, and user time Our focus: user CPU time time spent executing the lines of code that are "in" our program 7 Example: Unix time Command 90.7u 12.9s 2:39 65% 90.7u : User CPU Time (seconds) 12.9s : System CPU Time (seconds) 2:39 : Elapsed Time (159 seconds) 65% : Percentage of Elapsed Time that is CPU Time is ( ) / 159 = 0.65 More than a third of the elapsed time in this example was spent waiting for I/O, running other programs, or both. 8

5 Book's Definition of Performance For some program running on machine X, Performance X = 1 / Execution time X machine X is n times faster than machine Y" Performance X / Performance Y = n Problem: machine A runs a program in 10 seconds machine B runs the same program in 15 seconds, how much faster is A than B? Answer: A is 1.5 times faster than B. 9 Clock Cycles Instead of reporting execution time in seconds, we often use cycles seconds program = cycles program seconds cycle A simple formula relates the most basic metrics (clock cycles and clock cycle time) to CPU time: CPU execution time = CPU clock cycles X Clock cycle time for a program for a program Or, because clock rate and clock cycle time are inverses: CPU execution time = (CPU clock cycles ) / Clock rate for a program 10

6 Clock Cycles (Cont.) Clock ticks indicate when to start activities (one abstraction): cycle time = time between ticks = seconds per cycle clock rate (frequency) = cycles per second (1 Hz.= 1 cycle/sec) A 200 MHz. clock has a cycle time time = 5 nanoseconds 11 How to Improve Performance seconds program = cycles program seconds cycle So, hardware designer can improve performance by reducing either the length of the clock cycle or the number of clock cycles required for a program. 12

7 Example Our favorite program runs in 10 seconds on computer A, which has a 400 MHz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target?" 13 Answer First find the number of clock cycles required for the program on A: CPU time A = (CPU clock cycles A ) / Clock rate A 10 seconds = (CPU clock cycles A ) / 400 x 10 6 cycles/sec CPU clock cycles A = 10 sec X 400 X 10 6 cycles/sec = 4000 X 10 6 cycles CPU time for B can be found using this equation: CPU time B = (1.2 X CPU clock cycles A ) / Clock rate B 6 seconds = (1.2 X 4000 X 10 6 cycles) / Clock rate B Clock rateb = (1.2 X 4000 X 10 6 cycles) / 6 seconds = (800 X 10 6 cycles) / second = 800 MHz Machine B must therefore have twice the clock rate of A to run the program in 6 seconds. 14

8 Cycles and Instructions Since the compiler clearly generated instructions to execute, and the machine had to execute the instructions to run the program, the execution time must depend on the number of instructions in a program. 15 How many cycles are required for a program? Could assume that # of cycles = # of instructions 1st instruction 2nd instruction 3rd instruction 4th 5th 6th... This assumption is incorrect, time different instructions take different amounts of time on different machines. Why? hint: remember that these are machine instructions, not lines of C code 16

9 Different numbers of cycles for different instructions time Multiplication takes more time than addition Floating point operations take longer than integer ones Accessing memory takes more time than accessing registers Important point: changing the cycle time often changes the number of cycles required for various instructions (more later) 17 CPI (clock cycles per instruction) The number of clock cycles required for a program can be written as: CPU clock cycles = Instructions for a program X Average clock cycles per instruction The term clock cycles per instruction, which is the average number of clock cycles each instruction takes to execute, is called CPI. Since different instructions may take different amounts of time depending on what they do, CPI is an average of all instructions executed in the program. 18

10 CPI Example Suppose we have two implementations of the same instruction set architecture (ISA). For some program, Machine A has a clock cycle time of 1 ns. and a CPI of 2.0 Machine B has a clock cycle time of 2 ns. and a CPI of 1.2 What machine is faster for this program, and by how much? If two machines have the same ISA which of our quantities (e.g., clock rate, CPI, execution time, # of instructions, MIPS) will always be identical? 19 Answer We know that each machine executes the same number of instructions for the same program; let s call this number I First find the number of processor clock cycles for each machine: CPU clock cycles A = I x 2.0 CPU clock cycles B = I x 1.2 Now we can compute the CPU time for each machine: CPU time A = CPU clock cycles A x Clock cycle time A = I x 2.0 x 1 ns = 2 x I ns CPU time B = I x 1.2 x 2 ns = 2.4 x I ns Clearly, machine A is faster. The amount faster is given by the ratio of the execution times: CPU Performance A = Execution time B = 2.4 x I ns = 1.2 CPU Performance B Execution time A 2 x I ns Machine A is 1.2 times faster than machine B for this program 20

11 Instruction Count Basic Performance Equation in terms of instruction count (the number of instructions executed by the program), CPI, and clock cycle time: or CPU time = Instruction count x CPI x Clock cycle time CPU time = (Instruction count x CPI) / Clock rate Execution Time = Instructions x Clock cycles x Seconds Program Instruction Clock cycle How to find the above performance parameters? CPU clock cycles = Summation (CPI i x C i ) i = 1 to n Where C i is the count of the number of instructions of class i executed, CPI i is the average number of cycles per instruction for that instruction class, and n is the number of instruction classes. 21 Number of Instructions Example A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively). The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. Which sequence will be faster? What is the CPI for each sequence? 22

12 Answer Sequence 1 executes = 5 instructions. Sequence 2 executes = 6 instructions. So, sequence 1 executes fewer instructions. To find the total number of clock cycles for each sequence: CPU clock cycles = Summation (CPI i x C i ) i = 1 to n CPU clock cycles 1 = (2x1) + (1x2) + (2x3) = 10 cycles CPU clock cycles 2 = (4x1) + (1x2) + (1x3) = 9 cycles So code sequence 2 is faster, even though it actually executes one extra instruction. Since code sequence 2 takes fewer overall clock cycles but has more instructions, it must have a lower CPI. 23 Answer (Cont.) The CPI values can be computed by: CPI = CPU clock cycles Instruction count CPI1 = CPU clock cycles 1 = 10 = 2 Instruction count 1 5 CPI2 = CPU clock cycles 2 = 9 = 1.5 Instruction count 2 6 Note: When comparing two machines, you must look at all three components (instruction count, CPI, and clock cycle time), which combine to form execution time. 24

13 Now that we understand cycles A given program will require some number of instructions (machine instructions) some number of cycles some number of seconds We have a vocabulary that relates these quantities: cycle time (seconds per cycle) clock rate (cycles per second) CPI (cycles per instruction) a floating point intensive application might have a higher CPI 25 Performance Performance is determined by execution time Do any of the other variables equal performance? number of cycles to execute program? number of instructions in program? number of cycles per second? average number of cycles per instruction? average number of instructions per second? Common pitfall: thinking one of the variables is indicative of performance when it really isn t. 26

14 Benchmarks Performance best determined by running a real application Use programs typical of expected workload Or, typical of expected class of applications e.g., compilers/editors, scientific applications, graphics, etc. Small benchmarks nice for architects and designers easy to standardize can be abused SPEC (System Performance Evaluation Cooperative) companies have agreed on a set of real program and inputs can still be abused (Intel s other bug) valuable indicator of performance (and compiler technology) 27 SPEC 89 Compiler enhancements and performance SPEC performance ratio gcc espresso spice doduc nasa7 li eqntott matrix300 fp pp p to m ca tv Benchmark Compiler Enhanced compiler 28

15 SPEC 95 (8 integer followed by 10 floating-point benchmarks) Benchmark Description go Artificial intelligence; plays the game of Go m88ksim Motorola 88k chip simulator; runs test program gcc The Gnu C compiler generating SPARC code compress Compresses and decompresses file in memory li Lisp interpreter ijpeg Graphic compression and decompression perl Manipulates strings and prime numbers in the special-purpose programming language Perl vortex A database program tomcatv A mesh generation program swim Shallow water model with 513 x 513 grid su2cor quantum physics; Monte Carlo simulation hydro2d Astrophysics; Hydrodynamic Naiver Stokes equations mgrid Multigrid solver in 3-D potential field applu Parabolic/elliptic partial differential equations trub3d Simulates isotropic, homogeneous turbulence in a cube apsi Solves problems regarding temperature, wind velocity, and distribution of pollutant fpppp Quantum chemistry wave5 Plasma physics; electromagnetic particle simulation 29 SPECint 95 SPEC ratio: bigger numeric results indicate faster performance SPECint Clock rate (MHz) Pentium Pentium Pro Pentium Pro is 1.4 to 1.5 times faster than Pentium 30

16 SPECfp 95 SPEC ratio: bigger numeric results indicate faster performance SPECfp Clock rate (MHz) Pentium Pentium Pro Pentium Pro is 1.7 to 1.8 times faster than Pentium 31 Synthetic Benchmarks Synthetic benchmarks are artificial programs that are constructed to try to match the characteristics of a large set of programs. Whetstone and Dhrystone are the most popular synthetic benchmarks. Whetstone was based on measurements of Algol programs in a scientific and engineering environment. It was later converted to Fortran and became popular. Dhrystone was created as benchmark for systems programming environments. It was originally written in Ada and later converted in C, after which it became popular. 32

17 Synthetic Benchmarks Drawbacks One major drawback of synthetic benchmarks is that no user would ever run a synthetic benchmark as an application because these programs do not compute anything a user would find remotely interesting. Synthetic benchmarks are not real programs, they usually do not reflect program behavior, other than the behavior considered when they were created. 33 CPU Performance Does doubling the clock rate double the performance? For a given instruction set architecture, increases in CPU performance can come from three sources: 1. Increases in clock rate 2. Improvements in processor organization that lower the CPI 3. Compiler enhancements that lower the instruction count or generate instructions with a lower average CPI (e.g., by using simpler instructions) 34

18 Comparing and Summarizing Performance Example: Execution times of 2 programs on 2 different machines Program 1 (seconds) Program 2 (seconds) Total time (seconds) Computer A Computer B Which is best answer? 1. A is 10 times faster than B for program B is 10 times faster than A for program B is 9.1 times faster than A for programs 1 & Comparing Performance (Cont.) Answer: Best answer is Performance B = Execution time A = 1001 = 9.1 Performance A Execution time B 110 The average of the execution times that is directly proportional to total execution time is the arithmetic mean (AM): AM = 1/n x Summation Time i i = 1 to n where Time i is the execution time for the ith program of a total of n in the workload. Since it is the mean of execution times, a smaller mean indicates a smaller average execution time and thus improved performance. 36

19 Amdahl's Law The performance enhancement possible with a given improvement is limited by the amount that the improved feature is used (Amdahl s Law). Execution Time After Improvement = Execution Time Unaffected +( Execution Time Affected / Amount of Improvement ) Example: "Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 5 times faster?" 37 Amdahl's Law (Cont.) Answer: Execution time after improvement = ( seconds) + (80 seconds / n) Since we want the performance to be 5 times faster, the new execution time should be 20 seconds, giving: 20 seconds = (20 seconds) + (80 seconds / n) 0 = (80 seconds / n) That is, there is no amount by which we can enhance multiply to achieve a fivefold increase in performance, if multiply accounts for only 80% of the workload. 38

20 Amdahl's Law (Cont.) Amdahl s law is given in another form that yields the speedup. Speedup is the measure of how a machine performs after some enhancement relative to how it performed previously. Speedup = Performance after improvement Performance before improvement = Execution time before improvement Execution time after improvement 39 MIPS (million instructions per second) MIPS = Instruction count / (Execution time x 10 6 ) Faster machines have a higher MIPS rating. Three problems with using MIPS: 1. MIPS specifies the instruction execution rate but does not take into account the capabilities of the instructions. We cannot compare computers with different instruction sets. 2. MIPS varies between programs on the same computer; thus a machine cannot have a single MIPS rating for all programs. 3. MIPS can vary inversely with performance. 40

21 MIPS Example Two different compilers are being tested for a 500 MHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (CPI) respectively. Both compilers are used to produce code for a large piece of software. The first compiler's code uses 5 billion Class A instructions, 1 billion Class B instructions, and 1 billion Class C instructions. The second compiler's code uses 10 billion Class A instructions, 1 billion Class B instructions, and 1 billion Class C instructions. Which sequence will be faster according to MIPS? Which sequence will be faster according to execution time? 41 MIPS Example (Answer) First we find the execution time for the 2 different compiles using the following equation: Execution time = CPU clock cycles / Clock rate where CPU clock cycles = Summation (CPI i x C i ) i = 1 to n CPU clock cycles1 = (5x1+1x2+1x3)x10 9 = 10 x 10 9 CPU clock cycles2 = (10x1+1x2+1x3)x10 9 = 15 x 10 9 Now, we find the execution time for the 2 compiles: Execution time1 = (10 x 10 9 ) / (500 x 10 6 ) = 20 seconds Execution time2 = (15 x 10 9 ) / (500 x 10 6 ) = 30 seconds So, we conclude that compiler 1 generates the faster program, according to execution time. 42

22 MIPS Example (Answer Cont.) Now let s compute the MIPS rate for each version of the program, using: MIPS = (Instruction count) / (Execution time x 10 6 ) MIPS1 = ((5+1+1) x 10 9 ) / (20 x 10 6 ) = 350 MIPS2 = ((10+1+1) x 10 9 ) / (30 x 10 6 ) = 400 So, the code from compiler 2 has a higher MIPS rating, but the code from compiler 1 runs faster! From the above example, MIPS fails to give a true picture of performance; even when comparing 2 versions of the same program from 2 different compilers on the same machine 43 MFLOPS as a Performance Metric MFLOPS (megaflops): million floating-point operations per second. MFLOPS = Number of floating-point operations in a program Execution time x 10 6 A floating-point is an addition, subtraction, multiplication, or division operation applied to a number in a single or double precision floating-point representation (key words: float, real, double ). MFLOPS rating is dependent on the program. That is, different programs require the execution of different number of floatingpoint operations. Compilers have a MFLOPS rating near to zero no matter how fast the machine is, because compilers rarely use floating-point arithmetic. 44

23 Remember Performance is specific to a particular program/s Total execution time is a consistent summary of performance For a given architecture performance increases come from: increases in clock rate improvements in processor organization that lower CPI compiler enhancements that lower CPI and/or instruction count Pitfall: expecting improvement in one aspect of a machine s performance to affect the total performance 45 Performance versus Cost All computer designers must balance performance and cost. Performance is the primary goal and cost is secondary. The cost of a machine is affected not only by the cost of the components, but by the costs of labor to assemble the machine, of research and development overhead, of sales and marketing, and of the profit margin. Computer designs will always be measured by cost and performance, and finding the best balance will always be the art of computer design. 46

Chapter 2. Why is some hardware better than others for different programs?

Chapter 2. Why is some hardware better than others for different programs? Chapter 2 1 Performance Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation Why is some hardware better than

More information

EEM 486: Computer Architecture. Lecture 4. Performance

EEM 486: Computer Architecture. Lecture 4. Performance EEM 486: Computer Architecture Lecture 4 Performance EEM 486 Performance Purchasing perspective Given a collection of machines, which has the» Best performance?» Least cost?» Best performance / cost? Design

More information

CPU Performance. Lecture 8 CAP 3103 06-11-2014

CPU Performance. Lecture 8 CAP 3103 06-11-2014 CPU Performance Lecture 8 CAP 3103 06-11-2014 Defining Performance Which airplane has the best performance? 1.6 Performance Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747

More information

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: Clock cycle where: Clock rate = 1 / clock cycle f = 1 /C

More information

Types of Workloads. Raj Jain. Washington University in St. Louis

Types of Workloads. Raj Jain. Washington University in St. Louis Types of Workloads Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/ 4-1 Overview!

More information

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle? Lecture 3: Evaluating Computer Architectures Announcements - Reminder: Homework 1 due Thursday 2/2 Last Time technology back ground Computer elements Circuits and timing Virtuous cycle of the past and

More information

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs. This Unit CIS 501: Computer Architecture Unit 4: Performance & Benchmarking Metrics Latency and throughput Speedup Averaging CPU Performance Performance Pitfalls Slides'developed'by'Milo'Mar0n'&'Amir'Roth'at'the'University'of'Pennsylvania'

More information

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Quiz for Chapter 1 Computer Abstractions and Technology 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

More information

on an system with an infinite number of processors. Calculate the speedup of

on an system with an infinite number of processors. Calculate the speedup of 1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements

More information

Performance evaluation

Performance evaluation Performance evaluation Arquitecturas Avanzadas de Computadores - 2547021 Departamento de Ingeniería Electrónica y de Telecomunicaciones Facultad de Ingeniería 2015-1 Bibliography and evaluation Bibliography

More information

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends This Unit CIS 501 Computer Architecture! Metrics! Latency and throughput! Reporting performance! Benchmarking and averaging Unit 2: Performance! CPU performance equation & performance trends CIS 501 (Martin/Roth):

More information

CSEE W4824 Computer Architecture Fall 2012

CSEE W4824 Computer Architecture Fall 2012 CSEE W4824 Computer Architecture Fall 2012 Lecture 2 Performance Metrics and Quantitative Principles of Computer Design Luca Carloni Department of Computer Science Columbia University in the City of New

More information

Computer Organization. and Instruction Execution. August 22

Computer Organization. and Instruction Execution. August 22 Computer Organization and Instruction Execution August 22 CSC201 Section 002 Fall, 2000 The Main Parts of a Computer CSC201 Section Copyright 2000, Douglas Reeves 2 I/O and Storage Devices (lots of devices,

More information

Quantitative Computer Architecture

Quantitative Computer Architecture Performace Measuremet ad Aalysis i Computer Quatitative Computer Measuremet Model Iovatio Proposed How to measure, aalyze, ad specify computer system performace or My computer is faster tha your computer!

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 6 Fundamentals in Performance Evaluation Computer Architecture Part 6 page 1 of 22 Prof. Dr. Uwe Brinkschulte,

More information

Lecture 5: Cost, Price, and Price for Performance Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 5: Cost, Price, and Price for Performance Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 5: Cost, Price, and Price for Performance Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review From Last Time Given sales a function of performance relative to competition,

More information

2: Computer Performance

2: Computer Performance 2: Computer Performance http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture2.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12

More information

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis 1 / 39 Overview Overview Overview What is a Workload? Instruction Workloads Synthetic Workloads Exercisers and

More information

TPCalc : a throughput calculator for computer architecture studies

TPCalc : a throughput calculator for computer architecture studies TPCalc : a throughput calculator for computer architecture studies Pierre Michaud Stijn Eyerman Wouter Rogiest IRISA/INRIA Ghent University Ghent University pierre.michaud@inria.fr Stijn.Eyerman@elis.UGent.be

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software

More information

Computer Architecture-I

Computer Architecture-I Computer Architecture-I 1. Die Yield is given by the formula, Assignment 1 Solution Die Yield = Wafer Yield x (1 + (Defects per unit area x Die Area)/a) -a Let us assume a wafer yield of 100% and a 4 for

More information

An examination of the dual-core capability of the new HP xw4300 Workstation

An examination of the dual-core capability of the new HP xw4300 Workstation An examination of the dual-core capability of the new HP xw4300 Workstation By employing single- and dual-core Intel Pentium processor technology, users have a choice of processing power options in a compact,

More information

CISC, RISC, and DSP Microprocessors

CISC, RISC, and DSP Microprocessors CISC, RISC, and DSP Microprocessors Douglas L. Jones ECE 497 Spring 2000 4/6/00 CISC, RISC, and DSP D.L. Jones 1 Outline Microprocessors circa 1984 RISC vs. CISC Microprocessors circa 1999 Perspective:

More information

EE361: Digital Computer Organization Course Syllabus

EE361: Digital Computer Organization Course Syllabus EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)

More information

Chapter 5 Instructor's Manual

Chapter 5 Instructor's Manual The Essentials of Computer Organization and Architecture Linda Null and Julia Lobur Jones and Bartlett Publishers, 2003 Chapter 5 Instructor's Manual Chapter Objectives Chapter 5, A Closer Look at Instruction

More information

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1 Pipeline Hazards Structure hazard Data hazard Pipeline hazard: the major hurdle A hazard is a condition that prevents an instruction in the pipe from executing its next scheduled pipe stage Taxonomy of

More information

64-Bit versus 32-Bit CPUs in Scientific Computing

64-Bit versus 32-Bit CPUs in Scientific Computing 64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples

More information

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

More information

Figure 1: Graphical example of a mergesort 1.

Figure 1: Graphical example of a mergesort 1. CSE 30321 Computer Architecture I Fall 2011 Lab 02: Procedure Calls in MIPS Assembly Programming and Performance Total Points: 100 points due to its complexity, this lab will weight more heavily in your

More information

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches: Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

More information

EC 362 Problem Set #2

EC 362 Problem Set #2 EC 362 Problem Set #2 1) Using Single Precision IEEE 754, what is FF28 0000? 2) Suppose the fraction enhanced of a processor is 40% and the speedup of the enhancement was tenfold. What is the overall speedup?

More information

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

Instruction Set Architecture (ISA) Design. Classification Categories

Instruction Set Architecture (ISA) Design. Classification Categories Instruction Set Architecture (ISA) Design Overview» Classify Instruction set architectures» Look at how applications use ISAs» Examine a modern RISC ISA (DLX)» Measurement of ISA usage in real computers

More information

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis Performance Metrics and Scalability Analysis 1 Performance Metrics and Scalability Analysis Lecture Outline Following Topics will be discussed Requirements in performance and cost Performance metrics Work

More information

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T) Unit- I Introduction to c Language: C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating

More information

On the Importance of Thread Placement on Multicore Architectures

On the Importance of Thread Placement on Multicore Architectures On the Importance of Thread Placement on Multicore Architectures HPCLatAm 2011 Keynote Cordoba, Argentina August 31, 2011 Tobias Klug Motivation: Many possibilities can lead to non-deterministic runtimes...

More information

How A V3 Appliance Employs Superior VDI Architecture to Reduce Latency and Increase Performance

How A V3 Appliance Employs Superior VDI Architecture to Reduce Latency and Increase Performance How A V3 Appliance Employs Superior VDI Architecture to Reduce Latency and Increase Performance www. ipro-com.com/i t Contents Overview...3 Introduction...3 Understanding Latency...3 Network Latency...3

More information

Session 7 Fractions and Decimals

Session 7 Fractions and Decimals Key Terms in This Session Session 7 Fractions and Decimals Previously Introduced prime number rational numbers New in This Session period repeating decimal terminating decimal Introduction In this session,

More information

Monday January 19th 2015 Title: "Transmathematics - a survey of recent results on division by zero" Facilitator: TheNumberNullity / James Anderson, UK

Monday January 19th 2015 Title: Transmathematics - a survey of recent results on division by zero Facilitator: TheNumberNullity / James Anderson, UK Monday January 19th 2015 Title: "Transmathematics - a survey of recent results on division by zero" Facilitator: TheNumberNullity / James Anderson, UK It has been my pleasure to give two presentations

More information

Obj: Sec 1.0, to describe the relationship between hardware and software HW: Read p.2 9. Do Now: Name 3 parts of the computer.

Obj: Sec 1.0, to describe the relationship between hardware and software HW: Read p.2 9. Do Now: Name 3 parts of the computer. C1 D1 Obj: Sec 1.0, to describe the relationship between hardware and software HW: Read p.2 9 Do Now: Name 3 parts of the computer. 1 Hardware and Software Hardware the physical, tangible parts of a computer

More information

Numerical Matrix Analysis

Numerical Matrix Analysis Numerical Matrix Analysis Lecture Notes #10 Conditioning and / Peter Blomgren, blomgren.peter@gmail.com Department of Mathematics and Statistics Dynamical Systems Group Computational Sciences Research

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Institute for Multimedia and Software Engineering Conduction of Exercises: Institute for Multimedia eda and Software Engineering g BB 315c, Tel: 379-1174 E-mail: marius.rosu@uni-due.de

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Welcome to Harcourt Mega Math: The Number Games

Welcome to Harcourt Mega Math: The Number Games Welcome to Harcourt Mega Math: The Number Games Harcourt Mega Math In The Number Games, students take on a math challenge in a lively insect stadium. Introduced by our host Penny and a number of sporting

More information

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. Unit 1 Number Sense In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. BLM Three Types of Percent Problems (p L-34) is a summary BLM for the material

More information

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers This Unit: Floating Point Arithmetic CIS 371 Computer Organization and Design Unit 7: Floating Point App App App System software Mem CPU I/O Formats Precision and range IEEE 754 standard Operations Addition

More information

Pipelining Review and Its Limitations

Pipelining Review and Its Limitations Pipelining Review and Its Limitations Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 16, 2010 Moscow Institute of Physics and Technology Agenda Review Instruction set architecture Basic

More information

Optimizing matrix multiplication Amitabha Banerjee abanerjee@ucdavis.edu

Optimizing matrix multiplication Amitabha Banerjee abanerjee@ucdavis.edu Optimizing matrix multiplication Amitabha Banerjee abanerjee@ucdavis.edu Present compilers are incapable of fully harnessing the processor architecture complexity. There is a wide gap between the available

More information

System Models for Distributed and Cloud Computing

System Models for Distributed and Cloud Computing System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems

More information

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Measuring Cache and Memory Latency and CPU to Memory Bandwidth White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

Activity 1: Using base ten blocks to model operations on decimals

Activity 1: Using base ten blocks to model operations on decimals Rational Numbers 9: Decimal Form of Rational Numbers Objectives To use base ten blocks to model operations on decimal numbers To review the algorithms for addition, subtraction, multiplication and division

More information

Computer Architecture. Secure communication and encryption.

Computer Architecture. Secure communication and encryption. Computer Architecture. Secure communication and encryption. Eugeniy E. Mikhailov The College of William & Mary Lecture 28 Eugeniy Mikhailov (W&M) Practical Computing Lecture 28 1 / 13 Computer architecture

More information

Real-Time Scheduling 1 / 39

Real-Time Scheduling 1 / 39 Real-Time Scheduling 1 / 39 Multiple Real-Time Processes A runs every 30 msec; each time it needs 10 msec of CPU time B runs 25 times/sec for 15 msec C runs 20 times/sec for 5 msec For our equation, A

More information

The Performance of Scalar Replacement on the HP 715/50

The Performance of Scalar Replacement on the HP 715/50 The Performance of Scalar Replacement on the HP 715/5 1 Introduction Steve Carr Qunyan Wu Department of Computer Science Michigan Technological University Houghton MI 49931-1295 It has been shown that

More information

Lattice QCD Performance. on Multi core Linux Servers

Lattice QCD Performance. on Multi core Linux Servers Lattice QCD Performance on Multi core Linux Servers Yang Suli * Department of Physics, Peking University, Beijing, 100871 Abstract At the moment, lattice quantum chromodynamics (lattice QCD) is the most

More information

Business opportunities from IOT and Big Data. Joachim Aertebjerg Director Enterprise Solution Sales Intel EMEA

Business opportunities from IOT and Big Data. Joachim Aertebjerg Director Enterprise Solution Sales Intel EMEA Business opportunities from IOT and Big Data Joachim Aertebjerg Director Enterprise Solution Sales Intel EMEA HOW INTEL IS TRANSFORMING COMPUTING? Smarter Devices Applications of Big Data Compute for Internet

More information

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips Technology Update White Paper High Speed RAID 6 Powered by Custom ASIC Parity Chips High Speed RAID 6 Powered by Custom ASIC Parity Chips Why High Speed RAID 6? Winchester Systems has developed High Speed

More information

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to: 55 Topic 3 Computer Performance Contents 3.1 Introduction...................................... 56 3.2 Measuring performance............................... 56 3.2.1 Clock Speed.................................

More information

Central Processing Unit (CPU)

Central Processing Unit (CPU) Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following

More information

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0 PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0 15 th January 2014 Al Chrosny Director, Software Engineering TreeAge Software, Inc. achrosny@treeage.com Andrew Munzer Director, Training and Customer

More information

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department

More information

A Lab Course on Computer Architecture

A Lab Course on Computer Architecture A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

More information

x64 Servers: Do you want 64 or 32 bit apps with that server?

x64 Servers: Do you want 64 or 32 bit apps with that server? TMurgent Technologies x64 Servers: Do you want 64 or 32 bit apps with that server? White Paper by Tim Mangan TMurgent Technologies February, 2006 Introduction New servers based on what is generally called

More information

Choosing a Computer for Running SLX, P3D, and P5

Choosing a Computer for Running SLX, P3D, and P5 Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line

More information

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do

More information

Intel Pentium 4 Processor on 90nm Technology

Intel Pentium 4 Processor on 90nm Technology Intel Pentium 4 Processor on 90nm Technology Ronak Singhal August 24, 2004 Hot Chips 16 1 1 Agenda Netburst Microarchitecture Review Microarchitecture Features Hyper-Threading Technology SSE3 Intel Extended

More information

Linux Process Scheduling Policy

Linux Process Scheduling Policy Lecture Overview Introduction to Linux process scheduling Policy versus algorithm Linux overall process scheduling objectives Timesharing Dynamic priority Favor I/O-bound process Linux scheduling algorithm

More information

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit. Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how

More information

Storing Measurement Data

Storing Measurement Data Storing Measurement Data File I/O records or reads data in a file. A typical file I/O operation involves the following process. 1. Create or open a file. Indicate where an existing file resides or where

More information

Wait-Time Analysis Method: New Best Practice for Performance Management

Wait-Time Analysis Method: New Best Practice for Performance Management WHITE PAPER Wait-Time Analysis Method: New Best Practice for Performance Management September 2006 Confio Software www.confio.com +1-303-938-8282 SUMMARY: Wait-Time analysis allows IT to ALWAYS find the

More information

Introduction to Microprocessors

Introduction to Microprocessors Introduction to Microprocessors Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 2, 2010 Moscow Institute of Physics and Technology Agenda Background and History What is a microprocessor?

More information

Parallel Scalable Algorithms- Performance Parameters

Parallel Scalable Algorithms- Performance Parameters www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer Computers CMPT 125: Lecture 1: Understanding the Computer Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 3, 2009 A computer performs 2 basic functions: 1.

More information

THE NAS KERNEL BENCHMARK PROGRAM

THE NAS KERNEL BENCHMARK PROGRAM THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures

More information

Working with whole numbers

Working with whole numbers 1 CHAPTER 1 Working with whole numbers In this chapter you will revise earlier work on: addition and subtraction without a calculator multiplication and division without a calculator using positive and

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

CHAPTER 4 MARIE: An Introduction to a Simple Computer

CHAPTER 4 MARIE: An Introduction to a Simple Computer CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 195 4.2 CPU Basics and Organization 195 4.2.1 The Registers 196 4.2.2 The ALU 197 4.2.3 The Control Unit 197 4.3 The Bus 197 4.4 Clocks

More information

White Paper. Intel Sandy Bridge Brings Many Benefits to the PC/104 Form Factor

White Paper. Intel Sandy Bridge Brings Many Benefits to the PC/104 Form Factor White Paper Intel Sandy Bridge Brings Many Benefits to the PC/104 Form Factor Introduction ADL Embedded Solutions newly introduced PCIe/104 ADLQM67 platform is the first ever PC/104 form factor board to

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit Decimal Division Remember 4th grade long division? 43 // quotient 12 521 // divisor dividend -480 41-36 5 // remainder Shift divisor left (multiply by 10) until MSB lines up with dividend s Repeat until

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

More information

Case Study I: A Database Service

Case Study I: A Database Service Case Study I: A Database Service Prof. Daniel A. Menascé Department of Computer Science George Mason University www.cs.gmu.edu/faculty/menasce.html 1 Copyright Notice Most of the figures in this set of

More information

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu. Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Computers in mid 50 s Hardware was expensive

More information

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design Learning Outcomes Simple CPU Operation and Buses Dr Eddie Edwards eddie.edwards@imperial.ac.uk At the end of this lecture you will Understand how a CPU might be put together Be able to name the basic components

More information

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s) Addressing The problem Objectives:- When & Where do we encounter Data? The concept of addressing data' in computations The implications for our machine design(s) Introducing the stack-machine concept Slide

More information

Windows Server Performance Monitoring

Windows Server Performance Monitoring Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly

More information

Binary Number System. 16. Binary Numbers. Base 10 digits: 0 1 2 3 4 5 6 7 8 9. Base 2 digits: 0 1

Binary Number System. 16. Binary Numbers. Base 10 digits: 0 1 2 3 4 5 6 7 8 9. Base 2 digits: 0 1 Binary Number System 1 Base 10 digits: 0 1 2 3 4 5 6 7 8 9 Base 2 digits: 0 1 Recall that in base 10, the digits of a number are just coefficients of powers of the base (10): 417 = 4 * 10 2 + 1 * 10 1

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

Introduction to Operating Systems. Perspective of the Computer. System Software. Indiana University Chen Yu

Introduction to Operating Systems. Perspective of the Computer. System Software. Indiana University Chen Yu Introduction to Operating Systems Indiana University Chen Yu Perspective of the Computer System Software A general piece of software with common functionalities that support many applications. Example:

More information

Operation Count; Numerical Linear Algebra

Operation Count; Numerical Linear Algebra 10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Week 1 out-of-class notes, discussions and sample problems

Week 1 out-of-class notes, discussions and sample problems Week 1 out-of-class notes, discussions and sample problems Although we will primarily concentrate on RISC processors as found in some desktop/laptop computers, here we take a look at the varying types

More information

Cloud Performance Benchmark Series

Cloud Performance Benchmark Series Cloud Performance Benchmark Series Amazon EC2 CPU Speed Benchmarks Kalpit Sarda Sumit Sanghrajka Radu Sion ver..7 C l o u d B e n c h m a r k s : C o m p u t i n g o n A m a z o n E C 2 2 1. Overview We

More information

Parallelism and Cloud Computing

Parallelism and Cloud Computing Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

Capacity Analysis Techniques Applied to VMware VMs (aka When is a Server not really a Server?)

Capacity Analysis Techniques Applied to VMware VMs (aka When is a Server not really a Server?) Capacity Analysis Techniques Applied to VMware VMs (aka When is a Server not really a Server?) Debbie Sheetz, BMC Software La Jolla, CA November 5 th 2013 Presentation Overview How to Approach Performance/Capacity

More information

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level System: User s View System Components: High Level View Input Output 1 System: Motherboard Level 2 Components: Interconnection I/O MEMORY 3 4 Organization Registers ALU CU 5 6 1 Input/Output I/O MEMORY

More information