Computer Architecture

Size: px
Start display at page:

Download "Computer Architecture"

Transcription

1 Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 6 Fundamentals in Performance Evaluation Computer Architecture Part 6 page 1 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

2 Why performance evaluation? Comparison of computers Selection of a computer Changes in the configuration of an existing computer (tuning) Design of computers Verification or validation of design desicions Methods for performance evaluation: (1) analytical methods (2) measurements Computer Architecture Part 6 page 2 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

3 Aspects for evaluation modularity orthogonality adequacy virtuality symmetry transparency Is the system composed of mostly independent parts, so called modules? Does every module offer an own set of functions to the system? Is one particular function not offered by different modules? Do performance and cost of a module meet its weight for the whole system? Are the physical limits of the hardware modules been repealed to the user? (Examples: virtual memory) It is possible to derive the function of unknown parts from the properties of some known parts of the architecture, e.g. parts of the ISA? Are nonrelevant parts of the architecture been hidden to the user? (Example: transparent coprocessor) Computer Architecture Part 6 page 3 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

4 Analytical methods Performance measures: (hypothetical maximaum performance!!) MIPS (Millions of Instructions per Second) MFLOPS (Millions of Floating Point Operations per Sec.) Mix: (as well calculated, not measured) In a mix, the average execution time for each instruction is calculated and scaled by a characteristical weight. Core-Programs: Typical application programs, written for the evaluated computer No measurements, the overall execution time is calculated using the execution times of the single machine instructions Computer Architecture Part 6 page 4 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

5 Performance measures runtime = # clock cycles * clock period MIPS (million instruction per second) MIPS = instruction count runtime 10 6 MIPS = instruction count = instruction count clock frequency # clock cycles clock period 10 6 # clock cycles 10 6 MIPS = clock frequency = clock frequency IPC CPI CPI (cycles per instruction) # clock cycles CPI = instruction count MFLOPS (million floating point operations per second) # executed floating point instruction MFLOPS = runtime 10 6 IPC (instructions per cycle) ICP = 1 / CPI Computer Architecture Part 6 page 5 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

6 Drawbacks of performance measures CPI, IPC, MIPS and MFLOPS are dependent on the instruction set. CPI, IPC, MIPS and MFLOPS are dependent on the program. CPI, IPC, MIPS and MFLOPS are dependent on the microarchitecture Conclusions: Greater MIPS or MFLOPS ratings do not implicitly mean more performance! It is of vital importance to chose well-suited test applications (benchmarks)! Computer Architecture Part 6 page 6 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

7 Measurements Benchmarks Use of existing or synthetic programs to measure the performance These programs are translated and executed on the evaluated computer Therefore, not only the computer hardware, but as well the compiler influences the outcome of a benchmark Monitoring: Monitors are used to observe parts of the computer at run-time Therefore, interesting quantities inside the computer can be measured beside the overall outcome of a benchmark (e.g. cache utilization, network traffic, ) Monitoring can be done by hardware or software Computer Architecture Part 6 page 7 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

8 Benchmark terminology benchmark A test program. benchmark suite A set of benchmarks. synthetic benchmark A test program only useful as benchmark. kernel benchmark A very small synthetic benchmark. Usually a time intensive part of a real program is chosen. Kernel benchmarks are well suited for design and simulation but normally unqualified to compare complete systems. benchmark application A complete program additionally used as benchmark. Opposite to synthetic benchmark. Computer Architecture Part 6 page 8 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

9 SPEC-Benchmarks SPEC Standard Performance Evaluation Corporation since 1989, consortium of different manufacturer, general purpose computer applications, mainly to measure speed and throughput Several benchmark suites, e.g. SPEC95, SPECweb96, SPEC JVM98 SPEC JBB2000 SPEC CINT 2006 SPEC CFP 2006 Computer Architecture Part 6 page 9 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

10 SPECmarks Goal: comparable values for different systems But: single values don't always reflect real relations, therefore only a first indication to select or judge a computer CPU performance plus cache, memory and compiler is measured, the operating system and IO is less relevant Integer test-programs (ANSI C) Floating-point test-programs (Fortran77) SPECmark : this characteristic is the geometric mean of the individual program characteristics contained in the suite Computer Architecture Part 6 page 10 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

11 SPEC-CINT2006: 12 Integer test programs (C, C++) name perlbench bzip2 description PERL interpreter bzip compressionsprogram gcc GNU-C-Compiler version 3.2 mcf gobmk hmmer Simplex algorithm for traffic planning AI implementation of the game Go Protein sequence analysis based on a hidden Markov model sjeng libquantum h264ref omnetpp astar xalancbmk Chess program Quantum computer simulator H.264 codec OMNET++ discrete event simulator Route planning XML translator Computer Architecture Part 6 page 11 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

12 SPEC-CFP2006: 17 Floating-point test programs (C, C++, FORTRAN) name description bwaves gamess milc zeusmp gromacs cactusadm Fluid dynamics algorithm Quantum chemistry algorithm Physics algorithm Fluid dynamics algorithm Newton's equations of motion Equation solver for Einstein's evolutionary equation leslie3d namd dealll soplex povray calculix GemsFDTD Fluid dynamics algorithm Biomolecular simulation Finite-Elements Simplex algorithm Image rendering Finite-Elements Maxwell equation solver tonto lbm wrf Shinx3 Quantum chemistry Lattice-Bolzmann-simulator Weather modeling Speach recognition Computer Architecture Part 6 page 12 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

13 More popular benchmark suites Basic Linear Algebra Subprograms (BLAS): For numerical applications Core of the LINPACK software package to solve lienar equation systems TOP 500 list of the fastest parallel computers Whetstone-Benchmark: Developed in the seventies, a single program with lot of floating-point calculations Dhrystone-Benchmark: Improvement of Whetstone, developed in the eighties Powerstone-Benchmark-Suite: To compare the energy consumption of microprocessors and microcontrollers Computer Architecture Part 6 page 13 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

14 Powerstone benchmark suite name description auto bilv bilt compress crc des dhry engine fir_int Vehicle control Logical and shift operations Graphical application UNIX compression program CRC error detection Data encryption Dhrystone Engine control Integer FIR filter g3fax FAX group 3 g721 jpeg pocsag servo summin ucbqsort v42bits whet Audio compression JPEG 24-Bit compression Communication protocol for pagers Hard disc control Hand writing recognition Quick sort Modem operation Whetstone Computer Architecture Part 6 page 14 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

15 Monitoring Monitors are components recording the states of a system during its normal operation. Contents of registers, flags, buffers and traffic in data paths are recorded. Monitors are used to observe and debug systems. Computer Architecture Part 6 page 15 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

16 Monitoring Generally, monitors can be classified in: a) Hardware monitors A hardware monitor is a separate component which is physically connected to the locations of the target system where measurements take place. Hardware monitors typically consist of comparators and counters to create data, memories to store it and busses for data transport. Thus, hardware monitors use its own resources. Computer Architecture Part 6 page 16 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

17 Monitoring b) Software monitors A software monitor is a program, implemented to collect measuring data through interfaces provided by the operation system, the programming languages or application program. A software monitor uses the resources of the observed system to collect, transport and store data. c) Hybrid monitors A hybrid monitor is a mixed hardware and software monitor. Often simple elements like counters and memories are implemented in hardware while more complex observation functions are implemented in software. Computer Architecture Part 6 page 17 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

18 Monitoring constraints 1. Accessing information Ideally monitoring is integrated into the hardware and software components of a system during design. Software monitors are cheaper than hardware monitors but they may influence the systems run time behavior. 2. Reaction less monitoring Hardware and most hybrid monitors store the recorded data in their own memories. Software monitors have to use the memories of the observed system. Thus, hardware monitors are more reaction less than software monitors. Computer Architecture Part 6 page 18 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

19 Monitoring constraints: 3. Amount of recorded data and its further processing Most purposes, especially debugging, require observations with high resolution. For the accurate analysis of program errors the causing machine instruction has to be identified. For other purposes, e.g. a global performance analysis, a coarser resolution is sufficient. Although it often seems necessary to record observable data on the level of machine instruction execution, this would generate traces much greater than the memory usage of the observed application. Thus, the cost to store this high amount of data and the general difficulties of processing the trace data prohibit a complete recording of traces at machine instruction level. Computer Architecture Part 6 page 19 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

20 Instrumentation One way of software monitoring is to insert measuring commands into program code e.g. loop or time counters. This is called instrumentation. Instrumentation can be performed by the user, the compiler, the class library or the operation system. instrumented program computer measure system results measure results Computer Architecture Part 6 page 20 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

21 Montitoring overview method direct instrumentation trace driven simulation system state accuracy tools hardware very high Hardware monitor hardware high instrumented program hard- and satisfactory simulation program software + hardware Trace simulation software sufficient simulation program Computer Architecture Part 6 page 21 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

22 Typical load-dependent parameters throughput Defines the average number of jobs completed per time unit. A job may be: execution of an instruction or a program, saving a data block or sending a message. utilization Defines the throughput (average number of jobs completed) divided by the maximum possible throughput. response time Defines the average time needed to complete a job. utilization ratio Defines the time spent working on the jobs divided by whole operating time. Computer Architecture Part 6 page 22 of 22 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

Achieving QoS in Server Virtualization

Achieving QoS in Server Virtualization Achieving QoS in Server Virtualization Intel Platform Shared Resource Monitoring/Control in Xen Chao Peng (chao.p.peng@intel.com) 1 Increasing QoS demand in Server Virtualization Data center & Cloud infrastructure

More information

How Much Power Oversubscription is Safe and Allowed in Data Centers?

How Much Power Oversubscription is Safe and Allowed in Data Centers? How Much Power Oversubscription is Safe and Allowed in Data Centers? Xing Fu 1,2, Xiaorui Wang 1,2, Charles Lefurgy 3 1 EECS @ University of Tennessee, Knoxville 2 ECE @ The Ohio State University 3 IBM

More information

An OS-oriented performance monitoring tool for multicore systems

An OS-oriented performance monitoring tool for multicore systems An OS-oriented performance monitoring tool for multicore systems J.C. Sáez, J. Casas, A. Serrano, R. Rodríguez-Rodríguez, F. Castro, D. Chaver, M. Prieto-Matias Department of Computer Architecture Complutense

More information

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: Clock cycle where: Clock rate = 1 / clock cycle f = 1 /C

More information

Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 64 Architecture

Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 64 Architecture Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 64 Architecture Dong Ye David Kaeli Northeastern University Joydeep Ray Christophe Harle AMD Inc. IISWC 2006 1 Outline Motivation

More information

Compiler-Assisted Binary Parsing

Compiler-Assisted Binary Parsing Compiler-Assisted Binary Parsing Tugrul Ince tugrul@cs.umd.edu PD Week 2012 26 27 March 2012 Parsing Binary Files Binary analysis is common for o Performance modeling o Computer security o Maintenance

More information

Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking

Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking Kathlene Hurt and Eugene John Department of Electrical and Computer Engineering University of Texas at San Antonio

More information

Types of Workloads. Raj Jain. Washington University in St. Louis

Types of Workloads. Raj Jain. Washington University in St. Louis Types of Workloads Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/ 4-1 Overview!

More information

Schedulability Analysis for Memory Bandwidth Regulated Multicore Real-Time Systems

Schedulability Analysis for Memory Bandwidth Regulated Multicore Real-Time Systems Schedulability for Memory Bandwidth Regulated Multicore Real-Time Systems Gang Yao, Heechul Yun, Zheng Pei Wu, Rodolfo Pellizzoni, Marco Caccamo, Lui Sha University of Illinois at Urbana-Champaign, USA.

More information

Reducing Dynamic Compilation Latency

Reducing Dynamic Compilation Latency LLVM 12 - European Conference, London Reducing Dynamic Compilation Latency Igor Böhm Processor Automated Synthesis by iterative Analysis The University of Edinburgh LLVM 12 - European Conference, London

More information

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Quiz for Chapter 1 Computer Abstractions and Technology 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

More information

Chapter 2. Why is some hardware better than others for different programs?

Chapter 2. Why is some hardware better than others for different programs? Chapter 2 1 Performance Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation Why is some hardware better than

More information

EEM 486: Computer Architecture. Lecture 4. Performance

EEM 486: Computer Architecture. Lecture 4. Performance EEM 486: Computer Architecture Lecture 4 Performance EEM 486 Performance Purchasing perspective Given a collection of machines, which has the» Best performance?» Least cost?» Best performance / cost? Design

More information

Cloud Performance Benchmark Series

Cloud Performance Benchmark Series Cloud Performance Benchmark Series Amazon EC2 CPU Speed Benchmarks Kalpit Sarda Sumit Sanghrajka Radu Sion ver..7 C l o u d B e n c h m a r k s : C o m p u t i n g o n A m a z o n E C 2 2 1. Overview We

More information

When Prefetching Works, When It Doesn t, and Why

When Prefetching Works, When It Doesn t, and Why When Prefetching Works, When It Doesn t, and Why JAEKYU LEE, HYESOON KIM, and RICHARD VUDUC, Georgia Institute of Technology In emerging and future high-end processor systems, tolerating increasing cache

More information

Secure Cloud Computing: The Monitoring Perspective

Secure Cloud Computing: The Monitoring Perspective Secure Cloud Computing: The Monitoring Perspective Peng Liu Penn State University 1 Cloud Computing is Less about Computer Design More about Use of Computing (UoC) CPU, OS, VMM, PL, Parallel computing

More information

Leistungsanalyse von Rechnersystemen

Leistungsanalyse von Rechnersystemen Center for Information Services and High Performance Computing (ZIH) Leistungsanalyse von Rechnersystemen 29. Oktober 2008 Nöthnitzer Straße 46 Raum 1026 Tel. +49 351-463 - 35048 Holger Brunst (holger.brunst@tu-dresden.de)

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis 1 / 39 Overview Overview Overview What is a Workload? Instruction Workloads Synthetic Workloads Exercisers and

More information

Characterizing the Unique and Diverse Behaviors in Existing and Emerging General-Purpose and Domain-Specific Benchmark Suites

Characterizing the Unique and Diverse Behaviors in Existing and Emerging General-Purpose and Domain-Specific Benchmark Suites Characterizing the Unique and Diverse Behaviors in Existing and Emerging General-Purpose and Domain-Specific Benchmark Suites Kenneth Hoste Lieven Eeckhout ELIS Department, Ghent University Sint-Pietersnieuwstraat

More information

How Much Power Oversubscription is Safe and Allowed in Data Centers?

How Much Power Oversubscription is Safe and Allowed in Data Centers? How Much Power Oversubscription is Safe and Allowed in Data Centers? Xing Fu, Xiaorui Wang University of Tennessee, Knoxville, TN 37996 The Ohio State University, Columbus, OH 43210 {xfu1, xwang}@eecs.utk.edu

More information

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1 Monitors Monitor: A tool used to observe the activities on a system. Usage: A system programmer may use a monitor to improve software performance. Find frequently used segments of the software. A systems

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters

A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters Hui Wang, Canturk Isci, Lavanya Subramanian, Jongmoo Choi, Depei Qian, Onur Mutlu Beihang University, IBM Thomas J. Watson

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

More information

The Effect of Input Data on Program Vulnerability

The Effect of Input Data on Program Vulnerability The Effect of Input Data on Program Vulnerability Vilas Sridharan and David R. Kaeli Department of Electrical and Computer Engineering Northeastern University {vilas, kaeli}@ece.neu.edu I. INTRODUCTION

More information

STAILIZER and Its Effectiveness

STAILIZER and Its Effectiveness STABILIZER: Statistically Sound Performance Evaluation Charlie Curtsinger Emery D. Berger Department of Computer Science University of Massachusetts Amherst Amherst, MA 01003 {charlie,emery}@cs.umass.edu

More information

secubt : Hacking the Hackers with User-Space Virtualization

secubt : Hacking the Hackers with User-Space Virtualization secubt : Hacking the Hackers with User-Space Virtualization Mathias Payer Department of Computer Science ETH Zurich Abstract In the age of coordinated malware distribution and zero-day exploits security

More information

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle? Lecture 3: Evaluating Computer Architectures Announcements - Reminder: Homework 1 due Thursday 2/2 Last Time technology back ground Computer elements Circuits and timing Virtuous cycle of the past and

More information

Cache Capacity and Memory Bandwidth Scaling Limits of Highly Threaded Processors

Cache Capacity and Memory Bandwidth Scaling Limits of Highly Threaded Processors Cache Capacity and Memory Bandwidth Scaling Limits of Highly Threaded Processors Jeff Stuecheli 12 Lizy Kurian John 1 1 Department of Electrical and Computer Engineering, University of Texas at Austin

More information

Practical Memory Checking with Dr. Memory

Practical Memory Checking with Dr. Memory Practical Memory Checking with Dr. Memory Derek Bruening Google bruening@google.com Qin Zhao Massachusetts Institute of Technology qin zhao@csail.mit.edu Abstract Memory corruption, reading uninitialized

More information

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs. This Unit CIS 501: Computer Architecture Unit 4: Performance & Benchmarking Metrics Latency and throughput Speedup Averaging CPU Performance Performance Pitfalls Slides'developed'by'Milo'Mar0n'&'Amir'Roth'at'the'University'of'Pennsylvania'

More information

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends This Unit CIS 501 Computer Architecture! Metrics! Latency and throughput! Reporting performance! Benchmarking and averaging Unit 2: Performance! CPU performance equation & performance trends CIS 501 (Martin/Roth):

More information

Chapter 3 Operating-System Structures

Chapter 3 Operating-System Structures Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual

More information

KerMon: Framework for in-kernel performance and energy monitoring

KerMon: Framework for in-kernel performance and energy monitoring 1 KerMon: Framework for in-kernel performance and energy monitoring Diogo Antão Abstract Accurate on-the-fly characterization of application behavior requires assessing a set of execution related parameters

More information

A Comparison of Capacity Management Schemes for Shared CMP Caches

A Comparison of Capacity Management Schemes for Shared CMP Caches A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Department of Electrical Engineering Princeton University {carolewu, mrm}@princeton.edu Abstract

More information

HQEMU: A Multi-Threaded and Retargetable Dynamic Binary Translator on Multicores

HQEMU: A Multi-Threaded and Retargetable Dynamic Binary Translator on Multicores H: A Multi-Threaded and Retargetable Dynamic Binary Translator on Multicores Ding-Yong Hong National Tsing Hua University Institute of Information Science Academia Sinica, Taiwan dyhong@sslab.cs.nthu.edu.tw

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

Performance Impacts of Non-blocking Caches in Out-of-order Processors

Performance Impacts of Non-blocking Caches in Out-of-order Processors Performance Impacts of Non-blocking Caches in Out-of-order Processors Sheng Li; Ke Chen; Jay B. Brockman; Norman P. Jouppi HP Laboratories HPL-2011-65 Keyword(s): Non-blocking cache; MSHR; Out-of-order

More information

Benchmarking the Amazon Elastic Compute Cloud (EC2)

Benchmarking the Amazon Elastic Compute Cloud (EC2) Benchmarking the Amazon Elastic Compute Cloud (EC2) A Major Qualifying Project Report submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the

More information

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and

More information

School of Computer Science

School of Computer Science School of Computer Science Computer Science - Honours Level - 2014/15 October 2014 General degree students wishing to enter 3000- level modules and non- graduating students wishing to enter 3000- level

More information

Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling

Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling Vijay Janapa Reddi, Svilen Kanev, Wonyoung Kim, Simone Campanoni, Michael D.

More information

THE NAS KERNEL BENCHMARK PROGRAM

THE NAS KERNEL BENCHMARK PROGRAM THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures

More information

on an system with an infinite number of processors. Calculate the speedup of

on an system with an infinite number of processors. Calculate the speedup of 1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements

More information

64-Bit versus 32-Bit CPUs in Scientific Computing

64-Bit versus 32-Bit CPUs in Scientific Computing 64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples

More information

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008 Professional Organization Checklist for the Computer Science Curriculum Updates Association of Computing Machinery Computing Curricula 2008 The curriculum guidelines can be found in Appendix C of the report

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 11 Memory Management Computer Architecture Part 11 page 1 of 44 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin

More information

Fine-Grained User-Space Security Through Virtualization. Mathias Payer and Thomas R. Gross ETH Zurich

Fine-Grained User-Space Security Through Virtualization. Mathias Payer and Thomas R. Gross ETH Zurich Fine-Grained User-Space Security Through Virtualization Mathias Payer and Thomas R. Gross ETH Zurich Motivation Applications often vulnerable to security exploits Solution: restrict application access

More information

Dynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources

Dynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources Dynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources JeongseobAhn,Changdae Kim, JaeungHan,Young-ri Choi,and JaehyukHuh KAIST UNIST {jeongseob, cdkim, juhan, and jhuh}@calab.kaist.ac.kr

More information

On the Importance of Thread Placement on Multicore Architectures

On the Importance of Thread Placement on Multicore Architectures On the Importance of Thread Placement on Multicore Architectures HPCLatAm 2011 Keynote Cordoba, Argentina August 31, 2011 Tobias Klug Motivation: Many possibilities can lead to non-deterministic runtimes...

More information

Architectural Support for Software-Defined Metadata Processing

Architectural Support for Software-Defined Metadata Processing Architectural Support for Software-Defined Metadata Processing Udit Dhawan 1 Cătălin Hriţcu 2 Raphael Rubin 1 Nikos Vasilakis 1 Silviu Chiricescu 3 Jonathan M. Smith 1 Thomas F. Knight Jr. 4 Benjamin C.

More information

FACT: a Framework for Adaptive Contention-aware Thread migrations

FACT: a Framework for Adaptive Contention-aware Thread migrations FACT: a Framework for Adaptive Contention-aware Thread migrations Kishore Kumar Pusukuri Department of Computer Science and Engineering University of California, Riverside, CA 92507. kishore@cs.ucr.edu

More information

Linear-time Modeling of Program Working Set in Shared Cache

Linear-time Modeling of Program Working Set in Shared Cache Linear-time Modeling of Program Working Set in Shared Cache Xiaoya Xiang, Bin Bao, Chen Ding, Yaoqing Gao Computer Science Department, University of Rochester IBM Toronto Software Lab {xiang,bao,cding}@cs.rochester.edu,

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

DELL VS. SUN SERVERS: R910 PERFORMANCE COMPARISON SPECint_rate_base2006

DELL VS. SUN SERVERS: R910 PERFORMANCE COMPARISON SPECint_rate_base2006 DELL VS. SUN SERVERS: R910 PERFORMANCE COMPARISON OUR FINDINGS The latest, most powerful Dell PowerEdge servers deliver better performance than Sun SPARC Enterprise servers. In Principled Technologies

More information

Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platforms

Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platforms .9/TC.5.5889, IEEE Transactions on Computers Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platforms Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Marco Caccamo, Lui Sha University

More information

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:

More information

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Heechul Yun +, Gang Yao +, Rodolfo Pellizzoni *, Marco Caccamo +, Lui Sha + University of Illinois at Urbana and Champaign

More information

Subject knowledge requirements for entry into computer science teacher training. Expert group s recommendations

Subject knowledge requirements for entry into computer science teacher training. Expert group s recommendations Subject knowledge requirements for entry into computer science teacher training Expert group s recommendations Introduction To start a postgraduate primary specialist or secondary ITE course specialising

More information

Five Families of ARM Processor IP

Five Families of ARM Processor IP ARM1026EJ-S Synthesizable ARM10E Family Processor Core Eric Schorn CPU Product Manager ARM Austin Design Center Five Families of ARM Processor IP Performance ARM preserves SW & HW investment through code

More information

MEng, BSc Computer Science with Artificial Intelligence

MEng, BSc Computer Science with Artificial Intelligence School of Computing FACULTY OF ENGINEERING MEng, BSc Computer Science with Artificial Intelligence Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give

More information

Operating Systems, 6 th ed. Test Bank Chapter 7

Operating Systems, 6 th ed. Test Bank Chapter 7 True / False Questions: Chapter 7 Memory Management 1. T / F In a multiprogramming system, main memory is divided into multiple sections: one for the operating system (resident monitor, kernel) and one

More information

MEng, BSc Applied Computer Science

MEng, BSc Applied Computer Science School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

Masters in Human Computer Interaction

Masters in Human Computer Interaction Masters in Human Computer Interaction Programme Requirements Taught Element, and PG Diploma in Human Computer Interaction: 120 credits: IS5101 CS5001 CS5040 CS5041 CS5042 or CS5044 up to 30 credits from

More information

Masters in Advanced Computer Science

Masters in Advanced Computer Science Masters in Advanced Computer Science Programme Requirements Taught Element, and PG Diploma in Advanced Computer Science: 120 credits: IS5101 CS5001 up to 30 credits from CS4100 - CS4450, subject to appropriate

More information

Lattice QCD Performance. on Multi core Linux Servers

Lattice QCD Performance. on Multi core Linux Servers Lattice QCD Performance on Multi core Linux Servers Yang Suli * Department of Physics, Peking University, Beijing, 100871 Abstract At the moment, lattice quantum chromodynamics (lattice QCD) is the most

More information

Figure 1: Graphical example of a mergesort 1.

Figure 1: Graphical example of a mergesort 1. CSE 30321 Computer Architecture I Fall 2011 Lab 02: Procedure Calls in MIPS Assembly Programming and Performance Total Points: 100 points due to its complexity, this lab will weight more heavily in your

More information

Masters in Artificial Intelligence

Masters in Artificial Intelligence Masters in Artificial Intelligence Programme Requirements Taught Element, and PG Diploma in Artificial Intelligence: 120 credits: IS5101 CS5001 CS5010 CS5011 CS4402 or CS5012 in total, up to 30 credits

More information

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

Cloud Computing. Adam Barker

Cloud Computing. Adam Barker Cloud Computing Adam Barker 1 Overview Introduction to Cloud computing Enabling technologies Different types of cloud: IaaS, PaaS and SaaS Cloud terminology Interacting with a cloud: management consoles

More information

Is there any alternative to Exadata X5? March 2015

Is there any alternative to Exadata X5? March 2015 Is there any alternative to Exadata X5? March 2015 Contents 1 About Benchware Ltd. 2 Licensing 3 Scalability 4 Exadata Specifics 5 Performance 6 Costs 7 Myths 8 Conclusion copyright 2015 by benchware.ch

More information

CSEE W4824 Computer Architecture Fall 2012

CSEE W4824 Computer Architecture Fall 2012 CSEE W4824 Computer Architecture Fall 2012 Lecture 2 Performance Metrics and Quantitative Principles of Computer Design Luca Carloni Department of Computer Science Columbia University in the City of New

More information

2: Computer Performance

2: Computer Performance 2: Computer Performance http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture2.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches: Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

More information

Testing & Assuring Mobile End User Experience Before Production. Neotys

Testing & Assuring Mobile End User Experience Before Production. Neotys Testing & Assuring Mobile End User Experience Before Production Neotys Agenda Introduction The challenges Best practices NeoLoad mobile capabilities Mobile devices are used more and more At Home In 2014,

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

OKLAHOMA SUBJECT AREA TESTS (OSAT )

OKLAHOMA SUBJECT AREA TESTS (OSAT ) CERTIFICATION EXAMINATIONS FOR OKLAHOMA EDUCATORS (CEOE ) OKLAHOMA SUBJECT AREA TESTS (OSAT ) FIELD 081: COMPUTER SCIENCE September 2008 Subarea Range of Competencies I. Computer Use in Educational Environments

More information

A Lab Course on Computer Architecture

A Lab Course on Computer Architecture A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

More information

A NOVEL RESOURCE EFFICIENT DMMS APPROACH

A NOVEL RESOURCE EFFICIENT DMMS APPROACH A NOVEL RESOURCE EFFICIENT DMMS APPROACH FOR NETWORK MONITORING AND CONTROLLING FUNCTIONS Golam R. Khan 1, Sharmistha Khan 2, Dhadesugoor R. Vaman 3, and Suxia Cui 4 Department of Electrical and Computer

More information

Online Adaptation for Application Performance and Efficiency

Online Adaptation for Application Performance and Efficiency Online Adaptation for Application Performance and Efficiency A Dissertation Proposal by Jason Mars 20 November 2009 Submitted to the graduate faculty of the Department of Computer Science at the University

More information

Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger

Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger Applying Data Analysis to Big Data Benchmarks Jazmine Olinger Abstract This paper describes finding accurate and fast ways to simulate Big Data benchmarks. Specifically, using the currently existing simulation

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

Performance evaluation

Performance evaluation Performance evaluation Arquitecturas Avanzadas de Computadores - 2547021 Departamento de Ingeniería Electrónica y de Telecomunicaciones Facultad de Ingeniería 2015-1 Bibliography and evaluation Bibliography

More information

Wiggins/Redstone: An On-line Program Specializer

Wiggins/Redstone: An On-line Program Specializer Wiggins/Redstone: An On-line Program Specializer Dean Deaver Rick Gorton Norm Rubin {dean.deaver,rick.gorton,norm.rubin}@compaq.com Hot Chips 11 Wiggins/Redstone 1 W/R is a Software System That: u Makes

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

Precise and Accurate Processor Simulation

Precise and Accurate Processor Simulation Precise and Accurate Processor Simulation Harold Cain, Kevin Lepak, Brandon Schwartz, and Mikko H. Lipasti University of Wisconsin Madison http://www.ece.wisc.edu/~pharm Performance Modeling Analytical

More information

What is LOG Storm and what is it useful for?

What is LOG Storm and what is it useful for? What is LOG Storm and what is it useful for? LOG Storm is a high-speed digital data logger used for recording and analyzing the activity from embedded electronic systems digital bus and data lines. It

More information

Computing Performance Benchmarks among CPU, GPU, and FPGA

Computing Performance Benchmarks among CPU, GPU, and FPGA Computing Performance Benchmarks among CPU, GPU, and FPGA MathWorks Authors: Christopher Cullinan Christopher Wyant Timothy Frattesi Advisor: Xinming Huang Abstract In recent years, the world of high performance

More information

SIPAC. Signals and Data Identification, Processing, Analysis, and Classification

SIPAC. Signals and Data Identification, Processing, Analysis, and Classification SIPAC Signals and Data Identification, Processing, Analysis, and Classification Framework for Mass Data Processing with Modules for Data Storage, Production and Configuration SIPAC key features SIPAC is

More information

Benchmarking Large Scale Cloud Computing in Asia Pacific

Benchmarking Large Scale Cloud Computing in Asia Pacific 2013 19th IEEE International Conference on Parallel and Distributed Systems ing Large Scale Cloud Computing in Asia Pacific Amalina Mohamad Sabri 1, Suresh Reuben Balakrishnan 1, Sun Veer Moolye 1, Chung

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

Understanding applications using the BSC performance tools

Understanding applications using the BSC performance tools Understanding applications using the BSC performance tools Judit Gimenez (judit@bsc.es) German Llort(german.llort@bsc.es) Humans are visual creatures Films or books? Two hours vs. days (months) Memorizing

More information