Types of Workloads. Raj Jain. Washington University in St. Louis



Similar documents
CS 147: Computer Systems Performance Analysis

EEM 486: Computer Architecture. Lecture 4. Performance

Selection of Techniques and Metrics

The Art of Workload Selection

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

Benchmarking Large Scale Cloud Computing in Asia Pacific

Chapter 2. Why is some hardware better than others for different programs?

Computer Architecture

Instruction Set Architecture (ISA) Design. Classification Categories

BENCHMARKING AND CAPACITY PLANNING

Operating System Impact on SMT Architecture

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Benchmarks and Performance Tests

Virtuoso and Database Scalability

Rackspace Cloud Databases and Container-based Virtualization

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis

THE NAS KERNEL BENCHMARK PROGRAM

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Redpaper. Performance Metrics in TotalStorage Productivity Center Performance Reports. Introduction. Mary Lovelace

Introducing EEMBC Cloud and Big Data Server Benchmarks

Server and Storage Virtualization. Virtualization. Overview. 5 Reasons to Virtualize

Quantitative Computer Architecture

Capacity Estimation for Linux Workloads

Random-Number Generation

Instruction Set Architecture (ISA)

EE361: Digital Computer Organization Course Syllabus

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends

Performance Workload Design

Case Study I: A Database Service

TPCC-UVa: An Open-Source TPC-C Implementation for Parallel and Distributed Systems

LS DYNA Performance Benchmarks and Profiling. January 2009

Server and Storage Virtualization

A Lab Course on Computer Architecture

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Gartner RPE2 Methodology Overview

TPC-W * : Benchmarking An Ecommerce Solution By Wayne D. Smith, Intel Corporation Revision 1.2

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Binary search tree with SIMD bandwidth optimization using SSE

IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC b Test Report Date: 27, April

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

RED HAT ENTERPRISE VIRTUALIZATION PERFORMANCE: SPECVIRT BENCHMARK

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

Overlapping Data Transfer With Application Execution on Clusters

picojava TM : A Hardware Implementation of the Java Virtual Machine

İSTANBUL AYDIN UNIVERSITY

White Paper February IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario

theguard! ApplicationManager System Windows Data Collector

Computer Architecture. Secure communication and encryption.

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Energy Efficient MapReduce

CSEE W4824 Computer Architecture Fall 2012

Eloquence Training What s new in Eloquence B.08.00

Information Systems. Capacity Planning Monthly Report

Agenda. Capacity Planning practical view CPU Capacity Planning LPAR2RRD LPAR2RRD. Discussion. Premium features Future

WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX620 S6

Chapter 1 Computer System Overview

Analyzing IBM i Performance Metrics

ICS Principles of Operating Systems

Is there any alternative to Exadata X5? March 2015

Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda

Comparing major cloud-service providers: virtual processor performance. A Cloud Report by Danny Gee, and Kenny Li

Sun 8Gb/s Fibre Channel HBA Performance Advantages for Oracle Database

EMBEDDED X86 PROCESSOR PERFORMANCE RATING SYSTEM

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.

Windows Server Performance Monitoring

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Digital Signature. Raj Jain. Washington University in St. Louis

How To Improve Performance On A Single Chip Computer

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Next Generation GPU Architecture Code-named Fermi

Tableau Server 7.0 scalability

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

z/vm Capacity Planning Overview SHARE 117 Orlando Session 09561

SAN Conceptual and Design Basics

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

PERFORMANCE ANALYSIS OF WEB SERVERS Apache and Microsoft IIS

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

ECLIPSE Performance Benchmarks and Profiling. January 2009

Best Practices for Monitoring a Vmware Environment. Gary Powell Senior Consultant IBM SWG Tivoli

Memory Resource Management in VMware ESX Server

CHAPTER 4 MARIE: An Introduction to a Simple Computer

32 K. Stride K 8 K 32 K 128 K 512 K 2 M

The i860 XP Second Generation of the i860 Supercomputing Microprocessor Family. Presentation Outline

On Benchmarking Popular File Systems

Understanding Linux on z/vm Steal Time

SolovatSoft. Load and Performance Test Plan Sample. Title: [include project s release name] Version: Date: SolovatSoft Page 1 of 13

RevoScaleR Speed and Scalability

DELL TM PowerEdge TM T Mailbox Resiliency Exchange 2010 Storage Solution

Transcription:

Types of Workloads Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/ 4-1

Overview! Terminology! Test Workloads for Computer Systems " Addition Instruction " Instruction Mixes " Kernels " Synthetic Programs " Application Benchmarks: Sieve, Ackermann's Function, Debit-Credit, SPEC 4-2

Part II: Measurement Techniques and Tools Measurements are not to provide numbers but insight - Ingrid Bucher 1. What are the different types of workloads? 2. Which workloads are commonly used by other analysts? 3. How are the appropriate workload types selected? 4. How is the measured workload data summarized? 5. How is the system performance monitored? 6. How can the desired workload be placed on the system in a controlled manner? 7. How are the results of the evaluation presented? 4-3

Terminology! Test workload: Any workload used in performance studies. Test workload can be real or synthetic.! Real workload: Observed on a system being used for normal operations.! Synthetic workload: " Similar to real workload " Can be applied repeatedly in a controlled manner " No large real-world data files " No sensitive data " Easily modified without affecting operation " Easily ported to different systems due to its small size " May have built-in measurement capabilities. 4-4

Test Workloads for Computer Systems 1. Addition Instruction 2. Instruction Mixes 3. Kernels 4. Synthetic Programs 5. Application Benchmarks 4-5

Addition Instruction! Processors were the most expensive and most used components of the system! Addition was the most frequent instruction 4-6

Instruction Mixes! Instruction mix = instructions + usage frequency! Gibson mix: Developed by Jack C. Gibson in 1959 for IBM 704 systems. 4-7

! Disadvantages:! Instruction Mixes (Cont) " Complex classes of instructions not reflected in the mixes. " Instruction time varies with:! Addressing modes! Cache hit rates! Pipeline efficiency! Interference from other devices during processormemory access cycles! Parameter values! Frequency of zeros as a parameter! The distribution of zero digits in a multiplier! The average number of positions of preshift in floatingpoint add! Number of times a conditional branch is taken 4-8

Instruction Mixes (Cont)! Performance Metrics: " MIPS = Millions of Instructions Per Second " MFLOPS = Millions of Floating Point Operations Per Second 4-9

! Kernel = nucleus Kernels! Kernel= the most frequent function! Commonly used kernels: Sieve, Puzzle, Tree Searching, Ackerman's Function, Matrix Inversion, and Sorting.! Disadvantages: Do not make use of I/O devices 4-10

Synthetic Programs! To measure I/O performance lead analysts Exerciser loops! The first exerciser loop was by Buchholz (1969) who called it a synthetic program.! A Sample Exerciser: See program listing Figure 4.1 in the book 4-11

Synthetic Programs! Advantage: " Quickly developed and given to different vendors. " No real data files " Easily modified and ported to different systems. " Have built-in measurement capabilities " Measurement process is automated " Repeated easily on successive versions of the operating systems! Disadvantages: " Too small " Do not make representative memory or disk references " Mechanisms for page faults and disk cache may not be adequately exercised. " CPU-I/O overlap may not be representative. " Loops may create synchronizations better or worse performance. 4-12

Application Benchmarks! For a particular industry: Debit-Credit for Banks! Benchmark = workload (Except instruction mixes)! Some Authors: Benchmark = set of programs taken from real workloads! Popular Benchmarks 4-13

Sieve! Based on Eratosthenes' sieve algorithm: find all prime numbers below a given number n.! Algorithm: " Write down all integers from 1 to n " Strike out all multiples of k, for k=2, 3,, n.! Example: " Write down all numbers from 1 to 20. Mark all as prime: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20! Remove all multiples of 2 from the list of primes: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 4-14

Sieve (Cont)! The next integer in the sequence is 3. Remove all multiples of 3: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20! 5 > 20 Stop! Pascal Program to Implement the Sieve Kernel: See Program listing Figure 4.2 in the book 4-15

Ackermann's Function! To assess the efficiency of the procedure-calling mechanisms. The function has two parameters and is defined recursively.! Ackermann(3, n) evaluated for values of n from one to six.! Metrics: " Average execution time per call " Number of instructions executed per call, and " Stack space per call! Verification: Ackermann(3, n) = 2 n+3-3! Number of recursive calls in evaluating Ackermann(3,n): (512 4 n-1-15 2 n+3 + 9n + 37)/3 Execution time per call.! Depth of the procedure calls = 2 n+3-4 stack space required doubles when n n+1. 4-16

Ackermann Program in Simula! See program listing Figure 4.3 in the book 4-17

! Whetstone! U.S. Steel! LINPACK! Dhrystone! Doduc! TOP Other Benchmarks! Lawrence Livermore Loops! Digital Review Labs! Abingdon Cross Image-Processing Benchmark 4-18

Debit-Credit Benchmark! A de facto standard for transaction processing systems.! First recorded in Anonymous et al (1975).! In 1973, a retail bank wanted to put its 1000 branches, 10,000 tellers, and 10,000,000 accounts online with a peak load of 100 Transactions Per Second (TPS).! Each TPS requires 10 branches, 100 tellers, and 100,000 accounts. 4-19

Debit-Credit (Cont) 4-20

Debit-Credit Benchmark (Continued)! Metric: price/performance ratio.! Performance: Throughput in terms of TPS such that 95% of all transactions provide one second or less response time.! Response time: Measured as the time interval between the arrival of the last bit from the communications line and the sending of the first bit to the communications line.! Cost = Total expenses for a five-year period on purchase, installation, and maintenance of the hardware and software in the machine room.! Cost does not include expenditures for terminals, communications, application development, or operations. 4-21

Pseudo-code Definition of Debit-Credit! See Figure 4.5 in the book! Four record types: account, teller, branch, and history.! Fifteen percent of the transactions require remote access! Transactions Processing Performance Council (TPC) was formed in August 1988.! TPC Benchmark TM A is a variant of the debit-credit! Metrics: TPS such that 90% of all transactions provide two seconds or less response time. 4-22

SPEC Benchmark Suite! Systems Performance Evaluation Cooperative (SPEC): Nonprofit corporation formed by leading computer vendors to develop a standardized set of benchmarks.! Release 1.0 consists of the following 10 benchmarks: GCC, Espresso, Spice 2g6, Doduc, LI, Eqntott, Matrix300, Fpppp, Tomcatv! Primarily stress the CPU, Floating Point Unit (FPU), and to some extent the memory subsystem Το compare CPU speeds.! Benchmarks to compare I/O and other subsystems may be included in future releases. 4-23

SPEC (Cont)! The elapsed time to run two copies of a benchmark on each of the N processors of a system (a total of 2N copies) is measured and compared with the time to run two copies of the benchmark on a reference system (which is VAX-11/780 for Release 1.0).! For each benchmark, the ratio of the time on the system under test and the reference system is reported as SPECthruput using a notation of #CPU@Ratio. For example, a system with three CPUs taking 1/15 times as long as the the reference system on GCC benchmark has a SPECthruput of 3@15.! Measure of the per processor throughput relative to the reference system 4-24

SPEC (Cont)! The aggregate throughput for all processors of a multiprocessor system can be obtained by multiplying the ratio by the number of processors. For example, the aggregate throughput for the above system is 45.! The geometric mean of the SPECthruputs for the 10 benchmarks is used to indicate the overall performance for the suite and is called SPECmark. 4-25

Summary! Synthetic workload are representative, repeatable, and avoid sensitive information! Add instruction most frequent instruction initially! Instruction mixes, Kernels, synthetic programs! Application benchmarks: Sieve, Ackerman,! Benchmark standards: Debit-Credit, SPEC 4-26

Exercise 4.1 Select an area of computer systems (for example, processor design, networks, operating systems, or databases), review articles on performance evaluation in that area and make a list of benchmarks used in those articles. 4-27

Exercise 4.2 Implement the Sieve workload in a language of your choice, run it on systems available to you, and report the results. 4-28

Homework 4! Make a list of latest workloads from www.spec.org 4-29