Leistungsanalyse von Rechnersystemen

Size: px
Start display at page:

Download "Leistungsanalyse von Rechnersystemen"

Transcription

1 Center for Information Services and High Performance Computing (ZIH) Leistungsanalyse von Rechnersystemen 29. Oktober 2008 Nöthnitzer Straße 46 Raum 1026 Tel Holger Brunst (holger.brunst@tu-dresden.de) Matthias S. Mueller (matthias.mueller@tu-dresden.de) Summary of Previous Lecture (1) Remarks: Doherty (1970) Performance is the degree to which a computing system meets expectations of the persons involved in it. Main objective: Get highest performance for a given cost System: An arbitrary collection of hardware, software, and firmware: e.g. CPU, database, network of computers Metric: A criteria used to evaluate the performance of a system: e.g. response time, throughput, FLOPS Workload: The overall sum of user requests to a system e.g.: CPU workload: Collection of instructions to execute 1

2 Summary of Previous Lecture (2) Discussion of performance analysis examples and questions Selection of technique, metric, and workload Correctness of performance measurements Measurement and simulation design The art of performance analysis Successful evaluation cannot be produced mechanically Evaluation requires detailed knowledge of the system to be modeled Summary of Previous Lecture (3) Knowledge of common mistakes and games is important for choosing the right methodology as an analyst; questioning offers, recommendations, and advertisements as a consumer, buying agent, or decision maker Classes of common mistakes: Goals Methodology Completeness Analysis Presentation Checklist for avoiding problems Systematic approach to performance evaluation 2

3 Summary of Previous Lecture: Questions What does performance mean? What are the main reasons to do a performance analysis? What are the main tasks? What s a system in performance analysis terminology? What do the terms metric and workload stand for? What s a performance parameter? What s a performance factor? Center for Information Services and High Performance Computing (ZIH) Parallel Metrics Nöthnitzer Straße 46 Raum 1026 Tel Holger Brunst (holger.brunst@tu-dresden.de) Matthias S. Mueller (matthias.mueller@tu-dresden.de) 3

4 Excursion on Speedup and Efficiency Metrics Comparison of sequential and parallel algorithms Speedup: S n = T 1 T n n is the number of processors T 1 is the execution time of the sequential algorithm T n is the execution time of the parallel algorithm with n processors Efficiency: E p = S p p Its value estimates how well-utilized p processors solve a given problem Usually between zero and one. Exception: Super linear speedup (later) Amdahl s Law Find the maximum expected improvement to an overall system when only part of the system is improved Serial execution time = s+p Parallel execution time = s+p/n S n = s + p s + p n Normalizing with respect to serial time (s+p) = 1 results in: S n = 1/(s+p/n) Drops off rapidly as serial fraction increases Maximum speedup possible = 1/s, independent of n the number of processors! Bad news: If an application has only 1% serial work (s = 0.01) then you will never see a speedup greater than 100. So, why do we build system with more than 100 processors? What is wrong with this argument? 4

5 Scaled Speedup (Gustafson-Barsis Law) Amdahl s speedup equation assumes p is independent of n, in other words the problem size remains the same Gustafson-Barsis law states that any sufficiently large problem can be efficiently parallelized More realistic to assume runtime remains the same, NOT the problem size If the problem size scales up, does the serial part also increase? Parallel execution time = s+p Serial execution time = s+np S sn = s + pn s + p Normalizing with respect to parallel execution time results in: S sn = n+(1-n) s = p(n-1) + 1 Efficiency and Serial Fraction Strong scalability vs. weak scalability E n = S n /n, does not tell the whole story is it necessarily bad if efficiency drops as you increase n for a given problem size? s is supposed to be a constant this assumes work is load balanced no overhead for synchronizing the processors Experimentally measure the serial fraction if s does not remain constant, what can we discern? 5

6 Superlinear/Superunitary Speedup Work in algorithm = W real +W ovhd What is W ovhd? Super-unitary speedup possible if total work done by n processors is strictly less than that done by a single processor Reasons for super-unitary speedup Memory and cache effects Dividing up resource management overheads Hiding latency for remote operations Randomized algorithms In literature superlinear speedup is sometime also referred to us superunitary speedup which might be mathematically more correct Center for Information Services and High Performance Computing (ZIH) Workload types, selection and characterization Nöthnitzer Straße 46 Raum 1026 Tel Holger Brunst (holger.brunst@tu-dresden.de) Matthias S. Mueller (matthias.mueller@tu-dresden.de) 6

7 Types of Workloads Test workload: Any workload used in performance studies Real or synthetic Real workload: Observed on a system being used for normal operation Cannot be repeated May contain sensitive data Synthetic workload: Should be representative for a real workload Often smaller in size Historical examples for test workloads Addition instruction Instruction mixes Kernels Synthetic programs Application benchmarks 7

8 Popular benchmarks: Eratosthenes sieve algorithm Algorithm to find prime numbers Kernel Simple An algorithm is always independent of a computer language or specific implementation No very representative of today's use of computers Popular benchmarks: Ackermann s Function Ackermann(n,m) := n+1 if m=0 Ackermann(m-1,1) if n=0 Ackermann(m-1, Ackermann(m,n-1)) Used to assess the efficiency of procedure calls Ackermann(3,n) requires (512*4**(n-1)-15*2**(n+3)+9*n+37)/3 calls and a stack size 2**(n+3)-4 8

9 Popular benchmarks: Whetstone Used at British Central Computer Agency 11 modules Representative f 949 ALGOL programs Available in ALGOL, FORTRAN, PL/I and other programs See Curnow and Wichmann (1975) Results in KWHIPS (Kilo Whetstone Instructions Per Second) Workloads characteristics: Floating point intensive Cache friendly No I/O Popular benchmarks: LINPACK Developed by Jack Dongarra (1983) at ANL (now ICL, UTK) Solves a dense system of linear equations Algorithmic definition of the benchmark Reference implementation available (HPL) Makes have use of BLAS One fixed dataset: 100x100 Used as the benchmark for the TOP500 list Many vendors have its own hand-tuned implementation 9

10 Popular benchmarks: Dhrystone Developed in 1984 by Reinhold Weicker at Siemens Represents systems programming environments Available in C, Pascal and Ada Results are in Dhrystone Instructions Per Seconds (DIPS) Includes ground rules for building and executing Dhrystone (run rules) Popular Benchmarks: Lawrence Livermore Loops 24 separate tests Largely vectorizable Assembled at LLNL (see McMahon 1986) 10

11 Popular Benchmarks: Transaction Processing (TPC-C) Successor of the Debit-Credit Benchmark TPC-C is an on-line transaction processing benchmark Results reports performance (tpmc) and price/performance ($/tmpc) System reported has to be available to the customer (at that price) Running the benchmarks requires a costly setup: SPEC groups and benchmarks Open Systems Group (desktop systems, high-end workstations and servers) CPU (CPU benchmarks) JAVA (java client and server side benchmarks) MAIL (mail server benchmarks) SFS (file server benchmarks) WEB (web Server benchmarks) High Performance Group (HPC systems) OMP (OpenMP benchmark) HPC (HPC application benchmark) MPI (MPI application benchmark) Graphics Performance Groups (Graphics) Apc (Graphics application benchmarks) Opc (OpenGL performance benchmarks) 11

12 Center for Information Services and High Performance Computing (ZIH) Workload Selection Nöthnitzer Straße 46 Raum 1026 Tel Holger Brunst (holger.brunst@tu-dresden.de) Matthias S. Mueller (matthias.mueller@tu-dresden.de) System under Study Seems to be an easy thing to define Be aware of different abstraction layers Example ISO/OSI reference model for computer networks: 1. Application (mail, FTP) 2. Presentation (Data compression,..) 3. Session (Dialogs) 4. Transport (Messages) 5. Network (Packets) 6. Datalink (Frames) 7. Physical (Bits) 12

13 Level of Detail of the workload description Examples: Most frequent request (e.g. Addition) Frequency of request type (instruction mix) Time-stamped sequence of requests Average resource demand (e.g. 20 I/O requests per second) Distribution of resource demands (not only the average, but also probability distribution) Representativeness After all benchmarks are not a merit of their own, they should represent real workloads: Different characteristics to consider: Arrival rate of requests Resource demands Resource usage profile (sequence and amounts of resources used by an application) To be representative a test workload has to follow the user behavior in a timely fashion!!! 13

14 Center for Information Services and High Performance Computing (ZIH) SPEC Benchmarks Vorlesung Leistungsanalyse Nöthnitzer Straße 46 Raum 1026 Tel Holger Brunst (holger.brunst@tu-dresden.de) Matthias S. Mueller (matthias.mueller@tu-dresden.de) Outline What is SPEC? Who is SPEC? Some SPEC benchmarks: SPEC CPU SPEC HPC SPEC OMP SPEC MPI Summary 14

15 Center for Information Services and High Performance Computing (ZIH) What and who is SPEC? Nöthnitzer Straße 46 Raum 1026 Tel Holger Brunst (holger.brunst@tu-dresden.de) Matthias S. Mueller (matthias.mueller@tu-dresden.de) What is SPEC? The Standard Performance Evaluation Corporation (SPEC) is a non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of highperformance computers. SPEC develops suites of benchmarks and also reviews and publishes submitted results from our member organizations and other benchmark licensees. For more details see 15

16 SPEC Members SPEC Members: 3DLabs * Acer Inc. * Advanced Micro Devices * Apple Computer, Inc. * ATI Research * Azul Systems, Inc. * BEA Systems * Borland * Bull S.A. * CommuniGate Systems * Dell * EMC * Exanet * Fabric7 Systems, Inc. * Freescale Semiconductor, Inc. * Fujitsu Limited * Fujitsu Siemens * Hewlett-Packard * Hitachi Data Systems * Hitachi Ltd. * IBM * Intel * ION Computer Systems * JBoss * Microsoft * Mirapoint * NEC - Japan * Network Appliance * Novell * NVIDIA * Openwave Systems * Oracle * P.A. Semi * Panasas * PathScale * The Portland Group * S3 Graphics Co., Ltd. * SAP AG * SGI * Sun Microsystems * Super Micro Computer, Inc. * Sybase * Symantec Corporation * Unisys * Verisign * Zeus Technology * SPEC Associates: California Institute of Technology * Center for Scientific Computing (CSC) * Defence Science and Technology Organisation - Stirling * Dresden University of Technology * Duke University * JAIST * Kyushu University * Leibniz Rechenzentrum - Germany * National University of Singapore * New South Wales Department of Education and Training * Purdue University * Queen's University * Rightmark * Stanford University * Technical University of Darmstadt * Texas A&M University * Tsinghua University * University of Aizu - Japan * University of California - Berkeley * University of Central Florida * University of Illinois - NCSA * University of Maryland * University of Modena * University of Nebraska, Lincoln * University of New Mexico * University of Pavia * University of Stuttgart * University of Texas at Austin * University of Texas at El Paso * University of Tsukuba * University of Waterloo * VA Austin Automation Center * SPEC members in Dresden: Workshop June

17 SPEC groups Open Systems Group (desktop systems, high-end workstations and servers) CPU (CPU benchmarks) JAVA (java client and server side benchmarks) MAIL (mail server benchmarks) SFS (file server benchmarks) WEB (web Server benchmarks) High Performance Group (HPC systems) OMP (OpenMP benchmark) HPC (HPC application benchmark) MPI (MPI application benchmark) Graphics Performance Groups (Graphics) Apc (Graphics application benchmarks) Opc (OpenGL performance benchmarks) SPEC HPG = SPEC High-Performance Group Founded in 1994 Mission: To establish, maintain, and endorse a suite of benchmarks that are representative of real-world highperformance computing applications. SPEC/HPG includes members from both industry and academia. Benchmark products: SPEC OMP (OMPM2001, OMPL2001) SPEC HPC2002 released at SC 2002 SPEC MPI (under development) 17

18 Currently active SPEC HPG Members Fujitsu HP IBM Intel SGI SUN UNISYS University of Purdue Technische Universität Dresden HPG (High Performance Group) Benchmark Suites MPI2007 OMP2001 OMPL2001 HPC96 HPC2002 Founding of SPEC HPG Jan June 2001 June 2002 Jan

19 Center for Information Services and High Performance Computing (ZIH) Overview and Positioning Nöthnitzer Straße 46 Raum 1026 Tel Holger Brunst (holger.brunst@tu-dresden.de) Matthias S. Mueller (matthias.mueller@tu-dresden.de) Where is SPEC Relative to Other Benchmarks? There are many metrics, each one has its purpose Computer Hardware Raw machine performance: Tflops Microbenchmarks: Stream Algorithmic benchmarks: Linpack Compact Apps/Kernels: NAS benchmarks Application Suites: SPEC User-specific applications: Custom benchmarks Applications 19

20 Why do we need benchmarks? Identify problems: measure machine properties Time evolution: verify that we make progress Coverage: Help the vendors to have representative codes: Increase competition by transparency Drive future development (see SPEC CPU2000) Relevance: Help the customers to choose the right computer Comparison of different benchmark classes coverage relevance Identify problems Time evolution Micro Algorithmic Kernels SPEC Apps

21 Center for Information Services and High Performance Computing (ZIH) Nöthnitzer Straße 46 Raum 1026 Tel SPEC CPU 2006 From John Henning s talk at SPEC Workshop June 2007, Dresden Holger Brunst (holger.brunst@tu-dresden.de) Matthias S. Mueller (matthias.mueller@tu-dresden.de) SPEC CPU2006 History Released August 2006 Replaces CPU2000 (retired February 2007) 5th CPU benchmark SPECmark (later called CPU89 ) SPEC92 (later called CPU92 ) CPU95 CPU2000 CPU2006 Note: these updates are required to stay representative Question to the audience: What kind of application would you add? 21

22 CINT 2006 Benchmark L Application Area Brief Description 400.perlbench C Programming Language Derived from Perl V The workload includes SpamAssassin, MHonArc (an indexer), and specdiff (SPEC's tool that checks benchmark outputs). 401.bzip2 C Compression Julian Seward's bzip2 version 1.0.3, modified to do most work in memory, rather than doing I/O. 403.gcc C C-Compiler Based on gcc Version 3.2, generates code for Opteron. 429.mcf C Combinatorial Optim. Vehicle scheduling. Uses a network simplex algorithm (which is also used in commercial products) to schedule public transport. 445.gobmk C Artificial Intelligence: Go Plays the game of Go, a simply described but deeply complex game. 456.hmmer C Search Gene Sequence Protein sequence analysis using profile hidden Markov models (profile HMMs) 458.sjeng C AI: chess A highly-ranked chess program that also plays several chess variants. 462.libquantum C Physics Quantum Comp. Simulates a quantum computer, running Shor's polynomial-time factorization algorithm. 464.h264ref C Video Compression A reference implementation of H.264/AVC, encodes a videostream using 2 parameter sets. The H.264/AVC standard is expected to replace MPEG2 471.omnetpp C++ Discrete Event Simulation Uses the OMNet++ discrete event simulator to model a large Ethernet campus network. 473.astar C++ Path-finding Algorithms Pathfinding library for 2D maps, including the well known A* algorithm. 483.xalancbmk C++ XML Processing A modified version of Xalan-C++, which transforms XML documents to other document types. CFP 2006 (part I) Benchmark Lang. Application Area Brief Description 410.bwaves Fortran Fluid Dynamics Computes 3D transonic transient laminar viscous flow. 416.gamess Fortran Quantum Chemistry. Implements a wide range of quantum chemical computations. The SPEC workload does self-consistent field calculations using the Restricted Hartree Fock method, Restricted open-shell Hartree-Fock, and Multi- Configuration Self-Consistent Field 433.milc C Physics/QCD A gauge field generating program for lattice gauge theory with dynamical quarks. 434.zeusmp Fortran Physics / CFD ZEUS-MP is a computational fluid dynamics code developed at the Laboratory for Computational Astrophysics (NCSA, University of Illinois at Urbana-Champaign) for the simulation of astrophysical phenomena. 435.gromacs C, Fortran Biochemistry Molecular dynamics, i.e. simulate Newtonian equations of motion for hundreds to millions of particles. The test case simulates protein Lysozyme in a solution. 436.cactusADM C,Fortran Physics / General Relativity Solves the Einstein evolution equations using a staggered-leapfrog numerical method 437.leslie3d Fortran Fluid Dynamics Computational Fluid Dynamics (CFD) using Large-Eddy Simulations with Linear-Eddy Model in 3D. Uses MacCormack Predictor-Corrector time integration 444.namd C++ Biology Molecular Dynamics Simulates biomolecular systems. Test case has 92,224 atoms of apolipoprotein A-I. 447.dealII C++ FE Analysis deal.ii is a C++ library targeted at adaptive finite elements and error estimation. The testcase solves a Helmholtz-type equation with nonconstant coefficients. 22

23 CFP 2006 (part II) Benchmark Language Application Area Brief Description 450.soplex C++ Linear Programming, Solves a linear program using a simplex algorithm and sparse linear algebra. Test Optimization cases include railroad planning and military airlift models. 453.povray C++ Image Ray-tracing Image rendering. The testcase is a 1280x1024 antialiased image of a landscape with some abstract objects with textures using a Perlin noise function. 454.calculix C, F Structural Mechanics Finite element code for 3D structural applications. Uses the SPOOLES solver library. 459.GemsFDTD F Electromagnetics Solves Maxwell equations in 3D using finite-difference time-domain (FDTD) method. 465.tonto Fortran Quantum Chemistry An open source quantum chemistry package, using an object-oriented design in Fortran 95. The test case places a constraint on a molecular Hartree-Fock wavefunction calculation to better match experimental X-ray diffraction data. 470.lbm C Fluid Dynamics Implements the "Lattice-Boltzmann Method" to simulate incompressible fluids in 3D 481.wrf C,F Weather Weather modeling from scales of meters to thousands of kilometers. The test case is from a 30km area over 2 days. 482.sphinx3 C Speech recognition A widely-known speech recognition system from Carnegie Mellon University Code growth 23

24 Metrics Speed SPECint_base2006 (Required Base result) SPECint2006 (Optional Peak result) SPECfp_base2006 (Required Base result) SPECfp2006 (Optional Peak result) Throughput SPECint_rate_base2006 (Required Base result) SPECint_rate2006 (Optional Peak result) SPECfp_rate_base2006 (Required Base result) SPECfp_rate2006 (Optional Peak result) Speed Metric for Single Benchmark For each benchmark in suite, compute ratio vs. time on a reference system A 1997 Sun system with 296 MHz UltraSPARC II Similar but not identical to CPU2000 ref machine Example: 400.perlbench on a year 2006 imac took 948 seconds On the reference system, took 9770 seconds SPECratio = 10.3 (9770/948) If your workload looks like perl, you might find that this modern imac runs around 10x faster than a state-of-the-1997-art workstation. 24

25 Overall Speed Metric To obtain the overall speed metrics: geometric mean of the individual SPECratios Why geometric mean? Because this is the best answer to the question Without knowing how much time I will spend in text processing vs. network mapping vs. compiling vs. video compression, please tell me about how much faster this machine will be than the reference system. Motivation for Throughput Metric Differs from speed Stove analogy: One big flame cooks one big pot with one hogshead in one hour 6 little flames cook 6 little pots, each holding one firkin, in 15 minutes Which is better? Well, big flame does ~250 liters/hour; each little flame does only ~40 * 4 = 160 liters/hour 25

26 Throughput vs. Speed Big flame does ~250 liters/hour; each little flame does only ~40 * 4 = 160 liters/hour Alternatives: If I only need to heat up an UNOPENED container holding 1 gallon of soup, supper can be served most quickly if I put it on the big flame If I need to heat up one butt of soup (=2 hogsheads), and if I can open the container, I'd be better off using many small flames In IT business: Processing one image in Photoshop or Gimp vs. Rendering the next movie with thousands of pictures CPU2006 Throughput Metric Formula: the number of copies run * reference time for the benchmark / elapsed time in seconds Example: Sun Fire E25K runs 144 copies of 400.perlbench in1066 seconds: 144 * 9770 / 1066 =

27 Summary of Metrics Two different kind of metrics speed (single application turnaround) rate (thoughput) Run rules make the different between base and peak Base: conservative optimization, less freedom Peak: more aggressive optimization, more freedom Tow benchmark sets SPECint and SPECfp 2 3 = 8 different metrics If you look at the single application results you get: 2*2*(12+17)=116 different metics Example for Run Rules Base does not allow feedback directed optimization (still legal in peak) An unlimited number of flags may be set in base, Why? Because flag counting is not worth arguing about. For example, is -fast:np27 one flag, two, or three? Prove it. What if it's -fast_np27? What it it s fast np27 or fast np27? 27

28 SPEC CPU2000 Result Center for Information Services and High Performance Computing (ZIH) Thank You! Nöthnitzer Straße 46 Raum 1026 Tel Holger Brunst (holger.brunst@tu-dresden.de) Matthias S. Mueller (matthias.mueller@tu-dresden.de) 28

EPA Data Center Efficiency Workshop SPEC Benchmarks. March 27, 2006 Walter Bays, President, SPEC

EPA Data Center Efficiency Workshop SPEC Benchmarks. March 27, 2006 Walter Bays, President, SPEC EPA Data Center Efficiency Workshop SPEC Benchmarks March 27, 2006 Walter Bays, President, SPEC SPEC Background Benchmark wars of the 80's RISC vs. CISC Vendors & EE Times created SPEC for better benchmarks

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 6 Fundamentals in Performance Evaluation Computer Architecture Part 6 page 1 of 22 Prof. Dr. Uwe Brinkschulte,

More information

Achieving QoS in Server Virtualization

Achieving QoS in Server Virtualization Achieving QoS in Server Virtualization Intel Platform Shared Resource Monitoring/Control in Xen Chao Peng (chao.p.peng@intel.com) 1 Increasing QoS demand in Server Virtualization Data center & Cloud infrastructure

More information

Types of Workloads. Raj Jain. Washington University in St. Louis

Types of Workloads. Raj Jain. Washington University in St. Louis Types of Workloads Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/ 4-1 Overview!

More information

How Much Power Oversubscription is Safe and Allowed in Data Centers?

How Much Power Oversubscription is Safe and Allowed in Data Centers? How Much Power Oversubscription is Safe and Allowed in Data Centers? Xing Fu 1,2, Xiaorui Wang 1,2, Charles Lefurgy 3 1 EECS @ University of Tennessee, Knoxville 2 ECE @ The Ohio State University 3 IBM

More information

An OS-oriented performance monitoring tool for multicore systems

An OS-oriented performance monitoring tool for multicore systems An OS-oriented performance monitoring tool for multicore systems J.C. Sáez, J. Casas, A. Serrano, R. Rodríguez-Rodríguez, F. Castro, D. Chaver, M. Prieto-Matias Department of Computer Architecture Complutense

More information

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: Clock cycle where: Clock rate = 1 / clock cycle f = 1 /C

More information

Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking

Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking Kathlene Hurt and Eugene John Department of Electrical and Computer Engineering University of Texas at San Antonio

More information

Compiler-Assisted Binary Parsing

Compiler-Assisted Binary Parsing Compiler-Assisted Binary Parsing Tugrul Ince tugrul@cs.umd.edu PD Week 2012 26 27 March 2012 Parsing Binary Files Binary analysis is common for o Performance modeling o Computer security o Maintenance

More information

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle? Lecture 3: Evaluating Computer Architectures Announcements - Reminder: Homework 1 due Thursday 2/2 Last Time technology back ground Computer elements Circuits and timing Virtuous cycle of the past and

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis 1 / 39 Overview Overview Overview What is a Workload? Instruction Workloads Synthetic Workloads Exercisers and

More information

An examination of the dual-core capability of the new HP xw4300 Workstation

An examination of the dual-core capability of the new HP xw4300 Workstation An examination of the dual-core capability of the new HP xw4300 Workstation By employing single- and dual-core Intel Pentium processor technology, users have a choice of processing power options in a compact,

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

Using GPUs in the Cloud for Scalable HPC in Engineering and Manufacturing March 26, 2014

Using GPUs in the Cloud for Scalable HPC in Engineering and Manufacturing March 26, 2014 Using GPUs in the Cloud for Scalable HPC in Engineering and Manufacturing March 26, 2014 David Pellerin, Business Development Principal Amazon Web Services David Hinz, Director Cloud and HPC Solutions

More information

Application of a Development Time Productivity Metric to Parallel Software Development

Application of a Development Time Productivity Metric to Parallel Software Development Application of a Development Time Metric to Parallel Software Development Andrew Funk afunk@ll.mit.edu Victor Basili 2 basili@cs.umd.edu Lorin Hochstein 2 lorin@cs.umd.edu Jeremy Kepner kepner@ll.mit.edu

More information

Reducing Dynamic Compilation Latency

Reducing Dynamic Compilation Latency LLVM 12 - European Conference, London Reducing Dynamic Compilation Latency Igor Böhm Processor Automated Synthesis by iterative Analysis The University of Edinburgh LLVM 12 - European Conference, London

More information

Introducing EEMBC Cloud and Big Data Server Benchmarks

Introducing EEMBC Cloud and Big Data Server Benchmarks Introducing EEMBC Cloud and Big Data Server Benchmarks Quick Background: Industry-Standard Benchmarks for the Embedded Industry EEMBC formed in 1997 as non-profit consortium Defining and developing application-specific

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

A PERFORMANCE COMPARISON USING HPC BENCHMARKS: WINDOWS HPC SERVER 2008 AND RED HAT ENTERPRISE LINUX 5

A PERFORMANCE COMPARISON USING HPC BENCHMARKS: WINDOWS HPC SERVER 2008 AND RED HAT ENTERPRISE LINUX 5 A PERFORMANCE COMPARISON USING HPC BENCHMARKS: WINDOWS HPC SERVER 2008 AND RED HAT ENTERPRISE LINUX 5 R. Henschel, S. Teige, H. Li, J. Doleschal, M. S. Mueller October 2010 Contents HPC at Indiana University

More information

Performance Monitoring of Parallel Scientific Applications

Performance Monitoring of Parallel Scientific Applications Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure

More information

How Much Power Oversubscription is Safe and Allowed in Data Centers?

How Much Power Oversubscription is Safe and Allowed in Data Centers? How Much Power Oversubscription is Safe and Allowed in Data Centers? Xing Fu, Xiaorui Wang University of Tennessee, Knoxville, TN 37996 The Ohio State University, Columbus, OH 43210 {xfu1, xwang}@eecs.utk.edu

More information

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009 ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory

More information

2: Computer Performance

2: Computer Performance 2: Computer Performance http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture2.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12

More information

Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 64 Architecture

Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 64 Architecture Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 64 Architecture Dong Ye David Kaeli Northeastern University Joydeep Ray Christophe Harle AMD Inc. IISWC 2006 1 Outline Motivation

More information

Competitive Comparison Dual-Core Intel Xeon Processor-based Platforms vs. AMD Opteron*

Competitive Comparison Dual-Core Intel Xeon Processor-based Platforms vs. AMD Opteron* Competitive Guide Dual-Core Intel Xeon Processor-based Systems Business Enterprise Competitive Comparison Dual-Core Intel Xeon Processor-based Platforms vs. AMD Opteron* Energy Efficient Performance Get

More information

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012 Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),

More information

System Models for Distributed and Cloud Computing

System Models for Distributed and Cloud Computing System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems

More information

Chapter 2. Why is some hardware better than others for different programs?

Chapter 2. Why is some hardware better than others for different programs? Chapter 2 1 Performance Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation Why is some hardware better than

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX620 S6

WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX620 S6 WHITE PAPER PERFORMANCE REPORT PRIMERGY BX620 S6 WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX620 S6 This document contains a summary of the benchmarks executed for the PRIMERGY BX620

More information

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie HPC/HTC vs. Cloud Benchmarking An empirical evalua.on of the performance and cost implica.ons Kashif Iqbal - PhD Kashif.iqbal@ichec.ie ICHEC, NUI Galway, Ireland With acknowledgment to Michele MicheloDo

More information

Power Benchmarking: A New Methodology for Analyzing Performance by Applying Energy Efficiency Metrics

Power Benchmarking: A New Methodology for Analyzing Performance by Applying Energy Efficiency Metrics Power Benchmarking: A New Methodology for Analyzing Performance by Applying Energy Efficiency Metrics June 2, 2006 Elisabeth Stahl Industry Solutions and Proof of Concept Centers IBM Systems and Technology

More information

Understanding the Performance of an X550 11-User Environment

Understanding the Performance of an X550 11-User Environment Understanding the Performance of an X550 11-User Environment Overview NComputing's desktop virtualization technology enables significantly lower computing costs by letting multiple users share a single

More information

Performance metrics for parallel systems

Performance metrics for parallel systems Performance metrics for parallel systems S.S. Kadam C-DAC, Pune sskadam@cdac.in C-DAC/SECG/2006 1 Purpose To determine best parallel algorithm Evaluate hardware platforms Examine the benefits from parallelism

More information

Scalability evaluation of barrier algorithms for OpenMP

Scalability evaluation of barrier algorithms for OpenMP Scalability evaluation of barrier algorithms for OpenMP Ramachandra Nanjegowda, Oscar Hernandez, Barbara Chapman and Haoqiang H. Jin High Performance Computing and Tools Group (HPCTools) Computer Science

More information

The Value of High-Performance Computing for Simulation

The Value of High-Performance Computing for Simulation White Paper The Value of High-Performance Computing for Simulation High-performance computing (HPC) is an enormous part of the present and future of engineering simulation. HPC allows best-in-class companies

More information

Schedulability Analysis for Memory Bandwidth Regulated Multicore Real-Time Systems

Schedulability Analysis for Memory Bandwidth Regulated Multicore Real-Time Systems Schedulability for Memory Bandwidth Regulated Multicore Real-Time Systems Gang Yao, Heechul Yun, Zheng Pei Wu, Rodolfo Pellizzoni, Marco Caccamo, Lui Sha University of Illinois at Urbana-Champaign, USA.

More information

MEng, BSc Applied Computer Science

MEng, BSc Applied Computer Science School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions

More information

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

More information

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1 Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these

More information

Analysis of Parallel Software Development using the

Analysis of Parallel Software Development using the CTWatch Quarterly November 2006 46 Analysis of Parallel Software Development using the Relative Development Time Productivity Metric Introduction As the need for ever greater computing power begins to

More information

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends This Unit CIS 501 Computer Architecture! Metrics! Latency and throughput! Reporting performance! Benchmarking and averaging Unit 2: Performance! CPU performance equation & performance trends CIS 501 (Martin/Roth):

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

HPC Deployment of OpenFOAM in an Industrial Setting

HPC Deployment of OpenFOAM in an Industrial Setting HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment

More information

Performance of HPC Applications on the Amazon Web Services Cloud

Performance of HPC Applications on the Amazon Web Services Cloud Cloudcom 2010 November 1, 2010 Indianapolis, IN Performance of HPC Applications on the Amazon Web Services Cloud Keith R. Jackson, Lavanya Ramakrishnan, Krishna Muriki, Shane Canon, Shreyas Cholia, Harvey

More information

Parallelism and Cloud Computing

Parallelism and Cloud Computing Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication

More information

Software Development around a Millisecond

Software Development around a Millisecond Introduction Software Development around a Millisecond Geoffrey Fox In this column we consider software development methodologies with some emphasis on those relevant for large scale scientific computing.

More information

Graphic Chartiles and High Performance Computing

Graphic Chartiles and High Performance Computing Center for Information Services and High Performance Computing (ZIH) Leistungsanalyse von Rechnersystemen Data Presentation Nöthnitzer Straße 46 Raum 1026 Tel. +49 351-463 - 35048 Holger Brunst (holger.brunst@tu-dresden.de)

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

Seagate HPC /Big Data Business Tech Talk. December 2014

Seagate HPC /Big Data Business Tech Talk. December 2014 Seagate HPC /Big Data Business Tech Talk December 2014 Safe Harbor Statement This document contains forward-looking statements within the meaning of Section 27A of the Securities Act of 1933, and Section

More information

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems 202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

EEM 486: Computer Architecture. Lecture 4. Performance

EEM 486: Computer Architecture. Lecture 4. Performance EEM 486: Computer Architecture Lecture 4 Performance EEM 486 Performance Purchasing perspective Given a collection of machines, which has the» Best performance?» Least cost?» Best performance / cost? Design

More information

Cloud computing. Intelligent Services for Energy-Efficient Design and Life Cycle Simulation. as used by the ISES project

Cloud computing. Intelligent Services for Energy-Efficient Design and Life Cycle Simulation. as used by the ISES project Intelligent Services for Energy-Efficient Design and Life Cycle Simulation Project number: 288819 Call identifier: FP7-ICT-2011-7 Project coordinator: Technische Universität Dresden, Germany Website: ises.eu-project.info

More information

Comparison of Windows IaaS Environments

Comparison of Windows IaaS Environments Comparison of Windows IaaS Environments Comparison of Amazon Web Services, Expedient, Microsoft, and Rackspace Public Clouds January 5, 215 TABLE OF CONTENTS Executive Summary 2 vcpu Performance Summary

More information

DELL VS. SUN SERVERS: R910 PERFORMANCE COMPARISON SPECint_rate_base2006

DELL VS. SUN SERVERS: R910 PERFORMANCE COMPARISON SPECint_rate_base2006 DELL VS. SUN SERVERS: R910 PERFORMANCE COMPARISON OUR FINDINGS The latest, most powerful Dell PowerEdge servers deliver better performance than Sun SPARC Enterprise servers. In Principled Technologies

More information

SUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS - 2013 UPDATE

SUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS - 2013 UPDATE SUBJECT: SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE KEYWORDS:, CORE, PROCESSOR, GRAPHICS, DRIVER, RAM, STORAGE SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE Below is a summary of key components of an ideal SolidWorks

More information

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs. This Unit CIS 501: Computer Architecture Unit 4: Performance & Benchmarking Metrics Latency and throughput Speedup Averaging CPU Performance Performance Pitfalls Slides'developed'by'Milo'Mar0n'&'Amir'Roth'at'the'University'of'Pennsylvania'

More information

When Prefetching Works, When It Doesn t, and Why

When Prefetching Works, When It Doesn t, and Why When Prefetching Works, When It Doesn t, and Why JAEKYU LEE, HYESOON KIM, and RICHARD VUDUC, Georgia Institute of Technology In emerging and future high-end processor systems, tolerating increasing cache

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing

Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing WHITE PAPER Highlights: There is a large number of HPC applications that need the lowest possible latency for best performance

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

MEng, BSc Computer Science with Artificial Intelligence

MEng, BSc Computer Science with Artificial Intelligence School of Computing FACULTY OF ENGINEERING MEng, BSc Computer Science with Artificial Intelligence Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give

More information

64-Bit versus 32-Bit CPUs in Scientific Computing

64-Bit versus 32-Bit CPUs in Scientific Computing 64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples

More information

BLM 413E - Parallel Programming Lecture 3

BLM 413E - Parallel Programming Lecture 3 BLM 413E - Parallel Programming Lecture 3 FSMVU Bilgisayar Mühendisliği Öğr. Gör. Musa AYDIN 14.10.2015 2015-2016 M.A. 1 Parallel Programming Models Parallel Programming Models Overview There are several

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

RED HAT ENTERPRISE VIRTUALIZATION PERFORMANCE: SPECVIRT BENCHMARK

RED HAT ENTERPRISE VIRTUALIZATION PERFORMANCE: SPECVIRT BENCHMARK RED HAT ENTERPRISE VIRTUALIZATION PERFORMANCE: SPECVIRT BENCHMARK AT A GLANCE The performance of Red Hat Enterprise Virtualization can be compared to other virtualization platforms using the SPECvirt_sc2010

More information

Benchmarks and Performance Tests

Benchmarks and Performance Tests Chapter 7 Benchmarks and Performance Tests 7.1 Introduction It s common sense everyone agrees that the best way to study the performance of a given system is to run the actual workload on the hardware

More information

Computing Performance Benchmarks among CPU, GPU, and FPGA

Computing Performance Benchmarks among CPU, GPU, and FPGA Computing Performance Benchmarks among CPU, GPU, and FPGA MathWorks Authors: Christopher Cullinan Christopher Wyant Timothy Frattesi Advisor: Xinming Huang Abstract In recent years, the world of high performance

More information

Measuring Computer Systems: How to Measure Performance

Measuring Computer Systems: How to Measure Performance : How to Measure Performance V E R I T A S Margo Seltzer, Aaron Brown Harvard University Division of Engineering and Applied Sciences {margo, abrown}@eecs.harvard.edu Abstract Benchmarks shape a field

More information

Collaborative and Interactive CFD Simulation using High Performance Computers

Collaborative and Interactive CFD Simulation using High Performance Computers Collaborative and Interactive CFD Simulation using High Performance Computers Petra Wenisch, Andre Borrmann, Ernst Rank, Christoph van Treeck Technische Universität München {wenisch, borrmann, rank, treeck}@bv.tum.de

More information

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two age 1 18-742 Lecture 4 arallel rogramming II Spring 2005 rof. Babak Falsafi http://www.ece.cmu.edu/~ece742 write X Memory send X Memory read X Memory Slides developed in part by rofs. Adve, Falsafi, Hill,

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

Trends in High-Performance Computing for Power Grid Applications

Trends in High-Performance Computing for Power Grid Applications Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Benchmarking the Amazon Elastic Compute Cloud (EC2)

Benchmarking the Amazon Elastic Compute Cloud (EC2) Benchmarking the Amazon Elastic Compute Cloud (EC2) A Major Qualifying Project Report submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the

More information

benchmarking Amazon EC2 for high-performance scientific computing

benchmarking Amazon EC2 for high-performance scientific computing Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received

More information

Benchmarking for High Performance Systems and Applications. Erich Strohmaier NERSC/LBNL Estrohmaier@lbl.gov

Benchmarking for High Performance Systems and Applications. Erich Strohmaier NERSC/LBNL Estrohmaier@lbl.gov Benchmarking for High Performance Systems and Applications Erich Strohmaier NERSC/LBNL Estrohmaier@lbl.gov HPC Reference Benchmarks To evaluate performance we need a frame of reference in the performance

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Power Efficiency Metrics for the Top500. Shoaib Kamil and John Shalf CRD/NERSC Lawrence Berkeley National Lab

Power Efficiency Metrics for the Top500. Shoaib Kamil and John Shalf CRD/NERSC Lawrence Berkeley National Lab Power Efficiency Metrics for the Top500 Shoaib Kamil and John Shalf CRD/NERSC Lawrence Berkeley National Lab Power for Single Processors HPC Concurrency on the Rise Total # of Processors in Top15 350000

More information

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Quiz for Chapter 1 Computer Abstractions and Technology 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

More information

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011 Oracle Database Reliability, Performance and scalability on Intel platforms Mitch Shults, Intel Corporation October 2011 1 Intel Processor E7-8800/4800/2800 Product Families Up to 10 s and 20 Threads 30MB

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

Performance Tuning of a CFD Code on the Earth Simulator

Performance Tuning of a CFD Code on the Earth Simulator Applications on HPC Special Issue on High Performance Computing Performance Tuning of a CFD Code on the Earth Simulator By Ken ichi ITAKURA,* Atsuya UNO,* Mitsuo YOKOKAWA, Minoru SAITO, Takashi ISHIHARA

More information

Tableau Server Scalability Explained

Tableau Server Scalability Explained Tableau Server Scalability Explained Author: Neelesh Kamkolkar Tableau Software July 2013 p2 Executive Summary In March 2013, we ran scalability tests to understand the scalability of Tableau 8.0. We wanted

More information

White Paper. Recording Server Virtualization

White Paper. Recording Server Virtualization White Paper Recording Server Virtualization Prepared by: Mike Sherwood, Senior Solutions Engineer Milestone Systems 23 March 2011 Table of Contents Introduction... 3 Target audience and white paper purpose...

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

Performance analysis of parallel applications on modern multithreaded processor architectures

Performance analysis of parallel applications on modern multithreaded processor architectures Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance analysis of parallel applications on modern multithreaded processor architectures Maciej Cytowski* a, Maciej

More information

Cloud Performance Benchmark Series

Cloud Performance Benchmark Series Cloud Performance Benchmark Series Amazon EC2 CPU Speed Benchmarks Kalpit Sarda Sumit Sanghrajka Radu Sion ver..7 C l o u d B e n c h m a r k s : C o m p u t i n g o n A m a z o n E C 2 2 1. Overview We

More information

BENCHMARKING AND CAPACITY PLANNING

BENCHMARKING AND CAPACITY PLANNING CHAPTER 4 BENCHMARKING AND CAPACITY PLANNING This chapter deals with benchmarking and capacity planning of performance evaluation for computer and telecommunication systems. We will address types of benchmark

More information

HP Z Turbo Drive PCIe SSD

HP Z Turbo Drive PCIe SSD Performance Evaluation of HP Z Turbo Drive PCIe SSD Powered by Samsung XP941 technology Evaluation Conducted Independently by: Hamid Taghavi Senior Technical Consultant June 2014 Sponsored by: P a g e

More information

Benchmarking Large Scale Cloud Computing in Asia Pacific

Benchmarking Large Scale Cloud Computing in Asia Pacific 2013 19th IEEE International Conference on Parallel and Distributed Systems ing Large Scale Cloud Computing in Asia Pacific Amalina Mohamad Sabri 1, Suresh Reuben Balakrishnan 1, Sun Veer Moolye 1, Chung

More information

Advanced discretisation techniques (a collection of first and second order schemes); Innovative algorithms and robust solvers for fast convergence.

Advanced discretisation techniques (a collection of first and second order schemes); Innovative algorithms and robust solvers for fast convergence. New generation CFD Software APUS-CFD APUS-CFD is a fully interactive Arbitrary Polyhedral Unstructured Solver. APUS-CFD is a new generation of CFD software for modelling fluid flow and heat transfer in

More information