CONSISTENT PERFORMANCE ASSESSMENT OF MULTICORE COMPUTER SYSTEMS

Size: px
Start display at page:

Download "CONSISTENT PERFORMANCE ASSESSMENT OF MULTICORE COMPUTER SYSTEMS"

Transcription

1 CONSISTENT PERFORMANCE ASSESSMENT OF MULTICORE COMPUTER SYSTEMS GH. ADAM 1,2, S. ADAM 1,2, A. AYRIYAN 2, V. KORENKOV 2, V. MITSYN 2, M. DULEA 1, I. VASILE 1 1 Horia Hulubei National Institute for Physics and Nuclear Engineering (IFIN-HH), 407 Atomistilor, Magurele Bucharest, , Romania adamg@ifin.nipne.ro 2 Joint Institute for Nuclear Research, Dubna, Moscow reg., Russia Received August 30, 2008 Performance assessment, through High-Performance Linpack (HPL) benchmark, of the quad-core cluster with InfiniBand interconnect recently acquired at LIT-JINR Dubna is reported. Corroboration with previous results [Gh. Adam et al., Rom. J. Phys., 53, 665 (2008)] shows that the HPL benchmark scales for single-core, two-core, quad-core chips and yields results fitting its intrinsic complexity under statistically relevant criteria. Free software implementations (OS and MPI) on multicore clusters at LIT-JINR and IFIN-HH resulted in relative performances comparable to those reported within the September 2007 issue of TOP500, the list of the five hundred most productive parallel computers in the world. 1. INTRODUCTION The multiprocessor computer architectures built by the computing system vendors are intended to solve complex computational problems. At one extreme there is the case of very large single problems (like, e.g., those arising in lattice chromodynamics), which ultimately result in very large linear algebraic systems, as a consequence of specific discretization procedures yielding numerical algorithms. At the other extreme there is the case of very large sets of independent small to medium size problems of similar nature, which arise in very large scale projects (like, e.g., the four LHC experiments at CERN, the data taking beginning of which is planned for September 2008). These two kinds of problems correspond to two extremes of the existing multiprocessor architectures: parallel clusters (which do high performance computing under small latencies of the interprocessor communication) and distributed systems (Grids) (which are reservoirs of computing power, accessible everywhere from the world within a virtual organization). Most of the offers during the last few years by the computer manufactures for Grid infrastructure development use multicore computer architecture, which involves several independent processors (cores) on a chip that communicate Rom. Journ. Phys., Vol. 53, Nos. 9 10, P , Bucharest, 2008

2 986 Gh. Adam et al. 2 through shared memory. Conceived mainly as a solution to overcome the power consumption problem which is impeding higher processor clock frequency increase, the multicore computer architecture marks the start of a historic transition from sequential to parallel computation inside each multicore chip installed on the system. Under parallel computations scalable with the number of cores on a chip, this would afford an alternative way towards further exponential performance improvement under Moore s law exponential increase in chip resources via core number increase. This is, however, a formidable task, quoted at the recent Gartner Conference [1] to represent one of the seven grand challenges facing IT for the next 25 years. Research concerning both computer architecture issues under the new circumstances [2] as well as the development of new higher-level abstractions for writing parallel programs [3] are actively pursued. Data accumulated both at LIT-JINR and abroad [4, 5] show that understanding the hardware transfer processes for specific problems inbetween the core and the RAM, together with appropriate identification of the algorithm modules that may be executed in parallel and with corresponding best MPI standard instructions for their handling, allow parallel code improvement. The present paper discusses performance assessment of a 20 quad-core processors module, with InfiniBand interconnect, acquired at the beginning of 2008 at LIT-JINR Dubna. This continues a similar study [6] of performance assessments of the CICC JINR supercomputer consisting of 120 two-core processors with Gigabit Ethernet (GbEthernet) interconnect, and the parallel 16-processor cluster SIMFAP with Myrinet interconnect, at IFIN-HH. 2. PERFORMANCE ASSESSMENT The main characteristics of the three systems mentioned above are given in Table 1. Performance is measured by means of the High-Performance Linpack (HPL) benchmark [7], used in TOP500, the list of the five hundred most productive parallel computing systems in the world [8], and in TOP50, the list of the fifty most productive parallel computing systems in the CIS (the Commonwealth of the Independent States) [9]. The HPL benchmark essentials and the discussion of its computational complexity can be found in [6]. The system performance gets maximized provided the order N of the solved algebraic system satisfies N < N max, where N max denotes the maximum system order for which the coefficient matrix can be accommodated within the available overall RAM, RAM. At N > N max, performance deterioration occurs due to the need of using the HDD swap storage. The quantity P peak denotes the peak theoretical performance which would be obtained under instantaneous information exchange along any of the paths involving cores, cache, RAM, HDD.

3 3 Consistent performance assessment of multicore computer systems 987 Table 1 Main characteristics of the three computing systems of interest Features IFIN-HH CICC CICC parallel SIMFAP supercomputer cluster Intel Processors Xeon Irwindale 2xXeon 5150 Xeon 5315 Clock frequency, ν 3 GHz 2.66 GHz 3 GHz Cores per CPU CPUs per node Total nodes Total CPUs Total cores, n level cache/cpu 2 MB 4 MB 8 MB RAM on node 4 GB 8 GB 8 GB Overall RAM, RAM 32 GB 480 GB 80 GB Operating System CentOS 5 SL 4.5 SL 4.5 Network Myrinet GbEthernet InfiniBand MPI Version OpenMPI Flops per tact, k System performance under HPL benchmark N max P peak = knν 96 GFlops GFlops 960 GFlops P max GFlops 1124 GFlops GFlops ρ eff = P max / P peak The quantity Pmax = Nop/ T denotes the maximum measured system performance, where N 3 2 op = (2/ 3) N + 2N is the number of floating point operations needed for solving the algebraic system of order N N max, and T is the measured computing time in seconds. Finally, the ratio ρ eff denotes the effectiveness of the system under scrutiny. For values N Nmax, the system performance is expected to be much smaller than P max. Fig. 1 summarizes the results obtained for the three mentioned clusters. On the bottom row, measured computing times in terms of N are given in minutes, while on the upper row, the resulting performances in terms of N are given for each of the clusters. The interesting feature showed both by the SIMFAP (Myrinet interconnect) and the CICC parallel cluster (InfiniBand interconnect) is the performance saturation near the upper order end of the solved algebraic systems. This points to the advantage of having a wide band dedicated data transfer bus among the processors. For the Gigabit Ethernet CICC supercomputer, saturation does not occur due to the absence of such a dedicated

4 988 Gh. Adam et al. 4 Fig. 1 Performance (on the upper row) and time of calculation (on the bottom row) vs. order of linear system of equations N, in 10 3 units, for the three clusters. bus. As compared to the previous performance estimates, the present statistics is larger and derived at magic N values [6]. 3. DISCUSSION AND CONCLUSIONS The least squares fit of the computing times measured at various N values provides insight into the consistency of the performance assessment procedure [6]. On one side, the intrinsic degree of complexity of the HPL benchmark is d = 3. On the other side, we can determine the optimal degree m of the least squares fit polynomial under a particular assumption on the distribution law of the uncertainties {σ i } and a statistically significant termination criterion of the least squares procedure. Since the time measurements have been done independently of each other, we have to assume a Poisson distribution law. In [6], optimal values m = d have been obtained both for the CICC supercomputer and SIMFAP data asking for the Hamming termination criterion (criterion 1 in the Appendix of [6]). For the CICC parallel cluster data, this

5 5 Consistent performance assessment of multicore computer systems 989 criterion proved to be ineffective. However, instead of the pure noise requirement involved in the Hamming criterion, we may ask the criterion z <τ max(1, T ), τ 0. 01, (1) im, i where z i,m denotes the residual associated to the T i time measurement within the m-th degree fitting polynomial. The criterion (1) has indeed resulted in m = d = 3, hence the present data are consistent with the third order complexity of the HPL benchmark as well (Fig. 2). Fig. 2 Fitting CICC parallel cluster performance data resulted in optimal m = 3 degree fitting polynomial with sup-norm misfit magnitude below one percent. Corroborating this result with the evidence reported in [6], we conclude that the HPL benchmark scales perfectly for the multicore clusters. This let us infer that, for scientific computing involving compact matrix coefficients, the derivation of scalable parallel codes is a feasible task within the MPI standard. In the last line of Table 1, there is a large difference between the 44 percent effectiveness of the GbEthernet CICC supercomputer, the 67 percent effectiveness of the Myrinet SIMFAP cluster, and the 71.3 percent effectiveness of the InfiniBand CICC parallel cluster. We assume that these figures should stem from the specific interconnects of the three computer clusters. An independent confirmation of such a hypothesis comes from the comparison of these figures with the histogram representations of the efficiencies

6 990 Gh. Adam et al. 6 Fig. 3 Histograms summarize the September 2007 issue of the TOP500 data for each of the existing interconnect networks. Arrows point to the present results.

7 7 Consistent performance assessment of multicore computer systems 991 reported in the September 2007 issue of the TOP500 list for GbEthernet, Myrinet, and InfiniBand parallel clusters (Fig. 3). The occurrence of the relative performances at the level of the best computers in the world points to the fact that the home made open software implementations of the operating systems (OS) and MPI standards have been done at a high qualitative level. Acknowledgments. Romanian authors acknowledge partial support from contract CEX05- D A. Ayriyan acknowledges partial support from RFBR grant # a. REFERENCES 1. Gartner Symposium/ITxpo 2008, Emerging Trends, 6 10 April 2008, Mandalay Bay/Las Vegas, NV, USA; Comm. ACM, 51, no. 7, 10 (2008); news/2008/ gartner-it-challenges.html 2. M. Osin, Comm. ACM, 51, no. 7, (July 2008). 3. J. Larus, C. Kozyrakis, Comm. ACM 51, no. 7, (July 2008). 4. V. Lindenstruth, Status and plans for building an energy efficient supercomputer in Frankfurt, GRID 2008, 3-rd Intl. Conf. Distributed computed and Grid technologies in science and education, JINR Dubna, 30 June 4 July S. Gorbunov, U. Kebschull, I. Kisel, V. Lindenstruth, W. F. J. Mueller, Comput. Phys. Commun. 178, (2008). 6. Gh. Adam, S. Adam, A. Ayriyan, E. Dushanov, E. Hayryan, V. Korenkov, A. Lutsenko, V. Mitsyn, T. Sapozhnikova, A. Sapozhnikov, O. Streltsova, F. Buzatu, M. Dulea, I. Vasile, A. Sima, C. Viºan, J. Buša, I. Pokorny, Romanian J. Phys. 53, No. 5 6, (2008). 7. A. Petitet, R. C. Whaley, J. Dongarra, A. Cleary, HPL A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers,

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

More information

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures 11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the

More information

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems 202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric

More information

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing 1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance

More information

Cluster Computing at HRI

Cluster Computing at HRI Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: jasjeet@mri.ernet.in 1 Introduction and some local history High performance computing

More information

Performance of the Cloud-Based Commodity Cluster. School of Computer Science and Engineering, International University, Hochiminh City 70000, Vietnam

Performance of the Cloud-Based Commodity Cluster. School of Computer Science and Engineering, International University, Hochiminh City 70000, Vietnam Computer Technology and Application 4 (2013) 532-537 D DAVID PUBLISHING Performance of the Cloud-Based Commodity Cluster Van-Hau Pham, Duc-Cuong Nguyen and Tien-Dung Nguyen School of Computer Science and

More information

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1 Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these

More information

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer Res. Lett. Inf. Math. Sci., 2003, Vol.5, pp 1-10 Available online at http://iims.massey.ac.nz/research/letters/ 1 Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

benchmarking Amazon EC2 for high-performance scientific computing

benchmarking Amazon EC2 for high-performance scientific computing Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received

More information

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, Josef.Pelikan@mff.cuni.cz Abstract 1 Interconnect quality

More information

A Flexible Cluster Infrastructure for Systems Research and Software Development

A Flexible Cluster Infrastructure for Systems Research and Software Development Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

Lecture 1: the anatomy of a supercomputer

Lecture 1: the anatomy of a supercomputer Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers of the future may have only 1,000 vacuum tubes and perhaps weigh 1½ tons. Popular Mechanics, March 1949

More information

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. Reza_Rooholamini@dell.com http://www.dell.com/clustering

More information

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833

More information

Key words: cloud computing, cluster computing, virtualization, hypervisor, performance evaluation

Key words: cloud computing, cluster computing, virtualization, hypervisor, performance evaluation Hypervisors Performance Evaluation with Help of HPC Challenge Benchmarks Reza Bakhshayeshi; bakhshayeshi.reza@gmail.com Mohammad Kazem Akbari; akbarif@aut.ac.ir Morteza Sargolzaei Javan; msjavan@aut.ac.ir

More information

Improving Grid Processing Efficiency through Compute-Data Confluence

Improving Grid Processing Efficiency through Compute-Data Confluence Solution Brief GemFire* Symphony* Intel Xeon processor Improving Grid Processing Efficiency through Compute-Data Confluence A benchmark report featuring GemStone Systems, Intel Corporation and Platform

More information

On-Demand Supercomputing Multiplies the Possibilities

On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Image courtesy of Wolfram Research, Inc. On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server

More information

Trends in High-Performance Computing for Power Grid Applications

Trends in High-Performance Computing for Power Grid Applications Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views

More information

QLIKVIEW SERVER MEMORY MANAGEMENT AND CPU UTILIZATION

QLIKVIEW SERVER MEMORY MANAGEMENT AND CPU UTILIZATION QLIKVIEW SERVER MEMORY MANAGEMENT AND CPU UTILIZATION QlikView Scalability Center Technical Brief Series September 2012 qlikview.com Introduction This technical brief provides a discussion at a fundamental

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

Lattice QCD Performance. on Multi core Linux Servers

Lattice QCD Performance. on Multi core Linux Servers Lattice QCD Performance on Multi core Linux Servers Yang Suli * Department of Physics, Peking University, Beijing, 100871 Abstract At the moment, lattice quantum chromodynamics (lattice QCD) is the most

More information

Clusters: Mainstream Technology for CAE

Clusters: Mainstream Technology for CAE Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

IT@Intel. Comparing Multi-Core Processors for Server Virtualization

IT@Intel. Comparing Multi-Core Processors for Server Virtualization White Paper Intel Information Technology Computer Manufacturing Server Virtualization Comparing Multi-Core Processors for Server Virtualization Intel IT tested servers based on select Intel multi-core

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Optimizing Shared Resource Contention in HPC Clusters

Optimizing Shared Resource Contention in HPC Clusters Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager*

More information

Methodology for predicting the energy consumption of SPMD application on virtualized environments *

Methodology for predicting the energy consumption of SPMD application on virtualized environments * Methodology for predicting the energy consumption of SPMD application on virtualized environments * Javier Balladini, Ronal Muresano +, Remo Suppi +, Dolores Rexachs + and Emilio Luque + * Computer Engineering

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

Network Bandwidth Measurements and Ratio Analysis with the HPC Challenge Benchmark Suite (HPCC)

Network Bandwidth Measurements and Ratio Analysis with the HPC Challenge Benchmark Suite (HPCC) Proceedings, EuroPVM/MPI 2005, Sep. 18-21, Sorrento, Italy, LNCS, Springer-Verlag, 2005. c Springer-Verlag, http://www.springer.de/comp/lncs/index.html Network Bandwidth Measurements and Ratio Analysis

More information

How System Settings Impact PCIe SSD Performance

How System Settings Impact PCIe SSD Performance How System Settings Impact PCIe SSD Performance Suzanne Ferreira R&D Engineer Micron Technology, Inc. July, 2012 As solid state drives (SSDs) continue to gain ground in the enterprise server and storage

More information

An introduction to Fyrkat

An introduction to Fyrkat Cluster Computing May 25, 2011 How to get an account https://fyrkat.grid.aau.dk/useraccount How to get help https://fyrkat.grid.aau.dk/wiki What is a Cluster Anyway It is NOT something that does any of

More information

Performance of Scientific Processing in Networks of Workstations: Matrix Multiplication Example

Performance of Scientific Processing in Networks of Workstations: Matrix Multiplication Example Performance of Scientific Processing in Networks of Workstations: Matrix Multiplication Example Fernando G. Tinetti Centro de Técnicas Analógico-Digitales (CeTAD) 1 Laboratorio de Investigación y Desarrollo

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance

More information

Multi-core and Linux* Kernel

Multi-core and Linux* Kernel Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Outline Introduction EO challenges; EO and classical/cloud computing; EO Services The computing platform Cluster -> Grid -> Cloud

More information

The L-CSC cluster: Optimizing power efficiency to become the greenest supercomputer in the world in the Green500 list of November 2014

The L-CSC cluster: Optimizing power efficiency to become the greenest supercomputer in the world in the Green500 list of November 2014 The L-CSC cluster: Optimizing power efficiency to become the greenest supercomputer in the world in the Green500 list of November 2014 David Rohr 1, Gvozden Nešković 1, Volker Lindenstruth 1,2 DOI: 10.14529/jsfi150304

More information

Comparative performance test Red Hat Enterprise Linux 5.1 and Red Hat Enterprise Linux 3 AS on Intel-based servers

Comparative performance test Red Hat Enterprise Linux 5.1 and Red Hat Enterprise Linux 3 AS on Intel-based servers Principled Technologies Comparative performance test Red Hat Enterprise Linux 5.1 and Red Hat Enterprise Linux 3 AS on Intel-based servers Principled Technologies, Inc. Agenda Overview System configurations

More information

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers WHITE PAPER Comparing the performance of the Landmark Nexus reservoir simulator on HP servers Landmark Software & Services SOFTWARE AND ASSET SOLUTIONS Comparing the performance of the Landmark Nexus

More information

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER Tender Notice No. 3/2014-15 dated 29.12.2014 (IIT/CE/ENQ/COM/HPC/2014-15/569) Tender Submission Deadline Last date for submission of sealed bids is extended

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

Combining Scalability and Efficiency for SPMD Applications on Multicore Clusters*

Combining Scalability and Efficiency for SPMD Applications on Multicore Clusters* Combining Scalability and Efficiency for SPMD Applications on Multicore Clusters* Ronal Muresano, Dolores Rexachs and Emilio Luque Computer Architecture and Operating System Department (CAOS) Universitat

More information

Parallel Computing. Introduction

Parallel Computing. Introduction Parallel Computing Introduction Thorsten Grahs, 14. April 2014 Administration Lecturer Dr. Thorsten Grahs (that s me) t.grahs@tu-bs.de Institute of Scientific Computing Room RZ 120 Lecture Monday 11:30-13:00

More information

Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

More information

Building an energy dashboard. Energy measurement and visualization in current HPC systems

Building an energy dashboard. Energy measurement and visualization in current HPC systems Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 thomas.geenen@surfsara.nl SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators

More information

Performance of the JMA NWP models on the PC cluster TSUBAME.

Performance of the JMA NWP models on the PC cluster TSUBAME. Performance of the JMA NWP models on the PC cluster TSUBAME. K.Takenouchi 1), S.Yokoi 1), T.Hara 1) *, T.Aoki 2), C.Muroi 1), K.Aranami 1), K.Iwamura 1), Y.Aikawa 1) 1) Japan Meteorological Agency (JMA)

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

Using an MPI Cluster in the Control of a Mobile Robots System

Using an MPI Cluster in the Control of a Mobile Robots System Using an MPI Cluster in the Control of a Mobile Robots System Mohamed Salim LMIMOUNI, Saïd BENAISSA, Hicham MEDROMI, Adil SAYOUTI Equipe Architectures des Systèmes (EAS), Laboratoire d Informatique, Systèmes

More information

2: Computer Performance

2: Computer Performance 2: Computer Performance http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture2.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12

More information

Improved LS-DYNA Performance on Sun Servers

Improved LS-DYNA Performance on Sun Servers 8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms

More information

Supercomputing 2004 - Status und Trends (Conference Report) Peter Wegner

Supercomputing 2004 - Status und Trends (Conference Report) Peter Wegner (Conference Report) Peter Wegner SC2004 conference Top500 List BG/L Moors Law, problems of recent architectures Solutions Interconnects Software Lattice QCD machines DESY @SC2004 QCDOC Conclusions Technical

More information

Design Considerations for Increasing VDI Performance and Scalability with Cisco Unified Computing System

Design Considerations for Increasing VDI Performance and Scalability with Cisco Unified Computing System White Paper Design Considerations for Increasing VDI Performance and Scalability with Cisco Unified Computing System White Paper April 2013 2013 Cisco and/or its affiliates. All rights reserved. This document

More information

Performance Evaluation of the XDEM framework on the OpenStack Cloud Computing Middleware

Performance Evaluation of the XDEM framework on the OpenStack Cloud Computing Middleware Performance Evaluation of the XDEM framework on the OpenStack Cloud Computing Middleware 1 / 17 Performance Evaluation of the XDEM framework on the OpenStack Cloud Computing Middleware X. Besseron 1 V.

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

Parallel Processing and Software Performance. Lukáš Marek

Parallel Processing and Software Performance. Lukáš Marek Parallel Processing and Software Performance Lukáš Marek DISTRIBUTED SYSTEMS RESEARCH GROUP http://dsrg.mff.cuni.cz CHARLES UNIVERSITY PRAGUE Faculty of Mathematics and Physics Benchmarking in parallel

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

FLOW-3D Performance Benchmark and Profiling. September 2012

FLOW-3D Performance Benchmark and Profiling. September 2012 FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute

More information

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317. http://www.ansys.com (T) 724-746-3304 (F) 724-514-9494

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317. http://www.ansys.com (T) 724-746-3304 (F) 724-514-9494 Performance Guide ANSYS, Inc. Release 12.1 Southpointe November 2009 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317 certified to ISO ansysinfo@ansys.com 9001:2008. http://www.ansys.com (T) 724-746-3304

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

Informatica Ultra Messaging SMX Shared-Memory Transport

Informatica Ultra Messaging SMX Shared-Memory Transport White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade

More information

Hadoop on a Low-Budget General Purpose HPC Cluster in Academia

Hadoop on a Low-Budget General Purpose HPC Cluster in Academia Hadoop on a Low-Budget General Purpose HPC Cluster in Academia Paolo Garza, Paolo Margara, Nicolò Nepote, Luigi Grimaudo, and Elio Piccolo Dipartimento di Automatica e Informatica, Politecnico di Torino,

More information

IS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration

IS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration IS-ENES/PrACE Meeting EC-EARTH 3 A High-resolution Configuration Motivation Generate a high-resolution configuration of EC-EARTH to Prepare studies of high-resolution ESM in climate mode Prove and improve

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Workshare Process of Thread Programming and MPI Model on Multicore Architecture

Workshare Process of Thread Programming and MPI Model on Multicore Architecture Vol., No. 7, 011 Workshare Process of Thread Programming and MPI Model on Multicore Architecture R. Refianti 1, A.B. Mutiara, D.T Hasta 3 Faculty of Computer Science and Information Technology, Gunadarma

More information

Retargeting PLAPACK to Clusters with Hardware Accelerators

Retargeting PLAPACK to Clusters with Hardware Accelerators Retargeting PLAPACK to Clusters with Hardware Accelerators Manuel Fogué 1 Francisco Igual 1 Enrique S. Quintana-Ortí 1 Robert van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores.

More information

AirWave 7.7. Server Sizing Guide

AirWave 7.7. Server Sizing Guide AirWave 7.7 Server Sizing Guide Copyright 2013 Aruba Networks, Inc. Aruba Networks trademarks include, Aruba Networks, Aruba Wireless Networks, the registered Aruba the Mobile Edge Company logo, Aruba

More information

Fast Two-Point Correlations of Extremely Large Data Sets

Fast Two-Point Correlations of Extremely Large Data Sets Fast Two-Point Correlations of Extremely Large Data Sets Joshua Dolence 1 and Robert J. Brunner 1,2 1 Department of Astronomy, University of Illinois at Urbana-Champaign, 1002 W Green St, Urbana, IL 61801

More information

ASPI Performance Analysis - A Practical Model

ASPI Performance Analysis - A Practical Model CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE (DOI: 1.12/cpe.787) Design and evaluation of a TOP1 Linux Super Cluster system Niklas Edmundsson, Erik Elmroth,,BoKågström, Markus Mårtensson, Mats

More information

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING OBJECTIVE ANALYSIS WHITE PAPER MATCH ATCHING FLASH TO THE PROCESSOR Why Multithreading Requires Parallelized Flash T he computing community is at an important juncture: flash memory is now generally accepted

More information

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability

More information

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009 ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory

More information

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers Information Technology Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers Effective for FY2016 Purpose This document summarizes High Performance Computing

More information

Logically a Linux cluster looks something like the following: Compute Nodes. user Head node. network

Logically a Linux cluster looks something like the following: Compute Nodes. user Head node. network A typical Linux cluster consists of a group of compute nodes for executing parallel jobs and a head node to which users connect to build and launch their jobs. Often the compute nodes are connected to

More information

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy

More information

Scalable Computing in the Multicore Era

Scalable Computing in the Multicore Era Scalable Computing in the Multicore Era Xian-He Sun, Yong Chen and Surendra Byna Illinois Institute of Technology, Chicago IL 60616, USA Abstract. Multicore architecture has become the trend of high performance

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Power-Aware High-Performance Scientific Computing

Power-Aware High-Performance Scientific Computing Power-Aware High-Performance Scientific Computing Padma Raghavan Scalable Computing Laboratory Department of Computer Science Engineering The Pennsylvania State University http://www.cse.psu.edu/~raghavan

More information

Symmetric Multiprocessing

Symmetric Multiprocessing Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called

More information

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012 Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),

More information

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks Garron K. Morris Senior Project Thermal Engineer gkmorris@ra.rockwell.com Standard Drives Division Bruce W. Weiss Principal

More information

Performance Across the Generations: Processor and Interconnect Technologies

Performance Across the Generations: Processor and Interconnect Technologies WHITE Paper Performance Across the Generations: Processor and Interconnect Technologies HPC Performance Results ANSYS CFD 12 Executive Summary Today s engineering, research, and development applications

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

High Performance Matrix Inversion with Several GPUs

High Performance Matrix Inversion with Several GPUs High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República

More information

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering Enquiry No: Enq/IITK/ME/JB/02 Enquiry Date: 14/12/15 Last Date of Submission: 21/12/15 Formal quotations are invited for HPC cluster.

More information

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

Performance Evaluation of Amazon EC2 for NASA HPC Applications! National Aeronautics and Space Administration Performance Evaluation of Amazon EC2 for NASA HPC Applications! Piyush Mehrotra!! J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff,! S. Saini, R. Biswas!

More information

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing

More information