Performance Comparison of ISV Simulation Codes on Microsoft Windows HPC Server 2008 and SUSE Linux Enterprise Server 10.2

Similar documents
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Numerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015

FLOW-3D Performance Benchmark and Profiling. September 2012

LS DYNA Performance Benchmarks and Profiling. January 2009

Cloud Computing through Virtualization and HPC technologies

Picking the right number of targets per server for BeeGFS. Jan Heichler March 2015 v1.2

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

High Performance Computing in CST STUDIO SUITE

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

RDMA over Ethernet - A Preliminary Study

IBM System Cluster 1350 ANSYS Microsoft Windows Compute Cluster Server

Microsoft Exchange Server 2003 Deployment Considerations

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

Boosting Data Transfer with TCP Offload Engine Technology

ECLIPSE Performance Benchmarks and Profiling. January 2009

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

CAE-Anwendungen auf Compute-Clustern: Wer kann's und was muss ich beachten?

Clusters: Mainstream Technology for CAE

Scaling from Workstation to Cluster for Compute-Intensive Applications

June, Supermicro ICR Recipe For 1U Twin Department Cluster. Version 1.4 6/25/2009

Fusionstor NAS Enterprise Server and Microsoft Windows Storage Server 2003 competitive performance comparison

Michael Kagan.

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

Fast Setup and Integration of ABAQUS on HPC Linux Cluster and the Study of Its Scalability

HA Certification Document Armari BrontaStor 822R 07/03/2013. Open-E High Availability Certification report for Armari BrontaStor 822R

Very special thanks to Wolfgang Gentzsch and Burak Yenier for making the UberCloud HPC Experiment possible.

Harvard Research Group Experience - Expertise - Insight - Results. Microsoft Windows Compute Cluster Server 2003 (WCCS) Industry Standard Expansion

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

InfiniBand Experiences at Forschungszentrum Karlsruhe. Forschungszentrum Karlsruhe

Brainlab Node TM Technical Specifications

Accelerating CFD using OpenFOAM with GPUs

Business white paper. HP Process Automation. Version 7.0. Server performance

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Contents. ST9612 Model WIC Printer. Get the original printer s information. Edited 11/04/15

Virtualization of a Cluster Batch System

1 DCSC/AU: HUGE. DeIC Sekretariat /RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

PRIMERGY server-based High Performance Computing solutions

Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison

OpenFOAM and SGI Designed to Work Together. Christian Tanasescu Vice President Software Engineering

WCF als Schnittstelle zwischen Industrieanwendung und HPC Cluster

Performance test report

Experiences with HPC on Windows

Wind-Tunnel Simulation using TAU on a PC-Cluster: Resources and Performance Stefan Melber-Wilkending / DLR Braunschweig

An introduction to Fyrkat

Improved LS-DYNA Performance on Sun Servers

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Smart Manufacturing. CAE as a Service in the Cloud. Objective: convincing you to consider CAE in the Cloud

When EP terminates the use of Hosting CC OG, EP is required to erase the content of CC OG application at its own cost.

A PERFORMANCE COMPARISON USING HPC BENCHMARKS: WINDOWS HPC SERVER 2008 AND RED HAT ENTERPRISE LINUX 5

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

Adonis Technical Requirements

SGI HPC Systems Help Fuel Manufacturing Rebirth

Simulation Platform Overview

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

CoolEmAll - Tools for realising an energy efficient data centre

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Test Report Newtech Supremacy II NAS 05/31/2012. Newtech Supremacy II NAS storage system

Ignify ecommerce. Item Requirements Notes

Dragon Medical Enterprise Network Edition Technical Note: Requirements for DMENE Networks with virtual servers


Scaling LS-DYNA on Rescale HPC Cloud Simulation Platform

Solution Brief July All-Flash Server-Side Storage for Oracle Real Application Clusters (RAC) on Oracle Linux

Cluster Computing at HRI

Performance Across the Generations: Processor and Interconnect Technologies

Recent Advances in HPC for Structural Mechanics Simulations

Dragon NaturallySpeaking and citrix. A White Paper from Nuance Communications March 2009

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Self service for software development tools

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

File System Design and Implementation

Pedraforca: ARM + GPU prototype

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:

Microsoft Windows Compute Cluster Server 2003 Getting Started Guide

Leveraging Windows HPC Server for Cluster Computing with Abaqus FEA

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

Research Report: The Arista 7124FX Switch as a High Performance Trade Execution Platform

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

ABAQUS High Performance Computing Environment at Nokia

White Paper Open-E NAS Enterprise and Microsoft Windows Storage Server 2003 competitive performance comparison

Priority Zoom v17: Hardware and Supporting Systems

Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks

Interoperability Testing and iwarp Performance. Whitepaper

Accelerating Microsoft Exchange Servers with I/O Caching

Low-cost

CONSTRUCTION / SERVICE BILLING SYSTEM SPECIFICATIONS

NEFSIS DEDICATED SERVER

Intel Solid-State Drives Increase Productivity of Product Design and Simulation

HP reference configuration for entry-level SAS Grid Manager solutions

Building Clusters for Gromacs and other HPC applications

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Ahsay Online Backup Suite v5.0. Whitepaper Backup speed

Transcription:

Fraunhofer Institute for Algorithms and Scientific Computing SCAI Performance Comparison of ISV Simulation Codes on Microsoft HPC Server 28 and SUSE Enterprise Server 1.2 Karsten Reineck und Horst Schwichtenberg 31.3.29

Fraunhofer Institute for Algorithms and Scientific Computing SCAI The Fraunhofer* Society Founded in 1949, non-profit organization Focus on application-oriented basic and industrial research 57 research institutes throughout Germany Staff of approx. 12.5 people, majority of qualified scientists and engineers Annual research volume around 1 billion euro *Joseph von Fraunhofer (1787-1826) Researcher, inventor and entrepreneur

Benchmark Introduction The same test cases (problems) are solved on equal hardware on and ISV defined test cases CFX: Internal flow through a flow channel with 5 to 2 million elements FLUENT: External flow over a truck body of around 14 million cells LS-DYNA: Neon-Refined Crash Test simulation (frontal crash with initial speed at 31.5 mph) PAMCRASH: Front crash of a Neon car with 1 million cells

Benchmark ISV Simulation Software SIMULIA Abaqus/Standard: implicit solutions and a range of contact and nonlinear material options for static, dynamics, thermal, and multiphysics analyses, Abaqus/Explicit: the explicit method for highspeed, nonlinear, transient response and multi-physics applications. ANSYS CFX is a powerful and flexible general-purpose computational fluid dynamics (CFD) package used for engineering simulations of all levels of complexity. ANSYS FLUENT is a powerful and flexible general-purpose computational fluid dynamics (CFD) package used for engineering simulations of all levels of complexity. DYNAMORE LS-DYNA is a multi-purpose, explicit and implicit finite element program used to analyze linear and nonlinear static and dynamic behavior of physical procedures. ESI Group PAM-CRASH is the most widely used crash simulation software.

Benchmark Hardware Hardware Twin Servers with Supermicro X7DWT main boards and quad core CPUs Attention when only one local node is involved: How does the scheduler attach 4 processes on 8 cores? Node Core 1 Core 2 CPU 1 Core 3 Core 4 Core 1 Core 2 CPU 2 Core 3 Core 4

Benchmark Hardware Network

CFX Benchmark Results Local Ethernet Infiniband 35 14 14 3 25 2 15 1 12 1 8 6 4 12 1 8 6 4 5 45% 4-1 8-1 number of processes - number of nodes 2-9% -9% 1% 1% 2-17% -18% 3% lower numbers (a lower run time) is better

FLUENT Benchmark Results Local Ethernet Infiniband 2 12 12 18 16 14 12 1 8 6 1 8 6 4 1 8 6 4 4 2 34% 12% 4-1 8-1 number of processes - number of nodes 2 1% 9% 7% 7% 1% 2 9% 1% 7% 7% 5 % lower numbers (a lower run time) is better

LS-DYNA Benchmark Results (single precision) Local Ethernet Infiniband 6 4 4 5 4 3 2 35 3 25 2 15 35 3 25 2 15 1 22% 1% 4-1 8-1 number of processes - number of nodes 1 5 4% 3% -1% 2% 33% 1 5 5% 7% 7% 8% 17% lower numbers (a lower run time) is better

PAM-CRASH Benchmark Results Local Infiniband 12 8 1 8 6 4 7 6 5 4 3 2 26% 1% 4-1 8-1 number of processes - number of nodes 2 1 3% lower numbers (a lower run time) is better

Abaqus/Explicit Benchmark Results The scheduler pauses the jobs in after about 12 minutes because there are not enough available cores. 3 25 Ethernet 3 25 Infiniband 3 25 2 15 1 5 Local 4-1 8-1 number of processes - number of nodes 2 15 1 5 18% 14% 12% 29% 42% 2 15 1 5 18% 18% 14% 24% 31% lower numbers (a lower run time) is better

Abaqus/Standard Benchmark Results The scheduler pauses the jobs in after about 12 minutes because there are not enough available cores. 6 5 Ethernet 6 5 Infiniband 14 12 1 8 6 4 2 Local 4-1 8-1 number of processes - number of nodes 4 3 2 1 17% 18% 18% 33% 31% 4 3 2 1 5% 12% 7% 13% 2% lower numbers (a lower run time) is better

Abaqus/Standard New Version 6.8-4 6 Ethernet 6.8-2 During the benchmark the Abaqus beta version 6.8-4 has been released for 5 6.8-4 Some issues for were resolved 4 3 2 Abaqus 6.8-4 has a performance improvement of about 3% in our scenarios 1 28% 38% 26% 3% 29% lower numbers (a lower run time) is better

Conclusion Deviation from to (=%) CFX 25% 2% 15% Abaqus Explicit 1% FLUENT 5% % Deviation Infiniband 7% Ethernet 13% Local 18% Abaqus Standard -5% -1% LS-DYNA PAM-CRASH Average Local Ethernet Infiniband Deviation CFX 4% FLUENT 13% LS-DYNA 11% PAM-CRASH 11% Abaqus Standard 17% Abaqus Explicit 22%

Open MS HPC Portal Porting Open Source HPC Software to Microsoft Platforms Portal for open source software developed and ported by Fraunhofer-SCAI (Elmer, OpenFOAM) In the future: uploads and downloads of YOUR open source software Best practises URL: http://www.scai.fraunhofer.de/openmshpc.html

Fraunhofer Institute for Algorithms and Scientific Computing SCAI Thanks for your attention! www.scai.fraunhofer.de

Appendix: Cluster Configuration Hardware and Network Head Node Compute Nodes MICRO-STAR MS-9172-1S 2x Intel Xeon E533 @ 2.13GHz (ES) 4 GB FB-DDR2 RAM 2x 1 Mbps LAN Mellanox ConnectX (MT25418) Infiniband DDR Channel Adapter Supermicro X7DWT 2x Intel Xeon E5472 @ 3.GHz (Quad Core) 16 GB FB-DDR2 RAM 2x 1 Mbps LAN Mellanox ConnectX (MT26418) Infiniband, 2Gbps PCI-E 2. (onboard) Network Hardware Network Configuration Switch 1: HP procurve switch 2724 J4897A, 1 Mbps Ethernet, 24 ports Switch 2: Extreme Networks Summit X45-24t, 1 Mbps Ethernet, 24 ports Switch 3: Voltaire ISR-924D_M 24 4x DDR, Infiniband, 24 ports Network 1: 1GBit/s Ethernet (management) Network 2: 1GBit/s Ethernet (MPI) Network 3: Infiniband (MPI)

Appendix: Operating Systems and ISV Softwares Server HPC Edition, Build 61, 64 bit Server Manager Version 6..61.1878 HPC Cluster Manager Version 2..1551. Infiniband Driver: Mellanox Version 1.4.1.3223 SUSE Enterprise Server 1.2 Kernel 2.6.16.6-.21, libc 2.9 Infiniband: OFED Version 1.3.1 Abaqus 6.8-2 6.8-4 (for only) Fluent CFX Pamcrash LS-Dyna 12.7 beta 11 SP1 with Arch detect fix for Quad core CPUs v28. with modified pamworld on 971_R3.2.1 double precision partitioning setup Device Boot Start End Blocks Id File System /dev/sdb1 * 1 26 28813+ 83 Ext3 (/boot) /dev/sdb2 27 26135 2972542+ 83 XFS (/scratch) /dev/sdb3 26136 3313 33559785 82 swap /dev/sdb4 3314 681 24489486 83 Ext3 (/)

3. Treffen der deutschsprachigen HPC Benutzergruppe 8.-9. März 21 im Institutszentrum Schloss Birlinghoven der Fraunhofer-Gesellschaft in St. Augustin bei Bonn www.izb.fraunhofer.de