Fraunhofer Institute for Algorithms and Scientific Computing SCAI Performance Comparison of ISV Simulation Codes on Microsoft HPC Server 28 and SUSE Enterprise Server 1.2 Karsten Reineck und Horst Schwichtenberg 31.3.29
Fraunhofer Institute for Algorithms and Scientific Computing SCAI The Fraunhofer* Society Founded in 1949, non-profit organization Focus on application-oriented basic and industrial research 57 research institutes throughout Germany Staff of approx. 12.5 people, majority of qualified scientists and engineers Annual research volume around 1 billion euro *Joseph von Fraunhofer (1787-1826) Researcher, inventor and entrepreneur
Benchmark Introduction The same test cases (problems) are solved on equal hardware on and ISV defined test cases CFX: Internal flow through a flow channel with 5 to 2 million elements FLUENT: External flow over a truck body of around 14 million cells LS-DYNA: Neon-Refined Crash Test simulation (frontal crash with initial speed at 31.5 mph) PAMCRASH: Front crash of a Neon car with 1 million cells
Benchmark ISV Simulation Software SIMULIA Abaqus/Standard: implicit solutions and a range of contact and nonlinear material options for static, dynamics, thermal, and multiphysics analyses, Abaqus/Explicit: the explicit method for highspeed, nonlinear, transient response and multi-physics applications. ANSYS CFX is a powerful and flexible general-purpose computational fluid dynamics (CFD) package used for engineering simulations of all levels of complexity. ANSYS FLUENT is a powerful and flexible general-purpose computational fluid dynamics (CFD) package used for engineering simulations of all levels of complexity. DYNAMORE LS-DYNA is a multi-purpose, explicit and implicit finite element program used to analyze linear and nonlinear static and dynamic behavior of physical procedures. ESI Group PAM-CRASH is the most widely used crash simulation software.
Benchmark Hardware Hardware Twin Servers with Supermicro X7DWT main boards and quad core CPUs Attention when only one local node is involved: How does the scheduler attach 4 processes on 8 cores? Node Core 1 Core 2 CPU 1 Core 3 Core 4 Core 1 Core 2 CPU 2 Core 3 Core 4
Benchmark Hardware Network
CFX Benchmark Results Local Ethernet Infiniband 35 14 14 3 25 2 15 1 12 1 8 6 4 12 1 8 6 4 5 45% 4-1 8-1 number of processes - number of nodes 2-9% -9% 1% 1% 2-17% -18% 3% lower numbers (a lower run time) is better
FLUENT Benchmark Results Local Ethernet Infiniband 2 12 12 18 16 14 12 1 8 6 1 8 6 4 1 8 6 4 4 2 34% 12% 4-1 8-1 number of processes - number of nodes 2 1% 9% 7% 7% 1% 2 9% 1% 7% 7% 5 % lower numbers (a lower run time) is better
LS-DYNA Benchmark Results (single precision) Local Ethernet Infiniband 6 4 4 5 4 3 2 35 3 25 2 15 35 3 25 2 15 1 22% 1% 4-1 8-1 number of processes - number of nodes 1 5 4% 3% -1% 2% 33% 1 5 5% 7% 7% 8% 17% lower numbers (a lower run time) is better
PAM-CRASH Benchmark Results Local Infiniband 12 8 1 8 6 4 7 6 5 4 3 2 26% 1% 4-1 8-1 number of processes - number of nodes 2 1 3% lower numbers (a lower run time) is better
Abaqus/Explicit Benchmark Results The scheduler pauses the jobs in after about 12 minutes because there are not enough available cores. 3 25 Ethernet 3 25 Infiniband 3 25 2 15 1 5 Local 4-1 8-1 number of processes - number of nodes 2 15 1 5 18% 14% 12% 29% 42% 2 15 1 5 18% 18% 14% 24% 31% lower numbers (a lower run time) is better
Abaqus/Standard Benchmark Results The scheduler pauses the jobs in after about 12 minutes because there are not enough available cores. 6 5 Ethernet 6 5 Infiniband 14 12 1 8 6 4 2 Local 4-1 8-1 number of processes - number of nodes 4 3 2 1 17% 18% 18% 33% 31% 4 3 2 1 5% 12% 7% 13% 2% lower numbers (a lower run time) is better
Abaqus/Standard New Version 6.8-4 6 Ethernet 6.8-2 During the benchmark the Abaqus beta version 6.8-4 has been released for 5 6.8-4 Some issues for were resolved 4 3 2 Abaqus 6.8-4 has a performance improvement of about 3% in our scenarios 1 28% 38% 26% 3% 29% lower numbers (a lower run time) is better
Conclusion Deviation from to (=%) CFX 25% 2% 15% Abaqus Explicit 1% FLUENT 5% % Deviation Infiniband 7% Ethernet 13% Local 18% Abaqus Standard -5% -1% LS-DYNA PAM-CRASH Average Local Ethernet Infiniband Deviation CFX 4% FLUENT 13% LS-DYNA 11% PAM-CRASH 11% Abaqus Standard 17% Abaqus Explicit 22%
Open MS HPC Portal Porting Open Source HPC Software to Microsoft Platforms Portal for open source software developed and ported by Fraunhofer-SCAI (Elmer, OpenFOAM) In the future: uploads and downloads of YOUR open source software Best practises URL: http://www.scai.fraunhofer.de/openmshpc.html
Fraunhofer Institute for Algorithms and Scientific Computing SCAI Thanks for your attention! www.scai.fraunhofer.de
Appendix: Cluster Configuration Hardware and Network Head Node Compute Nodes MICRO-STAR MS-9172-1S 2x Intel Xeon E533 @ 2.13GHz (ES) 4 GB FB-DDR2 RAM 2x 1 Mbps LAN Mellanox ConnectX (MT25418) Infiniband DDR Channel Adapter Supermicro X7DWT 2x Intel Xeon E5472 @ 3.GHz (Quad Core) 16 GB FB-DDR2 RAM 2x 1 Mbps LAN Mellanox ConnectX (MT26418) Infiniband, 2Gbps PCI-E 2. (onboard) Network Hardware Network Configuration Switch 1: HP procurve switch 2724 J4897A, 1 Mbps Ethernet, 24 ports Switch 2: Extreme Networks Summit X45-24t, 1 Mbps Ethernet, 24 ports Switch 3: Voltaire ISR-924D_M 24 4x DDR, Infiniband, 24 ports Network 1: 1GBit/s Ethernet (management) Network 2: 1GBit/s Ethernet (MPI) Network 3: Infiniband (MPI)
Appendix: Operating Systems and ISV Softwares Server HPC Edition, Build 61, 64 bit Server Manager Version 6..61.1878 HPC Cluster Manager Version 2..1551. Infiniband Driver: Mellanox Version 1.4.1.3223 SUSE Enterprise Server 1.2 Kernel 2.6.16.6-.21, libc 2.9 Infiniband: OFED Version 1.3.1 Abaqus 6.8-2 6.8-4 (for only) Fluent CFX Pamcrash LS-Dyna 12.7 beta 11 SP1 with Arch detect fix for Quad core CPUs v28. with modified pamworld on 971_R3.2.1 double precision partitioning setup Device Boot Start End Blocks Id File System /dev/sdb1 * 1 26 28813+ 83 Ext3 (/boot) /dev/sdb2 27 26135 2972542+ 83 XFS (/scratch) /dev/sdb3 26136 3313 33559785 82 swap /dev/sdb4 3314 681 24489486 83 Ext3 (/)
3. Treffen der deutschsprachigen HPC Benutzergruppe 8.-9. März 21 im Institutszentrum Schloss Birlinghoven der Fraunhofer-Gesellschaft in St. Augustin bei Bonn www.izb.fraunhofer.de