OpenFOAM Performance Benchmark and Profiling. Jan 2010

Similar documents
ECLIPSE Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

FLOW-3D Performance Benchmark and Profiling. September 2012

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Accelerating CFD using OpenFOAM with GPUs

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Enabling High performance Big Data platform with RDMA

Numerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Building a Scalable Storage with InfiniBand

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

SMB Direct for SQL Server and Private Cloud

Michael Kagan.

White Paper Solarflare High-Performance Computing (HPC) Applications

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

3G Converged-NICs A Platform for Server I/O to Converged Networks

Can High-Performance Interconnects Benefit Memcached and Hadoop?

High Performance Computing in CST STUDIO SUITE

Sun Constellation System: The Open Petascale Computing Architecture

Understanding PCI Bus, PCI-Express and In finiband Architecture

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

RDMA over Ethernet - A Preliminary Study

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW. Dell PowerEdge M-Series Blade Servers

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers

Choosing the Best Network Interface Card Mellanox ConnectX -3 Pro EN vs. Intel X520

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Mellanox Academy Online Training (E-learning)

Family 10h AMD Phenom II Processor Product Data Sheet

HPC Applications Scalability.

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

You re not alone if you re feeling pressure

Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

7 Real Benefits of a Virtual Infrastructure

Solution Brief July All-Flash Server-Side Storage for Oracle Real Application Clusters (RAC) on Oracle Linux

State of the Art Cloud Infrastructure

Replacing SAN with High Performance Windows Share over a Converged Network

PCI Express and Storage. Ron Emerick, Sun Microsystems

AMD Opteron Quad-Core

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

Clusters: Mainstream Technology for CAE

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Advancing Applications Performance With InfiniBand

RoCE vs. iwarp Competitive Analysis

Open Source CFD Solver - OpenFOAM

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Dell s SAP HANA Appliance

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

HPC Update: Engagement Model

Building Clusters for Gromacs and other HPC applications

Hadoop on the Gordon Data Intensive Cluster

Connecting the Clouds

Pedraforca: ARM + GPU prototype

Recommended hardware system configurations for ANSYS users

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

SOLUTION BRIEF AUGUST All-Flash Server-Side Storage for Oracle Real Application Clusters (RAC) on Oracle Linux

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

PCI Express IO Virtualization Overview

Performance Comparison of ISV Simulation Codes on Microsoft Windows HPC Server 2008 and SUSE Linux Enterprise Server 10.2

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

Kronos Workforce Central on VMware Virtual Infrastructure

OpenPower: IBM s Strategy for Best of Breed 64-bit Linux

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Intel True Scale Fabric Architecture. Enhanced HPC Architecture and Performance

Figure 1A: Dell server and accessories Figure 1B: HP server and accessories Figure 1C: IBM server and accessories

InfiniBand Switch System Family. Highest Levels of Scalability, Simplified Network Manageability, Maximum System Productivity

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

InfiniBand Switch System Family. Highest Levels of Scalability, Simplified Network Manageability, Maximum System Productivity

Virtual Compute Appliance Frequently Asked Questions

HP ProLiant BL460c achieves #1 performance spot on Siebel CRM Release 8.0 Benchmark Industry Applications running Microsoft, Oracle

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

HP ProLiant DL585 G5 earns #1 virtualization performance record on VMmark Benchmark

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

AMD PhenomII. Architecture for Multimedia System Prof. Cristina Silvano. Group Member: Nazanin Vahabi Kosar Tayebani

Cloud Data Center Acceleration 2015

InfiniBand vs Fibre Channel Throughput. Figure 1. InfiniBand vs 2Gb/s Fibre Channel Single Port Storage Throughput to Disk Media

Business white paper. HP Process Automation. Version 7.0. Server performance

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Accelerating Microsoft Exchange Servers with I/O Caching

How System Settings Impact PCIe SSD Performance

Mellanox Accelerated Storage Solutions

Big Data Performance Growth on the Rise

IBM System x family brochure

High Performance SQL Server with Storage Center 6.4 All Flash Array

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

Transcription:

OpenFOAM Performance Benchmark and Profiling Jan 2010

Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC Advisory Council Cluster Center For more info please refer to www.mellanox.com, www.dell.com/hpc, www.amd.com http://www.opencfd.co.uk/openfoam 2

OpenFOAM OpenFOAM (Open Field Operation and Manipulation) CFD Toolbox can simulate Complex fluid flows involving Chemical reactions Turbulence Heat transfer Solid dynamics Electromagnetics The pricing of financial options OpenFOAM is Open source, produced by OpenCFD Ltd 3

Objectives The presented research was done to provide best practices OpenFOAM performance benchmarking Interconnect performance comparisons Understanding OpenFOAM communication patterns Power-efficient simulations Compilation tips The presented results will demonstrate Balanced compute system enables Good application scalability Power saving 4

Test Cluster Configuration Dell PowerEdge SC 1435 24-node cluster Quad-Core AMD Opteron 2382 ( Shanghai ) CPUs Mellanox InfiniBand ConnectX 20Gb/s (DDR) HCAs Mellanox InfiniBand DDR Switch Memory: 16GB memory, DDR2 800MHz per node OS: RHEL5U3, OFED 1.4.1 InfiniBand SW stack MPI: OpenMPI-1.3.3 Application: OpenFOAM 1.6 Benchmark Workload Lid-driven cavity flow 5

Mellanox InfiniBand Solutions Industry Standard Hardware, software, cabling, management Design for clustering and storage interconnect Performance 40Gb/s node-to-node 120Gb/s switch-to-switch 1us application latency Most aggressive roadmap in the industry Reliable with congestion management Efficient RDMA and Transport Offload Kernel bypass CPU focuses on application processing Scalable for Petascale computing & beyond End-to-end quality of service Virtualization acceleration I/O consolidation Including storage The InfiniBand Performance Gap is Increasing 60Gb/s 20Gb/s 120Gb/s 40Gb/s 240Gb/s (12X) 80Gb/s (4X) Ethernet Fibre Channel InfiniBand Delivers the Lowest Latency 6

Quad-Core AMD Opteron Processor Performance Quad-Core Enhanced CPU IPC 4x 512K L2 cache 6MB L3 Cache Direct Connect Architecture HyperTransport Technology Up to 24 GB/s peak per processor Floating Point 128-bit FPU per core 4 FLOPS/clk peak per core Integrated Memory Controller Up to 12.8 GB/s DDR2-800 MHz or DDR2-667 MHz Scalability 48-bit Physical Addressing Compatibility Same power/thermal envelopes as 2 nd / 3 rd generation AMD Opteron processor 8 GB/S 8 GB/S PCI-E Bridge I/O Hub 8 GB/S 8 GB/S 8 GB/S USB PCI PCI-E Bridge Dual Channel Reg DDR2 7 November5, 2007 7

Dell PowerEdge Servers helping Simplify IT System Structure and Sizing Guidelines 24-node cluster build with Dell PowerEdge SC 1435 Servers Servers optimized for High Performance Computing environments Building Block Foundations for best price/performance and performance/watt Dell HPC Solutions Scalable Architectures for High Performance and Productivity Dell's comprehensive HPC services help manage the lifecycle requirements. Integrated, Tested and Validated Architectures Workload Modeling Optimized System Size, Configuration and Workloads Test-bed Benchmarks ISV Applications Characterization Best Practices & Usage Analysis 8

Dell PowerEdge Server Advantage Dell PowerEdge servers incorporate AMD Opteron and Mellanox ConnectX InfiniBand to provide leading edge performance and reliability Building Block Foundations for best price/performance and performance/watt Investment protection and energy efficient Longer term server investment value Faster DDR2-800 memory Enhanced AMD PowerNow! Independent Dynamic Core Technology AMD CoolCore and Smart Fetch Technology Mellanox InfiniBand end-to-end for highest networking performance 9

OpenFOAM Benchmark Results Input Dataset: Lid-driven cavity flow Mesh of 1000x1000 cells, icofoam solver for laminar, 2D, 1000 steps InfiniBand provides higher utilization, performance and scalability Up to 219% higher performance versus GigE and 109% higher than 10GigE 219% Higher is better 8-cores per node 10

OpenFOAM Performance Enhancement Default OpenFOAM binary is not optimized over InfiniBand Precompiled Open MPI doesn t solve the issue The ways to compile OpenFOAM properly is provided in the next slide With proper optimization, InfiniBand based performance improves by 172% 172% Higher is better 11

OpenFOAM Compilation Best Pratices Two ways to compile OpenFOAM Option 1: Modify OpenFOAM-1.6/etc/bashrc to use MPI entry rather OPENMPI WM_MPLIB:=MPI Change MPI entry within settings.sh to system OpenMPI export MPI_HOME=/usr/mpi/gcc/openmpi-1.3.3 Add the following to wmake/rules/linux64gcc/mplib PFLAGS = -DOMPI_SKIP_MPICXX PINC = -I$(MPI_ARCH_PATH)/include PLIBS = -L$(MPI_ARCH_PATH)/lib64 lmpi Option 2: Keep the default OPENMPI entry in bashrc Modify default Open MPI compiler option in ThirdParty-1.6/Allwmake Refer to Open MPI website for full compiling options Compiling with this option will take much longer (> 4 hours) 12

Power Cost Savings with Different Interconnect Dell economical integration of AMD CPUs and Mellanox InfiniBand Saves power up to $8400 to achieve same number of application jobs over GigE Up to $6400 to achieve same number of application jobs with 10GigE Yearly based for 24-node cluster As cluster size increases, more power can be saved $6400 $8400 $/KWh = KWh * $0.20 For more information - http://enterprise.amd.com/downloads/svrpwrusecompletefinal.pdf 13

OpenFOAM Benchmark Results Summary Interconnect comparison shows InfiniBand delivers superior performance in every cluster size Performance advantage extends as cluster size increases InfiniBand enables power saving Up to $8400/year power savings versus GigE Up to $6400/year power savings versus 10GigE Dell PowerEdge server blades provides Linear scalability (maximum scalability) and balanced system By integrating InfiniBand interconnect and AMD processors Maximum return on investment through efficiency and utilization 14

OpenFOAM MPI Profiling MPI Functions Mostly used MPI functions MPI_Allreduce, MPI_Waitall, MPI_Isend, and MPI_recv Number of MPI functions increases with cluster size 15

OpenFOAM MPI Profiling MPI Timing MPI_Allreduce, MPI_Recv, and MPI_Waitall show the highest communication overhead 16

OpenFOAM MPI Profiling Message Size Large communication overhead is caused by Small messages handled by MPI_Allreduce 8 Nodes (64 cores) 24 Nodes (192 cores) 17

OpenFOAM Profiling Summary OpenFOAM was profiled to identify its communication patterns MPI collective functions create the biggest communication overhead Number of messages increases with cluster size Interconnects effect to OpenFOAM performance Interconnect latency is critical to OpenFOAM performance Balanced system CPU, memory, Interconnect that match each other capabilities, is essential for providing application efficiency 18

Thank You HPC Advisory Council All trademarks are property of their respective owners. All information is provided As-Is without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and completeness of the information contained herein. HPC Advisory Council Mellanox undertakes no duty and assumes no obligation to update or correct any information presented herein 19 19