Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer



Similar documents
Performance Across the Generations: Processor and Interconnect Technologies

Panasas: High Performance Storage for the Engineering Workflow

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Stanford HPC Conference. Panasas Storage System Integration into a Cluster

FLOW-3D Performance Benchmark and Profiling. September 2012

High Performance Computing in CST STUDIO SUITE

Clusters: Mainstream Technology for CAE

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Cloud Computing through Virtualization and HPC technologies

Simulation Platform Overview

Cluster Implementation and Management; Scheduling

A High-Performance Storage and Ultra-High-Speed File Transfer Solution

Scale-out NAS Unifies the Technical Enterprise

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Hadoop Cluster Applications

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA (T) (F)

Improving Time to Results for Seismic Processing with Paradigm and DDN. ddn.com. DDN Whitepaper. James Coomer and Laurent Thiers

benchmarking Amazon EC2 for high-performance scientific computing

Recommended hardware system configurations for ANSYS users

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Integrated Grid Solutions. and Greenplum

Self service for software development tools

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Moving Virtual Storage to the Cloud

Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage

IBM System Cluster 1350 ANSYS Microsoft Windows Compute Cluster Server

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Cloud computing. Intelligent Services for Energy-Efficient Design and Life Cycle Simulation. as used by the ISES project

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin.

On-Demand Supercomputing Multiplies the Possibilities

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012

SGI HPC Systems Help Fuel Manufacturing Rebirth

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers

Parallel Large-Scale Visualization

NFS SERVER WITH 10 GIGABIT ETHERNET NETWORKS

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Private cloud computing advances

Isilon IQ Scale-out NAS for High-Performance Applications

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

SAS Business Analytics. Base SAS for SAS 9.2

SEAIP 2009 Presentation

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

The Ultimate in Scale-Out Storage for HPC and Big Data

Understanding the Benefits of IBM SPSS Statistics Server

Microsoft Exchange Server 2003 Deployment Considerations

Using PCI Express Technology in High-Performance Computing Clusters

OpenMP Programming on ScaleMP

Big data management with IBM General Parallel File System

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

The Benefits of Virtualizing

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Deploying Microsoft SQL Server 2005 Business Intelligence and Data Warehousing Solutions on Dell PowerEdge Servers and Dell PowerVault Storage

Accelerating and Simplifying Apache

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

HPC Storage Solutions at transtec. Parallel NFS with Panasas ActiveStor

Sun Constellation System: The Open Petascale Computing Architecture

Scala Storage Scale-Out Clustered Storage White Paper

Cluster Computing at HRI

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

EMC Unified Storage for Microsoft SQL Server 2008

Easier - Faster - Better

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Boost Database Performance with the Cisco UCS Storage Accelerator

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

Building Clusters for Gromacs and other HPC applications

Michael Kagan.

POSIX and Object Distributed Storage Systems

Very special thanks to Wolfgang Gentzsch and Burak Yenier for making the UberCloud HPC Experiment possible.

Kriterien für ein PetaFlop System

Architecting Your SAS Grid : Networking for Performance

2009 Oracle Corporation 1

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array

HPC Advisory Council

HP reference configuration for entry-level SAS Grid Manager solutions

Cisco, Citrix, Microsoft, and NetApp Deliver Simplified High-Performance Infrastructure for Virtual Desktops

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers

Oracle Exadata Database Machine for SAP Systems - Innovation Provided by SAP and Oracle for Joint Customers

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Using the Windows Cluster

Performance in a Gluster System. Versions 3.1.x

Broadcom 10GbE High-Performance Adapters for Dell PowerEdge 12th Generation Servers

Dell Virtual Remote Desktop Reference Architecture. Technical White Paper Version 1.0

Numerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015

White Paper Solarflare High-Performance Computing (HPC) Applications

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

SR-IOV In High Performance Computing

Transcription:

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge, Cambridge UK SYNOPSIS This work examines the parallel scalability characteristics of commercial CFD software ANSYS FLUENT 12 for up to 256 processing cores, for transient CFD simulations that are heavy in I/O relative to numerical operations. In studies conducted with engineering contributions from the University of Cambridge and ANSYS, the Linux HPC environment named Darwin combined an Intel Xeon cluster with a Panasas parallel file system and shared storage. The motivation for these studies was to quantify the performance and scalability benefits of parallel I/O in CFD software on a parallel file system, verses a conventional serial NFS file systems for transient CFD cases. 1. INTRODUCTION CFD parallel efficiency and simulation turn-around times continue to be an important factor behind engineering and scientific decisions to develop models at higher fidelity. Most parallel CFD simulations use scalable Linux clusters for their demanding HPC requirements, but for certain classes of CFD models, data I/O can severely degrade overall job scalability and limit CFD effectiveness. As CFD model sizes grow and the numbers of processing cores are increased for a single simulation, it becomes critical for each thread on each core to perform I/O operations in parallel, rather than rely on the master compute thread to collect and operate on each I/O process in serial. Examples of I/O demanding simulations include ~100-millIon-cell steady state models on a large (> 64) number of cores, and moderate-sized and moderately parallel transient CFD models that require frequent and multiple solution files writes in order to collect time history results for subsequent post-processing. In the case of frequent time history files being written in a serial process, any parallel benefit from a CFD solver is soon overwhelmed as more processing cores (and therefore more serial I/O threads) are added to a simulation. For this reason, commercial CFD software developers such as ANSYS offer parallel I/O schemes that scale with their CFD solvers. But just as parallel solvers require scalable clusters, parallel I/O operates more efficiently with a parallel file system, and sometimes referred to as parallel network attached storage (NAS). Parallel NAS capabilities enable the application to overcome I/O bottlenecks, enabling I/Obound CFD simulations to scale to their full potential. 2. PARALLEL FILE SYSTEMS AND SHARED STORAGE A new class of parallel file system and shared storage technology has developed that scales I/O in order to extend overall scalability of CFD simulations on clusters. For most

implementations, entirely new storage architectures were introduced that combine key advantages of legacy shared storage systems, yet eliminate the drawbacks that have made them unsuitable for large distributed cluster deployments. Parallel NAS can achieve both the high-performance benefits of direct access to disk, as well as the data-sharing benefits of files and metadata that HPC clusters require for CFD scalability. One implementation from Panasas offers the technology with an object-based storage architecture that can eliminate serial I/O bottlenecks. Object-based storage enables two primary technological breakthroughs vs. conventional block-based storage. First, since an object contains a combination of user data and metadata attributes, the object architecture is able to offload I/O directly to the storage device instead of going through a central file server to deliver parallel I/O capability. That is, just as a cluster spreads the work evenly across compute nodes, the object-based storage architecture allows data to be spread across objects for parallel access directly from disk. Secondly, since each object has metadata attributes in addition to user-data, the object can be managed intelligently within large shared volumes under a single namespace. This eliminates the need for administrators to focus on LUN management as those operations are automatically handled by intelligent storage blades. Object-based storage architectures provide virtually unlimited growth in capacity and bandwidth, making them well-suited for handling CFD run-time I/O operations and large files for post-processing and data management. With object-based storage, the cluster has parallel and direct access to all data spread across the shared storage system. This means a large volume of data can be accessed in one simple step by the cluster for computation and visualization to improve speed in the movement of data between storage and other tasks in the CFD workflow. The Panasas parallel file system has developed software tuning and advanced caching algorithms, combined with extensive cache memory and intelligent hardware components, to optimize the parallel file system architecture, and provides an easy to manage, appliance-like implementation. 3. HPC ENVIRONMENT DARWIN CLUSTER The University of Cambridge established the High Performance Computing Service in 2006, the core of which is a cluster named Darwin, after the famous Cambridge alumnus who developed the theory of evolution. The HPCS supports research in many disciplines including CFD and the availability of ANSYS FLUENT among other CFD software. Darwin is a cluster of 2340 cores, provided by 585 dual socket Dell PowerEdge 1950 IU rack mount server nodes. Each node consists of two 3.0 gigahertz dual core Intel (Woodcrest) Xeon processors, forming a single node with 8 gigabyte of RAM (two per core for four cores) running Scientific Linux CERN and interlinked by a QLogic InfiniBand interconnect. The cluster is organized into nine computational units (CUs) with each CU consisting of 64 nodes in two racks. The nodes within each CU are connected to the full bisectional bandwidth InfiniBand network which provides 900 MB/second bandwidth with an MPI latency of 1.9 microseconds. The CUs are connected to a half bisectional bandwidth Infiniband network to ensure that large parallel jobs can run over the entire cluster with good performance. In addition to the Infiniband network, each computational unit has a full bisectional bandwidth gigabit ethernet network for data, and a 100 megabyte network for administration; also each node is connected to a power network through which the power to each box can be controlled remotely. Darwin details of the cluster components are provided in Figure 1 below.

Figure 1: Details of the Darwin Linux Supercomputer at the University of Cambridge The cluster s initial installation included 46 terabytes of local disk capacity and 60 terabytes of file system storage provided by Dell PowerVault MD1000 disk arrays, with 15,000 rpm, 73GB SAS drives, connected to the cluster network over 10 gigabit Ethernet links. The storage pool was managed by the TerraGrid parallel file system. The cluster has a peak LINPACK Performance of 28.08 teraflops per second and a sustained value of 18.27 teraflops per second. 4. TRANSIENT CFD WITH PARALLEL I/O The combination of scalable CFD application software, Linux HPC clusters, high speed network interconnects and a parallel file system and storage, can provide engineers and scientists with new and significant performance advantages for transient CFD simulations. In recent studies, the advantages of ANSYS FLUENT 12 with its parallel I/O capability, demonstrated efficient scalability, total job turn-around time improvements greater than 2 times the previous release 6.3, and the ability to conduct high-fidelity and more cost-effective transient solutions that were previously impractical for industrial CFD consideration. The ANSYS FLUENT models for these studies comprised cases that were relevant in size and flow features to current commercial and research CFD practice: ANSYS FLUENT capability study of a large external aerodynamics case of 111 million cell truck geometry modelled as transient with DES, for performance evaluation from 64 to 256 Intel Xeon cores, and the writing of 20 GB of output data once per every 5 time steps, during 100 iterations. ANSYS FLUENT capacity study of a moderate external aerodynamics case of 14 million cell truck geometry modelled as transient with DES, for performance evaluation of 8 concurrent jobs, each using 16 Intel Xeon cores, and the writing of a total of 23 GB of output data once per every 5 time steps, during 100 iterations.

Collaboration between ANSYS and Panasas application engineers that began in 2005 has developed a parallel I/O scheme based on MPI-I/O under the MPI-2 standard. ANSYS FLUENT parallel I/O will be introduced in release 12 scheduled for 1H-2009 and will leverage several parallel file systems including Panasas PanFS. This development will permit ANSYS FLUENT scalability for even heavy I/O applications such as large steady state models with frequent checkpoints, or even more challenging, transient CFD cases that may require 100 or more solution writes per single simulation. The benefits of parallel I/O for transient CFD were demonstrated with a production case of an ANSYS FLUENT aerodynamics model of 111M cells, provided by an industrial truck vehicle manufacturer. Figure 2 below, illustrates the I/O schematic of the performance tests that were conducted, which comprised a case file read, a compute solve of 5 time steps with 100 iterations, and a write of the data file. In a full transient simulation the solve and write tasks would be repeated to a much larger number of time steps and iterations, and with roughly the same amount of computational work for each of these repeatable tasks. Figure 2: Schematic of truck aerodynamic transient CFD simulation and I/O scheme Details of the CFD model, the cluster and storage configuration and the results of solver + write times are provided in Figure 3 on the next page. (Note: the case read a one-time task in a full simulation and was therefore omitted in the results). ANSYS FLUENT 12 with parallel I/O conducted concurrent writes of the local (partition) solutions directly to the global solution data file on PanFS, unlike the case for 6.3 that writes the local solutions one-by-one to a global solution data file. The results demonstrate that ANSYS FLUENT 12 on the Panasas parallel file system demonstrated a more than 2 times reduction in total time vs. 6.3 on NFS and a conventional NAS system. The total time improvement due to the write time advantage for release 12 over 6.3, demonstrating up to 39 times higher data transfer rates measured in MB/s. In the case of 64 cores, the solver advantage of ANSYS FLUENT 12 over 6.3 was only 4% with the total time benefit of 1.8 fold shown in Figure 3, attributed to the parallel I/O speedup. The ANSYS FLUENT 12 solver advantage grows to 9% at 128 cores, and 24% at 256 cores, which contribute to the growing benefits in total time improvements of 2.0 fold on 128 and 2.3 fold on 256 cores for ANSYS FLUENT 12 on PanFS vs. 6.3 on NFS.

Figure 3: Comparison of capability results for ANSYS FLUENT R12 + PanFS vs. 6.3 + NFS It is important to note that the performance of CFD solvers and numerical operations are not affected by the choice of file system, which only improves I/O operations. That is, a CFD solver will perform the same on a given cluster regardless of whether a parallel or serial NFS file system is used. The advantage of parallel I/O is best illustrated in a comparison of the computational profiles of each scheme. ANSYS FLUENT 12 on PanFS keeps the I/O percent of the total job time in the range of 3% at 64 cores to 8% at 256 cores, whereas 6.3 and NFS spend as much as 50% of the total job time in I/O. An ANSYS FLUENT license is too valuable to spend such a high percentage of operations in I/O relative to numerical operations. The high efficiency of a parallel I/O solution equates to ~2 times more ANSYS FLUENT 12 utilization that can be achieved within the same license cost structure as the I/O-heavy 6.3 serial I/O solution under similar conditions (this particular test case and HPC cluster configuration). Despite the advantages in this large-scale case using a high number of processing cores, capability computing is not as common in industry as capacity computing. Rather, the situation when a single high-fidelity job is given the resources of an entire HPC cluster (capability computing) is not nearly as common as that same cluster being used for the processing of a set of lower-fidelity jobs, each using a moderate number of cores (capacity computing), for shared use of a cluster by multiple engineers. The capacity study was comprised of 8 jobs, each run concurrently on 16 processing cores, for a total utilization of 128 processing cores. The ANSYS FLUENT model was the same for each of the 8 jobs, and was the same truck case as the capability study except that the mesh had been coarsened to 14 million cells. This meant that each job was writing a smaller data file of about 3 GB each, but 8 of these concurrently into 8 independent files, whereas the capability case was writing 20 GB to a single file. Also in contrast, the capacity jobs had much fewer processing cores working on each job, in this case 16, whereas the capability case was utilizing from 64 to 256 cores.

At 256 cores the NFS case had 256 I/O threads where each must be written through the single head-node thread. This scenario/o can lead to a severe case of I/O bottlenecks compared with the parallel case with just 16 I/O threads for 8 concurrent instances. The results illustrated in Figure 4 below demonstrate that ANSYS FLUENT 12 on PanFS has a 23% total time advantage over 6.3 and NFS. Similar to the capability study results, the computational profiles of each show an efficient utilization of ANSYS FLUENT numerical operations at 97% vs. I/O operations at 3% for PanFS. In comparison, release 6.3 on NFS showed 19% of the computation time in spent in I/O operations. Figure 4: Comparison of capacity results for ANSYS FLUENT R12 + PanFS vs. 6.3 + NFS 5. CONCLUSIONS The joint study conducted between research and industry organizations demonstrated that ANSYS FLUENT CFD software with parallel I/O capabilities and using a parallel file system can greatly improve job completion times for transient simulations that are heavy in I/O operations relative to numerical operations. The favourable results were conclusive for the range of capability and capacity uses on the Darwin cluster with a Panasas parallel file system. These benefits will enable industry an expanded and more common use of transient CFD in applications such as aerodynamics, aeroacoustics, reacting flows and combustion, multi-phase and free surface flows, and DES/LES turbulence, among other advanced CFD modeling and practices. 6. REFERENCES 1. S. Kodiyalam, M. Kremenetsky and S. Posey, Balanced HPC Infrastructure for CFD and associated Multidiscipline Simulations of Engineering Systems, Proceedings, 7 th Asia CFD Conference 2007, Bangalore, India, November 26 30, 2007. 2. Gibson, G.A., Van Meter, R., Network Attached Storage Architecture, Communications of the ACM, Vol. 43, No. 11, November 2000. 3. ANSYS FLUENT Features: http://www.fluent.com/software/fluent/index.htm