Accelerating CFD using OpenFOAM with GPUs



Similar documents
GPGPU accelerated Computational Fluid Dynamics

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPGPU acceleration in OpenFOAM

High Performance Computing in CST STUDIO SUITE

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Large-Scale Reservoir Simulation and Big Data Visualization

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Parallel Computing with MATLAB

FLOW-3D Performance Benchmark and Profiling. September 2012

OpenFOAM Workshop. Yağmur Gülkanat Res.Assist.

Simulation Platform Overview

Turbomachinery CFD on many-core platforms experiences and strategies

HPC enabling of OpenFOAM R for CFD applications

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

LS DYNA Performance Benchmarks and Profiling. January 2009

Graphic Processing Units: a possible answer to High Performance Computing?

Overview of HPC systems and software available within

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

HP ProLiant SL270s Gen8 Server. Evaluation Report

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

Open Source CFD Solver - OpenFOAM

Building a Top500-class Supercomputing Cluster at LNS-BUAP

HP Workstations graphics card options

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Jean-Pierre Panziera Teratec 2011

ECLIPSE Performance Benchmarks and Profiling. January 2009

Multicore Parallel Computing with OpenMP

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures

Recent Advances in HPC for Structural Mechanics Simulations

HPC Deployment of OpenFOAM in an Industrial Setting

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Installation Guide. (Version ) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom

Evaluation of CUDA Fortran for the CFD code Strukti

Packet-based Network Traffic Monitoring and Analysis with GPUs

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

High Performance GPGPU Computer for Embedded Systems

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

Computer Graphics Hardware An Overview

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

CFD Implementation with In-Socket FPGA Accelerators

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

GPUs for Scientific Computing

NVIDIA Jetson TK1 Development Kit

Case Study on Productivity and Performance of GPGPUs

Performance Measurement of a High-Performance Computing System Utilized for Electronic Medical Record Management

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

ultra fast SOM using CUDA

System Requirements Table of contents

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Trends in High-Performance Computing for Power Grid Applications

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

walberla: A software framework for CFD applications on Compute Cores

GPU Acceleration of the SENSEI CFD Code Suite

Virtualization of ArcGIS Pro. An Esri White Paper December 2015

Stream Processing on GPUs Using Distributed Multimedia Middleware

REPORT DOCUMENTATION PAGE

Cornell University Center for Advanced Computing

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

QCD as a Video Game?

Hardware Acceleration for CST MICROWAVE STUDIO

GPU-Based Network Traffic Monitoring & Analysis Tools

Auto-Tuning TRSM with an Asynchronous Task Assignment Model on Multicore, GPU and Coprocessor Systems

Self Financed One Week Training

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Network Traffic Monitoring and Analysis with GPUs

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

The K computer: Project overview

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Unified Computing Systems

Next Generation GPU Architecture Code-named Fermi

NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief

Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science

A Fast Double Precision CFD Code using CUDA

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Selecting NetVanta UC Server Hypervisor and Server Platforms

Parallel Computing. Introduction

Transcription:

Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide range of engineering and science disciplines in both commercial and academic organizations. OpenFOAM has an extensive range of features to solve a wide range of fluid flows and physics phenomenon. OpenFOAM provides tools for all three stages of CFD, preprocessing, solvers, and post processing. Almost all are capable of being run in parallel as standard making it an important resource for a wide range of scientists and engineers using HPC for CFD. General purpose Graphic Processor Units (GPUs) technology is increasingly being used to accelerate computeintensive HPC applications across various disciplines in the HPC community. OpenFOAM CFD simulations can take a significant amount of time and are computational intensive. Comparing various alternatives for enabling faster research and discovery using CFD is of key importance. SpeedIT libraries from Vratis provide GPU-accelerated iterative solvers that replace the iterative solvers in OpenFOAM. In order to investigate the GPU-acceleration of OpenFOAM, we simulate the three dimensional lid driven cavity problem based on the tutorial provided with OpenFOAM. The 3D lid driven cavity problem is an incompressible flow problem solved using OpenFOAM icofoam solver. The majority of the computational intensive portion of the solver is the pressure equation. In the case of acceleration, only the pressure calculation is offloaded to the GPUs. On the CPUs, the PCG solver with DIC preconditioner is used. In the GPU-accelerated case, the SpeedIT 2.1 algebraic multigrid precoditioner with smoothed aggregation (AMG) in combination with the SpeedIT Plugin to OpenFOAM is used.

Figure 1: OpenFOAM performance of 3D cavity case using 4 million cells on a single node. Figure 1 shows the performance of OpenFOAM s the 3D lid driven cavity case using approximately 4 million cells on a single R720 node. The results are presented for CPU only, CPU + 1 M2090 GPU, and CPU + 2 M2090. The R720 CPU only results reflect the maximum number of cores available on this configuration (16 cores). The software limits the number of CPU cores used for GPU-acceleration mapping one CPU core to one GPU. The R720 + 1 M2090 and R720 + 2 M2090 results reflect the use of 1 core + 1 GPU and 2 cores + 2 GPU s respectively. Compared to a CPU only configuration, no acceleration is obtained by using one GPU and an acceleration of 1.5X with two GPUs. Figure 2 shows the power consumption results for the 4 million cell simulation. In all cases, the power consumption is measured. As shown, the power efficiency, i.e. the useful work delivered for every watt of power consumed, improves by adding GPUs. The power efficiency is defined as the performance (simulations/day) per measured power consumption (Watt). With one M2090, the power efficiency is approximately 1.3X and with two M2090 GPUs the power efficiency is almost 1.3X compared to the CPU only configuration.

Figure 2: Total Power and Power Efficiency of 3D cavity case on 4 million cells on a single node. Figure 3 shows the performance of OpenFOAM s 3D lid driven cavity case using approximately 8 million cells on a single R720 node. The size of the problem required the use of both GPUs. Compared to a CPU only configuration, an acceleration of 1.5X was achieved with two GPUs. Figure 4 shows the power consumption results for the 8 million cell simulation. As shown, the power efficiency also improves for the larger simulation. With two M2090 GPUs the power efficiency is almost 1.3X compared to the CPU only configuration.

Figure 3: OpenFOAM performance 3D cavity case using 8 million cells on a single node.

Figure 4: Total Power and Power Efficiency of 3D cavity case on 8 million cells on a single node. In conclusion, first, using GPUs can accelerate the OpenFOAM icofoam solver for incompressible fluid flow. As shown in Figure 2, using CPUs only, a single node delivers about 24 simulations/day of sustained performance for a problem size of 4 million cells. Adding 1 GPU delivers about the same sustained performance but increases the performance/watt ratio, while adding 2 GPUs the sustained performance improves to about 36 simulations/day. Second, using GPUs improves the performance/watt ratio as well. The power consumption due to GPUs increases but not as much as the corresponding performance improvement. As shown in Figure 3, the CPU only simulation consumes about 400 Watts and operates at 0.061 (simulations/day)/watt. Adding 1 GPU but using only one core of the CPU, the power consumption decreases to about 300 Watts and operates at 0.078 (simulations/day)/watt, which represents an increase of about 28% in performance/watt. Adding 2 GPUs and using only two cores of the CPU, the power consumption increases to about 445 Watts and operates at 0.0083 (simulations/day)/watt, which represents an increase of about 36% in performance/watt. Similar trends are shown in figures 4 and 5 for the problem size of 8 million cells. On the larger problem size, the performance increased from about 15 simulations/day for CPU only to about 24 simulations/day for 2 GPUs. The power consumption increased from about 391 Watts operating at 0.039 (simulations/day)/watt for CPU only to about 462 Watts operating at 0.051 (simulations/day)/watt for 2 GPUs. This represents an increase of about 32% in performance/watt.

Configuration and Installation Each PowerEdge R720 has a dual Intel Xeon E5-2600 series processor. Please note installing two NVIDIA Tesla M2090 GPUs requires the use of a GPU enablement kit, the x16 option on the 3 rd riser, and dual, redundant 1100W power supplies, shown in Figure 5. The details of the hardware and software components are given below: Figure 5: Two M2090 GPUs can be attached inside the R720 using a riser and associated power cables. Compute Node Model PowerEdge R720 Compute Node processor Two Intel @ 2.2 GHz, 95W (Xeon E5-2660) Memory 64 GB 1333 MHz GPUs NVIDIA Tesla M2090 Number of GPUs 2 M2090 GPU Number of cores 512 Memory 6 GB Memory bandwidth 177 GB/s Peak Performance: Single Precision 1,331 GFLOPS Peak Performance: Double Precision 665 GFLOPS Power Capping 250W Software OpenFOAM Version 1.7.1 SpeedIT from Vratis Version 2.1 CUDA 4.0(285.05.23) OS RHEL 6.2