Wind-Tunnel Simulation using TAU on a PC-Cluster: Resources and Performance Stefan Melber-Wilkending / DLR Braunschweig



Similar documents
Improved LS-DYNA Performance on Sun Servers

High Performance Computing in CST STUDIO SUITE

Cluster Computing at HRI

Clusters: Mainstream Technology for CAE

Building Clusters for Gromacs and other HPC applications

Performance Comparison of ISV Simulation Codes on Microsoft Windows HPC Server 2008 and SUSE Linux Enterprise Server 10.2

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

vrealize Business System Requirements Guide

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Enabling Technologies for Distributed Computing

A Flexible Cluster Infrastructure for Systems Research and Software Development

System requirements for A+

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

Scalable Distributed Schur Complement Solvers for Internal and External Flow Computations on Many-Core Architectures

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Business white paper. HP Process Automation. Version 7.0. Server performance

Enabling Technologies for Distributed and Cloud Computing

Very special thanks to Wolfgang Gentzsch and Burak Yenier for making the UberCloud HPC Experiment possible.

MEGWARE HPC Cluster am LRZ eine mehr als 12-jährige Zusammenarbeit. Prof. Dieter Kranzlmüller (LRZ)

Sun Constellation System: The Open Petascale Computing Architecture

IT Business Management System Requirements Guide

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA

Supercomputing Status und Trends (Conference Report) Peter Wegner

HPC Update: Engagement Model

Numerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Recent Advances in HPC for Structural Mechanics Simulations

Virtualised MikroTik

Multicore Parallel Computing with OpenMP

Scaling from Workstation to Cluster for Compute-Intensive Applications

SUN ORACLE EXADATA STORAGE SERVER

A Scalable Ethernet Clos-Switch

FLOW-3D Performance Benchmark and Profiling. September 2012

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin.

How To Visualize At The Dlr

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers

wu.cloud: Insights Gained from Operating a Private Cloud System

Ansys & optislang on a HPC-Cluster

Recommended hardware system configurations for ANSYS users

When EP terminates the use of Hosting CC OG, EP is required to erase the content of CC OG application at its own cost.

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

ABAQUS High Performance Computing Environment at Nokia

Virtuoso and Database Scalability

- An Essential Building Block for Stable and Reliable Compute Clusters

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

Turbomachinery CFD on many-core platforms experiences and strategies

Virtual Compute Appliance Frequently Asked Questions

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

Packet Capture in 10-Gigabit Ethernet Environments Using Contemporary Commodity Hardware

Altix Usage and Application Programming. Welcome and Introduction

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

A Smart Investment for Flexible, Modular and Scalable Blade Architecture Designed for High-Performance Computing.

1 DCSC/AU: HUGE. DeIC Sekretariat /RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA (T) (F)

Current Status and Challenges in CFD at the DLR Institute of Aerodynamics and Flow Technology

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

Lustre Networking BY PETER J. BRAAM

Streamline Computing: Cluster Integration and Effective Software

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

Ignify ecommerce. Item Requirements Notes

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

SYSTEM SETUP FOR SPE PLATFORMS

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

High performance parallel computing for Computational Fluid

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

theguard! Service Management Center (Valid for Version 6.3 and higher)

Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison

CAS2K5. Jim Tuccillo

Cluster Implementation and Management; Scheduling

System Requirements. SuccessMaker 5

IBM System Cluster 1350 ANSYS Microsoft Windows Compute Cluster Server

ORACLE BIG DATA APPLIANCE X3-2

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

PRIMERGY server-based High Performance Computing solutions

Accelerating CFD using OpenFOAM with GPUs

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure

High-Performance Computing Clusters

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

Hardware/Software Guidelines

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications

Kriterien für ein PetaFlop System

Virtualization of a Cluster Batch System

Transcription:

Wind-Tunnel Simulation using TAU on a PC-Cluster: Resources and Performance Stefan Melber-Wilkending / DLR Braunschweig Folie 1 > Vortrag > Stefan Melber-Wilkending

Wind-Tunnel Simulation using TAU on a PC-Cluster: Resources and Performance Outline New Linux PC-Cluster at Braunschweig (DLR-AS) Performance Measurements of TAU on PCClusters: Platforms Results Example of an application on a PCCluster: Wind-Tunnel Simulation Wind-Tunnel Boundary Condition Example: Simulation of DLR ALVAST High-Lift Configuration in Low-Speed Wind-Tunnel DNW-NWB

New Linux PC-Cluster at DLR-AS Technical Data - General New Linux PC-Cluster at DLR-AS / Braunschweig: For middle-sized CFD-problems Production-usage for research and contract-work Size: 276 Opteron 2.6 GHz CPUs Hardware installation and testing: 09/2005 Open for user-access: 10/2005

New Linux PC-Cluster at DLR-AS Technical Data - Nodes 138 Dual-Opteron (AMD) Nodes (V20z, SUN) CPU-clockspeed: 2.6 GHz 4 GByte DDR1/400 memory 2 x 73 GB Ultra320 SCSI HDs Management processor ( remote power reset, monitoring, error-analysis...) Infiniband HPC interconnect 100 MBit Ethernet interconnect 1 HU - size SuSE Linux 9.3 professional

New Linux PC-Cluster at DLR-AS Technical Data - Frontends 2 Frontends (V40z, SUN) 4x Opteron 2.2 GHz (AMD) 8 GByte DDR1/333 memory 2 x 73 GB Ultra320 SCSI HDs 100MBit Ethernet interconnect 3 HU - size SuSE Linux 9.3 professional RAID system 10 TByte Infiniband switch 144 ports (Voltaire) PBS Pro queuing-system / MAUI sheduler

New Linux PC-Cluster at DLR-AS Technical Data - Setup

New Linux PC-Cluster at DLR-AS Performance Compared Systems 32 Nodes / 64 CPUs Intel Xeon 3.06 GHz NEC-Cluster (DLR-AS): 2 GByte RAM / Node, Myrinet 2000 Interconnect 128 Nodes / 256 CPU AMD Opteron 2.0 GHz Cray-Cluster (HWW) 4 GByte RAM / Node, Myrinet 2000 Interconnect 192 Nodes / 384 CPUs AMD Opteron 2.4 GHz SUN-Cluster (DLR-AT) 4 GByte RAM / Node, Infiniband (Voltaire) Interconnect 36 Nodes / 72 CPUs AMD Opteron 2.2 GHz Cray XD1-Cluster (Cray) 4 GByte RAM / Node, RapidArray Interconnect (direct connection between network and Hybertransport-channel on the CPU) 72 Nodes / 144 CPUs AMD Opteron 2.4 GHz Cray XD1-Cluster (Cray) 8 GByte RAM / Node, RapidArray Interconnect

New Linux PC-Cluster at DLR-AS Performance Setup All Clusters running under Linux Operating-System Compiler: GnuCC 3.2.3 TAU-Code, Version 2004.1.2 with typical settings for complex configurations: Central discretization Implicit time integration (LU-SGS) CFL-number: 5 Multigrid: 3v Turbulence model: Menter k-ω SST Low-Mach-number preconditioning Cache-optimization Case: glider with laminar-turbulent transition Free-stream conditions: Ma = 0.078, Re = 1.1e6 Grid: 10 million points, 30 layers

New Linux PC-Cluster at DLR-AS Performance Test Results CPU-Time for 50 cycles [s] for different CPU-numbers CPUs NEC Xeon Cray Opteron Cray Opteron Cray Opteron SUN Opteron SUN Opteron 3.06 Ghz (AS) 2.0 Ghz 2.2 Ghz 2.4 Ghz 2.4 Ghz (AT) 2.6 Ghz (AS) 6 2303 1947 1702 8 1667 1307 1222 1266 1126 12 1564 1108 881 811 987 743 16 1203 760 661 621 669 572 32 643 436 347 326 339 306 48 241 236 60 176 183 165

New Linux PC-Cluster at DLR-AS Performance Test Results Relative Speedup compared to Cray Opteron-Cluster at HWW CPUs NEC Xeon Cray Opteron Cray Opteron Cray Opteron SUN Opteron SUN Opteron 3.06 Ghz (AS) 2.0 Ghz 2.2 Ghz 2.4 Ghz 2.4 Ghz (AT) 2.6 Ghz (AS) 6 100 118 135 8 100 128 136 132 148 12 71 100 126 137 121 149 16 63 100 115 114 114 133 32 68 100 126 134 129 143

New Linux PC-Cluster at DLR-AS Performance Test Results Speed of TAU on Opteron CPUs is a linear function of CPU clockspeed Compared to CrayOpteron 2.0 GHz new cluster is about 1.5 times faster Compared to NEC Xeon 3.06 GHz (standard cluster at AS-BS) new cluster is about 2.1 times faster Folie 11 > Vortrag > Stefan Melber-Wilkending

New Linux PC-Cluster at DLR-AS Performance Test Results Speedup compared to 8 CPUs (memory restrictions of the test-case) Nearly linear scalability of the TAU-Code up to 60 CPUs Tested Interconnects (Myrintet, Infiniband, RapidArray) have enough reserve for TAU parallelisation

Wind-Tunnel simulation using TAU-Code General Simulation of a wind-tunnel including test-section and nozzle Background: Avoid uncertainties of wind-tunnel corrections Uncorrected measurements directly comparable to CFD Validation of wind-tunnel corrections Extrapolation of wind-tunnel results at free-flight using CFD DLR project ForMEx (Fortschrittliche Methoden zur Extrapolation von Windkanalergebnissen auf den Freiflug) Problem : Numerical simulation of wind-tunnel including model big grids (about 20 million points) HPC-resources needed new PCcluster / AS-BS

Wind-Tunnel simulation using TAU-Code Wind-Tunnel Boundary Condition Idea: Usage and extension of engine boundary-condition Wind-tunnel inlet: Total-pressure and -temperature are given Regulation of flow-speed in windtunnel: Imaginary probe in numerical test-section (same position as in experiment) Comparison with given Machnumber Input for static pressure regulation on tunnel-outlet Applyable for 0 < Ma < 1 Numerical Wind-Tunnel Pressure on Outlet Imaginary Probe Bound. Cond. TAU-Code

Wind-Tunnel simulation using TAU-Code Validation Measurements in empty low-speed windtunnel DNW-NWB Database for validation of numerical results Measurements: Boundary layer profiles Static pressure on tunnel-outlet

Wind-Tunnel simulation using TAU-Code Preliminary Results DNW-NWB / ALVAST DLR-ALVAST half-model in high-lift configuration in DNW-NWB DLR-ALVAST: analoge to AIRBUS A320 Half model mounted on peniche Grids: Hybrid unstructured Centaur grid generator 20 million points Full Navier-Stokes Chimera-Technique: rotation of model without grid-generation

Wind-Tunnel simulation using TAU-Code Preliminary Results DNW-NWB / ALVAST Simulation of complete lift-polars including maximum lift Geometry variations: Wing-root geometry (e.g. slathorn, 16 configurations) Comparison of wind-tunnelsimulation against free-flight wind-tunnel-corrections Influence of peniche height

Wind-Tunnel simulation using TAU-Code Preliminary Results DNW-NWB / ALVAST ALVAST TAU F11 Wind-Tunnel Horse-shoe vortex around peniche

Wind-Tunnel simulation using TAU-Code Preliminary Results DNW-NWB / ALVAST

Conclusions TAU tested on PC-Linux Clusters: Good scalability and performance New Cluster at AS/BS available for production: 10/2005 Implementation of an wind-tunnel boundary condition in TAU: Validation with empty wind-tunnel measurements First results of simulation of ALVAST high-lift configuration at DNW-NWB compared to the experiment Further work: Investigation of half-model influence, variation of geometry,...

Special thanks for testing-support and debugging of TAU-parallelisation W. Hafemann, C. Simmendinger (T-Systems) N. Gal, Y. Shahar (Voltaire) J. Redmer, T. Warschko (Linux NetWorx) Axel Köhler (SUN) Institute of Propulsion Technology (DLR-AT) R. Dwight, T. Alrutz (DLR-AS) M. Wierse (Cray)