Speedup von Analysen und Optimierungen mit OptiStruct

Similar documents

Recommended hardware system configurations for ANSYS users

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Best practices for efficient HPC performance with large models

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA (T) (F)

High Performance Computing in CST STUDIO SUITE

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

Overlapping Data Transfer With Application Execution on Clusters

ANSYS Solvers: Usage and Performance. Ansys equation solvers: usage and guidelines. Gene Poole Ansys Solvers Team, April, 2002

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Recent Advances in HPC for Structural Mechanics Simulations

Virtuoso and Database Scalability

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

QLIKVIEW SERVER MEMORY MANAGEMENT AND CPU UTILIZATION

Arrow ECS sp. z o.o. Oracle Partner Academy training environment with Oracle Virtualization. Oracle Partner HUB

Drupal Performance Tuning

Monitoring Databases on VMware

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

OpenMP Programming on ScaleMP

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Central Processing Unit (CPU)

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Grant Management. System Requirements

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Solid State Drive Architecture

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Energy-aware job scheduler for highperformance

System Requirements Table of contents

Database Hardware Selection Guidelines

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Intel Solid-State Drives Increase Productivity of Product Design and Simulation

ultra fast SOM using CUDA

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Ready Time Observations

Configuration Maximums VMware Infrastructure 3

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

Clusters: Mainstream Technology for CAE

Abila Grant Management. System Requirements

CPU Performance. Lecture 8 CAP

AirWave 7.7. Server Sizing Guide

Web Application s Performance Testing

IMPLEMENTING GREEN IT

High-Performance Processing of Large Data Sets via Memory Mapping A Case Study in R and C++

Performance tuning Xen

FLOW-3D Performance Benchmark and Profiling. September 2012

Analysis of VDI Storage Performance During Bootstorm

Benchmark Tests on ANSYS Parallel Processing Technology

International Journal of Computer & Organization Trends Volume20 Number1 May 2015

Delivering Quality in Software Performance and Scalability Testing

Packet Capture in 10-Gigabit Ethernet Environments Using Contemporary Commodity Hardware

361 Computer Architecture Lecture 14: Cache Memory

Quantifying Hardware Selection in an EnCase v7 Environment

Simnet Registry Repair User Guide. Edition 1.3

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Technical Paper. Performance and Tuning Considerations for SAS on Fusion-io ioscale Flash Storage

Install Guide for JunosV Wireless LAN Controller

How to choose a suitable computer

Summer Student Project Report

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Throughput Capacity Planning and Application Saturation

Introduction to Cloud Computing

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

Windows Server Performance Monitoring

Enabling Technologies for Distributed Computing

on an system with an infinite number of processors. Calculate the speedup of

owncloud Enterprise Edition on IBM Infrastructure

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

RightNow November 09 Workstation Specifications

Scaling Analysis Services in the Cloud

CYCLOPE let s talk productivity

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

How To Test On The Dsms Application

LS DYNA Performance Benchmarks and Profiling. January 2009

Mirtrak 6 Powered by Cyclope

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING

Parallel Algorithm Engineering

SUN ORACLE EXADATA STORAGE SERVER

Key Attributes for Analytics in an IBM i environment

SUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS UPDATE

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

INTRODUCTION TO WINDOWS 7

Stream Processing on GPUs Using Distributed Multimedia Middleware

Transcription:

Beginn: 11:00 Uhr Innovation Intelligence Speedup von Analysen und Optimierungen mit OptiStruct Kristian Holm (12.07.2013) HyperWorks Best Practice www.altairhyperworks.de/bestpractice

Agenda the computing time is influenced by: - Model size - Hardware - Operating system - Memory allocation - solver - parallelization

Agenda the computing time is influenced by: - Memory allocation - solver - parallelization

Memory allocation A check run can be very helpful in estimating the memory and disk space usage. The solver automatically chooses an in-core, out-of-core, or minimum core solution based on the memory allocated. A solution type can be forced by defining the core option in the run script; the memory necessary for the specified solution type is then assigned.

Memory allocation When more memory is requested than actual available RAM, OptiStructwill run much slower due to swapping. there will be a significant difference between the elapsed time and the CPU time Memory that is not used by OptiStructis still available for I/O caching. So the amount of free memory can dramatically effect the wall clock time of the run. The more free memory, the less I/O wait time and the faster the job will run. Even if an analysis is too large to run in-core, having extra memory available will increase the speed of the analysis because unused RAM will be used by the operating system to buffer disk requests.

solver BCS direct solver PCG iterative solver Mumps direct unsymmetrical solver Lanczos Eigen Solver Amses Automatic Multilevel Substructuring Eigen Solver

Solver linear static BCS direct solver PCG iterative solver default optional Mumps direct unsymmetrical solver optional

Solver linear static Model info solid -gear case model 3 static subcases with different SPC s Total # of Grids (Structural) : 901776 Total # of Elements : 516392 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 elapsed time [s] vs. Solver BCS MUMPS PCG Run with 1 core and core in option

Solver linear static Model info 1 static subcases 2 nd order Hexa-elements Total # of Grids (Structural) : 160781 Total # of Elements : 39520 900 800 700 600 500 400 300 200 100 0 14000 12000 10000 8000 6000 4000 2000 0 elapsed time [s] vs. Solver BCS BCS Run with 1core and core in option PCG PCG used RAM for incore solution[mb] vs. Solver

Solver nonlinear static (nlstat) BCS direct solver PCG iterative solver default n/a When friction is present Mumps direct unsymmetrical solver optional

Solver modal solutions Lancos Eigen Solver default Amses Automatic Multilevel Substructuring Eigen Solver optional

Solver modal solutions Model info BIW Free-free-Eigenmodes # 200 Modes Total # of Grids (Structural) : 706534 Total # of Elements : 690242 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 3500 elapsed time [s] vs. Solver 3000 2500 2000 1500 1000 500 0 Lancos Amses Run with 1core and core in option

parallelization SMP Shared Memory Parallelization SPMD Single Program Multiple Data Hybrid SPMD + SMP SMP with usage of GPU

Parallelization - SMP SMP - Shared Memory Parallelism based on shared memory architecture of computers all processors can access a common memory space. Each process can access to all memory allocated by the program. Howtorunan SMP-job?

Parallelization - SMP Model info solid -gear case model 3 static subcases with different SPC s Total # of Grids (Structural) : 901776 Total # of Elements : 516392 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 2000 elapsed time vs. number of cores in SMP run 1500 1000 500 0 1 2 4 8 16 Run with BCS direct solver

Parallelization - SMP Model info BIW Free-free-Eigenmodes # 200 Modes Total # of Grids (Structural) : 706534 Total # of Elements : 690242 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 2000 elapsed time vs. number of cores in SMP run 1500 1000 500 0 1 2 4 8 16 Run with Amses-solver

Parallelization - SMP Model info 1 static subcases 2 nd orderhexa-elements Total # of Grids (Structural) : 160781 Total # of Elements : 39520 250 elapsed time vs. number of cores in SMP run 200 150 100 50 0 1 4 8 Run with PCG iterative solver

Parallelization - SMP Model info Engine block Nlstat -contact with friction 2 load cases (pretension step + loading step) Total # of Grids (Structural) : 1017210 Total # of Elements : 640379 25000 elapsed time vs. number of cores in SMP run 20000 15000 10000 5000 0 1 2 4 8 16 Run with MUMPS direct unsymetric solver

Parallelization - SMP SMP when should it be used? Shows speedup on all examples for each solver Usageof 4 (Cores + GPU s) add no additional cost On some models SPMD shows more speedup.

Parallelization - SPMD SPMD - Single Program Multiple Data OptiStruct divides the analysis into several domains (if possible). Howtorunan SPMD-job?

Parallelization - SPMD Model info solid -gear case model 3 static subcases with different SPC s Total # of Grids (Structural) : 901776 Total # of Elements : 516392 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 2000 elapsed time vs. number of cores in MPI run 1500 1000 500 0 2*mpi 4*mpi 8*mpi 1 CPU Run with BCS direct solver

Parallelization - SPMD Model info BIW Free-free-Eigenmodes # 200 Modes Total # of Grids (Structural) : 706534 Total # of Elements : 690242 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 2000 elapsed time vs. number of cores in MPI run 1500 1000 500 0 2*mpi 4*mpi 8*mpi 1 CPU Run with Amses-solver

Parallelization - SPMD Model info 1 static subcases 2 nd order Hexa-elements Total # of Grids (Structural) : 160781 Total # of Elements : 39520 only 1 static subcases -> SPMD not useful

Parallelization - SPMD Model info Engine block Nlstat -contact with friction 2 load cases (pretension step + loading step) Total # of Grids (Structural) : 1017210 Total # of Elements : 640379 2 subcases (pretension step + loading step) Loading step is a nonlinear solution sequence from a preceding (pretension) nonlinear subcase Subcase can not be run in parallel -> SPMD not useful

Parallelization - SPMD SPMD when should it be used? Multiple linear static load cases with different constrains Multiple nonlinear static load cases, if load cases are independent Multiple buckling load cases Direct frequency response with multiple loading frequencies Multiple modal load cases with different constrains Mixed load cases e.g. static + normal modes When load cases are parallelized the memory requirement increases as well, unlike SMP

Parallelization - Hybrid Hybrid combination from SPMD + SMP OptiStructdivides the analysis into several domains (if possible) and uses SMP for each subdomain. How to run an hybrid-job?

Parallelization - Hybrid Model info solid -gear case model 3 static subcases with different SPC s Total # of Grids (Structural) : 901776 Total # of Elements : 516392 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 SMP speedup inside MPI run 2000 1800 1600 1400 1200 1000 800 600 400 200 0 4*mpi 4*mpi a 2*smp 4*mpi a 4*smp 1 CPU Run with BCS direct solver

Parallelization - Hybrid Hybrid when should it be used? When max. number of SPMD-parallelization is reached and still more core s are available E.g. 16 cores and 3 static load cases (with different SPC s) 4*SPMD (3 load cases + 1 managing) a 4 SMP When max number of SPMD-parallelization can not be used due to insufficient memory.

Parallelization - GPU SMP with usage of GPU Currently it is 1 GPU + 1/more CPU s Howtorunan GPU-job? GPU Recommended GPU s

Parallelization SMP+GPU Model info solid -gear case model 3 static subcases with different SPC s Total # of Grids (Structural) : 901776 Total # of Elements : 516392 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap 2000 1800 1600 1400 1200 1000 800 600 400 200 0 influence of 1 additional GPU on the number of cores 1 CPU 2 4 8 16 Run with BCS direct solver

Parallelization - GPU GPU when could it be used? Static analysis/optimization with BCS - direct solver. available on 64-bit Linux platform available for SMP-module When SPMD is possible it might give more speedup.

summary General recommendations: Run in-core solution if possible If not possible, still more memory helps to reduce I/O-time there is usually no reason to use less then 4 core s (SMP) Run SPMD when having appropriate load cases Use AMSES-solver for large modal analysis (and combine it with SMP) Optional recommendations: Try PCG-solver on bulky-solid models under static load, especially when memory is not sufficient to run direct solver in-core.