Ansys & optislang on a HPC-Cluster Optimierungsergebnis nach 50 Iterationen Evolutionary Algorithm 1,00E+05 1,00E+04 Messkurve Best-Fit 1,00E+03 Dipl.-Ing. (FH) Holger Mai Engineering GmbH Holunderweg 8 89182 Bernstadt www.microconsult-engineering.de PSD-Beschleunigung [(m/s²)²/hz] 1,00E+02 1,00E+01 1,00E+00 1,00E-01 1,00E-02 1,00E-03 1,00E-04 1,00E-05 0 100 200 300 400 500 600 700 800 900 1000 Frequenz [Hz] Engineering
Overview & optislang The simple way optimization on a workstation One step further optimization on a HPC-Cluster using RSM Pushing the limits optimization on a HPC-Cluster in Linux-environment Engineering Holger Mai, Seite 2
About Founded in 2000 Business focussed on FE-simulation and metrology engineering 5 employees, 3 working on simulation Customers: Automotive, Automotive Electronics Typical problems: fatigue, thermo-mechanics, fluid dynamics Since 2009: ANSYS Enhanced Solutions Partner Engineering Holger Mai, Seite 3
& optislang successfully built up a HPC-Cluster to deal with extreme customer s problems Looking for new applications to make use of the enormous computing power Being able to solve one huge problem on 100+ cores we could also solve several problems simultaneously optimization is the way to go Introduction of optislang in June 2009, since then trying to push the limits HPC Cluster 8 Intel Harpertown Systems, total of 64 cores, 488 GB RAM 16 Intel Nehalem Systems, total of 128 cores, 1140 GB RAM OS SUSE Linux Enterprise Server Max. power consumption 18 kw Engineering Holger Mai, Seite 4
Typical test cases 1 Adaption of material parameters to fit a PSD-Analysis to experimental data 2.700.000 DOFs 150 designs calculated with ARSM (5 optimization parameters) 2 Variation of CTEs of a fibre-reinforced plastic structure to fit a thermomechanical simulation to measured values 2.600.000 DOFs 720 EA-Designs calculated (15 optimization parameters) 4 different ambient temperatures to optimize => 2880 designs! Huge number of designs to calculate for each optimization task Efficient optimization needs optimization of computing performance Engineering Holger Mai, Seite 5
Optimization on a workstation Workstation: HP Z800, Win XP Pro x64, 2x Intel Nehalem Quadcore, 48 GB RAM Maximum of 1 Job parallel, 8 cores/job RSM WB-Problem Ansys Classic is.db-file ported & to APDL-script optislang is via generated optiplug; WB optislang WB-Problem is queued changes via is RSM APDL-script ported to RSMsolve to optislang multiple to generate via problems optiplug; new designs at the WB gets input from optislang same time and works in background Maximum of 4 Jobs parallel, 2 cores/job Maximum of 4 Jobs parallel, 2 cores/job Engineering Holger Mai, Seite 6
Optimization on a workstation Testcase 1 PSD-analysis Engineering Holger Mai, Seite 7
Optimization on a Multicore Opteron Multicore Opteron: Tyan S8812 Quad Socket Board, 4x AMD Opteron Magny- Cours 12-Core-Processor, 192 GB RAM Main advantage: Configuration as simple as a Workstation Ansys Classic Compute.db-file & power APDL-script almost is generated like a cluster optislang changes APDL-script to generate new designs Maximum of 24 Jobs parallel, 2 cores/job Engineering Holger Mai, Seite 8
Optimization on a Multicore Opteron Workstation (reference) Testcase 1 size of ARSM- Generations (in this case = 9) dominates speedup-effects Engineering Holger Mai, Seite 9
Optimization on a Multicore Opteron Workstation (reference) Testcase 2 24 Designs/EA- Generation, doesn t affect performance Engineering Holger Mai, Seite 10
Optimization on a Multicore Opteron Simple setup, just like a workstation Power consumption like standard 8-core workstation Hardware costs about twice of a workstation Number of cores like in a cluster 6 times faster than up-to-date 8-core Workstation Main expense: licensing (as you will see later) Extremely efficient way to speedup your optimizations Engineering Holger Mai, Seite 11
Optimization on a HPC-Cluster using RSM RSM generation of optimization designs, pre-/post-processing I/O solution solution solution solution Engineering Holger Mai, Seite 12
Optimization on a HPC-Cluster using RSM Workstation (reference) Testcase 1 size of ARSM- Generations (in this case = 9) dominates speedup-effects Engineering Holger Mai, Seite 13
Optimization on a HPC-Cluster using RSM Problems and Disadvantages Performance is killed by huge amount of time spent for I/O (in case of a PSD-Analysis the results of the modal-analysis are transferred to the headnode and from there transferred back to host to perform the PSD-Analysis) Each process needs a single license, to calculate 5 jobs parallel you need 5 prep/post-licenses, 5 batch-licenses and 5 HPC-Packs Engineering Holger Mai, Seite 14
Optimization on a HPC-Cluster in Linux-environment optislang running in Linux-environment Headnode only generating optimization designs calculation calculation calculation Small amount of I/O e.g. only text files or pictures of evaluated results must be transferred calculation Remote machines doing entire calculation (Solution and pre/post-processing) Engineering Holger Mai, Seite 15
Optimization on a HPC-Cluster in Linux-environment optislang running in Linux-environment Headnode only generating optimization designs Additional optislang-variable allows remote calculation on more than one remote machine calculation calculation calculation calculation Engineering Holger Mai, Seite 16
Optimization on a HPC-Cluster in Linux-environment Workstation (reference) Testcase 1 size of ARSM- Generations (in this case = 9) dominates speedup-effects Engineering Holger Mai, Seite 17
Optimization on a HPC-Cluster in Linux-environment Testcase 2 24 Designs/EA- Generation, doesn t affect performance 42 times faster Engineering Holger Mai, Seite 18
Optimization on a HPC-Cluster in Linux-environment current hardware parallel computing previous generation hardware current hardware Engineering Holger Mai, Seite 19
Optimization on a HPC-Cluster in Linux-environment standard licensing RDO-Pack licensing [RDO-Pack multiplies number of availiable licenses by 8, for optimization purposes only] 12*4 cores Opteron 8*8 cores 8*16 cores 12*8 cores 42 times faster @ 8 times the cost 1*8 cores workstation Engineering Holger Mai, Seite 20
Conclusions Ansys Classic always faster than Workbench Multicore machine most convenient way to go for parallel optimization RSM generates huge amount of I/O => makes it inefficient to accelerate optimization Important factor for speedup possibilities: generation sizes Using up-to-date hardware doubles performance while causing very little extra costs compared to licensing Due to new licensing model (RDO-Pack) optimization on HPC- Cluster can be very cost-efficient (42x faster, 8x cost) Engineering Holger Mai, Seite 21
Questions? Engineering Holger Mai, Seite 22
Comparison of Windows RSM & Linux RSM Data transfer works three times faster with Linux RSM! Engineering Holger Mai, Seite 23