Beginn: 11:00 Uhr Innovation Intelligence Speedup von Analysen und Optimierungen mit OptiStruct Kristian Holm (12.07.2013) HyperWorks Best Practice www.altairhyperworks.de/bestpractice
Agenda the computing time is influenced by: - Model size - Hardware - Operating system - Memory allocation - solver - parallelization
Agenda the computing time is influenced by: - Memory allocation - solver - parallelization
Memory allocation A check run can be very helpful in estimating the memory and disk space usage. The solver automatically chooses an in-core, out-of-core, or minimum core solution based on the memory allocated. A solution type can be forced by defining the core option in the run script; the memory necessary for the specified solution type is then assigned.
Memory allocation When more memory is requested than actual available RAM, OptiStructwill run much slower due to swapping. there will be a significant difference between the elapsed time and the CPU time Memory that is not used by OptiStructis still available for I/O caching. So the amount of free memory can dramatically effect the wall clock time of the run. The more free memory, the less I/O wait time and the faster the job will run. Even if an analysis is too large to run in-core, having extra memory available will increase the speed of the analysis because unused RAM will be used by the operating system to buffer disk requests.
solver BCS direct solver PCG iterative solver Mumps direct unsymmetrical solver Lanczos Eigen Solver Amses Automatic Multilevel Substructuring Eigen Solver
Solver linear static BCS direct solver PCG iterative solver default optional Mumps direct unsymmetrical solver optional
Solver linear static Model info solid -gear case model 3 static subcases with different SPC s Total # of Grids (Structural) : 901776 Total # of Elements : 516392 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 elapsed time [s] vs. Solver BCS MUMPS PCG Run with 1 core and core in option
Solver linear static Model info 1 static subcases 2 nd order Hexa-elements Total # of Grids (Structural) : 160781 Total # of Elements : 39520 900 800 700 600 500 400 300 200 100 0 14000 12000 10000 8000 6000 4000 2000 0 elapsed time [s] vs. Solver BCS BCS Run with 1core and core in option PCG PCG used RAM for incore solution[mb] vs. Solver
Solver nonlinear static (nlstat) BCS direct solver PCG iterative solver default n/a When friction is present Mumps direct unsymmetrical solver optional
Solver modal solutions Lancos Eigen Solver default Amses Automatic Multilevel Substructuring Eigen Solver optional
Solver modal solutions Model info BIW Free-free-Eigenmodes # 200 Modes Total # of Grids (Structural) : 706534 Total # of Elements : 690242 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 3500 elapsed time [s] vs. Solver 3000 2500 2000 1500 1000 500 0 Lancos Amses Run with 1core and core in option
parallelization SMP Shared Memory Parallelization SPMD Single Program Multiple Data Hybrid SPMD + SMP SMP with usage of GPU
Parallelization - SMP SMP - Shared Memory Parallelism based on shared memory architecture of computers all processors can access a common memory space. Each process can access to all memory allocated by the program. Howtorunan SMP-job?
Parallelization - SMP Model info solid -gear case model 3 static subcases with different SPC s Total # of Grids (Structural) : 901776 Total # of Elements : 516392 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 2000 elapsed time vs. number of cores in SMP run 1500 1000 500 0 1 2 4 8 16 Run with BCS direct solver
Parallelization - SMP Model info BIW Free-free-Eigenmodes # 200 Modes Total # of Grids (Structural) : 706534 Total # of Elements : 690242 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 2000 elapsed time vs. number of cores in SMP run 1500 1000 500 0 1 2 4 8 16 Run with Amses-solver
Parallelization - SMP Model info 1 static subcases 2 nd orderhexa-elements Total # of Grids (Structural) : 160781 Total # of Elements : 39520 250 elapsed time vs. number of cores in SMP run 200 150 100 50 0 1 4 8 Run with PCG iterative solver
Parallelization - SMP Model info Engine block Nlstat -contact with friction 2 load cases (pretension step + loading step) Total # of Grids (Structural) : 1017210 Total # of Elements : 640379 25000 elapsed time vs. number of cores in SMP run 20000 15000 10000 5000 0 1 2 4 8 16 Run with MUMPS direct unsymetric solver
Parallelization - SMP SMP when should it be used? Shows speedup on all examples for each solver Usageof 4 (Cores + GPU s) add no additional cost On some models SPMD shows more speedup.
Parallelization - SPMD SPMD - Single Program Multiple Data OptiStruct divides the analysis into several domains (if possible). Howtorunan SPMD-job?
Parallelization - SPMD Model info solid -gear case model 3 static subcases with different SPC s Total # of Grids (Structural) : 901776 Total # of Elements : 516392 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 2000 elapsed time vs. number of cores in MPI run 1500 1000 500 0 2*mpi 4*mpi 8*mpi 1 CPU Run with BCS direct solver
Parallelization - SPMD Model info BIW Free-free-Eigenmodes # 200 Modes Total # of Grids (Structural) : 706534 Total # of Elements : 690242 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 2000 elapsed time vs. number of cores in MPI run 1500 1000 500 0 2*mpi 4*mpi 8*mpi 1 CPU Run with Amses-solver
Parallelization - SPMD Model info 1 static subcases 2 nd order Hexa-elements Total # of Grids (Structural) : 160781 Total # of Elements : 39520 only 1 static subcases -> SPMD not useful
Parallelization - SPMD Model info Engine block Nlstat -contact with friction 2 load cases (pretension step + loading step) Total # of Grids (Structural) : 1017210 Total # of Elements : 640379 2 subcases (pretension step + loading step) Loading step is a nonlinear solution sequence from a preceding (pretension) nonlinear subcase Subcase can not be run in parallel -> SPMD not useful
Parallelization - SPMD SPMD when should it be used? Multiple linear static load cases with different constrains Multiple nonlinear static load cases, if load cases are independent Multiple buckling load cases Direct frequency response with multiple loading frequencies Multiple modal load cases with different constrains Mixed load cases e.g. static + normal modes When load cases are parallelized the memory requirement increases as well, unlike SMP
Parallelization - Hybrid Hybrid combination from SPMD + SMP OptiStructdivides the analysis into several domains (if possible) and uses SMP for each subdomain. How to run an hybrid-job?
Parallelization - Hybrid Model info solid -gear case model 3 static subcases with different SPC s Total # of Grids (Structural) : 901776 Total # of Elements : 516392 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap OptiStruct 12.0 SMP speedup inside MPI run 2000 1800 1600 1400 1200 1000 800 600 400 200 0 4*mpi 4*mpi a 2*smp 4*mpi a 4*smp 1 CPU Run with BCS direct solver
Parallelization - Hybrid Hybrid when should it be used? When max. number of SPMD-parallelization is reached and still more core s are available E.g. 16 cores and 3 static load cases (with different SPC s) 4*SPMD (3 load cases + 1 managing) a 4 SMP When max number of SPMD-parallelization can not be used due to insufficient memory.
Parallelization - GPU SMP with usage of GPU Currently it is 1 GPU + 1/more CPU s Howtorunan GPU-job? GPU Recommended GPU s
Parallelization SMP+GPU Model info solid -gear case model 3 static subcases with different SPC s Total # of Grids (Structural) : 901776 Total # of Elements : 516392 system + software info Linux 2.6.18-308.8.1.el5 16 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz CPU speed 1200 MHz 128929 MB RAM, 8191 MB swap 2000 1800 1600 1400 1200 1000 800 600 400 200 0 influence of 1 additional GPU on the number of cores 1 CPU 2 4 8 16 Run with BCS direct solver
Parallelization - GPU GPU when could it be used? Static analysis/optimization with BCS - direct solver. available on 64-bit Linux platform available for SMP-module When SPMD is possible it might give more speedup.
summary General recommendations: Run in-core solution if possible If not possible, still more memory helps to reduce I/O-time there is usually no reason to use less then 4 core s (SMP) Run SPMD when having appropriate load cases Use AMSES-solver for large modal analysis (and combine it with SMP) Optional recommendations: Try PCG-solver on bulky-solid models under static load, especially when memory is not sufficient to run direct solver in-core.