ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING
|
|
- Paul Pitts
- 8 years ago
- Views:
Transcription
1 ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager* Michael Wood Developer* Cristian Ianculescu Developer* Mintae Kim Developer* Andrzej Bajer Developer* *Dassault Systèmes Simulia Corp. Geraud Krawezik Developer, ACCELEWARE, Canada ABSTRACT In the last decade, significant R&D resources have been invested to deliver commercially available technologies that meet current and future mechanical engineering industry requirements, both in terms of mechanics and performance. While significant focus has been given to developing robust nonlinear finite element analysis technology, there has also been continued investment in developing advancements for linear dynamic analyses. The research and development efforts have focused on combining advanced linear and nonlinear technology to provide accurate, yet fast modelling of noise and vibration engineering problems. This effort has enabled high-fidelity models to
2 run in a reasonable time which is vital for virtual prototyping within shortened product design cycles. While it is very true that model sizes (degrees of freedom) have grown significantly during this period, the complexity of the models has also increased, which has led to a larger number of total iterations within nonlinear implicit analyses, and to a large number of eigenmodes within linear dynamic simulations. An innovative approach has been developed to leverage highperformance computing (HPC) resources to yield reasonable turn-around times for such analyses by taking advantage of massive parallelism without sacrificing any mechanical formulation quality. The accessibility and affordability of HPC hardware in the past few years has changed the landscape of commercial finite element analysis software usage and applications. This change has come in response to an expressed desire from engineers and designers to run their existing simulations faster, or in many cases to run more realistic jobs. Due to their computational cost and lack of high-performance commercial software, such "high-end" simulations were until recently thought to be only available to academic institutions or government research laboratories which typically developed their own HPC applications. Today, with the advent of affordable multi-core SMP workstations and compute clusters with multi-core nodes and high-speed interconnects equipped with GPGPU accelerators, HPC is now sought after by many engineers for routine FEA. This presents a challenge for commercial FEA software vendors which have to adapt their decades old legacy code to take advantage of state-of-the-art HPC platforms. Given this background, this paper focuses on how recent developments in HPC have affected the performance of linear dynamic and implicit nonlinear analyses. Two main HPC developments are studied. First, we look into the performance and scalability of the commercially available Abaqus AMS eigenvalue solver, and of the entire frequency response simulation running on multi-core SMP workstations. Advances in the AMS eigenvalue solution procedure and linear dynamic capabilities make the realistic simulation solution suitable for a wide range of vehicle-level noise and vibration simulations. Next, we will discuss the progress made in relatively new, but very active area of high performance commercial FE software development, which is based on taking advantage of high performance GPGPU accelerators. Efficient adoption of GPGPU in such products is a very challenging task which requires significant re-architecture of the existent code. We describe the experience in integrating GPGPU acceleration into complex commercial engineering software. In particular we discuss the trade-off we had to make and the benefits we obtained from this technology. KEYWORDS HPC, Parallel Computing, Cluster Computing, Equation Solver, Non-linear Implicit FEA, GPGPU, Modal Linear Dynamics, AMS, Automated Multilevel Substructuring, Abaqus
3 1: AMS (Automatic Multilevel Substructuring) Eigensolver As model meshes become more refined and accurate, the complexity of the models increase, the size of finite element models grows, all while the demand for faster job turn-around time continues to be strong. The role of a modebased approach in linear dynamic analyses becomes crucial given that the direct approach, based on the solution of a system of equations on the physical domain for each excitation frequency, is much more expensive as the size of finite element models grows. The most time-consuming task in mode-based linear dynamic analyses is the solution of a large eigenvalue extraction problem to create the modal basis. The most advanced eigenvalue extraction technology suitable to handle today s needs in the automotive noise and vibration (N&V) simulation is AMLS. Beginning in 2006, SIMULIA began to offer a version of AMLS, marketed as Abaqus/AMS. The performance of the AMS eigensolver, therefore, becomes crucially important to reduce overall analysis runtime in large-scale N&V simulations. Over the past three years ( ), the Abaqus AMS eigensolver has evolved from an original serial implementation designed for computers with a single processor and limited memory able to solve problems with a couple of million equations, to the modern style software implementation designed for modern computers with multi-core processors and a large amount of memory for solving larger problems with tens of millions of equations. Beginning with the Abaqus 6.10 Enhanced Functionality release, the AMS eigensolver can run in parallel on shared memory computers with multiple processors. Following that release, parallel performance of AMS has been improved substantially. To demonstrate the AMS eigensolver performance on HPC hardware, two automotive industrial models were chosen to run on the machine with four sixcore Intel Xeon Nehalem processors and 128 GB physical memory. The first model, referred to as Model 1, is an automotive vehicle body model with 14.1 million degrees of freedom. This model has an acoustic cavity for coupled structural-acoustic frequency response analysis; the modal basis consists of 5190 structural modes and 266 acoustic modes below the maximum frequency of 600 Hz. The selective recovery capability for the structural domain, which recovers user-requested output variables at the user-defined node set, and the full recovery capability for acoustic domain, which recovers user-defined output variables at all nodes of the model, is used in this simulation. The second model, Model 2, is a powertrain model with 11.2 million degrees of freedom. The modal basis includes 377 modes below 2500 Hz, and the selective recovery capability is used.
4 The pre-release version of Abaqus 6.11 was used to obtain the performance data in both models. Table 1 demonstrates the parallel performance of the AMS eigensolver for Model 1. In the table, FREQ indicates the whole frequency extraction procedure, which includes the AMS eigensolver and the non-scalable nonsolver parts of the code, while AMS indicates the AMS eigensolver itself. The AMS eigensolver takes only 25 minutes to solve the eigenproblem on 16 cores, while it takes about 4 hours on a single core. Non-scalable parts become dominant as the number of cores increases. Figure 1 shows the scalability of the AMS eigensolver based on the data in Table 1. Due to a good parallel speedup of AMS, the frequency extraction procedure FREQ shows a speedup of about 5 overall. Table 1. Performance of the AMS eigensolver (AMS) and frequency extraction procedure (FREQ) for Model 1 Number of Cores FREQ (6.11) Wall Clock Time (h:mm) AMS (6.11) Wall Clock Time (h:mm) 1 4:32 4:01 4 1:38 1:07 8 1:09 0: :56 0:25
5 Figure 1. Scalability of the Abaqus 6.11 AMS eigensolver (AMS) and frequency extraction procedure (FREQ) for Model Speedup Factor FREQ (6.11) AMS (6.11) Number of Cores Table 2 and Figure 2 show the parallel performance and scalability of the frequency extraction procedure (FREQ) and the AMS eigensolver (AMS) for Model 2. Due to a good scalability of the AMS eigensolver, the frequency extraction procedure takes only 36 minutes for this large model, which significantly reduces the overall job turn-around time. Table 2. Performance of the AMS eigensolver (AMS) and frequency extraction procedure (FREQ) for Model 2 Number of Cores FREQ (6.11) Wall Clock Time (h:mm) AMS (6.11) Wall Clock Time (h:mm) 1 2:57 2:33 4 1:03 0:39 8 0:45 0: :36 0:13
6 Figure 2. Scalability of the Abaqus 6.11 AMS eigensolver (AMS) and frequency extraction procedure (FREQ) for Model Speedup Factor FREQ (6.11) AMS (6.11) Number of Cores 2: Mode-based Frequency Response Analysis Mode-based frequency response analysis is the commonly accepted method by N&V engineers for simulation of noise and vibrations in vehicles and other structures. To reduce the cost of the analysis, the system of equations is solved in a modal subspace. The projection of the finite element system to the modal subspace requires the eigenvalue extraction analysis, which in Abaqus is typically performed using the AMS eigensolver described in the previous section. The projected system of equations in the modal subspace takes the following form: 2 K ω M ωc D Re( Q( ω)) Re( F ( ω)) = 2 ( ωc D K ω M ) Im( Q( ω)) Im( F ( ω)) (1) Here: K - is the system stiffness matrix; M - mass matrix; C - viscous damping matrix; D - structural damping matrix; ω - excitation frequency; Q - generalized displacement; F - force vector; Re() - real part of a complex quantity; Im() - imaginary part of a complex quantity.
7 The size of the modal system (1) is twice the number of modes. If the frequency response is performed in the mid-frequency range, often there are more than 10,000 modes in a complex structure. If only diagonal damping is applied, the mode-based analysis is really inexpensive because the system of equations (1) becomes decoupled and every equation is solved separately. However, in the mid-frequency range the modal damping is not sufficient ient and material damping (e.g., dashpot elements and material structural damping) must be applied to obtain accurate results. The material damping causes the projected damping operators C and/or D in the equation (1) to be fully populated. Thus, the system of linear equations, which is two times the number of modes (2N) in the modal subspace must be solved at every frequency point. With a few hundred to a thousand frequency points, and the number of modes over 10,000, it becomes a rather expensive analysis. Figure 3. The structure of the left-hand side operator for the mode-based frequency response analysis In a typical case, when the stiffness matrix is symmetric and constant with respect to excitation frequency, the stiffness and mass operators are reduced to diagonal matrices in the modal subspace. Thus, the structure of the system of modal equations (1) in this case is presented in Figure 3. The diagonal blocks are represented by diagonal matrices (corresponding to a linear combination of projected mass and stiffness operators), while the off-diagonal blocks are fully populated (corresponding to projected structural and viscous damping operators). Traditionally, this system of equations of the size 2N is solved at every frequency. First, we take advantage age of a diagonal structure on a part of the operator and reduce the size of the system by half. Using this reduction we end up with a fully populated system of equations of the size M. For details and derivation of the reduction algorithm we refer to [1]. The reduction phase is dominated by the matrix-matrix multiplication operations, and takes more time than the subsequent solution of the reduced system. Thus, to obtain an efficient parallel algorithm, we need to parallelize both algorithms: the matrix-matrix multiplication and the factorization of the dense system of equations.
8 The parallel algorithm for mode-based frequency response analysis is implemented on shared-memory machines. The computationally expensive ingredients of this algorithm matrix-matrix products and dense linear solves have been parallelized using a task-based approach. This implementation ensures that the memory consumption remains constant regardless of the number of processors used, while achieving almost linear parallel scaling to the limits of the number of general-purpose computational cores of modern hardware. To demonstrate the effectiveness of this algorithm we present an example of a typical N&V analysis of structural vibration of a car body. The stiffness matrix is symmetric and the model includes some structural damping, so the projected system looks like the one illustrated in Figure 3. Over 10,000 modes were extracted using Abaqus/AMS eigensolver, and the analysis is performed at 500 frequency points. The presented results were obtained on the machine with four six-core Intel Xeon Nehalem processors and 128 GB physical memory. Table 3 and Figure 4 show the performance and scalability of the modal frequency response solver. Excellent parallel speed-up of on 24-cores allows for reducing of the wall-clock analysis time from almost 22 hour to about 1 hour. It drastically reduces turn-around time and enables N&V engineers to analyse a few design changes during one business day. Table 3. Analysis time and scalability of the mode-based frequency response solver Number of Cores Wall Clock Time (h:mm) Parallel Speed-Up 1 21: : : : : :
9 Figure 4. Analysis time of the mode-based frequency response solver on a shared-memory machine with 24 cores 25 Parallel speed-up Number of cores Figure 5 demonstrates s the parallel efficiency of the modal frequency response solver. The efficiency is defined as the parallel speed-up divided by the number of cores times 100%. Thus, the parallel efficiency of 100% would indicate the optimal speed-up. The presented results demonstrate very good efficiency of the modal frequency response solver of about 95% on 2, 4, and 8 cores. On the 24 cores, the efficiency is just below 90%. Figure 5. Parallel efficiency of the mode-based frequency response solver on a sharedmemory mory machine with 24 cores Efficiency [%] Number of cores
10 3: Acceleration of the direct sparse solver using GPGPUs GPGPUs offer exceptional floating point operation speed. With the advent of recent hardware, theoretical double precision floating point operations can be executed at a rate of 500 GFlops. Of course, in order to realize this peak, an algorithm must be embarrassingly parallel since the tremendous processing speed is largely due to massive parallelism of the GPGPU hardware. One of the challenges to exploiting the power of GPGPU in general purpose FEA codes is that it requires re-writing the code in a new language, and adapting the algorithm to maximally utilize the GPGPU hardware. Currently, there are two GPGPU hardware vendors, and each has its own preferred coding language. In order to maximize the benefit of GPGPU performance while minimizing development effort, we chose to apply this technology to the most floating point intensive portion of any implicit FEA program the linear equation solver. With minimal changes to our existing solver, we created an interface for the factorization of individual supernodes in our direct sparse solver. We turned to Acceleware Corporation for the implementation of the GPGPU portion of the project. Their experience with GPGPU acceleration of scientific algorithms was helpful in getting our first implementation up and running quickly. In our current implementation, our GPGPU accelerated direct solver can greatly reduce the time spent in the solver phase of an FEA analysis for a variety of large models. We have learned that there are a number of factors which must be considered when trying to determine the level of benefit to expect when adding GPGPU compute capability to reduce analysis time. Abaqus provides an out-of-core solver, however, when enough memory is available, the factorization and subsequent backward pass remains in-core and delivers optimal performance. Once the problem size exceeds the system memory, I/O costs will become significant and reduces the overall benefit of GPGPU acceleration. Another factor is the size of the FEA model. The most important measure of size in this case is not the number of degrees of freedom (DoF) in the model, but the number of floating point operations required for factorization. Thus, a 5 million DoF solid element model may be more computationally intensive than a 10 million DoF shell element model. The target we set for performance gain was an overall speedup of 2x for the analysis wall clock time for our benchmark automotive powertrain model, when compared to the performance on a 4 core parallel run. The actual results are shown in Figure 6, identified by the number of floating point operations in the solver for this model (1.0E+13). This chart is arranged to show how the amount of work in the solver correlates to performance improvements when
11 using a GPGPU for compute acceleration. The effectiveness of GPGPU acceleration increases with problem size up to the point where the factorization can no longer fit in core, or an individual supernode does not fit in the GPGPU memory. Figure 6. Effect of GPGPU acceleration on the performance of 4 core parallel runs 4.00 GPGPU speedup (4 core / (4 core + gpu)) E E E E E E E E E E E E E+14 Today, it is common for high performance workstations or compute cluster nodes have 8 cores. For comparison, see the results in the chart of Figure 7 for 8 core + GPGPU vs. 8 core runs for some of the larger test cases. Here, the addition of GPGPU acceleration is again beneficial, but not to the same level. Increasing the number of core increases the number of branches in the supernode tree that are solved concurrently. When more than one branch has an eligible supernode for processing on the GPGPU, there is contention for the GPGPU resource. This results in a delay (waiting for GPGPU to be available), or processing the supernode on the slower CPU resources.
12 Figure 7. Effect of GPGPU acceleration on the performance of 8 core parallel runs Speed up (8 cpu / (8 cpu + gpu)) E E E E+14 Future developments to further leverage GPGPU acceleration of our direct sparse solver will target deployment on multiple nodes of a compute cluster. Going forward we hope to find applications for GPGPU compute acceleration outside of our direct sparse solver. REFERENCES 1. Bajer, A., Performance Improvement Algorithm for Mode-Based Frequency Response Analysis, SAE Paper No , 2009.
Leveraging Windows HPC Server for Cluster Computing with Abaqus FEA
Leveraging Windows HPC Server for Cluster Computing with Abaqus FEA This white paper outlines the benefits of using Windows HPC Server as part of a cluster computing solution for performing realistic simulation.
More informationRecommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
More informationThree Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture
White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationFinite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing
Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing Microsoft Windows Compute Cluster Server Runs
More informationLinear Dynamics with Abaqus
Linear Dynamics with Abaqus 2016 About this Course Course objectives Upon completion of this course you will be able to: Extract eigenmodes about a certain frequency Determine whether the number of extracted
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationThe Value of High-Performance Computing for Simulation
White Paper The Value of High-Performance Computing for Simulation High-performance computing (HPC) is an enormous part of the present and future of engineering simulation. HPC allows best-in-class companies
More informationMixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationInteractive comment on A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc by W. He et al.
Geosci. Model Dev. Discuss., 8, C1166 C1176, 2015 www.geosci-model-dev-discuss.net/8/c1166/2015/ Author(s) 2015. This work is distributed under the Creative Commons Attribute 3.0 License. Geoscientific
More informationBest practices for efficient HPC performance with large models
Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013 - Industry Oriented HPC Simulations, September 21-27, University of Ljubljana,
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationClusters: Mainstream Technology for CAE
Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux
More informationAccelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
More informationLS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
More informationScaling Study of LS-DYNA MPP on High Performance Servers
Scaling Study of LS-DYNA MPP on High Performance Servers Youn-Seo Roh Sun Microsystems, Inc. 901 San Antonio Rd, MS MPK24-201 Palo Alto, CA 94303 USA youn-seo.roh@sun.com 17-25 ABSTRACT With LS-DYNA MPP,
More informationCluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
More informationParallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best
More informationOverview of HPC Resources at Vanderbilt
Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources
More informationUnderstanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
More informationHPC Deployment of OpenFOAM in an Industrial Setting
HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment
More informationDavid Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM
More informationAchieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
More informationBenchmark Tests on ANSYS Parallel Processing Technology
Benchmark Tests on ANSYS Parallel Processing Technology Kentaro Suzuki ANSYS JAPAN LTD. Abstract It is extremely important for manufacturing industries to reduce their design process period in order to
More informationAdvanced Core Operating System (ACOS): Experience the Performance
WHITE PAPER Advanced Core Operating System (ACOS): Experience the Performance Table of Contents Trends Affecting Application Networking...3 The Era of Multicore...3 Multicore System Design Challenges...3
More informationA Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
More informationGPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
More informationGPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
More informationAccelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
More informationRecent Advances in HPC for Structural Mechanics Simulations
Recent Advances in HPC for Structural Mechanics Simulations 1 Trends in Engineering Driving Demand for HPC Increase product performance and integrity in less time Consider more design variants Find the
More informationWhy Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationSGI HPC Systems Help Fuel Manufacturing Rebirth
SGI HPC Systems Help Fuel Manufacturing Rebirth Created by T A B L E O F C O N T E N T S 1.0 Introduction 1 2.0 Ongoing Challenges 1 3.0 Meeting the Challenge 2 4.0 SGI Solution Environment and CAE Applications
More informationLarge-Scale Reservoir Simulation and Big Data Visualization
Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)
More informationHigh Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationIntroduction to High Performance Cluster Computing. Cluster Training for UCL Part 1
Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these
More informationIBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain
More informationANSYS Solvers: Usage and Performance. Ansys equation solvers: usage and guidelines. Gene Poole Ansys Solvers Team, April, 2002
ANSYS Solvers: Usage and Performance Ansys equation solvers: usage and guidelines Gene Poole Ansys Solvers Team, April, 2002 Outline Basic solver descriptions Direct and iterative methods Why so many choices?
More informationDesign and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
More informationHigh Performance Matrix Inversion with Several GPUs
High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República
More informationPerformance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317. http://www.ansys.com (T) 724-746-3304 (F) 724-514-9494
Performance Guide ANSYS, Inc. Release 12.1 Southpointe November 2009 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317 certified to ISO ansysinfo@ansys.com 9001:2008. http://www.ansys.com (T) 724-746-3304
More informationReconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra
More informationTurbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
More informationNAPA/MAESTRO Interface. Reducing the Level of Effort for Ship Structural Design
NAPA/MAESTRO Interface Reducing the Level of Effort for Ship Structural Design 12/3/2010 Table of Contents Introduction... 1 Why Create a NAPA/MAESTRO Interface... 1 Level of Effort Comparison for Two
More informationIcepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks
Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks Garron K. Morris Senior Project Thermal Engineer gkmorris@ra.rockwell.com Standard Drives Division Bruce W. Weiss Principal
More informationAbaqus Technology Brief. Automobile Roof Crush Analysis with Abaqus
Abaqus Technology Brief Automobile Roof Crush Analysis with Abaqus TB-06-RCA-1 Revised: April 2007. Summary The National Highway Traffic Safety Administration (NHTSA) mandates the use of certain test procedures
More informationHSL and its out-of-core solver
HSL and its out-of-core solver Jennifer A. Scott j.a.scott@rl.ac.uk Prague November 2006 p. 1/37 Sparse systems Problem: we wish to solve where A is Ax = b LARGE Informal definition: A is sparse if many
More informationYALES2 porting on the Xeon- Phi Early results
YALES2 porting on the Xeon- Phi Early results Othman Bouizi Ghislain Lartigue Innovation and Pathfinding Architecture Group in Europe, Exascale Lab. Paris CRIHAN - Demi-journée calcul intensif, 16 juin
More informationIntel Solid-State Drives Increase Productivity of Product Design and Simulation
WHITE PAPER Intel Solid-State Drives Increase Productivity of Product Design and Simulation Intel Solid-State Drives Increase Productivity of Product Design and Simulation A study of how Intel Solid-State
More informationThe Application of Process Automation and Optimisation in the Rapid Development of New Passenger Vehicles at SAIC Motor
The Application of Process Automation and Optimisation in the Rapid Development of New Passenger Vehicles at SAIC Motor Dave Husson Vehicle CAE Manager, SAIC Motor UK Technical Centre Lowhill Lane, Longbridge,
More informationOptimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
More informationCalculation of Eigenmodes in Superconducting Cavities
Calculation of Eigenmodes in Superconducting Cavities W. Ackermann, C. Liu, W.F.O. Müller, T. Weiland Institut für Theorie Elektromagnetischer Felder, Technische Universität Darmstadt Status Meeting December
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
More informationBack to Elements - Tetrahedra vs. Hexahedra
Back to Elements - Tetrahedra vs. Hexahedra Erke Wang, Thomas Nelson, Rainer Rauch CAD-FEM GmbH, Munich, Germany Abstract This paper presents some analytical results and some test results for different
More informationGPGPU acceleration in OpenFOAM
Carl-Friedrich Gauß Faculty GPGPU acceleration in OpenFOAM Northern germany OpenFoam User meeting Braunschweig Institute of Technology Thorsten Grahs Institute of Scientific Computing/move-csc 2nd October
More information1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationThe Methodology of Application Development for Hybrid Architectures
Computer Technology and Application 4 (2013) 543-547 D DAVID PUBLISHING The Methodology of Application Development for Hybrid Architectures Vladimir Orekhov, Alexander Bogdanov and Vladimir Gaiduchok Department
More informationAutomotive Brake Squeal Analysis Using a Complex Modes Approach
Abaqus Technology Brief TB-05-BRAKE-1 Revised: April 2007. Automotive Brake Squeal Analysis Using a Complex Modes Approach Summary A methodology to study friction-induced squeal in a complete automotive
More informationSimulation Platform Overview
Simulation Platform Overview Build, compute, and analyze simulations on demand www.rescale.com CASE STUDIES Companies in the aerospace and automotive industries use Rescale to run faster simulations Aerospace
More informationInformation management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse
Information management software solutions White paper Powerful data warehousing performance with IBM Red Brick Warehouse April 2004 Page 1 Contents 1 Data warehousing for the masses 2 Single step load
More informationTableau Server 7.0 scalability
Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationBalancing Manufacturability and Optimal Structural Performance for Laminate Composites through a Genetic Algorithm
Balancing Manufacturability and Optimal Structural Performance for Laminate Composites through a Genetic Algorithm Mike Stephens Senior Composites Stress Engineer, Airbus UK Composite Research, Golf Course
More informationInfrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationMEng, BSc Applied Computer Science
School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions
More informationBuilding a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
More informationOn-Demand Supercomputing Multiplies the Possibilities
Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Image courtesy of Wolfram Research, Inc. On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server
More informationLS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
More informationScaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
More informationHeat Transfer and Thermal-Stress Analysis with Abaqus
Heat Transfer and Thermal-Stress Analysis with Abaqus 2016 About this Course Course objectives Upon completion of this course you will be able to: Perform steady-state and transient heat transfer simulations
More information2: Computer Performance
2: Computer Performance http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture2.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12
More informationThe Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.
White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
More informationQuiz for Chapter 1 Computer Abstractions and Technology 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,
More informationTime Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication
Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Thomas Reilly Data Physics Corporation 1741 Technology Drive, Suite 260 San Jose, CA 95110 (408) 216-8440 This paper
More informationThe Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland
The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which
More informationPyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms
PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014 Overview Motivation Flux Reconstruction Many-Core
More informationHIGH PERFORMANCE CONSULTING COURSE OFFERINGS
Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...
More informationABAQUS High Performance Computing Environment at Nokia
ABAQUS High Performance Computing Environment at Nokia Juha M. Korpela Nokia Corporation Abstract: The new commodity high performance computing (HPC) hardware together with the recent ABAQUS performance
More informationSUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS - 2013 UPDATE
SUBJECT: SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE KEYWORDS:, CORE, PROCESSOR, GRAPHICS, DRIVER, RAM, STORAGE SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE Below is a summary of key components of an ideal SolidWorks
More informationIn-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps Yu Su, Yi Wang, Gagan Agrawal The Ohio State University Motivation HPC Trends Huge performance gap CPU: extremely fast for generating
More informationFast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
More informationFluid-Structure Acoustic Analysis with Bidirectional Coupling and Sound Transmission
VPE Swiss Workshop Acoustic Simulation 12. Sept. 2013 Fluid-Structure Acoustic Analysis with Bidirectional Coupling and Sound Transmission Reinhard Helfrich INTES GmbH, Stuttgart info@intes.de www.intes.de
More informationAN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS
AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS by M. Storti, L. Dalcín, R. Paz Centro Internacional de Métodos Numéricos en Ingeniería - CIMEC INTEC, (CONICET-UNL), Santa Fe, Argentina
More informationIntroduction to the Siemens PLM End to End Solution for Composites
Introduction to the Siemens PLM End to End Solution for Composites Restricted Siemens AG 2014 2013 All rights reserved. Page 1 Siemens PLM is Dedicated to the Future of Lightweight Engineering Bringing
More informationHPC enabling of OpenFOAM R for CFD applications
HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,
More informationDispersion diagrams of a water-loaded cylindrical shell obtained from the structural and acoustic responses of the sensor array along the shell
Dispersion diagrams of a water-loaded cylindrical shell obtained from the structural and acoustic responses of the sensor array along the shell B.K. Jung ; J. Ryue ; C.S. Hong 3 ; W.B. Jeong ; K.K. Shin
More informationCluster Computing at HRI
Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: jasjeet@mri.ernet.in 1 Introduction and some local history High performance computing
More informationWorkshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012
Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),
More informationIntroduction. 1.1 Motivation. Chapter 1
Chapter 1 Introduction The automotive, aerospace and building sectors have traditionally used simulation programs to improve their products or services, focusing their computations in a few major physical
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationHyperQ Storage Tiering White Paper
HyperQ Storage Tiering White Paper An Easy Way to Deal with Data Growth Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com info@parseclabs.com
More informationData Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
More information