ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH PERFORMANCE COMPUTING


 Paul Pitts
 3 years ago
 Views:
Transcription
1 ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager* Michael Wood Developer* Cristian Ianculescu Developer* Mintae Kim Developer* Andrzej Bajer Developer* *Dassault Systèmes Simulia Corp. Geraud Krawezik Developer, ACCELEWARE, Canada ABSTRACT In the last decade, significant R&D resources have been invested to deliver commercially available technologies that meet current and future mechanical engineering industry requirements, both in terms of mechanics and performance. While significant focus has been given to developing robust nonlinear finite element analysis technology, there has also been continued investment in developing advancements for linear dynamic analyses. The research and development efforts have focused on combining advanced linear and nonlinear technology to provide accurate, yet fast modelling of noise and vibration engineering problems. This effort has enabled highfidelity models to
2 run in a reasonable time which is vital for virtual prototyping within shortened product design cycles. While it is very true that model sizes (degrees of freedom) have grown significantly during this period, the complexity of the models has also increased, which has led to a larger number of total iterations within nonlinear implicit analyses, and to a large number of eigenmodes within linear dynamic simulations. An innovative approach has been developed to leverage highperformance computing (HPC) resources to yield reasonable turnaround times for such analyses by taking advantage of massive parallelism without sacrificing any mechanical formulation quality. The accessibility and affordability of HPC hardware in the past few years has changed the landscape of commercial finite element analysis software usage and applications. This change has come in response to an expressed desire from engineers and designers to run their existing simulations faster, or in many cases to run more realistic jobs. Due to their computational cost and lack of highperformance commercial software, such "highend" simulations were until recently thought to be only available to academic institutions or government research laboratories which typically developed their own HPC applications. Today, with the advent of affordable multicore SMP workstations and compute clusters with multicore nodes and highspeed interconnects equipped with GPGPU accelerators, HPC is now sought after by many engineers for routine FEA. This presents a challenge for commercial FEA software vendors which have to adapt their decades old legacy code to take advantage of stateoftheart HPC platforms. Given this background, this paper focuses on how recent developments in HPC have affected the performance of linear dynamic and implicit nonlinear analyses. Two main HPC developments are studied. First, we look into the performance and scalability of the commercially available Abaqus AMS eigenvalue solver, and of the entire frequency response simulation running on multicore SMP workstations. Advances in the AMS eigenvalue solution procedure and linear dynamic capabilities make the realistic simulation solution suitable for a wide range of vehiclelevel noise and vibration simulations. Next, we will discuss the progress made in relatively new, but very active area of high performance commercial FE software development, which is based on taking advantage of high performance GPGPU accelerators. Efficient adoption of GPGPU in such products is a very challenging task which requires significant rearchitecture of the existent code. We describe the experience in integrating GPGPU acceleration into complex commercial engineering software. In particular we discuss the tradeoff we had to make and the benefits we obtained from this technology. KEYWORDS HPC, Parallel Computing, Cluster Computing, Equation Solver, Nonlinear Implicit FEA, GPGPU, Modal Linear Dynamics, AMS, Automated Multilevel Substructuring, Abaqus
3 1: AMS (Automatic Multilevel Substructuring) Eigensolver As model meshes become more refined and accurate, the complexity of the models increase, the size of finite element models grows, all while the demand for faster job turnaround time continues to be strong. The role of a modebased approach in linear dynamic analyses becomes crucial given that the direct approach, based on the solution of a system of equations on the physical domain for each excitation frequency, is much more expensive as the size of finite element models grows. The most timeconsuming task in modebased linear dynamic analyses is the solution of a large eigenvalue extraction problem to create the modal basis. The most advanced eigenvalue extraction technology suitable to handle today s needs in the automotive noise and vibration (N&V) simulation is AMLS. Beginning in 2006, SIMULIA began to offer a version of AMLS, marketed as Abaqus/AMS. The performance of the AMS eigensolver, therefore, becomes crucially important to reduce overall analysis runtime in largescale N&V simulations. Over the past three years ( ), the Abaqus AMS eigensolver has evolved from an original serial implementation designed for computers with a single processor and limited memory able to solve problems with a couple of million equations, to the modern style software implementation designed for modern computers with multicore processors and a large amount of memory for solving larger problems with tens of millions of equations. Beginning with the Abaqus 6.10 Enhanced Functionality release, the AMS eigensolver can run in parallel on shared memory computers with multiple processors. Following that release, parallel performance of AMS has been improved substantially. To demonstrate the AMS eigensolver performance on HPC hardware, two automotive industrial models were chosen to run on the machine with four sixcore Intel Xeon Nehalem processors and 128 GB physical memory. The first model, referred to as Model 1, is an automotive vehicle body model with 14.1 million degrees of freedom. This model has an acoustic cavity for coupled structuralacoustic frequency response analysis; the modal basis consists of 5190 structural modes and 266 acoustic modes below the maximum frequency of 600 Hz. The selective recovery capability for the structural domain, which recovers userrequested output variables at the userdefined node set, and the full recovery capability for acoustic domain, which recovers userdefined output variables at all nodes of the model, is used in this simulation. The second model, Model 2, is a powertrain model with 11.2 million degrees of freedom. The modal basis includes 377 modes below 2500 Hz, and the selective recovery capability is used.
4 The prerelease version of Abaqus 6.11 was used to obtain the performance data in both models. Table 1 demonstrates the parallel performance of the AMS eigensolver for Model 1. In the table, FREQ indicates the whole frequency extraction procedure, which includes the AMS eigensolver and the nonscalable nonsolver parts of the code, while AMS indicates the AMS eigensolver itself. The AMS eigensolver takes only 25 minutes to solve the eigenproblem on 16 cores, while it takes about 4 hours on a single core. Nonscalable parts become dominant as the number of cores increases. Figure 1 shows the scalability of the AMS eigensolver based on the data in Table 1. Due to a good parallel speedup of AMS, the frequency extraction procedure FREQ shows a speedup of about 5 overall. Table 1. Performance of the AMS eigensolver (AMS) and frequency extraction procedure (FREQ) for Model 1 Number of Cores FREQ (6.11) Wall Clock Time (h:mm) AMS (6.11) Wall Clock Time (h:mm) 1 4:32 4:01 4 1:38 1:07 8 1:09 0: :56 0:25
5 Figure 1. Scalability of the Abaqus 6.11 AMS eigensolver (AMS) and frequency extraction procedure (FREQ) for Model Speedup Factor FREQ (6.11) AMS (6.11) Number of Cores Table 2 and Figure 2 show the parallel performance and scalability of the frequency extraction procedure (FREQ) and the AMS eigensolver (AMS) for Model 2. Due to a good scalability of the AMS eigensolver, the frequency extraction procedure takes only 36 minutes for this large model, which significantly reduces the overall job turnaround time. Table 2. Performance of the AMS eigensolver (AMS) and frequency extraction procedure (FREQ) for Model 2 Number of Cores FREQ (6.11) Wall Clock Time (h:mm) AMS (6.11) Wall Clock Time (h:mm) 1 2:57 2:33 4 1:03 0:39 8 0:45 0: :36 0:13
6 Figure 2. Scalability of the Abaqus 6.11 AMS eigensolver (AMS) and frequency extraction procedure (FREQ) for Model Speedup Factor FREQ (6.11) AMS (6.11) Number of Cores 2: Modebased Frequency Response Analysis Modebased frequency response analysis is the commonly accepted method by N&V engineers for simulation of noise and vibrations in vehicles and other structures. To reduce the cost of the analysis, the system of equations is solved in a modal subspace. The projection of the finite element system to the modal subspace requires the eigenvalue extraction analysis, which in Abaqus is typically performed using the AMS eigensolver described in the previous section. The projected system of equations in the modal subspace takes the following form: 2 K ω M ωc D Re( Q( ω)) Re( F ( ω)) = 2 ( ωc D K ω M ) Im( Q( ω)) Im( F ( ω)) (1) Here: K  is the system stiffness matrix; M  mass matrix; C  viscous damping matrix; D  structural damping matrix; ω  excitation frequency; Q  generalized displacement; F  force vector; Re()  real part of a complex quantity; Im()  imaginary part of a complex quantity.
7 The size of the modal system (1) is twice the number of modes. If the frequency response is performed in the midfrequency range, often there are more than 10,000 modes in a complex structure. If only diagonal damping is applied, the modebased analysis is really inexpensive because the system of equations (1) becomes decoupled and every equation is solved separately. However, in the midfrequency range the modal damping is not sufficient ient and material damping (e.g., dashpot elements and material structural damping) must be applied to obtain accurate results. The material damping causes the projected damping operators C and/or D in the equation (1) to be fully populated. Thus, the system of linear equations, which is two times the number of modes (2N) in the modal subspace must be solved at every frequency point. With a few hundred to a thousand frequency points, and the number of modes over 10,000, it becomes a rather expensive analysis. Figure 3. The structure of the lefthand side operator for the modebased frequency response analysis In a typical case, when the stiffness matrix is symmetric and constant with respect to excitation frequency, the stiffness and mass operators are reduced to diagonal matrices in the modal subspace. Thus, the structure of the system of modal equations (1) in this case is presented in Figure 3. The diagonal blocks are represented by diagonal matrices (corresponding to a linear combination of projected mass and stiffness operators), while the offdiagonal blocks are fully populated (corresponding to projected structural and viscous damping operators). Traditionally, this system of equations of the size 2N is solved at every frequency. First, we take advantage age of a diagonal structure on a part of the operator and reduce the size of the system by half. Using this reduction we end up with a fully populated system of equations of the size M. For details and derivation of the reduction algorithm we refer to [1]. The reduction phase is dominated by the matrixmatrix multiplication operations, and takes more time than the subsequent solution of the reduced system. Thus, to obtain an efficient parallel algorithm, we need to parallelize both algorithms: the matrixmatrix multiplication and the factorization of the dense system of equations.
8 The parallel algorithm for modebased frequency response analysis is implemented on sharedmemory machines. The computationally expensive ingredients of this algorithm matrixmatrix products and dense linear solves have been parallelized using a taskbased approach. This implementation ensures that the memory consumption remains constant regardless of the number of processors used, while achieving almost linear parallel scaling to the limits of the number of generalpurpose computational cores of modern hardware. To demonstrate the effectiveness of this algorithm we present an example of a typical N&V analysis of structural vibration of a car body. The stiffness matrix is symmetric and the model includes some structural damping, so the projected system looks like the one illustrated in Figure 3. Over 10,000 modes were extracted using Abaqus/AMS eigensolver, and the analysis is performed at 500 frequency points. The presented results were obtained on the machine with four sixcore Intel Xeon Nehalem processors and 128 GB physical memory. Table 3 and Figure 4 show the performance and scalability of the modal frequency response solver. Excellent parallel speedup of on 24cores allows for reducing of the wallclock analysis time from almost 22 hour to about 1 hour. It drastically reduces turnaround time and enables N&V engineers to analyse a few design changes during one business day. Table 3. Analysis time and scalability of the modebased frequency response solver Number of Cores Wall Clock Time (h:mm) Parallel SpeedUp 1 21: : : : : :
9 Figure 4. Analysis time of the modebased frequency response solver on a sharedmemory machine with 24 cores 25 Parallel speedup Number of cores Figure 5 demonstrates s the parallel efficiency of the modal frequency response solver. The efficiency is defined as the parallel speedup divided by the number of cores times 100%. Thus, the parallel efficiency of 100% would indicate the optimal speedup. The presented results demonstrate very good efficiency of the modal frequency response solver of about 95% on 2, 4, and 8 cores. On the 24 cores, the efficiency is just below 90%. Figure 5. Parallel efficiency of the modebased frequency response solver on a sharedmemory mory machine with 24 cores Efficiency [%] Number of cores
10 3: Acceleration of the direct sparse solver using GPGPUs GPGPUs offer exceptional floating point operation speed. With the advent of recent hardware, theoretical double precision floating point operations can be executed at a rate of 500 GFlops. Of course, in order to realize this peak, an algorithm must be embarrassingly parallel since the tremendous processing speed is largely due to massive parallelism of the GPGPU hardware. One of the challenges to exploiting the power of GPGPU in general purpose FEA codes is that it requires rewriting the code in a new language, and adapting the algorithm to maximally utilize the GPGPU hardware. Currently, there are two GPGPU hardware vendors, and each has its own preferred coding language. In order to maximize the benefit of GPGPU performance while minimizing development effort, we chose to apply this technology to the most floating point intensive portion of any implicit FEA program the linear equation solver. With minimal changes to our existing solver, we created an interface for the factorization of individual supernodes in our direct sparse solver. We turned to Acceleware Corporation for the implementation of the GPGPU portion of the project. Their experience with GPGPU acceleration of scientific algorithms was helpful in getting our first implementation up and running quickly. In our current implementation, our GPGPU accelerated direct solver can greatly reduce the time spent in the solver phase of an FEA analysis for a variety of large models. We have learned that there are a number of factors which must be considered when trying to determine the level of benefit to expect when adding GPGPU compute capability to reduce analysis time. Abaqus provides an outofcore solver, however, when enough memory is available, the factorization and subsequent backward pass remains incore and delivers optimal performance. Once the problem size exceeds the system memory, I/O costs will become significant and reduces the overall benefit of GPGPU acceleration. Another factor is the size of the FEA model. The most important measure of size in this case is not the number of degrees of freedom (DoF) in the model, but the number of floating point operations required for factorization. Thus, a 5 million DoF solid element model may be more computationally intensive than a 10 million DoF shell element model. The target we set for performance gain was an overall speedup of 2x for the analysis wall clock time for our benchmark automotive powertrain model, when compared to the performance on a 4 core parallel run. The actual results are shown in Figure 6, identified by the number of floating point operations in the solver for this model (1.0E+13). This chart is arranged to show how the amount of work in the solver correlates to performance improvements when
11 using a GPGPU for compute acceleration. The effectiveness of GPGPU acceleration increases with problem size up to the point where the factorization can no longer fit in core, or an individual supernode does not fit in the GPGPU memory. Figure 6. Effect of GPGPU acceleration on the performance of 4 core parallel runs 4.00 GPGPU speedup (4 core / (4 core + gpu)) E E E E E E E E E E E E E+14 Today, it is common for high performance workstations or compute cluster nodes have 8 cores. For comparison, see the results in the chart of Figure 7 for 8 core + GPGPU vs. 8 core runs for some of the larger test cases. Here, the addition of GPGPU acceleration is again beneficial, but not to the same level. Increasing the number of core increases the number of branches in the supernode tree that are solved concurrently. When more than one branch has an eligible supernode for processing on the GPGPU, there is contention for the GPGPU resource. This results in a delay (waiting for GPGPU to be available), or processing the supernode on the slower CPU resources.
12 Figure 7. Effect of GPGPU acceleration on the performance of 8 core parallel runs Speed up (8 cpu / (8 cpu + gpu)) E E E E+14 Future developments to further leverage GPGPU acceleration of our direct sparse solver will target deployment on multiple nodes of a compute cluster. Going forward we hope to find applications for GPGPU compute acceleration outside of our direct sparse solver. REFERENCES 1. Bajer, A., Performance Improvement Algorithm for ModeBased Frequency Response Analysis, SAE Paper No , 2009.
NOISE, VIBRATION, AND HARSHNESS (NVH) ANALYSIS OF A FULL VEHICLE MODEL
NOISE, VIBRATION, AND HARSHNESS (NVH) ANALYSIS OF A FULL VEHICLE MODEL SUMMARY This technology brief illustrates typical modebased noise, vibration, and harshness (NVH) analyses of a full automobile model
More informationHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions. Efficient and Smart Solutions for Large Models
High Performance Computing: A Review of Parallel Computing with ANSYS solutions Efficient and Smart Solutions for Large Models 1 Use ANSYS HPC solutions to perform efficient design variations of large
More informationRecommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
More informationLeveraging Windows HPC Server for Cluster Computing with Abaqus FEA
Leveraging Windows HPC Server for Cluster Computing with Abaqus FEA This white paper outlines the benefits of using Windows HPC Server as part of a cluster computing solution for performing realistic simulation.
More informationThree Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture
White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationFinite Elements Infinite Possibilities. Virtual Simulation and HighPerformance Computing
Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Finite Elements Infinite Possibilities. Virtual Simulation and HighPerformance Computing Microsoft Windows Compute Cluster Server Runs
More informationLSDYNA BestPractices: Networking, MPI and Parallel File System Effect on LSDYNA Performance
11 th International LSDYNA Users Conference Session # LSDYNA BestPractices: Networking, MPI and Parallel File System Effect on LSDYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 FamilyBased Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on FamilyBased Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationLinear Dynamics with Abaqus
Linear Dynamics with Abaqus 2016 About this Course Course objectives Upon completion of this course you will be able to: Extract eigenmodes about a certain frequency Determine whether the number of extracted
More informationAccelerating CST MWS Performance with GPU and MPI Computing. CST workshop series
Accelerating CST MWS Performance with GPU and MPI Computing www.cst.com CST workshop series 2010 1 Hardware Based Acceleration Techniques  Overview  Multithreading GPU Computing Distributed Computing
More informationThe Value of HighPerformance Computing for Simulation
White Paper The Value of HighPerformance Computing for Simulation Highperformance computing (HPC) is an enormous part of the present and future of engineering simulation. HPC allows bestinclass companies
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationBest practices for efficient HPC performance with large models
Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013  Industry Oriented HPC Simulations, September 2127, University of Ljubljana,
More informationClusters: Mainstream Technology for CAE
Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux
More informationMixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationAccelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
More informationScaling Study of LSDYNA MPP on High Performance Servers
Scaling Study of LSDYNA MPP on High Performance Servers YounSeo Roh Sun Microsystems, Inc. 901 San Antonio Rd, MS MPK24201 Palo Alto, CA 94303 USA younseo.roh@sun.com 1725 ABSTRACT With LSDYNA MPP,
More informationCluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
More informationParallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization ProtoApplications
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best
More informationDavid Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer  Universidad de Alcalá >2 years building and managing clusters UPM
More informationLSDYNA Scalability on Cray Supercomputers. TinTing Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LSDYNA Scalability on Cray Supercomputers TinTing Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WPLSDYNA12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
More informationInteractive comment on A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc by W. He et al.
Geosci. Model Dev. Discuss., 8, C1166 C1176, 2015 www.geoscimodeldevdiscuss.net/8/c1166/2015/ Author(s) 2015. This work is distributed under the Creative Commons Attribute 3.0 License. Geoscientific
More informationUnderstanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
More informationHPC Deployment of OpenFOAM in an Industrial Setting
HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 2829 March 2011 HPC Deployment
More informationAchieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro and even nanoseconds.
More informationAbaqus Performance Benchmark and Profiling. March 2015
Abaqus 6.142 Performance Benchmark and Profiling March 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information
More informationAdvanced Core Operating System (ACOS): Experience the Performance
WHITE PAPER Advanced Core Operating System (ACOS): Experience the Performance Table of Contents Trends Affecting Application Networking...3 The Era of Multicore...3 Multicore System Design Challenges...3
More informationGPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g CarlFriedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
More informationWhy Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are
More informationIBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain
More informationOverview of HPC Resources at Vanderbilt
Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPUCPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationSGI HPC Systems Help Fuel Manufacturing Rebirth
SGI HPC Systems Help Fuel Manufacturing Rebirth Created by T A B L E O F C O N T E N T S 1.0 Introduction 1 2.0 Ongoing Challenges 1 3.0 Meeting the Challenge 2 4.0 SGI Solution Environment and CAE Applications
More informationA Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H906
More informationBenchmark Tests on ANSYS Parallel Processing Technology
Benchmark Tests on ANSYS Parallel Processing Technology Kentaro Suzuki ANSYS JAPAN LTD. Abstract It is extremely important for manufacturing industries to reduce their design process period in order to
More informationGPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk OxfordMan Institute of Quantitative Finance Oxford University Mathematical Institute Oxford eresearch
More informationHSL and its outofcore solver
HSL and its outofcore solver Jennifer A. Scott j.a.scott@rl.ac.uk Prague November 2006 p. 1/37 Sparse systems Problem: we wish to solve where A is Ax = b LARGE Informal definition: A is sparse if many
More informationAccelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
More informationExperiments in Unstructured Mesh Finite Element CFD Using CUDA
Experiments in Unstructured Mesh Finite Element CFD Using CUDA Graham Markall Software Performance Imperial College London http://www.doc.ic.ac.uk/~grm08 grm08@doc.ic.ac.uk Joint work with David Ham and
More informationShattering the 1U Server Performance Record. Figure 1: Supermicro Product and Market Opportunity Growth
Shattering the 1U Server Performance Record Supermicro and NVIDIA recently announced a new class of servers that combines massively parallel GPUs with multicore CPUs in a single server system. This unique
More informationLargeScale Reservoir Simulation and Big Data Visualization
LargeScale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)
More informationHigh Performance Computing. Course Notes 20072008. HPC Fundamentals
High Performance Computing Course Notes 20072008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define  it s a moving target. Later 1980s, a supercomputer performs
More informationNAPA/MAESTRO Interface. Reducing the Level of Effort for Ship Structural Design
NAPA/MAESTRO Interface Reducing the Level of Effort for Ship Structural Design 12/3/2010 Table of Contents Introduction... 1 Why Create a NAPA/MAESTRO Interface... 1 Level of Effort Comparison for Two
More informationRecent Advances in HPC for Structural Mechanics Simulations
Recent Advances in HPC for Structural Mechanics Simulations 1 Trends in Engineering Driving Demand for HPC Increase product performance and integrity in less time Consider more design variants Find the
More informationTurbomachinery CFD on manycore platforms experiences and strategies
Turbomachinery CFD on manycore platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 2729
More informationPerformance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317. http://www.ansys.com (T) 7247463304 (F) 7245149494
Performance Guide ANSYS, Inc. Release 12.1 Southpointe November 2009 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317 certified to ISO ansysinfo@ansys.com 9001:2008. http://www.ansys.com (T) 7247463304
More informationIntroduction to High Performance Cluster Computing. Cluster Training for UCL Part 1
Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these
More informationSpeedup von Analysen und Optimierungen mit OptiStruct
Beginn: 11:00 Uhr Innovation Intelligence Speedup von Analysen und Optimierungen mit OptiStruct Kristian Holm (12.07.2013) HyperWorks Best Practice www.altairhyperworks.de/bestpractice Agenda the computing
More informationBack to Elements  Tetrahedra vs. Hexahedra
Back to Elements  Tetrahedra vs. Hexahedra Erke Wang, Thomas Nelson, Rainer Rauch CADFEM GmbH, Munich, Germany Abstract This paper presents some analytical results and some test results for different
More informationThe Application of Process Automation and Optimisation in the Rapid Development of New Passenger Vehicles at SAIC Motor
The Application of Process Automation and Optimisation in the Rapid Development of New Passenger Vehicles at SAIC Motor Dave Husson Vehicle CAE Manager, SAIC Motor UK Technical Centre Lowhill Lane, Longbridge,
More informationHigh Performance Matrix Inversion with Several GPUs
High Performance Matrix Inversion on a Multicore Platform with Several GPUs Pablo Ezzatti 1, Enrique S. QuintanaOrtí 2 and Alfredo Remón 2 1 Centro de CálculoInstituto de Computación, Univ. de la República
More informationDesign and Optimization of OpenFOAMbased CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAMbased CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationIntel SolidState Drives Increase Productivity of Product Design and Simulation
WHITE PAPER Intel SolidState Drives Increase Productivity of Product Design and Simulation Intel SolidState Drives Increase Productivity of Product Design and Simulation A study of how Intel SolidState
More informationANSYS Solvers: Usage and Performance. Ansys equation solvers: usage and guidelines. Gene Poole Ansys Solvers Team, April, 2002
ANSYS Solvers: Usage and Performance Ansys equation solvers: usage and guidelines Gene Poole Ansys Solvers Team, April, 2002 Outline Basic solver descriptions Direct and iterative methods Why so many choices?
More informationReconfigurable Architecture Requirements for CoDesigned Virtual Machines
Reconfigurable Architecture Requirements for CoDesigned Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra
More informationSpeeding up MATLAB Applications
Speeding up MATLAB Applications Mannheim, 19. Februar 2014 Michael Glaßer Dipl.Ing. Application Engineer 2014 The MathWorks, Inc. 1 Ihr MathWorks Team heute: Andreas Himmeldorf Senior Team Leader Educational
More informationThe Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.
White Paper 0213133 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,
More informationAbaqus Technology Brief. Automobile Roof Crush Analysis with Abaqus
Abaqus Technology Brief Automobile Roof Crush Analysis with Abaqus TB06RCA1 Revised: April 2007. Summary The National Highway Traffic Safety Administration (NHTSA) mandates the use of certain test procedures
More informationOptimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
More informationIcepak HighPerformance Computing at Rockwell Automation: Benefits and Benchmarks
Icepak HighPerformance Computing at Rockwell Automation: Benefits and Benchmarks Garron K. Morris Senior Project Thermal Engineer gkmorris@ra.rockwell.com Standard Drives Division Bruce W. Weiss Principal
More informationYALES2 porting on the Xeon Phi Early results
YALES2 porting on the Xeon Phi Early results Othman Bouizi Ghislain Lartigue Innovation and Pathfinding Architecture Group in Europe, Exascale Lab. Paris CRIHAN  Demijournée calcul intensif, 16 juin
More informationAccelerating Automotive Design with InfiniBand
February 2009 Accelerating Automotive Design with InfiniBand 1.0 Abstract 1.1 Introduction 1.0 Abstract... 1 1.1 Introduction... 1 1.2 Automotive Crash Simulations... 1 1.3 Multicore Cluster Environments...
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction  Hardware
More informationHeat Transfer and ThermalStress Analysis with Abaqus
Heat Transfer and ThermalStress Analysis with Abaqus 2016 About this Course Course objectives Upon completion of this course you will be able to: Perform steadystate and transient heat transfer simulations
More informationCalculation of Eigenmodes in Superconducting Cavities
Calculation of Eigenmodes in Superconducting Cavities W. Ackermann, C. Liu, W.F.O. Müller, T. Weiland Institut für Theorie Elektromagnetischer Felder, Technische Universität Darmstadt Status Meeting December
More informationLecture 11: MultiCore and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: MultiCore and GPU Multicore computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 MultiCore System Integration of multiple processor cores on a single chip. To provide
More informationMEng, BSc Applied Computer Science
School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions
More informationBuilding a Top500class Supercomputing Cluster at LNSBUAP
Building a Top500class Supercomputing Cluster at LNSBUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
More informationGPGPU acceleration in OpenFOAM
CarlFriedrich Gauß Faculty GPGPU acceleration in OpenFOAM Northern germany OpenFoam User meeting Braunschweig Institute of Technology Thorsten Grahs Institute of Scientific Computing/movecsc 2nd October
More informationLS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
More informationThe Methodology of Application Development for Hybrid Architectures
Computer Technology and Application 4 (2013) 543547 D DAVID PUBLISHING The Methodology of Application Development for Hybrid Architectures Vladimir Orekhov, Alexander Bogdanov and Vladimir Gaiduchok Department
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationThe Lattice Project: A MultiModel Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland
The Lattice Project: A MultiModel Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which
More informationHyperQ Storage Tiering White Paper
HyperQ Storage Tiering White Paper An Easy Way to Deal with Data Growth Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 17632198811 www.parseclabs.com info@parseclabs.com
More informationAutomotive Brake Squeal Analysis Using a Complex Modes Approach
Abaqus Technology Brief TB05BRAKE1 Revised: April 2007. Automotive Brake Squeal Analysis Using a Complex Modes Approach Summary A methodology to study frictioninduced squeal in a complete automotive
More informationIterate More, Innovate Faster Working Differently with QuadCore Intel Xeon ProcessorBased HP Workstations and Dassault Systèmes Solutions
White Paper QuadCore Intel Xeon Processor Iterate More, Innovate Faster Working Differently with QuadCore Intel Xeon ProcessorBased HP Workstations and Dassault Systèmes Solutions Innovative Intel QuadCore
More informationSimulation Platform Overview
Simulation Platform Overview Build, compute, and analyze simulations on demand www.rescale.com CASE STUDIES Companies in the aerospace and automotive industries use Rescale to run faster simulations Aerospace
More information1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for HighEnd Computing October 1, 2013 Lyon, France
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture SharedMemory
More informationInformation management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse
Information management software solutions White paper Powerful data warehousing performance with IBM Red Brick Warehouse April 2004 Page 1 Contents 1 Data warehousing for the masses 2 Single step load
More informationBalancing Manufacturability and Optimal Structural Performance for Laminate Composites through a Genetic Algorithm
Balancing Manufacturability and Optimal Structural Performance for Laminate Composites through a Genetic Algorithm Mike Stephens Senior Composites Stress Engineer, Airbus UK Composite Research, Golf Course
More informationInfrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8based scaleout Power System to Intel E5 v2 x86 based scaleout systems. A followon report
More informationAN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS
AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS by M. Storti, L. Dalcín, R. Paz Centro Internacional de Métodos Numéricos en Ingeniería  CIMEC INTEC, (CONICETUNL), Santa Fe, Argentina
More informationTableau Server 7.0 scalability
Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different
More informationCluster Computing at HRI
Cluster Computing at HRI J.S.Bagla HarishChandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. Email: jasjeet@mri.ernet.in 1 Introduction and some local history High performance computing
More informationIntroduction. 1.1 Motivation. Chapter 1
Chapter 1 Introduction The automotive, aerospace and building sectors have traditionally used simulation programs to improve their products or services, focusing their computations in a few major physical
More informationWorkshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012
Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),
More informationIntroduction to the Siemens PLM End to End Solution for Composites
Introduction to the Siemens PLM End to End Solution for Composites Restricted Siemens AG 2014 2013 All rights reserved. Page 1 Siemens PLM is Dedicated to the Future of Lightweight Engineering Bringing
More informationHIGH PERFORMANCE CONSULTING COURSE OFFERINGS
Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
More informationPyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms
PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014 Overview Motivation Flux Reconstruction ManyCore
More informationData Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
More informationScaling Objectivity Database Performance with Panasas ScaleOut NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas ScaleOut NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
More informationPREDICTION OF MACHINE TOOL SPINDLE S DYNAMICS BASED ON A THERMOMECHANICAL MODEL
PREDICTION OF MACHINE TOOL SPINDLE S DYNAMICS BASED ON A THERMOMECHANICAL MODEL P. Kolar, T. Holkup Research Center for Manufacturing Technology, Faculty of Mechanical Engineering, CTU in Prague, Czech
More information