ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING"

Transcription

1 ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager* Michael Wood Developer* Cristian Ianculescu Developer* Mintae Kim Developer* Andrzej Bajer Developer* *Dassault Systèmes Simulia Corp. Geraud Krawezik Developer, ACCELEWARE, Canada ABSTRACT In the last decade, significant R&D resources have been invested to deliver commercially available technologies that meet current and future mechanical engineering industry requirements, both in terms of mechanics and performance. While significant focus has been given to developing robust nonlinear finite element analysis technology, there has also been continued investment in developing advancements for linear dynamic analyses. The research and development efforts have focused on combining advanced linear and nonlinear technology to provide accurate, yet fast modelling of noise and vibration engineering problems. This effort has enabled high-fidelity models to

2 run in a reasonable time which is vital for virtual prototyping within shortened product design cycles. While it is very true that model sizes (degrees of freedom) have grown significantly during this period, the complexity of the models has also increased, which has led to a larger number of total iterations within nonlinear implicit analyses, and to a large number of eigenmodes within linear dynamic simulations. An innovative approach has been developed to leverage highperformance computing (HPC) resources to yield reasonable turn-around times for such analyses by taking advantage of massive parallelism without sacrificing any mechanical formulation quality. The accessibility and affordability of HPC hardware in the past few years has changed the landscape of commercial finite element analysis software usage and applications. This change has come in response to an expressed desire from engineers and designers to run their existing simulations faster, or in many cases to run more realistic jobs. Due to their computational cost and lack of high-performance commercial software, such "high-end" simulations were until recently thought to be only available to academic institutions or government research laboratories which typically developed their own HPC applications. Today, with the advent of affordable multi-core SMP workstations and compute clusters with multi-core nodes and high-speed interconnects equipped with GPGPU accelerators, HPC is now sought after by many engineers for routine FEA. This presents a challenge for commercial FEA software vendors which have to adapt their decades old legacy code to take advantage of state-of-the-art HPC platforms. Given this background, this paper focuses on how recent developments in HPC have affected the performance of linear dynamic and implicit nonlinear analyses. Two main HPC developments are studied. First, we look into the performance and scalability of the commercially available Abaqus AMS eigenvalue solver, and of the entire frequency response simulation running on multi-core SMP workstations. Advances in the AMS eigenvalue solution procedure and linear dynamic capabilities make the realistic simulation solution suitable for a wide range of vehicle-level noise and vibration simulations. Next, we will discuss the progress made in relatively new, but very active area of high performance commercial FE software development, which is based on taking advantage of high performance GPGPU accelerators. Efficient adoption of GPGPU in such products is a very challenging task which requires significant re-architecture of the existent code. We describe the experience in integrating GPGPU acceleration into complex commercial engineering software. In particular we discuss the trade-off we had to make and the benefits we obtained from this technology. KEYWORDS HPC, Parallel Computing, Cluster Computing, Equation Solver, Non-linear Implicit FEA, GPGPU, Modal Linear Dynamics, AMS, Automated Multilevel Substructuring, Abaqus

3 1: AMS (Automatic Multilevel Substructuring) Eigensolver As model meshes become more refined and accurate, the complexity of the models increase, the size of finite element models grows, all while the demand for faster job turn-around time continues to be strong. The role of a modebased approach in linear dynamic analyses becomes crucial given that the direct approach, based on the solution of a system of equations on the physical domain for each excitation frequency, is much more expensive as the size of finite element models grows. The most time-consuming task in mode-based linear dynamic analyses is the solution of a large eigenvalue extraction problem to create the modal basis. The most advanced eigenvalue extraction technology suitable to handle today s needs in the automotive noise and vibration (N&V) simulation is AMLS. Beginning in 2006, SIMULIA began to offer a version of AMLS, marketed as Abaqus/AMS. The performance of the AMS eigensolver, therefore, becomes crucially important to reduce overall analysis runtime in large-scale N&V simulations. Over the past three years ( ), the Abaqus AMS eigensolver has evolved from an original serial implementation designed for computers with a single processor and limited memory able to solve problems with a couple of million equations, to the modern style software implementation designed for modern computers with multi-core processors and a large amount of memory for solving larger problems with tens of millions of equations. Beginning with the Abaqus 6.10 Enhanced Functionality release, the AMS eigensolver can run in parallel on shared memory computers with multiple processors. Following that release, parallel performance of AMS has been improved substantially. To demonstrate the AMS eigensolver performance on HPC hardware, two automotive industrial models were chosen to run on the machine with four sixcore Intel Xeon Nehalem processors and 128 GB physical memory. The first model, referred to as Model 1, is an automotive vehicle body model with 14.1 million degrees of freedom. This model has an acoustic cavity for coupled structural-acoustic frequency response analysis; the modal basis consists of 5190 structural modes and 266 acoustic modes below the maximum frequency of 600 Hz. The selective recovery capability for the structural domain, which recovers user-requested output variables at the user-defined node set, and the full recovery capability for acoustic domain, which recovers user-defined output variables at all nodes of the model, is used in this simulation. The second model, Model 2, is a powertrain model with 11.2 million degrees of freedom. The modal basis includes 377 modes below 2500 Hz, and the selective recovery capability is used.

4 The pre-release version of Abaqus 6.11 was used to obtain the performance data in both models. Table 1 demonstrates the parallel performance of the AMS eigensolver for Model 1. In the table, FREQ indicates the whole frequency extraction procedure, which includes the AMS eigensolver and the non-scalable nonsolver parts of the code, while AMS indicates the AMS eigensolver itself. The AMS eigensolver takes only 25 minutes to solve the eigenproblem on 16 cores, while it takes about 4 hours on a single core. Non-scalable parts become dominant as the number of cores increases. Figure 1 shows the scalability of the AMS eigensolver based on the data in Table 1. Due to a good parallel speedup of AMS, the frequency extraction procedure FREQ shows a speedup of about 5 overall. Table 1. Performance of the AMS eigensolver (AMS) and frequency extraction procedure (FREQ) for Model 1 Number of Cores FREQ (6.11) Wall Clock Time (h:mm) AMS (6.11) Wall Clock Time (h:mm) 1 4:32 4:01 4 1:38 1:07 8 1:09 0: :56 0:25

5 Figure 1. Scalability of the Abaqus 6.11 AMS eigensolver (AMS) and frequency extraction procedure (FREQ) for Model Speedup Factor FREQ (6.11) AMS (6.11) Number of Cores Table 2 and Figure 2 show the parallel performance and scalability of the frequency extraction procedure (FREQ) and the AMS eigensolver (AMS) for Model 2. Due to a good scalability of the AMS eigensolver, the frequency extraction procedure takes only 36 minutes for this large model, which significantly reduces the overall job turn-around time. Table 2. Performance of the AMS eigensolver (AMS) and frequency extraction procedure (FREQ) for Model 2 Number of Cores FREQ (6.11) Wall Clock Time (h:mm) AMS (6.11) Wall Clock Time (h:mm) 1 2:57 2:33 4 1:03 0:39 8 0:45 0: :36 0:13

6 Figure 2. Scalability of the Abaqus 6.11 AMS eigensolver (AMS) and frequency extraction procedure (FREQ) for Model Speedup Factor FREQ (6.11) AMS (6.11) Number of Cores 2: Mode-based Frequency Response Analysis Mode-based frequency response analysis is the commonly accepted method by N&V engineers for simulation of noise and vibrations in vehicles and other structures. To reduce the cost of the analysis, the system of equations is solved in a modal subspace. The projection of the finite element system to the modal subspace requires the eigenvalue extraction analysis, which in Abaqus is typically performed using the AMS eigensolver described in the previous section. The projected system of equations in the modal subspace takes the following form: 2 K ω M ωc D Re( Q( ω)) Re( F ( ω)) = 2 ( ωc D K ω M ) Im( Q( ω)) Im( F ( ω)) (1) Here: K - is the system stiffness matrix; M - mass matrix; C - viscous damping matrix; D - structural damping matrix; ω - excitation frequency; Q - generalized displacement; F - force vector; Re() - real part of a complex quantity; Im() - imaginary part of a complex quantity.

7 The size of the modal system (1) is twice the number of modes. If the frequency response is performed in the mid-frequency range, often there are more than 10,000 modes in a complex structure. If only diagonal damping is applied, the mode-based analysis is really inexpensive because the system of equations (1) becomes decoupled and every equation is solved separately. However, in the mid-frequency range the modal damping is not sufficient ient and material damping (e.g., dashpot elements and material structural damping) must be applied to obtain accurate results. The material damping causes the projected damping operators C and/or D in the equation (1) to be fully populated. Thus, the system of linear equations, which is two times the number of modes (2N) in the modal subspace must be solved at every frequency point. With a few hundred to a thousand frequency points, and the number of modes over 10,000, it becomes a rather expensive analysis. Figure 3. The structure of the left-hand side operator for the mode-based frequency response analysis In a typical case, when the stiffness matrix is symmetric and constant with respect to excitation frequency, the stiffness and mass operators are reduced to diagonal matrices in the modal subspace. Thus, the structure of the system of modal equations (1) in this case is presented in Figure 3. The diagonal blocks are represented by diagonal matrices (corresponding to a linear combination of projected mass and stiffness operators), while the off-diagonal blocks are fully populated (corresponding to projected structural and viscous damping operators). Traditionally, this system of equations of the size 2N is solved at every frequency. First, we take advantage age of a diagonal structure on a part of the operator and reduce the size of the system by half. Using this reduction we end up with a fully populated system of equations of the size M. For details and derivation of the reduction algorithm we refer to [1]. The reduction phase is dominated by the matrix-matrix multiplication operations, and takes more time than the subsequent solution of the reduced system. Thus, to obtain an efficient parallel algorithm, we need to parallelize both algorithms: the matrix-matrix multiplication and the factorization of the dense system of equations.

8 The parallel algorithm for mode-based frequency response analysis is implemented on shared-memory machines. The computationally expensive ingredients of this algorithm matrix-matrix products and dense linear solves have been parallelized using a task-based approach. This implementation ensures that the memory consumption remains constant regardless of the number of processors used, while achieving almost linear parallel scaling to the limits of the number of general-purpose computational cores of modern hardware. To demonstrate the effectiveness of this algorithm we present an example of a typical N&V analysis of structural vibration of a car body. The stiffness matrix is symmetric and the model includes some structural damping, so the projected system looks like the one illustrated in Figure 3. Over 10,000 modes were extracted using Abaqus/AMS eigensolver, and the analysis is performed at 500 frequency points. The presented results were obtained on the machine with four six-core Intel Xeon Nehalem processors and 128 GB physical memory. Table 3 and Figure 4 show the performance and scalability of the modal frequency response solver. Excellent parallel speed-up of on 24-cores allows for reducing of the wall-clock analysis time from almost 22 hour to about 1 hour. It drastically reduces turn-around time and enables N&V engineers to analyse a few design changes during one business day. Table 3. Analysis time and scalability of the mode-based frequency response solver Number of Cores Wall Clock Time (h:mm) Parallel Speed-Up 1 21: : : : : :

9 Figure 4. Analysis time of the mode-based frequency response solver on a shared-memory machine with 24 cores 25 Parallel speed-up Number of cores Figure 5 demonstrates s the parallel efficiency of the modal frequency response solver. The efficiency is defined as the parallel speed-up divided by the number of cores times 100%. Thus, the parallel efficiency of 100% would indicate the optimal speed-up. The presented results demonstrate very good efficiency of the modal frequency response solver of about 95% on 2, 4, and 8 cores. On the 24 cores, the efficiency is just below 90%. Figure 5. Parallel efficiency of the mode-based frequency response solver on a sharedmemory mory machine with 24 cores Efficiency [%] Number of cores

10 3: Acceleration of the direct sparse solver using GPGPUs GPGPUs offer exceptional floating point operation speed. With the advent of recent hardware, theoretical double precision floating point operations can be executed at a rate of 500 GFlops. Of course, in order to realize this peak, an algorithm must be embarrassingly parallel since the tremendous processing speed is largely due to massive parallelism of the GPGPU hardware. One of the challenges to exploiting the power of GPGPU in general purpose FEA codes is that it requires re-writing the code in a new language, and adapting the algorithm to maximally utilize the GPGPU hardware. Currently, there are two GPGPU hardware vendors, and each has its own preferred coding language. In order to maximize the benefit of GPGPU performance while minimizing development effort, we chose to apply this technology to the most floating point intensive portion of any implicit FEA program the linear equation solver. With minimal changes to our existing solver, we created an interface for the factorization of individual supernodes in our direct sparse solver. We turned to Acceleware Corporation for the implementation of the GPGPU portion of the project. Their experience with GPGPU acceleration of scientific algorithms was helpful in getting our first implementation up and running quickly. In our current implementation, our GPGPU accelerated direct solver can greatly reduce the time spent in the solver phase of an FEA analysis for a variety of large models. We have learned that there are a number of factors which must be considered when trying to determine the level of benefit to expect when adding GPGPU compute capability to reduce analysis time. Abaqus provides an out-of-core solver, however, when enough memory is available, the factorization and subsequent backward pass remains in-core and delivers optimal performance. Once the problem size exceeds the system memory, I/O costs will become significant and reduces the overall benefit of GPGPU acceleration. Another factor is the size of the FEA model. The most important measure of size in this case is not the number of degrees of freedom (DoF) in the model, but the number of floating point operations required for factorization. Thus, a 5 million DoF solid element model may be more computationally intensive than a 10 million DoF shell element model. The target we set for performance gain was an overall speedup of 2x for the analysis wall clock time for our benchmark automotive powertrain model, when compared to the performance on a 4 core parallel run. The actual results are shown in Figure 6, identified by the number of floating point operations in the solver for this model (1.0E+13). This chart is arranged to show how the amount of work in the solver correlates to performance improvements when

11 using a GPGPU for compute acceleration. The effectiveness of GPGPU acceleration increases with problem size up to the point where the factorization can no longer fit in core, or an individual supernode does not fit in the GPGPU memory. Figure 6. Effect of GPGPU acceleration on the performance of 4 core parallel runs 4.00 GPGPU speedup (4 core / (4 core + gpu)) E E E E E E E E E E E E E+14 Today, it is common for high performance workstations or compute cluster nodes have 8 cores. For comparison, see the results in the chart of Figure 7 for 8 core + GPGPU vs. 8 core runs for some of the larger test cases. Here, the addition of GPGPU acceleration is again beneficial, but not to the same level. Increasing the number of core increases the number of branches in the supernode tree that are solved concurrently. When more than one branch has an eligible supernode for processing on the GPGPU, there is contention for the GPGPU resource. This results in a delay (waiting for GPGPU to be available), or processing the supernode on the slower CPU resources.

12 Figure 7. Effect of GPGPU acceleration on the performance of 8 core parallel runs Speed up (8 cpu / (8 cpu + gpu)) E E E E+14 Future developments to further leverage GPGPU acceleration of our direct sparse solver will target deployment on multiple nodes of a compute cluster. Going forward we hope to find applications for GPGPU compute acceleration outside of our direct sparse solver. REFERENCES 1. Bajer, A., Performance Improvement Algorithm for Mode-Based Frequency Response Analysis, SAE Paper No , 2009.

NOISE, VIBRATION, AND HARSHNESS (NVH) ANALYSIS OF A FULL VEHICLE MODEL

NOISE, VIBRATION, AND HARSHNESS (NVH) ANALYSIS OF A FULL VEHICLE MODEL NOISE, VIBRATION, AND HARSHNESS (NVH) ANALYSIS OF A FULL VEHICLE MODEL SUMMARY This technology brief illustrates typical mode-based noise, vibration, and harshness (NVH) analyses of a full automobile model

More information

High Performance Computing: A Review of Parallel Computing with ANSYS solutions. Efficient and Smart Solutions for Large Models

High Performance Computing: A Review of Parallel Computing with ANSYS solutions. Efficient and Smart Solutions for Large Models High Performance Computing: A Review of Parallel Computing with ANSYS solutions Efficient and Smart Solutions for Large Models 1 Use ANSYS HPC solutions to perform efficient design variations of large

More information

Recommended hardware system configurations for ANSYS users

Recommended hardware system configurations for ANSYS users Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range

More information

Leveraging Windows HPC Server for Cluster Computing with Abaqus FEA

Leveraging Windows HPC Server for Cluster Computing with Abaqus FEA Leveraging Windows HPC Server for Cluster Computing with Abaqus FEA This white paper outlines the benefits of using Windows HPC Server as part of a cluster computing solution for performing realistic simulation.

More information

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing Microsoft Windows Compute Cluster Server Runs

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Linear Dynamics with Abaqus

Linear Dynamics with Abaqus Linear Dynamics with Abaqus 2016 About this Course Course objectives Upon completion of this course you will be able to: Extract eigenmodes about a certain frequency Determine whether the number of extracted

More information

Accelerating CST MWS Performance with GPU and MPI Computing. CST workshop series

Accelerating CST MWS Performance with GPU and MPI Computing.  CST workshop series Accelerating CST MWS Performance with GPU and MPI Computing www.cst.com CST workshop series 2010 1 Hardware Based Acceleration Techniques - Overview - Multithreading GPU Computing Distributed Computing

More information

The Value of High-Performance Computing for Simulation

The Value of High-Performance Computing for Simulation White Paper The Value of High-Performance Computing for Simulation High-performance computing (HPC) is an enormous part of the present and future of engineering simulation. HPC allows best-in-class companies

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Best practices for efficient HPC performance with large models

Best practices for efficient HPC performance with large models Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013 - Industry Oriented HPC Simulations, September 21-27, University of Ljubljana,

More information

Clusters: Mainstream Technology for CAE

Clusters: Mainstream Technology for CAE Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

Scaling Study of LS-DYNA MPP on High Performance Servers

Scaling Study of LS-DYNA MPP on High Performance Servers Scaling Study of LS-DYNA MPP on High Performance Servers Youn-Seo Roh Sun Microsystems, Inc. 901 San Antonio Rd, MS MPK24-201 Palo Alto, CA 94303 USA youn-seo.roh@sun.com 17-25 ABSTRACT With LS-DYNA MPP,

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best

More information

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM

More information

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability

More information

Interactive comment on A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc by W. He et al.

Interactive comment on A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc by W. He et al. Geosci. Model Dev. Discuss., 8, C1166 C1176, 2015 www.geosci-model-dev-discuss.net/8/c1166/2015/ Author(s) 2015. This work is distributed under the Creative Commons Attribute 3.0 License. Geoscientific

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

HPC Deployment of OpenFOAM in an Industrial Setting

HPC Deployment of OpenFOAM in an Industrial Setting HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Abaqus Performance Benchmark and Profiling. March 2015

Abaqus Performance Benchmark and Profiling. March 2015 Abaqus 6.14-2 Performance Benchmark and Profiling March 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information

More information

Advanced Core Operating System (ACOS): Experience the Performance

Advanced Core Operating System (ACOS): Experience the Performance WHITE PAPER Advanced Core Operating System (ACOS): Experience the Performance Table of Contents Trends Affecting Application Networking...3 The Era of Multicore...3 Multicore System Design Challenges...3

More information

GPGPU accelerated Computational Fluid Dynamics

GPGPU accelerated Computational Fluid Dynamics t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute

More information

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are

More information

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

SGI HPC Systems Help Fuel Manufacturing Rebirth

SGI HPC Systems Help Fuel Manufacturing Rebirth SGI HPC Systems Help Fuel Manufacturing Rebirth Created by T A B L E O F C O N T E N T S 1.0 Introduction 1 2.0 Ongoing Challenges 1 3.0 Meeting the Challenge 2 4.0 SGI Solution Environment and CAE Applications

More information

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

More information

Benchmark Tests on ANSYS Parallel Processing Technology

Benchmark Tests on ANSYS Parallel Processing Technology Benchmark Tests on ANSYS Parallel Processing Technology Kentaro Suzuki ANSYS JAPAN LTD. Abstract It is extremely important for manufacturing industries to reduce their design process period in order to

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

HSL and its out-of-core solver

HSL and its out-of-core solver HSL and its out-of-core solver Jennifer A. Scott j.a.scott@rl.ac.uk Prague November 2006 p. 1/37 Sparse systems Problem: we wish to solve where A is Ax = b LARGE Informal definition: A is sparse if many

More information

Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

More information

Experiments in Unstructured Mesh Finite Element CFD Using CUDA

Experiments in Unstructured Mesh Finite Element CFD Using CUDA Experiments in Unstructured Mesh Finite Element CFD Using CUDA Graham Markall Software Performance Imperial College London http://www.doc.ic.ac.uk/~grm08 grm08@doc.ic.ac.uk Joint work with David Ham and

More information

Shattering the 1U Server Performance Record. Figure 1: Supermicro Product and Market Opportunity Growth

Shattering the 1U Server Performance Record. Figure 1: Supermicro Product and Market Opportunity Growth Shattering the 1U Server Performance Record Supermicro and NVIDIA recently announced a new class of servers that combines massively parallel GPUs with multi-core CPUs in a single server system. This unique

More information

Large-Scale Reservoir Simulation and Big Data Visualization

Large-Scale Reservoir Simulation and Big Data Visualization Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

NAPA/MAESTRO Interface. Reducing the Level of Effort for Ship Structural Design

NAPA/MAESTRO Interface. Reducing the Level of Effort for Ship Structural Design NAPA/MAESTRO Interface Reducing the Level of Effort for Ship Structural Design 12/3/2010 Table of Contents Introduction... 1 Why Create a NAPA/MAESTRO Interface... 1 Level of Effort Comparison for Two

More information

Recent Advances in HPC for Structural Mechanics Simulations

Recent Advances in HPC for Structural Mechanics Simulations Recent Advances in HPC for Structural Mechanics Simulations 1 Trends in Engineering Driving Demand for HPC Increase product performance and integrity in less time Consider more design variants Find the

More information

Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

More information

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317. http://www.ansys.com (T) 724-746-3304 (F) 724-514-9494

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317. http://www.ansys.com (T) 724-746-3304 (F) 724-514-9494 Performance Guide ANSYS, Inc. Release 12.1 Southpointe November 2009 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317 certified to ISO ansysinfo@ansys.com 9001:2008. http://www.ansys.com (T) 724-746-3304

More information

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1 Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these

More information

Speedup von Analysen und Optimierungen mit OptiStruct

Speedup von Analysen und Optimierungen mit OptiStruct Beginn: 11:00 Uhr Innovation Intelligence Speedup von Analysen und Optimierungen mit OptiStruct Kristian Holm (12.07.2013) HyperWorks Best Practice www.altairhyperworks.de/bestpractice Agenda the computing

More information

Back to Elements - Tetrahedra vs. Hexahedra

Back to Elements - Tetrahedra vs. Hexahedra Back to Elements - Tetrahedra vs. Hexahedra Erke Wang, Thomas Nelson, Rainer Rauch CAD-FEM GmbH, Munich, Germany Abstract This paper presents some analytical results and some test results for different

More information

The Application of Process Automation and Optimisation in the Rapid Development of New Passenger Vehicles at SAIC Motor

The Application of Process Automation and Optimisation in the Rapid Development of New Passenger Vehicles at SAIC Motor The Application of Process Automation and Optimisation in the Rapid Development of New Passenger Vehicles at SAIC Motor Dave Husson Vehicle CAE Manager, SAIC Motor UK Technical Centre Lowhill Lane, Longbridge,

More information

High Performance Matrix Inversion with Several GPUs

High Performance Matrix Inversion with Several GPUs High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

Intel Solid-State Drives Increase Productivity of Product Design and Simulation

Intel Solid-State Drives Increase Productivity of Product Design and Simulation WHITE PAPER Intel Solid-State Drives Increase Productivity of Product Design and Simulation Intel Solid-State Drives Increase Productivity of Product Design and Simulation A study of how Intel Solid-State

More information

ANSYS Solvers: Usage and Performance. Ansys equation solvers: usage and guidelines. Gene Poole Ansys Solvers Team, April, 2002

ANSYS Solvers: Usage and Performance. Ansys equation solvers: usage and guidelines. Gene Poole Ansys Solvers Team, April, 2002 ANSYS Solvers: Usage and Performance Ansys equation solvers: usage and guidelines Gene Poole Ansys Solvers Team, April, 2002 Outline Basic solver descriptions Direct and iterative methods Why so many choices?

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Speeding up MATLAB Applications

Speeding up MATLAB Applications Speeding up MATLAB Applications Mannheim, 19. Februar 2014 Michael Glaßer Dipl.-Ing. Application Engineer 2014 The MathWorks, Inc. 1 Ihr MathWorks Team heute: Andreas Himmeldorf Senior Team Leader Educational

More information

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,

More information

Abaqus Technology Brief. Automobile Roof Crush Analysis with Abaqus

Abaqus Technology Brief. Automobile Roof Crush Analysis with Abaqus Abaqus Technology Brief Automobile Roof Crush Analysis with Abaqus TB-06-RCA-1 Revised: April 2007. Summary The National Highway Traffic Safety Administration (NHTSA) mandates the use of certain test procedures

More information

Optimizing Shared Resource Contention in HPC Clusters

Optimizing Shared Resource Contention in HPC Clusters Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs

More information

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks Garron K. Morris Senior Project Thermal Engineer gkmorris@ra.rockwell.com Standard Drives Division Bruce W. Weiss Principal

More information

YALES2 porting on the Xeon- Phi Early results

YALES2 porting on the Xeon- Phi Early results YALES2 porting on the Xeon- Phi Early results Othman Bouizi Ghislain Lartigue Innovation and Pathfinding Architecture Group in Europe, Exascale Lab. Paris CRIHAN - Demi-journée calcul intensif, 16 juin

More information

Accelerating Automotive Design with InfiniBand

Accelerating Automotive Design with InfiniBand February 2009 Accelerating Automotive Design with InfiniBand 1.0 Abstract 1.1 Introduction 1.0 Abstract... 1 1.1 Introduction... 1 1.2 Automotive Crash Simulations... 1 1.3 Multi-core Cluster Environments...

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

Heat Transfer and Thermal-Stress Analysis with Abaqus

Heat Transfer and Thermal-Stress Analysis with Abaqus Heat Transfer and Thermal-Stress Analysis with Abaqus 2016 About this Course Course objectives Upon completion of this course you will be able to: Perform steady-state and transient heat transfer simulations

More information

Calculation of Eigenmodes in Superconducting Cavities

Calculation of Eigenmodes in Superconducting Cavities Calculation of Eigenmodes in Superconducting Cavities W. Ackermann, C. Liu, W.F.O. Müller, T. Weiland Institut für Theorie Elektromagnetischer Felder, Technische Universität Darmstadt Status Meeting December

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

MEng, BSc Applied Computer Science

MEng, BSc Applied Computer Science School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions

More information

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

More information

GPGPU acceleration in OpenFOAM

GPGPU acceleration in OpenFOAM Carl-Friedrich Gauß Faculty GPGPU acceleration in OpenFOAM Northern germany OpenFoam User meeting Braunschweig Institute of Technology Thorsten Grahs Institute of Scientific Computing/move-csc 2nd October

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

The Methodology of Application Development for Hybrid Architectures

The Methodology of Application Development for Hybrid Architectures Computer Technology and Application 4 (2013) 543-547 D DAVID PUBLISHING The Methodology of Application Development for Hybrid Architectures Vladimir Orekhov, Alexander Bogdanov and Vladimir Gaiduchok Department

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which

More information

HyperQ Storage Tiering White Paper

HyperQ Storage Tiering White Paper HyperQ Storage Tiering White Paper An Easy Way to Deal with Data Growth Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com info@parseclabs.com

More information

Automotive Brake Squeal Analysis Using a Complex Modes Approach

Automotive Brake Squeal Analysis Using a Complex Modes Approach Abaqus Technology Brief TB-05-BRAKE-1 Revised: April 2007. Automotive Brake Squeal Analysis Using a Complex Modes Approach Summary A methodology to study friction-induced squeal in a complete automotive

More information

Iterate More, Innovate Faster Working Differently with Quad-Core Intel Xeon Processor-Based HP Workstations and Dassault Systèmes Solutions

Iterate More, Innovate Faster Working Differently with Quad-Core Intel Xeon Processor-Based HP Workstations and Dassault Systèmes Solutions White Paper Quad-Core Intel Xeon Processor Iterate More, Innovate Faster Working Differently with Quad-Core Intel Xeon Processor-Based HP Workstations and Dassault Systèmes Solutions Innovative Intel Quad-Core

More information

Simulation Platform Overview

Simulation Platform Overview Simulation Platform Overview Build, compute, and analyze simulations on demand www.rescale.com CASE STUDIES Companies in the aerospace and automotive industries use Rescale to run faster simulations Aerospace

More information

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing 1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse Information management software solutions White paper Powerful data warehousing performance with IBM Red Brick Warehouse April 2004 Page 1 Contents 1 Data warehousing for the masses 2 Single step load

More information

Balancing Manufacturability and Optimal Structural Performance for Laminate Composites through a Genetic Algorithm

Balancing Manufacturability and Optimal Structural Performance for Laminate Composites through a Genetic Algorithm Balancing Manufacturability and Optimal Structural Performance for Laminate Composites through a Genetic Algorithm Mike Stephens Senior Composites Stress Engineer, Airbus UK Composite Research, Golf Course

More information

Infrastructure Matters: POWER8 vs. Xeon x86

Infrastructure Matters: POWER8 vs. Xeon x86 Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report

More information

AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS

AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS by M. Storti, L. Dalcín, R. Paz Centro Internacional de Métodos Numéricos en Ingeniería - CIMEC INTEC, (CONICET-UNL), Santa Fe, Argentina

More information

Tableau Server 7.0 scalability

Tableau Server 7.0 scalability Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different

More information

Cluster Computing at HRI

Cluster Computing at HRI Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: jasjeet@mri.ernet.in 1 Introduction and some local history High performance computing

More information

Introduction. 1.1 Motivation. Chapter 1

Introduction. 1.1 Motivation. Chapter 1 Chapter 1 Introduction The automotive, aerospace and building sectors have traditionally used simulation programs to improve their products or services, focusing their computations in a few major physical

More information

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012 Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),

More information

Introduction to the Siemens PLM End to End Solution for Composites

Introduction to the Siemens PLM End to End Solution for Composites Introduction to the Siemens PLM End to End Solution for Composites Restricted Siemens AG 2014 2013 All rights reserved. Page 1 Siemens PLM is Dedicated to the Future of Lightweight Engineering Bringing

More information

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014 Overview Motivation Flux Reconstruction Many-Core

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

PREDICTION OF MACHINE TOOL SPINDLE S DYNAMICS BASED ON A THERMO-MECHANICAL MODEL

PREDICTION OF MACHINE TOOL SPINDLE S DYNAMICS BASED ON A THERMO-MECHANICAL MODEL PREDICTION OF MACHINE TOOL SPINDLE S DYNAMICS BASED ON A THERMO-MECHANICAL MODEL P. Kolar, T. Holkup Research Center for Manufacturing Technology, Faculty of Mechanical Engineering, CTU in Prague, Czech

More information