Hierarchically Parallel FE Software for Assembly Structures : FrontISTR - Parallel Performance Evaluation and Its Industrial Applications
|
|
- Ferdinand Webb
- 8 years ago
- Views:
Transcription
1 CO-DESIGN 2012, October 23-25, 2012 Peing University, Beijing Hierarchically Parallel FE Software for Assembly Structures : FrontISTR - Parallel Performance Evaluation and Its Industrial Applications Hiroshi Ouda The University of Toyo ouda@.u-toyo.ac.jp
2 Outline Bacground : Towards peta/exascale computing Necessary advances in programming models and software FrontISTR : as an HPC tool for industry Large-grain Parallelism Assembly structures under hierarchical gridding Small-grain Parallelism Blocing with padding for multicore CPU (and GPUs) Summary
3 Necessary advances in programming models and software Fast SpMV for unstructured grid Two (at most) nested programming model, i.e. message passing and loop decomposition Keep consistency and stability B/F required in program consistent with H/W Automatic generation of compiler directives, which can consider the dependency, after trial runs. middleware ( mid-level interface ), bridging the application and the BLAS based numerical libraries. Uncertainty or ris in the physical model Hierarchical consistency between H/W configuration and the physical modeling, particularly in engineering fields.
4 FrontISTR built on HEC-MW FrontISTR Nonlinear analysis functions Hyper-elasticity/Thermal-elasticplastic/Visco-elastic/Creep, Combined hardening rule Total/Updated Lagrangian Finite slip contact, Friction HEC-MW Advanced features of parallel FEM Hierarchical mesh refinement Assembly structure Up to O(10 5 ) nodes Portability From MPP to PC CAE cloud Nonlinear structural analysis functions have been deployed on a parallel FEM basis: HEC-MW.
5 Structural Analysis Functions Supported in FrontISTR Function Supported contents Static Material Geometry Boundary Elastic/Hyper-elasticity/Thermal- Elastic-Plastic/Visco-Elastic/Creep, Combined hardening rule Total Lagrangian/Updated Lagrangian Augmented Lagrangian/Lagrangian multiplier method, Finite slip contact, Friction Dynamic Linear/Nonlinear, Explicit/Implicit Eigen value Lanczos method ( considering differential stiffness ) Heat Steady / Non-steady (implicit), Nonlinear
6 Front ISTR
7 Thermal-Elastic-Plastic Analysis of Welding Residual Stress Joint research with IHI Heat source transfer along a welding line Residual stress induced by plastic deformation Temperature
8 Cupping press simulation / Elasto-plasticity and friction on contact faces A punch is plugged into a blan, which is placed between a die and a blan holder. The blan is formed into a cylinder shape as the punch is plugged. 8
9
10 Friction of power transmission belt Joint research with Mitsuboshi Belt 対 称 性 により 幅 方 向 に 半 分 のみモデル 化 ベルト V belt プーリ 面 ( 剛 体 ) 負 荷 トルク Active 駆 動 プーリ 回 転 ゴム 大 規 模 解 析 へのニーズ 従 Passive 動 プーリ 心 線 帆 布 軸 荷 重
11 Structural integrity analysis of electrical devices 先 端 力 学 シミュレーション 研 究 所 殿 ご 提 供
12 Contact force analysis of brae dis
13 C B A
14 Large-grain Parallelism : Parallelization based on domain decomposition メッシュ 分 割 領 域 分 割 Local Data Local Data Local Data Local Data FEM Code FEM Code FEM Code FEM Code Solver Subsystem MPI Solver Subsystem MPI Solver Subsystem MPI Solver Subsystem
15 Data structure for assembly structures with parallel and hierarchical gridding A) Partitioning ( MPI rans ) B) Hierarchical level C) Assembly model Assembly_2 Level_1 C) C) Assembly_1 Assembly_1 Level_1 Level_1 MPC Refine B) Partitioning A) Assembly_2 Level_2 C) Partitioning A) C) Assembly_1 Assembly_1 Level_2 Level_2 MPC
16 Iterative solvers with MPC * preconditioning CG itrs. CPU (sec) CPU/CG itr. (sec) MPCCG 14,203 3, Penalty+CG 171,354 40, Mises stress r p KTu T f T r = = T T = 0, L 1, T T KTp T r r p u u KTp T p r r α α α = + = = ), ( ), ( p r p r r r r β β + = = ), ( ), ( Tu u = For Chec convergence end Algorithm *) Multi-point constraint T: Sparce MPC matrix
17 Front ISTR Assembled Structure: Piping composed of many parts 5 pipes & 32 bolts 10mm 2 nd order tet-mesh 3,093,453 elements 5,433,029 nodes Num. of MPC : 70,166 fixed Piping system composed of many parts is easily handled. Mises stress
18 Strong Scale with Refiner - Static linear analysis of machine part - 2 nd order tetra element - PCG (eps=10^-6) FX10@UT SPARC64 Ixfx(1.848 GHz) 1CPU (16core)/node
19 Performance of 1 node is a crucial factor Access of innermost loop Simulator 2 Intra-node parallel - Remove dependency by ordering - OpenMP and/or vectorization
20 Blocing with padding for multicore CPU and GPUs SpMV in iterative solvers is crucial for unstructured grid. For improving B/F ratio Blocing, Padding, ELL+CSR. SpMV AXPY & AYPX DOT Breadown of CPU in CG operations Bloced CSR HYB format = ELL + CSR4
21 Flop/Byte SpMV with CSR: Flop/Byte = 1/{6*(1+m/nnz)} = 0.08~ SpMV with BCSR: Flop/Byte = 1/{4*(1+fill/nnz) + 2/c + 2m/nnz*(2+1/r)} = 0.18~ nnz: number of non-zero components m: number of columns, r, c: bloc size, fill: number of zero s for blocing
22 Acceleration of SpMV Computation Parallelization Rows are distributed among threads. Load balancing Reallocate rows to balance loads. Blocing Matrix format is crucial. CSR: Compressed Sparse Row Flops/Byte = 0.08~ BCSR: Bloced CSR Flops/Byte = 0.18~ value colindx rowptr Thread 1 Thread 2 Thread 3 Thread 1 Thread 2 Thread A 0 B 0 C 0 D 0 E Balanced A B C D E
23 Performance Test (1/3) Load Balancing on Nehalem (Core i7 975) x10,000 SpMV of unbalanced matrices from the library[2] Left: w/o. load balancing Right: without load balancing [2] T. A. Davis. University of Florida sparse matrix collection, 1997.
24 Performance Test (2/3) Parallelization and Matrix Format Performance of SpMV on Nehalem (Core i7 975) CSR / BCSR format performance [MFLOPS] matrix #
25 Performance Test (3/3) Overall CG Solver CG solver s performance on CSR single thread on Nehalem (Core i7 975) performance over CSR single thread matrix #
26 Performance Model (1/2) The K-computer s roofline model based on William s model[1]. Sustained performance can be predicted w.r.t. applications Flop/Byte ratio. 8 / 128 estimated To-pea 6.25% Sustained [1] S. Williams. Auto-tuning Performance on Multicore Computers. Univ. of California, 2008.
27 Performance Model (2/2) SpMV with CSR (Flop/Byte = 0.08~0.16) Byte/Flop = 6.25~12.5 SpMV with BCSR: (Flop/Byte = 0.18~0.21) Byte/Flop = 4.76~5.56 Machine Node performance BW (catalog) BW (STREAM) B/F K 128 Gflops 64 GB/s 46.6 GB/s 0.36 FX Gflops 85 GB/s 64 GB/s 0.27 B/F of FISTR To- pea Measured performance by profiler on FX % SpMV with CSR 2.9~5.8 % SpMV with BCSR: 4.9~7.6 % SpMV with CSR 2.2~4.3 % SpMV with BCSR: 3.7~5.7 %
28 Scalability up to O(10^5) cores under memory wall Hierarchical approaches: - Memory : register cache main memory - Granularity : thread among cores message passing among nodes - Algorithm ( information transfer ) : near field far field ex) iterative solvers with multi-grid preconditioning, FETI with balancing preconditioning, fast multipole method, homogenization method, zooming methods, Saint-Venant theorem If algorithms are made multi-leveled, iterative process is generally introduced. Magic number, which controls the convergence and/or the accuracy, would appear there. With the non-linearity, different solutions may be searched at each hierarchical level.
29 Summary A parallel structural analysis system for tacling the assemble structure as a whole is being developed. Hierarchical gridding strategy is used for increasing the problem size and enhancing the accuracy. For intra-node, multithreading with considering the blocing/padding has been explored on multicore CPU and GPU (currently small number of nodes ). Future wor Intra-node ILU preconditioning Hiding data translation time Performance model
30 Front ISTR メッシュ サイズ =0.1mm
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
More informationParallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
More informationHPC Deployment of OpenFOAM in an Industrial Setting
HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment
More informationYALES2 porting on the Xeon- Phi Early results
YALES2 porting on the Xeon- Phi Early results Othman Bouizi Ghislain Lartigue Innovation and Pathfinding Architecture Group in Europe, Exascale Lab. Paris CRIHAN - Demi-journée calcul intensif, 16 juin
More informationAccelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
More informationBenchmark Tests on ANSYS Parallel Processing Technology
Benchmark Tests on ANSYS Parallel Processing Technology Kentaro Suzuki ANSYS JAPAN LTD. Abstract It is extremely important for manufacturing industries to reduce their design process period in order to
More informationScalable Distributed Schur Complement Solvers for Internal and External Flow Computations on Many-Core Architectures
Scalable Distributed Schur Complement Solvers for Internal and External Flow Computations on Many-Core Architectures Dr.-Ing. Achim Basermann, Dr. Hans-Peter Kersken, Melven Zöllner** German Aerospace
More informationHPC enabling of OpenFOAM R for CFD applications
HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationand RISC Optimization Techniques for the Hitachi SR8000 Architecture
1 KONWIHR Project: Centre of Excellence for High Performance Computing Pseudo-Vectorization and RISC Optimization Techniques for the Hitachi SR8000 Architecture F. Deserno, G. Hager, F. Brechtefeld, G.
More informationLarge-Scale Reservoir Simulation and Big Data Visualization
Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)
More informationMulticore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
More informationP013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE
1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France
More informationMixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
More informationOpenMP and Performance
Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University Tuning Cycle Performance Tuning aims to improve the runtime of an
More informationOpenFOAM Optimization Tools
OpenFOAM Optimization Tools Henrik Rusche and Aleks Jemcov h.rusche@wikki-gmbh.de and a.jemcov@wikki.co.uk Wikki, Germany and United Kingdom OpenFOAM Optimization Tools p. 1 Agenda Objective Review optimisation
More informationPerformance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
More informationACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING
ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager*
More informationA New Unstructured Variable-Resolution Finite Element Ice Sheet Stress-Velocity Solver within the MPAS/Trilinos FELIX Dycore of PISCEES
A New Unstructured Variable-Resolution Finite Element Ice Sheet Stress-Velocity Solver within the MPAS/Trilinos FELIX Dycore of PISCEES Irina Kalashnikova, Andy G. Salinger, Ray S. Tuminaro Numerical Analysis
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationTrends in High-Performance Computing for Power Grid Applications
Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views
More informationANSYS Solvers: Usage and Performance. Ansys equation solvers: usage and guidelines. Gene Poole Ansys Solvers Team, April, 2002
ANSYS Solvers: Usage and Performance Ansys equation solvers: usage and guidelines Gene Poole Ansys Solvers Team, April, 2002 Outline Basic solver descriptions Direct and iterative methods Why so many choices?
More informationBest practices for efficient HPC performance with large models
Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013 - Industry Oriented HPC Simulations, September 21-27, University of Ljubljana,
More informationPerformance of the JMA NWP models on the PC cluster TSUBAME.
Performance of the JMA NWP models on the PC cluster TSUBAME. K.Takenouchi 1), S.Yokoi 1), T.Hara 1) *, T.Aoki 2), C.Muroi 1), K.Aranami 1), K.Iwamura 1), Y.Aikawa 1) 1) Japan Meteorological Agency (JMA)
More informationAeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications
AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications Dr. Bjoern Landmann Dr. Kerstin Wieczorek Stefan Bachschuster 18.03.2015 FluiDyna GmbH, Lichtenbergstr. 8, 85748 Garching
More informationTWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW
TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW Rajesh Khatri 1, 1 M.Tech Scholar, Department of Mechanical Engineering, S.A.T.I., vidisha
More informationMesh Generation and Load Balancing
Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable
More informationUnleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
More informationFast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
More informationAN APPROACH FOR SECURE CLOUD COMPUTING FOR FEM SIMULATION
AN APPROACH FOR SECURE CLOUD COMPUTING FOR FEM SIMULATION Jörg Frochte *, Christof Kaufmann, Patrick Bouillon Dept. of Electrical Engineering and Computer Science Bochum University of Applied Science 42579
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
More informationAccelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
More informationJob scheduling of parametric computational mechanics studies on Cloud Computing infrastructures
HPC-Cetraro 2012 1/29 Job scheduling of parametric computational mechanics studies on Cloud Computing infrastructures Carlos García Garino Cristian Mateos Elina Pacini HPC 2012 High Perfomance Computing,
More informationScalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age
Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics
More informationPyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms
PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014 Overview Motivation Flux Reconstruction Many-Core
More informationFRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite
More informationPetascale Software Challenges. William Gropp www.cs.illinois.edu/~wgropp
Petascale Software Challenges William Gropp www.cs.illinois.edu/~wgropp Petascale Software Challenges Why should you care? What are they? Which are different from non-petascale? What has changed since
More informationFinite Element Method (ENGC 6321) Syllabus. Second Semester 2013-2014
Finite Element Method Finite Element Method (ENGC 6321) Syllabus Second Semester 2013-2014 Objectives Understand the basic theory of the FEM Know the behaviour and usage of each type of elements covered
More informationTESLA Report 2003-03
TESLA Report 23-3 A multigrid based 3D space-charge routine in the tracking code GPT Gisela Pöplau, Ursula van Rienen, Marieke de Loos and Bas van der Geer Institute of General Electrical Engineering,
More informationLoad Balancing Algorithms for Sparse Matrix Kernels on Heterogeneous Platforms
Load Balancing Algorithms for Sparse Matrix Kernels on Heterogeneous Platforms Thesis submitted in partial fulfillment of the requirements for the degree of MS by Research in Computer Science and Engineering
More informationModeling of Earth Surface Dynamics and Related Problems Using OpenFOAM
CSDMS 2013 Meeting Modeling of Earth Surface Dynamics and Related Problems Using OpenFOAM Xiaofeng Liu, Ph.D., P.E. Assistant Professor Department of Civil and Environmental Engineering University of Texas
More informationA Load Balancing Tool for Structured Multi-Block Grid CFD Applications
A Load Balancing Tool for Structured Multi-Block Grid CFD Applications K. P. Apponsah and D. W. Zingg University of Toronto Institute for Aerospace Studies (UTIAS), Toronto, ON, M3H 5T6, Canada Email:
More informationRecent Advances in HPC for Structural Mechanics Simulations
Recent Advances in HPC for Structural Mechanics Simulations 1 Trends in Engineering Driving Demand for HPC Increase product performance and integrity in less time Consider more design variants Find the
More informationHardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationHow High a Degree is High Enough for High Order Finite Elements?
This space is reserved for the Procedia header, do not use it How High a Degree is High Enough for High Order Finite Elements? William F. National Institute of Standards and Technology, Gaithersburg, Maryland,
More informationReal Time Simulation for Off-Road Vehicle Analysis. Dr. Pasi Korkealaakso Mevea Ltd., May 2015
Real Time Simulation for Off-Road Vehicle Analysis Dr. Pasi Korkealaakso Mevea Ltd., May 2015 Contents Introduction Virtual machine model Machine interaction with environment and realistic environment
More informationThe simulation of machine tools can be divided into two stages. In the first stage the mechanical behavior of a machine tool is simulated with FEM
1 The simulation of machine tools can be divided into two stages. In the first stage the mechanical behavior of a machine tool is simulated with FEM tools. The approach to this simulation is different
More informationYousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008
A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity
More informationSUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS - 2013 UPDATE
SUBJECT: SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE KEYWORDS:, CORE, PROCESSOR, GRAPHICS, DRIVER, RAM, STORAGE SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE Below is a summary of key components of an ideal SolidWorks
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationMPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
More informationAN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS
AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS by M. Storti, L. Dalcín, R. Paz Centro Internacional de Métodos Numéricos en Ingeniería - CIMEC INTEC, (CONICET-UNL), Santa Fe, Argentina
More informationTurbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
More informationKeys to node-level performance analysis and threading in HPC applications
Keys to node-level performance analysis and threading in HPC applications Thomas GUILLET (Intel; Exascale Computing Research) IFERC seminar, 18 March 2015 Legal Disclaimer & Optimization Notice INFORMATION
More informationHIGH PERFORMANCE CONSULTING COURSE OFFERINGS
Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...
More informationFEAWEB ASP Issue: 1.0 Stakeholder Needs Issue Date: 03/29/2000. 04/07/2000 1.0 Initial Description Marco Bittencourt
)($:(%$63 6WDNHKROGHU1HHGV,VVXH 5HYLVLRQ+LVWRU\ 'DWH,VVXH 'HVFULSWLRQ $XWKRU 04/07/2000 1.0 Initial Description Marco Bittencourt &RQILGHQWLDO DPM-FEM-UNICAMP, 2000 Page 2 7DEOHRI&RQWHQWV 1. Objectives
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
More informationA Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
More informationDavid Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM
More informationArchitecture of Hitachi SR-8000
Architecture of Hitachi SR-8000 University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 Most of the slides from Hitachi Slide 2 the problem modern computer are data
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationPerformance Improvement of Application on the K computer
Performance Improvement of Application on the K computer November 13, 2011 Kazuo Minami Team Leader, Application Development Team Research and Development Group Next-Generation Supercomputer R & D Center
More informationIntroduction to the Finite Element Method
Introduction to the Finite Element Method 09.06.2009 Outline Motivation Partial Differential Equations (PDEs) Finite Difference Method (FDM) Finite Element Method (FEM) References Motivation Figure: cross
More informationParallel Computing for Data Science
Parallel Computing for Data Science With Examples in R, C++ and CUDA Norman Matloff University of California, Davis USA (g) CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint
More informationFast Iterative Solvers for Integral Equation Based Techniques in Electromagnetics
Fast Iterative Solvers for Integral Equation Based Techniques in Electromagnetics Mario Echeverri, PhD. Student (2 nd year, presently doing a research period abroad) ID:30360 Tutor: Prof. Francesca Vipiana,
More informationFire Simulations in Civil Engineering
Contact: Lukas Arnold l.arnold@fz- juelich.de Nb: I had to remove slides containing input from our industrial partners. Sorry. Fire Simulations in Civil Engineering 07.05.2013 Lukas Arnold Fire Simulations
More informationMiniapplications: Vehicles for Co-design
Miniapplications: Vehicles for Co-design SOS 15, Engelberg, Switzerland Michael A. Heroux Scalable Algorithms Department Collaborators: Brian Barrett, Richard Barrett, Erik Boman, Ron Brightwell, Paul
More informationApplications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
More informationLecture 16 - Free Surface Flows. Applied Computational Fluid Dynamics
Lecture 16 - Free Surface Flows Applied Computational Fluid Dynamics Instructor: André Bakker http://www.bakker.org André Bakker (2002-2006) Fluent Inc. (2002) 1 Example: spinning bowl Example: flow in
More informationAN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS
AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS Revised Edition James Epperson Mathematical Reviews BICENTENNIAL 0, 1 8 0 7 z ewiley wu 2007 r71 BICENTENNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc.,
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationLS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
More informationOVERVIEW AND PERFORMANCE ANALYSIS OF THE EPETRA/OSKI MATRIX CLASS IN TRILINOS
CSRI Summer Proceedings 8 OVERVIEW AND PERFORMANCE ANALYSIS OF THE EPETRA/OSKI MATRIX CLASS IN TRILINOS I. KARLIN STUDENT AND J. HU MENTOR Abstract. In this paper, we describe a new matrix class in Epetra
More informationLap Fillet Weld Calculations and FEA Techniques
Lap Fillet Weld Calculations and FEA Techniques By: MS.ME Ahmad A. Abbas Sr. Analysis Engineer Ahmad.Abbas@AdvancedCAE.com www.advancedcae.com Sunday, July 11, 2010 Advanced CAE All contents Copyright
More informationHigh Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationIntelligent Heuristic Construction with Active Learning
Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U Space is BIG! Hubble Ultra-Deep Field
More informationNonlinear Analysis Using Femap with NX Nastran
Nonlinear Analysis Using Femap with NX Nastran Chip Fricke, Principal Applications Engineer, Agenda Nonlinear Analysis Using Femap with NX Nastran Who am I? Overview of Nonlinear Analysis Comparison of
More informationSTRUCTURAL ANALYSIS SKILLS
STRUCTURAL ANALYSIS SKILLS ***This document is held up to a basic level to represent a sample for our both theoretical background & software capabilities/skills. (Click on each link to see the detailed
More informationBuilding Platform as a Service for Scientific Applications
Building Platform as a Service for Scientific Applications Moustafa AbdelBaky moustafa@cac.rutgers.edu Rutgers Discovery Informa=cs Ins=tute (RDI 2 ) The NSF Cloud and Autonomic Compu=ng Center Department
More informationME6130 An introduction to CFD 1-1
ME6130 An introduction to CFD 1-1 What is CFD? Computational fluid dynamics (CFD) is the science of predicting fluid flow, heat and mass transfer, chemical reactions, and related phenomena by solving numerically
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
More informationA COMPARATIVE STUDY OF TWO METHODOLOGIES FOR NON LINEAR FINITE ELEMENT ANALYSIS OF KNIFE EDGE GATE VALVE SLEEVE
International Journal of Mechanical Engineering and Technology (IJMET) Volume 6, Issue 12, Dec 2015, pp. 81-90, Article ID: IJMET_06_12_009 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=6&itype=12
More informationGPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
More informationCAD-BASED DESIGN PROCESS FOR FATIGUE ANALYSIS, RELIABILITY- ANALYSIS, AND DESIGN OPTIMIZATION
CAD-BASED DESIGN PROCESS FOR FATIGUE ANALYSIS, RELIABILITY- ANALYSIS, AND DESIGN OPTIMIZATION K.K. Choi, V. Ogarevic, J. Tang, and Y.H. Park Center for Computer-Aided Design College of Engineering The
More informationAssessing the Performance of OpenMP Programs on the Intel Xeon Phi
Assessing the Performance of OpenMP Programs on the Intel Xeon Phi Dirk Schmidl, Tim Cramer, Sandra Wienke, Christian Terboven, and Matthias S. Müller schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum
More informationA Multi-layered Domain-specific Language for Stencil Computations
A Multi-layered Domain-specific Language for Stencil Computations Christian Schmitt, Frank Hannig, Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Workshop ExaStencils 2014,
More informationToward a New Metric for Ranking High Performance Computing Systems
SANDIA REPORT SAND2013-4744 Unlimited Release Printed June 2013 Toward a New Metric for Ranking High Performance Computing Systems Jack Dongarra, University of Tennessee Michael A. Heroux, Sandia National
More informationDistributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
More informationANALYTICAL AND EXPERIMENTAL EVALUATION OF SPRING BACK EFFECTS IN A TYPICAL COLD ROLLED SHEET
International Journal of Mechanical Engineering and Technology (IJMET) Volume 7, Issue 1, Jan-Feb 2016, pp. 119-130, Article ID: IJMET_07_01_013 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=7&itype=1
More informationProgramming Languages for Large Scale Parallel Computing. Marc Snir
Programming Languages for Large Scale Parallel Computing Marc Snir Focus Very large scale computing (>> 1K nodes) Performance is key issue Parallelism, load balancing, locality and communication are algorithmic
More information~ Greetings from WSU CAPPLab ~
~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)
More informationLBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
More informationPerformance Evaluation of Amazon EC2 for NASA HPC Applications!
National Aeronautics and Space Administration Performance Evaluation of Amazon EC2 for NASA HPC Applications! Piyush Mehrotra!! J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff,! S. Saini, R. Biswas!
More informationParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008
ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element
More informationGPU Acceleration of the SENSEI CFD Code Suite
GPU Acceleration of the SENSEI CFD Code Suite Chris Roy, Brent Pickering, Chip Jackson, Joe Derlaga, Xiao Xu Aerospace and Ocean Engineering Primary Collaborators: Tom Scogland, Wu Feng (Computer Science)
More information