Parallel implementation of the deflated GMRES in the PETSc package
|
|
|
- Sophie Porter
- 9 years ago
- Views:
Transcription
1 Work done while visiting the (10/26-11/24) 1/17 Parallel implementation of the deflated in the package in WAKAM 1, Joint work with Jocelyne ERHEL 1, William D. GROPP 2 1 INRIA Rennes, 2 NCSA-University of Illinois Third Workshop of the INRIA-Illinois Joint Laboratory for Petascale Computation, NCSA, November 22-24, 2010
2 Outline in Practical 4 Early results 2/17
3 General purpose Linear system The core computation of many scientific simulations is to solve Ax = b we restrict to A R n n nonsymmetric, x,b R n The method [Saad and Schultz, 86] in From x 0, iteratively build a sequence of approximate solutions x 1,... x k... at step k, x k x 0 + K k s.t. r k = b Ax k AK k K k is a Krylov subspace From the Arnoldi process : AV k = V k+1 H k and x k = x 0 + V k y k From the minimal residual condition : y k solves min βe 1 H k y k The solution is found in at most n iterations. 3/17
4 (m) Computational and memory requirements grow with k. Restarted (m) At step m, set x 0 x m, and restart V m and H m are discarded difficult to predict the convergence behaviour. The iterative process can stall as well!! in An example Matrix: CASE 04 Field : Fluid dynamics Origin: 2D linear cascade turbine Source : FLUOREM Matrix Collection Rel. Res. Norm CASE_04 n=7,980 nz=430,909 (32) iterations 4/17
5 and Eigenvalues Rate of convergence depends on the spectral distribution of A Removing or deflating eigenvalues (usually the smallest ones) improve the convergence rate Deflation occurs autommatically in the full version of when the subspace is large enough. In the restarted version, deflate an eigenvalue add the corresponding eigenvector [Morgan, J. Sci. Stat. Comput. 02] in In this work Deflation occurs by using a preconditioner corresponding to the invariant subspace associated to the selected eigenvalues [Erhel et al, JCAM, 1996; Burrage et al, NLAA, 1998] U = [u 1...u k ], T = U T AU M 1 I n + U( λ n T 1 I k )U T, 5/17
6 with right preconditioning in Solve AM 1 (Mx) = b for x, Algorithm: Input (m, k)[erhel et al, JCAM, 1996] Set x 0 ; M = I; U = [] 1 Arnoldi process on AM 1 to get V m and H m. 2 x m x 0 + V m y m, y m solution of min βe 1 H m y m 3 If no convergence 1 Find k Schur vectors S k of H m 2 Orthogonalize X = V ms r against U 3 Increase U by X and Compute T = U T AU 4 Set M 1 I n + U( λ n T 1 I rj )U T 5 Set x 0 x m and Restart at step 1 6/17
7 Main observation induces extra cost to compute and apply the preconditioner Should be applied only if necessary ; for instance, to prevent stagnation or insufficient reduction in the residual norm : Detect stagnation at the end of a cycle and switch to deflation in How to detect stagnation in (m)?? Strictly : Rate of convergence r m / r 0 > τ, 0 < τ 1 Pratically : should have a more realistic experimental test (m) is declared to have stagnated if at the averate rate of progress over the last restart cycle of m steps, the residual norm tolerance cannot be met in some large multiple of the remaining number of steps allowed (the number of steps permitted is bounded by itmax) [Sosonkina et al., NLAA98]: 7/17
8 Pratical implementation Outline of (D(m, l, rmax)) choose x 0,tol, m, itmax,k Set B A M 1 where M 1 is any external preconditioner r 0 = b Bx 0 ; M = I; k = 0; it = 0; l = 0; while ( r 0 > tol) Run a cycle of m max. iterations with a minimization step to get x m and r m it+ = m; in If ( r m > tol and it < itmax) then test = m log(tol/ r m )/log( r m / r 0 ); If (test > smv * (itmax-it) ) then Estimate k smallest eigenvalues of BM 1 and compute data for the preconditioner M 1 associated to the deflation End If End If x 0 = x m, r 0 = r m end while 8/17
9 in New KSP type : D LEVEL OF ABSTRACTION PC (Preconditioners) MATRICES APPLICATIONS CODES EXTERNAL PACKAGES VECTORS KSP (Krylov Subspace Methods) INDEX SETS... BLAS LAPACK SCALAPACK BLACS MPI Other KSP... F L D MPI Calls are encapsulated in routines in Usage in Petsc In your executable, register dynamically the new KSP KSPRegisterDynamic(KSPD, Path-to-libdgmres.a, KSPCreate D, KSPCreate D); Use D just as any other KSP with the following options KSPSetType(ksp, KSPD); or ksp type dgmres ksp dgmres eigen <k>: Number of eigenvalues to deflate ksp dgmres max eigen <kmax>: Maximum Number of eigenvalues to deflate ksp dgmres smv <smv> : relaxation parameter in the adaptive ksp gmres restart<m>, ksp max it<itmax>, ksp rtol<rtol>... Any option from holds; Any preconditioner available for pc type<pc> 9/17
10 Environment for tests Platform of tests Test Examples Cluster GRID 5000 SMP Nodes HP Proliant DL165 Dual CPU/Node; 12 AMD 1.7GHZ Infiniband DDR network in Petsc Example (Ex3) Basic poisson problem 2D regular mesh on the unit square 2000x2000 mesh in these tests N:=4,000,000 NZ:=35,936,032 easy to solve; given here just for large scale experimentation FLUOREM Example (CASE 017) Parametrized Navier-Stokes equation Finite volume discretization of the steady equation Nonsymmetric jacobian matrix N:=381,689 NZ:=37,464,962 strongly non-elliptic operator; need robust solver 10/17
11 Policy of tests and options Policy of tests Run the iterative method until b Ax / b <rtol> or the maximum number of iterations <max it> (not matrix-vectors) is reached. Domain decomposition preconditioner is used (Additive Schwarz, block jacobi); data assigned to processes in chunk of contiguous rows. For each test case and each number of subdomains #D (m) : restarted ; cycle of m iterations; right preconditioning; D(m, k, max) : ; k eigenvalues extracted at each restart; max eigenvalues extracted!! D A(m,k,max): with adaptive. in Ex3 ksp <rtol> 1e-12 gmres restart <m> 12 gmres <max it> 500 right preconditioning PC block jacobi Subdomains <D> [ ] sub solver ILU(0) (Hypre-Euclid) dgmres eigen <k> 2 CASE 017 ksp <rtol> 1e-08 gmres restart <m> 64 gmres <max it> 1500 right preconditioning PC ASM <overlap> 1 Subdomains <D> [ ] sub solver LU (MUMPS) dgmres eigen <k> 5 11/17
12 Ex3: Poisson problem; 2000x2000 mesh; 36M entries Numerical convergence, subdomains in Rel. Res. Norm Rel. Res. Norm 10 8 EX3_N2000 D= (12) D_A(12;2;8) D(12;2;23) iterations EX3_2000x2000 D= (12) D_A(12;2;78) D(12;2;91) Rel. Res. Norm EX3_2000x2000 D= (12) D_A(12;2;17) D(12;2;31) iterations (12) D(12, 2) D_A(12, 2) Number of Matrix vectors iterations /17
13 EX3: Overhead due to the deflation MPI Messages on MPI Messages and Reductions x MPI Messages and Reductions (12) 7 D(12, 2) 6 D_A(12, 2) Processors MPI Messages and Reductions x Messages during deflation Total in D 7 During Deflation Processors in Insignificant overhead of messages in the deflation phase More investigation on real test cases for the Flops and CPU time 13/17
14 CASE 17; 3D CFD case; 37.5M entries Numerical convergence; [ ] subdomains, Rel. Res. Norm Rel. Res. Norm 10 0 CASE_17 D= (64) D_A(64;5;5) D(64;5;61) iterations 10 0 CASE_17 D= (64) D_A(64;5;10) D(64;5;43) Rel. Res. Norm Matrix Vectors 10 0 CASE_17 D= (64) D_A(64;5;54) D(64;5;86) iterations Matrix vectors (64) D(64, 5) D_A(64, 5) in iterations Processors 14/17
15 CASE 17; 3D CFD case; 37.5M entries Flops, MPI messages and CPU Time; Linux 5000 Time (sec) Flops x Floating point operations (64) D(64, 5) 4 D_A(64, 5) Processors Total elapsed time D D_A Time (Sec) MPI Messages and Reductions x MPI Messages and Reductions (64) D(64, 5) 3 D_A(64, 5) Processors Overhead time in deflation Total In deflation in Processors Processors 15/17
16 Real contribution of the adaptive The fact is: Rel. Res. Norm Sometimes, an increase of the restart parameter in may prevent stagnation Difficult to know how much it should be increase; more difficult to know if the method will converge either way after the increase. The deflated with adaptive will provide more robustness with any restart parameter CASE_17 D= (48) (64) (96) Rel. Res. Norm 10 0 CASE_17 D= D_A(48;5;49) D_A(64;5;10) D_A(96;5;5) D_A(128;5;0) in iterations iterations 16/17
17 Conclusion in The should be tested on more real applications and platforms Nevertheless the code exists as a KSP module... for real artihmetics. 17/17
18 Conclusion in The should be tested on more real applications and platforms Nevertheless the code exists as a KSP module... for real artihmetics. THANKS... QUESTIONS??? 17/17
Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008
A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity
AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS
AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS by M. Storti, L. Dalcín, R. Paz Centro Internacional de Métodos Numéricos en Ingeniería - CIMEC INTEC, (CONICET-UNL), Santa Fe, Argentina
A Load Balancing Tool for Structured Multi-Block Grid CFD Applications
A Load Balancing Tool for Structured Multi-Block Grid CFD Applications K. P. Apponsah and D. W. Zingg University of Toronto Institute for Aerospace Studies (UTIAS), Toronto, ON, M3H 5T6, Canada Email:
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 [email protected] 1 Course G63.2010.001 / G22.2420-001,
P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE
1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France
Mathematical Libraries on JUQUEEN. JSC Training Course
Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries on JUQUEEN JSC Training Course May 10, 2012 Outline General Informations Sequential Libraries, planned Parallel Libraries and Application Systems:
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance
Numerical Methods I Eigenvalue Problems
Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 [email protected] 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)
Mathematical Libraries and Application Software on JUROPA and JUQUEEN
Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUROPA and JUQUEEN JSC Training Course May 2014 I.Gutheil Outline General Informations Sequential Libraries Parallel
Numerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015
Numerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015 Xian Shi 1 bio I am a second-year Ph.D. student from Combustion Analysis/Modeling Lab,
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
Scalable Distributed Schur Complement Solvers for Internal and External Flow Computations on Many-Core Architectures
Scalable Distributed Schur Complement Solvers for Internal and External Flow Computations on Many-Core Architectures Dr.-Ing. Achim Basermann, Dr. Hans-Peter Kersken, Melven Zöllner** German Aerospace
Module 6 Case Studies
Module 6 Case Studies 1 Lecture 6.1 A CFD Code for Turbomachinery Flows 2 Development of a CFD Code The lecture material in the previous Modules help the student to understand the domain knowledge required
Domain Decomposition Methods. Partial Differential Equations
Domain Decomposition Methods for Partial Differential Equations ALFIO QUARTERONI Professor ofnumericalanalysis, Politecnico di Milano, Italy, and Ecole Polytechnique Federale de Lausanne, Switzerland ALBERTO
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation
The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation Mark Baker, Garry Smith and Ahmad Hasaan SSE, University of Reading Paravirtualization A full assessment of paravirtualization
Fast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
A matrix-free preconditioner for sparse symmetric positive definite systems and least square problems
A matrix-free preconditioner for sparse symmetric positive definite systems and least square problems Stefania Bellavia Dipartimento di Ingegneria Industriale Università degli Studi di Firenze Joint work
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
Clusters: Mainstream Technology for CAE
Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux
A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
AMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 19: SVD revisited; Software for Linear Algebra Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 9 Outline 1 Computing
Hands-On Exercises for SLEPc
Scalable Library for Eigenvalue Problem Computations http://slepc.upv.es Hands-On Exercises for SLEPc June 2015 Intended for use with version 3.6 of SLEPc These exercises introduce application development
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
Large-Scale Reservoir Simulation and Big Data Visualization
Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Fast Iterative Solvers for Integral Equation Based Techniques in Electromagnetics
Fast Iterative Solvers for Integral Equation Based Techniques in Electromagnetics Mario Echeverri, PhD. Student (2 nd year, presently doing a research period abroad) ID:30360 Tutor: Prof. Francesca Vipiana,
Turbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures
11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the
General Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
HSL and its out-of-core solver
HSL and its out-of-core solver Jennifer A. Scott [email protected] Prague November 2006 p. 1/37 Sparse systems Problem: we wish to solve where A is Ax = b LARGE Informal definition: A is sparse if many
ANSYS Solvers: Usage and Performance. Ansys equation solvers: usage and guidelines. Gene Poole Ansys Solvers Team, April, 2002
ANSYS Solvers: Usage and Performance Ansys equation solvers: usage and guidelines Gene Poole Ansys Solvers Team, April, 2002 Outline Basic solver descriptions Direct and iterative methods Why so many choices?
TESLA Report 2003-03
TESLA Report 23-3 A multigrid based 3D space-charge routine in the tracking code GPT Gisela Pöplau, Ursula van Rienen, Marieke de Loos and Bas van der Geer Institute of General Electrical Engineering,
1 2 3 1 1 2 x = + x 2 + x 4 1 0 1
(d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which
Course Outline for the Masters Programme in Computational Engineering
Course Outline for the Masters Programme in Computational Engineering Compulsory Courses CP-501 Mathematical Methods for Computational 3 Engineering-I CP-502 Mathematical Methods for Computational 3 Engineering-II
DYNAMIC LOAD BALANCING APPLICATIONS ON A HETEROGENEOUS UNIX/NT CLUSTER
European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 ECCOMAS DYNAMIC LOAD BALANCING APPLICATIONS ON A HETEROGENEOUS UNIX/NT CLUSTER
Iterative Solvers for Linear Systems
9th SimLab Course on Parallel Numerical Simulation, 4.10 8.10.2010 Iterative Solvers for Linear Systems Bernhard Gatzhammer Chair of Scientific Computing in Computer Science Technische Universität München
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
Best practices for efficient HPC performance with large models
Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013 - Industry Oriented HPC Simulations, September 21-27, University of Ljubljana,
Cluster Computing in a College of Criminal Justice
Cluster Computing in a College of Criminal Justice Boris Bondarenko and Douglas E. Salane Mathematics & Computer Science Dept. John Jay College of Criminal Justice The City University of New York 2004
A Massively Parallel Finite Element Framework with Application to Incompressible Flows
A Massively Parallel Finite Element Framework with Application to Incompressible Flows Dissertation zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultäten der Georg-August-Universität
AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications
AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications Dr. Bjoern Landmann Dr. Kerstin Wieczorek Stefan Bachschuster 18.03.2015 FluiDyna GmbH, Lichtenbergstr. 8, 85748 Garching
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition
P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition K. Osypov* (WesternGeco), D. Nichols (WesternGeco), M. Woodward (WesternGeco) & C.E. Yarman (WesternGeco) SUMMARY Tomographic
Multigrid preconditioning for nonlinear (degenerate) parabolic equations with application to monument degradation
Multigrid preconditioning for nonlinear (degenerate) parabolic equations with application to monument degradation M. Donatelli 1 M. Semplice S. Serra-Capizzano 1 1 Department of Science and High Technology
Hierarchically Parallel FE Software for Assembly Structures : FrontISTR - Parallel Performance Evaluation and Its Industrial Applications
CO-DESIGN 2012, October 23-25, 2012 Peing University, Beijing Hierarchically Parallel FE Software for Assembly Structures : FrontISTR - Parallel Performance Evaluation and Its Industrial Applications Hiroshi
It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU
It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU A. Windisch PhD Seminar: High Performance Computing II G. Haase March 29 th, 2012, Graz Outline 1 MUMPS 2 PaStiX 3 SuperLU 4 Summary
Mesh Generation and Load Balancing
Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable
benchmarking Amazon EC2 for high-performance scientific computing
Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received
GPU Acceleration of the SENSEI CFD Code Suite
GPU Acceleration of the SENSEI CFD Code Suite Chris Roy, Brent Pickering, Chip Jackson, Joe Derlaga, Xiao Xu Aerospace and Ocean Engineering Primary Collaborators: Tom Scogland, Wu Feng (Computer Science)
Continuity of the Perron Root
Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North
Recommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
ALGEBRAIC EIGENVALUE PROBLEM
ALGEBRAIC EIGENVALUE PROBLEM BY J. H. WILKINSON, M.A. (Cantab.), Sc.D. Technische Universes! Dsrmstedt FACHBEREICH (NFORMATiK BIBL1OTHEK Sachgebieto:. Standort: CLARENDON PRESS OXFORD 1965 Contents 1.
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
High-fidelity electromagnetic modeling of large multi-scale naval structures
High-fidelity electromagnetic modeling of large multi-scale naval structures F. Vipiana, M. A. Francavilla, S. Arianos, and G. Vecchi (LACE), and Politecnico di Torino 1 Outline ISMB and Antenna/EMC Lab
Preconditioning Sparse Matrices for Computing Eigenvalues and Solving Linear Systems of Equations. Tzu-Yi Chen
Preconditioning Sparse Matrices for Computing Eigenvalues and Solving Linear Systems of Equations by Tzu-Yi Chen B.S. (Massachusetts Institute of Technology) 1995 B.S. (Massachusetts Institute of Technology)
Matrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
LINEAR REDUCED ORDER MODELLING FOR GUST RESPONSE ANALYSIS USING THE DLR TAU CODE
IFASD-213-36A LINEAR REDUCED ORDER MODELLING FOR GUST RESPONSE ANALYSIS USING THE DLR TAU CODE S. Timme 1, K. J. Badcock 1, and A. Da Ronch 2 1 School of Engineering, University of Liverpool Liverpool
ME6130 An introduction to CFD 1-1
ME6130 An introduction to CFD 1-1 What is CFD? Computational fluid dynamics (CFD) is the science of predicting fluid flow, heat and mass transfer, chemical reactions, and related phenomena by solving numerically
Improving Scalability for Citrix Presentation Server
VMWARE PERFORMANCE STUDY VMware ESX Server 3. Improving Scalability for Citrix Presentation Server Citrix Presentation Server administrators have often opted for many small servers (with one or two CPUs)
ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS
1 ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS Sreenivas Varadan a, Kentaro Hara b, Eric Johnsen a, Bram Van Leer b a. Department of Mechanical Engineering, University of Michigan,
64-Bit versus 32-Bit CPUs in Scientific Computing
64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples
by the matrix A results in a vector which is a reflection of the given
Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that
Design and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms. Amani AlOnazi.
Design and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms Amani AlOnazi For the Degree of Master of Science Uniersity College Dublin, Dublin, Ireland
Poisson Equation Solver Parallelisation for Particle-in-Cell Model
WDS'14 Proceedings of Contributed Papers Physics, 233 237, 214. ISBN 978-8-7378-276-4 MATFYZPRESS Poisson Equation Solver Parallelisation for Particle-in-Cell Model A. Podolník, 1,2 M. Komm, 1 R. Dejarnac,
Lecture 5: Singular Value Decomposition SVD (1)
EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system
AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS
AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS Revised Edition James Epperson Mathematical Reviews BICENTENNIAL 0, 1 8 0 7 z ewiley wu 2007 r71 BICENTENNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc.,
Computational Modeling of Wind Turbines in OpenFOAM
Computational Modeling of Wind Turbines in OpenFOAM Hamid Rahimi [email protected] ForWind - Center for Wind Energy Research Institute of Physics, University of Oldenburg, Germany Outline Computational
A numerically adaptive implementation of the simplex method
A numerically adaptive implementation of the simplex method József Smidla, Péter Tar, István Maros Department of Computer Science and Systems Technology University of Pannonia 17th of December 2014. 1
The PageRank Citation Ranking: Bring Order to the Web
The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized
GPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
Benchmark Tests on ANSYS Parallel Processing Technology
Benchmark Tests on ANSYS Parallel Processing Technology Kentaro Suzuki ANSYS JAPAN LTD. Abstract It is extremely important for manufacturing industries to reduce their design process period in order to
Partitioning and Load Balancing for Emerging Parallel Applications and Architectures
Chapter Partitioning and Load Balancing for Emerging Parallel Applications and Architectures Karen D. Devine, Erik G. Boman, and George Karypis. Introduction An important component of parallel scientific
Building a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
Performance of the JMA NWP models on the PC cluster TSUBAME.
Performance of the JMA NWP models on the PC cluster TSUBAME. K.Takenouchi 1), S.Yokoi 1), T.Hara 1) *, T.Aoki 2), C.Muroi 1), K.Aranami 1), K.Iwamura 1), Y.Aikawa 1) 1) Japan Meteorological Agency (JMA)
ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING
ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager*
Adaptive Stable Additive Methods for Linear Algebraic Calculations
Adaptive Stable Additive Methods for Linear Algebraic Calculations József Smidla, Péter Tar, István Maros University of Pannonia Veszprém, Hungary 4 th of July 204. / 2 József Smidla, Péter Tar, István
The Application of a Black-Box Solver with Error Estimate to Different Systems of PDEs
The Application of a Black-Bo Solver with Error Estimate to Different Systems of PDEs Torsten Adolph and Willi Schönauer Forschungszentrum Karlsruhe Institute for Scientific Computing Karlsruhe, Germany
Subspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China [email protected] Stan Z. Li Microsoft
Overlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm [email protected] [email protected] Department of Computer Science Department of Electrical and Computer
