P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

Size: px
Start display at page:

Download "P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE"

Transcription

1 1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau Rueil Malmaison Cedex. France Abstract As full field reservoir simulations require a large amounts of computing resources, the trend is to use parallel computing to overcome hardware limitations. With a price to performance ratio often better than for traditional machines and with services in constant progress, Linux clusters have become a very attractive alternative solution to traditional high performance facilities. Therefore oil&gas companies have very quickly been moving to these new architectures. This paper presents the new reservoir simulation software developed at IFP which has been specially designed for Linux clusters. The parallel model implemented with MPI protocol is described. The linear solvers and the preconditioning techniques applied to solve very large scale problems are presented. We discuss some technical choices well suited for our specific numerical schemes and Linux clusters architectures. This new reservoir simulation software has been tested on different platforms for several already published reference large scale benchmark problems and real study cases. This paper reports the scalability of the parallel simulator and numerical stability of underlying algorithms calculated from this test campaign. Introduction As vector computers and traditional parallel machines, Linux PC clusters represent a major breakthrough in High Performance Computing (HPC). In order to take advantage of the various architectures, simulation codes must be adapted or sometimes completely rewritten. In 1983, IFP developed a vector version of its reservoir simulator [1]. In 2001, the first parallel reservoir simulator [2], dedicated for shared memory platforms was introduced. In 2003, IFP began the development of a new parallel multipurpose reservoir simulator based on a parallel scheme optimized for distributed architectures and especially for Linux clusters. This paper describes the MPI (Message Passing Interface [3]) parallel model based on a domain decomposition using an expert grid partitioning algorithm. The reservoir simulator is based on a system of equation discretized with a first order space finite volume scheme and linearized with the Newton iterative method. Reservoir simulation on reservoirs discretized with large numbers of cells and hydrocarbon components generate large sparse linear systems to be solved at each time step and each Newton iteration. The preconditioned iterative methods based on Bi-CGSTAB [4] or GMRES [4] are known to be well suited to solve this kind of problem. Nevertheless, the complexity and the size of the reservoir models keep increasing in a continuous way and that requires designing new algorithms for solving such large linear systems. Advanced methods that have implemented in our code are described and the research on linear solvers is also presented. We discussed the results obtained on the Tenth SPE Comparative Solution Project, Model 2 [5] and some bottlenecks due to Linux clusters. 9th European Conference on the Mathematics of Oil Recovery Cannes, France, 30 August - 2 September 2004

2 2 A multi purpose reservoir simulator The parallel reservoir simulator presented in this paper is a multi purpose simulator providing most of physical options required by reservoir engineers such as black-oil, multi-component, thermal, dual permeability, dual porosity, polymers and steam injection. The classical fully implicit numerical formulation, was used for tests but AIM, IMPEX, AIMPEX [6] and IMPES mixing implicit and explicit time discretization methods are also available. The simulator grid may be either space 3D Cartesian or Corner Point geometries integrating multiple refinement levels. This geometry is transformed into an unstructured internal geometry scheme for the resolution of mass conservative equations. Linux clusters Until recently the HPC (High Performance Computing) market was completely divided in two parts, vector computers and traditional proprietary parallel machines. Also, during the last ten years, Linux clusters have quickly gained in maturity to be now largely used, in particular for geo-modeling and seismic interpretation algorithm. The main reason of this success comes from their good price/performance ratio which is much more interesting than for other classical platforms. Linux clusters are built on basic bricks which contain similar components found in personal PC. The volume realized on the market of personal PC push down prices drastically. In a Linux cluster, the different nodes are linked to each other by special networks for services, Input/Output and communications. A global parallel file system is often used to ensure the simultaneous access of all nodes to a unique storage area. The networks and the parallel file system are the critical part of the architecture and can represent half the price of the whole machine. The Open Source community has made great efforts, on the Linux operating system which now offers the same level of quality and robustness as proprietary systems. Most of the largest clusters [7] in the world are now Linux clusters. Parallelism approach In our simulator, the space finite volume discretization can use either structured or unstructured mesh. Unstructured meshes are better adapted to simulate very complex geometry including faults, layer pinch outs, production areas and representation of complex horizontal wells. It also avoids storing non active cells and, so, requires less memory. Our parallel simulator is based on a general unstructured mesh with cells and links between any two cells, one upstream cell and one downstream cell. The parallelization consists in: 1.partitioning the mesh in sub-domains ; 2.distributing grid data on the different domains ; 3.distributing computations on the different processors managing each sub-domain. Data Distribution Data distribution is based on an overlapping decomposition of the grid. Each grid cell is affected to a unique sub-domain as an interior cell. Overlapping decomposition is constructed by adding to each sub-domain the cells (the ghost cells) of other domains also connected to at least one interior cell of the sub-domain. Finally the mesh is divided in several sub-domains composed of: Interior cells ; Ghost cells, connected to at least one interior cell.

3 3 Renumbering To improve the performance of solving algorithms which are not naturally parallel, we use a special cell numbering for domain decomposition. Cells are separated in : Cells only connected to other interior cells, composing the interior of the sub-domain ; Cells having at least one connection with a ghost cell that we call interface cells of the subdomain. Then, they are sorted in the following way: 1.The cells of the interior of the sub-domain ; 2.The interface cells ; 3.The ghost cells. Such renumbering can help to parallelize algorithms which are not naturally parallel and that can be very expensive for communication costs. This is the case of linear solvers and many of efficient preconditioning techniques that will be further described. Domain decompositon renumbering Interior cells Ghost cells Interface cells Figure 1: Domain decomposition example Quality of the distribution and limit of the number of sub-domains Communication cost depends on the number of interface cells. To achieve good performance, it is important to keep this rate at minimum value, at worst no more than 10% or 20 % following the network. On a constant defined size test case, when the number of domains increases, the interface rate grows as the number of interface cells increases while the global size keeps constant. So it is obvious that when the number of domains increases, the parallel scalability for a fix problem size will reach a maximum value as the interface rate increases up to a critical value where performance drops down. Distribution of cells, links, composite objets, global objets and their data All data corresponding to cell properties such as cell porosity, permeability, pressure or average velocity, are distributed regarding the cell distribution: every properties of one cell are distributed to all the processors dealing with the domains where the cell is either interior or ghost. Composite objects such as wells composed of perforated cells, border or limit region, are distributed and affected to the domain that contains the major part of their cells. Their data are distributed regarding the way these composite objects are distributed. Links between two cells i and j are distributed to all domains containing i or j as interior cells, but not on domains where i and j are both interface cells. Links data such as permeability are then distributed regarding links distribution. General global objects such as PVT laws, KrPc laws, curves and recurrent well data or numerical options, are distributed on every processor. 9th European Conference on the Mathematics of Oil Recovery Cannes, France, 30 August - 2 September 2004

4 4 Distribution of computation All local computations using cell or link data are distributed regarding cell and link distributions. Global values such as global balance, average or maximum values or all kind of global step (time steps, Newton steps and solver steps) are issued from computation on all cells or on all links. They are built by global MPI reduction of the results obtained locally on the interior cells of each sub-domain. The most expensive phase during simulation is the resolution of non linear problem using Newton iterations. At each Newton iteration, a large sparse linear system is solved with a parallel implementation of the Bi Conjugate Stabilised Gradient using a parallel preconditioner. This implementation is based on the parallel distribution of the matrix rows and of the vector components. As each matrix row corresponds to one equation of one cell unknown, each vector component to one cell unknown, and each non zero matrix entry to one mesh link, the matrix and vector distribution is done following the way cells and links are distributed. Some equations link well properties to perforated cell unknowns. These well properties may also be solved implicitly as the other implicit cell unknowns. The corresponding rows and columns of these equations are distributed regarding the well and cell distribution. New Solutions implemented in our new parallel reservoir simulator. In our new simulator, in order to overcome the difficulties of our parallel approach, we have: worked on a numerical scheme to ensure enhanced numerical stability and high parallel performance. implemented a communication strategy to use overlapping communication as often as possible, limited synchronization phase and reduced communication cost. Numerical stability and convergence criteria During the simulation we use Newton method to linearize the non linearity. The linearized problem is solved using BiCGStab algorithms. The cost of these iterative algorithms depends of the number of steps required to reach their stopping criteria. Even if we only parallelize independent operations so that results should be independent of the number of domains, global results can be nevertheless sensitive to how the reduction of local operations is done. In a parallel paradigm, we cannot avoid little differences due to global parallel reduction. During a simulation, if the Newton stopping criterion is close to the Newton precision, some differences due to parallel synchronisation can affect on the number of Newton iterations and the value of the time steps. For example, a little difference on the resolution on the linear system can make the Newton loop fail. The time step is then reduced and the Newton loop has to be restart. Finally a small difference on the solution of the linear system has very important consequences on the number of Newton steps and even worse, on the values of the time step which gradually decreases. In that case the simulation becomes very unstable. The cost of a single run becomes very dependent on the number of domains and on the hardware environment. In such cases, the accuracy of the solver is critical to ensure parallel performance as the cumulative number of steps (solver resolution, Newton loops, time loops) can be very dependent to the number of domains. This shows the importance of choosing a stable numerical scheme, good solver options and robust parallel preconditionner to have a numerically stable simulation. That difficulty emphasises the special importance of using very stable numerical schemes during parallel simulation. Solver options may have important influence on convergence criteria, and the choice of a good parallel preconditioner can help to have stable convergence iteration numbers.

5 5 In the standard sequential simulator, linear systems are efficiently solved by the Bi Conjugated Gradient Stabilized with an incomplete ILU0 preconditionner. This preconditioner, well known to be efficient for standard case is not naturally parallel as its algorithm is recursive. Some tests on several cases show that the simple block ILU0 preconditionner is not numerically stable: the number of solver iterations is very sensitive to the number of domains. We have then parallelised it taking into account all links between adjacent domains. The parallel renumbering makes the ILU0 algorithm parallel on interior cells as they are independent from one domain to another. To optimise the treatment on interface cells, we have separated and sorted them in three kinds of cells: 1. Cells only connected to a domain with a higher rank than the current domain ; 2. Cells connected to both domains with rank higher and lower than the current domain ; 3. Cells only connected with rank higher than the current domain. Even if the algorithm is recursive, the rows corresponding to the first and the third kind of interface cells can be computed independently of the domain, and then they can be computed in parallel. The second kind of cells introduces recursivity between processors, but fortunately with most type of partitionners there are not such cells. With all those renumbering techniques, the domain decomposition renumbering and the sort of interface cells, the ILU0 preconditonner has been parallelize without neglecting interface interaction between domains and avoiding recursivity between processors. In that way, the over head of the parallelism is only due to communication costs. Our new parallel ILU0 preconditionner turns to be a good scalable preconditionner. However, it can encounter difficulties on complex industrial cases with very complex geometry and physical models. Some recent researches have proved that Multigrid methods, which are more expensive than traditional ILU0 methods, are more robust for elliptic problem and very stable in term of number of iterations independently of the problem size. In reservoir modelling, linear systems are composed of unknowns such as the pressure coming from elliptic equation and other unknowns coming from transport equations. In some industrial cases, we have noticed that the variations of saturation are very sensitive to pressure gradient so that having a robust preconditionner for the pressure is very important. We have developed a new kind of preconditionner Two level AMG combining an algebraic multigrid method on pressure unknowns and a more traditional parallel one on the others (ILU0, Block ILU0, polynomial). We had, in that way, developed a new method which is robust, parallel and performing. We have tested theses methods (Parallel ILU0, Two level AMG) and studied their influence on the performance and on the numerical stability of the reservoir simulator. Communication cost The overhead of the computations is mainly due to redundant computation on ghost cells and to communications cost. A great attention has to be paid so that communication cost does not reduce the scalability For efficient partitionner with low interface rate, the cost of computation on ghost cells is not important compared to the other computations. The overhead of parallelism is mainly due to communication costs. The most communicating task is dedicated to the solver. During that phase, we take advantage of the cells domain decomposition renumbering: Sorting interior cells we use overlapping communication during linear resolution. With asynchronous communication, we can overlap communication through the network with computation on interior cells. For the synchronisation step, when communication overlapping is not possible, we use blocking communication as it turns out to be more efficient than asynchronous one. 9th European Conference on the Mathematics of Oil Recovery Cannes, France, 30 August - 2 September 2004

6 6 We only need a few synchronisation phases to ensure that ghost cell data is correct. Most of the computations are done on every processor that manages one cell as interior or ghost cell. Synchronisation is so only needed when the computation on a ghost cell may be false because of the lack of neighbour data. We group these synchronizations in a few phases, after linear resolution and before the updating of cells data with their variation, and when we compute all kind of balance to compute time steps or Newton steps. IO strategy, parallel hdf5 format As the amount of transferred data during these IO phases can be very high, a parallel IO strategy has been adopted. This strategy is a very difficult issue in parallel industrial software because it depends on the performance of the parallel file system, the way user wants to post process file results, the number of currently used processors. The HDF5 [8] (Hierarchical Data Format version 5) API is used for the management of the parallel IO in our simulator. HDF5 is a library and a file format designed for scientific data storage. HDF5 provides a complete portable high level API, which allow the developer to efficient data structures which optimized the read/write operations. All the low level calls to the MPI-IO [3] are implicitly done by HDF5, this allow a good portability, and important gain in time of development and maintenance. The parallel version of the HDF5 API enables all processors to read and write in one binary file managing all kind of competition between processors. The use of HDF5 enables to have a unique file which is independent of the number of the processors. This allows to do restart jobs and post-processing with various numbers of processors. The performance of the HDF5 library is completely dependant of the performance of the parallel file system. We noticed HDF5 is not very scalable for a large number of processors on GPFS file system. To improve the scalability of the IO in our simulator we developed a strategy which consists in reducing the number of processes doing the IO on the disk and doing. Results The parallel reservoir simulator has been tested on a 32 nodes IBM Linux cluster of dual Intel Xeon@3,06GHz with 2GB memory per node, with a high bandwidth network Myrinet 2000 and a IBM GPFS parallel file system. The results presented in this paper were obtained on the Tenth SPE Comparative Solution Project, Model 2 [6]. This model is built on a Cartesian regular geometry and simulates a 3D waterflood of a geostatistical model with more than one million cells. Only the fine grid with the following dimensions 60x220x85 and 1,094,721 active cells has been simulated on the cluster. The problem simulated is incompressible and a water-oil thermodynamic system is used to model the fluid behavior. The simulations were carried on 2, 4, 8, 16 and 32 processors, filling the cluster in a standard way, with 2 processes per node. We have shown the performances should have been better by using one process per node because of a serious bottleneck on the dual Xeon processors for the memory accesses. Nevertheless, we have decided to present results in the usual running condition as for normal users. The physical results (fig. 2) are very similar to the results presented in [5] and are independent to the number of the processors. This, confirm the good stability of our algorithms.

7 7 Figure 2: Well P3 Water cut and Std Surface Oil Rate Interface rate In the following table, we compare the interface rate on SPE10 test case versus the number of domains with a classical band partitioner. Number of CPUs Interface size Interior size Rate (%) , , , , , ,33 Table 1: Domain decomposition statistics This evolution shows that the SPE10 test case can be efficiently simulated up to 32 processors. With more than 32 processors very poor performance is expected. Stability In the following table, we compare: The number of steps needed for the all simulation ; The number of solver iterations ; The time steps evolution. We have obtained with a Combinative AMG preconditioner, versus the number of processors. 9th European Conference on the Mathematics of Oil Recovery Cannes, France, 30 August - 2 September 2004

8 8 Number of CPUs Time (s) Speedup Solver iterations Time steps , , , , , Table 2: Numerical results The results show that the simulator is numerically stable and the number of steps is quite independent from the number of processors. Scalability The following figure (fig. 3) shows the elapsed time and the speed-up relative to 2 processors, of the different runs versus the number of processors Time (s) Speedup 7,00 6,00 Elapsed time (s) ,00 4,00 3,00 2,00 1,00 Relative Speedup (Tn / T2) Number of processors 0,00 Figure 3: Elapsed times and speedups The graphic shows that up to 32 processors, we have very good speedups. One run has been realized on 64 processors and the performance was poor as SPE10 case is not large enough to enable low interface rate on 64 sub domains. Conclusions The parallel version of our simulator designed for distributed architectures achieves good performances on a Linux cluster. The results presented in the paper and the other test cases we have ran show: - The domain decomposition and the parallel algorithms used ensure a good stability for our code. - The Combinative method is well suited for the resolution of complex reservoir models and is very insensitive to the number of processors. - The choice of a parallel IO strategy has a very important impact on the performances.

9 9 The scalability of the code has been shown until 32 processors, largest model will be simulate to calculate the scalability on larger clusters. References 1 P. Quandalle, Comparaison de Quelques Algorithmes d Inversion Matricielle sur le Calculateur CRAY1, Revue de l Institut Français du Pétrole, vol. 38, n 2, March-April 1983, p.p J-F. Magras, P. Quandalle, High Performance Reservoir Simulation With Parallel ATHOS, SPE66342, Feb Message Passing Interface : 4 SAAD Y., Iterative methods for Sparse Linear Systems, SIAM, Second Edition, January M. A. Christie, Tenth SPE Comparative Solution Project: A Comparison of Upscaling Techniques, SPE66599, Feb Y. Caillabet, J-F. Magras, Large compositional reservoir simulations with parallelized adaptive implicit methods, SPE81501, June Top 500 Supercomputers sites : 8 HDF5 Home Page: 9th European Conference on the Mathematics of Oil Recovery Cannes, France, 30 August - 2 September 2004

10 10

Large-Scale Reservoir Simulation and Big Data Visualization

Large-Scale Reservoir Simulation and Big Data Visualization Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)

More information

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

More information

Basin simulation for complex geological settings

Basin simulation for complex geological settings Énergies renouvelables Production éco-responsable Transports innovants Procédés éco-efficients Ressources durables Basin simulation for complex geological settings Towards a realistic modeling P. Havé*,

More information

HPC Deployment of OpenFOAM in an Industrial Setting

HPC Deployment of OpenFOAM in an Industrial Setting HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment

More information

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. Reza_Rooholamini@dell.com http://www.dell.com/clustering

More information

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

HPC enabling of OpenFOAM R for CFD applications

HPC enabling of OpenFOAM R for CFD applications HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

Reservoir Simulation

Reservoir Simulation Reservoir Simulation Instructors: Duration: Level: Dr. Turgay Ertekin and Dr. Maghsood Abbaszadeh 5 days Basic - Intermediate Course Objectives and Description This five-day course is designed for participants

More information

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications A Load Balancing Tool for Structured Multi-Block Grid CFD Applications K. P. Apponsah and D. W. Zingg University of Toronto Institute for Aerospace Studies (UTIAS), Toronto, ON, M3H 5T6, Canada Email:

More information

Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

More information

Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

MPI Hands-On List of the exercises

MPI Hands-On List of the exercises MPI Hands-On List of the exercises 1 MPI Hands-On Exercise 1: MPI Environment.... 2 2 MPI Hands-On Exercise 2: Ping-pong...3 3 MPI Hands-On Exercise 3: Collective communications and reductions... 5 4 MPI

More information

Large Scale Parallel Reservoir Simulations on a Linux PC-Cluster 1

Large Scale Parallel Reservoir Simulations on a Linux PC-Cluster 1 Large Scale Parallel Reservoir Simulations on a Linux PC-Cluster 1 Walid A. Habiballah and M. Ehtesham Hayder Petroleum Engineering Application Services Department Saudi Aramco, Dhahran 31311, Saudi Arabia

More information

Scalable Distributed Schur Complement Solvers for Internal and External Flow Computations on Many-Core Architectures

Scalable Distributed Schur Complement Solvers for Internal and External Flow Computations on Many-Core Architectures Scalable Distributed Schur Complement Solvers for Internal and External Flow Computations on Many-Core Architectures Dr.-Ing. Achim Basermann, Dr. Hans-Peter Kersken, Melven Zöllner** German Aerospace

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008 A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity

More information

AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications

AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications Dr. Bjoern Landmann Dr. Kerstin Wieczorek Stefan Bachschuster 18.03.2015 FluiDyna GmbH, Lichtenbergstr. 8, 85748 Garching

More information

benchmarking Amazon EC2 for high-performance scientific computing

benchmarking Amazon EC2 for high-performance scientific computing Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received

More information

High-fidelity electromagnetic modeling of large multi-scale naval structures

High-fidelity electromagnetic modeling of large multi-scale naval structures High-fidelity electromagnetic modeling of large multi-scale naval structures F. Vipiana, M. A. Francavilla, S. Arianos, and G. Vecchi (LACE), and Politecnico di Torino 1 Outline ISMB and Antenna/EMC Lab

More information

ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS

ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS 1 ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS Sreenivas Varadan a, Kentaro Hara b, Eric Johnsen a, Bram Van Leer b a. Department of Mechanical Engineering, University of Michigan,

More information

OpenFOAM Optimization Tools

OpenFOAM Optimization Tools OpenFOAM Optimization Tools Henrik Rusche and Aleks Jemcov h.rusche@wikki-gmbh.de and a.jemcov@wikki.co.uk Wikki, Germany and United Kingdom OpenFOAM Optimization Tools p. 1 Agenda Objective Review optimisation

More information

KOMAR UNIVERSITY OF SCIENCE AND TECHNOLOGY (KUST)

KOMAR UNIVERSITY OF SCIENCE AND TECHNOLOGY (KUST) Course Title Course Code PTE 4370 RESERVOIR SIMULATION AND WELTEST SYLLABUS Reservoir Simulation and Weltest No. of Credits 3 CR Department Petroleum Engineering College College of Engineering Pre-requisites

More information

SPE 51885. Abstract. Copyright 1999, Society of Petroleum Engineers, Inc.

SPE 51885. Abstract. Copyright 1999, Society of Petroleum Engineers, Inc. SPE 51885 A Fully Implicit Parallel EOS Compositional Simulator for Large Scale Reservoir Simulation. P. Wang, S. Balay 1, K.Sepehrnoori, J. Wheeler, J. Abate, B. Smith 1, G.A. Pope. The University of

More information

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions

More information

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

YALES2 porting on the Xeon- Phi Early results

YALES2 porting on the Xeon- Phi Early results YALES2 porting on the Xeon- Phi Early results Othman Bouizi Ghislain Lartigue Innovation and Pathfinding Architecture Group in Europe, Exascale Lab. Paris CRIHAN - Demi-journée calcul intensif, 16 juin

More information

Fast Iterative Solvers for Integral Equation Based Techniques in Electromagnetics

Fast Iterative Solvers for Integral Equation Based Techniques in Electromagnetics Fast Iterative Solvers for Integral Equation Based Techniques in Electromagnetics Mario Echeverri, PhD. Student (2 nd year, presently doing a research period abroad) ID:30360 Tutor: Prof. Francesca Vipiana,

More information

Interactive comment on A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc by W. He et al.

Interactive comment on A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc by W. He et al. Geosci. Model Dev. Discuss., 8, C1166 C1176, 2015 www.geosci-model-dev-discuss.net/8/c1166/2015/ Author(s) 2015. This work is distributed under the Creative Commons Attribute 3.0 License. Geoscientific

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

A Theory of the Spatial Computational Domain

A Theory of the Spatial Computational Domain A Theory of the Spatial Computational Domain Shaowen Wang 1 and Marc P. Armstrong 2 1 Academic Technologies Research Services and Department of Geography, The University of Iowa Iowa City, IA 52242 Tel:

More information

Nexus. Reservoir Simulation Software DATA SHEET

Nexus. Reservoir Simulation Software DATA SHEET DATA SHEET Nexus Reservoir Simulation Software OVERVIEW KEY VALUE Compute surface and subsurface fluid flow simultaneously for increased accuracy and stability Build multi-reservoir models by combining

More information

IBM Deep Computing Visualization Offering

IBM Deep Computing Visualization Offering P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: parijatsharma@in.ibm.com Summary Deep Computing Visualization in Oil & Gas

More information

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing 1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance

More information

Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis. Pelle Jakovits

Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis. Pelle Jakovits Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis Pelle Jakovits Outline Problem statement State of the art Approach Solutions and contributions Current work Conclusions

More information

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition K. Osypov* (WesternGeco), D. Nichols (WesternGeco), M. Woodward (WesternGeco) & C.E. Yarman (WesternGeco) SUMMARY Tomographic

More information

Benchmark Tests on ANSYS Parallel Processing Technology

Benchmark Tests on ANSYS Parallel Processing Technology Benchmark Tests on ANSYS Parallel Processing Technology Kentaro Suzuki ANSYS JAPAN LTD. Abstract It is extremely important for manufacturing industries to reduce their design process period in order to

More information

High Performance Computing for Operation Research

High Performance Computing for Operation Research High Performance Computing for Operation Research IEF - Paris Sud University claude.tadonki@u-psud.fr INRIA-Alchemy seminar, Thursday March 17 Research topics Fundamental Aspects of Algorithms and Complexity

More information

Introduction. 1.1 Motivation. Chapter 1

Introduction. 1.1 Motivation. Chapter 1 Chapter 1 Introduction The automotive, aerospace and building sectors have traditionally used simulation programs to improve their products or services, focusing their computations in a few major physical

More information

TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW

TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW Rajesh Khatri 1, 1 M.Tech Scholar, Department of Mechanical Engineering, S.A.T.I., vidisha

More information

Performance of the JMA NWP models on the PC cluster TSUBAME.

Performance of the JMA NWP models on the PC cluster TSUBAME. Performance of the JMA NWP models on the PC cluster TSUBAME. K.Takenouchi 1), S.Yokoi 1), T.Hara 1) *, T.Aoki 2), C.Muroi 1), K.Aranami 1), K.Iwamura 1), Y.Aikawa 1) 1) Japan Meteorological Agency (JMA)

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

Very special thanks to Wolfgang Gentzsch and Burak Yenier for making the UberCloud HPC Experiment possible.

Very special thanks to Wolfgang Gentzsch and Burak Yenier for making the UberCloud HPC Experiment possible. Digital manufacturing technology and convenient access to High Performance Computing (HPC) in industry R&D are essential to increase the quality of our products and the competitiveness of our companies.

More information

ME6130 An introduction to CFD 1-1

ME6130 An introduction to CFD 1-1 ME6130 An introduction to CFD 1-1 What is CFD? Computational fluid dynamics (CFD) is the science of predicting fluid flow, heat and mass transfer, chemical reactions, and related phenomena by solving numerically

More information

Poisson Equation Solver Parallelisation for Particle-in-Cell Model

Poisson Equation Solver Parallelisation for Particle-in-Cell Model WDS'14 Proceedings of Contributed Papers Physics, 233 237, 214. ISBN 978-8-7378-276-4 MATFYZPRESS Poisson Equation Solver Parallelisation for Particle-in-Cell Model A. Podolník, 1,2 M. Komm, 1 R. Dejarnac,

More information

Best practices for efficient HPC performance with large models

Best practices for efficient HPC performance with large models Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013 - Industry Oriented HPC Simulations, September 21-27, University of Ljubljana,

More information

Big Data Systems CS 5965/6965 FALL 2015

Big Data Systems CS 5965/6965 FALL 2015 Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html

More information

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers WHITE PAPER Comparing the performance of the Landmark Nexus reservoir simulator on HP servers Landmark Software & Services SOFTWARE AND ASSET SOLUTIONS Comparing the performance of the Landmark Nexus

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

More information

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...

More information

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks Garron K. Morris Senior Project Thermal Engineer gkmorris@ra.rockwell.com Standard Drives Division Bruce W. Weiss Principal

More information

CFD Applications using CFD++ Paul Batten & Vedat Akdag

CFD Applications using CFD++ Paul Batten & Vedat Akdag CFD Applications using CFD++ Paul Batten & Vedat Akdag Metacomp Products available under Altair Partner Program CFD++ Introduction Accurate multi dimensional polynomial framework Robust on wide variety

More information

Advanced Computational Software

Advanced Computational Software Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 June 10 2011 Outline Quick review Fancy Linear Algebra libraries - ScaLAPACK -PETSc

More information

Modeling and Simulation of Oil-Water Flows with Viscous Fingering in Heterogeneous Porous Media.

Modeling and Simulation of Oil-Water Flows with Viscous Fingering in Heterogeneous Porous Media. ACMA 2014 Modeling and Simulation of Oil-Water Flows with Viscous Fingering in Heterogeneous Porous Media. H. DJEBOURI 1, S. ZOUAOUI 1, K. MOHAMMEDI 2, and A. AIT AIDER 1 1 Laboratoire de Mécanique Structure

More information

Mesh Generation and Load Balancing

Mesh Generation and Load Balancing Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable

More information

ICES REPORT 12-01. January 2012. Reza Tavakoli, Gergina Pencheva, Mary F. Wheeler, Benjamin Ganis

ICES REPORT 12-01. January 2012. Reza Tavakoli, Gergina Pencheva, Mary F. Wheeler, Benjamin Ganis ICES REPORT 12-01 January 2012 Petroleum Reservoir Parameter Estimation and Uncertainty Assessment with the Parallel Ensemble Based Framework Coupled with IPARS by Reza Tavakoli, Gergina Pencheva, Mary

More information

MIKE by DHI 2014 e sviluppi futuri

MIKE by DHI 2014 e sviluppi futuri MIKE by DHI 2014 e sviluppi futuri Johan Hartnack Torino, 9-10 Ottobre 2013 Technology drivers/trends Smart devices Cloud computing Services vs. Products Technology drivers/trends Multiprocessor hardware

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

Part I Courses Syllabus

Part I Courses Syllabus Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment

More information

Chapter 18: Database System Architectures. Centralized Systems

Chapter 18: Database System Architectures. Centralized Systems Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

A Chromium Based Viewer for CUMULVS

A Chromium Based Viewer for CUMULVS A Chromium Based Viewer for CUMULVS Submitted to PDPTA 06 Dan Bennett Corresponding Author Department of Mathematics and Computer Science Edinboro University of PA Edinboro, Pennsylvania 16444 Phone: (814)

More information

Arcane/ArcGeoSim, a software framework for geosciences simulation

Arcane/ArcGeoSim, a software framework for geosciences simulation Renewable energies Eco-friendly production Innovative transport Eco-efficient processes Sustainable resources Arcane/ArcGeoSim, a software framework for geosciences simulation Pascal Havé Outline these

More information

Streamline Integration using MPI-Hybrid Parallelism on a Large Multi-Core Architecture

Streamline Integration using MPI-Hybrid Parallelism on a Large Multi-Core Architecture Streamline Integration using MPI-Hybrid Parallelism on a Large Multi-Core Architecture David Camp (LBL, UC Davis), Hank Childs (LBL, UC Davis), Christoph Garth (UC Davis), Dave Pugmire (ORNL), & Kenneth

More information

Investigation of the Effect of Dynamic Capillary Pressure on Waterflooding in Extra Low Permeability Reservoirs

Investigation of the Effect of Dynamic Capillary Pressure on Waterflooding in Extra Low Permeability Reservoirs Copyright 013 Tech Science Press SL, vol.9, no., pp.105-117, 013 Investigation of the Effect of Dynamic Capillary Pressure on Waterflooding in Extra Low Permeability Reservoirs Tian Shubao 1, Lei Gang

More information

High Performance Computing

High Performance Computing High Performance Computing Trey Breckenridge Computing Systems Manager Engineering Research Center Mississippi State University What is High Performance Computing? HPC is ill defined and context dependent.

More information

Methodology for predicting the energy consumption of SPMD application on virtualized environments *

Methodology for predicting the energy consumption of SPMD application on virtualized environments * Methodology for predicting the energy consumption of SPMD application on virtualized environments * Javier Balladini, Ronal Muresano +, Remo Suppi +, Dolores Rexachs + and Emilio Luque + * Computer Engineering

More information

DYNAMIC LOAD BALANCING APPLICATIONS ON A HETEROGENEOUS UNIX/NT CLUSTER

DYNAMIC LOAD BALANCING APPLICATIONS ON A HETEROGENEOUS UNIX/NT CLUSTER European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 ECCOMAS DYNAMIC LOAD BALANCING APPLICATIONS ON A HETEROGENEOUS UNIX/NT CLUSTER

More information

2013 Code_Saturne User Group Meeting. EDF R&D Chatou, France. 9 th April 2013

2013 Code_Saturne User Group Meeting. EDF R&D Chatou, France. 9 th April 2013 2013 Code_Saturne User Group Meeting EDF R&D Chatou, France 9 th April 2013 Thermal Comfort in Train Passenger Cars Contact For further information please contact: Brian ANGEL Director RENUDA France brian.angel@renuda.com

More information

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

More information

Big Graph Processing: Some Background

Big Graph Processing: Some Background Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM

More information

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems November 3-6, 1999 in Cambridge Massachusetts, USA Characterizing the Performance of Dynamic Distribution

More information

Divergence-Free Elements for Incompressible Flow on Cartesian Grids

Divergence-Free Elements for Incompressible Flow on Cartesian Grids Divergence-Free Elements for Incompressible Flow on Cartesian Grids Tobias Neckel, Marion Bendig, Hans-Joachim Bungartz, Miriam Mehl, and Christoph Zenger, Fakultät für Informatik TU München Outline The

More information

Parallel Large-Scale Visualization

Parallel Large-Scale Visualization Parallel Large-Scale Visualization Aaron Birkland Cornell Center for Advanced Computing Data Analysis on Ranger January 2012 Parallel Visualization Why? Performance Processing may be too slow on one CPU

More information

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability

More information

CONVERGE Features, Capabilities and Applications

CONVERGE Features, Capabilities and Applications CONVERGE Features, Capabilities and Applications CONVERGE CONVERGE The industry leading CFD code for complex geometries with moving boundaries. Start using CONVERGE and never make a CFD mesh again. CONVERGE

More information

Parallelism and Cloud Computing

Parallelism and Cloud Computing Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication

More information

Building an Inexpensive Parallel Computer

Building an Inexpensive Parallel Computer Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University

More information

Spatial Discretisation Schemes in the PDE framework Peano for Fluid-Structure Interactions

Spatial Discretisation Schemes in the PDE framework Peano for Fluid-Structure Interactions Spatial Discretisation Schemes in the PDE framework Peano for Fluid-Structure Interactions T. Neckel, H.-J. Bungartz, B. Gatzhammer, M. Mehl, C. Zenger TUM Department of Informatics Chair of Scientific

More information

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed

More information

Performance Across the Generations: Processor and Interconnect Technologies

Performance Across the Generations: Processor and Interconnect Technologies WHITE Paper Performance Across the Generations: Processor and Interconnect Technologies HPC Performance Results ANSYS CFD 12 Executive Summary Today s engineering, research, and development applications

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

GPGPU accelerated Computational Fluid Dynamics

GPGPU accelerated Computational Fluid Dynamics t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

Software Development around a Millisecond

Software Development around a Millisecond Introduction Software Development around a Millisecond Geoffrey Fox In this column we consider software development methodologies with some emphasis on those relevant for large scale scientific computing.

More information

How To Write A Pde Framework For A Jubilian (Jubilians)

How To Write A Pde Framework For A Jubilian (Jubilians) FEM Simulations of Incompressible Flow using AD in the PDE Framework Peano Hans-Joachim Bungartz,, Fakultät für Informatik TU München Germany Outline The PDE Framework Peano Different Approaches for Jacobians

More information

Simulation of Fluid-Structure Interactions in Aeronautical Applications

Simulation of Fluid-Structure Interactions in Aeronautical Applications Simulation of Fluid-Structure Interactions in Aeronautical Applications Martin Kuntz Jorge Carregal Ferreira ANSYS Germany D-83624 Otterfing Martin.Kuntz@ansys.com December 2003 3 rd FENET Annual Industry

More information

Lecture 7 - Meshing. Applied Computational Fluid Dynamics

Lecture 7 - Meshing. Applied Computational Fluid Dynamics Lecture 7 - Meshing Applied Computational Fluid Dynamics Instructor: André Bakker http://www.bakker.org André Bakker (2002-2006) Fluent Inc. (2002) 1 Outline Why is a grid needed? Element types. Grid types.

More information

Recent Advances in HPC for Structural Mechanics Simulations

Recent Advances in HPC for Structural Mechanics Simulations Recent Advances in HPC for Structural Mechanics Simulations 1 Trends in Engineering Driving Demand for HPC Increase product performance and integrity in less time Consider more design variants Find the

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information