Tridiagonal Solvers on the GPU and Applications to Fluid Simulation. Nikolai Sakharnykh, NVIDIA

Size: px
Start display at page:

Download "Tridiagonal Solvers on the GPU and Applications to Fluid Simulation. Nikolai Sakharnykh, NVIDIA [email protected]"

Transcription

1 Tridiagonal Solvers on the GPU and Applications to Flid Simlation Nikolai Sakharnykh, NVIDIA

2 Agenda Introdction and Problem Statement Governing Eqations ADI Nmerical Method GPU Implementation and Optimizations Reslts and Ftre Work

3 Introdction Trblence simlation Direct Nmerical Simlation all scales of trblence epensive Large-Eddy Simlation Reynolds-Averaged Navier-Stokes Research at Compter Science department of Moscow State University Paskonov V.M., Berezin S.B.

4 Problem Statement Viscid incompressible flid in 3D domain Initial and bondary conditions Eler coordinates: velocity and temperatre

5 Definitions Density const Velocity (, v, w) Temperatre T Pressre p State eqation p RT RT R gas constant for air

6 Governing Eqations Continity eqation div 0 Navier-Stokes eqations dimensionless form t T Re Re Reynolds nmber

7 Reynolds nmber Similarity parameter the ratio of inertia forces to viscosity forces 3D channel: Re V ' L' ' V ' L' - mean velocity - length of pipe ' - dynamic viscosity High Re trblent flow Low Re laminar flow

8 Governing Eqations Energy eqation dimensionless form T t T T Pr Re T Re Pr Prandtl nmber heat capacity ratio dissipative fnction

9 Nmerical Method Alternating Direction Implicit (ADI) t y z X Y Z t t y t z

10 ADI Heat Condction 3 fractional steps X, Y, Z Implicit finite-difference scheme /3,, /3,, /3,,,, /3,, t n k j i n k j i n k j i n k j i n k j i t n k j n n k j i n k j n k j n n k j i n k j t q q q q,,,,,, /3,, /3,, /3,, 0 0 t q

11 ADI Navier-Stokes Eqation for X velocity need iterations for non-linear PDEs Re z y T z w y v t Re T t Re y y v t Re z z w t X Y Z

12 ADI Time Step (n-) time step (n) time step (n+) time step Splitting by X Splitting by Y Splitting by Z Updating non-linear parameters Global iterations

13 ADI Fractional Time Step Linear PDEs Previos layer N time layer : -velocity v: y-velocity w: z-velocity Sweep N + time layer Net layer T: temperatre Solves many tridiagonal systems independently

14 ADI Fractional Time Step Non-Linear PDEs Previos layer N time layer : -velocity N + ½ time layer Update Local iterations v: y-velocity w: z-velocity Sweep N + time layer Net layer T: temperatre Solves many tridiagonal systems independently

15 Main Stages of the Algorithm Solve a lot of independent tridiagonal systems Comptationally intensive Easy to parallelize Sbtasks: Evalate dissipation term Update non-linear parameters

16 Tridiagonal Solvers Overview Simplified Gass elimination Also known as Thomas algorithm, Sweep The fastest serial approach Cyclic Redction methods Attend Yao Zhang s talk Fast Tridiagonal Solvers afterwards!

17 Sweep algorithm Memory reqirements one additional array of size N Forward elimination step Backward sbstittion step Compleity: O(N)

18 GPU Implementation All data arrays are stored on GPU Several 3D time-layers overall GB for 999 grid in DP Main kernels Sweep Dissipative fnction evalation Non-linear pdate

19 Sweep on the GPU One thread solves one system N^ systems on each fractional step Splitting by X Splitting by Y Splitting by Z Each thread operates with D slice in corresponding direction

20 Sweep performance time steps/sec NVIDIA Tesla C float doble 0 Sweep X Sweep Y Sweep Z X splitting is mch slower than Y/Z ones

21 Sweep going into details Memory bond need to optimize access to the memory Sweep X Sweep Y Sweep Z ncoalesced coalesced coalesced

22 Sweep optimization Soltion for X-splitting Reorder data arrays and rn Y-splitting Need few additional 3D matri transposes time steps/sec..0.7 original 0.8 optimized float doble

23 Code analysis GPU version is based on the CPU code // bondary conditions switch (dir) { case X: case X_as_Y: bc_0( ); break; case Y: bc_y0( ); break; case Z: bc_z0( ); break; } a[] = - c / c; _net[base_id] = f_i / c; // forward trace of sweep int id = base_id; int id_prev; for (int k = ; k < n; k++) { id_prev = id; id += p.stride; doble c = v_temp[id]; c = p.m_c3 * c - p.h; c = p.m_c; c3 = - p.m_c3 * c - p.h; } doble q = (c3 * a[k] + c); doble t = / q; a[k+] = - c * t; _net[id] = (f[id] - c3 * _net[id_prev]) * t;

24 Performance Comparison Test data Grid size of 8/9 8 non-linear iterations ( inner 4 oter) Hardware NVIDIA Tesla C060 Intel Core Qad (4 threads) Intel Core i7 Nehalem (8 threads)

25 Performance 8 - float time steps/sec NVIDIA Tesla C Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) 3 0 Dissipation Sweep NonLinear Total

26 Performance 8 - doble time steps/sec NVIDIA Tesla C060 Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) 0 Dissipation Sweep NonLinear Total

27 Performance 9 - float time steps/sec NVIDIA Tesla C060 Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) 0 Dissipation Sweep NonLinear Total

28 Performance 9 - doble time steps/sec NVIDIA Tesla C060 3 Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) Dissipation Sweep NonLinear Total

29 GPU performance SP/DP time steps/sec float doble 0 dissipation sweep nonlinear total In doble precision GPU is only slower than in single precision

30 Visal reslts Bondary conditions no-slip free = Constant flow at start:, v w 0 No-slip on sides: v w 0 Free at far end: v w 0

31 Visal Reslts X-slice v w T = 0,9 t = 6 Re = 000

32 Ftre Work Effective mlti-gpu sage Distribted memory systems Performance improvements High resoltion grids, high Reynolds nmbers

33 Conclsion High performance and efficiency of GPUs in comple 3D flid simlation CUDA is an easy-to-se tool for GPU compte programming GPU enables new possibilities for researching

34 Qestions? Thank yo! Keywords: ADI, Tridiagonal Solvers, DNS, Trblence

35 References Paskonov V.M., Berezin S.B., Korkhova E.S. (007) A dynamic visalization system for mltiprocessor compters with common memory and its application for nmerical modeling of the trblent flows of viscos flids, Moscow University Comptational Mathematics and Cybernetics ADI method - Doglas Jr., Jim (96), "Alternating direction methods for three space variables", Nmerische Mathematik 4: 4 63

36 Dissipative Fnction z v y w z w z w z v z y w z v y v y w y v y w z v y w v z y z y

Using GPU to Compute Options and Derivatives

Using GPU to Compute Options and Derivatives Introdction Algorithmic Trading has created an increasing demand for high performance compting soltions within financial organizations. The actors of portfolio management and ris assessment have the obligation

More information

Effect of Angular Velocity of Inner Cylinder on Laminar Flow through Eccentric Annular Cross Section Pipe

Effect of Angular Velocity of Inner Cylinder on Laminar Flow through Eccentric Annular Cross Section Pipe Asian Transactions on Engineering (ATE ISSN: -467) Volme 3 Isse Effect of Anglar Velocity of Inner Cylinder on Laminar Flow throgh Eccentric Annlar Cross Section Pipe Ressan Faris Hamd * Department of

More information

Effect of flow field on open channel flow properties using numerical investigation and experimental comparison

Effect of flow field on open channel flow properties using numerical investigation and experimental comparison INTERNATIONAL JOURNAL OF ENERGY AND ENVIRONMENT Volme 3, Isse 4, 2012 pp.617-628 Jornal homepage: www.ijee.ieefondation.org Effect of flow field on open channel flow properties sing nmerical investigation

More information

CFD Platform for Turbo-machinery Simulation

CFD Platform for Turbo-machinery Simulation CFD Platform for Trbo-machinery Simlation Lakhdar Remaki BCAM- Basqe Centre for Applied Mathematics Otline p BCAM-BALTOGAR project p CFD platform design strategy p Some new developments p Some reslts p

More information

Modeling Roughness Effects in Open Channel Flows D.T. Souders and C.W. Hirt Flow Science, Inc.

Modeling Roughness Effects in Open Channel Flows D.T. Souders and C.W. Hirt Flow Science, Inc. FSI-2-TN6 Modeling Roghness Effects in Open Channel Flows D.T. Soders and C.W. Hirt Flow Science, Inc. Overview Flows along rivers, throgh pipes and irrigation channels enconter resistance that is proportional

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

More information

8. Forced Convection Heat Transfer

8. Forced Convection Heat Transfer 8. Forced Convection Heat Transfer 8.1 Introdction The general definition for convection ma be smmarized to this definition "energ transfer between the srface and flid de to temperatre difference" and

More information

Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1

Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1 Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

ultra fast SOM using CUDA

ultra fast SOM using CUDA ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A

More information

Pricing of cross-currency interest rate derivatives on Graphics Processing Units

Pricing of cross-currency interest rate derivatives on Graphics Processing Units Pricing of cross-currency interest rate derivatives on Graphics Processing Units Duy Minh Dang Department of Computer Science University of Toronto Toronto, Canada [email protected] Joint work with

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

More information

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy Perspectives of GPU Computing in Physics

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best

More information

Acceleration of a CFD Code with a GPU

Acceleration of a CFD Code with a GPU Acceleration of a CFD Code with a GPU Dennis C. Jespersen ABSTRACT The Computational Fluid Dynamics code Overflow includes as one of its solver options an algorithm which is a fairly small piece of code

More information

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

More information

Chapter 10 LOW PRANDTL NUMBER THERMAL-HYDRAULICS*

Chapter 10 LOW PRANDTL NUMBER THERMAL-HYDRAULICS* Chapter LOW PRANDTL NUMBER THERMAL-HYDRAULICS*. Introdction This chapter is an introdction into the field of momentm and heat transfer in lo Prandtl nmber flids. In order to read this chapter a basic knoledge

More information

DIFFERENTIAL FORMULATION OF THE BASIC LAWS

DIFFERENTIAL FORMULATION OF THE BASIC LAWS CHAPER DIFFERENIAL FORMULAION OF HE BASIC LAWS. Introdction Differential fmlation of basic las: Conseration of mass Conseration of momentm Conseration of energ. Flo Generation (i) Fced conection. Motion

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

High Performance CUDA Accelerated Local Optimization in Traveling Salesman Problem

High Performance CUDA Accelerated Local Optimization in Traveling Salesman Problem High Performance CUDA Accelerated Local Optimization in Traveling Salesman Problem Kamil Rocki, PhD Department of Computer Science Graduate School of Information Science and Technology The University of

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles [email protected] Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

Optimizing Application Performance with CUDA Profiling Tools

Optimizing Application Performance with CUDA Profiling Tools Optimizing Application Performance with CUDA Profiling Tools Why Profile? Application Code GPU Compute-Intensive Functions Rest of Sequential CPU Code CPU 100 s of cores 10,000 s of threads Great memory

More information

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

More information

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 [email protected] THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833

More information

GPU Acceleration of the SENSEI CFD Code Suite

GPU Acceleration of the SENSEI CFD Code Suite GPU Acceleration of the SENSEI CFD Code Suite Chris Roy, Brent Pickering, Chip Jackson, Joe Derlaga, Xiao Xu Aerospace and Ocean Engineering Primary Collaborators: Tom Scogland, Wu Feng (Computer Science)

More information

OpenCL Programming for the CUDA Architecture. Version 2.3

OpenCL Programming for the CUDA Architecture. Version 2.3 OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different

More information

A Fast Double Precision CFD Code using CUDA

A Fast Double Precision CFD Code using CUDA A Fast Double Precision CFD Code using CUDA Jonathan M. Cohen *, M. Jeroen Molemaker** *NVIDIA Corporation, Santa Clara, CA 95050, USA (e-mail: [email protected]) **IGPP UCLA, Los Angeles, CA 90095, USA

More information

Monte-Carlo Option Pricing. Victor Podlozhnyuk [email protected]

Monte-Carlo Option Pricing. Victor Podlozhnyuk vpodlozhnyuk@nvidia.com Monte-Carlo Option Pricing Victor Podlozhnyuk [email protected] Document Change History Version Date Responsible Reason for Change 1. 3//7 vpodlozhnyuk Initial release Abstract The pricing of options

More information

Spectrum Balancing for DSL with Restrictions on Maximum Transmit PSD

Spectrum Balancing for DSL with Restrictions on Maximum Transmit PSD Spectrm Balancing for DSL with Restrictions on Maximm Transmit PSD Driton Statovci, Tomas Nordström, and Rickard Nilsson Telecommnications Research Center Vienna (ftw.), Dona-City-Straße 1, A-1220 Vienna,

More information

GPU Hardware Performance. Fall 2015

GPU Hardware Performance. Fall 2015 Fall 2015 Atomic operations performs read-modify-write operations on shared or global memory no interference with other threads for 32-bit and 64-bit integers (c. c. 1.2), float addition (c. c. 2.0) using

More information

Planning a Managed Environment

Planning a Managed Environment C H A P T E R 1 Planning a Managed Environment Many organizations are moving towards a highly managed compting environment based on a configration management infrastrctre that is designed to redce the

More information

CIVE2400 Fluid Mechanics. Section 1: Fluid Flow in Pipes

CIVE2400 Fluid Mechanics. Section 1: Fluid Flow in Pipes CIVE00 Flid Mechanics Section : Flid Flow in Pipes CIVE00 FLUID MECHNICS... SECTION : FLUID FLOW IN PIPES.... FLUID FLOW IN PIPES.... Pressre loss de to riction in a pipeline..... Pressre loss dring laminar

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

Heavy Parallelization of Alternating Direction Schemes in Multi-Factor Option Valuation Models. Cris Doloc, Ph.D.

Heavy Parallelization of Alternating Direction Schemes in Multi-Factor Option Valuation Models. Cris Doloc, Ph.D. Heavy Parallelization of Alternating Direction Schemes in Multi-Factor Option Valuation Models Cris Doloc, Ph.D. WHO INTRO Ex-physicist Ph.D. in Computational Physics - Applied TN Plasma (10 yrs) Working

More information

Resource Pricing and Provisioning Strategies in Cloud Systems: A Stackelberg Game Approach

Resource Pricing and Provisioning Strategies in Cloud Systems: A Stackelberg Game Approach Resorce Pricing and Provisioning Strategies in Clod Systems: A Stackelberg Game Approach Valeria Cardellini, Valerio di Valerio and Francesco Lo Presti Talk Otline Backgrond and Motivation Provisioning

More information

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

More information

GPGPU accelerated Computational Fluid Dynamics

GPGPU accelerated Computational Fluid Dynamics t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute

More information

Real-time Visual Tracker by Stream Processing

Real-time Visual Tracker by Stream Processing Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol

More information

TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings

TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings TrstSVD: Collaborative Filtering with Both the Explicit and Implicit Inflence of User Trst and of Item Ratings Gibing Go Jie Zhang Neil Yorke-Smith School of Compter Engineering Nanyang Technological University

More information

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy

More information

Steady Flow: Laminar and Turbulent in an S-Bend

Steady Flow: Laminar and Turbulent in an S-Bend STAR-CCM+ User Guide 6663 Steady Flow: Laminar and Turbulent in an S-Bend This tutorial demonstrates the flow of an incompressible gas through an s-bend of constant diameter (2 cm), for both laminar and

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Modern GPU

More information

10.4 Solving Equations in Quadratic Form, Equations Reducible to Quadratics

10.4 Solving Equations in Quadratic Form, Equations Reducible to Quadratics . Solving Eqations in Qadratic Form, Eqations Redcible to Qadratics Now that we can solve all qadratic eqations we want to solve eqations that are not eactly qadratic bt can either be made to look qadratic

More information

Regular Specifications of Resource Requirements for Embedded Control Software

Regular Specifications of Resource Requirements for Embedded Control Software Reglar Specifications of Resorce Reqirements for Embedded Control Software Rajeev Alr and Gera Weiss University of Pennsylvania Abstract For embedded control systems a schedle for the allocation of resorces

More information

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu [email protected] High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University

More information

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA Part 1: Hardware design and programming model Amin Safi Faculty of Mathematics, TU dortmund January 22, 2016 Table of Contents Set

More information

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks Garron K. Morris Senior Project Thermal Engineer [email protected] Standard Drives Division Bruce W. Weiss Principal

More information

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA CUDA Optimization with NVIDIA Tools Julien Demouth, NVIDIA What Will You Learn? An iterative method to optimize your GPU code A way to conduct that method with Nvidia Tools 2 What Does the Application

More information

CRM Customer Relationship Management. Customer Relationship Management

CRM Customer Relationship Management. Customer Relationship Management CRM Cstomer Relationship Management Kenneth W. Thorson Tax Commissioner Virginia Department of Taxation Discssion Areas TAX/AMS Partnership Project Backgrond Cstomer Relationship Management Secre Messaging

More information

Research on Pricing Policy of E-business Supply Chain Based on Bertrand and Stackelberg Game

Research on Pricing Policy of E-business Supply Chain Based on Bertrand and Stackelberg Game International Jornal of Grid and Distribted Compting Vol. 9, No. 5 (06), pp.-0 http://dx.doi.org/0.457/ijgdc.06.9.5.8 Research on Pricing Policy of E-bsiness Spply Chain Based on Bertrand and Stackelberg

More information

MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA

MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS Julien Demouth, NVIDIA STAC-A2 BENCHMARK STAC-A2 Benchmark Developed by banks Macro and micro, performance and accuracy Pricing and Greeks for American

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

4.What is the appropriate dimensionless parameter to use in comparing flow types? YOUR ANSWER: The Reynolds Number, Re.

4.What is the appropriate dimensionless parameter to use in comparing flow types? YOUR ANSWER: The Reynolds Number, Re. CHAPTER 08 1. What is most likely to be the main driving force in pipe flow? A. Gravity B. A pressure gradient C. Vacuum 2.What is a general description of the flow rate in laminar flow? A. Small B. Large

More information

THE CFD SIMULATION OF THE FLOW AROUND THE AIRCRAFT USING OPENFOAM AND ANSA

THE CFD SIMULATION OF THE FLOW AROUND THE AIRCRAFT USING OPENFOAM AND ANSA THE CFD SIMULATION OF THE FLOW AROUND THE AIRCRAFT USING OPENFOAM AND ANSA Adam Kosík Evektor s.r.o., Czech Republic KEYWORDS CFD simulation, mesh generation, OpenFOAM, ANSA ABSTRACT In this paper we describe

More information

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING Sonam Mahajan 1 and Maninder Singh 2 1 Department of Computer Science Engineering, Thapar University, Patiala, India 2 Department of Computer Science Engineering,

More information

Effect of Aspect Ratio on Laminar Natural Convection in Partially Heated Enclosure

Effect of Aspect Ratio on Laminar Natural Convection in Partially Heated Enclosure Universal Journal of Mechanical Engineering (1): 8-33, 014 DOI: 10.13189/ujme.014.00104 http://www.hrpub.org Effect of Aspect Ratio on Laminar Natural Convection in Partially Heated Enclosure Alireza Falahat

More information

On the urbanization of poverty

On the urbanization of poverty On the rbanization of poverty Martin Ravallion 1 Development Research Grop, World Bank 1818 H Street NW, Washington DC, USA Febrary 001; revised Jly 001 Abstract: Conditions are identified nder which the

More information

Poisson Equation Solver Parallelisation for Particle-in-Cell Model

Poisson Equation Solver Parallelisation for Particle-in-Cell Model WDS'14 Proceedings of Contributed Papers Physics, 233 237, 214. ISBN 978-8-7378-276-4 MATFYZPRESS Poisson Equation Solver Parallelisation for Particle-in-Cell Model A. Podolník, 1,2 M. Komm, 1 R. Dejarnac,

More information

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

Experiences on using GPU accelerators for data analysis in ROOT/RooFit Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,

More information

Graphic Processing Units: a possible answer to High Performance Computing?

Graphic Processing Units: a possible answer to High Performance Computing? 4th ABINIT Developer Workshop RESIDENCE L ESCANDILLE AUTRANS HPC & Graphic Processing Units: a possible answer to High Performance Computing? Luigi Genovese ESRF - Grenoble 26 March 2009 http://inac.cea.fr/l_sim/

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Sample Pages. Edgar Dietrich, Alfred Schulze. Measurement Process Qualification

Sample Pages. Edgar Dietrich, Alfred Schulze. Measurement Process Qualification Sample Pages Edgar Dietrich, Alfred Schlze Measrement Process Qalification Gage Acceptance and Measrement Uncertainty According to Crrent Standards ISBN: 978-3-446-4407-4 For frther information and order

More information

Evolutionary Path Planning for Robot Assisted Part Handling in Sheet Metal Bending

Evolutionary Path Planning for Robot Assisted Part Handling in Sheet Metal Bending Evoltionary Path Planning for Robot Assisted Part Handling in Sheet Metal Bending Abstract Xiaoyn Liao G. Gary Wang * Dept. of Mechanical & Indstrial Engineering, The University of Manitoba Winnipeg, MB,

More information

High Performance Matrix Inversion with Several GPUs

High Performance Matrix Inversion with Several GPUs High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República

More information

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE Tuyou Peng 1, Jun Peng 2 1 Electronics and information Technology Department Jiangmen Polytechnic, Jiangmen, Guangdong, China, [email protected]

More information

WHITE PAPER. Filter Bandwidth Definition of the WaveShaper S-series Programmable Optical Processor

WHITE PAPER. Filter Bandwidth Definition of the WaveShaper S-series Programmable Optical Processor WHITE PAPER Filter andwidth Definition of the WaveShaper S-series 1 Introdction The WaveShaper family of s allow creation of ser-cstomized filter profiles over the C- or L- band, providing a flexible tool

More information

Optimal Trust Network Analysis with Subjective Logic

Optimal Trust Network Analysis with Subjective Logic The Second International Conference on Emerging Secrity Information, Systems and Technologies Optimal Trst Network Analysis with Sbjective Logic Adn Jøsang UNIK Gradate Center, University of Oslo Norway

More information

CRM Customer Relationship Management. Customer Relationship Management

CRM Customer Relationship Management. Customer Relationship Management CRM Cstomer Relationship Management Farley Beaton Virginia Department of Taxation Discssion Areas TAX/AMS Partnership Project Backgrond Cstomer Relationship Management Secre Messaging Lessons Learned 2

More information

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU Efficient Parallel Graph Exploration on Multi-Core CPU and GPU Pervasive Parallelism Laboratory Stanford University Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun Graph and its Applications Graph Fundamental

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

CUDA for Real Time Multigrid Finite Element Simulation of

CUDA for Real Time Multigrid Finite Element Simulation of CUDA for Real Time Multigrid Finite Element Simulation of SoftTissue Deformations Christian Dick Computer Graphics and Visualization Group Technische Universität München, Germany Motivation Real time physics

More information

Chapter 3. 2. Consider an economy described by the following equations: Y = 5,000 G = 1,000

Chapter 3. 2. Consider an economy described by the following equations: Y = 5,000 G = 1,000 Chapter C evel Qestions. Imagine that the prodction of fishing lres is governed by the prodction fnction: y.7 where y represents the nmber of lres created per hor and represents the nmber of workers employed

More information

Black-Scholes option pricing. Victor Podlozhnyuk [email protected]

Black-Scholes option pricing. Victor Podlozhnyuk vpodlozhnyuk@nvidia.com Black-Scholes option pricing Victor Podlozhnyuk [email protected] June 007 Document Change History Version Date Responsible Reason for Change 0.9 007/03/19 vpodlozhnyuk Initial release 1.0 007/04/06

More information

Case Study on Productivity and Performance of GPGPUs

Case Study on Productivity and Performance of GPGPUs Case Study on Productivity and Performance of GPGPUs Sandra Wienke [email protected] ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia

More information

Parallel 3D Image Segmentation of Large Data Sets on a GPU Cluster

Parallel 3D Image Segmentation of Large Data Sets on a GPU Cluster Parallel 3D Image Segmentation of Large Data Sets on a GPU Cluster Aaron Hagan and Ye Zhao Kent State University Abstract. In this paper, we propose an inherent parallel scheme for 3D image segmentation

More information

(Toward) Radiative transfer on AMR with GPUs. Dominique Aubert Université de Strasbourg Austin, TX, 14.12.12

(Toward) Radiative transfer on AMR with GPUs. Dominique Aubert Université de Strasbourg Austin, TX, 14.12.12 (Toward) Radiative transfer on AMR with GPUs Dominique Aubert Université de Strasbourg Austin, TX, 14.12.12 A few words about GPUs Cache and control replaced by calculation units Large number of Multiprocessors

More information

Basic Equations, Boundary Conditions and Dimensionless Parameters

Basic Equations, Boundary Conditions and Dimensionless Parameters Chapter 2 Basic Equations, Boundary Conditions and Dimensionless Parameters In the foregoing chapter, many basic concepts related to the present investigation and the associated literature survey were

More information

Model of a flow in intersecting microchannels. Denis Semyonov

Model of a flow in intersecting microchannels. Denis Semyonov Model of a flow in intersecting microchannels Denis Semyonov LUT 2012 Content Objectives Motivation Model implementation Simulation Results Conclusion Objectives A flow and a reaction model is required

More information

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

More information

~ Greetings from WSU CAPPLab ~

~ Greetings from WSU CAPPLab ~ ~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model 5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model C99, C++, F2003 Compilers Optimizing Vectorizing Parallelizing Graphical parallel tools PGDBG debugger PGPROF profiler Intel, AMD, NVIDIA

More information

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014 Overview Motivation Flux Reconstruction Many-Core

More information

OpenACC 2.0 and the PGI Accelerator Compilers

OpenACC 2.0 and the PGI Accelerator Compilers OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group [email protected] This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present

More information

TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW

TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW Rajesh Khatri 1, 1 M.Tech Scholar, Department of Mechanical Engineering, S.A.T.I., vidisha

More information

Several tips on how to choose a suitable computer

Several tips on how to choose a suitable computer Several tips on how to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and postprocessing of your data with Artec

More information

A priori error analysis of stabilized mixed finite element method for reaction-diffusion optimal control problems

A priori error analysis of stabilized mixed finite element method for reaction-diffusion optimal control problems F et al. Bondary Vale Problems 2016 2016:23 DOI 10.1186/s13661-016-0531-9 R E S E A R C H Open Access A priori error analysis of stabilized mixed finite element metod for reaction-diffsion optimal control

More information

Iterative Solvers for Linear Systems

Iterative Solvers for Linear Systems 9th SimLab Course on Parallel Numerical Simulation, 4.10 8.10.2010 Iterative Solvers for Linear Systems Bernhard Gatzhammer Chair of Scientific Computing in Computer Science Technische Universität München

More information

GPGPU Parallel Merge Sort Algorithm

GPGPU Parallel Merge Sort Algorithm GPGPU Parallel Merge Sort Algorithm Jim Kukunas and James Devine May 4, 2009 Abstract The increasingly high data throughput and computational power of today s Graphics Processing Units (GPUs), has led

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information