Tridiagonal Solvers on the GPU and Applications to Fluid Simulation. Nikolai Sakharnykh, NVIDIA
|
|
|
- Randell Waters
- 9 years ago
- Views:
Transcription
1 Tridiagonal Solvers on the GPU and Applications to Flid Simlation Nikolai Sakharnykh, NVIDIA
2 Agenda Introdction and Problem Statement Governing Eqations ADI Nmerical Method GPU Implementation and Optimizations Reslts and Ftre Work
3 Introdction Trblence simlation Direct Nmerical Simlation all scales of trblence epensive Large-Eddy Simlation Reynolds-Averaged Navier-Stokes Research at Compter Science department of Moscow State University Paskonov V.M., Berezin S.B.
4 Problem Statement Viscid incompressible flid in 3D domain Initial and bondary conditions Eler coordinates: velocity and temperatre
5 Definitions Density const Velocity (, v, w) Temperatre T Pressre p State eqation p RT RT R gas constant for air
6 Governing Eqations Continity eqation div 0 Navier-Stokes eqations dimensionless form t T Re Re Reynolds nmber
7 Reynolds nmber Similarity parameter the ratio of inertia forces to viscosity forces 3D channel: Re V ' L' ' V ' L' - mean velocity - length of pipe ' - dynamic viscosity High Re trblent flow Low Re laminar flow
8 Governing Eqations Energy eqation dimensionless form T t T T Pr Re T Re Pr Prandtl nmber heat capacity ratio dissipative fnction
9 Nmerical Method Alternating Direction Implicit (ADI) t y z X Y Z t t y t z
10 ADI Heat Condction 3 fractional steps X, Y, Z Implicit finite-difference scheme /3,, /3,, /3,,,, /3,, t n k j i n k j i n k j i n k j i n k j i t n k j n n k j i n k j n k j n n k j i n k j t q q q q,,,,,, /3,, /3,, /3,, 0 0 t q
11 ADI Navier-Stokes Eqation for X velocity need iterations for non-linear PDEs Re z y T z w y v t Re T t Re y y v t Re z z w t X Y Z
12 ADI Time Step (n-) time step (n) time step (n+) time step Splitting by X Splitting by Y Splitting by Z Updating non-linear parameters Global iterations
13 ADI Fractional Time Step Linear PDEs Previos layer N time layer : -velocity v: y-velocity w: z-velocity Sweep N + time layer Net layer T: temperatre Solves many tridiagonal systems independently
14 ADI Fractional Time Step Non-Linear PDEs Previos layer N time layer : -velocity N + ½ time layer Update Local iterations v: y-velocity w: z-velocity Sweep N + time layer Net layer T: temperatre Solves many tridiagonal systems independently
15 Main Stages of the Algorithm Solve a lot of independent tridiagonal systems Comptationally intensive Easy to parallelize Sbtasks: Evalate dissipation term Update non-linear parameters
16 Tridiagonal Solvers Overview Simplified Gass elimination Also known as Thomas algorithm, Sweep The fastest serial approach Cyclic Redction methods Attend Yao Zhang s talk Fast Tridiagonal Solvers afterwards!
17 Sweep algorithm Memory reqirements one additional array of size N Forward elimination step Backward sbstittion step Compleity: O(N)
18 GPU Implementation All data arrays are stored on GPU Several 3D time-layers overall GB for 999 grid in DP Main kernels Sweep Dissipative fnction evalation Non-linear pdate
19 Sweep on the GPU One thread solves one system N^ systems on each fractional step Splitting by X Splitting by Y Splitting by Z Each thread operates with D slice in corresponding direction
20 Sweep performance time steps/sec NVIDIA Tesla C float doble 0 Sweep X Sweep Y Sweep Z X splitting is mch slower than Y/Z ones
21 Sweep going into details Memory bond need to optimize access to the memory Sweep X Sweep Y Sweep Z ncoalesced coalesced coalesced
22 Sweep optimization Soltion for X-splitting Reorder data arrays and rn Y-splitting Need few additional 3D matri transposes time steps/sec..0.7 original 0.8 optimized float doble
23 Code analysis GPU version is based on the CPU code // bondary conditions switch (dir) { case X: case X_as_Y: bc_0( ); break; case Y: bc_y0( ); break; case Z: bc_z0( ); break; } a[] = - c / c; _net[base_id] = f_i / c; // forward trace of sweep int id = base_id; int id_prev; for (int k = ; k < n; k++) { id_prev = id; id += p.stride; doble c = v_temp[id]; c = p.m_c3 * c - p.h; c = p.m_c; c3 = - p.m_c3 * c - p.h; } doble q = (c3 * a[k] + c); doble t = / q; a[k+] = - c * t; _net[id] = (f[id] - c3 * _net[id_prev]) * t;
24 Performance Comparison Test data Grid size of 8/9 8 non-linear iterations ( inner 4 oter) Hardware NVIDIA Tesla C060 Intel Core Qad (4 threads) Intel Core i7 Nehalem (8 threads)
25 Performance 8 - float time steps/sec NVIDIA Tesla C Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) 3 0 Dissipation Sweep NonLinear Total
26 Performance 8 - doble time steps/sec NVIDIA Tesla C060 Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) 0 Dissipation Sweep NonLinear Total
27 Performance 9 - float time steps/sec NVIDIA Tesla C060 Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) 0 Dissipation Sweep NonLinear Total
28 Performance 9 - doble time steps/sec NVIDIA Tesla C060 3 Intel Core i7 Nehalem.93GHz (4 cores) Intel Core Qad.4GHz (4 cores) Dissipation Sweep NonLinear Total
29 GPU performance SP/DP time steps/sec float doble 0 dissipation sweep nonlinear total In doble precision GPU is only slower than in single precision
30 Visal reslts Bondary conditions no-slip free = Constant flow at start:, v w 0 No-slip on sides: v w 0 Free at far end: v w 0
31 Visal Reslts X-slice v w T = 0,9 t = 6 Re = 000
32 Ftre Work Effective mlti-gpu sage Distribted memory systems Performance improvements High resoltion grids, high Reynolds nmbers
33 Conclsion High performance and efficiency of GPUs in comple 3D flid simlation CUDA is an easy-to-se tool for GPU compte programming GPU enables new possibilities for researching
34 Qestions? Thank yo! Keywords: ADI, Tridiagonal Solvers, DNS, Trblence
35 References Paskonov V.M., Berezin S.B., Korkhova E.S. (007) A dynamic visalization system for mltiprocessor compters with common memory and its application for nmerical modeling of the trblent flows of viscos flids, Moscow University Comptational Mathematics and Cybernetics ADI method - Doglas Jr., Jim (96), "Alternating direction methods for three space variables", Nmerische Mathematik 4: 4 63
36 Dissipative Fnction z v y w z w z w z v z y w z v y v y w y v y w z v y w v z y z y
Using GPU to Compute Options and Derivatives
Introdction Algorithmic Trading has created an increasing demand for high performance compting soltions within financial organizations. The actors of portfolio management and ris assessment have the obligation
Effect of Angular Velocity of Inner Cylinder on Laminar Flow through Eccentric Annular Cross Section Pipe
Asian Transactions on Engineering (ATE ISSN: -467) Volme 3 Isse Effect of Anglar Velocity of Inner Cylinder on Laminar Flow throgh Eccentric Annlar Cross Section Pipe Ressan Faris Hamd * Department of
Effect of flow field on open channel flow properties using numerical investigation and experimental comparison
INTERNATIONAL JOURNAL OF ENERGY AND ENVIRONMENT Volme 3, Isse 4, 2012 pp.617-628 Jornal homepage: www.ijee.ieefondation.org Effect of flow field on open channel flow properties sing nmerical investigation
CFD Platform for Turbo-machinery Simulation
CFD Platform for Trbo-machinery Simlation Lakhdar Remaki BCAM- Basqe Centre for Applied Mathematics Otline p BCAM-BALTOGAR project p CFD platform design strategy p Some new developments p Some reslts p
Modeling Roughness Effects in Open Channel Flows D.T. Souders and C.W. Hirt Flow Science, Inc.
FSI-2-TN6 Modeling Roghness Effects in Open Channel Flows D.T. Soders and C.W. Hirt Flow Science, Inc. Overview Flows along rivers, throgh pipes and irrigation channels enconter resistance that is proportional
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
Turbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
8. Forced Convection Heat Transfer
8. Forced Convection Heat Transfer 8.1 Introdction The general definition for convection ma be smmarized to this definition "energ transfer between the srface and flid de to temperatre difference" and
Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
ultra fast SOM using CUDA
ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A
Pricing of cross-currency interest rate derivatives on Graphics Processing Units
Pricing of cross-currency interest rate derivatives on Graphics Processing Units Duy Minh Dang Department of Computer Science University of Toronto Toronto, Canada [email protected] Joint work with
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
Accelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures
Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy Perspectives of GPU Computing in Physics
Parallel Computing with MATLAB
Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best
Acceleration of a CFD Code with a GPU
Acceleration of a CFD Code with a GPU Dennis C. Jespersen ABSTRACT The Computational Fluid Dynamics code Overflow includes as one of its solver options an algorithm which is a fairly small piece of code
OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
Chapter 10 LOW PRANDTL NUMBER THERMAL-HYDRAULICS*
Chapter LOW PRANDTL NUMBER THERMAL-HYDRAULICS*. Introdction This chapter is an introdction into the field of momentm and heat transfer in lo Prandtl nmber flids. In order to read this chapter a basic knoledge
DIFFERENTIAL FORMULATION OF THE BASIC LAWS
CHAPER DIFFERENIAL FORMULAION OF HE BASIC LAWS. Introdction Differential fmlation of basic las: Conseration of mass Conseration of momentm Conseration of energ. Flo Generation (i) Fced conection. Motion
CUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
High Performance CUDA Accelerated Local Optimization in Traveling Salesman Problem
High Performance CUDA Accelerated Local Optimization in Traveling Salesman Problem Kamil Rocki, PhD Department of Computer Science Graduate School of Information Science and Technology The University of
GPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles [email protected] Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
Optimizing Application Performance with CUDA Profiling Tools
Optimizing Application Performance with CUDA Profiling Tools Why Profile? Application Code GPU Compute-Intensive Functions Rest of Sequential CPU Code CPU 100 s of cores 10,000 s of threads Great memory
Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 [email protected] THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
GPU Acceleration of the SENSEI CFD Code Suite
GPU Acceleration of the SENSEI CFD Code Suite Chris Roy, Brent Pickering, Chip Jackson, Joe Derlaga, Xiao Xu Aerospace and Ocean Engineering Primary Collaborators: Tom Scogland, Wu Feng (Computer Science)
OpenCL Programming for the CUDA Architecture. Version 2.3
OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different
A Fast Double Precision CFD Code using CUDA
A Fast Double Precision CFD Code using CUDA Jonathan M. Cohen *, M. Jeroen Molemaker** *NVIDIA Corporation, Santa Clara, CA 95050, USA (e-mail: [email protected]) **IGPP UCLA, Los Angeles, CA 90095, USA
Monte-Carlo Option Pricing. Victor Podlozhnyuk [email protected]
Monte-Carlo Option Pricing Victor Podlozhnyuk [email protected] Document Change History Version Date Responsible Reason for Change 1. 3//7 vpodlozhnyuk Initial release Abstract The pricing of options
Spectrum Balancing for DSL with Restrictions on Maximum Transmit PSD
Spectrm Balancing for DSL with Restrictions on Maximm Transmit PSD Driton Statovci, Tomas Nordström, and Rickard Nilsson Telecommnications Research Center Vienna (ftw.), Dona-City-Straße 1, A-1220 Vienna,
GPU Hardware Performance. Fall 2015
Fall 2015 Atomic operations performs read-modify-write operations on shared or global memory no interference with other threads for 32-bit and 64-bit integers (c. c. 1.2), float addition (c. c. 2.0) using
Planning a Managed Environment
C H A P T E R 1 Planning a Managed Environment Many organizations are moving towards a highly managed compting environment based on a configration management infrastrctre that is designed to redce the
CIVE2400 Fluid Mechanics. Section 1: Fluid Flow in Pipes
CIVE00 Flid Mechanics Section : Flid Flow in Pipes CIVE00 FLUID MECHNICS... SECTION : FLUID FLOW IN PIPES.... FLUID FLOW IN PIPES.... Pressre loss de to riction in a pipeline..... Pressre loss dring laminar
Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
Heavy Parallelization of Alternating Direction Schemes in Multi-Factor Option Valuation Models. Cris Doloc, Ph.D.
Heavy Parallelization of Alternating Direction Schemes in Multi-Factor Option Valuation Models Cris Doloc, Ph.D. WHO INTRO Ex-physicist Ph.D. in Computational Physics - Applied TN Plasma (10 yrs) Working
Resource Pricing and Provisioning Strategies in Cloud Systems: A Stackelberg Game Approach
Resorce Pricing and Provisioning Strategies in Clod Systems: A Stackelberg Game Approach Valeria Cardellini, Valerio di Valerio and Francesco Lo Presti Talk Otline Backgrond and Motivation Provisioning
ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU
Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents
GPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
Real-time Visual Tracker by Stream Processing
Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol
TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings
TrstSVD: Collaborative Filtering with Both the Explicit and Implicit Inflence of User Trst and of Item Ratings Gibing Go Jie Zhang Neil Yorke-Smith School of Compter Engineering Nanyang Technological University
GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile
GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy
Steady Flow: Laminar and Turbulent in an S-Bend
STAR-CCM+ User Guide 6663 Steady Flow: Laminar and Turbulent in an S-Bend This tutorial demonstrates the flow of an incompressible gas through an s-bend of constant diameter (2 cm), for both laminar and
Introduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com
CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Modern GPU
10.4 Solving Equations in Quadratic Form, Equations Reducible to Quadratics
. Solving Eqations in Qadratic Form, Eqations Redcible to Qadratics Now that we can solve all qadratic eqations we want to solve eqations that are not eactly qadratic bt can either be made to look qadratic
Regular Specifications of Resource Requirements for Embedded Control Software
Reglar Specifications of Resorce Reqirements for Embedded Control Software Rajeev Alr and Gera Weiss University of Pennsylvania Abstract For embedded control systems a schedle for the allocation of resorces
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu [email protected] High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model
Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA Part 1: Hardware design and programming model Amin Safi Faculty of Mathematics, TU dortmund January 22, 2016 Table of Contents Set
Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks
Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks Garron K. Morris Senior Project Thermal Engineer [email protected] Standard Drives Division Bruce W. Weiss Principal
CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA
CUDA Optimization with NVIDIA Tools Julien Demouth, NVIDIA What Will You Learn? An iterative method to optimize your GPU code A way to conduct that method with Nvidia Tools 2 What Does the Application
CRM Customer Relationship Management. Customer Relationship Management
CRM Cstomer Relationship Management Kenneth W. Thorson Tax Commissioner Virginia Department of Taxation Discssion Areas TAX/AMS Partnership Project Backgrond Cstomer Relationship Management Secre Messaging
Research on Pricing Policy of E-business Supply Chain Based on Bertrand and Stackelberg Game
International Jornal of Grid and Distribted Compting Vol. 9, No. 5 (06), pp.-0 http://dx.doi.org/0.457/ijgdc.06.9.5.8 Research on Pricing Policy of E-bsiness Spply Chain Based on Bertrand and Stackelberg
MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA
MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS Julien Demouth, NVIDIA STAC-A2 BENCHMARK STAC-A2 Benchmark Developed by banks Macro and micro, performance and accuracy Pricing and Greeks for American
Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software
GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas
4.What is the appropriate dimensionless parameter to use in comparing flow types? YOUR ANSWER: The Reynolds Number, Re.
CHAPTER 08 1. What is most likely to be the main driving force in pipe flow? A. Gravity B. A pressure gradient C. Vacuum 2.What is a general description of the flow rate in laminar flow? A. Small B. Large
THE CFD SIMULATION OF THE FLOW AROUND THE AIRCRAFT USING OPENFOAM AND ANSA
THE CFD SIMULATION OF THE FLOW AROUND THE AIRCRAFT USING OPENFOAM AND ANSA Adam Kosík Evektor s.r.o., Czech Republic KEYWORDS CFD simulation, mesh generation, OpenFOAM, ANSA ABSTRACT In this paper we describe
ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING
ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING Sonam Mahajan 1 and Maninder Singh 2 1 Department of Computer Science Engineering, Thapar University, Patiala, India 2 Department of Computer Science Engineering,
Effect of Aspect Ratio on Laminar Natural Convection in Partially Heated Enclosure
Universal Journal of Mechanical Engineering (1): 8-33, 014 DOI: 10.13189/ujme.014.00104 http://www.hrpub.org Effect of Aspect Ratio on Laminar Natural Convection in Partially Heated Enclosure Alireza Falahat
On the urbanization of poverty
On the rbanization of poverty Martin Ravallion 1 Development Research Grop, World Bank 1818 H Street NW, Washington DC, USA Febrary 001; revised Jly 001 Abstract: Conditions are identified nder which the
Poisson Equation Solver Parallelisation for Particle-in-Cell Model
WDS'14 Proceedings of Contributed Papers Physics, 233 237, 214. ISBN 978-8-7378-276-4 MATFYZPRESS Poisson Equation Solver Parallelisation for Particle-in-Cell Model A. Podolník, 1,2 M. Komm, 1 R. Dejarnac,
Experiences on using GPU accelerators for data analysis in ROOT/RooFit
Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,
Graphic Processing Units: a possible answer to High Performance Computing?
4th ABINIT Developer Workshop RESIDENCE L ESCANDILLE AUTRANS HPC & Graphic Processing Units: a possible answer to High Performance Computing? Luigi Genovese ESRF - Grenoble 26 March 2009 http://inac.cea.fr/l_sim/
Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
Introduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
Sample Pages. Edgar Dietrich, Alfred Schulze. Measurement Process Qualification
Sample Pages Edgar Dietrich, Alfred Schlze Measrement Process Qalification Gage Acceptance and Measrement Uncertainty According to Crrent Standards ISBN: 978-3-446-4407-4 For frther information and order
Evolutionary Path Planning for Robot Assisted Part Handling in Sheet Metal Bending
Evoltionary Path Planning for Robot Assisted Part Handling in Sheet Metal Bending Abstract Xiaoyn Liao G. Gary Wang * Dept. of Mechanical & Indstrial Engineering, The University of Manitoba Winnipeg, MB,
High Performance Matrix Inversion with Several GPUs
High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República
APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE
APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE Tuyou Peng 1, Jun Peng 2 1 Electronics and information Technology Department Jiangmen Polytechnic, Jiangmen, Guangdong, China, [email protected]
WHITE PAPER. Filter Bandwidth Definition of the WaveShaper S-series Programmable Optical Processor
WHITE PAPER Filter andwidth Definition of the WaveShaper S-series 1 Introdction The WaveShaper family of s allow creation of ser-cstomized filter profiles over the C- or L- band, providing a flexible tool
Optimal Trust Network Analysis with Subjective Logic
The Second International Conference on Emerging Secrity Information, Systems and Technologies Optimal Trst Network Analysis with Sbjective Logic Adn Jøsang UNIK Gradate Center, University of Oslo Norway
CRM Customer Relationship Management. Customer Relationship Management
CRM Cstomer Relationship Management Farley Beaton Virginia Department of Taxation Discssion Areas TAX/AMS Partnership Project Backgrond Cstomer Relationship Management Secre Messaging Lessons Learned 2
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU Pervasive Parallelism Laboratory Stanford University Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun Graph and its Applications Graph Fundamental
HPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
CUDA for Real Time Multigrid Finite Element Simulation of
CUDA for Real Time Multigrid Finite Element Simulation of SoftTissue Deformations Christian Dick Computer Graphics and Visualization Group Technische Universität München, Germany Motivation Real time physics
Chapter 3. 2. Consider an economy described by the following equations: Y = 5,000 G = 1,000
Chapter C evel Qestions. Imagine that the prodction of fishing lres is governed by the prodction fnction: y.7 where y represents the nmber of lres created per hor and represents the nmber of workers employed
Black-Scholes option pricing. Victor Podlozhnyuk [email protected]
Black-Scholes option pricing Victor Podlozhnyuk [email protected] June 007 Document Change History Version Date Responsible Reason for Change 0.9 007/03/19 vpodlozhnyuk Initial release 1.0 007/04/06
Case Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke [email protected] ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
Parallel 3D Image Segmentation of Large Data Sets on a GPU Cluster
Parallel 3D Image Segmentation of Large Data Sets on a GPU Cluster Aaron Hagan and Ye Zhao Kent State University Abstract. In this paper, we propose an inherent parallel scheme for 3D image segmentation
(Toward) Radiative transfer on AMR with GPUs. Dominique Aubert Université de Strasbourg Austin, TX, 14.12.12
(Toward) Radiative transfer on AMR with GPUs Dominique Aubert Université de Strasbourg Austin, TX, 14.12.12 A few words about GPUs Cache and control replaced by calculation units Large number of Multiprocessors
Basic Equations, Boundary Conditions and Dimensionless Parameters
Chapter 2 Basic Equations, Boundary Conditions and Dimensionless Parameters In the foregoing chapter, many basic concepts related to the present investigation and the associated literature survey were
Model of a flow in intersecting microchannels. Denis Semyonov
Model of a flow in intersecting microchannels Denis Semyonov LUT 2012 Content Objectives Motivation Model implementation Simulation Results Conclusion Objectives A flow and a reaction model is required
NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
~ Greetings from WSU CAPPLab ~
~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
GPU Parallel Computing Architecture and CUDA Programming Model
GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel
5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model
5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model C99, C++, F2003 Compilers Optimizing Vectorizing Parallelizing Graphical parallel tools PGDBG debugger PGPROF profiler Intel, AMD, NVIDIA
PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms
PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014 Overview Motivation Flux Reconstruction Many-Core
OpenACC 2.0 and the PGI Accelerator Compilers
OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group [email protected] This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present
TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW
TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW Rajesh Khatri 1, 1 M.Tech Scholar, Department of Mechanical Engineering, S.A.T.I., vidisha
Several tips on how to choose a suitable computer
Several tips on how to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and postprocessing of your data with Artec
A priori error analysis of stabilized mixed finite element method for reaction-diffusion optimal control problems
F et al. Bondary Vale Problems 2016 2016:23 DOI 10.1186/s13661-016-0531-9 R E S E A R C H Open Access A priori error analysis of stabilized mixed finite element metod for reaction-diffsion optimal control
Iterative Solvers for Linear Systems
9th SimLab Course on Parallel Numerical Simulation, 4.10 8.10.2010 Iterative Solvers for Linear Systems Bernhard Gatzhammer Chair of Scientific Computing in Computer Science Technische Universität München
GPGPU Parallel Merge Sort Algorithm
GPGPU Parallel Merge Sort Algorithm Jim Kukunas and James Devine May 4, 2009 Abstract The increasingly high data throughput and computational power of today s Graphics Processing Units (GPUs), has led
Spring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
