Attaining High Performance in General-Purpose Computations on Current Graphics Processors

Size: px
Start display at page:

Download "Attaining High Performance in General-Purpose Computations on Current Graphics Processors"

Transcription

1 Attaining High Performance in General-Purpose Computations on Current Graphics Processors Francisco Igual Rafael Mayo Enrique S. Quintana-Ortí Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón (Spain) Attaining High Performance in General-Purpose Computations on GPUs 1 Igual et al.

2 Motivation (I) GPUs have become the first widely available HPC platform Successfully used in linear algebra, image processing, data mining, simulations... Recently introduced advances: Hardware level: Unified Architecture Software level: CUDA Peak performance up to 10x (Core2Duo vs Nvidia 8800) Is it always a benefit the use of GPUs? Attaining High Performance in General-Purpose Computations on GPUs 2 Igual et al.

3 Motivation (I) GPUs have become the first widely available HPC platform Successfully used in linear algebra, image processing, data mining, simulations... Recently introduced advances: Hardware level: Unified Architecture Software level: CUDA Peak performance up to 10x (Core2Duo vs Nvidia 8800) Is it always a benefit the use of GPUs? Attaining High Performance in General-Purpose Computations on GPUs 2 Igual et al.

4 Motivation (II) GPUs are specific-purpose processors The GPU architecture is focused on graphics performance Not every general-purpose algorithm will fit correctly on GPU Newest GPU models introduce interesting features from the GPGPU point of view Which algorithms fit better on the graphics processor? Is it always better the use of GPU? Impact of the improvements of last generations of GPUs Attaining High Performance in General-Purpose Computations on GPUs 3 Igual et al.

5 Motivation (II) GPUs are specific-purpose processors The GPU architecture is focused on graphics performance Not every general-purpose algorithm will fit correctly on GPU Newest GPU models introduce interesting features from the GPGPU point of view Which algorithms fit better on the graphics processor? Is it always better the use of GPU? Impact of the improvements of last generations of GPUs Attaining High Performance in General-Purpose Computations on GPUs 3 Igual et al.

6 Motivation (II) GPUs are specific-purpose processors The GPU architecture is focused on graphics performance Not every general-purpose algorithm will fit correctly on GPU Newest GPU models introduce interesting features from the GPGPU point of view Which algorithms fit better on the graphics processor? Is it always better the use of GPU? Impact of the improvements of last generations of GPUs Attaining High Performance in General-Purpose Computations on GPUs 3 Igual et al.

7 Motivation (II) GPUs are specific-purpose processors The GPU architecture is focused on graphics performance Not every general-purpose algorithm will fit correctly on GPU Newest GPU models introduce interesting features from the GPGPU point of view Which algorithms fit better on the graphics processor? Is it always better the use of GPU? Impact of the improvements of last generations of GPUs Attaining High Performance in General-Purpose Computations on GPUs 3 Igual et al.

8 Outline 1 Introduction 2 Classical GPU architecture 3 Unified GPU architecture 4 Experimental setup 5 Results on classical GPUs 6 Results on unified GPUs 7 Conclusions Attaining High Performance in General-Purpose Computations on GPUs 4 Igual et al.

9 Introduction Introduction and goals Methodology Microbenchmarks from BLAS and image processing Implement the benchmarks on CPU and both generations of GPUs Use only optimized benchmarks on both architectures Goals Extract the features of GPU-like algorithms through the evaluation of different routines Compare both generations of GPU architectures Extract the impact of data transfers on both generations Decide which algorithms are CPU or GPU-like Attaining High Performance in General-Purpose Computations on GPUs 5 Igual et al.

10 Introduction Introduction and goals Methodology Microbenchmarks from BLAS and image processing Implement the benchmarks on CPU and both generations of GPUs Use only optimized benchmarks on both architectures Goals Extract the features of GPU-like algorithms through the evaluation of different routines Compare both generations of GPU architectures Extract the impact of data transfers on both generations Decide which algorithms are CPU or GPU-like Attaining High Performance in General-Purpose Computations on GPUs 5 Igual et al.

11 Classical GPU architecture Classical GPU pipeline Each stage of the pipeline works with a different datatype Each stage is computed on a different type of processor (vertex processors and fragment processors) But the processors are programmable... Attaining High Performance in General-Purpose Computations on GPUs 6 Igual et al.

12 Classical GPU architecture Classical GPU overview Programmability Both vertex processors and fragments processors are programmable Usually fragment processors (FP) are programmed The replication of FP fits well with data-parallel routines SIMD architecture Disadvantages No memory hierarchy at all Just one memory direction could be written for each fragment Hard to program (OpenGL + Cg) Attaining High Performance in General-Purpose Computations on GPUs 7 Igual et al.

13 Classical GPU architecture Classical GPU overview Programmability Both vertex processors and fragments processors are programmable Usually fragment processors (FP) are programmed The replication of FP fits well with data-parallel routines SIMD architecture Disadvantages No memory hierarchy at all Just one memory direction could be written for each fragment Hard to program (OpenGL + Cg) Attaining High Performance in General-Purpose Computations on GPUs 7 Igual et al.

14 Unified GPU architecture Unified GPU shader One unified shader with multiple functionality Works with any type of graphical data GPGPU: all programmable shaders are now available Attaining High Performance in General-Purpose Computations on GPUs 8 Igual et al.

15 Unified GPU architecture Unified GPU architecture: G80 NVIDIA G80 Main unified implementation Up to 128 cores in the unified shader Introduction of memory hierarchy Higher clock frequency in the shader More flexibility in reading/writing to memory CUDA library Attaining High Performance in General-Purpose Computations on GPUs 9 Igual et al.

16 Experimental setup Selection of benchmarks Features to be evaluated 1 Data parallelism 2 Input data reutilization 3 Computational intensity per stream element Routines to be evaluated: SGEMM: C = αab + βc SGEMV: y = αax + βy SAXPY: y = αx + y SSCAL: y = αy 2D Convolution Attaining High Performance in General-Purpose Computations on GPUs 10 Igual et al.

17 Experimental setup Features of the benchmarks Selected routines: Routine Type Data parallelism Input reutilization Arith. intensity SGEMM BLAS 3 High High High SGEMV BLAS 2 High Medium Medium SAXPY BLAS 1 High None Low SSCAL BLAS 1 High None Lowest Convolution Image High Low Medium Hardware setup: Classical architecture Unified architecture AMD Athlon Nvidia Geforce 6200 GotoBLAS / CUBLAS 1.0 Intel Core2Duo 1.86 Ghz + Nvidia Geforce 8800 Ultra Attaining High Performance in General-Purpose Computations on GPUs 11 Igual et al.

18 Experimental setup Features of the benchmarks Selected routines: Routine Type Data parallelism Input reutilization Arith. intensity SGEMM BLAS 3 High High High SGEMV BLAS 2 High Medium Medium SAXPY BLAS 1 High None Low SSCAL BLAS 1 High None Lowest Convolution Image High Low Medium Hardware setup: Classical architecture Unified architecture AMD Athlon Nvidia Geforce 6200 GotoBLAS / CUBLAS 1.0 Intel Core2Duo 1.86 Ghz + Nvidia Geforce 8800 Ultra Attaining High Performance in General-Purpose Computations on GPUs 11 Igual et al.

19 Experimental setup Features of the benchmarks Selected routines: Routine Type Data parallelism Input reutilization Arith. intensity SGEMM BLAS 3 High High High SGEMV BLAS 2 High Medium Medium SAXPY BLAS 1 High None Low SSCAL BLAS 1 High None Lowest Convolution Image High Low Medium Hardware setup: Classical architecture Unified architecture AMD Athlon Nvidia Geforce 6200 GotoBLAS / CUBLAS 1.0 Intel Core2Duo 1.86 Ghz + Nvidia Geforce 8800 Ultra Attaining High Performance in General-Purpose Computations on GPUs 11 Igual et al.

20 Experimental setup Features of the benchmarks Selected routines: Routine Type Data parallelism Input reutilization Arith. intensity SGEMM BLAS 3 High High High SGEMV BLAS 2 High Medium Medium SAXPY BLAS 1 High None Low SSCAL BLAS 1 High None Lowest Convolution Image High Low Medium Hardware setup: Classical architecture Unified architecture AMD Athlon Nvidia Geforce 6200 GotoBLAS / CUBLAS 1.0 Intel Core2Duo 1.86 Ghz + Nvidia Geforce 8800 Ultra Attaining High Performance in General-Purpose Computations on GPUs 11 Igual et al.

21 Experimental setup Features of the benchmarks Selected routines: Routine Type Data parallelism Input reutilization Arith. intensity SGEMM BLAS 3 High High High SGEMV BLAS 2 High Medium Medium SAXPY BLAS 1 High None Low SSCAL BLAS 1 High None Lowest Convolution Image High Low Medium Hardware setup: Classical architecture Unified architecture AMD Athlon Nvidia Geforce 6200 GotoBLAS / CUBLAS 1.0 Intel Core2Duo 1.86 Ghz + Nvidia Geforce 8800 Ultra Attaining High Performance in General-Purpose Computations on GPUs 11 Igual et al.

22 Results on classical GPUs Results on classical GPUs. SGEMM Test SGEMM MFLOPS CPU MFLOPS GPU MFLOPS GPU w. TX. Data reutilization impact MFLOPs CPU outperforms GPU Speedup up to x4 Impact of cache hierarchy on CPU Data transfers not significant Matrix dimension Input data reutilization seems a key factor on GPU SGEMV exhibits less input data reutilization... Attaining High Performance in General-Purpose Computations on GPUs 12 Igual et al.

23 Results on classical GPUs Results on classical GPUs. SGEMV Test SGEMV MFLOPS CPU MFLOPS GPU MFLOPS GPU w. TX Data reutilization impact MFLOPs CPU outperforms GPU, but up to x2 Data transfers more significant Vector dimension Input data reutilization is a key factor on GPU Attaining High Performance in General-Purpose Computations on GPUs 13 Igual et al.

24 Results on classical GPUs Results on classical GPUs. SAXPY and SSCAL Arithmetic intensity Test SAXPY MFLOPS CPU MFLOPS GPU MFLOPS GPU w. TX 2000 Test SSCAL MFLOPS CPU MFLOPS GPU MFLOPS GPU w. TX 1500 MFLOPs MFLOPs Vector dimension Vector dimension SAXPY and SSCAL are stream-oriented operations They should attain high performance on GPUs However, CPU outperforms GPU for these operations Arithmetic intensity per stream element is another key factor Attaining High Performance in General-Purpose Computations on GPUs 14 Igual et al.

25 Results on classical GPUs Results on classical GPUs. Convolution Time (ms) Test Convolution - Image 512x512 CPU GPU4 GPU4 w. TX GPU Convolution performance Attained performance similar to that on CPU Optimized implementation for vectorial capabilities (GPU4) attains 4x speedup Filter Width Required features on GPU High data parallelism Low input data reutilization High arithmetic intensity per stream element Attaining High Performance in General-Purpose Computations on GPUs 15 Igual et al.

26 Results on unified GPUs Results on unified GPUs. SGEMM GFLOPs Test SGEMM CUDA GFLOPS CPU GFLOPS GPU GFLOPS GPU w. TX Matrix dimension Data reutilization impact GPU outperforms CPU Speedup up to x10 However, only attaining 20% of the peak performance of the GPU Data transfers more significant than on previous generations Input data reutilization is still important on GPU, but less than on previous generations Reason: more sophisticated memory hierarchy Attaining High Performance in General-Purpose Computations on GPUs 16 Igual et al.

27 Results on unified GPUs Results on unified GPUs. SGEMV GFLOPs Test SGEMV CUDA GFLOPS CPU GFLOPS GPU GFLOPS GPU w. TX Vector dimension Data reutilization impact GPU outperforms CPU for big matrices Impact of cache hierarchy on CPU: faster for small streams of data Data transfers very significant Data transfer stage is a very important factor now G6 G80 (x20) but AGP PCIExpress (x2) Modern GPUs work better with big streams of data Attaining High Performance in General-Purpose Computations on GPUs 17 Igual et al.

28 Results on unified GPUs Results on unified GPUs. SAXPY and SSCAL Arithmetic intensity 6 5 Test SAXPY CUDA GFLOPS CPU GFLOPS GPU GFLOPS GPU w. TX Test SSCAL CUDA GFLOPS CPU GFLOPS GPU GFLOPS GPU w. TX 4 5 GFLOPs 3 GFLOPs Vector dimension Vector dimension However, arithmetic intensity per element is still a key factor CPU outperforms GPU for these operations Attaining High Performance in General-Purpose Computations on GPUs 18 Igual et al.

29 Results on unified GPUs Results on unified GPUs. Convolution GFLOPS Test Convolution - Image 512x512 GFLOPS CPU GFLOPS GPU GFLOPS GPU w. TX. Convolution performance Attained the highest speedup of the benchmarks Highest data transfer impact Filter Width Required features on unified GPUs High data parallelism, low reutilization, high intensity Low ratio transfer/calculation Big data streams Attaining High Performance in General-Purpose Computations on GPUs 19 Igual et al.

30 Results on unified GPUs Results on unified GPUs. Convolution GFLOPS Test Convolution - Image 512x512 GFLOPS CPU GFLOPS GPU GFLOPS GPU w. TX. Convolution performance Attained the highest speedup of the benchmarks Highest data transfer impact Filter Width Required features on unified GPUs High data parallelism, low reutilization, high intensity Low ratio transfer/calculation Big data streams Attaining High Performance in General-Purpose Computations on GPUs 19 Igual et al.

31 Conclusions Conclusions The GPU is an interesting platform for HPC However, not every algorithm extracts all the performance Desirable features: High data parallelism Low input data reutilization High arithmetic intensity Operating with big streams of data Necessary to design algorithm minimizing data transfers Attaining High Performance in General-Purpose Computations on GPUs 20 Igual et al.

32 Conclusions Thank you... For more information... figual Attaining High Performance in General-Purpose Computations on GPUs 21 Igual et al.

Retargeting PLAPACK to Clusters with Hardware Accelerators

Retargeting PLAPACK to Clusters with Hardware Accelerators Retargeting PLAPACK to Clusters with Hardware Accelerators Manuel Fogué 1 Francisco Igual 1 Enrique S. Quintana-Ortí 1 Robert van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores.

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

High Performance Matrix Inversion with Several GPUs

High Performance Matrix Inversion with Several GPUs High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate

More information

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

ST810 Advanced Computing

ST810 Advanced Computing ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview

More information

L20: GPU Architecture and Models

L20: GPU Architecture and Models L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.

More information

Graphic Processing Units: a possible answer to High Performance Computing?

Graphic Processing Units: a possible answer to High Performance Computing? 4th ABINIT Developer Workshop RESIDENCE L ESCANDILLE AUTRANS HPC & Graphic Processing Units: a possible answer to High Performance Computing? Luigi Genovese ESRF - Grenoble 26 March 2009 http://inac.cea.fr/l_sim/

More information

GPGPU accelerated Computational Fluid Dynamics

GPGPU accelerated Computational Fluid Dynamics t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute

More information

GPGPU Computing. Yong Cao

GPGPU Computing. Yong Cao GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...

More information

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Modern GPU

More information

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University A little about me http://graphics.stanford.edu/~mhouston Education: UC San Diego, Computer Science BS Stanford

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

More information

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter Daniel Weingaertner Informatics Department Federal University of Paraná - Brazil Hochschule Regensburg 02.05.2011 Daniel

More information

ultra fast SOM using CUDA

ultra fast SOM using CUDA ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A

More information

Choosing a Computer for Running SLX, P3D, and P5

Choosing a Computer for Running SLX, P3D, and P5 Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line

More information

IP Video Rendering Basics

IP Video Rendering Basics CohuHD offers a broad line of High Definition network based cameras, positioning systems and VMS solutions designed for the performance requirements associated with critical infrastructure applications.

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

QCD as a Video Game?

QCD as a Video Game? QCD as a Video Game? Sándor D. Katz Eötvös University Budapest in collaboration with Győző Egri, Zoltán Fodor, Christian Hoelbling Dániel Nógrádi, Kálmán Szabó Outline 1. Introduction 2. GPU architecture

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

Le langage OCaml et la programmation des GPU

Le langage OCaml et la programmation des GPU Le langage OCaml et la programmation des GPU GPU programming with OCaml Mathias Bourgoin - Emmanuel Chailloux - Jean-Luc Lamotte Le projet OpenGPU : un an plus tard Ecole Polytechnique - 8 juin 2011 Outline

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

~ Greetings from WSU CAPPLab ~

~ Greetings from WSU CAPPLab ~ ~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)

More information

Introduction to GPU Computing

Introduction to GPU Computing Matthis Hauschild Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - 1 Table of Contents 1. Architecture

More information

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Dissertation submitted in partial fulfillment of the requirements for the degree of Master of Technology, Computer Engineering by Amol

More information

The Future Of Animation Is Games

The Future Of Animation Is Games The Future Of Animation Is Games 王 銓 彰 Next Media Animation, Media Lab, Director cwang@1-apple.com.tw The Graphics Hardware Revolution ( 繪 圖 硬 體 革 命 ) : GPU-based Graphics Hardware Multi-core (20 Cores

More information

GPGPU acceleration in OpenFOAM

GPGPU acceleration in OpenFOAM Carl-Friedrich Gauß Faculty GPGPU acceleration in OpenFOAM Northern germany OpenFoam User meeting Braunschweig Institute of Technology Thorsten Grahs Institute of Scientific Computing/move-csc 2nd October

More information

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series By: Binesh Tuladhar Clay Smith Overview History of GPU s GPU Definition Classical Graphics Pipeline Geforce 6 Series Architecture Vertex

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

GPU Computing - CUDA

GPU Computing - CUDA GPU Computing - CUDA A short overview of hardware and programing model Pierre Kestener 1 1 CEA Saclay, DSM, Maison de la Simulation Saclay, June 12, 2012 Atelier AO and GPU 1 / 37 Content Historical perspective

More information

Retour d expérience : portage d une application haute-performance vers un langage de haut niveau

Retour d expérience : portage d une application haute-performance vers un langage de haut niveau Retour d expérience : portage d une application haute-performance vers un langage de haut niveau ComPAS/RenPar 2013 Mathias Bourgoin - Emmanuel Chailloux - Jean-Luc Lamotte 16 Janvier 2013 Our Goals Globally

More information

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011 Visually realistic goal force[d] us to completely rethink the entire rendering process. Cook

More information

Introduction to GPU Architecture

Introduction to GPU Architecture Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three

More information

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

More information

Real-time Visual Tracker by Stream Processing

Real-time Visual Tracker by Stream Processing Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol

More information

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt. Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.edu Outline Motivation why do we need GPUs? Past - how was GPU programming

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

More information

Experiences With Mobile Processors for Energy Efficient HPC

Experiences With Mobile Processors for Energy Efficient HPC Experiences With Mobile Processors for Energy Efficient HPC Nikola Rajovic, Alejandro Rico, James Vipond, Isaac Gelado, Nikola Puzovic, Alex Ramirez Barcelona Supercomputing Center Universitat Politècnica

More information

10- High Performance Compu5ng

10- High Performance Compu5ng 10- High Performance Compu5ng (Herramientas Computacionales Avanzadas para la Inves6gación Aplicada) Rafael Palacios, Fernando de Cuadra MRE Contents Implemen8ng computa8onal tools 1. High Performance

More information

The Methodology of Application Development for Hybrid Architectures

The Methodology of Application Development for Hybrid Architectures Computer Technology and Application 4 (2013) 543-547 D DAVID PUBLISHING The Methodology of Application Development for Hybrid Architectures Vladimir Orekhov, Alexander Bogdanov and Vladimir Gaiduchok Department

More information

Trading Off Performance for Power-Energy in Dense Linear Algebra Operations

Trading Off Performance for Power-Energy in Dense Linear Algebra Operations Trading Off Performance for Power-Energy in Dense Linear Algebra Operations Peter Benner 1, Pablo Ezzatti 2, Enrique S. Quintana-Ortí 3, and Alfredo Remón 1 1 Max-Planck-Institute for Dynamics of Complex

More information

Introduction to Computer Graphics

Introduction to Computer Graphics Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 torsten@sfu.ca www.cs.sfu.ca/~torsten Today What is computer graphics? Contents of this course Syllabus Overview of course topics

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

Applying Parallel and Distributed Computing for Image Reconstruction in 3D Electrical Capacitance Tomography

Applying Parallel and Distributed Computing for Image Reconstruction in 3D Electrical Capacitance Tomography AUTOMATYKA 2010 Tom 14 Zeszyt 3/2 Pawe³ Kapusta*, Micha³ Majchrowicz*, Robert Banasiak* Applying Parallel and Distributed Computing for Image Reconstruction in 3D Electrical Capacitance Tomography 1. Introduction

More information

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University,

More information

GPGPU: General-Purpose Computation on GPUs

GPGPU: General-Purpose Computation on GPUs GPGPU: General-Purpose Computation on GPUs Randy Fernando NVIDIA Developer Technology Group (Original Slides Courtesy of Mark Harris) Why GPGPU? The GPU has evolved into an extremely flexible and powerful

More information

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

More information

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators Ahmad Abdelfattah 1, Jack Dongarra 2, David Keyes 1, and Hatem Ltaief 3 1 KAUST Division of Mathematical and Computer Sciences and Engineering,

More information

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008 Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2 GPGPU for Real-Time Data Analytics: Introduction Bingsheng He 1, Huynh Phung Huynh 2, Rick Siow Mong Goh 2 1 Nanyang Technological University, Singapore 2 A*STAR Institute of High Performance Computing,

More information

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems 202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric

More information

ENHANCEMENT OF TEGRA TABLET'S COMPUTATIONAL PERFORMANCE BY GEFORCE DESKTOP AND WIFI

ENHANCEMENT OF TEGRA TABLET'S COMPUTATIONAL PERFORMANCE BY GEFORCE DESKTOP AND WIFI ENHANCEMENT OF TEGRA TABLET'S COMPUTATIONAL PERFORMANCE BY GEFORCE DESKTOP AND WIFI Di Zhao The Ohio State University GPU Technology Conference 2014, March 24-27 2014, San Jose California 1 TEGRA-WIFI-GEFORCE

More information

GPU Architecture. Michael Doggett ATI

GPU Architecture. Michael Doggett ATI GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super

More information

Advanced Rendering for Engineering & Styling

Advanced Rendering for Engineering & Styling Advanced Rendering for Engineering & Styling Prof. B.Brüderlin Brüderlin,, M Heyer 3Dinteractive GmbH & TU-Ilmenau, Germany SGI VizDays 2005, Rüsselsheim Demands in Engineering & Styling Engineering: :

More information

Clustering Billions of Data Points Using GPUs

Clustering Billions of Data Points Using GPUs Clustering Billions of Data Points Using GPUs Ren Wu ren.wu@hp.com Bin Zhang bin.zhang2@hp.com Meichun Hsu meichun.hsu@hp.com ABSTRACT In this paper, we report our research on using GPUs to accelerate

More information

Latency and Bandwidth Impact on GPU-systems

Latency and Bandwidth Impact on GPU-systems NTNU Norwegian University of Science and Technology Faculty of Information Technology, Mathematics and Electrical Engineering Department of Computer and Information Science TDT4590 Complex Computer Systems,

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

Radeon HD 2900 and Geometry Generation. Michael Doggett

Radeon HD 2900 and Geometry Generation. Michael Doggett Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle? Lecture 3: Evaluating Computer Architectures Announcements - Reminder: Homework 1 due Thursday 2/2 Last Time technology back ground Computer elements Circuits and timing Virtuous cycle of the past and

More information

A Very Brief History of High-Performance Computing

A Very Brief History of High-Performance Computing A Very Brief History of High-Performance Computing CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 1

More information

Accelerating Wavelet-Based Video Coding on Graphics Hardware

Accelerating Wavelet-Based Video Coding on Graphics Hardware Wladimir J. van der Laan, Andrei C. Jalba, and Jos B.T.M. Roerdink. Accelerating Wavelet-Based Video Coding on Graphics Hardware using CUDA. In Proc. 6th International Symposium on Image and Signal Processing

More information

Evaluation of CUDA Fortran for the CFD code Strukti

Evaluation of CUDA Fortran for the CFD code Strukti Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center

More information

Data-parallel Acceleration of PARSEC Black-Scholes Benchmark

Data-parallel Acceleration of PARSEC Black-Scholes Benchmark Data-parallel Acceleration of PARSEC Black-Scholes Benchmark AUGUST ANDRÉN and PATRIK HAGERNÄS KTH Information and Communication Technology Bachelor of Science Thesis Stockholm, Sweden 2013 TRITA-ICT-EX-2013:158

More information

Interactive Level-Set Deformation On the GPU

Interactive Level-Set Deformation On the GPU Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University Data Parallel Computing on Graphics Hardware Ian Buck Stanford University Brook General purpose Streaming language DARPA Polymorphous Computing Architectures Stanford - Smart Memories UT Austin - TRIPS

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

64-Bit versus 32-Bit CPUs in Scientific Computing

64-Bit versus 32-Bit CPUs in Scientific Computing 64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples

More information

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Quiz for Chapter 1 Computer Abstractions and Technology 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

More information

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute

More information

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

More information

System requirements for Autodesk Building Design Suite 2017

System requirements for Autodesk Building Design Suite 2017 System requirements for Autodesk Building Design Suite 2017 For specific recommendations for a product within the Building Design Suite, please refer to that products system requirements for additional

More information

Stochastic Analysis of a Queue Length Model Using a Graphics Processing Unit

Stochastic Analysis of a Queue Length Model Using a Graphics Processing Unit Stochastic Analysis of a ueue Length Model Using a Graphics Processing Unit J. Přiryl* Faculty of Transportation Sciences, Czech University of Technology, Prague, Czech Republic Institute of Information

More information

Optimization for DirectX9 Graphics. Ashu Rege

Optimization for DirectX9 Graphics. Ashu Rege Optimization for DirectX9 Graphics Ashu Rege Last Year: Batch, Batch, Batch Moral of the story: Small batches BAD What is a batch Every DrawIndexedPrimitive call is a batch All render, texture, shader,...

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Dense Linear Algebra Solvers for Multicore with GPU Accelerators

Dense Linear Algebra Solvers for Multicore with GPU Accelerators Dense Linear Algebra Solvers for Multicore with GPU Accelerators Stanimire Tomov, Rajib Nath, Hatem Ltaief, and Jack Dongarra Department of Electrical Engineering and Computer Science, University of Tennessee,

More information

Brook for GPUs: Stream Computing on Graphics Hardware

Brook for GPUs: Stream Computing on Graphics Hardware Brook for GPUs: Stream Computing on Graphics Hardware Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan Computer Science Department Stanford University

More information

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Intro, GPU Computing February 9, 2012 Dan Negrut, 2012 ME964 UW-Madison "The Internet is a great way to get on the net. US Senator Bob Dole

More information

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager*

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

CUDAMat: a CUDA-based matrix class for Python

CUDAMat: a CUDA-based matrix class for Python Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 November 25, 2009 UTML TR 2009 004 CUDAMat: a CUDA-based

More information

Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis Martínez, Gerardo Fernández-Escribano, José M. Claver and José Luis Sánchez

Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis Martínez, Gerardo Fernández-Escribano, José M. Claver and José Luis Sánchez Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis artínez, Gerardo Fernández-Escribano, José. Claver and José Luis Sánchez 1. Introduction 2. Technical Background 3. Proposed DVC to H.264/AVC

More information

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University

More information