APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE
|
|
|
- Georgia Mitchell
- 10 years ago
- Views:
Transcription
1 APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE Tuyou Peng 1, Jun Peng 2 1 Electronics and information Technology Department Jiangmen Polytechnic, Jiangmen, Guangdong, China, [email protected] 2 TSMC Limited Shanghai, China, john @yahoo.com.cn Abstract. Joint programming of QT and CUDA is a urgent problem on Linux, a Linux-based QT-CUDA parallel architecture has been built creatively. As an example, an fast parallel rendering algorithm for seismic and GPR imaging is proposed and implemented based on the Linux QT-CUDA parallel architecture. It is proved that the parallel rendering algorithm is about ten times faster than conventional algorithm, which can be widely applied to fast visualization of different kinds of 2D data. Keywords: QT, CMAKE,CUDA, parallel computing, imaging. Introduction CUDA (Compute Unified Device Architecture)[1] parallel computing technology invented by NVIDIA in 2007 is the latest and powerful high-performance processing technology. The CUDA source code of GPU can only be compiled into an independent program on Windows so far. In order to achieve joint programming of QT [2,3] and CUDA, and take their advantages simultaneously, building a Linux-based QT-CUDA parallel architecture to achieve a high degree of QT and CUDA integration is of practical significance. After in-depth study and experiment, a QT-CUDA parallel architecture has been built creatively, which has solved the urgent problem of programming effectively. As an example, the fast GPU rendering algorithm for seismic and GPR (Ground Penetrating Radar) data visualization is proposed and implemented based on QT- CUDA parallel architecture. The GPU rendering algorithm is about ten times faster than conventional rendering algorithm on the same PC, which is proved to be the fastest seismic data visualization solution on PC currently. A. QT Cross-Platform Features. QT introduced by Norwegian TrollTech is a cross-platform C++ GUI(Graphical User Interface) application development platform. It includes many features: Cross-platform feature supports Unix, Linux and Windows. Object-oriented feature and encapsulation mechanism make QT be a very high degree of modularity. Being rich of APIs(Application Programming Interface), including more than 250 classes. Using private signals and slots technology to communicate between objects. supporting 2D and 3D graphics rendering. 82
2 QT implements communication between objects by means of signals and slots technology. For example, an occurrence of event can fire one or more signals; any object can define slot functions to make different responses. Signals and slots technology has completely overcome two major shortcomings of C++ callback method. Besides, all slots are executed by means of multi-threads and asynchronous. The relation between signals and slots is loose. Those features make QT being a good choice of client-server software and embedded software development tools. But QT itself does not have the parallel processing capability, it must be integrated with CUDA to carry out. B. GPU Parallel Principle [4]. In 1999, NIVDIA released the first GPU graphic processors, GPU hardware maintains a rapid development by now, GPU performance can double every year, much higher than CPU (Central Processing Unit), which follows Moore's Law of every 18 to 24 months to double the performance. However, the GPU parallel computing can not be carried out until the official release of CUDA by NIVDIA in GPU-based parallel techniques can be simply described as following: CPU does data preparation, and executes serial initialization code; GPU carries out parallel computing. GPU-based graphic processor directly connects to CPU by AGP or PCI-E bus, which owns independent high-speed memory and lightweight threads. GPU can achieve zero-overhead fast thread switching. At present, NIVDIA's GPU contains more than 1 to 30 SMs (Stream Processors), which is equivalent to 1 to 30 CPUs. Each SM has eight SIMDs (Single Instruction Multiple Data), which is equivalent to 8 CPUs cores. And a SIMD has a thread block of up to 512 threads. Therefore, the performance of GPU can be 10 times faster than performance of CPU at the same period. Besides, a host (computer) can install multiple GPU graphic cards; therefore, the performance of GPU parallel computing on PC can reach high performance of the cluster computing. CUDA parallel programming model is shown in Figure 1[1], which comes from the NVIDIA CUDA programming guide. In Figure 1, CPU is on behalf of Host executes Serial Code (serial code program); Device is on behalf of GPU executes parallel kernel functions (parallel kernel code); Grid is on behalf of abstract stream processors grid and Block is on behalf of an abstract grid of threads. Implementation of QT-CUDA Parallel Architecture At present, CUDA can only use C programming language, which supports a little C ++ syntax, so it can not be combined with QT programming. It is very urgent to solve the problem of integrating QT and the CUDA programming. By in-depth study of the existing open source software CMAKE and joint programming technique, an innovative QT-CUDA parallel architecture is presented and implemented on Linux box after repeated tests. The steps as follows: 83
3 Fig. 1 CUDA parallel programming model. A. Setup the Latest Open Source Packages (QT, CUDA, and CMAKE) on Linux System properly. Among them, CUDA carries out parallel processing of GPU programming; QT carries out serial code programming and calling CUDA parallel processing function; CMAKE coordinates the link between QT and CUDA, generating Makefile for Gcc compiler to compile codes automatically. The three divisions together constitute the QT-CUDA parallel architecture. B. Create Subdirectories for QT-CUDA Parallel Architecture. According to directory structure shown in Figure 2, create a QT-CUDA parallel development subdirectory named qtcuda, and design CMAKE main configuration file CMakeLists.txt. The codes of CMakeLists.txt[5] are as follows: #specify the target project name PROJECT (target) #Cmake specify CMAKE version or higher CMAKE_MINIMUM_REQUIRED (VERSION 2.6.0) # QT4 specify the QT 4.0 or higher version FIND_PACKAGE (Qt4 REQUIRED) # CUDA specify the CUDA parallel software FIND_PACKAGE (CUDA REQUIRED) # Source specify the source subdirectory of all codes subdir SUBDIRS (codes) 84
4 Fig. 2 Directory structure of QT-CUDA parallel architecture C. Design Another Configuration File Cmakelists.Txt And Place It At Directory Of Codes, Put All QT Source Code At Subdirectory Of Qt, And Put All CUDA Source Code At Subdirectory Of Cuda. The codes of CMakeLists.txt are as follows: #setup qt source code SET (QT_USE_QTOPENGL 1) SET (QT_USE_QTXML 1) INCLUDE ($ {QT_USE_FILE}) # list all qt program files here SET (qt_cuda_srcs_cxx qt / *. cpp ) # list all qt head files here SET (qt_cuda_moc_hdrs qt / *. h ) # list all qt GUI ui files here SET (qt_cuda_uis qt / *. ui ) # list all cuda parallel program files here SET (qt_cuda_cuda_srcs cuda / *. cu) #specify method for compiling qt ui files. QT4_WRAP_UI (qt_cuda_srcs_cxx $ {qt_cuda_uis}) INCLUDE_DIRECTORIES ($ {CMAKE_CURRENT_BINARY_DIR}) #specify method for compiling qt moc files QT4_WRAP_CPP (qt_cuda_srcs_cxx $ {qt_cuda_moc_hdrs}) #specify the joint compiling method for qt and cuda codes CUDA_ADD_EXECUTABLE (${CMAKE_PROJECT_NAME}$ {Qt_cuda_CUDA_SRCS}$ {qt_cuda_srcs_cxx})target_link_libraries 85
5 ($ {CMAKE_PROJECT_NAME} $ {QT_LIBRARIES}) INSTALL (TARGETS ${CMAKE_PROJECT_NAME} DESTINATION bin) D. Write CUDA Source Codes. By coding with CUDA, main function can not be defined. Only the normal functions and the parallel processing kernel functions can be defined. E. Define and Execute CUDA Functions in QT Files. Declare external CUDA function in QT using extern C void cuda_function_name (parameter1, parameter2), and execute CUDA parallel processing function by passing real values to all parameters. The cuda_function_name is a CUDA function s name; parameter1, parameter2 are parameters of cuda_function_name. F. Create a Linux Bash Shell Script File Named Linux_Build.sh for Finishing Compilation automatically. The file named linux_build.sh is placed at subdirectory of qt-cuda. When linux_build.sh is executed, all QT and CUDA files can be automatically compiled into a single executable file with no errors. After the QT-CUDA parallel architecture is established, it is very simple to add or remove QT and CUDA files by manually modifying the configuration file CMakeLists.txt in subdirectory of codes. Application of QT-CUDA Parallel Architecture The color image of seismic data in oil and gas exploration is a very important tool to demonstrate the result of seismic data processing and interpretation. A regular rendering algorithm can cause GUI frozen and other problems [2,6] due to slow response on PC. In this thesis, an innovating and fast parallel rendering algorithm is proposed and implemented based on above QT-CUDA parallel architecture, which applies the latest GPU parallel technology to optimize rendering performance by GPU multi-threaded parallel processing and then directly writing OpenGL mapping buffer. It is proved that the parallel rendering algorithm is about ten times faster than conventional algorithm. The GPU parallel rendering algorithm flow chart is shown in Figure 3. 86
6 Start Select GPU device and do initialization Allocate memory for seismic data and one kind of color map data in GPU device Copy seismic data and color map data from host to device Build parallel thread grid with the same size of imaging Call multi-thread paralle kernel function to calculate each pixel data Finish multi-thread synchronization operation Image by writing pixel data to OpenGL buffer directly Stop Fig. 3 Flow chart of seismic imaging parallel rendering algorithm The GPU parallel processing kernel function code is given as follows: global void preparemap (float3 * positions, float * gpr, float * tmp_color, int screenwidth, int screenhigh, float min, float max, int colortablelength) { unsigned int i =blockidx.x*blockdim.x+threadidx.x; unsigned int j =blockidx.y*blockdim.y+threadidx.y; if (i<screenwidth && j<screenhigh) {int index=(int)(((colortablelength- 1)**(gpr+i*screenHigh +j)/(max-min)-(colortablelength-1)*min / (max-min))); if (index<0) index=0; if (index>colortablelength-1) index=colortablelength-1; positions[j*screenwidth+i]=make_float3(*(tmp_color+ index*3+0),*(tmp_color+index*3+1),*(tmp_color+index*3 +2)); //generate pixel data of map } syncthreads (); //multi-thread synchronization } Due to space limitation, the main codes of OpenGL initialization, generating pixel data and writing OpenGL buffer are only given as follows: glmatrixmode(gl_projection);//opengl initialization glloadidentity(); glrasterpos2f(-1.0,-0.85); glscalef(1.0,1.0,1.0); glviewport(0,0,screenwidth,screenhigh); for(int j=0;j<screenhigh/2;j++)//regulating pixel data for(int i=0;i<screenwidth;i++){ 87
7 float3firstdata=*(positions+j*screenwidth+i); *(positions+j*screenwidth+i)=*(positions+(screenhigh-2-j)*screenwidth+i); *(positions+(screenhigh-2-j)*screenwidth+i)=firstdata;} gldrawpixels(screenwidth,screenhigh,gl_rgb,gl_float,positions);// writing OpenGL buffer The seismic imaging [7] by GPU parallel is illustrated in Figure 4. With the means of the parallel seismic data rendering algorithm, fast imaging can be carried out at any processing and interpretation stage. What you see is what you get [8] can be carried out with no GUI frozen. The oil reservoir at blue polygon area is clearly shown at the centre of Figure 4.The same rendering algorithm is applied to implement GPR imaging and achieves good results[9,10]. Figure 5 is a fast imaging of underground pipe GPR field data based upon GPU imaging parallel rendering algorithm. Imagings are finished on the laptop Acer ASPIRE 4736G (containing GEFORCE G105M CUDA 512M graphics card). With regard to resolution image, the GPU imaging parallel rendering algorithm can get speedup of 10 respectively. oil reservoir Fig. 4 Seismic fast imaging of an oil field 88
8 Conclusions Fig. 5 GPR data fast imaging of underground pipe A GPU parallel computing technology is the latest and powerful high-performance processing technology. However, the current CUDA source code can only be compiled into an independent executable program. A QT-CUDA parallel architecture has been built creatively on Linux, which has solved the urgent problem of programming effectively. At last, the seismic and GPR imaging is carried out by a fast GPU rendering algorithm based on QT-CUDA parallel architecture. This solution can also apply to fast visualization of 2D data. Acknowledgement I would like to thank Mr. Guo Tao for his color map coding, I also want to give my thanks to all members of my team. Other thanks are for my classmate Mr. Xuecai Liu for reviewing my paper in US and other contributors who give me a lot of help not mentioned in references. References [1] CUDA, Feb [2] QT, Jan [3] Z. M. Cai, and Z. F. Lu, Master QT Programming, Electronic Industry Press,China:Beijing [4] S. Zhang, and Y. L. Zhu, CUDA of GPU High Performance, Chinese Water Resources and Hydropower Press,China:Beijing [5] CMAKE, Jan
9 [6] T. Y. Peng, and P. Shi, Arbitrary Profile Data Extraction Algorithm and It s Implementation of 3D Seismic Data Volume, Computing Techniques for Geophysical and Geochemical Exploration, vol. 31, pp , Sep [7] Mike B., and Steve F, 3D Seismic Discontinuity for Faults and Stratigraphic Features, The Coherence Cube, vol. 10, pp , Oct [8] T. Y. Peng, and J. X. Cao, Time slicing and arbitrary horizon extraction algorithm and implementation of 3D SEGY seismic data volume, IEEE Fifth International Conference on Bio-Inspired Computing:Theories and Applications,(IEEE Catalog Number:CFP 1001F- PRT) NO.16823, pp , Oct [9] Davis J. L., and Annan A. P., Ground-penetrating radar for high-resolution mapping of soil and rock stratigraphy, Geophysics Prospecting, vol. 37, pp ,June [10] Z. F. Zeng, and Steve F, Principle and Application of Ground Penetrating Radar Method, Science Press,China:Beijing
GPU Parallel Computing Architecture and CUDA Programming Model
GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel
Introduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
The Construction of Seismic and Geological Studies' Cloud Platform Using Desktop Cloud Visualization Technology
Send Orders for Reprints to [email protected] 1582 The Open Cybernetics & Systemics Journal, 2015, 9, 1582-1586 Open Access The Construction of Seismic and Geological Studies' Cloud Platform Using
Introduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
GeoImaging Accelerator Pansharp Test Results
GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices
E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,
Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
Clustering Billions of Data Points Using GPUs
Clustering Billions of Data Points Using GPUs Ren Wu [email protected] Bin Zhang [email protected] Meichun Hsu [email protected] ABSTRACT In this paper, we report our research on using GPUs to accelerate
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
Research and Design of Universal and Open Software Development Platform for Digital Home
Research and Design of Universal and Open Software Development Platform for Digital Home CaiFeng Cao School of Computer Wuyi University, Jiangmen 529020, China [email protected] Abstract. With the development
Intrusion Detection Architecture Utilizing Graphics Processors
Acta Informatica Pragensia 1(1), 2012, 50 59, DOI: 10.18267/j.aip.5 Section: Online: aip.vse.cz Peer-reviewed papers Intrusion Detection Architecture Utilizing Graphics Processors Liberios Vokorokos 1,
NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
Introduction to GPGPU. Tiziano Diamanti [email protected]
[email protected] Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
How PCI Express Works (by Tracy V. Wilson)
1 How PCI Express Works (by Tracy V. Wilson) http://computer.howstuffworks.com/pci-express.htm Peripheral Component Interconnect (PCI) slots are such an integral part of a computer's architecture that
The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA
The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques
ultra fast SOM using CUDA
ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Computer Graphics Hardware An Overview
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
Remote Graphical Visualization of Large Interactive Spatial Data
Remote Graphical Visualization of Large Interactive Spatial Data ComplexHPC Spring School 2011 International ComplexHPC Challenge Cristinel Mihai Mocan Computer Science Department Technical University
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING
www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging
Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
Parallel Firewalls on General-Purpose Graphics Processing Units
Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering
Lecture 1 Introduction to Android
These slides are by Dr. Jaerock Kwon at. The original URL is http://kettering.jrkwon.com/sites/default/files/2011-2/ce-491/lecture/alecture-01.pdf so please use that instead of pointing to this local copy
Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
Part I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics
GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),
Supported platforms & compilers Required software Where to download the packages Geant4 toolkit installation (release 9.6)
Supported platforms & compilers Required software Where to download the packages Geant4 toolkit installation (release 9.6) Configuring the environment manually Using CMake CLHEP full version installation
Writing Applications for the GPU Using the RapidMind Development Platform
Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...
CUDA Basics. Murphy Stein New York University
CUDA Basics Murphy Stein New York University Overview Device Architecture CUDA Programming Model Matrix Transpose in CUDA Further Reading What is CUDA? CUDA stands for: Compute Unified Device Architecture
STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM
STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM Albert M. K. Cheng, Shaohong Fang Department of Computer Science University of Houston Houston, TX, 77204, USA http://www.cs.uh.edu
Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com
CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Modern GPU
Petascale Visualization: Approaches and Initial Results
Petascale Visualization: Approaches and Initial Results James Ahrens Li-Ta Lo, Boonthanome Nouanesengsy, John Patchett, Allen McPherson Los Alamos National Laboratory LA-UR- 08-07337 Operated by Los Alamos
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing
CUDA SKILLS Yu-Hang Tang June 23-26, 2015 CSRC, Beijing day1.pdf at /home/ytang/slides Referece solutions coming soon Online CUDA API documentation http://docs.nvidia.com/cuda/index.html Yu-Hang Tang @
Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015
INF5063: Programming heterogeneous multi-core processors because the OS-course is just to easy! Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks October 20 th 2015 Håkon Kvale
NVIDIA GeForce GTX 580 GPU Datasheet
NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines
Installation Guide. (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom
Installation Guide (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom Tel: +44 (0) 141 3322681 Fax: +44 (0) 141 3326792 www.mve.com Table of Contents 1.
Interactive Level-Set Deformation On the GPU
Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected]
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected] Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
GPU Tools Sandra Wienke
Sandra Wienke Center for Computing and Communication, RWTH Aachen University MATSE HPC Battle 2012/13 Rechen- und Kommunikationszentrum (RZ) Agenda IDE Eclipse Debugging (CUDA) TotalView Profiling (CUDA
Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH
Equalizer Parallel OpenGL Application Framework Stefan Eilemann, Eyescale Software GmbH Outline Overview High-Performance Visualization Equalizer Competitive Environment Equalizer Features Scalability
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
HPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU
Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
Parallel Visualization for GIS Applications
Parallel Visualization for GIS Applications Alexandre Sorokine, Jamison Daniel, Cheng Liu Oak Ridge National Laboratory, Geographic Information Science & Technology, PO Box 2008 MS 6017, Oak Ridge National
L20: GPU Architecture and Models
L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Qingyu Meng, Alan Humphrey, Martin Berzins Thanks to: John Schmidt and J. Davison de St. Germain, SCI Institute Justin Luitjens
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v5.5 July 2013 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About
An Easier Way for Cross-Platform Data Acquisition Application Development
An Easier Way for Cross-Platform Data Acquisition Application Development For industrial automation and measurement system developers, software technology continues making rapid progress. Software engineers
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden [email protected],
Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server
Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4
Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute [email protected].
Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute [email protected] Outline Motivation why do we need GPUs? Past - how was GPU programming
NVIDIA Tools For Profiling And Monitoring. David Goodwin
NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale
Operating Systems. Design and Implementation. Andrew S. Tanenbaum Melanie Rieback Arno Bakker. Vrije Universiteit Amsterdam
Operating Systems Design and Implementation Andrew S. Tanenbaum Melanie Rieback Arno Bakker Vrije Universiteit Amsterdam Operating Systems - Winter 2012 Outline Introduction What is an OS? Concepts Processes
Outline. Operating Systems Design and Implementation. Chap 1 - Overview. What is an OS? 28/10/2014. Introduction
Operating Systems Design and Implementation Andrew S. Tanenbaum Melanie Rieback Arno Bakker Outline Introduction What is an OS? Concepts Processes and Threads Memory Management File Systems Vrije Universiteit
GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile
GPU Computing with CUDA Lecture 4 - Optimizations Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 3 Control flow Coalescing Latency hiding
CUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
On Cloud Computing Technology in the Construction of Digital Campus
2012 International Conference on Innovation and Information Management (ICIIM 2012) IPCSIT vol. 36 (2012) (2012) IACSIT Press, Singapore On Cloud Computing Technology in the Construction of Digital Campus
PCI vs. PCI Express vs. AGP
PCI vs. PCI Express vs. AGP What is PCI Express? Introduction So you want to know about PCI Express? PCI Express is a recent feature addition to many new motherboards. PCI Express support can have a big
Planning Your Installation or Upgrade
Planning Your Installation or Upgrade Overview This chapter contains information to help you decide what kind of Kingdom installation and database configuration is best for you. If you are upgrading your
Enhancing Cloud-based Servers by GPU/CPU Virtualization Management
Enhancing Cloud-based Servers by GPU/CPU Virtualiz Management Tin-Yu Wu 1, Wei-Tsong Lee 2, Chien-Yu Duan 2 Department of Computer Science and Inform Engineering, Nal Ilan University, Taiwan, ROC 1 Department
ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING
ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING Sonam Mahajan 1 and Maninder Singh 2 1 Department of Computer Science Engineering, Thapar University, Patiala, India 2 Department of Computer Science Engineering,
Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism
Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Jianqiang Dong, Fei Wang and Bo Yuan Intelligent Computing Lab, Division of Informatics Graduate School at Shenzhen,
The Future Of Animation Is Games
The Future Of Animation Is Games 王 銓 彰 Next Media Animation, Media Lab, Director [email protected] The Graphics Hardware Revolution ( 繪 圖 硬 體 革 命 ) : GPU-based Graphics Hardware Multi-core (20 Cores
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
HPC performance applications on Virtual Clusters
Panagiotis Kritikakos EPCC, School of Physics & Astronomy, University of Edinburgh, Scotland - UK [email protected] 4 th IC-SCCE, Athens 7 th July 2010 This work investigates the performance of (Java)
Turbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
GPGPU Parallel Merge Sort Algorithm
GPGPU Parallel Merge Sort Algorithm Jim Kukunas and James Devine May 4, 2009 Abstract The increasingly high data throughput and computational power of today s Graphics Processing Units (GPUs), has led
~ Greetings from WSU CAPPLab ~
~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
QCD as a Video Game?
QCD as a Video Game? Sándor D. Katz Eötvös University Budapest in collaboration with Győző Egri, Zoltán Fodor, Christian Hoelbling Dániel Nógrádi, Kálmán Szabó Outline 1. Introduction 2. GPU architecture
Speeding Up RSA Encryption Using GPU Parallelization
2014 Fifth International Conference on Intelligent Systems, Modelling and Simulation Speeding Up RSA Encryption Using GPU Parallelization Chu-Hsing Lin, Jung-Chun Liu, and Cheng-Chieh Li Department of
Speeding up GPU-based password cracking
Speeding up GPU-based password cracking SHARCS 2012 Martijn Sprengers 1,2 Lejla Batina 2,3 [email protected] KPMG IT Advisory 1 Radboud University Nijmegen 2 K.U. Leuven 3 March 17-18, 2012 Who
Parallelization of video compressing with FFmpeg and OpenMP in supercomputing environment
Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 231 237 doi: 10.14794/ICAI.9.2014.1.231 Parallelization of video compressing
GPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles [email protected] Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
IBM Deep Computing Visualization Offering
P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: [email protected] Summary Deep Computing Visualization in Oil & Gas
Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology
Optimizing Parallel Reduction in CUDA Mark Harris NVIDIA Developer Technology Parallel Reduction Common and important data parallel primitive Easy to implement in CUDA Harder to get it right Serves as
NVIDIA IndeX. Whitepaper. Document version 1.0 3 June 2013
NVIDIA IndeX Whitepaper Document version 1.0 3 June 2013 NVIDIA Advanced Rendering Center Fasanenstraße 81 10623 Berlin phone +49.30.315.99.70 fax +49.30.315.99.733 [email protected] Copyright Information
A Survey Study on Monitoring Service for Grid
A Survey Study on Monitoring Service for Grid Erkang You [email protected] ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide
International Journal of Computer Engineering Science (IJCES)
Design and Automation of Security Management System for Industries Based On M2M Technology Swathi Bhupatiraju [1] T J V Subrahmanyeswara Rao M.Tech [2] [1] M.Tech. Student, Department of ECE, [email protected]
GPU for Scientific Computing. -Ali Saleh
1 GPU for Scientific Computing -Ali Saleh Contents Introduction What is GPU GPU for Scientific Computing K-Means Clustering K-nearest Neighbours When to use GPU and when not Commercial Programming GPU
Introduction to CUDA C
Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU computing as first-class capability Retain traditional DirectX/OpenGL graphics performance CUDA C Based on industry-standard
