OctaVis: A Simple and Efficient Multi-View Rendering System
|
|
|
- Scot May
- 9 years ago
- Views:
Transcription
1 OctaVis: A Simple and Efficient Multi-View Rendering System Eugen Dyck, Holger Schmidt, Mario Botsch Computer Graphics & Geometry Processing Bielefeld University Abstract: We present a simple, low-cost, and high performance system for multi-view rendering in a virtual reality (VR) context. In contrast to complex CAVE installations, which are typically driven by one render client per view, we arrange eight displays in an octagon to provide a full 360 horizontal view, and drive these eight displays by a single PC equipped with multiple graphics units (GPUs). We describe the hardware and software setup, as well as the necessary optimizations to fully exploit the parallelism of this multi-gpu VR system. Keywords: Multi-View Rendering, Multi-GPU Rendering, Virtual Reality 1 Introduction Thanks to the steady increase in computational resources and rendering performance over the last decade, virtual reality (VR) techniques have developed into valuable tools for a large variety of applications, such as automotive design, architectural previews, game development, or medical applications, to name just a few. In all these applications a high level of immersion is both desired and required. Our approach is motivated by a medical research project that aims at diagnosing and training stroke patients based on neuro-psychological tests performed in a virtual environment. In order to enable the transfer of the patient s training successes to real-life situations, the VR setup has to be sufficiently realistic and immersive. CAVE installations are known to provide a very high level of immersion, but disqualify for our project because of their high cost and maintenance effort, which is mainly due to the fact that CAVEs are usually driven by one rendering node per view. We propose a cost-efficient visualization system that consists of eight standard (non-stereo) touch screen displays arranged in an octagon around the patient, thereby providing a full 360 horizontal view and a simple interaction with the scene. To minimize the stress on the patient we abandon stereo rendering. In order to reduce hardware costs and maintenance effort, our so-called OctaVis solution is driven by a single PC, which consequently is equipped with several graphics processing units (GPUs) (see Figure 1). This paper describes our hardware and software architecture, parallelization efforts, and performance optimizations that eventually enable the real-time rendering of complex VR environments on eight views using a single PC.
2 View 2 1 Vi e w w 3 e Vi View 4 View 8 GPU 1 PC GPU 2 GPU 3 Vi e w Vi e 7 w 5 View 6 Figure 1: In our OctaVis setup a single PC is equipped with three GPUs that drive eight displays, which are arranged in an octagon to provide a full 360 visualization. 2 Related Work The standard approach to drive an eight-display system is to use a render cluster consisting of one application server and eight render clients. This solution, however, is complex in terms of both hardware and software. First, it requires nine individual PCs that are properly synchronized and connected through a sufficiently fast network (Gigabit Ethernet, Infiniband). Second, the rendering has to be distributed to the render clients, by either employing a distributed scene graph (e.g., OpenSG [Ope10]) or by distributing OpenGL commands (e.g., Chromium [HHN+ 02]). In contrast, we decided for a single-pc multi-gpu solution for driving our OctaVis system. Our workstation is a standard, off-the-shelf PC with four PCIe slots (Intel Core i7, Windows 7). We experimented with either four NVIDIA GeForce 9800 GX2 cards (two graphics outputs each) or three ATI Radeon HD 5770 cards (three outputs each). The question is how to design the rendering architecture, such that the parallel performance of this multi-gpu system is exploited as much as possible. The major graphics vendors already provide solutions for combining several GPUs in order to increase rendering performance (NVIDIA s SLI, ATI s CrossFire). Note, however, that these techniques only support a single graphics output, and hence are not applicable in our multi-view setup. NVIDIA s QuadroPlex is a multi-gpu solution that combines up to four Quadro GPUs in an external case. Two QuadroPlex boxes can therefore be used to drive eight displays, but such a system comes at a price of more than $20,000. ATI s EyeFinity is a technology for driving several displays by one graphics board, and hence is rather a multi-view approach, but not a multi-gpu solution. In our system we do not make use of any of these techniques, but rather handle each view and each GPU individually. This requires to distribute the rendering to the different GPUs in an as efficient as possible manner.
3 We experimented with higher-level APIs, such as the distributed scene graph OpenSG [Ope10] and the multi-gpu-aware scene graph OpenSceneGraph [OSG10]. However, these frameworks turned out not to be flexible enough to give (easy) control over the crucial details affecting multi-gpu performance. A distribution of OpenGL commands based on Chromium [HHN + 02] was done by Rabe et al. [RFL07], who built a system similar to ours based on three NVIDIA cards. However, their performance results are rather disappointing, mainly due to the overhead induced by the Chromium layer. An elegant abstraction-layer for parallel rendering is provided by the Equalizer framework [EMP], which allows for distributed OpenGL rendering using render clusters, multi-gpu setups, or any combination thereof. Our design goal was to keep the OctaVis system as simple as possible both in terms of hardware and software since this allows for easy maintenance and future extension. To this end, we do not employ any higher-level API for distributed or parallel rendering, but instead custom-tailor a (simple) low-level OpenGL solution for our target system. The resulting rendering architecture and required optimizations are described in the next section. 3 Rendering Architecture Since we do not build our rendering architecture on top of a high-level framework, we have to (and are able to) take care of (1) the distribution of render commands to the different GPUs, (2) the data management, and (3) low-level OpenGL optimizations. In the following we discuss these three steps in detail. 3.1 Distributing OpenGL Commands Our multi-gpu multi-view architecture consist of three GPUs with three outputs each, which drive the eight displays of our OctaVis system. We therefore have to be able to address OpenGL commands to a particular display attached to a particular GPU. At application start-up, we generate a single full-screen window for each of the eight views, with each window having its own OpenGL context. For Unix operating systems distributing render commands is easy, since an individual X-server can be explicitly assigned to each view. In our medial research project, however, external constraints require Windows as an operating system, which does not provide this kind of explicit control. The Windows Display Driver Model 1.1 used in Windows 7 provides improved support for multi-gpu applications, but it turned out that the actual performance strongly depends on the GPU driver. Our experiments revealed that the NVIDIA driver dispatches all OpenGL render commands to all available GPUs. This obviously prevents efficient parallelization, which is clearly reflected in the performance statistics in Section 4. In order to address a specific NVIDIA GPU the OpenGL extension WGL NV gpu affinity has to be used, but this extension is only available for (expensive) Quadro-GPUs.
4 Geometry Data Texture Data Scene vertices image Texture Node Shader Node ID on GPU 0 ID on GPU 0 ID on GPU 1 ID on GPU 1 Transform Node Transform Node Transform Node ID on GPU 2 ID on GPU 2 Mesh Node Mesh Node Mesh Node Figure 2: In our scene graph nodes share data in order to reduce memory consumption. In contrast, the ATI driver dispatches render commands to just the one GPU responsible for the current window, which is the GPU attached to the display the window was created on. It therefore turned out to be crucial to create the eight windows at the correct initial position on the respective view. Creating all windows on the first view and moving them to the proper position afterward does not work. When taking this subtle information into account, the ATI driver allows for efficient parallelization between different GPUs. 3.2 Data Management Thanks to our simple single-pc architecture we do not have to distribute scene data or render states over the network to individual render clients. However, in order to render the scene in our multi-view application each OpenGL context (i.e., each view) needs access to the scene data, such as, e.g., textures and shaders. In order to minimize memory consumption, we do not duplicate the scene data for each view, but instead store a copy for each GPU only, which is then shared by all views attached to this GPU. This behavior can easily be implemented by shared OpenGL contexts. We incorporated this functionality into a very simplistic scene graph with standard nodes for transformations, textures, materials, and triangle meshes. Texture nodes, for instance, then store a reference to a texture object only. The texture object in turn stores the texture data on each available GPU (but not for each individual view). See Figure 2 for an illustration. In our case, where each GPU drives up to three views, memory consumption is reduced by a factor of three. 3.3 Performance Optimizations In order to optimize rendering performance one has to identify and eliminate the typical performance bottlenecks: CPU load, data transfer from CPU to GPU, and GPU load. In a multi-view multi-gpu environment, even more attention has to be paid to these issues. Rendering geometry in immediate mode quickly makes the application CPU-bound due to the massive amount of glvertex() function calls. We therefore store vertex positions and
5 GPU 1 GPU 2 GPU 3 Shared Context 1 View 1 Shared Context 2 View 4 Shared Context 3 View 7 View 2 View 5 View 8 View 3 View 6 Figure 3: Rendering process: All views attached to the same GPU share a render context and are drawn consecutively, while the rendering is parallelized over the available GPUs. triangle indices in vertex arrays (VA), which allows to render meshes with a single function call and thereby eliminates the CPU bottleneck. However, in our multi-view setup the data has to be transferred from main memory to the GPU eight times (for each individual view) in each frame, such that data transfer immediately becomes the bottleneck. Data transfer can be eliminated by storing the vertex arrays in vertex buffer objects (VBO) on the GPU. This turned out to be absolutely crucial in our multi-view setting. The respective data storage follows the same shared context paradigm as described in the previous section. With VBOs the bottleneck is no longer data transfer, but the per-vertex computations of the GPU. This GPU load can be reduced by caching computations performed for individual vertices. If a triangle is rendered and one of its vertices has been processed before and is still in the cache, these computations can be re-used. To maximize cache-hits we re-order the individual triangles of the mesh using the method described in [YL05], which (depending on the model) yields a significant performance gain. Finally, in order to optimally exploit all available GPUs, the render traversal is parallelized: Each GPU is served by a dedicated render thread that processes all views attached to that GPU in a serial manner. Parallelizing over the (two or three) views on the same GPU did not increase performance. Figure 3 gives an overview of the rendering process. 4 Results We analyze our rendering architecture on a standard PC with an Intel Core i7 CPU and running Windows 7. We experimented with two kinds of GPUs: four NVIDIA GeForce 9800 GX2 (two outputs each) and three ATI Radeon HD 5770 (three outputs each). For each card we used the latest driver (NVIDIA: , ATI: Catalyst 10.6).
6 # GPUs # Views/GPU # Views FPS NVIDIA FPS ATI Table 1: Comparing frame rates for a 870k triangle model using NVIDIA GeForce 9800 GX2 and ATI Radeon HD 5770 cards, respectively, in several multi-gpu and multi-view setups. # GPUs # Views/GPU # Views FPS (VA) FPS (VBO) FPS (reorder) Table 2: Frame rates for different optimizations (vertex arrays, vertex buffer objects, cachefriendly reordering), using ATI Radeon HD 5770 GPUs. We first compare the performance scaling of NVIDIA and ATI cards/drivers with varying numbers of GPUs and varying numbers of views per GPU (see Table 1). For this experiment, a model of 870k triangles was rendered. We employed VBOs, such that the GPU performance (and not the data transfer) is measured. The performance difference for one GPU and one view is simply due to the fact that the two cards represent different generations of GPUs. It is clearly visible that the NVIDIA system scales inversely proportional to the number of views: No matter whether two views are driven by one GPU or by two GPUs, the performance drops by a factor of about two. This is an immediate consequence of the NVIDIA driver sending OpenGL commands to all available GPUs, as also described in [EMP]. In contrast, the ATI system scales almost perfectly. Rendering one view per GPU gives the same performance on one, two, or three cards. When keeping the number of GPUs fixed, the performance decreases almost linearly with the number of attached views per GPU. Hence, the ATI system can fully exploit the parallelization between multiple GPUs.
7 Figure 4: The eight views of the VR supermarket used for the timings in Table 2. Using the ATI system, we now compare the different optimization techniques discussed in Section 3.3, as listed in Table 2. For these experiments we visualize the VR scene to be used in the medical project, which is a virtual supermarket of about 1.7M triangles shown in Figure 4. When the geometry is stored in vertex arrays (VA), but not in VBOs, the geometry data is transferred to the GPU for each active view. Consequently, the performance drops with each additional view, even when they are attached to different GPUs. Storing the geometry in VBOs eliminates the transfer costs, which then yields the convincing scaling behavior discussed above. The last column shows the performance after reordering the triangles in order to maximize cache usage [YL05]. With this last optimization, our rendering architecture is able to visualize a 1.7M triangle scene on nine display at a performance of more than 100 frames per second. As mentioned above, we initially experimented with the two popular Open Source scene graph libraries OpenSceneGraph [OSG10] and OpenSG [Ope10], but did not achieve a comparable performance in terms of rendering speed and memory consumption: Since OpenSG is specialized to distributed rendering, our implementation uses eight local render server processes to drive the eight displays. Since each render server requires a full scene copy, this consumes significantly more memory than our shared contexts and therefore does not allow for highly complex scenes. Moreover, due to suboptimal window creation and setup (see Section 3.1) OpenGL commands are dispatched to all GPUs, which slows down the rendering to just about 10 frames per second (for a less complex supermarket model). OpenSceneGraph correctly creates and initializes windows, supports for shared OpenGL contexts, and allows for precise control of VBOs. However, its rendering performance is still only about 70% of ours, which we assume is due to the higher overall complexity of this scene graph system. Due to our use of touch screen displays we could not employ seamless displays, resulting in clearly visible seams between the eight screens. In an empirical study [PK10] conducted by the Department of Psychology at Bielefeld University our OctaVis setup was evaluated by 27 subjects. Particularly asked about the effect of these seams, 21 subjects stated them to be not disturbing at all, 3 were disturbed slightly, and 3 were confused by them.
8 5 Conclusions These experiments clearly demonstrate the potential of a multi-view rendering system based on a single PC equipped with multiple GPUs. However, the results also indicate that the multi-gpu performance can crucially depend on a few seemingly minor implementation details, on optimization techniques, and on GPU drivers and operating systems. This paper tries to provide a recipe to circumvent pitfalls and achieve a high performance single-pc multi-gpu system. In terms of hardware, our proposed OctaVis system is cheap and easy to maintain. In terms of software, the rendering architecture scales almost perfectly thanks to few but carefully done low-level optimizations. So far we did not implement higher-level scene graph optimizations, such as different types of culling, spatial data structures, sorting with respect to state changes, or other wellknown techniques. These are clearly the next steps and are expected to yield a similar performance gain compared to single-gpu solutions. Acknowledgments: The authors are grateful to Christian Fröhlich and Bernhard Brüning for their support with the initial OpenSG-based version of the rendering system. Mario Botsch is supported by the Deutsche Forschungsgemeinschaft (Center of Excellence in Cognitive Interaction Technology, CITEC). Eugen Dyck and Holger Schmidt are supported by the EFRE project CITmed: Cognitive Interaction Technologies for Medical Applications. References [EMP] S. Eilemann, M. Makhinya, and R. Pajarola. Equalizer: A scalable parallel rendering framework. IEEE Trans. on Visualization and Computer Graphics, 15(3): [HHN + 02] G. Humphreys, M. Houston, R. Ng, R. Frank, S. Ahern, P. D. Kirchner, and J. T. Klosowski. Chromium: a stream-processing framework for interactive rendering on clusters. ACM Transactions on Graphics (SIGGRAPH), 21(3): , [Ope10] OpenSG [OSG10] OpenSceneGraph [PK10] [RFL07] [YL05] Martina Piefke and Sina Kühnel. Empirie- und Beobachtungspraktikum: Physiologische Psychologie, Departement of Psychology, Bielefeld University. Winter Term 2009/2010. F. Rabe, C. Fröhlich, and M. E. Latoschik. Low-cost image generation for immersive multi-screen environments. In Workshop of the GI VR & AR special interest group, pages 65 76, S. E. Yoon and P. Lindstrom. Cache-oblivious mesh layouts. ACM Transactions on Graphics (SIGGRAPH), 24(3): , 2005.
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
GPU Usage. Requirements
GPU Usage Use the GPU Usage tool in the Performance and Diagnostics Hub to better understand the high-level hardware utilization of your Direct3D app. You can use it to determine whether the performance
How To Share Rendering Load In A Computer Graphics System
Bottlenecks in Distributed Real-Time Visualization of Huge Data on Heterogeneous Systems Gökçe Yıldırım Kalkan Simsoft Bilg. Tekn. Ltd. Şti. Ankara, Turkey Email: [email protected] Veysi İşler Dept.
Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH
Equalizer Parallel OpenGL Application Framework Stefan Eilemann, Eyescale Software GmbH Outline Overview High-Performance Visualization Equalizer Competitive Environment Equalizer Features Scalability
Performance Optimization and Debug Tools for mobile games with PlayCanvas
Performance Optimization and Debug Tools for mobile games with PlayCanvas Jonathan Kirkham, Senior Software Engineer, ARM Will Eastcott, CEO, PlayCanvas 1 Introduction Jonathan Kirkham, ARM Worked with
REMOTE RENDERING OF COMPUTER GAMES
REMOTE RENDERING OF COMPUTER GAMES Peter Eisert, Philipp Fechteler Fraunhofer Institute for Telecommunications, Einsteinufer 37, D-10587 Berlin, Germany [email protected], [email protected]
L20: GPU Architecture and Models
L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.
Advanced Rendering for Engineering & Styling
Advanced Rendering for Engineering & Styling Prof. B.Brüderlin Brüderlin,, M Heyer 3Dinteractive GmbH & TU-Ilmenau, Germany SGI VizDays 2005, Rüsselsheim Demands in Engineering & Styling Engineering: :
GPU Architecture. Michael Doggett ATI
GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super
DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER
DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER RAMA HOETZLEIN, DEVELOPER TECHNOLOGY, NVIDIA Data Visualizations assist humans with data analysis by representing information
Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions
Module 4: Beyond Static Scalar Fields Dynamic Volume Computation and Visualization on the GPU Visualization and Computer Graphics Group University of California, Davis Overview Motivation and applications
Computer Graphics Hardware An Overview
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect
SIGGRAPH 2013 Shaping the Future of Visual Computing NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect NVIDIA
Introduction to GPGPU. Tiziano Diamanti [email protected]
[email protected] Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
How PCI Express Works (by Tracy V. Wilson)
1 How PCI Express Works (by Tracy V. Wilson) http://computer.howstuffworks.com/pci-express.htm Peripheral Component Interconnect (PCI) slots are such an integral part of a computer's architecture that
Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software
GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With
SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC Specification Display Support Output GPU Video Memory Dimension Software Accessory 3 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort 1.2
How to choose a suitable computer
How to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and post-processing your data with Artec Studio. While
Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005
Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005 Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2
GPGPU for Real-Time Data Analytics: Introduction Bingsheng He 1, Huynh Phung Huynh 2, Rick Siow Mong Goh 2 1 Nanyang Technological University, Singapore 2 A*STAR Institute of High Performance Computing,
Remote Graphical Visualization of Large Interactive Spatial Data
Remote Graphical Visualization of Large Interactive Spatial Data ComplexHPC Spring School 2011 International ComplexHPC Challenge Cristinel Mihai Mocan Computer Science Department Technical University
Multiresolution 3D Rendering on Mobile Devices
Multiresolution 3D Rendering on Mobile Devices Javier Lluch, Rafa Gaitán, Miguel Escrivá, and Emilio Camahort Computer Graphics Section Departament of Computer Science Polytechnic University of Valencia
Several tips on how to choose a suitable computer
Several tips on how to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and postprocessing of your data with Artec
How To Teach Computer Graphics
Computer Graphics Thilo Kielmann Lecture 1: 1 Introduction (basic administrative information) Course Overview + Examples (a.o. Pixar, Blender, ) Graphics Systems Hands-on Session General Introduction http://www.cs.vu.nl/~graphics/
Faculty of Computer Science Computer Graphics Group. Final Diploma Examination
Faculty of Computer Science Computer Graphics Group Final Diploma Examination Communication Mechanisms for Parallel, Adaptive Level-of-Detail in VR Simulations Author: Tino Schwarze Advisors: Prof. Dr.
Data Visualization in Parallel Environment Based on the OpenGL Standard
NO HEADER, NO FOOTER 5 th Slovakian-Hungarian Joint Symposium on Applied Machine Intelligence and Informatics January 25-26, 2007 Poprad, Slovakia Data Visualization in Parallel Environment Based on the
Several tips on how to choose a suitable computer
Several tips on how to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and postprocessing of your data with Artec
An Interactive Dynamic Tiled Display System
An Interactive Dynamic Tiled Display System Juliano Franz, Gelson Reinaldo, Anderson Maciel and Luciana Nedel Instituto de Informática Universidade Federal do Rio Grande do Sul Porto Alegre, Brazil {[email protected],
Enhancing Cloud-based Servers by GPU/CPU Virtualization Management
Enhancing Cloud-based Servers by GPU/CPU Virtualiz Management Tin-Yu Wu 1, Wei-Tsong Lee 2, Chien-Yu Duan 2 Department of Computer Science and Inform Engineering, Nal Ilan University, Taiwan, ROC 1 Department
CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014
CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014 Introduction Cloud ification < 2013 2014+ Music, Movies, Books Games GPU Flops GPUs vs. Consoles 10,000
Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group
Shader Model 3.0 Ashu Rege NVIDIA Developer Technology Group Talk Outline Quick Intro GeForce 6 Series (NV4X family) New Vertex Shader Features Vertex Texture Fetch Longer Programs and Dynamic Flow Control
Petascale Visualization: Approaches and Initial Results
Petascale Visualization: Approaches and Initial Results James Ahrens Li-Ta Lo, Boonthanome Nouanesengsy, John Patchett, Allen McPherson Los Alamos National Laboratory LA-UR- 08-07337 Operated by Los Alamos
GPGPU Computing. Yong Cao
GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power
Intel Data Direct I/O Technology (Intel DDIO): A Primer >
Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
HP Workstations graphics card options
Family data sheet HP Workstations graphics card options Quick reference guide Leading-edge professional graphics February 2013 A full range of graphics cards to meet your performance needs compare features
GPU Parallel Computing Architecture and CUDA Programming Model
GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014
HPC Cluster Decisions and ANSYS Configuration Best Practices Diana Collier Lead Systems Support Specialist Houston UGM May 2014 1 Agenda Introduction Lead Systems Support Specialist Cluster Decisions Job
NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
Intel DPDK Boosts Server Appliance Performance White Paper
Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks
ScaleArc idb Solution for SQL Server Deployments
ScaleArc idb Solution for SQL Server Deployments Objective This technology white paper describes the ScaleArc idb solution and outlines the benefits of scaling, load balancing, caching, SQL instrumentation
Practical Data Visualization and Virtual Reality. Virtual Reality VR Software and Programming. Karljohan Lundin Palmerius
Practical Data Visualization and Virtual Reality Virtual Reality VR Software and Programming Karljohan Lundin Palmerius Synopsis Scene graphs Event systems Multi screen output and synchronization VR software
Radeon HD 2900 and Geometry Generation. Michael Doggett
Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command
Writing Applications for the GPU Using the RapidMind Development Platform
Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...
Introduction to Computer Graphics
Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 [email protected] www.cs.sfu.ca/~torsten Today What is computer graphics? Contents of this course Syllabus Overview of course topics
OpenGL Performance Tuning
OpenGL Performance Tuning Evan Hart ATI Pipeline slides courtesy John Spitzer - NVIDIA Overview What to look for in tuning How it relates to the graphics pipeline Modern areas of interest Vertex Buffer
Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August
Optimizing Unity Games for Mobile Platforms Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August Agenda Introduction The author and ARM Preliminary knowledge Unity Pro, OpenGL ES 3.0 Identify
Parallel Simplification of Large Meshes on PC Clusters
Parallel Simplification of Large Meshes on PC Clusters Hua Xiong, Xiaohong Jiang, Yaping Zhang, Jiaoying Shi State Key Lab of CAD&CG, College of Computer Science Zhejiang University Hangzhou, China April
NVIDIA GeForce GTX 580 GPU Datasheet
NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines
SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST
SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory supports up to 4 display monitor(s) without DisplayPort 4 x Maximum Display
Optimizing AAA Games for Mobile Platforms
Optimizing AAA Games for Mobile Platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Epic Games, Unreal Engine 15 years in the industry 30 years of programming C64 demo
ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU
Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents
Vulkan on NVIDIA GPUs. Piers Daniell, Driver Software Engineer, OpenGL and Vulkan
Vulkan on NVIDIA GPUs Piers Daniell, Driver Software Engineer, OpenGL and Vulkan Who am I? Piers Daniell @piers_daniell Driver Software Engineer - OpenGL, OpenGL ES, Vulkan NVIDIA Khronos representative
NVIDIA Quadro M4000 Sync PNY Part Number: VCQM4000SYNC-PB. User Guide
NVIDIA Quadro M4000 Sync PNY Part Number: VCQM4000SYNC-PB User Guide PNY 100 Jefferson Road Parsippany NJ 07054-0218 973-515-9700 www.pny.com/quadro Features and specifications are subject to change without
SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST
SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory 4 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort
The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA
The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques
Hardware design for ray tracing
Hardware design for ray tracing Jae-sung Yoon Introduction Realtime ray tracing performance has recently been achieved even on single CPU. [Wald et al. 2001, 2002, 2004] However, higher resolutions, complex
ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer
ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop Emily Apsey Performance Engineer Presentation Overview What it takes to successfully virtualize ArcGIS Pro in Citrix XenApp and XenDesktop - Shareable
2020 Design Update 11.3. Release Notes November 10, 2015
2020 Design Update 11.3 Release Notes November 10, 2015 Contents Introduction... 1 System Requirements... 2 Actively Supported Operating Systems... 2 Hardware Requirements (Minimum)... 2 Hardware Requirements
DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service
DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service Achieving Scalability and High Availability Abstract DB2 Connect Enterprise Edition for Windows NT provides fast and robust connectivity
Next Generation Operating Systems
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the
Low power GPUs a view from the industry. Edvard Sørgård
Low power GPUs a view from the industry Edvard Sørgård 1 ARM in Trondheim Graphics technology design centre From 2006 acquisition of Falanx Microsystems AS Origin of the ARM Mali GPUs Main activities today
Getting Started with CodeXL
AMD Developer Tools Team Advanced Micro Devices, Inc. Table of Contents Introduction... 2 Install CodeXL... 2 Validate CodeXL installation... 3 CodeXL help... 5 Run the Teapot Sample project... 5 Basic
NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH [email protected] SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA
NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH [email protected] SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA GFLOPS 3500 3000 NVPRO-PIPELINE Peak Double Precision FLOPS GPU perf improved
GRAPHICS PROCESSING REQUIREMENTS FOR ENABLING IMMERSIVE VR
GRAPHICS PROCESSING REQUIREMENTS FOR ENABLING IMMERSIVE VR By David Kanter Sponsored by AMD Over the last three decades, graphics technology has evolved and revolutionized the way that people interact
Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university
Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011 Visually realistic goal force[d] us to completely rethink the entire rendering process. Cook
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
Why You Need the EVGA e-geforce 6800 GS
Why You Need the EVGA e-geforce 6800 GS GeForce 6800 GS Profile NVIDIA s announcement of a new GPU product hailing from the now legendary GeForce 6 series adds new fire to the lineup in the form of the
How To Develop For A Powergen 2.2 (Tegra) With Nsight) And Gbd (Gbd) On A Quadriplegic (Powergen) Powergen 4.2.2 Powergen 3
Profiling and Debugging Tools for High-performance Android Applications Stephen Jones, Product Line Manager, NVIDIA ([email protected]) Android By The Numbers 1.3M Android activations per day Android activations
RWTH GPU Cluster. Sandra Wienke [email protected] November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke [email protected] November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative
NVIDIA Tools For Profiling And Monitoring. David Goodwin
NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale
CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University
GPU Generations CSE 564: Visualization GPU Programming (First Steps) Klaus Mueller Computer Science Department Stony Brook University For the labs, 4th generation is desirable Graphics Hardware Pipeline
Software-defined Storage Architecture for Analytics Computing
Software-defined Storage Architecture for Analytics Computing Arati Joshi Performance Engineering Colin Eldridge File System Engineering Carlos Carrero Product Management June 2015 Reference Architecture
IBM Deep Computing Visualization Offering
P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: [email protected] Summary Deep Computing Visualization in Oil & Gas
An Efficient Application Virtualization Mechanism using Separated Software Execution System
An Efficient Application Virtualization Mechanism using Separated Software Execution System Su-Min Jang, Won-Hyuk Choi and Won-Young Kim Cloud Computing Research Department, Electronics and Telecommunications
GPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008
Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer
High Performance GPGPU Computer for Embedded Systems
High Performance GPGPU Computer for Embedded Systems Author: Dan Mor, Aitech Product Manager September 2015 Contents 1. Introduction... 3 2. Existing Challenges in Modern Embedded Systems... 3 2.1. Not
Course Overview. CSCI 480 Computer Graphics Lecture 1. Administrative Issues Modeling Animation Rendering OpenGL Programming [Angel Ch.
CSCI 480 Computer Graphics Lecture 1 Course Overview January 14, 2013 Jernej Barbic University of Southern California http://www-bcf.usc.edu/~jbarbic/cs480-s13/ Administrative Issues Modeling Animation
Introduction to Computer Graphics. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012
CSE 167: Introduction to Computer Graphics Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012 Today Course organization Course overview 2 Course Staff Instructor Jürgen Schulze,
HP Z Workstations graphics card options
Sales guide HP Z Workstations graphics card options Quick reference guide Table of contents Desktop Workstations compatibility... 3 Mobile and All-in-One Workstations compatibility... 4 Compare features:
EBERSPÄCHER ELECTRONICS automotive bus systems. solutions for network analysis
EBERSPÄCHER ELECTRONICS automotive bus systems solutions for network analysis DRIVING THE MOBILITY OF TOMORROW 2 AUTOmotive bus systems System Overview Analyzing Networks in all Development Phases Control
Dynamic Resolution Rendering
Dynamic Resolution Rendering Doug Binks Introduction The resolution selection screen has been one of the defining aspects of PC gaming since the birth of games. In this whitepaper and the accompanying
GPU Profiling with AMD CodeXL
GPU Profiling with AMD CodeXL Software Profiling Course Hannes Würfel OUTLINE 1. Motivation 2. GPU Recap 3. OpenCL 4. CodeXL Overview 5. CodeXL Internals 6. CodeXL Profiling 7. CodeXL Debugging 8. Sources
Web Based 3D Visualization for COMSOL Multiphysics
Web Based 3D Visualization for COMSOL Multiphysics M. Jüttner* 1, S. Grabmaier 1, W. M. Rucker 1 1 University of Stuttgart Institute for Theory of Electrical Engineering *Corresponding author: Pfaffenwaldring
Data Visualization Using Hardware Accelerated Spline Interpolation
Data Visualization Using Hardware Accelerated Spline Interpolation Petr Kadlec [email protected] Marek Gayer [email protected] Czech Technical University Department of Computer Science and Engineering
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)
Optimizing Unity Games for Mobile Platforms Angelo Theodorou Software Engineer Brains Eden, 28 th June 2013 Agenda Introduction The author ARM Ltd. What do you need to have What do you need to know Identify
Technical Specifications: tog Live
s: tog Live TEXT AND TICKERS SPORTS INSERTION VIRTUAL STUDIOS AUGMENTED REALITY tog Live: the playout engine for all RT Software tog products tog 3d Live is RT Software s core render technology responsible
Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.
SAPPHIRE FleX HD 6870 1GB GDDE5 SAPPHIRE HD 6870 FleX can support three DVI monitors in Eyefinity mode and deliver a true SLS (Single Large Surface) work area without the need for costly active adapters.
Parallel Large-Scale Visualization
Parallel Large-Scale Visualization Aaron Birkland Cornell Center for Advanced Computing Data Analysis on Ranger January 2012 Parallel Visualization Why? Performance Processing may be too slow on one CPU
3D Computer Games History and Technology
3D Computer Games History and Technology VRVis Research Center http://www.vrvis.at Lecture Outline Overview of the last 10-15 15 years A look at seminal 3D computer games Most important techniques employed
How To Create A Flood Simulator For A Web Browser (For Free)
Interactive Web-based Flood Simulation System for Realistic Experiments of Flooding and Flood Damage Ibrahim Demir Big Data We are generating data on a petabyte scale through observations and modeling
