Technical Brief. Fast Texture Downloads and Readbacks using Pixel Buffer Objects in OpenGL. August 2005 TB-02011-001_v01

Similar documents
Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB _v02

Application Note. nforce 220/420 Platform Installing nforce Core Drivers Under Windows 98

NVIDIA GeForce Experience

Technical Brief. Introducing Hybrid SLI Technology. March 11, 2008 TB _v01

Technical Brief. NVIDIA nview Display Management Software. May 2009 TB _v02

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA Tesla Compute Cluster Driver for Windows

Technical Brief. MediaShield Storage Technology: Confidently Storing Your Digital Assets. March 2007 TB _v03

Black-Scholes option pricing. Victor Podlozhnyuk

Intel 810 and 815 Chipset Family Dynamic Video Memory Technology

Technical Brief. Quadro FX 5600 SDI and Quadro FX 4600 SDI Graphics to SDI Video Output. April 2008 TB _v01

GRID VGPU FOR VMWARE VSPHERE

QUADRO POWER GUIDELINES

How To Write A Cg Toolkit Program

Tegra Android Accelerometer Whitepaper

NVIDIA VIDEO ENCODER 5.0

NVIDIA NVIEW DESKTOP MANAGEMENT SOFTWARE

Monte-Carlo Option Pricing. Victor Podlozhnyuk

TESLA C2050/2070 COMPUTING PROCESSOR INSTALLATION GUIDE

GeForce Drivers NVIDIA Control Panel Quick Start Guide. Driver Release 174/175 for Windows NVIDIA Corporation

NVIDIA Graphics Card and Driver Installation Guide (Windows NT 4.0 and Windows 2000)

Binomial option pricing model. Victor Podlozhnyuk

XID ERRORS. vr352 May XID Errors

Intel X38 Express Chipset Memory Technology and Configuration Guide

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

================================================================== CONTENTS ==================================================================

OpenCL Programming for the CUDA Architecture. Version 2.3

Atlas Creation Tool Using the Command-line Tool

Intel 845G/GL Chipset Dynamic Video Memory Technology

GPU Christmas Tree Rendering. Evan Hart

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

Workstation Applications for Windows. NVIDIA MAXtreme User s Guide

QUADRO AND NVS DISPLAY RESOLUTION SUPPORT

DeviceAnywhere Automation for Smartphones Setup Guide Windows Mobile

Several tips on how to choose a suitable computer

How to choose a suitable computer

Whitepaper. NVIDIA Miracast Wireless Display Architecture

Several tips on how to choose a suitable computer

NVIDIA GRID DASSAULT CATIA V5/V6 SCALABILITY GUIDE. NVIDIA Performance Engineering Labs PerfEngDoc-SG-DSC01v1 March 2016

QCD as a Video Game?

DUAL MONITOR DRIVER AND VBIOS UPDATE

TESLA K20X GPU ACCELERATOR

NVIDIA GRID 2.0 ENTERPRISE SOFTWARE

NVIDIA GRID K1 GRAPHICS BOARD

Autodesk Revit 2016 Product Line System Requirements and Recommendations

Installation Guide. (Version ) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom

Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide

GRID SDK FAQ. PG January Frequently Asked Questions

MetaMorph Microscopy Automation & Image Analysis Software Super-Resolution Module

Intel 965 Express Chipset Family Memory Technology and Configuration Guide

EPI SUITE 6 INSTALLATION INSTRUCTIONS

Computer Graphics Hardware An Overview

Maximizing Backup and Restore Performance of Large Databases

GRID LICENSING. DU April User Guide

Release 295 Graphics Drivers for Windows - Version

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Windows Embedded Compact 7: RemoteFX and Remote Experience Thin Client Integration

Quest vworkspace Virtual Desktop Extensions for Linux

VIRTU Universal MVP Installation Guide

Parallel Prefix Sum (Scan) with CUDA. Mark Harris

NVIDIA Mosaic Technology

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Steps to Migrating to a Private Cloud

Migrating Cirrus. Revised 7/19/2007

White Paper A Case Study Comparing Data Access Methods Using Remote Graphics Software and Siemens PLM Software, Teamcenter 2007 and NX 5

NVIDIA WMI. WP _v02 August White Paper

BIOS Update Release Notes

Intel WiDi Remote 1.0 Release Notes

Choosing a Computer for Running SLX, P3D, and P5

BIOS Update Release Notes

Deploying Microsoft RemoteFX on a Single Remote Desktop Virtualization Host Server Step-by-Step Guide

Hyper-V Server 2008 Getting Started Guide

Initial Hardware Estimation Guidelines. AgilePoint BPMS v5.0 SP1

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

Stream Processing on GPUs Using Distributed Multimedia Middleware

Quadro Professional Drivers Quadro FX 3800/4800/5800 and Quadro CX SDI User s Guide. Version 2.0

HOW MANY USERS CAN I GET ON A SERVER? This is a typical conversation we have with customers considering NVIDIA GRID vgpu:

TESLA K20 GPU ACCELERATOR

SURROUNDVIEW Installation and Setup User s Guide

Adobe Creative Cloud for teams

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

System requirements for Autodesk Building Design Suite 2017

Intel Desktop Board DG31GL

Tested product: Auslogics BoostSpeed

Keynote DeviceAnywhere/HP Application Lifecycle Management (HP ALM/QC) Integration Guide. TCE Automation 5.2

TEGRA LINUX DRIVER PACKAGE R21.1

Dell Statistica. Statistica Document Management System (SDMS) Requirements

================================================================== CONTENTS ==================================================================

Avoiding Catastrophic Performance Loss

PCI vs. PCI Express vs. AGP

Release 302 Graphics Drivers for Windows - Version

NetIQ Privileged User Manager

Basic ShadowProtect Troubleshooting

Intel 865G Chipset Dynamic Video Memory Technology

BrightStor ARCserve Backup for Windows

Data Sheet Graphic Cards for Fujitsu ESPRIMO PCs

Bandwidth Calculations for SA-1100 Processor LCD Displays

Symantec Backup Exec 2010 R2. Quick Installation Guide

Adobe Creative Cloud for teams

Transcription:

Technical Brief Fast Texture Downloads and Readbacks using Pixel Buffer Objects in OpenGL August 2005 TB-02011-001_v01

Document Change History Version Date Responsible Reason for Change 01 00/00/00 Initial release 02 08/17/05 Ikrima Elhassan 03 08/17/05 Ikrima Elhassan Added Chris, Mark, and Eric s suggestions Added perf #s and section about motherboards & chipsets DA-00xxx-001_v01 9/28/2005 NVIDIA CONFIDENTIAL i

Preface Bandwidth bottleneck is a common occurrence in many applications. This technical brief discusses how to achieve efficient bandwidth rates in your application, including transfers from and to the graphics processing unit (GPU). Ikrima Elhassan sdkfeedback@nvidia.com NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 August 16, 2005 TB-02011-001_v01 1 08/17/05

Achieving Efficient Bandwidth Rates Graphics applications are often bandwidth bound. To make matters worse, non PCI-e video cards have asymmetric bandwidth rates, with slower readback rates than download rates. Unfortunately, applications often require reading the frame buffer. For example, some applications write intermediary results to the frame buffer and use it as an input to additional rendering or computation passes. This paper aims to discuss ways to achieve fast transfers by: Using an nforce3 or comparable AGP chipset Using pixel sizes that are multiples of 32 bits to avoid data padding. Storing 8-bit textures in a BGRA layout in system memory and use the GL_BGRA as the external format for textures to avoid swizzling. Using a PBO to transfer data to and from the GPU. If possible, using multiple PBOs to implement an asynchronous readback scheme. Motherboard System Configuration The motherboard system configuration plays an important role in achieving fast texture downloads and readbacks. For both AGP 8x and PCIe machines, use an nforce3 or comparable AGP chipset to achieve higher transfer rates. Using a Geforce 6-series card or higher with these chipsets allows for higher speed readbacks than when using an nforce2 or comparable chipset. Set the Correct Image Format As a developer, you need to pay attention to the texture formats to ensure efficient transfer rates. Different texture formats affect performance differently. Pixel Size To begin with, make sure the pixel sizes are an integer multiple of 32-bits; if you fail to do this, the driver will perform data padding, which causes the transfer rate to slow down. TB-02011-001_v01 1 08/17/05

Achieving Optimal Bandwidth Rates Pixel Format For 8-bit textures, NVIDIA graphics cards are built to match the Microsoft GDI pixel layout, so make sure the pixel format in system memory is BGRA. Why are these formats important? Because if the texture in system memory is laid out in RGBA, the driver has to swizzle the incoming pixels to BGRA, which slows down the transfer rate. For example, in the case of glteximage2d(), the format argument specifies how to interpret the data that is laid out in memory (such as GL_BGRA, GL_RGBA, or GL_RED); the internalformat argument specifies how the graphics card internally stores the pixel data in terms of bits (GL_RGB16, GL_RGBA8, and GL_R3_G3_B2, to name a few). To make matters more confusing, OpenGL allows you to specify GL_RGBA as an internal format, but this is taken to mean GL_RGBA8. It is always best to explicitly specify the number of bits in the internal format. Refer to Table 1 to see the performance impact of using non-optimal texture formats. Note, this is not the case with 16-bit and 32-bit floating point formats. Here are code snippets showing what to avoid and how to achieve fast transfer rates: //These calls will cause a slow down because of driver swizzling glteximage2d(gl_texture_rectangle_nv, 0, GL_RGBA8, img_width, img_height, 0, GL_RGBA, GL_UNSIGNED_BYTE, img_data); glteximage2d(gl_texture_rectangle_nv, 0, GL_RGBA, img_width, img_height, 0, GL_RGBA, GL_UNSIGNED_BYTE, img_data); glteximage2d(gl_texture_rectangle_nv, 0, GL_FLOAT_RGBA16_NV, img_width, img_height, 0, GL_BGRA, GL_HALF_FLOAT_NV, img_data); glteximage2d(gl_texture_rectangle_nv, 0, GL_FLOAT_RGBA_NV, img_width, img_height, 0, GL_BGRA, GL_FLOAT_NV, img_data); //These calls would not require unnecessary swizzling glteximage2d(gl_texture_rectangle_nv, 0, GL_RGBA8, img_width, img_height, 0, GL_BGRA, GL_UNSIGNED_BYTE, img_data); glteximage2d(gl_texture_rectangle_nv, 0, GL_RGBA, img_width, img_height, 0, GL_BGRA, GL_UNSIGNED_BYTE, img_data); glteximage2d(gl_texture_rectangle_nv, 0, GL_FLOAT_RGBA16_NV, img_width, img_height, 0, GL_RGBA, GL_HALF_FLOAT_NV, img_data); glteximage2d(gl_texture_rectangle_nv, 0, GL_FLOAT_RGBA_NV, img_width, img_height, 0, GL_RGBA, GL_FLOAT_NV, img_data); TB-02011-001_v01 2 08/17/05 NVIDIA CONFIDENTIAL

Achieving Optimal Bandwidth Rates Table 1. Performance Impact of Using Non-Optimal Texture Formats using an AMD64 Athlon 3500+, 1 GB of RAM, using an nforce4 chipset. A 1024x1024 texture is used for measuring transfer Readback FX8RGBA FX8BGRA FP16RGBA FP16BGRA FP32RGBA FP32BGRA GeForce 6800 Ultra 768 MB/s 790 MB/s 602 MB/s 650 MB/s 568 MB/s 581 MB/s GeForce 7800 GT Quadro FX 4400 Quadro FX 4500 Download GeForce 6800 Ultra GeForce 7800 GT Quadro FX 4400 Quadro FX 4500 770 MB/s 718 MB/s 630 MB/s 653 MB/s 640 MB/s 677 MB/s 789 MB/s 824 MB/s 787 MB/s 827 MB/s 710 MB/s 738 MB/s 745 MB/s 805 MB/s 760 MB/s 806 MB/s 770 MB/s 810 MB/s 480 MB/s 1596 MB/s 1583 MB/s 490 MB/s 1734 MB/s 480 MB/s 483 MB/s 2238 MB/s 1987 MB/s 527 MB/s 2367 MB/s 525 MB/s 471 MB/s 2167 MB/s 2157 MB/s 503 MB/s 2237 MB/s 463 MB/s 490 MB/s 2605 MB/s 2613 MB/s 493 MB/s 2689 MB/s 527 MB/s Use the Pixel Buffer Object Extension To achieve fast transfers to and from the graphics card, applications should use the Pixel Buffer Object (PBO) extension. Conceptually, a PBOs is simply an array of bytes in memory. To achieve a fast readback, bind the buffer object to GL_PIXEL_PACK_ BUFFER using glbindbufferarb() After a buffer is bound, glreadpixels() will pack (write) data into the Pixel Buffer Object. To download data to the GPU, bind the buffer to the GL_PIXEL_UNPACK_ BUFFER target using glbindbufferarb(). The glteximage2d() command will then unpack (read) their data from the buffer object. To modify the data in the PBO, use glmapbuffer() to retrieve a pointer to the PBO s data. Once data modification is completed, the application must issue a glunmapbuffer(). All updates to the buffer object must be done between these two calls. These PBOs can improve performance because they allow the driver to streamline reading and writing to and from video memory. For example, when streaming TB-02011-001_v01 3 08/17/05 NVIDIA CONFIDENTIAL

Achieving Optimal Bandwidth Rates textures, using PBOs and glmapbuffer() and glunmapbuffer() usually eliminates an expensive data copy that is usually required for downloading a texture to the GPU. Implement an Asynchronous Readback Problem OpenGL makes it difficult to pipeline readback of multiple images. For example, if the application requests readback data using glreadpixels, the driver often has to send the hardware a readback command and wait for all the data to return before it can let the application proceed. This stall prevents the application from processing the readback data while kicking off another glreadpixels(). Moreover, stalls can occur because any impending commands to the frame buffer must be completed before readback can begin. This is essentially the same as a glfinish() before every glreadpixels(). Solutions If, however, you use PBOs, the application can work around these stalls and perform asynchronous readback of the frame buffer. When you use PBOs, glreadpixels()returns asynchronously and doesn t wait for the data to return from the GPU. Instead, the driver ensures that all the data to be read back is ready when the application issues the glmapbuffer() command. By using multiple PBOs, such as the two explained below, the application can asynchronously read back data. Split the Frame Buffer into Multiple Portions One way of asynchronously reading back data is to split the frame buffer into multiple portions and map those into different PBOs. For example, if we split the frame buffer in half, the application could issue glreadpixels() from the top half of the frame buffer into PBO1, and issue glreadpixels() from the bottom half of the frame buffer into PBO2. Because glreadpixels() returns asynchronously when using PBOs, both transfers are kicked off simultaneously. Then, the application can map the top portion of the frame into PBO1, causing any outstanding Direct Memory Access (DMA) transfers to finish. The benefit is that the application is not being stalled by DMA transfers into the bottom half of the frame. This means the application can perform calculations on the readback data from the first half of the frame buffer while data is still being readback from the second portion of the frame. This allows the application to pipeline CPU processing of readback data and continue to read back data from the frame buffer. Be aware, however, this method still requires rendering of the image TB-02011-001_v01 4 08/17/05 NVIDIA CONFIDENTIAL

Achieving Optimal Bandwidth Rates to complete before it is read back. In other words, there is always an implicit glfinish() before each glreadpixels(). Map Different Frames to Different PBOs Another way of asynchronously reading back data requires mapping different frames to different PBOs. For example, using two PBOs, the application calls glreadpixels() into PBO1 at frame n. At frame n+1, the application calls glreadpixels() into PBO2 and then processes the data in PBO1. Ideally, enough time should pass between the readback calls to allow the first to complete to before the CPU begins processing it. By alternating between PBO1 and PBO2 on every frame, asynchronous readback can be achieved. Moreover, applications can wait two or more frames. Conclusion To achieve efficient bandwidth rates, applications should implement fast download and readback paths. Using the techniques outlined in this technical brief should improve performance of data transfers. TB-02011-001_v01 5 08/17/05 NVIDIA CONFIDENTIAL

Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, MATERIALS ) ARE BEING PROVIDED AS IS. NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, the NVIDIA logo, GeForce, and Quadro are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. Copyright 2005 by NVIDIA Corporation. All rights reserved. NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 www.nvidia.com