1 AMD GPU Association Targeting GPUs for Load Balancing in OpenGL The contents of this document are provided in connection with Advanced Micro Devices, Inc. ( AMD ) products. THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS AND AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS PUBLICATION AND RESERVES THE RIGHT TO MAKE CHANGES TO SPECIFICATIONS AND PRODUCT DESCRIPTIONS AT ANY TIME WITHOUT NOTICE. The information contained herein may be of a preliminary or advance nature and is subject to change without notice. No license, whether express, implied, arising by estoppel or otherwise, to any intellectual property rights is granted by this publication. EXCEPT AS SET FORTH IN AMD S STANDARD TERMS AND CONDITIONS OF SALE, AMD ASSUMES NO LIABILITY WHATSOEVER, AND DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO ITS PRODUCTS INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR INFRINGEMENT OF ANY INTELLECTUAL PROPERTY RIGHT. AMD s products are not designated, intended, authorized or warranted for use as components in systems intended for surgical implant in the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD s products could create a situation where personal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice Advanced Micro Devices, Inc. Al l rights reserved. AMD, the AMD Arrow logo, ATI, CrossFire, AMD FirePro, ATI Radeon, ATI FireGL, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Windows is a registered trademark of Microsoft Corporation. Other names used in this publication may be trademarks of their respective companies.
2 Introduction In modern workstation and gaming systems it is becoming increasingly more common to see multiple GPUs. Performance gains can be seen on these systems by enabling ATI CrossFire mode. Multiple GPUs also enable a system to support more display devices. There are many benefits to having multiple graphics cards in a workstation. The computing industry has seen similar trends with CPUs. Originally multiple CPU configurations were reserved for servers and provided little benefit on end user PCs. In the last few years, systems with at least 2 CPUs have become very common. More importantly, we now have applications and OSes that can take better advantage of the multiple processing units. The industry is again on the verge of a shift as graphics processors have become smaller, more powerful and have found their way into commonly used devices. What is currently lacking is a way to make efficient use of multiple GPUs, especially if there are more than two, or when not configured for ATI CrossFire. GPU Association Picking a GPU AMD has provided a method for addressing the problem of not having access to all GPUs, specifically for OpenGL. A new extension, WGL_AMD_GPU_association, allows applications access to specific AMD GPUs. This allows applications to make intelligent decisions about how to most efficiently allocate and execute rendering tasks. This extension is available on Catalyst 9.3 and newer driver kits. The extension is written against OpenGL 1.5, but is really a WGL extension that has limited direct interaction with GL. It will also function equally well on OpenGL 2.0, 2.1, 3.0, 3.1, and beyond. Currently only version OpenGL 2.1 and later are available on AMD drivers, OpenGL 2.1 is backwards compatible with OpenGL 1.5.s. WGL_AMD_GPU_association provides several key pieces of functionality that make efficient distribution of rendering possible. The first is a method of determining what graphics resources are available in a system. Next is a way to allocate OpenGL contexts on specific GPUs and query where non-associated contexts have been allocated. Last is a new path for fast and efficient data transfer between contexts. Determining Topology GPU Count In order to make efficient use of a system s graphic resources, an application must first know what GPUs are available. WGL_AMD_GPU_association has provided a set of interfaces to do exactly that. wglgetgpuidsamd allows an application to get an enumerated list of all graphics processors in the system. These are each identified by a unique ID. IDs are valid for the current instance of the operating system and current configuration. Rebooting a system, disabling resources, or otherwise fundamentally
3 changing the system configuration could invalidate the list of IDs. Caching the IDs between reboots and expecting all resources to be the same is invalid. UINT wglgetgpuidsamd(uint maxcount, UINT *ids); To use this function, allocate an array of unsigned integers and pass a pointer this into wglgetgpuidsamd along with the size of the array, an array sized at 16 values will be sufficient for the near future. The value returned by this function is the actual number of GPUs available in the system. In most cases, this value will be the number of IDs written to the ids array. However, if the array size was too small to handle the complete list, the returned number available GPUs will be larger than maxcount passed into the function. In this case only maxcount values will be written to the array. The ID 0 will never be returned. If the ids pointer was null, no values will be written, but the total number of GPUs in the system will still be returned. An application can first call wglgetgpuidsamd to determine an appropriate size for the array. GPU Properties Once the list of GPUs is known, an application will need to determine what each GPU is capable of in order to make an educated decision on how to distribute rendering. The function wglgetgpuinfoamdx can be used to find out what each GPU is capable of. INT wglgetgpuinfoamd(uint id, WGL_ENUM property, ENUM datatype, UINT size, void *data); To determine the capabilities of a GPU enumerated in wglgetgpuidsamdx, pass this ID into the id field of wglgetgpuinfoamd. The properties that can be queried are listed in Table 1 below. These enums are passed into the property field. The datatype field is used to specify the return type of the data requested by the application. Valid values are GL_UNSIGNED_INT, GL_INT, GL_FLOAT, and GL_UNSIGNED_SHORT. If the datatype value is not an appropriate match for the property requested, the function will simply return -1. For instance, applications should not use type GL_FLOAT to query string values. Property WGL_GPU_OPENGL_VERSION_STRING WGL_GPU_RENDERER_STRING Description Returns the OpenGL version for this GPU. This corresponds to a call to glgetstring(gl_ VERSION), but can be done before creating an OpenGL context. Data type GL_UNSIGNED_SHORT should be used, and the values will be returned in an array with the array size returned by the function. Returns the renderer string for this GPU. This corresponds to a call to glgetstring(gl_ RENDERER), but can be done before
4 creating an OpenGL context. This will be the proprietary GPU name. Data type GL_UNSIGNED_SHORT should be used, and the values will be returned in an array with the array size returned by the function. WGL_GPU_FASTEST_TARGET_GPUS WGL_GPU_RAM WGL_GPU_CLOCK WGL_GPU_NUM_PIPES WGL_GPU_NUM_SIMD WGL_GPU_NUM_RB WGL_GPU_NUM_SPI Returns an array of GPU IDs ordered from the fastest at index 0 to the slowest at index size-1. The method to determine GPU ordering is proprietary, but will include GPU family as well as clock speeds. This is not simply a list sorted by clock speed. Returns the amount of GPU RAM in megabytes. This is a single value. Returns the GPU clock frequency in megahertz. This is a single value. Returns the number of 3D pipes on the GPU. This is a single value. Returns the number of SIMD units in each shader pipe. This is a single value. Returns the number of render backends. This is a single value. Returns the number of shader parameter interpolators. This is a single value. Table 1 Property values accepted by wglgetgpuinfoamd These GPU properties can be queried and used to find the best GPU match for a specific rendering task. For instance, one context may be heavily texture or renderbuffer dependent and the application should use WGL_GPU_RAM_AMD as the first sorting parameter. A second context may have extensive shader computations, for which the application could use WGL_GPU_NUM_SIMD to determine the best GPU. Creating Contexts On AMD hardware, an OpenGL context will automatically be associated with the card attached to the display the window was created on. For example, if an application creates a window and an OpenGL context on display 2 which is attached to a secondary card, the OpenGL context will run natively on this secondary card. After an application has determined which contexts to use based on the resources and capabilities of each GPU, it can create contexts using appropriate IDs. Applications may always want to create a local, native context on the primary card. Once this is done, information regarding the card this unassociated context was created on can still be queried by getting the ID using wglgetcontextgpuidamd and
5 then performing the queries as described above. This will provide the information necessary to make sure that off-screen contexts do not collide with the native context GPU. UINT wglgetcontextgpuidamd(hglrc hglrc); To find the ID value for a previously created context, use the wglgetcontextgpuidamd function. The HGLRC passed into this function can be from an associated context, or from a generic context created by calling wglcreatecontext or wglcreatecontextattribsarb. The value returned will be the ID of the GPU the context is tied to. If the HGLRC passed in is invalid or if an error has occurred, the function will fail and return 0. Creating an associated context can be done by simply calling wglcreateassociatedcontextamd. This function behaves similarly to wglcreatecontext but takes a GPU ID instead of a hdc. HGLRC wglcreateassociatedcontextamd(uint id); Use the GPU ID for the GPU this context is intended to run on. A device handle is not necessary because this context will not be attached to a display device. Instead it is attached to the specified GPU. It is important to note that this context will not be associated with or attached to a window. Additionally, a specific type of associated GL context can be created by using the wglcreateassociatedcontextattribsamd version. The attributes specified here are the same as those specified for wglcreatecontextattribsarb. This function allows applications to specify the version and type of associated context to be created. HGLRC wglcreateassociatedcontextattribsamd(uint id, HGLRC hsharecontext, const int *attriblist); To delete an associated context, call wgldeleteassociatedcontextamd. This function will only take HGLRCs that were created by calling wglcreateassociatedcontextamd. If a HGLRC was not created with wglcreateassociatedcontextamd, the function will fail and return false. Note that associated contexts should not be deleted by calling wgldeletecontext. This call will also fail and may result in undefined behavior. BOOL wgldeleteassociatedcontextamd(hglrc hglrc); Rendering with an Associated Context Once associated contexts are created, they can be bound for use by calling wglmakeassociatedcontextcurrentamd. The same rules apply to this function as to wgldeleteassociatedcontextamd. HGLRCs must be created by calling wglcreateassociatedcontextamd or wglcreateassociatedcontextattribsamd, and associated contexts must not be deleted by calling wgldeleteassociatedcontextamd. BOOL wglmakeassociatedcontextcurrentamd(hglrc hglrc);
6 Note that only one context can be current to a thread at a time, regardless of which creation function was used to generate them. A method to query the currently bound associated context is also provided through wglgetcurrentassociatedcontextamd. Call this function to get the current associated context. If none is current, the function will return NULL. If a non-associated context is current, the function will also return NULL. HGLRC wglgetcurrentassociatedcontextamd(void); After making an associated context current an application will have to do several things before the context can be used. Because there are no windows attached to the associated context, there are also no drawable surfaces (or readable). Essentially, the default framebuffer object is invalid for rendering. An error will be thrown if attempted. The reasoning behind this is that the GL does not know what rendering the associated context will be used for. Applications have complete flexibility to create whatever surfaces they desire. Additionally, the associated context will not use GPU resources by allocating a drawable surface that most applications will not use. To start, create renderbuffers with the desired size and format. Then attach them to a framebuffer object attach that object. Once the new framebuffer is FRAMEBUFFER_COMPLETE, the context can be used for rendering. Use the context as any GL context with a framebuffer object attached. Use of wglswapbuffers or wglswaplayerbuffers has no effect on an associated context. There is no window attached to the context. Sharing Pixel Data Between Contexts One way to move data between contexts is to copy pixels or data out of GPU memory and into system memory. Then the data can be copied back up to the other context. This is functional, but not very efficient. Another option is to share data between contexts and access the data directly. However associated contexts that are not associated with the same GPU cannot share data because they do not reside on the same physical hardware. A new data transfer function has been created to allow an application to transfer data between contexts quickly and efficiently. This function is called wglblitcontextframebufferamd. It can be used to push data from the attached framebuffer in the current context to another context. This interface follows all of the rules defined in EXT_framebuffer_blit for the glblitframebufferext interface. VOID wglblitcontextframebufferamd(hglrc dstctx, GLint srcx0, GLint srcy0, GLint srcx1, GLint srcy1, GLint dstx0, GLint dsty0, GLint dstx1, GLint dsty1, GLbitfield mask, GLenum filter);
7 This provides a mechanism for applications to allow the OpenGL driver to do the data copy instead. The driver is aware of the fastest mechanism for transferring data between contexts, which may take several paths that are considerably faster than copying to system memory and over to the second GPU. Specify the destination context using the dstctx parameter. The source and destination regions are specified through the src and dest parameters. The mask parameter allows for selection of specific renderbuffer types; color, depth and/or stencil. The filter parameter specifies how the source image is interpolated when the stretching is necessary. All error behavior specified for EXT_framebuffer_blit is also applicable to WGL_AMD_GPU_association. Additionally, the source context (current context) cannot be used as the destination context. The destination context must be a valid context. A context must be current at the time of this call. All of these conditions will generate GL_INVALID_OPERATION errors in the GL error stream. Make sure the proper framebuffers are bound to the GL_DRAW_FRAMEBUFFER_EXT and GL_READ_FRAMEBUFFER_EXT attachments in the destination and source contexts. wglblitcontextframebufferamd does not perform any synchronization on attached surfaces. The application must ensure all pending rendering operations are complete on both the source and destination surfaces before executing the blit call. Alternatives There are several existing methods for distributing rendering, although each has limitations. First, an application can manually create contexts and windows on multiple GPUs. Data copying between contexts would have to be done through the CPU. The window drawables for the additional contexts will likely be wasted as the application would in most cases not want to display off-screen rendering. One of the bigger limitations to this approach is that the application does not know the capabilities of the GPUs it is executing on. The application also cannot be certain that these contexts are actually executing on separate GPUs. Another alternative is to use WGL_NV_GPU_affinity. This extension also allows for selecting a target GPU for a context. It uses a DC tied to a GPU to accomplish this. This method can be error prone because of the requirement to match affinity DCs with affinity contexts. It also requires setting a pixel format for a specific DC which is then inherited by contexts, even though rendering will generally be off-screen. The WGL_NV_GPU_affinity extension also does not provide a method for efficiently copying data between contexts. Using WGL_AMD_GPU_association Before using WGL_AMD_GPU_association, an application should test for its existence. const char * extensions = wglgetextensionsstringarb(g_hdc); if (strstr(extensions, "WGL_AMD_gpu_association")!= NULL) // Get WGL_AMD_GPU_association entrypoints
8 // Note that a GL context should be current at this time The main rendering context should be setup as always. This context will be used for displaying the output directly to the application window. g_hrcmain = wglcreatecontext( g_hdc ); wglmakecurrent( g_hdc, g_hrcmain ); // Also setup main context for rendering Also determine the GPU name for the main context. int nmaingpuid = wglgetcontextgpuidamd(g_hrcmain) Next the list of GPUs available for associated rendering can be queried. Note that one of the GPUs will be nmaingpuid which was returned in the above call and will also be used for displaying to the window. UINT uigpuids = 0 ; UINT maxgpus = wglgetgpuidsamd(16, uigpuids); if (maxgpus == 16) // The size of uigpuids may not have been large enough to // hold all GPUs availible, call again with a larger array Once the GPU names are known, the individual attributes can be determined. For simplicity, we will only compare GPU clock speed to prioritize GPU usage. But applications should take all relevant information into account when choosing which GPU to use for a particular rendering task. int nreturneddatacount = wglgetgpuinfoamd(uigpuids[i], WGL_GPU_CLOCK_AMD, GL_UNSIGNED_INT, 16, intdata[i]); if(nreturneddatacount!= 1) // wglgetgpuinfoamd failed. Possibly invalid GPU name used. Now that we have the clock speeds for the GPUs available, find the best candidate for off-screen processing. // find the fastest GPU that is not used for displaying to the window int nfastestgpu = -1; int nfastestgpuspeed = -1; for (int j = 0; j < maxgpus; j++) if ((uigpuids[j]!= nmaingpuid) && (intdata[j] > nfastestgpuspeed))
9 nfastestgpu = uigpuids[j]; nfastestgpuspeed = intdata[j]; Check the highest supported OpenGL version for the fastest GPU. char chardata = 0 ; nreturneddatacount = wglgetgpuinfoamd(ngpuid, WGL_GPU_OPENGL_VERSION_STRING_AMD, GL_UNSIGNED_BYTE, 64, chardata); if(nreturneddatacount < 1) // An error occured else if ((chardata == '3' && chardata >= '1') (chardata >= '3')) // Can support an OpenGL 3.1 or greater context Create a new associated context for off-screen rendering on the candidate we just selected. Specify a specific context version. int attriblist = WGL_CONTEXT_MAJOR_VERSION_ARB, 3, WGL_CONTEXT_MINOR_VERSION_ARB, 2, NULL ; HGLRC hoffscrctx = wglcreateassociatedcontextattribsamd(uigpuids[i], NULL, attriblist); if (hoffscrctx == 0) // An error occured Now that we have an associated context for off-screen rendering, make it current and render to it. wglmakeassociatedcontextcurrentamd(hoffscrctx); // Setup render target UINT nshadowpassfboname = 0; glgenframebuffers(1, &nshadowpassfboname); glbindframebuffer(gl_draw_framebuffer, nshadowpassfboname); UINT nshadowpassrboname = 0; glgenrenderbuffers(1, &nshadowpassrboname); glbindrenderbuffer(gl_renderbuffer, nshadowpassrboname); glrenderbufferstorage(gl_renderbuffer, 1, DEPTH_COMPONENT24, 1024, 768); glframebufferrender(gl_framebuffer, GL_DEPTH_ATTACHMENT, GL_RENDERBUFFER, nshadowpassrboname); // Begin offscreen rendering
10 ... wglmakecurrent(g_hdc, hrcmain); // Setup Main context UINT nremotedatafboname = 0; glgenframebuffers(1, &nremotedatafboname); glbindframebuffer(gl_draw_framebuffer, nremotedatafboname); UINT nremotedatarboname = 0; glgenrenderbuffers(1, &nremotedatarboname); glbindrenderbuffer(gl_renderbuffer, nremotedatarboname); glrenderbufferstorage(gl_renderbuffer, 1, DEPTH_COMPONENT24, 1024, 768); glframebufferrender(gl_framebuffer, GL_DEPTH_ATTACHMENT, GL_RENDERBUFFER, nshadowpassrboname); Once off-screen rendering is complete, the resulting image data can be brought back to the primary GPU for use with the final scene. // Copy data to RBO on main context wglblitcontextframebufferamd(hrcmain, 0, 0, 1024, 768, 0, 0, 1024,768, GL_DEPTH_BUFFER_BIT, GL_LINEAR); Now that the data from the associated context (off-screen) is brought to the local GPU, it can be used locally. Once finished, cleanup all contexts. // Cleanup contexts wglmakecurrent(g_hdc, NULL); wgldeleteassociatedcontext(hoffscrctx); wgldeletecontext(hrcmain); Synchronizing Data Transfers Between Contexts OpenGL 3.2 added sync objects and fences to help multiple context applications synchronize work without having to stall the graphics pipeline, among other reasons. This mechanism works well with WGL_AMD_GPU_association, allowing applications to synchronize the rendering and transfer of remote data. It can also be used on earlier versions of OpenGL if the extension GL_ARB_sync is supported. To use sync objects in the example above, just insert a sync object after remote rendering. wglmakeassociatedcontextcurrentamd(hoffscrctx); // Render to FBO...
11 // Copy result to main context wglblitcontextframebufferamd(hrcmain, 0, 0, 1024, 768, 0, 0, 1024,768, GL_DEPTH_BUFFER_BIT, GL_LINEAR); // Insert Fence UINT remotefence = glfencesync(gl_sync_gpu_commands_complete, 0); Then in the main thread, or a thread created for reading the result of the sync object, test the status of the fence to find out if the data is ready. // Main rendering loop... // Test fence GLenum syncresult = glclientwaitsync(remotefence, GL_SYNC_FLUSH_COMMAND_BIT, 0); if (syncresult == GL_CONDITION_SATISFIED syncresult == GL_ALREADY_SIGNALED) // Rendering complete and result ready else if (syncresult == GL_TIMEOUT_EXPIRED syncresult == GL_WAIT_FAILED) // Error ocured Once finished with sync object, delete it. // Cleanup fence gldeletesync(syncresult); The cost in time to copy data from one GPU to another is not insignificant. Because of this, it is important to plan what rendering should be done on remote GPUs, leaving time for copies to the main GPU. Techniques to Efficiently Distribute Work There are several well known techniques to distribute the rendering on multiple GPUs and to combine the results to a final image. Examples are 2D Decomposition (sort first) Database Decomposition (sort last)
12 Time based Decomposition Eye decomposition for stereo rendering All of them can be implemented by using the WGL_AMD_GPU_association extension. To efficiently implement those techniques on multiple GPUs in a system the application needs to create one rendering thread per GPU. Each GPU can render its portion of the final image and when finished the different sub-images are combined in a compositing step. Typically the rendering threads will look like shown below. Depending on the technique the data that is blitted and also the composing step will differ. The following example will show how to implement the DB Decomposition using WGL_AMD_GPU_association.
13 Example Database Decomposition The idea of the database decomposition is to distribute the geometry on multiple GPUs and to compose the final image by taking into account the depth values. Each GPU will only render a subset of the total geometry and will provide a color and a depth texture to the composing shader. Depending on the depth value the shader will choose which texel to display. The Master Thread: First create 2 FBOs each with a color and depth attachment. One is used for local rendering, the second as destination for the blit of the remote GPU. Render one half of the geometry into fbo: // Draw to local FBO glbindframebufferext(gl_framebuffer, fbo); glclear(gl_color_buffer_bit GL_DEPTH_BUFFER_BIT); glpushmatrix(); glrotatef( angle, 0.0f, 1.0f, 0.0f); // Draw only half of the elements ra->drawrange(0, ra->getnumelements()/2); glpopmatrix(); // Bind fbo as destination for the blit. glbindframebufferext(gl_draw_framebuffer, fbo); // Update rotation angle for the next frame angle += 0.05; // Indicate that master is ready -> Slave will start Blit ReleaseSemaphore(gMasterReady, 1, 0); // Wait for slave to finish blit WaitForSingleObject(gSlaveReady, INFINITE); // Compose Results glbindframebufferext(gl_framebuffer, 0); if (greadytocompose) compose(pwin->getwidth(), pwin->getheight(), ct, dt); pwin->swapbuffer(); greadytocompose = false; The Slave Thread: First create a FBO as render target.
14 Render the second half of the geometry into FBO. As soon as the Master has also finished rendering trigger the blit and release Semaphore when ready. // Draw to local FBO glbindframebufferext(gl_framebuffer, fbo); // Draw scen into fbo that is bound to ac glclear(gl_color_buffer_bit GL_DEPTH_BUFFER_BIT); glpushmatrix(); glrotatef(angle, 0.0f, 1.0f, 0.0f); // Draw the second half of the elements ra->drawrange(ra->getnumelements()/2, ra->getnumelements()/2); glpopmatrix(); // Wait fo Master to be ready WaitForSingleObject(gMasterReady, INFINITE); // Blit into FBO on master GPU wglmakeassociatedcontextcurrentamd(associatedglrc); wglblitcontextframebufferamd(glrc,0, 0, w, h, 0, 0, w, h, GL_COLOR_BUFFER_BIT GL_DEPTH_BUFFER_BIT, GL_NEAREST); // Insert fence in gl stream to check when the Blit is done BlitReadyFence = glfencesync(gl_sync_gpu_commands_complete, 0); // Wait for blit to finish GLenum BlitStatus = glclientwaitsync(blitreadyfence, GL_SYNC_FLUSH_COMMANDS_BIT, maxtimeout); if (BlitStatus == GL_CONDITION_SATISFIED BlitStatus == GL_ALREADY_SIGNALED) greadytocompose = true; else greadytocompose = false; // Indicate that blit is finished ReleaseSemaphore(gSlaveReady, 1, 0); gldeletesync(blitreadyfence); Composing: Draw a screen aligned quad with the following fragment shader bound. varying vec2 Texcoord; // Texures renderd by master uniform sampler2d color0; uniform sampler2d depth0; // Textures rendered by slave uniform sampler2d color1; uniform sampler2d depth1; void main() float d0 = texture2d(depth0, Texcoord).r; float d1 = texture2d(depth1, Texcoord).r; vec4 c0 = texture2d(color0, Texcoord); vec4 c1 = texture2d(color1, Texcoord); if (d0 < d1) gl_fragcolor = c0; else
15 gl_fragcolor = c1; The pictures below show the different color and depth textures that are used to get the final image. The blue geometry was rendered on GPU1 the red on GPU 2. This approach can easily be modified to compute shadow maps, environment maps or reflections on the second GPU.
16 Using Dissimilar GPUs The ideal situation is to have multiple high performance GPUs. But since the environment applications run in cannot always be controlled, this extension makes efforts to provide applications with as much environmental information as possible. An application can get both the local memory size as well as the speed for all available GPUs in a system. While an application can t necessarily change the GPU currently in use driving the display the application is running on, it can control how much and which portions of the rendering process are done on which GPU. For cases where the local GPU is very low power, such as an integrated GPU, all rendering can be done on a remote GPU with the result sent back to the GPU the window resides on. The best option for using dissimilar GPUs may be to throttle use of lower power GPUs in a way they can assist in rendering scenes while still competing off-screen rendering in time for the result to still be helpful. Because every system and GPU configuration may be different, experimentally testing with realistic loads will be the best way to determine what amount this will be. Interactions with Other Features ATI CrossFire ATI CrossFire and GPU association are two separate methods to accomplish accelerated performance on systems with multiple adaptors. ATI CrossFire mode is enabled by a user, not an application. When ATI CrossFire is enabled, the GPUs tied together through ATI Crossfire can only be addressed as a single GPU. Paired GPUs cannot individually be context targets. The GPU information returned when calling wglgetgpuinfoamd with the ID of the ATI CrossFire pair will be that of the most capable GPU. GPU Load Balancing Users may also enable GPU Load Balancing mode. When running in this configuration, the GPU a context resides on is dynamically selected based on usage parameters. However, creating a GPU associated context overrides the GPU Load Balancing target. MultiView Normally, Multiview will cause a context tied to a specific window to be allocated on the GPU which is powering the monitor on which the window resides. When a context is created with the GPU association extension, the context is not tied directly to a specific window. The GPU association extension will override a MultiView bias and allocate the context on the requested GPU.
17 System Configuration Compatibility For these extensions to be effective, multiple ATI graphics cards must be present in a system are one time. Additionally, there are some Operating System restrictions that limit when this mode can be used. Windows XP, Windows Vista and Windows 7 On Windows operating systems, all GPUs intended for use with WGL_AMD_GPU_association must be visible to the OS. This effectively means the windows desktop must be extended to include at least one display head of each GPU intended for use in remote rendering. No applications or windows need be present on the additional GPU displays. Availability The WGL_AMD_GPU_association extension is currently shipping on ATI Radeon and ATI FirePro hardware drivers as of Catalyst 9.3. It is supported on the following graphics cards: Professional Graphics Professional Graphics o ATI Fire Pro TM V3700, V3750, V5700, V8700 Series and newer o ATI FireGL TM V8600, V7600, V5600, V3600, V7700 Series and newer Consumer Graphics o ATI Radeon TM HD4800, HD4600, HD 4500, HD4300 Series Graphics and newer o ATI Radeon TM HD3800, HD3600, HD3400 Series Graphics and newer o ATI Radeon TM HD2900, HD2600, HD2400 Series Graphics and newer Catalyst drivers can be downloaded from Linux support will be released in the near future. Conclusions As graphic technologies advance, becoming more affordable and available, systems with more than one GPU are becoming commonplace. However, harnessing all of this power has still been a challenge. But with the use of extensions such as WGL_AMD_GPU_association, applications can take advantage of many different graphics resources in one system. WGL_AMD_GPU_association also allows applications to decide how to divide and distribute rendering tasks based on rendering load and internal application metrics. Further Reading OpenGL 3.2 ( WGL_AMD_GPU_association (
18 Removed sections Copying Objects Other Than Render Buffers OpenGL 3.0 added The cost in time to copy GPU Association for Linux GPU association is also supported on Linux if the extension GLX_AMD_GPU_association is supported. This extension is very similar to WGL_AMD_GPU_association and functions in the same way. Use glxgetgpuidsamd to get the number of supported GPUs. UINT glxgetgpuidsamd(uint maxcount, UINT *ids); Use glxgetgpuinfoamd to find out more information about a given GPU. INT glxgetgpuinfoamd(uint id, INT property, GLenum datatype, UINT size, void *data); glxgetcontextgpuidamd will return the GPU ID a context is executing on. UINT glxgetcontextgpuidamd(glxcontext ctx); To create a context that is associated with a specific GPU, call glxcreateassociatedcontextamd. GLXContext glxcreateassociatedcontextamd(uint id, GLXContext share_context); Use glxcreateassociatedcontextattribsamd to create a context with specific attributes that is also associated with a specific GPU. GLXContext glxcreateassociatedcontextattribsamd(uint id, GLXContext share_context, const int *attrib_list); Once finished with a context, call glxdeleteassociatedcontextamd. BOOL glxdeleteassociatedcontextamd(glxcontext ctx);
19 To use an associated context, call glxmakeassociatedcontextcurrentamd. BOOL glxmakeassociatedcontextcurrentamd(glxcontext ctx); To get the handle of the current associated context, call glxgetcurrentassociatedcontextamd. GLXContext glxgetcurrentassociatedcontextamd(); To get the handle of the current associated context, call glxblitcontextframebufferamd. VOID glxblitcontextframebufferamd(glxcontext dstctx, GLint srcx0, GLint srcy0, GLint srcx1, GLint srcy1, GLint dstx0, GLint dsty0, GLint dstx1, GLint dsty1, GLbitfield mask, GLenum filter); The use of these functions follows the same paradigm as those for the WGL versions described in the previous sections. More information on the details of the GLX interfaces can be found in the GLX_AMD_GPU_association extension.
System Memory - size and layout Optimum performance is only possible when application data resides in system RAM. Waiting on slower disk I/O page file adversely impacts system and application performance.
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Overview Metrics Monitor is part of Intel Media Server Studio 2015 for Linux Server. Metrics Monitor is a user space shared library
AMD and OpenCL Mike Houston Senior System Architect Advanced Micro Devices, Inc. Review: OpenCL View Processing Element Host Compute Unit Compute Device 2 AMD and OpenCL August 4, 2009 Review: OpenCL View
Catalyst Software Suite Version 9.2 Release Notes This release note provides information on the latest posting of AMD s industry leading software suite, Catalyst. This particular software suite updates
ATI Radeon 4800 series Graphics Michael Doggett Graphics Architecture Group Graphics Product Group Graphics Processing Units ATI Radeon HD 4870 AMD Stream Computing Next Generation GPUs 2 Radeon 4800 series
Performance Analysis and Optimization ARM Mali GPU Performance Counters in ARM DS-5 Streamline Performance Analyzer Lorenzo Dal Col Senior Software Engineer, ARM 1 Agenda Introduction to ARM DS-5 and Streamline
White Paper AMD PROJECT FREESYNC TABLE OF CONTENTS INTRODUCTION 3 PROJECT FREESYNC USE CASES 4 Gaming 4 Video Playback 5 System Power Savings 5 PROJECT FREESYNC IMPLEMENTATION 6 Implementation Overview
Enterprise Reporter Overview v2.5.0 This document contains a list of the reports in the Enterprise Reporter. Active Directory Reports Change History Reports Computer Reports File Storage Analysis Reports
QUADRO AND NVS DISPLAY RESOLUTION SUPPORT DA-07089-001_v02 October 2014 Application Note DOCUMENT CHANGE HISTORY DA-07089-001_v02 Version Date Authors Description of Change 01 November 1, 2013 AP, SM Initial
GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super
Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
Intel Media SDK Features in Microsoft Windows 7* Multi- Monitor Configurations on 2 nd Generation Intel Core Processor-Based Platforms Technical Advisory December 2010 Version 1.0 Document Number: 29437
1159 Sonora Court, Suite #305 Sunnyvale, CA 94086 Tel: (408) 737-9609 Implicit Sync User Guide Implicit Sync Professional Edition Users Guide Implicit Inc. May 2013 Version 2.6 Rev. 3 Copyright 2004-2013
White Paper Bruce Bao Graphics Application Engineer Intel Corporation Software Solutions for Multi-Display Setups January 2013 328563-001 Executive Summary Multi-display systems are growing in popularity.
NVIDIA GeForce Experience DU-05620-001_v02 October 9, 2012 User Guide TABLE OF CONTENTS 1 NVIDIA GeForce Experience User Guide... 1 About GeForce Experience... 1 Installing and Setting Up GeForce Experience...
Application note PDM audio software decoding on STM32 microcontrollers 1 Introduction This application note presents the algorithms and architecture of an optimized software implementation for PDM signal
Intel 810 and 815 Chipset Family Dynamic Video Technology Revision 3.0 March 2002 March 2002 1 Information in this document is provided in connection with Intel products. No license, express or implied,
SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory supports up to 4 display monitor(s) without DisplayPort 4 x Maximum Display
Dell Statistica 13.0 2015 Dell Inc. ALL RIGHTS RESERVED. This guide contains proprietary information protected by copyright. The software described in this guide is furnished under a software license or
email@example.com Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
OpenGL Insights Edited by Patrick Cozzi and Christophe Riccio ARB debug output: A Helping Hand for Desperate Developers 33 António Ramires Fernandes and Bruno Oliveira 33.1 Introduction Since the inception
Family 12h AMD Athlon II Processor Publication # 50322 Revision: 3.00 Issue Date: December 2011 Advanced Micro Devices 2011 Advanced Micro Devices, Inc. All rights reserved. The contents of this document
Dell Statistica Document Management System (SDMS) Installation Instructions 2015 Dell Inc. ALL RIGHTS RESERVED. This guide contains proprietary information protected by copyright. The software described
NVIDIA NVIEW DESKTOP MANAGEMENT SOFTWARE TB-03966-001_v03 September 2013 Technical Brief DOCUMENT CHANGE HISTORY TB-03966-001_v03 Version Date Authors Description of Change 01 April 21, 2008 DW Initial
Intel Solid-State Drive Optimizer Getting the most out of your 34nm Intel X25-M and X18-M Solid-State Drive. White Paper Intel SSD Optimizer Abstract Today, when a user deletes a file from their system,
AMD Proprietary Linux Release Notes Web Content This release note provides information on the latest posting of AMD s Proprietary Linux driver. This particular driver updates the software version to 8.561.
AMD DASHConfig Tool Document version: 1.0 March 27 th, 2013 White Paper Descriptor This whitepaper provides users with detailed description about using AMD DASHConfig tool. DASHConfig is for provisioning
Image Synthesis Transparency Inter-Object realism Covers different kinds of interactions between objects Increasing realism in the scene Relationships between objects easier to understand Shadows, Reflections,
SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC Specification Display Support Output GPU Video Memory Dimension Software Accessory 3 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort 1.2
WELCOME TO THE FUTURE OF DIGITAL SIGNAGE Welcome to the future of digital signage Retail outlets, entertainment venues, travel hubs, bars, restaurants, hotels, company offices Digital Signage (DS) is firmly
SyAM Software* Server Monitor Local/Central* on a Microsoft* Windows* Operating System with Internal Storage Focusing on IPMI Out of Band Management Recipe ID: 19SYAM190000000011-01 Contents Hardware Components...3
Intel Power Management Technologies for Processor Graphics, Display, and Memory White Paper For 2010-2011 Desktop and Notebook Platforms August 2010 Document Number: 324226-002 INFORMATION IN THIS DOCUMENT
3D STEREO SETUP WITH PROJECTOR DA-07307-001_v01 June 2014 Application Note DOCUMENT CHANGE HISTORY DA-07307-001_v01 Version Date Authors Description of Change 01 June 18, 2014 EL, SM Initial Release 3D
Shader Model 3.0 Ashu Rege NVIDIA Developer Technology Group Talk Outline Quick Intro GeForce 6 Series (NV4X family) New Vertex Shader Features Vertex Texture Fetch Longer Programs and Dynamic Flow Control
Family 10h AMD Phenom II Processor Product Data Sheet Publication # 46878 Revision: 3.05 Issue Date: April 2010 Advanced Micro Devices 2009, 2010 Advanced Micro Devices, Inc. All rights reserved. The contents
IEEE 754-2008 FLOATING-POINT ARITHMETIC IN AMD'S GRAPHICS CORE NEXT ARCHITECTURE MIKE SCHULTE AMD RESEARCH JULY 2016 AGENDA AMD s Graphics Core Next (GCN) Architecture Processors Featuring the GCN Architecture
2014 Dell Inc. ALL RIGHTS RESERVED. This guide contains proprietary information protected by copyright. The software described in this guide is furnished under a software license or nondisclosure agreement.
White Paper EMBEDDED GPUS ARE NO GAMBLE Designing electronic slot machine systems with optimal performance, power consumption and reliability it all starts with the processing platform TABLE OF CONTENTS
ChangeAuditor 6.0 For Windows File Servers Event Reference Guide 2013 Quest Software, Inc. ALL RIGHTS RESERVED. This guide contains proprietary information protected by copyright. The software described
Experiences with 2-D and 3-D Mathematical Plots on the Java Platform David Clayworth Maplesoft What you will learn > Techniques for writing software that plots mathematical and scientific data > How to
Test Center Enterprise ios Device Onboarding Guide Copyright Copyright 2012 Keynote DeviceAnywhere. All Rights Reserved. March 2012. Notice 2012 Keynote DeviceAnywhere. All rights reserved. THE INFORMATION
Pentium and Celeron Processors Product Guide November 2012 Revision 002 Document Number: 325628-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,
OpenGL Insights Edited by Patrick Cozzi and Christophe Riccio Browser Graphics Analysis and Optimizations 36 Chris Dirks and Omar A. Rodriguez 36.1 Introduction Understanding performance bottlenecks in
AdminToys Suite Installation & Setup Guide Copyright 2008-2009 Lovelysoft. All Rights Reserved. Information in this document is subject to change without prior notice. Certain names of program products
Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command
Several tips on how to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and postprocessing of your data with Artec
D3D9 Media Surface Sharing Between Intel Quick Sync Video and OpenCL* on Intel HD Graphics Abstract Intel has defined an extension to OpenCL* v1.0 and later, allowing applications to directly access images
Consumer vs Professional How to Select the Best Graphics Card For Your Workflow Allen Bourgoyne Director, ISV Alliances, AMD Professional Graphics Learning Objectives At the end of this class, you will
Windows Embedded Compact 7: RemoteFX and Remote Experience Thin Client Integration Windows Embedded Technical Article Summary: Microsoft RemoteFX is a new Windows Server 2008 R2 SP1 feature that enables
Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer
ChangeAuditor 5.6 For Windows File Servers Event Reference Guide 2011 Quest Software, Inc. ALL RIGHTS RESERVED. This guide contains proprietary information protected by copyright. The software described
SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory 4 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort
Intel Graphics Media Accelerator 900 White Paper September 2004 Document Number: 302624-003 INFOMATION IN THIS DOCUMENT IS POVIDED IN CONNECTION WITH INTEL PODUCTS. NO LICENSE, EXPESS O IMPLIED, BY ESTOPPEL
7.x Upgrade Instructions 2015 Table of Contents INTRODUCTION...2 SYSTEM REQUIREMENTS FOR SURESYNC 7...2 CONSIDERATIONS BEFORE UPGRADING...3 TERMINOLOGY CHANGES... 4 Relation Renamed to Job... 4 SPIAgent
Catalyst Software Suite Version 9.3 Release Notes This release note provides information on the latest posting of AMD s industry leading software suite, Catalyst. This particular software suite updates
HP and NX Introduction The purpose of this document is to provide information that will aid in selection of HP Workstations for running Siemens PLMS NX. A performance study was completed by benchmarking
ID ERRORS vr352 May 2015 ID Errors Introduction... 1 1.1. What Is an id Message... 1 1.2. How to Use id Messages... 1 Working with id Errors... 2 2.1. Viewing id Error Messages... 2 2.2. Tools That Provide
Hardware Requirements Guide Document Version: 1.04 2014-08-25 Release 8.8 and higher Typographic Conventions Type Style Example Description Words or characters quoted from the screen. These include field
Windows Hardware Certification Requirements Client and Server Systems December 2011 This document is provided as-is. Information and views expressed in this document, including URL and other Internet Web
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze Whitepaper December 2012 Anita Banerjee Contents Introduction... 3 Sorenson Squeeze... 4 Intel QSV H.264... 5 Power Performance...
Tegra Android Accelerometer Whitepaper Version 5-1 - Contents INTRODUCTION 3 COORDINATE SPACE GLOSSARY 4 ACCELEROMETER CANONICAL AXES 6 WORKING WITH ACCELEROMETER DATA 7 POWER CONSERVATION 10 SUPPORTING
FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 AGENDA The Kaveri Accelerated Processing Unit (APU) The Graphics Core Next Architecture and its Floating-Point Arithmetic
Intel Storage System SSR212CC Enclosure Management Software Installation Guide For Red Hat* Enterprise Linux Order Number: D58855-002 Disclaimer Information in this document is provided in connection with
INSTALLATION GUIDE ENTERPRISE DYNAMICS 9.0 PLEASE NOTE PRIOR TO INSTALLING On Windows 8, Windows 7 and Windows Vista you must have Administrator rights to install the software. Installing Enterprise Dynamics
Page 1 of 5 Print ATI FireGL Unified Driver Verison 8.3188.8.131.52.1 ATI's FireGL cards are designed to accelerate 3D workstation applications based on OpenGL and Microsoft DirectX 9.0. With full certification
Dell Backup Plug-in for Advanced Encryption 2.2 2014 Dell Inc. ALL RIGHTS RESERVED. This guide contains proprietary information protected by copyright. The software described in this guide is furnished
White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary
Application Note Implementing SPI Master and Slave Functionality Using the Z8 Encore! F083A AN026701-0308 Abstract This application note demonstrates a method of implementing the Serial Peripheral Interface
Foglight 5.6.7 System Requirements and Platform Support Guide 2013 Quest Software, Inc. ALL RIGHTS RESERVED. This guide contains proprietary information protected by copyright. The software described in
HP SkyRoom Frequently Asked Questions September 2009 Background... 2 Using HP SkyRoom... 3 Why HP SkyRoom?... 4 Product FYI... 4 Background What is HP SkyRoom? HP SkyRoom is a visual collaboration solution
THUM - Temperature Humidity USB Monitor The THUM is a true USB device to monitor temperature and relative humidity of an interior living, working, and storage spaces. The THUM is ideal for computer rooms,
APPLICATION NOTE AT16268: JD Smart Cloud Based Smart Plug Getting Started Guide ATSAMW25 Introduction This application note aims to help readers to get started with the Atmel smart plug reference design
Updated 03/20/12 Windows Tuning Guide for vspace 6 Configuration Recommendations for Windows Introduction NComputing products are designed to extend standard desktop PCs and lower the average cost of computing
Runtime Power Management v1.0 Introduction............................................................ 2 What is EIST?........................................................ 2 What is Cool n Quiet..................................................
Affdex SDK for Windows SDK Developer Guide 1 Introduction Affdex SDK is the culmination of years of scientific research into emotion detection, validated across thousands of tests worldwide on PC platforms,
Samsung Magician v.4.5 Introduction and Installation Guide 1 Legal Disclaimer SAMSUNG ELECTRONICS RESERVES THE RIGHT TO CHANGE PRODUCTS, INFORMATION AND SPECIFICATIONS WITHOUT NOTICE. Products and specifications