Total Recall: A Debugging Framework for GPUs
|
|
|
- Beatrix Richards
- 9 years ago
- Views:
Transcription
1 Graphics Hardware (2008) David Luebke and John D. Owens (Editors) Total Recall: A Debugging Framework for GPUs Ahmad Sharif and Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA Abstract GPUs have transformed from simple fixed-function processors to powerful, programmable stream processors and are continuing to evolve. Programming these massively parallel GPUs, however, is very different from programming a sequential CPU. Lack of native support for debugging coupled with the parallelism in the GPU makes program development for the GPU a non-trivial task. As GPU programs grow in complexity because of scaling in maximum allowed program size and increased demand in terms of realism, debugging GPU code is becoming a major timesink for content developers. In addition to more complex shaders, applications are using multi-pass effects in order to create more convincing reality. In this paper, we present a debugging framework that can be employed to debug complex code running on the GPU in an efficient manner. By observing the API calls of the application that are made to the 3D runtime, the framework can keep track of the program s state in memory. Upon the programmer s request, it is able to capture and deterministically replay the stream of instructions that caused the final write to a pixel of interest. This execution stream includes writes to intermediate render targets and spans across shader boundaries. The stream of instructions can then be replayed on the CPU via emulation and the programmer can debug the straight-line code with ease. We also present a hardware-friendly scheme that can be used to accelerate the debugging process for long-chain multi-pass effects. Categories and Subject Descriptors (according to ACM CCS): I.3.3 [Computer Graphics]: GPUs, Debugging 1. Introduction In less than two decades, graphics hardware has evolved from 2D accelerators, fixed-function 3D accelerators, to today s massively parallel programmable GPUs. Current GPUs on the market have nearly billion transistor budgets and are used for general purpose computation in addition to rendering 3D graphics. As GPUs are becoming more and more programmable, their complexity and cost to program them in terms of time and effort are also increasing. This heavy cost to program GPUs is one of the biggest man-hour sink associated with creating shaders as well as porting sequential CPU code to the GPUs. The cost of programming GPUs can be mainly attributed to the lack of native debugging support and the parallel nature of the primitive processing on the GPUs. To tackle the challenge of debugging code on the GPU, several techniques have been proposed. These techniques can be divided into two broad categories: (a) instrumenting shader code followed by outputting an intermediate value to the bound render target for the programmer to inspect, or (b) emulating all primitives sent to the GPU on the CPU. Iterative deepening is an example of the first technique. Only part of the shader is executed on the GPU and the intermediate values are dumped to the render target. These intermediate values are then displayed as colors on the screen, or read back by the CPU and printed for the programmer to view. However, current manifestations of GPU debuggers using iterative deepening cannot debug across render target switches. The latter mentioned technique, emulation of the GPU execution on the CPU, does vertex, geometry and pixel shading of the entire scene on the CPU. By emulating all the work on the CPU, the programmer can easily put in breakpoints and execute the shader step-by-step. However, emulation can be slow, especially when the number of primitives or the resolution is large, or the shader is complex. This paper introduces Total Recall, a debugging framework that content-publishers can use to rapidly identify bugs in the code running on the GPU. Our debugging framework
2 Render Chain Length Ahmad Sharif & Hsien-Hsin S. Lee / Total Recall: A Debugging Framework for GPUs allows programmers to insert conditional breakpoints inside shaders to enable error checking during development. When a breakpoint is hit, a complete execution history of the pixel that hit the breakpoint can be obtained from the debugger runtime for emulation on the CPU. The programmer can then execute the relevant code step-by-step and can observe all intermediate results. Note that this technique does not suffer the performance loss of full emulation, as only the programmer-specified primitives are selected for emulation on the CPU. By letting the CPU handle the complex task of step-by-step execution with watches on variables, we can simplify and speed up the debugging process. Our main contributions are two-fold: 1. We present a debugging framework that allows the capture and deterministic replay of the entire execution stream of a pixel of interest. Note that this stream may contain writes to dynamic textures that were the sources of the pixel of interest. This execution stream, once obtained, can be replayed step-by-step on the CPU, as desired by the programmer via emulation. 2. By tracking finer buffer dependencies, as explained in Section 4, we can accelerate the debugging process for multi-pass shaders. Hardware modification is required for this acceleration technique. The rest of this paper is organized as follows. First, our motivation for our technique is presented in Section 2. The next section, Section 3, describes the GPU programming model and describes the challenges associated with debugging GPU code. Section 4 then explains our proposed technique and delves into its implementation. Section 5 describes our experimental framework for implementing our debugger framework for Direct3D 9. Finally Section 8 concludes. 2. Motivation Most programmers are used to debugging sequential code using breakpoints, watchpoints, or printf statements. Breakpoints and watchpoints are the primary mechanism employed by debuggers such as gdb, windbg, etc., to pause or intercept the program state so the programmer can inspect variables at certain interesting events. Printf statements, a less elegant approach, are used to dump out intermediate values of program variables for the programmer to figure out how the program computed its final output. Unfortunately, neither breakpoints nor printf statements are natively available on GPUs. Emulation on the CPU can offer both of these, but emulated code runs at a much slower speed than native execution on the GPU. Another issue with debugging GPU code is that GPUs are inherently parallel machines and human programmers are not used to thinking in parallel. For debugging code that is not executing correctly, programmers are usually interested in watching or dumping intermediate values of a certain computation which they suspect is not working as expected. For graphics applications, DMark05 - Return to Proxycon 3DMark05 - Firefly Forest 3DMark05 - Canyon Flight 3DMark06 - Return to Proxycon 3DMark06 - Firefly Forest 3DMark06-3DMark06 - Canyon Deep Freeze Flight Figure 1: Comparison of different benchmarks maximum render chain length. This corresponds to the maximum number of dynamic textures data has to pass through before it shows up as a pixel on the final screen. this can be done on the GPU by compiling and running a shader whose final output is an intermediate value of the programmer s interest. Various programs, e.g., GLSLDevil [SKE07] and Shadesmith [PS03] for OpenGL, were introduced to intercept, modify and recompile shaders automatically. However, current GPU debuggers are unable to debug the entire history of a pixel that went through several passes including render target switches. The programmer might be able to go through one shader step-by-step, but no framework exists, to the best of our knowledge, that allows the programmer to step through the entire execution stream that led to the final write of the pixel of interest. A piece of data can pass through an entire render chain before appearing as a pixel on the final render target. Each shader transforms the input texture and geometry data into render target pixel values. Figure 1 shows the render chain length for several 3D benchmarks. As can be seen, a pixel value on the final screen can come from a chain of up to 17 textures in length for a benchmark (data passed through a chain of 17 dynamic textures before it appeared on the screen). Our debugger framework allows the programmer to step through all 17 shaders in sequence which makes programming multi-pass shaders much more intuitive and easier to debug. 3. GPU Programming Model and Architecture To the graphics programmer, the GPU exposes itself as a z-buffer based primitive processing engine. Primitives like triangles, lines, etc., are processed by the GPU and end up showing on the render target as pixel values. Modern graphics runtimes like Direct3D and OpenGL provide the ability
3 to perform three programmable operations on the primitives in the form of vertex, geometry and pixel shaders. Vertex shaders handle transformation and lighting of input vertices. Geometry shaders provide a way to amplify or destroy geometry in a programmable fashion while pixel shaders compute the final color of the pixels of the current render target. The GPU does not define any ordering in which primitives will pass through this pipeline. To ensure correct results, the programmer enables z-culling with which geometry hidden behind other geometry will not be displayed on the render target. Because of this ordering constraint relaxation, the GPU can be architected to achieve a large amount of parallelism for graphics workloads: primitives can be processed in parallel with respect to each other. However, this parallelism comes at a cost: since most programmers are used to writing straight-line sequential CPU code, development and debugging on the GPU becomes a non-trivial task to perform. 4. Total Recall Approach Step-by-step shader execution can be done by either: (a) stepping through all threads one instruction at a time, as iterative deepening performs, or (b) selecting one thread and stepping through it. Although stepping through all the threads has its value, debugging a single thread at a time offers several advantages over debugging all pixels at once: 1. Programmers, especially those having background in sequential CPU programming, are better at debugging sequential code than parallel code. Thus it would be easier for a programmer to follow just one execution stream rather than many, especially if different threads follow different paths (thread divergence). 2. Unlike multiple threads, a single thread will only have one set of variables that it works on. For a programmer with a background in CPU debugging, a single set of variables is easier to keep track of than multiple sets. 3. It might be the case that the programmer is interested in only one or a few bad pixels, in which case this debugging model fits perfectly in this situation. Current methods like Microsoft s PIX [Wal07] and GLSLDevil [SKE07] allow the programmer to view all writes to the pixel of interest. The programmer can step then through the shader that caused each write. Applications, however, can use multiple passes or render-to-texture in which a write is made to an off-screen buffer, which is then read back in a subsequent draw call to write to the final render target. Popular techniques like Shadow Mapping [Wil78] and deferred shading [DWS 88] use multiple pass rendering to create convincing graphics effects realtime. A complete history of the pixel should incorporate all writes that were made to off-screen buffers that led to the final write of the current pixel. To the best of our knowledge, no current GPU debugger offers this. We propose a debugging framework that allows the programmer to recreate the entire execution history of the pixel that they might be interested in. This execution history can then be replayed on the CPU and the programmer can view all intermediate values via emulation. Notice that this is different from emulating the shader invocations on all the pixels on the CPU like Visual Studio.NET does for Direct3D shaders. Only the shader invocations that the programmer is interested in are captured and emulated on the CPU. This is also different from instrumenting the shader and running it on the GPU using iterative deepening as we selectively emulate code entirely on the CPU after obtaining the input values of the relevant shaders. Our proposed debugging framework allows the following: 1. Conditional breakpoint statements inside shaders with break in program execution when the condition is satisfied. 2. Capture of the entire execution history of a particular pixel of interest. This history encompasses writes done to intermediate textures that were subsequently read by the shader that did the write to the final render target. While this can be done entirely in software, as a library above the 3D runtime, we propose hardware support to do this more efficiently and quickly. Some of the elements of our debugging framework are provided in the sub-sections below. These sub-sections will identify the "book-keeping" and data capture that is required by the debugging environment to debug shaders using our proposed technique Conditional Breakpoints Conditional breakpoints are breakpoints that get hit upon the true evaluation of a certain condition. Once the breakpoint is hit, the state of the program needs to be presented to the programmer. Current GPU hardware does not support breakpoints natively. In order to capture the render state at the breakpoint, the debugger runtime needs to first identify the exact draw call which was made when the breakpoint was hit. A simple way of doing the aforementioned is to bind an additional render target to the GPU state and write to it upon meeting a certain condition specified by the programmer as shown in Figure 2. This can be done fairly easily because conditional writes are supported by the modern GPU hardware. GPUs also support hardware counters to count the number of pixels drawn in a set of draw calls. After every draw call, the debugger runtime can read back this counter via a query to see whether this counter is non-zero. For example, in the Direct3D runtime, an object of the Occlusion- Query class can be used to perform the query on the hardware counter [Mic]. If the hardware counter is non-zero, the debug render target can be read back and searched to see which pixels hit the breakpoint. If multiple pixels are found, they can be presented to the user to select one to view its
4 Figure 2: A simple way of finding whether a breakpoint condition was met or not. A debug render target is bound to the render state and the shader instrumented with a conditional write to this render target. After the draw call, this debug render target is queried for number of pixels drawn. If the value is non-zero, the breakpoint condition was met in the draw call. execution stream. Presently, occlusion queries cannot track pixels written to individual render targets separately. Therefore, the shader has to be modified to remove the writes to the regular render targets when the debug render target is bound. Once it is known that the breakpoint was not hit, the original shader can be restored and the execution can continue like normal. If the condition intended by the programmer is to break on a certain pixel (by selecting, or otherwise specifying the coordinates of it), the debugger can isolate the draw call that writes to that pixel by doing the following. The debugger runtime creates a depth buffer (of the same resolution as the render target) that has all pixel values set to the lowest depth except for the pixel of interest. This depth buffer can then be bound to the render state and the draw call made. If the draw call wrote a value to the pixel of interest, a query will return non-zero number of pixels drawn. In this way, the precise draw call that modifies the pixel of interest can be obtained. Once we know the draw call that resulted in the breakpoint being hit, we need to extract the render state and all shader inputs to deterministically replay the shader on the CPU, where it can be executed step-by-step for analysis. The render state can be obtained by simply querying the 3D runtime on top of which the debugger runtime is running. For example, the Direct3D 9 device can be queried for the current bound render targets, textures, etc., by using function calls like ID3DDevice9::GetRenderTarget() in the ID3DDevice9 interface Obtaining the Shader Inputs Shaders on the GPU have input values coming from the previous stages of the pipeline that are used to look up textures or perform other computations. These live-in values are needed to faithfully and deterministically replay the GPU execution stream that led to the breakpoint. Current GPU hardware supports gather, but no scatter. This means that for obtaining all input variables for a shader, we can use the following approach: create a render target whose bytes per pixel is equal to or bigger than the size of the live-in values of the shader times the number of pixels. The debugger runtime can then bind this render target to the render state and create a shader that packs all shader input values and outputs them to this render target (pass-through shader). The storage requirement of this debug render target can be reduced by a multi-pass approach: at every pass, a certain number of inputs of the shader are dumped to a smaller debug render target and read back Buffer Dependencies As mentioned earlier, the goal of this debugging methodology is to provide the programmer with the entire execution history of a particular pixel. This consists of a stream of dynamic instructions that were executed on the GPU that finally led to the conditional breakpoint. The debugger runtime therefore has to keep track of all bound input textures to see whether they were a render target sometime before. A buffer dependency graph can be built by the debugger runtime by intercepting all requests to set the render target and checking whether the render target is a texture level or not. Once the render target is detected as a texture, the current shaders can be parsed and inspected to see if they are doing any look-ups from the currently bound textures. If so, those textures are added as parents to this render target. Figure 3 shows a buffer dependency graph for an application. In total 5 draw calls were made, each to a separate render target. The value of a pixel in the final render target depends on the entire chain of rendered textures as well as the input textures (and the geometry/state at each draw call). Once a breakpoint is hit, the debugger runtime can walk through the buffer dependency graph and figure out exactly how to construct the execution stream that led to the breakpoint being hit Full Execution History Execution history can be obtained by the following iterative scheme once the breakpoint has been detected as hit: 1. First, obtain the live-in values of the pixel that hit the breakpoint. This can be done by using the methods described in the sub-sections above. These live-in values of the pixel contain the texture coordinates of all its parents in the buffer dependency graph. If the parent textures are all static, meaning they have no parents in the buffer dependency graph, then can go to step 4. If the input bound textures are dynamic, go through each of the dynamic textures and program a breakpoint at the
5 Figure 3: Example buffer dependency graph. At each draw call the currently bound shader takes input from the input textures as well as the geometry and produces an output render target. Note that any pixel value in the final render target is dependent on pixel values of the previous passes render targets. coordinates obtained from the live-in values. If the dynamic texture is looked-up at various coordinates in the pixel shader, breakpoints will have to be set at all these locations. If the input dynamic texture supports texture filtering upon texture access, breakpoints will have to be set at multiple locations per set of texture input coordinates. This is done so that the emulation environment can do texture filtering to produce the same result as the GPU hardware. 2. Start from the beginning of the frame and keep making draw calls until we hit the breakpoint that was programmed in step 1. The debugger runtime has to store the initial state and all state and draw call information (as well as vertex/index buffer writes) in the current frame to replay these draw calls. This information can be stored in memory and typically does not take more than 100 MB per frame (for the 3DMark06 game tests that we tested). 3. The breakpoint was hit. Go to step 1 to read the shader inputs. 4. We now have all the relevant execution history that can be passed on to the emulator module. This history includes all writes to dynamic textures that led to the breakpoint that was programmed in by the developer. Figure 4: In debug pass 0, the render target under debug hits a breakpoint and its input values (u, v) are found. Next, a breakpoint is inserted at those coordinates in the dynamic texture. The same draw calls are made and we obtain the input of the dynamic texture (u, v ) in the second pass. Finally the full execution stream is reconstructed and emulated for debug. An example debug session is shown in Figure 4. In this case, 2 passes are required in order to obtain the shader inputs that can be used for deterministic emulation on the CPU. Once this execution history is obtained, we can emulate the entire stream on the CPU and allow the programmer to go forward and backward in time. Through emulation, the programmer should be able to view all intermediate results, set watchpoints, etc.. The emulator can compute the final expected output of the shader and match it against the pixel values obtained from the series of draw calls in the replay buffer for sanity check Challenges and Limitations The emulated output could be different from the GPU output because of floating point format and multi-sampling issues. Typical GPUs implement a non-standard floating point forc The Eurographics Association 2008.
6 mat for speed and efficiency reasons. In order to emulate the GPU execution more faithfully, a hardware-vendor supplied functional emulator can be used to emulate the non-standard floating point format implemented in hardware. This can enable the emulation result to be accurately paired with the real result from execution on the GPU. Hardware support for multi-sampling/super-sampling can also cause differences between the emulation and actual GPU execution. Since the atom of pixel execution exposed to software is a single pixel, our framework cannot insert "breakpoints" at sub-pixel locations in the dynamic textures. Thus our framework is currently limited to accurately debugging render targets that do not have hardware support for super-sampling. Alpha blending can present another challenge to isolating the shader inputs that will be used for deterministic emulation. When alpha-blending is enabled and a draw call is made, a binary search is done to isolate primitives that cause a write to the pixel location of interest. The debugger runtime has to intercept the draw call and make modified draw calls to figure out which primitives cause a write to the pixel location. The debugger runtime can use the depth-buffer clear technique mentioned before to see if half the primitives of the original draw call end up writing to the pixel of interest. Both halves have to be checked recursively to obtain all primitives that write to the pixel of interest. In the worst case, all primitives of the draw call write to the same pixel location with transparency enabled. In this case, these primitives will have to be drawn one-by-one, and all shader inputs obtained this way Accelerating multi-pass debugging The main loop explained in Section 4.4 can take some time to execute, especially if there is a long chain of buffer dependencies. In this case, hardware can be employed to accelerate this process. Each source and destination buffer can be divided into a uniform grid of blocks. The hardware can maintain a bitvector for each render target block that maintains information about which source blocks were used in writing to this destination block. From reading this bitvector, one can determine the source buffer blocks that contributed to the write of a render target block. If there is a lot of spatial coherence in the pixel shader, this bitvector will be sparse. In this case, in step 2) from the previous sub-section, we can make a less expensive draw call by only drawing a small portion of the dynamic buffer. This can be done by clearing only a few blocks of the z-buffer, so that all other blocks will not be shaded because of early-z reject. This acceleration technique requires hardware modifications. PIX and others already have functionality to determine the primitive id from the pixel value. If the exact primitive covering the pixel of interest is known, the debugging environment needs to only process that primitive in order to perform deterministic replay. This can be done in a few ways: 1. Assigning each primitive an id value and tagging a separate buffer with the primitive id of the primitive. In this way, by looking up the value of the primitive id in the corresponding pixel of the primitive-buffer, the exact primitive that covers that pixel can be determined. 2. By only clearing the z-buffer of the pixel of interest and reading back the number of pixels rendered after every draw call, the exact draw call that writes to the pixel can be determined. Then, a binary search can be performed on the primitives inside the draw call using the same mechanism to figure out exactly which primitive caused a write to the pixel. A subset of the primitives in the draw call are sent to the GPU and the hardware counter is queried to see whether they got drawn to the pixel of interest. 5. Experimental Results In order to provide a proof-of-concept debugger, we implemented a debugger runtime library for Direct3D 9. Direct3D 9 is a 3D runtime that exposes a common interface to applications and games using COM objects that want to utilize the graphics hardware. On the back-end, the runtime interacts with the driver code provided by hardware manufacturers. The Direct3D runtime handles the creation, binding and destruction of various resources like vertex and index buffers, shaders, devices, etc. The application interacts with this runtime via different interfaces like IDirect3DDevice9, etc. For our debugger, we encapsulated the Direct3D 9 interfaces in our own spy dll to capture the behavior of the application to debug. All the application needs to do is to put our spy dll inside the working directory and it will get loaded when the Windows loader inspects the import table of the application. The architecture of our debugger is shown in Figure 5. Our debugging library saves relevant per-frame information like writes to vertex buffers, creation and setting of pixel and vertex shaders, etc., to segments of memory called replay buffers. It does that so the programmer can revert back to a previous frame by the press of a button. Once the programmer is in the desired frame, a pixel to debug can be selected by clicking on it. Once we receive input from the programmer, we replay the current frame (by playing commands from the replay buffer), but instead of running the pixel shader of the application, we modify the shader on-the-fly to dump the input variables of the shader to render targets as described before. Additional render targets are created and bound if the inputs to the shader are more than the current render targets allowed. If multi-pass shading was done in the current frame, our debugger runtime detects that and iteratively executes the captured draw calls to obtain the input variables of the very first shader. Once this "execution map" is created, it is dumped to a file for the programmer to view.
7 Figure 5: Architecture of our debugger framework. The debugger exports an interface identical to the Direct3D runtime and intercepts all Direct3D function calls made by the application. Replay buffers store per-frame information. 6. Related Work Single-threaded CPU debugging is a long-studied problem and effective solutions exist. Since the programmer is mostly interested in breaking on a special condition and inspecting intermediate values in registers and memory, a special breakpoint instruction (int 3 on x86 CPUs) is placed at the point of interest. When this breakpoint instruction is executed, execution jumps to an entry point in the debugger application and it displays the intermediate values in registers and memory. From here on, the programmer can execute instructions step-by-step to gain insight into the execution of the program. Conditional breakpoints are achieved by simply evaluating the desired condition and breaking execution if it is met. A number of techniques have been proposed and implemented for GPU shader debugging. These techniques fall into two broad categories: the debugging environment intercepting shaders and modifying them on-the-fly, or emulating all shader invocations on the CPU. Shadesmith [PS03], Imdebug, Relational Debugging Engine [DNB 05] and GLSLDevil [SKE07] all use shader code instrumentation and readback after the draw call in order to debug shaders..net Shader Debugger uses the REF_RAST device and emulates shader code on the CPU. Starting from the June 2006 DirectX SDK version, PIX has a Pixel History feature which can be used to debug a single pixel by stepping through the vertex and pixel shaders that led to the write of that pixel. Shadesmith [PS03] enables the user to step through shader code forward and backward. It does so by wrapping the actual OpenGL function calls in its own library and intercepting those function calls. When a draw call is made using some shader and the user wants to debug it, Shadesmith creates a temporary shader and binds that to the pipeline and makes the same draw call. This temporary shader executes a part of the original shader and outputs the intermediate result on to the render target. After the draw call, Shadesmith reads back the pixel values which contain the intermediate results. The user can step forwards and backwards and Shadesmith handles the creation and binding of the temporary shaders. Microsoft s Visual Studio.NET can have conditional breakpoints inside shaders. It uses CPU emulation on a REF_RAST device to handle breakpoints. Every vertex and pixel is shaded on the CPU without discrimination. This emulation is slow, and becomes time-consuming especially at higher screen resolutions, high primitive count or longer shaders. Our technique does selective emulation of pixels that contribute to the pixel under debug. gdebugger [TS05] is an OpenGL debugger that allows the programmer to visualize the OpenGL state before the draw call as well as after it. It does not allow stepping through shader code, but does offer a wide variety of performance visualizations. The Relational Debugging engine [DNB 05], proposed by Duca et al, is implemented as a library for OpenGL which intercepts OpenGL function calls and stores the graphics state as a set of virtual tables. These virtual tables can then be accessed by an SQL-like query based language interface which is used to debug the application of interest. Like other debugging tools, they implement their debugger as a library over OpenGL s runtime, intercepting OpenGL state and shaders. The captured shaders are then instrumented and bound to the pipeline, yielding the desired results. GLSLDevil [SKE07] is an OpenGL debugger that allows the programmer to select intermediate values in the shader for display. An intercepting library understands the semantics put in by the programmer as debugging statements, rewrites the shaders accordingly and binds them to the pipeline. The library can then read back the debug channel of the pixel values to display intermediate variables. GLSLDevil can handle regular code, conditionals, loops and function calls in GLSL. Additionally, pixels can be selected by the user for individual debug. PIX [Wal07], in 2006, introduced a utility for debugging individual pixels called Pixel History. This feature allows the programmer to go through the shader instruction-byinstruction for all the invocations that wrote to a particular pixel. PIX can also display the state of the Direct3D runtime as well as bound and unbound resources at each point in the program s lifetime. However, unlike our approach, PIX does not allow the programmer to view the full execution history of a pixel beyond a single shader invocation. Our proposed technique is similar to other techniques proposed earlier as far as shader instrumentation is concerned. However, we only dump the input variables (live-ins) of the shader in order to deterministically replay the execution stream on the CPU. Once all the live-ins are determined, the CPU can do what it does best, i.e., go step-by-step and execute the captured stream. We also propose a hardware mechanism to speed up the capture of the execution stream.
8 7. Future Work This work could be expanded in several directions in the future. Debug history could be enhanced by including vertex shaders, geometry shaders and stream out in the emulated stream of instructions. Similarly, one could think of a way to provide a complete history of a particle for a particle system simulated on the GPU, and emulate it on the CPU for debugging purposes. With the introduction of Nvidia s CUDA [NVI08] and ATI s CTM [ATI08] technology, GPGPU programming is on the rise. However, other than emulation, there is currently no way to debug GPGPU code. This work could be extended to provide a framework that could help in debugging faulty output. The framework could allow for capture of the entire execution stream of instructions that led to a particular final output value. This stream could then be faithfully and deterministically replayed on the CPU in order to analyze program behavior and find bugs. 8. Conclusion We provided the design and implementation of a framework that allows a shader programmer to sequentialize instructions from different shaders that caused the final write of a pixel of interest. This pixel of interest can be specified as input coordinates or a conditional breakpoint can be put inside the shader. The application is transparent to the debugger runtime and does not have to be recompiled to use it. After obtaining the sequentialized instructions, they can be deterministically emulated on the CPU step-by-step with ease. This allows the programmer to inspect all intermediate values as the execution progresses. Our framework is especially well-suited to multi-pass shading algorithms, as the programmer can view the entire execution stream at once regardless of render state changes (render target switches, shader changes, etc.). We think that by looking at the entire execution stream which causes the final write to a pixel, the programmer will gain valuable insight and will be able to correct unintentional program bugs. for the graphics pipeline. ACM Transactions on Graphics 24, 3 (Aug. 2005), [DWS 88] DEERING M. F., WINNER S., SCHEDIWY B., DUFFY C., HUNT N.: The triangle processor and normal vector shader: A vlsi system for high performance graphics. In Computer Graphics (Proceedings of SIGGRAPH 88) (Aug. 1988), pp [Mic] MICROSOFT: Queries (direct3d 9). [NVI08] NVIDIA: CUDA zone - resource for C developers of applications that solve computing problems. [PS03] PURCELL T. J., SEN P.: Shadesmith fragment program debugger [SKE07] STRENGERT M., KLEIN T., ERTL T.: A hardware-aware debugger for the opengl shading language. In Graphics Hardware 2007 (Aug. 2007), pp [TS05] TEBEKA Y., SHAPIRA A.: Advanced opengl debugging and profiling with gdebugger. Game Developer s Conference (2005). [Wal07] WALBOURN C.: Debugging direct3d 10 applications. In SIGGRAPH 07: ACM SIGGRAPH 2007 courses (New York, NY, USA, 2007), ACM, pp [Wil78] WILLIAMS L.: Casting curved shadows on curved surfaces. In Computer Graphics (Proceedings of SIGGRAPH 78) (Aug. 1978), pp Acknowledgment This research was supported in part by NSF Grant CNS , a Department of Energy Early CAREER Award, and an NSF CAREER Award CNS References [ATI08] ATI: ATI researcher relations - CTM document library. [DNB 05] DUCA N., NISKI K., BILODEAU J., BOLITHO M., CHEN Y., COHEN J.: A relational debugging engine
Developer Tools. Tim Purcell NVIDIA
Developer Tools Tim Purcell NVIDIA Programming Soap Box Successful programming systems require at least three tools High level language compiler Cg, HLSL, GLSL, RTSL, Brook Debugger Profiler Debugging
Computer Graphics Hardware An Overview
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
L20: GPU Architecture and Models
L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.
The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA
The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques
Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005
Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005 Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex
GPU Architecture. Michael Doggett ATI
GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super
CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University
GPU Generations CSE 564: Visualization GPU Programming (First Steps) Klaus Mueller Computer Science Department Stony Brook University For the labs, 4th generation is desirable Graphics Hardware Pipeline
Hardware design for ray tracing
Hardware design for ray tracing Jae-sung Yoon Introduction Realtime ray tracing performance has recently been achieved even on single CPU. [Wald et al. 2001, 2002, 2004] However, higher resolutions, complex
Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions
Module 4: Beyond Static Scalar Fields Dynamic Volume Computation and Visualization on the GPU Visualization and Computer Graphics Group University of California, Davis Overview Motivation and applications
ANDROID DEVELOPER TOOLS TRAINING GTC 2014. Sébastien Dominé, NVIDIA
ANDROID DEVELOPER TOOLS TRAINING GTC 2014 Sébastien Dominé, NVIDIA AGENDA NVIDIA Developer Tools Introduction Multi-core CPU tools Graphics Developer Tools Compute Developer Tools NVIDIA Developer Tools
Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group
Shader Model 3.0 Ashu Rege NVIDIA Developer Technology Group Talk Outline Quick Intro GeForce 6 Series (NV4X family) New Vertex Shader Features Vertex Texture Fetch Longer Programs and Dynamic Flow Control
GPGPU Computing. Yong Cao
GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power
Introduction to GPGPU. Tiziano Diamanti [email protected]
[email protected] Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
Radeon HD 2900 and Geometry Generation. Michael Doggett
Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command
Writing Applications for the GPU Using the RapidMind Development Platform
Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...
OpenGL Performance Tuning
OpenGL Performance Tuning Evan Hart ATI Pipeline slides courtesy John Spitzer - NVIDIA Overview What to look for in tuning How it relates to the graphics pipeline Modern areas of interest Vertex Buffer
Performance Optimization and Debug Tools for mobile games with PlayCanvas
Performance Optimization and Debug Tools for mobile games with PlayCanvas Jonathan Kirkham, Senior Software Engineer, ARM Will Eastcott, CEO, PlayCanvas 1 Introduction Jonathan Kirkham, ARM Worked with
Silverlight for Windows Embedded Graphics and Rendering Pipeline 1
Silverlight for Windows Embedded Graphics and Rendering Pipeline 1 Silverlight for Windows Embedded Graphics and Rendering Pipeline Windows Embedded Compact 7 Technical Article Writers: David Franklin,
Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg
Image Processing and Computer Graphics Rendering Pipeline Matthias Teschner Computer Science Department University of Freiburg Outline introduction rendering pipeline vertex processing primitive processing
Optimizing AAA Games for Mobile Platforms
Optimizing AAA Games for Mobile Platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Epic Games, Unreal Engine 15 years in the industry 30 years of programming C64 demo
Lecture Notes, CEng 477
Computer Graphics Hardware and Software Lecture Notes, CEng 477 What is Computer Graphics? Different things in different contexts: pictures, scenes that are generated by a computer. tools used to make
Getting Started with CodeXL
AMD Developer Tools Team Advanced Micro Devices, Inc. Table of Contents Introduction... 2 Install CodeXL... 2 Validate CodeXL installation... 3 CodeXL help... 5 Run the Teapot Sample project... 5 Basic
Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university
Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011 Visually realistic goal force[d] us to completely rethink the entire rendering process. Cook
NVIDIA GeForce GTX 580 GPU Datasheet
NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines
GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith
GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series By: Binesh Tuladhar Clay Smith Overview History of GPU s GPU Definition Classical Graphics Pipeline Geforce 6 Series Architecture Vertex
GPU Profiling with AMD CodeXL
GPU Profiling with AMD CodeXL Software Profiling Course Hannes Würfel OUTLINE 1. Motivation 2. GPU Recap 3. OpenCL 4. CodeXL Overview 5. CodeXL Internals 6. CodeXL Profiling 7. CodeXL Debugging 8. Sources
Introduction to Computer Graphics
Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 [email protected] www.cs.sfu.ca/~torsten Today What is computer graphics? Contents of this course Syllabus Overview of course topics
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software
GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas
Choosing a Computer for Running SLX, P3D, and P5
Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line
NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH [email protected] SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA
NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH [email protected] SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA GFLOPS 3500 3000 NVPRO-PIPELINE Peak Double Precision FLOPS GPU perf improved
INTRODUCTION TO RENDERING TECHNIQUES
INTRODUCTION TO RENDERING TECHNIQUES 22 Mar. 212 Yanir Kleiman What is 3D Graphics? Why 3D? Draw one frame at a time Model only once X 24 frames per second Color / texture only once 15, frames for a feature
Test Specification. Introduction
Test Specification Introduction Goals and Objectives GameForge is a graphical tool used to aid in the design and creation of video games. A user with little or no experience with Microsoft DirectX and/or
Image Synthesis. Transparency. computer graphics & visualization
Image Synthesis Transparency Inter-Object realism Covers different kinds of interactions between objects Increasing realism in the scene Relationships between objects easier to understand Shadows, Reflections,
CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS
ICCVG 2002 Zakopane, 25-29 Sept. 2002 Rafal Mantiuk (1,2), Sumanta Pattanaik (1), Karol Myszkowski (3) (1) University of Central Florida, USA, (2) Technical University of Szczecin, Poland, (3) Max- Planck-Institut
Web Based 3D Visualization for COMSOL Multiphysics
Web Based 3D Visualization for COMSOL Multiphysics M. Jüttner* 1, S. Grabmaier 1, W. M. Rucker 1 1 University of Stuttgart Institute for Theory of Electrical Engineering *Corresponding author: Pfaffenwaldring
Texture Cache Approximation on GPUs
Texture Cache Approximation on GPUs Mark Sutherland Joshua San Miguel Natalie Enright Jerger {suther68,enright}@ece.utoronto.ca, [email protected] 1 Our Contribution GPU Core Cache Cache
NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect
SIGGRAPH 2013 Shaping the Future of Visual Computing NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect NVIDIA
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1
Programming 3D Applications with HTML5 and WebGL
Programming 3D Applications with HTML5 and WebGL Tony Parisi Beijing Cambridge Farnham Köln Sebastopol Tokyo Table of Contents Preface ix Part I. Foundations 1. Introduction 3 HTML5: A New Visual Medium
GPU Tools Sandra Wienke
Sandra Wienke Center for Computing and Communication, RWTH Aachen University MATSE HPC Battle 2012/13 Rechen- und Kommunikationszentrum (RZ) Agenda IDE Eclipse Debugging (CUDA) TotalView Profiling (CUDA
AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009
AMD GPU Architecture OpenCL Tutorial, PPAM 2009 Dominik Behr September 13th, 2009 Overview AMD GPU architecture How OpenCL maps on GPU and CPU How to optimize for AMD GPUs and CPUs in OpenCL 2 AMD GPU
How To Make A Texture Map Work Better On A Computer Graphics Card (Or Mac)
Improved Alpha-Tested Magnification for Vector Textures and Special Effects Chris Green Valve (a) 64x64 texture, alpha-blended (b) 64x64 texture, alpha tested (c) 64x64 texture using our technique Figure
Advanced Visual Effects with Direct3D
Advanced Visual Effects with Direct3D Presenters: Mike Burrows, Sim Dietrich, David Gosselin, Kev Gee, Jeff Grills, Shawn Hargreaves, Richard Huddy, Gary McTaggart, Jason Mitchell, Ashutosh Rege and Matthias
2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT
COMP27112 Computer Graphics and Image Processing 2: Introducing image synthesis [email protected] 1 Introduction In these notes we ll cover: Some orientation how did we get here? Graphics system
Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August
Optimizing Unity Games for Mobile Platforms Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August Agenda Introduction The author and ARM Preliminary knowledge Unity Pro, OpenGL ES 3.0 Identify
Visualizing Data: Scalable Interactivity
Visualizing Data: Scalable Interactivity The best data visualizations illustrate hidden information and structure contained in a data set. As access to large data sets has grown, so has the need for interactive
How To Program With Adaptive Vision Studio
Studio 4 intuitive powerful adaptable software for machine vision engineers Introduction Adaptive Vision Studio Adaptive Vision Studio software is the most powerful graphical environment for machine vision
How To Develop For A Powergen 2.2 (Tegra) With Nsight) And Gbd (Gbd) On A Quadriplegic (Powergen) Powergen 4.2.2 Powergen 3
Profiling and Debugging Tools for High-performance Android Applications Stephen Jones, Product Line Manager, NVIDIA ([email protected]) Android By The Numbers 1.3M Android activations per day Android activations
3D Computer Games History and Technology
3D Computer Games History and Technology VRVis Research Center http://www.vrvis.at Lecture Outline Overview of the last 10-15 15 years A look at seminal 3D computer games Most important techniques employed
Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH
Equalizer Parallel OpenGL Application Framework Stefan Eilemann, Eyescale Software GmbH Outline Overview High-Performance Visualization Equalizer Competitive Environment Equalizer Features Scalability
Building Applications Using Micro Focus COBOL
Building Applications Using Micro Focus COBOL Abstract If you look through the Micro Focus COBOL documentation, you will see many different executable file types referenced: int, gnt, exe, dll and others.
How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)
Optimizing Unity Games for Mobile Platforms Angelo Theodorou Software Engineer Brains Eden, 28 th June 2013 Agenda Introduction The author ARM Ltd. What do you need to have What do you need to know Identify
TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING
TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING NVIDIA DEVELOPER TOOLS BUILD. DEBUG. PROFILE. C/C++ IDE INTEGRATION STANDALONE TOOLS HARDWARE SUPPORT CPU AND GPU DEBUGGING & PROFILING
Interactive Level-Set Deformation On the GPU
Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation
JOURNAL OF OBJECT TECHNOLOGY
JOURNAL OF OBJECT TECHNOLOGY Online at http://www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2006 Vol. 5, No. 6, July - August 2006 On Assuring Software Quality and Curbing Software
PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)
PARALLEL JAVASCRIPT Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) JAVASCRIPT Not connected with Java Scheme and self (dressed in c clothing) Lots of design errors (like automatic semicolon
QCD as a Video Game?
QCD as a Video Game? Sándor D. Katz Eötvös University Budapest in collaboration with Győző Egri, Zoltán Fodor, Christian Hoelbling Dániel Nógrádi, Kálmán Szabó Outline 1. Introduction 2. GPU architecture
DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER
DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER RAMA HOETZLEIN, DEVELOPER TECHNOLOGY, NVIDIA Data Visualizations assist humans with data analysis by representing information
Figure 1: Graphical example of a mergesort 1.
CSE 30321 Computer Architecture I Fall 2011 Lab 02: Procedure Calls in MIPS Assembly Programming and Performance Total Points: 100 points due to its complexity, this lab will weight more heavily in your
Monitoring, Tracing, Debugging (Under Construction)
Monitoring, Tracing, Debugging (Under Construction) I was already tempted to drop this topic from my lecture on operating systems when I found Stephan Siemen's article "Top Speed" in Linux World 10/2003.
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Introduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
zen Platform technical white paper
zen Platform technical white paper The zen Platform as Strategic Business Platform The increasing use of application servers as standard paradigm for the development of business critical applications meant
Color correction in 3D environments Nicholas Blackhawk
Color correction in 3D environments Nicholas Blackhawk Abstract In 3D display technologies, as reviewers will say, color quality is often a factor. Depending on the type of display, either professional
A Survey of General-Purpose Computation on Graphics Hardware
EUROGRAPHICS 2005 STAR State of The Art Report A Survey of General-Purpose Computation on Graphics Hardware Lefohn, and Timothy J. Purcell Abstract The rapid increase in the performance of graphics hardware,
Dynamic Resolution Rendering
Dynamic Resolution Rendering Doug Binks Introduction The resolution selection screen has been one of the defining aspects of PC gaming since the birth of games. In this whitepaper and the accompanying
ultra fast SOM using CUDA
ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
How To Teach Computer Graphics
Computer Graphics Thilo Kielmann Lecture 1: 1 Introduction (basic administrative information) Course Overview + Examples (a.o. Pixar, Blender, ) Graphics Systems Hands-on Session General Introduction http://www.cs.vu.nl/~graphics/
Interactive Visualization of Magnetic Fields
JOURNAL OF APPLIED COMPUTER SCIENCE Vol. 21 No. 1 (2013), pp. 107-117 Interactive Visualization of Magnetic Fields Piotr Napieralski 1, Krzysztof Guzek 1 1 Institute of Information Technology, Lodz University
Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008
Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer
What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game
1 2 3 4 What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game developers are smart and inquisitive Game devs extract
This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo
How To Write A Cg Toolkit Program
Cg Toolkit Cg 1.4.1 March 2006 Release Notes Cg Toolkit Release Notes The Cg Toolkit allows developers to write and run Cg programs using a wide variety of hardware platforms and graphics APIs. Originally
The Future Of Animation Is Games
The Future Of Animation Is Games 王 銓 彰 Next Media Animation, Media Lab, Director [email protected] The Graphics Hardware Revolution ( 繪 圖 硬 體 革 命 ) : GPU-based Graphics Hardware Multi-core (20 Cores
Interactive Level-Set Segmentation on the GPU
Interactive Level-Set Segmentation on the GPU Problem Statement Goal Interactive system for deformable surface manipulation Level-sets Challenges Deformation is slow Deformation is hard to control Solution
Low power GPUs a view from the industry. Edvard Sørgård
Low power GPUs a view from the industry Edvard Sørgård 1 ARM in Trondheim Graphics technology design centre From 2006 acquisition of Falanx Microsystems AS Origin of the ARM Mali GPUs Main activities today
Optimization for DirectX9 Graphics. Ashu Rege
Optimization for DirectX9 Graphics Ashu Rege Last Year: Batch, Batch, Batch Moral of the story: Small batches BAD What is a batch Every DrawIndexedPrimitive call is a batch All render, texture, shader,...
Windows Presentation Foundation: What, Why and When
Windows Presentation Foundation: What, Why and When A. WHY WPF: WPF is framework to build application for windows. It is designed for.net influenced by modern display technologies like HTML and Flash and
GPU Usage. Requirements
GPU Usage Use the GPU Usage tool in the Performance and Diagnostics Hub to better understand the high-level hardware utilization of your Direct3D app. You can use it to determine whether the performance
Programming with the Dev C++ IDE
Programming with the Dev C++ IDE 1 Introduction to the IDE Dev-C++ is a full-featured Integrated Development Environment (IDE) for the C/C++ programming language. As similar IDEs, it offers to the programmer
GPU Shading and Rendering: Introduction & Graphics Hardware
GPU Shading and Rendering: Introduction & Graphics Hardware Marc Olano Computer Science and Electrical Engineering University of Maryland, Baltimore County SIGGRAPH 2005 Schedule Shading Technolgy 8:30
Manage Software Development in LabVIEW with Professional Tools
Manage Software Development in LabVIEW with Professional Tools Introduction For many years, National Instruments LabVIEW software has been known as an easy-to-use development tool for building data acquisition
Troubleshooting.NET Applications - Knowing Which Tools to Use and When
Troubleshooting.NET Applications - Knowing Which Tools to Use and When Document Version 1.0 Abstract There are three fundamental classifications of problems encountered with distributed applications deployed
Advanced Graphics and Animations for ios Apps
Tools #WWDC14 Advanced Graphics and Animations for ios Apps Session 419 Axel Wefers ios Software Engineer Michael Ingrassia ios Software Engineer 2014 Apple Inc. All rights reserved. Redistribution or
Debugging Multi-threaded Applications in Windows
Debugging Multi-threaded Applications in Windows Abstract One of the most complex aspects of software development is the process of debugging. This process becomes especially challenging with the increased
Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab
Performance monitoring at CERN openlab July 20 th 2012 Andrzej Nowak, CERN openlab Data flow Reconstruction Selection and reconstruction Online triggering and filtering in detectors Raw Data (100%) Event
Real-time Visual Tracker by Stream Processing
Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol
ANDROID APPS DEVELOPMENT FOR MOBILE AND TABLET DEVICE (LEVEL I)
ANDROID APPS DEVELOPMENT FOR MOBILE AND TABLET DEVICE (LEVEL I) Who am I? Lo Chi Wing, Peter Lecture 1: Introduction to Android Development Email: [email protected] Facebook: http://www.facebook.com/peterlo111
OpenEXR Image Viewing Software
OpenEXR Image Viewing Software Florian Kainz, Industrial Light & Magic updated 07/26/2007 This document describes two OpenEXR image viewing software programs, exrdisplay and playexr. It briefly explains
NVFX : A NEW SCENE AND MATERIAL EFFECT FRAMEWORK FOR OPENGL AND DIRECTX. TRISTAN LORACH Senior Devtech Engineer SIGGRAPH 2013
NVFX : A NEW SCENE AND MATERIAL EFFECT FRAMEWORK FOR OPENGL AND DIRECTX TRISTAN LORACH Senior Devtech Engineer SIGGRAPH 2013 nvfx : Plan What is an Effect New Approach and new ideas of nvfx Examples Walkthrough
Computer Applications in Textile Engineering. Computer Applications in Textile Engineering
3. Computer Graphics Sungmin Kim http://latam.jnu.ac.kr Computer Graphics Definition Introduction Research field related to the activities that includes graphics as input and output Importance Interactive
