Real Time Ray Tracing

Save this PDF as:

Size: px
Start display at page:

Download "Real Time Ray Tracing"


1 Real Time Ray Tracing Master thesis IMM, DTU Pawel Bak Project number 38 June 2010


3 Abstract 1 Abstract The primary goal of the project was to create a method for using Ray Tracing rendering algorithm in a real time rendering application. Although the algorithm itself is more than 30 years old nobody was able to create a solution for arbitrary application showing 3d objects in real time. Those applications would include e.g. visualization software or 3d games. The report describes attempt of creating such an algorithm with help of contemporary Graphical Processing Units and new features exposed by a universal 3d API called DirectX 11. There were already many attempts on doing similar algorithms yet the possibilities in this API have not been explored yet. Additionally the work presents those attempts that have been undertaken during past few years to make the real time ray tracing possible. These unfortunately though very fast in some case do not present the opportunity for being used in real life scenarios. In the work I have also tried to analyze different possible methods of achieving the goal. Looking into those different options for the solution I have managed to create a solution capable of using GPU as well as CPU in a hybrid solution. Where the GPU is used in places where its enormous computational power and parallel architecture could be used best. However in places where single threaded algorithm was hard to change I have used the CPU for its much faster speed in such application. Finally the solution which was created for testing the assumptions taken in the thesis was tested using several scenes. The results are more than promising yet still one can see that a system with a single card is not able to render complex animated scenes in high resolutions in real time. That is true especially taking additional ray tracing effects into account. However from the test one can see also that usage of more advanced multiple card system should allow real time rendering of such scenes. This is due to the fact that the solution is clearly bounded by the rendering resolution. Therefore increasing the cards number and though doubling, tripling or quadrupling the computational power should in a process of ray tracing which is almost independent for each pixel adequately increases rendering speed.

4 Table of Contents 2 Table of Contents Abstract... 1 Table of Contents Introduction [62] What is Ray Tracing Generating Primary rays Math Formulas vs. Primitives Finding the intersection points Shading Global illumination vs. local illumination Phong shading [44] Blinn-Phong shading [11] Reflected and refracted rays Acceleration data structures Grid structures Voxel structure/uniform grid Bounding Volume Hierarchies (BVH) Axis aligned bounding boxes (AABB) Tree Space Partition Structures BSP trees KD-tree Data structures summary Why it is beneficial to make it Real time State of the art real time ray tracing solutions RealStorm [50] Arauna real-time ray-tracing[10] Ray-tracing Demos[22] NVidia OptiX[38] IBM Interactive Ray-tracer (irt)[31] SmallptGPU a OpenCL benchmark GPU GPU programming drawbacks... 29

5 Table of Contents 3 6 DirectX What is DirectX What is new in the DirectX 11[18] Compute shaders HLSL and ASM DirectX 11 hardware Evergreen Fermi [57] DirectX 11 benefits for real-time ray-tracing summarized Current work Analysis of the problem Geometry type to use d scene storing and reading Choosing acceleration data structure Way to generate data structure Traversing data structure Generating image My Idea Components of the system Data structures Structured buffers My data structures KD tree GPU mapping Data flow between components Algorithm in detail Reading scene Reading and applying animation Building acceleration data structure Pre compute shader determining amount of triangles The compute shader splitting the triangles to the grid Building KD-Tree... 57

6 Table of Contents Filling of tree leafs Passing Data onto GPU Determining primary hits based on rasterization Traversing Data Structure Rendering Optimization used to achieve better performance Optimization tools AMD CodeAnalyst[3] AMD GPU PerfStudio[4] AMD GPU ShaderAnalyzer[5] Microsoft Pix Data structure optimizations CPU Code optimizations Primary hits rasterizer acceleration GPU optimizations Attempted optimization that didn t increase the speed Current weak spots of the solution User interface Tests Grid size vs. FPS Resolution vs. FPS Reflections vs. FPS Animation vs. FPS GPU overclocking FPS Fermi vs. Evergreen FPS Results Discussion Conclusion Future work Bibliography Appendix Program Listings... 89

7 Table of Contents Main RayTracing Pixel Shader Divide Shader Program Screenshots

8 Introduction [62] 6 1 Introduction [62] For many years real time rendering has been achieved using techniques based on rasterization approaches. This has been determined by the ease of implementation of such techniques into hardware. For past two decades hardware manufacturers have been creating faster and better equipped specialized processors to handle real time rendering using those rendering methods. We call them Graphical Processing Units or shortly GPUs. With time apart of standard rasterization rendering GPUs also started to support more and more additional techniques in order to improve visual qualities of the real time computer graphics they are able to present. Finally GPU creators realized that the best they can do is to allow creation of customizable program that can perform visual enhancements to the rendered scenes. Those were called shaders and were very successful due to the almost arbitrary processing of both objects in the scene and final image. Since visual effects have always required a lot of computational power shader units in modern GPUs have to be very computationally powerful. Realizing that, a lot of people started to develop shader programs. These not always were meant to generate graphics trough the GPU pipeline. In the effort of easing such a development lately GPU makers and software developers decided to create ways to allow easy of development of such programs. First they were vendor specific like CUDA from NVidia or FireStream from ATI. Yet recently in an effort of unifying development of general purpose programs libraries like MS DirectX 11 and OpenCL where introduced. Both of those solutions promise cross vendor execution of software which was written using those APIs. The above paragraph introduces current state of the real time rendering and GPUs. This work however focuses on building an alternative rendering method for real time computer graphics. This method tries to emulate behavior of natural light rays and therefore it was called ray tracing. Invented by Appel in 1968 [6] and perfected by Whitted in 1980 [63] is one of the best rendering techniques in respect of image quality. Unfortunately currently due to the high computational power needed it is reserved primarily for so called offline rendering. This includes static images as well as recently popular computer animated movies. However due to its clear advantage in whole scene rendering against the rasterization based techniques a lot of research is being done to make the technique suitable for real-time rendering. The main advantage over rasterization is that color of each pixel is calculated based on the whole scene. The rasterization in opposite to ray tracing calculates value of each pixel based solitary on the object which is nearest to the camera for a given pixel. This work is also an attempt to use advantage of one of modern APIs and GPUs to make real time ray trace rendering possible. However in order to move forward in a topic that has already been investigated for over two decades one has to look into the current research. In case of ray tracing of animated scenes a lot has already been tried. Yet still no one was able to produce a solution that could be used in real life scenarios like 3d multimedia or other CAD software. The most important focus of researchers in this field is to create best acceleration data structure to hold scene

9 What is Ray Tracing 7 objects in it. This is due to the fact that ray tracing of scenes that are more complex that few elements is impractical without good organization of those. It should be enough to mention that without acceleration data structure one has to test all rays against all objects in the scene which makes the total number of test grow exponentially. This of course can t be calculated fast enough for large scenes to be considered real time. A very comprehensive comparison of those and the rendering time speed up can be found in Havran PHD thesis [23]. In general one can divide acceleration data structures into three groups. Those are trees based data structures, grid based and Bounding Volume Hierarchies (BVH). A more in depth analysis of those reveals that build quality depends on time. Taking into consideration kd-trees which are determined to be fastest in rendering one can see that building those based on bounding boxes instead of actual objects although faster produces structure about 25% in slower rendering [59]. Yet one also has to realize that in case of real time ray tracing of dynamic scenes not only the rendering time is important. Additionally one has to take into account fact that the data structure has to be adjusted for any changes in the scene. The same paper shows that in fact although one suffers this speed drop in pure rendering in total one gains speedup. Other attempts to speed up ray tracing include usage of ray aggregation techniques. This combines several rays into so called packets or frustums. This was initially proposed by Wald in 2001 [61] and later improved by Reshetov [52] for kd-trees and others for other data structures. Better mechanism of determining primary ray hits has been used by Horn in 2007 [25] who uses the GPUs rasterization process to determine primary so called hits. This technique has been improved by Dachsbacher in 2009 [15]. This is particularly useful in GPU based ray tracing since it allows to use the rasterizer which is very fast, however not directly programmable. Taking into account the work done previously one can start investigating new ways to improve this process so it could be done finally in real time using modern hardware and software. 2 What is Ray Tracing Ray Tracing was initially invented as hidden surface removal algorithm. The assumption was that virtual rays generated from the eye-point/camera going through a point on image hit objects in the scene. The ray geometry intersections that are the closes to the eye-point along the ray trajectory are with the objects which are visible and the rest can be ignored. This initial version proposed by Appel in 68 is called ray casting. Additionally one can use local illumination techniques to calculate the color of the given pixel on the image. Shadows can be calculated using so called shadow rays. These are the rays that are spawning from the ray geometry intersection points toward each source of light. In case a shadow ray intersects any object on the path to the light source, this implies that the point on the object is not being light by the given light source. This technique has been improved over the years and

10 What is Ray Tracing 8 extended by specular reflections and refraction by Whitted. This becomes the definition of ray tracing as it is used today. Figure 1 - Determining whether a point on an object is in shadow based on Arthur Apple, IBM T.J. Watson Research Center. In more general words ray tracing is described as a technique for generating an image by using tracing of light rays path for pixels in an image plane. This technique is in general capable of producing very high quality images. Those images comparing to the ones generated by other techniques like the rasterization are more photorealistic. This is due to the fact that ray tracing tires to emulate the behavior of real life light. On the other hand unfortunately it is computationally much heavier than the rasterization techniques based on scanline rendering methods. Those additionally have the advantage of being easily implementable into hardware. Since they handle each object in the scene separately where ray tracing has to take into account all the data in the scene in order to render each pixel. This makes rasterization techniques more suitable for real time rendering in opposite to ray tracing which is mostly used in solutions where higher quality is deserved yet frames can be pre generated. Those are among many still images, special effects in movies or even whole computer animated movies. The strength of ray tracing comparing with other techniques lies in the ability of simulating a variety of optical affects such as reflection and refraction, scattering, and chromatic aberration. [49] As it follows from the description above, Ray Tracing technique is an alternative to currently widely spread rasterization techniques which are used in real time rendering. This is due to the fact that rasterization is much easier implementable into hardware then other rendering technique types. Although ray tracing is considered to produce superior quality images up till now nobody was able to create a solution based on it suitable to replace rasterization in real time graphics. At least this is the case when we consider technology available to a common user. Of course there exist so called rendering farms [51] consisting of thousands of computers capable of fast rendering using ray tracing and other computationally demanding

11 What is Ray Tracing 9 techniques (e.g. photon mapping [46]). However for obvious reasons this technology is out of reach for common users. Nevertheless it is used in experimental laboratories and most notably while rendering movies. Figure 2 - Ray tracing schematics from Wikipedia. In more technical terms ray tracing can be considered as a global illumination rendering method. This technique can be described as an attempt to emulate physics model of light rays. Unfortunately in reality source of light sends enormous amount of photons which reflect of objects. Small fraction of those is falling to the camera/eye and produces the image. Tracing all possible photons from the lights source would be computationally hard therefore ray tracing tires to emulates only those photons which actually fall into the eye. This approach reduces dramatically the amount of calculations needed to render the scene producing fairly realistic images. [43] In practice all the eye rays (the ones falling into the eye/camera) are tested against all the objects in the scene. In standard ray tracing each of the eye rays is generated in a way that it is supposed to represent a pixel on a final image. So as one can notice the technique is a point sampling algorithm. This is due to the fact that one sends rays which map onto the image pixels. Since the scene is continuous we say that they are sampling it producing discrete representation of it. This means that it only samples colors of objects hit by the ray shot for a given pixel. In case there is no object on the path of ray the pixel is colored with background color. Pixels for rays that do hit something take the color of the object calculated by local illumination. Additionally depending on the material properties they can be reflected or refracted which in turn generates new rays that contribute to pixel color. Moreover one can calculate shadows by simply sending a ray toward light source and testing whether it has hit something before it reached the light source or not. In case it has we have to set the color of the ray to zero. This means that it doesn t contribute to the final color.

12 What is Ray Tracing 10 Figure 3 - Rays recursive spawn other rays Unfortunately like with every approximation there are also quality losses. Just to name a few examples these are: color bleeding (the phenomenon in which objects or surfaces are colored by reflection of colored light from nearby surfaces) or soft shadows (due to the nature of ray tracing in its natural form it produces only sharp shadows). Additionally since it is a point sampling algorithm one can also observe aliasing. Aliasing is the image deformation due to low sampling resolution. This can be observed especially at the ages of the objects. However there are also techniques focused especially on reducing those (like soft shadows and radiosity). So enhanced ray tracing is able to produce lifelike renderings hardly distinguishable from the real photos. This is widely used by movie industry in recently very popular computer animated films. The problem of aliasing can be solved as well. For that one uses so called multisampling which in case of ray tracing is sending multiple rays trough single pixel image and then combining their results into single color for given pixel. This can be done using different approaches. However in general since this radically increases demand on computational power for picture rendering it is not used in real time ray tracing rendering. In the next few sections I will focus on theoretical building blocks of the ray tracing technique. 2.1 Generating Primary rays Ray tracing process starts with generating so called primary or eye rays. These are the initial rays used for determining colors of each pixel of the screen. From mathematical point of view these are half lines starting in so called eye point or camera position and going through the pixels on the projection plane. The projection plane also known as view plane is a plane which part defined by field of view (FOV) becomes the final image. This is illustrated on an

13 What is Ray Tracing 11 image below. In order to generate primary rays we still have to define the resolution of the image. This is done by choosing width and height in pixels. Since each ray is commonly described by the pair of vectors one denoting start point and other its direction and the start point is known (common for all the primary rays) only the direction has to be calculated. So to generate primary ray s direction we just have to subtract the point on the projection plane for a given pixel from the camera location and we get the direction. Figure 4 - Primary rays In other more mathematical words to calculate the direction one uses following formula: poi right up fov height fov, poi right poi tan 1 2 x 2 width direction x y fov height y up poi tan width height - Point of interest vector - Right vector - Up vector - Field of view - Image height width - Image width After we have generated all the rays we have to test those against the geometry in the scene. In case they intersect with the scenes geometry the color is calculated. Afterwards depending on the material the rays can be refracted and reflected which is described in farther sections of this work.

14 What is Ray Tracing Math Formulas vs. Primitives Fundamentally there are two approaches to represent objects in the scene. First of them would be to represent objects with mathematical formulas and the second one is to build objects from smaller primitives primary triangles or quads. The first approach is good in terms of creating objects that can be represented easily using any form of relatively simple mathematical equation. As the best example in this category one can give spheres. Spheres generated by mathematical means are just perfect comparing to the ones created from primitives (these are edgy and require a lot of primitives like triangles to construct them). Additionally in case of such approach to the scene building, the number of geometry is rather low. This means that the tracing of rays for a single frame can be done really fast. At this point one might ask a question why do we actually need other forms of representing scene then this since the image is clear and can be generated in a fast way. The answer to this question would be: while it is rather easy to generate mathematical formula for sphere and similar it is not the case for some other more complicated objects like teapots, bunnies and other complex geometry. So in order to allow representation of all the complex objects that can t be represented by sufficiently simple math formula one thought of splitting the objects into smaller objects. Practically most of the time one uses triangles or quads. Triangles are however providing higher grain of the object that the quads do. Therefore triangles are used in most modern implementations of 3D graphics software. Use of triangles however introduces much higher amount of objects in the scene. The higher detailed and complex object the larger amount of so called faces (faces are the smaller objects creating a larger object) is required to construct it. Object constructed in such a way depending on complexity appears very edgy however thanks to clever usage of shading techniques one may practically reduce this effect to minimum. The major drawback of splitting objects into faced meshes is the highly increased computational complexity of the scene compared to the math approach. Additionally much larger amount of memory is required to store and process data. Fortunately using so called acceleration data structures one is able to reduce the computational complexity required for rendering single frame in exchange for slight increased usage of memory which is described in farther chapters. 2.3 Finding the intersection points Depending on geometry chosen for representing a scene one uses different methods to calculate intersection point between it and the rays. In general case the methods should return either that the intersection doesn t exist or the point in space where it took place. Most of the algorithms return the so called t-value along the ray path, where the t is the distance from ray origin to the intersection. Taking into account the most popular case that the scene is consisting of the triangles we will have many algorithms. From my research I was able to find algorithms proposed by,

15 What is Ray Tracing 13 Badouel [8], Moller & Trumbore [32], O Rourke [39], Moller & Haines [32] and Segura & Feito [54]. Looking through those algorithms I have decided to use the one presented by Moller & Trumbose since it returns all the data necessary for the ray tracing and is fairly simple to implement on GPU. Other algorithms like the one described by Segura & Feito promise more speed in ray triangle intersection however this doesn t return anything but the true or false in case of intersection. The idea behind the Fast, Minimum Storage Ray/Triangle Intersection is that one is to solve following equation ( ) ( ) where ( ) is the position along the ray and the ( ) is the position on the triangle. This is then extended to ( ) where: O is the ray origin, D is the ray direction and V x are the triangle vertices. If the solution to the above equation is such that, and then the intersection point lies inside the triangle and that means that it exist. Additionally one gets the u and v coordinates that give the position on the triangle which then can be used for texturing. Having t one can easily also calculate the position of the intersection point in space. In the actual implementation I have additionally used the optimization proposed by Dan Sunday [58] who replaced one of the cross products with dot products which are much better for GPU implementation. Figure 5 - ray triangle intersection 2.4 Shading Shading apart of perspective is the most important thing that makes viewers perceive the 2D rendered image on the computer screen (which is as well 2D) as a representation of 3D scene. This is achieved by illustrating varying levels of darkness. Without it any 2D image appears just flat and therefore it is used in any rendering (2D image of 3D scene) which is not supposed to be seen flat. At this point it is also worth to mention that there are also shading techniques that make scene look flat on purpose however this are out of the scope of this work.

16 What is Ray Tracing 14 Below one can see the simple example of shading. From the left one can see the so called wireframe of the object. Wireframe is the rendering in which only the edges of faces are drawn. It is very useful in depicting the objects shape since it shows the object back side. The next in the row on the below picture is no shading. On this one the egg is rendered yet all the pixels have the same color. Note that one really can t notice its 3dimensional shape. Next in the row is the diffuse shading which is showing the objects shape. This shading takes into account only the diffuse part of the lighting. Additionally one can enhance the shading by adding the specular part of the light which is shown as the last in the row. Figure 6 Shading Global illumination vs. local illumination In the computer graphics we recognize two illumination types; those are global and local illumination. The difference between those is that the local one is based on the direct contribution of light sources to the object s color. The global one is supposed to simulate the contribution of all the other light sources in the scene to the objects color. Based on that one can see that the shading of the objects can be simply based on only local illumination which in reality ignores all the scene and only properties of the object and the lights sources are important. Additionally local illumination shaded objects appetence can be enhanced by adding global illumination contributions. Though one can say that the good example of local illumination is the Phong shading described below. The Ray tracing algorithm can be considered as global illumination technique taking into account the reflections, refractions and other lighting effects that can be simulated by it. Besides the Phong shading mentioned above which probably is the single most important shading in computer graphics there exist many others. Among those we can distinguish its variation which is the Blinn-Phong shading. In my work I decided to use Blinn-Phong which is a faster version of Phong shading although not as accurate. This is due to the fact that one of the assumptions of the solutions is speed.

17 What is Ray Tracing 15 Apart of the two mentioned above there are other algorithms used in CG. The most commonly used out of other algorithms are flat shading and Gouraud shading. These are much faster than the ones based on Phong shading however additionally they also produce much lower image quality. Below I will describe the most important shadings which were used in the solution and are also most widely used in the real time CG nowadays Phong shading [44] The Phong shading is the visualization of the Phong equation which is presented in one of the next sections. This is probably the most photorealistic shading used in modern Computer Graphics. It is based on the assumption that each material has some spatial properties for reflecting light. These are called: specular reflection, diffuse reflection, ambient reflection and shininess constant. Each of these components represents way special part of light is reflected Ambient reflection This reflection is responsible for the light that comes from environment. It s present all over the object so it shades it with uniform color. This can be treated as being responsible for the base color of the object. Without this parameter objects would be just black with parts of them highlighted by light sources. The ambient light can be treated as the sum of all light sources in the world that somehow contribute to the color of the object but are rather in either large number or too far away to produce other lighting effects and so are not represented in the scene itself. This can be also seen as a very simple approximation of global illumination Diffuse reflection [41] Diffuse parameter represents the light reflected from a rough surface meaning that the reflected rays scatter in numerous directions. This type of reflection has no directional dependencies for the viewer. One can distinguish to different types of such reflections one is the directly coming light from the light source which is scattered on the rough surface and reflected toward the viewer. This means that the ray are scattered in all directions from the surface. Second type of the reflection is the one that is reflected from one surface to another and then first toward a viewer. Most of the reflection models like the Phong s one assume however only the diffuse part coming from light sources. There are however more advanced techniques like radiosity which calculate both of those types rendering more natural scenes.

18 What is Ray Tracing 16 reflected rays rays from the light source Figure 7 - Diffuse reflection Specular reflection [42] The specular reflection represents the part of the light that is being reflected with the same angle respect to normal. This means that from a single incoming direction light is reflected into single out coming direction in opposition to the diffuse reflection. This produces shiny points on the surface of the material. Figure 8 - specular reflection The combination of the three above can be seen on the picture below. One can see that final image is of adequately good quality to be considered photorealistic. Figure 9 - Phong reflection components from Wikipedia

19 What is Ray Tracing Phong Equation The final object color can of course be affected by multiple lights. For each of those the diffuse and specular components have to be calculated and added all together. The final pixel color in the Phong shading can be calculated by the following equation. ( ( ) ( ) ) [45] Where: k d - object s material diffuse reflection coefficient k a - object s material ambient reflection coefficient, the ratio of reflection of the ambient term present in all points in the scene rendered k s - object s material specular reflection coefficient, the ratio of reflection of the specular term if incoming light α - is shininess constant for this material, which for smoother surfaces like mirrors is larger. When this constant is large the specular highlight is small. - is the direction vector from the point on the surface toward each light source, - is the normal at this point on the surface, - is the direction that a perfectly reflected ray of light would take from this point on the surface - is the direction vector pointing towards the viewer (such as a virtual camera) Blinn-Phong shading [11] This is a modified Phong shading model for achieving greater efficiency in cost of some depicting accuracy. In the standard Phong shading one has to recalculate the direction of a perfectly reflected ray for each ray. However in the Blinn modification it is enough to calculate so called halfway vector where L is the position of the light source and V is the position of the viewer. Then the dot product of V and R from the Phong equation can be replaced by product of H and N where N is the normal. The resulting effect is very similar to the one produced by Phong shading. One can additionally adjust α (shines of the material) to achieve better visual effect. This can be seen on the image below. Figure 10 - Comparison of Blinn-Phong and Phong from Wikipedia

20 What is Ray Tracing Reflected and refracted rays Depending on the so called material properties given ray can be reflected and or refracted from the surface. These effects are not to be seen in standard rasterization rendering using only local illumination techniques. However it is very natural for real objects. The mirror is a perfect example of an object from which rays are reflected. A perfect one would characterize with reflectivity index 1.0 or 100% reflectivity mining that all the ray energy is reflected. In such a case the total color is taken for objects farther on the path of a given ray. In case the index is less than 1.0 the appropriate part of the objects color is taken based on the local illumination of the object. The direction of the reflected ray is calculated using so called law of reflection which states that the angle of incoming ray to the surface normal is the same as the angle of reflected ray to the surface normal. This is depicted in the figure below. θr θ i mirror Figure 11 - Reflected ray The second property of light which is supported by the ray tracing is the light refraction. This one is achieved by calculating the refracted rays. The refraction is the change of ray direction due to the change of light speed in different mediums. In real life this is to be observed in the transparent objects such as glass or water. As mentioned the refraction comes from the difference in speed of light in different medium and those are related to the angles of ray directions related to the surface normal using the following formula. In this formula v is the velocity n is the refraction index and is the angle between ray and surface normal. In case of computer graphics the refraction index is more commonly used to indicate difference between mediums and is used as a material parameter. The image below represents the refraction schema.

21 Acceleration data structures 19 Figure 12 - Law of Refraction 3 Acceleration data structures At this point it is worth to mention that a simple naïve form of ray tracing is practically never used since it requires testing of all the rays against all the geometry. This of course would be the best option if one would consider really low geometry scenes. Therefore this approach is used only in very simple demos having scenes build from mathematical representations of objects. However if the scene contains more than a hand of objects (from my experience it s around objects for modern architectures) the amount of tests that has to be performed is getting really high and though unpractical. This is due to the fact that taking a standard resolution of 800 by 600 pixels and 100 objects in the scene to calculate even primary rays one needs to do 40 million ray geometry tests. In light of this fact one uses acceleration data structures to divide the geometry in the scene into subsets. These are supposed to hold small enough amount of objects so that the number of tests required for a single ray going into the subset is low. This allows a significant acceleration since instead of simply trying to test all rays against all scene objects one has to traverse the data structure in search of subsets hit by the given ray. This in general if the amount of geometry is large should be much faster than the naïve approach. Afterwards only intersection test on the objects that potentially are in question is needed and not the ones that clearly are out of the ray path. In general we distinguish three types of acceleration data structures. These are Grid structures, Tree structures, and Bounding volume hierarchies (BVH). Those are described in details below. There has been made a lot of research into those structures. Since as described a good data structure is be or not to be of fast ray tracing solution. However taking into account the scene animation it is also important to know the cost of rebuilding

22 Acceleration data structures 20 given structure. In the literature we can easily find that the form of tree structure called kdtree is supposed to be the fastest rendering the scenes. However it is also noted that this is a structure which is very slow to build. Therefor a direct usage of best known algorithms generating this structure is tricky. The quick look through literature also reveals that the fastest to rebuild are grid structures especially so called uniform grid. This however is slow in rendering. Therefore as one can see the problem of data structures is not an easy to solve and requires some in depth analysis in order to create a good solution. 3.1 Grid structures There are several acceleration data structures available now for splitting the scene into subsets. Most popular ones would be all kind of grids and tree like ones. The idea behind the grid like structures is simple one divides the space either by arbitrary setting the size of the grid or adaptively according to objects in the scene. Afterwards the objects get to be put into the appropriate cells. This means that object in such a structure can be placed in multiple cells. Yet rebuild time in general is very fast and can be parallelized easily. This is due to the fact that each object can be placed independently inside the grid structure. So in ideal case the amount of worker threads while building the scene would be equal to amount of objects in the scene. This makes this structure very fast to be rebuild from scratch. This approach is ideal for scenes that have approximately uniform distribution of geometry objects. However if one considers scenes that most of the objects is paced in one place then after division one ends with unbalanced structure in which all the objects are inside one or only few cells. In such a case there will be no much acceleration from it Voxel structure/uniform grid The idea behind this structure is to very simple one just divides the scene into equally sized sub spaces. This forms a grid enclosing all the objects in the space. Since the division of the space is done using predefined size the construction of the structure is very fast. However the quality of acceleration might be low if the scene has a large concentration of objects in a certain points of the scene. not tested tested no hit tested hit Figure 13- Voxel data structure

23 Acceleration data structures Bounding Volume Hierarchies (BVH) This in general refers to all kind of tree structure hierarchies containing leaves encapsulation objects in the scene. This means that each object is enclosed in some simple bounding volume. The assumption of bounding volume is that it is much easier to perform intersection between it and a ray then doing the same with the enclosed object. The major advantage of BVH is their ability to do incremental changes to the structure which is very important considering the need of adjusting data structures each time scene changes. This was sown by Yoon in 2007 [64]. This is hardly achievable with other data structure types. The additional advantage of the BVH is the fact that it refers to scene object exactly once in comparison to other data structures where the object can be referred multiple times. However as shown by Havran in his comparison study about the data structures [24] the rendering speed is slower than in case of other data structure types. Therefore one can see BHVs as very fast in rebuilding yet slow in rendering. Therefore they have a lot of potential in rendering dynamic scenes where the rendering time consists both of adjusting data structure and rendering scene. Figure 14 - Bounding Volume Hierarches Example from Wikipedia Axis aligned bounding boxes (AABB) In computer graphics one uses bounding volumes quite a lot. A bounding volume is any kind of simple object used to totally enclose a subset of a scene. This then is used to accelerate calculations. In particular for ray tracing it might be seen as a form of math geometry scene. The acceleration in the case of bounding volumes is through reducing the amount of initial ray geometry tests. This is because most of the rays that do not hit anything in the scene also do not hit the bounding volumes as well. In contrary rays that do intersect with a given volume have to be tested against geometry bounded by the given volume. The most simple and most used out of bounding volumes are axis aligned bounding boxes. These are boxes aligned with the axis of the coordinate system. These are much easier to be tested then ordinary bounding boxes however any sort of object rotation implies the regeneration of the bounding volume. Other commonly used geometrical objects as bounding volume are spheres. They are used in ray tracing application since the sphere ray intersection is very simple and fast to be

24 Acceleration data structures 22 computed. Additionally rotations of objects bounded in the sphere do not require sphere regeneration. 3.3 Tree Space Partition Structures The idea behind the structures is that one divides the space into seller sub spaces. This is very similar to grid structures. Yet the hierarchy of those spaces is stored using tree structure. The most notable of those would be BSP and KD-trees which are described below BSP trees The Binary Space Partitioning (BSP) tree is one of the mostly used space dividing acceleration data structures. It works based on a very simple yet efficient schema. Namely one divides the space into two sub spaces. This is done until certain criteria are met. In most of the cases these are either the depth of the tree or the amount of objects enclosed in leafs of the tree. Browsing the BSP tree one can quickly reach really small parts of the space. This in most of the cases is much quicker than in the case of voxel structure. This is due to the fact that large empty subspaces can be quickly omitted. However the building time is much slower in most of the cases since there is need to calculate space division criteria each time one wants to divide the space. This is in opposite to voxel structure in which the space division is done based on constant set of parameters (width, height, length of the voxel). On the figure below one can see sample space division. Each of the nodes of the tree describes part of the space. Going deeper into the tree children contain smaller subsets of the space described by their parents. Figure 15 - Binary Space Partitioning Tree from Wikipedia KD-tree The most popular acceleration data structure used in ray tracing currently is kd-tree which is a special case of BSP tree. In the name k stands for the amount of dimension it spends on and d is for dimensions so for 3 dimensional trees one actually should use 3D-tree however most of the times one refers to it simply as kd-tree. This simply works on the assumption of dividing space into two subspaces and those into another two and so on until a goal is achieved. So to say is that a parent node is containing the whole sub-tree. Therefore it is easy to rule out large amount of traversing since if the ray is not hitting a node in the tree then it will also not intersect with any of its sub-nodes. This as well implies that rays not hitting root node do not hit the scene at all.

25 Acceleration data structures 23 KD trees are found by a research on acceleration data structures to be most efficient ones. The study was done by Havran [23] as a part of this work he conducted a research on the efficiency of different data structures and different scenes and he concludes that the kdtrees are the most efficient ones for most of the cases. In order to divide space one has to use some sort of division criterion. Depending on this criterion one achieves so called tree quality. The quality in this case is the level of scene division. If the scene is divided so that each leaf has the same amount of objects then he tree has the highest possible quality. In practice it is very hard to achieve such quality especially for complex scenes. Currently best known technique of achieving highest quality trees is usage of surface area heuristic SAH in calculating space split points while building the tree. This heuristic allows estimating ray tracing cost based on assumption on ray distribution in the scene. Minimizing this cost allows creation of best trees. Yet disadvantage of this technique is that it is very computing power hungry due to the fact that one has to calculate all possible divisions and their costs in order to achieve best results. There are some methods that try to accelerate this process like one proposed by Wald and Havran [59] which is based on sorting of the primitives. However in general algorithms based on SAH are slow to rebuild the tree. In case of kd-trees one often divides space using rotating dimensions scheme. This scheme is based on dividing the space into two subspaces based on division criteria found on a specific axis. The axis is chosen consecutively for each level of the tree of the set of available axes. Figure 16-2 Dimensional kd-tree example Data structures summary After looking at the data structure one sees that the problem of efficient ray tracing has to be regarded as two stage one. First is the part which one could call: how to build efficiently the chosen data structure. This is especially needed if the solution is supposed to dynamically change the geometries position which is called animation of geometry. In order to do that each time when the geometry changes one has to regenerate the data structure since the position of single objects is changing and they may and will change its position in the data structure. Therefore if one needs real time animation the process of regenerating structure must not take more than few milliseconds. The second phase is: the rendering phase in which one has to generate all the rays and traverse the data structure. Then the ray geometry intersection test has to take place. While doing so one has to calculate all the information necessary to generate the color of the pixel.

26 Why it is beneficial to make it Real time 24 This is done based on the objects properties. These properties are often referred to as material properties of the object. Afterwards one still has to calculate the so called reflected rays. This process is similar to the calculation of primary rays however new ray origin is the point primary ray intersected with geometry. The direction of the ray is calculated then based on the incoming ray direction and surface normal. This process can be continued multiple number of times. Each reflection will generate in general better image quality which appears more photorealistic. Although if the ray reaches point of light it will no longer have to be traced since the whole path of the light form this light source to the eye has been simulated. In addition depending on the properties of the material one can emulate refracted rays as well. 4 Why it is beneficial to make it Real time The simplest answer to the question why is it beneficial to make the real-time ray tracing is the image quality. This is superior to the currently widely spread polygon rasterization techniques. In addition ray tracing supports usage of acceleration data structures which allow grater speeds then traditional rasterization if one considers very complex scenes. This is due to the fact that correctly created data structure will require similar amount of ray geometry intersections for all the scene sizes. So at least in theory achieving real time on small and medium size scene should equal to achieving that in large scene considering only the rendering time of a frame. Of course rebuilding acceleration data structure will take more time if the scene is more complex. Additionally one has to consider that a larger scene will also use more memory. This is especially to be considered taking into account large scenes. It s true that a lot of effects supported by ray tracing can be emulated on currently wide used GPUs. Although their rendering engines are based on rasterization, they support custom image post processing in form of pixel shaders. Some effects like reflections still can t be emulated on the GPU using standard approach. This can be seen in almost all 3d programs, that some of the basic properties of real word are just omitted like the given reflections meaning that mirrors actually do not reflect. As well light refraction is not seen in contemporary real time 3d graphics. These and many more are simple effects under ray tracing which makes the rendered image more immersive and lifelike. All of this follows from the fact that rasterization based techniques do not take the scene as whole while rendering, but they consider each object separately. So all effects based on scene objects interaction are very hard to emulate in rasterization. In opposite they are just natural for ray tracing. 4.1 State of the art real time ray tracing solutions This paragraph is dedicated for the description of the currently best ray tracing solutions. I will try to summarize the fastest and or best image quality generating ones. This will be used later for comparison with the solution created as a part of this thesis. One has to be aware that there is a simple link between quality and speed that can be best observed in the

27 Why it is beneficial to make it Real time 25 solutions used for rendering movies. In case of those on one hand we have best possible visual quality at the moment of their creation. On the other hand average final frame renders several hours on really high powered machines. Just to take it into perspective according to [53] the rendering of Shrek III took about 20 million hours on a single CPU while the movie has only 92 minutes. From this one can see that the movie quality renderings are still far away from being real time on current PCs. I will however discard the movie renderers since this are the extreme case of quality against speed. In those renderers one doesn t focus on the same goals as the ones taken for the thesis so comparison is not that interesting. I will present rather solutions with more than less similar point of view instead. Meaning that author/s had a similar approach to ray tracing. Unfortunately although there is a lot of talk about the real time ray tracing on the internet there is not so many actual running software that can be found. Therefore mostly one can use the data provided by the researchers for the comparison. In this chapter yet I will try to present the solutions that I was able to find on the internet that claim to be real time ray tracers. For reference to more scientific solutions presented in the papers please refer to the chapter on current work. The solutions presented below are downloadable and testable on hardware they are intended for RealStorm [50] I will start the description on state of the art ray tracers with RealStorm which is the first Real Time ray tracing engine that I come across. It is based on modeling scenes from geometrical entities based on mathematical formula. It can produce modestly looking scenes in real time although of course it has limitations on what can be created. There exist benchmark applications for this engine which can be run freely at any computer. The tests runs at between 9 and 20 frames per second depending on how heavy is the part of it in lights and reflection effects. In general it supports only single reflections Arauna real-time ray-tracing [10] The Arauna real-time retracing is the fastest and the best real time ray-tracing solution that I have been able to find on the internet while writing the project. It provides acceptable frame rates to be called real-time. In case of my test machine it was between 15 and 30 frames per second in 800x600 resolution. The total rendering process is done on CPU which according to the author can take advantage of until 32 cores. The whole process is based on rendering static scenes with a moving camera and light sources. Additionally some degree of animating objects is used that do not require regeneration of the acceleration data structure. Furthermore author specifies that the solution is using Bounding Volume Hierarchy (BVH) as the acceleration data structure. Thanks to all that tricks and low polygon scene the author was able to achieve real-time ray tracing for simple games. However in his solution the acceleration data structure has to be generated before the program starts. He supplies a program that generates the structure of given object and uses precompiled objects in his solution.

28 Why it is beneficial to make it Real time Ray-tracing Demos [22] On the internet one can find a lot of so called demos which use simple form of ray tracing. Mostly it contains formula based objects rendered in low resolution. Demo is basically a small program which is to show of the skills of its creator. This use CPU to ray trace scene in real time accompanied by some music. It is hard to assess the speed of those however they provide acceptable visual effects on even quite old computers by today s standards. Yet of course scenes are also very small containing only a hand of objects in most cases NVidia OptiX [38] OptiX is a real time rendering engine developed by NVidia for their GPUs. It is based on CUDA acceleration. I was only able to get a look at some of the videos. Therefore I was not able to assess the performance of their solution in real time. However what I could see in the videos is that they mostly base their real time animations on a trick to lower the resolution and quality while the system is processing scene changes and then bust it back to high quality while the scene is static. This can be seen as a hybrid solution between real time ray tracing and one used in 3D graphics software for generating higher quality images. However the engine seams not to be entirely suitable for real time rendering. It is rather for applications where generation of a single frame high quality frame is most important and the manipulation of the scene will give some good indication on how the final image would look like. Based on this engine NVidia has created a Demo for their new line of cards called Design Garage 1. It is supposed to show of the abilities of their GPU to render high quality images. I have not been able to test this one myself however from the review on the internet by Ryan Smith [57] I can see that it is supposed to achieve about 3.6 frames per second in the demo using the top of the line GeForce GTX IBM Interactive Ray-tracer (irt) [31] This one is very similar to the OptiX engine by NVidia. However it was developed to be used with Linux. There are some demos to be seen on the internet of it running on 3 PlayStation 3 consoles. It is also thought rather for interactive ray tracing then real time ray tracing. Although as it could be seen on the demos the scene is not as much complex as the ones used in OptiX. However this one has a potential of being one of the best solutions for real time ray tracing. There are two factors making it good for real time solution first is the ability to split the image rendering among multiple machines. Second is the cost of the PlayStation 3 which is getting very low and now 3 or 4 PS3 are less expensive than a good PC offering much more computational power. According to the document found on the IBM site 4 Cell processors should be able to render static scenes of about 1 Million polygons at 720p resolution ( pixels which is roughly 1 Megapixel) at speeds reaching 30 frames per second. Unfortunately the paper do not states how many reflections have been used. However as in other papers tests are published without reflections. Yet still 30 FPS for 1 1

29 GPU 27 Million polygon scene is a good achievement considering technology that can be acquired rather cheaply SmallptGPU a OpenCL benchmark This is a modification of Smallpt by Kevin Beason [9] which is supposed to be one of the shortest global illumination programs. Its original takes only 99 lines of C++ code. The modification has been done by David Bucciarelli to be executed on OpenCL compatible hardware. Thanks to that it is a very nice hybrid solution of which part is executed on CPU and other part on GPU. This calculates the speed of both and assigns the rendering part accordingly to one of those. This is in a sense very similar to OptiX in that that it is only partially real time. This is due to the fact that image is being generated in few steps. On a fast machine one could get some real-time rendering however the picture is generated in passes. This means that random pixels are being calculated for a given single pass. Then first the whole image is calculated after several passes. Therefore this benchmark allows user to see only how many rays are calculated per second rather than how many FPS one gets. 5 GPU The Graphics Processing Unit is a dedicated hardware to handle tasks related to visual effects. It can be considered similar to CPU however with different processing units designed especially for graphical data processing. GPUs over went a long way from their beginnings where they simply where used to offload some specific tasks from the CPU to a powerhouses we have today. Modern high-end GPUs have computational power far greater than the CPU. The power of the most top models is measured already in [7], [36] TFLOPS (10^12 floating operations per second) meanwhile best CPUs oscillate around 100 GFLOPS (10^9 floating operations per second). To be exact the best currently available GPU (Radeon HD5970 which is a double chip card)(december 2009) is having computational power estimated to be around 6240 GFLOPS (6.2 TFLOPS) while fastest CPU (Fujitsu's eightcore SPARC64 VIIIfx Venus CPU ) [30] has a computational power of only 128 GFLOPS (0.12 TFLOPS). From the practical point of view however both of those figures have to be rather seen as theoretical computational power if someone would take them as given one could see that the GPU is 50 times faster than the CPU. Of course this is not the average case and one might say that GPU is in average 20 plus times faster than a CPU. This follows from the parallel nature of GPU. The current main stream CPUs slowly get to 6 cores architecture. This is to the fact that most of the currently developed algorithms actually are single threaded so putting more cores will allow only running multiple programs at ones at least if one considers most of the algorithms. In opposite to the CPU modern GPU consists of texturing units, fixed pipeline elements like rasterizer and hundreds of so called unified shader units. The most important for GP computations are shader units so I will focus on those. The rest is not really programmable and follows predefined algorithms to achieve predefined goals. Therefore one might as a question how much faster the GPU is against CPU considering General Purpose computations. Of course manufacturers stating the amount of GFLOPS

30 Frames Per Second GPU 28 given GPU achieves always add up every single function of it. Yet in GPGPU one is mostly only able to use the programmable part of the pipeline. In order to answer this crucial question I have used a program called GPU Caps Viewer v This program is able to perform exactly the same demos using CPU and GPU. I have summed up the results of those in a chart below. Please note that only the animation computations are being done on CPU or GPU rest is rendered on GPU. Radeon HD 5870 vs. Phenom II 965 BE D Quaternion Julia Set 1M Particles Post FX GPU CPU Figure 17 - GPU vs. CPU speed chart To give the short description of the tests above. The 4D Quaternion Julia Set is a visualization of a solid fractal. Here we can see that GPU has the most advantage over CPU. Next is rendering of 1 Million dynamically changing Particles. Here we can see that GPU is only twice as fast as CPU. And the last is image post processing where we have about 11 times the speed on GPU then using CPU. The speed of the GPU follows from the high amount of shader units. This is due to the fact that those are thought to process each pixel or vertex independently so in ideal case for graphics GPU would have had the same amount of shader units as the number of pixels. These shader units can be seen as simple processors with specific instruction sets for graphical operations. However thanks to enormous power achieved due to the parallelism people for some time tried to harness it for general purpose tasks. Especially algorithms mapping efficiently on high amount (10^3 10^4) [14] of threads can benefit from the parallel architecture of GPU. One of such algorithms is for sure ray-tracing. Where each ray can be traced individually in a separate thread. Additionally there is practically no need for synchronization between treads since each thread is responsible for calculating each ray. This is very important since there are only limited possibilities of synchronization of those threads on GPU.

31 GPU 29 The need of harnessing the GPU for GP computations is as old as the shaders themselves. Before the shaders where introduced there were simply no programmable parts on GPU so it could be used only for the operations it was designed for. However from the moment of early shaders one had tried to use them in some way to accelerate computations. First there was only two shader types vertex and pixel shaders or fragment shaders as they were called on OpenGL architectures. These were used to perform as the names suggest the vertex processing of the scene and pixel modification of final image. Vertex shaders couldn t really be used for GP tasks since the whole data went through rasterization (a fixed stage till now in the pipeline). So one used pixel shaders however this was not an easy task, since they were not designed for GP purposes in mind. So the only way to send and receive data were textures and only some basic operations where supported. Over the years people start to develop more shader types like geometry shader (can generate new graphics primitives), domain shaders and hull shader. As a side note domain and hull shaders are in fact to way to aid programing hardware tessellator on board of GPU and though not very interesting to this work. Additionally GPU manufacturers realized that there is no point in making separate shader units for each type of shader and decided to create universal shader units. This allowed more shader units to be used at a particular stage if other stages were not needed. Meanwhile GPU makers realized that there is a great need for the extra computational power GPU can provide and introduced vendor specific GP APIs. Finally recently both the API makers and GPU creators realized that there should be an easier way to use GPUs for GP computations and they introduced so called compute shaders. These shaders are thought to process data that doesn t have to be directly related to graphics. So there are more convenient ways to pass and receive data from within GPU. Additionally there are some communication mechanisms to allow data exchange between instances of shader program directly on GPU. 5.1 GPU programming drawbacks There are several programming drawbacks on GPU comparing to CPU. The most important one is of course reduced ability of the GPU to perform as a GP processing unit. The GPU as already noted was never designed to be a CPU replacement so one needs to do tricks to overcome shortages as one uses it for GP computations. The most major drawback is probably the limited ability of GPU to perform shader debugging. Therefore most of the current shader debuggers for GPU is based on CPU compatibility implementation of the APIs. This emulates the GPU in a referenced way and allows debugging of that virtual GPU. Fortunately there exist some simple debuggers as described in next chapters about DirectX however these have only a very limited debugging options for DirectX11 so it is extremely difficult to check for errors in a complex code. The second major drawback as I can see it is not a lack of functionality it s rather the fact that falls from the architecture of GPU. Namely the GPU is only fast in solving tasks that are

32 GPU 30 highly parallel and uniform in terms of execution time. This means that having a large number of tasks the execution time will take always the time it is needed to perform the most expensive one. This is of course assuming the amount of tasks is less than amount of processing units, otherwise free units will of course perform commutations for available tasks. So to give an example if we have a GPU capable of executing 100 threads at one time and one of the threads will take enormous amount of time to execute then 99 percent of GPU power is wasted for the time it takes to process the thread after other threads have finished. One simply can t execute more than a single shader program at a time. So the to execute next task the previous one has to be finished. (This supposedly have been solved in the Fermi architecture of NVidia, sill one has to write a clever code in order to take advantage of executing multiple shader programs at once) Therefore only programs that run balanced amount of tasks can really benefit from the GPU computational power which will be shown in the results section of this report. Therefore although the computational power of GPUs is substantiate in order to harness it one has to create algorithms which can take advantage of it. Generally there is a lack of pointers on GPU both the data and function pointers are not present. This implies the lack of advanced data structures and recursive methods useful in traversing those. Taking a tree structure as an example on a CPU it will consist of elements having some data and at least pointers to all of the children of a given node. The data is then traversed using a recursive function. However both of these require pointers to work correctly and cannot be implemented straight forward on GPU. One has to remember that double precision operations are not as fast as the single ones. Although most modern GPUs at least those compatible with DirectX 10.1 have to implement the double precision numbers and operations. However since the double precision might be even 3 times or more slower than a single one it is always better to use only single precision arithmetic. This implies however that in some cases the algorithm is fast however not accurate enough or accurate and slow. There is a limited amount of operations that can be put into a single shader program although the current implementations make it high, however one still has to consider that the instruction cap is for the compiled version of the shader and some high level instructions may generate multiple low level ones. This is described farther in the work. There are some operations that can t be used within conditional block and or loops. Due to the architectures of GPUs. This forces sometimes strange implementation approaches to problems that could otherwise be easily solved. Fortunately a lot of the problems can be fixed using some tricks (so called hacks) or simply by doing the preprocessing on CPU before actual processing on GPU takes place. This is especially the case with all the data that has to be generated for each shader program in the same way.

33 DirectX DirectX What is DirectX One should probably start with a short introduction of the DirectX what is it and why it is important for this work. In general DirectX is an API created by Microsoft for its Windows brand Operating Systems for common access to the multimedia hardware. This was supposed to be an answer to the problems with the previous main stream OS called DOS (Disk Operating System). In DOS the main problem with multimedia hardware such as GPU (of course in DOS days there were no real GPUs although some graphic card supported certain degree of acceleration) or sound card was that there was no common standard for accessing those. This meant that one had to create separate code/routine to handle each device that was supposed to be supported. Of course as it is easy to imagine it was not possible for everyone who created programs back then to implement support for every device so most of the times only most popular hardware was supported. To be more precise there were some attempts of creating universal libraries, however those where more part of programs then operating system. Therefore they have not got any popularity and where used only by some software vendors. To solve the problem people at Microsoft (Craig Eisler, Alex St. John, and Eric Engstrom) made a solution to create a common interface for programmers to write application for. Of course this doesn t by any means imply that each hardware type like GPU is supposed to be the same. The hardware manufacturer simply has to create a driver mapping internal functionalities of the device onto DirectX interfaces. This means that on low level bases hardware can work completely different from one to another however high-level they will seam the same for the program. This can be observed even looking at the GPU specs. Recently both major manufacturers (ATI and NVIDIA) cards have only so called Unified Shaders (units capable of performing any shader program) although DirectX specifies: Vertex, Geometry, Compute and Pixel Shaders. However some generations back there were separate Shader Units for Vertex and Pixel Shaders (back then there were no Geometry or Compute Shaders). Now the hardware is using so called unified shaders which run all the different shader types. However from the point of view of the API there are still different vertex and pixel shaders. So as one can see although functionality of the hardware changes the API functionality remains the same allowing programs written for DirectX to run correctly on all hardware which is compatible with it (as mentioned compatibility is achieved via drivers). Worth to notice is that DirectX is actually a set of smaller APIs. These vary from version to version however most important ones are Direct3D (3D Graphics), DirectDraw (2D Graphics), DirectInput (Human Input Devices), and DirectSound (Audio). Since the work is mainly on the Direct3D component of DirectX I will focus on it in the rest of the chapter.

34 DirectX From the initial release (September 30, 1995 [17] of DirectX more than 14 years past and we got to the release 11 which is used in this work. During this time DirectX underwent complex changes and become one of the most used APIs in windows. With each version there were new features supported. Probably the widest used version now is 9.0c which is the final release for the Windows XP (still over 60% [40] of computers) and is used on XBOX360, additionally newer version of Windows support it as well. Unfortunately 9.0c has a lot of issues which couldn t be easy fixed so Microsoft decided to introduce a lot of changes in DirectX 10. Unfortunately these changes required changing Driver Model (WDDM - Windows Display Driver Model) and since Microsoft for obvious marketing reasons decided not to extend XP with it, DirectX 10 and up can run only on Windows Vista and up. One of the biggest issues with 9.0c is that it uses a lot of CPU computational power on so called API Calls. This means that in some cases which require a lot of operations (very important one would be advanced GPGPU algorithms as well as complex visual effects requiring multiple rendering stages) 3D performance is actually limited by the speed of CPU and not GPU. Introduction of the new driver model and some changes in the API allowed to significantly reducing this bottleneck. Current version (11.0) farther improves on the assumptions of DirectX 10. In addition backward compatibility has been introduced once again meaning that one can run DirectX 11 runtime on hardware which is compatible with 10. This however implies lack of certain functionalities. 6.2 What is new in the DirectX 11 [18] As already noted DirectX version 11 and most notably the Direct3D which is of the most interest for the real-time ray tracing builds on the foundations of the DirectX 10. It introduces multithreading support (more than one core can be used to coordinate DirectX in oppose to the previous versions which used only one core essentially creating CPU bottlenecks when still a lot of CPU power remained unused), Compute Shaders, Domain Shaders, Hull Shaders and Tessellation. Below one can see the Direct3D pipeline the green boxes represent new functionalities extending Direct3D 10. Figure 18 - DirectX 11 pipeline - Microsoft presentation

35 DirectX The Direct3D 11 is a super set of the 10.1 version which is super set of 10.0 version. This means that all the functionalities of previous ones are the same and only there are some extensions added to those. Beside new and improved features integrated in DirectX 11 there is a lot of improvements in DirectX 10 which make it ideal for implementing Ray tracing based on the technology. Those include full support of integers in DirectX 10 and full support of double precision in The double precision might be needed if one considerers sending multiple rays for a single pixel (for anti-aliasing purposes). Furthermore there have been introduced additional parameters allowing the pixel shader to read the primitive id it is calculated for. This allows use of rasterization to determine primary hits using standard rendering method which by now is extremely fast on GPU. As measured by my experiment one can achieve about 3000 frames in a resolution 800x600 on a scene of primitives in calculating primary hits comparing to about 30 in a true ray tracing approach on the same hardware. This means that for simple scenes this approach is about 100 times faster for determining primary hits. 6.3 Compute shaders The most interesting of the new features of DirectX 11 is the Compute Shader (CS). The shader is a part of new HLSL 5.0 (High Level Shader Language) which is one of the elements of the new DirectX API. This shader significantly eases usage of GPUs for general computations. Prior to the introduction of CS one had to either use vendor dependent API (like NVidia s CUDA or ATIs Stream SDK) or do some hacks with Pixel Shaders. Both of the solutions where not optimal since although vendor dependent APIs seam to work great with cards of one or other manufacturer they haven t been compatible with the cards of other. This is a big issue since neither market share of those companies [21] is big enough to say the other could be ignored or force other to adopt the technology. The more universal approach with the misuse of PS to do general purpose computations is also not the best since it lacks means of exchange data between threads (in this case pixels) and forces usage of a lot of unnecessary steps. This all in all reduces the speed of the program running on GPU. Fortunately Microsoft realized the growing need for unleashing the computational power of GPUs in GP solutions. This led to creation of the Compute Shaders. The shader itself is very similar to the pixel shader however instead of processing each pixel of the screen it is executed on specified number of threads. Each thread is executed separately however all the thread shares the same code and the same input and output data. Additionally there are instructions allowing exchange of data between threads. Therefore CSs are ideal for processing large amounts of independent data since modern GPUs are able to execute as many as 640 threads at one time (in case of dual processor cards). 6.4 HLSL and ASM For a long time there have been two methods of writing a shader program. First was to write the program in a native assembly language. This language was supposed to represent

36 DirectX single instructions curried out by the GPU during the run of shader program. In the previous versions of shader models this was very important since there were strict limits of number of the instructions that can create a shader program. In the first generation of the SM (shader model) one could only use 64 of those. To make the matter worse some of the instructions in the assembly language could use more than one instruction slot eventually shortening the number of instructions of a single program. Therefore it was very important to know all the instructions and their cost. Both in terms of used up slots and computational power needed. Both the computational power and instruction slots where linked together meaning the more slot were needed the more time it took to execute instruction. Fortunately the number of instructions with each version has been increased so more complex algorithms could be put onto the GPU. Initial versions of the assembly language did not supported the looping instructions so all the loops have been just code repetitions making the instruction limits very hard to overcome. The second option of writing the shader programs is using so called HLSL which stands for High Level Shader Language. This language is very similar to the GLSL and CG. It is based on the c like syntax. It supports a special set of functions defined for it. Most of those are thought especially for graphics application. Some of them can be mapped directly onto the ASM instructions and others are made of several low level instructions. With the version of DirectX 11 HLSL is compiled dynamically to ASM so different cards can produce different amount of low level instructions for given HLSL function. The program written in HLSL is compiled by a very high optimized compiler to the ASM. This compiler generates a code containing only instructions needed to generate final outputs of the program. Therefore it is not certain that the output ASM program will do exactly all the operations as defined in HLSL. Some of them might be strip if they do not contribute to the final result. The result in case of the pixel shader would be the color of the pixel in the output. There are several compatibility standards depending on which code is being generated. In early versions the loops and conditional instructions have been treated differently than in the new ones. The newest version which is part of DirectX 11 is Shader Model 5. It is supporting class architecture making it almost as comprehensive as CPU languages. This is especially usefully with a very high limit of instructions available on current GPUs. Unfortunately there is still lack of pointer support making it virtually unable to support complex data structures or recursive methods very useful in traversing them. It is also very important to notice that from version 10 of DirectX and up one can use only HLSL to write shader programs. This is probably due to the fact that, architecture based compilers are supposed to generate more efficient code not depending on one of the hardware implementation. Additionally one is able to generate shaders in real-time and compiling them while the program is running. This allows better compatibility with different cards allowing developers to generate options depending on the hardware it is supposed to run on. As especially older hardware will not run some of the modern effects. Unfortunately

37 DirectX however this implies longer loading times of the application when it is necessary to compile the shader. This might be especially problematic with complex shaders which take long time to compile due to much optimized compiler. 6.5 DirectX 11 hardware At the moment of writing the thesis there is only one commercially available discrete graphic card line fully supporting DX 11. This is the R800 family of AMDs graphics division also known as ATI. It is also referred as Evergreen or simply by the commercial name of Radeon HD5xxx series. There is a range of cards on the market ranging from so called midrange to enthusiast solutions. Since this is the only commercially available card at the moment of creating the thesis it was pre chosen by the market to be used as a basis for the thesis. Since one of the assumption was to take advantage of DX 11 hardware in creating real time ray-tracer Evergreen Evergreen is the newest and first DirectX 11 compatible card from AMD. The Radeon HD5870 card is top of the line single GPU solution. It is also according to multiple tests the fastest single GPU (while writing this part of the thesis, it has been suppressed by the Fermi from NVidia in some of the benchmarks) on the market delivering in terms of computational power almost 3 TFLOPS in single precision operations. In order to understand the capabilities of the GPU one has to go into some hardware design details. Therefore taking the HD5870 as an example I will describe general build and features of this GPU [56]. It has 1600 Stream Processing Unit (SPU) also referred as Stream Cores. The SPU is the most basic unit executing arithmetic instructions. The SPUs are bundled together into larger units called Stream Processors (SP). Each of SP contains exactly 5 SPUs where the 5 th (denoted as t on the image below) SPU is more complex unit capable of transcendental functions along with the base functions of an ALU (Arithmetic logic unit). Additionally each SP is having a Registry File and a branch unit. The SPs are the smallest execution units in which each SPU is executing the same instruction. This implies however that given single, double, triple or quadruple vectors their process time is supposed to be the same. This in some cases is of course very positive and in others not so. Figure 19 - Stream processor diagram from AMD presentations Due to the architecture single SP is capable of

38 DirectX bit floating point MAD per clock 2 64-bit floating point MUL or ADD per clock 1 64-bit floating point MAD per clock 4 24-bit Integer MUL or ADD per clock Special Function Unit : 1 32-bit FP MAD per clock Very interesting thing from the stand point of GPGPU is that dot product has been implemented as single clock instruction. However according to the very useful instruction set reference [19] only the version for 3 dimensional vectors is implemented as single instruction one. Going farther into architecture the SPs are also grouped into so called SIMD Cores. SIMD Core contains 16 SPs and additionally texture units, L1 cache, shared memory, and controlling logic. Figure 20 - Evergreen GPU diagram from AMD presentations This architecture provides very high theoretical performance of the GPU however in order to achieve it program has to be organized in such a way that all the SPUs of a single SP have to do some work. Otherwise up till 80% of a single SP can be lost if only one SPU is used in SP. This unfortunately means that single operation instructions such as loops or ifs are running very slowly using only 1/5 of the GPU potential. Therefore heavy looping is not recommended, however unfortunately necessary for ray tracing.

39 DirectX Fermi [57] Fermi is a new architecture from NVidia. It is the direct competitor to the ATI/AMD card described above. This architecture was not available during the most work done on the project since it was 6 months late according to the original schedule. However fortunately near the end of the work on the project it finally came out. This is important since this architecture is supposed to be focused on General Purpose computations and offer significant advantage in those against the Evergreen series. Figure 21 - Fermi chip overview from NVidia materials NVidia as already mentioned had a different approach to constructing their chip then AMD had. Especially they have decided to divide the whole chip into 4 Graphics Processing Cluster (GPC). Each of those clusters has been equipped with a separate rasterizer engine and set of 4 Streaming Multiprocessor (SM) which contains 32 stream cores. Additionally each of the SMs has its own shared memory, texture units and so called PolyMorph engine. This engine is responsible for geometry executions and tessellation. A nice side fact is that according to NVidia their chips have increased the shader speed between series NV30 (GeForce FX 5800) and GT200 (GeForce GTX 280) 150x while the geometry speed has increased only 3 times. Fermi thanks to the change in the architecture is supposed to increase this 8 times to the previous GT200 series. Additionally the new architecture is supposed to be fully compatible with IEEE specs, allowing full 32bit integer operations and 64bit floating point operations at 1/2 of the speed of 32 bit ones. This is a Figure 22 - Stream Multiprocessor Fermi

40 DirectX significant increasement over the previous series which offered only 1/8 of the performance in so called double operations. Yet another interesting feature of this chip is the ability to execute multiple shader programs at once. Till now all the shader programs had to be executed concurrently and if they were not using the GPU fully the part of the potential computational power was lost. Over all this GPU promises to be a great chip for General Purpose computations. Form some early tests one can see that the difference between the Evergreen and Fermi in that field is significant. If it is also better for this solution we will see in the test section. 6.6 DirectX 11 benefits for real-time ray-tracing summarized The most important benefit of using DirectX 11 against manufacturer specific frameworks is the compatibility with several architectures. This allows programs written using it to run on all modern generation of cards. At the time of finishing this work all major GPU manufacturers are supposed to have at least one card family (range of cards having the same architecture but potentially different speeds or lower amount of functional units) running DirectX 11. Additionally DirectX is a very established standard meaning that there is a very high chance of it actually being on the market for a longer period of time making the solution created for it to run faster over the time while new GPUs will come out. This means that a solution created today being able to render 4 frames per second in two years will run at least 8 and in 3 years 16 FPS just because of the progress of the hardware development thought to run more efficiently using the DirectX API. This is due to the fact that actually as mentioned now not the card manufacturers decide on the futures of the cards but the API does it for them. So there is no much room for competition in adding new features which in most of the cases would end up unused just because they are not supported in API. Therefore what they can really compete is speed of their solution and that can already be seen on the market today and will be more visible in the future. Additionally current version of DirectX finally supports all the features that are needed in order to create efficient ray tracing solutions. The most important of those is of course the Compute Shader which is especially though for general purpose computations allowing best usage of GPU computational power for parallel tasks. Additionally the dynamically compiled shaders with efficient compiler for given architecture promise to deliver best performance for given program without need for modifying it for each GPU separately. Furthermore more advanced visual effects can be achieved thanks to full support of both integers and double precision numbers although one has to observe that in case of the last the speed is reduced due to the cards architecture. Finally one of the most important for ray tracing acceleration is the ability of Direct3D to render scene using standard pipeline and save data on triangle indexes for each pixel. This is done very fast by current GPUs and can accelerate primary ray geometry tests by a factor of even 100 comparing to plain ray tracing solution.

41 Current work 39 7 Current work As already noted the work on the problem can be split into two phases. First phase is to create the acceleration data structure and the second is to render the image. There is a lot of interesting work made in both of the subjects. In this chapter I will focus on describing the work currently done in the field for more detailed discussion on what has been chosen for the thesis please see the next chapters. First it is very important to know what actually the best data structure for ray tracing is. As always with the simple questions there is no simple answer depending on the scene there best data structure will vary as it was shown in Havran (2000). However the same paper also notes that in most of the cases the ideal data structure is the KD-tree. Following this course one sees that most of the work curried currently on speeding up ray tracing bases on this data structure. This includes work both on implementing the kd-tree building and traversing. Both of this are of almost equal importance since during a normal scene render with dynamic objects both the structure has to be regenerated and traversed. Some of the work which is curried is to improve CPU implementations of the ray tracing algorithm. Other part is to put the algorithm on GPUs. One of the notable solutions for the CPU is the Shevtsov one [55]. It is a state of the art CPU ray tracer which is used as a benchmark in many other papers. The focus of the work is to create parallel algorithm for crating KD-trees. However the goal is here to map it on CPU so the ideal amount of threads is not too much since management of the threads could eat up all the advantage of the parallel work. So as noticed in the works on GPU acceleration it remains as a benchmark only and nobody is trying to copy the solution to work on GPU since there algorithm was not though for high threaded solutions. The GPU implementation vary from the ones based only on DirectX like the one proposed in Horn's [25] solution. Their solution is based on the DirectX 9 architecture which had a lot of drawbacks and wasn t really thought to be used for GP tasks so they have to face a lot of the problems connected to the API itself. Those were mostly lack of support for integers and double numbers and high CPU usage while making API calls. These however have been solved with introduction of newer versions of DirectX. The other is based on CUDA like the one proposed in a research by Microsoft Research Asia [65]. The paper farther more goes into KD-tree construction for accelerating both ray tracing and photon mapping. They propose a solution which as they claim is more powerful than others both CPU and GPU ones. The solution is based on processing the kd-tree nodes in two modes first of them are large nodes and second are small nodes. Large nodes are the nodes that contain more geometry then the prescribed limit and small ones are the ones that already meet the requirement. They admit that the quality of the tree is not the highest possible however the total process of rendering is still the best one can find. Their algorithm however unfortunately requires a lot of switching between phases that are done in parallel

42 Analysis of the problem 40 and the ones that are done sequentially. Therefore I m afraid that they may be losing a lot of time on GPU CPU synchronization. Recently a study by NVidia [1] (presented on High Performance Graphics 2009) shows that most of the ray traversing algorithms for kd-tree on GPU are inefficient. The same paper tries to answer the question how to make it more efficient. Additionally there are a lot of ideas on how to increase the ray tracing speed. One that has very good results have been used in already mentioned Interactive k-d Tree GPU Raytracing. The implementers used the standard rasterization technique to generate primary hits. This is a good solution for less complex scenes since there is no actual culling other than back or front face on standard rasterization. Therefore large and complex scenes may work slower than the ones build on an all the way ray tracing solution. However I decided to test this one as well to see how well it accelerates. In terms of ray casting witch according to the article is used for comparison of speeds is a great performance booster for scenes that can be handled by GPU rasterizer. 8 Analysis of the problem Before going into details of the approach used in the system to generate the ray traced images in real time first let s focus on the problem at hand and possible solutions. As already shown in previous chapters the ray tracing consists of several steps. Some of those are simple yet have to be repeated very large amount of times others are complex yet are used in a more unified way. I will use so called divide and conquer (DNC) [16] approach to analyze the problem and try to find best solutions to smaller problems. The DNC is very powerful approach in solving complex problems since it allows splitting them recursively into smaller ones to the point where one ends with a list of simple tasks. Those tasks are then building blocks of the whole solution. Therefore optimizing those small parts should be much easier than thinking about the optimization of whole solution and should lead to optimizing of the whole solution. As shown in the previous chapters the complex task of making ray tracer is first split into two tasks which can be described building data structure and traversing data structure. Afterwards those can be split as shown on the list below. Building data structure o Loading scene read geometry read lights read materials read camera read animation o Constructing data structure Choose acceleration data structure

43 Analysis of the problem 41 Organize geometry according to some set of rules defined for the data structure Pass structure to the renderer Traversing data structure o Generate eye rays o Find ray geometry intersections Traverse data structure Test each subspace for intersections o Find intersection color Find nearest intersection of ray and geometry Set ambient color Calculate defuse color Calculate specular color o Find shadow rays Traverse data structure toward each light point Test for intersecting geometry if any shade o Find process additional rays (reflected/refracted) Reflect and or refract ray depending on the material properties Perform find ray geometry intersections Marge ray colors o Generate final image

44 Analysis of the problem 42 Figure 23 - ray tracer diagram 8.1 Geometry type to use Starting from the top of the pseudo solution for the ray-tracing one has to first answer the question what type of geometry to use. This basic question will determine vast majority of answers to other parts of the algorithm. However in practice the choice here is determined by the fact whether one wants to present complex scenes containing arbitrary objects or just geometrical ones. If the answer to the question is like in our case arbitrary then one has to decide on the subdividing geometrical elements like triangles or quads. The choice between those two is as well matter of taste, since with both one can achieve similar scenes quality. In our case I decided for triangles just because it is more popular choice and there is a lot of software generating scenes using that kind of subdividing geometry. In particular 3D MAX which I m using to generate the scenes for ray tracer in standard uses triangles as base d scene storing and reading As mentioned above I have decided to use 3D MAX as the editor for creating scenes. The choice to use this 3d scene editor was taken due to several reasons. However the most important ones would be established standard and my knowledge of the software. Additionally the scene format produced by the software contains all the information about the scene which is required for correct rendering of dynamic scenes. In order to speed development in this part of the project not directly connected to the goals of it I have found an open library to load the scene from the file format. Other option to solve loading scene process would be to use different standards or create own standard for holding scenes.

45 Analysis of the problem 43 However since this is not a direct goal of real time ray tracing there is no point in investing time in order to implement custom scene files and editors. It is much more convenient to convert the data created externally and read by external library to the eventual needs of the solution. This saves a lot of time which in fact is very limited for a solution of such a complex problem as real time ray tracing. 8.3 Choosing acceleration data structure Choosing good acceleration data structure is the key to achieving best possible speed with the given geometry. Especially since I have decided to use more complex scenes containing a lot of triangles this is the be or not to be of real-time. Apart of the amount of objects/geometry in the scene one has to take the architecture used in order to implement. Since this project started with a simple assumption to use advantages of DirectX 11 also the data structure has to be chosen with that in mind. As already presented in one of the previous chapters, there is a really large selection of those from which one can pick. Unfortunately with the new architecture which comes with DirectX 11 it is really hard to find the solution that could best use it. Therefore in this case I have decided to outsource the choice a bit. Since it has been mentioned already in previous chapters that the kd-tree is supposed to produce the best acceleration I have decided to use it in the thesis. As the alternatives here one could choose any other data structure. Actually to choose the best one would have to do a series of experiments on building and traversing each of the structure. It s true that there exists some papers on which try to find best data structure however according to [60] most of the research has been conducted for traversing speed and not both traverse and regeneration. This in fact could be probably a topic for a different thesis. In some of my previous attempts I have been trying to use voxel structure however this was not really best for achieving real time ray tracing. 8.4 Way to generate data structure After deciding on the data structure used for accelerating the rendering process one has to decide on how to implement it. As explained in the previous section I have decided to use kd-tree. However there are many different ways to implement the constructing algorithm. What they all have in common is that they have to go through all the geometry in the scene and produce tree structure. The quality of the space partition can be measured. The higher the quality the more uniform number of geometry is distributed over leafs of the tree. This allows the structure to increase rendering speed more uniformly as well. However unfortunately the more quality the tree has the more complicated and time consuming algorithm is required. Additionally the architecture here plays an important role in the construction of the tree. There are many algorithms used to create the tree, however most of those are single threaded or at best use limited amount of threads is used. This unfortunately can t fully benefit from the highly parallel GPU which is the most powerful component of the system as explained in previous chapters. Even a Horn s GPU approach [25] mentioned in the chapter

46 Analysis of the problem 44 about current work introduces a lot of steps that have to be done using single threads. Therefore creation of the tree doesn t seem to be best fitted for a pure GPU implementation. Figuring that out the best option to do was to compact the high parallel stages of the algorithm and put them on GPU and for the rest to use CPU since it is much faster for low threaded algorithms. The other options here so putting it entirely on either of these devices were not as good from the theoretical stand point. Additionally that have been tested already many times so I have decided to try something else than approaches that I have found in the literature study. During this step I had to also think about the next one. Creating the structure is one thing however a ready structure has to be traversed in order to produce the image. A valid observation at this point would of course be that the cost of creating and traversing structure has to be lower than simple naïve approach (testing every ray against all geometry). Otherwise one do not achieves any acceleration yet might be even deceleration. Having that in mind and the actual functionality of GPU which is not suitable for recursive algorithms that would be the best option in case of tree traversing, I have put some hints onto the structure how it has to be traversed. This is described in the details in the next paragraph. 8.5 Traversing data structure Since the intermediate goal of the thesis is to use the astonishing power of GPU to accelerate this step is ideal candidate to be put on it. The traversal of the data structure has to be performed for each ray. Since amount of primary rays even for 800x600 resolution is about half a million this ideally fits onto GPU architecture. The traversal algorithm for the structure has to be however a bit more tricky then the one usually used to traverse a tree structure. Normally one would choose most likely either recursive algorithm or a stack based solution. However both of these unfortunately do not map correctly on GPU. Therefore I had to come up with a different idea. I have decided to enhances the tree structure with a path that can be fallowed using a simple while loop. The path following algorithm is almost straight forward. One starts with the root and each leaf has a successful crossing path direction and unsuccessful one. Success is when the ray crosses the part of space described by the leaf otherwise it is unsuccessful. The algorithm loops until there are no more paths to choose meaning that the tree has been traversed. For more details refer to next chapters. 8.6 Generating image This is the last part of ray tracing a scene. In this part all the reflected refracted shadowed rays are taken into account. In to this part I have classified all the small choices that accelerate the process as a whole. The most important here would be the decision to use the rasterization as an acceleration technique for primary hits. Since the GPU rasterizer is

47 My Idea 45 very fast one can use its power to determine primary hits accelerating the process for the by skipping traversal of the data structure which still is heavy. Shading objects is still something that requires some decision on how to do it. As mentioned in one of the previous chapters there is a lot of techniques that could be used in order to map object colors onto the screen pixels. They range from simple shading like flat one or Gouraud toward Phong. Since this solution is supposed to be a fast yet showing some advantage of ray tracing I decided not to use any of the primitive shading techniques. On the other hand the solution is supposed to be fast so I have decided to use the Blinn Phong shading model which is supposed to be faster than the standard Phong model. The shading however is still achieving very nice visual effects. The last of the important things to decide is where the final image should be displayed. There are at least two options here store it to hard drive or put it onto the screen. Since this is a real-time solution I have decided for displaying results on the screen. 9 My Idea This chapter I would like to start with a short description of my approach to this project. I have long thought about what is actually to be achieved as an optimal solution to the real time ray tracing that would both show the potential for industrial and academic appliance. I have also thought about a very limited time for creating a solution and the fact that researchers are looking for optimal solutions for the past two decade or so. Therefore since the time for writing master thesis is very limited I have decided to limit the options of the software to the absolute minimum required to show potential of real time rendering. However to make the solution the most informative from academic point of view I have still decided on not to use any simplifications or special case scenarios. These could in fact increase number of frames per second in some special cases. However in terms of general real time ray tracing algorithm it wouldn't be a good solution. After going through the current work and the documentation of the DirectX 11 I had several ideas. At the point of writing this thesis the DirectX 11 is brand new and there is still very little documentation available so the idea is based on my experience with previous versions of technology and adaption of current solution on the new ground. My initial idea was to build the kd-tree on the GPU. In that I planned to use one of the existing algorithms. I thought of modifying it in a way that it would benefit from the new hardware capabilities. Regrettably I could not find an algorithm which would efficiently use the high parallel architecture of GPU. The algorithm proposed by Zhou [65] requires unfortunately many steps that can t be done in parallel so although is pretty much fast it would require usage of multiple compute shaders. Thinking on the problem how to compact stages in the algorithm I decided to try out slightly different approach. I was focusing on creating algorithm that could be executed for most of

48 My Idea 46 the time per triangle. Since I m using triangles as base elements of the scene such an algorithm would highly benefit from the GPU parallelism. Additionally my primary goal was to make the implementation as scalable as possible and making it dependent on triangle would achieve this goal. Then taking into account research by Kalojanov and Slusallek [28] I realized that the fastest acceleration data structure to be constructed on GPU is the Uniform Grid. I thought of a solution which actually could both be fast in constructing and fast in traversing. I couldn t find a satisfactory answer in the literature that I have been studding so far therefore I have decided to think of something myself and experiment a bit. Whether the approach that I have created is good or bed one can see in the result section of this work. I figured out that maybe the best option is space discretization by creating the uniform grid first. On top of such a space I could build a tree which would require just a fraction of time needed for building tree from scratch. I thought of additionally cutting the triangles that are contained in multiple cells. However I realized that this is unnecessary since the kd-tree traversal algorithm will check first if the ray hits subspace described by the tree node and then it will try to hit triangles within it. Therefore it is actually enough to put indexes of triangles in the cell. This reduces the amount of computations and memory used for storing the data. The tree is build using simple yet effective way splitting always space in the approximate mid in respect to amount of triangle on each side of the split plane. Additionally the splitting planes are put parallel with one of the axis in rotating manner. So to give the example first the space is divided using plane parallel to x-axis then the subspaces are split using y-axis parallel planes and those are split using z-axis parallel planes and so on until the algorithm reaches predefined goal of space splitting. One may ask question is the approach not performing a lot of possibly unnecessary computational steps. The answer is probably in most of the cases yes. However since the majority of the tasks is put on GPU with enormous computational power it shouldn t have much impact on the whole process. Probably the need of synchronization of data between GPU and CPU in more advanced approaches would eat up much more computational power then doing some extra triangle splits. Afterwards the step of building the tree is very simple. This is only counting amount of triangles in each row of the 3 dimensional grid. So to be exact if one wants to do a split along x-axis one has to count all the triangles above the potential split plane and if it s less than half then move the splitting plane one grid cell farther and count additionally included triangles if this is a half or more remember the amount of triangles in each subspace and run splitting for each subspace. In this way I hope to achieve both very fast and scalable algorithm for generating the acceleration data structure. It is obvious that the tree will not be of highest possible quality however the time used for the generation should be minimal and compensate for the lack of

49 Components of the system 47 quality. The process furthermore can be improved adding additional step for determining the optimal grid size which is to be generated on GPU. Moreover I m employing rasterization [12] in order to achieve best possible performance for primary hits. As well I have used some algorithm optimizations to try getting the most out of GPU CPU hybrid. All the optimizations will be described in a chapter at the end of the work. 10 Components of the system System consists of several components. The most important one is the ray tracer itself which is a shader program on a graphic card. In addition to the ray tracer component which renders the final image there are other ones responsible for preparing the scene. Those can be seen on the diagram below. Figure 24 - System Component diagram If one would like to go from the beginning of the rendering process then we have to start with the scene component. This is represented by a Scene class in the system. This class is responsible for loading scene out of a 3ds file which is a common scene file in 3d graphics software. It also does some initial processing of it and applies some degree of animations for primary ray acceleration. After the scene is loaded and converted from the file structure to the program structure it gets loaded into the acceleration data structure building component. This one is responsible for creation of kd-tree structure and converting it to the format which is readable by GPU s ray tracer. The scene building component is divided into initial GPU part represented by three classes called AnimateShader, PreComputeShader and DivideSceneComputeShader. These are responsible for preparing the scene to be put onto a kd-tree acceleration data structure. The first one animates the triangles based on the animation instructions saved in the scene file.

50 Data structures 48 The second two do the initial space subdivision on animated triangles. The tree building task is done by CPU and it is represented by KDTreeBuilder class. After the tree is build the tree leafs are filled with triangles by the compute shader program represented by FillShader then all the data is feed to the GPU structured buffers. Afterwards ray-tracer shader comes into action and renders the scene. Then depending on the settings either the scene is animated and acceleration data structure is rebuild or rendering is just repeated with the same acceleration data structure, but possibly using other parameters that do not require rebuilding of it. 11 Data structures 11.1 Structured buffers In the HLSL 5.0 which has already mentioned is a part of new DirectX 11.0 there are new types of data structure. These were introduced to ease data access for General Purpose data. This kind of structure are simply called buffer. There are two types of buffers first of them is structured buffer [35] and second one is un-structured or raw buffer. The main difference as one can easily see from the name is that a structured buffer is storing elements of the same size which are described by a structure of one or multiple data types. And the second one is just storing a set of bits without any structure behind. Both of these structures can be used inside of a pixel or compute shader and occupy a texture register although they are defined in some different manner. This is probably due to the established mechanisms of texture data fetching. The new structured buffer is an almost obvious choice to use for ray tracer implementation. It has following advantages over the previously used way which was passing the data over a texture. As already noted previous versions of Direct 3D where not really thought for the GP computation usage therefore there were no need to allow passing arbitrary data to GPU. The shaders had specific purposes so one didn t thought about the need of passing arbitrary data to shaders. Therefore in the realm of DirectX for general data types there have been existing only constant registers. Those however could pass only small variables. So in order to pass huge data structures like a scene description one had used textures. This was in a sense abusing of textures and texture manipulation functions. As well this worked only for some specific settings since any form of loss compression or filtering often used on textures would destroy the encoded data. Additionally one had to target specific pixels in order to get the data one wanted to obtain. This required some additional calculation since the texture is addressed within shader not in a simple way like pixels on x from 0 to width and on y from 0 to height. They are accessed using from 0 to 1 scheme. These inconvenient things about the textures for GP data passing are just perfect for the normal textures. Since if you write normal visual pixel shader you don t really have to worry about such things as getting some exact values from a pixel in contrary one wants to get the best possible color value which can be compressed and interpolated in order to get some other benefits.

51 Data structures 49 Therefore the textures where not the best option for passing GP data and fortunately with the introduction of the structured and raw buffers one doesn t have to use them anymore to pass that kind of data to the shader. Now the best option for a data which has a defined structure like the one describing a scene is structured buffer. It allows an array like access so one always gets exactly the data needed without doing any tricks which could possibly introduce some serious errors. The real example from the application would be that the solutions use 2 such buffers in order to keep both the information about acceleration data structure and the scene itself. Of course first holds a flatten KD-tree and second one contains all the triangle data My data structures This chapter describes data structures used in my solution. As already mentioned main structure would be the acceleration structure. This is the KD-tree structure, it can however be implemented in multiple ways and the one I m using is described below KD tree In the previous chapters I have already given a thorough description of KD tree. At this point it is only worth to remind that the structure as the name suggests is built as a tree. Each of leafs is supposed to hold geometry from a given subspace. Nodes above given one describe larger compound parts of the space. Ending with the root node which describes whole space. In the case of the solution described in this thesis the tree has been first generated as a tree structure and later flattened as it s described in the next section of the report. Below one can see the exact structure of each of the tree nodes. struct TreeElement float3 maxbound; float3 minbound; int texturepointer; int trianglecount; int successjump; int failurejump; ; To explain the structure above the min and maxbound are the coordinates in space defining the subspace described by the given node. The texturepointer and trianglecount are responsible for describing triangles inside each of the subspace. The texturepointer is the pointer to the first triangle in the special structure holding indexes of triangles in the scene and the count is the amount of triangles in the subspace. Also important to note is that all triangles after the pointer are in the given subspace until the count of triangles. SuccessJump and failurejump are described in the next section on the mapping onto GPU.

52 Data structures GPU mapping The tree structure is very easy to implement and traverse on CPU. This is done mostly through usage of pointers (both data pointers for defining the tree structure and function pointers for defining recursive methods to work on a tree). Unfortunately the GPU doesn t implement the notion of pointers at present. Therefore unfortunately all that seems pretty straight forward on CPU is not that easy on GPU. First of course one has to consider how to store the tree in a flat structure available on GPU. This is regardless of choosing textures or buffers for passing the data. Fortunately there is a mapping method for a binary tree onto an array. It s called Ahnentafel list [47] and it s pretty much simple. It works as follows if one takes a node of a tree with flat id i then the children of the node are under i*2 +1 and i*2 +2 respectively for the left and right. So that should solve the problem of storing a tree structure and passing it to the GPU. Figure 25 - Ahnentafel list from a tree diagram Now there is a problem of processing the tree. There are many algorithms for traversing the tree thought especially to be used on GPUs. I however though of an idea to build the algorithm to traverse the flatten tree as an array instead as it being a tree. While creating the tree algorithm pre-computes the array element number to go to the next element, both after successful intersection and unsuccessful one.

53 Data structures 51 Figure 26 - Successful / unsuccessful jump IDs embedded into tree Using those ids one can actually do a while loop over the flatten tree and using the pointers loop through the tree. This approach is fairly simple and allows ease of data structure traversal on GPU. One has to do only a test on a ray box intersection to decide which of the traversal paths to choose. This method doesn t require any stack or similar data structures to be emulated on GPU. It achieves fairly dissent speed. Figure 27 - kd-tree correlation with data arrays Additionally as shown on the figure above each node of the tree holds information about the first index in an array of triangle indexes (Index-position array) and count of triangle in the node. Within this array of indexes n indexes after first one for a given cell where n is the number of elements in the cell all of them are belong to the given cell. The indexes in the index array point then to the triangle properties array so one can easily get triangles within given tree node. Since information about triangles is not important for other nodes then leafs only indexes in leafs are set to nonnegative values. Thanks to that one also is able quickly to recognize that a given node is a leaf. Additionally to reduce the number of data fetches by texturing units the index data structure has been extended to hold also

54 Data structures 52 information about the position of the triangle. This is used in most of the operations in order not to get large triangle description until one is certain that the triangle is hit Data flow between components As noted there are three main components in the system. The data flow between those could be described as follows. First we start with the 3ds file containing a description of the scene: all the materials, cameras, objects, lights and so on. This gets converted into list of triangle positions, normal, texturing coordinates and material properties for each object. Additionally cameras and lights are extracted. The flow of the data in the solution is demonstrated on the diagram below. Figure 28 - data flow diagram The list of triangle data is then passed to the compute shader which modifies the triangles according to a given animation frame. Next another preparing compute shader kicks in. The shader creates out of the triangle list 2 new lists. The first list is the list of all cells in the grid

55 Algorithm in detail 53 with the amount of triangles contained in the cell and the pointer to those in the next list. The second is the sorted triangle list so that after a certain point all the triangles belong to the same grid cell the next certain point appears. That lists are then passed back to CPU. This is done because at this moment it is faster to build a tree on CPU which is described in the next chapter. On the CPU for ease of access and speeding up of computations both this lists are converted into a 3 dimensional array of data where each element is corresponding to a grid cell in the semi-finished acceleration data structure on GPU. Then the array is converted to the tree as described in the next chapter. All the data is then once again feed to GPU and the tree is converted to a flattened representation of it. This is based on two arrays. First of them is flattened array representing tree construction of the tree itself. The second array is containing all the triangles in the tree as indexes to the triangles in the list of those in the scene. Additionally as mentioned this array is enhanced with some basic information about the triangle. The flattened tree along with the data read from the scene file so the list of materials, triangles, lights and camera is send to GPU. On the GPU final stage of the rendering takes place. So the image is generated based on the acceleration data structure and the list of scene elements. 12 Algorithm in detail The solution as mentioned has been based on DirectX SDK August 09. Currently only language supported natively by the SDK is C++. Fortunately this is also one of the most advanced languages producing very fast code. Therefore one can say that the technology for writing the solution has been pre-chosen. In simple words the solution is a WIN32 program compiled on Visual Studio 2008 with help of the DirectX SDK Reading scene As a scene format I decided to use the 3ds file. There are many 3d scene formats however in favor of using the 3ds format are two things it contains the whole scene with lights materials cameras etc. and it can be exported from a vast majority of 3d creation software. Thanks to that one can edit a complex scene and see the result quick in the software. In order to read the 3ds binary file I use a library called lib3ds [29]. The library is written in C++ and contains all the tools necessary to read the 3ds scene file. I use the library to read the data which is interesting for the program and then convert it to internal data structures more convenient for handling inside the solution. The data contains all scene objects, materials lights and first camera. 3ds file has a lot of advanced features not needed in the solution so I have introduced some simplifications. These are:

56 Algorithm in detail 54 All the lights are treated as infinite omni (all direction) lights There is only one camera camera0 in case of 3ds file I use only simplified properties of the materials in the scene All the scene information useful for rendering is then stored in a class called Scene. This class contains simply a list of all the faces with their attributes, list of materials, list of lights and the camera. The simplified information is then passed first to the component responsible for building acceleration data structure. Afterwards full information is passed to the ray tracer on the GPU. One extra thing to notice is that while reading the data the bounding box of the scene is calculated. This is done simply by finding the smallest and the largest coordinates of the vertex in the scene. This is later used in the algorithms to help determine vital variables which are based on the size of the scene. It also accelerates somewhat the rendering of the scene since all the rays not hitting the bounding box can be dismissed automatically reducing the amount of calculations needed to display the scene Reading and applying animation Animations are being read from the scene file containing frames. In case of 3ds file used in the system these frames are stored in a list of so called tracks for each element in the scene. There are multiple types of tracks stored for objects. Each of these describes different aspect of animation like position change, rotation, color and similar. In the State of the Art in Ray Tracing Animated Scenes [60] the animations are classified in different categories. Yet the article specifies the worst case scenario as incoherent motion meaning that the faces can be moving independent of one another even losing objects structural integrity. This also includes adding and removing faces during animation. In simple words the acceleration data structure has to be regenerated each time from scratch. Therefore in light of this worst scenario I have decided to implement only the animation of objects position. Other animation types like changing color of object or light positions do not actually require rebuilding of the data structure and can be classified as part of static scenes as defended in the mentioned article. The system for each of the frames generates a set of movement vectors one for each object in the scene. Then the original set of triangles is modified by adding the movement vector to each triangle. The stationary objects have movement vectors set to 0. In order to accelerate the whole process even farther the transformation is done on GPU using Compute Shader. Using this technique one reduces number of data exchanges between main memory and GPU memory. This is due to the property of shaders that the result can be fed directly to GPU as a resource for next stages. After modifying the scene new vertex buffer for rasterization has to be created and acceleration data structure has to be rebuilt. Unfortunately creating vertex buffer requires creating temporary data structure on CPU. This is somewhat redoing the task that is done on GPU however one is not able to use CS resource as vertex data for rendering.

57 Algorithm in detail Building acceleration data structure The building of acceleration data structure has been divided into 5 steps. 4 of those are parallel and carried out by GPU and one of those is single threaded and carried on CPU. The split of the tasks is done so due to the already mentioned fact that GPU is fast only for highly parallel tasks in other cases CPU of course is much faster. Initially I thought of only pre-splitting triangles on GPU using uniform grid data structure. I decided for uniform grid since this one can be calculated by relatively simple algorithm and those in general are very fast on GPU. From the point of view of the splitting quality the uniform grid is not very good since the less uniformly distributed scene the lower split quality we get. In extreme cases we could even get no acceleration if all the geometry happens to be in the single grid cell. However other splitting algorithms will not benefit as much from the parallel nature of GPU as the uniform grid one does. Unfortunately since one has to define the size of the buffers before running this algorithm, it has to be carried in two steps. First time for determining the size of the buffers and second time for writing actual data to the buffer of which we know the size. After splitting the scene there is CPU step of building the tree. Since it is very heavy on memory operations I decided to keep it single threaded. This was to the fact that unfortunately building tree over a uniform grid according to my conditions is resulting in having multiple grid cells being a part of a single leaf of the tree. This implies that there could be multiple instances of the same triangle in the leaf. This case significantly reduces however performance while rendering. Since the GPU I was using for the development is not the fastest in making loop operations the amount of those had to be reduced. In order to do that while adding triangles to the tree leafs one has to check for multiple instances of the same triangle in the leaf. This is very time consuming and according to the optimization tools I have used it was the heaviest operation of all on the CPU especially because of heavy memory access. I decided to move a significant part of this step onto GPU as Compute Shader and parallelize it per leaf. Doing so I could use the results of the compute shader directly as input for final rendering stage. Additionally since this is a highly memory intensive operation I hoped to gain some bust out of fast GDDR 5 memories. Unfortunately my card seems to be slow in this kind of operation and I have actually suffered some degree of deceleration. However tests with the card parameters let me believe that in total this should be better decision since it shows increment of speed while processing finer grids having higher amount of cells while building the tree Pre compute shader determining amount of triangles Let s start with the description of steps in the order they occur in the algorithm. The first thing that is done after reading the data from the scene is the computation of amount of triangles generated by the second step. Unfortunately as already mentioned there is no way to resize the input or output of buffers inside compute shader so one has to first do

58 Algorithm in detail 56 something which I have called pre-compute shader. This is a compute shader doing similar operations to the work done by the next step however this only determines the amount of triangles generated and required to be stored. As already stated the general idea about the algorithm is based on processing triangles. Therefore each thread on the GPU is responsible for processing a single triangle from the scene. This is the most efficient way since typical scene consists of thousands of triangles and as an above mentioned research [14] indicates the GPU is most optimal with huge amount of threads. An array of all the scene triangles in form of a structured buffer is passed to the GPU. There is no need at this stage for any other information than the vertex positions so this are only passed. Additionally some constants like grid size in respect to x, y and z dimensions, scene bounding box and size of a single grid cell is passed. The size of the cell could be calculated on GPU since we have the information about the scene bounding box size and the number of cells in each dimension however since the grid is assumed to be uniform (all the cells are of the same size) and this would have to be calculated for each thread its being calculated on CPU instead and passed to GPU. On the GPU each triangle is first roughly placed into the grid. During this a bounding box for the triangle is calculated. Based on that all the grid cells that overlap with the bounding box are found. This is done using a very simple assumption that all the grid cells greater or equal in respect of one of the dimensions then the cell holding minimal point of the bounding box and less or equal to the one holding maximal point of the bounding box are overlapping with it. Then using the Fast 3D Triangle-Box Overlap Testing by Tomas Akenine-Moller [2] algorithm it tests all the overlapping grid cells for the real triangle cell intersection. Thanks to initial fast testing of bounding box however one has to test only a small subset with the much more costly triangle box overlap test. As a result if a triangle is overlapping with a cell the count of cells for the given triangle is increased. After going through each cell we know the total count of cells that overlap with the triangle being processed by current thread. The result is put into a read write structure buffer under the same index as the triangle was in the input buffer. Thanks to that one doesn t have to do any buffer synchronization between threads since each thread has separate input and output. After reading the output finishing the run of the compute shader the data is read by the main program. All it remains is to loop trough the result and sum all the triangle grid intersection. This is used as the size for the output buffer of the next step The compute shader splitting the triangles to the grid This step is responsible for cutting the scene and fitting to the grid. Since in this step the amount of the resulting triangles is known we can create the output buffer easily. Additionally each input triangle is extended with the index of first element in the output buffer/array. This is for simple data handling later on and of curse for eliminating any synchronization problems. All threads inputs and outputs are defined and do not overlap.

59 Algorithm in detail 57 This means that each thread exactly knows from where to read and where to write and all those places in memory are separate for each of those. The first part of the step is very similar to the previous step. As in the previous step all triangle grid intersections are found. However this time, instead just counting intersections the algorithm puts information about each of them into the output buffer. Since the output is set to the size pre-calculated in previous step there is no problem with storing information about the cell and the index of all cut triangles. The information for each sub triangle generated from original triangle cut in the process of fitting to the grid is placed consecutively after the index specified for the given triangle. This can be seen as for each triangle there is an exactly defined number of cells in memory where one can put information about sub triangles. Actually we don t have to cut the triangles since the algorithm for ray tracing will first test the grid ray intersections and later ray triangle ones. However that is described in more details in the following sections. Additionally after the execution of compute shader the data about the amount of triangles in each grid cell is calculated. This is very simple since it requires only going through the output and adding all the cell triangle intersections for each cell to an array Building KD-Tree As the result of previous steps we get all data needed to create a kd-tree based on my hybrid algorithm. The building of the tree is based on the pre-calculated grid. This is much faster than building it from scratch using a more advanced space division technique however as mentioned the quality of the tree is not the highest. Before building the tree itself information about the count of triangles in each grid cell is transformed. The output of the previous steps is a 1D array; however the calculations are supposed to be performed on 3D grid structure. To reduce number of index transformation in the code I have decided to convert the output array to a 3 dimensional one. This allows easy addressing of the data in the grid cells without having to do much more than just specifying the x, y and z coordinates of a given grid cell. Since the algorithm conceptually is based on finding best divisions of the grid into subspaces it relies on accessing the data structure very often.

60 Algorithm in detail 58 Figure 29-2D sample KD-tree building from grid The basic idea is that the tree is build based on the amount of triangles in the grid cells. The algorithm tries to find optimal splitting point. That s done using the scheme that all the splitting axis are circled. A simplified 2D case is depicted on the figure above. As one can see first we start with choosing the split axis, the first axis is the x axis. Then every consecutive split is performed on next axis (in 2D case this is x,y,x,y. and in 3D case x,y,z,x,y,z..). So knowing the test axis the algorithm starts to calculate all triangles in the sub space defined by grid edges (in consecutive division attempts we have to take existing splitting planes into account) and the potential split plane. If it happens that the sub space defined by the tested plane and grid edges contains more than half of triangles in the scene we found the optimal split plane. This can be stored in the tree in form of bounding coordinates minimal and maximal ones. Otherwise we have to move the test plane to the next possible coordinate in the grid cell and to the counting once again. In the figure above one can see that the first split is in the middle and that it was calculated that the left child of the root node will contain 21 triangles and the right one 16 triangles. Then one changes the axis and calculates both child nodes of the left and right children of the root node. This is continued until one reaches a splitting goal. After we reach one of the splitting goals which are either the space is not split able (has only one grid cell or splitting it on all of the axis produces the same resulting sub space) or the goal of having a given amount of triangles in the node it is marked as leaf. Leafs of the tree have to be filled with data. Since this process have been put on GPU the leafs are put onto a list that is to be processed in the second step called filling. The filling is done using Compute Shader and is described in the next section of this chapter.

61 Algorithm in detail 59 Additionally while creating the tree each tree node is assigned a unique ID based on its position. Based on those ids and the structure of the tree the traversing algorithm for the ray tracer is created. As already mentioned in context of Direct3D GPU unfortunately doesn t implement notion of pointers so one can t directly map traversal algorithms from CPU onto GPU. The details of the algorithm will be described later. At this point it is only worth to mention that all the tree nodes have assigned so called successful id and failure id. These are the ids of tree nodes that have to be checked if node is intersected with ray successfully or not. These are generated in the following way. If the node is a left child then its successful id is the id of its left child and the failure id is the id of the right child. For the right child the successful id is the id of the left child and failure id is the failure id of its parent. In case of leafs the successful and failure ids are the same and are equal to the failure id of its parent. This was also shown on the figure 24 in the chapter on GPU mapping of the tree. Figure 30 - Tree converted into 2 arrays After the tree is created one has to flatten it once again since it s much easier to manipulate array data on the GPU then a tree. The process is not very complex and a schematic is shown above. The colors indicate data that is stored for a given node in both tree array and triangle index-position array. The data is simply put in a flat array based on the ids assigned to the nodes. This process creates 2 arrays one is the array of the tree nodes and the second is the array of triangles. Each leaf node in the array of tree nodes has a pointer (index) to the second array and the amount of data stored for the given leaf is the count of the triangle. One has to split the data into two arrays since there is no option of putting variable length structures of GPU like the once containing arrays and similar Filling of tree leafs I have introduced this step very late in the process. I have decided to paralyze the task of filling the tree leafs to answer a problem that I have encountered with a very high CPU and memory usage. I hoped that the fast GDDR5 memory on the graphic card board would be

62 Algorithm in detail 60 more suitable for heavy operations required by this process and that it will aid passing data to the ray tracing Part since it will be already on GPU. Figure 31 - Filling leafs On the 2d simplification of real problem above one can see that triangles can occupy multiple grid cells. This implies that the while the scene is being cut into smaller pieces in the process of creating the uniform grid there exist multiple instances of a single triangle in that grid. Therefore if we take triangle 1 from above case we can see that it is in 11 grid cells. So the system while creating the KD-tree treats this triangle as 11 triangles. However after the final tree is created and leafs are being filled there is no point in putting all the instances of the same triangle into given leaf that describes part of the space containing multiple instances of the same triangle. Therefore the most important part of the filling process is to remove those multiple instances of a single triangle that are supposed to be put into the same tree leaf. The process of filling data requires a lot of different inputs from all over different parts of the algorithm. These are: Triangle data - the information about the animated triangles in the scene Tree Leaf data - information on tree leafs that are to be filled Triangle Indexes - the indexes of triangles in each cell of the grid of discredited space Consecutive Indexes of triangles - first indexes of the triangles in given cell of the space Grid Triangle Count - the count of triangles in each grid cell of the discredited space In order to have the required data the buffers from previous steps are either kept on the graphic card or are copied from read-write buffers in memory. The only data that has to be passed separately is the information describing tree leafs to be filled. The process itself is very simple. For each leaf one loops through all the subspace grid cells and adds the index of a triangle from a cell if it's not added yet. In order to figure out if the


INTRODUCTION TO RENDERING TECHNIQUES INTRODUCTION TO RENDERING TECHNIQUES 22 Mar. 212 Yanir Kleiman What is 3D Graphics? Why 3D? Draw one frame at a time Model only once X 24 frames per second Color / texture only once 15, frames for a feature

More information

Hardware design for ray tracing

Hardware design for ray tracing Hardware design for ray tracing Jae-sung Yoon Introduction Realtime ray tracing performance has recently been achieved even on single CPU. [Wald et al. 2001, 2002, 2004] However, higher resolutions, complex

More information

Computer Graphics: Visualisation Lecture 3. Taku Komura Institute for Perception, Action & Behaviour

Computer Graphics: Visualisation Lecture 3. Taku Komura Institute for Perception, Action & Behaviour Computer Graphics: Visualisation Lecture 3 Taku Komura Institute for Perception, Action & Behaviour Taku Komura Computer Graphics & VTK 1 Last lecture... Visualisation can be greatly

More information

Architecture of a Graphics Pipeline. 6 February 2007 CMPT370 Dr. Sean Ho Trinity Western University

Architecture of a Graphics Pipeline. 6 February 2007 CMPT370 Dr. Sean Ho Trinity Western University Architecture of a Graphics Pipeline 6 February 2007 CMPT370 Dr. Sean Ho Trinity Western University Review last time Visual computing: Computer graphics and image analysis Objectives of visual computing

More information

A Short Introduction to Computer Graphics

A Short Introduction to Computer Graphics A Short Introduction to Computer Graphics Frédo Durand MIT Laboratory for Computer Science 1 Introduction Chapter I: Basics Although computer graphics is a vast field that encompasses almost any graphical

More information

We have learnt that the order of how we draw objects in 3D can have an influence on how the final image looks

We have learnt that the order of how we draw objects in 3D can have an influence on how the final image looks Review: Last Week We have learnt that the order of how we draw objects in 3D can have an influence on how the final image looks Depth-sort Z-buffer Transparency Orientation of triangle (order of vertices)

More information

9. Illumination and Shading

9. Illumination and Shading -128-9. Illumination and Shading Approaches for visual realism: 1. Remove hidden surfaces 2. Shade the visible surfaces and reproduce shadows 3. Reproduce surface properties: texture degree of transparency,

More information

Computer Graphics Global Illumination (2): Monte-Carlo Ray Tracing and Photon Mapping. Lecture 15 Taku Komura

Computer Graphics Global Illumination (2): Monte-Carlo Ray Tracing and Photon Mapping. Lecture 15 Taku Komura Computer Graphics Global Illumination (2): Monte-Carlo Ray Tracing and Photon Mapping Lecture 15 Taku Komura In the previous lectures We did ray tracing and radiosity Ray tracing is good to render specular

More information

Introduction to Computer Graphics

Introduction to Computer Graphics Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 Today What is computer graphics? Contents of this course Syllabus Overview of course topics

More information

Using RenderMan for ray tracing and global illumination in complex scenes

Using RenderMan for ray tracing and global illumination in complex scenes Using RenderMan for ray tracing and global illumination in complex scenes Per Christensen Pixar Animation Studios DTU, June 2005 Overview Pixar and Pixar movies RenderMan Recent research: ray tracing and

More information

A fast real-time back-face culling approach

A fast real-time back-face culling approach A fast real-time back-face culling approach Vadim Manvelyan Advisor: Dr. Norman Badler April 11 2006 Abstract Three-dimensional graphics has been an area of interest in computer science

More information

Realtime 3D Computer Graphics Virtual Reality. Graphics

Realtime 3D Computer Graphics Virtual Reality. Graphics Realtime 3D Computer Graphics Virtual Reality Graphics Computer graphics 3D-Computer graphics (3D-CG) currently used for Simulators, VR, Games (real-time) Design (CAD) Entertainment (Movies), Art Education

More information

Computer Applications in Textile Engineering. Computer Applications in Textile Engineering

Computer Applications in Textile Engineering. Computer Applications in Textile Engineering 3. Computer Graphics Sungmin Kim Computer Graphics Definition Introduction Research field related to the activities that includes graphics as input and output Importance Interactive

More information

Books. CS155b Computer Graphics. Homework. Additional References. Syllabus. Goals

Books. CS155b Computer Graphics. Homework. Additional References. Syllabus. Goals CS155b Computer Graphics Instructor: Giovanni Motta ( Volen, Room #255. Phone: x62718 Class: Mon. and Wed. from 5 to 6:30pm Abelson #131 Teaching Assistants: Anthony Bucci (abucci@cs) John

More information

Ray Tracing on Graphics Hardware

Ray Tracing on Graphics Hardware Ray Tracing on Graphics Hardware Toshiya Hachisuka University of California, San Diego Abstract Ray tracing is one of the important elements in photo-realistic image synthesis. Since ray tracing is computationally

More information

Computer Graphics (CS 543) Lecture 1 (Part 1): Introduction to Computer Graphics

Computer Graphics (CS 543) Lecture 1 (Part 1): Introduction to Computer Graphics Computer Graphics (CS 543) Lecture 1 (Part 1): Introduction to Computer Graphics Prof Emmanuel Agu Computer Science Dept. Worcester Polytechnic Institute (WPI) What is Computer Graphics (CG)? Computer

More information

Shading. Reading. Pinhole camera. Basic 3D graphics. Brian Curless CSE 557 Fall 2013. Required: Shirley, Chapter 10

Shading. Reading. Pinhole camera. Basic 3D graphics. Brian Curless CSE 557 Fall 2013. Required: Shirley, Chapter 10 Reading Required: Shirley, Chapter 10 Shading Brian Curless CSE 557 Fall 2013 1 2 Basic 3D graphics With affine matrices, we can now transform virtual 3D obects in their local coordinate systems into a

More information

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques

More information

CSE 167: Lecture #3: Projection. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2011

CSE 167: Lecture #3: Projection. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2011 CSE 167: Introduction to Computer Graphics Lecture #3: Projection Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2011 Announcements Project 1 due Friday September 30 th, presentation

More information

Lecture 11: Ray tracing

Lecture 11: Ray tracing Interactive Computer Graphics Lecture 11: Ray tracing Graphics Lecture 11: Slide 1 Some slides adopted from H. Pfister, Harvard Graphics Lecture 10: Slide 2 Direct and Global Illumination Direct illumination:

More information

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg Image Processing and Computer Graphics Rendering Pipeline Matthias Teschner Computer Science Department University of Freiburg Outline introduction rendering pipeline vertex processing primitive processing

More information

Dhiren Bhatia Carnegie Mellon University

Dhiren Bhatia Carnegie Mellon University Dhiren Bhatia Carnegie Mellon University University Course Evaluations available online Please Fill! December 4 : In-class final exam Held during class time All students expected to give final this date

More information

L20: GPU Architecture and Models

L20: GPU Architecture and Models L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.

More information

Illumination Models and Shading. Foley & Van Dam, Chapter 16

Illumination Models and Shading. Foley & Van Dam, Chapter 16 Illumination Models and Shading Foley & Van Dam, Chapter 16 Illumination Models and Shading Light Source Models Ambient Illumination Diffuse Reflection Specular Reflection Polygon Rendering Methods Flat

More information

Instructor. Goals. Image Synthesis Examples. Applications. Computer Graphics. Why Study 3D Computer Graphics?

Instructor. Goals. Image Synthesis Examples. Applications. Computer Graphics. Why Study 3D Computer Graphics? Computer Graphics Motivation: Why do we study 3D Graphics? Instructor PhD Stanford, 2002. PhD thesis developed Spherical Harmonic Lighting widely

More information

CS 314: Lights, More Shading, Recursive Raytracing

CS 314: Lights, More Shading, Recursive Raytracing CS 314: Lights, More Shading, Recursive Raytracing Robert Bridson October 21, 2008 1 Normals in Diffuse Shading Last time we finished with the basic formula for diffuse (or matte, or Lambertian) shading:

More information

Specular reflection. Dielectrics and Distribution in Ray Tracing. Snell s Law. Ray tracing dielectrics

Specular reflection. Dielectrics and Distribution in Ray Tracing. Snell s Law. Ray tracing dielectrics Specular reflection Dielectrics and Distribution in Ray Tracing CS 465 Lecture 22 Smooth surfaces of pure materials have ideal specular reflection (said this before) Metals (conductors) and dielectrics

More information


CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS ICCVG 2002 Zakopane, 25-29 Sept. 2002 Rafal Mantiuk (1,2), Sumanta Pattanaik (1), Karol Myszkowski (3) (1) University of Central Florida, USA, (2) Technical University of Szczecin, Poland, (3) Max- Planck-Institut

More information

Ray Casting. Simplest shading approach is to perform independent lighting calculation for every pixel

Ray Casting. Simplest shading approach is to perform independent lighting calculation for every pixel Ray Casting Simplest shading approach is to perform independent lighting calculation for every pixel ) ) ( ) ( ( + + + = i i n i S i i D AL A E I R V K I L N K I K I I Polygon Rendering Methods Given a

More information

Deferred Shading. Shawn Hargreaves

Deferred Shading. Shawn Hargreaves Deferred Shading Shawn Hargreaves Overview Don t bother with any lighting while drawing scene geometry Render to a fat framebuffer format, using multiple rendertargets to store data such as the position

More information

GRAFICA - A COMPUTER GRAPHICS TEACHING ASSISTANT. Andreas Savva, George Ioannou, Vasso Stylianou, and George Portides, University of Nicosia Cyprus

GRAFICA - A COMPUTER GRAPHICS TEACHING ASSISTANT. Andreas Savva, George Ioannou, Vasso Stylianou, and George Portides, University of Nicosia Cyprus ICICTE 2014 Proceedings 1 GRAFICA - A COMPUTER GRAPHICS TEACHING ASSISTANT Andreas Savva, George Ioannou, Vasso Stylianou, and George Portides, University of Nicosia Cyprus Abstract This paper presents

More information

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007 Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Questions 2007 INSTRUCTIONS: Answer all questions. Spend approximately 1 minute per mark. Question 1 30 Marks Total

More information

MMGD0203 Multimedia Design MMGD0203 MULTIMEDIA DESIGN. Chapter 3 Graphics and Animations

MMGD0203 Multimedia Design MMGD0203 MULTIMEDIA DESIGN. Chapter 3 Graphics and Animations MMGD0203 MULTIMEDIA DESIGN Chapter 3 Graphics and Animations 1 Topics: Definition of Graphics Why use Graphics? Graphics Categories Graphics Qualities File Formats Types of Graphics Graphic File Size Introduction

More information

Introduction to Computer Graphics. Reading: Angel ch.1 or Hill Ch1.

Introduction to Computer Graphics. Reading: Angel ch.1 or Hill Ch1. Introduction to Computer Graphics Reading: Angel ch.1 or Hill Ch1. What is Computer Graphics? Synthesis of images User Computer Image Applications 2D Display Text User Interfaces (GUI) - web - draw/paint

More information

Lecture Notes, CEng 477

Lecture Notes, CEng 477 Computer Graphics Hardware and Software Lecture Notes, CEng 477 What is Computer Graphics? Different things in different contexts: pictures, scenes that are generated by a computer. tools used to make

More information

Ray Tracing: Color and Texture

Ray Tracing: Color and Texture Ale xb ento n, Sup Bent Univer o por s ted n@da ity of Cam in p mtp.c art by G bridge A le U K, L td Ray Tracing: Color and Texture Shadows To simulate shadows in ray tracing, fire a ray

More information

IFC-based clash detection for the open-source BIMserver

IFC-based clash detection for the open-source BIMserver IFC-based clash detection for the open-source BIMserver P. van den Helm, M Böhms & L van Berlo TNO Built Environment and Geosciences, NL Abstract When designing construction objects, different disciplines

More information


PRODUCT LIFECYCLE MANAGEMENT COMPETENCY CENTRE RENDERING. PLMCC, JSS Academy of Technical Education, Noida Rendering 1 of 16 PRODUCT LIFECYCLE MANAGEMENT COMPETENCY CENTRE RENDERING PLMCC, JSS Academy of Technical Education, Noida Rendering 1 of 16 Table of contents Under construction PLMCC, JSS Academy of Technical Education,

More information

Comp 410/510. Computer Graphics Spring 2016. Introduction to Graphics Systems

Comp 410/510. Computer Graphics Spring 2016. Introduction to Graphics Systems Comp 410/510 Computer Graphics Spring 2016 Introduction to Graphics Systems Computer Graphics Computer graphics deals with all aspects of creating images with a computer Hardware (PC with graphics card)

More information

Shader Model 3.0, Best Practices. Phil Scott Technical Developer Relations, EMEA

Shader Model 3.0, Best Practices. Phil Scott Technical Developer Relations, EMEA Shader Model 3.0, Best Practices Phil Scott Technical Developer Relations, EMEA Overview Short Pipeline Overview CPU Bound new optimization opportunities Obscure bits of the pipeline that can trip you

More information

(Refer Slide Time: 00:01:23 min)

(Refer Slide Time: 00:01:23 min) Computer Aided Design Prof. Anoop Chalwa Department of Mechanical Engineering Indian Institute of Technology, Delhi Lecture No. # 02 Input Output Devices, Raster Graphics Today we will be talking about

More information

Computer Graphics. Introduction. Computer graphics. What is computer graphics? Yung-Yu Chuang

Computer Graphics. Introduction. Computer graphics. What is computer graphics? Yung-Yu Chuang Introduction Computer Graphics Instructor: Yung-Yu Chuang ( 莊 永 裕 ) E-mail: Office: CSIE 527 Grading: a MatchMove project Computer Science ce & Information o Technolog og Yung-Yu Chuang

More information

Making natural looking Volumetric Clouds In Blender 2.48a

Making natural looking Volumetric Clouds In Blender 2.48a I think that everyone using Blender has made some trials about making volumetric clouds. The truth is that a kind of volumetric clouds is already available in Blender for a long time, thanks to the 3D

More information

Image-based Lighting in Lightwave 3D

Image-based Lighting in Lightwave 3D Image-based Lighting in LightWave Page 1 of 4 Image-based Lighting in Lightwave 3D 2001 Lightwave 3D Background The Lightwave 3D renderer is one of the most widely used in Film and Broadcast production

More information

Lezione 4: Grafica 3D*(II)

Lezione 4: Grafica 3D*(II) Lezione 4: Grafica 3D*(II) Informatica Multimediale Docente: Umberto Castellani *I lucidi sono tratti da una lezione di Maura Melotti ( RENDERING Rendering What is rendering? Rendering

More information

3D Modeling and Animation

3D Modeling and Animation 3D Modeling and Animation An Introduction ( Stephanie O Malley ) University of Michigan 3D Lab Digital Media Commons, Library What does CGI Mean? CGI Stands for Computer Generated

More information

Impact of Modern OpenGL on FPS

Impact of Modern OpenGL on FPS Impact of Modern OpenGL on FPS Jan Čejka Supervised by: Jiří Sochor Faculty of Informatics Masaryk University Brno/ Czech Republic Abstract In our work we choose several old and modern features of OpenGL

More information

Image Synthesis. Ambient Occlusion. computer graphics & visualization

Image Synthesis. Ambient Occlusion. computer graphics & visualization Image Synthesis Ambient Occlusion Ambient Occlusion (AO) Ambient Occlusion approximates the diffuse illumination of a surface based on its directly visible occluders Idea: Trace rays through the normal-oriented

More information

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1 Welcome to GUI! Mechanics 26/02/2014 1 Requirements Info If you don t know C++, you CAN take this class additional time investment required early on GUI Java to C++ transition tutorial on course website

More information

COMP175: Computer Graphics. Lecture 1 Introduction and Display Technologies

COMP175: Computer Graphics. Lecture 1 Introduction and Display Technologies COMP175: Computer Graphics Lecture 1 Introduction and Display Technologies Course mechanics Number: COMP 175-01, Fall 2009 Meetings: TR 1:30-2:45pm Instructor: Sara Su ( TA: Matt Menke

More information

Scan-Line Fill. Scan-Line Algorithm. Sort by scan line Fill each span vertex order generated by vertex list

Scan-Line Fill. Scan-Line Algorithm. Sort by scan line Fill each span vertex order generated by vertex list Scan-Line Fill Can also fill by maintaining a data structure of all intersections of polygons with scan lines Sort by scan line Fill each span vertex order generated by vertex list desired order Scan-Line

More information

Shading. Reading. Pinhole camera. Basic 3D graphics. Brian Curless CSE 557 Fall 2014. Required: Shirley, Chapter 10

Shading. Reading. Pinhole camera. Basic 3D graphics. Brian Curless CSE 557 Fall 2014. Required: Shirley, Chapter 10 Reading Required: Shirley, Chapter 10 Shading Brian Curless CSE 557 Fall 2014 1 2 Basic 3D graphics With affine matrices, we can now transform virtual 3D objects in their local coordinate systems into

More information

The Close Objects Buffer: A Sharp Shadow Detection Technique for Radiosity Methods

The Close Objects Buffer: A Sharp Shadow Detection Technique for Radiosity Methods The Close Objects Buffer: A Sharp Shadow Detection Technique for Radiosity Methods A.C. Telea, C.W.A.M. van Overveld Department of Mathematics and Computing Science Eindhoven University of Technology P.O.

More information

Ray tracing for the movie Cars

Ray tracing for the movie Cars Ray tracing for the movie Cars Per Christensen Pixar Animation Studios Ayia Napa Seminar, June 2006 Cars challenges Animation: cars that move, talk, think Rendering: geometric complexity ray tracing: reflections,

More information

Reflection and Refraction

Reflection and Refraction Equipment Reflection and Refraction Acrylic block set, plane-concave-convex universal mirror, cork board, cork board stand, pins, flashlight, protractor, ruler, mirror worksheet, rectangular block worksheet,

More information

Microsoft DirectX 7: What s New for Graphics

Microsoft DirectX 7: What s New for Graphics Microsoft DirectX 7: What s New for Graphics Microsoft DirectX 7: What s New for Graphics? Microsoft s introduction of DirectX 7 introduces a wide range of new 3D graphics features for the personal computer.

More information

17.1 Reflection and Refraction

17.1 Reflection and Refraction 17.1 Reflection and Refraction How do we describe the reflection and refraction of light? Reflection and Refraction Investigation 17.1 We observe the law of reflection every day. Looking in a mirror, we

More information

Radiosity Rendering. Chapter 5. References. 5.1 Radiosity

Radiosity Rendering. Chapter 5. References. 5.1 Radiosity Chapter 5 Radiosity Rendering References As you read the following, you may find the following summary helpful. In particular it contains some nice illustrations.

More information

NVIDIA Advanced Rendering Solutions May 14, 2012

NVIDIA Advanced Rendering Solutions May 14, 2012 NVIDIA Advanced Rendering Solutions May 14, 2012 S0604 - NVIDIA Advanced Rendering Solutions The full range of advanced rendering solutions and frameworks from NVIDIA will be explored in this insightful

More information

An introduction to Global Illumination. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

An introduction to Global Illumination. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology An introduction to Global Illumination Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology Isn t ray tracing enough? Effects to note in Global Illumination image:

More information

Realtime Ray Tracing for Current and Future Games

Realtime Ray Tracing for Current and Future Games Realtime Ray Tracing for Current and Future Games Jörg Schmittler, Daniel Pohl, Tim Dahmen, Christian Vogelgesang, and Philipp Slusallek {schmittler,sidapohl,morfiel,chrvog,slusallek}

More information

Overview. Introduction 3D Projection. Volume Rendering. Isosurface Rendering. Sofware. Raytracing Modes for 3D rendering

Overview. Introduction 3D Projection. Volume Rendering. Isosurface Rendering. Sofware. Raytracing Modes for 3D rendering 3D rendering Overview Introduction 3D Projection Raytracing Modes for 3D rendering Volume Rendering Maximum intensity projection Direct Volume Rendering Isosurface Rendering Wireframing Sofware Amira Imaris

More information

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group Shader Model 3.0 Ashu Rege NVIDIA Developer Technology Group Talk Outline Quick Intro GeForce 6 Series (NV4X family) New Vertex Shader Features Vertex Texture Fetch Longer Programs and Dynamic Flow Control

More information

Shadow Algorithms. Image Processing and Computer Graphics. Matthias Teschner. Computer Science Department University of Freiburg

Shadow Algorithms. Image Processing and Computer Graphics. Matthias Teschner. Computer Science Department University of Freiburg Image Processing and Computer Graphics Shadow Algorithms Matthias Teschner Computer Science Department University of Freiburg University of Freiburg Computer Science Department Computer Graphics - 1 Outline

More information

Ray Tracing (Shading)

Ray Tracing (Shading) CS4620/5620: Lecture 35 Ray Tracing (Shading) 1 Announcements 4621 Class today Turn in HW3 PPA3 is going to be out today PA3A is out 2 Shading Compute light reflected toward camera Inputs: eye direction

More information

Pre-computing Lighting in Games. David Larsson Autodesk Inc.

Pre-computing Lighting in Games. David Larsson Autodesk Inc. Pre-computing Lighting in Games David Larsson Autodesk Inc. What is baked lighting? Precompute lighting information for static scenes and lights Typically baked to Vertices Textures Light probe points

More information

Analysis of Undiffused and Diffused Light on Stainless Steel

Analysis of Undiffused and Diffused Light on Stainless Steel Analysis of Undiffused and Diffused Light on Stainless Steel Aarash Navabi, CPIP and Matthew Hildner Abstract An analysis of why stainless steel appears to have a large number of imperfections under undiffused

More information

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005 Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005 Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex

More information

Render your tests with your master camera at low-res and do your lighting, posing, texturing as usual.

Render your tests with your master camera at low-res and do your lighting, posing, texturing as usual. Render your tests with your master camera at low-res and do your lighting, posing, texturing as usual. For all of you Lightwave users that need to find an easier way of rendering those high resolution

More information

FreeForm 3D Mesh Warp User Manual. freeform created by

FreeForm 3D Mesh Warp User Manual. freeform created by FreeForm 3D Mesh Warp User Manual freeform created by Contents 3 Installation 4 Controls 6 Controls - cont d 8 Support 9 Application FreeForm 3D Mesh Warp The FreeForm 3D Mesh Warp plugin for

More information

CS445 Exam 2 Solutions

CS445 Exam 2 Solutions November 20, 2014 Name CS445 Exam 2 Solutions Fall 2014 1. (max = 15) 5. (max = 21) 2. (max = 8) 6. (max = 16) 3. (max = 10) 7. (max = 16) 4. (max = 14) Final Score: (max=100) Please try to write legibly.

More information

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation Modern Graphics Engine Design Sim Dietrich NVIDIA Corporation Overview Modern Engine Features Modern Engine Challenges Scene Management Culling & Batching Geometry Management Collision

More information

Real-time multi-bounce many-object ray tracing with distance-normal impostors

Real-time multi-bounce many-object ray tracing with distance-normal impostors Real-time multi-bounce many-object ray tracing with distance-normal impostors Peter Dancsik Peter Minarik Department of Control Engineering and Information Technology Budapest University of Technology

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Advanced Computer Graphics. Materials and Lights. Matthias Teschner. Computer Science Department University of Freiburg

Advanced Computer Graphics. Materials and Lights. Matthias Teschner. Computer Science Department University of Freiburg Advanced Computer Graphics Materials and Lights Matthias Teschner Computer Science Department University of Freiburg Motivation materials are characterized by surface reflection properties empirical reflectance

More information

SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing

SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing Won-Jong Lee, Shi-Hwa Lee, Jae-Ho Nah *, Jin-Woo Kim *, Youngsam Shin, Jaedon Lee, Seok-Yoon Jung SAIT, SAMSUNG Electronics, Yonsei Univ. *,

More information

Computer Graphics. Lecture 1:

Computer Graphics. Lecture 1: Computer Graphics Thilo Kielmann Lecture 1: 1 Introduction (basic administrative information) Course Overview + Examples (a.o. Pixar, Blender, ) Graphics Systems Hands-on Session General Introduction

More information

Overview. 2D Texture Map Review. 2D Texture Map Hardware. Texture-Based Direct Volume Rendering

Overview. 2D Texture Map Review. 2D Texture Map Hardware. Texture-Based Direct Volume Rendering Overview Texture-Based Direct Volume Rendering Department of Computer Science University of New Hampshire Durham, NH 03824 Based on: Van Gelder and Kim, Direct volume rendering with shading via 3D textures,

More information

Illumination Models for Graphics CS 211A

Illumination Models for Graphics CS 211A Illumination Models for Graphics CS 211A Can be very complex The incoming light can come from a source, or bouncing off another object, or after multiple bounces Sources can be extended Multiple interactions

More information


2 1 2 Prior presenters have well explained the MLAA algorithm and some implementation approaches, as well as some of the motivations for its use (alternative to MSAA, lower memory, application to deferred

More information

Improved Billboard Clouds for Extreme Model Simplification

Improved Billboard Clouds for Extreme Model Simplification Improved Billboard Clouds for Extreme Model Simplification I.-T. Huang, K. L. Novins and B. C. Wünsche Graphics Group, Department of Computer Science, University of Auckland, Private Bag 92019, Auckland,

More information

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1 Silverlight for Windows Embedded Graphics and Rendering Pipeline 1 Silverlight for Windows Embedded Graphics and Rendering Pipeline Windows Embedded Compact 7 Technical Article Writers: David Franklin,

More information

Kinematic chains/skeleton Relative

Kinematic chains/skeleton Relative Review of Tuesday Complex (mechanical) objects consist of components Kinematic chains/skeleton represented as tree Hierarchical modeling facilitates implementation of relative behavior of components Relative

More information

8.1 Lens Equation. 8.2 Image Resolution (8.1) z' z r

8.1 Lens Equation. 8.2 Image Resolution (8.1) z' z r Chapter 8 Optics This chapter covers the essentials of geometrical optics. Radiometry is covered in Chapter 9. Machine vision relies on the pinhole camera model which models the geometry of perspective

More information

Advanced Rendering for Engineering & Styling

Advanced Rendering for Engineering & Styling Advanced Rendering for Engineering & Styling Prof. B.Brüderlin Brüderlin,, M Heyer 3Dinteractive GmbH & TU-Ilmenau, Germany SGI VizDays 2005, Rüsselsheim Demands in Engineering & Styling Engineering: :

More information



More information

Analytical Technologies in Biotechnology Dr. Ashwani K. Sharma Department of Biotechnology Indian Institute of Technology, Roorkee

Analytical Technologies in Biotechnology Dr. Ashwani K. Sharma Department of Biotechnology Indian Institute of Technology, Roorkee Analytical Technologies in Biotechnology Dr. Ashwani K. Sharma Department of Biotechnology Indian Institute of Technology, Roorkee Module 1 Microscopy Lecture - 2 Basic concepts in microscopy 2 In this

More information

Course Overview. CSCI 480 Computer Graphics Lecture 1. Administrative Issues Modeling Animation Rendering OpenGL Programming [Angel Ch.

Course Overview. CSCI 480 Computer Graphics Lecture 1. Administrative Issues Modeling Animation Rendering OpenGL Programming [Angel Ch. CSCI 480 Computer Graphics Lecture 1 Course Overview January 14, 2013 Jernej Barbic University of Southern California Administrative Issues Modeling Animation

More information

AMD GPU Tools for games development

AMD GPU Tools for games development AMD GPU Tools for games development Holger Gruen European Developer Relations AMD Graphics Products Group Material for many slides was provided by J. Zarge and S. Sowerby from the

More information

Polygon Scan Conversion & Shading

Polygon Scan Conversion & Shading 3D Rendering Pipeline (for direct illumination) Polygon Scan Conversion & Shading Greg Humphreys CS445: Intro Graphics University of Virginia, Fall 2004 3D Primitives 3D Modeling Coordinates Modeling Transformation

More information

Real-time skin rendering on graphics hardware

Real-time skin rendering on graphics hardware Real-time skin rendering on graphics hardware Pedro V. Sander David Gosselin Jason L. Mitchell ATI Research Skin shading Most lighting comes from sub-surface scattering Traditional Lambertian lighting

More information



More information


REFLECTION & REFRACTION REFLECTION & REFRACTION OBJECTIVE: To study and verify the laws of reflection and refraction using a plane mirror and a glass block. To see the virtual images that can be formed by the reflection and refraction

More information

The Study of The Application of 3DsMax and Photoshop Software in The Simulation Effects of Home Textile Products

The Study of The Application of 3DsMax and Photoshop Software in The Simulation Effects of Home Textile Products I.J. Education and Management Engineering 2012, 5, 48-53 Published Online May 2012 in MECS ( DOI: 10.5815/ijeme.2012.05.08 Available online at

More information

Polygon Scan Conversion and Z-Buffering

Polygon Scan Conversion and Z-Buffering Polygon Scan Conversion and Z-Buffering Rasterization Rasterization takes shapes like triangles and determines which pixels to fill. 2 Filling Polygons First approach:. Polygon Scan-Conversion Rasterize

More information

A Fast Voxel Traversal Algorithm for Ray Tracing

A Fast Voxel Traversal Algorithm for Ray Tracing A Fast Voxel Traversal Algorithm for Ray Tracing John Amanatides Andrew Woo Dept. of Computer Science University of Toronto Toronto, Ontario, Canada M5S 1A4 ABSTRACT A fast and simple voxel traversal algorithm

More information

Fundamentals of Computer Graphics

Fundamentals of Computer Graphics Fundamentals of Computer Graphics INTRODUCTION! Sergio Benini! Department of Information Engineering Faculty of Engineering University of Brescia Via Branze, 38 25231 Brescia - ITALY 1 Overview Here you

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

Ray Representation. A ray can be represented explicitly (in parametric form) as an origin (point) and a direction (vector):

Ray Representation. A ray can be represented explicitly (in parametric form) as an origin (point) and a direction (vector): Ray Tracing Part II Ray Representation A ray can be represented explicitly (in parametric form) as an origin (point) and a direction (vector): Origin: r Direction: o The ray consists of all points: r(t)

More information

Writing Applications for the GPU Using the RapidMind Development Platform

Writing Applications for the GPU Using the RapidMind Development Platform Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...

More information