SCATTERED DATA VISUALIZATION USING GPU. A Thesis. Presented to. The Graduate Faculty of The University of Akron. In Partial Fulfillment

Size: px
Start display at page:

Download "SCATTERED DATA VISUALIZATION USING GPU. A Thesis. Presented to. The Graduate Faculty of The University of Akron. In Partial Fulfillment"

Transcription

1 SCATTERED DATA VISUALIZATION USING GPU A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Bo Cai May, 2015

2 SCATTERED DATA VISUALIZATION USING GPU Bo Cai Thesis Approved: Accepted: Advisor Dr. Yingcai Xiao Dean of the College Dr. Chand Midha Committee Member Dr. Tim O Neil Interim Dean of the Graduate School Dr. Rex D. Ramsier Committee Member Dr. Zhong-Hui Duan Date Department Chair Dr. Timothy Norfolk ii

3 ABSTRACT Scattered data visualization is commonly used in engineering applications. We usually employ a two-step approach, data modeling and rendering, in visualizing scattered data. Performance and accuracy are two important issues in scattered data modeling and rendering. This project developed a GPU-accelerated scattered data visualization system. The Shepard s method was used to interpolate scattered data into a 3D uniform grid and The Marching Cubes method was used in rendering the intermediate grid. Techniques, such as Localized Data Modeling, Static Local Block Data Modeling and Dynamic Local Block Data Modeling, were tested to measure their performance and accuracy. Experiments have been conducted with real world data on a GPU-accelerated scattered data visualization system. The speed-up observed on a GPU (NVidia GeForce GT 525M) is 12 to 27 times faster than on a CPU (Intel Core i5-2410m 2.30 GHz). Increasing the value of α in the Shepard s method can improve accuracy without causing performance penalty. Localization can reduce modeling error but causes performance penalty. Dynamic Block Localization can increase modeling accuracy significantly, but it has a large speed penalty due to frequent data shifts among GPU memory banks. Static iii

4 Block Localization, on the other hand, has a smaller performance penalty, but also shown smaller accuracy improvement. The parallel efficiency of the system is low ( to ). Future work includes studying issues related to GPU memory bank conflicts to increase the efficiency and investigating more GPU-accelerated data interpolation methods for their accuracy and performance. iv

5 ACKNOWLEDGEMENTS I am very thankful to my parents, for encouraging and supporting me in pursuing my master degree and also making this thesis possible. I would like to acknowledge the professor who inspired me throughout my master s program. Dr. Yingcai Xiao, thank you very much for inspiring me by giving me guidance and supporting me throughout the program. Dr. Zhong-Hui Duan and Dr. Tim O Neil, thank you very much for being my thesis committee members and supporting me to accomplish this thesis. I also would like to acknowledge the faculty in the Department of Computer Science, Dr. En Cheng, Dr. Chien-Chung Chan, Dr. Kathy Liszka and Dr. Michael L. Collard for their help during my Master s degree study. Their help has directly or indirectly contributed towards the accomplishment of this thesis. v

6 TABLE OF CONTENTS Page LIST OF TABLES... viii LIST OF FIGURES... ix CHAPTER I. INTRODUCTION Motivation Survey of Previous Work Outline of the Thesis... 3 II. BACKGROUND Scattered Data Modeling and Visualization Data Interpolation Method Data Visualization Method... 6 III. DESIGN Localized Data Modeling Design of GPU-based Modeling Algorithm... 8 vi

7 3.3 Design of GPU-based Visualization Algorithm IV. IMPLEMENTATION Implementation of Localization Shepard s Method on GPU Implementation of Marching Cubes Algorithm on GPU V. RESULTS AND ANALYSES Global Method Comparisons between CPU and GPU Accuracy Comparisons between Localized Global Method and Non-localized Global Method Performance Comparison between Static Local Block Data Modeling Method and Dynamic Local Block Data Modeling Method Over All Speed Up and Error Report Analyses VI. CONCLUSION AND FUTURE WORK REFERENCES vii

8 LIST OF TABLES Table Page 5.1 Comparing CPU and GPU for Global Shepard s Method for Various Grid Sizes Runtime GPU Global Method Detailed Runtime Data Communication Time, Size and Speed for Various Grid Sizes Data Communication Size and Speed for Grid Size 64*64*64 to 128*128* Data Communication Size and Speed for Grid Size 80*80*80 to 82*82* Comparing Static Local Block Data Modeling and Dynamic Local Block Data Modeling Method Running Time for Various Grid Sizes viii

9 LIST OF FIGURES Figure Page 5.1 Graphical Representation of Non-localized Global Method Data Modeling Result Graphical Representation of Non-localized Global Method Data Modeling Numerical Error Graphical Representation of Non-localized Global Method Data Modeling Relative Error Graphical Representation of Localized Global Method Data Modeling Result Graphical Representation of Localized Global Method Data Modeling Numerical Error Graphical Representation of Localized Global Method Data Modeling Relative Error Graphical Representation of Improved Shepard s Method by α = 2 Result Graphical Representation of Improved Shepard s Method by α = 2 Numerical Error Graphical Representation of Improved Shepard s Method by α = 2 Relative Error Graphical Representation of Improved Shepard s Method by α = 10 Result Graphical Representation of Improved Shepard s Method by α = 10 Numerical Error Graphical Representation of Improved Shepard s Method by α = 10 Relative Error Improved Shepard s Method by α = 11 Relative Error RMS of Different Data Modeling Methods ix

10 5.15 Graph to Compare the Speed Between Static Local Block Data Modeling and Dynamic Local Block Data Modeling Method x

11 CHAPTER I INTRODUCTION 1.1 Motivation High performance scattered data visualizations are in great demands in many engineering applications. Examples of such applications can be found in environmental studies, oil exploration and mining. Volume visualization of scattered data is difficult due to the limited sampling rate and the scattered nature of the data [1]. This involves difficulty and high costs of conducting site investigations to acquire scattered data. It is typically we tend to collect sampling points from suspected areas of concentration. Hence it is difficult to form a 3D grid with scattered sampling points and traditional grid-based visualization techniques cannot be employed to visualize such data. As a result we usually apply a two-step approach. The first step is to perform modeling on the scattered sample data to form a 3D uniform grid. Each grid node has an interpolated data value. Conventional grid-based visualization techniques are then applied on this intermediate grid in the second step, i.e, the rendering step [2]. Traditional CPU-based computing methods are dominant in the modeling field. Even though the CPU has developed very quickly in the past several decades, it still cannot catch up with modern modeling computational demand [3]. Similarly, interactive 1

12 visualization has a higher computation demand. In our project, we aim to speed up both steps the modeling and rendering by parallelizing both the modeling and visualization processes using CUDA parallel processing. 1.2 Survey of Previous Work CPU-based interpolation methods have been used in the modeling process of scattered data visualization for years. The ideas of scattered data visualization correctness dilemma and local constraint are presented [4]. The advent of GPU CUDA parallel processing has led to research in many areas. Scattered data visualization is one of the most suitable research areas for parallel processing. The advantages of GPU CUDA parallel processing can benefit both the scattered data modeling process and the visualization process. For the GPU based scattered data modeling part, a GPU based scattered data modeling system was developed [5], Where the author implemented four GPU based scattered data modeling methods, Shepard s method, Multiquadric method, Thin-plate-spline method, and Volume Spline method. In [13] this GPU based scattered data modeling system was migrated into various platforms such as GPU GTX480, GPGPU Tesla C2070, and An Amazon Web Service Cloud-Based GPGPU instance [13]. For GPU-based scattered data visualization, there is currently no existing research on visualizing scattered data by CUDA GPU. However, the NVidia CUDA SDK provides GPU-accelerated data expansions for the Marching Cubes algorithm [9]. These tools enabled the development of GPU based scattered data visualization. 2

13 1.3 Outline of the Thesis This report consists of detailed work explained through various chapters. Chapter I includes information regarding the motivation of the project. Chapter II presents the background on the technology and some of the basic theories on scattered data modeling and rendering. Chapter III discusses the design and implementation of this scattered data visualization system. The design and implementation regarding GPU-based Shepard s method modeling and GPU-based Marching Cubes algorithm rendering are explained in detail and depicted through diagrams. Chapter IV discusses case studies, in which a stepby-step procedure is explained pictorially. Time consumed, memories used, and speeds of data communications are presented in order to make comparisons among different cases. Chapter V discusses the summary of the work and how the system can be utilized. Possible modifications as well as some future work are also explained. 3

14 CHAPTER II BACKGROUND 2.1 Scattered Data Modeling and Visualization Scattered data is unevenly distributed or randomly spread over the volume of interest. The random distribution of the data makes it hard to visualize (since existing visualization algorithms are based on a 3D grid structure [3]). Scattered data is commonly found in engineering applications. Quick interactive visualization of scattered data is in great demand [2]. The most commonly used approach for scattered data visualization contains two steps [1]. The first step involves converting the scattered sample data into a 3D uniform grid. The sample data consist of 3 values for the position and one data value. To form the grid we need to interpolate the data values onto each grid node. After the interpolation we could use grid-based visualization techniques as Marching Cubes to visualize the grid such. 2.2 Data Interpolation Method Following the two-steps approach, the first step of the procedure is modeling scattered data into a 3D uniform grid. To model the scattered data, we employ commonly used interpolation methods. Interpolation is a method of constructing new data points within the range of a discrete set of the known original data points. In the areas of 4

15 engineering and science, when we deal with data, one often has a number of data points obtained through sampling or experimentation. The data represents function for a limited independent variable [14]. To analyze the data scientists and engineers usually use mathematical interpolation methods to model the scattered data. Generally speaking there are two kinds of mathematical interpolation, global interpolation and local interpolation. All the points will be used to determine the value of separate new points in global interpolation, while only the nearby points are used in local interpolation. Usually we use global interpolation methods when modeling scattered data to make full use of the original data. In global interpolation, given a set of n sample points {(xi,yi,zi), i = 1, 2,, n} with the sample value for each point {vi, i = 1, 2,, n}, we construct a function f(x,y,z) that is valid everywhere inside the domain of interest and satisfies the condition of {f(xi,yi,zi) = vi, i = 1, 2,, n}. When the function is found, it can help to calculate the value for each point based on the discrete location. For this project we used the Shepard s method. The mathematical expression of the method is: ( ) ( ) ( ) (2.1) is the distance between sample point and this gird node. is the diagonal length of a grid node. α is usually any real number greater than zero. The inverse-distance weighted method is a special case of the Shepard s method where α = 1. 5

16 2.3 Data Visualization Method Marching cubes is a computer graphics or visualization algorithm for extracting a polygonal mesh of an iso-surface from a 3D scalar field (sometimes called voxels) [9]. This is done by creating an index to a pre-calculated array of 256 possible polygon configurations ( =256) within the cube (by treating each of the 8 scalar values as a bit in an 8-bit integer). If the scalar's value is higher than the iso-value (i.e., it is inside the surface) then the appropriate bit is set to one, while if it is lower (outside), it is set to zero. The final value after all 8 scalars are checked is the actual index to the polygon indices array. Finally each vertex will generated triangles by using the Marching Cubes lookup table. So they can connect vertex correctly.[15]. 6

17 CHAPTER III DESIGN 3.1 Localized Data Modeling To model scattered data, we usually employ interpolation methods. These are the methods used to construct new data points within the range of a discrete set of known original data points. Currently, the most commonly used mathematical interpolation methods have two kinds of data modeling, globalized data modeling and localized data modeling. Globalized data modeling methods use all sample points to interpolate a grid value. Localized data modeling only use nearby sample points to interpolate a grid value. In our project, we focus on localized data modeling. As we previously stated, that localized data modeling only use nearby sample point to interpolate a grid value. How to define nearby is an interesting question. So we have included two options, Range- Oriented Localized Data Modeling (ROLDM) and Blocked-Oriented Localized Data Modeling (BOLDM). Range-Oriented Localized Data Modeling (ROLDM) is a distance-based localized data modeling method. Each time we interpolate a grid value, we draw a circle (in 2D) or sphere (in 3D) using this grid value as the center point and a certain radius. The radius is defined by us. If the radius is large enough to contain all sample point, the result will be 7

18 the same as the result of globalized data modeling. We only use the sample points in this circle or sphere to calculate the grid value and ignore any sample points outside this circle or sphere. Blocked-Oriented Localized Data Modeling (BOLDM) is designed for a GPU s architecture. We divide the entire data volume into small blocks or volumes which could fit perfectly in shared memory. We will discuss this later in the design of a GPU-based modeling algorithm. All the grid points have their own block ID. All the grid points values are interpolated by using the sample points, which have the same block ID. This means that only same block sample points are used to interpolate a grid value. 3.2 Design of GPU-based Modeling Algorithm Calculating the data values in parallel is the basic idea behind designing a GPUbased modeling algorithm. As we discussed, there are two types of localized data modeling methods (which we designed), Range-Oriented Localized Data Modeling (ROLDM) and Blocked-Oriented Localized Data Modeling (BOLDM). The design of Range-Oriented Localized Data Modeling (ROLDM) can be explained as follows: 1) Define the grid size in each dimension. 2) Allocate one dimension arrays for sample points and grid points in the CPU. 3) Read sample points from a text file and write positions and data values into the arrays. 4) Scale the sample point position values by the following steps: 8

19 a) Find the maximum and minimum of each x, y and z values of the sample points. b) For each x, y and z values of the sample points divide by the difference of the maximum and minimum and multiply by the grid size in each dimension. c) Write the scaled sample points positions in to an array 5) Allocate one dimension arrays for sample points and grid points in the GPU. 6) Calculate the block dimension and grid dimension by the grid size. 7) Copy the scaled sample point positions and values from the CPU to the GPU. 8) Call the kernel function by passing the block dimension, grid dimension, grid array pointer and sample data pointer. 9) Allocate shared memory. 10) Each kernel performs the following steps: a) Load this kernel s corresponding sample point data from global memory to shared memory. b) Synchronize all the threads. Wait until all the sample data are loaded into shared memory. c) Calculate this kernel s corresponding grid point index by using the block index, block dimension, and thread index. d) Calculate the distances between this kernel s corresponding grid point and all the sample data points. e) Ignore the sample data points far away from this kernel s corresponding grid point and record the nearby sample points. 9

20 f) Interpolate this kernel s corresponding grid point value by using recorded nearby sample points. g) Write the interpolated value into the array. 11) Copy back the interpolated grid values from the GPU to the CPU. 12) Free GPU memories. The design of Block-Oriented Localized Data Modeling (BOLDM) can be explained as follows: 1) Define the grid size in each dimension. 2) Allocate one dimension arrays for sample points and grid points in the CPU. 3) Read sample points from a text file and write positions and data values into the arrays. 4) Scale the sample point position values by the following steps: a) Find the maximum and minimum of each x, y and z value of the sample points. b) For each x, y and z value of the sample points divide by the difference of the maximum and minimum and multiply by the grid size in each dimension. c) Write the scaled sample point positions into an array 5) Allocate one dimension arrays for sample points and grid points in the GPU. 6) Calculate the block dimension and grid dimension by the grid size. 7) Divide the sample data points into blocks by using the grid dimension. 8) Copy the scaled sample point positions and values from the CPU to the GPU. 9) Call the kennel function by passing the block dimension, grid dimension, grid array pointer and sample data pointer. 10) Allocate shared memory. 10

21 11) Each kernel performs the following steps: a) Load this kernel s corresponding block of sample point data from global memory to shared memory. b) Synchronize all the threads. Wait until all the sample data are loaded into shared memory. c) Calculate this kernel s corresponding grid point index by using the block index, block dimension, and thread index. d) Interpolate this kernel s corresponding grid point value by using this kernel s corresponding block sample data. e) Write the interpolated value into the array. 12) Copy back the interpolated grid values from the GPU to the CPU. 13) Free GPU memories. 3.3 Design of GPU-based Visualization Algorithm Marching Cubes is a surface reconstruction algorithm [8]. It extracts a geometric iso-surface from the volume of voxels. There are three situations for the vertices of a voxel: 1) If the value of the vertex is less than the iso-value this vertex is outside of the isosurface. 2) If the value of the vertex equals the iso-value which means this vertex is on the iso-surface. 3) If the value of the vertex is larger than the iso-value this vertex is in the isosurface. 11

22 A border voxel does not have all of its vertices neither inside nor outside the isosurface. By ignoring the second situation (where the value of the vertex equals the isovalue), there are 256 possible configurations for each voxel. Each voxel has 8 vertices and each vertex has 2 situations either inside the iso-surface or outside the iso-surface. This is why there are many configurations. So we predefined the triangle mesh approximating the part of the iso-surface for each configuration [9]. We use the edgetable[256] array to store 256 possible configurations as a look up table. For each of the possible vertex states listed in edgetable[256] there is a specific triangulation. tritable[256] lists all of them in the form of 0-5 edge triples so there are 256 ways to draw a triangle. In a 3D space we enumerate 256 different situations for the marching cubes representation. All of these cases can be generalized in 15 unique topological cases [7]. The main idea of the GPU-base marching cubes algorithm is that each thread of the GPU can be used to compute each voxel of the entire volume. The design of the GPUbased marching cubes algorithm can be explained as follows: 1) Initialization. 2) Allocate one dimension arrays for the grid data values, the voxel cases of the entire volume, the edge look up table and the triangle look up table on the CPU. 3) Read grid data values which we interpolated by ROLDM or BOLDM and write them into CPU arrays. 4) Allocate one dimension arrays for the grid data values, the voxel cases of the entire volume, the edge look up table and the triangle look up table on the GPU. 5) Copy the grid data values array from the CPU to the GPU. 12

23 6) Calculate the block dimension and grid dimension by the volume size. 7) Call the kernel function by passing the block dimension, the grid dimension, the iso-value and grid data values array pointer. 8) Each kernel performs the following steps: a) Calculate this kernel s corresponding voxel index by using the block index, the block dimension, and the thread index. b) Read the vertex values of this kernel s corresponding voxel from grid data values array. c) Compare each vertex value and iso-value to generate 8 scalar values which indicate 8 vertices of this voxel. We treat each of the 8 scalar values as a bit in an 8-bit integer what inside is the 0 and outside is the 1. d) Write the result of the comparison which is a 8-bit integer into the voxel cases array. 9) Copy back the voxel cases array from the GPU to the CPU. 10) Draw triangles by using the voxel cases array, edge look up table and triangle look up table. 13

24 CHAPTER IV IMPLEMENTATION 4.1 Implementation of Localization Shepard s Method on GPU The Shepard s Method is represented as: ( ) ( ) ( ) (2.1) is the distance between sample point and this gird node. is the diagonal length of a grid node. α is usually any real number greater than zero. The inverse-distance weighted method is a special case of the Shepard s method where α = 1. Each grid point has its own position which is represented by x, y and z. Each x, y and z value is used as an index to correspond the kernel index which is the thread index (in order to sign each grid point a kernel thread). Thus, the program can be parallelized so that each thread calculates a data value for each of the given set of grid points. All the grid point positions and values will be stored in a one dimension array since the GPU and the CPU are using one dimension arrays to communicate. The index of the one dimension array will indicate the position of a grid point: Index = z*ydimensionsize + y*xdimensionsize + x (4.1) 14

25 Each thread can calculate its corresponding grid point position by using threadidx.x, blockdim.x and blockidx.x. The equation is defined as: Index = threadidx.x + blockdim.x * blockidx.x (4.2) Since every kernel thread has been assigned to each grid point, all the grid points are interpolated simultaneously by using the Shepard s method equation. All sample points will be loaded into shared memory when each kernel thread begins Range-Oriented Localized Data Modeling (ROLDM). The distances between the current kernels thread grid point and all sample points are calculated to determine whether this sample point is nearby or not. The distance equation is defined as: Distance = ( ) ( ) ( ) (4.3) Pseudo code: Input: sample points array Output: 3D uniform grid with data value on each grid node Load sample points into shared memory Synchronize all the threads Calculate the index by using kernel thread index equation (4.2) Parse x, y and z value form to the index Calculate the distance between the current grid point and all sample points Record nearby sample points whose distance is less than a certain value 15

26 Interpolate the current grid point value by using the Shepard s method Equation. Only recorded samples are used to interpolate. Write back the grid point value into an array 4.2 Implementation of Marching Cubes Algorithm on GPU Three kernel functions are implemented for the GPU-based Marching Cubes Algorithm. They are the classifyvoxel kernel, the compactvoxels kernel and the generatetriangles kernel. Each kernel thread is assigned a voxel and then execute classifyvoxel to determine whether this voxel will be displayed or not, which means there is an intersection on the edge of this voxel or not. We compare the iso-value and the value of each vertex to determine whether there is an intersection on the edge or not. If all of the verte value of the voxel are less than or greater than iso-value, this voxel will not be displayed. If some of the vertices are less than the iso-value and other vertices are greater than the iso-value, there is an intersection on the edge and this voxel will be displayed. After the classifyvoxel kernel function, the voxeloccupied array will be outputted. This will indicate if the voxel is non-empty or will be displayed. The voxelvertices array is used to record vertex statues in order to tell the generatetriangles kernel function how to display triangles. 16

27 We executed compactvoxels right after classifyvoxel to compact the voxeloccupied array and get rid of empty voxels. This allows us to run the complex generatetriangles kernel on only the occupied voxels. The generatetriangles kernel function runs only on the occupied voxels for the high performance. Both of the lookup tables edgetable and tritable will be loaded into the GPU texture memory. Each kernel calculates its corresponding voxel case from the voxelvertices array. After the voxel cases are loaded, each kernel will go through both of the lookup tables to find how to generate a triangle for this voxel case. Thus triangles will be generated correctly. 17

28 CHAPTER V RESULTS AND ANALYSES 5.1 Global Method Comparisons between CPU and GPU. The implementation of the presented algorithm has been tested on a Dell computer with an NVidia GeForce GT 525M. The following are its specifications: CUDA Driver Version / Runtime Version: 5.5/5.5 CUDA Capability Major/Minor version number: 2.1 Number of Multiprocessors: 2 Number of CUDA cores per Multiprocessors: 48 Total Number of CUDA Cores: 96 Total amount of global memory: 1024 Mbytes Total amount of shared memory per block: bytes Various grid sizes have been chosen in order to compare the time consumed and draw conclusions regarding how much the GPU-based program will speed computation. Codes are written using similar logic for both CPU based sequential programming and GPU based parallel programming. Table 5.1 shows average the running times in milliseconds (ms) of ten experiments result for each grid. 18

29 Table 5.1 Comparing CPU and GPU for Global Shepard s Method of Various Grid Sizes Runtime Grid Size CPU Runtime GPU Runtime SpeedUp Efficiency Factor 1*1* < *2* < *4* < *8* *16* *32* *64* *128* The speedup factor is the ratio between GPU runtime and CPU runtime. It measures and captures the relative benefit of using parallel. The speedup factor equation is defined as: SpeedUp Factor = = = S(p) (5.1) Efficiency is a fraction time measurement for how a processing element is usefully employed in a computation [11]. The Efficiency equation is defined as: Efficiency = = = ( ) (5.2) The GPU-based program has overhead factors such as process synchronization, memory allocation and data communication. The ratio of overhead factors becomes smaller when the grid size increases. Thus, we can see that the GPU-based program becomes increasingly efficient as the grid size increase. 19

30 The running time of the CPU exceeds that of the GPU when the grid size is larger than 21*21*21. The GPU-based global method shows better results as the size of grid increases. When the grid size is smaller than 21*21*21, the CPU-based global method taken advantage due to the GPU-based global method having the overhead of memory copy and synchronization. The GPU-based global method can take great advantage of parallel processing when the grid size is larger than 21*21*21. The serial compute time is significantly longer than the GPU communication time. The details of the overhead of GPU-based global method data communication have also been observed. Table 5.1 average shows the running times in milliseconds (ms) of ten experiments in each size. Table 5.2 GPU Global Method Detailed Runtime Grid Size GPU Runtime GPU Kernel Compute Runtime Data Copy Host to Device Time Data Copy Device to Host Time Malloc Memory Time Data Communication Time 1*1* *2* *4* *8* *16* *32* *64* *128* Note: Data Communication Time is the sum of Data Copy Host to Device Time, Data Copy Device to Host Time and Malloc Memory Time. The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers with vital feedback for optimizing CUDA C/C++ applications [10]. 20

31 We applied the NVIDIA Visual Profiler as a timing test tool. The Data Communication Time of GPU-based Global Method is shown in detail, in Table 5.2 which is broken into four parts, GPU Kernel Compute Runtime, Data Copy Host to Device Time, Data Copy Device to Host Time and Malloc Memory Runtime. The smallest time unit of the NVIDIA Visual Profiler timing test tool is 0.002ms, so all the time less than or equals to 0.002ms will be shown as 0.002ms in this table. The GPU Kernel Compute Runtime is the time of all kernel computations from beginning to end. These are meaningless when the grid size is smaller than 16*16*16 because GPU these are too small to monitor by the NVIDIA Visual Profile. The GPU Kernel Compute Runtime increases approximately 8 times on each grid dimension when the grid size increase 32*32*32 to 128*128*128. Data Copy Host to Device Time has approximately the same results. Due to the same sample data are being used for each experiment. The Data Copy Device to Host Time is meaningless when the grid size is less than 16*16*16. This is due to NVIDIA Visual Profile smallest time unit being the same as that of GPU Kernel Compute Runtime. Data Copy Device to Host Time increases with the larger output data size which is the grid size. Malloc Memory Time also grows with the needed memory data size. The speeds of global memory copy are also experimented and shown in Table

32 Table 5.3 Data Communication Time, Size and Speed for Various Grid Sizes Grid Size Data Copy Host to Device Data Size Data Copy Host to Device Time Data Copy Host to Device Speed Data Copy Device to Host data Size Data Copy Device to Host Time Data Copy Device to Host Speed 1*1*1 2156KB GB/s 4bytes MB/s 2*2*2 2156KB GB/s 32 bytes MB/s 4*4*4 2156KB GB/s 256 bytes MB/s 8*8*8 2156KB GB/s 2 KB MB/s 16*16* KB GB/s 16KB GB/s 32*32* KB GB/s 128KB GB/s 64*64* KB GB/s 1MB GB/s 128*128* KB GB/s 8MB GB/s The Data Copy Host to Device Speed are all approximately the same since the hardware does not change. The largest Data Copy Device to Host Speed is observed within the grid size 64*64*64. 22

33 Table 5.4 Data Communication Size and Speed for Grid Size 64*64*64 to 128*128*128 Grid Size Data Copy Device to Host data size Data Copy Device to Host speed 64*64*64 1MB 6.09GB/s 70*70* MB 5.78GB/s 80*80* MB 5.65GB/s 90*90* MB 2.81GB/s 100*100* MB 2.95GB/s 110*110* MB 2.82GB/s 120*120* MB 2.91GB/s 128*128*128 8MB 2.82GB/s Table 5.5 Data Communication Size and Speed for Grid Size 80*80*80 to 82*82*82 Grid Size Data Copy Device to Host data size Data Copy Device to Host speed 80*80* MB 5.65GB/s 81*81* MB 2.65GB/s 82*82* MB 2.68GB/s Table 5.4 and Table 5.5 show us that the peak Data Copy Device to Host speed is observed at grid size 80*80*80. 23

34 5.2 Accuracy Comparisons between Localized Global Method and Non-localized Global Method The two-step approach to scattered data visualization faces many issues, one of which is accuracy. We employed numerical error analysis to calculate the accuracy of the scattered data modeling. After the data modeling step, every grid node value was then constructed using the input sample points. Then, these interpolated grid node values were used to produce the data values for the original sample points (by using linear interpolation). Analytically, these interpolated grid node values can exactly reproduce the original data values at the sample points; but numerically, it cannot due to numerical errors. Such numerical errors can be calculated by: n i = f ( x, y, z ) - v, i i i i i =1,..., n, (5.4) where is the scattered data value at sample point (,, ) and f(,, ) is the interpolated grid node values at the point [12]. The relative errors are calculated using the numerical errors and original data values. Root mean square measures the differences between value predicted by a model or an estimator and the values actually observed. We are using the absolute value of relative error to calculate RMS, which can be explained as: Relative Error = (5.5) 24

35 RMS use the sample standard deviation to show the experiment accuracy. The equation for RMS is explained as: RMS = ( ) (5.5) Real Value Experimental Value Figure 5.1 Graphical Representation of Non-Localized Global Method Data Modeling Result 25

36 Numerical Error Numerical Error Figure 5.2 Graphical Representation of Non-localized Global Method Data Modeling Numerical Error 3500 Relative Error Relative Error Figure 5.3 Graphical Representation of Non-localized Global Method Data Modeling Relative Error 26

37 We can see that the results of non-localized global method data modeling have a seemingly low accuracy. The RMS is which are not desirable. We can barely see the trend of data point values as they increase and decrease. Apparently, the result is not satisfied. In order to improve accuracy, we employ localized global method data modeling which uses only nearby sample points interpolated a grid value. We use 8 by 8 by 8 as the range size. The results of localized global method data modeling are shown in Figure Real Value Experimental Value Figure 5.4 Graphical Representation of Localized Global Method Data Modeling Result 27

38 Numerical Error Numerical Error Figure 5.5 Graphical Representation of Localized Global Method Data Modeling Numerical Error Relative Error Relative Error Figure 5.6 Graphical Representation of Localized Global Method Data Modeling Relative Error 28

39 We can see that the results of localized global method data modeling are much more accurate than the results of non-localized global method data modeling. The accuracy has improved significantly. The RMS is reduced to Decreasing the contribution of any far away data point is another approach to improving accuracy. We can control the contribution of far distance data point by changing α. The distant data point contributes to the result more when α is small. So, we can increase α value in order to decrease the contribution of far away data point. By default, α is 1, so we increase α to 2. The result is shown by Figure Real Value Experimental Value Figure 5.7 Graphical Representation of Improved Shepard s Method by α = 2 Result 29

40 Numerical Error Numerical Error Figure 5.8 Graphical Representation of Improved Shepard s Method by α = 2 Numerical Error 30

41 Relative Error Relative Error Figure 5.9 Graphical Representation of Improved Shepard s Method by α = 2 Relative Error By changing α from 1 to 2, the accuracy has improved slightly. The RMS is reduced to (from ). We continue to increase the α value to the accuracy peak, which is 10 in this case. The result is shown by Figure

42 Real Value Experimental Value Figure 5.10 Graphical Representation of Improved Shepard s Method by α = 10 Result 6000 Numerical Error Numerical Error Figure 5.11 Graphical Representation of Improved Shepard s Method by α = 10 Numerical Error 32

43 3000 Relative Error Relative Error Figure 5.12 Graphical Representation of Improved Shepard s Method by α = 10 Relative Error The accuracy has significantly improved and the result is desirable now. The RMS is reduced to When the value of the α parameter increase to 11, The accuracy goes down sharply and is shown in Figure

44 Real Value Experimental Value Figure 5.13 Improved Shepard s Method by α = 11 Relative Error The accuracy drop down sharply when α increase to 11, due to the contribution of distance becomes too small. The RMS is increased to The over all RMS changes are shown in Figure

45 RMS RMSE Non-localized Global Method Localized Global Method α=2 α=10 α=11 Figure 5.14 RMS of Different Data Modeling Methods 5.3 Performance Comparison between Static Local Block Data Modeling Method and Dynamic Local Block Data Modeling Method Blocked-oriented localized data modeling (BOLDM) is designed for GPU s Architecture. We divided the entire data volume into small blocks (or volumes) which could be fitted perfectly into shared memory. The static local block data modeling method is a pre-defined method. We manually divide the entire data volume into small blocks and assign each small block to a GPU block, so each small block has its own shared memory. The dynamic local block data modeling method also divides the entire data volume into small blocks but dynamically. Each data point has its own block which is the group of data points around it. The blocks are organized dynamically to each data point, which is determined by its position. The dynamic local block data modeling method also uses the shared memory to improve the performance. However, it is different than the static local block data modeling method which puts all the required sample data 35

46 points in their own shared memory. Some of the required sample data points may not be in their own shared memory due to shared memory being static and the block is moved by data point position. So, static local block data modeling method reads all its required sample data from its own shared memory. However, the dynamic local block data modeling method reads some of the required sample data from its own shared memory and the other sample data from global memory. Obviously, the performance of the static local block data modeling method and the dynamic local block data modeling method will be seemingly different. They are shown in Table

47 Table 5.6 Comparing Static Local Block Data Modeling and Dynamic Local Block Data Modeling Method Running Time for Various Grid Sizes Grid Size Static Local Block Data Modeling Method Runtime Dynamic Local Block Data Modeling Method Runtime 20*20* *30* *40* *50* *60* *70* *80* *90* *100* *110* *120*

48 *20*20 30*30*30 40*40*40 50*50*50 60*60*60 70*70*70 80*80*80 90*90*90 100*100* *110* *120*120 Static Local Block Data Modeling Method runtime Dynamic Local Block Data Modeling Method runtime Figure 5.14 Graph to Compare the Speed between Static Local Block Data Modeling and Dynamic Local Block Data Modeling Method We can see that the dynamic local block data modeling method consumes more time than the static local block data modeling method runtime. This is due to dynamic local block data modeling method reading from both shared memory and global memory. As the grid size increases, the running time of the dynamic local block data modeling method increases significantly. 5.4 Over All Speed Up and Error Report Analyses Different GPU-based data modeling methods present different error reports and speed up rates. How to balance speed up rate and accuracy will be a new issue of GPUbased scattered data visualization. Table 5.7 shows the error reports and speedup factors of different data modeling methods by grid size 128*128*

49 Table 5.7 Speed Up Rate and Error Report for Various Data Modeling Method CPUbased Global Method GPUbased Global Method GPUbased Global Method with α=2 GPUbased Global Method with α=10 GPUbased Static Local Block Method GPUbased Dynamic Local Block Method Maximum Absolute Numerical Error Maximum Absolute Relative Error RMS Accuracy of Numerical Error RMS Accuracy of Relative Error SpeedUp Factor Efficiency Table 5.7 shows that we can employ the GPU-based dynamic local block method to increase accuracy significantly by sacrificing running time. The key element is finding a way to balance accuracy and runtime. Increasing the α coefficient of Shepard s method is good option to improve result accuracy. 39

50 CHAPTER VI CONCLUSION AND FUTURE WORK How to balance performance and accuracy is an important issue in GPU-based scattered data visualization. We have built a GPU-accelerated scattered data visualization system and used the system to study various methods of speeding up the performance while still balancing the accuracy. We have experimented with various techniques to improve GPU memory usage and to reduce CPU-GPU data communication. The experiments have shown the following results: 1) GPU-based scattered data modeling demonstrates a speedup of 12 to 27 times than its CPU-based counterpart (on an NVidia GeForce GT 525M GPU against an Intel Core i5-2410m 2.30 GHz CPU). 2) Increasing the value of the α parameter to certain value in the Shepard s method can improve accuracy without causing a performance penalty (The accuracy of chemical leakage sample data will increase to the peak when the value of the α parameter is 10). 3) Localization can reduce modeling error but causes a performance penalty. 4) Dynamic block localization can increase accuracy significantly, but has a large performance penalty due to the frequent data shift among GPU memory banks. 40

51 5) Static block localization has a smaller performance penalty, but also shows smaller accuracy improvement. The parallel efficiency of the system is low ( to ). To achieve high memory bandwidth for concurrent accesses, issues related to GPU memory bank conflicts need to be addressed in future work. More GPU-accelerated data interpolation methods, such as the volume spline method, the thin-plate-spline method and the multiquadric method, should also be investigated in the future. 41

52 REFERENCES [1] Yingcai Xiao, J. Ziebarth, Physically Based Data Modeling for Sparse Data Volume Visualization, Technical Report No , Department of Mathematics and Computer Science, University of Akron, January [2] Yingcai Xiao, J, Ziebarth, FEM-based Scattered Data Modeling and Visualization, with J. Ziebarth, Computers and Graphics, Vol. 24, No. 5, 2000, [3] Yingcai Xiao, C. Woodbury Constraining Global Interpolation Methods for Sparse Data Volume Visualization, with C. Woodbury, International Journal of Computers and Applications, Vol. 21, No. 2, 1999, [4] Yingcai Xiao, John P. Ziebarth, Chuck Woodbury, Eric Bayer, Bruce Rundell, Jeroen van der Zijp, The Challenges of Visualizing and Modeling Environmental Data, with J. Ziebarth, C. Woodbury, E. Bayer, B. Rundell, J. Zijp, IEEE Visualization 96 Conference Proceeding, San Francisco, California, October 27 November 1, 1996, [5] Vinjarapu, Saranya S. GPU-based Scattered Data Modeling, Master Thesis in Computer Science, University of Akron, [6] J.Allard, C.Menier, B.Raffin, et al. Grimage: Markerless 3D Interactions, In ACM SIGGRAPH 07, International Conference on Computer Graphics and Interactive Techniques, emerging technologies, article No. 9, [7] C.Leong, Y.Xing, N.D.Georganas. Tele-Immersive Systems, IEEE International Workshop on Haptic Audio Visual Environments and their Applications, Ottawa: Canada, [8] W. E. Lorensen, H. E. Cline, Marching Cubes: A High Resolution 3D Surface Reconstruction Algorithm, SIGGRAPH 87 Proceedings of the 14th annual conference on Computer graphics and interactive techniques, 1987, USA [9] Y. Heng, L. Gu, GPU-based Volume Rendering for Medical Image Visualization, Proceedings of the IEEE Engineering in Medicine and Biology 27th Annual Conference, 2005, Shanghai, China [10] NVIDIA CUDA C Programming guide. Version 3.2, 2010, NVIDIA C[4] H.R. Nagel, GPU optimized Marching Cubes Algorithm for Handling Very Large, Temporal 42

53 Datasets, CiteSeerX Scientific Literature Digital Library and Search Engine, 2010 corporation [11] Programming Massively Parallel Processors: A Hands-on Approach. By David Kirk and Wen-mei Hwu. [12] Yingcai Xiao, Jinqiang Tian, Hao Sun. Error Analysis in Sparse Data Volume Visualization, International Conference on Imaging Science, Systems, and Technology, Las Vegas, June 24-27, 2002, [13] Lu Wang. Scattered-Data Computing on Various Platforms, Master Thesis in Computer Science, University of Akron, [14] Interpolation. (2015, March 12). In Wikipedia, The Free Encyclopedia. Retrieved 19:34, April 2, 2015, from [15] Marching cubes. (2015, February 9). In Wikipedia, The Free Encyclopedia. Retrieved 19:40, April 2, 2015, from 43

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy

More information

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

More information

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

More information

ultra fast SOM using CUDA

ultra fast SOM using CUDA ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

More information

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Volume visualization I Elvins

Volume visualization I Elvins Volume visualization I Elvins 1 surface fitting algorithms marching cubes dividing cubes direct volume rendering algorithms ray casting, integration methods voxel projection, projected tetrahedra, splatting

More information

Common Core Unit Summary Grades 6 to 8

Common Core Unit Summary Grades 6 to 8 Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 4 - Optimizations Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 3 Control flow Coalescing Latency hiding

More information

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate

More information

FPGA area allocation for parallel C applications

FPGA area allocation for parallel C applications 1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

CUDA Basics. Murphy Stein New York University

CUDA Basics. Murphy Stein New York University CUDA Basics Murphy Stein New York University Overview Device Architecture CUDA Programming Model Matrix Transpose in CUDA Further Reading What is CUDA? CUDA stands for: Compute Unified Device Architecture

More information

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter Daniel Weingaertner Informatics Department Federal University of Paraná - Brazil Hochschule Regensburg 02.05.2011 Daniel

More information

GRADES 7, 8, AND 9 BIG IDEAS

GRADES 7, 8, AND 9 BIG IDEAS Table 1: Strand A: BIG IDEAS: MATH: NUMBER Introduce perfect squares, square roots, and all applications Introduce rational numbers (positive and negative) Introduce the meaning of negative exponents for

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

Texture Cache Approximation on GPUs

Texture Cache Approximation on GPUs Texture Cache Approximation on GPUs Mark Sutherland Joshua San Miguel Natalie Enright Jerger {suther68,enright}@ece.utoronto.ca, joshua.sanmiguel@mail.utoronto.ca 1 Our Contribution GPU Core Cache Cache

More information

Learn CUDA in an Afternoon: Hands-on Practical Exercises

Learn CUDA in an Afternoon: Hands-on Practical Exercises Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA

More information

HistoPyramid stream compaction and expansion

HistoPyramid stream compaction and expansion HistoPyramid stream compaction and expansion Christopher Dyken1 * and Gernot Ziegler2 Advanced Computer Graphics / Vision Seminar TU Graz 23/10-2007 1 2 University of Oslo Max-Planck-Institut fu r Informatik,

More information

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING Sonam Mahajan 1 and Maninder Singh 2 1 Department of Computer Science Engineering, Thapar University, Patiala, India 2 Department of Computer Science Engineering,

More information

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions Module 4: Beyond Static Scalar Fields Dynamic Volume Computation and Visualization on the GPU Visualization and Computer Graphics Group University of California, Davis Overview Motivation and applications

More information

Speeding Up RSA Encryption Using GPU Parallelization

Speeding Up RSA Encryption Using GPU Parallelization 2014 Fifth International Conference on Intelligent Systems, Modelling and Simulation Speeding Up RSA Encryption Using GPU Parallelization Chu-Hsing Lin, Jung-Chun Liu, and Cheng-Chieh Li Department of

More information

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Jianqiang Dong, Fei Wang and Bo Yuan Intelligent Computing Lab, Division of Informatics Graduate School at Shenzhen,

More information

Performance Level Descriptors Grade 6 Mathematics

Performance Level Descriptors Grade 6 Mathematics Performance Level Descriptors Grade 6 Mathematics Multiplying and Dividing with Fractions 6.NS.1-2 Grade 6 Math : Sub-Claim A The student solves problems involving the Major Content for grade/course with

More information

GPGPU Computing. Yong Cao

GPGPU Computing. Yong Cao GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power

More information

A Short Introduction to Computer Graphics

A Short Introduction to Computer Graphics A Short Introduction to Computer Graphics Frédo Durand MIT Laboratory for Computer Science 1 Introduction Chapter I: Basics Although computer graphics is a vast field that encompasses almost any graphical

More information

QCD as a Video Game?

QCD as a Video Game? QCD as a Video Game? Sándor D. Katz Eötvös University Budapest in collaboration with Győző Egri, Zoltán Fodor, Christian Hoelbling Dániel Nógrádi, Kálmán Szabó Outline 1. Introduction 2. GPU architecture

More information

Real-time Visual Tracker by Stream Processing

Real-time Visual Tracker by Stream Processing Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol

More information

3D Distance from a Point to a Triangle

3D Distance from a Point to a Triangle 3D Distance from a Point to a Triangle Mark W. Jones Technical Report CSR-5-95 Department of Computer Science, University of Wales Swansea February 1995 Abstract In this technical report, two different

More information

Optimizing Application Performance with CUDA Profiling Tools

Optimizing Application Performance with CUDA Profiling Tools Optimizing Application Performance with CUDA Profiling Tools Why Profile? Application Code GPU Compute-Intensive Functions Rest of Sequential CPU Code CPU 100 s of cores 10,000 s of threads Great memory

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

New Hash Function Construction for Textual and Geometric Data Retrieval

New Hash Function Construction for Textual and Geometric Data Retrieval Latest Trends on Computers, Vol., pp.483-489, ISBN 978-96-474-3-4, ISSN 79-45, CSCC conference, Corfu, Greece, New Hash Function Construction for Textual and Geometric Data Retrieval Václav Skala, Jan

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Packet-based Network Traffic Monitoring and Analysis with GPUs

Packet-based Network Traffic Monitoring and Analysis with GPUs Packet-based Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar wenji@fnal.gov, demar@fnal.gov GPU Technology Conference 2014 March 24-27, 2014 SAN JOSE, CALIFORNIA Background Main

More information

Parallel Prefix Sum (Scan) with CUDA. Mark Harris mharris@nvidia.com

Parallel Prefix Sum (Scan) with CUDA. Mark Harris mharris@nvidia.com Parallel Prefix Sum (Scan) with CUDA Mark Harris mharris@nvidia.com April 2007 Document Change History Version Date Responsible Reason for Change February 14, 2007 Mark Harris Initial release April 2007

More information

Parallel Large-Scale Visualization

Parallel Large-Scale Visualization Parallel Large-Scale Visualization Aaron Birkland Cornell Center for Advanced Computing Data Analysis on Ranger January 2012 Parallel Visualization Why? Performance Processing may be too slow on one CPU

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Parallel Simplification of Large Meshes on PC Clusters

Parallel Simplification of Large Meshes on PC Clusters Parallel Simplification of Large Meshes on PC Clusters Hua Xiong, Xiaohong Jiang, Yaping Zhang, Jiaoying Shi State Key Lab of CAD&CG, College of Computer Science Zhejiang University Hangzhou, China April

More information

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE Tuyou Peng 1, Jun Peng 2 1 Electronics and information Technology Department Jiangmen Polytechnic, Jiangmen, Guangdong, China, typeng2001@yahoo.com

More information

Interactive Level-Set Deformation On the GPU

Interactive Level-Set Deformation On the GPU Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation

More information

A Fast Scene Constructing Method for 3D Power Big Data Visualization

A Fast Scene Constructing Method for 3D Power Big Data Visualization Journal of Communications Vol. 0, No. 0, October 05 A Fast Scene Constructing Method for 3D Power Big Data Visualization Zhao-Yang Qu and Jing-Yuan Huang School of Information Engineering of Northeast

More information

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Dissertation submitted in partial fulfillment of the requirements for the degree of Master of Technology, Computer Engineering by Amol

More information

A Theory of the Spatial Computational Domain

A Theory of the Spatial Computational Domain A Theory of the Spatial Computational Domain Shaowen Wang 1 and Marc P. Armstrong 2 1 Academic Technologies Research Services and Department of Geography, The University of Iowa Iowa City, IA 52242 Tel:

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden smakadir@csc.kth.se,

More information

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005 Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005 Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex

More information

GeoImaging Accelerator Pansharp Test Results

GeoImaging Accelerator Pansharp Test Results GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

How To Create A Surface From Points On A Computer With A Marching Cube

How To Create A Surface From Points On A Computer With A Marching Cube Surface Reconstruction from a Point Cloud with Normals Landon Boyd and Massih Khorvash Department of Computer Science University of British Columbia,2366 Main Mall Vancouver, BC, V6T1Z4, Canada {blandon,khorvash}@cs.ubc.ca

More information

GPU Accelerated Monte Carlo Simulations and Time Series Analysis

GPU Accelerated Monte Carlo Simulations and Time Series Analysis GPU Accelerated Monte Carlo Simulations and Time Series Analysis Institute of Physics, Johannes Gutenberg-University of Mainz Center for Polymer Studies, Department of Physics, Boston University Artemis

More information

Hands-on CUDA exercises

Hands-on CUDA exercises Hands-on CUDA exercises CUDA Exercises We have provided skeletons and solutions for 6 hands-on CUDA exercises In each exercise (except for #5), you have to implement the missing portions of the code Finished

More information

Introduction to Computer Graphics

Introduction to Computer Graphics Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 torsten@sfu.ca www.cs.sfu.ca/~torsten Today What is computer graphics? Contents of this course Syllabus Overview of course topics

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, 5-8 8-4, 8-7 1-6, 4-9

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, 5-8 8-4, 8-7 1-6, 4-9 Glencoe correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 STANDARDS 6-8 Number and Operations (NO) Standard I. Understand numbers, ways of representing numbers, relationships among numbers,

More information

GPU-BASED TUNING OF QUANTUM-INSPIRED GENETIC ALGORITHM FOR A COMBINATORIAL OPTIMIZATION PROBLEM

GPU-BASED TUNING OF QUANTUM-INSPIRED GENETIC ALGORITHM FOR A COMBINATORIAL OPTIMIZATION PROBLEM GPU-BASED TUNING OF QUANTUM-INSPIRED GENETIC ALGORITHM FOR A COMBINATORIAL OPTIMIZATION PROBLEM Robert Nowotniak, Jacek Kucharski Computer Engineering Department The Faculty of Electrical, Electronic,

More information

GPU-Based Network Traffic Monitoring & Analysis Tools

GPU-Based Network Traffic Monitoring & Analysis Tools GPU-Based Network Traffic Monitoring & Analysis Tools Wenji Wu; Phil DeMar wenji@fnal.gov, demar@fnal.gov CHEP 2013 October 17, 2013 Coarse Detailed Background Main uses for network traffic monitoring

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

Clustering Billions of Data Points Using GPUs

Clustering Billions of Data Points Using GPUs Clustering Billions of Data Points Using GPUs Ren Wu ren.wu@hp.com Bin Zhang bin.zhang2@hp.com Meichun Hsu meichun.hsu@hp.com ABSTRACT In this paper, we report our research on using GPUs to accelerate

More information

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques

More information

Computer Graphics CS 543 Lecture 12 (Part 1) Curves. Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Computer Graphics CS 543 Lecture 12 (Part 1) Curves. Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI) Computer Graphics CS 54 Lecture 1 (Part 1) Curves Prof Emmanuel Agu Computer Science Dept. Worcester Polytechnic Institute (WPI) So Far Dealt with straight lines and flat surfaces Real world objects include

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

NVIDIA Tools For Profiling And Monitoring. David Goodwin

NVIDIA Tools For Profiling And Monitoring. David Goodwin NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale

More information

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA CUDA Optimization with NVIDIA Tools Julien Demouth, NVIDIA What Will You Learn? An iterative method to optimize your GPU code A way to conduct that method with Nvidia Tools 2 What Does the Application

More information

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...

More information

September 25, 2007. Maya Gokhale Georgia Institute of Technology

September 25, 2007. Maya Gokhale Georgia Institute of Technology NAND Flash Storage for High Performance Computing Craig Ulmer cdulmer@sandia.gov September 25, 2007 Craig Ulmer Maya Gokhale Greg Diamos Michael Rewak SNL/CA, LLNL Georgia Institute of Technology University

More information

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express

More information

GPU Architecture. Michael Doggett ATI

GPU Architecture. Michael Doggett ATI GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super

More information

Constrained Tetrahedral Mesh Generation of Human Organs on Segmented Volume *

Constrained Tetrahedral Mesh Generation of Human Organs on Segmented Volume * Constrained Tetrahedral Mesh Generation of Human Organs on Segmented Volume * Xiaosong Yang 1, Pheng Ann Heng 2, Zesheng Tang 3 1 Department of Computer Science and Technology, Tsinghua University, Beijing

More information

Accelerating Wavelet-Based Video Coding on Graphics Hardware

Accelerating Wavelet-Based Video Coding on Graphics Hardware Wladimir J. van der Laan, Andrei C. Jalba, and Jos B.T.M. Roerdink. Accelerating Wavelet-Based Video Coding on Graphics Hardware using CUDA. In Proc. 6th International Symposium on Image and Signal Processing

More information

Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0

Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0 Muse Server Sizing 18 June 2012 Document Version 0.0.1.9 Muse 2.7.0.0 Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without

More information

Evaluating HDFS I/O Performance on Virtualized Systems

Evaluating HDFS I/O Performance on Virtualized Systems Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang xtang@cs.wisc.edu University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing

More information

Pre-Algebra 2008. Academic Content Standards Grade Eight Ohio. Number, Number Sense and Operations Standard. Number and Number Systems

Pre-Algebra 2008. Academic Content Standards Grade Eight Ohio. Number, Number Sense and Operations Standard. Number and Number Systems Academic Content Standards Grade Eight Ohio Pre-Algebra 2008 STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express large numbers and small

More information

Network Traffic Monitoring and Analysis with GPUs

Network Traffic Monitoring and Analysis with GPUs Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar wenji@fnal.gov, demar@fnal.gov GPU Technology Conference 2013 March 18-21, 2013 SAN JOSE, CALIFORNIA Background Main uses for network

More information

Process Modelling from Insurance Event Log

Process Modelling from Insurance Event Log Process Modelling from Insurance Event Log P.V. Kumaraguru Research scholar, Dr.M.G.R Educational and Research Institute University Chennai- 600 095 India Dr. S.P. Rajagopalan Professor Emeritus, Dr. M.G.R

More information

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM 152 APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM A1.1 INTRODUCTION PPATPAN is implemented in a test bed with five Linux system arranged in a multihop topology. The system is implemented

More information

L20: GPU Architecture and Models

L20: GPU Architecture and Models L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.

More information

Massive Streaming Data Analytics: A Case Study with Clustering Coefficients. David Ediger, Karl Jiang, Jason Riedy and David A.

Massive Streaming Data Analytics: A Case Study with Clustering Coefficients. David Ediger, Karl Jiang, Jason Riedy and David A. Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger, Karl Jiang, Jason Riedy and David A. Bader Overview Motivation A Framework for Massive Streaming hello Data Analytics

More information

Higher Education Math Placement

Higher Education Math Placement Higher Education Math Placement Placement Assessment Problem Types 1. Whole Numbers, Fractions, and Decimals 1.1 Operations with Whole Numbers Addition with carry Subtraction with borrowing Multiplication

More information

Employing Complex GPU Data Structures for the Interactive Visualization of Adaptive Mesh Refinement Data

Employing Complex GPU Data Structures for the Interactive Visualization of Adaptive Mesh Refinement Data Volume Graphics (2006) T. Möller, R. Machiraju, T. Ertl, M. Chen (Editors) Employing Complex GPU Data Structures for the Interactive Visualization of Adaptive Mesh Refinement Data Joachim E. Vollrath Tobias

More information

A New Approach to Cutting Tetrahedral Meshes

A New Approach to Cutting Tetrahedral Meshes A New Approach to Cutting Tetrahedral Meshes Menion Croll August 9, 2007 1 Introduction Volumetric models provide a realistic representation of three dimensional objects above and beyond what traditional

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing

More information

Multi-GPU Load Balancing for In-situ Visualization

Multi-GPU Load Balancing for In-situ Visualization Multi-GPU Load Balancing for In-situ Visualization R. Hagan and Y. Cao Department of Computer Science, Virginia Tech, Blacksburg, VA, USA Abstract Real-time visualization is an important tool for immediately

More information

Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress

Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress Biggar High School Mathematics Department National 5 Learning Intentions & Success Criteria: Assessing My Progress Expressions & Formulae Topic Learning Intention Success Criteria I understand this Approximation

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information