HistoPyramid stream compaction and expansion Christopher Dyken1 * and Gernot Ziegler2 Advanced Computer Graphics / Vision Seminar TU Graz 23/10-2007 1 2 University of Oslo Max-Planck-Institut fu r Informatik, Saarbru cken Page 1
GPUs are highly parallel and perfect for all data local operations. However, re-arranging or selectively deleting data is difficult on GPUs. Stream compaction and expansion are such operations: Stream compaction For each element in input stream, let a predicate determine if the element should be discarded. Produce a compact output stream of remaining elements. Generalization: Stream compaction and expansion For each element in input stream, let a predicate determine each element s multiplicity, i.e. how many times the element should be present in the output stream (0 = discard) Produce a compact output stream of all elements with a multiplicity greater than 0 Page 2
Why bother with stream compaction and expansion? Feature extraction: get a compact list of all locations satisfying a criterion. Point cloud generation: Generate a set of points on an implicitly defined surface. Compaction of intermediate results: Save GPU CPU bandwith. Emulate/Offload geometry-shader: Our HP-based Marching Cubes implementation does not need GS, currently even outperforms GS-based approaches. Sparse matrix extraction.... Data re-organization is an active field of research (see GPU Gems II and III, etc.) Page 3
The HistoPyramid algorithm The input stream is laid out over a N N grid (=tex2d). Each input element are subjected to a predicate function: count = 0 = discard from output stream. count = 1 = keep in output stream. count > 1 = repeat in output stream. The output of the predicate func forms the HP base layer. the HP is a mipmap-pyramid of partial sums. the HP pyramid is built in log 2 (N) passes. top element of HP: number of elements in output. Then, iterate over output elements: extraction of an element is done in log 2 (N) texture lookups. Each input element can have multiple copies in output. No data transfer from GPU to CPU. Page 4
Overview 12 2 2 4 4 0 1 1 0 0 1 1 0 2 0 0 2 0 2 2 0 (2,1) (2,2) (5,1) (5,2) (0,4) (1,5) (2,6) (3,6) (7,4) (6,5) (4,6) (5,6) Input image Bucket count HistoPyramid Point list Predicate HP-builder Extractor Page 5
Predicate function For each input element, determine output stream count. count is often binary (1/0) For different predicates, several base layers can be built: per base layer, one HP will be built - in parallel. predicates may overlap if needed! NV40: 4 RGBA = 16 predicates, G80: 8 RGBA = 32 predicates. Example: extract list of edge pixels (Laplace + threshold): Input data Laplace + threshold } {{ } predicate HP base level Page 6
HistoPyramid builder mipmap-generation, but with sum instead of average: in effect, each cell counts elements in its sub-pyramid: 1 1 0 1 1 0 1 0 0 2 0 1 1 0 0 0 Base level, 4 4 3 2 3 1 Level 1, 2 2 9 Level 2, 1 1 top element: total number of elements in base layer = output size. Example: HP of the Lena edge-pixels (red = nonzero count): Page 7
Pointlist extractor Input: output index used as key index. Input: Key indices 0 1 2 3 4 5 6 7 8 Output: texcoords & clone ix [0,0],0 [1,0],0 [0,1],0 [3,0],0 [2,1],0 [1,2],0 [1,2],1 [0,3],0 [3,2],0 Notice: multiplicity from base layer: =1 = copy once. >1 = copy multiple times. [0,1) [1,2) intervals 6 1 4 1 1 0 1 3 2 1 0 1 0 9 3 1 0 2 0 1 L2 L1 1 0 0 0 1 L0 [0,3) [3,5) [5,8) [8,9) intervals [0,2) [2,3) intervals Page 8
The Marching Cubes Algorithm 8 corners, inside/outside = 256 classes. Each MC class: combination of edges that pierce iso-surface. Use table with geometry for MC classes, with all possible triangulations of the edge intersections (figure). The input to the algorithm is an M 3 grid of scalar values. Examine groups of 2 2 2 voxels (MC cell). Check if MC cell s corners are inside/outside iso-level. Determine exact edge-surface intersections and emit corresponding triangles. Notice: Effectively a stream compaction/expansion process! Page 9
HistoPyramid Marching Cubes Iso-level Scalar field texture Vertex count texture HistoPyramid texture Triangulation table texture Enumeration VBO Start new frame Update scalar field Build HP base HP reduce For each level Vertex count readback Render geometry Input: A stream of (M 1) 3 MC-cells (2x2x2 voxels grouped). Predicate: Samples and determines MC class via inside/outside-state of MC cell corners, then writes number of required vertices for MC geometry to base layer. HistoPyramid: Top element gives total number of vertices in the iso-surface (3 the number of triangles). Extraction: Use output index to traverse HP, determine corresponding input element (i.e. which MC cell), remainder tells which edge intersection this vertex correspond to, determine edge intersection and emit position. Page 10
Datasets used in the performance analysis Bunny CThead MRbrain Bonsai Aneurism Cayley Page 11
Performance of HistoPyramid Marching Cubes Bunny mrbrain Aneurism Cayley 6600GT 7800GT 8800GTX 8800GTX Model MC cells Density HP-VS HP-VS HP-VS NV-SDK10 255x255x255 16581375 3.2% 538.6 (32.5) 127x127x127 2048383 5.6% 5.4 (2.6) 11.8 (5.8) 309.5 (151.1) 63x63x63 250047 9.1% 4.0 (16.1) 8.5 (34.1) 163.4 (653.5) 28.3 (113.2) 31x31x31 29791 13.6% 2.5 (82.8) 5.0 (167.9) 25.5 (857.0) 21.9 (734.0) 127x127x63 Cthead255x255x128 1016127 6.3% 5.4 (5.3) 11.6 (11.5) 288.1 (283.6) 63x63x31 123039 9.6% 3.7 (29.9) 7.7 (62.2) 97.3 (791.0) 25.3 (205.9) 31x31x15 14415 14.5% 2.3 (161.3) 4.5 (311.5) 12.9 (896.4) 17.1 (1187.0) 8323200 3.7% 16.3 (2.0) 437.6 (53.0) 255x255x128 8323200 5.8% 10.5 (1.3) 309.0 (37.4) 127x127x63 1016127 7.4% 4.6 (4.5) 9.9 (9.7) 257.7 (263.6) 63x63x31 123039 10.0% 3.5 (28.6) 7.4 (60.0) 96.8 (786.5) 26.4 (214.9) 31x31x15 14415 14.9% 2.2 (155.0) 4.3 (300.9) 12.7 (879.7) 18.2 (1257.4) 127x127x127 Bonsai255x255x255 2048383 5.1% 5.9 (2.9) 13.0 (6.3) 329.8 (161.0) 63x63x63 250047 6.7% 5.4 (21.5) 11.4 (45.5) 186.5 (745.9) 28.9 (115.6) 31x31x31 29791 8.2% 4.1 (136.8) 8.0 (268.8) 25.1 (843.0) 24.0 (804.6) 16581375 3.0% 560.8 (33.8) 255x255x255 16581375 1.6% 892.5 (53.8) 127x127x127 2048383 2.1% 12.6 (6.1) 29.1 (14.2) 557.6 (272.2) 63x63x63 250047 3.7% 9.1 (36.2) 19.2 (76.7) 190.5 (761.9) 32.9 (131.5) 31x31x31 29791 6.8% 4.5 (149.7) 8.6 (289.1) 25.0 (839.3) 25.5 (856.6) 255x255x255 16581375 0.9% 1112.3 (67.1) 127x127x127 2048383 1.9% 13.5 (6.6) 31.2 (15.2) 581.3 (283.8) 63x63x63 250047 3.9% 8.5 (33.9) 17.9 (71.6) 198.0 (791.9) 32.1 (128.5) 31x31x31 29791 8.1% 3.7 (123.8) 7.3 (245.8) 25.8 (866.2) 24.7 (827.9) Numbers in million voxels processed per second (Parentheses: MC runs per second - framerate). Page 12
References: C. Dyken, G. Ziegler, C. Theobalt, H.-P. Seidel, GPU Marching Cubes on Shader Model 3.0 and 4.0, MPI-I-2007-4-006, Max-Planck-Institut für Informatik, 2007 C. Dyken, J. Seland, and M.Reimers, Real-Time GPU Silhouette Refinement using adaptively blended Bézier patches, to appear in Graphics Forum, 2007 I Ihrke, G. Ziegler, A. Tevs, C. Theobalt, M. Magnor, H.-P. Seidel Eikonal Rendering: Efficient Light Transport in Refractive Objects to appear in ACM Trans. on Graphics (Siggraph 07), 2007. G. Ziegler, A. Tevs, C. Theobalt, H.-P. Seidel, GPU Point List Generation through Histogram Pyramids, MPI-I-2006-4-002, Max-Planck-Institut für Informatik, 2006. Page 13