Parallel Simplification of Large Meshes on PC Clusters Hua Xiong, Xiaohong Jiang, Yaping Zhang, Jiaoying Shi State Key Lab of CAD&CG, College of Computer Science Zhejiang University Hangzhou, China April 14, 2008
Background The scale of data sets are growing fast. 3D scanning Scientific simulation CAD modeling Acceleration techniques for interactive rendering: Visibility culling Parallel rendering Image-based rendering Mesh compression & layout optimization Mesh simplification & multiresolution techniques
Related works Mesh simplification and multiresolution modeling and rendering techniques are two of the most efficient acceleration approaches. Massive mesh simplification methods: Mesh cutting based approaches [Hoppe 98] [Prince 00] [Brodsky et al. 03] [Borodin et al. 03] External memory data structure [Lindstrom et al. 01] [Cignoni et al. 03] [Shaffer et al. 05] Stream or batch processing [Lindstrom 00] [Wu et al. 03] [Isenburg 03]
Problems Long simplification time for massive meshes [Wu et al. 03], 866 MHz PIII, St.Matthew statue, 373M, to 0.5%, 4 hours [Lindstrom et al. 01], 250 MHz SGI Onyx2, isosurface, 468M, to 1.5%, 3 hours 12 minutes [Isenburg et al. 03], 250 MHz SGI Onyx2, isosurface, 468M, to 0.5%, 2 hours 25 minutes Need to speed up the simplification Benefit downstream mesh processing applications Make system debugging more convenient
Summary of our approach Mesh cutting based parallel simplification Cutting the input massive mesh into sub-meshes Simplifying all sub-meshes in parallel Stitching the sub-meshes Mesh stream based parallel simplification Generating mesh streams Simplifying all mesh stream in parallel Mesh streams composition
Cutting based parallel simplification(1) Cutting requirement for load balanced parallel simplification Nearly equal sized sub-meshes Boundary primitives are as few as possible Our approach: mesh cutting based on graph partition Fast cutting speed High quality of cutting
Cutting based parallel simplification(2) Cutting step Using a uniform grid to subdivide the bounding box of the input mesh Constructing a graph: Graph node non-empty cell Graph edge k-nearest neighbors Using METIS to partition the graph Clustering the vertices and triangles
Cutting based parallel simplification(3) Cutting examples
Cutting based parallel simplification(4) Load balanced parallel simplification PC cluster resource management Heterogeneous PC cluster Different memory capacity Different CPU performance Simplification task management Task partition Task distribution Load balancing
Cutting based parallel simplification(5) How to evaluate the performance of each PC? Our approach: Benchmark based performance test for simplification The benchmark test includes: Constructing simplification data structure Performing half edge collapse till no triangle remains Performance: triangle count / simplification time
Cutting based parallel simplification(6) Dynamic task management Master PC: distribute & collect simplification tasks Slave PCs: dynamically apply for simplification tasks Master PC Slave PCs
Cutting based parallel simplification(7) Load balancing Dynamic task distribution Input & output buffers Cache sub-meshes before and after simplification Hide transmission latency Buffer size: controlled by the PC performance parameter Renew buffer: determined by the occupancy ratio
Cutting based parallel simplification(8) Stitching the simplified sub-meshes Our approach: boundary triangles are stored in a separate file Simplifying boundary primitives
Cutting based parallel simplification(9) Experimental environment Stand-alone implementation PC: 2 * 2.4 GHz CPUs, 1 GB RAM Thread: 2 Parallel implementation PC : 2 * 2.4 GHz CPUs, 1 GB RAM Thread: 2 Cluster: 24 nodes Network: Gigabit Ethernet
Cutting based parallel simplification(10) Results Mesh Thailand Statue Lucy XYZ Dragon Malaysia Statue #Triangle in 10,000,000 28,055,742 7,219,045 3,631,628 #Triangle out 200,052 280,102 72,196 36,320 Percentage 2% 1 % 1 % 1 % #Sub-mesh 512 1024 512 512 T-Cutting 0:00:41 0:03:26 0:00:30 0:00:11 T-Simplification 0:00:32 0:01:31 0:00:24 0:00:20 T-Stitching 0:00:16 0:00:45 0:00:08 0:00:10 T-Total 0:01:29 0:05:42 0:01:02 0:00:41 T-Single PC 0:27:45 0:80:24 0:17:56 0:12:40 Speedup 19:1 14:1 17:1 18:1
Stream based parallel simplification(1) Requirement for parallel stream simplification (a) Data locality: temporal and spatial proximity of primitive access Geometrical sorting Topological sorting Space filling curves Spectral sequencing (b) Multiple mesh streams generation Space subdivision vs. surface segmentation Equal sized mesh streams
Stream based parallel simplification(2) Data locality optimization Our approach: out-of-core geometrical sorting along the longest axis of the bounding box
Stream based parallel simplification(3) Multiple mesh streams generation Our approach: adaptive space subdivision of the bounding box
Stream based parallel simplification(4) Parallel stream simplification The core stream simplification algorithm is similar to the one in [Wu et al. 03] Difference: using indexed triangle format instead of triangle soup Advantage: not need to reconstruct the connectivity Operators: INPUT and DECIMATE
Stream based parallel simplification(5) Streams composition and post-processing Observation: boundaries are spatially ordered Our approach: parallel boundary stitching
Stream based parallel simplification(6) Results Mesh Thailand Statue Lucy XYZ Dragon Malaysia Statue #Triangle in 10,000,000 28,055,742 7,219,045 3,631,628 #Triangle out 200,013 280,172 72,256 36,640 Percentage 2% 1 % 1 % 1 % #Streams 24 24 24 24 T-Generation 0:00:32 0:02:12 0:00:24 0:00:12 T-Simplification 0:00:12 0:00:48 0:00:10 0:00:09 T-Composition 0:00:08 0:00:25 0:00:07 0:00:07 T-Total 0:00:52 0:03:25 0:00:41 0:00:28 T-Single PC 0:08:20 0:25:45 0:06:53 0:05:40 Speedup 9:1 8:1 10:1 12:1
Conclusion Two parallel simplification schemes for massive meshes Task partition schemes: Mesh cutting Stream generation Load balancing schemes: benchmark test based resource management dynamic task management
Ongoing and future work A parallel framework for the construction of multiresolution representations of massive meshes Storage and indexing schemes of multiresolution representation of massive meshes in distributed environment GPU cluster based parallel simplification for massive meshes Other methods of mesh streams generation
Acknowledgement National Grand Fundamental Research 973 Program of China Grant 2002CB312105 National Science Foundation of China Grant 60533080 Stanford Graphics Group and Cyberware for providing the test data sets Reviewers comments
Thank you!