Parallel Simplification of Large Meshes on PC Clusters



Similar documents
Faculty of Computer Science Computer Graphics Group. Final Diploma Examination

Efficient Storage, Compression and Transmission

Computer Graphics Hardware An Overview

Stream Processing on GPUs Using Distributed Multimedia Middleware

How To Create A Surface From Points On A Computer With A Marching Cube

Advanced Rendering for Engineering & Styling

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Cellular Computing on a Linux Cluster

Multiresolution 3D Rendering on Mobile Devices

Parallel Computing with MATLAB

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Grid Computing for Artificial Intelligence

SECONDARY STORAGE TERRAIN VISUALIZATION IN A CLIENT-SERVER ENVIRONMENT: A SURVEY

Dual Marching Cubes: Primal Contouring of Dual Grids

Off-line Model Simplification for Interactive Rigid Body Dynamics Simulations Satyandra K. Gupta University of Maryland, College Park

A NEW METHOD OF STORAGE AND VISUALIZATION FOR MASSIVE POINT CLOUD DATASET

Robust Algorithms for Current Deposition and Dynamic Load-balancing in a GPU Particle-in-Cell Code

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Parallel Large-Scale Visualization

Remote Graphical Visualization of Large Interactive Spatial Data

Facts about Visualization Pipelines, applicable to VisIt and ParaView

High Performance Computing in CST STUDIO SUITE

OpenMP Programming on ScaleMP

Chapter 18: Database System Architectures. Centralized Systems

Big Graph Processing: Some Background

Model Repair. Leif Kobbelt RWTH Aachen University )NPUT $ATA 2EMOVAL OF TOPOLOGICAL AND GEOMETRICAL ERRORS !NALYSIS OF SURFACE QUALITY

Delaunay Based Shape Reconstruction from Large Data

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Binary search tree with SIMD bandwidth optimization using SSE

Numerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015

Lecture Notes, CEng 477

Lecture 7 - Meshing. Applied Computational Fluid Dynamics

Constrained Tetrahedral Mesh Generation of Human Organs on Segmented Volume *

A FRAMEWORK FOR REAL-TIME TERRAIN VISUALIZATION WITH ADAPTIVE SEMI- REGULAR MESHES

Volume visualization I Elvins

Interactive 3D Medical Visualization: A Parallel Approach to Surface Rendering 3D Medical Data

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

A Multiresolution Approach to Large Data Visualization

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Computer Graphics CS 543 Lecture 12 (Part 1) Curves. Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

GPU for Scientific Computing. -Ali Saleh

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

HIGH PERFORMANCE BIG DATA ANALYTICS

Interactive Level-Set Deformation On the GPU

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

NVIDIA IndeX. Whitepaper. Document version June 2013

Volumetric Meshes for Real Time Medical Simulations

A Fast Scene Constructing Method for 3D Power Big Data Visualization

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

GPU Architecture. Michael Doggett ATI

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Fast Multipole Method for particle interactions: an open source parallel library component

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Introduction to Computer Graphics

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

HPC performance applications on Virtual Clusters

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Parallel Visualization for GIS Applications

Hardware design for ray tracing

High Performance Computing and Big Data: The coming wave.

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation

Understanding the Benefits of IBM SPSS Statistics Server

A Theory of the Spatial Computational Domain

Automatic Reconstruction of Parametric Building Models from Indoor Point Clouds. CAD/Graphics 2015

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Introduction to GPU Computing

Rethinking SIMD Vectorization for In-Memory Databases

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Performance of the JMA NWP models on the PC cluster TSUBAME.

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Telecom Data processing and analysis based on Hadoop

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

P. Lu, Sh. Huang and K. Jiang

Big Data Performance Growth on the Rise

A Short Introduction to Computer Graphics

How To Share Rendering Load In A Computer Graphics System

Introducing Storm 1 Core Storm concepts Topology design

FPGA-based Multithreading for In-Memory Hash Joins

Principles and characteristics of distributed systems and environments

Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Motivation: Smartphone Market

Efficient Data Management Support for Virtualized Service Providers

Robust NURBS Surface Fitting from Unorganized 3D Point Clouds for Infrastructure As-Built Modeling

Transcription:

Parallel Simplification of Large Meshes on PC Clusters Hua Xiong, Xiaohong Jiang, Yaping Zhang, Jiaoying Shi State Key Lab of CAD&CG, College of Computer Science Zhejiang University Hangzhou, China April 14, 2008

Background The scale of data sets are growing fast. 3D scanning Scientific simulation CAD modeling Acceleration techniques for interactive rendering: Visibility culling Parallel rendering Image-based rendering Mesh compression & layout optimization Mesh simplification & multiresolution techniques

Related works Mesh simplification and multiresolution modeling and rendering techniques are two of the most efficient acceleration approaches. Massive mesh simplification methods: Mesh cutting based approaches [Hoppe 98] [Prince 00] [Brodsky et al. 03] [Borodin et al. 03] External memory data structure [Lindstrom et al. 01] [Cignoni et al. 03] [Shaffer et al. 05] Stream or batch processing [Lindstrom 00] [Wu et al. 03] [Isenburg 03]

Problems Long simplification time for massive meshes [Wu et al. 03], 866 MHz PIII, St.Matthew statue, 373M, to 0.5%, 4 hours [Lindstrom et al. 01], 250 MHz SGI Onyx2, isosurface, 468M, to 1.5%, 3 hours 12 minutes [Isenburg et al. 03], 250 MHz SGI Onyx2, isosurface, 468M, to 0.5%, 2 hours 25 minutes Need to speed up the simplification Benefit downstream mesh processing applications Make system debugging more convenient

Summary of our approach Mesh cutting based parallel simplification Cutting the input massive mesh into sub-meshes Simplifying all sub-meshes in parallel Stitching the sub-meshes Mesh stream based parallel simplification Generating mesh streams Simplifying all mesh stream in parallel Mesh streams composition

Cutting based parallel simplification(1) Cutting requirement for load balanced parallel simplification Nearly equal sized sub-meshes Boundary primitives are as few as possible Our approach: mesh cutting based on graph partition Fast cutting speed High quality of cutting

Cutting based parallel simplification(2) Cutting step Using a uniform grid to subdivide the bounding box of the input mesh Constructing a graph: Graph node non-empty cell Graph edge k-nearest neighbors Using METIS to partition the graph Clustering the vertices and triangles

Cutting based parallel simplification(3) Cutting examples

Cutting based parallel simplification(4) Load balanced parallel simplification PC cluster resource management Heterogeneous PC cluster Different memory capacity Different CPU performance Simplification task management Task partition Task distribution Load balancing

Cutting based parallel simplification(5) How to evaluate the performance of each PC? Our approach: Benchmark based performance test for simplification The benchmark test includes: Constructing simplification data structure Performing half edge collapse till no triangle remains Performance: triangle count / simplification time

Cutting based parallel simplification(6) Dynamic task management Master PC: distribute & collect simplification tasks Slave PCs: dynamically apply for simplification tasks Master PC Slave PCs

Cutting based parallel simplification(7) Load balancing Dynamic task distribution Input & output buffers Cache sub-meshes before and after simplification Hide transmission latency Buffer size: controlled by the PC performance parameter Renew buffer: determined by the occupancy ratio

Cutting based parallel simplification(8) Stitching the simplified sub-meshes Our approach: boundary triangles are stored in a separate file Simplifying boundary primitives

Cutting based parallel simplification(9) Experimental environment Stand-alone implementation PC: 2 * 2.4 GHz CPUs, 1 GB RAM Thread: 2 Parallel implementation PC : 2 * 2.4 GHz CPUs, 1 GB RAM Thread: 2 Cluster: 24 nodes Network: Gigabit Ethernet

Cutting based parallel simplification(10) Results Mesh Thailand Statue Lucy XYZ Dragon Malaysia Statue #Triangle in 10,000,000 28,055,742 7,219,045 3,631,628 #Triangle out 200,052 280,102 72,196 36,320 Percentage 2% 1 % 1 % 1 % #Sub-mesh 512 1024 512 512 T-Cutting 0:00:41 0:03:26 0:00:30 0:00:11 T-Simplification 0:00:32 0:01:31 0:00:24 0:00:20 T-Stitching 0:00:16 0:00:45 0:00:08 0:00:10 T-Total 0:01:29 0:05:42 0:01:02 0:00:41 T-Single PC 0:27:45 0:80:24 0:17:56 0:12:40 Speedup 19:1 14:1 17:1 18:1

Stream based parallel simplification(1) Requirement for parallel stream simplification (a) Data locality: temporal and spatial proximity of primitive access Geometrical sorting Topological sorting Space filling curves Spectral sequencing (b) Multiple mesh streams generation Space subdivision vs. surface segmentation Equal sized mesh streams

Stream based parallel simplification(2) Data locality optimization Our approach: out-of-core geometrical sorting along the longest axis of the bounding box

Stream based parallel simplification(3) Multiple mesh streams generation Our approach: adaptive space subdivision of the bounding box

Stream based parallel simplification(4) Parallel stream simplification The core stream simplification algorithm is similar to the one in [Wu et al. 03] Difference: using indexed triangle format instead of triangle soup Advantage: not need to reconstruct the connectivity Operators: INPUT and DECIMATE

Stream based parallel simplification(5) Streams composition and post-processing Observation: boundaries are spatially ordered Our approach: parallel boundary stitching

Stream based parallel simplification(6) Results Mesh Thailand Statue Lucy XYZ Dragon Malaysia Statue #Triangle in 10,000,000 28,055,742 7,219,045 3,631,628 #Triangle out 200,013 280,172 72,256 36,640 Percentage 2% 1 % 1 % 1 % #Streams 24 24 24 24 T-Generation 0:00:32 0:02:12 0:00:24 0:00:12 T-Simplification 0:00:12 0:00:48 0:00:10 0:00:09 T-Composition 0:00:08 0:00:25 0:00:07 0:00:07 T-Total 0:00:52 0:03:25 0:00:41 0:00:28 T-Single PC 0:08:20 0:25:45 0:06:53 0:05:40 Speedup 9:1 8:1 10:1 12:1

Conclusion Two parallel simplification schemes for massive meshes Task partition schemes: Mesh cutting Stream generation Load balancing schemes: benchmark test based resource management dynamic task management

Ongoing and future work A parallel framework for the construction of multiresolution representations of massive meshes Storage and indexing schemes of multiresolution representation of massive meshes in distributed environment GPU cluster based parallel simplification for massive meshes Other methods of mesh streams generation

Acknowledgement National Grand Fundamental Research 973 Program of China Grant 2002CB312105 National Science Foundation of China Grant 60533080 Stanford Graphics Group and Cyberware for providing the test data sets Reviewers comments

Thank you!