State-of-the-art Algorithms for Complete 3D Model Reconstruction

Transcription

1 State-of-the-art Algorithms for Complete 3D Model Reconstruction Georgios Kordelas 1, Juan Diego Pèrez-Moneo Agapito 1,2, Jesùs M. Vegas Hernandez 2, and Petros Daras 1 1 Informatics & Telematics Institute, 1st km Thermi Panorama Road, 57001, Thermi, Thessaloniki,Greece 2 Computer Science Department, University of Valladolid, Valladolid, Spain kordelas@iti.gr,perez@iti.gr,jvegas@infor.uva.es,daras@iti.gr Abstract. The task of generating fast and accurate 3D models of a scene has applications in various computer vision fields, including robotics, virtual and augmented reality, and entertainment. So far, the computer vision scientific community has provided innovative reconstruction algorithms that exploit variant types of equipment. In this paper a survey of the recent methods that are able to generate complete and dense 3D models is given. The algorithms are classified into categories according to the equipment used to acquire the processed data. Keywords: 3D reconstruction, multi-view, scan registration. 1 Introduction Nowadays, many advanced applications require three-dimensional (3D) information. The third dimension plays a decisive role in the analysis of dynamic or static environments. Fields of application in daily life that may exploit the third dimension include the surveillance and robotic domains that exploit depth information to gain a much better analysis of the environment. In the medical research, new technologies require more and more reliable depth data. In general, the domains of 3D image processing, digital photography, games, multimedia, 3D visualization and augmented reality make an increasing use of real-time 3D information. The existing 3D modeling methods can be classified according to the required input data, while their efficacy is reflected by the variety of scene that can be processed, the fidelity of the final model and the total processing time. According to the user requirements, automated, semi-automated or manual image-based approaches can be selected to produce digital models usable for inspections, visualization or documentation. Automated methods focus mainly on the full automation of the process but generally produce results which are mainly good for nice-looking real-time 3D recording or simple visualization. On the other hand, semi-automated methods try to reach a balance between accuracy and automation and are very useful for precise documentation and restoration planning.

2 2 Lecture Notes in Computer Science The aim of this paper is to identify the most recent and advanced methods and present them according to the equipment they exploit. The rest of this paper is organized as follows. In Section 2 algorithms using the range scanner technology are presented. Section 3 covers multi-view stereo approaches. In Section 4 an overview of the 3D reconstruction lab established for the scope of the 3DLife project [54] is presented, while conclusions are drawn in Section 5. 2 Reconstruction Using Laser Scanner Technology In the last years, laser scanner technology was emerged as a useful and competitive approach for creating 3D reconstructions. The basic advantages of the methods that use this technology are: (i) speed, (ii) accuracy and (iii) resolution of the reconstruction. Moreover, the scanners field of view allows for the reconstruction of objects, which size ranges from a few centimeters to several meters and exist in a short or long distance. Consequently, this technology is suitable for large-size scenes, as the interior and exterior of buildings and therefore, it is generally accepted by the community as a valid support for documentation and conservation of historic buildings, monuments or archaeological sites. Epigrammatically, the reconstruction of a scene using a range scanner requires the following steps: 1. Acquisition of an appropriate number of colour range scans so as to adequately cover the 3D scene. 2. Registertation of the range scans in the same coordinate system (Fig. 1(I)). 3. Data processing for refinement of the final 3D surface model (Fig. 1(II)). The processing stage includes the following steps: Elimination of redundant information (i.e. double data created from registration of overlapping regions) from the registered point cloud and noise removal. Construction of a 3D model that comprises polygonal facets from the point cloud and its complete missing data (hole filling). The placement of the equipment, in order to acquire the range scans is a trivial task, while there are many commercial software packages that are able to process data so as to refine the final 3D model [52, 53]. Therefore, the main problem during the 3D reconstruction procedure lies in the automatic computation of the three-dimensional transformations that align the range data sets in order to extract the complete 3D model. Registering point clouds in the same coordinate system presumes the exploitation of a 3D-to-3D registration method. There are plenty of methods dealing with range data registration [1 9, 13 15, 18 23]. Some methods are automatic and rely on an automated matching procedure of features [4 9, 13 15, 18, 23], while others require the use of markers [1 3]. This review presents the most efficient procedures that can perform accurate registration without the need of special markers. The range scan registration procedure can be divided into two steps: (1) initial registration that provides a good initial guess of the alignment transformation and (2) fine registration that gives the accurate alignment transformation.

3 Lecture Notes in Computer Science 3 Fig. 1. Registration of: (I) two range scans (II) multiple range scans forming a 3D model [4]. 2.1 Initial Registration A wide variety of methods have already been proposed for initial registration of scans. Several methods extract geometric features from the scans to allow feature matching and alignment between scans. In [4], an effective system that integrates automated 3D-to-3D registration based on geometric features is presented. During preprocessing a set of major 3D planes, a set of geometric 3D lines and a set of reflectance 3D lines from each 3D range scan are extracted. The range scans are registered in the same coordinate system via the automated 3D-to-3D feature-based range-scan registration method of [5]. As a result, all range scans are registered with respect to one selected pivot scan. Since there are some cases in which the extracted linear features are inadequate to register the range scans together, the authors use nonlinear features such as circular arcs. Then, a circle-based registration method based on similarity of radii, orientation and relative position between pairs of circles is utilized in the matching phase. From each valid matching circle pairs, a candidate transformation is computed and its correctness is evaluated. Finally, the transformation achieving the smallest average distance between overlapping range points is chosen as the best one. The mean registration error for this method is about 1 cm, while it can register efficiently scans with minimum overlap of about 20%. The work presented in [6] proposes an angular-invariant feature for the 3D registration procedure to perform reliable selection of point correspondence. The angular feature, which is invariant to scale and rotation transformations, improves the convergence and error without any assumptions about the initial transformation. A major criticism against this feature, however, is that it can discover the potential structural information hiding in nearly flat surfaces. The work presented in [4] can efficiently perform automatic range scan registration using simple geometric features (3D

4 4 Lecture Notes in Computer Science lines, circles). Therefore, this algorithm is appropriate when intensity data does not accompany the range data. A class of registration comprises methods that extract descriptors of large scan areas as the basis for registration and 3D object recognition. An early method for free-form surface registration is proposed in [13]. This approach uses the spin image surface representation, which has low discriminating capability because it maps the 3D range image into a 2D histogram. Therefore, the spin image matching procedure results in many ambiguous correspondences which must be processed through a number of filtration stages to prune out incorrect ones making the technique computationally inefficient even for range images of a reasonable size. A novel 3D free-form surface area representation scheme based on third order tensors, is used for surface registration in [14]. More, specifically multiple tensors are used to represent each range scan. Tensors of two range scans are matched to identify correspondences between them. Correspondences are verified and then used for pairwise registration of the range images. The experimental results show that this algorithm is robust to resolution, number of tensors per view, the required amount of overlap and noise. Comparison against [13] proved its registration efficiency. In [15], the scan alignment is based on the correlation of two Extended Gaussian Images (EGIs) [16] in the Fourier domain using the spherical harmonics of the EGI and the rotational Fourier transform [17]. For pairs with low overlap, which fail to satisfy two criteria (the first one is based on the consistency of surface orientations in the overlapping region and the second one on visibility information), the rotational alignment can be obtained by the alignment of constellation images generated from the EGIs. Rotationally aligned sets are matched by correlation using the Fourier transform of volumetric functions. The merit of this method is that it can efficiently align point clouds with arbitrarily large displacements that have very little overlap. The major advantage of [13 15] methods is that they can resister free-form objects contrarily to [4, 6, 5], which use simple geometric features. When calibrated intensity images accompany the range scans, intensity features can be combined with range information to develop efficient registration methods. In the method developed by Wyngaerd and Van Goon [18], the 3D measurements are combined with the texture. In particular, the surface is intersected with small spheres centered at feature points. The intersection line between such a sampling sphere and the surface defines an invariant region in the texture image. The surface texture inside these regions is used for matching regions between different patches. The merit of this approach is that it can be applied in parts of the surface with poor geometry and rich texture or with rich geometry and poor texture. Bendels et al.[9], match 2D SIFT features [11], backproject them onto range data, and then employ RANSAC [10] on these points in 3D to identify an initial 3D transformation. A similar registration system is presented in [7]. During initialization, intensity keypoints and their SIFT descriptors are extracted from the images and backprojected onto the range data. A 3D coordinate system is established for each keypoint using its backprojected intensity gradient direction and its locally computed range surface normal. Key-

5 Lecture Notes in Computer Science 5 points are then matched using their SIFT descriptors and each match provides an initial rigid transformation estimation. An extension of the Dual-Bootstrap ICP algorithm [12] is used for the fine alignment of the scans. The weakest point of [7] is initialization, since an improper initial estimation would cause alignment failure. Seo et al.[8] use a method to correct geometric and illumination variations of the photometric features before performing 2D local feature matching using the SIFT algorithm. Authors claim that this algorithm is faster than most methods that rely on shape information. Methods that exploit intensity information [18, 9, 7, 8] perform well in aligning textured range scans. However, lack of texture would make these approaches useless. 2.2 Fine Registration Initial registration provides a good estimation of the alignment transformation between sets of 3D range scans, while fine registration, which follows initial registrations, aims to optimally align these sets. In the literature, the ICP (Iterative Closest Point) algorithm [19] is a very popular method for the fine registration of 3D data sets, when an initial guess of the relative pose between them is known. The work presented in [20] classifies several ICP variants and evaluates their performance according to time required to reach the correct alignment. Moreover, a combination of ICP variants optimized for high speed is proposed. The study on convergence properties of ICP [21] shows that the use of normal distances is more effective than Euclidean distances and proposes faster convergence with higher order approximations. An alternative approach that combines intensity and geometric attributes to filter closest point matches is studied in [22]. A maximum-likelihood method for registration of scenes with unmatched or missing data, which does not require ICP refinement, is presented in [23]. In this method, correspondences are formed between valid and missing points in each view. The matched points are classified according to their visibility properties into types. Then a generic sensor model, which can be adjusted to match a wide variety of sensors, is used to generate likelihood measures for each point type. Finally, a multistage optimization procedure, based on these likelihood measures, takes place to find a maximum-likelihood registration. The experimental results proved the efficacy of this method to register complex noisy scenes with occlusions, missing data, and varying degrees of overlap. Even if range scanners give promising results, their cost, size, power requirement and the intricate handling of their data are significant drawbacks. Therefore, the availability of methods that use range scanned technology is limited when compared to multi-view stereo methods. 3 Multi-view Stereo Reconstruction In this section a survey on multi-view stereo reconstruction methods that are efficient to provide dense and full 3D reconstructions of objects from multiple views, is given. Thus, binocular, trinocular, and multi-baseline methods that

6 6 Lecture Notes in Computer Science Fig. 2. (I) Intersection of three visual cones [24] and (II) Evolving shape after eroding inconsistent points [29]. reconstruct a single dense map or structure-from-motion are not considered in this survey. There are several types of methods that are used to reconstruct 3D models of objects from a set of images. These methods could be classified into: (1) methods that reconstruct the visual hull of the object, (2) approaches that recover the photo-hull of an object and (3) algorithms that minimize the surface integral of a certain cost function over the surface shape. 3.1 Visual Hull Reconstruction The first class includes methods that exploit silhouette information to generate intersected visual cones (Fig. 2(I)), which then are used to obtain the 3D representation of an object. Silhouette-based methods are popular for use in multicamera environments mainly due to their simplicity and computational efficiency. In [49], a parallel pipeline processing method for reconstructing a dynamic 3-D object shape from multiview video images is proposed. Real-time processing is accomplished through the combination of a plane-based volume intersection algorithm with a parallel pipeline implementation. The quantitative performance evaluations demonstrated that the acceleration and parallelizing algorithms are very efficient in reconstructing a dynamic full 3-D shape in good resolution. A novel framework for multi-view silhouette cue fusion is proposed in [25]. This framework uses a space occupancy grid as a probabilistic 3D representation of scene contents. The idea behind this paper is to consider each camera pixel as a statistical occupancy sensor. All pixel observations are then used jointly to infer where, and how likely, matter is present in the scene. Through this paper optimal scene object localization, and robust volume reconstruction, can be achieved, with no constraint on camera placement and object visibility. An algorithm that eliminates the problems related to dense feature point matching and camera calibration is presented in [28]. This method is based on the projective geometry between the object space and silhouette images taken from multiple

7 Lecture Notes in Computer Science 7 viewpoints. The object shape is reconstructed by establishing a set of hypothetical planes slicing the object volume and estimating the projective geometric relations between the images. Ishikawa et al.[26] include a genuine segmentation method to acquire the silhouettes, but segmentation errors directly affect the visual hull since the segmentation process is absolutely independent to the reconstruction process. Grauman et al. [27] proposed a Bayesian approach to compensate for modeling errors from false segmentation. They modeled prior density using probabilistic principal components analysis and estimated a maximuma-posteriori reconstruction of multi-view contours. This approach reconstructs good error-compensated models from erroneous silhouette information, but it needs prior knowledge about the objects and ground-truth training data. Concluding, shape-from-silhouette approaches can generate full 3D reconstruction of dynamic scenes, but they lack in reconstruction fidelity and are very sensitive to errors in silhouette extraction. Therefore more efficient techniques, in terms of reconstruction quality, are employed to generate 3D models with increased level of detail. 3.2 Space Carving Reconstruction The second class includes the space carving approaches, which take into account the photometric consistency of the surface across the input images and allow for the recovery of the photo-hull that contains all possible photo-consistent reconstructions. Space carving methods generate an initial reconstruction that envelops the object to be reconstructed. The surface of the reconstruction is then eroded at the points that are inconsistent with the input images (Fig. 2(II)). By repeating this process a reconstruction, which is consistent with the input images, emerges. The Space Carving algorithm suggested by Kutulakos and Seitz [29] uses a repeatedly sweeping plane through the scene volume and tests the photo-consistency of voxels on that plane. This approach permits arbitrary camera placement. However, it has the drawback of making hard and irreversible commitments on the removal of voxels. In particular, if a voxel is removed by error, further voxels can be erroneously removed in a cascade effect. This may lead to incorrect reconstruction by creating holes. Thus, space carving recasts are proposed in the literature. A space carving probabilistic framework for analyzing the 3D occupancy computation problem from multiple images is introduced in [30]. This framework enables a complete analysis of the complex probabilistic dependencies inherent in occupancy calculations and provides an expression for the tightest occupancy bound recoverable from noisy images. In [31], two major extensions to the Space Carving framework are presented. The first one is a progressive scheme for better reconstruction of surfaces lacking sufficient textures. The second one is a novel photo-consistency measure that is valid for both specular and diffuse (Lambertian) surfaces, without the need of light calibration. This method, unlike [29, 30], can deal with surfaces lacking sufficient textures. Concluding, the Space Carving framework suffers from several important limitations:

8 8 Lecture Notes in Computer Science The original Space Carving approach [29] makes hard decisions. This limitation is partially overcome in [30]. The choice of the global threshold on the color variance is often problematic [29, 30]. An attempt to alleviate these photometric constraints in presented in [31] The voxel-based representation, used in the Space Carving approaches, disregards the continuity of shape makes it very hard to enforce any kind of spatial coherence. As a result, space carving is sensitive to noise and outliers and may yield to noisy reconstructions. 3.3 Reconstruction via Surface Integral Minimization The third class of methods optimizes the surface integral of a consistency function over the surface shape. Level-set based methods provide a way of minimizing this cost function. In [43], surface reconstruction is achieved by combining both 3D data and 2D image information. This leads to a more robust approach than existing methods that use only pure 2D information or 3D stereo data. For the efficient evolution of surfaces, a bounded regularization method based on levelset methods, is proposed. Additionally, if the silhouette information from the 2D images is available, it can be integrated to improve the pure 3D results. The main limitation of this system is due to the choice of the surface evolution approach, which assumes a closed and smooth surface. Therefore, the surface reconstruction module is not designed for outdoor or polyhedric objects. The algorithm presented in [37], starts with a generic surface, say a large sphere or a smooth cube, and evolves it to best approximate the shape of the scene. This task is performed by numerically integrating systems of partial differential equations using the level set method presented in [41]. In order to deal with non-lambertial surfaces this method uses a model of the radiance that accounts for deviations from Lambertian reflection through an affine subspace constraint on the radiance tensor field. This algorithm does not require strong texture and can handle sharp radiance changes. A novel method for multi-view stereovision that minimizes the prediction error using a global image-based matching score, is presented in [40]. The input image views are wrapped and registered with a user-defined image similarity measure, which can include neighborhood and global intensity information. The surface evolution is implemented in a level set framework [41]. Experiments proved the superiority of this method against the method presented in [37], even for complex non-lambertian images including specularities and translucency. Except for level-set based methods, a second way of minimizing the surface integral is to use graph-cuts. Yu et al.[38] propose a new iterative graph-cuts based algorithm which operates on the Surface Distance Grid, to reduce the minimal surface bias and transform the discretization bias into a controllable degree of surface smoothness (these biases make difficult the recovery of surface extrusions and other details). This algorithm works better than [37] in preserving the edges and corners, which results in a lower volume difference. The drawback of this method is that it is assumed that the initial estimate is already quite close

9 Lecture Notes in Computer Science 9 to the final result. In [39], a direct surface reconstruction approach is proposed, which starts from a continuous geometric functional that is minimized up to a discretization by a global graph-cut algorithm operating on a 3D embedded graph. The whole procedure is consistently incorporated into a voxel representation that handles both occlusions and discontinuities. In [32], an algorithm for reducing the minimal surface bias associated with volumetric graph cuts for 3D reconstruction from multiple calibrated images is presented. The algorithm is based on an iterative graph-cut over narrow bands combined with an accurate surface normal estimation. At each iteration, the normal to each surface patch is optimized in order to obtain a precise value for the photometric consistency measure. Then, a volumetric graph-cut is applied on a narrow band around the current surface estimate to determine the optimal surface inside this band. The reconstruction results, obtained on standard data sets, are more accurate and complete than in [45] and [33] ([47] presents the evolution of this work and is described below). Additionally, this method does not require exact silhouette images (unlike [45]) or the use of a ballooning term (unlike [33]). The octahedral graph structure used in [34] establishes a well defined relationship between the photo-consistency of a voxel and the edge weights of an embedded octahedral subgraph. This specific graph design supports a hierarchical surface extraction, which allows to efficiently process even high volumetric resolutions and a large number of input images. This method achieves high resolution in the region of interest, but it relies heavily on the visual hull being a good approximation of the surface. 3.4 Fusion of Reconstruction Techniques Many recent approaches use a fusion of different reconstruction techniques to accomplish better reconstruction results. The flexibility of the carving approach is combined with the accuracy of graph-cut optimization in [35]. In this algorithm a progressive refinement scheme is used to recover the topology and reason the visibility of the object. Within each voxel, a detailed surface patch is optimally reconstructed using a graph-cut method. The advantage of this technique is its ability to handle complex shape similarly to level sets while enjoying a higher precision. Compared to carving techniques the produced surface does not suffer from aliasing. This work is extended in [36], where a new surface representation method, called patchwork is introduced. A patchwork is the combination of several patches that are built one by one. This design potentially allows for the reconstruction of an object with arbitrarily large dimensions while preserving a fine level of detail. This algorithm outperforms the level-set method presented in [42] and Space Carving [29]. The use of graph-cut optimization to the volumetric multiview stereo problem is introduced in [47], too. Initially, it is defined an occlusion-robust photo-consistency metric, which is then approximated by a discrete flow graph. This metric uses a robust voting scheme that treats pixels from occluded cameras as outliers. Graph-cut optimization can exactly compute the minimal surface that encloses the largest possible volume, where surface area is just a surface integral in this photo-consistency field. However, the ballooning

10 10 Lecture Notes in Computer Science Fig. 3. Provided interface for (I) displaying the frames captured by the cameras in real-time and (II) calibrating the camera network. term used in this method cannot handle thin structures or big concavities. Kolev et al.[44] consider three different energy models for multiview reconstruction, which are based on a common variational template unifying regional volumetric terms and on-surface photoconsistency. While the first two approaches are based on a classical silhouette-based volume subdivision, the third one relies on stereo information to introduce the concept of propagated photoconsistency, thereby addressing some of the shortcomings of classical methodologies. Qualitative and quantitative experiments showed that precise and spatially consistent reconstructions can be computed by minimizing continuous convex functionals. In [45], a graph cut algorithm manages to recover the 3D shape of an object using both silhouette and foreground color information. Initially, a method that is able to deal with silhouette uncertainties arising from background subtraction is used to extract the visual hull of the object. Then, the graph cut algorithm is used for optimization on a color consistency field. Constraints that are added to improve its performance are efficient enough to preserve protrusions and to pursue concavities on the surface of the object. Sinha et al.[46], recover surfaces at high resolution by performing a graph-cut on the dual of an adaptive volumetric mesh created by photo-consistency driven subdivision. This methods does not require good initializations and is not restricted to a specific surface topology. The specific graph-cut formulation enforces silhouette constraints to counter the bias for minimal surfaces. Local shape refinement via surface deformation is used to recover details in the reconstructed surface. However, the solutions that incorporate silhouette constraints [45, 46] are only viable when exact silhouettes are available. 4 The 3D Reconstruction Lab of 3DLife The 3DLife [54] project aims to develop technologies that could make interaction between humans in virtual online environments easier, more reliable and more

11 Lecture Notes in Computer Science 11 realistic. In order to achieve this goal, the integration of recent progress in 3D data acquisition and processing, autonomous avatars, real-time rendering, interaction in virtual worlds and networking are required. Therefore, scenarios that will enable exploitation of recent progress have been developed. A key element in these scenarios is to collect 3D data in real data from moving people. On this scope a 3D reconstruction lab was established. The lab consists of six CCD cameras (Fig. 4(IV)), which are mounted on a cylindrical grid. Each camera is switched to a computer, which are connected to a net switch forming a star network. There is an additional computer used as server, where the Network Time Protocol is installed. This protocol is used to synchronize the clocks of the computers with the server, so that all the computers share the same time. Server s Graphical User Interface (GUI) encompasses the tools used for monitoring and performing the 3D reconstruction process. In particular, video frames from all cameras are depicted in real time through the interface (Fig. 3(I)). Thus, cameras positions and orientations can be readily adjusted to gain better visibility of the scene. The calibration of the camera network is performed via Zhang s algorithm [48], which uses as calibration object a chessboard pattern. The GUI (Fig. 3(II)) allows the user to define the calibration pattern settings (the number and the size of the squares along width and size), while calibration can be performed either automatically or manually. The calibration outcome are the extrinsic and the intrinsic parameters, which are essential to fuse video data across the cameras. The volumetric intersection approach presented in [49] is the basis of computing the visual hull of the 3D object through multiple video inputs. The process of generating a 3D frame in this framework is described by the following steps: 1. Synchronized multiple video capturing: A set of multiple images are captured simultaneously by the camera network (Fig. 4(I)). 2. Silhouette Extraction: A silhouette is extracted per frame by exploiting a background subtraction algorithm [51] (Fig. 4(II)). 3. Silhouette Volume Intersection: A visual cone, encasing the 3D object, is generated per silhouette and cones are intersected with each other to generate the visual hull of the object (voxel representation). 4. Surface Shape Representation: A marching cubes method [50] is applied to convert the voxel representation to a polygonal representation (the outcome of 3,4 is visualized in Fig. 4(III)). 5. Texture Mapping: Color and texture are projected on the generated 3D shape. The reconstruction results for a time frame are depicted in Fig. 5. More specifically, on the left image the 3D object is depicted in a model simulation of the reconstruction lab (the spheres simulate the position of the cameras), while on the right image a closer view of the reconstructed object is provided. The main advantages of this method are the following: It allows real-time dynamic full 3D shape reconstruction.

12 12 Lecture Notes in Computer Science Fig. 4. (I) Video Capture, (II) Silhouette Extraction, (III) 3D Shape Reconstruction and (IV) 3D Reconstruction Lab Overview. It provides good 3D reconstruction results even for poor quality video, as no texture correspondences are needed. Thus, this algorithm fits to the scope of this lab, which is to collect 3D data in real data from moving people. Its main drawback lies on the fact that not high level of detail can be obtained. Other 3D reconstruction methods (Sections: 3.2, 3.3) could provide better reconstruction fidelity, but the time required for the data process is far from being real time. As a sequence, they are not appropriate for the specific application. The ongoing research towards improving the 3D reconstruction algorithm, includes: (a) the employment of a more efficient background subtraction algorithm than [51], which will be more robust to camera sensor noise, ambiguities between objects and background colors, changes in the lighting of the scene (including shadows of objects of interest) and (b) fusion of the existing method with novel approaches that will improve the reconstruction fidelity without adding significantly to the processing time. 5 Conclusions This paper presents a survey on recent reconstruction algorithms that are able to reconstruct dense and full 3D models. The aim of this survey is broad since methods that exploit variant types of equipment are included. However, except for the equipment-based categorization a further classification for each category is given based on the particular techniques that methods use to recover the 3D shape of objects. Acknowledgments. This work was supported by the 3DLifeEU Network of Excellence (NoE) project.

13 Lecture Notes in Computer Science 13 Fig. 5. Reconstruction results for a time frame. References 1. Kim, T., Seo, Y., Lee, S., Yang, Z., Chang, M.: Simultaneous registration of multiple views with markers. Computer-Aided Design, Elsevier, 41(4), , (2009) 2. Akca, D.: Full automatic registration of laser scanner point clouds. Optical 3-D Measurement Techniques VI, 1, , (2003) 3. Bienert, A., Maas, H.: Methods for the Automatic Geometric Registration of Terrestrial Laserscanner Point Clouds in Forest Stands. In ISPRS Workshop, (2009) 4. Stamos, I., Liu, L., Chao, C., Wolberg, G., Yu, G., Zokai, S.: Integrating Automated Range Registration with Multiview Geometry for the Photorealistic Modeling of Large-Scale Scenes. IJCV, Springer, 78(2/3), , (2008) 5. Chen, C., Stamos, I.: Semi-automatic range to range registration: a feature-based method. In the 5th international conference on 3-D digital imaging and modeling, , (2005) 6. Jiang, J., Cheng, J., Chen, X.: Registration for 3-D point cloud using angularinvariant feature. Neurocomputing, Elsevier, 72, (2009) 7. Smith, E., King, B., Stewart, C., Radke, R.: Registration of Combined Range- Intensity Scans: Initialization through Verification. Computer Vision and Image Understanding, 110, (2008) 8. Seo, J.,Sharp, G., Lee, S.: Range data registration using photometric features. In Proc. of CVPR, 2, , (2005) 9. Bendels, G., Degener, P., Wahl, R., Kortgen, M.: Klein R., Image-based registration of 3d-range data using feature surface elements. In Proc. of International Symposium on Virtual Reality, Archaeology and Cultural Heritage VAST, (2004) 10. Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Graphics and Image Processing 24(6), , (1981) 11. Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60, (2004)

14 14 Lecture Notes in Computer Science 12. Stewart, C., Tsai, C., Roysam, B.: The Dual-Bootstrap iterative closest point algorithm with application to retinal image registration. IEEE Trans. Med. Imag., 22(11), , (2003) 13., Johnson, A., Hebert, M.: Surface registration by matching oriented points. In International Conference on Recent Advances in 3-D Imaging and Modelling, , (1997) 14. Mian, A., Bennamoun, M., Owens, R.: A Novel Representation and Feature Matching Algorithm for Automatic Pairwise Registration of Range Images. IJCV, Springer, 66(1), 19 40, (2006) 15. Makadia, A., Patterson, A., Daniilidis, K.: Fully automatic registration of 3d point clouds. In CVPR, (2006) 16. Kang, S., Ikeuchi, K.: The complex EGI: A new representation for 3-D pose determination. TPAMI, 15(7), , (1993) 17. Kostelec, P., Rockmore, D.: FFTs on the Rotation Group. In Working Paper Series, Santa Fe Institute, (2003) 18. Wyngaerd, V., Gool, V.: Combining texture and shape for automatic crude patch registration. In Proc. Fourth Int. Conf. on 3DIM, pages, , (2003) 19. Besl, P., McKay, N.: A method for registration of 3-d shapes. IEEE Trans. Pattern Anal. Mach. Intell, 14(2), , (1992) 20. Rusinkiewicz, S., Levoy, M.: Effcient variants of the ICP algorithm. In Proc. Third Int. Conf. on 3DIM, , (2001) 21. Pottmann, H., Huang, Q., Yang, Y., Hu, S.:Geometry and convergence analysis of algorithms for registration of 3d shapes. Int. J. Comp. Vis., 67(3), , (2006) 22. Okatani, I., Sugimoto, A.: Registration of Range Images that Preserves Local Surface Structures and Color, In 3DPTV, , (2004) 23. Sharp, G., Lee, S., Wehe, D.: Maximum-Likelihood Registration of Range Images with Missing Data. TPAMI, 30(1), , (2008) 24. Matusik, W., Buehler, C., Raskar, R., Gortler, S., McMillan, L.: Image-based visual hulls. In SIGGRAPH Proceedings, , (2000) 25. Franco, J., Boyer, E.: Fusion of multi-view silhouette cues using a space occupancy grid. In ICCV, 2, , (2005) 26. Ishikawa, T., Yamazawa, K., Yokoya, N.: Real-time generation of novel views of a dynamic scene using morphing and visual hull. Proc. ICIP, , (2005) 27. Grauman, K., Shakhnarovich, G., Darrell, T.: A Bayesian Approach to Image- Based Visual Hull Reconstruction. In CVPR, , (2003) 28. Lai, P., Yilmaz, A.: Shape Recovery Using Rotated Slicing Planes, Int. Congress on Image and Signal Processing, (2009) 29. Kutulakos, K., Seitz, S.: A Theory of Shape by Space Carving. In International Journal of Computer Vision, 38(3), , (2000) 30. Bhotika, R., Fleet, D., Kutulakos, K.: A Probabilistic Theory of Occupancy and Emptiness. In Proc. ECCV, 3, , (2002) 31. Yang, R., Pollefeys, M., Welch, G.: Dealing with textureless regions and specular highlights - A progressive space carving scheme using a novel photo-consistency measure. In ICCV, , (2003) 32. Ladikos, A.,Benhimane, S., Navab, N.: Multi-View Reconstruction using Narrow- Band Graph-Cuts and Surface Normal Optimization, In BMVC, (2008) 33. Vogiatzis, G., Torr, P., Cipolla, R.: Multi-view stereo via volumetric graph-cuts. In CVPR, , (2005) 34. Hornung, A., Kobbelt, L.: Hierarchical volumetric multi-view stereo reconstruction of manifold surfaces based on dual graph embedding. In CVPR, 1, , (2006)

15 Lecture Notes in Computer Science Zeng, G., Paris S., Quan, L., Sillion, F.: Progressive surface reconstruction from images using a local prior. In ICCV, , (2005) 36. Zeng, G., Paris S., Quan, L., Sillion, F.: Accurate and scalable surface representation and reconstruction from images. TPAMI, 29(1), , (2007) 37. Jin, H., Soatto, S., Yezzi A.: Multi-view stereo reconstruction of dense shape and complex appearance. International Journal of Computer Vision, 63(3), , (2005) 38. Yu, T., Ahuja, N., Chen, W.: SDG Cut: 3d reconstruction of non-lambertian objects using graph cuts on surface distance grid. In Proc. CVPR, (2006) 39. Paris, S., Sillion, F., Quan, L.: A surface reconstruction method using global graph cut optimization. IJCV, Springer, 66(2), , (2006) 40. Pons, J., Keriven, R., Faugeras, O.: Multi-view stereo reconstruction and scene flow estimation with a global image-based matching score. IJCV, Springer, 72(2), , (2007) 41. Osher, S., Sethian, J.: Fronts propagating with curvature-dependent speed: algorithms based on hamilton-jacobi equations. J. of Comp. Physics, 79, 12 49, (1988) 42. Lhuillier, M., Quan, L.: Surface Reconstruction by Integrating 3D and 2D Data of Multiple Views. In ICCV, (2003) 43. Lhuillier, M., Quan, L.: A Quasi-Dense Approach to Surface Reconstruction from Uncalibrated Images, TPAMI, 27(3), , (2005) 44. Kolev K.,Klodt, M.,Brox, T.,Cremers, D.:Continuous Global Optimization in Multiview 3D Reconstruction. IJCV, Springer, 84, 8096, (2009) 45. Tran, S., Davis, L.: 3d surface reconstruction using graph cuts with surface constraints. In ECCV, 2, , (2006) 46. Sinha, S., Mordohai, P., Pollefeys, M.: Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh. In ICCV, (2007) 47. Vogiatzis, G., C. Hernandez, Torr, P., and Cipolla, R.: Multiview stereo via volumetric graph-cuts and occlusion robust photo-consistency. TPAMI, 29(12), , (2007) 48. Zhang, Z.:A flexible new technique for camera calibration. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), , (2000) 49. Matsuyama, T., Wu, X., Takai, T., Wada, T.: Real-Time Dynamic 3-D Object Shape Reconstruction and High-Fidelity Texture Mapping for 3-D Video. In IEEE Transactions on Circuits and Systems for Video Technology, 14(3), , (2004) 50. Lorensen, W., Cline, H.: Marching cubes: A high resolution 3d surface construction algorithm. Computer Graphics, 21(4): , Kim, K., Chalidabhongse, T., Harwood, D., Davis L.: Real-time foregroundbackground segmentation using codebook model. Real-Time Imaging, 11, , (2005) 52. Autodesk 3ds Max, VRMesh - For point cloud and triangle mesh processing, DLife-Towards a VCE for Media Internet,