Semantic 3D Reconstruction joint reconstruction, recognition and segmentation

Semantic 3D Reconstruction joint reconstruction, recognition and segmentation Marc Pollefeys joint work with Christian Haene, Nikolay Savinov, Lubor Ladicky, Christopher Zach, Andrea Cohen

Outline Some recent 3D reconstruction projects Autonomous cars & micro-aerial vehicles Mobile 3D modeling (Google Tango, mobile phones) Semantic 3D reconstruction Joint reconstruction, recognition and segmentation High-order ray potentials Joint classifier Modeling objects 2

3D mapping for autonomous driving (Lee et al CVPR13; Heng et al. ICRA14; Haene et al. IROS15; ) Dense real-time temporal fisheye stereo (GPU) omnidirectional visual simultaneous localization and mapping cameras as main sensors: no lasers, no GPS

Fully autonomous vision-based navigation and mapping full on-board processing 2+1 cameras + IMU indoor and outdoor operation obstacle avoidance, mapping and exploration no laser, no GPS, no network (Fraundorfer et al. IROS12 best paper finalist) Our lab is also the home of PixHawk/PX4 more info at: http://pixhawk.ethz.ch

Research partner of Google s Project Tango 5 (Camposeco ICRA15, Schoeps et al. 3DV2015, Lynen RSS15...)

3D selfies! (Tanskanen et al. ICCV13; Kolev CVPR14)

Outline Highlights of some ongoing projects Autonomous cars & micro-aerial vehicles Mobile 3D modeling (Google Tango, mobile phones) Semantic 3D reconstruction Joint reconstruction, recognition and segmentation High-order ray potentials Joint classifier Modeling objects 7

Classical 3D reconstruction from images Classical 3D from image approach Compute relative pose between images (structure-from-motion) Estimate per pixel depth estimation (multi-view stereo matching) Compute surface reconstruction (graph energy minimization) E(x) = unary depth evidence term free-space S Ω ρ S ρ S optimal solution x s x S +φ ( x S ) φ isotropic shape prior (spherical) line-of-sight model occupied-space

Semantic 3D reconstruction n Building n Ground n Vegetation n Clutter Joint 3D reconstruction and class segmentation Obtain separate surface for each class of object Corresponds to multi-label volumetric segmentation problem 10

Discrete and Continuous Formulations Discrete Domain Continuous Domain Smoothness: transitions along edges Formulation: Linear Program [Schlesinger 1976] Arbitrary smoothness cost allowed Smoothness: (anisotropic) boundary length Formulation: Convex Program [Chambolle, Cremers, Pock 2008] Smoothness needs to form a metric

Convex, Continuous Multi-Label Formulation [Zach, Häne, Pollefeys, CVPR 2012, TPAMI 2014] Metric smoothness fulfills triangle inequality Truncated quadratic smoothness non-metric Our continuously inspired formulation Takes best from both worlds Non-metric and anisotropic boundary length cost

Cost for boundary: Energy Formulation

Wulff shapes The shape of an equilibrium crystal is obtained, according to the Gibbs thermodynamic principle, by minimizing the total surface free energy associated to the crystal-medium interface. http://www.scholarpedia.org/ Wulff's theorem: The minimum surface energy for a given volume of a polyhedron will be achieved if the distances of its faces from one given point are proportional to their surface tension Proposed use for anisotropic regularization [Esedoglu and Oscher 2004, Zach et al. 2009, Haene et al. 2013/14/15]

Dense Semantic 3D Reconstruction [Häne, Zach, Cohen, Angst, Pollefeys, CVPR 2013] Input Images Semantic Classifier Sparse Reconstruction Dense Matching Class Likelihoods Depth Maps

Dense Semantic 3D Reconstruction [Häne, Zach, Cohen, Angst, Pollefeys, CVPR 2013] Class Likelihoods Depth Maps Joint Fusion, Convex Optimization Dense Semantic 3D Model

Formulation: Labeling of a Voxel Space [Häne, Zach, Cohen, Angst, Pollefeys, CVPR 2013] Data Term: Described as per-voxel unary potentials Regularization Term: Class-specific, direction dependent, surface area penalization Learned from training data

Class specific training/selection of building ground vegetation stuff air building ground vegetation stuff 20

Joint 3D reconstruction and class segmentation (Haene et al CVPR13) 21

Weakly Observed Structures I Buildings standing on the ground

Weakly Observed Structures II Building separated from vegetation

Labels can be separated Unobserved Surfaces

Joint 3D reconstruction and class segmentation (Haene et al CVPR13) 26

More Results Single reconstruction, segmentation per dataset Input Image Dense 3D Reconstruction Semantic 3D Reconstruction

Higher-order ray potentials to model visibility Volumetric formulation (Savinov et al, CVPR15/CVPR16) Ray potentials Pairwise regularizer Cost based on the first occupied voxel along the ray freespace depth label

Cost based on the first occupied voxel along the ray (Savinov et al, CVPR15/CVPR16) r 3 ψ r (x r )=φ r (3, red) K r =3 No dependency on those voxels

Visibility Consistency Constraint (Savinov et al, CVPR16) (non-convex)

Results minimize Δ inference generative 31 (Savinov et al, CVPR15/16)

2-label V-Constraint: SOTA on Middlebury V-Constraint: nicely handles thin objects

Joint classification of depth and class? 34

Learning joint classifier for depth and class? (Ladicky, Shi and Pollefeys CVPR 2014) key idea: avoid learning class appearance at different scales How well do we estimate depth? proposed approach is both efficient and unbiased towards specific depths 35

Can our approach also handle objects? Extend approach from Stuff to Things Introduce location dependent anisotropic smoothness prior 36

Learning location-dependent anisotropic smoothnes prior (Haene et al CVPR 2014) Download training data from Google 3D warehouse At every voxel in 3D bounding box, estimate distribution of observed shape normals Determine convex Wulff shape which best represents observed statistics 37

38 Cars

39 Car reconstructions

Advantages of multi-class segmentation Learn separate statistics for car-air and car-ground transition likelihoods 40

Semantic multi-class 3D head reconstruction (Maninchedda et al.) Align prior to data by minimizing smoothness term towards position parametric shape model implicit semantic shape model

Segment based 3D object shape priors represent non-convex object shape priors as combination of convex part priors Karimi, Haene, Pollefeys (CVPR15) build prior by performing (approximate) convex decomposition of example (and observe which transitions occur) 42 Computational 3D Photography

Result: Tables

Result: Trees

Result Mug

Multi-Body Depth-Map Fusion with Non-Intersection Constraints Laptop with and without self-intersection (Jacquet et al ECCV14) 46

Detect and regularize for symmetries (Speciale et al.) A preference for symmetry can be introduced by adding non-local regularization terms

Conclusion Volumetric multi-view approach which performs joint reconstruction, recognition and segmentation Strong coupling between geometry and appearance via classdependent anisotropic smoothness term Clean energy formulation (with tight convex relaxation) Significant qualitative improvement with respect to state of the art (regularized solution make more sense) Semantic reconstruction is more useful

Challenges and future research Scale to many classes leverage sparsity of semantic interactions, class hierarchies Scale to large volumes adaptive space discretization and basis representation Dynamic scenes Spatio-temporal interactions, extensions to 4D volumes and Wulff shapes Exploration and robotics Enable real-time navigation and exploration Predict information gain of perception actions

Thank you for your attention! Questions? Collaborators: Christian Haene, Nikolay Savinov, Lubor Ladicky, Christopher Zach, Andrea Cohen