How To Model A 3D Shape With A 3D Model



Similar documents
Subspace Analysis and Optimization for AAM Based Face Alignment

Face Model Fitting on Low Resolution Images

Mean-Shift Tracking with Random Sampling

Data Mining: Algorithms and Applications Matrix Math Review

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Linear Algebra Review. Vectors

Accurate and robust image superresolution by neural processing of local image representations

SYMMETRIC EIGENFACES MILI I. SHAH

PASSIVE DRIVER GAZE TRACKING WITH ACTIVE APPEARANCE MODELS

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

3D Tranformations. CS 4620 Lecture 6. Cornell CS4620 Fall 2013 Lecture Steve Marschner (with previous instructors James/Bala)

Nonlinear Iterative Partial Least Squares Method

Metrics on SO(3) and Inverse Kinematics

α = u v. In other words, Orthogonal Projection

Review Jeopardy. Blue vs. Orange. Review Jeopardy

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Robust Principal Component Analysis for Computer Vision

3D Model based Object Class Detection in An Arbitrary View

Least Squares Estimation

A Learning Based Method for Super-Resolution of Low Resolution Images

8. Linear least-squares

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

Classifying Manipulation Primitives from Visual Data

DATA ANALYSIS II. Matrix Algorithms

Introduction to Matrix Algebra

Removing Moving Objects from Point Cloud Scenes

Least-Squares Intersection of Lines

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

The Characteristic Polynomial

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

Solving Simultaneous Equations and Matrices

1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY Principal Components Null Space Analysis for Image and Video Classification

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Towards Online Recognition of Handwritten Mathematics

Reconstructing 3D Pose and Motion from a Single Camera View

Inner Product Spaces and Orthogonality

Statistical Machine Learning

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Vision based Vehicle Tracking using a high angle camera

The Master s Degree with Thesis Course Descriptions in Industrial Engineering

How To Understand Multivariate Models

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Machine Learning and Pattern Recognition Logistic Regression

The Image Deblurring Problem

The Matrix Elements of a 3 3 Orthogonal Matrix Revisited

1 Sets and Set Notation.

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

3 Orthogonal Vectors and Matrices

Partial Least Squares (PLS) Regression.

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Taking Inverse Graphics Seriously

Essential Mathematics for Computer Graphics fast

Support Vector Machines with Clustering for Training with Very Large Datasets

Image Segmentation and Registration

Introduction to Computer Graphics Marie-Paule Cani & Estelle Duveau

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Content. Chapter 4 Functions Basic concepts on real functions 62. Credits 11

Component Ordering in Independent Component Analysis Based on Data Power

How To Understand The Theory Of Probability

Metric Spaces. Chapter Metrics

Learning Enhanced 3D Models for Vehicle Tracking

Fitting Subject-specific Curves to Grouped Longitudinal Data

Inner Product Spaces

Two-Frame Motion Estimation Based on Polynomial Expansion

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm

Bayesian Image Super-Resolution

Lecture 5: Singular Value Decomposition SVD (1)

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Probabilistic Latent Semantic Analysis (plsa)

2.2 Creaseness operator

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

How To Cluster

Applied Linear Algebra I Review page 1

Simultaneous Gamma Correction and Registration in the Frequency Domain

Solving Linear Systems, Continued and The Inverse of a Matrix

Computer Graphics. Geometric Modeling. Page 1. Copyright Gotsman, Elber, Barequet, Karni, Sheffer Computer Science - Technion. An Example.

State of Stress at Point

Section 1.1. Introduction to R n

9 MATRICES AND TRANSFORMATIONS

MATHEMATICAL METHODS OF STATISTICS

ISOMETRIES OF R n KEITH CONRAD

Chapter 6. Orthogonality

Efficient online learning of a non-negative sparse autoencoder

1 Introduction to Matrices

Continued Fractions and the Euclidean Algorithm

Heat Kernel Signature

We shall turn our attention to solving linear systems of equations. Ax = b

Empirical Model-Building and Response Surfaces

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

Transcription:

Subspace Procrustes Analysis Xavier Perez-Sala 1,3,4, Fernando De la Torre, Laura Igual 3,5, Sergio Escalera 3,5, and Cecilio Angulo 4 1 Fundació Privada Sant Antoni Abat, Vilanova i la Geltrú, 08800, Spain Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, 1513. USA 3 Computer Vision Center, Universitat Autònoma de Barcelona, Bellaterra, Spain. 4 Universitat Politècnica de Catalunya, Vilanova i la Geltrú, 08800, Spain. 5 Universitat de Barcelona, Barcelona, 08007, Spain. Abstract. Procrustes Analysis (PA) has been a popular technique to align and build -D statistical models of shapes. Given a set of -D shapes PA is applied to remove rigid transformations. Then, a non-rigid -D model is computed by modeling (e.g., PCA) the residual. Although PA has been widely used, it has several limitations for modeling -D shapes: occluded landmarks and missing data can result in local minima solutions, and there is no guarantee that the -D shapes provide a uniform sampling of the 3-D space of rotations for the object. To address previous issues, this paper proposes Subspace PA (SPA). Given several instances of a 3-D object, SPA computes the mean and a -D subspace that can simultaneously model all rigid and non-rigid deformations of the 3-D object. We propose a discrete (DSPA) and continuous (CSPA) formulation for SPA, assuming that 3-D samples of an object are provided. DSPA extends the traditional PA, and produces unbiased -D models by uniformly sampling different views of the 3-D object. CSPA provides a continuous approach to uniformly sample the space of 3-D rotations, being more efficient in space and time. Experiments using SPA to learn -D models of bodies from motion capture data illustrate the benefits of our approach. 1 Introduction In computer vision, Procrustes Analysis (PA) has been used extensively to align shapes (e.g., [19, 4]) and appearance (e.g., [0, 13]) as a pre-processing step to build -D models of shape variation. Usually, shape models are learned from a discrete set of -D landmarks through a two-step process [8]. Firstly, the rigid transformations are removed by aligning the training set w.r.t. the mean using PA; next, the remaining deformations are modeled using Principal Component Analysis (PCA) [18, 5]. PA has been widely employed despite suffering from several limitations: (1) The -D training samples do not necessarily cover a uniform sampling of all 3-D rigid transformations of an object and this can result in a biased model (i.e., some poses are better represented than others). () It is computationally expensive to learn a shape model by sampling all possible 3-D rigid transformations of an

X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo (a) (b) (c) Fig. 1. Illustration of Continuous Subspace Procrustes Analysis (CSPA), which builds an unbiased -D model of human joints variation (b) by integrating over all possible viewpoints of a 3-D motion capture data (a). This -D body shape model is used to reconstruct -D shapes from different viewpoints (c). Our CSPA model generalizes across poses and camera views because it is learned from a 3-D model. object. (3) The models that are learned using only -D landmarks cannot model missing landmarks due to large pose changes. Moreover, PA methods can lead to local minima problems if there are missing components in the training data. (4) Finally, PA is computationally expensive, it scales linearly with the number of samples and landmarks and quadratically with the dimension of the data. To address these issues, this paper proposes a discrete and a continuous formulation of Subspace Procrustes Analysis (SPA). SPA is able to efficiently compute the non-rigid subspace of possible -D projections given several 3-D samples of a deformable object. Note that our proposed work is the inverse problem of Non-Rigid Structure From Motion (NRSFM) [, 1, 3]. The goal of NRSFM is to recover 3-D shape models from -D tracked landmarks, while SPA builds unbiased -D models from 3-D data. The learned -D model has the same representational power of a 3-D model but leads to faster fitting algorithms [15]. SPA uniformly samples the space of possible 3-D rigid transformations, and it is extremely efficient in space and time. The main idea of SPA is to combine functional data analysis with subspace estimation techniques. Fig. 1 illustrates the main idea of this work. In Fig. 1 (a), we represent many samples of 3-D motion capture data of humans performing several activities. SPA simultaneously aligns all 3-D samples projections, while computing a -D subspace (Fig. 1 (b)) that can represent all possible projections of the 3-D motion capture samples under different camera views. Hence, SPA provides a simple, efficient and effective method to learn a -D subspace that accounts for nonrigid and 3-D geometric deformation of 3-D objects. These -D subspace models can be used for detection (i.e., constrain body parts, see Fig. 1 (c)), because the subspace models all 3-D rigid projections and non-rigid deformations. As we will show in the experimental validation, the models learned by SPA are able

Subspace Procrustes Analysis 3 to generalize better than existing PA approaches across view-points (because they are built using 3-D models) and preserve expressive non-rigid deformations. Moreover, computing SPA is extremely efficient in space and time. Procrustes Analysis Revisited This section describes three different formulations of PA with a unified and enlightening matrix formulation. Procrustes Analysis (PA): Given a set of m centered shapes (see footnote for notation 1 ) composed by l landmarks D i R d l, i = 1,..., m, PA [6, 9, 8, 10, ] computes the d-dimensional reference shape M R d l and the m transformations T i R d d (e.g., affine, Euclidean) that minimize the reference-space model [10, 8, ] (see Fig. (a)): E R (M, T) = m T i D i M F, (1) where T = [T T 1,[, T T m] T ] R dm d. In the case of two-dimensional shapes x1 x (d = ), D i =... x l. Alternatively, PA can be optimized using the y 1 y... y l data-space model [] (see Fig. (b)): E D (M, A) = m D i A i M F, () where A = [A T 1,, A T m] T R dm d. A i = T 1 R d d is the inverse transformation of T i and corresponds to the rigid transformation for the reference shape M. The error function Eq. (1) of the reference-space model minimizes the difference between the reference shape and the registered shape data. In the data-space model, the error function Eq. () compares the observed shape points with the transformed reference shape, i.e., shape points predicted by the model and based on the notion of average shape [3]. This difference between the two models leads to different properties. Since the reference-space cost (E R, Eq. (1)) is a sum of squares and it is convex in the optimization parameters, it can be optimized globally with Alternated Least Squares (ALS) methods. On the other hand, the data-space cost (E D, Eq. ()) is a bilinear problem and non-convex. If there is no missing data, the data-space model can be solved using the Singular Value 1 Bold capital letters denote a matrix X, bold lower-case letters a column vector x. x i represents the i th column of the matrix X. x ij denotes the scalar in the i th row and j th column of the matrix X. All non-bold letters represent scalars. I n R n n is an identity matrix. x = i xi and X F = ij x ij denote the -norm for a vector and the Frobenius norm of a matrix, respectively. X Y is the Kronecker product of matrices and X (p) is the vec-transpose operator, detailed in Appendix A.

4 X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo (a) (b) Fig.. (a): Reference-space model. (b): Data-space model. Note that A i = T 1 i. Decomposition (SVD). A major advantage of the data-space model is that it is gauge invariant (i.e., the cost does not depend on the coordinate frame in which the reference shape and the transformations are expressed) []. Benefits of both models are combined in []. Recently, Pizarro et al. [19] have proposed a convex approach for PA based on the reference-space model. In their case, the cost function is expressed with a quaternion parametrization which allows conversion to a Sum of Squares Program (SOSP). Finally, the equivalent semi-definite program of a SOSP relaxation is solved using convex optimization. PA has also been applied to learn appearance models invariant to geometric transformations. When PA is applied to shapes, the geometric transformation (e.g., T i or A i ) can be directly applied to the image coordinates. However, to align appearance features the geometric transformations have to be composed with the image coordinates, and the process is a bit more complicated. This is the main difference when applying PA to align appearance and shape. Frey and Jojic [7] proposed a method for learning a factor analysis model that is invariant to geometric transformations. The computational cost of this method grows polynomially with the number of possible spatial transformations and it can be computationally intensive when working with high-dimensional motion models. To improve upon that, De la Torre and Black [0] proposed parameterized component analysis: a method that learns a subspace of appearance invariant to affine transformations. Miller et al. proposed the congealing method [13], which uses an entropy measure to align images with respect to the distribution of the data. Kookinos and Yuille [1] proposed a probabilistic framework and extended previous approaches to deal with articulated objects using a Markov Random Field (MRF) on top of Active Appearance Models (AAMs). Projected Procrustes Analysis (PPA): Due to advances in 3-D capture systems, nowadays it is common to have access to 3-D shape models for a variety of objects. Given n 3-D shapes D i R 3 l, we can compute r projections P j

Subspace Procrustes Analysis 5 R 3 for each of them (after removing translation) and minimize PPA: E PPA (M, A ij ) = j=1 r P j D i A ij M F, (3) where P j is an orthographic projection of a 3-D rotation R(ω) in a given domain Ω, defined by the rotation angles ω = {φ, θ, ψ}. Note that, while data and reference shapes are d-dimensional in Eq. (1) and Eq. (), data D i and reference M R l shapes in Eq. (3) are fixed to be 3-D and -D, respectively. Hence, A ij R is a -D transformation mapping M to the -D projection of the 3-D data. ALS is a common method to minimize Eq. () and (3). ALS alternates between minimizing over M and A ij R with the following expressions: A ij = P j D i M T (MM T ) 1 i, j, (4) r r M = ( A T ija ij ) 1 ( ( A T ijp j )D i ). (5) j=1 j=1 Note that PPA and its extensions deal with missing data naturally. Since they use the whole 3-D shape of objects, the enhanced -D dataset resulting of projecting the data from different viewpoints can be constructed without occluded landmarks. Continuous Procrustes Analysis (CPA): A major limitation of PPA is the difficulty to generate uniform distributions in the Special Orthogonal group SO(3) [17]. Due to the topology of SO(3), different angles should be sampled following different distributions, which becomes difficult when the rotation matrices must be confined in a specific region Ω of SO(3), restricted by rotation angles ω = {φ, θ, ψ}. Moreover, the computational complexity of PPA increases linearly with the number of projections (r) and 3-D objects (n). In order to deal with these drawbacks, a continuous formulation (CPA) was proposed in [10] by formulating PPA within a functional analysis framework. CPA minimizes: E CPA (M, A(ω) i ) = Ω P(ω)D i A(ω) i M F dω, (6) where dω = 1 8π sin(θ)dφdθdψ ensures uniformity in SO(3) [17]. This continuous formulation finds the optimal -D reference shape of a 3-D dataset, rotated and projected in a given domain Ω, by integrating over all possible rotations in that domain. The main difference between Eq. (3) and Eq. (6) is that the entries in P(ω) R 3 and A(ω) i R are not scalars anymore, but functions of the integration angles ω = {φ, θ, ψ}. After some linear algebra and functional analysis, it is possible to find an equivalent expression to the discrete approach

6 X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo (Eq. (3)), where A(ω) i and M have the following expressions: A(ω) i = P(ω)D i M T (MM T ) 1 i, (7) ( n ) 1 ( n ( ) ) M = Ω A(ω)T i A(ω) i dω Ω A(ω)T i P(ω)dω D i. (8) It is important to notice that the -D projections are not explicitly computed in the continuous formulation. The solution of M is found using fixed-point iteration in Eq. (6): M = (ZM T (MM T ) 1 ) 1 Z, (9) where X = Ω P(ω)T P(ω)dω R 3 3 averages the rotation covariances and Z = (MM T ) 1 M ( n (DT i D T i ) vec(x)) (l). Note that the definite integral X is not data dependent, and it can be computed off-line. Our work builds on [10] but extends it in several ways. First, CPA only computes the reference shape of the dataset. In this paper, we add a subspace that is able to model non-rigid deformations of the object, as well as rigid 3-D transformations that the affine transformation cannot model. As we will describe later, adding a subspace to the PA formulation is not a trivial task. For instance, modeling a subspace following the standard methodology based on CPA would still require to generate r rotations for each 3-D sample. Hence, the CPA efficiency is limited to rigid models while our approach is not. Second, we provide a discrete and continuous formulation in order to provide a better understanding of the problem, and experimentally show that it converges to the same solution when the number of sampled rotations (r) increases. Finally, we evaluate the models in two challenging problems: pose estimation in still images and joints modeling. 3 Subspace Procrustes Analysis (SPA) This section proposes Discrete Subspace Procrustes Analysis (DSPA) and Continuous Subspace Procrustes Analysis (CSPA) to learn unbiased -D models from 3-D deformable objects. Discrete Subspace Procrustes Analysis (DSPA): Given a set of r viewpoints P j R 3 of the n 3-D shapes, where d i = vec(d i ) R 3l 1, DSPA extends PA by considering a subspace B R l k and the weights c ij R k 1 which model the non-rigid deformations that the mean M and the transformation A ij are not able to reconstruct. DSPA minimizes the following function: E DSPA (M, A ij, B, c ij ) = j=1 j=1 r P j D i A ij M (c T ij I )B () = (10) r (I l P j )d i (I l A ij )µ Bc ij, (11) See Appendix A for an explanation of the vec-transpose operator. F

Subspace Procrustes Analysis 7 where P j is a particular 3-D rotation, R(ω), that is projected using an orthographic projection into -D, µ = vec(m) R l 1 is the vectorized version of the reference shape, c ij are the k weights of the subspace for each -D shape projection, and B () R k l is the reshaped subspace. Observe that the only difference with Eq. (3) is that we have added a subspace. This subspace will compensate for the non-rigid components of the 3-D object and the rigid component (3-D rotation and projection to the image plane) that the affine transformation cannot model. Recall that a 3-D rigid object under orthographic projection can be recovered with a three-dimensional subspace (if the mean is removed), but PA cannot recover it because it is only rank two. Also, observe that the coefficient c ij depends on two indexes, i for the object and j for the geometric projection. Dependency of c ij on the geometric projection is a key point. If j index is not considered, the subspace would not be able to capture the variations in pose and its usefulness for our purposes would be unclear. Although Eq. (10) and the NRSFM problem follow similar formulation [3], the assumptions are different and variables have opposite meanings. For instance, the NRSFM assumptions about rigid transformations do not apply here, since A ij are affine transformations in our case. Given an initialization of B = 0, DSPA is minimized by finding the transformations A ij and reference shape M that minimize Eq. (3), using the same ALS framework as in PA. Then, we substitute A ij and M in Eq. (11) that results in the expression: E DSPA (B, c ij ) = j=1 j=1 r D ij (c T ij I )B () = (1) r d ij Bc ij = D BC F F, (13) where D ij = P j D i A ij M R l, d ij = vec( D ij ) R l 1, D = [ d 1... d nr ] R l nr, and C R k nr. We can find the global optima of Eq. (13) by Singular Value Decomposition (SVD): B = U and C = SV T, where D = USV T. Continuous Subspace Procrustes Analysis (CSPA): As it was discussed in the previous section, the discrete formulation is not efficient in space nor time, and might suffer from not uniform sampling of the original space. CSPA generalizes DSPA by re-writting it in a continuous formulation. CSPA minimizes the following functional: E CSPA (M, A(ω) i, B, c(ω) i ) = P(ω)D i A(ω) i M (c(ω) T i I )B () dω = (14) Ω F Ω (I l P(ω))d i (I l A(ω) i )µ Bc(ω) i dω, (15)

8 X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo where dω = 1 8π sin(θ)dφdθdψ. The main difference between Eq. (15) and Eq. (11) is that the entries in c(ω) i R k 1, P(ω) R 3 and A(ω) i R are not scalars anymore, but functions of integration angles ω = {φ, θ, ψ}. Given an initialization of B = 0, and similarly to the DSPA model, CSPA is minimized by finding the optimal reference shape M that minimizes Eq. (6). We used the same fixed-point framework as CPA. Given the value of M and the expression of A(ω) i from Eq. (7), we substitute them in Eq. (15) resulting in: E CSPA (B, c(ω) i ) = P(ω) D i (c(ω) T i I )B () dω = (16) Ω F (Il P(ω)) d i Bc(ω) i dω, (17) Ω where D i = D i (I l (M T (M M T ) 1 M )) and d i = vec( D i ). We can find the global optima of Eq. (17) by solving the eigenvalue problem, ΣB = BΛ, where Λ are the eigenvalues corresponding to columns of B. After some algebra (see Appendix B) we show that the covariance matrix Σ = ((I l Y) vec( n r d j=1 ij dt ij )) (l) where the definite integral Y = Ω P(ω) (I l P(ω))dω R l l can be computed off-line, leading to an efficient optimization in space and time. Though the number of elements in matrix Y increase quadratically with the number of landmarks l, note that the integration time is constant since Y has a sparse structure with only 36 different non-zero values (recall that P(ω) R 3 ). Although A(ω) i and c(ω) i are not explicitly computed during training, this is not a limitation compared to DSPA. During testing time, training values of c(ω) i are not needed. Only the deformation limits in each principal direction of B are required. These limits also depend on eigenvalues [4], which are computed with CSPA. 4 Experiments & Results This section illustrates the benefits of DSPA and CSPA, and compares them with state-of-the-art PA methods to build -D shape models of human skeletons. First, we compare the performance of PA+PCA and SPA to build a -D shape model of Motion Capture (MoCap) bodies using the Carnegie Mellon University MoCap dataset [1]. Next, we compare our discrete and continuous approaches in a large scale experiment. Finally, we illustrate the generalization of our -D body model in the problem of human pose estimation using the Leeds Sport Dataset [11]. For all experiments, we report the Mean Squared Error (MSE) relative to the torso size. 4.1 Learning -D Joints Models The aim of this experiment is to build a generic -D body model that can reconstruct non-rigid deformations under a large range of 3-D rotations. For

Subspace Procrustes Analysis 9 (a) (b) (c) Fig. 3. Comparisons as a function of the number of training viewpoint projections. (a) Rigid and (b) Deformable models (using a subspace of 9 basis) from Experiment 1, respectively; (c) CSPA and DSPA deformable models (using a subspace of 1 basis) from Experiment. training and testing, we used the Carnegie Mellon University MoCap dataset that is composed of 605 sequences performed by 109 subjects. The sequences cover a wide variety of daily human activities and sports. Skeletons with 31 joints are provided, as well as RGB video recordings for several sequences. We trained our models using the set of 14 landmarks as is common across several databases for human pose estimation. Experiment 1: Comparison with State-of-the-Art PA Methods This section compares DSPA, CSPA methods with the state-of-the-art Stratified Generalized Procrustes Analysis (SGPA) [] 3. For training we randomly selected 3 sequences with 30 frames per sequence from the set of 11 running sequences of the user number 9 (this is due to the memory limitations of SGPA). For testing we randomly selected sequences with 30 frames from the same set. We rotated the 3-D models in the yaw and pitch angles, within the ranges of φ, θ [ π/, π/]. The angles were uniformly selected and we report results varying the number of considered angles (i.e., rotations) between 1 100 angles in training, and fixed 300 angles for testing. There are several versions of SGPA. We selected the Affine-factorization with the data-space model to make a fair comparison with our method. Recall that under our assumption of non-missing data Affine-All and Affinefactorization achieve the same solution, with Affine-factorization being faster. Fig. 3 shows the mean reconstruction error and 0.5 of the standard deviation for 100 realizations. Fig. 3 (a) reports the results comparing PA, CPA and SGPA. As expected, PA and SGPA converge to CPA as the number of training rotations increased. However, observe that CPA achieves the same performance, but it is much more efficient. Fig. 3 (b) compares DSPA, CSPA, and SGPA followed by PCA (we will refer to this method SGPA+PCA). From the figure we can 3 The code was downloaded from author s website (http://isit.u-clermont1.fr/ ab).

10 X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo Fig. 4. Experiment results with 1 (top), and 30 (bottom) rotations. Examples show skeleton reconstructions from continuous (CSPA in solid red lines) and discrete (SPA in dashed blue lines) models over ground truth (solid black lines). observe that the mean error in the test for DSPA and SGPA+PCA decrease with the number of rotations in the training, and it converges to CSPA. CSPA provides a bound on the lower error. Observe, that we used 90 3-D bodies (3 sequences with 30 frames) within rotating angles φ, θ [ π/, π/], and DSPA and SGPA+PCA needed about 30 angles to achieve similar result to CSPA. So, in this case, discrete methods need 30 times more space than the continuous one. The execution times with 30 rotations, on a.ghz computer with 8Gb of RAM, were 1.44 sec. (DSPA), 0.03 sec. (CSPA) and 3.54 sec. (SGPA+PCA). Experiment : Comparison between CSPA and DSPA This experiment compares DSPA and CSPA in a large-scale problem as a function of the number of rotations. For training we randomly selected 0 sequences with 30 frames per sequence. For testing we randomly selected 5 sequences with 30 frames. We rotated the 3-D models in the yaw and pitch angles, within the ranges of φ, θ [ π/, π/]. The angles were uniformly selected and we report results varying the number of angles (i.e., rotations) between 1 100 angles in training, and 300 angles for testing. Fig. 3 (c) shows the mean reconstruction error and 0.5 of the standard deviation for 100 realizations, comparing DSPA and CSPA. As expected, DSPA converges to CSPA as the number of training rotations increases. However, observe that CSPA achieves the same performance, but it is much more efficient. In this experiment, with 6000 3-D training bodies (0 sequences with 30 frames) and domain: φ, θ [ π/, π/] discrete method required, again, around 30 -D viewpoint projections to achieve similar results to CSPA. Thus, discrete model DSPA needs 30 times more storage space than CSPA. The execution times with 30 rotations, on a.ghz computer with 8Gb of RAM, were 14.75 sec. (DSPA) and 0.04 sec. (CSPA).

Subspace Procrustes Analysis 11 Qualitative results from CSPA and DSPA models trained with different number of rotations are shown in Fig. 4. Note that training DSPA model with 1 rotation (top) results in poor reconstruction. However, training it with 30 rotations (bottom) leads to reconstructions almost as accurate as made by CSPA. 4. Experiment 3. Leeds Sport Dataset This section illustrates how to use the -D body models learned with CSPA to detect body configurations from images. We used the Leeds Sport Dataset (LSP) that contains 000 images of people performing different sports, some of them including extreme poses or viewpoints (e.g., parkour images). The first 1000 images of the dataset are considered for training and the second set of 1000 images for testing. One skeleton manually labeled with 14 joints is provided for each training and test image. We trained our -D CSPA model in the CMU MoCap dataset [1] using 1000 frames. From the 605 sequences of the motion capture data, we randomly selected 1000 and the frame in the middle of sequence is selected as representative frame. Using this training data, we built the -D CSPA model using the following ranges for the pitch, roll and yaw angles: φ, θ, ψ [ 3/4π, 3/4π]. We will refer to this model as CSPA-MoCap. For comparison, we used the 1000 -D training skeletons provided by the LSP dataset and run SGPA+PCA to build an alternative -D model. We will refer to this model as SGPA+PCA-LSP. Observe, that this model was trained on similar data as the test set. Table 1. Experiment 3 results. MSE of our continuous model (CSPA-MoCap) trained with 3-D MoCap data, the discrete model trained in the LSP dataset (SGPA+PCA- LSP), and both rigid models (CPA-MoCap, SGPA-LSP). Model CPA-MoCap SGPA-LSP CSPA-MoCap SGPA+PCA-LSP MSE 0.16405 0.1631 0.01046 0.01366 Table 1 reports MSE of reconstructing the test skeletons with rigid and deformable models. A subspace of 1 basis is used for both deformable models. Results show that CSPA-MoCap has less reconstruction error than the standard method SGPA+PCA-LSP, even trained in a different dataset (CMU MoCap) than the test. Qualitative results from CSPA-MoCap and SGPA+PCA-LSP models are shown in Fig. 5. Note that CSPA-MoCap provides more accurate reconstructions than SGPA+PCA-LSP because it is able to generalize to all possible 3-D rotations in the given interval. 5 Conclusions This paper proposes an extension of PA to learn a -D subspace of rigid and nonrigid deformations of 3-D objects. We propose two models, one discrete (DSPA)

1 X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo Fig. 5. Experiment 3 examples, reconstructing ground truth skeletons of LSP dataset with CSPA-MoCap (solid red lines) and SGPA+PCA-LSP (dashed blue lines) models. that samples the 3-D rotation space, and one continuous (CSPA) that integrates over SO(3). As the number of projections increases DPSA converges to CSPA. SPA has two advantages over traditional PA, PPA: (1) it generates unbiased models because it uniformly covers the space of projections, and () CSPA is much more efficient in space and time. Experiments comparing -D SPA models of bodies show improvements w.r.t. state-of-the-art PA methods. In particular, CSPA models trained with motion capture data outperformed -D models trained on the same database under the same conditions in the LSP database, showing how our -D models from 3-D data can generalize better to different viewpoints. In future work, we plan to explore other models that decouple the rigid and non-rigid deformation by providing two independent basis in the subspace. 6 Acknowledgments This work is partly supported by the Spanish Ministry of Science and Innovation (projects TIN01-38416-C03-01, TIN01-38187-C03-01, TIN013-43478- P), project 014 SGR 119, and the Comissionat per a Universitats i Recerca del Departament d Innovació, Universitats i Empresa de la Generalitat de Catalunya.

Subspace Procrustes Analysis 13 A Appendix. Vec-transpose Vec-transpose A (p) is a linear operator that generalizes vectorization and transposition operators [14, 16]. It reshapes matrix A R m n by vectorizing each i th block of p rows, and rearranging it as the i th column of the reshaped matrix, such that A (p) R pn m p, a 11 a 41 (3) a 11 a 1 a 13 a 1 a 51 a 1 a a 3 a 31 a 61 a 31 a 3 a 33 a 1 a 4 = a a 5, a 41 a 4 a 43 a 51 a 5 a 53 a 61 a 6 a 63 a 11 a 1 a 13 a 1 a a 3 a 31 a 3 a 33 a 41 a 4 a 43 a 51 a 5 a 53 a 61 a 6 a 63 () a 3 a 6 a 13 a 43 a 3 a 53 a 33 a 63 a 11 a 31 a 51 a 1 a 41 a 61 = a 1 a 3 a 5 a a 4 a 6. a 13 a 33 a 53 a 3 a 43 a 63 Note that (A (p) ) (p) = A and A (m) = vec(a). A useful rule for pulling a matrix out of nested Kronecker products is, ((BA) (p) C) (p) = (C T I p )BA = (B (p) C) (p) A, which leads to (C T I )B = (B () C) (). B Appendix. CSPA formulation In this Appendix, we detail the steps from Eq. (14) to Eq. (17), as well as the definition of the covariance matrix, introduced in Section 3. Given the value of M and the optimal expression of A(ω) i from Eq. (7), we substitute them in Eq. (14) resulting in: E CSPA(B, c(ω) i) = P(ω)D i P(ω)D ih (c(ω) T i I )B () dω, (18) Ω F where H = M T (M M T ) 1 M and D i R 3 l. Then, E CSPA(B, c(ω) i) = P(ω)D i(i l H) (c(ω) T i I )B () dω (19) Ω F leads us to Eq. (16) and Eq. (17), where D i = D i (I l H) and d i = vec( D i ). From Eq. (17), solving ECSPA c(ω) i = 0 we find: c(ω) i = (B T B) 1 B T (I l P(ω)) d i. (0)

14 X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo The substitution of c(ω) i in Eq. (17) results in: E CSPA(B) = (I l P(ω)) d i B(B T B) 1 B T (I l P(ω)) d i dω = (1) Ω (I ) B(B T B) 1 B T (I l P(ω)) d i dω = () Ω [(I ) Ω tr B(B T B) 1 B T ( ) ] T (I l P(ω)) d i (Il P(ω)) d i dω = (3) [( ) ] tr I B(B T B) 1 B T Σ, (4) where: ( n ) Σ = Ω (I l P(ω)) d i dt i (I l P(ω)) T dω. (5) We can find the global optima of Eq. (4) by solving the eigenvalue problem, ΣB = BΛ, where Σ is the covariance matrix and Λ are the eigenvalues corresponding to columns of B. However, the definite integral in Σ is data dependent. To be able to compute the integral off-line, we need to rearrange the elements in Σ. Using vectorization and vec-transpose operator 4 : Σ = (vec [Σ]) (l) = (6) ( [ ( n ) (l) vec Ω (I l P(ω)) d i dt i (I l P(ω)) dω]) T = (7) ( ( ) [ n ]) (l) Ω (I l P(ω)) (I l P(ω))dω vec d i dt i, (8) which finally leads to: ( [ n ]) (l) Σ = (I l Y) vec d ij dt ij, (9) where the definite integral Y = Ω P(ω) (I l P(ω))dω R 4l 9l can be computed off-line. References 1. Carnegie mellon motion capture database. http://mocap.cs.cmu.edu. Bartoli, A., Pizarro, D., Loog, M.: Stratified generalized procrustes analysis. IJCV 101(), 7 53 (013) 3. Brand, M.: Morphable 3d models from video. In: CVPR. vol., pp. II 456. IEEE (001) 4. Cootes, T.F., Edwards, G.J., Taylor, C.J., et al.: Active appearance models. PAMI 3(6), 681 685 (001) 4 See Appendix A for the vec-transpose operator.

Subspace Procrustes Analysis 15 5. De la Torre, F.: A least-squares framework for component analysis. PAMI 34(6), 1041 1055 (01) 6. Dryden, I.L., Mardia, K.V.: Statistical shape analysis, vol. 4. John Wiley & Sons New York (1998) 7. Frey, B.J., Jojic, N.: Transformation-invariant clustering using the em algorithm. PAMI 5(1), 1 17 (003) 8. Goodall, C.: Procrustes methods in the statistical analysis of shape. Journal of the Royal Statistical Society. Series B (Methodological) pp. 85 339 (1991) 9. Gower, J.C., Dijksterhuis, G.B.: Procrustes problems, vol. 3. Oxford University Press Oxford (004) 10. Igual, L., Perez-Sala, X., Escalera, S., Angulo, C., De la Torre, F.: Continuous generalized procrustes analysis. PR 47(), 659 671 (014) 11. Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference (010), doi:10.544/c.4.1 1. Kokkinos, I., Yuille, A.: Unsupervised learning of object deformation models. In: ICCV. pp. 1 8. IEEE (007) 13. Learned-Miller, E.G.: Data driven image models through continuous joint alignment. PAMI 8(), 36 50 (006) 14. Marimont, D.H., Wandell, B.A.: Linear models of surface and illuminant spectra. JOSA A 9(11), 1905 1913 (199) 15. Matthews, I., Xiao, J., Baker, S.: d vs. 3d deformable face models: Representational power, construction, and real-time fitting. IJCV 75(1), 93 113 (007) 16. Minka, T.P.: Old and new matrix algebra useful for statistics. http://research.microsoft.com/en-us/um/people/minka/papers/matrix/, 000 17. Naimark, M.A.: Linear representatives of the Lorentz group (translated from Russian). New York, Macmillan (1964) 18. Pearson, K.: On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science (11), 559 57 (1901) 19. Pizarro, D., Bartoli, A.: Global optimization for optimal generalized procrustes analysis. In: CVPR. pp. 409 415. IEEE (011) 0. De la Torre, F., Black, M.J.: Robust parameterized component analysis: theory and applications to d facial appearance models. CVIU 91(1), 53 71 (003) 1. Torresani, L., Hertzmann, A., Bregler, C.: Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors. PAMI 30(5), 878 89 (008). Xiao, J., Chai, J., Kanade, T.: A closed-form solution to non-rigid shape and motion recovery. IJCV 67(), 33 46 (006) 3. Yezzi, A.J., Soatto, S.: Deformotion: Deforming motion, shape average and the joint registration and approximation of structures in images. IJCV 53(), 153 167 (003)