How To Model A 3D Shape With A 3D Model

Subspace Procrustes Analysis Xavier Perez-Sala 1,3,4, Fernando De la Torre, Laura Igual 3,5, Sergio Escalera 3,5, and Cecilio Angulo 4 1 Fundació Privada Sant Antoni Abat, Vilanova i la Geltrú, 08800, Spain Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, 1513. USA 3 Computer Vision Center, Universitat Autònoma de Barcelona, Bellaterra, Spain. 4 Universitat Politècnica de Catalunya, Vilanova i la Geltrú, 08800, Spain. 5 Universitat de Barcelona, Barcelona, 08007, Spain. Abstract. Procrustes Analysis (PA) has been a popular technique to align and build -D statistical models of shapes. Given a set of -D shapes PA is applied to remove rigid transformations. Then, a non-rigid -D model is computed by modeling (e.g., PCA) the residual. Although PA has been widely used, it has several limitations for modeling -D shapes: occluded landmarks and missing data can result in local minima solutions, and there is no guarantee that the -D shapes provide a uniform sampling of the 3-D space of rotations for the object. To address previous issues, this paper proposes Subspace PA (SPA). Given several instances of a 3-D object, SPA computes the mean and a -D subspace that can simultaneously model all rigid and non-rigid deformations of the 3-D object. We propose a discrete (DSPA) and continuous (CSPA) formulation for SPA, assuming that 3-D samples of an object are provided. DSPA extends the traditional PA, and produces unbiased -D models by uniformly sampling different views of the 3-D object. CSPA provides a continuous approach to uniformly sample the space of 3-D rotations, being more efficient in space and time. Experiments using SPA to learn -D models of bodies from motion capture data illustrate the benefits of our approach. 1 Introduction In computer vision, Procrustes Analysis (PA) has been used extensively to align shapes (e.g., [19, 4]) and appearance (e.g., [0, 13]) as a pre-processing step to build -D models of shape variation. Usually, shape models are learned from a discrete set of -D landmarks through a two-step process [8]. Firstly, the rigid transformations are removed by aligning the training set w.r.t. the mean using PA; next, the remaining deformations are modeled using Principal Component Analysis (PCA) [18, 5]. PA has been widely employed despite suffering from several limitations: (1) The -D training samples do not necessarily cover a uniform sampling of all 3-D rigid transformations of an object and this can result in a biased model (i.e., some poses are better represented than others). () It is computationally expensive to learn a shape model by sampling all possible 3-D rigid transformations of an

X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo (a) (b) (c) Fig. 1. Illustration of Continuous Subspace Procrustes Analysis (CSPA), which builds an unbiased -D model of human joints variation (b) by integrating over all possible viewpoints of a 3-D motion capture data (a). This -D body shape model is used to reconstruct -D shapes from different viewpoints (c). Our CSPA model generalizes across poses and camera views because it is learned from a 3-D model. object. (3) The models that are learned using only -D landmarks cannot model missing landmarks due to large pose changes. Moreover, PA methods can lead to local minima problems if there are missing components in the training data. (4) Finally, PA is computationally expensive, it scales linearly with the number of samples and landmarks and quadratically with the dimension of the data. To address these issues, this paper proposes a discrete and a continuous formulation of Subspace Procrustes Analysis (SPA). SPA is able to efficiently compute the non-rigid subspace of possible -D projections given several 3-D samples of a deformable object. Note that our proposed work is the inverse problem of Non-Rigid Structure From Motion (NRSFM) [, 1, 3]. The goal of NRSFM is to recover 3-D shape models from -D tracked landmarks, while SPA builds unbiased -D models from 3-D data. The learned -D model has the same representational power of a 3-D model but leads to faster fitting algorithms [15]. SPA uniformly samples the space of possible 3-D rigid transformations, and it is extremely efficient in space and time. The main idea of SPA is to combine functional data analysis with subspace estimation techniques. Fig. 1 illustrates the main idea of this work. In Fig. 1 (a), we represent many samples of 3-D motion capture data of humans performing several activities. SPA simultaneously aligns all 3-D samples projections, while computing a -D subspace (Fig. 1 (b)) that can represent all possible projections of the 3-D motion capture samples under different camera views. Hence, SPA provides a simple, efficient and effective method to learn a -D subspace that accounts for nonrigid and 3-D geometric deformation of 3-D objects. These -D subspace models can be used for detection (i.e., constrain body parts, see Fig. 1 (c)), because the subspace models all 3-D rigid projections and non-rigid deformations. As we will show in the experimental validation, the models learned by SPA are able

Subspace Procrustes Analysis 3 to generalize better than existing PA approaches across view-points (because they are built using 3-D models) and preserve expressive non-rigid deformations. Moreover, computing SPA is extremely efficient in space and time. Procrustes Analysis Revisited This section describes three different formulations of PA with a unified and enlightening matrix formulation. Procrustes Analysis (PA): Given a set of m centered shapes (see footnote for notation 1 ) composed by l landmarks D i R d l, i = 1,..., m, PA [6, 9, 8, 10, ] computes the d-dimensional reference shape M R d l and the m transformations T i R d d (e.g., affine, Euclidean) that minimize the reference-space model [10, 8, ] (see Fig. (a)): E R (M, T) = m T i D i M F, (1) where T = [T T 1,[, T T m] T ] R dm d. In the case of two-dimensional shapes x1 x (d = ), D i =... x l. Alternatively, PA can be optimized using the y 1 y... y l data-space model [] (see Fig. (b)): E D (M, A) = m D i A i M F, () where A = [A T 1,, A T m] T R dm d. A i = T 1 R d d is the inverse transformation of T i and corresponds to the rigid transformation for the reference shape M. The error function Eq. (1) of the reference-space model minimizes the difference between the reference shape and the registered shape data. In the data-space model, the error function Eq. () compares the observed shape points with the transformed reference shape, i.e., shape points predicted by the model and based on the notion of average shape [3]. This difference between the two models leads to different properties. Since the reference-space cost (E R, Eq. (1)) is a sum of squares and it is convex in the optimization parameters, it can be optimized globally with Alternated Least Squares (ALS) methods. On the other hand, the data-space cost (E D, Eq. ()) is a bilinear problem and non-convex. If there is no missing data, the data-space model can be solved using the Singular Value 1 Bold capital letters denote a matrix X, bold lower-case letters a column vector x. x i represents the i th column of the matrix X. x ij denotes the scalar in the i th row and j th column of the matrix X. All non-bold letters represent scalars. I n R n n is an identity matrix. x = i xi and X F = ij x ij denote the -norm for a vector and the Frobenius norm of a matrix, respectively. X Y is the Kronecker product of matrices and X (p) is the vec-transpose operator, detailed in Appendix A.

4 X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo (a) (b) Fig.. (a): Reference-space model. (b): Data-space model. Note that A i = T 1 i. Decomposition (SVD). A major advantage of the data-space model is that it is gauge invariant (i.e., the cost does not depend on the coordinate frame in which the reference shape and the transformations are expressed) []. Benefits of both models are combined in []. Recently, Pizarro et al. [19] have proposed a convex approach for PA based on the reference-space model. In their case, the cost function is expressed with a quaternion parametrization which allows conversion to a Sum of Squares Program (SOSP). Finally, the equivalent semi-definite program of a SOSP relaxation is solved using convex optimization. PA has also been applied to learn appearance models invariant to geometric transformations. When PA is applied to shapes, the geometric transformation (e.g., T i or A i ) can be directly applied to the image coordinates. However, to align appearance features the geometric transformations have to be composed with the image coordinates, and the process is a bit more complicated. This is the main difference when applying PA to align appearance and shape. Frey and Jojic [7] proposed a method for learning a factor analysis model that is invariant to geometric transformations. The computational cost of this method grows polynomially with the number of possible spatial transformations and it can be computationally intensive when working with high-dimensional motion models. To improve upon that, De la Torre and Black [0] proposed parameterized component analysis: a method that learns a subspace of appearance invariant to affine transformations. Miller et al. proposed the congealing method [13], which uses an entropy measure to align images with respect to the distribution of the data. Kookinos and Yuille [1] proposed a probabilistic framework and extended previous approaches to deal with articulated objects using a Markov Random Field (MRF) on top of Active Appearance Models (AAMs). Projected Procrustes Analysis (PPA): Due to advances in 3-D capture systems, nowadays it is common to have access to 3-D shape models for a variety of objects. Given n 3-D shapes D i R 3 l, we can compute r projections P j

Subspace Procrustes Analysis 5 R 3 for each of them (after removing translation) and minimize PPA: E PPA (M, A ij ) = j=1 r P j D i A ij M F, (3) where P j is an orthographic projection of a 3-D rotation R(ω) in a given domain Ω, defined by the rotation angles ω = {φ, θ, ψ}. Note that, while data and reference shapes are d-dimensional in Eq. (1) and Eq. (), data D i and reference M R l shapes in Eq. (3) are fixed to be 3-D and -D, respectively. Hence, A ij R is a -D transformation mapping M to the -D projection of the 3-D data. ALS is a common method to minimize Eq. () and (3). ALS alternates between minimizing over M and A ij R with the following expressions: A ij = P j D i M T (MM T ) 1 i, j, (4) r r M = ( A T ija ij ) 1 ( ( A T ijp j )D i ). (5) j=1 j=1 Note that PPA and its extensions deal with missing data naturally. Since they use the whole 3-D shape of objects, the enhanced -D dataset resulting of projecting the data from different viewpoints can be constructed without occluded landmarks. Continuous Procrustes Analysis (CPA): A major limitation of PPA is the difficulty to generate uniform distributions in the Special Orthogonal group SO(3) [17]. Due to the topology of SO(3), different angles should be sampled following different distributions, which becomes difficult when the rotation matrices must be confined in a specific region Ω of SO(3), restricted by rotation angles ω = {φ, θ, ψ}. Moreover, the computational complexity of PPA increases linearly with the number of projections (r) and 3-D objects (n). In order to deal with these drawbacks, a continuous formulation (CPA) was proposed in [10] by formulating PPA within a functional analysis framework. CPA minimizes: E CPA (M, A(ω) i ) = Ω P(ω)D i A(ω) i M F dω, (6) where dω = 1 8π sin(θ)dφdθdψ ensures uniformity in SO(3) [17]. This continuous formulation finds the optimal -D reference shape of a 3-D dataset, rotated and projected in a given domain Ω, by integrating over all possible rotations in that domain. The main difference between Eq. (3) and Eq. (6) is that the entries in P(ω) R 3 and A(ω) i R are not scalars anymore, but functions of the integration angles ω = {φ, θ, ψ}. After some linear algebra and functional analysis, it is possible to find an equivalent expression to the discrete approach

6 X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo (Eq. (3)), where A(ω) i and M have the following expressions: A(ω) i = P(ω)D i M T (MM T ) 1 i, (7) ( n ) 1 ( n ( ) ) M = Ω A(ω)T i A(ω) i dω Ω A(ω)T i P(ω)dω D i. (8) It is important to notice that the -D projections are not explicitly computed in the continuous formulation. The solution of M is found using fixed-point iteration in Eq. (6): M = (ZM T (MM T ) 1 ) 1 Z, (9) where X = Ω P(ω)T P(ω)dω R 3 3 averages the rotation covariances and Z = (MM T ) 1 M ( n (DT i D T i ) vec(x)) (l). Note that the definite integral X is not data dependent, and it can be computed off-line. Our work builds on [10] but extends it in several ways. First, CPA only computes the reference shape of the dataset. In this paper, we add a subspace that is able to model non-rigid deformations of the object, as well as rigid 3-D transformations that the affine transformation cannot model. As we will describe later, adding a subspace to the PA formulation is not a trivial task. For instance, modeling a subspace following the standard methodology based on CPA would still require to generate r rotations for each 3-D sample. Hence, the CPA efficiency is limited to rigid models while our approach is not. Second, we provide a discrete and continuous formulation in order to provide a better understanding of the problem, and experimentally show that it converges to the same solution when the number of sampled rotations (r) increases. Finally, we evaluate the models in two challenging problems: pose estimation in still images and joints modeling. 3 Subspace Procrustes Analysis (SPA) This section proposes Discrete Subspace Procrustes Analysis (DSPA) and Continuous Subspace Procrustes Analysis (CSPA) to learn unbiased -D models from 3-D deformable objects. Discrete Subspace Procrustes Analysis (DSPA): Given a set of r viewpoints P j R 3 of the n 3-D shapes, where d i = vec(d i ) R 3l 1, DSPA extends PA by considering a subspace B R l k and the weights c ij R k 1 which model the non-rigid deformations that the mean M and the transformation A ij are not able to reconstruct. DSPA minimizes the following function: E DSPA (M, A ij, B, c ij ) = j=1 j=1 r P j D i A ij M (c T ij I )B () = (10) r (I l P j )d i (I l A ij )µ Bc ij, (11) See Appendix A for an explanation of the vec-transpose operator. F

Subspace Procrustes Analysis 7 where P j is a particular 3-D rotation, R(ω), that is projected using an orthographic projection into -D, µ = vec(m) R l 1 is the vectorized version of the reference shape, c ij are the k weights of the subspace for each -D shape projection, and B () R k l is the reshaped subspace. Observe that the only difference with Eq. (3) is that we have added a subspace. This subspace will compensate for the non-rigid components of the 3-D object and the rigid component (3-D rotation and projection to the image plane) that the affine transformation cannot model. Recall that a 3-D rigid object under orthographic projection can be recovered with a three-dimensional subspace (if the mean is removed), but PA cannot recover it because it is only rank two. Also, observe that the coefficient c ij depends on two indexes, i for the object and j for the geometric projection. Dependency of c ij on the geometric projection is a key point. If j index is not considered, the subspace would not be able to capture the variations in pose and its usefulness for our purposes would be unclear. Although Eq. (10) and the NRSFM problem follow similar formulation [3], the assumptions are different and variables have opposite meanings. For instance, the NRSFM assumptions about rigid transformations do not apply here, since A ij are affine transformations in our case. Given an initialization of B = 0, DSPA is minimized by finding the transformations A ij and reference shape M that minimize Eq. (3), using the same ALS framework as in PA. Then, we substitute A ij and M in Eq. (11) that results in the expression: E DSPA (B, c ij ) = j=1 j=1 r D ij (c T ij I )B () = (1) r d ij Bc ij = D BC F F, (13) where D ij = P j D i A ij M R l, d ij = vec( D ij ) R l 1, D = [ d 1... d nr ] R l nr, and C R k nr. We can find the global optima of Eq. (13) by Singular Value Decomposition (SVD): B = U and C = SV T, where D = USV T. Continuous Subspace Procrustes Analysis (CSPA): As it was discussed in the previous section, the discrete formulation is not efficient in space nor time, and might suffer from not uniform sampling of the original space. CSPA generalizes DSPA by re-writting it in a continuous formulation. CSPA minimizes the following functional: E CSPA (M, A(ω) i, B, c(ω) i ) = P(ω)D i A(ω) i M (c(ω) T i I )B () dω = (14) Ω F Ω (I l P(ω))d i (I l A(ω) i )µ Bc(ω) i dω, (15)

8 X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo where dω = 1 8π sin(θ)dφdθdψ. The main difference between Eq. (15) and Eq. (11) is that the entries in c(ω) i R k 1, P(ω) R 3 and A(ω) i R are not scalars anymore, but functions of integration angles ω = {φ, θ, ψ}. Given an initialization of B = 0, and similarly to the DSPA model, CSPA is minimized by finding the optimal reference shape M that minimizes Eq. (6). We used the same fixed-point framework as CPA. Given the value of M and the expression of A(ω) i from Eq. (7), we substitute them in Eq. (15) resulting in: E CSPA (B, c(ω) i ) = P(ω) D i (c(ω) T i I )B () dω = (16) Ω F (Il P(ω)) d i Bc(ω) i dω, (17) Ω where D i = D i (I l (M T (M M T ) 1 M )) and d i = vec( D i ). We can find the global optima of Eq. (17) by solving the eigenvalue problem, ΣB = BΛ, where Λ are the eigenvalues corresponding to columns of B. After some algebra (see Appendix B) we show that the covariance matrix Σ = ((I l Y) vec( n r d j=1 ij dt ij )) (l) where the definite integral Y = Ω P(ω) (I l P(ω))dω R l l can be computed off-line, leading to an efficient optimization in space and time. Though the number of elements in matrix Y increase quadratically with the number of landmarks l, note that the integration time is constant since Y has a sparse structure with only 36 different non-zero values (recall that P(ω) R 3 ). Although A(ω) i and c(ω) i are not explicitly computed during training, this is not a limitation compared to DSPA. During testing time, training values of c(ω) i are not needed. Only the deformation limits in each principal direction of B are required. These limits also depend on eigenvalues [4], which are computed with CSPA. 4 Experiments & Results This section illustrates the benefits of DSPA and CSPA, and compares them with state-of-the-art PA methods to build -D shape models of human skeletons. First, we compare the performance of PA+PCA and SPA to build a -D shape model of Motion Capture (MoCap) bodies using the Carnegie Mellon University MoCap dataset [1]. Next, we compare our discrete and continuous approaches in a large scale experiment. Finally, we illustrate the generalization of our -D body model in the problem of human pose estimation using the Leeds Sport Dataset [11]. For all experiments, we report the Mean Squared Error (MSE) relative to the torso size. 4.1 Learning -D Joints Models The aim of this experiment is to build a generic -D body model that can reconstruct non-rigid deformations under a large range of 3-D rotations. For

Subspace Procrustes Analysis 9 (a) (b) (c) Fig. 3. Comparisons as a function of the number of training viewpoint projections. (a) Rigid and (b) Deformable models (using a subspace of 9 basis) from Experiment 1, respectively; (c) CSPA and DSPA deformable models (using a subspace of 1 basis) from Experiment. training and testing, we used the Carnegie Mellon University MoCap dataset that is composed of 605 sequences performed by 109 subjects. The sequences cover a wide variety of daily human activities and sports. Skeletons with 31 joints are provided, as well as RGB video recordings for several sequences. We trained our models using the set of 14 landmarks as is common across several databases for human pose estimation. Experiment 1: Comparison with State-of-the-Art PA Methods This section compares DSPA, CSPA methods with the state-of-the-art Stratified Generalized Procrustes Analysis (SGPA) [] 3. For training we randomly selected 3 sequences with 30 frames per sequence from the set of 11 running sequences of the user number 9 (this is due to the memory limitations of SGPA). For testing we randomly selected sequences with 30 frames from the same set. We rotated the 3-D models in the yaw and pitch angles, within the ranges of φ, θ [ π/, π/]. The angles were uniformly selected and we report results varying the number of considered angles (i.e., rotations) between 1 100 angles in training, and fixed 300 angles for testing. There are several versions of SGPA. We selected the Affine-factorization with the data-space model to make a fair comparison with our method. Recall that under our assumption of non-missing data Affine-All and Affinefactorization achieve the same solution, with Affine-factorization being faster. Fig. 3 shows the mean reconstruction error and 0.5 of the standard deviation for 100 realizations. Fig. 3 (a) reports the results comparing PA, CPA and SGPA. As expected, PA and SGPA converge to CPA as the number of training rotations increased. However, observe that CPA achieves the same performance, but it is much more efficient. Fig. 3 (b) compares DSPA, CSPA, and SGPA followed by PCA (we will refer to this method SGPA+PCA). From the figure we can 3 The code was downloaded from author s website (http://isit.u-clermont1.fr/ ab).

10 X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo Fig. 4. Experiment results with 1 (top), and 30 (bottom) rotations. Examples show skeleton reconstructions from continuous (CSPA in solid red lines) and discrete (SPA in dashed blue lines) models over ground truth (solid black lines). observe that the mean error in the test for DSPA and SGPA+PCA decrease with the number of rotations in the training, and it converges to CSPA. CSPA provides a bound on the lower error. Observe, that we used 90 3-D bodies (3 sequences with 30 frames) within rotating angles φ, θ [ π/, π/], and DSPA and SGPA+PCA needed about 30 angles to achieve similar result to CSPA. So, in this case, discrete methods need 30 times more space than the continuous one. The execution times with 30 rotations, on a.ghz computer with 8Gb of RAM, were 1.44 sec. (DSPA), 0.03 sec. (CSPA) and 3.54 sec. (SGPA+PCA). Experiment : Comparison between CSPA and DSPA This experiment compares DSPA and CSPA in a large-scale problem as a function of the number of rotations. For training we randomly selected 0 sequences with 30 frames per sequence. For testing we randomly selected 5 sequences with 30 frames. We rotated the 3-D models in the yaw and pitch angles, within the ranges of φ, θ [ π/, π/]. The angles were uniformly selected and we report results varying the number of angles (i.e., rotations) between 1 100 angles in training, and 300 angles for testing. Fig. 3 (c) shows the mean reconstruction error and 0.5 of the standard deviation for 100 realizations, comparing DSPA and CSPA. As expected, DSPA converges to CSPA as the number of training rotations increases. However, observe that CSPA achieves the same performance, but it is much more efficient. In this experiment, with 6000 3-D training bodies (0 sequences with 30 frames) and domain: φ, θ [ π/, π/] discrete method required, again, around 30 -D viewpoint projections to achieve similar results to CSPA. Thus, discrete model DSPA needs 30 times more storage space than CSPA. The execution times with 30 rotations, on a.ghz computer with 8Gb of RAM, were 14.75 sec. (DSPA) and 0.04 sec. (CSPA).

Subspace Procrustes Analysis 11 Qualitative results from CSPA and DSPA models trained with different number of rotations are shown in Fig. 4. Note that training DSPA model with 1 rotation (top) results in poor reconstruction. However, training it with 30 rotations (bottom) leads to reconstructions almost as accurate as made by CSPA. 4. Experiment 3. Leeds Sport Dataset This section illustrates how to use the -D body models learned with CSPA to detect body configurations from images. We used the Leeds Sport Dataset (LSP) that contains 000 images of people performing different sports, some of them including extreme poses or viewpoints (e.g., parkour images). The first 1000 images of the dataset are considered for training and the second set of 1000 images for testing. One skeleton manually labeled with 14 joints is provided for each training and test image. We trained our -D CSPA model in the CMU MoCap dataset [1] using 1000 frames. From the 605 sequences of the motion capture data, we randomly selected 1000 and the frame in the middle of sequence is selected as representative frame. Using this training data, we built the -D CSPA model using the following ranges for the pitch, roll and yaw angles: φ, θ, ψ [ 3/4π, 3/4π]. We will refer to this model as CSPA-MoCap. For comparison, we used the 1000 -D training skeletons provided by the LSP dataset and run SGPA+PCA to build an alternative -D model. We will refer to this model as SGPA+PCA-LSP. Observe, that this model was trained on similar data as the test set. Table 1. Experiment 3 results. MSE of our continuous model (CSPA-MoCap) trained with 3-D MoCap data, the discrete model trained in the LSP dataset (SGPA+PCA- LSP), and both rigid models (CPA-MoCap, SGPA-LSP). Model CPA-MoCap SGPA-LSP CSPA-MoCap SGPA+PCA-LSP MSE 0.16405 0.1631 0.01046 0.01366 Table 1 reports MSE of reconstructing the test skeletons with rigid and deformable models. A subspace of 1 basis is used for both deformable models. Results show that CSPA-MoCap has less reconstruction error than the standard method SGPA+PCA-LSP, even trained in a different dataset (CMU MoCap) than the test. Qualitative results from CSPA-MoCap and SGPA+PCA-LSP models are shown in Fig. 5. Note that CSPA-MoCap provides more accurate reconstructions than SGPA+PCA-LSP because it is able to generalize to all possible 3-D rotations in the given interval. 5 Conclusions This paper proposes an extension of PA to learn a -D subspace of rigid and nonrigid deformations of 3-D objects. We propose two models, one discrete (DSPA)

1 X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo Fig. 5. Experiment 3 examples, reconstructing ground truth skeletons of LSP dataset with CSPA-MoCap (solid red lines) and SGPA+PCA-LSP (dashed blue lines) models. that samples the 3-D rotation space, and one continuous (CSPA) that integrates over SO(3). As the number of projections increases DPSA converges to CSPA. SPA has two advantages over traditional PA, PPA: (1) it generates unbiased models because it uniformly covers the space of projections, and () CSPA is much more efficient in space and time. Experiments comparing -D SPA models of bodies show improvements w.r.t. state-of-the-art PA methods. In particular, CSPA models trained with motion capture data outperformed -D models trained on the same database under the same conditions in the LSP database, showing how our -D models from 3-D data can generalize better to different viewpoints. In future work, we plan to explore other models that decouple the rigid and non-rigid deformation by providing two independent basis in the subspace. 6 Acknowledgments This work is partly supported by the Spanish Ministry of Science and Innovation (projects TIN01-38416-C03-01, TIN01-38187-C03-01, TIN013-43478- P), project 014 SGR 119, and the Comissionat per a Universitats i Recerca del Departament d Innovació, Universitats i Empresa de la Generalitat de Catalunya.

Subspace Procrustes Analysis 13 A Appendix. Vec-transpose Vec-transpose A (p) is a linear operator that generalizes vectorization and transposition operators [14, 16]. It reshapes matrix A R m n by vectorizing each i th block of p rows, and rearranging it as the i th column of the reshaped matrix, such that A (p) R pn m p, a 11 a 41 (3) a 11 a 1 a 13 a 1 a 51 a 1 a a 3 a 31 a 61 a 31 a 3 a 33 a 1 a 4 = a a 5, a 41 a 4 a 43 a 51 a 5 a 53 a 61 a 6 a 63 a 11 a 1 a 13 a 1 a a 3 a 31 a 3 a 33 a 41 a 4 a 43 a 51 a 5 a 53 a 61 a 6 a 63 () a 3 a 6 a 13 a 43 a 3 a 53 a 33 a 63 a 11 a 31 a 51 a 1 a 41 a 61 = a 1 a 3 a 5 a a 4 a 6. a 13 a 33 a 53 a 3 a 43 a 63 Note that (A (p) ) (p) = A and A (m) = vec(a). A useful rule for pulling a matrix out of nested Kronecker products is, ((BA) (p) C) (p) = (C T I p )BA = (B (p) C) (p) A, which leads to (C T I )B = (B () C) (). B Appendix. CSPA formulation In this Appendix, we detail the steps from Eq. (14) to Eq. (17), as well as the definition of the covariance matrix, introduced in Section 3. Given the value of M and the optimal expression of A(ω) i from Eq. (7), we substitute them in Eq. (14) resulting in: E CSPA(B, c(ω) i) = P(ω)D i P(ω)D ih (c(ω) T i I )B () dω, (18) Ω F where H = M T (M M T ) 1 M and D i R 3 l. Then, E CSPA(B, c(ω) i) = P(ω)D i(i l H) (c(ω) T i I )B () dω (19) Ω F leads us to Eq. (16) and Eq. (17), where D i = D i (I l H) and d i = vec( D i ). From Eq. (17), solving ECSPA c(ω) i = 0 we find: c(ω) i = (B T B) 1 B T (I l P(ω)) d i. (0)

14 X. Perez-Sala, F. De la Torre, L. Igual, S. Escalera, and C. Angulo The substitution of c(ω) i in Eq. (17) results in: E CSPA(B) = (I l P(ω)) d i B(B T B) 1 B T (I l P(ω)) d i dω = (1) Ω (I ) B(B T B) 1 B T (I l P(ω)) d i dω = () Ω [(I ) Ω tr B(B T B) 1 B T ( ) ] T (I l P(ω)) d i (Il P(ω)) d i dω = (3) [( ) ] tr I B(B T B) 1 B T Σ, (4) where: ( n ) Σ = Ω (I l P(ω)) d i dt i (I l P(ω)) T dω. (5) We can find the global optima of Eq. (4) by solving the eigenvalue problem, ΣB = BΛ, where Σ is the covariance matrix and Λ are the eigenvalues corresponding to columns of B. However, the definite integral in Σ is data dependent. To be able to compute the integral off-line, we need to rearrange the elements in Σ. Using vectorization and vec-transpose operator 4 : Σ = (vec [Σ]) (l) = (6) ( [ ( n ) (l) vec Ω (I l P(ω)) d i dt i (I l P(ω)) dω]) T = (7) ( ( ) [ n ]) (l) Ω (I l P(ω)) (I l P(ω))dω vec d i dt i, (8) which finally leads to: ( [ n ]) (l) Σ = (I l Y) vec d ij dt ij, (9) where the definite integral Y = Ω P(ω) (I l P(ω))dω R 4l 9l can be computed off-line. References 1. Carnegie mellon motion capture database. http://mocap.cs.cmu.edu. Bartoli, A., Pizarro, D., Loog, M.: Stratified generalized procrustes analysis. IJCV 101(), 7 53 (013) 3. Brand, M.: Morphable 3d models from video. In: CVPR. vol., pp. II 456. IEEE (001) 4. Cootes, T.F., Edwards, G.J., Taylor, C.J., et al.: Active appearance models. PAMI 3(6), 681 685 (001) 4 See Appendix A for the vec-transpose operator.

Subspace Procrustes Analysis 15 5. De la Torre, F.: A least-squares framework for component analysis. PAMI 34(6), 1041 1055 (01) 6. Dryden, I.L., Mardia, K.V.: Statistical shape analysis, vol. 4. John Wiley & Sons New York (1998) 7. Frey, B.J., Jojic, N.: Transformation-invariant clustering using the em algorithm. PAMI 5(1), 1 17 (003) 8. Goodall, C.: Procrustes methods in the statistical analysis of shape. Journal of the Royal Statistical Society. Series B (Methodological) pp. 85 339 (1991) 9. Gower, J.C., Dijksterhuis, G.B.: Procrustes problems, vol. 3. Oxford University Press Oxford (004) 10. Igual, L., Perez-Sala, X., Escalera, S., Angulo, C., De la Torre, F.: Continuous generalized procrustes analysis. PR 47(), 659 671 (014) 11. Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference (010), doi:10.544/c.4.1 1. Kokkinos, I., Yuille, A.: Unsupervised learning of object deformation models. In: ICCV. pp. 1 8. IEEE (007) 13. Learned-Miller, E.G.: Data driven image models through continuous joint alignment. PAMI 8(), 36 50 (006) 14. Marimont, D.H., Wandell, B.A.: Linear models of surface and illuminant spectra. JOSA A 9(11), 1905 1913 (199) 15. Matthews, I., Xiao, J., Baker, S.: d vs. 3d deformable face models: Representational power, construction, and real-time fitting. IJCV 75(1), 93 113 (007) 16. Minka, T.P.: Old and new matrix algebra useful for statistics. http://research.microsoft.com/en-us/um/people/minka/papers/matrix/, 000 17. Naimark, M.A.: Linear representatives of the Lorentz group (translated from Russian). New York, Macmillan (1964) 18. Pearson, K.: On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science (11), 559 57 (1901) 19. Pizarro, D., Bartoli, A.: Global optimization for optimal generalized procrustes analysis. In: CVPR. pp. 409 415. IEEE (011) 0. De la Torre, F., Black, M.J.: Robust parameterized component analysis: theory and applications to d facial appearance models. CVIU 91(1), 53 71 (003) 1. Torresani, L., Hertzmann, A., Bregler, C.: Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors. PAMI 30(5), 878 89 (008). Xiao, J., Chai, J., Kanade, T.: A closed-form solution to non-rigid shape and motion recovery. IJCV 67(), 33 46 (006) 3. Yezzi, A.J., Soatto, S.: Deformotion: Deforming motion, shape average and the joint registration and approximation of structures in images. IJCV 53(), 153 167 (003)