Coordinating Principal Component Analyzers



Similar documents
Modified Line Search Method for Global Optimization

Systems Design Project: Indoor Location of Wireless Devices

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Maximum Likelihood Estimators.

A gentle introduction to Expectation Maximization

Properties of MLE: consistency, asymptotic normality. Fisher information.

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Incremental calculation of weighted mean and variance

Research Article Sign Data Derivative Recovery

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Department of Computer Science, University of Otago

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Chapter 7 Methods of Finding Estimators

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

I. Chi-squared Distributions

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

Normal Distribution.

Class Meeting # 16: The Fourier Transform on R n

A probabilistic proof of a binomial identity

An Efficient Polynomial Approximation of the Normal Distribution Function & Its Inverse Function

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Ekkehart Schlicht: Economic Surplus and Derived Demand

Plug-in martingales for testing exchangeability on-line

1 Computing the Standard Deviation of Sample Means

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Measures of Spread and Boxplots Discrete Math, Section 9.4

Lesson 15 ANOVA (analysis of variance)

LECTURE 13: Cross-validation

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

CHAPTER 3 DIGITAL CODING OF SIGNALS

Swaps: Constant maturity swaps (CMS) and constant maturity. Treasury (CMT) swaps

Section 11.3: The Integral Test

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

5 Boolean Decision Trees (February 11)

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Lecture 2: Karger s Min Cut Algorithm

Finding the circle that best fits a set of points

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

CHAPTER 3 THE TIME VALUE OF MONEY

Totally Corrective Boosting Algorithms that Maximize the Margin

Convexity, Inequalities, and Norms

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Hypergeometric Distributions

Soving Recurrence Relations

Convention Paper 6764

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Sequences and Series

Hypothesis testing. Null and alternative hypotheses

Overview of some probability distributions.

Function factorization using warped Gaussian processes

A Recursive Formula for Moments of a Binomial Distribution

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,

1 The Gaussian channel

Generalization Dynamics in LMS Trained Linear Networks

Chapter 5: Inner Product Spaces

Cooley-Tukey. Tukey FFT Algorithms. FFT Algorithms. Cooley

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

5: Introduction to Estimation

WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER?

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

ODBC. Getting Started With Sage Timberline Office ODBC

Regularized Distance Metric Learning: Theory and Algorithm

(VCP-310)

Subject CT5 Contingencies Core Technical Syllabus

Lecture 4: Cheeger s Inequality

THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK


CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

1 Correlation and Regression Analysis

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

THE HEIGHT OF q-binary SEARCH TREES

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Confidence Intervals for One Mean

Probabilistic Engineering Mechanics. Do Rosenblatt and Nataf isoprobabilistic transformations really differ?

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Overview on S-Box Design Principles

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Building Blocks Problem Related to Harmonic Series

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

Entropy of bi-capacities

Universal coding for classes of sources

Chapter 7: Confidence Interval and Sample Size

Problem Solving with Mathematical Software Packages 1

Transcription:

Coordiatig Pricipal Compoet Aalyzers J.J. Verbeek ad N. Vlassis ad B. Kröse Iformatics Istitute, Uiversity of Amsterdam Kruislaa 403, 1098 SJ Amsterdam, The Netherlads Abstract. Mixtures of Pricipal Compoet Aalyzers ca be used to model high dimesioal data that lie o or ear a low dimesioal maifold. By liearly mappig the PCA subspaces to oe global low dimesioal space, we obtai a global low dimesioal coordiate system for the data. As show by Roweis et al., esurig cosistet global low-dimesioal coordiates for the data ca be expressed as a pealized likelihood optimizatio problem. We show that a restricted form of the Mixtures of Probabilistic PCA model allows for a more efficiet algorithm. Experimetal results are provided to illustrate the viability method. 1 Itroductio With icreasig sesor capabilities, powerful feature extractio methods are becomig icreasigly importat. Cosider a robot sesig its eviromet with a camera yieldig a stream of 100 100 pixel images. The observatios made by the robot ofte have a much lower itrisic dimesio tha the 10.000 dimesios the pixels provide. If we assume a fixed eviromet, ad a robot that ca rotate aroud its axis ad traslate through a room, the itrisic dimesio is oly three. Liear feature extractio techiques are able to do a fair compressio of the sigal by mappig it to a much lower dimesioal space. However, oly i very few special cases the maifold o which the sigal is geerated is a liear subspace of the sesor space. This clearly limits the use of liear techiques ad suggests to use o-liear feature extractio techiques. Mixtures of Factor Aalyzers (MFA) [2] ca be used to model such o-liear data maifolds. This model provides local liear mappigs betwee local latet spaces ad the data space. However, the local latet spaces are ot compatible with each other, i.e. the coordiate systems of eighborig factor aalyzers might be completely differetly orieted. Hece, if we move through the data space from oe factor aalyzer to the ext we caot predict the latet coordiate of a data poit o the oe factor aalyzer from the latet coordiate o the other factor aalyzer. Recetly, a model was proposed that itegrates the local liear models ito a global latet space, allowig for mappig back ad forth betwee the global latet ad the data-space [5]. The idea is that there is a liear map for each factor aalyzer betwee the data-space ad the global latet space. The model, which is fitted by maximizig pealized log-likelihood with a algorithm closely

related to the Expectatio-Maximizatio (EM) algorithm [4], is discussed i the ext sectio. I Sectio 3, we show how we ca reduce the umber of parameters to be estimated ad simplify the algorithm of [5]. These simplificatios remove the iterative procedure from the M-step ad remove all matrix iversios from the algorithm. The price we pay is that the covariace matrices of the Gaussia desities we use are more restricted, as discussed i the same sectio. Experimetal results are give i Sectio 4. A discussio ad coclusios are provided i Sectio 5. 2 The desity model To model the data desity i the high dimesioal space we use mixtures of a restricted type of Gaussia desities. The mixture is formed as a weighted sum of its compoet desities, which are idexed by s. The mixig weight ad mea of each compoet are give by respectively p s ad µ s. The covariace matrices of the Gaussia desities are costraied to be of the form: C = σ 2 (I D + ρλλ ), Λ Λ = I d, ρ > 0 (1) where D ad d are respectively the dimesio of the high-dimesioal/data-space ad the low-dimesioal/latet space. We use I d to deote the d-dimesioal idetity matrix. The d colums of Λ, i factor aalysis kow as the loadig matrix, are D-dimesioal vectors spaig the local PCA. Directios withi the PCA subspace have variace σ 2 (1+ρ), other directios have σ 2 variace. This is as the Mixture of Probabilistic Pricipal Compoet Aalyzers (MPPCA) model [7], with the differece that here we do ot oly have isotropic oise outside the subspaces but also isotropic variace iside the subspaces. We use this desity model to allow for coveiet solutios later. I [9] we provide a Geeralized EM algorithm to fid maximum likelihood solutios for this model. The same model ca be rephrased usig hidde variables z, which we use to deote iteral coordiates of the subspaces. We scale the coordiates z such that: p(z s) = N (z; 0, I d ). The iteral coordiates allow us to clearly express the lik to the global latet space, for which we deote coordiates with g. All mixture compoets have their ow liear mappig to the global space, give by a traslatio κ ad a matrix A, i.e. p(g z, s) = δ(κ s +A s z), where δ( ) deotes the distributio with mass 1 at the argumet. The geerative model reads: p(x) = s p(x z, s) = N (x; µ s + ρ s σ s Λ s z, σsi 2 D ), (2) p s N (x; µ s, σs(i 2 D + ρ s Λ s Λ s )), p(g) = p s N (g; κ s, A s A s ). (3) s We put a extra costrait o the projectio matrices: A s = α s σ s ρs R s, R s R s = I d, α s > 0, (4) hece R s implemets oly rotatios plus reflectios.

Note that the model assumes that locally there is a liear correspodece betwee the data space ad the latet space. It follows that the desities p(g x, s) ad p(x g, s) are Gaussia ad hece both p(g x) ad p(x g) are mixtures of Gaussia desities. I the ext sectio we discuss how this desity model allows for a efficiet learig scheme, as compared to the expressive but expesive MFA model proposed i [5]. 3 The learig algorithm The goal is, give observable data {x }, to fid a good desity model i the data-space ad mappigs {A s, κ s } that give rise to cosistet estimates for the hidde {g }. With cosistet we mea that if a poit x i the data-space is well modeled by two PCA s, the the correspodig estimates for its latet coordiate g should be close to each other, i.e. the subspaces should agree o the correspodig g. Objective Fuctio: To measure the level of agreemet, oe ca cosider for all data poits how ui-modal the distributio p(g x) is. This idea was also used i [10]. There the goal was to fid a global liear low-dimesioal projectio of supervised data, that preserves the maifold structure of the data. I [5] it is show how the double objective of likelihood ad ui-modality ca be implemeted as a pealized log-likelihood optimizatio problem. Let Q(g x ) = N (g; g, Σ ) a Gaussia approximatio of the mixture p(g x ) ad Q(s x ) = q s. We defie: Q(g, s x ) = Q(s x )Q(g x ). (5) As a measure of ui-modality we ca use a sum of Kullback-Leibler divergeces: [ ] Q(g, s x ) dg Q(g, s x ) log = D KL (q s p s ) + q s D s p(g, s x ) s s where D s = D KL (Q(g x ) p(g x, s)) ad p s = p(s x ). The total objective fuctio, combiig log-likelihood ad the pealty term, the becomes: Φ = = s log p(x ) D KL ({q s } {p s }) s q s D s (6) dg Q(g, s x ) [ log Q(g, s x ) + log p(x, g, s) ]. (7) The objective correspods to a costraied EM procedure, c.f. [8] where the same idea is used to derive a probabilistic versio of Kohoe s Self-Orgaizig Map [3]. Our desity model differs with that of [5] i two aspects: (i) we use a isotropic oise model outside the subspaces (as opposed to diagoal covariace matrix) ad (ii) we use isotropic variace iside the subspace (as opposed to geeral Gaussia). Also, usig our desity model it turs out that to optimize Φ with respect to Σ, it should be of the form 1 Σ = β 1 I d. Therefore we 1 Oce we realize that the matrices V s i [5] are of the form ci d with our desity model, it ca be see easily by settig Φ/ Σ = 0 that Σ = β 1 I d.

work with β from ow o. Usig our desity model ad g s = g κ s ad x s = x µ s we ca write (7) as: Φ = [ q s d 2 log β log q s e s 2σ 2 v [ ] s dβ 1 + g sg s (8) s s 2 ρ s + 1 ] D log σ s + d 2 log v s ρ s + 1 + log p s + cost. with e s = x s αs 1 Λ s R s g s 2 ad v s = ρ s + 1 σsρ 2 s αs 2, (9) where v s is the iverse variace of p(g s, x) ad e s is the squared distace betwee x ad g mapped ito the data space by compoet s. Optimizatio: To optimize Φ we use a EM-style algorithm, a simplified versio of the algorithm provided i [5]. The simplificatios are: (i) the iterative process to solve for the Λ s, A s is o loger eeded; a exact update is possible ad (ii) the algorithm o loger ivolves matrix iversios. The same maer of computatio is used: i the E-step, we compute the ui-modal distributios Q(s, g x ), parameterized by β, g ad q s. Let g s = E p(g x,s)[g] deote the expected value of g give x ad s. We use the followig idetities: g s = κ s + R s Λ s x s α s ρ s /(ρ s + 1), (10) D s = v s [ dβ 1 + g g s 2 ] + d 2 2 [log β log v s ]. (11) The distributios Q ca be foud by iteratig the fixed-poit equatios: β = s q s v s, g = β 1 q s v s g s, q s = s p s exp D s s p s exp D s, where we used p s = p(s x ). I the M-step, we update the parameters of the mixture model. Usig otatio: C s = q s g s 2, E s = q s e s, G s = d q s β 1, (12) the update equatios are: κ s = q sg q, µ s = q sx C s + G s s q, α s = s q s(gsr s Λ s x s ), ρ s = D(C s + G s ) d(αse 2 s + G s ), σ2 s = E s + ρ 1 s αs 2 [C s + (ρ s + 1)G s ] (D + d) q, p s = q s s s q. s Note that the above equatios require E s which i tur requires Λ s R s via equatios (9) ad (12). To fid Λ s R s we have to miimize: q s e s = q s x s αs 1 Λ s R s g s 2 = q s x s(λ s R s )g s + cost.

This problem is kow as the weighted Procrustes rotatio [1]. Let C = [ q 1s x 1s q s x s ][ q 1s g 1s q s g s ], with SVD: C = ULΓ, where the g s have bee padded with zeros to form D-dimesioal vectors, the the optimal Λ s R s is give by the first d colums of UΓ. 4 Experimetal Illustratio To demostrate the method, we captured 40 40 pixel gray valued images of a face with a camera. The face has two degrees of freedom, amely lookig updow ad left-right. We leared a coordiated mixture model with 1000 images. We used a global PCA projectio to 22 dimesios, preservig over 70% of the variace i the data set. We used a latet dimesioality of two ad 20 mixture compoets. We iitialized the coordiated mixture model by clampig the latet coordiates g at coordiates foud by Isomap [6] ad clampig the β at small values for the first 50 iteratios. The q s were iitialized uiformly radom, ad updated from the start. The obtaied coordiated mixture model was used to map 1000 test images. For each test image x we approximated p(g x ) with a sigle Gaussia Q = arg mi Q D KL (Q p(g x )) with a certai mea ad stadard deviatio. I Figure 1 we show these meas (locatio of circle) ad stadard deviatios (radius). To illustrate the discovered parametrizatio further, two examples of liear traversal of the latet space are give. 5 Coclusios ad Discussio We showed how a special case of the desity model used i [5] leads to a more efficiet algorithm to coordiate probabilistic local liear descriptios of a data maifold. The M-step ca be computed at oce, the iterative procedure to fid solutios for a Riccati equatio is o loger eeded. Furthermore, the update equatios do ot ivolve matrix iversio aymore. However, still d sigular values ad vectors of a D D matrix have to be foud. The applicatio of this method to partially supervised data sets is a iterestig possibility ad a topic of future research. Aother importat issue, ot addressed here, is that ofte whe we collect data from a system with limited degrees of freedom we actually observe sequeces of data. If we assume that the system ca vary its state oly i cotiuous maer, these sequeces should correspod to paths o the maifold of observable data. This fact might be exploited to fid a low dimesioal embeddig of the maifold. I [9] we report o promisig results of experimets where we used this model to map omi-directioal camera images, recorded through a office, to a 2d latet space (the locatio i the office), where the data was supervised i the sese that 2d workfloor coordiates are kow. Ackowledgmet: This research is supported by the Techology Foudatio STW (project r. AIF4997) applied sciece divisio of NWO ad the techology program of the Dutch Miistry of Ecoomic Affairs.

Fig. 1. Latet coordiates ad liear trajectories i the latet space. Refereces 1. T.F. Cox ad M.A.A. Cox. Multidimesioal Scalig. Number 59 i Moographs o statistics ad applied probability. Chapma & Hall, 1994. 2. Z. Ghahramai ad G.E. Hito. The EM Algorithm for Mixtures of Factor Aalyzers. Techical Report CRG-TR-96-1, Uiversity of Toroto, Caada, 1996. 3. T. Kohoe. Self-Orgaizig Maps. Spriger Series i Iformatio Scieces. Spriger-Verlag, Heidelberg, Germay, 2001. 4. R.M. Neal ad G.E. Hito. A view of the EM algorithm that justifies icremetal, sparse, ad other variats. I M.I. Jorda, editor, Learig i Graphical Models, pages 355 368. Kluwer Academic Publishers, Dordrecht, The Netherlads, 1998. 5. S.T. Roweis, L.K. Saul, ad G.E. Hito. Global coordiatio of local liear models. I T.G. Dietterich, S. Becker, ad Z. Ghahramai, editors, Advaces i Neural Iformatio Processig Systems 14. MIT Press, 2002. 6. J.B. Teebaum, V. de Silva, ad J.C. Lagford. A global geometric framework for oliear dimesioality reductio. Sciece, 290(5500):2319 2323, 2000. 7. M.E. Tippig ad C.M. Bishop. Mixtures of probabilistic pricipal compoet aalysers. Neural Computatio, 11(2):443 482, 1999. 8. J.J. Verbeek, N. Vlassis, ad B. Kröse. The Geerative Self-Orgaizig Map: A Probabilistic Geeralizatio of Kohoe s SOM. Techical Report IAS-UVA-02-03, Iformatics Istitute, Uiversity of Amsterdam, The Netherlads, May 2002. 9. J.J. Verbeek, N. Vlassis, ad B. Kröse. Procrustes Aalysis to Coordiate Mixtures of Probabilistic Pricipal Compoet Aalyzers. Techical report, Iformatics Istitute, Uiversity of Amsterdam, The Netherlads, February 2002. 10. N. Vlassis, Y. Motomura, ad B. Kröse. Supervised dimesio reductio of itrisically low-dimesioal data. Neural Computatio, 14(1):191 215, Jauary 2002.