Figure2:Themixtureoffactoranalysisgenerativemodel. j;j z



Similar documents

London: capital of debt. Reducing the health consequences of personal debt

CS229 Lecture notes. Andrew Ng

Georey E. Hinton. University oftoronto. Technical Report CRG-TR May 21, 1996 (revised Feb 27, 1997) Abstract

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, p i.

Lower Austria The Big Travel Map

UNIVERSITY OF WARWICK. Academic Quality and Standards Committee

Note: This information copied with permission. Thanks to Kevin K. Custer W3KKC, Masters Communications, Inc.

Method 1: 30x Method 2: 15

Business Information Systems and Technology

TechnischeUniversitatChemnitz-Zwickau

thek-aryn-cubestructure. 1

1 of 7 31/10/ :34

Sections 2.11 and 5.8

MyOWNMcMaster Degree Pathway: Diploma in Business Administration & Bachelor of Arts in History

Requirements The MyOWNMcMaster degree pathway has three parts: diploma, elective and undergraduate courses.

The MyOWNMcMaster degree pathway has three parts: diploma, elective and undergraduate courses.

Themethodofmovingcurvesandmovingsurfacesisanew,eectivetoolfor Abstract

(DSSORA)isaninteractivemathematicalprogrammingsystemforoptimalresourceallocationdevelopedtosupportdecisionsofinvestment

LECTURE 3. Probability Computations

Regulatory Story. RNS Number : 8343I. DCD Media PLC. 08 July TR-1: NOTIFICATION OF MAJOR INTEREST IN SHARES i

Adjustable-Rate Mortgages; Single-Family; indexed to the one-year Treasury Constant Maturity; 2 percent

!"# $% &''&() * + /'/0*,' 1! #$$- ท มา : ส าน กนายกร ฐมนตร หน า 1

Printing Letters Correctly



How To Factor By Gcf In Algebra 1.5

งานท าทายท ล มแม น ำโขง

Copperplate Victorian Handwriting. Victorian. Exploring your History. Created by Causeway Museum Service

9 Summary of California Law (10th), Partnership

Business to Business Marketing Management

The Eleven Elliott Wave Patterns:

Stratex International Plc ('Stratex' or 'the Company') Holdings in Company

E&T POL 17.0 VET FEE-HELP Policy

is identically equal to x 2 +3x +2


On closed-form solutions of a resource allocation problem in parallel funding of R&D projects

The ABC s of Web Site Evaluation

HUMAN RESOURCES. Resourcing and Appointment. HR 1.1 Recruitment, Selection and Appointment


1Introduction. identicallywhenthesurfacehasbasepoints{thatis,parametervalues(s0;t0)forwhich

S.GRAF C.LOISEAUX Keywords:abstractinterpretation,simulation,propertypreservation,model-checking. 1.Introduction

Technical Points about Adaptive Steganography by Oracle (ASO)

MANAGEMENT OPTIMIZATION PROBLEMS ON FUZZY GRAPHS

1.5. Factorisation. Introduction. Prerequisites. Learning Outcomes. Learning Style

International Entrepreneurship

GLOSSARY of Paratransit Terms

Use Cases and Scenarios

Probabilistic Latent Semantic Analysis. To appear in: Uncertainity in Articial Intelligence, UAI'99, Stockholm. Thomas Hofmann

West African Minerals Corporation ("West African" or the "Company") Holding in Company

comenius SMART&SIMPLE Mencia de Mendoza Lyceum Breda DU CHEN - NIELS VAN ROIJ

B. Franklin, Printer. LESSON 3: Learning the Printing Trade

TRC Decision on the Reference Offer

RevisedJanuary26,

Art.275 PABX Interface

CRIMINAL JUSTICE SUMMER WORKSHOP FOR TEACHERS CJ837

HP PhotoSmart 3210/3210/3300/3310/ HP PhotoSmart 3210/3210/3300/3310/

Personal Accident Insurance Claim form

Select cell to view, left next event, right hardcopy

"Die Zauberflöte" "The Magic Flute" (K.620, Sept. 1791) Ouvertüre / Overture

ACCREDITED in USA. ACCREDITED in EU

Attention windows of second level fixations. Input image. Attention window of first level fixation

Jerónimo Martins SGPS, S.A.

EEL303: Power Engineering I - Tutorial 4

NLA Service Operations

FACTORING POLYNOMIALS

An Analytical Approach. This version: January 15, 1997

Case 1:13-cr GMS Document 129 Filed 04/01/15 Page 1 of 10 PageID #: 542 IN THE UNITED STATES DISTRICT COURT FOR THE DISTRICT OF DELAWARE

Tuning Tips & Techniques

Financials. Articles and agreements. Professional advisers


Usefulness of SSH PhD graduation in Hungary. Research questions

Spe Spearman S Rank Correlation Models

MANDATE OF THE BOARD

RAJALAKSHMI ENGINEERING COLLEGE MA 2161 UNIT I - ORDINARY DIFFERENTIAL EQUATIONS PART A

Basic Properties of Rational Expressions

ResearchsupportedbyONRContractN J-4052,ARPAOrder8225.

STATEMENTS OF COST SPECIAL ASSESSMENTS SEPTEMBER, 2014

AR(p) + MA(q) = ARMA(p, q)

2/4, 4/5/6 CLOCK GENERATION CHIP

DC-8706K Auto Dial Alarm System

Efficient Prevention of Credit Card Leakage from Enterprise Networks

Product Composer System

Dr.Web anti-viruses Visual standards

SPATIAL AND INTERIOR DESIGN

Limits and Continuity

Analysis of Algorithms I: Optimal Binary Search Trees

Transcription:

TheEMAlgorithmforMixturesofFactorAnalyzers DepartmentofComputerScience ZoubinGhahramani GeoreyE.Hinton May21,1996(revisedFeb27,1997) TechnicalReportCRG-TR-96-1 Email:zoubin@cs.toronto.edu Toronto,CanadaM5S1A4 6King'sCollegeRoad UniversityofToronto dimensionaldatausingasmallnumberoflatentvariables,canbeextendedbyallowing dierentlocalfactormodelsindierentregionsoftheinputspace.thisresultsina modelwhichconcurrentlyperformsclusteringanddimensionalityreduction,andcan bethoughtofasareduceddimensionmixtureofgaussians.wepresentanexact Expectation{Maximizationalgorithmforttingtheparametersofthismixtureoffactor Factoranalysis,astatisticalmethodformodelingthecovariancestructureofhigh Abstract 1Introduction Clusteringanddimensionalityreductionhavelongbeenconsideredtwoofthefundamental problemsinunsupervisedlearning(duda&hart,1973;chapter6).inclustering,thegoal istogroupdatapointsbysimilaritybetweentheirfeatures.conversely,indimensionality analyzers. clusteringand,withineachcluster,localdimensionalityreduction. formsofdimensionalityreduction factoranalysis withabasicmethodforclustering the Gaussianmixturemodel.Whatresultsisastatisticalmethodwhichconcurrentlyperforms paperwepresentanemlearningalgorithmforamethodwhichcombinesoneofthebasic reduction,thegoalistogroup(orcompress)featuresthatarehighlycorrelated.inthis moreseparateddependingonthelocalmetric. reductionmayguidetheprocessofclusterformation i.e.dierentclustersmayappear anddimensionalityreductionareperformedseparately.first,dierentfeaturesmaybe correlatedwithindierentclustersandthusthemetricfordimensionalityreductionmay needtovarybetweendierentclusters.conversely,themetricinducedindimensionality Recently,therehasbeenagreatdealofresearchonthetopicoflocaldimensionality Localdimensionalityreductionpresentsseveralbenetsoveraschemeinwhichclustering usedbytheseauthorsfordimensionalityreductionisprincipalcomponentsanalysis(pca). SungandPoggio,1994;SchwenkandMilgram,1995;Hintonetal.,1995).Thealgorithm characterandfacerecognition(breglerandomohundro,1994;kambhatlaandleen,1994; reduction,resultinginseveralvariantsonthebasicconceptwithsuccessfulapplicationsto 1

Figure1:Thefactoranalysisgenerativemodel(invectorform). - x? z modelforthedata,asthecostofcodingadatapointisequalanywherealongtheprincipal PCA,unlikemaximumlikelihoodfactoranalysis(FA),doesnotdeneaproperdensity PCAisnotrobusttoindependentnoiseinthefeaturesofthedata(seeHintonetal.,1996, componentsubspace(i.e.thedensityisun-normalizedalongthesedirections).furthermore, foracomparisonofpcaandfamodels).hinton,dayan,andrevow(1996),alsoexploring analyzerstoamixtureoffactoranalyzers.theirlearningalgorithmconsistedofanouter anapplicationtodigitrecognition,werethersttoextendmixturesofprincipalcomponents loop.thissimpliestheimplementation,reducesthenumberofheuristicparameters(i.e. loopofapproximateemtotthemixturecomponents,combinedwithaninnerloopof learningratesorstepsofconjugategradientdescent),andcanpotentiallyresultinspeed-ups. gradientdescenttoteachindividualfactormodel.inthisnotewepresentanexactem algorithmformixturesoffactoranalyzerswhichobviatestheneedforanouterandinner Inmaximumlikelihoodfactoranalysis(FA),ap-dimensionalreal-valueddatavectorxis 2FactorAnalysis analyzersinsection3.weclosewithadiscussioninsection4. gorithm.thisisfollowedbythederivationofthelearningalgorithmformixtureoffactor InthenextsectionwepresentbackgroundmaterialonfactoranalysisandtheEMaldimensionalrandomvariableuisdistributedN(0; smallerthanp(everitt,1984).thegenerativemodelisgivenby: modeledusingak-dimensionalvectorofreal-valuedfactors,z,wherekisgenerallymuch independentgiventhefactors.accordingtothismodel,xisthereforedistributedwithzero whereisknownasthefactorloadingmatrix(seefigure1).thefactorszareassumed diagonalityof toben(0;i)distributed(zero-meanindependentnormals,withunitvariance).thep- isoneofthekeyassumptionsoffactoranalysis:theobservedvariablesare x=z+u; ),where isadiagonalmatrix.the (1) meanandcovariance0+ bestmodelthecovariancestructureofx.thefactorvariableszmodelcorrelationsbetween theelementsofx,whiletheuvariablesaccountforindependentnoiseineachelementofx. mativeprojectionsofthedata.givenand ThekfactorsplaythesameroleastheprincipalcomponentsinPCA:Theyareinfor- ;andthegoaloffactoranalysisistondtheand 2,theexpectedvalueofthefactorscanbe that

computedthroughthelinearprojection:e(zjx)=x; Notethatsince where0( +0) 1,afactthatresultsfromthejointnormalityofdataandfactors: isdiagonal,theppmatrix( P "xz#!=n"0#;"0+ +0),canbeecientlyinvertedusing 0I#!: (3) (2) whereiisthekkidentitymatrix.furthermore,itispossible(andinfactnecessaryfor EM)tocomputethesecondmomentofthefactors, thematrixinversionlemma: ( +0) 1= E(zz0jx)=Var(zjx)+E(zjx)E(zjx)0 =I +xx00; 1 1(I+0 1) 10 1; factoranalysis(seeappendixaandrubin&thayer,1982): PCA. whichprovidesameasureofuncertaintyinthefactors,aquantitythathasnoanaloguein Theexpectations(2)and(4)formthebasisoftheEMalgorithmformaximumlikelihood (4) E-step:ComputeE(zjxi)andE(zz0jxi)foreachdatapointxi,givenand M-step: new= new=1ndiag(nxi=1xix0i newe[zjxi]x0i); nxi=1xie(zjxi)0! nxl=1e(zz0jxl)! 1.(5) 3MixtureofFactorAnalyzers wherethediagoperatorsetsalltheo-diagonalelementsofamatrixtozero. (6) Assumewehaveamixtureofmfactoranalyzersindexedby!j,j=1;:::;m.Thegenerative modelnowobeysthefollowingmixturedistribution(seefigure2): Asinregularfactoranalysis,thefactorsareallassumedtobeN(0;I)distributed,therefore, P(x)=mXj=1ZP(xjz;!j)P(zj!j)P(!j)dz: P(zj!j)=P(z)=N(0;I): 3 (8) (7)

! SSSw - x/ Figure2:Themixtureoffactoranalysisgenerativemodel. j;j z allowingeachtomodelthedatacovariancestructureinadierentpartofinputspace, Whereasinfactoranalysisthedatameanwasirrelevantandwassubtractedbeforettingthe model,herewehavethefreedomtogiveeachfactoranalyzeradierentmean,j,thereby followingstatementscanbeeasilyveried, zandthemixtureindicatorvariable!,wherewj=1whenthedatapointwasgenerated adaptablemixingproportions,j=p(!j).thelatentvariablesinthismodelarethefactors by!j.forthee-stepoftheemalgorithm,oneneedstocomputeexpectationsofall theinteractionsofthehiddenvariablesthatappearintheloglikelihood.fortunately,the Theparametersofthismodelaref(j;j)mj=1;; P(xjz;!j)=N(j+jz; g;1thevectorparametrizesthe ): (9) andusingequations(2)and(10)weobtain Dening hij=e[wjjxi]/p(xi;!j)=jn(xi j;j0j+ E[wjzz0jxi]=E[wjjxi]E[zz0j!j;xi]: E[wjzjxi]=E[wjjxi]E[zj!j;xi] E[wjzjxi]=hijj(xi j); ) (10) (12) (11) TheEMalgorithmformixturesoffactoranalyzersthereforebecomes: wherej0j( E-step:Computehij,E[zjxi;!j]andE[zz0jxi;!j]foralldatapointsiandmixture +j0j) 1.Similarly,usingequations(4)and(11)weobtain E[wjzz0jxi]=hijI jj+j(xi j)(xi j)00j: (14) (13) sians.eachfactoranalyzertsagaussiantoaportionofthedata,weightedbytheposterior componentsj. parametersdedicatedtomodelingcovariancestructure. probabilities,hij.sincethecovariancematrixforeachgaussianisspeciedthroughthe lowerdimensionalfactorloadingmatrices,themodelhasmkp+p,ratherthanmp(p+1)=2, 1Notethateachmodelcanalsobeallowedtohaveaseparate Themixtureoffactoranalyzersis,inessence,areduceddimensionalitymixtureofGaus- M-step:Solveasetoflinearequationsforj,j,jand (seeappendixb). interpretationassensornoise. 4 matrix.this,however,changesits

4Discussion WehavedescribedanEMalgorithmforttingamixtureoffactoranalyzers.Matlabsource beingdeveloped. codeforthealgorithmcanbeobtainedfromftp://ftp.cs.toronto.edu/pub/zoubin/ zandthediscretevariables!dependontheirvalueataprevioustimestep,iscurrently mfa.tar.gz.anextensionofthisarchitecturetotimeseriesdata,inwhichboththefactors someperformanceloss.alternatively,afull-edgedbayesiananalysis,inwhichthesemodel methodsbasedonpruningorgrowingthemixturemaybemoreecientatthecostof dataandtheloglikelihoodonavalidationsetisusedtoselectthenalvalues.greedy bywhichthesecanbeselectediscross-validation:severalvaluesofmandkarettothe factoranalyzerstouse(m),andthenumberoffactorineachanalyzer(k).onemethod mixtureoffactoranalyzersthemodelerhastwofreeparameterstodecide:thenumberof Oneoftheimportantissuesnotaddressedinthisnoteismodelselection.Inttinga WethankC.Bishopforcommentsonthemanuscript.Theresearchwasfundedbygrants Acknowledgements fromthecanadiannaturalscienceandengineeringresearchcouncilandtheontario InformationTechnologyResearchCenter.GEHistheNesbitt-BurnsfellowoftheCanadian parametersareintegratedover,mayalsobepossible. Theexpectedloglikelihoodforfactoranalysisis InstituteforAdvancedResearch. AEMforFactorAnalysis Q=E"logYi(2)p=2j j XiE12x0i j 1=2expf 12[xi z]0 1z+12z00 1[xi z]g# wherecisaconstant,independentoftheparameters,andtristhetraceoperator. Tore-estimatethefactorloadingmatrixweset =c n2logj j Xi12x0i 1xi x0i 1E[zjxi]+12trh0 1z @Q @= Xi 1xiE[zjxi]0+Xl 1newE[zz0jxl]=0 1E[zz0jxi]i; obtaining new XlE[zz0jxl]0!=XixiE[zjxi]0 5

fromwhichwegetequation(5). Substitutingequation(5),n2 Were-estimatethematrix @@Q 1=n2 new Xi12xix0i newe[zjxi]x0i+12newe[zz0jxi]new0=0: new=xi12xix0i 12newE[zjxi]x0i throughitsinverse,setting andusingthediagonalconstraint, BEMforMixtureofFactorAnalyzers Theexpectedloglikelihoodformixtureoffactoranalysisis new=1ndiag(xixix0i newe[zjxi]x0i): augmentedcolumnvectoroffactors~z="z1# Q=E24logYiYj(2)p=2j Tojointlyestimatethemeanjandthefactorloadingsjitisusefultodenean j 1=2expf 12[xi j jz]0 1[xi j jz]gwj35 andanaugmentedfactorloadingmatrix~j=[jj].theexpectedloglikelihoodisthen wherecisaconstant.toestimate~jweset Q=E24logYiYj(2)p=2j =c n2logj j Xi;j12hijx0i j 1=2expf 12[xi ~j~z]0 1xi hijx0i 1~jE[~zjxi;!j]+12hijtrh~0j 1[xi ~j~z]gwj35 @~j= Xihij @Q 1xiE[~zjxi;!j]0+hij 1~new je[~z~z0jxi;!j]=0: 1~jE[~z~z0jxi;!j]i Thisresultsinalinearequationforre-estimatingthemeansandfactorloadings, hnew jnew ji=~new j= XihijxiE[~zjxi;!j]0! 6 XlhljE[~z~z0jxl;!j]! 1 (15)

where and E[~z~z0jxl;!j]="E[zz0jxl;!j]E[zjxl;!j] E[~zjxi;!j]="E[zjxi;!j] E[zjxl;!j]01# @@Q Were-estimatethematrix 1=n2 new Xij12hijxix0i hij~new throughitsinverse,setting je[~zjxi;!j]x0i+12hij~new 1#: Substitutingequation(15)for~jandusingthediagonalconstrainton new=1ndiag8<:xijhijxi ~new je[~zjxi;!j]x0i9=;: je[~z~z0jxi;!j]~new0 weobtain, j=0: Sincehij=P(!jjxi),usingtheempiricaldistributionofthedataasanestimateofP(x)we Finally,tore-estimatethemixingproportionsweusethedenition, j=p(!j)=zp(!jjx)p(x)dx: (16) get References Bregler,C.andOmohundro,S.M.(1994).Surfacelearningwithapplicationstolip-reading. InCowan,J.D.,Tesauro,G.,andAlspector,J.,editors,AdvancesinNeuralInformation new j=1nnxi=1hij: Duda,R.O.andHart,P.E.(1973).PatternClassicationandSceneAnalysis.Wiley,New Everitt,B.S.(1984).AnIntroductiontoLatentVariableModels.ChapmanandHall, ProcessingSystems6,pages43{50.MorganKaufmanPublishers,SanFrancisco,CA. Hinton,G.,Revow,M.,andDayan,P.(1995).Recognizinghandwrittendigitsusingmixtures York. Hinton,G.E.,Dayan,P.,andRevow,M.(1996).ModelingthemanifoldsofImagesof oflinearmodels.intesauro,g.,touretzky,d.,andleen,t.,editors,advancesin London. handwrittendigits.submittedforpublication. MA. NeuralInformationProcessingSystems7,pages1015{1022.MITPress,Cambridge, 7

Kambhatla,N.andLeen,T.K.(1994).Fastnon-lineardimensionreduction.InCowan, Rubin,D.andThayer,D.(1982).EMalgorithmsforMLfactoranalysis.Psychometrika, J.D.,Tesauro,G.,andAlspector,J.,editors,AdvancesinNeuralInformationProcessing Systems6,pages152{159.MorganKaufmanPublishers,SanFrancisco,CA. Sung,K.-K.andPoggio,T.(1994).Example-basedlearningforview-basedhumanface Schwenk,H.andMilgram,M.(1995).Transformationinvariantautoassociationwithapplicationtohandwrittencharacterrecognition.InTesauro,G.,Touretzky,D.,andLeen, T.,editors,AdvancesinNeuralInformationProcessingSystems7,pages991{998.MIT Press,Cambridge,MA. 47(1):69{76. detection.mitaimemo1521,cbclpaper112. 8