A Hierarchical Latent Variable Model for Data Visualization



Similar documents
IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

APPENDIX III THE ENVELOPE PROPERTY

6.7 Network analysis Introduction. References - Network analysis. Topological analysis

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

On Error Detection with Block Codes

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

An Operating Precision Analysis Method Considering Multiple Error Sources of Serial Robots

Numerical Methods with MS Excel

Relaxation Methods for Iterative Solution to Linear Systems of Equations

Defining Perfect Location Privacy Using Anonymization

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

The simple linear Regression Model

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

Simple Linear Regression

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

The Digital Signature Scheme MQQ-SIG

CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning

Robust Realtime Face Recognition And Tracking System

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK

Speeding up k-means Clustering by Bootstrap Averaging

Average Price Ratios

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

Settlement Prediction by Spatial-temporal Random Process

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering

Projection model for Computer Network Security Evaluation with interval-valued intuitionistic fuzzy information. Qingxiang Li

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0

Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts

Chapter Eight. f : R R

Chapter = 3000 ( ( 1 ) Present Value of an Annuity. Section 4 Present Value of an Annuity; Amortization

CHAPTER 2. Time Value of Money 6-1

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

CH. V ME256 STATICS Center of Gravity, Centroid, and Moment of Inertia CENTER OF GRAVITY AND CENTROID

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

Reinsurance and the distribution of term insurance claims

Conversion of Non-Linear Strength Envelopes into Generalized Hoek-Brown Envelopes

Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, eds.

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

The Time Value of Money

10.5 Future Value and Present Value of a General Annuity Due

Curve Fitting and Solution of Equation

Statistical Intrusion Detector with Instance-Based Learning

A Parallel Transmission Remote Backup System

Automated Event Registration System in Corporation

RUSSIAN ROULETTE AND PARTICLE SPLITTING

Using Phase Swapping to Solve Load Phase Balancing by ADSCHNN in LV Distribution Network

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

1. The Time Value of Money

ON SLANT HELICES AND GENERAL HELICES IN EUCLIDEAN n -SPACE. Yusuf YAYLI 1, Evren ZIPLAR 2. yayli@science.ankara.edu.tr. evrenziplar@yahoo.

Common p-belief: The General Case

Software Aging Prediction based on Extreme Learning Machine

Banking (Early Repayment of Housing Loans) Order,

Integrating Production Scheduling and Maintenance: Practical Implications

CHAPTER 13. Simple Linear Regression LEARNING OBJECTIVES. USING Sunflowers Apparel

where p is the centroid of the neighbors of p. Consider the eigenvector problem

ISyE 512 Chapter 7. Control Charts for Attributes. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

of the relationship between time and the value of money.

Credibility Premium Calculation in Motor Third-Party Liability Insurance

DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT

Forecasting Trend and Stock Price with Adaptive Extended Kalman Filter Data Fusion

Constrained Cubic Spline Interpolation for Chemical Engineering Applications

Load Balancing Control for Parallel Systems

A COMPARATIVE STUDY BETWEEN POLYCLASS AND MULTICLASS LANGUAGE MODELS

Study on prediction of network security situation based on fuzzy neutral network

CSSE463: Image Recognition Day 27

How To Make A Supply Chain System Work

Application of Grey Relational Analysis in Computer Communication

An IG-RS-SVM classifier for analyzing reviews of E-commerce product

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time

A particle Swarm Optimization-based Framework for Agile Software Effort Estimation

Fault Tree Analysis of Software Reliability Allocation

STOCHASTIC approximation algorithms have several

Load and Resistance Factor Design (LRFD)

Optimal Packetization Interval for VoIP Applications Over IEEE Networks

THE McELIECE CRYPTOSYSTEM WITH ARRAY CODES. MATRİS KODLAR İLE McELIECE ŞİFRELEME SİSTEMİ

Fundamentals of Mass Transfer

Aggregation Functions and Personal Utility Functions in General Insurance

FINANCIAL MATHEMATICS 12 MARCH 2014

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

Analysis of one-dimensional consolidation of soft soils with non-darcian flow caused by non-newtonian liquid

A particle swarm optimization to vehicle routing problem with fuzzy demands

Regression Analysis. 1. Introduction

Session 4: Descriptive statistics and exporting Stata results

Transcription:

IEEE ASACIOS O PAE AAYSIS AD MACHIE IEIGECE, VO. 0, O. 3, MACH 998 8 A Herarchcal atet Varable Moel for Data Vsualzato Chrstopher M. Bshop a Mchael E. ppg Abstract Vsualzato has prove to be a powerful a welyapplcable tool for the aalyss a terpretato of multvarate ata. Most vsualzato algorthms am to f a proecto from the ata space ow to a twomesoal vsualzato space. However, for complex ata sets lvg a hghmesoal space, t s ulkely that a sgle twomesoal proecto ca reveal all of the terestg structure. We therefore trouce a herarchcal vsualzato algorthm whch allows the complete ata set to be vsualze at the top level, wth clusters a subclusters of ata pots vsualze at eeper levels. he algorthm s base o a herarchcal mxture of latet varable moels, whose parameters are estmate usg the expectatomaxmzato algorthm. We emostrate the prcple of the approach o a toy ata set, a we the apply the algorthm to the vsualzato of a sythetc ata set mesos obtae from a smulato of multphase flows ol ppeles, a to ata 36 mesos erve from satellte mages. A Matlab software mplemetato of the algorthm s publcly avalable from the Worl We Web. Iex erms atet varables, ata vsualzato, EM algorthm, herarchcal mxture moel, esty estmato, prcpal compoet aalyss, factor aalyss, maxmum lkelhoo, clusterg, statstcs. IODCIO M AY algorthms for ata vsualzato have bee propose by both the eural computg a statstcs commutes, most of whch are base o a proecto of the ata oto a twomesoal vsualzato space. Whle such algorthms ca usefully splay the structure of smple ata sets, they ofte prove aequate the face of ata sets whch are more complex. A sgle twomesoal proecto, eve f t s olear, may be suffcet to capture all of the terestg aspects of the ata set. For example, the proecto whch best separates two clusters may ot be the best for revealg teral structure wth oe of the clusters. hs motvates the coserato of a herarchcal moel volvg multple twomesoal vsualzato spaces. he goal s that the toplevel proecto shoul splay the etre ata set, perhaps revealg the presece of clusters, whle lowerlevel proectos splay teral structure wth vual clusters, such as the presece of subclusters, whch mght ot be apparet the hgherlevel proectos. Oce we allow the possblty of may complemetary vsualzato proectos, we ca coser each proecto moel to be relatvely smple, for example, base o a lear proecto, a compesate for the lack of flexblty of vual moels by the overall flexblty of the complete herarchy. he use of a herarchy of relatvely smple moels offers greater ease of terpretato as well as the beefts of aalytcal a computatoal smplfcato. hs C.M. Bshop s wth Mcrosoft esearch, St. George House, Gulhall Street, Cambrge CB 3H,.K. Emal: cmbshop@mcrosoft.com. M.E. ppg s wth the eural Computg esearch Group, Asto versty, Brmgham B4 7E,.K. Emal: m.e.tppg@asto.ac.uk. Mauscrpt receve 3 Apr. 997; revse 3 Ja. 998. ecommee for acceptace by.w. Pcar. For formato o obtag reprts of ths artcle, please se emal to: tpam@computer.org, a referece IEEECS og umber 0633. phlosophy for moelg complexty s smlar sprt to the mxture of experts approach for solvg regresso problems []. he algorthm scusse ths paper s base o a form of latet varable moel whch s closely relate to both prcpal compoet aalyss (PCA) a factor aalyss. At the top level of the herarchy we have a sgle vsualzato plot correspog to oe such moel. By coserg a probablstc mxture of latet varable moels we obta a soft parttog of the ata set to clusters, correspog to the seco level of the herarchy. Subsequet levels, obtae usg este mxture represetatos, prove successvely refe moels of the ata set. he costructo of the herarchcal tree procees top ow, a ca be rve teractvely by the user. At each stage of the algorthm the relevat moel parameters are eterme usg the expectatomaxmzato (EM) algorthm. I the ext secto, we revew the latetvarable moel, a, Secto 3, we scuss the exteso to mxtures of such moels. hs s further extee to herarchcal mxtures Secto 4, a s the use to formulate a teractve vsualzato algorthm Secto 5. We llustrate the operato of the algorthm Secto 6 usg a smple toy ata set. he we apply the algorthm to a problem volvg the motorg of multphase flows alog ol ppes Secto 7 a to the terpretato of satellte mage ata Secto 8. Fally, extesos to the algorthm, a the relatoshps to other approaches, are scusse Secto 9. AE VAIABES We beg by troucg a smple form of lear latet varable moel a scussg ts applcato to ata aalyss. Here we gve a overvew of the key cocepts, a leave 06888/98/$0.00 998 IEEE

8 IEEE ASACIOS O PAE AAYSIS AD MACHIE IEIGECE, VO. 0, O. 3, MACH 998 the etale mathematcal scusso to Appex A. he am s to f a represetato of a multmesoal ata set terms of two latet (or he ) varables. Suppose the ata space s mesoal wth coorates y, º, y a that the ata set cossts of a set of mesoal vectors {t } where =, º,. ow coser a twomesoal latet space x = (x, x ) together wth a lear fucto whch maps the latet space to the ata space y = Wx + m () where W s a matrx a m s a mesoal vector. he mappg () efes a twomesoal plaar surface the ata space. If we trouce a pror probablty strbuto p(x) over the latet space gve by a zeromea Gaussa wth a ut covarace matrx, the () efes a sgular Gaussa strbuto ata space wth mea m a covarace matrx (y m)(y m) Ò = WW. Fally, sce we o ot expect the ata to be cofe exactly to a twomesoal sheet, we covolve ths strbuto wth a sotropc Gaussa strbuto p(t x, s ) ata space, havg a mea of zero a covarace s I, where I s the ut matrx. sg the rules of probablty, the fal esty moel s obtae from the covoluto of the ose moel wth the pror strbuto over latet space the form p()= t z p( ) t x p( x) x. () Sce ths represets the covoluto of two Gaussas, the tegral ca be evaluate aalytcally, resultg a strbuto p(t) whch correspos to a mesoal Gaussa wth mea m a covarace matrx WW + s I. If we ha cosere a more geeral moel whch the cotoal strbuto p(t x) s gve by a Gaussa wth a geeral agoal covarace matrx (havg epeet parameters), the we woul obta staar lear factor aalyss [], [3]. I fact, our moel s more closely relate to prcpal compoet aalyss, as we ow scuss. he log lkelhoo fucto for ths moel s gve by = l p(t ), a maxmum lkelhoo ca be use to ft the moel to the ata a hece eterme values for the parameters m, W, a s. he soluto for m s ust gve by the sample mea. I the case of the factor aalyss moel, the etermato of W a s correspos to a olear optmzato whch must be performe teratvely. For the sotropc ose covarace matrx, however, t was show by ppg a Bshop [4], [5] that there s a exact close form soluto as follows. If we trouce the sample covarace matrx gve by S = t m t m c hc h, (3) = the the oly ozero statoary pots of the lkelhoo occur for: W = ( s I) /, (4) where the two colums of the matrx are egevectors of S, wth correspog egevalues the agoal matrx, a s a arbtrary orthogoal rotato matrx. Furthermore, t was show that the statoary pot correspog to the global maxmum of the lkelhoo occurs whe the colums of comprse the two prcpal egevectors of S (.e., the egevectors correspog to the two largest egevalues) a that all other combatos of egevectors represet salepots of the lkelhoo surface. It was also show that the maxmumlkelhoo estmator of s s gve by sm = l, (5) = 3 whch has a clear terpretato as the varace lost the proecto, average over the lost mesos. lke covetoal PCA, however, our moel efes a probablty esty ata space, a ths s mportat for the subsequet herarchcal evelopmet of the moel. he choce of a raally symmetrc rather tha a more geeral agoal covarace matrx for p(t x) s motvate by the esre for greater ease of terpretablty of the vsualzato results, sce the proectos of the ata pots oto the latet plae ata space correspo (for small values of s ) to a orthogoal proecto as scusse Appex A. Although we have a explct soluto for the maxmumlkelhoo parameter values, t was show by ppg a Bshop [4], [5] that sgfcat computatoal savgs ca sometmes be acheve by usg the followg EM (expectatomaxmzato) algorthm [6], [7], [8]. sg (), we ca wrte the log lkelhoo fucto the form = lz pt xc p xhx, (6) = whch we ca regar the quattes x as mssg varables. he posteror strbuto of the x, gve the observe t a the moel parameters, s obtae usg Bayes theorem a aga cossts of a Gaussa strbuto. he Estep the volves the use of ol parameter values to evaluate the suffcet statstcs of ths strbuto the form x Ò = M W (t m) (7) xx = s M + x x, (8) where M = W W + s I s a matrx, a Ò eotes the expectato compute wth respect to the posteror strbuto of x. he Mstep the maxmzes the expectato of the completeata log lkelhoo to gve s = M O QP M O c h QP W = t m x x x = = t + W W x x { m r x W t m}(0) e c h = (9)

BISHOP AD IPPIG: A HIEACHICA AE VAIABE MODE FO DAA VISAIZAIO 83 whch eotes ew quattes. ote that the ew value for W obtae from (9) s use the evaluato of s (0). he moel s trae by alterately evaluatg the suffcet statstcs of the latetspace posteror strbuto usg (7) a (8) for gve s a W (the Estep), a reevaluatg s a W usg (9) a (0) for gve x Ò a xx (the Mstep). It ca be show that, at each stage of the EM algorthm, the lkelhoo s crease uless t s alreay at a local maxmum, as emostrate Appex E. For ata pots mesos, evaluato of the sample covarace matrx requres O( ) operatos, a so ay approach to fg the prcpal egevectors base o a explct evaluato of the covarace matrx must have at least ths orer of computatoal complexty. By cotrast, the EM algorthm volves steps whch are oly O(). hs savg of computatoal cost s a cosequece of havg a latet space whose mesoalty (whch, for the purposes of our vsualzato algorthm, s fxe at two) oes ot scale wth. If we substtute the expressos for the expectatos gve by the Estep equatos (7) a (8) to the Mstep equatos we obta the followg reestmato formulas W = SW(s I + M W SW) () o t () s = r S SWM W whch shows that all of the epeece o the ata occurs through the sample covarace matrx S. hus the EM algorthm ca be expresse as alterate evaluatos of () a (). (ote that () volves a combato of ol a ew quattes.) hs form of the EM algorthm has bee trouce for llustratve purposes oly, a woul volve O( ) computatoal cost ue to the evaluato of the covarace matrx. We have see that each ata pot t uces a Gaussa posteror strbuto p(x t ) the latet space. For the purposes of vsualzato, however, t s coveet to summarze each such strbuto by ts mea, gve by x Ò, as llustrate Fg.. ote that these quattes are obtae rectly from the output of the Estep (7). hus, a set of ata pots {t } where =, º, s proecte oto a correspog set of pots { x Ò} the twomesoal latet space. 3 MIXES OF AE VAIABE MODES We ca perform a automatc soft clusterg of the ata set, a at the same tme obta multple vsualzato plots correspog to the clusters, by moelg the ata wth a mxture of latet varable moels of the k escrbe Secto. he correspog esty moel takes the form af p t = p p t = ch (3) Fg.. Illustrato of the proecto of a ata pot oto the mea of the posteror strbuto latet space. where M 0 s the umber of compoets the mxture, a the parameters p are the mxg coeffcets, or pror probabltes, correspog to the mxture compoets p(t ). Each compoet s a epeet latet varable moel wth parameters m, W, a s. hs mxture strbuto wll form the seco level our herarchcal moel. he EM algorthm ca be extee to allow a mxture of the form (3) to be ftte to the ata (see Appex B for etals). o erve the EM algorthm we ote that, ato to the {x }, the mssg ata ow also clues labels whch specfy whch compoet s resposble for each ata pot. It s coveet to eote ths mssg ata by a set of varables z where z = f t was geerate by moel (a zero otherwse). he pror expectatos for these varables are gve by the p a the correspog posteror probabltes, or resposbltes, are evaluate the extee Estep usg Bayes theorem the form c h pp t = P t =. (4) p pct h Although a staar EM algorthm ca be erve by treatg the {x } a the z otly as mssg ata, a more effcet algorthm ca be obtae by coserg a twostage form of EM. At each complete cycle of the algorthm we commece wth a ol set of parameter values p, m, W, a s. We frst use these parameters to evaluate the posteror probabltes usg (4). hese posteror probabltes are the use to obta ew values p a m usg the followg reestmato formulas p = (5) t m =. (6) he ew values m are the use evaluato of the suffcet statstcs for the posteror strbuto for x m x = M W t = + s c h (7) x x = s M + x x (8) where M W W I. Fally, these statstcs are use to evaluate ew values W a s usg

84 IEEE ASACIOS O PAE AAYSIS AD MACHIE IEIGECE, VO. 0, O. 3, MACH 998 M O QP M O c h QP W = t m x x x = t m s x W t + m r x x W W (9) c h V (0) whch are erve Appex B. As for the sgle latet varable moel, we ca substtute the expressos for x Ò a xx, gve by (7) a (8), respectvely, to (9) a (0). We the see that the reestmato formulae for W a s take the form e () W = SW I + M W SW s s = r es SWM W, () where all of the ata epeece bee expresse terms of the quattes S = t t c mhc m h, (3) a we have efe =. he matrx S ca clearly be terprete as a resposblty weghte covarace matrx. Aga, for reasos of computatoal effcecy, the form of EM algorthm gve by (7) to (0) s to be preferre f s large. 4 HIEACHICA MIXE MODES We ow exte the mxture represetato of Secto 3 to gve a herarchcal mxture moel. Our formulato wll be qute geeral a ca be apple to mxtures of ay parametrc esty moel. So far we have cosere a twolevel system cosstg of a sgle latet varable moel at the top level a a mxture of M 0 such moels at the seco level. We ca ow exte the herarchy to a thr level by assocatg a group * of latet varable moels wth each moel the seco level. he correspog probablty esty ca be wrtte the form af = Œ* p t = p p p t,, (4) where p(t, ) aga represet epeet latet varable moels, a p correspo to sets of mxg coeffcets, oe for each, whch satsfy p =. hus, each level of the herarchy correspos to a geeratve moel, wth lower levels gvg more refe a etale represetatos. hs moel s llustrate Fg.. Determato of the parameters of the moels at the thr level ca aga be vewe as a mssg ata problem whch the mssg formato correspos to labels specfyg whch moel geerate each ata pot. Whe o formato about the labels s prove the log lkelhoo for the moel (4) woul take the form Fg.. he structure of the herarchcal moel. = l p p p t, = = Œ* V. (5) If, however, we were gve a set of cator varables z specfyg whch moel at the seco level geerate each ata pot t the the log lkelhoo woul become = z l p p p t, = = Œ* V. (6) I fact, we oly have partal, probablstc, formato the form of the posteror resposbltes for each moel havg geerate the ata pots t, obtae from the seco level of the herarchy. akg the expectato of (6), we the obta the log lkelhoo for the thr level of the herarchy the form = l p p p t, = = Œ* V, (7) whch the are costats. I the partcular case whch the are all 0 or, correspog to complete certaty about whch moel the seco level s resposble for each ata pot, the log lkelhoo (7) reuces to the form (6). Maxmzato of (7) ca aga be performe usg the EM algorthm, as scusse Appex C. hs has the same form as the EM algorthm for a smple mxture, scusse Secto 3, except that the Estep, the posteror probablty that moel (, ) geerate ata pot t s gve by

BISHOP AD IPPIG: A HIEACHICA AE VAIABE MODE FO DAA VISAIZAIO 85 whch, =, (8). (9) p p t, = p p t, ote that are costats eterme from the seco level of the herarchy, a are fuctos of the ol parameter values the EM algorthm. he expresso (9) automatcally satsfes the relato, = (30) Œ* so that the resposblty of each moel at the seco level for a gve ata pot s share by a partto of uty betwee the correspog group of offsprg moels at the thr level. he correspog EM algorthm ca be erve by a straghtforwar exteso of the scusso gve Secto 3 a Appex B, a s outle Appex C. hs shows that the Mstep equatos for the mxg coeffcets a the meas are gve by, p =, (3), m, =, t. (3) he posteror expectatos for the mssg varables z, are the gve by x = M W t m Fally, the W, a s, equatos M,,,,,,,,,, e (33) x x = s M + x x (34) are upate usg the Mstep O QP M e,,, t m, x, W, et m,, r,,,, V W = t m x x x,,,,,,, = s, + O QP (35) x x W W. (36) Aga, we ca substtute the Estep equatos to the Mstep equatos to obta a set of upate formulas of the form W = S W s I + M W S W e (37),,,,,,,, e (38) s, = r S, S, W, M, W, where all of the summatos over have bee expresse terms of the quattes S, = t t m m, e e (39),,, whch we have efe, =,. he S, ca aga be terprete as resposbltyweghte covarace matrces. It s straghtforwar to exte ths herarchcal moelg techque to ay esre umber of levels, for ay parametrc famly of compoet strbutos. 5 HE VISAIZAIO AGOIHM So far, we have escrbe the theory beh herarchcal mxtures of latet varable moels, a have llustrate the overall form of the vsualzato herarchy Fg.. We ow complete the escrpto of our algorthm by coserg the costructo of the herarchy, a ts applcato to ata vsualzato. Although the tree structure of the herarchy ca be preefe, a more terestg possblty, wth greater practcal applcablty, s to bul the tree teractvely. Our multlevel vsualzato algorthm begs by fttg a sgle latet varable moel to the ata set, whch the value of m s gve by the sample mea. For low values of the ata space mesoalty, we ca f W a s rectly by evaluatg the covarace matrx a applyg (4) a (5). However, for larger values of, t may be computatoally more effcet to apply the EM algorthm, a a scheme for talzg W a s s gve Appex D. Oce the EM algorthm has coverge, the vsualzato plot s geerate by plottg each ata pot t at the correspog posteror mea x Ò latet space. O the bass of ths plot, the user the eces o a sutable umber of moels to ft at the ext level ow, a selects pots x () o the plot correspog, for example, to the ceters of apparet clusters. he resultg pots y () ata space, obtae from (), are the use to talze the meas m of the respectve submoels. o talze the remag parameters of the mxture moel, we frst assg the ata pots to ther earest mea vector m, a the ether compute the correspog sample covarace matrces a apply a rect egevector ecomposto, or use the talzato scheme of Appex D followe by the EM algorthm. Havg eterme the parameters of the mxture moel at the seco level we ca the obta the correspog set of vsualzato plots, whch the posteror meas x Ò are aga use to plot the ata pots. For these, t s useful to plot all of the ata pots o every plot, but to mofy the esty of k proporto to the resposblty whch each plot has for that partcular ata pot. hus, f oe partcular compoet takes most of the resposblty for a partcular pot, the that pot wll effectvely be vsble oly o the correspog plot. he proecto of a ata pot oto the latet spaces for a mxture of two latet varable moels s llustrate schematcally Fg. 3.

86 IEEE ASACIOS O PAE AAYSIS AD MACHIE IEIGECE, VO. 0, O. 3, MACH 998 Fg. 3. Illustrato of the proecto of a ata pot oto the latet spaces of a mxture of two latet varable moels. he resultg vsualzato plots are the use to select further submoels, f esre, wth the resposblty weghtg of (8) beg corporate at ths stage. If t s ece ot to partto a partcular moel at some level, the t s easly see from (30) that the result of trag s equvalet to copyg the moel ow uchage to the ext level. Equato (30) further esures that the combato of such cope moels wth those geerate through further submoelg efes a cosstet probablty moel, such as that represete by the lower three moels Fg.. he talzato of the moel parameters s by rect aalogy wth the secolevel scheme, wth the covarace matrces ow also volvg the resposbltes as weghtg coeffcets, as (3). Aga, each ata pot s prcple plotte o every moel at a gve level, wth a esty of k proportoal to the correspog posteror probablty, gve, for example, by (8) the case of the thr level of the herarchy. Deeper levels of the herarchy volve greater umbers of parameters, a t s therefore mportat to avo overfttg a to esure that the parameter values are welleterme by the ata. If we coser prcpal compoet aalyss, the we see that three (ocolear) ata pots are suffcet to esure that the covarace matrx has rak two a hece that the frst two prcpal compoets are efe, rrespectve of the mesoalty of the ata set. I the case of our latet varable moel, four ata pots are suffcet to eterme both W a s. From ths, we see that we o ot ee excessve umbers of ata pots each leaf of the tree, a that the mesoalty of the space s largely rrelevat. Fally, t s ofte also useful to be able to vsualze the spatal relatoshp betwee a group of moels at oe level a ther paret at the prevous level. hs ca be oe by coserg the orthogoal proecto of the latet plae ata space oto the correspog plae of the paret moel, as llustrate Fg. 4. For each moel the herarchy (except those at the lowest level), we ca plot the proectos of the assocate moels from the level below. I the ext secto, we llustrate the operato of ths algorthm whe apple to a smple toy ata set, before presetg results from the stuy of more realstc ata Sectos 7 a 8. Fg. 4. Illustrato of the proecto of oe of the latet plaes oto ts paret plae. 6 ISAIO SIG OY DAA We frst coser a toy ata set cosstg of 450 ata pots geerate from a mxture of three Gaussas a threemesoal space. Each Gaussa s relatvely flat (has small varace) oe meso, a all have the same covarace but ffer ther meas. wo of these pacakelke clusters are closely space, whle the thr s well separate from the frst two. he structure of ths ata set has bee chose orer to llustrate the teractve costructo of the herarchcal moel. o vsualze the ata, we frst geerate a sgle toplevel latet varable moel, a plot the posteror mea of each ata pot the latet space. hs plot s show at the top of Fg. 5, a clearly suggests the presece of two stct clusters wth the ata. he user the selects two tal cluster ceters wth the plot, whch talze the secolevel. hs leas to a mxture of two latet varable moels, the latet spaces of whch are plotte at the seco level Fg. 5. Of these two plots, that o the rght shows evece of further structure, a so a submoel s geerate, aga base o a mxture of two latet varable moels, whch llustrates that there are ee two further stct clusters. At ths thr step of the ata explorato, the herarchcal ature of the approach s evet as the latter two moels oly attempt to accout for the ata pots whch have alreay bee moele by ther mmeate acestor. Iee, a group of offsprg moels may be combe wth the sblgs of the paret a stll efe a cosstet esty moel. hs s llustrate Fg. 5, whch oe of the seco level plots has bee cope ow (show by the otte le) a combe wth the other thrlevel moels. Whe offsprg plots are geerate from a paret, the extet of each offsprg latet space (.e., the axs lmts show o the plot) s cate by a proecte rectagle wth the paret space, usg the approach llustrate Fg. 4, a these rectagles are umbere sequetally such that the leftmost submoel s. I orer to splay the relatve oretatos of the latet plaes, ths umber s plotte o the se of the rectagle whch correspos to the top of the correspog offsprg plot. he orgal three clusters have bee vually colore, a t ca be see that the re, yellow, a blue ata

BISHOP AD IPPIG: A HIEACHICA AE VAIABE MODE FO DAA VISAIZAIO 87 Fg. 5. A summary of the fal results from the toy ata set. Each ata pot s plotte o every moel at a gve level, but wth a esty of k whch s proportoal to the posteror probablty of that moel for the gve ata pot. pots have bee almost perfectly separate the thr level. 7 OI FOW DAA As a example of a more complex problem, we coser a ata set arsg from a ovasve motorg system use to eterme the quatty of ol a multphase ppele cotag a mxture of ol, water, a gas [9]. he agostc ata s collecte from a set of three horzotal a three vertcal beamles alog whch gamma rays at two fferet eerges are passe. By measurg the egree of atteuato of the gammas, the fractoal path legth through ol a water (a, hece, gas) ca realy be eterme, gvg agostc measuremets total. I practce, the am s to solve the verse problem of etermg the fracto of ol the ppe. he complexty of the problem arses from the possblty of the multphase mxture aoptg oe of a umber of fferet geometrcal cofguratos. Our goal s to vsualze the structure of the ata the orgal mesoal space. A ata set cosstg of,000 pots s obtae sythetcally by smulatg the physcal processes the ppe, clug the presece of ose omate by photo statstcs. ocally, the ata s expecte to have a trsc mesoalty of two correspog to the two egrees of freeom gve by the fracto of ol a the fracto of water (the fracto of gas beg reuat). However, the presece of fferet flow cofguratos, as well as the geometrcal teracto betwee phase bouares a the beam paths, leas to umerous stct clusters. It woul appear that a herarchcal approach of the k scusse here shoul be capable of scoverg ths structure. esults from fttg the ol flow ata usg a threelevel herarchcal moel are show Fg. 6. I the case of the toy ata scusse Secto 6, the optmal choce of clusters a subclusters s relatvely uambguous a a sgle applcato of the algorthm s suffcet to reveal all of the terestg structure wth the ata. For more complex ata sets, t s approprate to aopt a exploratory perspectve a vestgate alteratve herarches, through the selecto of fferg umbers of clusters a ther respectve locatos. he example show Fg. 6 has clearly bee hghly successful. ote how the apparetly sgle cluster, umber, the toplevel plot s reveale to be two qute stct clusters at the seco level, a how ata pots from the homogeeous cofgurato have bee solate a ca be see to le o a twomesoal tragular structure the thr level.

88 IEEE ASACIOS O PAE AAYSIS AD MACHIE IEIGECE, VO. 0, O. 3, MACH 998 Fg. 6. esults of fttg the ol ata. Colors eote fferet multphase flow cofguratos correspog to homogeeous (re), aular (blue), a lamar (yellow). 8 SAEIE IMAGE DAA As a fal example, we coser the vsualzato of a ata set obtae from remotesesg satellte mages. Each ata pot represets a 3 3 pxel rego of a satellte la mage, a, for each pxel, there are four measuremets of testy take at fferet wavelegths (approxmately re a gree the vsble spectrum, a two the ear frare). hs gves a total of 36 varables for each ata pot. here s also a label catg the type of la represete by the cetral pxel. hs ata set has prevously bee the subect of a classfcato stuy wth the SAOG proect [0]. We apple the herarchcal vsualzato algorthm to 600 ata pots, wth 00 raw at raom of each of sx classes the 4,435pot ata set. he result of fttg a threelevel herarchy s show Fg. 7. ote that the class labels are use oly to color the ata pots a play o role the maxmum lkelhoo etermato of the moel parameters. Fg. 7 llustrates that the ata ca be approxmately separate to classes, a the gray sol Æ amp gray sol Æ very amp gray sol cotuum s clearly evet compoet 3 at the seco level. Oe partcularly terestg atoal feature s that there appear to be two stct a separate clusters of cotto crop pxels, mxtures a at the seco level, whch are ot evet the sgle toplevel proecto. Stuy of the orgal mage [0] ee cates that there are two separate areas of cotto crop. 9 DISCSSIO We have presete a ovel approach to ata vsualzato whch s both statstcally prcple a whch, as llustrate by real examples, ca be very effectve at revealg structure wth ata. he herarchcal summares of Fgs. 5, 6, a 7 are relatvely smple to terpret, yet stll covey coserable structural formato. It s mportat to emphasze that ata vsualzato there s o obectve measure of qualty, a so t s ffcult to quatfy the mert of a partcular ata vsualzato techque. hs s oe reaso, o oubt, why there s a multtue of vsualzato algorthms a assocate software avalable. Whle the effectveess of may of these techques s ofte hghly ataepeet, we woul expect the herarchcal vsualzato moel to be a very useful tool for the vsualzato a exploratory aalyss of ata may applcatos. I relato to prevous work, the cocept of subsettg, or solatg, ata pots for further vestgato ca be trace back to Maltso a Damma [], a was further evelope by Frema a ukey [] for exploratory ata aalyss coucto wth proecto pursut. Such subsettg operatos are also possble curret yamc vsualzato software, such as XGob [3]. However, these approaches there are two lmtatos. Frst, the parttog of the ata s performe a har fasho, whle the mxture of latet varable moels approach scusse ths paper permts a soft parttog whch ata pots ca effectvely belog to more tha oe cluster at ay gve level. Seco, the mechasm for the parttog of the ata s proe to suboptmalty as the clusters must be fxe

BISHOP AD IPPIG: A HIEACHICA AE VAIABE MODE FO DAA VISAIZAIO 89 Fg. 7. esults of fttg the satellte mage ata. by the user base o a sgle twomesoal proecto. I the herarchcal approach avocate ths paper, the user selects oly a frst guess for the cluster ceters the mxture moel. he EM algorthm s the utlze to eterme the parameters whch maxmze the lkelhoo of the moel, thus allowg both the ceters a the wths of the clusters to aapt to the ata the full multmesoal ata space. here s also some smlarty betwee our metho a earler herarchcal methos scrpt recogto [4] a moto plag [5] whch corporate the Kohoe SelfOrgazg Feature Map [6] a so offer the potetal for vsualzato. As well as aga performg a har clusterg, a key stcto both of these approaches s that fferet levels the herarches operate o fferet subsets of put varables a ther operato s thus qute fferet from the herarchcal algorthm escrbe ths paper. Our moel s base o a herarchcal combato of lear latet varable moels. A relate latet varable techque calle the geeratve topographc mappg (GM) [7] uses a olear trasformato from latet space to ata space a s aga optmze usg a EM algorthm. It s straghtforwar to corporate GM place of the lear latet varable moels the curret herarchcal framework. As escrbe, our moel apples to cotuous ata varables. We ca easly exte the moel to hale screte ata as well as combatos of screte a cotuous varables. I case of a set of bary ata varables y k Œ {0, }, we ca express the cotoal strbuto of a bary varable, gve x, usg a bomal strbuto t t k k c h e e k of the form p tx = ks wkx + mk s wkx + m where s(a) = ( + exp(a)) s the logstc sgmo fucto, a w k s the kth colum of W. For ata havg a ofd cog scheme we ca represet the strbuto of ata varables usg a multomal strbuto of the form pctxh = D k m x k = k where m k are efe by a softmax, or ormalze expoetal, trasformato of the form expewkx + mk m k = exp w x + m e. (40) If we have a ata set cosstg of a combato of cotuous, bary a categorcal varables, we ca formulate the approprate moel by wrtg the cotoal strbuto p(t x) as a prouct of Gaussa, bomal a multomal strbutos as approprate. he Estep of the EM algorthm ow becomes more complex sce the margalzato over the latet varables, eee to ormalze the posteror strbuto latet space, wll geeral be aalytcally tractable. Oe approach s to approxmate the tegrato usg a fte sample of pots raw from the pror [7]. Smlarly, the Mstep s more complex, although t ca be tackle effcetly usg the teratve reweghte least squares (IS) algorthm [8]. Oe mportat coserato wth the preset moel s that the parameters are eterme by maxmum lkelhoo, a ths crtero ee ot always lea to the most terestg vsualzato plots. We are curretly vestgatg alteratve moels whch optmze other crtera such as the separato of clusters. Other possble refemets,

90 IEEE ASACIOS O PAE AAYSIS AD MACHIE IEIGECE, VO. 0, O. 3, MACH 998 clue algorthms whch allow a selfcosstet fttg of the whole tree, so that lower levels have the opportuty to fluece the parameters at hgher levels. Whle the userrve ature of the curret algorthm s hghly approprate for the vsualzato cotext, the evelopmet of a automate proceure for geeratg the herarchy woul clearly also be of terest. A software mplemetato of the probablstc herarchcal vsualzato algorthm MAAB s avalable from: http://www.crg.asto.ac.uk/phvs APPEDIX A POBABIISIC PICIPA COMPOE AAYSIS AD EM he algorthm scusse ths paper s base o a latet varable moel correspog to a Gaussa strbuto wth mea m a covarace WW + s I, whch the parameters of the moel, gve by m, W, a s are eterme by maxmzg the lkelhoo fucto gve by (6). For a sgle such moel, the soluto for the mea m s gve by the sample mea of the ata set. We ca express the solutos for W a s close form terms of the egevectors a egevalues of the sample covarace matrx, as scusse Secto. Here we erve a alteratve approach base o the EM (expectatomaxmzato) algorthm. We frst regar the varables x appearg (6) as mssg ata. If these quattes were kow, the the correspog complete ata log lkelhoo fucto woul be gve by c h c h = = C = l p t, x = l p t x p x. (4) We o ot, of course, kow the values of the x, but we ca f ther posteror strbuto usg Bayes theorem the form b g p xt b g af af p tx p x =. (4) p t Sce p(t x) s Gaussa wth mea Wx + m a covarace s I, a p(x) s Gaussa wth zero mea a ut varace, t follows by completg the square that p(x t) s also Gaussa wth mea gve by M W (t m), a covarace gve by s M, where we have efe M = W W + s I. We ca the compute the expectato of C wth respect to ths posteror strbuto to gve C = l s r = S e xx t m s + VW x Wc t mh rew W xx (43) s s whch correspos to the Estep of the EM algorthm. he Mstep correspos to the maxmzato of C Ò wth respect to W a s, for fxe x Ò a xx. hs s straghtforwar, a gves the results (9) a (0). A smple proof of covergece for the EM algorthm s gve Appex E. A mportat aspect of our algorthm s the choce of a sotropc covarace matrx for the ose moel of the form s I. he maxmum lkelhoo soluto for W s gve by the scale prcpal compoet egevectors of the ata set, the form W= q ( q s I) / (44) where q s a q matrx whose colums are the egevectors of the ata covarace matrx correspog to the q largest egevalues (where q s the mesoalty of the latet space, so that q = our moel), a q s a q q agoal matrx whose elemets are gve by the egevalues. he matrx s a arbtrary orthogoal matrx correspog to a rotato of the axes latet space. hs result s erve a scusse [5], a shows that the mage of the latet plae ata space coces wth the prcpal compoets plae. Also, for s Æ 0, the proecto of ata pots oto the latet plae, efe by the posteror meas x Ò, coces wth the prcpal compoets proecto. o see ths we ote that whe a pot x latet space s proecte oto a pot Wx + m ata space, the square stace betwee the proecte pot a a ata pot t s gve by Wx + m t. (45) If we mmze ths stace wth respect to x we obta a soluto for the orthogoal proecto of t oto the plae efe by W a m, gve by Wx + m where x = W W W t m. (46) e c h We see from (7) that, the lmt s Æ 0, the posteror mea for a ata pot t reuces to (46) a hece the correspog pot W x Ò + m s gve by the orthogoal proecto of t oto the plae efe by (). For s π 0, the posteror mea s skewe towars the org by the pror, a hece the proecto Wx + µ s shfte towar m. he crucal fferece betwee our latet varable moel a prcpal compoet aalyss s that, ulke PCA, our moel efes a probablty esty, a hece allows us to coser mxtures, a ee herarchcal mxtures, of moels a probablstcally prcple maer. APPEDIX B EM FO MIXES OF PICIPA COMPOE AAYZES At the seco level of the herarchy we must ft a mxture of latet varable moels, whch the overall moel strbuto takes the form af p t = p p t, (47) = ch

BISHOP AD IPPIG: A HIEACHICA AE VAIABE MODE FO DAA VISAIZAIO 9 where p(t ) s a sgle latet varable moel of the form scusse Appex A a p s the correspog mxg proporto. he parameters for ths mxture moel ca be eterme by a exteso of the EM algorthm. We beg by coserg the staar form whch the EM algorthm woul take for ths moel a hghlght a umber of lmtatos. We the show that a twostage form of EM leas to a much more effcet algorthm. We frst ote that ato to a set of x for each moel, the mssg ata clues varables z labelg whch moel s resposble for geeratg each ata pot t. At ths pot, we ca erve a staar EM algorthm by coserg the correspog completeata log lkelhoo whch takes the form = z l p p t, x C = = c hs. (48) Startg wth ol values for the parameters p, m, W, a s we frst evaluate the posteror probabltes usg (4) a smlarly evaluate the expectatos x Ò a x x usg (7) a (8) whch are easly obtae by specto of (7) a (8). he we take the expectato of C wth respect to ths posteror strbuto to obta C M 0 = l p ls r x x = = S t + m x W t s s r e c h m ew W x x V + cost. (49) s where eotes the expectato wth respect to the posteror strbutos of both x a z. he Mstep the volves maxmzg (49) wth respect to p, m, s, a W to obta ew values for these parameters. he maxmzato wth respect to p must take accout of the costrat that p =. hs ca be acheve wth the use of a agrage multpler l [8] by maxmzg F H G = I K J C + l p. (50) ogether wth the results of maxmzg (48) wth respect to the remag parameters, ths gves the followg Mstep equatos p = (5) e (5) = t W x m M O QP M O c h QP W = t m x x x (53) t m c h e V. (54) = s x W t + µ r x x W W ote that the Mstep equatos for m a W, gve by (5) a (53), are couple, a so further mapulato s requre to obta explct solutos. I fact, a smplfcato of the Mstep equatos, alog wth mprove spee of covergece, s possble f we aopt a twostage EM proceure as follows. he lkelhoo fucto we wsh to maxmze s gve by = l p p t = = c hv. (55) egarg the compoet labels z as mssg ata, we ca coser the correspog expecte completeata log lkelhoo gve by $ lppcths, (56) C = = = where represet the posteror probabltes (correspog to the expecte values of z ) a are gve by (4). Maxmzato of (56) wth respect to p, aga usg a agrage multpler, gves the Mstep equato (5). Smlarly, maxmzato of (56) wth respect to m gves (6). I orer to upate W a s, we seek oly to crease the value of $ C a ot actually to maxmze t. hs correspos to the geeralze EM (or GEM) algorthm. We o ths by treatg the labels z as mssg ata a performg oe cycle of the EM algorthm. hs volves usg the ew values m to compute the suffcet statstcs of the posteror strbuto of x usg (7) a (8). he avatage of ths strategy s that we are usg the ew rather tha ol values of m computg these statstcs, a overall ths leas to smplfcatos to the algorthm as well as mprove covergece spee. By specto of (49) we see that the expecte completeata log lkelhoo takes the form C M 0 = l p ls r x x = = S t + m x W t m s s r c h ew W x x V. (57) s We the maxmze (57) wth respect to W a s (keepg m fxe). hs gves the Mstep equatos (9) a (0). e

9 IEEE ASACIOS O PAE AAYSIS AD MACHIE IEIGECE, VO. 0, O. 3, MACH 998 APPEDIX C EM FO HIEACHICA MIXE MODES I the case of the thr a subsequet levels of the herarchy we have to maxmze a lkelhoo fucto of the form (7) whch the a the p are treate as costats. o obta a EM algorthm we ote that the lkelhoo fucto ca be wrtte as = p = l p where p t, = = Œ* V. (58) Sce the parameters for fferet values of are epeet ths represets M 0 epeet moels each of whch ca be ftte separately, a each of whch correspos to a mxture moel but wth weghtg coeffcets. We ca the erve the EM algorthm by troucg, for each, the expecte completeata lkelhoo the form l p p t, C = = Œ* { } (59) where s efe by (9) a we have omtte the costat term volvg p. hus, the resposblty of the th submoel group * for geeratg ata pot t s effectvely weghte by the resposblty of ts paret moel. Maxmzato of (59) gves rse to weghte Mstep equatos for the W,, m,, a s, parameters wth weghtg factors, gve by (8), as scusse the text. For the mxg coeffcets p, we ca trouce a agrage multpler l, a hece maxmze the fucto l p + l p = Œ* F HG I KJ (60) to obta the Mstep result (3). A fal coserato s that whle each offsprg mxture wth the herarchy s ftte to the etre ata set, the resposbltes of ts paret moel for may of the ata pots wll approach zero. hs mples that the weghte resposbltes for the compoet moels of the mxture wll lkewse be at least as small. hus, a practcal mplemetato, we ee oly ft offsprg mxture moels to a reuce ata set, where ata pots for whch the paretal resposblty s less tha some threshol are scare. For reasos of umercal accuracy, ths threshol shoul be o smaller tha the mache precso (whch s. 0 6 for oubleprecso arthmetc). We aopte such a threshol for the expermets wth ths paper, a observe a coserable computatoal avatage, partcularly at eeper levels the herarchy. APPEDIX D IIIAIZAIO OF HE EM AGOIHM Here we outle a smple proceure for talzg W a s before applyg the EM algorthm. Coser a covarace matrx S wth egevalues u a egevalues l. A arbtrary vector v wll have a expaso the egebass of the form v = v u, where v = v u. If we multply v by S, we obta a vector l v u whch wll te to be omate by the egevector u wth the largest egevalue l. epeate multplcato a ormalzato wll gve a creasgly mprove estmate of the ormalze egevector a of the correspog egevalue. I orer to f the frst two egevectors a egevalues, we start wth a raom matrx V a after each multplcato we orthoormalze the colums of V. We choose two ata pots at raom a, after subtracto of m, use these as the colums of V to prove a startg pot for ths proceure. Degeerate egevalues o ot preset a problem sce ay two orthogoal vectors the prcpal subspace wll suffce. I practce oly a few matrx multplcatos are requre to obta a sutable tal estmate. We ow talze W usg the result (4), a talze s usg (5). I the case of mxtures we smply apply ths proceure for each weghte covarace matrx S tur. As state ths proceure appears to requre the evaluato of S, whch woul take O( ) computatoal steps a woul therefore efeat the purpose of usg the EM algorthm. However, we oly ever ee to evaluate the prouct of S wth some vector, whch ca be performe O() steps by rewrtg the prouct as c hc h = Sv = t m t m v (6) a evaluatg the er proucts before performg the summato over. Smlarly the trace of S, requre to talze s, ca also be obtae O() steps. APPEDIX E COVEGECE OF HE EM AGOIHM Here we gve a very smple emostrato that the EM algorthms of the k scusse ths paper have the esre property of guarateeg that the lkelhoo wll be crease at each cycle of the algorthm uless the parameters correspo to a (local) maxmum of the lkelhoo. If we eote the set of observe ata by D, the the log lkelhoo whch we wsh to maxmze s gve by = p(d q) (6) where q eotes the set of parameters of the moel. If we eote the mssg ata by M, the the completeata log lkelhoo fucto,.e., the lkelhoo fucto whch woul be applcable f M were actually observe, s gve by C = l p(d, M q). (63) I the Estep of the EM algorthm, we evaluate the posteror strbuto of M gve the observe ata D a some curret values q ol for the parameters. We the use ths strbuto to take the expectato of C, so that C (q)ò =z l {p(d, M q)}p(m D, q ol)m. (64) I the Mstep, the quatty C (q)ò s maxmze wth respect to q to gve q ew. From the rules of probablty we have

BISHOP AD IPPIG: A HIEACHICA AE VAIABE MODE FO DAA VISAIZAIO 93 p(d, M q) = p(m D, q) p(d q) (65) a substtutg ths to (64) gves C (q)ò = lp(d q) + z l {p(m D, q)} p(m D, q ol)m. (66) he chage the lkelhoo fucto gog from ol to ew parameter values s therefore gve by lp(d q ew) lp(d q ol) = C (q ew) Ò C (q ol)ò z V l pmd, qew pmd, qol M pmd, q ol. (67) he fal term o the rghtha se of (67) s the Kullbackebler vergece betwee the ol a ew posteror strbutos. sg Jese s equalty t s easly show that K(q q ol) 0 [8]. Sce we have maxmze C Ò (or more geerally ust crease ts value the case of the GEM algorthm) gog from q ol to q ew, we see that p(d q ew) > p(d q ol) as requre. ACKOWEDGMES hs work was supporte by EPSC grat G/K5808: eural etworks for Vsualzato of HghDmesoal Data. We are grateful to Mchael Jora for useful scussos, a we woul lke to thak the Isaac ewto Isttute Cambrge for ther hosptalty. EFEECES [] M.I. Jora a.a. Jacobs, Herarchcal Mxtures of Experts a the EM Algorthm, eural Computato, vol. 6, o., pp. 8 4, 994. [] B.S. Evertt, A Itroucto to atet Varable Moels. oo: Chapma a Hall, 984. [3] W.J. Krzaowsk a F.H.C. Marrott, Multvarate Aalyss Part : Classfcato, Covarace Structures a epeate Measuremets. oo: Ewar Arol, 994. [4] M.E. ppg a C.M. Bshop, Mxtures of Prcpal Compoet Aalysers, Proc. IEE Ffth It l Cof. Artfcal eural etworks, pp. 38, Cambrge,.K., July 997. [5] M.E. ppg a C.M. Bshop, Mxtures of Probablstc Prcpal Compoet Aalysers, ech. ep. CG/97/003, eural Computg esearch Group, Asto versty, Brmgham,.K., 997. [6] A.P. Dempster,.M. ar, a D.B. ub, Maxmum kelhoo From Icomplete Data va the EM Algorthm, J. oyal Statstcal Soc., B, vol. 39, o., pp. 38, 977. [7] D.B. ub a D.. hayer, EM Algorthms for M Factor Aalyss, Psychometrka, vol. 47, o., pp. 6976, 98. [8] C.M. Bshop, eural etworks for Patter ecogto. Oxfor v. Press, 995. [9] C.M. Bshop a G.D. James, Aalyss of Multphase Flows sg DualEergy Gamma Destometry a eural etworks, uclear Istrumets a Methos Physcs esearch, vol. A37, pp. 580593, 993. [0] D. Mche, D.J. Spegelhalter, a C.C. aylor, Mache earg, eural a Statstcal Classfcato. ew York: Ells Horwoo, 994. [].. Maltso a J.E. Damma, A echque for Determg a Cog Subclasses Patter ecogto Problems, IBM J., vol. 9, pp. 9430, 965. [] J.H. Frema a J.W. ukey, A Proecto Pursut Algorthm for Exploratory Data Aalyss, IEEE ras. Computers, vol. 3, pp. 88889, 974. [3] A. Bua, D. Cook, a D.F. Swaye, Iteractve Hgh Dmesoal Data Vsualzato, J. Computatoal a Graphcal Statstcs, vol. 5, o., pp. 7899, 996. [4]. Mkkulae, Scrpt ecogto Wth Herarchcal Feature Maps, Coecto Scece, vol., pp. 830, 990. [5] C. Verso a.m. Gambarella, earg Fe Moto by sg the Herarchcal Extee Kohoe Map, Artfcal eural etworks ICA 96, C. vo er Malsburg, W. vo Seele, J.C. Vorbrügge, a B. Sehoff, es., ecture otes Computer Scece, vol.,, pp. 6. Berl: SprgerVerlag, 996. [6]. Kohoe, SelfOrgazg Maps. Berl: SprgerVerlag, 995. [7] C.M. Bshop, M. Svesé, a C.K.I. Wllams, GM: he Geeratve opographc Mappg, eural Computato, vol. 0, o., pp. 534, 998. [8] P. McCullagh a J.A. eler, Geeralze ear Moels, e. Chapma a Hall, 989. Chrstopher M. Bshop grauate from the versty of Oxfor 980 wth Frst Class Hoors Physcs a obtae a PhD from the versty of Eburgh quatum fel theory 983. After a pero at Culham aboratory researchg the theory of magetcally cofe plasmas for the fuso program, he evelope a terest statstcal patter recogto, a became hea of the Apple eurocomputg Ceter at Harwell aboratory. I 993, he was appote to a char the Departmet of Computer Scece a Apple Mathematcs at Asto versty, a he was the prcpal orgazer of the sxmoth program o eural etworks a Mache earg at the Isaac ewto Isttute for Mathematcal Sceces Cambrge 997. ecetly, he move to the Mcrosoft esearch aboratory Cambrge a has also bee electe to a char of computer scece at the versty of Eburgh. Hs curret research terests clue probablstc ferece, graphcal moels, a patter recogto. Mchael E. ppg receve the BEg egree electroc egeerg from Brstol versty 990 a the MSc egree artfcal tellgece from the versty of Eburgh 99. He receve the PhD egree eural computg from Asto versty 996. He has bee a research fellow the eural Computg esearch Group at Asto versty sce March 996, a hs research terests clue eural etworks, ata vsualzato, probablstc moelg, statstcal patter recogto, a topographc mappg.