Data Visualization by Pairwise Distortion Minimization

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Data Visualization by Pairwise Distortion Minimization"

Transcription

1 Communcatons n Statstcs, Theory and Methods 34 (6), 005 Data Vsualzaton by Parwse Dstorton Mnmzaton By Marc Sobel, and Longn Jan Lateck* Department of Statstcs and Department of Computer and Informaton Scences Temple Unversty, Phladelpha, PA 191. We dedcate ths paper to the memory of Mlton Sobel who provded nspraton to us and the academc communty as a whole. ABSTRACT Data vsualzaton s acheved by mnmzng dstorton resultng from observng the relatonshps between data ponts. Typcally, ths s accomplshed by estmatng latent data ponts, desgned to accurately reflect the parwse relatonshps between observed data ponts. The dstorton masks the true parwse relatonshps between data ponts, represented by the latent data. Dstorton can be modeled as maskng dssmlarty measures between data ponts or, alternatvely, as maskng ther parwse dstances. The latter class of models are encompassed by metrc scalng methodology (MDS); the former are ntroduced here as compettors. The former class of models nclude Prncpal Components Analyss, whch mnmzes the global dstorton between observed and latent data. We model dstorton usng mxtures of parwse dfference factor-analyss statstcal models. We employ an algorthm whch we call stepwse forward selecton for purposes of dentfyng approprate startng values and determnng the approprate dmensonalty of the latent data space. We show that the parwse factor-analyss models frequently better ft the data because they allows for drect modelng of par-wse dssmlartes between data ponts. Marc Sobel ) s an Assocate Professor n the Department of Statstcs, Fox School of Busness and Management, 1810 N. 13 th Street, Longn Jan Lateck ) s an Assocate Professor n the Department of Computer and Informaton Scences (CIS) 314 Wachman Hall, 1805 N. Broad Street; Temple Unversty, Phladelpha, PA 191

2 Communcatons n Statstcs, Theory and Methods 34 (6) 1. INTRODUCTION There has been consderable nterest n both the machne learnng and statstcal modelng lterature n comparng, regsterng, and classfyng mage data. From the practtoners perspectve, there are a number of advantages f such algorthms are successful. Frst, algorthms of ths sort can provde mechansms for vsualzng the data. Second, they provde a mechansm for learnng the mportant features of the data. Because feature vectors typcally lve n very hgh dmensonal spaces, reducng ther dmensonalty s crucal to most datamnng tasks. Many algorthms for reducng data dmensonalty depend on estmatng latent (proected) varables desgned to mnmze certan energy (or error) functons. Other algorthms acheve the same purpose by estmatng latent varables usng statstcal models. Generally, dmensonalty reducton methods can be dvded nto metrc and nonmetrc methods. Metrc methods start wth data ponts (n a hgh-dmensonal space) wth observed parwse dstances between them. The goal of metrc methods s to estmate latent data ponts, lvng n a lower dmensonal space, whose parwse dstances accurately reflect the parwse dstances between the observed data ponts. Methods of ths sort nclude those proposed by J. Sammon []. Nonmetrc methods start wth data ponts whose parwse relatonshps are gven by dssmlartes whch need not correspond to a dstance. In contrast, prncpal components analyss mnmzes the global dstorton between observed and latent data values. Metrc methods ncorporate addtonal steps desgned to provde constrants wthn whch latent dssmlartes, havng approprate propertes, can be optmally estmated. Methods of ths sort nclude those of Kruskal [3]. In ths paper we take as our startng pont observed data ponts, lvng n a hgh-dmensonal space. The parwse relatonshps between these data ponts are represented by the correspondng relatonshps between latent data ponts, lvng n a lowdmensonal space. The parwse dssmlartes between observed data ponts are masked by nose. Ths nose could arse n many dfferent settngs; examples nclude: () settngs where parttonng data nto groups s of paramount nterest, and lack of

3 Communcatons n Statstcs, Theory and Methods 34 (6) 3 straghtforward clusters can be modeled as the mpact of nose on the parwse relatonshps between data, and () settngs where the energy of data ponts s beng modelled; n ths case nose arses n evaluatng the relatonshp between the energy of neghborng data ponts. Our approach s dfferent from that of probablstc prncpal components (see [4]) where nose masks the relatonshp between each ndvdual data pont and ts latent counterpart. By contrast, n our approach nose masks parwse dssmlartes between data ponts and analogous latent quanttes; we wll see below that ths dfference n approach allows us to buld n some extra flexblty nto the nterpretaton and modelng.of hgh-dmensonal data. Our approach s smlar n sprt to the approach employed n relatonal Markov models [5]. The man goal of multdmensonal scalng (MDS) s to mnmze the dstorton between parwse data dstances and the correspondng parwse dstances between ther latent proectons. Ths nsures that the latent (or proected) data optmally reflect the nternal structure of the data. MDS algorthms frequently nvolve constructng loss functons whch prescrbe (scaled) penaltes for dfferences between observed and latent parwse dstances. See also [6] for a graphcal analyss of MDS. MDS methods are used wdely n behavoral, econometrc and socal scences [7]. The most commonly used nonlnear proecton methods wthn MDS nvolve mnmzng measures of dstorton (lke those of Kruskal and Sammon) (e.g., Secton 1.3. n [8] and [9]). These measures of dstorton frequently take the form of loss or energy functon. For purposes of comparson we focus on the loss (energy) functon proposed by J. Sammon n [].In ths paper we compare MDS methods (as represented by that of Sammons) wth those usng parwse-dfference factor analyss methodology. We employ stepwse forward selecton algorthms (see below) to provde good estmates of the dmenson of the latent data space and

4 Communcatons n Statstcs, Theory and Methods 34 (6) 4 startng vectors approprate for use wth ether of these two methodologes. Other startng value optons whch have been recommended nclude () random startng values (see [], [8], and [13]) and () startng at latent varable values arsng from employng prncpal components analyss (PCA) []. The former opton, stll commonly used, fals to provde any useful tranng (nformaton). The latter opton fals because PCA does not provde optmal (or near-optmal) solutons to mnmzng the (nose) dstorton of the data. In fact, as wll be seen below n the examples, the dstorton reducton for PCA generated latent varables s very small. For mult-dmensonal scalng models, after employng stepwse forward selecton algorthms for the aforementoned purposes, we typcally use gradent descent methods (see e.g., [9] and [10]) to mnmze Sammon s cost functon. For factor analyss mxture models, after employng stepwse forward selecton algorthms, we use the EM algorthm (see [11]) to provde estmates of the parameters. We partton the parwse dfferences between data nto two groups by determnng data membershp usng EM-suppled probabltes and an approprate threshold. The frst group conssts n those pars of data wth small parwse dfferences; the second n those pars of data ponts wth large parwse dfferences. The frst group of pars provdes a mechansm for dstngushng data clusters; the second group provdes a mechansm for dstngushng whch pars of ponts are dfferent from one another. In the next secton we compare two dfferent ways of proectng the relatonshps between pars of data ponts nto latent k-dmensonal space, denoted by k R ; typcally k wll be taken to be or 3. Usng the notaton F 1,...,F n for the observed feature vector data, multdmensonal scalng s concerned wth proectng a known real valued dssmlarty

5 Communcatons n Statstcs, Theory and Methods 34 (6) 5 functon, { D(, ) = D(F,F ) } of the ordered pars of features {, } dmensonal functonal counterparts { µ µ } F F onto ther latent k- - (1 < n). Sammons energy functon provdes a typcal example of ths. In ths settng the latent r-vectors are chosen to mnmze a loss (or energy) functon of the form, S H D (, ) µ µ ( F µ ) = ; (1.1) D (, ) 1 < n n mu. We have n mnd the example, D (, ) = l'(f - F ). Many varatons on ths basc theme have been explored n the lterature (see [3]). As a counterpont to ths approach, we ntroduce the next secton:. FORMULATING THE PROBLEM USING STATISTICAL MODELS In ths secton we assume (as above) that feature vectors, assocated wth each data obect are themselves observed. We employ a varant of probablstc prncpal components models, ntroduced n [4]. Our varant s desgned to take account of the fact that we seek to model the nose dstorton between pars of feature vectors rather than the nose dstorton assocated wth each ndvdual feature vector. We follow the man prncple of MDS whch s to map the data to a low-dmensonal space n such a way that the dstorton between data ponts s mnmal. We ntroduce some necessary notaton frst. Let 'D(, )' denote the dssmlarty between feature vectors k F and F (1 < n) (whch s allowed to lve n more than one dmenson).

6 Communcatons n Statstcs, Theory and Methods 34 (6) 6 Explctly, we assume that ths dssmlarty measure 'D(, )' lves n a Eucldean space wth dmensonalty p (assumed to be less than or equal to the dmensonalty k of the feature space). In the example, gven below, we take D(, ) = F -F (1 < n), n whch case p=k. Other examples nclude assumng that = '( ) statstcal model assumed below takes the form: D(, ) 1 F -F (for a known p-vector l) The general lnear ( ), (.1) ( g) ( g) ( g) D(, ) = A µ - µ + ε ; 1 < n ' g ' dentfes the partcular mxture model component; (.e., 'g(, ) = s' means that the par (,) belong to mxture component s) ( g ) ' A ' are parametrc p q matrces ndexed by the component π ; ( g ) ' µ ' are parametrc q 1 latent vectors for feature F ndexed by the component π and observaton ndex ''. (1 <<n). ' ε ' s the parwse nose dstorton for features F,F; (1 < n), It s assumed below that the errors ' ε ' are normally dstrbuted wth ( g ) ( σ ) common varance I., Whle the dmensonalty p of the D s (defned above) may be qute hgh, the dmensonalty q of the latent mu vectors wll typcally be assumed to be qute small. (In the composte move example analyzed n secton 5, below, L s taken to be 5). The matrces 'A ( g ) ' are (latent) proecton matrces proectng pared dfferences between parametrc latent µ vectors onto ther feature vector pared dfference counterparts. We use the EM algorthm [11] to estmate the model parameters under the assumpton that the observed dssmlartes are gven by D (, ) = F F (1 < n). The equatons needed for purposes of dong ths calculaton are gven n the appendx. In equaton (.), below, we assume that the aforementoned mxture model, ndexed by g, conssts of exactly components. The frst component comprses pars of

7 Communcatons n Statstcs, Theory and Methods 34 (6) 7 observatons wth small varance; the second comprses pars of observatons wth large varance. The frst component model s desgned to characterze those pars of feature vectors whose dfference s well-approxmated by the correspondng dfference between ther latent varable counterparts; the second, those pars of feature vectors whose dfference s not wellapproxmated by ths dfference. Specfcally, we assume that the frst component varance (1) [ σ ] s sgnfcantly smaller than the second, were selected to mnmze quanttes of the form, () [ σ ]. Frst component model parameters ˆ ( g= 1) ( g= 1) ( g= 1) D(, ) -A (µ ˆ ˆ -µ ) SS[ g = 1] = P( D(, ) g = 1) (.), σ where the hatted quanttes are the EM algorthm estmates of the correspondng parameters and P( D (, ) g = 1) s the probablty specfed n the EM algorthm (see the appendx, below). Model Ft and Assessment We assess the ftness of data vsualzaton models usng Bayesan p-values [1]. Ths can be formulated as the probablty that the nformaton obtaned from the model s less than expected under an aposteror update of the data. Informaton quanttes lke those derved below are dscussed n [13]. Ths knd of calculaton s not possble for typcal MDS models because they are not formulated as statstcal models. In the model ntroduced at the begnnng of ths secton, the nformaton contaned n the observed dssmlarty measures, assumng an unnformatve pror and gnorng margnal terms, s,

8 Communcatons n Statstcs, Theory and Methods 34 (6) 8 { } {, } INF ( M D) = E log ( L) D = E log(l ) D (.3) 1 < n where L denotes the lkelhood of the data and L ( g) ( g) ( g) 1 D A µ µ, = ( g) exp σ (, ) ( ) ( g) ( σ ) For the model ntroduced n secton, the rght hand sde of equaton (.3) can be approxmated, omttng terms whch don t nvolve the observed dssmlarty measures, by ˆ ( g= 1) ( g= 1) ( g= 1) D (, ) A ( ˆ µ ˆ µ ) INF( M D) INF( M ) Pˆ D = (, g = 1) ( g= 1) ˆ 1 < n σ ˆ ( g= ) ( g= ) ( g= ) D (, ) A ( ˆ µ ˆ µ ) - P ˆ (, g= ) ( g= ) ˆ 1 < σ n (.4) where the hatted quanttes are all the EM algorthm estmates (see the appendx) (see [13] for a more complete dscusson of nformaton quanttes lke that gven n equaton (.4)). Posteror updates of the dssmlartes were smulated va: ˆ (g=1) ( g= 1) ( g= 1) ( g= 1) { ˆ µ ˆ µ ( ˆ σ ) } ˆ (g=) ( g= ) ( g= ) ( g= ) ˆ µ ˆ µ ( ˆ σ ) N A ( ), wprob P(, ˆ g = 1) * D (, ) (.5) N ˆ { A ( ), } wprob P(, g = ) (1 < n). ( N(*1,*) refers to the normal dstrbuton wth mean *1 and varance *). The posteror Bayes p-value s equal to: * ( D D D ) Bayes pvalue = P INF( M ) < INF( M ) (.6)

9 Communcatons n Statstcs, Theory and Methods 34 (6) 9 (the probablty n equaton (.6) beng calculated over the dstrbuton specfed by equaton (.5)). For the models examned below the (Bayesan p-values) were all between 80 and 90% ndcatng good fts. 3. ALGORITHMS EMPLOYED FOR MDS DATA VISUALIZATION We use onlne gradent descent algorthms to estmate parameters n the MDS approach to data vsualzaton [6]. The gradent of Sammon s energy functon wth respect to the parametrc vector r restrcted to terms nvolvng r ; s: µ µ D, µ µ ( E ) = (3.1) D, µ µ The analogous quantty wth and swtched s: ( E ) = ( E ) (1 < N). An onlne gradent descent algorthm can, n theory, be based on an teratve calculaton of the r-vectors by updatng r-vectors usng the followng teratve steps: (new) (old) µ = µ -ε (E ) ( 3.) (new) (old) µ = µ -ε (E ) We have already remarked on the problem of tranng a large number of µ vectors for the purpose of startng the gradent descent and EM algorthms. We show below how to select a small number l of vantage obects v 1,..., v l from among the observed nput obects such that the Sammon s energy functon, l ( D( p D v,f) p ) EDv ( 1,..., vl ) D (, ) (v,f ) = (3.3) 1 < l p= 1 D (, ) s farly small. Ths provdes us wth well-traned (.e., well-ftted) startng r-vectors gven by (0) ( v v ) µ = D( 1, ),..., D( p, ) ; =1,...,n

10 Communcatons n Statstcs, Theory and Methods 34 (6) 10 Snce for purposes of vsualzaton l= or l=3 s typcally suffcent to nsure small values of E( D v 1,..., v l) for a moderate szed data set, two or three vantage vectors usually suffce n ths case [14]. For a large number n of observed data ponts vantage obects can be obtaned by the stepwse forward selecton process descrbed below. We note that ths process mproves on the adhoc procedures used heretofore [13]. Stepwse Forward Selecton At each stage s=1,...,l, the stepwse forward selecton algorthm selects one new vantage obect v that s added to the set of prevously chosen obects v 1,..., v s-1 s chosen to satsfy: ( ) v = arg mn E D v,..., v, v (3.4) s v A 1 s 1. The vantage obect v s s where 'arg mn v A' denotes the vector n A for whch the mnmum value of EDv ( 1,..., vs 1, v) s reached. At stage s, havng chosen the vantage obect v s, we prune the obects by comparng the energes EDv ( 1,..., v 1, v+ 1,..., vs) (=1,.,s) wth the energy EDv ( 1,..., vs 1, vs ) (=1,...,s). If any of them are smaller than the latter energy, we remove the vantage obect v and return to the next step of the process. 4. Expermental Results In ths secton we examne the performance of the proposed algorthms on varous data sets. We begn wth the classcal Irs data set [7]. The Irs data s composed of 150 vectors each havng 4 components. It s known that there are 3 clusters, each havng 50 ponts; these consst of one clear cluster, denoted by A below, and two clusters, B and C that are hard to dstngush from one another. We frst compare 3 dmensonal proectons obtaned usng Sammon s algorthm (cf., equaton (1.1)) wth nput vantage vectors produced va stepwse

11 Communcatons n Statstcs, Theory and Methods 34 (6) 11 forward selecton [15]. Fgure 1a, below, shows a 3 dmensonal proecton of the Irs data obtaned usng the classcal Prncpal Components algorthm for data vsualzaton; the dstorton measure for ths estmate (computed usng Sammon s energy functon) s 3,55. Fgure 1b, below, shows a 3 dmensonal proecton of the Irs data obtaned usng Sammons data vsualzaton algorthm; the dstorton measure (computed usng Sammons energy functon) for ths estmate s 544. As can be seen, we cannot clearly dstngush between clusters B and C usng PCA. By contrast, clusters B and C can be clearly dstngushed usng Sammons data vsualzaton algorthm. Fgure 1(a on the left and b on the rght) 3 Dmensonal proectons of the Irs data (a): obtaned by classcal PCA, and (b) usng Sammons algorthm for data vsualzaton (employng vantage vectors produced by stepwse forward selecton. We now turn to evaluatng two dmensonal proectons for the data set referred to below as composte move, below. Composte move s composed of 10 shots (each havng 10 frames) taken from 4 dfferent moves; these consst n: a) 4 shots taken from the move, Mr. Beans Chrstmas : frames 1 to 40. b) 3 shots taken from the move House Tour : frames 41 to 70.

12 Communcatons n Statstcs, Theory and Methods 34 (6) 1 c) shots taken from a move we created (referred to below as Mov1 ): frames 71:90. d) 1 shot from a move n whch Kyle Mnogue s ntervewed: frames 91 to 100. The frames can vewed at the ste: Usng mage processng technques descrbed n [16], we assgn a vector wth 7 features to each of the 100 frames. We obtan a data set consstng of one hundred 7 component feature vectors. Composte Move has two herachcal groupng levels; t can be grouped usng shots and separately usng moves. We expect to dstngush both between the shots and, on a hgher level, between the moves. The best data vsualzaton algorthm (cf., fgure a) for ths data set was obtaned usng the parwse dfference factor analyss mxture model outlned n secton ; we used startng vantage vectors, computed usng stepwse forward selecton. As can be seen n Fgure a, below, there are 4 clear clusters that belong together n the upper left corner of the fgure. They represent the 4 shots grouped to form excerpts from Mr Beans Chrstmas. In the lower rght corner, we see two clear clusters. These are two shots from the move referred to as Mov1. The 3 shots from House Tour are represented by the 3 rghtmost clusters n the mddle of the fgure. Fgure b below, employs Sammon s data vsualzaton algorthm, usng gradent descent (see secton 3) wth the same vantage vectors. Sammon s data vsualzaton gave a sgnfcantly worse pcture of the data. Ths s demonstrated by the fact that the moves are no longer grouped correctly. For example, the four clusters from Mr. Bean s chrstmas are mxed wth clusters from the other moves n the lower rght hand quadrant.

13 Communcatons n Statstcs, Theory and Methods 34 (6) 13 Fgure The dmensonal proectons of the composte move data obtaned by (a) the parwse dfference factor analyss mxture algorthm (on the left) and (b) Sammons algorthm computed usng gradent descent (on the rght). 5. Conclusons and Future Research We have ntroduced stepwse forward selecton algorthms and demonstrated ther value n provdng startng values for factor mxture models and Multdmensonal scalng algorthms. It has been shown that parwse dfference factor mxture models provde good data vsualzaton for a wde varety of data when vantage vectors, constructed usng stepwse forward selecton, are used to generate approprate startng values. Our examples llustrate that factor mxture models frequently provde better data vsualzaton than Multdmensonal Scalng algorthms, desgned for the same purpose. Ther superorty arses as a result of ther flexblty n modelng data dstorton. We have shown how to assess the ftness of factor mxture models and used these results to assess ft n the examples presented above. We would lke to extend our current work to nclude mxture factor models whch ncorporate ntracomponent correlatons.

14 Communcatons n Statstcs, Theory and Methods 34 (6) 14 Appendx Calculatons va the EM algorthm needed for the mxture factor dfference model: In ths secton we descrbe the Expectaton-Maxmzaton (EM) algorthm [10] used to estmate the latent varables and parameters ntroduced n secton above. For purposes of clarty we repeat the formulaton of our model: ( ), (A.1) (π) (π) (π) D(, ) = A µ - µ + ε ; 1 < n ' π ' dentfes the partcular mxture model component; (.e., ' π(, ) = s' means that the par (,) belong to mxture component s) (π) ' A ' are parametrc p q matrces ndexed by the component π ; (π) ' µ ' are parametrc q 1 latent vectors for feature F ndexed by the component π and observaton ndex ''. (1 <<n). ' ε,' s the parwse nose dstorton for features F,F; (1 < n) It s assumed below that ' ε ' that the errors are normally dstrbuted wth (π) ( σ ) common varance., ( old; g ) ( g; new) In the notaton below, µ (respectvely, µ ) denotes the old or prevous value (respectvely, new or updated value) of the latent parameter µ for the g th component (g=1,). (=1,,n). Analogous notaton s used to characterze the proecton matrx A. We also ( g; old ) employ the notaton, µ ( ) for the average of the old (or prevous) mu-parameters of the g th component excludng the th; smlar notaton apples to the the new (or updated) parameters (g=1,; =1,,n). Then, employng the notaton, κ = 1 ( gold ; ) (g; old) ( gold ; ) ( gold ; ) { σ µ µ } ( κ; old ) ( κ; old ) ( κ; old ) ( κ; old ) { σ µ µ } exp (1/ [ ] ) D-A ( ) P(,;g)=P(D(,) g)= (A.) exp (1/ [ ] ) D-A ( ) for the probablty weght attached to the observed par of dssmlarty measure D (, ), ( g; new) we update the latent mean vectors µ (g=1,; =1,,n) va,

15 Communcatons n Statstcs, Theory and Methods 34 (6) 15 ( gnew ; ) ( ( gnew ; ) ( gnew ; ) 1 ) ( gnew ; ) ( ( gnew ; ) ( gnew ; ), + ( ) ) A ' A A ' D A µ P(, ; π) ˆ µ (A.3) P (, g ; ) The back proecton matrx ( g; new) A s updated usng the formula, A ( gnew ; ) for g=1,. D µ µ µ µ µ µ ' P(, ; π) ( g; new) ( g; new) ( g; new) ( g; new) ( g; new) ( g; new) ( ) ( )( ), < < We upgrade the varances < ( g; new) σ va, P (, g ; ) 1 (A.4) σ ( gnew ; ) < ( gnew ; ) ( gnew ; ) ( gnew ; ), µ µ D A ( P(, ; g) < P (, g ; ) (A.5). Bblography [1] Jollffe,I.T. Prncpal Component Analyss, Sprnger-Verlag, 1986 [] Sammon, J.W., Jr., A nonlnear mappng for data structure analyss, IEEE Trans. Comput. 1969, 18, [3] T.F. Cox and M.A. Cox. Multdmensonal Scalng, Chapman and Hall, 001. [4] Bshop, M., and Tppng, M.E. A Herarchcal Latent Varable Model for Data Vsualzaton, IEEE Transactons on Pattern Analyss and Machne Intellgence, 1983, 0,3, [5] Koller, D. Probablstc Relatonal Models, nvted contrbuton to, Inductve Logc Programmng, 9 th Internatonal Workshop (ILP-99), Saso Dzerosk and Peter Flach, Eds, Sprnger Verlag, 1999, pp [6] McFarlane, M., and Young F.W., Graphcal Senstvty Analyss for Multdmensonal Scalng, Journal of Computatonal and Graphcal Statstcs, 1994, 3, 1, [7] Kohonen, T. Self-organzng maps, Sprnger-Verlaag, New York, 001.

16 Communcatons n Statstcs, Theory and Methods 34 (6) 16 [8] Faloutsos C., and Ln, K.-I. FastMap: A fast algorthm for Indexng,Data-Mnng and Vsualzaton of Tradtonal and Multmeda Datasets, Proc. ACM SIGMOD Internatonal Conference on Management of Data, 1995, [9] Mao, J. and Jan, A.K.: Artfcal Neural Networks for Feature Extracton and Multvarate Data Proecton. IEEE Transactons on Neural Networks 1995, 6,. [10] Lerner, Boaz, Guterman, Hugo, Aladem, Mayer, Dnsten, Itshak, and Romem, Ytzhak,On pattern classfcaton wth Sammon s Nonlnear Mappng - An Expermental Study, Pattern Recognton, 1998, 31, [11] Lard, N.M., and Rubn, D.B., Maxmum lkelhood for ncomplete data va the em algorthm, Journal of Royal Statstcal Socety, 1977, 39, pp [1] Gelman, Carln, Stern and Rubn, Bayesan Data Analyss, Chapman and Hall, [13] Maclachlan, G. and Peer, D., Fnte Mxture Models, Wley Seres n Probablty and Statstcs, 000. [14] Fraley, C. and Raftery A.E., How Many Clusters? Whch clusterng method? Answers va Model Based Cluster Analyss, Computer Journal, 1999, 41, pp97:306. [15] Jollffe, I.T., Prncpal Components Analyss, Sprnger seres n statstcs, nd edton, 00. [16] Lateck, L.J., and Wldt, D., Automatc Recognton of Unpredctable Events n Vdeos, Proceedngs of the Internatonal Conference on Pattern Recognton (ICPR), 00, 16.

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

Nonlinear data mapping by neural networks

Nonlinear data mapping by neural networks Nonlnear data mappng by neural networks R.P.W. Dun Delft Unversty of Technology, Netherlands Abstract A revew s gven of the use of neural networks for nonlnear mappng of hgh dmensonal data on lower dmensonal

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

CS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering

CS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering Lecture 7a Clusterng Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Clusterng Groups together smlar nstances n the data sample Basc clusterng problem: dstrbute data nto k dfferent groups such that

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Inequality and The Accounting Period. Quentin Wodon and Shlomo Yitzhaki. World Bank and Hebrew University. September 2001.

Inequality and The Accounting Period. Quentin Wodon and Shlomo Yitzhaki. World Bank and Hebrew University. September 2001. Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

MAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date

MAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Georey E. Hinton. University oftoronto. Email: zoubin@cs.toronto.edu. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract

Georey E. Hinton. University oftoronto. Email: zoubin@cs.toronto.edu. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract The EM Algorthm for Mxtures of Factor Analyzers Zoubn Ghahraman Georey E. Hnton Department of Computer Scence Unversty oftoronto 6 Kng's College Road Toronto, Canada M5S A4 Emal: zoubn@cs.toronto.edu Techncal

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

Using Mixture Covariance Matrices to Improve Face and Facial Expression Recognitions

Using Mixture Covariance Matrices to Improve Face and Facial Expression Recognitions Usng Mxture Covarance Matrces to Improve Face and Facal Expresson Recogntons Carlos E. homaz, Duncan F. Glles and Raul Q. Fetosa 2 Imperal College of Scence echnology and Medcne, Department of Computng,

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data Journal of Al Azhar Unversty-Gaza (Natural Scences), 2011, 13 : 109-118 Estmatng the Number of Clusters n Genetcs of Acute Lymphoblastc Leukema Data Mahmoud K. Okasha, Khaled I.A. Almghar Department of

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

On Mean Squared Error of Hierarchical Estimator

On Mean Squared Error of Hierarchical Estimator S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Communication Networks II Contents

Communication Networks II Contents 8 / 1 -- Communcaton Networs II (Görg) -- www.comnets.un-bremen.de Communcaton Networs II Contents 1 Fundamentals of probablty theory 2 Traffc n communcaton networs 3 Stochastc & Marovan Processes (SP

More information

Modelling high-dimensional data by mixtures of factor analyzers

Modelling high-dimensional data by mixtures of factor analyzers Computatonal Statstcs & Data Analyss 41 (2003) 379 388 www.elsever.com/locate/csda Modellng hgh-dmensonal data by mxtures of factor analyzers G.J. McLachlan, D. Peel, R.W. Bean Department of Mathematcs,

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

Lecture 18: Clustering & classification

Lecture 18: Clustering & classification O CPS260/BGT204. Algorthms n Computatonal Bology October 30, 2003 Lecturer: Pana K. Agarwal Lecture 8: Clusterng & classfcaton Scrbe: Daun Hou Open Problem In HomeWor 2, problem 5 has an open problem whch

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence

More information

Active Learning for Interactive Visualization

Active Learning for Interactive Visualization Actve Learnng for Interactve Vsualzaton Tomoharu Iwata Nel Houlsby Zoubn Ghahraman Unversty of Cambrdge Unversty of Cambrdge Unversty of Cambrdge Abstract Many automatc vsualzaton methods have been. However,

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Fast Fuzzy Clustering of Web Page Collections

Fast Fuzzy Clustering of Web Page Collections Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg,

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Clustering Gene Expression Data. (Slides thanks to Dr. Mark Craven)

Clustering Gene Expression Data. (Slides thanks to Dr. Mark Craven) Clusterng Gene Epresson Data Sldes thanks to Dr. Mark Craven Gene Epresson Proles we ll assume we have a D matr o gene epresson measurements rows represent genes columns represent derent eperments tme

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

A Fast Incremental Spectral Clustering for Large Data Sets

A Fast Incremental Spectral Clustering for Large Data Sets 2011 12th Internatonal Conference on Parallel and Dstrbuted Computng, Applcatons and Technologes A Fast Incremental Spectral Clusterng for Large Data Sets Tengteng Kong 1,YeTan 1, Hong Shen 1,2 1 School

More information

Review of Hierarchical Models for Data Clustering and Visualization

Review of Hierarchical Models for Data Clustering and Visualization Revew of Herarchcal Models for Data Clusterng and Vsualzaton Lola Vcente & Alfredo Velldo Grup de Soft Computng Seccó d Intel lgènca Artfcal Departament de Llenguatges Sstemes Informàtcs Unverstat Poltècnca

More information

Traffic-light a stress test for life insurance provisions

Traffic-light a stress test for life insurance provisions MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

The Analysis of Outliers in Statistical Data

The Analysis of Outliers in Statistical Data THALES Project No. xxxx The Analyss of Outlers n Statstcal Data Research Team Chrysses Caron, Assocate Professor (P.I.) Vaslk Karot, Doctoral canddate Polychrons Economou, Chrstna Perrakou, Postgraduate

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

The covariance is the two variable analog to the variance. The formula for the covariance between two variables is

The covariance is the two variable analog to the variance. The formula for the covariance between two variables is Regresson Lectures So far we have talked only about statstcs that descrbe one varable. What we are gong to be dscussng for much of the remander of the course s relatonshps between two or more varables.

More information

Abstract. Clustering ensembles have emerged as a powerful method for improving both the

Abstract. Clustering ensembles have emerged as a powerful method for improving both the Clusterng Ensembles: {topchyal, Models jan, of punch}@cse.msu.edu Consensus and Weak Parttons * Alexander Topchy, Anl K. Jan, and Wllam Punch Department of Computer Scence and Engneerng, Mchgan State Unversty

More information

Mixtures of Factor Analyzers with Common Factor Loadings for the Clustering and Visualisation of High-Dimensional Data

Mixtures of Factor Analyzers with Common Factor Loadings for the Clustering and Visualisation of High-Dimensional Data Mxtures of Factor Analyzers wth Common Factor Loadngs for the Clusterng and Vsualsaton of Hgh-Dmensonal Data Jangsun Baek 1 and Geoffrey J. McLachlan 2 1 Department of Statstcs, Chonnam Natonal Unversty,

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Abstract. 260 Business Intelligence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING

Abstract. 260 Business Intelligence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING 260 Busness Intellgence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING Murphy Choy Mchelle L.F. Cheong School of Informaton Systems, Sngapore

More information

Visualization of high-dimensional data with relational perspective map

Visualization of high-dimensional data with relational perspective map (2004) 3, 49 59 & 2004 Palgrave Macmllan Ltd. All rghts reserved 1473-8716 $25.00 www.palgrave-journals.com/vs Vsualzaton of hgh-dmensonal data wth relatonal perspectve map James Xnzh L 1 1 Edgehll Dr.

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

1 Approximation Algorithms

1 Approximation Algorithms CME 305: Dscrete Mathematcs and Algorthms 1 Approxmaton Algorthms In lght of the apparent ntractablty of the problems we beleve not to le n P, t makes sense to pursue deas other than complete solutons

More information

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble 1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

Table of Contents EQ.10...46 EQ.6...46 EQ.8...46

Table of Contents EQ.10...46 EQ.6...46 EQ.8...46 Table of Contents CHAPTER II - PATTERN RECOGNITION.... THE PATTERN RECOGNITION PROBLEM.... STATISTICAL FORMULATION OF CLASSIFIERS...6 3. CONCLUSIONS...30 UNDERSTANDING BAYES RULE...3 BAYESIAN THRESHOLD...33

More information

Detecting Credit Card Fraud using Periodic Features

Detecting Credit Card Fraud using Periodic Features Detectng Credt Card Fraud usng Perodc Features Alejandro Correa Bahnsen, Djamla Aouada, Aleksandar Stojanovc and Björn Ottersten Interdscplnary Centre for Securty, Relablty and Trust Unversty of Luxembourg,

More information

Dimensionality Reduction for Data Visualization

Dimensionality Reduction for Data Visualization Dmensonalty Reducton for Data Vsualzaton Samuel Kask and Jaakko Peltonen Dmensonalty reducton s one of the basc operatons n the toolbox of data-analysts and desgners of machne learnng and pattern recognton

More information

Probabilities and Probabilistic Models

Probabilities and Probabilistic Models Probabltes and Probablstc Models Probablstc models A model means a system that smulates an obect under consderaton. A probablstc model s a model that produces dfferent outcomes wth dfferent probabltes

More information

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification IDC IDC A Herarchcal Anomaly Network Intruson Detecton System usng Neural Network Classfcaton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech.,

More information

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS Tmothy J. Glbrde Assstant Professor of Marketng 315 Mendoza College of Busness Unversty of Notre Dame Notre Dame, IN 46556

More information

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS Chrs Deeley* Last revsed: September 22, 200 * Chrs Deeley s a Senor Lecturer n the School of Accountng, Charles Sturt Unversty,

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Bayesian Cluster Ensembles

Bayesian Cluster Ensembles Bayesan Cluster Ensembles Hongjun Wang 1, Hanhua Shan 2 and Arndam Banerjee 2 1 Informaton Research Insttute, Southwest Jaotong Unversty, Chengdu, Schuan, 610031, Chna 2 Department of Computer Scence &

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

x f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60

x f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60 BIVARIATE DISTRIBUTIONS Let be a varable that assumes the values { 1,,..., n }. Then, a functon that epresses the relatve frequenc of these values s called a unvarate frequenc functon. It must be true

More information

A Lyapunov Optimization Approach to Repeated Stochastic Games

A Lyapunov Optimization Approach to Repeated Stochastic Games PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT. 2013 1 A Lyapunov Optmzaton Approach to Repeated Stochastc Games Mchael J. Neely Unversty of Southern Calforna http://www-bcf.usc.edu/

More information

Hallucinating Multiple Occluded CCTV Face Images of Different Resolutions

Hallucinating Multiple Occluded CCTV Face Images of Different Resolutions In Proc. IEEE Internatonal Conference on Advanced Vdeo and Sgnal based Survellance (AVSS 05), September 2005 Hallucnatng Multple Occluded CCTV Face Images of Dfferent Resolutons Ku Ja Shaogang Gong Computer

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

Quantification of qualitative data: the case of the Central Bank of Armenia

Quantification of qualitative data: the case of the Central Bank of Armenia Quantfcaton of qualtatve data: the case of the Central Bank of Armena Martn Galstyan 1 and Vahe Movssyan 2 Overvew The effect of non-fnancal organsatons and consumers atttudes on economc actvty s a subject

More information

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent

More information

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello *

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello * Internatonal Journal of Computatonal Scence 992-6669 (Prnt) 992-6677 (Onlne) Global Informaton Publsher 27, Vol., No., 27-39 A neuro-fuzzy collaboratve flterng approach for Web recommendaton G. Castellano,

More information

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms Internatonal Journal of Appled Informaton Systems (IJAIS) ISSN : 2249-0868 Foundaton of Computer Scence FCS, New York, USA Volume 7 No.7, August 2014 www.jas.org Cluster Analyss of Data Ponts usng Parttonng

More information

A Simple Approach to Clustering in Excel

A Simple Approach to Clustering in Excel A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

General Iteration Algorithm for Classification Ratemaking

General Iteration Algorithm for Classification Ratemaking General Iteraton Algorthm for Classfcaton Ratemakng by Luyang Fu and Cheng-sheng eter Wu ABSTRACT In ths study, we propose a flexble and comprehensve teraton algorthm called general teraton algorthm (GIA)

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information