Modelling high-dimensional data by mixtures of factor analyzers

Size: px
Start display at page:

Download "Modelling high-dimensional data by mixtures of factor analyzers"

Transcription

1 Computatonal Statstcs & Data Analyss 41 (2003) Modellng hgh-dmensonal data by mxtures of factor analyzers G.J. McLachlan, D. Peel, R.W. Bean Department of Mathematcs, Unversty of Queensland, St. Luca, Brsbane 4072, Australa Receved 1 March 2002 Abstract We focus on mxtures of factor analyzers from the perspectve of a method for model-based densty estmaton from hgh-dmensonal data, and hence for the clusterng of such data. Ths approach enables a normal mxture model to be tted to a sample of n data ponts of dmenson p, where p s large relatve to n. The number of free parameters s controlled through the dmenson of the latent factor space. By workng n ths reduced space, t allows a model for each component-covarance matrx wth complexty lyng between that of the sotropc and full covarance structure models. We shall llustrate the use of mxtures of factor analyzers n a practcal example that consders the clusterng of cell lnes on the bass of gene expressons from mcroarray experments. c 2002 Elsever Scence B.V. All rghts reserved. Keywords: Mxture modellng; Factor analyzers; EM algorthm 1. Introducton Fnte mxtures of dstrbutons have provded a mathematcal-based approach to the statstcal modellng of a wde varety of random phenomena; see, for example, McLachlan and Peel (2000a). For multvarate data of a contnuous nature, attenton has focussed on the use of multvarate normal components because of ther computatonal convenence. Wth the normal mxture model-based approach to densty estmaton and clusterng, the densty of the (p-dmensonal) random varable Y of nterest s modelled as a mxture of a number (g) of multvarate normal denstes n some Correspondng author. Tel.: ; fax: E-mal address: gjm@maths.uq.edu.au (G.J. McLachlan) /03/$ - see front matter c 2002 Elsever Scence B.V. All rghts reserved. PII: S (02)

2 380 G.J. McLachlan et al. / Computatonal Statstcs & Data Analyss 41 (2003) unknown proportons 1 ;:::; g. That s, each data pont s taken to be a realzaton of the mxture probablty densty functon (p.d.f.), g f(y; )= (y; ; ); (1) =1 where (y; ; ) denotes the p-varate normal densty functon wth mean and covarance matrx. Here the vector of unknown parameters conssts of the mxng proportons, the elements of the component means, and the dstnct elements of the component-covarance matrx. The normal mxture model (1) can be tted teratvely to an observed random sample y 1 ;:::;y n by maxmum lkelhood (ML) va the expectaton-maxmzaton (EM) algorthm of Dempster et al. (1977); see also McLachlan and Krshnan (1997). The number of components g can be taken sucently large to provde an arbtrarly accurate estmate of the underlyng densty functon; see, for example, L and Barron (2000). For clusterng purposes, a probablstc clusterng of the data nto g clusters can be obtaned n terms of the tted posteror probabltes of component membershp for the data. An outrght assgnment of the data nto g clusters s acheved by assgnng each data pont to the component to whch t has the hghest estmated posteror probablty of belongng. The g-component normal mxture model (1) wth unrestrcted component-covarance matrces s a hghly parameterzed model wth 1 2p(p + 1) parameters for each component-covarance matrx ( =1;:::;g). Baneld and Raftery (1993) ntroduced a parameterzaton of the component-covarance matrx based on a varant of the standard spectral decomposton of ( =1;:::;g). A common approach to reducng the number of dmensons s to perform a prncpal component analyss (PCA). But as s well-known, projectons of the feature data y j onto the rst few prncpal axes are not always useful n portrayng the group structure; see McLachlan and Peel (2000a, p. 239). Ths pont was also stressed by Chang (1983), who showed n the case of two groups that the prncpal component of the feature vector that provdes the best separaton between groups n terms of Mahalanobs dstance s not necessarly the rst component. Another approach for reducng the number of unknown parameters n the forms for the component-covarance matrces s to adopt the mxture of factor analyzers model, as consdered n McLachlan and Peel (2000a, 2000b). Ths model was orgnally proposed by Ghahraman and Hnton (1997) and Hnton et al. (1997) for the purposes of vsualzng hgh dmensonal data n a lower dmensonal space to explore for group structure; see also Tppng and Bshop (1997, 1999) and Bshop (1998) who consdered the related model of mxtures of prncpal component analyzers for the same purpose. Further references may be found n McLachlan and Peel (2000a, Chapter 8). In ths paper, we nvestgate further the modellng of hgh-dmensonal data through the use of mxtures of factor analyzers, focussng on computatonal ssues not addressed n McLachlan and Peel (2000a, Chapter 8). We shall also demonstrate the usefulness of the methodology n ts applcaton to the clusterng of mcroarray expresson data, whch s a very mportant but nonstandard problem n cluster analyss. Intal attempts on ths problem used herarchcal clusterng, but there s no reason why the clusters

3 G.J. McLachlan et al. / Computatonal Statstcs & Data Analyss 41 (2003) should be herarchcal for ths problem. Also, a mxture model-based approach enables the clusterng of mcroarray data to be approached on a sound mathematcal bass. Indeed, as remarked by Atkn et al. (1981), when clusterng samples from a populaton, no cluster analyss method s a pror belevable wthout a statstcal model. For mcroarray data, the number of tssues n s usually very small relatve to the number of genes (the dmenson p), and so the use of factor models to represent the component-covarance matrces allows the mxture model to be tted by workng n the lower dmensonal space mpled by the factors. 2. Sngle-factor analyss model Factor analyss s commonly used for explanng data, n partcular, correlatons between varables n multvarate observatons. It can be used also for dmensonalty reducton. In a typcal factor analyss model, each observaton Y j s modelled as Y j = + BU j + e j (j =1;:::;n); (2) where U j s a q-dmensonal (q p) vector of latent or unobservable varables called factors and B s a p q matrx of factor loadngs (parameters). The U j are assumed to be..d. as N (0; I q ), ndependently of the errors e j, whch are assumed to be..d. as N(0; D), where D s a dagonal matrx, D = dag(1;:::; 2 p) 2 and where I q denotes the q q dentty matrx. Thus, condtonal on U j =u j, the Y j are ndependently dstrbuted as N ( +Bu j ; D). Uncondtonally, the Y j are..d. accordng to a normal dstrbuton wth mean and covarance matrx = BB T + D: (3) If q s chosen sucently smaller than p, representaton (3) mposes some constrants on the component-covarance matrx and thus reduces the number of free parameters to be estmated. Note that n the case of q 1, there s an nnty of choces for B, snce (3) s stll satsed f B s replaced by BC, where C s any orthogonal matrx of order q. One (arbtrary) way of unquely specfyng B s to choose the orthogonal matrx C so that B T D 1 B s dagonal (wth ts dagonal elements arranged n decreasng order); see Lawley and Maxwell (1971, Chapter 1). Assumng that the egenvalues of BB T are postve and dstnct, the condton that B T D 1 B s dagonal as above mposes 1 2q(q 1) constrants on the parameters. Hence then the number of free parameters s pq + p 1 2q(q 1). The factor analyss model (2) can be tted by the EM algorthm and ts varants as to be dscussed n the subsequent secton for the more general case of mxtures of such models. Note that wth the factor analyss model, we avod havng to compute the nverses of terates of the estmated p p covarance matrx that may be sngular for large p relatve to n. Ths s because the nverson of the current value of the p p matrx (BB T + D) on each teraton can be undertaken usng the result that (BB T + D) 1 = D 1 D 1 B(I q + B T D 1 B) 1 B T D 1 ; (4)

4 382 G.J. McLachlan et al. / Computatonal Statstcs & Data Analyss 41 (2003) where the rght-hand sde of (4) nvolves only the nverses of q q matrces, snce D s a dagonal matrx. The determnant of (BB T + D) can then be calculated as BB T + D = D = I q B T (BB T + D) 1 B : Unlke the PCA model, the factor analyss model (2) enjoys a powerful nvarance property: changes n the scales of the feature varables n y j, appear only as scale changes n the approprate rows of the matrx B of factor loadngs. 3. Mxtures of factor analyzers A global nonlnear approach can be obtaned by postulatng a nte mxture of lnear submodels for the dstrbuton of the full observaton vector Y j gven the (unobservable) factors u j. That s, we can provde a local dmensonalty reducton method by assumng that the dstrbuton of the observaton Y j can be modelled as Y j = + B U j + e j wth prob: ( =1;:::;g) (5) for j =1;:::;n, where the factors U 1 ;:::;U n are dstrbuted ndependently N(0; I q ), ndependently of the e j, whch are dstrbuted ndependently N(0; D ), where D s a dagonal matrx ( = 1;:::;g). Thus the mxture of factor analyzers model s gven by (1), where the th componentcovarance matrx has the form = B B T + D ( =1;:::;g); (6) where B s a p q matrx of factor loadngs and D s a dagonal matrx ( =1;:::;g). The parameter vector now conssts of the elements of the, the B, and the D, along wth the mxng proportons ( =1;:::;g 1), on puttng g =1 g 1 =1. 4. Maxmum lkelhood estmaton of mxture of factor analyzers models The mxture of factor analyzers model can be tted by usng the alternatng expectaton condtonal maxmzaton (AECM) algorthm (Meng and van Dyk, 1997). The expectaton condtonal maxmzaton (ECM) algorthm proposed by Meng and Rubn (1993) replaces the M-step of the EM algorthm by a number of computatonally smpler condtonal maxmzaton (CM) steps. The AECM algorthm s an extenson of the ECM algorthm, where the speccaton of the complete data s allowed to be derent on each CM-step. To apply the AECM algorthm to the ttng of the mxture of factor analyzers model, we partton the vector of unknown parameters T T as ( 1 ; 2 ) T, where 1 contans the mxng proportons ( =1;:::;g 1) and the elements of the component means ( =1;:::;g). The subvector 2 contans the elements of the B and the D ( =1;:::;g). We let (k) (k) =( T (k) 1 ; T 2 ) T be the value of after the kth teraton of the AECM algorthm. For ths applcaton of the AECM algorthm, one teraton conssts of two

5 G.J. McLachlan et al. / Computatonal Statstcs & Data Analyss 41 (2003) cycles, and there s one E-step and one CM-step for each cycle. The two CM-steps correspond to the partton of nto the two subvectors 1 and 2. For the rst cycle of the AECM algorthm, we specfy the mssng data to be just the component-ndcator vectors, z 1 ;:::;z n, where z j =(z j ) s one or zero, accordng to whether y j arose or dd not arse from the th component ( =1;:::;g; j =1;:::;n). The rst condtonal CM-step leads to (k) n (k+1) = j=1 (y j ; and (k) beng updated to (k) )=n (7) and (k+1) = n (y j ; j=1 (k) )y j / n j=1 (y j ; (k) ) (8) for =1;:::;g, where / g (y j ; )= (y j ; ; ) h (y j ; h ; h ) (9) h=1 s the th component posteror probablty of y j. For the second cycle for the updatng of 2, we specfy the mssng data to be the factors u 1 ;:::;u n, as well as the component-ndcator vectors, z 1 ;:::;z n. On settng (k+1=2) (k+1) equal to ( T (k) 1 ; T 2 ) T, an E-step s performed to calculate Q( ; (k+1=2) ), whch s the condtonal expectaton of the complete-data log lkelhood gven the observed data, usng = (k+1=2). The CM-step on ths second cycle s mplemented by the maxmzaton of Q( ; (k+1=2) (k+1) ) over wth 1 set equal to 1. Ths yelds the updated estmates B (k+1) and D (k+1). The former s gven by where B (k+1) = V (k+1=2) (k) ( (k)t n V (k+1=2) j=1 = (y j ; V (k+1=2) (k) +! (k) ) 1 ; (10) (k+1=2) )(y j (k+1) )(y j (k+1) n j=1 (y j ; (k+1=2) ) ) T ; (11) and (k)! (k) =(B (k) B (k)t + D (k) ) 1 B (k) (12) = I q (k)t B (k) (13) for =1;:::;g. The updated estmate D (k+1) D (k+1) = dag {V (k+1=2) = dag {V (k+1=2) B (k+1) V (k+1=2) s gven by H (k+1=2) B (k+1)t } (k) B (k+1)t }; (14)

6 384 G.J. McLachlan et al. / Computatonal Statstcs & Data Analyss 41 (2003) where n H (k+1=2) j=1 = (y j ; n j=1 (y j ; = (k)t (k+1=2) )E (k+1=2) (U j Uj T y j ) ; (k+1=2) ) V (k+1=2) (k) +! (k) (15) and E (k+1=2) denotes condtonal expectaton gven membershp of the th component, usng (k+1=2) for. Drect derentaton of the log-lkelhood functon shows that the ML estmate of the dagonal matrx D satses ˆD = dag( ˆV ˆB ˆB T ); (16) where ˆV = / n n (y j ; ˆ )(y j ˆ )(y j ˆ ) T (y j ; ˆ ): (17) j=1 j=1 As remarked by Lawley and Maxwell (1971, p. 30) n the context of drect computaton of the ML estmate for a sngle-component factor analyss model, Eq. (16) looks temptngly smple to use to solve for ˆD, but was not recommended due to convergence problems. On comparng (16) wth (14), t can be seen that wth the calculaton of the ML estmate of D drectly from the (ncomplete-data) log-lkelhood functon, the uncondtonal expectaton of U j Uj T, whch s the dentty matrx, s used n place of the condtonal expectaton n (15) on the E-step of the AECM algorthm. Unlke the drect approach of calculatng the ML estmate, the EM algorthm and ts varants such as the AECM verson have good convergence propertes n that they ensure the lkelhood s not decreased after each teraton regardless of the choce of startng pont. It can be seen from (16) that some of the estmates of the elements of the dagonal matrx D (the unquenesses) wll be close to zero f eectvely not more than q observatons are unequvocally assgned to the th component of the mxture n terms of the tted posteror probabltes of component membershp. Ths wll lead to spkes or near sngulartes n the lkelhood. One way to avod ths s to mpose the condton of a common value D for the D, D = D ( =1;:::;g): (18) An alternatve way of proceedng s to adopt some pror dstrbuton for the D as n the Bayesan approaches of Fokoue and Ttterngton (2000), Ghahraman and Beal (2000) and Utsug and Kumaga (2001). The mxture of probablstc component analyzers (PCAs) model, as proposed by Tppng and Bshop (1997), has form (6) wth each D now havng the sotropc structure D = 2 I p ( =1;:::;g): (19)

7 G.J. McLachlan et al. / Computatonal Statstcs & Data Analyss 41 (2003) Under ths sotropc restrcton (19) the teratve updatng of B and D s not necessary snce, gven the component membershp of the mxture of PCAs, B (k+1) and (k+1)2 are gven explctly by an egenvalue decomposton of the current value of V. 5. Intalzaton of AECM algorthm We can make use of the lnk of factor analyss wth the probablstc PCA model (19) to specfy an ntal value (0) for n the ML ttng of the mxture of factor analyzers va the AECM algorthm. On notng that the transformed data D 1=2 Y j satses the probabltstc PCA model (19) wth 2 = 1, t follows that for a gven D (0) and (0), we can specfy B (0) as B (0) = D (0)1=2 A ( 2 I q ) 1=2 ( =1;:::;g); (20) where p 2 = h =(p q): h=q+1 The q columns of the matrx A are the egenvectors correspondng to the egenvalues 1 2 q of D (0) 1=2 (0) D (0) 1=2 (21) and =dag( 1 ;:::; q ). The use of 2 nstead of unty s proposed n (20), because t avods the possblty of negatve values for ( I q ), whch can occur snce estmates are beng used for the unknown values of D and n (21). To specfy (0) for use n (21), we can randomly assgn the data nto g groups and take (0) to be the sample covarance matrx of the th group ( =1;:::;g). Concernng the choce of D (0), we can take D (0) to be the dagonal matrx formed from the dagonal elements of (0) (=1;:::;g). In ths case, the matrx (21) has the form of a correlaton matrx. The egenvalues and egenvectors for use n (21) can be found by a sngular value decomposton of each p p sample component-covarance matrx (0). But f the number of dmensons p s apprecably greater than the sample sze n, then t s much qucker to nd them by a sngular value decomposton of the n n matrx (0), the sample matrx formed by takng the observatons to be the rows rather than the columns of the p n data matrx whose n columns are the p-dmensonal observatons assgned ntally to the th component ( =1;:::;g). The egenvalues of ths latter matrx are equal to those of (0) apart from a common multpler due to the derent dvsors n ther formaton. A formal test for the number of factors can be undertaken usng the lkelhood rato, as regularty condtons hold for ths test conducted at a gven value for the number of components g. For the null hypothess that H 0 : q = q 0 versus the alternatve

8 386 G.J. McLachlan et al. / Computatonal Statstcs & Data Analyss 41 (2003) H 1 : q = q 0 + 1, the statstc 2 log s asymptotcally ch-squared wth d = g(p q 0 ) degrees of freedom. However, n stuatons where n s not large relatve to the number of unknown parameters, we prefer the use of the BIC crteron of Schwarz (1978). Appled n ths context, t means that twce the ncrease n the log-lkelhood ( 2 log ) has to be greater than d log n for the null hypothess to be rejected. 6. Example: colon data In ths example, we consder the clusterng of tssue samples on the bass of two thousand genes for the colon data of Alon et al. (1999). They used Aymetrx olgonucleotde arrays to montor absolute measurements on expressons of over 6500 human gene expressons n 40 tumour and 22 normal colon tssue samples, These samples were taken from 40 derent patents so that 22 patents suppled both a tumour and normal tssue sample. Alon et al. (1999) focussed on the 2000 genes wth hghest mnmal ntensty across the samples, and t s these 2000 genes that comprsed our data set. The matrx A of mcroarray data for ths data set thus has p = 2000 rows and n = 62 columns. Before we consdered the clusterng of ths set, we processed the data by takng the (natural) logarthm of each expresson level n the matrx A. Then each column of ths matrx was standardzed to have mean zero and unt standard devaton. Fnally, each row of the consequent matrx was standardzed to have mean zero and unt standard devaton. We are unable to proceed drectly wth the ttng of a normal mxture model to these data n ths form. But even f we were able to do so, t s not perhaps the deal way of proceedng because wth such a large number p of feature varables, there wll be a lot of nose ntroduced nto the problem and ths nose s unable to be modelled adequately because of the very small number (n = 62) of observatons avalable relatve to the dmenson p = 2000 of each observaton. We therefore appled the screenng procedure n the software EMMIX-GENE of McLachlan et al. (2001). Wth ths screenng procedure, the genes are ranked n decreasng sze of 2 log, where s essentally the lkelhood rato statstc for the test of g = 1 versus g =2 component t dstrbutons tted to the 62 tssues wth each gene consdered ndvdually. If the value of 2 log were greater than some threshold (here taken to be 8) but the mnmum sze of the mpled clusters was less than some threshold (here taken to be 8 also), ths value of was replaced by ts value for the test of g = 2 versus 3 components. Ths screenng of the genes here resulted n 446 genes beng retaned. We rst clustered the n = 62 tssues on the bass of the retaned set of 446 genes. We tted mxtures of factor analyzers for varous levels of the number q of factors rangng from q = 2 to 8. Usng 50 random and 50 k-means-based starts, the clusterng correspondng to the largest of the local maxma obtaned gave the followng clusterng for q = 6 factors, C 1 = {1 12; 20; 25; 41 52} {13 39; 21 24; 26 40; 53 62}: (22) Getz et al. (2000) and Getz (2001) reported that there was a change n the protocol durng the conduct of the mcroarray experments. The 11 tumour tssue samples

9 G.J. McLachlan et al. / Computatonal Statstcs & Data Analyss 41 (2003) (labelled 1 11 here) and 11 normal tssue samples (41 51) were taken from the rst 11 patents usng a poly detector, whle the 29 tumour tssue samples (12 40) and normal tssue samples (52 62) were taken from the remanng 29 patents usng total extracton of RNA. It can be seen from (22) that ths clusterng C 1 almost corresponds to the dchotomy between tssues obtaned under the old and new protocols. A more detaled account of mxture model-based clusterng of ths colon data set may be found n McLachlan et al. (2001). References Atkn, M., Anderson, D., Hnde, J., Statstcal modellng of data on teachng styles (wth dscusson) J. Roy. Statst. Soc. Ser. B 144, Alon, U., Barka, N., Notterman, D.A., Gsh, K., Ybarra, S., Mack, D., Levne, A.J., Broad patterns of gene expresson revealed by clusterng analyss of tumor and normal colon tssues probed by olgonucleotde arrays. Proc. Nat. Acad. Sc. 96, Baneld, J.D., Raftery, A.E., Model-based Gaussan and non-gaussan clusterng. Bometrcs 49, Bshop, C.M., Latent varable models. In: Jordan, M.I. (Ed.), Learnng n Graphcal Models. Kluwer, Dordrecht, pp Chang, W.C., On usng prncpal components before separatng a mxture of two multvarate normal dstrbutons. Appl. Statst. 32, Dempster, A.P., Lard, N.M., Rubn, D.B., Maxmum lkelhood from ncomplete data va the EM algorthm (wth dscusson) J. Roy. Statst. Soc. Ser. B 39, Fokoue, E., Ttterngton, D.M., Bayesan samplng for mxtures of factor analysers. Techncal Report, Department of Statstcs, Unversty of Glasgow, Glasgow. Getz, G., Prvate communcaton. Getz, G., Levne, E., Domany, E., Coupled two-way clusterng analyss of gene mcroarray data. Cell Bol. 97, Ghahraman, Z., Beal, M.J., Varatonal nference for Bayesan mxtures of factor analyzers. In: Solla, S.A., Leen, T.K., Mller, K.-R. (Eds.), Neural Informaton Processng Systems 12. MIT Press, MA, pp Ghahraman, Z., Hnton, G.E., The EM algorthm for factor analyzers. Techncal Report No. CRG-TR-96-1, The Unversty of Toronto, Toronto. Hnton, G.E., Dayan, P., Revow, M., Modelng the manfolds of mages of handwrtten dgts. IEEE Trans. Neural Networks 8, Lawley, D.N., Maxwell, A.E., Factor Analyss as a Statstcal Method, 2nd Edton. Butterworths, London. L, J.Q., Barron, A.R., Mxture densty estmaton. Techncal Report, Department of Statstcs, Yale Unversty, New Haven, Connectcut. McLachlan, G.J., Krshnan, T., The EM Algorthm and Extensons. Wley, New York. McLachlan, G.J., Peel, D., 2000a. Fnte Mxture Models. Wley, New York. McLachlan, G.J., Peel, D., 2000b. Mxtures of factor analyzers. In: Langley, P. (Ed.), Proceedngs of the Seventeenth Internatonal Conference on Machne Learnng. Morgan Kaufmann, San Francsco, pp McLachlan, G.J., Bean, R.W., Peel, D., EMMIX-GENE: a mxture model-based program for the clusterng of mcroarray expresson data. Techncal Report, Centre for Statstcs, Unversty of Queensland. Meng, X.L., Rubn, D.B., Maxmum lkelhood estmaton va the ECM algorthm: a general framework Bometrka 80, Meng, X.L., van Dyk, D., The EM algorthm an old folk song sung to a fast new tune (wth dscusson) J. Roy. Statst. Soc. Ser. B 59, Schwarz, G., Estmatng the dmenson of a model. Ann. Statst. 6,

10 388 G.J. McLachlan et al. / Computatonal Statstcs & Data Analyss 41 (2003) Tppng, M.E., Bshop, C.M., Mxtures of probablstc prncpal component analysers. Techncal Report No. NCRG=97=003, Neural Computng Research Group, Aston Unversty, Brmngham. Tppng, M.E., Bshop, C.M., Mxtures of probablstc prncpal component analysers. Neural Comput. 11, Utsug, A., Kumaga, T., Bayesan analyss of mxtures of factor analyzers. Neural Comput. 13,

Georey E. Hinton. University oftoronto. Email: zoubin@cs.toronto.edu. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract

Georey E. Hinton. University oftoronto. Email: zoubin@cs.toronto.edu. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract The EM Algorthm for Mxtures of Factor Analyzers Zoubn Ghahraman Georey E. Hnton Department of Computer Scence Unversty oftoronto 6 Kng's College Road Toronto, Canada M5S A4 Emal: zoubn@cs.toronto.edu Techncal

More information

Mixtures of Factor Analyzers with Common Factor Loadings for the Clustering and Visualisation of High-Dimensional Data

Mixtures of Factor Analyzers with Common Factor Loadings for the Clustering and Visualisation of High-Dimensional Data Mxtures of Factor Analyzers wth Common Factor Loadngs for the Clusterng and Vsualsaton of Hgh-Dmensonal Data Jangsun Baek 1 and Geoffrey J. McLachlan 2 1 Department of Statstcs, Chonnam Natonal Unversty,

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Data Visualization by Pairwise Distortion Minimization

Data Visualization by Pairwise Distortion Minimization Communcatons n Statstcs, Theory and Methods 34 (6), 005 Data Vsualzaton by Parwse Dstorton Mnmzaton By Marc Sobel, and Longn Jan Lateck* Department of Statstcs and Department of Computer and Informaton

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

Binomial Link Functions. Lori Murray, Phil Munz

Binomial Link Functions. Lori Murray, Phil Munz Bnomal Lnk Functons Lor Murray, Phl Munz Bnomal Lnk Functons Logt Lnk functon: ( p) p ln 1 p Probt Lnk functon: ( p) 1 ( p) Complentary Log Log functon: ( p) ln( ln(1 p)) Motvatng Example A researcher

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Bayesian Cluster Ensembles

Bayesian Cluster Ensembles Bayesan Cluster Ensembles Hongjun Wang 1, Hanhua Shan 2 and Arndam Banerjee 2 1 Informaton Research Insttute, Southwest Jaotong Unversty, Chengdu, Schuan, 610031, Chna 2 Department of Computer Scence &

More information

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data Journal of Al Azhar Unversty-Gaza (Natural Scences), 2011, 13 : 109-118 Estmatng the Number of Clusters n Genetcs of Acute Lymphoblastc Leukema Data Mahmoud K. Okasha, Khaled I.A. Almghar Department of

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

Imperial College London

Imperial College London F. Fang 1, C.C. Pan 1, I.M. Navon 2, M.D. Pggott 1, G.J. Gorman 1, P.A. Allson 1 and A.J.H. Goddard 1 1 Appled Modellng and Computaton Group Department of Earth Scence and Engneerng Imperal College London,

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Fast Fuzzy Clustering of Web Page Collections

Fast Fuzzy Clustering of Web Page Collections Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg,

More information

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM BARRIOT Jean-Perre, SARRAILH Mchel BGI/CNES 18.av.E.Beln 31401 TOULOUSE Cedex 4 (France) Emal: jean-perre.barrot@cnes.fr 1/Introducton The

More information

PERRON FROBENIUS THEOREM

PERRON FROBENIUS THEOREM PERRON FROBENIUS THEOREM R. CLARK ROBINSON Defnton. A n n matrx M wth real entres m, s called a stochastc matrx provded () all the entres m satsfy 0 m, () each of the columns sum to one, m = for all, ()

More information

Stochastic epidemic models revisited: Analysis of some continuous performance measures

Stochastic epidemic models revisited: Analysis of some continuous performance measures Stochastc epdemc models revsted: Analyss of some contnuous performance measures J.R. Artalejo Faculty of Mathematcs, Complutense Unversty of Madrd, 28040 Madrd, Span A. Economou Department of Mathematcs,

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract Household Sample Surveys n Developng and Transton Countres Chapter More advanced approaches to the analyss of survey data Gad Nathan Hebrew Unversty Jerusalem, Israel Abstract In the present chapter, we

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Lecture 5,6 Linear Methods for Classification. Summary

Lecture 5,6 Linear Methods for Classification. Summary Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

A Fast Incremental Spectral Clustering for Large Data Sets

A Fast Incremental Spectral Clustering for Large Data Sets 2011 12th Internatonal Conference on Parallel and Dstrbuted Computng, Applcatons and Technologes A Fast Incremental Spectral Clusterng for Large Data Sets Tengteng Kong 1,YeTan 1, Hong Shen 1,2 1 School

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

STATISTICAL DATA ANALYSIS IN EXCEL

STATISTICAL DATA ANALYSIS IN EXCEL Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 petr.nazarov@crp-sante.lu Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for

More information

Abstract. Clustering ensembles have emerged as a powerful method for improving both the

Abstract. Clustering ensembles have emerged as a powerful method for improving both the Clusterng Ensembles: {topchyal, Models jan, of punch}@cse.msu.edu Consensus and Weak Parttons * Alexander Topchy, Anl K. Jan, and Wllam Punch Department of Computer Scence and Engneerng, Mchgan State Unversty

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

PREDICTION OF MISSING DATA IN CARDIOTOCOGRAMS USING THE EXPECTATION MAXIMIZATION ALGORITHM

PREDICTION OF MISSING DATA IN CARDIOTOCOGRAMS USING THE EXPECTATION MAXIMIZATION ALGORITHM 18-19 October 2001, Hotel Kontokal Bay, Corfu PREDICTIO OF MISSIG DATA I CARDIOTOCOGRAMS USIG THE EXPECTATIO MAXIMIZATIO ALGORITHM G. okas Department of Electrcal and Computer Engneerng, Unversty of Patras,

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

SIMPLE LINEAR CORRELATION

SIMPLE LINEAR CORRELATION SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Review of Hierarchical Models for Data Clustering and Visualization

Review of Hierarchical Models for Data Clustering and Visualization Revew of Herarchcal Models for Data Clusterng and Vsualzaton Lola Vcente & Alfredo Velldo Grup de Soft Computng Seccó d Intel lgènca Artfcal Departament de Llenguatges Sstemes Informàtcs Unverstat Poltècnca

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

Interpreting Patterns and Analysis of Acute Leukemia Gene Expression Data by Multivariate Statistical Analysis

Interpreting Patterns and Analysis of Acute Leukemia Gene Expression Data by Multivariate Statistical Analysis Interpretng Patterns and Analyss of Acute Leukema Gene Expresson Data by Multvarate Statstcal Analyss ChangKyoo Yoo * and Peter A. Vanrolleghem BIOMATH, Department of Appled Mathematcs, Bometrcs and Process

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

An Algorithm for Data-Driven Bandwidth Selection

An Algorithm for Data-Driven Bandwidth Selection IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 2, FEBRUARY 2003 An Algorthm for Data-Drven Bandwdth Selecton Dorn Comancu, Member, IEEE Abstract The analyss of a feature space

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,

More information

1 De nitions and Censoring

1 De nitions and Censoring De ntons and Censorng. Survval Analyss We begn by consderng smple analyses but we wll lead up to and take a look at regresson on explanatory factors., as n lnear regresson part A. The mportant d erence

More information

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001 Proceedngs of the Annual Meetng of the Amercan Statstcal Assocaton, August 5-9, 2001 LIST-ASSISTED SAMPLING: THE EFFECT OF TELEPHONE SYSTEM CHANGES ON DESIGN 1 Clyde Tucker, Bureau of Labor Statstcs James

More information

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC Approxmatng Cross-valdatory Predctve Evaluaton n Bayesan Latent Varables Models wth Integrated IS and WAIC Longha L Department of Mathematcs and Statstcs Unversty of Saskatchewan Saskatoon, SK, CANADA

More information

Regression Models for a Binary Response Using EXCEL and JMP

Regression Models for a Binary Response Using EXCEL and JMP SEMATECH 997 Statstcal Methods Symposum Austn Regresson Models for a Bnary Response Usng EXCEL and JMP Davd C. Trndade, Ph.D. STAT-TECH Consultng and Tranng n Appled Statstcs San Jose, CA Topcs Practcal

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

Loop Parallelization

Loop Parallelization - - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering Out-of-Sample Extensons for LLE, Isomap, MDS, Egenmaps, and Spectral Clusterng Yoshua Bengo, Jean-Franços Paement, Pascal Vncent Olver Delalleau, Ncolas Le Roux and Mare Oumet Département d Informatque

More information

In the rth step, the computaton of the Householder matrx H r requres only the n ; r last elements of the rth column of A T r;1a r;1 snce we donothave

In the rth step, the computaton of the Householder matrx H r requres only the n ; r last elements of the rth column of A T r;1a r;1 snce we donothave An accurate bdagonal reducton for the computaton of sngular values Ru M. S. Ralha Abstract We present a new bdagonalzaton technque whch s compettve wth the standard bdagonalzaton method and analyse the

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Support vector domain description

Support vector domain description Pattern Recognton Letters 20 (1999) 1191±1199 www.elsever.nl/locate/patrec Support vector doman descrpton Davd M.J. Tax *,1, Robert P.W. Dun Pattern Recognton Group, Faculty of Appled Scence, Delft Unversty

More information

Dscrete-Tme Approxmatons of the Holmstrom-Mlgrom Brownan-Moton Model of Intertemporal Incentve Provson 1 Martn Hellwg Unversty of Mannhem Klaus M. Schmdt Unversty of Munch and CEPR Ths verson: May 5, 1998

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Active Learning for Interactive Visualization

Active Learning for Interactive Visualization Actve Learnng for Interactve Vsualzaton Tomoharu Iwata Nel Houlsby Zoubn Ghahraman Unversty of Cambrdge Unversty of Cambrdge Unversty of Cambrdge Abstract Many automatc vsualzaton methods have been. However,

More information

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com

More information

Evaluating credit risk models: A critique and a new proposal

Evaluating credit risk models: A critique and a new proposal Evaluatng credt rsk models: A crtque and a new proposal Hergen Frerchs* Gunter Löffler Unversty of Frankfurt (Man) February 14, 2001 Abstract Evaluatng the qualty of credt portfolo rsk models s an mportant

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

where the coordinates are related to those in the old frame as follows.

where the coordinates are related to those in the old frame as follows. Chapter 2 - Cartesan Vectors and Tensors: Ther Algebra Defnton of a vector Examples of vectors Scalar multplcaton Addton of vectors coplanar vectors Unt vectors A bass of non-coplanar vectors Scalar product

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Statistical algorithms in Review Manager 5

Statistical algorithms in Review Manager 5 Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a meta-analyss of k studes

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

A Structure for General and Specc Market Rsk Eckhard Platen 1 and Gerhard Stahl Summary. The paper presents a consstent approach to the modelng of general and specc market rsk as dened n regulatory documents.

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS Tmothy J. Glbrde Assstant Professor of Marketng 315 Mendoza College of Busness Unversty of Notre Dame Notre Dame, IN 46556

More information

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia To appear n Journal o Appled Probablty June 2007 O-COSTAT SUM RED-AD-BLACK GAMES WITH BET-DEPEDET WI PROBABILITY FUCTIO LAURA POTIGGIA, Unversty o the Scences n Phladelpha Abstract In ths paper we nvestgate

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

Traffic-light a stress test for life insurance provisions

Traffic-light a stress test for life insurance provisions MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax

More information

Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data

Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data Computatonal Statstcs & Data Analyss 51 (26) 1643 1655 www.elsever.com/locate/csda Multclass sparse logstc regresson for classfcaton of multple cancer types usng gene expresson data Yongda Km a,, Sunghoon

More information

Construction Rules for Morningstar Canada Target Dividend Index SM

Construction Rules for Morningstar Canada Target Dividend Index SM Constructon Rules for Mornngstar Canada Target Dvdend Index SM Mornngstar Methodology Paper October 2014 Verson 1.2 2014 Mornngstar, Inc. All rghts reserved. The nformaton n ths document s the property

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence

More information

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE, FEBRUARY ISSN 77-866 Logcal Development Of Vogel s Approxmaton Method (LD- An Approach To Fnd Basc Feasble Soluton Of Transportaton

More information

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio Vascek s Model of Dstrbuton of Losses n a Large, Homogeneous Portfolo Stephen M Schaefer London Busness School Credt Rsk Electve Summer 2012 Vascek s Model Important method for calculatng dstrbuton of

More information

Diagnostic Tests of Cross Section Independence for Nonlinear Panel Data Models

Diagnostic Tests of Cross Section Independence for Nonlinear Panel Data Models DISCUSSION PAPER SERIES IZA DP No. 2756 Dagnostc ests of Cross Secton Independence for Nonlnear Panel Data Models Cheng Hsao M. Hashem Pesaran Andreas Pck Aprl 2007 Forschungsnsttut zur Zukunft der Arbet

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

Dimensionality Reduction for Data Visualization

Dimensionality Reduction for Data Visualization Dmensonalty Reducton for Data Vsualzaton Samuel Kask and Jaakko Peltonen Dmensonalty reducton s one of the basc operatons n the toolbox of data-analysts and desgners of machne learnng and pattern recognton

More information

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining Rsk Model of Long-Term Producton Schedulng n Open Pt Gold Mnng R Halatchev 1 and P Lever 2 ABSTRACT Open pt gold mnng s an mportant sector of the Australan mnng ndustry. It uses large amounts of nvestments,

More information