Mixtures of Factor Analyzers with Common Factor Loadings for the Clustering and Visualisation of High-Dimensional Data

Size: px
Start display at page:

Download "Mixtures of Factor Analyzers with Common Factor Loadings for the Clustering and Visualisation of High-Dimensional Data"

Transcription

1 Mxtures of Factor Analyzers wth Common Factor Loadngs for the Clusterng and Vsualsaton of Hgh-Dmensonal Data Jangsun Baek 1 and Geoffrey J. McLachlan 2 1 Department of Statstcs, Chonnam Natonal Unversty, Gwangju -77, South Korea and 2 Department of Mathematcs and Insttute for Molecular Boscence, Unversty of Queensland, Brsbane QLD 472, Australa March, 28 Abstract Mxtures of factor analyzers enable model-based densty estmaton and clusterng to be undertaken for hgh-dmensonal data, where the number of observatons n s very large relatve to ther dmenson p. In practce, there s often the need to reduce further the number of parameters n the specfcaton of the componentcovarance matrces. To ths end, we propose the use of common component-factor loadngs, whch consderably reduces further the number of parameters. Moreover, t allows the data to be dsplayed n low-dmensonal plots. Keywords Normal mxture models, mxtures of factor analyzers, common factor loadngs, model-based clusterng. 1

2 1 Introducton Fnte mxture models are beng ncreasngly used to model the dstrbutons of a wde varety of random phenomena and to cluster data sets; see, for example, [1]. Let Y = (Y 1,..., Y p ) T (1) be a p-dmensonal vector of feature varables. For contnuous features Y j, the densty of Y can be modelled by a mxture of a suffcently large enough number g of multvarate normal component dstrbutons, f(y; Ψ) = g π φ(y; µ, Σ ), (2) =1 where φ(y; µ, Σ) denotes the p-varate normal densty functon wth mean µ and covarance matrx Σ. Here the vector Ψ of unknown parameters conssts of the mxng proportons π, the elements of the component means µ, and the dstnct elements of the component-covarance matrces Σ ( = 1,..., g). The parameter vector Ψ can be estmated by maxmum lkelhood. For an observed random sample, y 1,..., y n, the log lkelhood functon for Ψ s gven by log L(Ψ) = n log f(y j ; Ψ). (3) j=1 The maxmum lkelhood estmate (MLE) of Ψ, ˆΨ, s gven by an approprate root of the lkelhood equaton, log L(Ψ)/ Ψ =. (4) Solutons of (4) correspondng to local maxmzers of log L(Ψ) can be obtaned va the expectaton-maxmzaton (EM) algorthm [2]; see also [3]. Besdes provdng an estmate of the densty functon of Y, the normal mxture model (2) provdes a probablstc clusterng of the observed data y 1,..., y n nto g clusters n terms of ther estmated posteror probabltes of component membershp of the mxture. The posteror probablty τ (y j ; Ψ) that the jth feature vector wth observed value y j belongs to the th component of the mxture can be expressed by Bayes theorem as τ (y j ; Ψ) = π φ(y j ; µ, Σ ) g h=1 π hφ(y j ; µ h, Σ h ) ( = 1,..., g; j = 1,..., n). () An outrght assgnment of the data s obtaned by assgnng each data pont y j to the component to whch t has the hghest estmated posteror probablty of belongng. The g-component normal mxture model (2) wth unrestrcted component-covarance matrces s a hghly parameterzed model wth d = 1 p(p + 1) parameters for each 2 component-covarance matrx Σ ( = 1,..., g). Banfeld and Raftery [4] ntroduced a parameterzaton of the component-covarance matrx Σ based on a varant of the standard spectral decomposton of Σ ( = 1,..., g). But f p s large relatve to the 2

3 sample sze n, t may not be possble to use ths decomposton to nfer an approprate model for the component-covarance matrces. Even f t s possble, the results may not be relable due to potental problems wth near-sngular estmates of the componentcovarance matrces when p s large relatve to n. In ths paper, we focus on the use of mxtures of factor analyzers to reduce the number of parameters n the specfcaton of the component-covarance matrces, as dscussed n [1,, 6]; see also [7]. Wth the factor-analytc representaton of the component-covarance matrces, we have that Σ = B B T + D ( = 1,..., g), (6) where B s a p q matrx and D s a dagonal matrx. As 1 q(q 1) constrants are 2 needed for B to be unquely defned, the number of free parameters n (6) s pq + p 1 q(q 1). (7) 2 Thus wth ths representaton (6), the reducton n the number of parameters for Σ s r = 1 p(p + 1) pq p + 1 q(q 1) 2 2 = 1{(p 2 q)2 (p + q)}, (8) assumng that q s chosen suffcently smaller than p so that ths dfference s postve. The total number of parameters s d 1 = (g 1) + 2gp + g{pq 1 q(q 1)}. (9) 2 We shall refer to ths approach as MFA (mxtures of factor analyzers). Even wth ths MFA approach, the number of parameters stll mght not be manageable, partcularly f the number of dmensons p s large and/or the number of components (clusters) g s not small. In ths paper, we therefore consder how ths factor-analytc approach can be modfed to provde a greater reducton n the number of parameters. We extend the model of [8, 9] to propose the normal mxture model (2) wth the restrctons and µ = Aξ ( = 1,..., g) () Σ = AΩ A T + D ( = 1,..., g), (11) where A s a p q matrx, ξ s a q-dmensonal vector, Ω s a q q postve defnte symmetrc matrx, and D s a dagonal p p matrx. As to be made more precse n the next secton, A s a matrx of loadngs on q unobservable factors and ts p columns are taken to be orthonormal; that s, A T A = I q, (12) where I q s the q q dentty matrx. Wth the restrctons () and 11) on the component mean µ and covarance matrx Σ, respectvely, the total number of parameters s reduced to d 2 = (g 1) + p + q(p + g) + 1 (g 1)q(q + 1). (13) 2 3

4 We shall refer to ths approach as MCFA (mxtures of common factor analyzers) snce t s formulated va the adopton of a common matrx (A) for the component-factor loadngs. We shall show for ths approach how the EM algorthm can be mplemented to ft ths normal mxture model under the constrants () and (11). We shall also llustrate how t can be used to provde lower-dmensonal plots of the data y j (j = 1,..., n). It provdes an alternatve to canoncal varates whch are calculated from the clusters under the assumpton of equal component-covarance matrces. 2 Mxtures of Common Factor Analyzers (MCFA) In ths secton, we examne the motvaton underlyng the MCFA approach wth ts constrants () and (11) on the g component means and covarance matrces µ and Σ ( = 1,..., g). We shall show that t can be vewed as a specal case of the MFA approach. To see ths we frst note that the MFA approach wth the factor-analytc representaton (6) on Σ s equvalent to assumng that the dstrbuton of the dfference Y j µ can be modelled as Y j µ = B U j + e j wth prob. π ( = 1,..., g) (14) for j = 1,..., n, where the (unobservable) factors U 1,..., U n are dstrbuted ndependently N(, I q ), ndependently of the e j, whch are dstrbuted ndependently N(, D ), where D s a dagonal matrx ( = 1,..., g). As noted n the ntroductory secton, ths model may not lead to a suffcently large enough reducton n the number of parameters, partcularly f g s not small. Hence f ths s the case, we propose the MCFA approach whereby the dstrbuton of Y j s modelled as Y j = AU j + e j wth prob. π ( = 1,..., g) (1) for j = 1,..., n, where the (unobservable) factors U 1,..., U n are dstrbuted ndependently N(ξ, Ω ), ndependently of the e j, whch are dstrbuted ndependently N(, D), where D s a dagonal matrx ( = 1,..., g). Here A s a p q matrx of factor loadngs, whch we take to satsfy the relatonshp (12); that s, A T A = I q. To see that the MCFA model as specfed by (1) s a specal case of the MFA approach as specfed by (14), we note that we can rewrte (1) as where Y j = AU j + e j = Aξ + A(U j ξ ) + e j = µ + AK K 1 (U j ξ ) + e j = µ + B U j + e j, (16) µ = Aξ, (17) B = AK, (18) U j = K 1 (U j ξ ), (19) 4

5 and where the U j are dstrbuted ndependently N(, I q ). The covarance matrx of U j s equal to I q, snce K can be been chosen so that K 1 Ω K 1T = I q ( = 1,..., g). (2) On comparng (16) wth (14), t can be seen that the MCFA model s a specal case of the MFA model wth the addtonal restrctons that and µ = Aξ ( = 1,..., g), (21) B = AK ( = 1,..., g), (22) D = D ( = 1,..., g). (23) The latter restrcton of equal dagonal covarance matrces for the component-specfc error terms (D = D) s sometmes mposed wth applcatons of the MFA approach to avod potental sngulartes wth small clusters (see []). Concernng the restrcton (22) that the matrx of factor of loadngs s equal to AK for each component, t can be vewed as adoptng common factor loadngs before the use of the transformaton K to transform the factors so that they have unt varances and zero covarances. Hence ths s why we call ths approach mxtures of common factor analyzers. It s also dfferent to the MFA approach n that t consders the factor-analytc representaton of the observatons Y j drectly, rather than the error terms Y j µ. As the MFA approach allows a more general representaton of the componentcovarance matrces and places no restrctons on the component means t s n ths sense preferable to the MCFA approach f ts applcaton s feasble gven the values of p and g. If the dmenson p and/or the number of components g s too large, then the MCFA provdes a more feasble approach at the expense of more dstrbutonal restrctons on the data. In emprcal results some of whch are to be reported n the sequel we have found the performance of the MCFA approach s usually at least comparable to the MFA approach for data sets to whch the latter s practcally feasble. The MCFA approach also has the advantage n that the latent factors n ts formulaton are allowed to have dfferent means and covarance matrces and are not whte nose as wth the formulaton of the MFA approach. Thus the (estmated) posteror means of the factors correspondng to the observed data can be used to portray the latter n low-dmensonal spaces. 3 Some Related Approaches The MCFA approach s smlar n form to the approach proposed by Yoshda et al. [8, 9], who also mposed the addtonal restrctons that the common dagonal covarance matrx D of the error terms s sphercal, D = σ 2 I p, (24)

6 Table 1: The number of parameters for three models p g q Number of parameters MFA MCFA MCUFSA and that the component-covarance matrces of the factors are dagonal. We shall call ths approach MCUFSA (mxtures of common uncorrelated factor sphercal-error analyzers). The total number of parameters wth ths approach s d 3 = (g 1) + pq gq 1 q(q + 1). (2) 2 In our experence, we have found that these restrctons of sphercty of the errors and of dagonal covarance matrces n the component dstrbutons of the factors can have an adverse effect on the clusterng of hgh-dmensonal data sets. The relaxaton of these restrctons does consderably ncrease the complexty of the problem of fttng the model. But we shall show how t can be effected va the EM algorthm wth the E- and M-steps beng able to be carred out n closed form. In Table 1, we have lsted the number of parameters to be estmated for the MFA, MCFA, and MCUFSA models when p =, ; q = 2; and g = 2, 4. For example, when we cluster p = dmensonal gene expresson data nto g = 2 groups usng q = 2 dmensonal factors, the MFA model requres 7999 parameters to be estmated, whle the MCUFSA needs only 27 parameters. Moreover, as the number of clusters grows from 2 to 4 the number of parameters for the MFA model grows twce as large as before, but that for MCUFSA remans almost the same. As MCUFSA needs less parameters to characterze the structure of the clusters, t does not always provde a good ft. It may fal to ft the data unless the drectons of the cluster-error dstrbutons are parallel to the axes of the orgnal feature space due the condton of sphercty on the cluster errors. Also, t s assumng that the component-covarance matrces of the factors are dagonal. Recently, Sangunett [] has consdered a method of dmensonalty reducton n a cluster analyss context. However, ts underlyng model assumes sphercty n the specfcaton of the varances/covarances of the factors n each cluster. Our proposed method allows for oblque factors, whch provdes the extra flexblty needed to cluster more effectvely hgh-dmensonal data sets n practce. 6

7 4 Fttng of Factor-Analytc Models The fttng of mxtures of factor analyzers as wth the MFA approach has been consdered n [], usng a varant of the EM algorthm known as the alternatng expectatoncondtonal maxmzaton algorthm (AECM). Wth the MCFA approach, we have ft to the same mxture model of factor analyzers but wth the addtonal restrctons ()and (11) on the component means µ and covarance matrces Σ. We also have to mpose the restrcton (23) of common dagonal covarance matrces D. The mplementaton of the EM algorthm for ths model s descrbed n the Appendx. In the EM framework, the component label z j assocated wth the observaton y j s ntroduced as mssng data, where z j = (z j ) s one or zero accordng as y j belongs or does not belong to the th component of the mxture ( = 1,..., g; j = 1,..., n). The unobservable factors u j are also ntroduced as mssng data n the EM framework. As part of the E-step, we requre the condtonal expectaton of the component labels z j ( = 1,..., g) gven the observed data pont y j (j = 1,..., n). It follows that E Ψ {Z j y j } = pr Ψ {Z j = 1 y j } = τ (y j ; Ψ) ( = 1,..., g; j = 1,..., n), (26) where τ (y j ; Ψ) s the posteror probablty that y j belongs to the th component of the mxture. From (), t can be expressed under the MCFA model as τ (y j ; Ψ) = π φ(y j ; Aξ, AΩ A T + D) g h=1 π hφ(y j ; Aξ h, AΩ h A T + D) (27) for = 1,..., g; j = 1,..., n. We also requre the condtonal dstrbuton of the unobservable (latent) factors U j gven the observed data y j (j = 1,..., n). The condtonal dstrbuton of U j gven y j and ts membershp of the th component of the mxture (that s, z j = 1) s multvarate normal, U j y j, z j = 1 N(ξ j, Ω y ), (28) where and and where ξ j = ξ + γ T (y j Aξ ) (29) Ω y = (I q γ T A)Ω, (3) γ = (AΩ A T + D) 1 AΩ. (31) We can portray the observed data y j n q-dmensonal space by plottng the correspondng values of the û j, whch are estmated condtonal expectatons of the factors U j, correspondng to the observed data ponts y j. From (28) and (29), E(U j y j, z j = 1) = ξ j = ξ + γ T (y j Aξ ). (32) 7

8 We let û j denote the value of the rght-hand sde of (32) evaluated at the maxmum lkelhood estmates of ξ, γ, and A. We can defne the estmated value û j of the jth factor correspondng to y j as û j = g τ (y j ; ˆΨ) û j (j = 1,..., n) (33) =1 where, from (27), τ (y j ; ˆΨ) s the estmated posteror probablty that y j belongs to the th component. An alternatve estmate of the posteror expectaton of the factor correspondng to the jth observaton y j s defned by replacng τ (y j ; ˆΨ) by ẑ j n (33), where ẑ j = arg max τ h (y j ; ˆΨ). (34) h Accuracy of Factor-Analytc Approxmatons To llustrate the accuracy of the three factor-analytc approxmatons as defned above, we performed a small smulaton experment. We generated random vectors from each of g = 2 dfferent three-dmensonal multvarate normal dstrbutons. The frst dstrbuton had the mean vector µ 1 = (,, ) T and covarance matrx Σ 1 = whle the second dstrbuton had mean vector µ 2 = (2, 2, 6) T and covarance matrx Σ 2 = We appled the MFA, MCFA, and the MCUFSA approaches wth q = 2 to cluster the data nto two groups. We used the ArrayCluster ~hguch/arraycluster.htm, whch was developed by Yoshda et al. [9] to mplement the MCUFSA approach. There were 2 msclassfcatons for MFA, 4 for MCFA, and 8 for MCUFSA. As we obtaned the parameter estmates for each model we can also predct each observaton based on the estmated factor scores and the parameter estmates. In Fgures 1, 2, and 3, we have plotted the predcted observatons ŷ j along wth the actual observatons y j by the MFA, MCFA, and the MCUFSA approaches. For the MFA approach, the predcted observaton s obtaned as where and ŷ j =, g τ (y j ; ˆΨ)(ˆµ + ˆB û j ), (3) =1 û j = ˆα T (y j ˆµ ) (36) ˆα = ( ˆB ˆBT + ˆD ) 1 ˆB. (37) 8

9 1 x3 x2 x1 Fgure 1: Orgnal observatons and the predcted observatons by MFA: Group 1; o Group 2; * predcted for Group 1; + predcted for Group 2 1 x3 x2 x1 Fgure 2: Orgnal observatons and the predcted observatons by MCFA: Group 1; o Group 2; * predcted for Group 1; + predcted for Group 2 For the MCFA approach, the predcted observaton s ŷ j = Âû j, (38) where  s the estmated projecton matrx A and where û j s the estmated factor score for the jth observaton, as defned by (33); smlarly, for the MCUFSA approach. The fgures show that the orgnal dstrbuton structure of two groups s recovered by the estmated factor scores for MFA and MCFA approaches. Ther assumed models are suffcently flexble to ft the data where the drectons of the two cluster-error dstrbutons are not parallel to the axes of the orgnal feature sapce. On the other hand the predcted observatons for the MCUFSA approach are not ftted well to the actual dstrbuton of two groups as shown n Fgure 3. Wth ths approach, the predcted observatons tend to be hgher than the actual observatons from the frst group and lower for those from the second group. Ths lack of ft s due to the strct assumpton of a sphercal covarance matrx for each component-error dstrbuton. We measured the dfference between the predcted and observed observatons by the mean squared error (MSE),where MSE= 2 j=1 (y j ŷ j ) T (y j ŷ j )/2. The value of the MSE for the smulated data s 2.3, 3.8, for MFA, MCFA, and MCUFSA, respectvely. As to be expected, the MSE ncreases n gong from MFA to MCFA and then markedly to MCUFSA. 9

10 1 x3 x2 x1 Fgure 3: Orgnal observatons and the predcted observatons by MCUFSA: Group 1; o Group 2; * predcted for Group 1; + predcted for Group 2 6 Applcatons of MCFA Approach to Clusterng We mplemented the MFA, MCFA, and MCUFSA approaches to cluster tssues n two dfferent gene expresson data sets and ndvduals n one chemcal measurement data set. We compared the consstency between the mpled clusterng obtaned wth each approach wth the true group membershp of each data set. We adopted the clusterng correspondng to the local maxmzer that gave the largest value of the lkelhood as obtaned by mplementng the EM algorthm for tral starts, comprsng 2 k-means starts and 2 random starts. To measure the agreement between a clusterng of the data and ther true group membershp, we used the consstency measures of Jaccard Index [11] and the Adjusted Rand Index (ARI)[12]. Both ndces take the value 1 when there s perfect valdaton between the clusterng and the true groupng. The Jaccard Index takes any value between and 1, but the ARI can have negatve values. 6.1 Example 1: Leukaema Gene-Expresson Data The frst data set concerns the leukaema tssue samples of Golub et al. [13], n whch there are two types of acute leukaema: Acute Lymphoblastc Leukaema (ALL) and Acute Myelod Leukaema (AML). The data contan the expresson levels of 7129 genes on 72 tssues comprsng 47 cases of ALL and 2 cases of AML. We preprocessed the data by the followng steps: () thresholdng wth a floor of and a celng of 16; () flterng: excluson of genes wth max / mn and (max mn), where max and mn refer to the maxmum and mnmum expresson levels, respectvely, of a partcular gene across a tssue sample; () takng logs of the expresson levels and then standardzng them across genes to have mean zero and unt standard devaton. Fnally, standardzng them across samples to have mean zero and unt standard devaton. Ths preprocessng resulted n thousand genes beng retaned. We reduced ths set further to genes by selectng genes accordng to the smple two-sample t-test as used n [14]. As observed n prevous studes, the data on the ALL and AML cases are well separated, confrmng the bologcal dstncton between ALL and AML subtypes. Hence all three approaches were able to cluster the nto two clusters that almost corresponded

11 Table 2: Agreement between the clusterng result and the true membershp of leukaema data q Smlarty Indces Model MFA Jaccard MCFA MCUFSA MFA ARI MCFA MCUFSA ALL AML 4 2 u u 1 Fgure 4: Plot of the factor scores wth the MCFA approach perfectly wth the ALL and AML subtypes. There was only one msallocaton wth the three approaches usng ether q = 1, 2, 3, or 4 factors. Hence n Table 2 the values of the Jaccard Index are the same for all three approaches, as are the ARI values. We plotted the estmated factor scores û j = (û 1j, û 2j ) T of the tssues n twodmensonal space accordng to ther clustered membershp determned by the MCFA wth q = 2 factors (Fgure 4). It can be seen from Fgure 4 that the two classes ALL and AMl are well separated. The one AML tssue that s msallocated as ALL s crcled n Fgure Example 2: Paedatrc Leukaema Gene-Expresson Data In the second example, we consdered the clusterng of the Paedatrc Acute Lymphoblastc Leukaema (ALL) data of Yeoh et al. [1], who analyzed some gene expressons on 7 subtypes of 327 ALL tssues, consstng of BCR-ABL, E2A-PBXI, Hyperdplod (> ), MLL, Novel, T-ALL, and TEL-AML1 subtypes. They performed an herarchcal cluster analyss of the 327 dagnostc cases usng genes selected by several dfferent methods ncludng ch-squared and the t-statstc. We chose for our analyss the 2 genes wth hghest ch-squared statstcs n each subtype. (Ths resulted n a total 132 unque genes as some were chosen for more than one subtype. Gven that the 11

12 Table 3: Agreement between the clusterng result and the true membershp of paedatrc ALL data q Smlarty Indces Model MFA Jaccard MCFA NA MCUFSA MFA ARI MCFA NA MCUFSA number of components (subtypes) here s not small wth g = 7, we mposed the constrant (23) of common dagonal matrces D n the formulaton of the MFA approach. Ths constrant s always mposed wth the MCFA approach. The values of the Jaccard Index and the ARI are lsted n Table 3 for the three approaches for each of fve levels of the number of factors q (q = 1, 2, 3, 4, ). In the case of a sngle factor (q = 1), the MFCA approach usng g = 7 components clustered the 327 tssues nto only clusters, and so the ndces were unable to be calculated. Ths s noted by NA (not avalable) for q = 1 n Table 3. It can be seen that the performance of the MCFA approach for q 3 factors s comparable to the MFA approach, although ther best results for the Jaccard and Adjusted Rand Indces occur for dfferent values of q. The ndces for the latter are greatest for q = 3 factors, whereas they are greatest for the MCFA approach for q = factors. The ArrayCluster program for the mplementaton of the MCUFSA approach gave results for only q = 3 and factors, whch are not as hgh as those for the MCFA approach. 6.3 Example 3: Chemcal Data wth Addtonal Nose Added The thrd example consders the so-called Vetnam data whch was consdered n Smyth et al. [16]. The Vetnam data set conssts of the log transformed and standardzed concentratons of 17 chemcal elements to whch four types of synthetc nose varables were added n [16] to study methods for clusterng hgh-dmensonal data. We used these data consstng of a total of 67 varables (p = 67; 17 chemcal concentraton varables plus normal nose varables). The concentratons were measured n har samples from sx classes (g = 6) of Vetnamese, and the total number of subjects were n = 224. The values of the ndces for the clusterng results for ths set are presented n Table 4. It can be seen that the hghest value (.81) of the ARI for the MFCA approach was obtaned for q = 4 factors, compared to a hghest value of.611 for ths ndex at q = 3 factors wth the MFA approach. A smlar comparson can be made on the bass of the Jaccard Index. Agan, t was not possble to obtan results for the MCUFSA approach for all values of q. 12

13 Table 4: Agreement between the clusterng result and the true membershp of Vetnamese data q Smlarty Indces Model MFA Jaccard MCFA MCUFSA MFA ARI MCFA MCUFSA Low-Dmensonal Plots va MCFA Approach To llustrate the usefulness of the MFCA approach for portrayng the results of a clusterng n low-dmensonal space, we have plotted n Fgure the estmated mean posteror values of the factors q j as defned by (33). It can be seen that the clusters are represented n ths plot wth very lttle overlap. Ths s not the case n Fgure 6, where the frst two canoncal varates are plotted. They were calculated usng the mpled clusterng labels. It can be seen from Fgure 6 that one cluster s essentally on top of another. The canoncal varates are calculated on the bass of the assumpton of equal cluster-covarance matrces, whch does not apply here. The MCFA approach s not predcated on ths assumpton and so has more flexblty n representng the data n reduced dmensons. Fgure : Plot of the (estmated) posteror mean factor scores 8 Dscusson and Conclusons In practce, much attenton s beng gven to the use of normal mxture models n densty estmaton and clusterng. However, for hgh-dmensonal data sets, the componentcovarance matrces are hghly parameterzed and some form of reducton n the number of parameters s needed, partcularly when the number of observatons n s not large 13

14 Fgure 6: Plot of the frst two canoncal varates based on the mpled clusterng va MCFA approach relatve to the number of dmensons p. One way of proceedng s to work wth mxtures of factor analyzers (MFA) as studed n [1, Chapter 8]. Ths approach acheves a reducton n the number of parameters through ts factor-analytc representaton of the component-covarance matrces. But t may not provde a suffcent reducton n the number of parameters, partcularly when the the number g of clusters (components)to be mposed on the data s not small. In ths paper, we show how n such nstances the number of parameters can be reduced apprecably by usng a factor-analytc representaton of the component-covarance matrces wth common factor loadngs. The approach s called mxtures of common factor analyzers (MCFA). Ths sharng of the factor loadngs enables the model to be used to cluster hgh-dmensonal nto many clusters and to provde low-dmensonal plots of the clusters so obtaned. The latter plots are gven n terms of the (estmated) posteror means of the factors correspondng to the observed data. These projectons are not useful wth the MFA approach as n ts formulaton the factors are taken to be whte nose wth no cluster-specfc dscrmnatory features for the factors. The MFA approach does allow a more general representaton of the component varances/covarances and places no restrctons on the component means. Thus t s more flexble n ts modellng of the data. But n ths paper we demonstrate that MCFA provdes a comparable approach that can be appled n stuatons where the dmenson p and the number of clusters g can be qute large. We have presented analyses of both smulated and real data sets to demonstrate the usefulness of the MCFA approach. In practce, we can use the Bayesan Informaton Crteron (BIC) of Schwartz [18] to provde a gude to the choce of the number of factors q and the number of number of components g to be used. On the latter choce t s well known that regularty condtons do not hold for the usual ch-squared approxmaton to the asymptotc null dstrbuton of the lkelhood rato test statstc to be vald. However, they do hold for tests on the number of factors at a gven level of g, and so we can also use the lkelhood rato test statstc to choose q; see [1, Chapter 8]. 14

15 9 Appendx The model (1) underlyng the MCFA approach can be ftted va the EM algorthm to estmate the vector Ψ of unknown parameters. It conssts of the mxng proportons π, the factor component-mean vectors ξ, the dstnct elements of the factor componentcovarance matrces Ω, the projecton matrx A based on sharng of factor loadngs, and the common dagonal matrx D of the resduals gven the factor scores wthn a component of the mxture. In order to apply the EM algorthm to ths problem, we ntroduce the component-ndcator labels z j, where z j s one or zero accordng to whether y j belongs or does not belong to the th component of the model. We let z j be the component-label vector, z j = (z 1j,..., z gj ) T. The z j are treated as mssng data, along wth the (unobservable) latent factors u j wthn ths EM framework. The complete-data log lkelhood s then gven by log L c (Ψ) = g n z j {log π + log φ(y j ; Au j, D) + log φ(u j ; ξ, Ω )}. (39) =1 j=1 E-step On the E-step, we requre the condtonal expectaton of the complete-data log lkelhood, log L c (Ψ), gven the observed data y = (y T 1,..., y T n) T, usng the current ft for Ψ. Let Ψ (k) be the value of Ψ after the kth teraton of the EM algorthm. Then more specfcally, on the (k + 1)th teraton the E-step requres the computaton of the condtonal expectaton of log L c (Ψ) gven y, usng Ψ (k) for Ψ, whch s denoted by Q(Ψ; Ψ (k) ). We let τ (k) j = τ (y j ; Ψ (k) ), (4) where τ (y j ; Ψ) s defned by (27). Also, we let E Ψ (k) refer to the expectaton operator, usng Ψ (k) for Ψ. Then the so-called Q-functon, Q(Ψ; Ψ (k) ), can be wrtten as g n Q(Ψ; Ψ (k) ) = τ (k) j {log π + w (k) 1j + w(k) 1j }, (41) where and =1 j=1 w (k) 1j = E Ψ (k){log φ(y j; Au j, D) y j, z j = 1} (42) w (k) 2j = E Ψ (k){log φ(u j; ξ, Ω ) y j, z j = 1}. (43) M-step On the (k + 1)th teraton of the EM algorthm, the M-step conssts of calculatng the updated estmates π (k+1), ξ (k+1), Ω (k+1), A (k+1), and D (k+1) by solvng the equaton Q(Ψ; Ψ (k) )/ Ψ =. (44) 1

16 The updated estmates of the mxng proportons π are gven as n the case of the normal mxture model by π (k+1) = n j=1 τ (k) j /n ( = 1,..., g). (4) Concernng the other parameters, t can be shown usng vector and matrx dfferentaton that Q(Ψ; Ψ (k) )/ ξ = Ω 1 Q(Ψ; Ψ (k) )/ Ω 1 = Q(Ψ; Ψ (k) )/ D 1 = Q(Ψ; Ψ (k) )/ A = n j=1 g =1 g =1 n j=1 τ (k) j E Ψ (k){(u j ξ ) y j }, (46) τ (k) 1 j [Ω 2 E Ψ (k){(u j ξ )(u j ξ ) T y j }], (47) n j=1 n j=1 τ (k) 1 j [D E 2 Ψ (k){(y j Au j )(y j u j ) T y j }], τ (k) j [D 1 {y j E Ψ (k)(u T j y j ) (48) AE Ψ (k)(u j u T j y j )}]. (49) On equatng (46) to the zero vector, t follows that ξ (k+1) can be expressed as ξ (k+1) = ξ (k) + n j=1 τ (k) j γ(k)t (y j A (k) ξ (k) n j=1 τ (k) j ), () where γ (k) = (A (k) Ω (k) A (k)t + D (k) ) 1 A (k) Ω (k). (1) On equatng (47) to the null matrx, t follows that Ω (k+1) = n j=1 τ (k) j γ(k)t (y j A (k) ξ (k) )(y j A (k) ξ (k) ) T γ (k) n j=1 τ (k) j +(I q γ (k)t A (k) )Ω (k) (2) On equatng (48) to the zero vector, we obtan D (k+1) = dag(d (k) 1 + D (k) 2 ), (3) where D (k) 1 = g n =1 j=1 τ (k) j D(k) (I p β (k) ) g n =1 j=1 τ (k) j (4) 16

17 and and where D (k) 2 = g =1 n j=1 τ (k) j β (k) β(k)t (y j A (k) ξ (k) )(y j A (k) ξ (k) ) T β (k) g n =1 j=1 τ, () (k) j = (A (k) Ω (k) A (k)t + D (k) ) 1 D (k). (6) On equatng (49) to the null matrx, we obtan A (k+1) = ( g =1 A (k) 1 )( g =1 A (k) 2 ) 1, (7) where and A (k+1) 1 = A (k+1) 2 = n j=1 n j=1 r (k) τ (k) j {y jξ (k)t τ (k) j = ξ (k) + (y j A (k) ξ (k) ) T γ (k) }, (8) {(I q γ (k)t A (k) )Ω (k) + r (k) r (k)t }, (9) + γ (k)t (y j A (k) ξ (k) ) (6) The factor loadng matrx A (k+1) must be orthogonal n columns, that s, A (k+1)t A (k+1) = I q. (61) We can use the Cholesky decomposton to fnd the upper trangular matrx C (k+1) of order q so that A (k+1)t A (k+1) = C (k+1)t C (k+1). (62) Then t follows that f we replace A (k+1) by A (k+1) C (k+1) 1, (63) then t wll satsfy the requrement (61). Wth the adopton of the estmate (63) for A, we need to adjust the updated estmates ξ (k+1) and Ω (k+1) to be and where ξ (k+1) and Ω (k+1) C (k+1) ξ (k+1) (64) C (k+1) Ω (k+1) C (k)t, (6) are gven by () and (2), respectvely. We have to specfy an ntal value for the vector Ψ of unknown parameters n the applcaton of the EM algorthm. A random start s obtaned by frst randomly assgnng the data nto g groups. Let n, ȳ, and S be the number of observatons, the sample mean, and the sample covarance matrx, respectvely, of the th group of the data so obtaned ( = 1,..., g). We then proceed as follows: 17

18 Set π () = n /n. Generate random numbers from the standard normal dstrbuton N(, 1) to obtan values for the (j, k)th element of A (j = 1,..., p; k = 1,..., q). Implement the Cholesky decomposton so that A T A = C T C, where C s the upper trangle matrx of order q, and defne A () by A C 1. On notng that the transformed data D 1/2 Y j satsfes the probablstc PCA model of Tppng and Bshop [17] wth σ 2 = 1, t follows that for a gven D () and A (), we can specfy Σ () as Ω () = A ()T D ()1/2 H (Λ σ 2 I q)h T D()1/2 A (), where σ 2 = p h=q+1 λ h/(p q). The q columns of the matrx H are the egenvectors correspondng to the egenvalues λ 1 λ 2 λ q of D () 1/2 Ω () D () 1/2, (66) where Ω () s the covarance matrx of the y j n the th group, and Λ s the dagonal matrx wth dagonal elements equal to λ 1,..., λ q. Concernng the choce of D (), we can take D () to be the dagonal matrx formed from the dagonal elements of the (pooled) wthn-cluster sample covarance matrx of the y j. The ntal value for ξ s ξ () = A ()T ȳ. Some clusterng procedure such as k-means can be used to provde non-random parttons of the data, whch can be used to obtan another set of ntal values for the parameters. In our analyses we used both ntalzaton methods. Acknowledgement The work of J. Baek was supported by the Korea Research Foundaton Grant funded by the Korean Government (MOEHRD, Basc Research Promoton Fund, KRF C48). The work of G. McLachlan was supported by the Australan Research Councl. REFERENCES [1] G.J. McLachlan and D. Peel, Fnte Mxture Models. New York: Wley, 2. [2] A.P. Dempster, N.M. Lard, and D.B. Rubn, Maxmum lkelhood from ncomplete data va the EM algorthm (wth dscusson), Journal of the Royal Statstcal Socety: Seres B, vol. 39, pp. 1 38, [3] G.J. McLachlan and T. Krshnan, The EM Algorthm and Extensons. Second Edton. New York: Wley,

19 [4] J.D. Banfeld and A.E. Raftery, Model-based Gaussan and non-gaussan clusterng, Bometrcs, Vol. 49, pp , [] G.J. McLachlan, D. Peel, and R.W. Bean, Modellng hgh-dmensonal data by mxtures of factor analyzers, Computatonal Statstcs & Data Analyss, vol. 41, pp , 23. [6] G.J. McLachlan, R.W. Bean, and L. Ben-Tovm Jones, Extenson of the mxture of factor analyzers model to ncorporate the multvarate t dstrbuton, Computatonal Statstcs & Data Analyss, Vol. 1, , 27. [7] G.E. Hnton, P. Dayan, and M. Revow, M. Modelng the manfolds of mages of handwrtten dgts, IEEE Transactons on Neural Networks, vol. 8, pp. 6 73, [8] R. Yoshda, T. Hguch, and S. Imoto, A mxed factors model for dmenson reducton and extracton of a group structure n gene expresson data, Proceedngs of the 24 IEEE Computatonal Systems Bonformatcs Conference, pp , 24. [9] R. Yoshda, T. Hguch, S. Imoto, and S. Myano, ArrayCluster: an analytc tool for clusterng, data vsualzaton and model fnder on gene expresson profles, Bonformatcs, vol. 22, pp , 26. [] G. Sangunett, Dmensonalty redcton of clustered data sets, IEEE Transactons Pattern Analyss and Machne Intellgence, Vol. 3, pp [11] P. Jaccard, Dstrbuton de la florne alpne dans la Bassn de Dranses et dans quelques regones vosnes, Bulletn de la Socété Vaudose des Scences Naturel les, Vol. 37, pp , 191. [12] L. Hubert and P. Arabe, Comparng parttons, Journal of Classfcaton, Vol. 2, , 198. [13] T.R. Golub, D.K. Slonm, P. Tamayo, C. Huard, M. Gaasenbeck, P. Mesrov, H. Coller, M.L. Loh, J.R. Downng, M.A. Calgur, et al., Molecular classfcaton of cancer: Class dscovery and class predcton by gene expresson montorng, Scence, 286, 31-37, [14] D. Nguyen, and D. Rocke, Tumor classfcaton by partal least squares usng mcroarray gene expresson data, Bonformatcs, Vol. 18, pp. 39, 22. [1] E. Yeoh, and Ross, M.E., et al., Classfcaton, subtype dscovery, and predcton of outcome n pedatrc acute lymphoblastc leukema by gene expresson proflng, Cancer Cell, Vol. 1, pp , 22. [16] C. Smyth, D. Coomans, and Y. Everngham, Clusterng nosy data n a reduced dmenson space va multvarate regresson trees, Pattern Recognton, Vol. 39, pp , 26. [17] M.E. Tppng, and C.M. Bshop, Mxtures of probablstc prncpal component analysers, Neural Computaton, Vol. 11, pp , [18] G. Schwarz, Estmatng the dmenson of a model, Annals of Statstcs, Vol. 6, pp ,

Modelling high-dimensional data by mixtures of factor analyzers

Modelling high-dimensional data by mixtures of factor analyzers Computatonal Statstcs & Data Analyss 41 (2003) 379 388 www.elsever.com/locate/csda Modellng hgh-dmensonal data by mxtures of factor analyzers G.J. McLachlan, D. Peel, R.W. Bean Department of Mathematcs,

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Georey E. Hinton. University oftoronto. Email: zoubin@cs.toronto.edu. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract

Georey E. Hinton. University oftoronto. Email: zoubin@cs.toronto.edu. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract The EM Algorthm for Mxtures of Factor Analyzers Zoubn Ghahraman Georey E. Hnton Department of Computer Scence Unversty oftoronto 6 Kng's College Road Toronto, Canada M5S A4 Emal: zoubn@cs.toronto.edu Techncal

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

STATISTICAL DATA ANALYSIS IN EXCEL

STATISTICAL DATA ANALYSIS IN EXCEL Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 petr.nazarov@crp-sante.lu Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data Journal of Al Azhar Unversty-Gaza (Natural Scences), 2011, 13 : 109-118 Estmatng the Number of Clusters n Genetcs of Acute Lymphoblastc Leukema Data Mahmoud K. Okasha, Khaled I.A. Almghar Department of

More information

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM BARRIOT Jean-Perre, SARRAILH Mchel BGI/CNES 18.av.E.Beln 31401 TOULOUSE Cedex 4 (France) Emal: jean-perre.barrot@cnes.fr 1/Introducton The

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Data Visualization by Pairwise Distortion Minimization

Data Visualization by Pairwise Distortion Minimization Communcatons n Statstcs, Theory and Methods 34 (6), 005 Data Vsualzaton by Parwse Dstorton Mnmzaton By Marc Sobel, and Longn Jan Lateck* Department of Statstcs and Department of Computer and Informaton

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

Interpreting Patterns and Analysis of Acute Leukemia Gene Expression Data by Multivariate Statistical Analysis

Interpreting Patterns and Analysis of Acute Leukemia Gene Expression Data by Multivariate Statistical Analysis Interpretng Patterns and Analyss of Acute Leukema Gene Expresson Data by Multvarate Statstcal Analyss ChangKyoo Yoo * and Peter A. Vanrolleghem BIOMATH, Department of Appled Mathematcs, Bometrcs and Process

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

Review of Hierarchical Models for Data Clustering and Visualization

Review of Hierarchical Models for Data Clustering and Visualization Revew of Herarchcal Models for Data Clusterng and Vsualzaton Lola Vcente & Alfredo Velldo Grup de Soft Computng Seccó d Intel lgènca Artfcal Departament de Llenguatges Sstemes Informàtcs Unverstat Poltècnca

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Binomial Link Functions. Lori Murray, Phil Munz

Binomial Link Functions. Lori Murray, Phil Munz Bnomal Lnk Functons Lor Murray, Phl Munz Bnomal Lnk Functons Logt Lnk functon: ( p) p ln 1 p Probt Lnk functon: ( p) 1 ( p) Complentary Log Log functon: ( p) ln( ln(1 p)) Motvatng Example A researcher

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Imperial College London

Imperial College London F. Fang 1, C.C. Pan 1, I.M. Navon 2, M.D. Pggott 1, G.J. Gorman 1, P.A. Allson 1 and A.J.H. Goddard 1 1 Appled Modellng and Computaton Group Department of Earth Scence and Engneerng Imperal College London,

More information

The Application of Fractional Brownian Motion in Option Pricing

The Application of Fractional Brownian Motion in Option Pricing Vol. 0, No. (05), pp. 73-8 http://dx.do.org/0.457/jmue.05.0..6 The Applcaton of Fractonal Brownan Moton n Opton Prcng Qng-xn Zhou School of Basc Scence,arbn Unversty of Commerce,arbn zhouqngxn98@6.com

More information

Regression Models for a Binary Response Using EXCEL and JMP

Regression Models for a Binary Response Using EXCEL and JMP SEMATECH 997 Statstcal Methods Symposum Austn Regresson Models for a Bnary Response Usng EXCEL and JMP Davd C. Trndade, Ph.D. STAT-TECH Consultng and Tranng n Appled Statstcs San Jose, CA Topcs Practcal

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Estimation of Dispersion Parameters in GLMs with and without Random Effects

Estimation of Dispersion Parameters in GLMs with and without Random Effects Mathematcal Statstcs Stockholm Unversty Estmaton of Dsperson Parameters n GLMs wth and wthout Random Effects Meng Ruoyan Examensarbete 2004:5 Postal address: Mathematcal Statstcs Dept. of Mathematcs Stockholm

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS Tmothy J. Glbrde Assstant Professor of Marketng 315 Mendoza College of Busness Unversty of Notre Dame Notre Dame, IN 46556

More information

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract Household Sample Surveys n Developng and Transton Countres Chapter More advanced approaches to the analyss of survey data Gad Nathan Hebrew Unversty Jerusalem, Israel Abstract In the present chapter, we

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Statistical algorithms in Review Manager 5

Statistical algorithms in Review Manager 5 Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a meta-analyss of k studes

More information

A Fast Incremental Spectral Clustering for Large Data Sets

A Fast Incremental Spectral Clustering for Large Data Sets 2011 12th Internatonal Conference on Parallel and Dstrbuted Computng, Applcatons and Technologes A Fast Incremental Spectral Clusterng for Large Data Sets Tengteng Kong 1,YeTan 1, Hong Shen 1,2 1 School

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data

Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data Computatonal Statstcs & Data Analyss 51 (26) 1643 1655 www.elsever.com/locate/csda Multclass sparse logstc regresson for classfcaton of multple cancer types usng gene expresson data Yongda Km a,, Sunghoon

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB. PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB. INDEX 1. Load data usng the Edtor wndow and m-fle 2. Learnng to save results from the Edtor wndow. 3. Computng the Sharpe Rato 4. Obtanng the Treynor Rato

More information

Bayesian Cluster Ensembles

Bayesian Cluster Ensembles Bayesan Cluster Ensembles Hongjun Wang 1, Hanhua Shan 2 and Arndam Banerjee 2 1 Informaton Research Insttute, Southwest Jaotong Unversty, Chengdu, Schuan, 610031, Chna 2 Department of Computer Scence &

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Evaluating credit risk models: A critique and a new proposal

Evaluating credit risk models: A critique and a new proposal Evaluatng credt rsk models: A crtque and a new proposal Hergen Frerchs* Gunter Löffler Unversty of Frankfurt (Man) February 14, 2001 Abstract Evaluatng the qualty of credt portfolo rsk models s an mportant

More information

The Distribution of Eigenvalues of Covariance Matrices of Residuals in Analysis of Variance

The Distribution of Eigenvalues of Covariance Matrices of Residuals in Analysis of Variance JOURNAL OF RESEARCH of the Natonal Bureau of Standards - B. Mathem atca l Scence s Vol. 74B, No.3, July-September 1970 The Dstrbuton of Egenvalues of Covarance Matrces of Resduals n Analyss of Varance

More information

where the coordinates are related to those in the old frame as follows.

where the coordinates are related to those in the old frame as follows. Chapter 2 - Cartesan Vectors and Tensors: Ther Algebra Defnton of a vector Examples of vectors Scalar multplcaton Addton of vectors coplanar vectors Unt vectors A bass of non-coplanar vectors Scalar product

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

Fast Fuzzy Clustering of Web Page Collections

Fast Fuzzy Clustering of Web Page Collections Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg,

More information

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIIOUS AFFILIATION AND PARTICIPATION Danny Cohen-Zada Department of Economcs, Ben-uron Unversty, Beer-Sheva 84105, Israel Wllam Sander Department of Economcs, DePaul

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

Pricing Multi-Asset Cross Currency Options

Pricing Multi-Asset Cross Currency Options CIRJE-F-844 Prcng Mult-Asset Cross Currency Optons Kenchro Shraya Graduate School of Economcs, Unversty of Tokyo Akhko Takahash Unversty of Tokyo March 212; Revsed n September, October and November 212

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

Abstract. Clustering ensembles have emerged as a powerful method for improving both the

Abstract. Clustering ensembles have emerged as a powerful method for improving both the Clusterng Ensembles: {topchyal, Models jan, of punch}@cse.msu.edu Consensus and Weak Parttons * Alexander Topchy, Anl K. Jan, and Wllam Punch Department of Computer Scence and Engneerng, Mchgan State Unversty

More information

Learning from Multiple Outlooks

Learning from Multiple Outlooks Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel maayanga@tx.technon.ac.l she@ee.technon.ac.l

More information

Loop Parallelization

Loop Parallelization - - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze

More information

Prediction of Disability Frequencies in Life Insurance

Prediction of Disability Frequencies in Life Insurance Predcton of Dsablty Frequences n Lfe Insurance Bernhard Köng Fran Weber Maro V. Wüthrch October 28, 2011 Abstract For the predcton of dsablty frequences, not only the observed, but also the ncurred but

More information

Quantization Effects in Digital Filters

Quantization Effects in Digital Filters Quantzaton Effects n Dgtal Flters Dstrbuton of Truncaton Errors In two's complement representaton an exact number would have nfntely many bts (n general). When we lmt the number of bts to some fnte value

More information

Application of Quasi Monte Carlo methods and Global Sensitivity Analysis in finance

Application of Quasi Monte Carlo methods and Global Sensitivity Analysis in finance Applcaton of Quas Monte Carlo methods and Global Senstvty Analyss n fnance Serge Kucherenko, Nlay Shah Imperal College London, UK skucherenko@mperalacuk Daro Czraky Barclays Captal DaroCzraky@barclayscaptalcom

More information

PERRON FROBENIUS THEOREM

PERRON FROBENIUS THEOREM PERRON FROBENIUS THEOREM R. CLARK ROBINSON Defnton. A n n matrx M wth real entres m, s called a stochastc matrx provded () all the entres m satsfy 0 m, () each of the columns sum to one, m = for all, ()

More information

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC Approxmatng Cross-valdatory Predctve Evaluaton n Bayesan Latent Varables Models wth Integrated IS and WAIC Longha L Department of Mathematcs and Statstcs Unversty of Saskatchewan Saskatoon, SK, CANADA

More information

How To Find The Dsablty Frequency Of A Clam

How To Find The Dsablty Frequency Of A Clam 1 Predcton of Dsablty Frequences n Lfe Insurance Bernhard Köng 1, Fran Weber 1, Maro V. Wüthrch 2 Abstract: For the predcton of dsablty frequences, not only the observed, but also the ncurred but not yet

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

General Iteration Algorithm for Classification Ratemaking

General Iteration Algorithm for Classification Ratemaking General Iteraton Algorthm for Classfcaton Ratemakng by Luyang Fu and Cheng-sheng eter Wu ABSTRACT In ths study, we propose a flexble and comprehensve teraton algorthm called general teraton algorthm (GIA)

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence

More information

Lecture 5,6 Linear Methods for Classification. Summary

Lecture 5,6 Linear Methods for Classification. Summary Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Use of Numerical Models as Data Proxies for Approximate Ad-Hoc Query Processing

Use of Numerical Models as Data Proxies for Approximate Ad-Hoc Query Processing Preprnt UCRL-JC-?????? Use of Numercal Models as Data Proxes for Approxmate Ad-Hoc Query Processng R. Kammura, G. Abdulla, C. Baldwn, T. Crtchlow, B. Lee, I. Lozares, R. Musck, and N. Tang U.S. Department

More information

An Algorithm for Data-Driven Bandwidth Selection

An Algorithm for Data-Driven Bandwidth Selection IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 2, FEBRUARY 2003 An Algorthm for Data-Drven Bandwdth Selecton Dorn Comancu, Member, IEEE Abstract The analyss of a feature space

More information

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services When Network Effect Meets Congeston Effect: Leveragng Socal Servces for Wreless Servces aowen Gong School of Electrcal, Computer and Energy Engeerng Arzona State Unversty Tempe, AZ 8587, USA xgong9@asuedu

More information