Abstract. Clustering ensembles have emerged as a powerful method for improving both the

Size: px
Start display at page:

Download "Abstract. Clustering ensembles have emerged as a powerful method for improving both the"

Transcription

1 Clusterng Ensembles: {topchyal, Models jan, of Consensus and Weak Parttons * Alexander Topchy, Anl K. Jan, and Wllam Punch Department of Computer Scence and Engneerng, Mchgan State Unversty East Lansng, Mchgan, 48824, USA Abstract. Clusterng ensembles have emerged as a powerful method for mprovng both the robustness as well as the stablty of unsupervsed classfcaton solutons. However, fndng a consensus clusterng from multple parttons s a dffcult problem that can be approached from graph-based, combnatoral or statstcal perspectves. Ths study extends prevous research on clusterng ensembles n several respects. Frst, we ntroduce a unfed representaton for multple clusterngs and formulate the correspondng categorcal clusterng problem. Second, we propose a probablstc model of consensus usng a fnte mxture of multnomal dstrbutons n a space of clusterngs. A combned partton s found as a soluton to the correspondng maxmum lkelhood problem usng the EM algorthm. Thrd, we defne a new consensus functon that s related to the classcal ntra-class varance crteron usng the generalzed mutual nformaton defnton. Fnally, we demonstrate the effcacy of combnng parttons generated by weak clusterng algorthms that use data projectons and random data splts. A smple explanatory model s offered for the behavor of combnatons of such weak clusterng components. Combnaton accuracy s analyzed as a functon of several parameters that control the power and resoluton of component parttons as well as the number of parttons. We also analyze clusterng ensembles wth ncomplete nformaton and the effect of mssng cluster labels on the qualty of overall consensus. Expermental results demonstrate the effectveness of the proposed methods on several real-world datasets. KEYWORDS: clusterng, ensembles, multple classfer systems, consensus functon, mutual nformaton * Ths research was supported by ONR grant N Parts of ths work have been presented at the IEEE Internatonal Conference on Data Mnng, ICDM 03, Melbourne, Florda, November 2003 and SIAM Internatonal Conference on Data Mnng, SDM 04, Florda, Aprl 2004.

2 Introducton In contrast to supervsed classfcaton, clusterng s nherently an ll-posed problem, whose soluton volates at least one of the common assumptons about scale-nvarance, rchness, and cluster consstency [33]. Dfferent clusterng solutons may seem equally plausble wthout a pror knowledge about the underlyng data dstrbutons. Every clusterng algorthm mplctly or explctly assumes a certan data model, and t may produce erroneous or meanngless results when these assumptons are not satsfed by the sample data. Thus the avalablty of pror nformaton about the data doman s crucal for successful clusterng, though such nformaton can be hard to obtan, even from experts. Identfcaton of relevant subspaces [2] or vsualzaton [24] may help to establsh the sample data s conformty to the underlyng dstrbutons or, at least, to the proper number of clusters. The exploratory nature of clusterng tasks demands effcent methods that would beneft from combnng the strengths of many ndvdual clusterng algorthms. Ths s the focus of research on clusterng ensembles, seekng a combnaton of multple parttons that provdes mproved overall clusterng of the gven data. Clusterng ensembles can go beyond what s typcally acheved by a sngle clusterng algorthm n several respects: Robustness. Better average performance across the domans and datasets. Novelty. Fndng a combned soluton unattanable by any sngle clusterng algorthm. Stablty and confdence estmaton. Clusterng solutons wth lower senstvty to nose, outlers or samplng varatons. Clusterng uncertanty can be assessed from ensemble dstrbutons. Parallelzaton and Scalablty. Parallel clusterng of data subsets wth subsequent combnaton of results. Ablty to ntegrate solutons from multple dstrbuted sources of data or attrbutes (features). Clusterng ensembles can also be used n multobjectve clusterng as a compromse between ndvdual clusterngs wth conflctng objectve functons. Fuson of clusterngs usng multple

3 sources of data or features becomes ncreasngly mportant n dstrbuted data mnng, e.g., see revew n [4]. Several recent ndependent studes [0, 2, 4, 5, 43, 47] have poneered clusterng ensembles as a new branch n the conventonal taxonomy of clusterng algorthms [26, 27]. Please see the Appendx for detaled revew of the related work, ncludng [7,, 6, 9, 28, 3, 35]. The problem of clusterng combnaton can be defned generally as follows: gven multple clusterngs of the data set, fnd a combned clusterng wth better qualty. Whle the problem of clusterng combnaton bears some trats of a classcal clusterng problem, t also has three major ssues whch are specfc to combnaton desgn:. Consensus functon: How to combne dfferent clusterngs? How to resolve the label correspondence problem? How to ensure symmetrcal and unbased consensus wth respect to all the component parttons? 2. Dversty of clusterng: How to generate dfferent parttons? What s the source of dversty n the components? 3. Strength of consttuents/components: How weak could each nput partton be? What s the mnmal complexty of component clusterngs to ensure a successful combnaton? Smlar questons have already been addressed n the framework of multple classfer systems. Combnng results from many supervsed classfers s an actve research area (Qunlan 96, Breman 98) and t provdes the man motvaton for clusterngs combnaton. However, t s not possble to mechancally apply the combnaton algorthms from classfcaton (supervsed) doman to clusterng (unsupervsed) doman. Indeed, no labeled tranng data s avalable n clusterng; therefore the ground truth feedback necessary for boostng the overall accuracy cannot be used. In addton, dfferent clusterngs may produce ncompatble data labelngs, resultng n ntractable correspondence problems, especally when the numbers of clusters are dfferent. Stll, the supervsed classfer combnaton demonstrates, n prncple, how multple solutons reduce the varance component of the expected error rate and ncrease the robustness of the soluton. 2

4 From the supervsed case we also learn that the proper combnaton of weak classfers [32, 25, 8, 6] may acheve arbtrarly low error rates on tranng data, as well as reduce the predctve error. One can expect that usng many smple, but computatonally nexpensve components wll be preferred to combnng clusterngs obtaned by sophstcated, but computatonally nvolved algorthms. Ths paper further advances ensemble methods n several aspects, namely, desgn of new effectve consensus functons, development of new partton generaton mechansms and study of the resultng clusterng accuracy.. Our Contrbuton We offer a representaton of multple clusterngs as a set of new attrbutes characterzng the data tems. Such a vew drectly leads to a formulaton of the combnaton problem as a categorcal clusterng problem n the space of these attrbutes, or, n other terms, a medan partton problem. Medan partton can be vewed as the best summary of the gven nput parttons. As an optmzaton problem, medan partton s NP-complete [3], wth a contnuum of heurstcs for an approxmate soluton. Ths work focuses on the prmary problem of clusterng ensembles, namely the consensus functon, whch creates the combned clusterng. We show how medan partton s related to the classcal ntra-class varance crteron when generalzed mutual nformaton s used as the evaluaton functon. Consensus functon based on quadratc mutual nformaton (QMI) s proposed and reduced to the k-means clusterng n the space of specally transformed cluster labels. We also propose a new fuson method for unsupervsed decsons that s based on a probablty model of the consensus partton n the space of contrbutng clusters. The consensus partton s found as a soluton to the maxmum lkelhood problem for a gven clusterng ensemble. The lkelhood functon of an ensemble s optmzed wth respect to the parameters of a fnte mxture dstrbuton. Each component n ths dstrbuton corresponds to a cluster n the target consensus 3

5 partton, and s assumed to be a multvarate multnomal dstrbuton. The maxmum lkelhood problem s solved usng the EM algorthm [8]. There are several advantages to QMI and EM consensus functons. These nclude: () complete avodance of solvng the label correspondence problem, () low computatonal complexty, and () ablty to handle mssng data,.e. mssng cluster labels for certan patterns n the ensemble (for example, when bootstrap method s used to generate the ensemble). Another goal of our work s to adopt weak clusterng algorthms and combne ther outputs. Vaguely defned, a weak clusterng algorthm produces a partton, whch s only slghtly better than a random partton of the data. We propose two dfferent weak clusterng algorthms as the component generaton mechansms:. Clusterng of random -dmensonal projectons of multdmensonal data. Ths can be generalzed to clusterng n any random subspace of the orgnal data space. 2. Clusterng by splttng the data usng a number of random hyperplanes. For example, f only one hyperplane s used then data s splt nto two groups. Fnally, ths paper compares the performance of dfferent consensus functons. We have nvestgated the performance of a famly of consensus functons based on categorcal clusterng ncludng the co-assocaton based herarchcal methods [5, 6, 7], hypergraph algorthms [47, 29, 30] and our new consensus functons. Combnaton accuracy s analyzed as a functon of the number and the resoluton of the clusterng components. In addton, we study clusterng performance when some cluster labels are mssng, whch s often encountered n the dstrbuted data or re-samplng scenaros. 2 Representaton of Multple Parttons Combnaton of multple parttons can be vewed as a parttonng task tself. Typcally, each partton n the combnaton s represented as a set of labels assgned by a clusterng algorthm. The combned partton s obtaned as a result of yet another clusterng algorthm whose nputs are the 4

6 cluster labels of the contrbutng parttons. We wll assume that the labels are nomnal values. In general, the clusterngs can be soft,.e., descrbed by the real values ndcatng the degree of pattern membershp n each cluster n a partton. We consder only hard parttons below, notng however, that combnaton of soft parttons can be solved by numerous clusterng algorthms and does not appear to be more complex. Suppose we are gven a set of N data ponts X = {x,, x N } and a set of H parttons Π={π,, π H } of objects n X. Dfferent parttons of X return a set of labels for each pont x, =,, N: x { π x ), π ( x ),..., π ( x )}. () ( 2 H Here, H dfferent clusterngs are ndcated and π ) denotes a label assgned to x by the j-th algorthm. No assumpton s made about the correspondence between the labels produced by dfferent clusterng algorthms. Also no assumptons are needed at the moment about the data nput: t could be represented n a non-metrc space or as an N N dssmlarty matrx. For smplcty, we use the notaton y = π x ) or y = π(x ). The problem of clusterng combnaton s to fnd a new j j ( partton π C of data X that summarzes the nformaton from the gathered parttons Π. Our man goal s to construct a consensus partton wthout the assstance of the orgnal patterns n X, but only from ther labels Y delvered by the contrbutng clusterng algorthms. Thus, such potentally mportant ssues as the underlyng structure of both the parttons and data are gnored for the sake of a soluton to the unsupervsed consensus problem. We emphasze that a space of new features s nduced by the set Π. One can vew each component partton π as a new feature wth categorcal values,.e. cluster labels. The values assumed by the -th new feature are smply the cluster labels from partton π. Therefore, membershp of an object x n dfferent parttons s treated as a new feature vector y = π(x), an H-tuple. In ths case, one can consder partton π j (x) as a feature extracton functon. Combnaton of clusterngs becomes equvalent to the problem of clusterng of H-tuples f we use only the exstng clusterngs {π,, π H }, wthout the orgnal features of data X. j ( x 5

7 Hence the problem of combnng parttons can be transformed to a categorcal clusterng problem. Such a vew gves nsght nto the propertes of the expected combnaton, whch can be nferred through varous statstcal and nformaton-theoretc technques. In partcular, one can estmate the senstvty of the combnaton to the correlaton of components (features) as well as analyze varous sample sze ssues. Perhaps the man advantage of ths representaton s that t facltates the use of known algorthms for categorcal clusterng [37, 48] and allows one to desgn new consensus heurstcs n a transparent way. The extended representaton of data X can be llustrated by a table wth N rows and (d+h) columns: π π H x x x d π ( x ) π H ( x ) x 2 Orgnal x 2 d features x 2d "New" π ( x 2 H features ) π H ( x 2 ) x N x N x Nd π ( x N ) π H ( x N ) The consensus clusterng s found as a partton π C of a set of vectors Y = {y } that drectly translates to the partton of the underlyng data ponts {x }. 3 A Mxture Model of Consensus Our approach to the consensus problem s based on a fnte mxture model for the probablty of the cluster labels y=π(x) of the pattern/object x. The man assumpton θ s that the labels y are modeled as random varables drawn from a probablty dstrbuton descrbed as a mxture of multvarate component denstes: M P( y Θ) = α mpm ( y m), (2) m= where each component s parametrzed by θ m. The M components n the mxture are dentfed wth the clusters of the consensus partton π C. The mxng coeffcents α m correspond to the pror probabltes of the clusters. In ths model, data ponts {y } are presumed to be generated n two 6

8 steps: frst, by drawng a component accordng to the probablty mass functon α m, and then N samplng a pont from the dstrbuton P m (y θ m ). All the data Y = y } are assumed to be { = ndependent and dentcally dstrbuted. Ths allows one to represent the log lkelhood functon for the parameters Θ={α,, α M, θ,, θ M } gven the data set Y as: N log L (Θ Y) = log P( y Θ) = log α m P = N M = m= The objectve of consensus clusterng s now formulated as a maxmum lkelhood estmaton problem. To fnd the best fttng mxture densty for a gven data Y, we must maxmze the lkelhood functon wth respect to the unknown parameters Θ: Θ = argmax log L ( Θ Y). (4) Θ The next mportant step s to specfy the model of component-condtonal denstes P m (y θ m ). Note, that the orgnal problem of clusterng n the space of data X has been transformed, wth the help of multple clusterng algorthms, to a space of new multvarate features y = π(x). To make the problem more tractable, a condtonal ndependence assumpton s made for the components of vector y, namely that the condtonal probablty of y can be represented as the followng product: m m H ( j) ( j) P ( y θ ) = P ( y θ ). j= To motvate ths, one can note that even f the dfferent clusterng algorthms (ndexed by j) are not truly ndependent, the approxmaton by product n Eq. (5) can be justfed by the excellent performance of nave Bayes classfers n dscrete domans [34]. Our ultmate goal s to make a dscrete label assgnment to the data n X through an ndrect route of densty estmaton of Y. The assgnments of patterns to the clusters n π C are much less senstve to the condtonal ndependence approxmaton than the estmated values of probabltes P( y Θ), as supported by the analyss of naïve Bayes classfer n [9]. m j m m ( y θ m ). (3) (5) 7

9 ( j) ( j) The last ngredent of the mxture model s the choce of a probablty densty P ( y θ ) for the components of the vectors y. Snce the varables y j take on nomnal values from a set of cluster labels n the partton π j, t s natural to vew them as the outcome of a multnomal tral: m j m K ( j) ( j) ( j) m j m = ϑ jm k = ( yj, k ) P ( y θ ) ( k) δ. Here, wthout the loss of generalty, the labels of the clusters n π j are chosen to be ntegers n (6) {,,K(j)}. To clarfy the notaton, note that the probabltes of the outcomes are defned as ϑ jm (k) and the product s over all the possble values of y j labels of the partton π j. Also, the probabltes sum up to one: K ( j) k = ϑ ( k) =, j {,..., H}, m {,..., M}. (7) jm For example, f the j-th partton has only two clusters, and possble labels are 0 and, then Eq. (5) can be smplfed as: ( j) P m y ( j ) y y θ m ) = ϑ jm ( jm ). (8) ( ϑ The maxmum lkelhood problem n Eq. (3) generally cannot be solved n a closed form when all the parameters Θ={α,, α M, θ,, θ M } are unknown. However, the lkelhood functon n Eq. (2) can be optmzed usng the EM algorthm. In order to adopt the EM algorthm, we hypothesze the exstence of hdden data Z and the lkelhood of complete data (Y, Z). If the value of z s known then one could mmedately tell whch of the M mxture components was used to generate the pont y. The detaled dervaton of the EM soluton to the mxture model wth multvarate, multnomal components s gven n the Appendx. Here we gve only the equatons for the E- and M-steps whch are repeated at each teraton of the algorthm: E[ z m ] = M α H K ( j) ( ϑ jm ( k) ) m j= k = H K ( j) α ( n ϑ jn ( k) ) n= j= k = δ ( y, k) j δ ( y, k) j. (9) 8

10 N E [ z m ] = α m = N. (0) M E [ z ] = m = N δ ( yj, k) E[ zm ] = ϑ jm ( k) = N K ( j). δ ( y, k) E[ z ] = k = The soluton to the consensus clusterng problem s obtaned by a smple nspecton of the expected values of the varables E[z m ], due to the fact that E[z m ] represents the probablty that the pattern y was generated by the m-th mxture component. Once convergence s acheved, a pattern y s assgned to the component whch has the largest value for the hdden label z. It s nstructve to consder a smple example of an ensemble. Fgure shows four 2-cluster parttons of 2 two-dmensonal data ponts. Correspondence problem s emphaszed by dfferent label systems used by the parttons. Table shows the expected values of latent varables after 6 teratons of the EM algorthm and the resultng consensus clusterng. In fact, a stable combnaton appears as early as the thrd teraton, and t corresponds to the true underlyng structure of the data. Our mxture model of consensus admts generalzaton for clusterng ensembles wth ncomplete parttons. Such parttons can appear as a result of clusterng of subsamples or resamplng of a dataset. For example, a partton of a bootstrap sample only provdes labels for the selected ponts. Therefore, the ensemble of such parttons s represented by a set of vectors of cluster labels wth potentally mssng components. Moreover, dfferent vectors of cluster labels are lkely to mss dfferent components. Incomplete nformaton can also arse when some clusterng algorthms do not assgn outlers to any of the clusters. Dfferent clusterngs n the dverse ensemble can consder the same pont x as an outler or otherwse, that results n mssng components n the vector y. j m m () 9

11 Y Y Y Y Y Y X X Y X X Y B A A A B B A A B B B B Yet another scenaro leadng to mssng nformaton can occur n clusterng combnaton of dstrbuted data or ensemble of clusterngs of non-dentcal replcas of a dataset. It s possble to apply the EM algorthm n the case of mssng data [20], namely mssng cluster labels for some of the data ponts. In these stuatons, each vector y n Y can be splt nto observed and mssng components y = (y obs, y ms ). Incorporaton of a mssng data leads to a slght β β β α β β α β α Fgure : Four possble parttons of 2 data ponts nto 2 clusters. Dfferent parttons use dfferent sets of labels. Table : Clusterng ensemble and consensus soluton π π 2 π 3 π 4 E [z ] E [z 2 ] Consensus y 2 B X β y 2 2 A X α y 3 2 A Y β y 4 2 B X β y 5 A X β y 6 2 A Y β y 7 2 B Y α y 8 B Y α y 9 B Y β y 0 A Y α y 2 B Y α y 2 B Y α α α α 0

12 modfcaton of the computaton of E and M steps. Frst, the expected values E[z m y obs, Θ ] are now nferred from the observed components of vector y,.e. the products n Eq. (9) are taken over H known labels:. Addtonally, one must compute the expected values E[z m y ms y obs, Θ ] j= j: y obs and substtute them, as well as E[z m y obs, Θ ], n the M-step for re-estmaton of parameters ϑ (k). More detals on handlng mssng data can be found n [20]. jm Though data wth mssng cluster labels can be obtaned n dfferent ways, we analyze only the case when components of y are mssng completely at random [46]. It means that the probablty of a component to be mssng does not depend on other observed or unobserved varables. Note, that the outcome of clusterng of data subsamples (e.g., bootstrap) s dfferent from clusterng the entre data set and then deletng a random subset of labels. However, our goal s to present a consensus functon for general settngs. We expect that expermental results for ensembles wth mssng labels are applcable, at least qualtatvely, even for a combnaton of bootstrap clusterngs. The proposed ensemble clusterng based on mxture model consensus algorthm s summarzed below. Note that any clusterng algorthm can be used to generate ensemble nstead of the k-means algorthm shown n ths pseudocode: begn for = to H // H - number of clusterngs end cluster a dataset X: π k-means(x) add partton π to the ensemble Π= {Π,π} ntalze model parameters Θ ={α,, α M, θ,, θ M } do untl convergence crteron s satsfed compute expected values E[z m ], =..N, m=..m compute E[z m y ms ] for mssng data (f any) re-estmate parameters ϑ (k jm ), j=..h, m=..m, k end π C (x ) = ndex of component of z wth the largest expected value, =..N

13 return π C // consensus partton end The value of M, number of components n the mxture, deserves a separate dscusson that s beyond the scope of ths paper. Here, we assume that the target number of clusters s predetermned. It should be noted, however, that mxture model n unsupervsed classfcaton greatly facltates estmaton of the true number of clusters [3]. Maxmum lkelhood formulaton of the problem specfcally allows us to estmate M by usng addtonal objectve functons durng the nference, such as the mnmum descrpton length of the model. In addton, the proposed consensus algorthm can be vewed as a verson of Latent Class Analyss (e.g. see [4]), whch has rgorous statstcal means for quantfyng plausblty of a canddate mxture model. Whereas the fnte mxture model may not be vald for the patterns n the orgnal space (the ntal representaton), ths model more naturally explans the separaton of groups of patterns n the space of extracted features (labels generated by the parttons). It s somewhat remnscent of classfcaton approaches based on kernel methods whch rely on lnear dscrmnant functons n the transformed space. For example, Support Vector Clusterng [5] seeks sphercal clusters after the kernel transformaton that corresponds to more complex cluster shapes n the orgnal pattern space. 4 Informaton-Theoretc Consensus of Clusterngs Another canddate consensus functon s based on the noton of medan partton. A medan partton σ s the best summary of exstng parttons n Π. In contrast to the co-assocaton approach, medan partton s derved from estmates of smlartes between attrbutes (.e., parttons n Π), rather than from smlartes between objects. A well-known example of ths approach s mplemented n the COBWEB algorthm n the context of conceptual clusterng [48]. COBWEB clusterng crteron estmates the partton utlty, whch s the sum of category utlty functons ntroduced by Gluck and Corter [2]. In our terms, the category utlty functon U(σ, π ) evaluates the qualty of a Here attrbutes (features) refer to the parttons of an ensemble, whle the objects refer the orgnal data ponts. 2

14 canddate medan partton π C ={C,,C K } aganst some other partton π = {L,, L K()}, wth labels L j for j-th cluster: K K ( ) K ( ) 2 2 C π = r j r j r= j= j=, (2) U( π, ) p( C ) p( L C ) p( L ) wth the followng notatons: p(c r ) = C r / N, p(l j) = L j / N, and p L C ) = L C / C. ( j r j r r The functon U(π C, π ) assesses the agreement between two parttons as the dfference between the expected number of labels of partton π that can be correctly predcted both wth the knowledge of clusterng π C and wthout t. The category utlty functon can also be wrtten as Goodman-Kruskal ndex for the contngency table between two parttons [22, 39]. The overall utlty of the partton π C wth respect to all the parttons n Π can be measured as the sum of par-wse agreements: U( π, ) U( π, π ) H Π =. C C = Therefore, the best medan partton should maxmze the value of overall utlty: (3) best π C = arg max U ( π C, Π ). (4) πc Importantly, Mrkn [39] has proved that maxmzaton of partton utlty n Eq. (3) s equvalent to mnmzaton of the square-error clusterng crteron f the number of clusters K n target partton π C s fxed. Ths s somewhat surprsng n that the partton utlty functon n Eq. (4) uses only the between-attrbute smlarty measure of Eq.(2), whle square-error crteron makes use of dstances between objects and prototypes. Smple standardzaton of categorcal labels n {π,,π H } effectvely transforms them to quanttatve features [39]. Ths allows us to compute real-valued dstances and cluster centers. Ths transformaton replaces the -th partton π assumng K() values by K() bnary features, and standardzes each bnary feature to a zero mean. In other words, for each object x we can compute the values of the new features yɶ ( x), as followng: yɶ ( x) = δ ( L, π ( x)) p( L ), for j= K(), = H. (5) j j j j 3

15 Hence, the soluton of medan partton problem n Eq. (4) can be approached by k-means clusterng algorthm operatng n the space of features yɶ j f the number of target clusters s predetermned. We use ths heurstc as a part of emprcal study of consensus functons. Let us consder the nformaton-theoretc approach to the medan partton problem. In ths framework, the qualty of the consensus partton π C s determned by the amount of nformaton I ( π C, Π ) t shares wth the gven parttons n Π. Strehl and Ghosh [47] suggest an objectve functon that s based on the classcal Shannon defnton of mutual nformaton: best π C = arg max I ( π C, Π ), where H I ( π C, Π ) I ( π C, π ) π =, (6) C = K K ( ) p( Cr, L ) j I ( π C, π ) = p( Cr, L j )log. (7) r= j= p( Cr ) p( Lj ) Agan, an optmal medan partton can be found by solvng ths optmzaton problem. However, t s not clear how to drectly use these equatons n a search for consensus. We show that another nformaton-theoretc defnton of entropy wll reduce the mutual nformaton crteron to the category utlty functon dscussed before. We proceed from the generalzed entropy of degree s for a dscrete probablty dstrbuton P=(p,,p n ) [23]: H s n s ( P) = (2 ) p = Shannon s entropy s the lmt form of Eq.(8): s = s, s lm H ( P) p log p n =. s > 0, s Generalzed mutual nformaton between σ and π can be defned as: s s s I ( π, π ) = H ( π ) H ( π π ). (20) C Quadratc entropy (s=2) s of partcular nterest, snce t s known to be closely related to classfcaton error, when used n the probablstc measure of nter-class dstance. When s=2, generalzed mutual nformaton I(π C, π ) becomes: 2 C (8) (9) 4

16 K ( ) K ( ) 2 K 2 2 I ( π C, π ) = 2 p( L j ) + 2 p( Cr ) p( L j Cr ) ) = j= r= j= (2) K K ( ) K ( ) 2 2 p Cr p L j Cr p L j U π C π r= j= j= = 2 ( ) ( ) 2 ( ) = 2 (, ). Therefore, generalzed mutual nformaton gves the same consensus clusterng crteron as category utlty functon n Eq. (3). Moreover, tradtonal Gn-ndex measure for attrbute selecton also follows from Eqs. (2) and (2). In lght of Mrkn s result, all these crtera are equvalent to wthn-cluster varance mnmzaton, after smple label transformaton. Quadratc mutual nformaton, mxture model and other nterestng consensus functons have been used n our comparatve emprcal study. 5 Combnaton of Weak Clusterngs The prevous sectons addressed the problem of clusterngs combnaton, namely how to formulate the consensus functon regardless of the nature of ndvdual parttons n the combnaton. We now turn to the ssue of generatng dfferent clusterngs for the combnaton. There are several prncpal questons. Do we use the parttons produced by numerous clusterng algorthms avalable n the lterature? Can we relax the requrements for the clusterng components? There are several exstng methods to provde dverse parttons:. Use dfferent clusterng algorthms, e.g. k-means, mxture of Gaussans, spectral, sngle-lnk, etc. [47]. 2. Explot bult-n randomness or dfferent parameters of some algorthms, e.g. ntalzatons and varous values of k n k-means algorthm [35, 5, 6]. 3. Use many subsamples of the data set, such as bootstrap samples [0, 38]. These methods rely on the clusterng algorthms, whch are powerful on ther own, and as such are computatonally nvolved. We argue that t s possble to generate the parttons usng weak, but less expensve, clusterng algorthms and stll acheve comparable or better performance. Certanly, the 5

17 key motvaton s that the synergy of many such components wll compensate for ther weaknesses. We consder two smple clusterng algorthms:. Clusterng of the data projected to a random subspace. In the smplest case, the data s projected on -dmensonal subspace, a random lne. The k-means algorthm clusters the projected data and gves a partton for the combnaton. 2. Random splttng of data by hyperplanes. For example, a sngle random hyperplane would create a rather trval clusterng of d-dmensonal data by cuttng the hypervolume nto two regons. We wll show that both approaches are capable of producng hgh qualty consensus clusterngs n conjuncton wth a proper consensus functon. 5. Splttng by Random Hyperplanes Drect clusterng by use of a random hyperplane llustrates how a relable consensus emerges from low-nformatve components. The random splts approach pushes the noton of weak clusterng almost to an extreme. The data set s cut by random hyperplanes dssectng the orgnal volume of d-dmensonal space contanng the ponts. Ponts separated by the hyperplanes are declared to be n dfferent clusters. Hence, the output clusters are convex. In ths stuaton, a co-assocaton consensus functon s approprate snce the only nformaton needed s whether the patterns are n the same cluster or not. Thus the contrbuton of a hyperplane partton to the co-assocaton value for any par of objects can be ether 0 or. Fner resolutons of dstance are possble by countng the number of hyperplanes separatng the objects, but for smplcty we do not use t here. Consder a random lne dssectng the classc 2-spral data shown n Fg. 2(a). Whle any one such partton does lttle to reveal the true underlyng clusters, analyss of the hyperplane generatng mechansm shows how multple such parttons can dscover the true clusters. 6

18 0.9 cluster cluster P plane 2-planes 3-planes 4-planes dstance x Fgure 2. Clusterng by a random hyperplane: (a) An example of splttng 2-spral data set by a random lne. Ponts on the same sde of the lne are n the same cluster. (b) Probablty of splttng two onedmensonal objects for dfferent number of random thresholds as a functon of dstance between objects. Consder frst the case of one-dmensonal data. Splttng of objects n -dmensonal space s done by a random threshold n R. In general, f r thresholds are randomly selected, then (r+) clusters are formed. It s easy to derve that, n -dmensonal space, the probablty of separatng two objects whose nter-pont dstance s x s exactly: P ( splt) ) where L s the length of the nterval contanng the objects, and r threshold ponts are drawn at r = ( x L, (22) random from unform dstrbuton on ths nterval. Fg. 2(b) llustrates the dependence for L= and r=,2,3,4. If a co-assocaton matrx s used to combne H dfferent parttons, then the expected value of co-assocaton between two objects s H( P(splt)), that follows from the bnomal dstrbuton of the number of splts n H attempts. Therefore, the co-assocaton values found after combnng many random splt parttons are generally expected to be a non-lnear and a monotonc functon of respectve dstances. The stuaton s smlar for multdmensonal data, however, the generaton of random hyperplanes s a bt more complex. To generate a random hyperplane n d dmensons, we should frst draw a random pont n the multdmensonal regon that wll serve as a pont of orgn. Then we randomly choose a unt normal vector u that defnes the hyperplane. The two objects characterzed by vectors p and q wll be n the same cluster f (up)(uq)>0 and wll be separated otherwse (here ab denotes a scalar product of a and b). If r hyperplanes are generated, then the total probablty that two objects reman n the same cluster s just the product of 7

19 probabltes that each of the hyperplanes does not splt the objects. Thus we can expect that the law governng the co-assocaton values s close to what s obtaned n -dmensonal space n Eq. (22). Let us compare the actual dependence of co-assocaton values wth the functon n Eq. (22). Fg. 3 shows the results of experments wth 000 dfferent parttons by random splts of the Irs data set. The Irs data s 4-dmensonal and contans 50 ponts. There are,75 par-wse dstances between the data tems. For all the possble pars of ponts, each plot n Fg. 3 shows the number of tmes a par was splt. The observed dependence of the nter-pont dstances derved from the co-assocaton values vs. the true Eucldean dstance, ndeed, can be descrbed by the functon n Eq. (22). Clearly, the nter-pont dstances dctate the behavor of respectve co-assocaton values. The probablty of a cut between any two gven objects does not depend on the other objects n the data set. Therefore, we can conclude that any clusterng algorthm that works well wth the orgnal nterpont dstances s also expected to work well wth co-assocaton values obtaned from a combnaton of multple parttons by random splts. However, ths result s more of theoretcal value when true dstances are avalable, snce they can be used drectly nstead of co-assocaton values. It llustrates the man dea of the approach, namely that the synergy of multple weak clusterngs can be very effectve. We present an emprcal study of the clusterng qualty of ths algorthm n the expermental secton. 5.2 Combnaton of Clusterngs n Random Subspaces Random subspaces are an excellent source of clusterng dversty that provdes dfferent vews of the data. Projectve clusterng s an actve topc n data mnng. For example, algorthms such as CLIQUE [2] and DOC [42] can dscover both useful projectons as well as data clusters. Here, however, we are only concerned wth the use of random projectons for the purpose of clusterng combnaton. 8

20 cōasocaton dstance cōasocaton dstance x cōasocaton dstance r= r=2 cōasocaton dstance x r=3 r=4 x x Fgure 3. Dependence of dstances derved from the co-assocaton values vs. the actual Eucldean dstance x for each possble par of objects n Irs data. Co-assocaton matrces were computed for dfferent numbers of hyperplanes r =,2,3,4. Each random subspace can be of very low dmenson and t s by tself somewhat unnformatve. On the other hand, clusterng n -dmensonal space s computatonally cheap and can be effectvely performed by k-means algorthm. The man subroutne of k-means algorthm dstance computaton becomes d tmes faster n -dmensonal space. The cost of projecton s lnear wth respect to the sample sze and number of dmensons O(Nd), and s less then the cost of one k-means teraton. The man dea of our approach s to generate multple parttons by projectng the data on a random lne. A fast and smple algorthm such as k-means clusters the projected data, and the resultng partton becomes a component n the combnaton. Afterwards, a chosen consensus functon s appled to the components. We dscuss and compare several consensus functons n the expermental secton. It s nstructve to consder a smple 2-dmensonal data and one of ts projectons, as llustrated n Fg. 4(a). There are two natural clusters n the data. Ths data looks the same n any - 9

21 d projected dstrbuton total class class 2 # of objects projected axs (a) (b) Fgure 4. Projectng data on a random lne: (a) A sample data wth two dentfable natural clusters and a lne randomly selected for projecton. (b) Hstogram of the dstrbuton of ponts resultng from data projecton onto a random lne. dmensonal projecton, but the actual dstrbuton of ponts s dfferent n dfferent clusters n the projected subspace. For example, Fg. 4(b) shows one possble hstogram dstrbuton of ponts n - dmensonal projecton of ths data. There are three dentfable modes, each havng a clear majorty of ponts from one of the two classes. One can expect that clusterng by k-means algorthm wll relably separate at least a porton of the ponts from the outer rng cluster. It s easy to magne that projecton of the data n Fg. 4(a) on another random lne would result n a dfferent dstrbuton of ponts and dfferent label assgnments, but for ths partcular data set t wll always appear as a mxture of three bell-shaped components. Most probably, these modes wll be dentfed as clusters by k-means algorthm. Thus each new -dmensonal vew correctly helps to group some data ponts. Accumulaton of multple vews eventually should result n a correct combned clusterng. The major steps for combnng the clusterngs usng random -d projectons are descrbed by the followng procedure: begn for = to H // H s the number of clusterngs n the combnaton generate a random vector u, s.t. u = project all data ponts {x j }: {y j } {ux j }, j= N cluster projectons {y j }: π() k-means({y j }) end combne clusterngs va a consensus functon: σ {π()}, = H 20

22 return σ // consensus partton end The mportant parameter s the number of clusters n the component partton π returned by k- means algorthm at each teraton,.e. the value of k. If the value of k s too large then the parttons {π } wll overft the data set whch n turn may cause unrelablty of the co-assocaton values. Too small a number of clusters n {π } may not be enough to capture the true structure of data set. In addton, f the number of clusterngs n the combnaton s too small then the effectve sample sze for the estmates of dstances from co-assocaton values s also nsuffcent, resultng n a larger varance of the estmates. That s why the consensus functons based on the co-assocaton values are more senstve to the number of parttons n the combnaton (value of H) than consensus functons based on hypergraph algorthms. 6 Emprcal study The experments were conducted wth artfcal and real-world datasets, where true natural clusters are known, to valdate both accuracy and robustness of consensus va the mxture model. We explored the datasets usng fve dfferent consensus functons. 6. Datasets. Table 2 summarzes the detals of the datasets. Fve datasets of dfferent nature have been used n the experments. Bochemcal and Galaxy data sets are descrbed n [] and [40], respectvely. Table 2: Characterstcs of the datasets. Dataset No. of No. of No. of Total no. Av. k -means features classes ponts/class of ponts error (%) Bochem Galaxy sprals Half-rngs Irs

23 We evaluated the performance of the evdence accumulaton clusterng algorthms by matchng the detected and the known parttons of the datasets. The best possble matchng of clusters provdes a measure of performance expressed as the msassgnment rate. To determne the clusterng error, one needs to solve the correspondence problem between the labels of known and derved clusters. The optmal correspondence can be obtaned usng the Hungaran method for mnmal weght bpartte matchng problem wth O(k 3 ) complexty for k clusters. 6.2 Selecton of Parameters and Algorthms. Accuracy of the QMI and EM consensus algorthms has been compared to sx other consensus functons:. CSPA for parttonng of hypergraphs nduced from the co-assocaton values. Its complexty s O(N 2 ) that leads to severe computatonal lmtatons. We dd not apply ths algorthm to Galaxy [40] and Bochemcal [] data. For the same reason, we dd not use other coassocaton methods, such as sngle-lnk clusterng. The performance of these methods was already analyzed n [4,5]. 2. HGPA for hypergraph parttonng. 3. MCLA, that modfes HGPA va extended set of hyperedge operatons and addtonal heurstcs. 4. Consensus functons operated on the co-assocaton matrx, but wth three dfferent herarchcal clusterng algorthms for obtanng the fnal partton, namely sngle-lnkage, average-lnkage, and complete-lnkage. Frst three methods (CSPA, HGPA and MCLA) were ntroduced n [47] and ther code s avalable at The k-means algorthm was used as a method of generatng the parttons for the combnaton. Dversty of the parttons s ensured by the solutons obtaned after a random ntalzaton of the algorthm. The followng parameters of the clusterng ensemble are especally mportant:. H the number of combned clusterngs. We vared ths value n the range [5..50]. 22

24 Fgure 5: 2 sprals and Half-rngs datasets are dffcult for any centrod based clusterng algorthms.. k the number of clusters n the component clusterngs {π,, π H } produced by k-means algorthm was taken n the range [2..0].. r the number of hyperplanes used for obtanng clusterngs {π,, π H } by random splttng algorthm. Both, the EM and QMI algorthms are susceptble to the presence of local mnma of the objectve functons. To reduce the rsk of convergence to a lower qualty soluton, we used a smple heurstc afforded by low computatonal complextes of these algorthms. The fnal partton was pcked from the results of three runs (wth random ntalzatons) accordng to the value of objectve functon. The hghest value of the lkelhood functon served as a crteron for the EM algorthm and wthn-cluster varance s a crteron for the QMI algorthm. 6.3 Experments wth Complete Parttons. Only man results for each of the datasets are presented n Tables 3-7 due to space lmtatons. The tables report the mean error rate (%) of clusterng combnaton from 0 ndependent runs for relatvely large bochemcal and astronomcal data sets and from 20 runs for the other smaller datasets. Frst observaton s that none of the consensus functons s the absolute wnner. Good performance was acheved by dfferent combnaton algorthms across the values of parameters k and H. The EM algorthm slghtly outperforms other algorthms for ensembles of smaller sze, whle MCLA s superor when number of clusterngs H > 20. However, ensembles of very large sze are less mportant n practce. All co-assocaton methods are usually unrelable wth number of 23

25 clusterngs H < 50 and ths s where we poston the proposed EM algorthm. Both, EM and QMI consensus functons need to estmate at least khm parameters. Therefore, accuracy degradaton wll nevtably occur wth an ncrease n the number of parttons when sample sze s fxed. However, there was no notceable decrease n the accuracy of the EM algorthm n current experments. The EM algorthm also should beneft from the datasets of large sze due to the mproved relablty of model parameter estmaton. A valuable property of the EM consensus algorthm s ts fast convergence rate. Mxture model parameter estmates nearly always converged n less than 0 teratons for all the datasets. Moreover, pattern assgnments were typcally settled n 4-6 teratons. Clusterng combnaton accuracy also depends on the number of clusters M n the ensemble parttons, or more precsely, on ts rato to the target number of clusters,.e. k/m. For example, the EM algorthm worked best wth k=3 for Irs dataset, k=3,4 for Galaxy dataset and k=2 for Halfrngs data. These values of k are equal or slghtly greater than the number of clusters n the combned partton. In contrast, accuracy of MCLA slghtly mproves wth an ncrease n the number of clusters n the ensemble. Fgure 7 shows the error as a functon of k for dfferent consensus functons for the galaxy data. It s also nterestng to note that, as expected, the average error of consensus clusterng was lower than average error of the k-means clusterngs n the ensemble (Table 2) when k s chosen to be equal to the true number of clusters. Moreover, the clusterng error obtaned by EM and MCLA algorthms wth k=4 for Bochemstry data [] was the same as found by supervsed classfers appled to ths dataset [45]. 6.4 Experments wth Incomplete Parttons. Ths set of experments focused on the dependence of clusterng accuracy on the number of patterns wth mssng cluster labels. As before, an ensemble of parttons was generated usng the k-means algorthm. Then, we randomly deleted cluster labels for a fxed number of patterns n each of the parttons. The EM consensus algorthm was used on 24

26 Table 3: Mean error rate (%) for the Galaxy dataset. Type of Consensus Functon H k EM QMI HGPA MCLA error rate (%) % of patterns wth mssng labels Fgure 6: Consensus clusterng error rate as a functon of the number of mssng labels n the ensemble for the Irs dataset, H=5, k=3. such an ensemble. The number of mssng labels n each partton was vared between 0% to 50% of the total number of patterns. The man results averaged over 0 ndependent runs are reported n Table 8 for Galaxy and Bochemstry datasets for varous values of H and k. Also, a typcal dependence of error on the number of patterns wth mssng data s shown for Irs data on Fgure 6 (H=5, k=3). One can note that combnaton accuracy decreases only nsgnfcantly for Bochemstry data when up to 50% of labels are mssng. Ths can be explaned by the low nherent accuracy for ths data, leavng lttle room for further degradaton. For the Galaxy data, the accuracy drops by almost 0% when k=3,4. However, when just 0-20% of the cluster labels are mssng, then there s just a small change n accuracy. Also, wth dfferent values of k, we see dfferent senstvty of the results to the mssng labels. For example, wth k=2, the accuracy drops by only slghtly more than %. Ensembles of larger sze H=0 suffered less from mssng data than ensembles of sze H= Results of Random Subspaces Algorthm Let us start by demonstratng how the combnaton of clusterngs n projected -dmensonal subspaces outperforms the combnaton of clusterngs n the orgnal multdmensonal space. Fg. 8(a) shows the learnng dynamcs for Irs data and k=4, usng average-lnk consensus functon based 25

27 on co-assocaton values. Note that the number of clusters n each of the components {π,, π H } s set to k=4, and s dfferent from the true number of clusters (=3). Clearly, each ndvdual clusterng n full multdmensonal space s much stronger than any -dm partton, and therefore wth only a small number of parttons (H<50) the combnaton of weaker parttons s not yet effectve. However, for larger numbers of combned parttons (H>50), -dm projectons together better reveal the true structure of the data. It s qute unexpected, snce the k-means algorthm wth k=3 makes, on average, 9 mstakes n orgnal 4-dm space and 25 mstakes n -dm random subspace. Moreover, clusterng n the projected subspace s d tmes faster than n multdmensonal space. Although, the cost of computng a consensus partton σ s the same n both cases. The results regardng the mpact of value of k are reported n Fg. 8(b), whch shows that there s a crtcal value of k for the Irs data set. Ths occurs when the average-lnkage of co-assocaton dstances s used as a consensus functon. In ths case the value k=2 s not adequate to separate the true clusters. The role of the consensus functon s llustrated n Fg. 9. Three consensus functons are compared on the Irs data set. They all use smlartes from the co-assocaton matrx but cluster the objects usng three dfferent crteron functons, namely, sngle lnk, average lnk and complete lnk. It s clear that the combnaton usng sngle-lnk performs sgnfcantly worse than the other two consensus functons. Ths s expected snce the three classes n Irs data have hyperellpsodal shape. More results were obtaned on half-rngs and 2 sprals data sets n Fg. 5, whch are tradtonally dffcult for any parttonal centrod-based algorthm. Table 9 reports the error rates for the 2 sprals data usng seven dfferent consensus functons, dfferent number of component parttons H = [5..500] and dfferent number of clusters n each component k = 2,4,0. We omt smlar results for half-rngs data set under the same expermental condtons and some ntermedate values of k due to space lmtatons. As we see, the sngle-lnk consensus functon performed the best and was able to dentfy both the half-rngs clusters as well as sprals. In contrast to the results for Irs data, average-lnk and complete-lnk consensus were not sutable for these data sets. 26

28 Table 4: Mean error rate (%) for the Bochemstry dataset. Type of Consensus Functon H k EM QMI MCLA Table 5: Mean error rate (%) for the Half-rngs dataset. Type of Consensus Functon H k EM QMI CSPA HGPA MCLA Table 6: Mean error rate (%) for the 2-sprals dataset. Type of Consensus Functon H k EM QMI CSPA HGPA MCLA error rate (%) k - number of clusters EM QMI MCLA Fgure 7: Consensus error as a functon of the number of clusters n the contrbutng parttons for Galaxy data and ensemble sze H=20. Table 7: Mean error rate (%) for the Irs dataset. Type of Consensus Functon H k EM QMI CSPA HGPA MCLA Table 8: Clusterng error rate of EM algorthm as a functon of the number of mssng labels for the large datasets Mssng "Galaxy" "Bochem." H k labels (%) error (%) error (%)

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Bayesian Cluster Ensembles

Bayesian Cluster Ensembles Bayesan Cluster Ensembles Hongjun Wang 1, Hanhua Shan 2 and Arndam Banerjee 2 1 Informaton Research Insttute, Southwest Jaotong Unversty, Chengdu, Schuan, 610031, Chna 2 Department of Computer Scence &

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

Cluster Analysis. Cluster Analysis

Cluster Analysis. Cluster Analysis Cluster Analyss Cluster Analyss What s Cluster Analyss? Types of Data n Cluster Analyss A Categorzaton of Maor Clusterng Methos Parttonng Methos Herarchcal Methos Densty-Base Methos Gr-Base Methos Moel-Base

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble 1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms Internatonal Journal of Appled Informaton Systems (IJAIS) ISSN : 2249-0868 Foundaton of Computer Scence FCS, New York, USA Volume 7 No.7, August 2014 www.jas.org Cluster Analyss of Data Ponts usng Parttonng

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

An Empirical Study of Search Engine Advertising Effectiveness

An Empirical Study of Search Engine Advertising Effectiveness An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman

More information

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Enterprise Master Patient Index

Enterprise Master Patient Index Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an

More information

Enabling P2P One-view Multi-party Video Conferencing

Enabling P2P One-view Multi-party Video Conferencing Enablng P2P One-vew Mult-party Vdeo Conferencng Yongxang Zhao, Yong Lu, Changja Chen, and JanYn Zhang Abstract Mult-Party Vdeo Conferencng (MPVC) facltates realtme group nteracton between users. Whle P2P

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures Mnmal Codng Network Wth Combnatoral Structure For Instantaneous Recovery From Edge Falures Ashly Joseph 1, Mr.M.Sadsh Sendl 2, Dr.S.Karthk 3 1 Fnal Year ME CSE Student Department of Computer Scence Engneerng

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,

More information

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining Rsk Model of Long-Term Producton Schedulng n Open Pt Gold Mnng R Halatchev 1 and P Lever 2 ABSTRACT Open pt gold mnng s an mportant sector of the Australan mnng ndustry. It uses large amounts of nvestments,

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Learning from Multiple Outlooks

Learning from Multiple Outlooks Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel maayanga@tx.technon.ac.l she@ee.technon.ac.l

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

Evaluating credit risk models: A critique and a new proposal

Evaluating credit risk models: A critique and a new proposal Evaluatng credt rsk models: A crtque and a new proposal Hergen Frerchs* Gunter Löffler Unversty of Frankfurt (Man) February 14, 2001 Abstract Evaluatng the qualty of credt portfolo rsk models s an mportant

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

Solving Factored MDPs with Continuous and Discrete Variables

Solving Factored MDPs with Continuous and Discrete Variables Solvng Factored MPs wth Contnuous and screte Varables Carlos Guestrn Berkeley Research Center Intel Corporaton Mlos Hauskrecht epartment of Computer Scence Unversty of Pttsburgh Branslav Kveton Intellgent

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Inter-Ing 2007. INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007.

Inter-Ing 2007. INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007. Inter-Ing 2007 INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007. UNCERTAINTY REGION SIMULATION FOR A SERIAL ROBOT STRUCTURE MARIUS SEBASTIAN

More information

Combinatorial Agency of Threshold Functions

Combinatorial Agency of Threshold Functions Combnatoral Agency of Threshold Functons Shal Jan Computer Scence Department Yale Unversty New Haven, CT 06520 shal.jan@yale.edu Davd C. Parkes School of Engneerng and Appled Scences Harvard Unversty Cambrdge,

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent

More information

HowHow to Find the Best Online Stock Broker

HowHow to Find the Best Online Stock Broker A GENERAL APPROACH FOR SECURITY MONITORING AND PREVENTIVE CONTROL OF NETWORKS WITH LARGE WIND POWER PRODUCTION Helena Vasconcelos INESC Porto hvasconcelos@nescportopt J N Fdalgo INESC Porto and FEUP jfdalgo@nescportopt

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Adaptive Fractal Image Coding in the Frequency Domain

Adaptive Fractal Image Coding in the Frequency Domain PROCEEDINGS OF INTERNATIONAL WORKSHOP ON IMAGE PROCESSING: THEORY, METHODOLOGY, SYSTEMS AND APPLICATIONS 2-22 JUNE,1994 BUDAPEST,HUNGARY Adaptve Fractal Image Codng n the Frequency Doman K AI UWE BARTHEL

More information

An MILP model for planning of batch plants operating in a campaign-mode

An MILP model for planning of batch plants operating in a campaign-mode An MILP model for plannng of batch plants operatng n a campagn-mode Yanna Fumero Insttuto de Desarrollo y Dseño CONICET UTN yfumero@santafe-concet.gov.ar Gabrela Corsano Insttuto de Desarrollo y Dseño

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

Lecture 5,6 Linear Methods for Classification. Summary

Lecture 5,6 Linear Methods for Classification. Summary Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and POLYSA: A Polynomal Algorthm for Non-bnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

Damage detection in composite laminates using coin-tap method

Damage detection in composite laminates using coin-tap method Damage detecton n composte lamnates usng con-tap method S.J. Km Korea Aerospace Research Insttute, 45 Eoeun-Dong, Youseong-Gu, 35-333 Daejeon, Republc of Korea yaeln@kar.re.kr 45 The con-tap test has the

More information

320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information