Fast Fuzzy Clustering of Web Page Collections

Size: px
Start display at page:

Download "Fast Fuzzy Clustering of Web Page Collections"

Transcription

1 Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg, Germany Abstract. We study an extenson of learnng vector quantzaton that draws on deas from fuzzy clusterng, enablng us to fnd fuzzy clusters of ellpsodal shape wth a compettve learnng scheme. Ths approach may be seen as a knd of onlne fuzzy clusterng, whch can have advantages w.r.t. the executon tme of the clusterng algorthm. We demonstrate the usefulness of our approach by applyng t to web page collectons, whch are, n general, dffcult to cluster due to the hgh number of dmensons and the specal dstrbuton characterstcs of the data. Introducton It s not dffcult to see that classcal c-means clusterng [6, 4] and standard learnng vector quantzaton appled to clusterng [4, 5] are very smlar: a pont that one method converges to s a stable pont of the other, n partcular, f learnng vector quantzaton s appled n batch mode. Snce classcal c-means clusterng has been generalzed to fuzzy clusterng [,, ], the dea suggests tself to transfer some deas that have been developed n fuzzy clusterng to compettve learnng, wth the am of achevng a hgher flexblty. In ths paper we consder how shape and sze parameters can be ntroduced nto a fuzzfed compettve learnng scheme, so that we arrve at compettve learnng clusterng algorthms that may be seen as onlne versons of the more sophstcated fuzzy clusterng approaches, lke the Gustafson-Kessel algorthm [] or the fuzzy maxmum lkelhood estmaton (FMLE) algorthm [8]. The basc dea of ths transfer s that the update of a reference vector n compettve learnng can be seen as an exponental decay of nformaton ganed from data ponts processed n earler steps a scheme that may just as well be appled to a covarance matrx descrbng the sze and shape of a cluster. Such onlne clusterng has at least two advantages for the applcaton doman we are concerned wth here, that s, for clusterng collectons of documents. The frst s that, due to the fact that the cluster parameters are updated more often, whle the greater part of the overhead comes from the computatons of the dstances between the data ponts and the cluster centers, t can be faster than standard fuzzy clusterng. Secondly, ths approach to clusterng makes t easer to handle documents that become avalable n a true onlne fashon, because updates need only few documents, not the whole collecton.

2 Ths paper s organzed as follows: n Secton we brefly revew some bascs of fuzzy clusterng. In Secton 3 we transfer fuzzy clusterng deas to learnng vector quantzaton and develop onlne update rules for the sze and shape parameters of a reference vector, captured n a covarance matrx. In Secton 4 we revew pre-processng methods for documents and n partcular the vector space model. In Secton 5 we present expermental results of clusterng web page collectons and fnally, n Secton 6, we draw conclusons from our dscusson. Fuzzy Clusterng Whle most classcal clusterng algorthms assgn each datum to exactly one cluster, thus formng a crsp partton of the gven data, fuzzy clusterng allows for degrees of membershp, to whch a datum belongs to dfferent clusters [,, ]. Most fuzzy clusterng algorthms are objectve functon based: they determne an optmal (fuzzy) partton of a gven data set X = {x j j =,..., n} nto c clusters by mnmzng an objectve functon subject to the constrants J(X, U, C) = c n = j= u w jd j n u j >, for all {,..., c}, and () j= c u j =, for all j {,..., n}, () = where u j [, ] s the membershp degree of datum x j to cluster and d j s the dstance between datum x j and cluster. The c n matrx U = (u j ) s called the fuzzy partton matrx and C descrbes the set of clusters by statng locaton parameters (.e. the cluster center) and maybe sze and shape parameters for each cluster. The parameter w, w >, s called the fuzzfer or weghtng exponent. It determnes the fuzzness of the classfcaton: wth hgher values for w the boundares between the clusters become softer, wth lower values they get harder. Usually w = s chosen. Hard clusterng results n the lmt for w. However, a hard assgnment may also be determned from a fuzzy result by assgnng each data pont to the cluster to whch t has the hghest degree of membershp. Constrant () guarantees that no cluster s empty and constrant () ensures that each datum has the same total nfluence by requrng that the membershp degrees of a datum must add up to. Because of the second constrant ths approach s usually called probablstc fuzzy clusterng, snce wth t the membershp degrees for a datum formally resemble the probabltes of ts beng a member of the correspondng clusters. The parttonng property of a probablstc clusterng algorthm, whch dstrbutes the weght of a datum to the dfferent clusters, s due to ths constrant.

3 Unfortunately, the objectve functon J cannot be mnmzed drectly. Therefore an teratve algorthm s used, whch alternately optmzes the membershp degrees and the cluster parameters [,, ]. That s, frst the membershp degrees are optmzed for fxed cluster parameters, then the cluster parameters are optmzed for fxed membershp degrees. The man advantage of ths scheme s that n each of the two steps the optmum can be computed drectly. By teratng the two steps the jont optmum s approached (although, of course, t cannot be guaranteed that the global optmum wll be reached the algorthm may get stuck n a local mnmum of the objectve functon J). The update formulae are derved by smply settng the dervatve of the objectve functon J w.r.t. the parameters to optmze equal to zero (necessary condton for a mnmum). Independent of the chosen dstance measure we thus obtan the followng update formula for the membershp degrees []: u j = d w j, (3) c k= d w kj that s, the membershp degrees represent the relatve nverse squared dstances of a data pont to the dfferent cluster centers, whch s a very ntutve result. The update formulae for the cluster parameters, however, depend on what parameters are used to descrbe a cluster (locaton, shape, sze) and on the chosen dstance measure. Therefore a general update formula cannot be gven. Here we brefly revew the three most common cases: The best-known fuzzy clusterng algorthm s the fuzzy c-means algorthm, whch s a straghtforward generalzaton of the classcal crsp c-means algorthm. It uses only cluster centers for the cluster prototypes and reles on the Eucldean dstance,.e., d j = d (x j, µ ) = (x j µ ) (x j µ ), where µ s the center of the -th cluster. Consequently t s restrcted to fndng sphercal clusters of equal sze. The resultng update rule s n j= µ = uw j x j, (4) n j= uw j that s, the new cluster center s the weghted mean of the data ponts assgned to t, whch s agan a very ntutve result. The Gustafson-Kessel algorthm [] uses the Mahalanobs dstance,.e., d j = d (x j, µ ) = (x j µ ) Σ (x j µ ), where µ s the cluster center and Σ s a cluster-specfc covarance matrx wth determnant that descrbes the shape of the cluster, thus allowng for ellpsodal clusters of equal sze. Ths dstance functon leads to same update rule (4) for the clusters centers. The covarance matrces are updated accordng to Σ = Σ m Σ where Σ = n j= uw j (x j µ )(x j µ ) n j= uw j (5)

4 and m s the number of dmensons of the data space. Σ s called the fuzzy covarance matrx, whch s smply normalzed to determnant to meet the abovementoned constrant. Compared to standard statstcal estmaton procedures, ths s also a very ntutve result. It should be noted that the restrcton to cluster of equal sze may be relaxed by smply allowng general covarance matrces. However, dependng on the characterstcs of the data, ths addtonal degree of freedom can deterorate the robustness of the algorthm. Fnally, the fuzzy maxmum lkelhood estmaton (FMLE) algorthm [8] s based on the assumpton that the data was sampled from a mxture of c multvarate normal dstrbutons as n the statstcal approach of mxture models [7, 3]. It uses a (squared) dstance that s nversely proportonal to the probablty that a datum was generated by the normal dstrbuton assocated wth a cluster,.e., ( ( d θ j = exp ) ) (π)m Σ (x j µ ) Σ (x j µ ), where θ s the pror probablty of the cluster, µ s the cluster center, Σ a cluster-specfc covarance matrx, whch n ths case s not requred to be normalzed to determnant, and m the number of dmensons of the data space. For the FMLE algorthm the update rules are not derved from the objectve functon due to techncal obstacles, but by comparng t to the well-known expectaton maxmzaton (EM) algorthm [5] for a mxture of normal dstrbutons [7, 3], whch, by analogy, leads to the same update rules for the cluster center and the cluster-specfc covarance matrx []. The pror probablty s, n drect analogy to statstcal estmaton, computed as θ = n n u w j. (6) Snce the hgh number of free parameters of the FMLE algorthm renders t unstable on certan data sets, t s usually recommended [] to ntalze t wth a few steps of the very robust fuzzy c-means algorthm. The same holds, though to a somewhat lesser degree, for the Gustafson-Kessel algorthm. It s worth notng that of both the Gustafson-Kessel as well as the FMLE algorthm there exst so-called axes-parallel versons, whch restrct the covarance matrces Σ to dagonal matrces and thus allow only axes-parallel ellpsods []. These varants have certan advantages w.r.t. robustness and executon tme. j= 3 Learnng Vector Quantzaton Learnng vector quantzaton [4, 5], n ts classcal form, s a compettve learnng algorthm that has been developed n the area of artfcal neural networks and that can be appled to classfed as well as unclassfed data. Here we confne ourselves to unclassfed data, where the algorthm conssts n teratvely updatng a set of c so-called reference vectors µ, =,..., c, each of whch

5 s represented by a neuron. For each data pont x j, j =,..., n, the closest reference vector (the so-called wnner neuron ) s determned and then ths reference vector (and only ths vector) s updated accordng to µ (new) = µ (old) + η ( x j µ (old) ), (7) where η s a learnng rate. Ths learnng rate usually decreases wth tme n order to avod oscllatons and to enforce the convergence of the algorthm. Membershp degrees can be ntroduced nto ths basc algorthm n two dfferent ways. In the frst place, one may employ an actvaton functon for the neurons, for whch a radal functon lke the Cauchy functon f(r) = + r or the Gaussan functon f(r) = e r may be chosen, where r s the (radal) dstance from the reference vector. In ths case all reference vectors are updated for each data pont, wth the update beng weghted wth the value of the actvaton functon. However, ths scheme, whch s closely related to possblstc fuzzy clusterng [6], usually leads to unsatsfactory results, snce there s no dependence between the clusters, so that they tend to end up at the center of gravty of all data ponts. Ths corresponds to the fact that n possblstc fuzzy clusterng the objectve functon s truly mnmzed only f all cluster centers are dentcal [3]. Useful results are obtaned only f the method gets stuck n a local mnmum, whch s an undesrable stuaton. An alternatve s to rely on a normalzaton scheme as n probablstc fuzzy clusterng, that s, to compute the weght for the update of a reference vector as the relatve nverse (squared) dstance from ths vector (cf. the computaton of the membershp degrees n fuzzy clusterng, see formula (3)), or as the relatve actvaton of a neuron. Ths s the approach we employ here, that s, we use µ (new) ( = µ (old) + η u j x j µ (old) wth u j defned as n equaton (3). Furthermore we assocate wth each neuron not only a reference vector µ, but also a covarance matrx Σ, whch descrbes the shape and (f we do not requre t to be normalzed to determnant ) the sze of the represented cluster. In order to fnd an update rule for ths covarance matrx, we observe that the above equaton (7) may also be wrtten as µ (new) = ( η ) µ (old) + η x j, whch shows that the update can be seen as an exponental decay of nformaton ganed from data ponts processed earler. Transferrng ths dea to the covarance matrces Σ and drawng on equaton (5) leads drectly to Σ (new) ) (8) = ( η ) Σ (old) + η (x j µ )(x j µ ), (9)

6 where η s a learnng rate, whch, n general, dffers from the learnng rate η for the reference vectors. In the fuzzy case ths update may be weghted, as the update of the reference vectors, by the relatve nverse (squared) dstance of the data pont from the reference vector or by the relatve neuron actvaton. It should be noted that versons of ths algorthm that requre the covarance matrx to be normalzed to determnant or restrct the covarance matrx to a dagonal matrx may be consdered, too. Such constrants can mprove the robustness or the executon tme of the algorthm. Furthermore t should be noted that the updates may be executed n batch mode, aggregatng the changes resultng from the data ponts and actually updatng the reference vectors and covarance matrces only at the end of an epoch. Fnally, t should be noted that the addtonal elements of the FMLE algorthm may also be transferred to a compettve learnng scheme, by usng the recprocal of the specal dstance functon as the actvaton functon of the neurons. The addtonal parameter θ, that s, the weght or pror probablty of a cluster, s then updated accordng to θ (new) = ( η 3 ) θ (old) + η 3 u j, where η 3 s another learnng rate (dfferent from η and η ) and u j s defned as n equaton (3). However, n the applcaton descrbed below, we confne ourselves to the smpler approach dscussed above, whch n addton to membershp degrees ntroduces only covarance matrces. 4 Clusterng Document Collectons To be able to cluster text document collectons wth the methods dscussed above, we have to map the text fles to numercal feature vectors. Therefore, we frst appled standard preprocessng methods,.e. stopword flterng and stemmng (usng the Porter Stemmer [8]), encoded each document usng the vector space model [9] and fnally selected a subset of terms as features for the clusterng process as brefly descrbed n the followng. 4. The Vector Space Model The vector space model represents text documents as vectors n an m-dmensonal space,.e., each document j s descrbed by a numercal feature vector x j = (x j,..., x jm ). Each element of the vector represents a word of the document collecton,.e., the sze of the vector s defned by the number of words of the complete document collecton. For a gven document j the so-called weght x jk defnes the mportance of the word k n ths document wth respect to the gven document collecton C. Large weghts are assgned to terms that are frequent n relevant documents but rare n the whole document collecton []. Thus a weght x jk for a term k n document j s computed as the term frequency tf jk tmes the nverse document frequency df k, whch descrbes the term specfcty wthn the document collecton.

7 In [] a weghtng scheme was proposed that has meanwhle proven ts usablty n practce. Besdes term frequency and nverse document frequency (defned as df k = log(n/n k )), a length normalzaton factor s used to ensure that all documents have equal chances of beng retreved ndependent of ther lengths: x jk = m tf jk log n n k ( ), () tfjl log n n l l= where n s the sze of the document collecton C, n k the number of documents n C that contan term k, and m the number of terms that are consdered. Based on a weghtng scheme a document j s descrbed by an m-dmensonal vector x j = (x j,..., x jm ) of term weghts and the smlarty S of two documents (or the smlarty of a document and a query vector) can be computed based on the nner product of the vectors (by whch f we assume normalzed vectors the cosne between the two document vectors s computed),.e. S(x j, x k ) = m x jl x kl. () For a more detaled dscusson of the vector space model and weghtng schemes see, for nstance, [9,, 9]. Note that for normalzed vectors the scalar product s not much dfferent n behavor from the Eucldean dstance, snce for two vectors x and y t s cos ϕ = l= xy x y = d ( x x, y y Although the scalar product s faster to compute, t enforces sphercal clusters. Therefore we rely on the Mahalanobs dstance n our approach. ). 4. Index Term Selecton To reduce the number of words n the vector descrpton we appled a smple method for keyword selecton by extractng keywords based on ther entropy. In the approach dscussed n [3], for each word k n the vocabulary the entropy as defned by [7] was computed: W k = + log n n tf jk p jk log p jk wth p jk = n l= tf, () lk j= where tf jk s the frequency of word k n document j, and n s the number of documents n the collecton. Here the entropy gves a measure how well a word s suted to separate documents by keyword search. For nstance, words that occur n many documents wll have low entropy. The entropy can be seen as a measure of the mportance of a word n the gven doman context. As ndex words a number of words that have a hgh entropy relatve to ther overall frequency

8 Dataset Character Dataset Category Assocated Theme A Commercal Banks Bankng & Fnance B Buldng Socetes Bankng & Fnance C Insurance Agences Bankng & Fnance D Java Programmng Languages E C / C++ Programmng Languages F Vsual Basc Programmng Languages G Astronomy Scence H Bology Scence I Soccer Sport J Motor Racng Sport K Sport Sport Table. Categores and Themes of the used benchmark data set. have been chosen,.e. from words occurrng equally often those wth the hgher entropy can be preferred. Emprcally ths procedure has been found to yeld a set of relevant words that are suted to serve as ndex terms [3]. In order to obtan a fxed number of ndex terms that approprately cover the documents, a greedy strategy was appled: From the frst document n the set of documents select the term wth the hghest relatve entropy as an ndex term. Then mark ths document and all other documents contanng ths term. From the frst of the remanng unmarked documents select agan the term wth the hghest relatve entropy as an ndex term. Then mark agan ths document and all other documents contanng ths term. Repeat ths process untl all documents are marked, then unmark all of them and start agan. The process can be termnated when the desred number of ndex terms have been selected. 5 Experments For our expermental studes we chose the collecton of web page documents used n []. The data set conssts of, web pages classfed nto equallyszed categores each contanng, web documents. To each category one of four dstnct themes, namely Bankng and Fnance, Programmng Languages, Scence, and Sport as shown n Table was assgned. The authors of [] reported baselne classfcaton rates usng c-means clusterng and dfferent preprocessng strateges (alternatvely stopword flterng and/or stemmng, dfferent number of ndex terms). In the followng we present results we obtaned wth our algorthm usng the preprocessng strateges descrbed above. After stemmng and stop word flterng we obtaned 63,8 words. Ths set was further reduced by removng terms that are shorter than 4 characters and that occur less then 5 or more than, / 97 tmes n the whole collecton. Thus we made sure that no Ths collecton s avalable for download from

9 words that perfectly separate one class from another are used n the descrbng vectors. From the remanng 66 words we selected 4 words by applyng the greedy ndex-term selecton approach descrbed n Secton 4.. For our clusterng experments we selected fnally subsets of the, 5,, 5,..., 35, 4 most frequent words n the subset to be clustered. Based on these words we determned vector space descrptons for each document (see Secton 4., Equaton ()) that we used n our clusterng experments. All vectors were normalzed to unt length. To assess the clusterng performance we computed the performance on the same data sets used n [],.e., we clustered the unon of the dssmlar data sets A and I, and the semantcally more smlar data sets B and C. The mean classfcaton performance reported n [] usng hard c-means are 67.5% for data sets A and I and 75.44% for data sets B and C, n both cases stemmng and stopword flterng methods were appled. In a thrd experment we used all classes and tred to fnd clusters descrbng the four man themes,.e. bankng, programmng languages, scence, and sport. For our experments we used c-means, fuzzy clusterng and learnng vector quantzaton methods wth and wthout cluster centers normalzed to unt length, wth and wthout varances (.e., sphercal clusters and axes-parallel ellpsods dagonal covarance matrces of equal sze), and wth the nverse squared dstance or the Gaussan functon for the actvaton. The learnng vector quantzaton algorthm updated the cluster parameters once for every documents. The results for some parameterzatons of the algorthms are shown n Fgures to 3. All results are the mean values of ten runs, whch dffered n the ntal cluster postons and the order n whch documents were processed. The dotted lnes show the default accuracy (obtaned f all documents are assgned to the majorty class). The damonds show the average classfcaton accuracy n percent (left axs) wth bars ndcatng the standard devaton. The grey dots and lnes show the average executon tmes n seconds (rght axs). The top row of each fgure shows the results for sphercal clusters wth cluster centers normalzed to unt length (.e., the centers resultng from the update formulae gven above were dvded by ther lengths). In ths case we used a Gaussan membershp/actvaton functon and a fxed cluster radus of 3. (Note that wth an nverse squared dstance membershp/actvaton, due to the normalzaton to sum, the radus has no effect, leadng to consderably worse results for fuzzy clusterng and learnng vector quantzaton. Therefore we omt these results here.) As can be seen, all algorthms work reasonably well, wth fuzzy clusterng and learnng vector quantzaton havng a slght edge n accuracy over hard c-means clusterng, whch s partcularly clear for the four cluster case (see Fgure 3). It may be a lttle surprsng that learnng vector quantzaton not only outperforms fuzzy clusterng w.r.t. the executon tme, but can even compete wth hard c-means clusterng n ths respect. Note that the executon tmes All experments were carred out wth a program wrtten n C and compled wth gcc on a Pentum 4C.6GHz system wth GB of man memory runnng S.u.S.E. Lnux 9.. The program and ts sources can be downloaded free of charge at

10 c-means fuzzy c-means vector quantzaton Fg.. Accuracy on commercal banks versus soccer (top row: normalzed centers, fxed unform varances; bottom row: free centers, adaptable varances). c-means fuzzy c-means vector quantzaton Fg.. Accuracy on buldng companes versus nsurance agences (top row: normalzed centers, fxed unform varances; bottom row: free centers, adaptable varances). c-means fuzzy c-means vector quantzaton Fg. 3. Accuracy on major themes (four clusters; top row: normalzed centers, fxed unform varances; bottom row: free centers, adaptable varances).

11 level off for 35 and 4 words n Fgures and, because only 36 and 3 words appear n these document subsets, respectvely. The bottom row of each fgure shows the results for free cluster centers (.e., they are not normalzed to unt length), axs-parallel ellpsods (dagonal covarance matrces), and wth the nverse squared dstance for the membershp/actvaton. Obvously, the performance of hard c-means s slghtly worse n ths case, whle fuzzy c-means and learnng vector quantzaton yeld good result. In partcular, the result acheved wth learnng vector quantzaton n the four class case (see Fgure 3), whch s very good and hghly stable, s remarkable. Ths, however, comes at some cost w.r.t. the executon tme of the algorthm. It s nterestng to note that n other experments (whch we cannot present here due to reasons of space) t turned out that usng ellpsodal clusters usually slghtly deterorates performance f the cluster centers are normalzed, but mproves performance for free cluster centers (.e. no normalzaton to unt length). 6 Conclusons In ths paper we transferred some deas from fuzzy clusterng, n partcular the use of a covarance matrx to descrbe the shape and the sze of a cluster, to learnng vector quantzaton. We developed a fuzzy compettve learnng scheme for these new reference vector parameters and appled our algorthm to the dffcult task of clusterng document collectons. Our experments showed that ths approach can be used successfully for clusterng collectons of documents, wth learnng vector quantzaton leadng to shorter executon tmes. In addton, learnng vector quantzaton appears to be slghtly more robust than fuzzy clusterng, whch s known to be consderably more robust than crsp clusterng. Fnally, t enables a truly onlne clusterng, snce only a fracton of the documents s needed for each update. References. J.C. Bezdek. Pattern Recognton wth Fuzzy Objectve Functon Algorthms. Plenum Press, New York, NY, USA 98. J.C. Bezdek, J. Keller, R. Krshnapuram, and N. Pal. Fuzzy Models and Algorthms for Pattern Recognton and Image Processng. Kluwer, Dordrecht, Netherlands J. Blmes. A Gentle Tutoral on the EM Algorthm and Its Applcaton to Parameter Estmaton for Gaussan Mxture and Hdden Markov Models. Unversty of Berkeley, Tech. Rep. ICSI-TR-97-, H.H. Bock. Automatsche Klassfkaton. Vandenhoeck & Ruprecht, Göttngen, Germany A.P. Dempster, N. Lard, and D. Rubn. Maxmum Lkelhood from Incomplete Data va the EM Algorthm. Journal of the Royal Statstcal Socety (Seres B) 39: 38. Blackwell, Oxford, Unted Kngdom R.O. Duda and P.E. Hart. Pattern Classfcaton and Scene Analyss. J. Wley & Sons, New York, NY, USA 973

12 7. B.S. Evertt and D.J. Hand. Fnte Mxture Dstrbutons. Chapman & Hall, London, UK I. Gath and A.B. Geva. Unsupervsed Optmal Fuzzy Clusterng. IEEE Trans. Pattern Analyss & Machne Intellgence : IEEE Press, Pscataway, NJ, USA, W.R. Greff. A Theory of Term Weghtng Based on Exploratory Data Analyss. Proc. st Ann. Int. ACM SIGIR Conf. on Research and Development n Informaton Retreval (Sydney, Australa), 7 9. ACM Press, New York, NY, USA 998. E.E. Gustafson and W.C. Kessel. Fuzzy Clusterng wth a Fuzzy Covarance Matrx. Proc. 8th IEEE Conference on Decson and Control (IEEE CDC, San Dego, CA), , IEEE Press, Pscataway, NJ, USA 979. F. Höppner, F. Klawonn, R. Kruse, and T. Runkler. Fuzzy Cluster Analyss. J. Wley & Sons, Chchester, England 999. F. Klawonn and R. Kruse. Constructng a Fuzzy Controller from Data. Fuzzy Sets and Systems 85: North-Holland, Amsterdam, Netherlands A. Klose, A. Nürnberger, R. Kruse, G.K. Hartmann, and M. Rchards. Interactve Text Retreval Based on Document Smlartes. Physcs and Chemstry of the Earth, Part A: Sold Earth and Geodesy 5: Elsever, Amsterdam, Netherlands 4. T. Kohonen. Learnng Vector Quantzaton for Pattern Recognton. Techncal Report TKK-F-A. Helsnk Unversty of Technology, Fnland T. Kohonen. Self-Organzng Maps. Sprnger-Verlag, Hedelberg, Germany 995 (3rd ext. edton ) 6. R. Krshnapuram and J. Keller. A Possblstc Approach to Clusterng, IEEE Transactons on Fuzzy Systems, :98-. IEEE Press, Pscataway, NJ, USA K.E. Lochbaum and L.A. Streeter. Combnng and Comparng the Effectveness of Latent Semantc Indexng and the Ordnary Vector Space Model for Informaton Retreval. Informaton Processng and Management 5: Elsever, Amsterdam, Netherlands M. Porter. An Algorthm for Suffx Strppng. Program: Electronc Lbrary & Informaton Systems 4(3):3 37. Emerald, Bradford, Unted Kngdom G. Salton, A. Wong, and C.S. Yang. A Vector Space Model for Automatc Indexng. Communcatons of the ACM 8:63 6 ACM Press, New York, NY, USA 975. G. Salton and C. Buckley. Term Weghtng Approaches n Automatc Text Retreval. Informaton Processng & Management 4: Elsever, Amsterdam, Netherlands 988. G. Salton, J. Allan, and C. Buckley. Automatc Structurng and Retreval of Large Text Fles. Communcatons of the ACM 37:97 8. ACM Press, New York, NY, USA 994. M.P. Snka, and D.W. Corne. A large benchmark dataset for web document clusterng. A. Abraham, J. Ruz-del-Solar, and M. Köppen (eds.), Soft Computng Systems: Desgn, Management and Applcatons, IOS Press, Amsterdam, The Netherlands 3. H. Tmm, C. Borgelt, and R. Kruse. A Modfcaton to Improve Possblstc Cluster Analyss. Proc. IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE, Honolulu, Hawa). IEEE Press, Pscataway, NJ, USA

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

Georey E. Hinton. University oftoronto. Email: zoubin@cs.toronto.edu. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract

Georey E. Hinton. University oftoronto. Email: zoubin@cs.toronto.edu. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract The EM Algorthm for Mxtures of Factor Analyzers Zoubn Ghahraman Georey E. Hnton Department of Computer Scence Unversty oftoronto 6 Kng's College Road Toronto, Canada M5S A4 Emal: zoubn@cs.toronto.edu Techncal

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Support vector domain description

Support vector domain description Pattern Recognton Letters 20 (1999) 1191±1199 www.elsever.nl/locate/patrec Support vector doman descrpton Davd M.J. Tax *,1, Robert P.W. Dun Pattern Recognton Group, Faculty of Appled Scence, Delft Unversty

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

A Fast Incremental Spectral Clustering for Large Data Sets

A Fast Incremental Spectral Clustering for Large Data Sets 2011 12th Internatonal Conference on Parallel and Dstrbuted Computng, Applcatons and Technologes A Fast Incremental Spectral Clusterng for Large Data Sets Tengteng Kong 1,YeTan 1, Hong Shen 1,2 1 School

More information

Ants Can Schedule Software Projects

Ants Can Schedule Software Projects Ants Can Schedule Software Proects Broderck Crawford 1,2, Rcardo Soto 1,3, Frankln Johnson 4, and Erc Monfroy 5 1 Pontfca Unversdad Católca de Valparaíso, Chle FrstName.Name@ucv.cl 2 Unversdad Fns Terrae,

More information

A Comparative Study of Data Clustering Techniques

A Comparative Study of Data Clustering Techniques A COMPARATIVE STUDY OF DATA CLUSTERING TECHNIQUES A Comparatve Study of Data Clusterng Technques Khaled Hammouda Prof. Fakhreddne Karray Unversty of Waterloo, Ontaro, Canada Abstract Data clusterng s a

More information

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms Internatonal Journal of Appled Informaton Systems (IJAIS) ISSN : 2249-0868 Foundaton of Computer Scence FCS, New York, USA Volume 7 No.7, August 2014 www.jas.org Cluster Analyss of Data Ponts usng Parttonng

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Modelling high-dimensional data by mixtures of factor analyzers

Modelling high-dimensional data by mixtures of factor analyzers Computatonal Statstcs & Data Analyss 41 (2003) 379 388 www.elsever.com/locate/csda Modellng hgh-dmensonal data by mxtures of factor analyzers G.J. McLachlan, D. Peel, R.W. Bean Department of Mathematcs,

More information

Review of Hierarchical Models for Data Clustering and Visualization

Review of Hierarchical Models for Data Clustering and Visualization Revew of Herarchcal Models for Data Clusterng and Vsualzaton Lola Vcente & Alfredo Velldo Grup de Soft Computng Seccó d Intel lgènca Artfcal Departament de Llenguatges Sstemes Informàtcs Unverstat Poltècnca

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence

More information

Abstract. Clustering ensembles have emerged as a powerful method for improving both the

Abstract. Clustering ensembles have emerged as a powerful method for improving both the Clusterng Ensembles: {topchyal, Models jan, of punch}@cse.msu.edu Consensus and Weak Parttons * Alexander Topchy, Anl K. Jan, and Wllam Punch Department of Computer Scence and Engneerng, Mchgan State Unversty

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification IDC IDC A Herarchcal Anomaly Network Intruson Detecton System usng Neural Network Classfcaton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech.,

More information

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers Journal of Computatonal Informaton Systems 7: 13 (2011) 4740-4747 Avalable at http://www.jofcs.com A Load-Balancng Algorthm for Cluster-based Mult-core Web Servers Guohua YOU, Yng ZHAO College of Informaton

More information

Bayesian Cluster Ensembles

Bayesian Cluster Ensembles Bayesan Cluster Ensembles Hongjun Wang 1, Hanhua Shan 2 and Arndam Banerjee 2 1 Informaton Research Insttute, Southwest Jaotong Unversty, Chengdu, Schuan, 610031, Chna 2 Department of Computer Scence &

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

Web Object Indexing Using Domain Knowledge *

Web Object Indexing Using Domain Knowledge * Web Object Indexng Usng Doman Knowledge * Muyuan Wang Department of Automaton Tsnghua Unversty Bejng 100084, Chna (86-10)51774518 Zhwe L, Le Lu, We-Yng Ma Mcrosoft Research Asa Sgma Center, Hadan Dstrct

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Software project management with GAs

Software project management with GAs Informaton Scences 177 (27) 238 241 www.elsever.com/locate/ns Software project management wth GAs Enrque Alba *, J. Francsco Chcano Unversty of Málaga, Grupo GISUM, Departamento de Lenguajes y Cencas de

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT Kolowrock Krzysztof Joanna oszynska MODELLING ENVIRONMENT AND INFRATRUCTURE INFLUENCE ON RELIABILITY AND OPERATION RT&A # () (Vol.) March RELIABILITY RIK AND AVAILABILITY ANLYI OF A CONTAINER GANTRY CRANE

More information

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Sequential Optimizing Investing Strategy with Neural Networks

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Sequential Optimizing Investing Strategy with Neural Networks MATHEMATICAL ENGINEERING TECHNICAL REPORTS Sequental Optmzng Investng Strategy wth Neural Networks Ryo ADACHI and Akmch TAKEMURA METR 2010 03 February 2010 DEPARTMENT OF MATHEMATICAL INFORMATICS GRADUATE

More information

Inter-Ing 2007. INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007.

Inter-Ing 2007. INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007. Inter-Ing 2007 INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007. UNCERTAINTY REGION SIMULATION FOR A SERIAL ROBOT STRUCTURE MARIUS SEBASTIAN

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

A Simple Approach to Clustering in Excel

A Simple Approach to Clustering in Excel A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,

More information

Data Visualization by Pairwise Distortion Minimization

Data Visualization by Pairwise Distortion Minimization Communcatons n Statstcs, Theory and Methods 34 (6), 005 Data Vsualzaton by Parwse Dstorton Mnmzaton By Marc Sobel, and Longn Jan Lateck* Department of Statstcs and Department of Computer and Informaton

More information

Enterprise Master Patient Index

Enterprise Master Patient Index Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process Dsadvantages of cyclc TDDB47 Real Tme Systems Manual scheduler constructon Cannot deal wth any runtme changes What happens f we add a task to the set? Real-Tme Systems Laboratory Department of Computer

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

where the coordinates are related to those in the old frame as follows.

where the coordinates are related to those in the old frame as follows. Chapter 2 - Cartesan Vectors and Tensors: Ther Algebra Defnton of a vector Examples of vectors Scalar multplcaton Addton of vectors coplanar vectors Unt vectors A bass of non-coplanar vectors Scalar product

More information

Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks

Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks From the Proceedngs of Internatonal Conference on Telecommuncaton Systems (ITC-97), March 2-23, 1997. 1 Analyss of Energy-Conservng Access Protocols for Wreless Identfcaton etworks Imrch Chlamtac a, Chara

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems STAN-CS-73-355 I SU-SE-73-013 An Analyss of Central Processor Schedulng n Multprogrammed Computer Systems (Dgest Edton) by Thomas G. Prce October 1972 Techncal Report No. 57 Reproducton n whole or n part

More information

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1 Send Orders for Reprnts to reprnts@benthamscence.ae The Open Cybernetcs & Systemcs Journal, 2014, 8, 115-121 115 Open Access A Load Balancng Strategy wth Bandwdth Constrant n Cloud Computng Jng Deng 1,*,

More information

Dynamic Pricing for Smart Grid with Reinforcement Learning

Dynamic Pricing for Smart Grid with Reinforcement Learning Dynamc Prcng for Smart Grd wth Renforcement Learnng Byung-Gook Km, Yu Zhang, Mhaela van der Schaar, and Jang-Won Lee Samsung Electroncs, Suwon, Korea Department of Electrcal Engneerng, UCLA, Los Angeles,

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Research on Evaluation of Customer Experience of B2C Ecommerce Logistics Enterprises

Research on Evaluation of Customer Experience of B2C Ecommerce Logistics Enterprises 3rd Internatonal Conference on Educaton, Management, Arts, Economcs and Socal Scence (ICEMAESS 2015) Research on Evaluaton of Customer Experence of B2C Ecommerce Logstcs Enterprses Yle Pe1, a, Wanxn Xue1,

More information

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告 行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告 畫 類 別 : 個 別 型 計 畫 半 導 體 產 業 大 型 廠 房 之 設 施 規 劃 計 畫 編 號 :NSC 96-2628-E-009-026-MY3 執 行 期 間 : 2007 年 8 月 1 日 至 2010 年 7 月 31 日 計 畫 主 持 人 : 巫 木 誠 共 同

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger

Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger Enrchng the Knowledge Sources Used n a Maxmum Entropy Part-of-Speech Tagger Krstna Toutanova Dept of Computer Scence Gates Bldg 4A, 353 Serra Mall Stanford, CA 94305 9040, USA krstna@cs.stanford.edu Chrstopher

More information

Learning from Multiple Outlooks

Learning from Multiple Outlooks Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel maayanga@tx.technon.ac.l she@ee.technon.ac.l

More information

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising* Probablstc Latent Semantc User Segmentaton for Behavoral Targeted Advertsng* Xaohu Wu 1,2, Jun Yan 2, Nng Lu 2, Shucheng Yan 3, Yng Chen 1, Zheng Chen 2 1 Department of Computer Scence Bejng Insttute of

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

An interactive system for structure-based ASCII art creation

An interactive system for structure-based ASCII art creation An nteractve system for structure-based ASCII art creaton Katsunor Myake Henry Johan Tomoyuk Nshta The Unversty of Tokyo Nanyang Technologcal Unversty Abstract Non-Photorealstc Renderng (NPR), whose am

More information