Dimensionality Reduction for Data Visualization
|
|
|
- Marsha Hubbard
- 10 years ago
- Views:
Transcription
1 Dmensonalty Reducton for Data Vsualzaton Samuel Kask and Jaakko Peltonen Dmensonalty reducton s one of the basc operatons n the toolbox of data-analysts and desgners of machne learnng and pattern recognton systems. Gven a large set of measured varables but few observatons, an obvous dea s to reduce the degrees of freedom n the measurements by representng them wth a smaller set of more condensed varables. Another reason for reducng the dmensonalty s to reduce computatonal load n further processng. A thrd reason s vsualzaton. Lookng at the data s a central ngredent of exploratory data analyss, the frst stage of data analyss where the goal s to make sense of the data before proceedng wth more goal-drected modelng and analyses. It has turned out that although these dfferent tasks seem alke ther soluton needs dfferent tools. In ths artcle we show that dmensonalty reducton to data vsualzaton can be represented as an nformaton retreval task, where the qualty of vsualzaton can be measured by precson and recall measures and ther smoothed extensons, and that vsualzaton can be optmzed to drectly maxmze the qualty for any desred tradeoff between precson and recall, yeldng very well-performng vsualzaton methods. HISTORY Each multvarate observaton x = [x 1,...x n ] T s a pont n an n-dmensonal space. A key dea n dmensonalty reducton s that f the data les n a d-dmensonal (d < n) subspace of the n- dmensonal space, and f we can dentfy the subspace, then there exsts a transformaton whch loses no nformaton and allows the data to be represented n a d-dmensonal space. If the data les n a (lnear) subspace then the transformaton s lnear, and more generally the data may le n a d-dmensonal (curved) manfold and the transformaton s non-lnear. Among the earlest methods are so-called Multdmensonal Scalng (MDS) methods [1] whch try to poston data ponts nto a d-dmensonal space such that ther parwse dstances are preserved as well as possble. If all parwse dstances are preserved, t can be argued that the data manfold has been dentfed (up to some transformatons). In practce, data of course are nosy and the soluton s found by mnmzng a cost functon such as the squared loss between the parwse dstances, E MDS =,j (d(x,x j ) d(x,x j ))2, where thed(x,x j ) are the orgnal dstances between the pontsx andx j, and thed(x,x j ) are the dstances between ther representatonsx andx j n thed-dmensonal space. MDS comes n several flavors that dffer n ther specfc form of cost functon and addtonal constrants on the mappng, and some of the choces gve famlar methods such as Prncpal Components Analyss or Sammon s mappng as specal cases. Neural computng methods are other wdely used famles of manfold embeddng methods. Socalled Autoencoder Networks (see, e.g., [2]) pass the data vector through a lower-dmensonal bottleneck layer n a neural network whch ams to reproduce the orgnal vector. The actvtes of the 1
2 neurons n the bottleneck layer gve the coordnates on the data manfold. Self-Organzng Maps (see [3]), on the other hand, drectly learn a dscrete representaton of a low-dmensonal manfold by postonng weght vectors of neurons along the manfold; the result s a dscrete approxmaton to prncpal curves or manfolds, a non-lnear generalzaton of prncpal components [4]. In 2000 a new manfold learnng boom was begun after publcaton of two papers n Scence showng how to learn nonlnear data manfolds. Locally Lnear Embeddng [5] made, as the name reveals, locally lnear approxmatons to the nonlnear manfold. The other, called Isomap [6], s essentally MDS tuned to work along the data manfold. After the manfold has been learned, dstances wll be computed along the manfold. But plan MDS tres to approxmate dstances of the data space whch do not follow the manfold, and hence plan MDS wll not work n general. That s why Isomap starts by computng dstances along the data manfold, approxmated by a graph connectng neghbor ponts. Snce only neghbors are connected, the connectons are lkely to be on the same part of the manfold nstead of jumpng across gaps to dfferent brances; dstances along the neghborhood graph are thus decent approxmatons of dstances along the data manfold known as geodesc dstances. A large number of other approaches have been ntroduced for learnng of manfolds durng the past ten years, ncludng methods based on spectral graph theory and based on smultaneous varance maxmzaton and dstance preservaton. CONTROVERSY Manfold learnng research has been crtczed for lack of clear goals. Many papers ntroduce a new method and only show ts performance by nce mages of how t learns a toy manfold. A famous example s the Swss roll, a two-dmensonal data sheet curved n three dmensons nto a Swss roll shape. Many methods have been shown capable of unrollng the Swss roll but few have been shown to have real applcatons, success stores, or even to quanttatvely outperform alternatve methods. One reason why quanttatve comparsons are rare s that the goal of manfold embeddng has not always been clearly defned. In fact, manfold learnng may have several alternatve goals dependng on how the learned manfold wll be used. We focus on one specfc goal, data vsualzaton, ntended for helpng analysts to look at the data and fnd related observatons durng exploratory data analyss. Data vsualzaton s tradtonally not a well-defned task ether. But t s easy to observe emprcally [7] that many of the manfold learnng methods are not good for data vsualzaton. The reason s that they have been desgned to fnd a d-dmensonal manfold f the nherent dmensonalty of data sd. For vsualzaton, the dsplay needs to haved= 2 ord= 3; that s, the dmensonalty may need to be reduced beyond the nherent dmensonalty of data. NEW PRINCIPLE It s well-known that a hgh-dmensonal data set cannot n general be fathfully represented n a lower-dmensonal space, such as the plane wth d = 2. Hence a vsualzaton method needs to choose what knds of errors to make. The choce naturally should depend on the vsualzaton goal; t turns out that under a specfc but general goal the choce can be expressed as an nterestng tradeoff, as we wll descrbe below. When the task s to vsualze whch data ponts are smlar, the vsualzaton can have two knds of errors (Fgure 1): t can mss some smlartes (.e. t can place smlar ponts far apart as false negatves) or t can brng dssmlar data ponts close together as false postves. If we know the 2
3 Input space P x * * * Output space (vsualzaton) * * * y Q mss false postves Fgure 1: A vsualzaton can have two knds of errors (from [9]). When a neghborhood P n the hgh-dmensonal nput space s compared to a neghborhood Q n the vsualzaton, false postves are ponts that appear to be neghbors n the vsualzaton but are not n the orgnal space; msses (whch could also be called false negatves) are ponts that are neghbors n the orgnal space but not n the vsualzaton. cost of each type of error, the vsualzaton can be optmzed to mnmze the total cost. Hence, once the user gves the relatve cost of msses and false postves, t fxes vsualzaton to be a welldefned optmzaton task. It turns out [8, 9] that under smplfyng assumptons the two costs turn nto precson and recall, standard measures between whch a user-defned tradeoff s made n nformaton retreval. Hence, the task of vsualzng whch ponts are smlar can be formalzed as a task of vsual nformaton retreval, that s, retreval of smlar ponts based on the vsualzaton. The vsualzaton can be optmzed to maxmze nformaton retreval performance, nvolvng as an unavodable element a trade-off between precson and recall. In summary, vsualzaton can be made nto a rgorous modelng task, under the assumpton that the goal s to vsualze whch data ponts are smlar. When the smplfyng assumptons are removed the neghborhoods are allowed to be contnuousvalued probablty dstrbutons p j of pont j beng a neghbor of pont. Then t can be shown that sutable analogues of precson and recall are dstances between the neghborhood dstrbutons p n the nput space and q on the dsplay. More specfcally, the Kullback-Lebler dvergence D(p,q ) reduces under smplfyng assumptons to recall and D(q,p ) to precson. The total cost s then E =λ D(p,q )+(1 λ) D(q,p ), (1) j whereλ s the relatve cost of msses and false postves. The dsplay coordnates of all data ponts are then optmzed to mnmze ths total cost; several nonlnear optmzaton approaches could be used, we have smply used conjugate gradent descent. Ths method has been called NeRV for Neghbor Retreval Vsualzer [8, 9]. Whenλ = 1 the method reduces to Stochastc Neghbor Embeddng [10], an earler method whch we now see maxmzes recall. 3
4 Fgure 2: Tradeoff between precson and recall n vsualzng a sphere (from [9]). Left: the threedmensonal locaton of ponts on the three-dmensonal sphere s encoded nto colors and glyph shapes. Center: two-dmensonal vsualzaton that maxmzes recall by squashng the sphere flat. All orgnal neghbors reman close-by but false postves (false neghbors) from opposte sdes of the sphere also become close-by. Rght: vsualzaton that maxmzes precson by peelng the sphere surface open. No false postves are ntroduced but some orgnal neghbors are mssed across the edges of the tear. Vsualzaton of a smple data dstrbuton makes the meanng of the tradeoff between precson and recall more concrete. When vsualzng the surface of a three-dmensonal sphere n two dmensons, maxmzng recall squashes the sphere flat (Fgure 2) whereas maxmzng precson peels the surface open. Both solutons are good, but have dfferent knds of errors. Both nonlnear and lnear vsualzatons can be optmzed by mnmzng (1). The remanng problem s how to defne the neghborhoods p; n the absence of more knowledge, symmetrc Gaussans or more heavy-taled dstrbutons are justfable choces. An even better alternatve s to derve the neghborhood dstrbutons from probablstc models that encode our knowledge of the data, both pror knowledge and what was learned from data. Dervng nput smlartes from a probablstc model has recently been done n Fsher Informaton Nonparametrc Embeddng [11], where the smlartes (dstances) approxmate Fsher nformaton dstances (geodesc dstances where the local metrc s defned by a Fsher nformaton matrx) derved from nonparametrc probablstc models. In related earler work [12, 13], approxmated geodesc dstances were computed n a learnng metrc derved usng Fsher nformaton matrces for a condtonal class probablty model. In all these works, though, the dstances were gven to standard vsualzaton methods, whch have not been desgned for a clear task of vsual nformaton retreval. In contrast, we wll combne the model-based nput smlartes to the rgorous precsonrecall approach to vsualzaton. Then the whole procedure corresponds to a well-defned modelng task where the goal s to vsualze whch data ponts are smlar. We wll next dscuss ths n more detal n two concrete applcatons. 4
5 APPLICATION 1: VISUALIZATION OF GENE EXPRESSION COMPENDIA FOR RETRIEVING RELEVANT EXPERIMENTS In the study of molecular bologcal systems, behavor of the system can seldom be nferred from frst prncples ether because such prncples are not known yet or because each system s dfferent. The study needs to be data-drven. Moreover, n order to make research cumulatve, new experments need to be placed n the context of earler knowledge. In the case of data-drven research, a key part of that sretrevalofrelevantexperments. An earler experment, a set of measurements, s relevant f some of the same bologcal processes are actve n t, ether ntentonally or as sde effects. In molecular bology t has become standard practce to store expermental data n repostores such as ArrayExpress of the European Bonformatcs Insttute EBI. Tradtonally, experments are sought from the repostory based on metadata annotatons only, whch works well when searchng for experments that nvolve well-annotated and well-known bologcal phenomena. In the nterestng case of studyng and modelng new fndngs, more data-drven approaches are needed, and nformaton retreval and vsualzaton based onlatent varable models are promsng tools [14]. Let s assume that n experment data g have been measured; n the concrete case below g wll be a dfferental gene expresson vector, where g j s expresson level of gene or gene set j compared to a control measurement. Now f we ft to the compendum a model that generates a probablty dstrbuton over the experments, p(g,z θ), where the θ are parameters of the model whch we wll omt below and z are latent varables, ths model can be used for retreval and vsualzaton as explaned below. Ths modelng approach makes sense n partcular f the model s constructed such that the latent varables have an nterpretaton as actvtes of latent or underlyng bologcal processes whch are manfested ndrectly as the dfferental gene expresson. Gven the model, relevance can be defned n a natural way as follows: Lkelhood of experment beng relevant for an earler experment j s p(g g j ) = p(g z)p(z g j )dz. That s, the experment s relevant f t s lkely that the measurements have arsen as products of the same unknown bologcal processes z. Ths defnton of relevance can now be used for retrevng the most relevant experments and, moreover, the defnton can be used as the natural probablty dstrbuton p n (1) to construct a vsual nformaton retreval nterface (Fgure 3); n ths case the data are 105 mcroarray experments from the Array Express database, comparng pathologcal samples such as cancer tssues to healthy samples. Above the vsual nformaton retreval dea was explaned n abstract concepts, applcable to many data sources. In the gene expresson retreval case of Fgure 3, the data were expressons of a pror defned gene sets, quantzed nto counts, and the probablstc model was the Dscrete Prncpal Component Analyss model, also called Latent Drchlet Allocaton, and n the context of texts called a topc model. The resultng relevances can drectly be gven as nputs to the Neghbor Retreval Vsualzer (NeRV); n Fgure 3 a slghtly modfed varant of the relevances was used, detals n [14]. In summary, fttng a probablstc latent varable model to the data produces a natural relevance measure whch can then be plugged as a smlarty measure nto the vsualzaton framework. Everythng from start to fnsh s then based on rgorous choces. APPLICATION 2: VISUALIZATION OF GRAPHS Graphs are a natural representaton of data n several felds where vsualzatons are helpful: socal networks analyss, nteracton networks n molecular bology, ctaton networks, etc. In a sense, graphs 5
6 A B B C prostate cancer oxdatve phosphorylaton purne metabolsm atp synthess glycolyss chrohn s dsease bladder carcnoma Fgure 3: A vsual nformaton retreval nterface to a collecton of mcroarray experments vsualzed as glyphs on a plane (from [14]). A: Glyph locatons have been optmzed by the Neghbor Retreval Vsualzer so that relevant experments are close-by. For ths experment data, relevance s defned by the same data-drven bologcal processes beng actve, as modeled by a latent varable model (component model). B: Enlarged vew wth annotatons; each color bar corresponds to a bologcal component or process, and the wdth tells the actvty of the component. These experments are retreved as relevant for the melanoma experment shown n the center. C: The bologcal components (nodes n the mddle) lnk the experments (left) to sets of genes (rght) actvated n them. 6
7 A B C Fgure 4: Vsualzatons of graphs. A: US college football teams (nodes) and who they played aganst (edges). The vsual groups of teams match the 12 conferences arranged for yearly play (shown wth dfferent colors). B-C: word adjacences n the works of Jane Austen. The nodes are words, and edges mean the words appeared next to each other n the text. The NeRV vsualzaton n B shows vsual groups whch reveal syntactc word categores: adjectves, nouns and verbs shown n blue, red, and green. The edge bundles reveal dsassortatve structure whch matches ntuton, for example, verbs are adjacent n text to nouns or adjectves and not to other verbs. Earler graph layout methods (Walshaw s algorthm shown n C) fal to reveal the structure. Fgure from [17], c ACM, are hgh-dmensonal structured data where nodes are ponts and all other nodes are dmensons; the value of the dmenson s the type or strength of the lnk. There exst lots of graph drawng algorthms, ncludng strng analogy-based methods such as Walshaw s algorthm [15] and spectral methods [16]. Most of them focus explctly or mplctly on local propertes of graphs, drawng nodes lnked by an edge close together but avodng overlap. That works well for smple graphs but for large and complcated ones addtonal prncples are needed to avod the famous harball vsualzatons. A promsng drecton forward s to learn a probablstc latent varable model of the graph, n the hope of capturng ts central propertes, and then focus on vsualzng those propertes. In the case of graphs, the data to be modeled s whch other nodes a node lnks to. But as the observed lnks n a network may be stochastc (nosy) measurements such as gene nteracton measurements, t makes sense to assume that the lnks are a sample from an underlyng lnk dstrbuton, and learn a probablstc latent varable model to model the dstrbutons. The smlarty of two nodes s then naturally evaluated as smlarty of ther lnk dstrbutons. The rest of the vsualzaton can proceed as n the prevous secton, wth experments replaced by graph nodes. Fgure 4 shows sample graphs vsualzed based on a varant of Dscrete Prncpal Components Analyss or Latent Drchlet Allocaton sutable for graphs. Wth ths lnk dstrbuton-based approach, the Neghbor Retreval Vsualzer places nodes close-by on the dsplay f they lnk to smlar other nodes, wth smlarty defned as smlarty of lnk dstrbutons. Ths has the nce sde-result that lnks form bundles where all start nodes are smlar and all end nodes are smlar. In summary, the dea s to use any pror knowledge n choosng a sutable model for the graph, and after that all steps of the vsualzaton follow naturally and rgorously from start to fnsh. In the absence of pror knowledge flexble machne learnng models such as the Dscrete Prncpal Components Analyss above can be learned from data. 7
8 CONCLUSIONS We have dscussed dmensonalty reducton for a specfc goal, data vsualzaton, whch has been so far defned mostly only heurstcally. Recently t has been suggested that a specfc knd of data vsualzaton task, that s, vsualzaton of smlartes of data ponts, could be formulated as a vsual nformaton retreval task, wth a well-defned cost functon to be optmzed. The nformaton retreval connecton further reveals that a tradeoff between msses and false postves needs to be made n vsualzaton as n all other nformaton retreval. Moreover, the vsualzaton task can be turned nto a well-defned modelng problem by nferrng the smlartes usng probablstc models that are learned to ft the data. A free software package that solves nonlnear dmensonalty reducton as vsual nformaton retreval, wth a method called NeRV for Neghbor Retreval Vsualzer, s avalable at AUTHORS Samuel Kask ([email protected]) s a Professor of Computer Scence n Aalto Unversty and Drector of Helsnk Insttute for Informaton Technology HIIT, a jont research nsttute of Aalto Unversty and Unversty of Helsnk. He studes machne learnng, n partcular mult-source machne learnng, wth applcatons n bonformatcs, neuronformatcs and proactve nterfaces. Jaakko Peltonen ([email protected]) s a postdoctoral researcher and docent at Aalto Unversty, Department of Informaton and Computer Scence. He receved the D.Sc. degree from Helsnk Unversty of Technology n He s an assocate edtor of Neural Processng Letters and has served n program commttees of eleven conferences. He studes generatve and nformaton theoretc machne learnng especally for exploratory data analyss, vsualzaton, and mult-source learnng. References [1] I. Borg and P. Groenen, ModernMultdmensonal Scalng. New York: Sprnger, [2] G. Hnton, Connectonst learnng procedures, Artfcal Intellgence, vol. 40, pp , [3] T. Kohonen,Self-Organzng Maps. Berln: Sprnger, 3rd ed., [4] F. Muler and V. Cherkassky, Self-organzaton as an teratve kernel smoothng process, Neural Computaton, vol. 7, pp , [5] S. T. Rowes and L. K. Saul, Nonlnear dmensonalty reducton by locally lnear embeddng, Scence, vol. 290, pp , [6] J. B. Tenenbaum, V. de Slva, and J. C. Langford, A global geometrc framework for nonlnear dmensonalty reducton, Scence, vol. 290, pp , [7] J. Venna and S. Kask, Comparson of vsualzaton methods for an atlas of gene expresson data sets, Informaton Vsualzaton, vol. 6, pp ,
9 [8] J. Venna and S. Kask, Nonlnear dmensonalty reducton as nformaton retreval, n Proceedngs of AISTATS*07, the 11th Internatonal Conference on Artfcal Intellgence and Statstcs (JMLR Workshop and Conference Proceedngs Volume 2) (M. Mela and X. Shen, eds.), pp , [9] J. Venna, J. Peltonen, K. Nybo, H. Ados, and S. Kask, Informaton retreval perspectve to nonlnear dmensonalty reducton for data vsualzaton, Journal of Machne Learnng Research, vol. 11, pp , [10] G. Hnton and S. T. Rowes, Stochastc neghbor embeddng, n Advances n Neural Informaton Processng Systems 14 (T. Detterch, S. Becker, and Z. Ghahraman, eds.), pp , Cambrdge, MA: MIT Press, [11] K. M. Carter, R. Rach, W. G. Fnn, and A. O. Hero III, FINE: Fsher nformaton nonparametrc embeddng, IEEE Transactons on Pattern Analyss and Machne Intellgence, vol. 31, no. 11, pp , [12] S. Kask, J. Snkkonen, and J. Peltonen, Bankruptcy analyss wth self-organzng maps n learnng metrcs, IEEETransactons on NeuralNetworks, vol. 12, pp , [13] J. Peltonen, A. Klam, and S. Kask, Improved learnng of Remannan metrcs for exploratory analyss, NeuralNetworks, vol. 17, pp , [14] J. Caldas, N. Gehlenborg, A. Fasal, A. Brazma, and S. Kask, Probablstc retreval and vsualzaton of bologcally relevant mcroarray experments, Bonformatcs, vol. 25, no. 12, pp , [15] C. Walshaw, A multlevel algorthm for force-drected graph drawng, n GD 00: Proceedngs of the 8th Internatonal Symposum on Graph Drawng, (London, UK), pp , Sprnger- Verlag, [16] K. M. Hall, An r-dmensonal quadratc placement algorthm, Management Scence, vol. 17, no. 3, pp , [17] J. Parkknen, K. Nybo, J. Peltonen, and S. Kask, Graph vsualzaton wth latent varable models, n Proceedngs of MLG-2010, the Eghth Workshop on Mnng and Learnng wth Graphs, (New York, NY, USA), pp , ACM, DOI: 9
L10: Linear discriminants analysis
L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.
Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)
Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton
Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION
Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble
Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering
Out-of-Sample Extensons for LLE, Isomap, MDS, Egenmaps, and Spectral Clusterng Yoshua Bengo, Jean-Franços Paement, Pascal Vncent Olver Delalleau, Ncolas Le Roux and Mare Oumet Département d Informatque
Active Learning for Interactive Visualization
Actve Learnng for Interactve Vsualzaton Tomoharu Iwata Nel Houlsby Zoubn Ghahraman Unversty of Cambrdge Unversty of Cambrdge Unversty of Cambrdge Abstract Many automatc vsualzaton methods have been. However,
CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht [email protected] 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
Data Visualization by Pairwise Distortion Minimization
Communcatons n Statstcs, Theory and Methods 34 (6), 005 Data Vsualzaton by Parwse Dstorton Mnmzaton By Marc Sobel, and Longn Jan Lateck* Department of Statstcs and Department of Computer and Informaton
Loop Parallelization
- - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze
Luby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
What is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
Project Networks With Mixed-Time Constraints
Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
Georey E. Hinton. University oftoronto. Email: [email protected]. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract
The EM Algorthm for Mxtures of Factor Analyzers Zoubn Ghahraman Georey E. Hnton Department of Computer Scence Unversty oftoronto 6 Kng's College Road Toronto, Canada M5S A4 Emal: [email protected] Techncal
An Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications
CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary
Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
Single and multiple stage classifiers implementing logistic discrimination
Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,
Support Vector Machines
Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.
Forecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting
Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of
8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by
6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng
BERNSTEIN POLYNOMIALS
On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful
A Fast Incremental Spectral Clustering for Large Data Sets
2011 12th Internatonal Conference on Parallel and Dstrbuted Computng, Applcatons and Technologes A Fast Incremental Spectral Clusterng for Large Data Sets Tengteng Kong 1,YeTan 1, Hong Shen 1,2 1 School
v a 1 b 1 i, a 2 b 2 i,..., a n b n i.
SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are
Learning from Multiple Outlooks
Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel [email protected] [email protected]
PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12
14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
Conversion between the vector and raster data structures using Fuzzy Geographical Entities
Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,
Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S
S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta
A Simple Approach to Clustering in Excel
A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa
Fisher Markets and Convex Programs
Fsher Markets and Convex Programs Nkhl R. Devanur 1 Introducton Convex programmng dualty s usually stated n ts most general form, wth convex objectve functons and convex constrants. (The book by Boyd and
Bypassing Synthesis: PLS for Face Recognition with Pose, Low-Resolution and Sketch
Bypassng Synthess: PLS for Face Recognton wth Pose, Low-Resoluton and Setch Abhshe Sharma Insttute of Advanced Computer Scence Unversty of Maryland, USA [email protected] Davd W Jacobs Insttute of Advanced
An Algorithm for Data-Driven Bandwidth Selection
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 2, FEBRUARY 2003 An Algorthm for Data-Drven Bandwdth Selecton Dorn Comancu, Member, IEEE Abstract The analyss of a feature space
Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification
Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson
Support vector domain description
Pattern Recognton Letters 20 (1999) 1191±1199 www.elsever.nl/locate/patrec Support vector doman descrpton Davd M.J. Tax *,1, Robert P.W. Dun Pattern Recognton Group, Faculty of Appled Scence, Delft Unversty
+ + + - - This circuit than can be reduced to a planar circuit
MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to
Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
Logistic Regression. Steve Kroon
Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro
DEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
A DATA MINING APPLICATION IN A STUDENT DATABASE
JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul
A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña
Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION
Recurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,
Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*
Probablstc Latent Semantc User Segmentaton for Behavoral Targeted Advertsng* Xaohu Wu 1,2, Jun Yan 2, Nng Lu 2, Shucheng Yan 3, Yng Chen 1, Zheng Chen 2 1 Department of Computer Scence Bejng Insttute of
Machine Learning and Data Mining Lecture Notes
Machne Learnng and Data Mnng Lecture Notes CSC 411/D11 Computer Scence Department Unversty of Toronto Verson: February 6, 2012 Copyrght c 2010 Aaron Hertzmann and Davd Fleet CONTENTS Contents Conventons
Review of Hierarchical Models for Data Clustering and Visualization
Revew of Herarchcal Models for Data Clusterng and Vsualzaton Lola Vcente & Alfredo Velldo Grup de Soft Computng Seccó d Intel lgènca Artfcal Departament de Llenguatges Sstemes Informàtcs Unverstat Poltècnca
An interactive system for structure-based ASCII art creation
An nteractve system for structure-based ASCII art creaton Katsunor Myake Henry Johan Tomoyuk Nshta The Unversty of Tokyo Nanyang Technologcal Unversty Abstract Non-Photorealstc Renderng (NPR), whose am
A Multi-Camera System on PC-Cluster for Real-time 3-D Tracking
The 23 rd Conference of the Mechancal Engneerng Network of Thaland November 4 7, 2009, Chang Ma A Mult-Camera System on PC-Cluster for Real-tme 3-D Trackng Vboon Sangveraphunsr*, Krtsana Uttamang, and
Enterprise Master Patient Index
Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable
Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending
Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success
A novel Method for Data Mining and Classification based on
A novel Method for Data Mnng and Classfcaton based on Ensemble Learnng 1 1, Frst Author Nejang Normal Unversty;Schuan Nejang 641112,Chna, E-mal: [email protected] Abstract Data mnng has been attached great
MONITORING OF DISTILLATION COLUMN OPERATION THROUGH SELF -ORGANIZING MAPS. Y.S. Ng and R. Srinivasan*
MONITORING OF DISTILLATION COLUMN OPERATION THROUGH SELF -ORGANIZING MAPS Y.S. Ng and R. Srnvasan* Laboratory for Intellgent Applcatons n Chemcal Engneerng, Department of Chemcal and Bomolecular Engneerng,
Lecture 5,6 Linear Methods for Classification. Summary
Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson
Traffic State Estimation in the Traffic Management Center of Berlin
Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal [email protected] Peter Möhl, PTV AG,
How To Understand The Results Of The German Meris Cloud And Water Vapour Product
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence
1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
Laddered Multilevel DC/AC Inverters used in Solar Panel Energy Systems
Proceedngs of the nd Internatonal Conference on Computer Scence and Electroncs Engneerng (ICCSEE 03) Laddered Multlevel DC/AC Inverters used n Solar Panel Energy Systems Fang Ln Luo, Senor Member IEEE
Exploiting Recommendation on Social Media Networks
Internatonal Journal of Scence and Research IJSR) ISSN Onln: 2319-7064 Index Coperncus Value 2013): 6.14 Impact Factor 2013): 4.438 Explotng Recommendaton on Socal Meda Networs Swat A. Adhav 1, Sheetal
Network Security Situation Evaluation Method for Distributed Denial of Service
Network Securty Stuaton Evaluaton Method for Dstrbuted Denal of Servce Jn Q,2, Cu YMn,2, Huang MnHuan,2, Kuang XaoHu,2, TangHong,2 ) Scence and Technology on Informaton System Securty Laboratory, Bejng,
Distributed Multi-Target Tracking In A Self-Configuring Camera Network
Dstrbuted Mult-Target Trackng In A Self-Confgurng Camera Network Crstan Soto, B Song, Amt K. Roy-Chowdhury Department of Electrcal Engneerng Unversty of Calforna, Rversde {cwlder,bsong,amtrc}@ee.ucr.edu
Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data
Journal of Al Azhar Unversty-Gaza (Natural Scences), 2011, 13 : 109-118 Estmatng the Number of Clusters n Genetcs of Acute Lymphoblastc Leukema Data Mahmoud K. Okasha, Khaled I.A. Almghar Department of
A Probabilistic Theory of Coherence
A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want
Ring structure of splines on triangulations
www.oeaw.ac.at Rng structure of splnes on trangulatons N. Vllamzar RICAM-Report 2014-48 www.rcam.oeaw.ac.at RING STRUCTURE OF SPLINES ON TRIANGULATIONS NELLY VILLAMIZAR Introducton For a trangulated regon
Mining Multiple Large Data Sources
The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of
Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University
Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence
Implementation of Deutsch's Algorithm Using Mathcad
Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"
Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems
Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent
320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:
An Analysis of Dynamic Severity and Population Size
An Analyss of Dynamc Severty and Populaton Sze Karsten Wecker Unversty of Stuttgart, Insttute of Computer Scence, Bretwesenstr. 2 22, 7565 Stuttgart, Germany, emal: [email protected]
An Empirical Study of Search Engine Advertising Effectiveness
An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman
An Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
Testing and Debugging Resource Allocation for Fault Detection and Removal Process
Internatonal Journal of New Computer Archtectures and ther Applcatons (IJNCAA) 4(4): 93-00 The Socety of Dgtal Informaton and Wreless Communcatons, 04 (ISSN: 0-9085) Testng and Debuggng Resource Allocaton
IMPACT ANALYSIS OF A CELLULAR PHONE
4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng
ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble
1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, [email protected] Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In
Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,
Performance Analysis and Coding Strategy of ECOC SVMs
Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School
A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression
Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,
Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
Fixed income risk attribution
5 Fxed ncome rsk attrbuton Chthra Krshnamurth RskMetrcs Group [email protected] We compare the rsk of the actve portfolo wth that of the benchmark and segment the dfference between the two
Research on Engineering Software Data Formats Conversion Network
2606 JOURNAL OF SOFTWARE, VOL. 7, NO. 11, NOVEMBER 2012 Research on Engneerng Software Data Formats Converson Network Wenbn Zhao School of Instrument Scence and Engneerng, Southeast Unversty, Nanng, Jangsu,
"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *
Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC
) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance
Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell
Optimal resource capacity management for stochastic networks
Submtted for publcaton. Optmal resource capacty management for stochastc networks A.B. Deker H. Mlton Stewart School of ISyE, Georga Insttute of Technology, Atlanta, GA 30332, [email protected]
Discriminative Improvements to Distributional Sentence Similarity
Dscrmnatve Improvements to Dstrbutonal Sentence Smlarty Yangfeng J School of Interactve Computng Georga Insttute of Technology [email protected] Jacob Esensten School of Interactve Computng Georga Insttute
NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION
NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State
