Visualization, Clustering and Classification of Multidimensional Astronomical Data
|
|
- Alfred Kelly
- 8 years ago
- Views:
Transcription
1 Visualization, Clustering and Classification of Multidimensional Astronomical Data Antonino Staiano, Angelo Ciaramella, Lara De Vinco, Ciro Donalek, Giuseppe Longo, Giancarlo Raiconi, Roberto Tagliaferri, Roberto Amato, Carmine Del Mondo, Giuseppe Mangano, Gennaro Miele Dipartimento di Matematica ed Informatica, Università di Salerno, Fisciano (Sa), Italy {astaiano, rtagliaferri, ciaram, Dipartimento di Scienze Fisiche, Università Federico II di Napoli, Italy {longo, INFOTEL S.r.l., Via Strauss, Battipaglia (Sa), Italy Abstract Due to the recent technological advances, Data Mining in massive data sets has evolved as a crucial research field for many if not all areas of research: from astronomy to high energy physics, to genetics etc. In this paper we discuss an implementation of the Probabilistic Principal Surfaces (PPS) which was developed within the framework of the AstroNeural collaboration. PPS are a nonlinear latent variable model which may be regarded as a complete mathematical framework to accomplish some fundamental data mining activities such as: visualization, clustering and classification of high dimensional data. The effectiveness of the proposed model is exemplified referring to a complex astronomical data set. I. INTRODUCTION The explosive growth in the quantity, quality and accessibility of data which is currently experienced in all fields of science and human endeavor, has triggered the search for a new generation of computational theories and tools capable to assist humans in extracting useful information (knowledge) from the available and planned massive data sets. This revolution has two main aspects: on the one hand in astronomy (as well as in high energy physics, genetics, social sciences, and in many other fields) traditional interactive data analysis and data visualization methods have proved to be far inadequate to cope with data sets which are characterized by huge volumes and/or complexity (ten or hundreds of parameter or features per record, cf. [] and references therein). In second place, the simultaneous analysis of hundreds of parameters unveils previously unknown patterns which may lead to a deeper understanding of the underlaying phenomena and trends. The field of Data Mining is therefore becoming of paramount importance not only in its traditional arena but also as an auxiliary tool for almost all fields of research. In this paper we discuss how three common tasks in data analysis (data visualization, clustering and data classification) may be performed using Spherical Probabilistic Principal Surfaces (PPS) as a common framework. Visualization: it is a crucial step in the process of data analysis, enabling an understanding of the relations that exists within the data by displaying them in such a way that the discovered patterns are emphasized. Clustering: it is perhaps the most important and widely used method of unsupervised learning. It may be synthetised problem of identifying groupings of similar points that are relatively isolated from each other, or in other words to partition the data into dissimilar groups of similar items. Classification: it concerns with assigning a given pattern to one of a number of possible classes which depends on the problem at hand. Such classes may be the results of a labeling accomplished on groupings resulting from a clustering procedure. PPS [6], [7] (discussed in Section II) are a nonlinear extension of principal components, in that each node on the PPS is the average of all data points that projects near/onto it. From a theoretical standpoint, the PPS is a generalization of the Generative Topographic Mapping (GTM) [2], [3], which can be seen as a parametric alternative to Self Organizing Maps (SOM) []. Some advantages of PPS includes its parametric and flexible formulation for any geometry/topology in any dimension, guaranteed convergence (indeed the PPS training is accomplished through the Expectation-Maximization algorithm). A PPS is governed by its latent topology and, owing to their flexibility, a variety of PPS topology can be created one of which is the 3D sphere. The sphere is finite and unbounded, with all nodes distributed at the edge, it therefore is ideal for emulating the sparseness and peripheral property of high-d data. Furthermore, the sphere topology can be easily comprehended by humans and thereby be og great help in visualizing high-d data (Section III-A). Since PPS generates a probability density function of the input data, in the form of a mixture of Gaussians, it can be used both for clustering (Section III-B) and classification (Section III-C) purposes. To illustrate the power and the effectivness of the model, we shall discuss a case study in the field of astronomy using a real and complex data set (Section IV). All results discussed here were obtained in the framework of the AstroNeural collaboration: a joint project between the Department of Mathematics and Informatics of the University of Salerno and the Department of Physical Sciences of the University Federico II in Napoli.
2 The main goal of the collaboration is to implement a user friendly data mining tool capable to deal with heterogeneous, high dimensionality data sets. (a) Manifold in latent space R 3 x (b) Manifold in feature space R D t y(x) (c) t projected onto manifold in latent space R 3 E[x t] II. PPS: THEORETICAL DESCRIPTION PPS defines a non-linear, parametric mapping y(x; W) from a Q-dimensional latent space (x R Q ) to a D- dimensional data space (t R D ), where normally Q < D. The mapping y(x; W) (defined continuous and differentiable) maps every point in the latent space to a point into the data space. Since the latent space is Q-dimensional, these points will be confined to a Q-dimensional manifold non-linearly embedded into the D-dimensional data space. PPS builds a constrained mixture of Gaussians (where the priors are all fixed to M ) p(t W, Σ m ) = M M p(t x m, W, Σ m ), () m= and each component has the form: Σ m 2 2π D 2 e { 2 (y(xm;w) t)σ m (y(xm;w) t)t }, (2) where t is a point in the data space and Σ m denotes the noise variance. The covariance is defined as Σ m = α Q e q (x)e T (D αq) D q (x) + e d (x)e T d (x), β β(d Q) q= d=q+ (3) < α < D Q where α is a clamping factor which determines the orientation of the covariance, and {e q (x)} Q q= is the set of orthonormal vectors tangential to the manifold at y(x; w), {e d (x)} D d=q+ is the set of orthonormal vectors orthogonal to the manifold in y(x; w). The complete set of orthonormal vectors {e d (x)} D d= spans R D. The EM algorithm[8] can be used to estimate the PPS parameters W and β, while the clamping factor is fixed by the user and is assumed to be constant during the EM iterations. The form of the mapping y(x; w) is defined as a generalized linear regression model y(x; w) = Wφ(x) (4) where the elements of φ(x) consist of L fixed basis functions {φ l (x)} L l=, and W is a D L matrix. A. Spherical PPS If Q = 3 is chosen, a spherical manifold [6] can be constructed using a PPS with nodes {x m } M m= arranged regularly on the surface of a sphere in R 3 latent space, with the latent basis functions evenly distributed on the sphere at a lower density. Fig.. manifold in R 3 data space. (c) Projection of data points t onto the latent spherical manifold. (a) The spherical manifold in R 3 latent space. (b) The spherical ( III. APPLICATION OF PPS TO DATA MINING A. Visualization After a PPS model is fitted to the data, several visualization possibilities are available for analyzing the data points. ( ) Data point projections onto the latent sphere: The data are projected into the latent space as points onto a sphere (Figure ). The latent manifold coordinates ˆx n of each data point t n are computed as M ˆx n x t n = xp(x t)dx = r mn x m m= where r mn are the latent variable responsibilities defined as r mn = p(x m t n ) = p(t n x m )P (x m ) M m = p(t n x m )P (x m ) = p(t n x m ) M m = p(t n x m ). (5) Since x m = and m r mn =, for n =,..., N, these coordinates lie within a unit sphere, i.e. ˆx n. 2) Interactively selecting points on the sphere: Having projected the data on the latent sphere, a typical task performed by most data analyzers is the localization of the most interesting data points, for instance the ones lying far away from more dense areas (outlayers), or those lying in the overlapping regions between clusters, and to investigate their characteristics by linking the data points on the sphere with their position in the original data set. For instance, in the astronomical application described in Section IV if the images corresponding to the data were available, the user might want to visualize the object corresponding to the data point selected on the sphere. The user is also allowed to select a latent variable and color all the points for which that specific latent variable is responsible (Figure 2). 3) Visualizing the latent variable responsibilities on the sphere: Some insights on the number of agglomerates localized into the spherical latent manifold is provided by the mean of the responsibility for each latent variable. Furthermore, if we build a spherical manifold which is composed by a set of faces each one delimited by four vertices, then we can color each face with colors varying in intensity on the basis of the value of the responsibility associate to that given vertex (and hence, to each latent variable). The overall result is that the sphere will contain regions denser than others and this information is easily visible and understandable (see Figure 3). Obviously, denser areas of the spherical manifold
3 Fig. 2. Data points selection phase. The bold black circles represent the latent variables; the blue points represent the projected input data points. When a latent variable is selected, each projected point for which the variable is responsible is colored. By selecting a data point the user is provided with information about it: coordinates and index corresponding to the position in the original catalog. might contain more than one cluster, and this calls for further investigations. Fig. 3. B. Clustering Probability density function on the latent sphere Once the user has an overall idea of the number of clusters on the sphere, he can exploit this information through the use of agglomerative hierarchical clustering techniques [9] to find out the clusters. This task is accomplished by running the clustering algorithm on the Gaussian centers in the data space. Once the center have been agglomerated, the points for which the centers falling in the same agglomerate are responsible, are assigned to the same cluster. The projections of the points into the latent space are then used to visualize the clusters onto the latent sphere [] (see Fig. 4). C. Classification Classification can be accomplished in a twofold way: i) by constructing a reference manifold for each class defined in the classification problem, and then assigning any test point to the class of its nearest manifold (PPSRM); ii) assigning a test data choosing the class with the maximum posterior class probability for a given new input(ppspr). Fig. 4. Clusters computed in data space by hierarchical clustering. In [] it was shown that this second form of classification leads to better performance. However, since PPS builds a probability density function as a mixture of Gaussian distributions trained through EM algorithm, its performance may degrade with increasing data dimensionality due to singularities and local maxima in the log-likelihood function, therefore we propose two schemes for designing a committee of spherical PPS to gain improved probability density functions and hence classification rates. The area of ensembles of learning machines is now a well defined field and has been successfully applied to neural networks especially in the case of supervised learning algorithms. Fewer cases can be found to unsupervised learning methodologies and to density estimation as well: among these, the works introduced in [3] and [4] both exploits consolidated techniques in supervised contexts as stacking [5] and bagging [5] and represent the basis of our implementations. ) Stacked PPS: StPPS: The combining scheme herein described may be seen as an instantiation of the method proposed in [4]. Let us suppose we are given with S probabilistic principal surface models (i.e., S density estimators) {P P S s (t)} S s=, where P P S s (t) is the s-th PPS model. Note that in the original formulation given in [4], the S density estimators could also be of different kind, for example finite mixtures with a fixed number of component densities or kernel density estimate with a fixed kernel and a single fixed global bandwidth in each dimension. Each of the S PPS models can be chosen to be diverse enough, i.e. by considering different number of latent variables and latent bases. To stack the S PPS models, we follow the procedure described below: i) Let D the training data set, with size D = N. Partition D v times, as in v-fold cross-validation. The v-th fold contains exactly (v ) N v training data points and N v test data points both from the training set D. For each fold: a) fit each of the S PPS models to the training subset of D. b) evaluate the likelihood of each data point in the test partition of D, for each of the S fitted models. ii) At the end of these preliminary steps, we obtain S density estimators for each of the N data points which
4 are organized in a matrix A, of size N S, where each entry a is is P P S s (t i ); iii) Use the matrix A to estimate the combination coefficients {π s } S s= that maximize the log-likelihood at the points t i of a stacked density model of the form: StPPS(t) = S π s P P S s (t i ) s= which corresponds to maximize ( N S ) ln π s P P S s (t), i= s= as a function of the weight vector (π,..., π S ). Direct maximization of this function is a non-linear optimization problem. We can apply the EM algorithm directly, by observing that the stacked mixture is a finite mixture density with weights (π,..., π S ). Thus, we can use the standard EM algorithm for mixtures, except that the parameters of the component densities P P S s (t) are fixed and the only parameters allowed to vary are the mixture weights. iv) The concluding phase consists in the parameters reestimation of each of the S component PPS models using all of the training data D. The stacked density model is then the linear combination of the so obtained component PPS models, with combining coefficients {π s } S s=. 2) Bagged PPS: BgPPS: This combining scheme employees bagging as mean to average a single PPS in a way similar to the model proposed in [3]. All we have to do is to train a number S of PPS with S bootstrap replicates of the original learning data set. At the end of this training process, we obtain S different density estimates which are then averaged to form the overall density estimate model. Formally speaking, let D be the original training set of size N and {P P S s } S s= a set of PPS models: i) create S bootstrap replicates (with replacement) of D, {D Boot (s)} S s= with size N; ii) train each of the S PPS models with a bootstrap replicate D Boot ; iii) at the end of the training we obtain S density estimates {P P S s } S s=; iv) average the S density estimates {P P S s } S s= as BgPPS(t) = S S P P S s (t). s= IV. CASE STUDY The GOODS (id est the Great Observatories Origin Deep Surveys) catalog is a catalog composed by 2845 objects (both galaxies and stars). The survey was conducted in 7 optical bands, namely U,B,V,R,I,J,K bands and for for the experiments described here we considered 3 different parameters (i.e., Kron radius, Flux and Magnitudes) for each band, thus summing to a total number of 2 parameters. The experiment s catalog mean classification error\standard deviation GOODS Catalog: PPSRM,PPSPR,StPPS, BgPPS best model statistics PPSRM PPSPR StPPS BgPPS PPS Classifier Models.344 mean std Fig. 7. GOODS Catalog: PPSRM, PPSPR, StPPS and BgPPS best model statistics therefore contains about 27 galaxies and 4 stars. From a computational point of view, the main peculiarity of this data set is that the majority of the objects are drop outs, i.e. they are not detected in at least one of the bands (id est, not detected in only one band, two bands, three bands and so on). The data set, therefore, contains four classes of objects, namely star (S), galaxy (G), star which are drop outs (SD) and galaxy which are drop outs (GD) (we do not care about the number of bands for which an object is a drop out). A. GOODS Catalog Visualizations As it can be seen from Figure 5(a), the PCA visualization of the GOODS catalogue provides no interesting information at all and displays only a single condensed group of data. In PCA, the class of galaxies which are drop outs (whose objects are yellow colored), which contains the majority of objects (about 24) is near totally hidden. The PPS projections (Figure 5(b)), instead, show a large group consisting of the drop out galaxies and overlapping objects of the remaining objects and a well bounded group of galaxies. Figure 6(a) and 6(b) also depict the latent variable probability densities for galaxy and star objects, respectively. Note, especially, how different these densities appear for each group of objects. B. GOODS Catalog Classification GOODS catalog classification task is very complex. As it had to be expected on the grounds of astronomical expertise, the four classes are heavily overlapping and even in the best cases there are classes (i.e., S and SD) whose objects are classified with an error rate about 6%. This is evident from the results obtained by the different PPS classifiers we compared, namely PPSRM, PPSPR, StPPS and BgPPS. Anyway, ensembles of PPS perform better than single PPS as it can be seen in Figure 7. BgPPS, in particular, obtain the best performance with a best case classification error of 2.5% as shown in Table I. The BgPPS parameter setting is shown in Table II
5 (a) Fig. 5. (a) GOODS 3D PCA projections, (b) PPS projections on the sphere. (b) GOODS Catalog: PPS Class Star Density in Latent Space GOODS Catalog: PPS Class Galaxy Density in Latent Space x (a) Fig. 6. (a) galaxy density on the sphere (b) star density on the sphere. (b) TABLE I GOODS CATALOG: CONFUSION MATRIX COMPUTED BY BgPPS BEST MODEL Classifier Confusion Matrix α BgP P S(2.5) S G SD GD S G SD 64 7 GD TABLE II GOODS CATALOG: BgPPS PARAMETER SETTINGS Parameter Value Description M 266 number of latent variables L 83 number of basis functions L fac basis functions width iter maximum number of iteration ɛ. early stopping threshold V. CONCLUSIONS AND FUTURE WORK We have described how spherical PPS works as a framework to address data mining activities such as visualization, clustering and classification and we have seen its power and effectiveness when dealing with high-d data as the astronomical data. Above all, the spherical PPS, which consists of a spherical latent manifold lying in a three dimensional latent space, is better suitable to high-d data since the sphere is able to capture the sparsity and periphery of data in large input spaces which are due to the curse of dimensionality. Currently we are pursuing two directions to further enhance our system: i) developing a clustering algorithm able to directly exploit the PPS mixture Gaussian density to compute the clusters. The algorithm is based on the Kullback-Leibler distance to decide if two Gaussian component of the PPS mixture model must be aggregated. In this way the clustering is able to follow the input data density and to
6 compute by itself the number of clusters. ii) building a hierarchical PPS for constructing localized nonlinear projection manifold as already done for GTM [2] and previously for a linear latent variable model [4]. Following [2], a hierarchy of PPS could be organized in a tree whose root corresponds to the PPS model trained on the entire data set at hand, and whose nodes, built interactively in a top-down fashion, represent PPS models trained in localized regions of the data input chosen in the ancestor plot PPS by the user, interactively. In all the sub-models one might exploit all the visualization and clustering options discussed in this paper. ACKNOWLEDGMENT The authors would like to thank all past and present members of the Astroneural collaboration. Astroneural is sponsored by the MIUR (Italian Bureau for University and Research) and by Regione Campania. The authors also wish to thank P. benvenuti for many discussions and for supporting this work since its beginning. REFERENCES [] J. Abello, P.M. Pardalos, M.G.C. Resende Editors: Handbook of Massive Data Sets, Kluwer Academic Publishers (22) [2] C. M. Bishop, M. Svensen, C.K.I. Williams, GTM: The Generative Topographic Mapping, Neural Computation, (), 998. [3] C.M. Bishop, M. Svensén, and C. K. I. Williams, Developments of the Generative Topographic Mapping, Neurocomputing 2, 998. [4] C.M. Bishop and M.E. Tipping, A hierarchical latent variable model for data visualization, IEEE Transactions on Pattern Analysis and Machine Intelligence 2(3), 28293,998 [5] L. Breiman, Bagging Predictors, Machine Learning, 26, 996 [6] K. Chang, Nonlinear Dimensionality Reduction Using Probabilistic Principal Surfaces, PhD Thesis, Department of Electrical and Computer Engineering, The University of Texas at Austin, USA, 2 [7] K. Chang, J. Ghosh, A unified Model for Probabilistic Principal Surfaces, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 23, NO., 2 [8] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum-Likelihood from Incomplete Data Via the EM Algorithm, J. Royal Statistical Soc., Vol. 39, NO., 977 [9] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley and Sons, 2 [] T. Kohonen, Self-Organizing Maps, Springer-Verlag, Berlin, (995) [] A. Staiano, Unsupervised Neural Networks For The Extraction of Scientific Information from Astronomical Data, PhD Thesis, University of Salerno, Italy, 23 [2] P. Tino, I. Nabney,Hierarchical GTM: constructing localized non-linear projection manifolds in a principled way, IEEE Transactions on Pattern Analysis and Machine Intelligence, in print [3] D. Ormoneit, V. Tresp, Averaging, Maximum Likelihood and Bayesian Estimation for Improving Gaussian Mixture Probability Density Estimates, IEEE Transaction on Neural Networks, Vol.9, NO. 4, 998 [4] P. Smyth, D.H. Wolpert, An evaluation of linearly combining density estimators via stacking, Machine Learning, Vol. 36, 999. [5] D.H. Wolpert, Stacked Generalization, Neural Networks, 5, 24, 992
Chapter XV Advanced Data Mining and Visualization Techniques with Probabilistic Principal Surfaces: Application to Astronomy and Genetics
Chapter XV Advanced Data Mining and Visualization Techniques with Probabilistic Principal Surfaces: Application to Astronomy and Genetics Antonino Staiano University of Napoli, Parthenope, Italy Lara De
More informationVisualization of High Dimensional Scientific Data
Visualization of High Dimensional Scientific Data Roberto Tagliaferri and Antonino Staiano Department of Mathematics and Computer Science, University of Salerno, Italy {robtag,astaiano}@unisa.it Copyright
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationCustomer Data Mining and Visualization by Generative Topographic Mapping Methods
Customer Data Mining and Visualization by Generative Topographic Mapping Methods Jinsan Yang and Byoung-Tak Zhang Artificial Intelligence Lab (SCAI) School of Computer Science and Engineering Seoul National
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationHigh-dimensional labeled data analysis with Gabriel graphs
High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationSPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
More informationComparing large datasets structures through unsupervised learning
Comparing large datasets structures through unsupervised learning Guénaël Cabanes and Younès Bennani LIPN-CNRS, UMR 7030, Université de Paris 13 99, Avenue J-B. Clément, 93430 Villetaneuse, France cabanes@lipn.univ-paris13.fr
More informationExploratory Data Analysis Using Radial Basis Function Latent Variable Models
Exploratory Data Analysis Using Radial Basis Function Latent Variable Models Alan D. Marrs and Andrew R. Webb DERA St Andrews Road, Malvern Worcestershire U.K. WR14 3PS {marrs,webb}@signal.dera.gov.uk
More informationDS6 Phase 4 Napoli group Astroneural 1,0 is available and includes tools for supervised and unsupervised data mining:
DS6 Phase 4 Napoli group Astroneural 1,0 is available and includes tools for supervised and unsupervised data mining: Preprocessing & visualization Supervised (MLP, RBF) Unsupervised (PPS, NEC+dendrogram,
More informationUsing Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationAutomated Hierarchical Mixtures of Probabilistic Principal Component Analyzers
Automated Hierarchical Mixtures of Probabilistic Principal Component Analyzers Ting Su tsu@ece.neu.edu Jennifer G. Dy jdy@ece.neu.edu Department of Electrical and Computer Engineering, Northeastern University,
More informationD-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationMS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
More informationCOLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining
More informationVisualization of Breast Cancer Data by SOM Component Planes
International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationUsing Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps
Technical Report OeFAI-TR-2002-29, extended version published in Proceedings of the International Conference on Artificial Neural Networks, Springer Lecture Notes in Computer Science, Madrid, Spain, 2002.
More informationA Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad
More informationGaussian Process Latent Variable Models for Visualisation of High Dimensional Data
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield,
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationKATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of
More informationVisualization of large data sets using MDS combined with LVQ.
Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
More informationData Visualization with Simultaneous Feature Selection
1 Data Visualization with Simultaneous Feature Selection Dharmesh M. Maniyar and Ian T. Nabney Neural Computing Research Group Aston University, Birmingham. B4 7ET, United Kingdom Email: {maniyard,nabneyit}@aston.ac.uk
More informationBig Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationSpecific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
More informationMixtures of Robust Probabilistic Principal Component Analyzers
Mixtures of Robust Probabilistic Principal Component Analyzers Cédric Archambeau, Nicolas Delannay 2 and Michel Verleysen 2 - University College London, Dept. of Computer Science Gower Street, London WCE
More information270107 - MD - Data Mining
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationNeural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationData Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationAdaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement
Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Toshio Sugihara Abstract In this study, an adaptive
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationLearning Vector Quantization: generalization ability and dynamics of competing prototypes
Learning Vector Quantization: generalization ability and dynamics of competing prototypes Aree Witoelar 1, Michael Biehl 1, and Barbara Hammer 2 1 University of Groningen, Mathematics and Computing Science
More informationAutomated Stellar Classification for Large Surveys with EKF and RBF Neural Networks
Chin. J. Astron. Astrophys. Vol. 5 (2005), No. 2, 203 210 (http:/www.chjaa.org) Chinese Journal of Astronomy and Astrophysics Automated Stellar Classification for Large Surveys with EKF and RBF Neural
More informationCustomer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationExample application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
More informationFiltered Gaussian Processes for Learning with Large Data-Sets
Filtered Gaussian Processes for Learning with Large Data-Sets Jian Qing Shi, Roderick Murray-Smith 2,3, D. Mike Titterington 4,and Barak A. Pearlmutter 3 School of Mathematics and Statistics, University
More informationVisualization of textual data: unfolding the Kohonen maps.
Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: ludovic.lebart@enst.fr) Ludovic Lebart Abstract. The Kohonen self organizing
More informationL25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
More information. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns
Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationVisualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
More informationDAME Astrophysical DAta Mining Mining & & Exploration Exploration GRID
DAME Astrophysical DAta Mining & Exploration on GRID M. Brescia S. G. Djorgovski G. Longo & DAME Working Group Istituto Nazionale di Astrofisica Astronomical Observatory of Capodimonte, Napoli Department
More informationA comparison of various clustering methods and algorithms in data mining
Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationProbabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationMobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
More information6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
More informationUSING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).
More informationSelf-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data
Self-Organizing g Maps (SOM) Ke Chen Outline Introduction ti Biological Motivation Kohonen SOM Learning Algorithm Visualization Method Examples Relevant Issues Conclusions 2 Introduction Self-organizing
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationSupervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
More informationMachine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationData mining and statistical models in marketing campaigns of BT Retail
Data mining and statistical models in marketing campaigns of BT Retail Francesco Vivarelli and Martyn Johnson Database Exploitation, Segmentation and Targeting group BT Retail Pp501 Holborn centre 120
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationA Computational Framework for Exploratory Data Analysis
A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.
More informationUW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision
UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision D.B. Grimes A.P. Shon R.P.N. Rao Dept. of Computer Science and Engineering University of Washington Seattle, WA
More informationTree based ensemble models regularization by convex optimization
Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationGalaxy Morphological Classification
Galaxy Morphological Classification Jordan Duprey and James Kolano Abstract To solve the issue of galaxy morphological classification according to a classification scheme modelled off of the Hubble Sequence,
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationModel-Based Cluster Analysis for Web Users Sessions
Model-Based Cluster Analysis for Web Users Sessions George Pallis, Lefteris Angelis, and Athena Vakali Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece gpallis@ccf.auth.gr
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationMethodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationClassification Techniques for Remote Sensing
Classification Techniques for Remote Sensing Selim Aksoy Department of Computer Engineering Bilkent University Bilkent, 06800, Ankara saksoy@cs.bilkent.edu.tr http://www.cs.bilkent.edu.tr/ saksoy/courses/cs551
More informationVisualizing pay-per-view television customers churn using cartograms and flow maps
Visualizing pay-per-view television customers churn using cartograms and flow maps David L. García 1 and Àngela Nebot1 and Alfredo Vellido 1 1- Dept. de Llenguatges i Sistemes Informàtics Universitat Politècnica
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More information