Component Ordering in Independent Component Analysis Based on Data Power


 Coleen McCormick
 1 years ago
 Views:
Transcription
1 Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals and Systems Group Hogekamp Building, 7522 NB Enschede Hogekamp Building, 7522 NB Enschede The Netherlands The Netherlands Luuk Spreeuwers University of Twente Fac. EEMCS, Signals and Systems Group Hogekamp Building, 7522 NB Enschede The Netherlands Abstract With Independent Component Analysis (ICA) the objective is to separate multidimensional data into independent components. A well known problem in ICA is that since both the independent components and the separation matrix have to be estimated, neither the ordering nor the amplitudes of the components can be determined. One suggested method for solving these ambiguities in ICA is to measure the data power of a component, which indicates the amount of input data variance explained by an independent component. This method resembles the eigenvalue ordering of principle components. We will demonstrate theoretically and with experiments that strong sources can be estimated with higher accuracy than weak components. Based on the selection by data power, a method is developed for estimating independent components in high dimensional spaces. A test with synthetic data shows that the new algorithm can provide higher accuracy than the usual PCA dimension reduction. 1 Introduction Independent component analysis (ICA) is a method to estimate the independent components or sources from which the data is generated. ICA assumes the data is generated by making linear combinations of a number of independent sources, denoted by s. This can be described by: x = A s (1) where x denotes the data or the vector of mixtures and A denotes a mixing matrix. The objective of an ICA algorithm is to reverse the linear mixture by estimating a separation matrix, W, which separates estimates of the independent sources, denoted y, from the mixture. y = W x (2)
2 Comon [5] shows that the ICA estimate has some ambiguities: any multiplication of the separation matrix with a permutation matrix and a scaling matrix results in another valid ICA estimation. Therefore the source estimates are usually supposed to have unit variance. The estimation of the separation matrix can be split in two stages. In the first stage the data is whitened. This is often done by Principle Component Analysis (PCA). Since the covariance matrix of the whitened input data and the source estimates are identity matrices, the remaining mixing matrix after whitening has to be an orthogonal matrix [6]. Because of the ambiguities of ICA only a rotation matrix has to be estimated. The estimation of this rotation matrix forms the second stage. Several algorithms have been proposed to estimate this rotation matrix. The general approach is to use a contrast function which achieves an extremum when the whitened data is rotated such that the marginals are independent [4]. In order to solve the ambiguity of the order of the estimated sources, it has been suggested to consider the contrast function, used in ICA estimation [8], for example kurtosis. The argument is that sources which have low contrast, are Gaussian and are thus hard to separated. However, depending on the contrast function used, source distributions get different contrast levels assigned. For example, there are distributions which have a zero kurtosis while being far from Gaussian. Comon [5] suggested to remove the ambiguity of ICA by ordering the eigenvectors matrix columns, the eigenvalue diagonal and the ICA rotation matrix rows such that the eigenvalues in the eigenvalue matrix are in descending order. The only reason for doing so is to make the components appear in the same order every estimation. In Bayesian ICA, it has also been suggested to use the values of the elements of the estimated mixing matrix to determine whether an estimated source is a real source [2]. In the remainder of the article we will focus on solving the source ordering problem by data power. We will demonstrate that strong sources can be estimated with higher accuracy. Based on this ordering an algorithm will be developed which can provide a solution to overtraining behaviour in high dimensional ICA problems. 1.1 Data power definition In ICA estimation without Bayesian inference, one possibility is to consider the data power of the components. The variance of the input data can be described by individual contributions of the independent components: I E [ ] I x 2 i = E [ { (a i s) 2] J = E [ } ] I s 2 j a 2 i,j i=1 i=1 j=1 i=1 where x is the input data, s represents the independent sources and A is the mixing matrix. In equation 3 it can be seen that each component makes an independent contribution to the total variation of the input data. Sources which provide large contributions are considered strong sources and those which provide small contributions are considered weak sources. There are a few reasons for selecting the components with the highest data power. The procedure is related to the selection based on eigenvalues in PCA, and is actually the same in certain mixtures. It therefore shares some of the arguments for using eigenvalue selection. For example, data power selection provides the best description of the data in independent components in the least squared sense. (3)
3 2 Analysis on relation estimation accuracy and data power Another reason to select only the strong sources is that they are more robust to errors in the estimation. The weak sources are very sensitive to errors in the eigenvectors estimation. Recall that a common approach to ICA estimation is to first whiten the data after which an ICA rotation matrix is estimated. The separation matrix can thus be decomposed into: W = A T D 1 2 E T (4) where E and D are the eigenvector matrix and the eigenvalue matrix respectively, which can be found by performing PCA on the input data. A denotes the ICA rotation matrix. Data power differences between independent components can only occur if the ICA rotation projects different sources on different diagonal elements of the scaling matrix. This causes the sources to be scaled with different factors. Nadal et al.[9] considered the situation in which the mixing matrix was nearly singular and also noise was present. They use the mutual information between the inputs and the outputs of a neural network to demonstrate that weak sources disturb the estimation of strong sources. We will focus more on the influence of the whitening stage in this error. We will also assume the errors are introduced by limited sample behaviour, instead of noise. While in [9] only large differences in strength are considered, it still makes sense to select sources with large data power when the differences are small as will be shown next. When the number of data samples is sufficiently large, the matrices in equation 4 can be estimated accurately, but when only a limited number of samples is available, errors are introduced. Small errors in E have a different effect on strong sources than on weak sources. Consider the 2D mixing problem where a strong and a weak source are mixed with a scaling matrix: [ ] 1 x = 1 s (5) α where α > 1. An error in the eigenvector matrix can be represented as an additional rotation of the input data by rotation matrix R before the separation is performed. Therefore the whitened data, denoted by z, is no longer white: [ ] [ ] [ ] 1 cos ϕ sin ϕ 1 z = 1 s (6) α sin ϕ cos ϕ α [ cos ϕ s = 1 + sin ϕ s ] α 2 (7) cos ϕ s 2 α sin ϕ s 1 Note that crosstalk of source 2 on the estimate of source 1 differs a factor α 2 from the crosstalk of source 1 on the estimate of source 2. The difference between the power caused by the crosstalk is a factor α 4. After whitening an ICA rotation matrix still has to be estimated. Since the data is no longer white, it depends on the specific ICA algorithm chosen what happens with the ICA rotation matrix. Cardoso [3] reported the effects of non white data on some objective functions. He reported that some objective functions acquire an additional correlation term. These objective functions thus attempt to find a rotation which partly minimizes the correlation between the two estimates even though the data is not white. Besides the limited sample behaviour, another possibility for an incorrect estimate of the eigenvalue rotation matrix is when the estimation is done on a subset of the data.
4 This is for instance the case when ICA is used to determine features for recognition purposes. ICA is then performed only on training data, which may not accurately represent the entire data set. The ICA rotation can not react on the error in the eigenvector estimate in such situations. The estimation of sources can also be corrupted by the presence of noise [9]. LearnedMiller [7] already indicated that Gaussian noise filters the probability density functions of the independent components, so they become more Gaussian. Weak sources are more affected than strong sources when equal powered noise is added to every dimension of the observation data, so the estimation of weak sources gets more difficult in the presence of noise and leads to more errors. 3 Experiments with data power In this section several experiments with synthetic data are described to verify the theory of section 2. To verify the crosstalk between strong and weak sources, an experiment has been performed in which two sources are mixed with a diagonal mixing matrix with a diagonal [1,.1]. This mixing matrix causes source 1 and 2 to be a strong and a weak source respectively. Three source configurations are used. In experiment 1 both sources are sub Gaussian. In experiment 2 source 1 is super Gaussian and source 2 is sub Gaussian. In experiment 3 both sources are super Gaussian. Sources are sub (super) Gaussian when the kurtosis of the source is negative (positive) [6]. For all tests all sources consist of 1 samples. The separation of sources is performed by only performing the whitening step, which should be sufficient to separate the sources. According to equation 7 the power of the crosstalk in the two estimates should differ by a factor 1 8. The experiments are repeated a few hundred times. Each time the crosstalk of the sources in the estimates is measured. Histograms of these measurements are plotted in figure 1(a). The horizontal axis in each plot displays the measured crosstalk power. The vertical axis indicates the number of experiments which had an amount of crosstalk as indicated on the horizontal axis. The two plots in each column belong to the same source configuration. The three plots in each row belong to the same source crosstalk in estimate measurement (for example row 1 indicates the crosstalk of source 2 on the estimate of source 1 for the different source configurations). The shapes are the same for every source configuration, but the crosstalk differs indeed a factor 1 8 in the two estimates. After whitening the strong components have less distortion than the weak components. Next ICA estimates the ICA rotation matrix. In the next experiment the same setup is used as in the previous experiment, but now the ICA rotation is also estimated. This is done using the fastica algorithm with a tanh nonlinearity function. In figure 1(b) the histograms of the crosstalk power are given. The ICA rotation matrix in the generating process is an identity matrix, so an accurate estimate of the rotation matrix would result in similar results as in experiment 1. However, the crosstalk between both sources is equal in strength. Apparently ICA reduces the accuracy of the strong source estimate in order to increase the accuracy of the weak source estimate. Nadal et al.[9] also found that the presence of small sources disturbed the estimation of the strong sources when using the infomax criterion. However they assume a clear distinction between weak and strong sources. Since the small components are largely present in the small eigenvalues, this result suggests to remove the smallest principle components before performing ICA. The second possibility of an error in the estimation of the eigenvectors suggested was in a trainingtest situation. In figure 1(c) the results are shown of the same test as in the previous experiment, but now an addition rotation of.3 degrees is introduced to the mixing matrix after separation estimation. Clearly a bias is introduced in the estimate of the components. The strong component causes on average 3 percent of the power of the weak source estimate, while the weak source is hardly present in the
5 strong source estimate, although the influence difference is not a factor 1 8 as in the first situation. Source 1 in estimate 2 Experiment Experiment Experiment Source 1 in estimate 2 Experiment Experiment Experiment Source 2 in estimate x x x 1 9 Source 2 in estimate (a) Crosstalk of sources in estimates after whitening. (b) Crosstalk of the sources in estimates after estimation of the ICA rotation matrix. Source 1 in estimate 2 Experiment Experiment Experiment Source 2 in estimate (c) Crosstalk of the sources after a rotation error is introduced to the model matrix of.3 degrees. Figure 1: Histograms of the crosstalk power between two source estimates. Source 1 and 2 have mixing factors 1 and.1. Three source configurations are used: both sources sub Gaussian, source 1 super Gaussian and source 2 sub Gaussian and both sources super Gaussian.
6 4 Application in blocked ICA (blica) ICA is known for its overtraining behaviour in high dimensional data with limited sample size. Särelä and Vigário [1] provided some analysis on the subject, especially when kurtosis is used as contrast function. A solution to the problem is to reduce the dimension of the data before ICA is performed. Several authors suggested the use of PCA for this reason [1], [6] chapter 13. Source selection is performed after ICA, so it cannot prevent overtraining in the proposed implementation of section 2. When the number of sources present in a group of dimensions of the data is limited, another possibility exists. Instead of considering all data dimensions at once, ICA can be performed on separate dimension clusters of the data. Consider the problem of performing ICA on image data, in which each image is an observation. Every pixel is considered one dimension, so the number of dimensions is high, while the number of images is in general low. Clusters of dimensions can be formed by cutting the images into blocks of equal size. On each block ICA can be performed. Using the data power criterion, the strongest sources of each block are selected and the rest is discarded. The remaining sources of neighbouring blocks are placed into new blocks on which the same operations are performed. This process is repeated until only one block remains. Attias [1] described such an approach for Bayesian ICA. The separation process defines one separation matrix. However, an estimation of the mixing matrix is not defined. This also leaves the data power undefined. The mixing matrix elements can be estimated by the cross correlation between the estimated sources and the data components. This definition allows the use of overlapping blocks, for example block two partly contains dimensions already used in block one. This may prevent border effects: when a strong source is present on the border of block one and two it might be rejected in both blocks since its power is only half in both blocks, while it might be retained if it is in the center of one of the blocks. 5 Experiments with blica To demonstrate the usefulness of the blica algorithm, the following experiment has been performed. Synthetic images are created of size 16x16 pixels from 256 white super Gaussian sources, each consisting of 1, samples. A mixing matrix is constructed with its elements Gaussian distributed. Next masks are applied to the columns of the mixing matrix. The masks have a 2D gaussian shape with uniform random means. 3 strong masks are generated which have both a larger amplitude and a larger spread value. To get an overlap of strong sources in principle components, each strong mask gets another strong mask added. An example of the resulting strong masks is given in figure 2(a). The resulting mixing matrix is used to mix the 256 sources into the synthetic images. From the mixture data 16 sources are estimated using both the PCA ICA method and the blica method. Of the estimated components the amount of crosstalk power is measured. The result is given in figure 2(b). blica estimated all the components PCA also estimated with higher accuracy, and all sources have about the same amount of crosstalk. The PCA method on the other hand has a few sources which consist mostly of crosstalk. The result gets even worse for PCA when the number of dimensions is reduced to 4. It is very difficult to identify a single source which is estimated, most estimates are best described as a linear combination of sources. The result for blica is the crosstalk of the first 4 elements in figure 2(b). Thus if the estimated number of dimensions is incorrect, blica performs better in such mixture configurations.
7 blica pca Crosstalk Sources (a) Example of the mask for strong sources. A strong mask is composed of two Gaussian shapes. Each Gaussian is shared with another source mask. (b) Comparison between the crosstalk in estimated components of both the ICA with PCA dimension reduction and the blica dimension reduction. Estimates of the same components are placed on top of each other Figure 2: blica versus ICA with PCA results. 6 Conclusion We proposed to solve the ambiguity of the ordering of ICA components based on the data power of a component. Although the method has been suggested before, we provided some new arguments in favour of the method: weaker sources are more sensitive to variations in the estimate of the eigenvectors in the whitening stage. In regular ICA, this is partly compensated by ICA rotation, which introduces errors on the estimate of strong sources, but in trainingtest situations weak sources get considerable more crosstalk from strong sources than visa versa. In Nadal et al. [9] also the situation with strong and weak sources is considered, but only with large differences in strength and presence of noise. The trainingtest situation is not considered at all. In Nadal et al. [9] it was suggested to remove the small principle components, since they would contain the small sources. However, as they noted, when two strong sources could only be separated using a smaller principle component, the dimension reduction could lead to separation problems. Making a selection of the strongest sources after ICA prevents this problem. A part of the problem of ICA estimation is that after the whitening, ICA rotation transfers part of the error in the weak estimate to the strong estimate. It might be a good idea to modify the symmetric update, like in FastICA [6], with a strength term, so the strong estimates are less effected by the errors in the weak estimates. Using the component selection criterion, the blica algorithm has been developed, which reduces the overtraining behaviour of ICA in the case that the sources are only locally present. In such a situation the experiment showed that blica can estimate the sources with higher accuracy than the general PCA dimension reduction method.
8 References [1] H. Attias. Learning in high dimensions: modular mixture models. In Proceedings of the 8th International Conference on Artificial Intelligence and Statistics, pages , 21. [2] C. M. Bishop and N. D. Lawrence. Variational bayesian independent component analysis. Technical report, Computer Laboratory, University of Cambridge, 2. [3] J. F. Cardoso. Dependence, correlation and gaussianity in independent component analysis. The Journal of Machine Learning Research, 4: , 23. [4] J.F. Cardoso. Blind signal separation: Statistical principles. Proceedings of the IEEE, 86(1):29 225, [5] Pierre Comon. Independent component analysis, a new concept? Signal Process., 36(3): , [6] A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. Adaptive and Learning Systems for Signal Processing, Communications, and Control. John Wiley and Sons, Inc, 21. [7] E. G. LearnedMiller and J. W. Fisher III. Ica using spacings estimates of entropy. The Journal of Machine Learning Research, 4(78): , 24. [8] Wei Lu and Jagath C. Rajapakse. Eliminating indeterminacy in ica. Neurocomputing, 5:271 29, 23. [9] J.P. Nadal, E. Korutcheva, and F. Aires. Blind source separation in the presence of weak sources. Neural Networks, 13(6): , 2. [1] J. Särelä and R. Vigário. Overlearning in marginal distributionbased ica: analysis and solutions. The Journal of Machine Learning Research, 4(78): , 24.
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationSubspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft
More informationAn Introduction to Independent Component Analysis: InfoMax and FastICA algorithms
Tutorials in Quantitative Methods for Psychology 2010, Vol. 6(1), p. 3138. An Introduction to Independent Component Analysis: InfoMax and FastICA algorithms Dominic Langlois, Sylvain Chartier, and Dominique
More informationIntroduction. Independent Component Analysis
Independent Component Analysis 1 Introduction Independent component analysis (ICA) is a method for finding underlying factors or components from multivariate (multidimensional) statistical data. What distinguishes
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 03 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationPCA to Eigenfaces. CS 510 Lecture #16 March 23 th A 9 dimensional PCA example
PCA to Eigenfaces CS 510 Lecture #16 March 23 th 2015 A 9 dimensional PCA example is dark around the edges and bright in the middle. is light with dark vertical bars. is light with dark horizontal bars.
More informationFactor Rotations in Factor Analyses.
Factor Rotations in Factor Analyses. Hervé Abdi 1 The University of Texas at Dallas Introduction The different methods of factor analysis first extract a set a factors from a data set. These factors are
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationNonlinear Blind Source Separation and Independent Component Analysis
Nonlinear Blind Source Separation and Independent Component Analysis Prof. Juha Karhunen Helsinki University of Technology Neural Networks Research Centre Espoo, Finland Helsinki University of Technology,
More informationEM Clustering Approach for MultiDimensional Analysis of Big Data Set
EM Clustering Approach for MultiDimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationSoft Clustering with Projections: PCA, ICA, and Laplacian
1 Soft Clustering with Projections: PCA, ICA, and Laplacian David Gleich and Leonid Zhukov Abstract In this paper we present a comparison of three projection methods that use the eigenvectors of a matrix
More informationNotes for STA 437/1005 Methods for Multivariate Data
Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.
More informationDimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationFace Recognition using Principle Component Analysis
Face Recognition using Principle Component Analysis Kyungnam Kim Department of Computer Science University of Maryland, College Park MD 20742, USA Summary This is the summary of the basic idea about PCA
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationMehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
More informationNonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., RichardPlouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
More informationT61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577
T61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or
More informationEigenvalues, Eigenvectors, Matrix Factoring, and Principal Components
Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationAccurate and robust image superresolution by neural processing of local image representations
Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica
More informationUnderstanding and Applying Kalman Filtering
Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationManifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
More informationADVANCES IN INDEPENDENT COMPONENT ANALYSIS WITH APPLICATIONS TO DATA MINING
Helsinki University of Technology Dissertations in Computer and Information Science Espoo 2003 Report D4 ADVANCES IN INDEPENDENT COMPONENT ANALYSIS WITH APPLICATIONS TO DATA MINING Ella Bingham Dissertation
More informationGaussian Process Latent Variable Models for Visualisation of High Dimensional Data
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield,
More informationSPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
More informationNonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning
Nonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More informationCHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
More informationIntroduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
More informationDirect Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
More informationLinear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
More informationChapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
More informationIndependent Component Analysis: Algorithms and Applications
Independent Component Analysis: Algorithms and Applications Aapo Hyvärinen and Erkki Oja Neural Networks Research Centre Helsinki University of Technology P.O. Box 5400, FIN02015 HUT, Finland Neural Networks,
More informationFace Recognition using SIFT Features
Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra  1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More information0.1 Phase Estimation Technique
Phase Estimation In this lecture we will describe Kitaev s phase estimation algorithm, and use it to obtain an alternate derivation of a quantum factoring algorithm We will also use this technique to design
More informationOverview of Factor Analysis
Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 354870348 Phone: (205) 3484431 Fax: (205) 3488648 August 1,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More information[1] Diagonal factorization
8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:
More informationFactor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
More information1 2 3 1 1 2 x = + x 2 + x 4 1 0 1
(d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationRandom Projectionbased Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining
Random Projectionbased Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining Kun Liu Hillol Kargupta and Jessica Ryan Abstract This paper explores the possibility of using multiplicative
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association
More informationACEAT493 Data Visualization by PCA, LDA, and ICA
ACEAT493 Data Visualization by PCA, LDA, and ICA TsunYu Yang a, ChaurChin Chen a,b a Institute of Information Systems and Applications, National Tsing Hua U., Taiwan b Department of Computer Science,
More informationCommon factor analysis
Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor
More informationMath 215 HW #6 Solutions
Math 5 HW #6 Solutions Problem 34 Show that x y is orthogonal to x + y if and only if x = y Proof First, suppose x y is orthogonal to x + y Then since x, y = y, x In other words, = x y, x + y = (x y) T
More informationLeastSquares Intersection of Lines
LeastSquares Intersection of Lines Johannes Traa  UIUC 2013 This writeup derives the leastsquares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationPrincipal components analysis
CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some kdimension subspace, where k
More informationPrincipal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
More information1 VECTOR SPACES AND SUBSPACES
1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such
More informationLINEAR ALGEBRA. September 23, 2010
LINEAR ALGEBRA September 3, 00 Contents 0. LUdecomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationA RegimeSwitching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH g.schindlmayr@enbw.com
A RegimeSwitching Model for Electricity Spot Prices Gero Schindlmayr EnBW Trading GmbH g.schindlmayr@enbw.com May 31, 25 A RegimeSwitching Model for Electricity Spot Prices Abstract Electricity markets
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationClassification by Pairwise Coupling
Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating
More informationChapter 17. Orthogonal Matrices and Symmetries of Space
Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length
More informationElementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination
Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used
More informationPartial Least Squares (PLS) Regression.
Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component
More informationThe PearsonICA Package
The PearsonICA Package June 29, 2006 Type Package Title Independent component analysis using score functions from the Pearson system Version 1.22 Date 20060629 Author Juha Karvanen Maintainer Juha Karvanen
More informationUnsupervised and supervised dimension reduction: Algorithms and connections
Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona
More informationA Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks
A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks H. T. Kung Dario Vlah {htk, dario}@eecs.harvard.edu Harvard School of Engineering and Applied Sciences
More informationBackground: State Estimation
State Estimation Cyber Security of the Smart Grid Dr. Deepa Kundur Background: State Estimation University of Toronto Dr. Deepa Kundur (University of Toronto) Cyber Security of the Smart Grid 1 / 81 Dr.
More informationUsing the Singular Value Decomposition
Using the Singular Value Decomposition Emmett J. Ientilucci Chester F. Carlson Center for Imaging Science Rochester Institute of Technology emmett@cis.rit.edu May 9, 003 Abstract This report introduces
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the socalled high dimensional, low sample size (HDLSS) settings, LDA
More informationRachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA
PROC FACTOR: How to Interpret the Output of a RealWorld Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a realworld example of a factor
More informationFace detection is a process of localizing and extracting the face region from the
Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationGeneralized Neutral Portfolio
Generalized Neutral Portfolio Sandeep Bhutani bhutanister@gmail.com December 10, 2009 Abstract An asset allocation framework decomposes the universe of asset returns into factors either by Fundamental
More informationClustering & Visualization
Chapter 5 Clustering & Visualization Clustering in highdimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to highdimensional data.
More informationJoint Use of Factor Analysis (FA) and Data Envelopment Analysis (DEA) for Ranking of Data Envelopment Analysis
Joint Use of Factor Analysis () and Data Envelopment Analysis () for Ranking of Data Envelopment Analysis Reza Nadimi, Fariborz Jolai Abstract This article combines two techniques: data envelopment analysis
More informationConcepts in Machine Learning, Unsupervised Learning & Astronomy Applications
Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications ChingWa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More information3.2 Sources, Sinks, Saddles, and Spirals
3.2. Sources, Sinks, Saddles, and Spirals 6 3.2 Sources, Sinks, Saddles, and Spirals The pictures in this section show solutions to Ay 00 C By 0 C Cy D 0. These are linear equations with constant coefficients
More informationAppendix 4 Simulation software for neuronal network models
Appendix 4 Simulation software for neuronal network models D.1 Introduction This Appendix describes the Matlab software that has been made available with Cerebral Cortex: Principles of Operation (Rolls
More informationFactor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models
Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationL10: Probability, statistics, and estimation theory
L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian
More informationStefanos D. Georgiadis Perttu O. Rantaaho Mika P. Tarvainen Pasi A. Karjalainen. University of Kuopio Department of Applied Physics Kuopio, FINLAND
5 Finnish Signal Processing Symposium (Finsig 5) Kuopio, Finland Stefanos D. Georgiadis Perttu O. Rantaaho Mika P. Tarvainen Pasi A. Karjalainen University of Kuopio Department of Applied Physics Kuopio,
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationCS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major
More informationCheng Soon Ong & Christfried Webers. Canberra February June 2016
c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I
More informationMATH 240 Fall, Chapter 1: Linear Equations and Matrices
MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS
More informationNeural Networks Lesson 5  Cluster Analysis
Neural Networks Lesson 5  Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt.  Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationRecall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.
ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the ndimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?
More information1 Example of Time Series Analysis by SSA 1
1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationSolutions to Exam in Speech Signal Processing EN2300
Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.
More informationPalmprint Recognition with PCA and ICA
Abstract Palmprint Recognition with PCA and ICA Tee Connie, Andrew Teoh, Michael Goh, David Ngo Faculty of Information Sciences and Technology, Multimedia University, Melaka, Malaysia tee.connie@mmu.edu.my
More informationBayesian Image SuperResolution
Bayesian Image SuperResolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian
More informationMultivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I  1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
More informationCompensation Basics  Bagwell. Compensation Basics. C. Bruce Bagwell MD, Ph.D. Verity Software House, Inc.
Compensation Basics C. Bruce Bagwell MD, Ph.D. Verity Software House, Inc. 2003 1 Intrinsic or Autofluorescence p2 ac 1,2 c 1 ac 1,1 p1 In order to describe how the general process of signal crossover
More information