# Component Ordering in Independent Component Analysis Based on Data Power

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals and Systems Group Hogekamp Building, 7522 NB Enschede Hogekamp Building, 7522 NB Enschede The Netherlands The Netherlands Luuk Spreeuwers University of Twente Fac. EEMCS, Signals and Systems Group Hogekamp Building, 7522 NB Enschede The Netherlands Abstract With Independent Component Analysis (ICA) the objective is to separate multidimensional data into independent components. A well known problem in ICA is that since both the independent components and the separation matrix have to be estimated, neither the ordering nor the amplitudes of the components can be determined. One suggested method for solving these ambiguities in ICA is to measure the data power of a component, which indicates the amount of input data variance explained by an independent component. This method resembles the eigenvalue ordering of principle components. We will demonstrate theoretically and with experiments that strong sources can be estimated with higher accuracy than weak components. Based on the selection by data power, a method is developed for estimating independent components in high dimensional spaces. A test with synthetic data shows that the new algorithm can provide higher accuracy than the usual PCA dimension reduction. 1 Introduction Independent component analysis (ICA) is a method to estimate the independent components or sources from which the data is generated. ICA assumes the data is generated by making linear combinations of a number of independent sources, denoted by s. This can be described by: x = A s (1) where x denotes the data or the vector of mixtures and A denotes a mixing matrix. The objective of an ICA algorithm is to reverse the linear mixture by estimating a separation matrix, W, which separates estimates of the independent sources, denoted y, from the mixture. y = W x (2)

2 Comon [5] shows that the ICA estimate has some ambiguities: any multiplication of the separation matrix with a permutation matrix and a scaling matrix results in another valid ICA estimation. Therefore the source estimates are usually supposed to have unit variance. The estimation of the separation matrix can be split in two stages. In the first stage the data is whitened. This is often done by Principle Component Analysis (PCA). Since the covariance matrix of the whitened input data and the source estimates are identity matrices, the remaining mixing matrix after whitening has to be an orthogonal matrix [6]. Because of the ambiguities of ICA only a rotation matrix has to be estimated. The estimation of this rotation matrix forms the second stage. Several algorithms have been proposed to estimate this rotation matrix. The general approach is to use a contrast function which achieves an extremum when the whitened data is rotated such that the marginals are independent [4]. In order to solve the ambiguity of the order of the estimated sources, it has been suggested to consider the contrast function, used in ICA estimation [8], for example kurtosis. The argument is that sources which have low contrast, are Gaussian and are thus hard to separated. However, depending on the contrast function used, source distributions get different contrast levels assigned. For example, there are distributions which have a zero kurtosis while being far from Gaussian. Comon [5] suggested to remove the ambiguity of ICA by ordering the eigenvectors matrix columns, the eigenvalue diagonal and the ICA rotation matrix rows such that the eigenvalues in the eigenvalue matrix are in descending order. The only reason for doing so is to make the components appear in the same order every estimation. In Bayesian ICA, it has also been suggested to use the values of the elements of the estimated mixing matrix to determine whether an estimated source is a real source [2]. In the remainder of the article we will focus on solving the source ordering problem by data power. We will demonstrate that strong sources can be estimated with higher accuracy. Based on this ordering an algorithm will be developed which can provide a solution to overtraining behaviour in high dimensional ICA problems. 1.1 Data power definition In ICA estimation without Bayesian inference, one possibility is to consider the data power of the components. The variance of the input data can be described by individual contributions of the independent components: I E [ ] I x 2 i = E [ { (a i s) 2] J = E [ } ] I s 2 j a 2 i,j i=1 i=1 j=1 i=1 where x is the input data, s represents the independent sources and A is the mixing matrix. In equation 3 it can be seen that each component makes an independent contribution to the total variation of the input data. Sources which provide large contributions are considered strong sources and those which provide small contributions are considered weak sources. There are a few reasons for selecting the components with the highest data power. The procedure is related to the selection based on eigenvalues in PCA, and is actually the same in certain mixtures. It therefore shares some of the arguments for using eigenvalue selection. For example, data power selection provides the best description of the data in independent components in the least squared sense. (3)

3 2 Analysis on relation estimation accuracy and data power Another reason to select only the strong sources is that they are more robust to errors in the estimation. The weak sources are very sensitive to errors in the eigenvectors estimation. Recall that a common approach to ICA estimation is to first whiten the data after which an ICA rotation matrix is estimated. The separation matrix can thus be decomposed into: W = A T D 1 2 E T (4) where E and D are the eigenvector matrix and the eigenvalue matrix respectively, which can be found by performing PCA on the input data. A denotes the ICA rotation matrix. Data power differences between independent components can only occur if the ICA rotation projects different sources on different diagonal elements of the scaling matrix. This causes the sources to be scaled with different factors. Nadal et al.[9] considered the situation in which the mixing matrix was nearly singular and also noise was present. They use the mutual information between the inputs and the outputs of a neural network to demonstrate that weak sources disturb the estimation of strong sources. We will focus more on the influence of the whitening stage in this error. We will also assume the errors are introduced by limited sample behaviour, instead of noise. While in [9] only large differences in strength are considered, it still makes sense to select sources with large data power when the differences are small as will be shown next. When the number of data samples is sufficiently large, the matrices in equation 4 can be estimated accurately, but when only a limited number of samples is available, errors are introduced. Small errors in E have a different effect on strong sources than on weak sources. Consider the 2D mixing problem where a strong and a weak source are mixed with a scaling matrix: [ ] 1 x = 1 s (5) α where α > 1. An error in the eigenvector matrix can be represented as an additional rotation of the input data by rotation matrix R before the separation is performed. Therefore the whitened data, denoted by z, is no longer white: [ ] [ ] [ ] 1 cos ϕ sin ϕ 1 z = 1 s (6) α sin ϕ cos ϕ α [ cos ϕ s = 1 + sin ϕ s ] α 2 (7) cos ϕ s 2 α sin ϕ s 1 Note that crosstalk of source 2 on the estimate of source 1 differs a factor α 2 from the crosstalk of source 1 on the estimate of source 2. The difference between the power caused by the crosstalk is a factor α 4. After whitening an ICA rotation matrix still has to be estimated. Since the data is no longer white, it depends on the specific ICA algorithm chosen what happens with the ICA rotation matrix. Cardoso [3] reported the effects of non white data on some objective functions. He reported that some objective functions acquire an additional correlation term. These objective functions thus attempt to find a rotation which partly minimizes the correlation between the two estimates even though the data is not white. Besides the limited sample behaviour, another possibility for an incorrect estimate of the eigenvalue rotation matrix is when the estimation is done on a subset of the data.

4 This is for instance the case when ICA is used to determine features for recognition purposes. ICA is then performed only on training data, which may not accurately represent the entire data set. The ICA rotation can not react on the error in the eigenvector estimate in such situations. The estimation of sources can also be corrupted by the presence of noise [9]. Learned-Miller [7] already indicated that Gaussian noise filters the probability density functions of the independent components, so they become more Gaussian. Weak sources are more affected than strong sources when equal powered noise is added to every dimension of the observation data, so the estimation of weak sources gets more difficult in the presence of noise and leads to more errors. 3 Experiments with data power In this section several experiments with synthetic data are described to verify the theory of section 2. To verify the crosstalk between strong and weak sources, an experiment has been performed in which two sources are mixed with a diagonal mixing matrix with a diagonal [1,.1]. This mixing matrix causes source 1 and 2 to be a strong and a weak source respectively. Three source configurations are used. In experiment 1 both sources are sub Gaussian. In experiment 2 source 1 is super Gaussian and source 2 is sub Gaussian. In experiment 3 both sources are super Gaussian. Sources are sub (super) Gaussian when the kurtosis of the source is negative (positive) [6]. For all tests all sources consist of 1 samples. The separation of sources is performed by only performing the whitening step, which should be sufficient to separate the sources. According to equation 7 the power of the crosstalk in the two estimates should differ by a factor 1 8. The experiments are repeated a few hundred times. Each time the crosstalk of the sources in the estimates is measured. Histograms of these measurements are plotted in figure 1(a). The horizontal axis in each plot displays the measured crosstalk power. The vertical axis indicates the number of experiments which had an amount of crosstalk as indicated on the horizontal axis. The two plots in each column belong to the same source configuration. The three plots in each row belong to the same source crosstalk in estimate measurement (for example row 1 indicates the crosstalk of source 2 on the estimate of source 1 for the different source configurations). The shapes are the same for every source configuration, but the crosstalk differs indeed a factor 1 8 in the two estimates. After whitening the strong components have less distortion than the weak components. Next ICA estimates the ICA rotation matrix. In the next experiment the same setup is used as in the previous experiment, but now the ICA rotation is also estimated. This is done using the fastica algorithm with a tanh nonlinearity function. In figure 1(b) the histograms of the crosstalk power are given. The ICA rotation matrix in the generating process is an identity matrix, so an accurate estimate of the rotation matrix would result in similar results as in experiment 1. However, the crosstalk between both sources is equal in strength. Apparently ICA reduces the accuracy of the strong source estimate in order to increase the accuracy of the weak source estimate. Nadal et al.[9] also found that the presence of small sources disturbed the estimation of the strong sources when using the infomax criterion. However they assume a clear distinction between weak and strong sources. Since the small components are largely present in the small eigenvalues, this result suggests to remove the smallest principle components before performing ICA. The second possibility of an error in the estimation of the eigenvectors suggested was in a training-test situation. In figure 1(c) the results are shown of the same test as in the previous experiment, but now an addition rotation of.3 degrees is introduced to the mixing matrix after separation estimation. Clearly a bias is introduced in the estimate of the components. The strong component causes on average 3 percent of the power of the weak source estimate, while the weak source is hardly present in the

5 strong source estimate, although the influence difference is not a factor 1 8 as in the first situation. Source 1 in estimate 2 Experiment Experiment Experiment Source 1 in estimate 2 Experiment Experiment Experiment Source 2 in estimate x x x 1 9 Source 2 in estimate (a) Crosstalk of sources in estimates after whitening. (b) Crosstalk of the sources in estimates after estimation of the ICA rotation matrix. Source 1 in estimate 2 Experiment Experiment Experiment Source 2 in estimate (c) Crosstalk of the sources after a rotation error is introduced to the model matrix of.3 degrees. Figure 1: Histograms of the crosstalk power between two source estimates. Source 1 and 2 have mixing factors 1 and.1. Three source configurations are used: both sources sub Gaussian, source 1 super Gaussian and source 2 sub Gaussian and both sources super Gaussian.

7 blica pca Crosstalk Sources (a) Example of the mask for strong sources. A strong mask is composed of two Gaussian shapes. Each Gaussian is shared with another source mask. (b) Comparison between the crosstalk in estimated components of both the ICA with PCA dimension reduction and the blica dimension reduction. Estimates of the same components are placed on top of each other Figure 2: blica versus ICA with PCA results. 6 Conclusion We proposed to solve the ambiguity of the ordering of ICA components based on the data power of a component. Although the method has been suggested before, we provided some new arguments in favour of the method: weaker sources are more sensitive to variations in the estimate of the eigenvectors in the whitening stage. In regular ICA, this is partly compensated by ICA rotation, which introduces errors on the estimate of strong sources, but in training-test situations weak sources get considerable more crosstalk from strong sources than visa versa. In Nadal et al. [9] also the situation with strong and weak sources is considered, but only with large differences in strength and presence of noise. The training-test situation is not considered at all. In Nadal et al. [9] it was suggested to remove the small principle components, since they would contain the small sources. However, as they noted, when two strong sources could only be separated using a smaller principle component, the dimension reduction could lead to separation problems. Making a selection of the strongest sources after ICA prevents this problem. A part of the problem of ICA estimation is that after the whitening, ICA rotation transfers part of the error in the weak estimate to the strong estimate. It might be a good idea to modify the symmetric update, like in FastICA [6], with a strength term, so the strong estimates are less effected by the errors in the weak estimates. Using the component selection criterion, the blica algorithm has been developed, which reduces the overtraining behaviour of ICA in the case that the sources are only locally present. In such a situation the experiment showed that blica can estimate the sources with higher accuracy than the general PCA dimension reduction method.

8 References [1] H. Attias. Learning in high dimensions: modular mixture models. In Proceedings of the 8th International Conference on Artificial Intelligence and Statistics, pages , 21. [2] C. M. Bishop and N. D. Lawrence. Variational bayesian independent component analysis. Technical report, Computer Laboratory, University of Cambridge, 2. [3] J. F. Cardoso. Dependence, correlation and gaussianity in independent component analysis. The Journal of Machine Learning Research, 4: , 23. [4] J.F. Cardoso. Blind signal separation: Statistical principles. Proceedings of the IEEE, 86(1):29 225, [5] Pierre Comon. Independent component analysis, a new concept? Signal Process., 36(3): , [6] A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. Adaptive and Learning Systems for Signal Processing, Communications, and Control. John Wiley and Sons, Inc, 21. [7] E. G. Learned-Miller and J. W. Fisher III. Ica using spacings estimates of entropy. The Journal of Machine Learning Research, 4(7-8): , 24. [8] Wei Lu and Jagath C. Rajapakse. Eliminating indeterminacy in ica. Neurocomputing, 5:271 29, 23. [9] J.-P. Nadal, E. Korutcheva, and F. Aires. Blind source separation in the presence of weak sources. Neural Networks, 13(6): , 2. [1] J. Särelä and R. Vigário. Overlearning in marginal distribution-based ica: analysis and solutions. The Journal of Machine Learning Research, 4(7-8): , 24.

### Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

### Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

### An Introduction to Independent Component Analysis: InfoMax and FastICA algorithms

Tutorials in Quantitative Methods for Psychology 2010, Vol. 6(1), p. 31-38. An Introduction to Independent Component Analysis: InfoMax and FastICA algorithms Dominic Langlois, Sylvain Chartier, and Dominique

### Introduction. Independent Component Analysis

Independent Component Analysis 1 Introduction Independent component analysis (ICA) is a method for finding underlying factors or components from multivariate (multidimensional) statistical data. What distinguishes

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### PCA to Eigenfaces. CS 510 Lecture #16 March 23 th A 9 dimensional PCA example

PCA to Eigenfaces CS 510 Lecture #16 March 23 th 2015 A 9 dimensional PCA example is dark around the edges and bright in the middle. is light with dark vertical bars. is light with dark horizontal bars.

### Factor Rotations in Factor Analyses.

Factor Rotations in Factor Analyses. Hervé Abdi 1 The University of Texas at Dallas Introduction The different methods of factor analysis first extract a set a factors from a data set. These factors are

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### Nonlinear Blind Source Separation and Independent Component Analysis

Nonlinear Blind Source Separation and Independent Component Analysis Prof. Juha Karhunen Helsinki University of Technology Neural Networks Research Centre Espoo, Finland Helsinki University of Technology,

### EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

### Soft Clustering with Projections: PCA, ICA, and Laplacian

1 Soft Clustering with Projections: PCA, ICA, and Laplacian David Gleich and Leonid Zhukov Abstract In this paper we present a comparison of three projection methods that use the eigenvectors of a matrix

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Face Recognition using Principle Component Analysis

Face Recognition using Principle Component Analysis Kyungnam Kim Department of Computer Science University of Maryland, College Park MD 20742, USA Summary This is the summary of the basic idea about PCA

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

### Nonlinear Iterative Partial Least Squares Method

Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

### T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

### Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

### DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

### BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

### Accurate and robust image superresolution by neural processing of local image representations

Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica

### Understanding and Applying Kalman Filtering

Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

### Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

### ADVANCES IN INDEPENDENT COMPONENT ANALYSIS WITH APPLICATIONS TO DATA MINING

Helsinki University of Technology Dissertations in Computer and Information Science Espoo 2003 Report D4 ADVANCES IN INDEPENDENT COMPONENT ANALYSIS WITH APPLICATIONS TO DATA MINING Ella Bingham Dissertation

### Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield,

### SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

### Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

### Factor analysis. Angela Montanari

Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

### CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

### Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

### Direct Methods for Solving Linear Systems. Matrix Factorization

Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

### Linear Algebra Review. Vectors

Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

### Chapter 6. Orthogonality

6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

### Independent Component Analysis: Algorithms and Applications

Independent Component Analysis: Algorithms and Applications Aapo Hyvärinen and Erkki Oja Neural Networks Research Centre Helsinki University of Technology P.O. Box 5400, FIN-02015 HUT, Finland Neural Networks,

### Face Recognition using SIFT Features

Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### 0.1 Phase Estimation Technique

Phase Estimation In this lecture we will describe Kitaev s phase estimation algorithm, and use it to obtain an alternate derivation of a quantum factoring algorithm We will also use this technique to design

### Overview of Factor Analysis

Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### [1] Diagonal factorization

8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:

### Factor Analysis. Chapter 420. Introduction

Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

### 1 2 3 1 1 2 x = + x 2 + x 4 1 0 1

(d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which

### Similarity and Diagonalization. Similar Matrices

MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

### Random Projection-based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining

Random Projection-based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining Kun Liu Hillol Kargupta and Jessica Ryan Abstract This paper explores the possibility of using multiplicative

### Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association

### ACEAT-493 Data Visualization by PCA, LDA, and ICA

ACEAT-493 Data Visualization by PCA, LDA, and ICA Tsun-Yu Yang a, Chaur-Chin Chen a,b a Institute of Information Systems and Applications, National Tsing Hua U., Taiwan b Department of Computer Science,

### Common factor analysis

Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor

### Math 215 HW #6 Solutions

Math 5 HW #6 Solutions Problem 34 Show that x y is orthogonal to x + y if and only if x = y Proof First, suppose x y is orthogonal to x + y Then since x, y = y, x In other words, = x y, x + y = (x y) T

### Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

### Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

### Principal components analysis

CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k

### Principal Component Analysis

Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

### 1 VECTOR SPACES AND SUBSPACES

1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such

### LINEAR ALGEBRA. September 23, 2010

LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### A Regime-Switching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH g.schindlmayr@enbw.com

A Regime-Switching Model for Electricity Spot Prices Gero Schindlmayr EnBW Trading GmbH g.schindlmayr@enbw.com May 31, 25 A Regime-Switching Model for Electricity Spot Prices Abstract Electricity markets

### Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

### Classification by Pairwise Coupling

Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating

### Chapter 17. Orthogonal Matrices and Symmetries of Space

Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length

### Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination

Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used

### Partial Least Squares (PLS) Regression.

Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component

### The PearsonICA Package

The PearsonICA Package June 29, 2006 Type Package Title Independent component analysis using score functions from the Pearson system Version 1.2-2 Date 2006-06-29 Author Juha Karvanen Maintainer Juha Karvanen

### Unsupervised and supervised dimension reduction: Algorithms and connections

Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona

### A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks

A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks H. T. Kung Dario Vlah {htk, dario}@eecs.harvard.edu Harvard School of Engineering and Applied Sciences

### Background: State Estimation

State Estimation Cyber Security of the Smart Grid Dr. Deepa Kundur Background: State Estimation University of Toronto Dr. Deepa Kundur (University of Toronto) Cyber Security of the Smart Grid 1 / 81 Dr.

### Using the Singular Value Decomposition

Using the Singular Value Decomposition Emmett J. Ientilucci Chester F. Carlson Center for Imaging Science Rochester Institute of Technology emmett@cis.rit.edu May 9, 003 Abstract This report introduces

### Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

### Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor

### Face detection is a process of localizing and extracting the face region from the

Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.

### AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

### Generalized Neutral Portfolio

Generalized Neutral Portfolio Sandeep Bhutani bhutanister@gmail.com December 10, 2009 Abstract An asset allocation framework decomposes the universe of asset returns into factors either by Fundamental

### Clustering & Visualization

Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

### Joint Use of Factor Analysis (FA) and Data Envelopment Analysis (DEA) for Ranking of Data Envelopment Analysis

Joint Use of Factor Analysis () and Data Envelopment Analysis () for Ranking of Data Envelopment Analysis Reza Nadimi, Fariborz Jolai Abstract This article combines two techniques: data envelopment analysis

### Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications

Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications Ching-Wa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers

### Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

### 3.2 Sources, Sinks, Saddles, and Spirals

3.2. Sources, Sinks, Saddles, and Spirals 6 3.2 Sources, Sinks, Saddles, and Spirals The pictures in this section show solutions to Ay 00 C By 0 C Cy D 0. These are linear equations with constant coefficients

### Appendix 4 Simulation software for neuronal network models

Appendix 4 Simulation software for neuronal network models D.1 Introduction This Appendix describes the Matlab software that has been made available with Cerebral Cortex: Principles of Operation (Rolls

### Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

### Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Stefanos D. Georgiadis Perttu O. Ranta-aho Mika P. Tarvainen Pasi A. Karjalainen. University of Kuopio Department of Applied Physics Kuopio, FINLAND

5 Finnish Signal Processing Symposium (Finsig 5) Kuopio, Finland Stefanos D. Georgiadis Perttu O. Ranta-aho Mika P. Tarvainen Pasi A. Karjalainen University of Kuopio Department of Applied Physics Kuopio,

### Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

### CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen

CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major

### Cheng Soon Ong & Christfried Webers. Canberra February June 2016

c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I

### MATH 240 Fall, Chapter 1: Linear Equations and Matrices

MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS

### Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

### Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the n-dimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?

### 1 Example of Time Series Analysis by SSA 1

1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'-SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Solutions to Exam in Speech Signal Processing EN2300

Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.

### Palmprint Recognition with PCA and ICA

Abstract Palmprint Recognition with PCA and ICA Tee Connie, Andrew Teoh, Michael Goh, David Ngo Faculty of Information Sciences and Technology, Multimedia University, Melaka, Malaysia tee.connie@mmu.edu.my

### Bayesian Image Super-Resolution

Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian