BLOCK JACOBI-TYPE METHODS FOR LOG-LIKELIHOOD BASED LINEAR INDEPENDENT SUBSPACE ANALYSIS
|
|
|
- Corey Carson
- 9 years ago
- Views:
Transcription
1 BLOCK JACOBI-TYPE METHODS FOR LOG-LIKELIHOOD BASED LINEAR INDEPENDENT SUBSPACE ANALYSIS Hao Shen, Knut Hüper National ICT Australia, Australia, and The Australian National University, Australia Martin Kleinsteuber Department of Mathematics, University of Würzburg, Germany ABSTRACT Independent Subspace Analysis (ISA) is a natural generalisation of Independent Component Analysis (ICA) incorporated with invariant feature subspaces, where mutual statistical independence exists between subspaces, while mutual statistical dependence is still allowed between components within the same subspace. In this paper, we develop a general scheme of block Jacobi-type ISA methods which optimise a popular family of log-likelihood based ISA contrast functions. It turns out that block Jacobi-type ISA method is an efficient tool for both parametric and nonparametric approaches. Rigorous analysis regarding the local convergence properties is provided in a general sense. A concrete realisation of the block Jacobi-type ISA method, employing a Newton step strategy, is proposed and demonstrates its local quadratic convergence properties to a correct subspace separation. Performance of the proposed algorithms is investigated by numerical experiments.. INTRODUCTION As a generalisation of the standard blind source separation (BSS), the so-called multidimensional blind source separation (MBSS) studies the problem of extracting sources in terms of groups other than individual signals. Since the success of using Independent Component Analysis (ICA) to solve BSS, meanwhile, an analogous statistical tool to ICA has been proposed for solving MBSS as Multidimensional Independent Component Analysis (MICA), which assumes that components from different groups are mutually statistically independent, while mutual statistical dependence is still allowed between components in the same subspace. Incorporated with invariant feature spaces, MICA is also referred to as Independent Subspace Analysis (ISA) National ICT Australia is funded by the Australian Government s Department of Communications, Information Technology and the Arts and the Australian Research Council through Backing Australia s Ability and the ICT Research Centre of Excellence programs. s: [email protected], [email protected] [email protected]. In this work, we study the problem of linear ISA from an optimisation point of view. According to the pioneering work, it shows that, in general, any standard ICA algorithm can be adapted to solve MICA in two steps: (i) to utilise a standard ICA method to estimate all individual signals; (ii) to construct mutually statistically independent subspaces by grouping dependent signals together. The Jacobi-type method is an important tool for solving the standard linear ICA problem. It jointly diagonalises a given set of commuting symmetric matrices, which are constructed in accordance with certain ICA models, such as JADE or MaxKurt, 3. Apart from a full joint diagonalisation of a set of symmetric matrices, it has been shown that the problem of MICA can be solved by a joint block diagonalisation with respect to a fixed block structure 4. Recently, a class of MICA methods based on joint block diagonalisation has been developed in 4, 5 by performing a standard Jacobi-type method as in 6 followed by certain permutations on the columns of the demixing matrix, to obtain block diagonalisation of a set of symmetric matrices. Although the efficiency of this approaches has been verified by numerical evidence, to our best knowledge, up to now there was no theory developed yet to guarantee that the efficiency and convergence properties of standard ICA algorithms hold for their MICA/ISA counterparts, as well. It is well known that the Jacobi-type method is essentially an optimisation procedure. Instead of optimising over a single parameter at one time, the standard Jacobi-type method has been generalised to the so-called block Jacobitype method 7, which optimises over several parameters simultaneously. It has also been shown that a convenient setting for doing linear ISA is indeed a flag manifold 8. In this paper, we develop a general scheme of block Jacobitype ISA methods on flag manifolds, in order to optimise a popular family of log-likelihood based ISA contrast functions. The paper is organised as follows. Section briefly introduces the linear ISA model with log-likelihood based ISA contrast functions and a block Jacobi-type method on a flag manifold. In Section 3, we give a critical point analysis of the log-likelihood based ISA contrast function followed
2 by a study of the Hessian, and propose a general scheme of block Jacobi-type ISA methods. Local convergence results of the proposed methods are presented without proof. By using a Newton step strategy, a concrete block Jacobi-type ISA method is formulated. It shares the same local convergence properties with the general cases. Analogous results on a similar nonparametric ISA approach, which is based on kernel density estimation, are also discussed. Finally, numerical experiments in Section 4 investigate the performance of the proposed algorithms.. PRELIMINARIES: LINEAR ISA MODEL AND BLOCK JACOBI-TYPE METHODS.. Linear ISA Model and linear ISA Contrast In this work, we study the standard noiseless linear instantaneous ISA model as follows, refer to for more details. Let Z = AS, () where S = s,..., s n R m n represents n samples of m sources with m n, which consist of p mutually statistically independent groups with the dimension of each subspace being d i, for i =,..., p, and p i= d i = m. The matrix A R m m is the full rank mixing matrix, and Z = z,..., z n R m n represents the observed mixtures. It is important to notice that the mutual statistical independence is ensured only if the sample size n tends to infinity. Nevertheless, for our theoretical analysis in Section 3, we assume that the independence holds even if the sample size is finite. The task of the linear ISA model is to recover the source signals S in mutually statistically independent groups based only on the observations Z via a linear transformation Q = B Z, () where B R m m is the full rank demixing matrix, and Q R m n represents p independent groups of d dependent signals. Let B = b,..., b p R m m with b i R m di and rk b i = d i for i =,..., p. If B = b,..., b p R m m is a correct demixing matrix and every b i extracts a statistically independent group of d i dependent signals, then B with span b i = span b i, for all i =,..., p, provides a correct separation of independent groups as well. Let us define r i := i j= d j for all i =,..., p. It is clear that < r <... < r p = m is an increasing sequence of integers. The solution set of the linear ISA problem can then be identified as the collection of ordered sets of p vector subspaces V i of R m with dim V i = r i for i =,... p and V... V p = R m, i.e., the flag manifold F l(r,..., r p ). In this work, we only study the situation where all independent subspaces have the same dimension d. For the sake of simplicity, in the following, we use F l(p, d) to denote the flag manifold F l(r,..., r p ) with r i = i d for all i =,..., p. Similar to performing linear ICA, the so-called whitening process of the mixtures can be applied to simplify the demixing ISA model () as follows Y = X W, (3) where W = w,..., w n = V Z R m n is the whitened observation (V R m m is the whitening matrix), X R m m is an orthogonal matrix being the demixing matrix, and Y = y,..., y n R m n contains the reconstructed p independent groups of signals. Let us denote the special orthogonal group of order m by SO(m) := { X R m m X X = I, det(x) = }. Let X = x,..., x p SO(m) with x i R m d, i.e., x i x i = I. We define an equivalence relation on SO(m) by the following: for any X, X SO(m), X X if and only if span x i = span x i, for all i =,..., p. We denote the equivalence class containing X SO(m) under by X. Obviously, every equivalence class X, for X SO(m), identifies exactly one point in F l(p, d) and X is a representative of X F l(p, d). The key idea of linear ISA is to maximise the mutual statistical independence between the norms of the projections of observations on a set of linear subspaces. Minimisation of the negtive log-likelihood between recovered signals is a widely used independence criterion in standard ICA. We adapt the same criterion to the linear ISA case as follows F : F l(p, d) R, ( ) w i F ( X ) := E i log ψ x kx k wi, k= where ψ( ) is the differential probability density function (PDF) of the norm of the projection of the observations on a certain linear subspace and E i is the empirical mean over index i. It is easily seen that the ISA contrast function (4) is independent of concrete representatives of an equivalence class X. The PDF ψ is usually chosen hypothetically based on applications. For the sake of simplicity, we use G(a) := log ψ(a). It can just be considered as a special parametric approach... Block Jacobi-type Methods on Flag Manifolds Block Jacobi-type procedures were developed as a generalisation of standard Jacobi method in terms of grouped variables for solving symmetric eigenvalue problems or singular value problems 9. Recent work in 7 formulates the so-called block Jacobi-type method as an optimisation approach on manifolds. We now adapt the general formulation as in 7 to the present setting, the flag manifold F l(p, d). (4)
3 Denote the vector space of all m m skew-symmetric matrices by so(m) := {Ω R m m Ω = Ω }. Let m = p d. We fix a subspace B(p, d) so(m) with fixed block structure as follows. Any Ω B(p, d) consists of p blocks of dimension d d. The (d d)-diagonal blocks ω ll of Ω = (ω kl ) p k,l= B(p, d) are all equal to zero. For example, for p = 3, Ω B(3, d) looks as Ω = ( 3 3 ω ω 3 ω 3 3 ω3 ω 3 ω ), (5) where ω kl = ω lk R3 3. By means of the matrix exponential map, a local parameterisation µ X of F l(p, d) around X is defined as µ X : B(p, d) F l(p, d), µ X (Ω) := Xe Ω. (6) The tangent space of F l(p, d) at X is then T X F l(p, d) = d d t µ X (t B(p, d)) t=. (7) Now let us decompose B(p, d) as follows B(p, d) = B kl (p, d), (8) k<l p where all blocks of Ω B kl (p, d) = R d d are equal to zero except for the kl-th and lk-th block. We then define V kl X := d d t µ X (t B kl (p, d)) t=. (9) It is clear that (V X kl ) k<l p gives a direct sum decomposition of the tangent space T X F l(p, d) as well, i.e., The smooth maps T X F l(p, d) = k<l p τ kl : B kl (p, d) F l(p, d) F l(p, d), τ kl (Ω, X ) := µ X (Ω), V kl X. () () for all k < l p, are referred to as the basic transformations. Let f : F l(p, d) R be a smooth cost function. A block Jacobi-type method for minimising f can be summarised as follows Algorithm Block Jacobi-type method on F l(p, d) Step : Given an initial guess X F l(p, d) and a set of basic transformations τ kl, for all k < l p, as defined in (). Step : (Sweep) Let X old = X. For k < l p (i) Compute Ω := arg min (f τ kl )(Ω, X ), Ω B kl (p,d) (ii) Update X τ kl (Ω, X ). Step 3: If δ( X old, X ) is small enough, Stop. Otherwise, goto Step. Here δ( X old, X ) represents a certain distance measure between two points on F l(p, d). Following the result of Theorem.4 in 7, we state the following theorem without proof. Theorem Let f : F l(p, d) R be a smooth cost function and X F l(p, d) be a local minimum of f. If the Hessian H f ( X ) is nondegenerated and the vector subspaces V X kl, for all k < l p, as in () are mutually orthonormal with respect to the Hessian H f ( X ), then the block Jacobi-type method converges locally quadratically fast. 3. BLOCK JACOBI-TYPE ISA METHODS 3.. Analysis of Linear ISA Contrasts In this section, we will first show that the log-likelihood based linear ISA contrast function as in (4) fulfills the conditions stated in Theorem, i.e., one can develop a scheme of block Jacobi-type methods, which minimise the negative log-likelihood based ISA contrasts, with local quadratic convergence properties. By the chain rule, the first derivative of the contrast F is calculated as d dt (F µ X )(tω) t= = tr ωkl(u kl (X) u lk (X)), () k<l p where u kl (X), u lk (X) R d d with u kl (X)=E i G ( w i x lx l wi )x kw i wi x l, and (3) u lk (X)=E i G ( w i x kx k wi )x kw i wi x l It can be shown that if X F l(p, d) is a correct demixing point, by the whitening properties of the sources, the term u kl (X ) is equal to for all k l. Thus it follows that the first derivative of F vanishes at X. Therefore a correct demixing point X is indeed a critical point of F. Note that there exist more critical points than the correct separation points. By a straightforward computation, the second derivative of F at a correct separation point X is calculated as follows d d t (F µ X )(tω) t= (4) = tr ωkl (v kk (X ) + v ll (X )) ω kl k<l p
4 where v kk (X) = E i G ( w i x kx k wi )x kw i wi x k E i G ( w i x kx k wi )x kw i wi x k + E i G ( w i x kx k wi ) I d. (5) It is clear that the Hessian of F evaluated at X is indeed block diagonal with the size of each diagonal block being d d. Note that the properties in (5) hold true only if the statistical independence can be ensured for the sources. 3.. A Block Jacobi-type ISA method According to the results in Section 3., we now develop a scheme of block Jacobi-type linear ISA method. For any k < l p, we denote X := µ X. (6) Bkl (p,d) µ kl Each partial step in a Jacobi-type sweep (Step in Algorithm ) requires to solve an unconstrained optimisation problem as F µ kl X : B kl(p, d) = R d d R. (7) As stated in Algorithm, one will need to solve the above subproblem for a global optimum. Unfortunately, it seems not feasible to do so in the current case (7). Nevertheless we can still make a theoretical conclusion as the following. Corollary Let X F l(p, d) be a correct separation point of a linear ISA problem. Then the block Jacobi-type linear ISA method in the fashion of Algorithm is locally quadratically convergent to X. It is well known that the performance of block Jacobitype methods critically depends on the methods to solve the subproblems. In the rest of this section, we formulate a Newton step based realisation of the block Jacobi-type linear ISA method, i.e., other than seeking for a local or global minimum of the restricted subproblem (7), we apply a single Newton optimisation step on each basic transformation. Similar techniques have already been used in,. The resulting algorithm preserves the local quadratic convergence properties as Algorithm does. The first and second derivatives of F µ kl X are computed as follows d dt (F µkl X )(tω) t= =tr ω kl(u kl (X) u lk (X)), (8) d dt (F µ kl X )(tω) t= =tr ω kl(h (X) kl (Ω)+h(X) lk (Ω)), where Ω B kl (p, d) and h (X) kl (Ω)=E i G ( w i x kx k wi )(x kw i w ix l )(w ix k ω kl x lw i ) E i G ( w i x kx k wi )(x kw i wi x k ω kl ) (9) + E i G ( w i x kx k wi )(ω kl x l w i wi x l ). Thus, a single Newton step is computed by solving the following linear system for Ω B kl (p, d) h (X) kl (Ω) + h (X) lk (Ω) = u kl(x) u lk (X). () By recursively iterating the above Newton step approach on each basic transformation, it completes the corresponding Newton step based block Jacobi-type ISA method. The local convergence properties of the Newton step based block Jacobi-type ISA method is stated as follows. Due to the page limits, we omit the proof. Proposition 3 Let X F l(p, d) be a correct separation point of a linear ISA problem. Then the block Jacobi-type linear ISA method employing a single Newton step as () on each basic transformation τ kl for all k < l p is locally quadratically convergent to X ISA Contrast Using Kernel Density Estimation Similar to ICA, the true distribution of the norm of projected observations is generally unknown. By employing the kernel density estimation technique, a popular nonparametric approach, an empirical negative log-likelihood between the norms of projected components can be formulated as follows, see for more details, F : F l(p, d) R, ( F ( X ):= E i log k= h E j ( φ w ij x kx k wij h )), () where w ij := w i w j R m represents the difference between the i-th and j-th sample, φ: R R is an appropriate kernel function, e.g., the Gaussian kernel φ(a) = exp ( a), and h R + is the kernel bandwidth. Following more tedious but analogous computations as for the general contrast function (4), it shows that (i) a correct separation point X is a critical point of F, (ii) the Hessian of F at X is also block diagonal with respect to the fixed block structure d d. It then follows directly that block Jacobi-type method is indeed an efficient tool for minimising the empirical ISA contrast function (). A Block Jacobi-type ISA method for optimising the contrast function () can be formulated directly in the similar fashion as in Section 3.. The convergence properties for general settings stated in Corollary and Proposition 3 will still apply to the empirical situation here. Due to the page limits,
5 all descriptions of the algorithm and proves of corresponding convergence results will be omitted here. For further details, we refer to our forthcoming journal paper. 4. NUMERICAL EXPERIMENTS In this section, we propose two experiments to illustrate the properties of the proposed ISA methods. Section 4. demonstrates the local quadratic convergence properties of the Newton step based block Jacobi-type ISA method by an ideal example. In Section 4., the Newton step based empirical block Jacobi-type ISA method proposed in Section 3.3 is compared with an ICA based ISA approach in terms of separation quality. 4.. Experiment As pointed out before, in general, the statistical independence holds only if the sample size tends to infinity. It indicates that the theoretical results as shown in Corollary and Proposition 3 could not generally be observed or verified in real environment. Nevertheless, in this experiment, by constructing an ideal dataset where the statistical independence can be ensured, we will illustrate the theoretical convergence result of the Newton step based block Jacobitype ISA method, i.e. the result in Proposition 3. Let us first specify the ideal dataset, which consists of three statistically independent signal groups with two dependent signals per group, as shown in Fig. and approximate the PDF by ψ(a) = cosh(a). The convergence properties are measured by the distance of the accumulation point X to the current iterate X k, which is defined as follows, for X, X F l(p, d), δ( X, X ) := x i x i x i x i F, () i= where F is the Frobenius norm. The numerical results in Fig. evidently verify the local quadratic convergence properties of the Newton step based block Jacobi-type linear ISA method stated in Proposition Experiment In this experiment, we investigate separation performance of the Newton step based empirical block Jacobi-type ISA method proposed in Section 3.3. It is compared with the popular approach of applying an ICA method followed by a regrouping process (referred to here as ICA-Group ISA). By fixing the dimension of each subspace d =, the block Jacobi-type ISA methods proposed in section 3 can be adapted easily to solve the standard linear ICA problem, i.e., a standard Jacobi-type ICA method. We refer to 3 for δ ( X k, X * ) S S S Index S4 S5 S6 5 5 Fig.. A toy ideal dataset Sweep (k) Fig.. Convergence properties of the Newton step based block Jacobi-type linear ISA method. implementation details. For each test, both methods are initialised by the same separation point which is close to an optimal solution. Our test data is generated as follows. Firstly we take three statistically independent signals randomly out of sources, with a fixed sample size n =,, from the benchmark speech dataset provided by the Brain Science Institute, RIKEN, see By generating a distortion of each signal, we end up with test data having three statistically independent signal groups with two dependent signals per group. To measure the separation quality in ISA scenario, a so-called multidimensional performance index (MPI) has been proposed in 4 as a generalisation of the Amari error 4 by pp pp c ij c ij d(c) := p i= j= max j c ij + i= max c i ij j=, (3) where C = (c ij ) p i,j= Rm m with c ij R d d and m =
6 PMI ICA Group ISA Block Jacobi type ISA Fig. 3. Separation performance of the proposed method. p d. For a given separation point X F l(p, d), we define C := X V A. Here the notation represents a certain matrix norm. As suggested in 4, for a given c R d d, c gives the absolute value of the largest eigenvalue of c. Generally, the smaller the index, the better the separation. After a replication of times of the test, we present the quartile based boxplots of the MPI score in Fig. 3. It shows that the Newton-step based empirical block Jacobitype ISA method outperforms the ICA-Group approach in terms of separation quality consistently. 5. REFERENCES J.-F. Cardoso, Multidimensional independent iomponent analysis, in Proceedings of the 3 rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 998), Seattle, WA, USA, 998, pp A. Hyvärinen and P. O. Hoyer, Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces, Neural Computation, vol., no. 7, pp. 75 7,. 3 J.-F. Cardoso, High-order contrasts for independent component analysis, Neural Computation, vol., no., pp. 57 9, F. J. Theis, Blind signal separation into groups of dependent signals using joint block diagonalization, in IEEE International Symposium on Circuits and Systems, 5 (ISCAS 5), Kobe, Japan, 5, pp J.-F. Cardoso and A. Souloumiac, Jacobi angles for simultaneous diagonalisation, SIAM Journal of Matrix Analysis and Application, vol. 7, no., pp. 6 64, K. Hüper, A Calculus Approach to Matrix Eigenvalue Algorithms, Habilitation Dissertation, Department of Mathematics, University of Würzburg, Germany, July. 8 Y. Nishimori, S. Akaho, and M. Plumbley, Riemannian optimization method on the flag manifold for independent subspace analysis, in Lecture Notes in Computer Science, Proceedings of the 6 th International Conference on Independent Component Analysis and Blind Source Separation (ICA 6), Berlin/Heidelberg, 6, vol. 3889, pp. 95 3, Springer-Verlag. 9 G. Golub and C. F. van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, nd edition, 989. J. J. Modi and J. D. Pryce, Efficient implementation of Jacobi s diagonalization method on the DAP, Numerische Mathematik, vol. 46, no. 3, pp , 985. J. Götze, S. Paul, and M. Sauer, An efficient Jacobilike algorithm for parallel eigenvalue computation, IEEE Transactions on Computers, vol. 4, no. 9, pp , 993. R. Boscolo, H. Pan, and V. P. Roychowdhury, Independent component analysis based on nonparametric density estimation, IEEE Transactions on Neural Networks, vol. 5, no., pp , 4. 3 H. Shen, M. Kleinsteuber, and K. Hüper, Efficient geometric methods for kernel density estimation based independent component analysis, To appear at EU- SIPCO 7, Poznań, Poland, September 3-7, 7. 4 S. Amari, A. Cichocki, and H. H. Yang, A new learning algorithm for blind signal separation, in Advances in Neural Information Processing Systems, David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, Eds. 996, vol. 8, pp , The MIT Press. 5 F. J. Theis, Multidimensional independent component analysis using characteristic functions, in Proceedings of the 3 th European Signal Processing Conference (EUSIPCO 5), Antalya, Turkey, 5.
α = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Orthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
Independent component ordering in ICA time series analysis
Neurocomputing 41 (2001) 145}152 Independent component ordering in ICA time series analysis Yiu-ming Cheung*, Lei Xu Department of Computer Science and Engineering, The Chinese University of Hong Kong,
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing
NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
Similarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal
SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)
SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL
Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8
Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e
Notes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
Applied Linear Algebra I Review page 1
Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties
Nonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
Lecture 18 - Clifford Algebras and Spin groups
Lecture 18 - Clifford Algebras and Spin groups April 5, 2013 Reference: Lawson and Michelsohn, Spin Geometry. 1 Universal Property If V is a vector space over R or C, let q be any quadratic form, meaning
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Mathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
Factorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function
17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function, : V V R, which is symmetric, that is u, v = v, u. bilinear, that is linear (in both factors):
Inner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
Matrix Differentiation
1 Introduction Matrix Differentiation ( and some other stuff ) Randal J. Barnes Department of Civil Engineering, University of Minnesota Minneapolis, Minnesota, USA Throughout this presentation I have
Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)
Wavelet analysis In the case of Fourier series, the orthonormal basis is generated by integral dilation of a single function e jx Every 2π-periodic square-integrable function is generated by a superposition
Inner Product Spaces and Orthogonality
Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation
Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in
Model order reduction via Proper Orthogonal Decomposition
Model order reduction via Proper Orthogonal Decomposition Reduced Basis Summer School 2015 Martin Gubisch University of Konstanz September 17, 2015 Martin Gubisch (University of Konstanz) Model order reduction
Chapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS. Mikhail Tsitsvero and Sergio Barbarossa
ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS Mikhail Tsitsvero and Sergio Barbarossa Sapienza Univ. of Rome, DIET Dept., Via Eudossiana 18, 00184 Rome, Italy E-mail: [email protected], [email protected]
CURVES WHOSE SECANT DEGREE IS ONE IN POSITIVE CHARACTERISTIC. 1. Introduction
Acta Math. Univ. Comenianae Vol. LXXXI, 1 (2012), pp. 71 77 71 CURVES WHOSE SECANT DEGREE IS ONE IN POSITIVE CHARACTERISTIC E. BALLICO Abstract. Here we study (in positive characteristic) integral curves
Mean value theorem, Taylors Theorem, Maxima and Minima.
MA 001 Preparatory Mathematics I. Complex numbers as ordered pairs. Argand s diagram. Triangle inequality. De Moivre s Theorem. Algebra: Quadratic equations and express-ions. Permutations and Combinations.
On Adaboost and Optimal Betting Strategies
On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: [email protected] Fabrizio Smeraldi School of
NOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Lecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
SYMMETRIC EIGENFACES MILI I. SHAH
SYMMETRIC EIGENFACES MILI I. SHAH Abstract. Over the years, mathematicians and computer scientists have produced an extensive body of work in the area of facial analysis. Several facial analysis algorithms
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student
1 VECTOR SPACES AND SUBSPACES
1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such
Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain
Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n real-valued matrix A is said to be an orthogonal
State of Stress at Point
State of Stress at Point Einstein Notation The basic idea of Einstein notation is that a covector and a vector can form a scalar: This is typically written as an explicit sum: According to this convention,
General Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
Applications to Data Smoothing and Image Processing I
Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is
CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION
No: CITY UNIVERSITY LONDON BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION ENGINEERING MATHEMATICS 2 (resit) EX2005 Date: August
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
160 CHAPTER 4. VECTOR SPACES
160 CHAPTER 4. VECTOR SPACES 4. Rank and Nullity In this section, we look at relationships between the row space, column space, null space of a matrix and its transpose. We will derive fundamental results
(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0
Linear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka [email protected] http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
Methods for Finding Bases
Methods for Finding Bases Bases for the subspaces of a matrix Row-reduction methods can be used to find bases. Let us now look at an example illustrating how to obtain bases for the row space, null space,
3 Orthogonal Vectors and Matrices
3 Orthogonal Vectors and Matrices The linear algebra portion of this course focuses on three matrix factorizations: QR factorization, singular valued decomposition (SVD), and LU factorization The first
CS3220 Lecture Notes: QR factorization and orthogonal transformations
CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
Duality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
Dynamic Eigenvalues for Scalar Linear Time-Varying Systems
Dynamic Eigenvalues for Scalar Linear Time-Varying Systems P. van der Kloet and F.L. Neerhoff Department of Electrical Engineering Delft University of Technology Mekelweg 4 2628 CD Delft The Netherlands
1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
Extrinsic geometric flows
On joint work with Vladimir Rovenski from Haifa Paweł Walczak Uniwersytet Łódzki CRM, Bellaterra, July 16, 2010 Setting Throughout this talk: (M, F, g 0 ) is a (compact, complete, any) foliated, Riemannian
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α
LECTURE III. Bi-Hamiltonian chains and it projections. Maciej B laszak. Poznań University, Poland
LECTURE III Bi-Hamiltonian chains and it projections Maciej B laszak Poznań University, Poland Maciej B laszak (Poznań University, Poland) LECTURE III 1 / 18 Bi-Hamiltonian chains Let (M, Π) be a Poisson
Lecture 1: Schur s Unitary Triangularization Theorem
Lecture 1: Schur s Unitary Triangularization Theorem This lecture introduces the notion of unitary equivalence and presents Schur s theorem and some of its consequences It roughly corresponds to Sections
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
Vector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
Derivative Free Optimization
Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
LINEAR ALGEBRA W W L CHEN
LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,
5. Orthogonal matrices
L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal
A Direct Numerical Method for Observability Analysis
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 15, NO 2, MAY 2000 625 A Direct Numerical Method for Observability Analysis Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper presents an algebraic method
Notes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.
MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. Vector space A vector space is a set V equipped with two operations, addition V V (x,y) x + y V and scalar
Factor Rotations in Factor Analyses.
Factor Rotations in Factor Analyses. Hervé Abdi 1 The University of Texas at Dallas Introduction The different methods of factor analysis first extract a set a factors from a data set. These factors are
Numerical Methods I Eigenvalue Problems
Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 [email protected] 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)
1 Sets and Set Notation.
LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most
Detection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
Lecture L3 - Vectors, Matrices and Coordinate Transformations
S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between
Chapter 7. Lyapunov Exponents. 7.1 Maps
Chapter 7 Lyapunov Exponents Lyapunov exponents tell us the rate of divergence of nearby trajectories a key component of chaotic dynamics. For one dimensional maps the exponent is simply the average
Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Factor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
Problem Set 5 Due: In class Thursday, Oct. 18 Late papers will be accepted until 1:00 PM Friday.
Math 312, Fall 2012 Jerry L. Kazdan Problem Set 5 Due: In class Thursday, Oct. 18 Late papers will be accepted until 1:00 PM Friday. In addition to the problems below, you should also know how to solve
0.1 Phase Estimation Technique
Phase Estimation In this lecture we will describe Kitaev s phase estimation algorithm, and use it to obtain an alternate derivation of a quantum factoring algorithm We will also use this technique to design
Solution of Linear Systems
Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 [email protected] 1 Course G63.2010.001 / G22.2420-001,
Statistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
Matrix Representations of Linear Transformations and Changes of Coordinates
Matrix Representations of Linear Transformations and Changes of Coordinates 01 Subspaces and Bases 011 Definitions A subspace V of R n is a subset of R n that contains the zero element and is closed under
Math 550 Notes. Chapter 7. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010
Math 550 Notes Chapter 7 Jesse Crawford Department of Mathematics Tarleton State University Fall 2010 (Tarleton State University) Math 550 Chapter 7 Fall 2010 1 / 34 Outline 1 Self-Adjoint and Normal Operators
IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction
IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible
