Linear Algebraic Equations, SVD, and the Pseudo-Inverse



Similar documents
Linear Algebra Review. Vectors

1 Introduction to Matrices

MAT188H1S Lec0101 Burbulla

Lecture 5: Singular Value Decomposition SVD (1)

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

x = + x 2 + x

Linear Algebra Notes for Marsden and Tromba Vector Calculus

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Introduction to Matrix Algebra

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Finite Dimensional Hilbert Spaces and Linear Inverse Problems

Data Mining: Algorithms and Applications Matrix Math Review

Similar matrices and Jordan form

Notes on Symmetric Matrices

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

α = u v. In other words, Orthogonal Projection

Similarity and Diagonalization. Similar Matrices

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

Linear Algebra: Determinants, Inverses, Rank

University of Lille I PC first year list of exercises n 7. Review

Orthogonal Projections

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

LINEAR ALGEBRA. September 23, 2010

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

Review Jeopardy. Blue vs. Orange. Review Jeopardy

CS3220 Lecture Notes: QR factorization and orthogonal transformations

1 VECTOR SPACES AND SUBSPACES

by the matrix A results in a vector which is a reflection of the given

MAT 242 Test 2 SOLUTIONS, FORM T

Chapter 6. Orthogonality

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

NOTES ON LINEAR TRANSFORMATIONS

Vector and Matrix Norms

Introduction to Matrices for Engineers

Orthogonal Diagonalization of Symmetric Matrices

MATH APPLIED MATRIX THEORY

Matrices 2. Solving Square Systems of Linear Equations; Inverse Matrices

6. Cholesky factorization

Inner Product Spaces

Section Inner Products and Norms

Math 215 HW #6 Solutions

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Solving Linear Systems, Continued and The Inverse of a Matrix

Solutions to Homework Section 3.7 February 18th, 2005

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Question 2: How do you solve a matrix equation using the matrix inverse?

Inner products on R n, and more

LINEAR ALGEBRA W W L CHEN

Brief Introduction to Vectors and Matrices

LS.6 Solution Matrices

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Solutions to Math 51 First Exam January 29, 2015

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Solving Systems of Linear Equations

Applied Linear Algebra I Review page 1

Mathematics Course 111: Algebra I Part IV: Vector Spaces

A =

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

DATA ANALYSIS II. Matrix Algorithms

Subspaces of R n LECTURE Subspaces

8. Linear least-squares

Systems of Linear Equations

Multidimensional data and factorial methods

Lecture Notes 2: Matrices as Systems of Linear Equations

Multivariate normal distribution and testing for means (see MKB Ch 3)

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in Total: 175 points.

[1] Diagonal factorization

General Framework for an Iterative Solution of Ax b. Jacobi s Method

Lecture 5 Least-squares

Chapter 17. Orthogonal Matrices and Symmetries of Space

MATH2210 Notebook 1 Fall Semester 2016/ MATH2210 Notebook Solving Systems of Linear Equations... 3

Lecture 5 Principal Minors and the Hessian

Lecture notes on linear algebra

( ) which must be a vector

Using row reduction to calculate the inverse and the determinant of a square matrix

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

Constrained Least Squares

BANACH AND HILBERT SPACE REVIEW

ISOMETRIES OF R n KEITH CONRAD

Lecture 2 Matrix Operations

Numerical Analysis Lecture Notes

1 Sets and Set Notation.

Lecture 4: Partitioned Matrices and Determinants

Least-Squares Intersection of Lines

Addition and Subtraction of Vectors

Nonlinear Iterative Partial Least Squares Method

5 Homogeneous systems

Solving Systems of Linear Equations

5. Orthogonal matrices

4.5 Linear Dependence and Linear Independence

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION

4.3 Least Squares Approximations

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

The Determinant: a Means to Calculate Volume

Transcription:

Linear Algebraic Equations, SVD, and the Pseudo-Inverse Philip N. Sabes October, 21 1 A Little Background 1.1 Singular values and matrix inversion For non-smmetric matrices, the eigenvalues and singular values are not equivalent. However, the share one important propert: Fact 1 A matrix A, NxN, is invertible iff all of its singular values are non-zero. Proof outline: if: We have A = UΣV T. Suppose that Σ has no zeros on the diagonal. Then B = VΣ 1 U T exists. And AB = UΣV T VΣ 1 U T = I, and similarl BA = I. Thus A is invertible. onl if: Given that A is invertible, we will construct Σ 1, which is sufficient to show that SVs are all non-zero: A = Σ = UΣV T U T AV Σ 1 = V T A 1 U 1.2 Some definitions from vector calculus In the following, we will want to find the value of a vector x which minimizes a scalar function f(x). In order to do this, we first need a few basic definitions from vector calculus. Definition 1 The (partial) derivative of a scalar a with respect to a vector x, Nx1, is the 1xN vector a a = a 1 N In practice, when onl derivatives of scalars are used, people often write a as an Nx1 column vector (i.e. the transpose of the definition above). However the row vector definition is preferable, as it s consistent with the following: Definition 2 The (partial) derivative of a vector b, Mx1, with respect to a vector x, Nx1, is the MxN matrix b 1 b 1 1 N b =....... b N 1 Linear Algebraic Equations, SVD, and the Pseudo-Inverse b Philip N. Sabes is licensed under a Creative Commons Attribution-Noncommercial 3. United States License. Requests for permissions beond the scope of this license ma be sent to sabes@ph.ucsf.edu b N N 1

Exercise 1 Let x, be Nx1 vectors and A be an NxN matrix. Show that the following are correct: T x Ax = T = A T Ax = 2x T A, if A is smmetric In ordinar Calculus, to find a local maximum or minimum of a function, f(x), we solve the equation f(x)/ =. A similar principle holds for a multivariate function (i.e. a function of a vector): Fact 2 Let f(x) be a scalar function of an Nx1 vector. Then f(x)/ = at x = x iff x is a local minimum, a local maximum or a saddle point of f. Analogous to the scalar case, second partial derivatives must be checked to determine the kind of extremum. In practice, one often knows that the function f is either convex or concave. 2 Solving Linear Algebraic Equations From High School algebra, everone should know how to solve N coupled linear equations with N unknowns. For example, consider the N=2 case below: 2x + = 4 2x = 8. First ou d probabl add the two equations to eliminate and solve for x: 4x = 12 ields x = 3. Then ou d substitute x into one of the equations to solve for : = 4 6 = 2. This is eas enough, but it gets somewhat hair for large N. We can simplif the notation, at least, b rewriting the problem as a matrix equation: 2 1 x 4 = 2 1 8 } {{ } A Now, assuming that A is invertible, we can write the solution as ( ) x 4 1 1 1 4 3 = A 1 = = 8 4 2 2 8 2 While this looks like a cleaner approach, in realit it just pushes the work into the computation of A 1. And in fact, the basic methods of matrix inversion use backsubstitution algorithms which are similar to the eliminate and substitute method we above. Still, this notation shows us something. Ever time we compute the inverse of a full-rank matrix A, we have essentiall solved the whole class of linear equations, Ax =, for an. The SVD of A makes the geometr of the situation clear: Ax = UΣV T x = ΣV T x }{{} x x = = U T } {{ } ỹ Σ 1 ỹ x i = ỹ i /σ i (1) 2

Of course not all sets of N equations with N unknowns have a solution. An example will illustrate the problem that arises. Example 1 1 2 2 x = Express the matrix in terms of its SVD, UΣV T : 1 2 1 1 1 2 1 1 1 x = Invert U and combine Σ and V T : 1 1 x = 1 1 2 5 2 1 x1 + x 2 = 1 5 1 + 2 2 2 1 + 2 Clearl, a solution onl exists when 2 = 2 1, i.e. when lives in the range of A. The problem here is that A is deficient, i.e. not full-rank. Consider the general case where A, NxN, has rank M. This means that A has N-M zero-valued SVs: σ 1... Σ = σ M ΣM =... If we tr to solve the equations as in Equation 1, we hit a snag: Ax = UΣV T x = ΣM x = ỹ x i = ỹ i /σ i for i M onl In the SVD-rotated space, we can onl access the first M elements of the solution. When a square matrix A is deficient, the columns are linearl dependent, meaning that we reall onl have M<N (independent) unknowns. But we still have N equations/constraints. In the next two sections we ll further explore the general problem of N equations, M unknowns. We ll start with the case of under-constrained equations, M>N, and then return to over-constrained equations in Section 4. 3

3 Under-constrained Linear Algebraic Equations If there are more unknowns M than equations N, the problem is under-constrained or ill-posed : A x = (2) NxM Mx1 Nx1 (N<M) Definition 3 The row-rank of an NxM matrix is the dimension of the subspace of R M spanned b its N rows. A matrix is said to have full row-rank if its row-rank is N, i.e. if the rows are a linearl independent set. Clearl this can onl be the case if N M. The row-rank of a matrix is equal to its rank, i.e. the number of non-zero SVs. In this section, we will address problems of the form of Equation 2 where A has full row-rank, i.e. the first N singular values are non-zero. Example Consider a simple example with 1 equation and 2 unknowns: x 1 +x 2 = 4. There are an infinite number of solutions to this equation, and the lie along a line in R 2. 3.1 A subspace of solutions We can determine the solution space for the general case of Equation 2 using the SVD of A: A x = U ΣN V T x = (3) NxN NxM MxM Mx1 Nx1 Using our previous results on SVD, we can rewrite Equation 3 as N ( σ i u i v T i x ) = (4) i=1 In other words, the onl part of x that matters is the component that lies in the N-dimensional subspace of R M spanned b the first N columns of V. Thus, the addition of an component that lies in the null-space of A will make no difference: if x is an solution to Equation 3, so is x + M i=n+1 α iv i, for an {α i }. Example Consider again the equation x 1 + x 2 = 4: 1 1 x = 4 Replacing the matrix with its SVD, UΣV T : 1 1 1 1 1 1 x = 4 4

The null-space of A = 1 1 is given b α 1 1, where α is a free scalar variable. Since x 1 = x 2 = 2 is one solution, the class of solutions is given b: 2 1 + α 2 1 3.2 Regularization In general, under-constrained problems can be made well-posed b the addition of a regularizer, i.e. a cost-function that we d like the solution to minimize. In the case of under-constrained linear equations, we know that the solution space lies in an (M N) dimensional subspace of R M. One obvious regularizer would be to pick the solution that has the minimum square norm, i.e. the solution that is closest to the origin. The new, well-posed version of the problem can now be stated as follows: Find the vector x which minimizes x T x, subject to the constraint Ax = (5) In order to solve Equation 5, we will need to make use of the following, Fact 3 If A, NxM, N M, has full row-rank, then AA T is invertible. Proof: where Σ N is invertible. Therefore, A = U Σ N V T AA T = U Σ N V T V = U Σ N Σ N = UΣ 2 N UT, ΣN U T U T where Σ 2 N is the NxN diagonal matrix with σ2 i on the ith diagonal. Since Σ N has no zero elements on the diagonal, neither does Σ 2 N. Therefore AAT is invertible (and smmetric, positive-definite). It is also worth noting here that since that each matrix in the last equation above is invertible, we can write down the SVD (and eigenvector decomposition) of (AA T ) 1 b inspection: (AA T ) 1 = UΣ 2 N UT. We will now solve the problem stated in Equation 5 using Lagrange multipliers. We assume that A has full row-rank. Let H = x T x + λ T (Ax ). The solution is found b solving the equation H/ = (see Section 1.2) and then ensuring that the constraint (Ax = ) holds. First solve for x: H = 2x T + λ T A = x = 1 2 AT λ (6) Now using the fact that AA T is invertible, choose λ to ensure that that the original equation holds: Ax = ( ) 1 A 2 AT λ = λ = 2 ( AA T) 1 (7) 5

Finall, substitute Equation 7 into Equation 6 to get an expression for x: 3.3 The right pseudo-inverse x = A T (AA T ) 1 (8) The MxN matrix which pre-multiplies in Equation 8 is called the right pseudo-inverse of A : Wh the strange name? Because A + R = AT (AA T ) 1. AA + R = AAT (AA T ) 1 = I, but A + R A is generall not equal to I. (A+ RA = I iff A is square and invertible, in which case A + R = A 1 ). Fact 4 Let A be an NxM matrix, N<M, with full row-rank. Then the pseudo-inverse of A projects a vector from the range of A (= R N ) into the N-dimensional sub-space of R M spanned b the columns of A: A + R = AT (AA T ) 1 ( ΣN = V = V = V = ΣN Σ 1 N N i=1 U T ) (UΣ 2 N UT) Σ 2 N UT U T σ 1 i v i u T i (9) One should compare the sum of outer products in Equation 9 with those describing A in Equation 4. This comparison drives home the geometric interpretation of the action of A + R. Fact 4 means that the solution to the regularized problem of Equation 5, x = A + R, defines the unique solution x that is completel orthogonal to the null-space of A. This should make good sense: in Section 3.1 we found that an component of the solution that lies in the null-space is irrelevant, and the problem defined in Equation 5 was to find the smallest solution vector. Finall then, we can write an explicit expression for the complete space of solutions to Ax =, for A, NxM, with full row-rank: x = A + R + M i=n+1 α i v i, for an {α i }. Example Consider one last time the equation 1 1 x = 4. In this case, the cost-function of Section 3.2 is x 2 1 + x 2 2. Inspection shows that the optimal solution is x 1 = x 2 = 2. We can confirm this using Equation 8. The right pseudo-inverse of A is so the solution is x =.5.5 A + 1 R = 1 2 4 = 2. ( 1 1 1 1 ) 1 =.5.5, 6

4 Linear Regression as Over-constrained Linear Algebraic Equations We now consider the over-constrained case, where there are more equations than unknowns (N>M). Although the results presented are generall applicable, for discussion purposes, we focus on a particular common application: linear regression. Consider an experimental situation in which ou measure the value of some quantit,, which ou believe depends on a set of M predictor variables, z. For example, could be number of spikes recorded from a sensor neuron in response to a stimulus with parameters z. Suppose that ou have a hpothesis that depends linearl on elements of z: = b 1 z 1 + + b M z M = z T b, and ou want to find the best set of model coefficients, b. The first step is to repeat the experiment N > M times, ielding a data set { i,z i } N i=1. In an ideal, linear, noise-free world, the data would satisf the set of N linear equations, z T i b = i, i 1, N. These equations can be rewritten in matrix form: z T. 1. b = z T N 1. N Z b = NxM Mx1 Nx1 (N>M) (1) Of course for an real data, no exact solution exists. Rather, we will look for the model parameters b that give the smallest summed squared model-error: Find the vector b which minimizes E = N (z T i b i ) 2 = (Zb ) T (Zb ). (11) Definition 4 The column-rank of an NxM matrix is the dimension of the subspace of R N spanned b its M rows. A matrix is said to have full column-rank if its column-rank is M, i.e. if the columns are a linearl independent set. Clearl this can onl be the case if N M. The column-rank of a matrix is equal to its row-rank and its rank, and all three equal the number of non-zero SVs. Fact 5 If Z, NxM, N M, has full column-rank, then Z T Z is invertible. This fact follows directl from Fact 3, with Z T replacing A. From the previous result, we have that if ΣN Z = U V T, then i=1 Z T Z = VΣ 2 N VT, and (Z T Z) 1 = VΣ 2 N VT. We will now solve the problem stated in Equation 11. Note that the route to the solution will make use of Fact 5, and so the solution will onl exist if Z has full column-rank. In the case of 7

regression, this shouldn t be a problem as long as the data are independentl collected and N is sufficientl larger than M. We begin b rewriting the error, E = (Zb ) T (Zb ) = b T Z T Zb 2 T Zb + T. Following Section 1.2, we find the minimum-error b b solving the equation E/ b = : E b = 2bT Z T Z 2 T Z = Z T Zb = Z T b = (Z T Z) 1 Z T (12) Equation 12 is the familiar formula for linear regression, which will be derived in a more standard wa later in this course. The MxN matrix which pre-multiplies in Equation 12 is called the left pseudo-inverse of Z: Z + L = (ZT Z) 1 Z T Fact 6 Let Z be a matrix, NxM, N>M, with full column-rank. Then the pseudo-inverse of Z projects a vector R N into R M b discarding an component of which lies outside (i.e. orthogonal to) the M-dimensional subspace of R N spanned b the columns of Z: Z + L = (Z T Z) 1 Z T = ( VΣ 2 N VT)( V Σ N U T) = VΣ 2 N ΣN U T = V Σ 1 N U T N = σi 1 v i u T i i=1 8