be a set of basis functions. In order to represent f as their linear combination i=0

Similar documents
The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

Lecture 5: Singular Value Decomposition SVD (1)

Chapter 6. Orthogonality

Linear Algebra Review. Vectors

Orthogonal Diagonalization of Symmetric Matrices

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Applied Linear Algebra I Review page 1

LINEAR ALGEBRA. September 23, 2010

by the matrix A results in a vector which is a reflection of the given

Similarity and Diagonalization. Similar Matrices

α = u v. In other words, Orthogonal Projection

Inner Product Spaces and Orthogonality

Recall that two vectors in are perpendicular or orthogonal provided that their dot

MATH APPLIED MATRIX THEORY

Factorization Theorems

5. Orthogonal matrices

Solving Linear Systems, Continued and The Inverse of a Matrix

Inner product. Definition of inner product

Linear Algebra Methods for Data Mining

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Similar matrices and Jordan form

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Section Inner Products and Norms

Notes on Symmetric Matrices

Inner Product Spaces

Orthogonal Bases and the QR Algorithm

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Lecture 1: Schur s Unitary Triangularization Theorem

Numerical Methods I Eigenvalue Problems

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

3 Orthogonal Vectors and Matrices

Vector and Matrix Norms

Linear Algebra: Determinants, Inverses, Rank

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

3. Let A and B be two n n orthogonal matrices. Then prove that AB and BA are both orthogonal matrices. Prove a similar result for unitary matrices.

Linear Algebra Notes

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

x = + x 2 + x

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

Systems of Linear Equations

Eigenvalues and Eigenvectors

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Chapter 17. Orthogonal Matrices and Symmetries of Space

DATA ANALYSIS II. Matrix Algorithms

Numerical Analysis Lecture Notes

Chapter 7. Lyapunov Exponents. 7.1 Maps

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

1 Introduction to Matrices

1.5 SOLUTION SETS OF LINEAR SYSTEMS

Nonlinear Iterative Partial Least Squares Method

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

7 Gaussian Elimination and LU Factorization

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

1 VECTOR SPACES AND SUBSPACES

A note on companion matrices

Notes on Determinant

Solutions to Math 51 First Exam January 29, 2015

Continuity of the Perron Root

Orthogonal Projections

8 Square matrices continued: Determinants

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Math 312 Homework 1 Solutions

1 Review of Least Squares Solutions to Overdetermined Systems

8. Linear least-squares

[1] Diagonal factorization

Lecture Topic: Low-Rank Approximations

Operation Count; Numerical Linear Algebra

Direct Methods for Solving Linear Systems. Matrix Factorization

ISOMETRIES OF R n KEITH CONRAD

NOTES ON LINEAR TRANSFORMATIONS

Section 5.3. Section 5.3. u m ] l jj. = l jj u j + + l mj u m. v j = [ u 1 u j. l mj

The Characteristic Polynomial

3. INNER PRODUCT SPACES

Numerical Analysis Lecture Notes

T ( a i x i ) = a i T (x i ).

6. Cholesky factorization

160 CHAPTER 4. VECTOR SPACES

Examination paper for TMA4205 Numerical Linear Algebra

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

LINEAR ALGEBRA W W L CHEN

General Framework for an Iterative Solution of Ax b. Jacobi s Method

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Linear Algebra Notes for Marsden and Tromba Vector Calculus

We shall turn our attention to solving linear systems of equations. Ax = b

LS.6 Solution Matrices

Solving Systems of Linear Equations

Methods for Finding Bases

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

BANACH AND HILBERT SPACE REVIEW

Chapter 6. Cuboids. and. vol(conv(p ))

Introduction to Matrix Algebra

4 MT210 Notebook Eigenvalues and Eigenvectors Definitions; Graphical Illustrations... 3

Data Mining: Algorithms and Applications Matrix Math Review

University of Lille I PC first year list of exercises n 7. Review

Inner products on R n, and more

Transcription:

Maria Cameron 1. The least squares fitting using non-orthogonal basis We have learned how to find the least squares approximation of a function f using an orthogonal basis. For example, f can be approximates by a truncated trigonometric Fourier series or by a truncated series based on orthogonal polynomials. Such series converge fast if f(x) is smooth. Now we switch to the case where f(x) is not necessarily smooth or its values are available only at the given set of points x 1,..., x n. Let a function f(x) is given by the set of values {f j, x j } m 1 j=0. The values of f contain noise which makes interpolation not appropriate for finding f at intermediate points. We want to (i) smooth f and (ii) be able to find its values in the intermediate points. Let {φ i } n 1 i=0 be a set of basis functions. In order to represent f as their linear combination n 1 f(x) c i φ i (x) we need to solve the following m n system of linear equations φ 0 (x 0 ) φ 1 (x 0 )... φ n 1 (x 0 ) c 0 (1) φ 0 (x 1 ) φ 1 (x 1 )... φ n 1 (x 1 ) c 1...... φ 0 (x m 1 ) φ 1 (x m 1 )... φ n 1 (x m 1 ) c n 1 i=0 = f 0 f 1... f m 1 In the typical case there are many data points and a few basis functions, i.e., m n. The first question is what is a good set of basis functions? In order to answer this question we need to recall the basic linear algebra concepts. Definition 1. A norm of a matrix is associated with the vector norm is defined as (2) A = max x 0 Ax x. Exercise Show that if A is a symmetric square n n matrix then where λ i s, i = 1,..., n are its eigenvalues. A = ρ(a) max λ i, i Exercise Show that if A is an m n matrix, m n, then A = ρ(a T A). The condition number of a square matrix A is defined as κ(a) A A 1. 1.

2 The condition number allows to bound the relative error of the solution x in terms of the relative error in the data b. Let A be a square matrix and let b be the right-hand side. Let x and x + δx be the solutions of Ax = b and A(x + δx) = b + δb respectively, where δb be the perturbation of the right-hand side. Then On the other hand, since Ax = b, we have Therefore, δx = A 1 b, hence δx A 1 δb. A x b. Hence x b A. δx x A A 1 δb b κ(a) δb b. Hence, for a good set of basis functions the condition number of the matrix in Eq. (1) is not large. For example, φ i = x i is a bad set of basis functions. Do not use it unless n is really small! Exercise Calculate the condition numbers for the matrix defined in Eq. (1) for φ i = x i and x j = j/m. Set m = n and plot the graph of κ(a) versus n. The matlab function that computed the condition number is cond(a). Since we are going to deal with overdetermined systems of linear equations whose matrices are m n with m > n, we need to define the condition number for a rectangular matrix. A very useful concept in the matrix theory is the singular value decomposition (SVD). We are going to consider it in details. 1.1. Singular Value Decomposition. Theorem 1. Let a be an arbitrary m n matrix with m n. Then we can write A = UΣV T, where U is m n and U T U = I n n, Σ = diag{σ 1,..., σ n }, σ 1 σ 2... σ n 0, and V is n n and V T V = I n n. The columns of U, u 1,..., u n, are called left singular vectors. The columns of V, v 1,..., v n are called right singular vectors. The numbers σ 1,..., σ n are called singular values. If m < n, the SVD is defined for A T. The geometric sense of this theorem is the following. Let us view the matrix A as a map from R n into R m : A : R n R m, x Ax.

Then one can find orthogonal bases in R n, v 1,...,v n, and in R m, u 1,..., u m and numbers σ 1,..., σ n, such that v j σ j u j, j = 1,..., n. Then for any x R n we have: if x = n x j v j then Ax = j=1 For a rectangular matrix A the condition number is n x j σ j u j. κ(a) = σ max σ min, where σ max and σ min are the largest and the smallest singular values of A. Proof. We use induction in m and n. We assume that the SVD exists for (m 1) (n 1) matrices and prove it for m n. We assume A 0; otherwise we take Σ = 0 and U and V are arbitrary orthogonal matrices. The basic step occurs when n = 1 (since m > n). We write A = UΣV T with U = where is the 2-norm. For the induction step, choose v so that Let j=1 A, Σ = A, V = 1, A v = 1 and A = Av > 0. u = Av Av, which is a unit vector. Choose Ũ and Ṽ so that U = [u, Ũ] and V = [v, Ṽ ] are m n and n n orthogonal matrices respectively. Now write [ ] [ U T u T u AV = Ũ T A [v Ṽ ] = t Av u t ] AṼ Ũ T Av Ũ T. AV Then and We claim that u T AṼ a contradiction. Therefore, u T Av = (Av)T (Av) Av = Av := σ Ũ T Av = Ũ T u Av = 0. = 0 too because otherwise σ = A = U T AV [1, 0,..., 0]U T AV σ u T AṼ > σ, U T AV = [ σ 0 0 Ũ T AV ] = [ u t Av 0 0 Ã ]. 3

4 Now we apply the induction hypothesis that Hence, or U T AV = A = [ σ 0 0 U 1 Σ 1 V T 1 ( U which is our desired decomposition. Ã = U 1 Σ 1 V T 1. ] = [ ]) [ ] ( 1 0 σ 0 V 0 U 1 0 Σ 1 [ 1 0 0 U 1 ] [ σ 0 0 Σ 1 ] [ 1 0 0 V 1 ] T [ ] ) T 1 0, 0 V 1 The SVD has a large number of important algebraic and geometric properties, the most important of which are summarized in the following theorem. Theorem 2. Let A = UΣV T be the SVD of the m n matrix A, m n. (1) Suppose A is symmetric and A = UΛU T be an eigendecomposition of A Then the SVD of A is UΣV T where σ i = λ i and v i = u i sign(λ i ), where sign(0) = 1. (2) The eigenvalues of the symmetric matrix A T A are σi 2. The right singular vectors v i are the corresponding orthonormal eigenvactors. (3) The eigenvectors of the symmetric matrix AA T are σi 2 and m n zeroes. The left singular vectors u i are the corresponding orthonormal eigenvectors for the eigenvalues σi 2. One can take any m n orthogonal vectors as eigenvectors for the eigenvalue 0. (4) If A has full rank, the solution of (5) If A is square and nonsingular, then and (6) Suppose Then min x Ax b is x = V Σ 1 U T b. A = σ 1. A 1 = 1 σ n κ(a) = A A 1 = σ 1 σ n. σ 1... σ r > σ r+1 =... = σ n = 0. rank(a) = r, null(a) = {x R n : Ax = 0 R m } = span(v r+1,..., v n ), range(a) = span(u 1,..., u r ).

(7) A = UΣV T = n σ i u i vi T, i=1 i.e., A is a sum of rank 1 matrices. Then a matrix of rank k < n closest to A is k A k = σ i u i vi T, and A A k = σ k+1. i=1 Example This example illustrates the low rank approximation of a large matrix. The original image is shown in Fig. 1(a). The rank 3, 10, and 20 approximations are shown in Figs. 1 (b), (c), and (d) respectively. The sequence of matlab commands to create an 5 (a) (b) (c) (d) Figure 1. Low rank approximations of image. (a): original; (b): rank 3; (c) rank 10; (d) rank 20. approximation of rank m for a given image is the following.

6 >> clear all >> im=imread( IMG_1413.jpg ); >> [m n k]=size(im) m = 1600 n = 1200 k = 3 >> mimi=zeros(m,n); >> mimi=sum(im,3); >> fig=figure; >> imagesc(mimi) >> colormap gray >> set(gca, DataAspectRatio,[1 1 1]) >> [U S V]=svd(mimi); >> size(u) ans = 1600 1600 >> size(v) ans = 1200 1200 >> size(s) ans = 1600 1200 >> fig=figure; >> m=10; >> rm=u(:,1:m)*s(1:m,1:m)*v(:,1:m) ; >> colormap gray >> set(gca, DataAspectRatio,[1 1 1]) 1.2. The QR decomposition. 1.2.1. Normal equations. Consider the overdetermined system of linear equations Ax = b, A m n, m n. Definition 2. We say that x is the least squares solution of Ax = b, A is m n, m n, if x = arg min Ax b. x Rn Now we will show that x is given by the formula (3) x = (A T A) 1 A T b. Note that x is the solution of the so called normal equation that is obtained from Ax = b by multiplication by A T from the left. If the matrix A has full rank, i.e., rank(a) = n, the matrix A T A is symmetric positive definite. Write x = x + e and consider Ax b 2. We

7 want to show that it is minimal if and only if e = 0, i.e., x = x. Ax b 2 = (Ax b) ( Ax b) = (Ax + Ae b) T (Ax + Ae b) = Ae 2 + Ax b 2 + 2(Ae) T (Ax b) = Ae 2 + Ax b 2 + 2e T (A T Ax A T b) = Ae 2 + Ax b 2 Ax b 2 The equality occurs if and only if e = 0, i.e., the norm Ax b is minimal if and only if x = x given by Eq. (3). The geometric sense of the least squares solution is the following: the residual r := Ax b is orthogonal to the space spanned by the columns of the matrix A, i.e., r dotted with any column of A is zero, or A T r = 0. 1.2.2. Gram-Schmidt Algorithm. Theorem 3. Let A be m n, m n. Suppose that A has full column rank. Then there exist a unique m n orthogonal matrix Q, i.e., Q T Q = I n n, and a unique n n upper-triangular matrix R with positive diagonals r ii > 0 such that A = QR. Proof. The proof of this theorem is given by the Gram-Schmidt orthogonalization process. Algorithm 1: Gram-Schmidt orthogonalization Input : matrix A, m n, rank(a) = n. Output: orthogonal matrix Q m n, Q T Q = I n n, and upper-triangular n n matrix R with r ii > 0. for i = 1,..., N do q i = a i ; for j = 1,..., i 1 do { r ji = q T j a i CGS r ji = qj T q i MGS ; q i = q i r ji q j ; end r ii = q i ; q i = q i /r ii ; end Here CGS and MGS stand for the Classic Gram-Schmidt and the Modified Gram- Schmidt respectively. Unfortunately the Classic Gram-Schmidt algorithm is numerically unstable when the columns of A are nearly linearly dependent. The Modified Gram-Schmidt is better but

8 still can result in Q that is far from orthogonal (i.e., Q Q I is much larger than the machine ɛ) when A is ill-conditioned. Exercise Show that the least squares solution of Ax = b is given by x = R 1 Q T b, where A = QR is the QR decomposition of A. 1.2.3. Householder reflections. A numerically stable way to perform the QR decomposition is via using the Householder transformations or reflections. Definition 3. A Householder transformation (or reflection) is a matrix of the form P = I 2uu t where u = 1. Exercise Show that P is both symmetric and orthogonal, i.e., P = P T and P P T = I. The Householder transformation is called reflection because it P x is the reflection of x with respect to the plane normal to u. Given a vector x we can find u such that P = I 2uu T zeros out all entries of x except the first one, i.e., P x = [c, 0,..., 0] T = ce 1. We do it as follows. Write P x = x 2u(u T x) = ce 1. Hence u = x ce 1 2u T x, i.e., u is a linear combination of x and e 1. Since u is parallel to ũ = x ± x e 1. Hence x = P x = c, u = ũ ũ. One can verify that either choice of sign yields a u satisfying P x = ce 1 as long as ũ 0. We will stick with ũ = x + sign(x 1 ) x e 1. In summary we have x 1 + sign(x 1 ) x x 2 (4) ũ =. with u = ũ ũ. x n Example We will show how to compute the QR decomposition of a 5 4 matrix A using the Householder reflections. x denotes a generic nonzero entry of A. x x x x (1) Choose P 1 so that A 1 := P 1 A =.

[ (2) Choose P 2 = (3) Choose P 3 = (4) Choose P 4 = 1 0 1 4 0 4 1 P 2 0 1 1 0 0 2 3 0 3 2 P 3 1 0 0 0 3 2 0 1 0 0 0 1 0 2 3 P 4 ] so that A 2 := P 2 A 1 = so that A 3 := P 3 A 2 = so that A 4 := P 4 A 3 = x x x x. x x x x 0 0 0 x 0 0 0 x Now we need to understand how to extract the matrices Q and R. Then A 4 = P 4 P 3 P 2 P 1 A. Therefore A = (P 4 P 3 P 2 P 1 ) 1 A 4 = P1 1 P2 1 P3 1 P4 1 A 4. Since P j s j = 1, 2, 3, 4 are symmetric and orthogonal,. x x x x 0 0 0 x 0 0 0 0 A = P 1 1 P 1 2 P 1 3 P 1 4 A 4 = P T 1 P T 2 P T 3 P T 4 A 4 = P 1 P 2 P 3 P 4 A 4. On the other hand, A = QR. Hence R is the first 4 rows of A 4 and Q is the first 4 columns of P T 1 P T 2 P T 3 P T 4 = P 1P 2 P 3 P 4. In Matlab, the least squares solution of Ax = b is found by A\b. The QR decomposition per se can be obtained by [Q,R]=qr(A). References [1] James W. Demmel, Applied Numerical Linear Algebra, SIAM, 1997. 9