Notes on the Singular Value Decomposition

Transcription

1 Notes on the Singular Value Decomposition Robert A van de Geijn The University of Texas at Austin Austin, TX 7871 September 11, 14 NOTE: I have not thoroughly proof-read these notes!!! We recommend the reader review Weeks 9-11 of Linear Algebra: Foundations to Frontiers - Notes to LAFF With 1 Orthogonality and Unitary Matrices Definition 1 Let u, v C m These vectors are orthogonal perpendicular if u v Definition Let q, q 1,, q n 1 C m These vectors are said to be mutually orthonormal if for all i, j < n { qi 1 if i j q j otherwise Notice that for n vectors of length m to be mutually orthonormal, n must be less than or equal to m This is because n mutually orthonormal vectors are linearly independent and there can be at most m linearly independent vectors of length m Definition 3 Let Q C m n with n m Then Q is said to be an orthonormal matrix if Q Q I the identity Exercise 4 Let Q C m n with n m Partition Q q q 1 q n 1 Show that Q is an orthonormal matrix if and only if q, q 1,, q n 1 are mutually orthonormal Definition 5 Let Q C m m Then Q is said to be a unitary matrix if Q Q I the identity Notice that unitary matrices are always square and only square matrices can be unitary Sometimes the term orthogonal matrix is used instead of unitary matrix, especially if the matrix is real valued Exercise 6 Let Q C m m Show that if Q is unitary then Q 1 Q and QQ I Exercise 7 Let Q, Q 1 C m m both be unitary Show that their product, Q Q 1, is unitary Exercise 8 Let Q, Q 1,, Q k 1 C m m all be unitary Show that their product, Q Q 1 Q k 1, is unitary The following is a very important observation: Let Q be a unitary matrix with Q q q 1 q m 1 1

2 Let x C m Then x QQ x q q 1 q m 1 q q 1 q m 1 x q q 1 q m 1 q q 1 q m 1 q q 1 q m 1 q x q 1 x q m 1x x q xq + q 1 xq q m 1xq m 1 What does this mean? The vector x χ χ 1 gives the coefficients when the vector x is written as a linear combination χ m 1 of the unit basis vectors: x χ e + χ 1 e χ m 1 e m 1 The vector Q x q x q 1 x q m 1x gives the coefficients when the vector x is written as a linear combination of the orthonormal vectors q, q 1,, q m 1 : x q xq + q 1 xq q m 1xq m 1 The vector q i xq i equals the component of x in the direction of vector q i Another way of looking at this is that if q, q 1,, q m 1 is an orthonormal basis for C m, then any x C m can be written as a linear combination of these vectors: Now, x α q + α 1 q α m 1 q m 1 q i x q i α q + α 1 q α i 1 q i 1 + α i q i + α i+1 q i α m 1 q m 1 α qi q α i + α 1 q i q α i 1 q i q i 1 + α i q i q i 1 + α i+1 q i q i α m 1 q i q m 1 Thus q i x α i, the coefficient that multiplies q i

3 Exercise 9 Let U C m m be unitary and x C m, then Ux x Exercise 1 Let U C m m and V C n n be unitary matrices and A C m n Then UA AV A Exercise 11 Let U C m m and V C n n be unitary matrices and A C m n Then UA F AV F A F Toward the SVD Lemma 1 Given A C m n there exists unitary U C m m, unitary V C n n, and diagonal D R m n DT such that A UDV L where D with D T L diagδ,, δ r 1 and δ i > for i < r Proof: First, let us observe that if A the zero matrix then the theorem trivially holds: A UDV where U I m m, V I n n, and D, so that D T L is Thus, wlog assume that A We will prove this for m n, leaving the case where m n as an exercise, employing a proof by induction on n Base case: n 1 In this case A a where a R m is its only column By assumption, a Then A a u a 1 where u a / a Choose U 1 C m m 1 so that U u U 1 is unitary Then A a u a 1 where D T L δ a and V 1 u U 1 a 1 UDV Inductive step: Assume the result is true for all matrices with 1 k < n columns Show that it is true for matrices with n columns Let A C m n with n Wlog, A so that A Let δ and v C n have the property that v 1 and δ Av A In other words, v is the vector that maximizes max x 1 Ax Let u Av /δ Note that u 1 Choose U 1 C m m 1 and V 1 C n n 1 so that Ũ u U 1 and Ṽ v V 1 are unitary Then Ũ AṼ u U 1 A v V 1 u Av u AV 1 δ u u u AV 1 U1 Av U1 AV 1 δu1 u U1 AV 1 δ w B where w V1 A u and B U1 AV 1 Now, we will argue that w, the zero vector of appropriate size: δ w δ A U AV max x 3 U AV x x max x B x x,

4 δ w B δ w δ w δ + w w δ + w w δ + w w Thus δ δ + w h which means that w and Ũ AṼ δ + w w Bw δ w δ B By the induction hypothesis, there exists unitary Ǔ Cm 1 m 1, unitary ˇV C n 1 n 1, and Ď R m 1 n 1 such that B ǓĎ ˇV where Ď ĎT L with ĎT L diagδ 1,, δ r 1 Now, let U Ũ 1, V Ǔ Ṽ 1 δ, and D ˇV Ď There are some really tough to see checks in the definition of U, V, and D!! Then A UDV where U, V, and D have the desired properties By the Principle of Mathematical Induction the result holds for all matrices A C m n with m n Exercise 13 Let D diagδ,, δ n 1 Show that D max n 1 i δ i Exercise 14 Let A AT Use the SVD of A to show that A A T Exercise 15 Assume that U C m m and V C n n be unitary matrices Let A, B C m n with B UAV Show that the singular values of A equal the singular values of B σ Exercise 16 Let A C m n with A and assume that A σ Show that B A B int: Use the SVD of B Exercise 17 Prove Lemma 1 for m n You can use the following as an outline for your proof: Proof: First, let us observe that if A the zero matrix then the theorem trivially holds: A UDV where U I m m, V I n n, and D, so that D T L is Thus, wlog assume that A We will employ a proof by induction on m Base case: m 1 In this case A where â T R1 n is its only row By assumption, â T Then A â T â T â T 1 v 4

5 where v â T / â T Choose V 1 C n n 1 so that V v V 1 is unitary Then A â T 1 â T v V 1 UDV where D T L δ â T and U 1 Inductive step: Similarly modify the inductive step of the proof of the theorem By the Principle of Mathematical Induction the result holds for all matrices A C m n with m n 3 The Theorem Theorem 18 Singular Value Decomposition Given A C m n there exists unitary U C m m, unitary ΣT V C n n, and Σ R m n such that A UΣV L where Σ with Σ T L diagσ,, σ r 1 and σ σ 1 σ r 1 > The σ,, σ r 1 are known as the singular values of A Proof: Notice that the proof of the above theorem is identical to that of Lemma 1 owever, thanks to the above exercises, we can conclude that B σ in the proof, which then can be used to show that the singular values are found in order Proof:Alternative An alternative proof uses Lemma 1 to conclude that A UDV If the entries on the diagonal of D are not ordered from largest to smallest, then this can be fixed by permuting the rows and columns of D, and correspondingly permuting the columns of U and V 4 Geometric Interpretation Again We will now quickly illustrate what the SVD Theorem tells us about matrix-vector multiplication linear transformations by examining the case where A R Let A UΣV T be its SVD decomposition Notice that all matrices are now real valued, and hence V V T Partition A u u 1 σ σ 1 v v 1 T Since U and V are unitary matrices, {u, u 1 } and {v, v 1 } form orthonormal bases for the range and domain of A, respectively: 5

6 R : Domain of A R : Range codomain of A Let us manipulate the decomposition a little: [ σ T σ A u u 1 v v 1 u u 1 σ 1 σ 1 T σ u σ 1 u 1 v v 1 ] v v 1 T Now let us look at how A transforms v and v 1 : T Av σ u σ 1 u 1 v v 1 v σ u σ 1 u 1 1 σ u and similarly Av 1 σ 1 u 1 This motivates the pictures R : Domain of A R : Range codomain of A 6

7 Now let us look at how A transforms any vector with Euclidean unit length Notice that x means that x χ e + χ 1 e 1, where e and e 1 are the unit basis vectors Thus, χ and χ 1 are the coefficients when x is expressed using e and e 1 as basis owever, we can also express x in the basis given by v and v 1 : T x V }{{ V T v } x v v 1 v v 1 x T x v v 1 v1 T x I v T T α x v + v1 x v 1 α v + α v 1 v v 1 α 1 α α 1 χ χ 1 Thus, in the basis formed by v and v 1, its coefficients are α and α 1 Now, Ax σ u σ 1 u 1 σ u σ 1 u 1 α α 1 v v 1 T x σ u σ 1 u 1 α σ u + α 1 σ 1 u 1 v v 1 T v v 1 α α 1 This is illustrated by the following picture, which also captures the fact that the unit ball is mapped to an ellipse 1 with major axis equal to σ A and minor axis equal to σ 1 : R : Domain of A R : Range codomain of A Finally, we show the same insights for general vector x not necessarily of unit length 1 It is not clear that it is actually an ellipse and this is not important to our observations 7

8 R : Domain of A R : Range codomain of A Another observation is that if one picks the right basis for the domain and codomain, then the computation Ax simplifies to a matrix multiplication with a diagonal matrix Let us again illustrate this for nonsingular A R with A u u 1 } {{ } U σ } σ 1 {{ } Σ v v 1 } {{ } V Now, if we chose to express y using u and u 1 as the basis and express x using v and v 1 as the basis, then ŷ ψ UU }{{ T } y u T yu + u T 1 yu 1 ψ 1 I x V }{{ V T } x v T xv + v1 T xv 1 I χ χ 1 T If y Ax then U U T y ŷ UΣV T x Ax UΣ x so that ŷ Σ x and ψ ψ 1 These observation generalize to A C m m σ χ σ 1 χ 1 5 Consequences of the SVD Theorem Throughout this section we will assume that 8

9 A UΣV is the SVD of A C m n, with U and V unitary and Σ diagonal ΣT L Σ where Σ T L diagσ,, σ r 1 with σ σ 1 σ r 1 > U U L U R with U L C m r V V L V R with V L C n r We first generalize the observations we made for A R Let us track what the effect of Ax UΣV x is on vector x We assume that m n Let U u u m 1 and V v v n 1 Let x V V x v v n 1 v v n 1 x v v n 1 v x v n 1x v xv + + v n 1xv n 1 This can be interpreted as follows: vector x can be written in terms of the usual basis of C n as χ e + +χ 1 e n 1 or in the orthonormal basis formed by the columns of V as v xv + +v n 1xv n 1 Notice that Ax Av xv + + v n 1xv n 1 v xav + + v n 1xAv n 1 so that we next look at how A transforms each v i : Av i UΣV v i UΣe i σ i Ue i σ i u i Thus, another way of looking at Ax is Ax v xav + + v n 1xAv n 1 v xσ u + + v n 1xσ n 1 u n 1 σ u v x + + σ n 1 u n 1 v n 1x σ u v + + σ n 1 u n 1 v n 1 x Corollary 19 A U L Σ T L V L This is called the reduced SVD of A Proof: A UΣV U L U R Σ T L V L V R UL Σ T LV L Corollary Let A U L Σ T L VL be the reduced SVD with U L u u r 1, Σ T L diagσ,, σ r 1, and V L v v r 1 Then A σ u v + σ 1 u 1 v1 + + σ r 1 u r 1 vr 1 each term nonzero and an outer product, and hence a rank-1 matrix Proof: We leave the proof as an exercise Corollary 1 CA CU L Proof: 9

10 Let y CA Then there exists x C n such that y Ax by the definition of y CA But then y Ax U L Σ T L V L x z U L z, ie, there exists z C r such that y U L z This means y CU L Let y CU L Then there exists z C r such that y U L z But then y U L z U L Σ T L Σ 1 T L I z U L Σ T L V L V L I Σ 1 T L z A V LΣ 1 T L }{{ z } x Ax so that there exists x C n such that y Ax, ie, y CA Corollary The rank of A is r Proof: The rank of A equals the dimension of CA CU L But the dimension of CU L is clearly r Corollary 3 N A CV R Proof: Let x N A Then x V }{{ V } x V L V R V L V R x I VL V L V x R VR x V L VL x + V R VR x V L V R V L V R x If we can show that VL x then x V Rz where z VR x Assume that V L x Then Σ T L VL x since Σ T L is nonsingular and U L Σ T L VL x since U L has linearly independent columns But that contradicts the fact that Ax U L Σ T L VL x Let x CV R Then x V R z for some z C r and Ax U L Σ T L Corollary 4 For all x C n there exists z CV L such that Ax Az V L V R z Proof: Ax A } V {{ V } I x A V L V R V L V R x A V L V L x + V R V R x AV L V L x + AV R V R x AV L VL x + U L Σ T L VL V R V R x A V L V L x z 1

11 C n C m dim r z CA CU L CA CV L x z + x n y Az dim r y Ax U L Σ T L V L x y x n dim n r N A CV R N A CU R dim m r Figure 1: A pictorial description of how x z + x n is transformed by A C m n into y Ax Az + x n We see that CV L and CV R are orthogonal complements of each other within C n Similarly, CU L and CU R are orthogonal complements of each other within C m Any vector x can be written as the sum of a vector z CV R and x n CV C N A Alternative proof which uses the last corollary: Ax A V L V L x + V R V R x AV L V L x + A V R V R x N A A V L V L x z The proof of the last corollary also shows that Corollary 5 Any vector x C n can be written as x z + x n where z CV L and x n N A CV R Corollary 6 A V L Σ T L UL so that CA CV L and N A CU R The above corollaries are summarized in Figure 1 Theorem 7 Let A C n n be nonsingular Let A UΣV be its SVD Then 1 The SVD is the reduced SVD σ n 1 3 If U u u n 1, Σ diagσ,, σ n 1, and V v v n 1, then A 1 V P T P Σ 1 P T UP T 1 v n 1 v diag,, 1 σ n 1 σ u n 1 u, 11

12 1 1 where P is the permutation matrix such that P x reverses the order of the entries 1 in x Note: for this permutation matrix, P T P In general, this is not the case What is the case for all permutation matrices P is that P T P P P T I 4 A 1 1/σ n 1 Proof: The only item that is less than totally obvious is 3 Clearly A 1 V Σ 1 U The problem is that in Σ 1 the diagonal entries are not ordered from largest to smallest The permutation fixes this Corollary 8 If A C m n has linearly independent columns then A A is invertible nonsingular and A A 1 V L Σ 1 T L V L Proof: Since A has linearly independent columns, A U L Σ T L VL columns and V L is unitary ence is the reduced SVD where U L has n A A U L Σ T L VL U L Σ T L VL V L Σ T LUL U L Σ T L VL V L Σ T L Σ T L VL V L Σ T LVL Since V L is unitary and Σ T L is diagonal with nonzero diagonal entries, tey are both nonsingular Thus VL Σ T LVL V L Σ 1 T L V L I This means A T A is invertible and A T A 1 is as given 6 Projection onto the Column Space Definition 9 Let U L C m k have orthonormal columns The projection of a vector y C m onto CU L is the vector U L x that minimizes y U L x, where x C k We will also call this vector y the component of x in CU L Theorem 3 Let U L C m k have orthonormal columns The projection of y onto CU L is given by U L U L y Proof: The vector U L x that we want must satisfy U L x y min w C k U L w y Now, the -norm is invariant under multiplication by the unitary matrix U U L U R U L x y min w C k U L w y min U U L w y since the two norm is preserved w C k min U L U R UL w y w C k U min L U L w y w C k U R 1

13 U min L U U w C k UR L w L y UR U min L U L w U w C k UR U L y Lw UR y w U min L y w C k UR y w U min L y w C k UR y min w U w C k L y + U R y since min w U L y + w C k U R y u v u + v This is minimized when w UL y Thus, the vector that is closest to y in the space spanned by U L is given by x U L UL y Corollary 31 Let A C m n and A U L Σ T L VL CA is given by U L UL y be its reduced SVD Then the projection of y Cm onto Proof: This follows immediately from the fact that CA CU L Corollary 3 Let A C m n have linearly independent columns Then the projection of y C m onto CA is given by AA A 1 A y Proof: From Corrolary 8, we know that A A is nonsingular and that A A 1 V L Σ T L 1 V L Now, AA A 1 A y U L Σ T L V L V L Σ T L 1 V L U L Σ T L V L y U L Σ T L VL V L I Σ 1 T L Σ 1 T L V L V L I Σ T L U L y U L U L y ence the projection of y onto CA is given by AA A 1 A y Definition 33 Let A have linearly independent columns Then A A 1 A Moore-Penrose generalized inverse of matrix A is called the pseudo-inverse or 7 Low-rank Approximation of a Matrix Theorem 34 Let A C m n have SVD A UΣV and assume A has rank r Partition ΣT L U U L U R, V V L V R, and Σ Σ BR, 13

14 where U L C m k, V L C n k, and Σ T L R k k with k r Then B U L Σ T L VL closest to A in the following sense: is the matrix in Cm n A B min C C m n rankc k A C σ k Proof: First, if B is as defined, then clearly A B σ k : A B U A BV U AV U BV Σ U L U R B V L V ΣT L R Σ BR Σ BR Σ BR σ k ΣT L Next, assume that C has rank t k and A C < A B We will show that this leads to a contradiction The null space of C has dimension at least n k since dimn C + rankc n If x N C then Ax A Cx A C x < σ k x Partition U u u m 1 and V v v n 1 Then Av j σ j u j σ j σ s for j,, k Now, let x be any linear combination of v,, v k : x α v + + α k v k Notice that Then x α v + + α k v k α + α k Ax Aα v + + α k v k α Av + + α k Av k α σ u + + α k σ k u k α σ u + + α k σ k u k α σ + + α k σ k α + + α k σ k so that Ax σ k x In other words, vectors in the subspace of all linear combinations of {v,, v k } satisfy Ax σ k x The dimension of this subspace is k + 1 since {v,, v k } form an orthonormal basis Both these subspaces are subspaces of C n Since their dimensions add up to more than n there must be at least one nonzero vector z that satisfies both Az < σ k z and Az σ k z, which is a contradiction The above theorem tells us how to pick the best approximation with given rank to a given matrix 8 An Application Let Y R m n be a matrix that, for example, stores a picture In this case, the i, j entry in Y is, for example, a number that represents the grayscale value of pixel i, j The following instructions, executed in octave or matlab, generate the picture of Mexican artist Frida Kahlo in Figure top-left The file FridaPNGpng can be found at 14

15 Original picture k 1 k k 5 k 1 k 5 Figure : Multiple pictures as generated by the code 15

16 octave> IMG imread FridaPNGpng ; % this reads the image octave> Y IMG :,:,1 ; octave> imshow Y % this dispays the image Although the picture is black and white, it was read as if it is a color image, which means a m n 3 array of pixel information is stored Setting Y IMG :,:,1 extracts a single matrix of pixel information If you start with a color picture, you will want to approximate IMG :,:,1, IMG :,:,, and IMG :,:,3 separately Now, let Y UΣV T be the SVD of matrix Y Partition, conformally, U U L U R, V V L V R, and Σ ΣT L where U L and V L have k columns and Σ T L is k k so that Σ T L T Y U L U R V L V R Σ BR Σ T L V T L U L U R Σ BR VR T Σ T L VL T U L U R Σ BR VR T U L Σ T L V T L + U R Σ BR V T R Recall that then U L Σ T L VL T is the best rank-k approximation to Y Let us approximate the matrix that stores the picture with U L Σ T L VL T : >> IMG imread FridaPNGpng ; % read the picture >> Y IMG :,:,1 ; >> imshow Y ; % this dispays the image >> k 1; >> [ U, Sigma, V ] svd Y ; >> UL U :, 1:k ; % first k columns >> VL V :, 1:k ; % first k columns >> SigmaTL Sigma 1:k, 1:k ; % TL submatrix of Sigma >> Yapprox uint8 UL * SigmaTL * VL ; >> imshow Yapprox ; Σ BR As one increases k, the approximation gets better, as illustrated in Figure The graph in Figure 3 helps explain The original matrix Y is , with 181, 53 entries When k 1, matrices U, V, and Σ are 387 1, and 1 1, respectively, requiring only 8, 66 entries to be stores 9 SVD and the Condition Number of a Matrix In Notes on Norms we saw that if Ax b and Ax + δx b + δb, then δx x κ A δb b, where κ A A A 1 is the condition number of A, using the -norm Exercise 35 Show that if A C m m is nonsingular, then, 16

17 Figure 3: Distribution of singular values for the picture A σ, the largest singular value; A 1 1/σ m 1, the inverse of the smallest singular value; and κ A σ /σ m 1 If we go back to the example of A R, recall the following pictures that shows how A transforms the unit circle: 17

18 R : Domain of A R : Range codomain of A In this case, the ratio σ /σ n 1 represents the ratio between the major and minor axes of the ellipse on the right 1 An Algorithm for Computing the SVD? It would seem that the proof of the existence of the SVD is constructive in the sense that it provides an algorithm for computing the SVD of a given matrix A C m m Not so fast! Observe that Computing A is nontrivial Computing the vector that maximizes max x 1 Ax is nontrivial Given a vector q computing vectors q,, q m 1 is expensive as we will see when we discuss the QR factorization Towards the end of the course we will discuss algorithms for computing the eigenvalues and eigenvectors of a matrix, and related algorithms for computing the SVD 18