The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression


 Hillary Cobb
 1 years ago
 Views:
Transcription
1 The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonaldiagonalorthogonal type matrix decompositions Every matrix, even nonsquare, has an SVD The SVD contains a great deal of information and is very useful as a theoretical and practical tool ******************************************************************* 1 Preliminaries Unless otherwise indicated, all vectors are column vectors u 1 u R n u 2 = u =. Rn 1 u n 1
2 Definition 1.1 Let u R n, so that u =(u 1,u 2,...u n ) T. The (Euclidean) norm of u is defined as ( n ) 1/2 u 2 = u u u2 n = u 2 j j=1 Definition 1.2 A vector u R n is a unit vector or normalized if u 2 = 1 Definition 1.3 Let A =(a ij ) R m n. The transpose A T (a ji ) R n m. of A is the matrix Example 1.4 ( ) T =
3 Definition 1.5 (Matrix Multiplication) Let A R m n,b R n p. Then the product AB is defined elementwise as (AB) ij = n a ik b kj k=1 and the matrix AB R m p Definition 1.6 Let u, v R n. Then the inner product of u and v, written u, v is defined as n u, v = u j v j = u T v j=1 Note that this notation permits us to write matrix multiplication as entrywise inner products of the rows and columns of the matrices If we denote the i th row of A by i A and the j th column of B by B j we have (AB) ij = ( i A) T,B j = i AB j 3
4 Example 1.7 ( ) = 1 (2) + 1 (0) + 0 (6) 1 (3) + 1 ( 2)+0 ( 3) 3 (2) + 2 (0) + 1 (6) 3 (3) + 2 ( 2)+1 ( 3) = ( ) Definition 1.8 Two vectors u, v R n are orthogonal if u, v = u T v = ( ) v 2 u 1 u 2 u n. v 1 v n = u 1 v 1 + u 2 v u n v n = 0 If u, v are orthogonal and both u 2 =1and v 2 =1, then we say u and v are orthonormal 4
5 Recall that the ndimensional identity matrix is I n = We ll write I for the identity matrix when the size is clear from the context. Definition 1.9 A square matrix Q R n n is orthogonal if Q T Q = I. This definition means that the columns of an orthogonal matrix A are mutually orthogonal unit vectors in R n Alternatively, the columns of A are an orthonormal basis for R n Now Definition 1.9 shows that Q T is the leftinverse of Q 5
6 But since matrix multiplication is associative, Q T is the rightinverse (and hence the inverse) of Q  indeed, let P be a rightinverse of Q (so that QP = I); then (Q T Q)P = Q T (QP ) IP = Q T I P = Q T The SVD is applicable to even nonsquare matrices with complex entries, but for clarity we will restrict our initial treatment to real square matrices ******************************************************************* 2 Structure of the SVD Definition 2.1 Let A R n n. Then the (full) singular value decomposition of A is σ σ 2 0 (V 1 ) T A = UΣV T (V 2 ) T = U 1 U 2 U m 0 σ n (V n ) T 0 0 where U, V are orthogonal matrices and Σ is diagonal The σ i s are the singular values of A, by convention arranged in nonincreasing order σ 1 σ 2 σ n 0; the columns of U are termed left singular vectors of A; the columns of V are called right singular vectors of A 6
7 Since U and V are orthogonal matrices, the columns of each form orthonormal (mutually orthogonal, all of length 1) bases for R n We can use these bases to illuminate the fundamental property of the SVD: For the equation Ax = b, the SVD makes every matrix diagonal by selecting the right bases for the range and domain Let b, x R n such that Ax = b, and expand b in the columns of U and x in the columns of V to get b = U T b, x = V T x. 7
8 Then we have b = Ax U T b = U T Ax = U T (UΣV T )x = (U T U)Σ(V T x) = IΣx = Σx or b = Ax b =Σx Let y R n, then the action of left multiplication of y by A (computing z = Ay) is decomposed by the SVD into three steps z = Ay = (UΣV T ) y = UΣ(V T y) = UΣ c (c := V T y) = Uw (w := Σ c) 8
9 c = V T y is the analysis step, in which the components of y, in the basis of R n given by the columns of V, are computed w =Σc is the scaling step in which the components c i, i {1, 2,...,n} are dilated z = Uw is the synthesis step, in which z is assembled by scaling each of the R n basis vectors u i by w i and summing 9
10 So how do we find the matrices U, Σ, and V in the SVD of some A R n n? Since V T V = I = U T U, A = UΣV T yields AV = U Σ and (1) U T A = ΣV T or, taking transposes A T U = V Σ (2) Or, for each j {1, 2,...,n}, Av j = σ j u j from Equation 1 (3) A T u j = σ j v j from Equation 2 (4) Now we multiply Equation 3 by A T to get 10
11 A T Av j = A T σ j u j = σ j A T u j = σ 2 j v j So the v j s are the eigenvectors of A T A with corresponding eigenvalues σ 2 j Note that (A T A) ij = i AA j or A T A = 1AA 1 1 AA 2 1AA n 2AA 1 2 AA naa 1 naa n (5) A T A is a matrix of inner products of columns of A  often called the Gram matrix of A We ll see the Gram matrix again when considering applications 11
12 Let s do an example: A = = A T = = A T A = To find the eigenvectors v and the corresponding eigenvalues λ for B := A T A, we solve Bx = λx (B λi)x =0 for λ and x 12
13 The standard technique for finding such λ and v is to first note that we are looking for the λ that make the matrix B λi = λ λ λ = 3 λ λ λ singular This is most easily done by solving det(b λi) =0: 3 λ λ λ = (3 λ)(1 λ)(2 λ) 2+λ = λ 3 +6λ 2 10λ +4 = 0 σ 2 1 = λ 1 = 2+ 2 σ 2 2 = λ 2 = 2 σ 2 3 = λ 3 =
14 Now (for a gentle first step) we ll find a vector v 2 so that A T Av 2 =2v 2 We do this by finding a basis for the nullspace of A T A 2I = = Certainly any vector of the form 0 0, t t R, is mapped to zero by A T A 2I 0 So we can set v 2 =
15 To find v 1 we find a basis for the nullspace of A T A (2 + 2)I = which rowreduces ( R2 (1 + 2)R1+R2, then R3 R2 )to So any vector of the form s ( 1+ 2)s 0 is mapped to zero by A T A (2 + 2)I so v 1 = spans the nullspace of AT A λ 1 I, but v 1 =1 15
16 So we set v 1 = v 1 v 1 = We could find v 3 in a similar manner, but in this particular case there s a quicker way... v 3 = (v 1 ) 2 (v 1 ) 1 0 = Certainly v 3 v 2 and by construction v 3 v 1  recall the theorem from linear algebra symmetric matrices must have orthogonal eigenvectors 16
17 We ve found V = v 1 v 2 v 3 = And of course Σ= Now, how do we find U? 17
18 If σ n > 0, Σ is invertible and U = AV Σ 1 So we have U = = ( 2 1) =
19 Figure 1: The columns of A in the unit sphere 19
20 Figure 2: The columns of U in the unit sphere 20
21 Figure 3: The columns of V in the unit sphere 21
22 Figure 4: The columns of Σ in the ellipse formed by Σ acting on the unit sphere by leftmultiplication Figure 5: The columns of AV = UΣ in the ellipse formed by A acting on the unit sphere by leftmultiplication Note that the columns of U and V are orthogonal (as are, of course, the columns of Σ) 22
23 Note that in practice, the SVD is computed more efficiently than by the direct method we used here; usually by (OK, get ready for the gratuitous mathspeak) reducing A to bidiagonal form U 1 BV T 1 by elementary reflectors or Givens rotations and directly computing the SVD of B (= U 2 ΣV T 2 ) then the SVD of A is ( U 1 U 2 )Σ(V T 2 V T 1 ) If σ n =0, then A is singular and the entire process above must be modified slightly but carefully. If r is the rank of A (the number of nonzero rows of the rowechelon form of A) then n r singular values of A are zero (equivalently if there are n r zero rows in the rowechelon form of A), so Σ 1 is not defined, and we define the pseudoinverse Σ + of Σ as Σ + = diag(σ 1 1,σ 1 2,..., σ 1 r, 0,..., 0) 23
24 Thus we can define the first r columns of U via AV Σ + and to complete U we choose any n r orthonormal vectors which are also orthogonal to span{u 1,u 2,...,u r }, via, for example, GramSchmidt Recall that the SVD is defined for even nonsquare matrices In this case, the above process is modified to permit U and V to have different sizes If A R m n, then U R m m Σ R m n V R n n 24
25 In the case m>n: a 11 a 12 a 1n a 21 a 22 a 2n A =..... a m1 a mn = u 11 u 12 u 1n u 1m u 21 u 22 u 2n u 2m..... u m1 u mn u mm σ σ v 11 v 12 v 1n v 0 σ 21 v 22. n v n1 v nn 0 0 or, in another incarnation of the SVD (the reduced SVD) u 11 u 12 u 1n u 21 u 22. A =.... u m1 u mn σ σ σ n v 11 v 12 v 1n v 21 v v n1 v nn where the matrix U is no longer square (so it can t be orthogonal) but still has orthonormal columns 25
26 If m<n: A = a 11 a 12 a 1m a 1n a 21 a 22 a 2n..... a m1 a mn = u 11 u 12 u 1n u 21 u u n1 u nn σ σ σ n 0 0 v 11 v 12 v 1n v 1m v 21 v 22 v 2n v 2m..... v n1 v n2 v nn v nm..... v m1 v mn v mm 26
27 In which case the reduced SVD is A = u 11 u 12 u 1n u 21 u σ σ v 11 v 12 v 1n v 1m v 21 v 22 v 2n v 2m..... u n1 u nn 0 σ n v n1 v n2 v nn v nm ******************************************************************* 3 Properties of the SVD Recall r is the rank of A; the number of nonzero singular values of A range (A) = span {u 1,u 2,...,u r } range (A T ) = span {v 1,v 2,...,v r } null (A) = span {v r+1,v r+2,..., v n } null (A T ) = span {u r+1,u r+2,..., u m } 27
28 For A R n n, det A = n i=1 σ i The SVD of an m n matrix A leads to an easy proof that the image of the unit sphere S n 1 under leftmultiplication by A is a hyperellipse with semimajor axes of length σ 1,σ 2,...,σ n The condition number of an m n matrix A, with m n, is κ(a) = σ 1 σ n Used in numerics, κ(a) is a measure of how close A is to being singular with respect to floatingpoint computation The 2norm of A is A 2 := sup { } Ax 2 x 2 =1 The Frobenius norm of A is A F := ( m i=1 n j=1 ) 1/2 a 2 ij 28
29 We have A 2 = σ 1 and A F = σ σ σ2 n since both matrix norms are invariant under orthogonal transformations (multiplication by orthogonal matrices) Note that although the singular values of A are uniquely determined, the left and right singular vectors are only determined up to a sequence of sign choices for the columns of either U or V So the SVD is not generally unique, there are 2 (max m,n) possible SVD s for a given matrix A If we fix signs for, say, column 1 of V, then the sign for column 1 of U is determined  recall AV = UΣ 29
30 4 Symmetric Orthogonalization For nonsingular A, the matrix L := UV T is called the symmetric orthogonalization of the matrix A L is unique since any sequence of sign choices for the columns of V determines a sequence of signs for the columns of U L ij = U i1 (V T ) 1j + U i2 (V T ) 2j + U i3 (V T ) 3j + + U in (V T ) ni = U i1 V j1 + U i2 V j2 + U i3 V j3 + + U in V jn Like GramSchmidt orthogonalization, it takes as input a linearly independent set (the columns of A) and outputs an orthonormal set 30
31 (Classical) GramSchmidt is unstable due to repeated subtractions; Modifed GramSchmidt remedies this But occasionally we want to disturb the original set of vectors as little as possible Theorem 4.1 Over all orthogonal matrices Q,, A Q F Q = L. is minimized when 31
32 Figure 6: The columns of L := UV T and the columns of A 32
33 5 Applications of the SVD Symmetric Orthogonalization was invented by a Swedish chemist, PerOlov Löwdin, for the purpose of orthogonalizing hybrid electron orbitals Also has application in 4G wireless communication standard, Orthogonal FrequencyDivision Multiplexing (OFDM) Nonorthogonal carrier waves with ideal properties, good timefrequency localization, orthogonalized in this manner have maximal TFlocalization among all orthogonal carriers Carrier waves are continuous (complexvalued) functions and not matrices, but there is an inner product defined for pairs of carrier waves via integration With that inner product, the Gram matrix of the set of carrier waves can be computed 33
34 The symmetrically orthogonalized Gram matrix is then used to provide coefficients for linear combinations of the carrier waves These linear combinations are orthogonal (hence suitable for OFDM) and optimally TFlocalized The SVD also has a natural application to finding the least squares solution to Ax = b (i.e., a vector x with minimal Ax b 2 ) where Ax = b is inconsistent (e.g., A R m n,m>n,r= n) But perhaps the most visually striking property of the SVD comes from an application in image compression 34
35 We can rewrite Σ as σ σ = 0 σ n σ } 0 {{ 0 } Σ σ } {{ 0 } Σ } {{ σ n } Σ n = Σ 1 + Σ Σ n Now consider the SVD A = U 1 U 2 U n ( Σ 1 + Σ Σ n ) (V 1 ) T (V 2 ) T. (V n ) T 35
36 and focus on, say, the first term U 1 U 2 U n ( Σ 1 ) (V 1 ) T (V 2 ) T. (V n ) T = σ 1U (V 1 ) T (V 2 ) T. (V n ) T = σ 1U (V 1 ) T 0. 0 = σ 1 U 1 (V 1 ) T In general UΣ k V T = σ k U k (V k ) T So A = n σ j U j (V j ) T j=1 which is an expression of A as a sum of rankone matrices 36
37 In this representation of A, we can consider partial sums For any k with 1 k n, define A (k) = k σ j U j (V j ) T j=1 This amounts to discarding the smallest n k singular values and their corresponding singular vectors, and storing only the V j s and the s j U j s Theorem 5.1 Among all rankk matrices P, A P F is minimized for P = A (k) Theorem 5.1 says that the k th partial sum of A (n) captures as much of the energy of A as possible 37
38 Example Consider the 320by200pixel image below This is stored as a matrix of grayscale values, between 0 (black) and 1 (white), denoted by A clown We can take the SVD of A clown 38
39 By Theorem 5.1, A (k) clown is the best rankk approximation to A clown, measured by the Frobenius norm Storage required for A (k) clown is a total of ( ) k bytes for storing σ 1u 1 through σ k u k and v 1 through v k = 64, 000 bytes required to store A clown explicitly Now consider the rank20 approximation to the original image, and the difference between the images 39
40 Figure 7: Rank20 approximation A (20) clown and A A(20) clown 40
41 The original image took 64 kb, while the lowrank approximation required ( ) 20 = 10.4 kb, a compression ratio of.1625 The SVD can also make you rich  but that s a topic for another time... For further investigation, see Numerical Linear Algebra by Trefethen Applied Numerical Linear Algebra by Demmel Matrix Analysis by Horn and Johnson Matrix Computations by Golub and van Loan 41
Orthogonal Bases and the QR Algorithm
Orthogonal Bases and the QR Algorithm Orthogonal Bases by Peter J Olver University of Minnesota Throughout, we work in the Euclidean vector space V = R n, the space of column vectors with n real entries
More informationFrom Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images
SIAM REVIEW Vol. 51,No. 1,pp. 34 81 c 2009 Society for Industrial and Applied Mathematics From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images Alfred M. Bruckstein David
More informationSketching as a Tool for Numerical Linear Algebra
Foundations and Trends R in Theoretical Computer Science Vol. 10, No. 12 (2014) 1 157 c 2014 D. P. Woodruff DOI: 10.1561/0400000060 Sketching as a Tool for Numerical Linear Algebra David P. Woodruff IBM
More informationA Modern Course on Curves and Surfaces. Richard S. Palais
A Modern Course on Curves and Surfaces Richard S. Palais Contents Lecture 1. Introduction 1 Lecture 2. What is Geometry 4 Lecture 3. Geometry of InnerProduct Spaces 7 Lecture 4. Linear Maps and the Euclidean
More informationA Tutorial on Spectral Clustering
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Spemannstr. 38, 7276 Tübingen, Germany ulrike.luxburg@tuebingen.mpg.de This article appears in Statistics
More informationCOSAMP: ITERATIVE SIGNAL RECOVERY FROM INCOMPLETE AND INACCURATE SAMPLES
COSAMP: ITERATIVE SIGNAL RECOVERY FROM INCOMPLETE AND INACCURATE SAMPLES D NEEDELL AND J A TROPP Abstract Compressive sampling offers a new paradigm for acquiring signals that are compressible with respect
More informationSubspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at UrbanaChampaign
More informationSemiSimple Lie Algebras and Their Representations
i SemiSimple Lie Algebras and Their Representations Robert N. Cahn Lawrence Berkeley Laboratory University of California Berkeley, California 1984 THE BENJAMIN/CUMMINGS PUBLISHING COMPANY Advanced Book
More informationFoundations of Data Science 1
Foundations of Data Science John Hopcroft Ravindran Kannan Version /4/204 These notes are a first draft of a book being written by Hopcroft and Kannan and in many places are incomplete. However, the notes
More informationHow Bad is Forming Your Own Opinion?
How Bad is Forming Your Own Opinion? David Bindel Jon Kleinberg Sigal Oren August, 0 Abstract A longstanding line of work in economic theory has studied models by which a group of people in a social network,
More informationTHESE. Christof VÖMEL CERFACS
THESE pour obtenir LE TITRE DE DOCTEUR DE L INSTITUT NATIONAL POLYTECHNIQUE DE TOULOUSE Spécialité: Informatique et Télécommunications par Christof VÖMEL CERFACS Contributions à la recherche en calcul
More informationHighRate Codes That Are Linear in Space and Time
1804 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 7, JULY 2002 HighRate Codes That Are Linear in Space and Time Babak Hassibi and Bertrand M Hochwald Abstract Multipleantenna systems that operate
More informationIndexing by Latent Semantic Analysis. Scott Deerwester Graduate Library School University of Chicago Chicago, IL 60637
Indexing by Latent Semantic Analysis Scott Deerwester Graduate Library School University of Chicago Chicago, IL 60637 Susan T. Dumais George W. Furnas Thomas K. Landauer Bell Communications Research 435
More informationAn Efficient and Parallel Gaussian Sampler for Lattices
An Efficient and Parallel Gaussian Sampler for Lattices Chris Peikert April 13, 2011 Abstract At the heart of many recent latticebased cryptographic schemes is a polynomialtime algorithm that, given
More informationDecoding by Linear Programming
Decoding by Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles, CA 90095
More informationFourier Theoretic Probabilistic Inference over Permutations
Journal of Machine Learning Research 10 (2009) 9971070 Submitted 5/08; Revised 3/09; Published 5/09 Fourier Theoretic Probabilistic Inference over Permutations Jonathan Huang Robotics Institute Carnegie
More informationTHE PROBLEM OF finding localized energy solutions
600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,
More informationFrom Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose. Abstract
To Appear in the IEEE Trans. on Pattern Analysis and Machine Intelligence From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose Athinodoros S. Georghiades Peter
More informationCompressed Sensing with Coherent and Redundant Dictionaries
Compressed Sensing with Coherent and Redundant Dictionaries Emmanuel J. Candès 1, Yonina C. Eldar 2, Deanna Needell 1, and Paige Randall 3 1 Departments of Mathematics and Statistics, Stanford University,
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289. Compressed Sensing. David L. Donoho, Member, IEEE
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289 Compressed Sensing David L. Donoho, Member, IEEE Abstract Suppose is an unknown vector in (a digital image or signal); we plan to
More informationNew Results on Hermitian Matrix RankOne Decomposition
New Results on Hermitian Matrix RankOne Decomposition Wenbao Ai Yongwei Huang Shuzhong Zhang June 22, 2009 Abstract In this paper, we present several new rankone decomposition theorems for Hermitian
More informationSome Applications of Laplace Eigenvalues of Graphs
Some Applications of Laplace Eigenvalues of Graphs Bojan MOHAR Department of Mathematics University of Ljubljana Jadranska 19 1111 Ljubljana, Slovenia Notes taken by Martin Juvan Abstract In the last decade
More informationGeneralized compact knapsacks, cyclic lattices, and efficient oneway functions
Generalized compact knapsacks, cyclic lattices, and efficient oneway functions Daniele Micciancio University of California, San Diego 9500 Gilman Drive La Jolla, CA 920930404, USA daniele@cs.ucsd.edu
More informationSome notes on group theory.
Some notes on group theory. Eef van Beveren Departamento de Física Faculdade de Ciências e Tecnologia P COIMBRA (Portugal) February Contents Some intuitive notions of groups.. What is a group?..............................
More informationThe Backpropagation Algorithm
7 The Backpropagation Algorithm 7. Learning as gradient descent We saw in the last chapter that multilayered networks are capable of computing a wider range of Boolean functions than networks with a single
More informationA Measure of Similarity between Graph Vertices: Applications to Synonym Extraction and Web Searching
SIAM REVIEW Vol 46, No 4, pp 647 666 c 2004 Society for Industrial and Applied Mathematics A Measure of Similarity between Graph Vertices: Applications to Synonym Extraction and Web Searching Vincent D
More informationMODULES OVER A PID KEITH CONRAD
MODULES OVER A PID KEITH CONRAD Every vector space over a field K that has a finite spanning set has a finite basis: it is isomorphic to K n for some n 0. When we replace the scalar field K with a commutative
More informationObject Detection with Discriminatively Trained Part Based Models
1 Object Detection with Discriminatively Trained Part Based Models Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan Abstract We describe an object detection system based on mixtures
More informationControllability and Observability of Partial Differential Equations: Some results and open problems
Controllability and Observability of Partial Differential Equations: Some results and open problems Enrique ZUAZUA Departamento de Matemáticas Universidad Autónoma 2849 Madrid. Spain. enrique.zuazua@uam.es
More informationCommunication Theory of Secrecy Systems
Communication Theory of Secrecy Systems By C. E. SHANNON 1 INTRODUCTION AND SUMMARY The problems of cryptography and secrecy systems furnish an interesting application of communication theory 1. In this
More information