548 93 Advanced Topics in Linear Algebra Diagonalization and Jordan s Theorem A system of differential equations x = Ax can be transformed to an uncoupled system y = diag(λ,, λ n y by a change of variables x = P y, provided P is invertible and A satisfies the relation ( AP = P diag(λ,, λ n A matrix A is said to be diagonalizable provided ( holds This equation is equivalent to the system of equations Av k = λ k v k, k =,, n, where v,, v n are the columns of matrix P Since P is assumed invertible, each of its columns are nonzero, and therefore (λ k, v k is an eigenpair of A, k n The values λ k need not be distinct (eg, all λ k = if A is the identity This proves: Theorem 0 (Diagonalization An n n matrix A is diagonalizable if and only if A has n eigenpairs (λ k, v k, k n, with v,, v n independent In this case, A = P DP where D = diag(λ,, λ n and the matrix P has columns v,, v n Theorem (Jordan s theorem Any n n matrix A can be represented in the form A = P T P where P is invertible and T is upper triangular The diagonal entries of T are eigenvalues of A Proof: We proceed by induction on the dimension n of A For n = there is nothing to prove Assume the result for dimension n, and let s prove it when A is (n+ (n+ Choose an eigenpair (λ, v of A with v 0 Complete a basis v (,, v n+ for R n+ and define V = aug(v,, v n+ Then V AV = λ B for some matrices B and A 0 A The induction hypothesis implies there is an invertible n n matrix( P and an upper triangular ( matrix T such 0 that A = P T P λ BT Let R = and T = Then 0 P 0 T T is upper triangular and (V AV R = RT, which implies A = P T P for P = V R The induction is complete
93 Advanced Topics in Linear Algebra 549 Cayley-Hamilton Identity A celebrated and deep result for powers of matrices was discovered by Cayley and Hamilton (see [?], which says that an n n matrix A satisfies its own characteristic equation More precisely: Theorem 2 (Cayley-Hamilton Let det(a λi be expanded as the nth degree polynomial n p(λ = c j λ j, j=0 for some coefficients c 0,, c n and c n = ( n Then A satisfies the equation p(λ = 0, that is, n p(a c j A j = 0 j=0 In factored form in terms of the eigenvalues {λ j } n j= the matrix equation p(a = 0 can be written as (duplicates possible, ( n (A λ I(A λ 2 I (A λ n I = 0 Proof: If A is diagonalizable, AP = P diag(λ,, λ n, then the proof is obtained from the simple expansion A j = P diag(λ j,, λj np, because summing across this identity leads to p(a = n ( j=0 c ja j n = P j=0 c j diag(λ j,, λj n = P diag(p(λ,, p(λ n P = P diag(0,, 0P = 0 P If A is not diagonalizable, then this proof fails To handle the general case, we apply Jordan s theorem, which says that A = P T P where T is upper triangular (instead of diagonal and the not necessarily distinct eigenvalues λ,, λ n of A appear on the diagonal of T Using Jordan s theorem, define A ɛ = P (T + ɛ diag(, 2,, np For small ɛ > 0, the matrix A ɛ has distinct eigenvalues λ j +ɛj, j n Then the diagonalizable case implies that A ɛ satisfies its characteristic equation Let p ɛ (λ = det(a ɛ λi Use 0 = lim ɛ 0 p ɛ (A ɛ = p(a to complete the proof
550 An Extension of Jordan s Theorem Theorem 3 (Jordan s Extension Any n n matrix A can be represented in the block triangular form A = P T P, T = diag(t,, T k, where P is invertible and each matrix T i is upper triangular with diagonal entries equal to a single eigenvalue of A The proof of the theorem is based upon Jordan s theorem, and proceeds by induction The reader is invited to try to find a proof, or read further in the text, where this theorem is presented as a special case of the Jordan decomposition Solving Block Triangular Differential Systems A matrix differential system y (t = T y(t with T block upper triangular splits into scalar equations which can be solved by elementary methods for first order scalar differential equations To illustrate, consider the system y = 3y + x 2 + y 3, y 2 = 3y 2 + y 3, y 3 = 2y 3 The techniques that apply are the growth-decay formula for u = ku and the integrating factor method for u = ku + p(t Working backwards from the last equation with back-substitution gives y 3 = c 3 e 2t, y 2 = c 2 e 3t c 3 e 2t, y = (c + c 2 te 3t What has been said here applies to any triangular system y (t = T y(t, in order to write an exact formula for the solution y(t If A is an n n matrix, then Jordan s theorem gives A = P T P with T block upper triangular and P invertible The change of variable x(t = P y(t changes x (t = Ax(t into the block triangular system y (t = T y(t There is no special condition on A, to effect the change of variable x(t = P y(t The solution x(t of x (t = Ax(t is a product of the invertible matrix P and a column vector y(t; the latter is the solution of the block triangular system y (t = T y(t, obtained by growth-decay and integrating factor methods The importance of this idea is to provide a theoretical method for solving any system x (t = Ax(t We show in the Jordan Form section, infra, how to find matrices P and T in Jordan s extension A = P T P, using computer algebra systems
93 Advanced Topics in Linear Algebra 55 Symmetric Matrices and Orthogonality Described here is a process due to Gram-Schmidt for replacing a given set of independent eigenvectors by another set of eigenvectors which are of unit length and orthogonal (dot product zero or 90 degrees apart The process, which applies to any independent set of vectors, is especially useful in the case of eigenanalysis of a symmetric matrix: A T = A Unit eigenvectors An eigenpair (λ, x of A can always be selected so that x = If x, then replace eigenvector x by the scalar multiple cx, where c = / x By this small change, it can be assumed that the eigenvector has unit length If in addition the eigenvectors are orthogonal, then the eigenvectors are said to be orthonormal Theorem 4 (Orthogonality of Eigenvectors Assume that n n matrix A is symmetric, A T = A If (α, x and (β, y are eigenpairs of A with α β, then x and y are orthogonal: x y = 0 Proof: To prove this result, compute αx y = (Ax T y = x T A T y = x T Ay Also, βx y = x T Ay Subtracting the relations implies (α βx y = 0, giving x y = 0 due to α β The proof is complete Theorem 5 (Real Eigenvalues If A T = A, then all eigenvalues of A are real Consequently, matrix A has n real eigenvalues counted according to multiplicity Proof: The second statement is due to the fundamental theorem of algebra To prove the eigenvalues are real, it suffices to prove λ = λ when Av = λv with v 0 We admit that v may have complex entries We will use A = A (A is real Take the complex conjugate across Av = λv to obtain Av = λv Transpose Av = λv to obtain v T A T = λv T and then conclude v T A = λv T from A T = A Multiply this equation by v on the right to obtain v T Av = λv T v Then multiply Av = λv by v T on the left to obtain v T Av = λv T v Then we have λv T v = λv T v Because v T v = n j= v j 2 > 0, then λ = λ and λ is real The proof is complete Theorem 6 (Independence of Orthogonal Sets Let v,, v k be a set of nonzero orthogonal vectors Then this set is independent Proof: Form the equation c v + + c k v k = 0, the plan being to solve for c,, c k Take the dot product of the equation with v Then c v v + + c k v v k = v 0 All terms on the left side except one are zero, and the right side is zero also, leaving the relation c v v = 0
552 Because v is not zero, then c = 0 The process can be applied to the remaining coefficients, resulting in c = c 2 = = c k = 0, which proves independence of the vectors The Gram-Schmidt process The eigenvectors of a symmetric matrix A may be constructed to be orthogonal First of all, observe that eigenvectors corresponding to distinct eigenvalues are orthogonal by Theorem 4 It remains to construct from k independent eigenvectors x,, x k, corresponding to a single eigenvalue λ, another set of independent eigenvectors y,, y k for λ which are pairwise orthogonal The idea, due to Gram-Schmidt, applies to any set of k independent vectors Application of the Gram-Schmidt process can be illustrated by example: let (, v, (2, v 2, (2, v 3, (2, v 4 be eigenpairs of a 4 4 symmetric matrix A Then v is orthogonal to v 2, v 3, v 4 The vectors v 2, v 3, v 4 belong to eigenvalue λ = 2, but they are not necessarily orthogonal The Gram-Schmidt process replaces these vectors by y 2, y 3, y 4 which are pairwise orthogonal The result is that eigenvectors v, y 2, y 3, y 4 are pairwise orthogonal Theorem 7 (Gram-Schmidt Let x,, x k be independent n-vectors The set of vectors y,, y k constructed below as linear combinations of x,, x k are pairwise orthogonal and independent y = x y 2 = x 2 x 2 y y y y y 3 = x 3 x 3 y y y y x 3 y 2 y 2 y 2 y 2 y k = x k x k y y x k y k y k y y y k y k Proof: Induction will be applied on k to show that y,, y k are nonzero and orthogonal If k =, then there is just one nonzero vector constructed y = x Orthogonality for k = is not discussed because there are no pairs to test Assume the result holds for k vectors Let s verify that it holds for k vectors, k > Assume orthogonality y i y j = 0 for i j and y i 0 for
93 Advanced Topics in Linear Algebra 553 i, j k It remains to test y i y k = 0 for i k and y k 0 The test depends upon the identity k x k y j y i y k = y i x k y i y j, y j y j which is obtained from the formula for y k by taking the dot product with y i In the identity, y i y j = 0 by the induction hypothesis for j k and j i Therefore, the summation in the identity contains just the term for index j = i, and the contribution is y i x k This contribution cancels the leading term on the right in the identity, resulting in the orthogonality relation y i y k = 0 If y k = 0, then x k is a linear combination of y,, y k But each y j is a linear combination of {x i } j i=, therefore y k = 0 implies x k is a linear combination of x,, x k, a contradiction to the independence of {x i } k i= The proof is complete j= Orthogonal Projection Reproduced here is the basic material on shadow projection, for the convenience of the reader The ideas are then extended to obtain the orthogonal projection onto a subspace V of R n Finally, the orthogonal projection formula is related to the Gram-Schmidt equations The shadow projection of vector X onto the direction of vector Y is the number d defined by X d = Y Y The triangle determined by X and d Y X Y is a right triangle d Y Figure 3 Shadow projection d of vector X onto the direction of vector Y The vector shadow projection of X onto the line L through the origin in the direction of Y is defined by proj Y ( X Y = d Y = X Y Y Y Y Orthogonal projection for dimension The extension of the shadow projection formula to a subspace V of R n begins with unitizing Y to isolate the vector direction u = Y / Y of line L Define the subspace
554 V = span{u} Then V is identical to L We define the orthogonal projection by the formula Proj V (x = (u xu, V = span{u} The reader is asked to verify that proj Y (x = du = Proj V (x These equalities mean that the orthogonal projection is the vector shadow projection when V is one dimensional Orthogonal projection for dimension k Consider a subspace V of R n given as the span of orthonormal vectors u,, u k Define the orthogonal projection by the formula k Proj V (x = (u j xu j, V = span{u,, u k } j= Justification of the Formula The definition of Proj V (x seems to depend on the choice of the orthonormal vectors Suppose that {w j } k j= is another orthonormal basis of V Define u = k i= (u i xu j and w = k j= (w j xw j It will be established that u = w, which justifies the projection formula is independent of basis Lemma (Orthonormal Basis Expansion Let {v j } k j= be an orthonormal basis of a subspace V in R n Then any vector v in V is represented as v = k (v j vv j j= Proof: First, v has a basis expansion v = k j= c jv j for some constants c,, c k Take the inner product of this equation with vector v i to prove that c i = v i v, hence the claimed expansion is proved Lemma 2 (Orthogonality Let {u i } k i= be an orthonormal basis of a subspace V in R n Let x be any vector in R n and define u = k i= (u i xu i Then y (x u = 0 for all vectors y in V Proof: The first lemma implies u can be written a second way as a linear combination of u, u k Independence implies equal basis coefficients, which gives u j u = u j x Then u j (x u = 0 Because y is in V, then y = k j= c ju j, which implies y (x u = k j= u j(x u = 0 The proof that w = u has these details: w = k j= (w j xw j = k j= (w j uw j Because w j (x u = 0 by the second lemma
93 Advanced Topics in Linear Algebra 555 = ( k j= w j k i= (u i xu i w j Definition of u = k k j= i= (w j u i (u i xw j Dot product properties = k i= ( k j= (w j u i w j (u i x Switch summations = k i= u i(u i x First lemma with v = u i = u Definition of u Orthogonal projection and Gram-Schmidt Define y,, y k by the Gram-Schmidt relations on page 552 Let u j = y j / y j for j =,, k Then V j = span{u,, u j } is a subspace of R n of dimension j with orthonormal basis u,, u j and y j = x j x j y y x k y j y j y y y j y j = x j Proj Vj (x j The Near Point Theorem Developed here is the characterization of the orthogonal projection of a vector x onto a subspace V as the unique point v in V which minimizes x v, that is, the point in V closest to x In remembering the Gram-Schmidt formulas, and in the use of the orthogonal projection in proofs and constructions, the following key theorem is useful Theorem 8 (Orthogonal Projection Properties Let V be the span of orthonormal vectors u,, u k (a Every vector in V has an orthogonal expansion v = k j= (u j vu j (b The vector Proj V (x is a vector in the subspace V (c The vector w = x Proj V (x is orthogonal to every vector in V (d Among all vectors v in V, the minimum value of x v is uniquely obtained by the orthogonal projection v = Proj V (x Proof: Properties (a, (b and (c are proved in the preceding lemmas Details are repeated here, in case the lemmas were skipped (a: Write a basis expansion v = k j= c jv j for some constants c,, c k Take the inner product of this equation with vector v i to prove that c i = v i v (b: Vector Proj V (x is a linear combination of basis elements of V (c: Let s compute the dot product of w and v We will use the orthogonal
556 expansion from (a w v = (x Proj V (x v k = x v (x u j u j v = j= k (v u j (u j x j= = 0 (d: Begin with the Pythagorean identity k (x u j (u j v j= a 2 + b 2 = a + b 2 valid exactly when a b = 0 (a right triangle, θ = 90 Using an arbitrary v in V, define a = Proj V (x v and b = x Proj V (x By (b, vector a is in V Because of (c, then a b = 0 This gives the identity Proj V (x v 2 + x Proj V (x 2 = x v 2, which establishes x Proj V (x < x v except for the unique v such that Proj V (x v = 0 The proof is complete Theorem 9 (Near Point to a Subspace Let V be a subspace of R n and x a vector not in V The near point to x in V is the orthogonal projection of x onto V This point is characterized as the minimum of x v over all vectors v in the subspace V Proof: Apply (d of the preceding theorem Theorem 20 (Cross Product and Projections The cross product direction a b can be computed as c Proj V (c, by selecting a direction c not in V = span{a, b} Proof: The cross product makes sense only in R 3 Subspace V is two dimensional when a, b are independent, and Gram-Schmidt applies to find an orthonormal basis u, u 2 By (c of Theorem 8, the vector c Proj V (c has the same or opposite direction to the cross product The QR Decomposition The Gram-Schmidt formulas can be organized as matrix multiplication A = QR, where x,, x n are the independent columns of A, and Q has columns equal to the Gram-Schmidt orthonormal vectors u,, u n, which are the unitized Gram-Schmidt vectors
93 Advanced Topics in Linear Algebra 557 Theorem 2 (The QR-Decomposition Let the m n matrix A have independent columns x,, x n Then there is an upper triangular matrix R with positive diagonal entries and an orthonormal matrix Q such that A = QR Proof: Let y,, y n be the Gram-Schmidt orthogonal vectors given by relations on page 552 Define u k = y k / y k and r kk = y k for k =,, n, and otherwise r ij = u i x j Let Q = aug(u,, u n Then (2 x = r u, x 2 = r 22 u 2 + r 2 u, x 3 = r 33 u 3 + r 3 u + r 32 u 2, x n = r nn u n + r n u + + r nn u n It follows from (2 and matrix multiplication that A = QR complete The proof is Theorem 22 (Matrices Q and R in A = QR Let the m n matrix A have independent columns x,, x n Let y,, y n be the Gram-Schmidt orthogonal vectors given by relations on page 552 Define u k = y k / y k Then AQ = QR is satisfied by Q = aug(u,, u n and y u x 2 u x 3 u x n 0 y 2 u 2 x 3 u 2 x n R = 0 0 0 y n Proof: The result is contained in the proof of the previous theorem Some references cite the diagonal entries as x, x 2,, x n, where x j = x j Proj Vj (x j, V j = span{v,, v j } Because y = x and y j = x j Proj Vj (x j, the formulas for the entries of R are identical Theorem 23 (Uniqueness of Q and R Let m n matrix A have independent columns and satisfy the decomposition A = QR If Q is m n orthogonal and R is n n upper triangular with positive diagonal elements, then Q and R are uniquely determined Proof: The problem is to show that A = Q R = Q 2 R 2 implies R 2 R = I and Q = Q 2 We start with Q = Q 2 R 2 R Define P = R 2R Then Q = Q 2 P Because I = Q T Q = P T Q T 2 Q 2 P = P T P, then P is orthogonal Matrix P is the product of square upper triangular matrices with positive diagonal elements, which implies P itself is square upper triangular with positive diagonal elements The only matrix with these properties is the identity matrix I Then R 2 R = P = I, which implies R = R 2 and Q = Q 2 The proof is complete
558 Theorem 24 (The QR Decomposition and Least Squares Let m n matrix A have independent columns and satisfy the decomposition A = QR Then the normal equation A T Ax = A T b in the theory of least squares can be represented as Rx = Q T b Proof: The theory of orthogonal matrices implies Q T Q = I Then the identity (CD T = D T C T, the equation A = QR, and R T invertible imply A T Ax = A T b R T Q T QRx = R T Q T x Normal equation Substitute A = QR Rx = Q T x Multiply by the inverse of R T The proof is complete The formula Rx = Q T b can be solved by back-substitution, which accounts for its popularity in the numerical solution of least squares problems Theorem 25 (Spectral Theorem Let A be a given n n real matrix Then A = QDQ with Q orthogonal and D diagonal if and only if A T = A Proof: The reader is reminded that Q orthogonal means that the columns of Q are orthonormal The equation A = A T means A is symmetric Assume first that A = QDQ with Q = Q T orthogonal (Q T Q = I and D diagonal Then Q T = Q = Q This implies A T = (QDQ T = (Q T D T Q T = QDQ = A Conversely, assume A T = A Then the eigenvalues of A are real and eigenvectors corresponding to distinct eigenvalues are orthogonal The proof proceeds by induction on the dimension n of the n n matrix A For n =, let Q be the identity matrix AQ = QD where D is diagonal Then Q is orthogonal and Assume the decomposition AQ = QD for dimension n Let s prove it for A of dimension n + Choose a real eigenvalue λ of A and eigenvector v with v = Complete a basis v,, v n+ of R n+ By Gram-Schmidt, we assume as well that this basis is orthonormal Define P = aug(v,, v n+ Then P is orthogonal and satisfies P T = P Define B = P AP Then B is symmetric (B T = B and col(b, = λ col(i, These facts imply that B is a block matrix ( λ 0 B = 0 C where C is symmetric (C T = C The induction hypothesis applies to C to obtain the existence of an orthogonal matrix Q such that CQ = Q D for
93 Advanced Topics in Linear Algebra 559 some diagonal matrix D Define a diagonal matrix D and matrices W and Q as follows: ( λ 0 D =, 0 D ( 0 W =, 0 Q Q = P W Then Q is the product of two orthogonal matrices, which makes Q orthogonal Compute ( ( ( ( 0 λ 0 0 λ 0 W BW = 0 Q = 0 C 0 Q 0 D Then Q AQ = W P AP W = W BW = D This completes the induction, ending the proof of the theorem Theorem 26 (Schur s Theorem Given any real n n matrix A, possibly non-symmetric, there is an upper triangular matrix T, whose diagonal entries are the eigenvalues of A, and a complex matrix Q satisfying Q T = Q (Q is unitary, such that AQ = QT If A = A T, then Q is real orthogonal (Q T = Q Schur s theorem can be proved by induction, following the induction proof of Jordan s theorem, or the induction proof of the Spectral Theorem The result can be used to prove the Spectral Theorem in two steps Indeed, Schur s Theorem implies Q is real, T equals its transpose, and T is triangular Then T must equal a diagonal matrix D Theorem 27 (Eigenpairs of a Symmetric A Let A be a symmetric n n real matrix Then A has n eigenpairs (λ, v,, (λ n, v n, with independent eigenvectors v,, v n Proof: The preceding theorem applies to prove the existence of an orthogonal matrix Q and a diagonal matrix D such that AQ = QD The diagonal entries of D are the eigenvalues of A, in some order For a diagonal entry λ of D appearing in row j, the relation A col(q, j = λ col(q, j holds, which implies that A has n eigenpairs The eigenvectors are the columns of Q, which are orthogonal and hence independent The proof is complete Theorem 28 (Diagonalization of Symmetric A Let A be a symmetric n n real matrix Then A has n eigenpairs For each distinct eigenvalue λ, replace the eigenvectors by orthonormal eigenvectors, using the Gram-Schmidt process Let u,, u n be the orthonormal vectors so obtained and define Q = aug(u,, u n, D = diag(λ,, λ n Then Q is orthogonal and AQ = QD
560 Proof: The preceding theorem justifies the eigenanalysis result Already, eigenpairs corresponding to distinct eigenvalues are orthogonal Within the set of eigenpairs with the same eigenvalue λ, the Gram-Schmidt process produces a replacement basis of orthonormal eigenvectors Then the union of all the eigenvectors is orthonormal The process described here does not disturb the ordering of eigenpairs, because it only replaces an eigenvector The proof is complete The Singular Value Decomposition The decomposition has been used as a data compression algorithm A geometric interpretation will be given in the next subsection Theorem 29 (Positive Eigenvalues of A T A Given an m n real matrix A, then A T A is a real symmetric matrix whose eigenpairs (λ, v satisfy (3 λ = Av 2 v 2 0 Proof: Symmetry follows from (A T A T = A T (A T T = A T A An eigenpair (λ, v satisfies λv T v = v T A T Av = (Av T (Av = Av 2, hence (3 Definition 4 (Singular Values of A Let the real symmetric matrix A T A have real eigenvalues λ λ 2 λ r > 0 = λ r+ = = λ n The numbers σ k = λ k ( k n are called the singular values of the matrix A The ordering of the singular values is always with decreasing magnitude Theorem 30 (Orthonormal Set u,, u m Let the real symmetric matrix A T A have real eigenvalues λ λ 2 λ r > 0 = λ r+ = = λ n and corresponding orthonormal eigenvectors v,,v n, obtained by the Gram-Schmidt process Define the vectors u = σ Av,, u r = σ r Av r Because Av k = σ k, then {u,, u r } is orthonormal Gram-Schmidt can extend this set to an orthonormal basis {u,, u m } of R m Proof of Theorem 30: Because A T Av k = λ k v k 0 for k r, the vectors u k are nonzero Given i j, then σ i σ j u i u j = (Av i T (Av j = λ j vi T v j = 0, showing that the vectors u k are orthogonal Further, u k 2 = v k (A T Av k /λ k = v k 2 = because {v k } n k= is an orthonormal set The extension of the u k to an orthonormal basis of R m is not unique, because it depends upon a choice of independent spanning vectors y r+,, y m for the set {x : x u k = 0, k r} Once selected, Gram-Schmidt is applied to u,, u r, y r+,, y m to obtain the desired orthonormal basis
93 Advanced Topics in Linear Algebra 56 Theorem 3 (The Singular Value Decomposition (svd Let A be a given real m n matrix Let (λ, v,,(λ n, v n be a set of orthonormal eigenpairs for A T A such that σ k = λ k ( k r defines the positive singular values of A and λ k = 0 for r < k n Complete u = (/σ Av,, u r = (/σ r Av r to an orthonormal basis {u k } m k= for R m Define ( diag(σ,, σ U = aug(u,, u m, Σ = r 0, 0 0 V = aug(v,, v n Then the columns of U and V are orthonormal and A = UΣV T = σ u v T + + σ r u r v T r = A(v v T + + A(v r v T r Proof of Theorem 3: The product of U and Σ is the m n matrix UΣ = aug(σ u,, σ r u r, 0,, 0 = aug(a(v,, A(v r, 0,, 0 Let v be any vector in R n It will be shown that UΣV T v, r k= A(v k(vk T v and Av are the same column vector We have the equalities v T v UΣV T v = UΣ vn T v v T v = aug(a(v,, A(v r, 0,, 0 vn T v = r (vk T va(v k k= Because v,, v n is an orthonormal basis of R n, then v = n k= (vt k vv k Additionally, A(v k = 0 for r < k n implies ( n Av = A (vk T vv k = k= r (vk T va(v k Then Av = UΣV T v = r k= A(v k(vk T v, which proves the theorem k= Singular Values and Geometry Discussed here is how to interpret singular values geometrically, especially in low dimensions 2 and 3 First, we review conics, adopting the viewpoint of eigenanalysis
562 Standard equation of an ellipse Calculus courses consider ellipse equations like 85x 2 60xy+40y 2 = 2500 and discuss removal of the cross term 60xy The objective is to obtain a standard ellipse equation X 2 a 2 + Y 2 b 2 = We re-visit this old problem from a different point of view, and in the derivation establish a connection between the ellipse equation, the symmetric matrix A T A, and the singular values of A ( 9 Example (Image of the Unit Circle Let A = 2 6 6 7 Verify that the invertible matrix A maps the unit circle into the ellipse 85x 2 60xy + 40y 2 = 2500 Solution: The unit circle has parameterization θ (cos θ, sin θ, 0 θ 2π Mapping of unit circle by the matrix A is formally the set of dual relations ( ( ( ( x cos θ cos θ = A, = A x y sin θ sin θ y The Pythagorean identity cos 2 θ+sin 2 θ = used on the second relation implies 85x 2 60xy + 40y 2 = 2500 0 Example (Removing the xy-term in an Ellipse Equation Verify that after a rotation (x, y (X, Y to remove the xy-term in 85x 2 60xy + 40y 2 = 2500, the ellipse equation in the new XY -coordinates is X 2 00 + Y 2 25 = Solution: The xy-term removal is accomplished by a change of variables (x, y (X, Y which transforms the ellipse equation 85x 2 60xy+40y 2 = 2500 into the ellipse equation 25X 2 + 00Y 2 = 2500, details below It s standard form is obtained by dividing by 2500, to give X 2 00 + Y 2 25 = Analytic geometry says that the semi-axis lengths are 00 = 0 and 25 = 5 In previous discussions of the ellipse, the equation 85x 2 60xy + 40y 2 = 2500 was represented by the vector-matrix identity ( x y ( 85 30 30 40 ( x y = 2500
93 Advanced Topics in Linear Algebra 563 The program used( earlier to remove the xy-term was to diagonalize the coefficient matrix B = by calculating the eigenpairs of B: 85 30 30 40 ( ( ( ( 2 25,, 00, 2 Because B is symmetric, then the eigenvectors are orthogonal The eigenpairs above are replaced by unitized pairs: ( ( 25,, 00, 5 ( 2 BQ = QD, Q = 5 ( 2 2 5 ( 2 Then the diagonalization theory for B can be written as, D = ( 25 0 0 00 The single change of variables ( x y ( X = Q Y then transforms the ellipse equation 85x 2 60xy + 40y 2 = 2500 into 25X 2 + 00Y 2 = 2500 as follows: 85x 2 60xy + 40y 2 = 2500 Ellipse equation ( ( u T Bu = 2500 Where B = 85 30 x 30 40 and u = y ( X (Qw T B(Qw = 2500 Change u = Qw, where w = Y w T (Q T BQw = 2500 Expand, ready to use BQ = QD w T (Dw = 2500 Because D = Q BQ and Q = Q T 25X 2 + 00Y 2 = 2500 Expand w T Dw Rotations and Scaling The 2 2 singular value decomposition A = UΣV T can be used to decompose the change of variables (x, y (X, Y into three distinct changes of variables, each with a geometrical meaning: (x, y (x, y (x 2, y 2 (X, Y Table Three Changes of Variable Domain ( Equation ( Image Meaning x cos θ Circle = V y T Circle 2 Rotation sin θ ( ( x2 x Circle 2 = Σ Ellipse Scale axes y 2 y ( ( X x2 Ellipse = U Ellipse 2 Rotation Y y 2
564 Geometry We give in Figure 4 a geometrical interpretation for the singular value decomposition A = UΣV T For illustration, the matrix A is assumed 2 2 and invertible Rotate Scale Rotate V T v Σ U σ u v 2 σ 2 u 2 Circle Circle 2 Ellipse Ellipse 2 (x, y (x, y (x 2, y 2 (X, Y Figure 4 Mapping the unit circle Invertible matrix A maps Circle into Ellipse 2 Orthonormal vectors v, v 2 are mapped by matrix A = UΣV T into orthogonal vectors Av = σ u, Av 2 = σ 2 u 2, which are exactly the semi-axes vectors of Ellipse 2 The semi-axis lengths of Ellipse 2 equal the singular values σ, σ 2 of matrix A The semi-axis directions of Ellipse 2 equal the basis vectors u, u 2 The process is a rotation (x, y (x, y, followed by an axisscaling (x, y (x 2, y 2, followed by a rotation (x 2, y 2 (X, Y Summary ( for the Example The singular value decomposition for A = 2 6 6 7 is A = UΣV T, where U = 5 ( 2 2 Invertible matrix A =, Σ = ( 0 0 0 5, V = ( 2 5 2 ( 2 6 6 7 maps the unit circle into an ellipse The columns of V are orthonormal vectors v, v 2, computed as eigenpairs (λ, v, (λ 2, v 2 of A T A, ordered by λ λ 2 ( 00, 5 ( 2, ( 25, 5 ( 2
93 Advanced Topics in Linear Algebra 565 The singular values are σ = λ = 0, σ 2 = λ 2 = 5 The image of v is Av = UΣV T v = U ( σ 0 = σ u ( The image of v 2 is Av 2 = UΣV T 0 v 2 = U = σ σ 2 u 2 2 v σ u v 2 Unit Circle A Figure 5 σ 2 u 2 Ellipse Mapping the unit circle into an ellipse The Four Fundamental Subspaces The subspaces appearing in the Fundamental Theorem of Linear Algebra are called the Four Fundamental Subspaces They are: Subspace Notation Row Space of A Image(A T Nullspace of A Column Space of A kernel(a Image(A Nullspace of A T kernel(a T The singular value decomposition A = UΣV T computes orthonormal bases for the row and column spaces of of A In the table below, symbol r = rank(a Matrix A is assumed m n, which implies A maps R n into R m Table 2 Four Fundamental Subspaces and the SVD Orthonormal Basis Span Name First r columns of V Image(A T Row Space of A Last n r columns of V kernel(a Nullspace of A First r columns of U Image(A Column Space of A Last m r columns of U kernel(a T Nullspace of A T A Change of Basis Interpretation of the SVD value decomposition can be described as follows: The singular
566 For every m n matrix A of rank r, orthonormal bases can be constructed such that {v i } n i= and {u j } m j= Matrix A maps basis vectors v,, v r to non-negative multiples of basis vectors u,, u r, respectively The n r left-over basis vectors v r+, v n map by A into the zero vector With respect to these bases, matrix A is represented by a real diagonal matrix Σ with non-negative entries Exercises 93 Diagonalization Find the matrices P and D in the relation AP = P D, that is, find the eigenpair packages 2 3 4 Jordan s Theorem Find matrices P and T in the relation AP = P T, where T is upper triangular 5 6 7 8 Cayley-Hamilton Theorem 9 0 2 Gram-Schmidt Process 3 4 5 6 7 8 9 20 Shadow Projection 2 22 23 24 Orthogonal Projection 25 26 27 28 29 Shadow Projection Prove that proj Y (x = du = Proj V (x The equality means that the orthogonal projection is the vector shadow projection when V is one dimensional
93 Advanced Topics in Linear Algebra 567 30 Gram-Schmidt Construction Prove these properties, where we define x j = x j Proj Wj (x j, W j = span(x,, x j (a Subspace W j is equal to the Gram-Schmidt V j = span(u,, u j (b Vector x j is orthogonal to all vectors in W j (c The vector x j is not zero (d The Gram-Schmidt vector is u j = x j x j Near Point Theorem Find the near point to subspace V of R n 3 32 33 34 QR-Decomposition 35 36 37 38 39 40 4 42 Least Squares 43 44 45 46 Orthonormal Diagonal Form 47 48 49 50 Eigenpairs of Symmetric Matrices 5 52 53 54 Singular Value Decomposition 55 56 57 58 59 60 6 62 Ellipse and the SVD 63 64 65 66