Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization results serve to simplify the solution to linear systems, others are concerned with revealing the matrix eigenvalues We consider both types of results here 7 The PLU Decomposition The PLU decomposition (or factorization) To achieve LU factorization we require a modified notion of the row reduced echelon form Definition 7 The modified row echelon form of a matrix is that form which satisfies all the conditions of the modified row reduced echelon form except that we do not require zeros to be above leading ones, and moreover we do not require leading ones, just nonzero entries For example the matrices below are in row echelon form 2 3 2 3 0 A = 0 0 B = 0 4 7 6 0 0 0 0 0 0 Most of the factorizations A M n (C) studied so far require one essential ingredient, namely the eigenvectors of A While it was not emphasized when we studied Gaussian elimination, there is a LU-type factorization there Assume for the moment that the only operations needed to carry A to its 20
202 CHAPTER 7 FACTORIZATION THEOREMS modified row echelon form are those that add a multiple of one row to another The modified row echelon form of a matrix is that form which satisfies all the conditions of the modified row reduced echelon form except that we do not require zeros to be above leading ones, and moreover we do not require leading ones, just nonzero entries Naturally it is easy to make the leading nonzero entries into leading ones by the multiplication by an appropriate identity matrix That is not the point here What we want to observe is that in this case the reduction is accomplished by the left multiplication of A by a sequence of lower triangular matrices of the form 0 0 L = 0 c 0 Since we pivot at the (, )-entry first, we eliminate all the entries in the first column below the first row The product of all the matrices L to accomplish this has the form L = c 2 0 c 3 0 c n 0 where c k = a k a Thus, with the notation that A = A has entries a () ij this first phase of the reduction renders the matrix A 2 with entries a (2) ij a (2) a (2) n 0 a (2) 22 A 2 = L A = 0 a (2) 32 a (2) 33 0 a (2) n2 a (2) nn Since we have assumed that no row interchanges are necessary to carry out the reduction we know that a (2) 22 6=0 The next part of the reduction process is the elimination of the elements in the second column below the second
7 THE PLU DECOMPOSITION 203 row, ie a (2) (2) 32 0, a n2 0 Correspondingly, this can be achieved by a matrix of the form 0 0 L 2 = 0 c 22 0 c n2 (What are the values c k2?) The result is the matrix A 3 given by A 3 = L 2 A 2 = L 2 L A = a (3) a (3) n 0 a (3) 22 0 0 a (3) 33 0 0 a (3) 3n a (3) nn Proceeding in this way through all the rows (columns) there results A n = L n A n = L n L 2 L A = a (3) a (3) n 0 a (3) 22 0 0 a (3) 33 0 0 0 a (3) nn The right side of the equation above is an upper triangular matrix Denote it by U Since each of the matrices L i,i=,n is invertible we can write The lemma below is useful in this A = L L n U Lemma 7 Suppose the lower triangular matrix L M n (C) has the
204 CHAPTER 7 FACTORIZATION THEOREMS form L = 0 0 0 c k+,k 0 0 c nk k th row Then L is invertible with inverse given by 0 0 L = 0 c k+,k 0 0 c nk Proof Trivial k th row Lemma 72 Suppose L, L 2,,L n are the matrices given above Then the matrix L = L L n has the form c 2 0 c 3 c 32 L = c k+,k c n c n2 c nk Proof Trivial Applying these lemmas to the present situation we can say that when no row interchanges are needed we can factor and matrix A M n (C) as A = LU, where L is lower triangular and U is upper triangular When row
7 THE PLU DECOMPOSITION 205 interchanges are needed and we let P be the permutation matrix that creates these row interchanges then the LU-factorization above can be carried out for the matrix PA Thus PA = LU, where L is lower triangular and U is upper triangular We call this the PLU factorization Let us summarize this in the following theorem Theorem 7 Let A M n (C) Then there is a permutation matrix P M n (C) and lower L and upper U triangular matrices ( M n (C)), such that PA = LU Moreover, L can be taken to have ones on its diagonal That is, `ii =, i =,n By applying the result above to A T it is easy to see that the matrix U can be taken to have the ones in its diagonal The result is stated as a corollary Corollary 7 Let A M n (C) Then there is a permutation matrix P M n (C) and lower and upper triangular matrices ( M n (C)) respectively, such that PA = LU Moreover, U can be taken to have ones on its diagonal (u ii =, i =,n) The PLU decomposition can be put in service to solving the system Ax = b as follows Assume that A M n (C) is invertible Determine the permutation matrix P in order that PA = LU, where L is lower triangular and U is upper triangular Thus, we have Solve the systems Ax = b PAx = Pb LUx = Pb Ly = Pb Ux = y Then LUx = Ly = Pb Hence x is a solution to the system The advantages of this formulation over the direct Gaussian elimination is that the systems Ly = Pband Ux = y are triangular and hence are easy to solve For example i T for the first of the systems, Ly = Pb, let the vector Pb = hˆb,,ˆb n Then it is easy to see that back substitution (aka forward substitution )
206 CHAPTER 7 FACTORIZATION THEOREMS can be used to determine y That is, we have the recursive relations y = ˆb l y 2 = ˆb 2 l 2 y l 22 y n = à ˆbn n X m= l nm y m! l nn A similar formula applies to solve Ux = y Inthiscasewesolvefirst for x n = y n /u nn The general formula is recursive with x k being determined after x k+,,x n are determined using the formula à x k = y k nx m=k+ u km y m! In practice the step of determining and then multiplying by the permutation matrix is not actually carried out Rather, an index array is generated, while the elimination step is accomplished that effectively interchanges a pointer to the row interchanges This saves considerable time in solving potentially very large systems More general and instructive methods are available for accomplishing this LU factorization Also, conditions are available for when no (nontrivial) permutation is required We need the following lemma Lemma 73 Let A M n (C) have the LU factorization A = LU, where L is lower triangular and U is upper triangular For any partition of the matrix of the form A A A = 2 A 2 A 22 u kk there are corresponding decompositions of the matrices L and U L 0 L = L 2 L 22 U U and U = 2 0 U 22
7 THE PLU DECOMPOSITION 207 where the L ii and the U ii are lower and upper triangular respectively Moreover, we have A = L U A 2 = L 2 U A 2 = L 2 U 22 A 22 = L 2 U 2 + L 22 U 22 Thus L U is a LU factorization of A With this lemma we can establish that almost every matrix can have a LU factorization Definition 72 Let A M n (C) and suppose that j n The expression det(a{,,j}) means the determininant of the upper left j j submatrix of A These quaditities for j =,,nare called the principal determinants of A Theorem 72 Let A M n (C) and suppose that A has rank k If det(a{,,j}) 6= 0 for j =,,k () then A has a LU factorization A = LU, where L is lower triangular and U is upper triangular Moreover, the factorization may be taken so that either L or U is nonsingular In the case k = n both L and U will be nonsingular Proof We carry out this LU factorization as a direct calculation in comparison to the Gaussian elimination method above Let us propose to solve the equation LU = A expressed as l u u 2 u 3 u n l 2 l 22 0 u 22 u 23 u 2n l 3 l 32 l 33 u 33 0 l n l n2 l nn unn a a 2 a 3 a n a 2 a 22 a 23 a 2n a 3 a 32 a 33 = a n a n2 a nn
208 CHAPTER 7 FACTORIZATION THEOREMS It is easy to see that l u = a We can take, for example l =and solve for u The detminant condition assures us that u 6=0 Next solve for the (2, )-entry We have l 2 u = a 2 Since u 6=0, solve for l 2 For the (, 2)-entry we have l u 2 = a 2,whichcanbesolvedforu 2 since l 6= 0 Finally, for the (2, 2)-entry, l 2 u 2 + l 22 u 22 = a 22 is an equation with two unknowns Assign l 22 = and solve for u 22 What is important to note is that the process carried out this way gives the factorization of the upper left 2 2 submatrix of A Thus l 0 u u 2 a a = 2 l 2 l 22 0 u 22 a 2 a 22 µ a a Since det 2 a 2 a 22 l 0 we know that µ u u 6= 0, it follows that det 2 0 u 22 6= 0and is nonsingular as the diagonal elements are ones l 2 l 22 Continue the factorization process through the k k upper left submatrix of A Now consider the blocked matrix form form A A A A = 2 A 2 A 22 where A is k k and has rank k Thus we know that the rows of the lower (n k) n matrix above, that is A 2 A 22 canbewrittenasaunique linear combination of the rows of the upper k n matrix A A 2 Thus A2 A 22 = C A A 2 for some (n k) k matrix C Of course this means: A 2 = CA and A 22 = CA 2 We consider the factorization A A A = 2 A 2 A 22 L 0 = L 2 L 22 U U 2 0 U 22 where the blocks L and U have just been determined From the equations in the lemma above we solve to get U 2 = L A 2 and L 2 =
72 LR FACTORIZATION 209 A 2 U Then A 22 = L 2 U 2 + L 22 U 22 = A 2 U L A 2 + L 22 U 22 = A 2 A A 2 + L 22 U 22 = CA A A 2 + L 22 U 22 = C A 2 + L 22 U 22 = A 22 + L 22 U 22 Thus we solve L 22 U 22 =0 Obviously, we can take for L 22 any nonsingular matrix we wish and solve for U 22 or conversely 72 LR factorization While the PLU factorization is useful for solving systems, the LR factorization can be used to determine eigenvalues Let A M n be given Then Then Continue in this fashion to obtain We define A = A = L R L A L = R L A 2 A 2 = L 2 R 2 L 2 A 2L 2 = R 2 L 2 A 3 L k A kl k = R k L k A k+ (?) Then P k = L L 2 L k Q k = R k R 2 R P k A k+ = A P k
20 CHAPTER 7 FACTORIZATION THEOREMS for or Hence A k+ = L k A kl k = L k L k A k L k L k = P k A P k P k A k+ = A P k P k Q k = P k A k Q k = A P k Q k = A P k 2 A k Q k 2 = A 2 P k 2 Q k 2 = A k Theorem 72 (Rutishauser) Let A M n be given Assume the eigenvalues of A satisfy λ > λ 2 > > λ n > 0 Then A Λ = diag(λ λ n ) Assume A = SΛS,and Y S = L y R y X = S = L x R x where L y and L x are lower unit triangular matrices and R y and R x are upper triangular Then A k defined by (?) satisfy the result lim A k is upper triangular Proof (Wilkinson) We have A k = XΛ k Y = XΛ k L y R y = XΛ k L y Λ k Λ k R y
73 THE QR ALGORITHM 2 By the strict inequalities between the eigenvalues we have i = j µ k (Λ k L y Λ k λi ) ij = `ij i>j λ j 0 i<j Hence Λ k L y Λ k I (because λ i λ j < ifi>j) Hence with and A k = L x R x (Λ k L y Λ k )Λ k R y A k = P k Q k we conclude that lim k P k = L x Therefore L k = P k P k I Finally we have that A k must be upper triangular because is upper triangular L k A k = R k This exposes all the eigenvalues of A determined Therefore the eigenvectors can be 73 The QR algorithm Certain numerical problems with the LU algorithm have led to the QR algorithm, which is based on the decomposition of the matrix A as A = QR where Q is unitary and R is upper triangular Theorem 73 (QR-factorization) (i) Suppose A is in M n,m and n m Then there is a matrix Q M n,m with orthogonal columns and an uppertriangularmatrix R M m such that A = QR
22 CHAPTER 7 FACTORIZATION THEOREMS (ii) If n = m, thenq is unitary If A is nonsingular the diagonal entries of R can be chosen to be positive (iii) If A is real; then Q and R maybechosentobereal Proof (i) We proceed inductively Let a,, a n denote the columns of A and q,q 2,,q m denote the columns of Q The basic idea of the QR-factorization is to orthogonalize the columns of A from left P to right Then the columns can be expressed by the formulas a k = k i= c kq k, k =,,n The coefficients of the expansion become, respectively, the entries of the k th column of R, completedbyn k zeros (Of course, if the rank of A is less than m, wefill in arbitrary orthogonal vectors which we know exist as m n) For the details, first define q = a /ka k To compute q 2 we use the Gram Schmidt procedure Tracing backwards note that ˆq 2 = a 2 hq,a iq q 2 =ˆq 2 /kˆq 2 k a 2 =ˆq 2 + hq,a iq = kˆq 2 kq 2 + hq,a iq So we have ka k hq,a i a a 2 a 3 q q = 2 q 3 0 kˆq 2 k 0 0 0 Instead of the full inductive step we compute q 3 and finish at that point ˆq 3 = a 3 hq,a 3 iq hq 2,a 3 iq 2 q 3 =ˆq 3 /kˆq 3 k Hence a 3 = kˆq 3 kq 3 + hq,a 3 iq + hq 2,a 3 iq 2
73 THE QR ALGORITHM 23 The third column of R is thus given by r 3 =[hq,a 3 i, hq 2,a 3 i, kˆq 3 k, 0, 0,,0] T In this way we see that the columns of Q are orthogonal and the matrix R is upper triangular, with an exception That is the possibility that ˆq k = 0 for some k In this degenerate case we take q k to be any vector orthogonal to the span of a,a 2,,a m,andwetaker kj =0, j = k, k +m Alsowenotethatifˆq k =0,thena k is linearly dependent on a,a 2,,a k, and hence on q,q 2,q k Select the coefficients r k,,r k k to reflect this dependence (ii) If m = n, the process above yields a unitary matrix If A is nonsingular, the process above yields a matrix R with a positive diagonal (iii) If A is a real, all operators above can be carried out in real arithmetic Now what about the uniqueness of the decomposition? Essentially the uniqueness is true up to a multiplication by a diagonal matrix, except in the case when the matrix has rank is less than m, when there is no form of uniqueness Suppose that the rank of A is m Then application of the Gram-Schmidt procedure yields a matrix R with positive diagonal Suppose that A has two QR factorizations, QR and PS with upper triangular factors having positive diagonals Then P Q = SR We have that SR is upper triangular and moreover has a positive diagonal Also, P Q is unitary We know that the only upper triangular unitary matrices are diagonal matrices, and finally the only unitary matrix with a positive diagonal is the identity matrix Therefore P Q = I, whichisto say that P = Q We summarize as Corollary 73 Suppose A is in M n,m and n m If rank(a) =m then the QR factorization of A = QR with upper triangular matrix R having a positive diagonal is unique
24 CHAPTER 7 FACTORIZATION THEOREMS The QR algorithm The QR algorithm parallels the LR algorithm almost identically Suppose A is in M n Define Also A = Q R A 2 R Q Q A Q = A 2 Then decompose A 2 into a QR decomposition A 2 = Q 2 R 2 and Q 2A 2 Q 2 = R 2 Q 2 A 3 Also Q 2Q A Q Q 2 = R 2 Q 2 = A 3 Proceed sequentially Let A k = Q k R k A k+ = R k Q k Q k A kq k = A k+ Then whence P k = Q Q 2 Q k T k = R k R k R P k A P k = A k+ P k A k+ = A P k
73 THE QR ALGORITHM 25 Also we have P k T k = P k Q k R k T k = P k A k T k = A P k T k = = A k Theorem 732 Let A M n be given, and assume the eigenvalues of A satisfy λ > λ 2 > > λ n > 0 Then the iterations A k converge to a triangular matrix Proof Our hypothesis gives that A is diagonalizable, and we write A Λ = diag(λ λ n ) That is, where Λ = diag(λ λ n ) Let A = SΛS X = S = Q x R x Y = S = L y U y here QR here LU Then A k = Q x R x Λ k L y U y = Q x R x Λ k L y Λ k Λ k U y = Q x (I + R x E k Rx )R x Λ k U Y where It follows that I + R x E k R x E k = Λ k L y Λ k I 0 i = j (E k ) ij = (λ i /λ j ) k`ij i>j 0 i<j I, andr x Λ k U y is upper triangular Thus Q x (I + R x E k R x )R x Λ k U y = P k T k
26 CHAPTER 7 FACTORIZATION THEOREMS The matrix I + R x E k Rx can be QR factored as Ũk R k, and since I + R x E k Rx I, it follows that we can assume both Ũk I and R k I Hence A k = Q x Ũ k [ R k (I + R x E k R x )R x Λ k U y ]=P k T k with the first factor unitary and the second factor upper triangular Since we have assumed (by the eigenvalue condition) that A is nonsingular, this factorization is essentially unique, where possibly a multiplication by a diagonal matrix must be applied to give the upper triangular factor on the right a positive diagonal Just what is the form of the diagonal matrix can be seen from the following Let Λ = Λ Λ,where Λ is the diagonal matrix of moduli of the elements of Λ and where Λ is the unitary matrix of the signs of each eigenvalue respectively We also take U y = Λ 2 (Λ 2 U y)where Λ 2 is a unitary matrix chosen so that Λ 2U has a positive diagonal Then ³ A k = Q x Ũ k Λ 2 Λ k [ Λ 2 Λ k Rk (I + R x E k Rx )R x ³Λ 2 Λ k Λ k (Λ 2U y )] = P k T k From this we obtain P k is essentially asymptotic to Q x Ũ k Λ 2 Λ k this we obtain that and from Q k = P k P k Λ which is diagonal Finally, it follows that A k is upper triangular since Q k A k = R k In the limit therefore A is similar to an upper triangular matrix Example 73 Apply the QR method to the matrix A := 23 2 2 2 2 3 2 0 The matrix A has eigvenvalues 545, 0723, 87 The successive iterations are
74 LEAST SQUARES 27 A 2 = A 4 = A 6 = A 8 = 50 05 23 063 0662 036 42 00202 44 546 4 0482 00372 0495 0672 069 085 62 546 3 0687 0084 52 083 000826 0983 038 A 3 = 543 0684 09 0000822 87 0229 0000025 00659 0729 A 5 = A 7 = 55 02 036 0046 0666 0482 053 0240 84 547 0366 26 00404 0462 39 00430 2 0677 545 0529 8 000682 78 0585 0005 044 0638 Note the gradual appearance of the eigenvalues on the diagonal Remark These iterations were carried out in precision 3 arithmetic, which affects the rate of convergence to triangular form 74 Least Squares As we know, if A M n,m with m<nit is generally not possible to solve the overdetermined system Ax = b For example, suppose we have the data {(x i,y i )} n i=,withthex-coordinates distinct We may wish to fit a straight to this data This means we want to find coefficients m and b so that b + mx i = y i, i =,,n (?) Taking the matrix and data vector x x 2 A = x n y y 2 b = y n and z =[b, m] T, the system (?) becomes Az = b Usually n À 2 Hence there is virtually no hope to determine a unique solution to system However, there are numerous ways to determine constants m and b so that the resulting line represents the data For example, owing to the distinctness of the x-coordinates, it is possible to solve any 2 2 subsystem of
28 CHAPTER 7 FACTORIZATION THEOREMS Az = b Other variations exist A new 2 2 system could be created by creating two averages of the data, say left and right, and solving Assume kp the sequence {x j } is ordered from least to greatest Define x` = k x j and j= np x r = n k x j Let y` and y r denote the corresponding averages for the j=k+ ordinates Then define the intercept b and slope m by solving the system x` b = y` x r m y r While this will normally give a reasonable approximating line, its value has little utility beyond its naive simplicity and visual appearance What is desired is to establish a criteria for choosing the line Define the residual of the approximation r = b Az Itmakesperfect sense to consider finding z =[b, m] T for which the residual is minimized in some norm Any norm can be selected here, but on practical grounds the best norm to use is the Euclidean norm k k 2 The vector Az that yields the minimal norm residual is the one for which (b Az) Aw, for we are seeking the nearest value in the Aw to the vector b It can be found by select the one for the solution, Az, forwhich This means b Ax Aw all w hb Ax, Ayi =0 ally or ha T (b Ay),yi =0 ally or A T (b Ay) =0 A T Ay = A T b Normal Equations The least squares solution to Ax = b is given by the solution to the normal equation A T Ay = A T b
75 EXERCISES 29 Suppose we have the QR decomposition for A ThenifA is real Hence the normal equations become A T A = R T Q T QR = R T R A T y = R T Qy R T Rx = R T Qy Assuming that the rank of A is m, wemusthavethatr and hence R T is invertible Therefore we have the least squares solution is given by the triangular system 75 Exercises Rx = Qy If A M(C) hasrankk, show that there is a permutation matrix P such that PA has its first k principal determinants nonzero 2 For the least squares fit of a straight line determine R and Q 3 In the case of data A T n Σxi A = Σx i Σx 2 i A T Σyi b = Σx i y i 4 In attempting to solve a quadratic fit wehavethemodel The system is c + bx i + ax 2 i = y i i =,,n x x 2 y y 2 A = b = x n x 2 n y n The normal equations have the matrix and data given by n Σx i Σx 2 A T i Σy i A = Σx i Σx 2 i Σx 3 i A T b = Σx i y i Σx 2 i Σx 2 i Σx 4 i Σx 2 i y i 5 Find the normal equations for the least squares fit ofdatatoapolynomial of degree k