# NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS DO Q LEE Abstract. Computing the solution to Least Squares Problems is of great importance in a wide range of fields ranging from numerical linear algebra to econometrics and optimization. This paper aims to present numerically stable and computationally efficient algorithms for computing the solution to Least Squares Problems. In order to evaluate and compare the stability and efficiency of our proposed algorithms, the theoretical complexities and numerical results have been analyzed. Contents 1. Introduction: The Least Squares Problem 2 2. Existence and Uniqueness Least Squares Solution from Normal Equations 3 3. Norms and Conditioning Norms: Quantifying Error and Distance Sensitivity and Conditioning: Perturbations to b Sensitivity to perturbations in A 6 4. Normal Equations Method Cholesky Factorization Flop: Complexity of Numerical Algorithms Algorithm: Computing the Cholesky Factorization Shortcomings of Normal Equations 8 5. Orthogonal Methods - The QR Factorization Orthogonal Matrices Triangular Least Squares Problems The QR Factorization in Least Squares Problems Calculating the QR-factorization - Householder Transformations Rank Deficiency: Numerical Loss of Orthogonality Singular Value Decomposition (SVD) The Minimum Norm Solution using SVD Computing the SVD of Matrix A Comparison of Methods Acknowledgements 15 References 15 Date: August 24,

2 2 DO Q LEE 1. Introduction: The Least Squares Problem Suppose we are given a set of observed values β and α 1,..., α n. Suppose the variable β is believed to have a linear dependence on the variables α 1,..., α n. Then we may postulate a linear model β = x 1 α x n α n. Our goal is to determine the unknown coefficients x 1,..., x n so that the linear model is a best fit to our observed data. Now consider a system of m linear equations with n variables: a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2... a m1 x 1 + a m2 x a mn x n = b m Then we obtain an overdetermined linear system Ax = b, with m n matrix A, where m > n. Since equality is usually not exactly satisfiable when m > n, the Least Squares Solution x minimizes the squared Euclidean norm of the residual vector r(x) = b Ax so that (1.1) min r(x) 2 2 = min b Ax 2 2 In this paper, numerically stable and computationally efficient algorithms for solving Least Squares Problems will be considered. Sections 2 and 3 will introduce the tools of orthogonality, norms, and conditioning which are necessary for understanding the numerical algorithms introduced in the following sections. The Normal Equations Method using Cholesky Factorization will be discussed in detail in section 4. In section 5, we will see that the QR Factorization generates a more accurate solution than the Normal Equations Method, making it one of the most important tools in computing Least Squares Solutions. Section 6 will discuss the Singular Value Decomposition (SVD) and its robustness in solving rank-deficient problems. Finally, we will see that under certain circumstances the Normal Equations Method and the SVD may be more applicable than the QR approach. 2. Existence and Uniqueness In this section, we will see that the linear Least Squares Problem Ax = b always has a solution, and this solution is unique if and only if the columns of A are linearly independent, i.e., rank(a) = n, where A is an m n matrix. If rank(a) < n, then A is rank-deficient, and the solution is not unique. As such, the numerical algorithms in the later sections of this paper will be based on the assumption that A has full column rank n. Definition 2.1. Let S R n. The orthogonal complement of S, denoted as S, is the set of all vectors x R n that are orthogonal to S. One important property of orthogonal complements is the following: R n = V V, where is the direct sum, which means that any vector x R n can be uniquely represented as x = p + o,

3 NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS 3 where p V and o V. Then p is called the orthogonal projection of the vector x onto the subspace V. As a result, we have the following lemma: Lemma 2.2. Let V R n. Let p be the orthogonal projection of a vector x R n onto V. Then, x v > x p for any v p V. Proof. Let o = x p, o = x v, and v = p v. Then o = o + v, v V, and v 0. Since o V, it follows that o v = 0. Thus, o 2 = o o = (o + v ) (o + v ) = o o + v o + o v + v v = o o + v v = o 2 + v 2 > o 2. Thus x p = min x v is the shortest distance from the vector x to V. v V 2.1. Least Squares Solution from Normal Equations. Recall from (1.1) that the Least Squares Solution x minimizes r(x) 2, where r(x) = b Ax for x R n. The dimension of span(a) is at most n, but if m > n, b generally does not lie in span(a), so there is no exact solution to the Least Squares Problem. In our next theorem, we will use Lemma 2.2 to see that Ax span(a) is closest to b when r = b Ax is orthogonal to span(a), giving rise to the system of Normal Equations A T Ax = A T b. Theorem 2.3. Let A be an m n matrix and b R m. Then ˆx is a Least Squares Solution of the system Ax = b if and only if it is a solution of the associated normal system A T Ax = A T b. Proof. Let x R n. Then Ax is an arbitrary vector in the column space of A, which we write as R(A). As a consequence of Lemma 2.2, r(x) = b Ax is minimum if Ax is the orthogonal projection of b onto R(A). Since R(A) = Null(A T ), ˆx is a Least Squares Solution if and only if A T r(ˆx) = A T (b Aˆx) = 0, which is equivalent to the system of Normal Equations A T Aˆx = A T b. For this solution to be unique, the matrix A needs to have full column rank: Theorem 2.4. Consider a system of linear equations Ax = b and the associated normal system A T Ax = A T b. Then the following conditions are equivalent: (1) The Least Squares Problem has a unique solution (2) The system Ax = 0 only has the zero solution (3) The columns of A are linearly independent. Proof. Let ˆx be the unique Least Squares Solution and x R n is such that A T Ax = 0. Then A T A(ˆx + x) = A T Aˆx = A T b. Thus ˆx + x = ˆx i.e., x = 0 since the normal system has a unique solution. Also, this means that A T Ax = 0 only has the trivial solution, so A T A is nonsingular (A T A is nonsingular if and only if A T Av 0 for all non-zero v R n ). Now it is enough to prove that (A T A) is invertible if and only if rank(a) = n. For v R n, we have v T (A T A)v = (v T A T )(Av) = (Av) T (Av) = Av 2 2, so A T Av 0

4 4 DO Q LEE if and only if Av 0. For 1 i n, denote the i-th component of v by v i and the i-th column of A by a i. Then Av is a linear combination of the columns of A, the coefficients of which are the components of v. It follows that Av 0 for all non-zero v if and only if the n columns of A are linearly independent, i.e., A is full-rank. 3. Norms and Conditioning This section will first justify the choice of the Euclidean 2-norm from our introduction in (1.1). Then we will see how this choice of norm is closely linked to analyzing the sensitivity and conditioning of the Least Squares Problem Norms: Quantifying Error and Distance. A norm is a function that assigns a positive length to all the vectors in a vector space. The most common norms on R n are (1) The Euclidean norm: x 2 = ( n x 2 ) 1/2, i=1 (2) The p-norm: x p = ( n x p ) 1/p for p 1, and i=1 (3) The Infinite norm: x = max 1 i n x i. Given so many choices of norm, it is natural to ask whether they are in any way comparable. It turns out that in finite dimensions, all norms are in fact equivalent to each other. The following theorem allows us to generalize Euclidean spaces to vector spaces of any finite dimension. Theorem 3.1. If S is a vector space of finite dimension, then all norms are equivalent on S i.e., for all pairs of norms n, m, there exist two constants c and C such that 0 < c C, and x S, we have c x n x m C x n Proof. It is enough to show that any two norms are equivalent on the unit sphere of a chosen norm, because for a general vector x R n, we can write x = γx 0, where γ = x n and x 0 is a vector on the unit sphere. We first prove that any norm is a continuous function: Suppose that x 0 R n. Then for every x R n, the triangle inequality gives x x 0 x x 0. Thus x x 0 < ɛ whenever x x 0 < ɛ. For simplicity, we work in the case n = 2. For the second inequality, let {e i } be the canonical basis in R n. Then write x = x i e i so that x m x i e i m xi 2 ei m 2 = C x 2 For the first inequality, by using continuity on the unit sphere, it can be shown that the function x x m has a minimum value c on the unit sphere. Now write x = x 2ˆx so that x m = x 2 ˆx m c x 2. With the equivalence of norms, we are now able to choose the 2-norm as our tool for the Least Squares Problem: solutions in the 2-norm are equivalent to solutions in any other norm.

5 NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS 5 The 2-norm is the most convenient one for our purposes because it is associated with an inner product. Once we have an inner product defined on a vector space, we can define both a norm and distance for the inner product space: Definition 3.2. Suppose that V is an inner product space. The norm or length of a vector u V is defined as u = u, u 1 2. The equivalence of norms also implies that any properties of accuracy and stability are independent of the norm chosen to evaluate them. Now that we have a means of quantifying distance and length, we have the tools for quantifying the error for our Least Squares Solutions Sensitivity and Conditioning: Perturbations to b. In general, a nonsquare m n matrix A has no inverse in the usual sense. But if A has full rank, a pseudoinverse can be defined as (3.3) A + = (A T A) 1 A T, and condition number κ(a) by (3.4) κ(a) = A A +. Combined with Theorem 2.3, the Least Squares Solution of Ax = b can be given by x = A + b. We will soon see that if κ(a) 1, small perturbations in A can lead to large errors in the solution. Systems of linear equations are sensitive if a small perturbation in the matrix A or in the right-hand side b causes a significant change in the solution x. Such systems are called ill-conditioned. The following is an example of an ill-conditioned matrix with respect to perturbations on b. Example 3.5. The linear system Ax = b, with 1 1 A =, b = ɛ [ 2 2], where 0 < ɛ 1 has the solution x = [ 2 0 ] T. We claim this system is illconditioned because small perturbations in b alter the solution significantly. If we compute the system Aˆx = ˆb, where 2 ˆb = 2 + ɛ we see this has the solution ˆx = [ 1 1 ] T, which is completely different from x. One way to determine sensitivity to perturbations of b is to examine the relationship between error and the residual. Consider the solution of the system Ax = b, with expected answer x and computed answer ˆx. We will write the error e and the residual r as e = x ˆx, r = b Aˆx = b ˆb. Since x may not be obtained immediately, the accuracy of the solution is often evaluated by looking at the residual r = b Aˆx = Ax Aˆx = Ae We take the norm of e to get a bound for the absolute error e 2 = x ˆx 2 = A + (b ˆb) 2 A + 2 b ˆb 2 = A + 2 r 2

6 6 DO Q LEE so e 2 A + 2 r 2. Using this we can derive a bound for the relative error e 2 / x 2 and r 2 / b 2 : Thus (3.6) e 2 A + 2 r 2 Ax 2 b 2 A + 2 A 2 x 2 r 2 b 2. e 2 x 2 κ(a) r 2 b 2, If κ(a) is large (i.e., the matrix is ill-conditioned), then relatively small perturbations of the right-hand side b (and therefore the residual) may lead to even larger errors. For well-conditioned problems (κ(a) 1) we can derive another useful bound: so that (3.7) r 2 x 2 = Ae 2 x 2 = Ae 2 A + b 2 A 2 e 2 A + 2 b 2 1 r 2 e 2. κ(a) b 2 x 2 Finally, combine (3.6) and (3.7) to obtain 1 r 2 x ˆx 2 κ(a) r 2. κ(a) b 2 x 2 b 2 These bounds are true for any A, but show that the residual is a good indicator of the error only if A is well-conditioned. The closer κ(a) is to 1, the smaller the bound can become. On the other hand, an ill-conditioned A would allow for large variations in the relative error Sensitivity to perturbations in A. Small perturbations on ill-conditioned matrices can lead to large changes in the solution. Example 3.8. For the linear system Ax = b, let 1 + ɛ 1 ɛ ɛ ɛ A =, A = 1 ɛ 1 + ɛ ɛ ɛ with 0 < ɛ 1. Then consider the perturbed matrix 1 1 Â = A + A =. 1 1 Â is singular, so Âˆx = b has no solution. Consider the linear system Ax = b, but now A is perturbed to Â = A + A. Denote by x the exact solution to Ax = b, and by ˆx the solution to the perturbed Least Squares Problem Âˆx = b. Now we can write ˆx = x + x so that so that Âˆx = (A + A)(x + x) = b Ax b + ( A)x + A( x) + ( A)( x) = 0 The term ( A)( x) is negligibly small compared to the other terms, so we get ( x) = A + ( A)x.

7 NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS 7 Taking norms leads to the following result: or x 2 A + 2 A 2 x 2 = A + 2 A 2 A 2 A 2 x 2 x ˆx 2 x 2 κ(a) A Â 2 A Normal Equations Method From Theorem 2.3, we have seen that finding the Least Squares Solution can be reduced to solving a system of Normal Equations A T Ax = A T b. The Normal Equations Method computes the solution to the Least Squares Problem by transforming the rectangular matrix A into triangular form Cholesky Factorization. If A has full column rank, then the following holds for A T A: (1) A T A is symmetric ((A T A) T = A T (A T ) T = A T A) (2) A T A is positive definite (x T A T Ax = (Ax) T Ax = Ax 2 2 > 0 if x 0). Thus, if an m n matrix A has rank n, then A T A is n n, symmetric, and positive definite. In this case it is favorable to use the Cholesky Factorization, which decomposes any positive definite symmetric matrix into two triangular matrices so that A T A can be expressed as A T A = LL T where L is an n n lower triangular matrix. See [2] for a detailed analysis Flop: Complexity of Numerical Algorithms. Triangular matrices are used extensively in numerical algorithms such as the Cholesky or the QR Factorization because triangular systems are one of the simplest systems to solve. By deriving the flop count for triangular system solving, this section introduces flop counting as a method of evaluating the performance of an algorithm. A flop is a floating point operation (+,,, /). In an n n unit lower triangular system Ly = b, each y k in y is obtained by writing k 1 y k = b k l kj y j, which requires k 1 multiplications and k 1 additions. Thus y requires n 2 n flops to compute. Since n is usually sufficiently large to ignore lower order terms, we say that an n-by-n forward substitution costs n 2 flops. When a linear system is solved by this algorithm, the arithmetic associated with solving triangular system is often dominated by the arithmetic required for the factorization. We only worry about the leading-order behaviors when counting flops; i.e., we assume m, n are large. Keeping this in mind, we will examine the Cholesky Factorization used for solving systems involving symmetric, positive definite matrices. j=1

8 8 DO Q LEE 4.3. Algorithm: Computing the Cholesky Factorization. For a matrix A, define A i:i,j:j to be the (i i + 1) (j j + 1) submatrix of A with upper left corner a ij and lower right corner a i j. Algorithm 4.1. R = A for k = 1 to m for j = k + 1 to m R j,j:m = R j,j:m R k,j:m R kj /R kk R k,k:m = R k,k:m / R kk The 4-th line dominates the operation count for this algorithm, so its flop count can be obtained by considering m m m k m 2(m j) 2 j k 2 m 3 /3 k=1 j=k+1 k=1 j=1 Thus we have the following algorithm for the Normal Equations Method: (1) Calculate C = A T A (C is symmetric, so mn 2 flops) (2) Cholesky Factorization C = LL T (n 3 /3 flops) (3) Calculate d = A T b (2mn flops) (4) Solve Lz = d by forward substitution (n 2 flops) (5) Solve L T x = z by back substitution (n 2 flops) This gives us the cost for large m, n : mn 2 + (1/3)n 3 flops 4.4. Shortcomings of Normal Equations. The Normal Equations Method is much quicker than other algorithms but is in general more unstable. Example 4.2. Consider the matrix 1 1 A = ɛ 0, 0 ɛ where 0 < ɛ 1. Then for very small ɛ, A T 1 + ɛ 2 1 A = ɛ 2 = which is singular. k=1 1 1, 1 1 Conditioning of the system is also worsened: κ(a T A) = [κ(a)] 2, so the Normal Equations have a relative sensitivity that is squared compared to the original Least Squares Problem Ax = b. Thus we can conclude that any numerical method using the Normal Equations will be unstable, since the rounding errors will correspond to (κ(a)) 2 instead of κ(a). 5. Orthogonal Methods - The QR Factorization The QR Factorization is an alternative approach that avoids the shortcomings of Normal Equations. For a matrix A R m n with full rank n, let A = [ a1 a 2 a n ]. We wish to produce a sequence of orthonormal vectors q1, q 2,... spanning the same space as the columns of A: (5.1) q 1,..., q j = a 1,..., a j

9 NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS 9 for j = 1, 2,..., n. This way, for k = 1,..., n, each a k can be expressed as a linear combination of q 1,..., q k. This is equivalent to the matrix form A = QR where Q R m n has orthonormal columns, and R R n n is upper triangular. To understand the factorization we will first introduce some special classes of matrices and their properties Orthogonal Matrices. A matrix Q R m n with m n, has orthonormal columns if all columns in Q are orthogonal to every other column and are normalized. When m = n so that Q is a square matrix, we can define an orthogonal matrix: Definition 5.2. A square matrix with orthonormal columns is referred to as an orthogonal matrix. For an orthogonal matrix Q, it holds that Q T Q = QQ T = I n. From this it is clear that Q 1 = Q T and if Q is an orthogonal matrix, then Q T is also orthogonal. Using these properties, we will prove the following lemma which shows that multiplying a vector by an orthogonal matrix preserves its Euclidean norm: Lemma 5.3. Multiplying a vector with an orthogonal matrix does not change its 2-norm. Thus if Q is orthogonal it holds that Qx 2 = x 2. Proof. Qx 2 2 = (Qx) T Qx = x T Q T Qx = x T x = x 2 2 Norm preservation implies no amplification of numerical error. The greatest advantage of using orthogonal transformations is in its numerical stability: if Q is orthogonal then κ(q) = 1. It is also clear that multiplying both sides of Least Squares Problem (1.1) by an orthogonal matrix does not change its solution Triangular Least Squares Problems. The upper triangular overdetermined Least Squares Problem can be rewritten as R b1 x =, O b 2 where R is an n n upper triangular partition, the entries of O are all zero, and b = T b 1 b 2 is partitioned similarly. Then the residual becomes r 2 2 = b 1 Rx b Although b does not depend on x, the first term b 1 Rx 2 2 can be minimized when x satisfies the n n triangular system Rx = b 1, which can be easily solved by back substitution. In this case, x is the Least Squares Solution with the minimum residual r 2 2 = b Recall that solving Least Squares Problems by Normal Equations squares the condition number, i.e., κ(a T A) = [κ(a)] 2. Combined with Lemma 5.3, we can conclude that the QR approach enhances numerical stability by avoiding this squaring effect.

10 10 DO Q LEE 5.3. The QR Factorization in Least Squares Problems. Theorem 5.4. Given the matrix A R m n and the right hand side b R m, the solution set of the Least Squares Problem is identical to the solution set of min x R n b Ax 2 Rx = Q T b where the m n matrix Q with orthonormal columns and the upper triangular n n matrix R, is a QR-factorization of A. Proof. Given m n matrix A with m > n, we need to find an m m orthogonal matrix Q such that R A = Q, O so that when applied to the Least Squares Problem, Ax = b becomes equivalent to (5.5) Q T R [ˆb1 Ax = x = = Q O] T b. ˆb2 where R is n n and upper triangular. If we partition Q as an m m orthogonal matrix Q = Q 1 Q 2, where Q1 is m n, then [ R A = Q = O] R Q 1 Q 2 = Q O 1 R, which is called the reduced QR Factorization of A. Then the solution to the Least Squares Problem Ax = b is given by solution to square system Q 1 T Ax = Rx = ˆb 1 = Q 1 T b So the minimum value of b Ax 2 2 is realized when Q T 1 b Rx 2 2 = 0, i.e. when Q T b Rx = 0 Although we have shown that QR is an efficient, accurate way to solve Least Squares, exhibiting the QR Factorization is quite a different matter. A standard method of obtaining a QR Factorization is via Gram-Schmidt orthogonalization. Although this can be implemented numerically, it turns out not to be as stable as QR via Householder reflectors, and thus we choose to focus on the latter method. Finally, the Least Squares Problem can be solved by the following algorithm: (1) QR Factorization of A (Householder or Gram-Schmidt 2mn 2 flops) (2) Form d = Q T b (2mn flops) (3) Solve Rx = d by back substitution (n 2 flops) This gives us the cost for large m, n : 2mn 2 flops 5.4. Calculating the QR-factorization - Householder Transformations. This section gives a rough sketch on how to calculate the QR-factorization in a way that is numerically stable. The main idea is to multiply the matrix A by a sequence of simple orthogonal matrices Q k in order to shape A into an upper triangular matrix Q n Q 2 Q 1. In the k-th step, the matrix Q k introduces zeroes below the diagonal in the k-th column while keeping the zeroes in previous rows. By the end of the n-th step, all the entries below the diagonal become zero, making Q T = Q n Q 2 Q 1 A = R upper triangular.

11 NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS 11 In order to construct the matrix Q T, choose each orthogonal matrix Q k such that I 0 Q k =, 0 F where I is a k 1 k 1 identity matrix and F is an m k + 1 m k + 1 matrix.the identity matrix in Q k allows us to preserve the zeroes already introduced in the first k 1 columns, while F introduces zeroes below the diagonal in the k-th column. Now let x R m k+1. We define F as the Household Reflector, which is chosen as F = I 2 vvt v T v, where v = x e 1 x. This achieves x 0 F x =. 0 = x e 1. After A has been reduced to upper triangular form Q T A = R, the orthogonal matrix Q is constructed by directly computing the formula Q T = Q n Q 1 Q = Q 1 Q n. As this process is repeated n times, each process works on smaller blocks of the matrix while deleting the entries below the diagonal in the next column. Each Householder transformation is applied to entire matrix, but does not affect prior columns, so zeros are preserved. In order to achieve the operation count, we take advantage of the matrix-vector multiplications rather than the full use of matrix multiplication. In other words, we perform (I 2 vvt v T v )A = A A) 2v(vT v T v which consists of a matrix-vector multiplication and an outer product. Compared to the Cholesky Factorization, applying the Householder transformation this way is much cheaper because it requires only vector v, rather than the full matrix F. Repeating this process n times gives us the Householder algorithm: Algorithm 5.6. for k = 1 to n v = A k:m,k u k = v v e 1 u k = u k / u k A k:m,k:n = A k:m,k:n 2u k (u k T A k:m,k:n ) Most work is done by the last line in the loop: A k:m,k:n = A k:m,k:n 2u k (u k T A k:m,k:n ), which costs 4(m k + 1)(n k + 1) flops. Thus we have T otal flops n 4(m k + 1)(n k + 1) = 2mn 2 2n 3 /3 k=1

12 12 DO Q LEE 5.5. Rank Deficiency: Numerical Loss of Orthogonality. We have assumed so far that the matrix A has full rank. But if A is rank-deficient, the columns A would be linearly dependent, so there would be some column a j in A such that a j span{q 1,..., q j 1 } = span{a 1,..., a j 1 }. In this case, from 5.1 we can see that the QR Factorization will fail. Not only that, when a matrix is close to rankdeficient, it could lose its orthogonality due to numerical loss. Consider a 2 2 matrix A = It is easy to check that A has full rank. But with a 5-digit accuracy of the problem, after computing the QR Factorization we see that Q = is clearly not orthogonal. If such a Q is used to solve the Least Squares Problem, the system would be highly sensitive to perturbations. Nevertheless, a minimum norm solution can still be computed with the help of Singular Value Decomposition (SVD), which will be covered in the next section. 6. Singular Value Decomposition (SVD) If the QR Factorization used orthogonal transformations to reduce the Least Squares Problem to a triangular system, the Singular Value Decomposition uses orthogonal transformations to reduce the problem into a diagonal system. We first introduce the Singular Value Decomposition. Theorem 6.1. Let A be an arbitrary m n matrix with m n. Then A can be factorized as A = UΣV T, where U R m m and V R n n are orthogonal matrices, and Σ = diag(σ 1,..., σ n ), where σ 1... σ n 0. Proof. Let σ 1 = A 2. Choose v 1 R n and u 1 R m such that v 1 2 = 1, u 1 2 = σ 1 and u 1 = Av 1. Then normalize u 1 as u 1 = u 1 / u 1 2. To form the orthogonal matrices U 1 and V 1, extend v 1 and u 1 to some orthonormal bases {v i } and {u i } as the orthogonal columns of each matrix. Then we have (6.2) U T σ1 w 1 AV 1 = S = T. 0 B Then it suffices to prove that w = 0. Consider the inequality σ1 w T σ1 σ 2 0 B w 1 + w 2 = (σ w 2 ) 1/2 σ1 w 2 2 which gives us S 2 (σ w 2 ) 1/2. But since U 1 and V 1 are orthogonal, from (6.2) we have S 2 = σ 1, which makes w = 0. Now proceed by induction. If n = 1 or m = 1 the case is trivial. In other cases, the submatrix B has an SVD B = U 2 Σ 2 V 2 T by the induction hypothesis. Thus we obtain the SVD of A in the following form: T 1 0 σ A = U 1 V 0 U 2 0 Σ 2 0 V 1 T 2

13 NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS 13 The main idea behind the SVD is that any matrix can be transformed into a diagonal matrix if we choose the right orthogonal coordinate systems for its domain and range. So using the orthogonality of V we can write the decomposition in the form AV = UΣ The Minimum Norm Solution using SVD. If rank(a) < n, Theorem 2.4 tells us that the solution to the Least Squares Problem may not be unique, i.e., multiple vectors x give the minimum residual norm. Of course, rank deficiency should not happen in a well-formulated Least Squares Problem: the set of variables used to fit the data should be independent in the first place. In the case when A is rank-deficient, we can still compute the Least Squares Solution by selecting the solution ˆx that has the smallest norm among the solutions. We begin with the pseudoinverse A + from (3.3) redefined in terms of the SVD: A + = V Σ + U T where Σ + = diag(σ 1 1,..., σ 1 n ). Note that this pseudoinverse exists regardless of whether the matrix is square or has full rank. To see this, suppose A = UΣV T is the SVD of A R n. For rank-deficient matrices with rank(a) = r < n, we have Σ1 0 Σ =, Σ = diag(σ 1,..., σ r ), σ 1 σ r > 0, with U and V partitioned accordingly as U = [ U 1 ] [ U 2 and V = V1 ] V 2. The pseudoinverse A + is given by A + = V Σ + U T = V 1 Σ 1 1 U T 1, where Σ + 1 Σ1 0 = 0 0 Since Σ 1 is nonsingular, A + always exists, and it can be easily confirmed that AA + = A + A = I. Now write c and y as [ c = U T T ] [ U1 b c1 b = U T =, y = V T T ] V1 x y1 x = 2 b c 2 V T = 2 x y 2 Apply the orthogonality of U and the SVD of A to the Least Squares Problem: b Ax 2 2 = UU T b UΣV T x 2 2 = c1 Σ 1 y 1 2 = c c 1 Σ 1 y c so that we have b Ax 2 c 2 2 x R n. Equality holds when x = V y = [ Σ 1 V 1 V 1 c ] 1 2 = A + b + V y 2 y 2 2 for any y 2 R n r, making x the general solution of the Least Squares Problem. Note that the solution is unique when r = rank(a) = n, in which case V 1 = V. On the other hand, if A is rank-deficient (r < n), we write z as z = V y = [ V 1 V 2 ] [ Σ 1 1 c 1 y 2 2 ] = A + b + V 2 y 2 for some nonzero y 2. Let ˆx = A + b, i.e., the solution when y 2 = 0. Then z 2 2 = A + b y > x 2 2.

14 14 DO Q LEE so that the vector x = A + b = V 1 Σ 1 1 U T 1 b is the minimum norm solution to the Least Squares Problem. Thus the minimum norm solution to the Least Squares Problem can be obtained through the following algorithm: (1) Compute A = UΣV T = U 1 Σ 1 V T 1, the SVD of A (2) Compute U T b (3) Solve Σw = U T b for w (Diagonal System) (4) Let x = V w The power of the SVD lies in the fact that it always exists and can be computed stably. The computed SVD will be well-conditioned because orthogonal matrices preserve the 2-norm. Any perturbation in A will not be amplified by the SVD since δa 2 = δσ Computing the SVD of Matrix A. There are several ways to compute the SVD of the matrix A. First, the nonzero Singular Values of A can be computed by an eigenvalue computation for the normal matrix A T A. However, as we have seen in the conditioning of Normal Equations, this approach is numerically unstable. Alternatively, we can transform A to bidiagonal form using Householder reflections, and then transform this matrix into diagonal form using two sequences of orthogonal matrices. See [3] for an in-depth analysis. While there are other versions of this method, the work required for computing the SVD is typically evaluated to be 2mn n 3. [3] The computation of the SVD is an interesting subject in itself. Various alternative approaches are used in practice. Some methods emphasize speed (such as the divide-and-conquer methods), while others focus on the accuracy of small Singular Values (such as some QR implementations) [3]. But in any case, once it has been computed, the SVD can be a powerful tool in itself. 7. Comparison of Methods So far, we have seen three algorithms for solving the Least Squares Problem: (1) Normal equations by Cholesky Factorization costs mn 2 + n 3 /3 flops. It is the fastest method but at the same time numerically unstable. (2) QR Factorization costs 2mn 2 2n 3 /3 flops. It is more accurate and broadly applicable, but may fail when A is nearly rank-deficient (3) SVD costs 2mn n 3 flops. It is expensive to compute, but is numerically stable and can handle rank deficiency If m n, (1) and (2) require about same amount of work. If m n, the QR method requires about twice as much work as Normal Equations. But the error for the Normal Equations Method produces solutions whose relative error is proportional to [κ(a)] 2. The Householder QR method is more accurate and more widely used than the Normal Equations, but these advantages may not be worth the additional cost when the problem is well conditioned. The SVD is more stable and robust than the QR approach, but requires more computational work. If m n, then the work for QR and SVD are both dominated by the first term, 2mn 2, so the two methods cost about the same amount of work.

15 NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS 15 However, when m n the cost of the SVD is roughly 10 times that of the QRfactorization. 8. Acknowledgements It is a pleasure to thank my mentor Jessica Lin for her guidance and expertise on numerical analysis. Her time and efforts in guiding me throughout my research has been most helpful. Finally, I owe great thanks to the University of Chicago REU program for funding my research. References 1. Charles L. Lawson and Richard J. Hanson, Solving least squares problems, Prentice-Hall Inc., Englewood Cliffs, N.J., 1974, Prentice-Hall Series in Automatic Computation. MR (51 #2270) 2. Endre Süli and David F. Mayers, An introduction to numerical analysis, Cambridge University Press, Cambridge, MR Lloyd N. Trefethen and David Bau, III, Numerical linear algebra, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, MR (98k:65002)

### Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Chapter 3 Linear Least Squares Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

### Linear Least Squares

Linear Least Squares Suppose we are given a set of data points {(x i,f i )}, i = 1,...,n. These could be measurements from an experiment or obtained simply by evaluating a function at some points. One

### Matrix Norms. Tom Lyche. September 28, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Matrix Norms Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 28, 2009 Matrix Norms We consider matrix norms on (C m,n, C). All results holds for

### CS3220 Lecture Notes: QR factorization and orthogonal transformations

CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss

### Using the Singular Value Decomposition

Using the Singular Value Decomposition Emmett J. Ientilucci Chester F. Carlson Center for Imaging Science Rochester Institute of Technology emmett@cis.rit.edu May 9, 003 Abstract This report introduces

### Lecture 5 - Triangular Factorizations & Operation Counts

LU Factorization Lecture 5 - Triangular Factorizations & Operation Counts We have seen that the process of GE essentially factors a matrix A into LU Now we want to see how this factorization allows us

### Linear Systems. Singular and Nonsingular Matrices. Find x 1, x 2, x 3 such that the following three equations hold:

Linear Systems Example: Find x, x, x such that the following three equations hold: x + x + x = 4x + x + x = x + x + x = 6 We can write this using matrix-vector notation as 4 {{ A x x x {{ x = 6 {{ b General

### Numerical Linear Algebra Chap. 4: Perturbation and Regularisation

Numerical Linear Algebra Chap. 4: Perturbation and Regularisation Heinrich Voss voss@tu-harburg.de Hamburg University of Technology Institute of Numerical Simulation TUHH Heinrich Voss Numerical Linear

### Factorization Theorems

Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

### The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### MATH 240 Fall, Chapter 1: Linear Equations and Matrices

MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS

### 7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

### 1 Orthogonal projections and the approximation

Math 1512 Fall 2010 Notes on least squares approximation Given n data points (x 1, y 1 ),..., (x n, y n ), we would like to find the line L, with an equation of the form y = mx + b, which is the best fit

### MATH36001 Background Material 2015

MATH3600 Background Material 205 Matrix Algebra Matrices and Vectors An ordered array of mn elements a ij (i =,, m; j =,, n) written in the form a a 2 a n A = a 2 a 22 a 2n a m a m2 a mn is said to be

### 9.3 Advanced Topics in Linear Algebra

548 93 Advanced Topics in Linear Algebra Diagonalization and Jordan s Theorem A system of differential equations x = Ax can be transformed to an uncoupled system y = diag(λ,, λ n y by a change of variables

### Solving Sets of Equations. 150 B.C.E., 九章算術 Carl Friedrich Gauss,

Solving Sets of Equations 5 B.C.E., 九章算術 Carl Friedrich Gauss, 777-855 Gaussian-Jordan Elimination In Gauss-Jordan elimination, matrix is reduced to diagonal rather than triangular form Row combinations

### 9. Numerical linear algebra background

Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization

### Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

### Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

### Iterative Methods for Computing Eigenvalues and Eigenvectors

The Waterloo Mathematics Review 9 Iterative Methods for Computing Eigenvalues and Eigenvectors Maysum Panju University of Waterloo mhpanju@math.uwaterloo.ca Abstract: We examine some numerical iterative

### The Moore-Penrose Inverse and Least Squares

University of Puget Sound MATH 420: Advanced Topics in Linear Algebra The Moore-Penrose Inverse and Least Squares April 16, 2014 Creative Commons License c 2014 Permission is granted to others to copy,

### Summary of week 8 (Lectures 22, 23 and 24)

WEEK 8 Summary of week 8 (Lectures 22, 23 and 24) This week we completed our discussion of Chapter 5 of [VST] Recall that if V and W are inner product spaces then a linear map T : V W is called an isometry

### NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

### Similarity and Diagonalization. Similar Matrices

MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### 5. Orthogonal matrices

L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

### Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001,

### 3 Orthogonal Vectors and Matrices

3 Orthogonal Vectors and Matrices The linear algebra portion of this course focuses on three matrix factorizations: QR factorization, singular valued decomposition (SVD), and LU factorization The first

### LINEAR ALGEBRA. September 23, 2010

LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................

### MATH 304 Linear Algebra Lecture 8: Inverse matrix (continued). Elementary matrices. Transpose of a matrix.

MATH 304 Linear Algebra Lecture 8: Inverse matrix (continued). Elementary matrices. Transpose of a matrix. Inverse matrix Definition. Let A be an n n matrix. The inverse of A is an n n matrix, denoted

### Diagonal, Symmetric and Triangular Matrices

Contents 1 Diagonal, Symmetric Triangular Matrices 2 Diagonal Matrices 2.1 Products, Powers Inverses of Diagonal Matrices 2.1.1 Theorem (Powers of Matrices) 2.2 Multiplying Matrices on the Left Right by

### Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

### 6. Cholesky factorization

6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix

### α = u v. In other words, Orthogonal Projection

Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

### Linear Systems COS 323

Linear Systems COS 323 Last time: Constrained Optimization Linear constrained optimization Linear programming (LP) Simplex method for LP General optimization With equality constraints: Lagrange multipliers

### MATH 3795 Lecture 7. Linear Least Squares.

MATH 3795 Lecture 7. Linear Least Squares. Dmitriy Leykekhman Fall 2008 Goals Basic properties of linear least squares problems. Normal equation. D. Leykekhman - MATH 3795 Introduction to Computational

### Orthogonal Projections

Orthogonal Projections and Reflections (with exercises) by D. Klain Version.. Corrections and comments are welcome! Orthogonal Projections Let X,..., X k be a family of linearly independent (column) vectors

### THEORY OF SIMPLEX METHOD

Chapter THEORY OF SIMPLEX METHOD Mathematical Programming Problems A mathematical programming problem is an optimization problem of finding the values of the unknown variables x, x,, x n that maximize

### UNIT 2 MATRICES - I 2.0 INTRODUCTION. Structure

UNIT 2 MATRICES - I Matrices - I Structure 2.0 Introduction 2.1 Objectives 2.2 Matrices 2.3 Operation on Matrices 2.4 Invertible Matrices 2.5 Systems of Linear Equations 2.6 Answers to Check Your Progress

### Inverses. Stephen Boyd. EE103 Stanford University. October 27, 2015

Inverses Stephen Boyd EE103 Stanford University October 27, 2015 Outline Left and right inverses Inverse Solving linear equations Examples Pseudo-inverse Left and right inverses 2 Left inverses a number

### Lecture 5: Singular Value Decomposition SVD (1)

EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system

### DETERMINANTS. b 2. x 2

DETERMINANTS 1 Systems of two equations in two unknowns A system of two equations in two unknowns has the form a 11 x 1 + a 12 x 2 = b 1 a 21 x 1 + a 22 x 2 = b 2 This can be written more concisely in

### Matrix Inverse and Determinants

DM554 Linear and Integer Programming Lecture 5 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1 2 3 4 and Cramer s rule 2 Outline 1 2 3 4 and

### Inner product. Definition of inner product

Math 20F Linear Algebra Lecture 25 1 Inner product Review: Definition of inner product. Slide 1 Norm and distance. Orthogonal vectors. Orthogonal complement. Orthogonal basis. Definition of inner product

### 4. Matrix inverses. left and right inverse. linear independence. nonsingular matrices. matrices with linearly independent columns

L. Vandenberghe EE133A (Spring 2016) 4. Matrix inverses left and right inverse linear independence nonsingular matrices matrices with linearly independent columns matrices with linearly independent rows

### Determinants. Dr. Doreen De Leon Math 152, Fall 2015

Determinants Dr. Doreen De Leon Math 52, Fall 205 Determinant of a Matrix Elementary Matrices We will first discuss matrices that can be used to produce an elementary row operation on a given matrix A.

### Vector Spaces II: Finite Dimensional Linear Algebra 1

John Nachbar September 2, 2014 Vector Spaces II: Finite Dimensional Linear Algebra 1 1 Definitions and Basic Theorems. For basic properties and notation for R N, see the notes Vector Spaces I. Definition

### NOTES on LINEAR ALGEBRA 1

School of Economics, Management and Statistics University of Bologna Academic Year 205/6 NOTES on LINEAR ALGEBRA for the students of Stats and Maths This is a modified version of the notes by Prof Laura

### Systems of Linear Equations

Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and

### Vector and Matrix Norms

Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

### (57) (58) (59) (60) (61)

Module 4 : Solving Linear Algebraic Equations Section 5 : Iterative Solution Techniques 5 Iterative Solution Techniques By this approach, we start with some initial guess solution, say for solution and

### Basics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20

Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical

### Lecture 6. Inverse of Matrix

Lecture 6 Inverse of Matrix Recall that any linear system can be written as a matrix equation In one dimension case, ie, A is 1 1, then can be easily solved as A x b Ax b x b A 1 A b A 1 b provided that

### Cofactor Expansion: Cramer s Rule

Cofactor Expansion: Cramer s Rule MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 2015 Introduction Today we will focus on developing: an efficient method for calculating

### Topic 1: Matrices and Systems of Linear Equations.

Topic 1: Matrices and Systems of Linear Equations Let us start with a review of some linear algebra concepts we have already learned, such as matrices, determinants, etc Also, we shall review the method

### 1 Introduction to Matrices

1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

### Inner Product Spaces

Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

### Numerical Methods I Eigenvalue Problems

Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)

### Practical Numerical Training UKNum

Practical Numerical Training UKNum 7: Systems of linear equations C. Mordasini Max Planck Institute for Astronomy, Heidelberg Program: 1) Introduction 2) Gauss Elimination 3) Gauss with Pivoting 4) Determinants

### 7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

7. LU factorization EE103 (Fall 2011-12) factor-solve method LU factorization solving Ax = b with A nonsingular the inverse of a nonsingular matrix LU factorization algorithm effect of rounding error sparse

### Linear Dependence Tests

Linear Dependence Tests The book omits a few key tests for checking the linear dependence of vectors. These short notes discuss these tests, as well as the reasoning behind them. Our first test checks

Facts About Eigenvalues By Dr David Butler Definitions Suppose A is an n n matrix An eigenvalue of A is a number λ such that Av = λv for some nonzero vector v An eigenvector of A is a nonzero vector v

### Orthogonal Diagonalization of Symmetric Matrices

MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

### Direct Methods for Solving Linear Systems. Linear Systems of Equations

Direct Methods for Solving Linear Systems Linear Systems of Equations Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University

### Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,

### 2.5 Gaussian Elimination

page 150 150 CHAPTER 2 Matrices and Systems of Linear Equations 37 10 the linear algebra package of Maple, the three elementary 20 23 1 row operations are 12 1 swaprow(a,i,j): permute rows i and j 3 3

### CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation

Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in

### (a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular.

Theorem.7.: (Properties of Triangular Matrices) (a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular. (b) The product

### Using determinants, it is possible to express the solution to a system of equations whose coefficient matrix is invertible:

Cramer s Rule and the Adjugate Using determinants, it is possible to express the solution to a system of equations whose coefficient matrix is invertible: Theorem [Cramer s Rule] If A is an invertible

### Regularized Matrix Computations

Regularized Matrix Computations Andrew E. Yagle Department of EECS, The University of Michigan, Ann Arbor, MI 489- Abstract We review the basic results on: () the singular value decomposition (SVD); ()

### Error Analysis of Elimination Methods for Equality Constrained Quadratic Programming Problems

Draft of the paper in: Numerical Methods and Error Bounds (G. Alefeld, J. Herzberger eds.), Akademie Verlag, Berlin, 996, 07-2. ISBN: 3-05-50696-3 Series: Mathematical Research, Vol. 89. (Proceedings of

### Solution based on matrix technique Rewrite. ) = 8x 2 1 4x 1x 2 + 5x x1 2x 2 2x 1 + 5x 2

8.2 Quadratic Forms Example 1 Consider the function q(x 1, x 2 ) = 8x 2 1 4x 1x 2 + 5x 2 2 Determine whether q(0, 0) is the global minimum. Solution based on matrix technique Rewrite q( x1 x 2 = x1 ) =

### 1.5 Elementary Matrices and a Method for Finding the Inverse

.5 Elementary Matrices and a Method for Finding the Inverse Definition A n n matrix is called an elementary matrix if it can be obtained from I n by performing a single elementary row operation Reminder:

### Definition A square matrix M is invertible (or nonsingular) if there exists a matrix M 1 such that

0. Inverse Matrix Definition A square matrix M is invertible (or nonsingular) if there exists a matrix M such that M M = I = M M. Inverse of a 2 2 Matrix Let M and N be the matrices: a b d b M =, N = c

### Direct Methods for Solving Linear Systems. Matrix Factorization

Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

### Linear Algebra Review. Vectors

Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

### Linear Algebra: Determinants, Inverses, Rank

D Linear Algebra: Determinants, Inverses, Rank D 1 Appendix D: LINEAR ALGEBRA: DETERMINANTS, INVERSES, RANK TABLE OF CONTENTS Page D.1. Introduction D 3 D.2. Determinants D 3 D.2.1. Some Properties of

### Additional Topics in Linear Algebra Supplementary Material for Math 540. Joseph H. Silverman

Additional Topics in Linear Algebra Supplementary Material for Math 540 Joseph H Silverman E-mail address: jhs@mathbrownedu Mathematics Department, Box 1917 Brown University, Providence, RI 02912 USA Contents

### Solution of Linear Systems

Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

### 3. INNER PRODUCT SPACES

. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

### Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Peter J. Olver 4. Gaussian Elimination In this part, our focus will be on the most basic method for solving linear algebraic systems, known as Gaussian Elimination in honor

### 1 VECTOR SPACES AND SUBSPACES

1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such

### Applied Linear Algebra I Review page 1

Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties

### 13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

### Solving Linear Systems of Equations. Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx.

Solving Linear Systems of Equations Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx.edu These slides are a supplement to the book Numerical Methods with Matlab:

### Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

### Chapter 8. Matrices II: inverses. 8.1 What is an inverse?

Chapter 8 Matrices II: inverses We have learnt how to add subtract and multiply matrices but we have not defined division. The reason is that in general it cannot always be defined. In this chapter, we

### 2.1: MATRIX OPERATIONS

.: MATRIX OPERATIONS What are diagonal entries and the main diagonal of a matrix? What is a diagonal matrix? When are matrices equal? Scalar Multiplication 45 Matrix Addition Theorem (pg 0) Let A, B, and

### Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,

### 4. MATRICES Matrices

4. MATRICES 170 4. Matrices 4.1. Definitions. Definition 4.1.1. A matrix is a rectangular array of numbers. A matrix with m rows and n columns is said to have dimension m n and may be represented as follows:

### Introduction to Matrix Algebra I

Appendix A Introduction to Matrix Algebra I Today we will begin the course with a discussion of matrix algebra. Why are we studying this? We will use matrix algebra to derive the linear regression model

### Matrix Algebra 2.3 CHARACTERIZATIONS OF INVERTIBLE MATRICES Pearson Education, Inc.

2 Matrix Algebra 2.3 CHARACTERIZATIONS OF INVERTIBLE MATRICES Theorem 8: Let A be a square matrix. Then the following statements are equivalent. That is, for a given A, the statements are either all true

### MATH 511 ADVANCED LINEAR ALGEBRA SPRING 2006

MATH 511 ADVANCED LINEAR ALGEBRA SPRING 26 Sherod Eubanks HOMEWORK 1 1.1 : 3, 5 1.2 : 4 1.3 : 4, 6, 12, 13, 16 1.4 : 1, 5, 8 Section 1.1: The Eigenvalue-Eigenvector Equation Problem 3 Let A M n (R). If

### CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye

CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize

### Chapter 6. Orthogonality

6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

### Linear Algebraic Equations, SVD, and the Pseudo-Inverse

Linear Algebraic Equations, SVD, and the Pseudo-Inverse Philip N. Sabes October, 21 1 A Little Background 1.1 Singular values and matrix inversion For non-smmetric matrices, the eigenvalues and singular