SOLVING A SYSTEM OF LINEAR EQUATIONS

SOLVING A SYSTEM OF LINEAR EQUATIONS 1 Introduction In the previous chapter, we determined the value x that satisfied a single equation, f(x) = 0. Now, we deal with the case of determining the values x 1, x 2,..., x n that simultaneously satisfy a set of linear equations of the form a 11 x 1 + a 12 x 2 +..., a 1n x n = b 1 a 21 x 1 + a 22 x 2 +..., a 2n x n = b 2 a n1 x 1 + a n2 x 2 +..., a nn x n = b n For small numbers of equations (n 3), linear equations can be solved readily by simple techniques. However, for four or more equations, solutions become tedious and computers must be utilized. The system of equations introduced above can be represented in a compact form where [A] is an n n matrix of coefficients B = [A]X = B a 11 a 12... a 1n a 21 a 22... a 2n a n1 a n2... a nn B is the n 1 column vector of constants, and X is the n 1 column vector of unknowns: b 1 b 2 X = x 1 x 2 b n x n 1.1 Examples Examples of engineering problems that require a solution of a system of linear equations are: 1. Referring to fig.1 and using Kirchhoffs law, the currents i 1, i 2,i 3, and i 4 can be determined by solving the following system of four equations: 9i 1 4i 2 2i 3 = 24 4i 1 + 17i 2 6i 3 3i 4 = 16 2i 1 6i 2 + 14i 3 6i 4 = 0 3i 2 6i 3 + 11i 4 = 18 1

Fig.1 1.2 Overview of numerical methods for solving a system of linear algebraic equations Two types of numerical methods, direct and iterative, are used for solving systems of linear algebraic equations. In direct methods, the solution is calculated by performing arithmetic operations with the equations. In iterative methods, an initial approximate solution is assumed and then used in an iterative process for obtaining successively more accurate solutions. 1.2.1 Direct methods In direct methods, the system of equations that is initially given in the general form, is manipulated to an equivalent system of equations that can be easily solved. Three systems of equations that can be easily solved are the upper triangular, lower triangular, and diagonal forms. The upper triangular form is shown in fig.2. The system in this form has all zero coefficients below the diagonal and is solved by a procedure called back substitution. It starts with the last equation, which is solved for x n. The value of x n is then substituted in the next-to-the-last equation, which is solved for x n 1. The process continues in the same manner all the way up to the first equation. Fig.2 x n = b n a nn x i = b i j=i+1 a ij x j i = n 1, n 2,..., 1 The lower triangular form is shown in fig.3. The system in this form has zero coefficients above the diagonal. A system in lower triangular form is solved in the same way as the upper triangular form but in an opposite order. The procedure is called forward substitution. It starts with the first equation, which is solved for x 1. The value of x 1 is then substituted in the second equation, which is solved for x 2. The process continues in the same manner all the way down to the last equation. 2

Fig.3 x 1 = b 1 a 11 x i = i 1 b i a ij x j j=1 i = 2, 3,..., n The diagonal form of a system of linear equations is shown in fig.4. A system in diagonal form has nonzero coefficients along the diagonal and zeros everywhere else. Obviously, a system in this form can be easily solved. Fig.4 x i = b i Three direct methods for solving systems of equations: Gauss elimination, GaussJordan, and LU decomposition are described in this chapter. 1.2.2 Indirect Methods Two indirect (iterative) methods, Jacobi and GaussSeidel are described in this chapter. 2 Direct Methods 2.1 Naive Gauss elimination method This section includes the systematic techniques for forward elimination and back substitution that comprise Gauss elimination. Although these techniques are ideally suited for implementation on computers, some modifications will be required to obtain a reliable algorithm. In particular, the computer program must avoid division by zero. The following method is called naive Gauss elimination because it does not avoid this problem. Subsequent sections will deal with the additional features required for an effective computer program. The approach is designed to solve a general set of n equations: The technique consists of two phases: elimination of unknowns and solution through back substitution. 3

a 11 x 1 + a 12 x 2 +... + a 1n x n = b 1 a 21 x 1 + a 22 x 2 +... + a 2n x n = b 2 a n1 x 1 + a n2 x 2 +... + a nn x n = b n 2.1.1 Forward elimination of unknowns The first phase is designed to reduce the set of equations to an upper triangular system. The initial step will be to eliminate the first unknown, x 1, from the second through the n th equations. To do this, multiply the first rwo in the previous equation by a 21 /a 11 to give a 21 x 1 + a 21 a 11 a 12 x 2 + + a 21 a 11 a 1n x n = a 21 a 11 b 1 Now, this equation can be subtracted from the second row to give ( a 22 a ) ( 21 a 12 x 2 +... + a 2n a ) 21 a 1n x n = b 2 a 21 b 1 a 11 a 11 a 11 or a 22x 2 +... + a 2nx n = b 2 where the prime indicates that the elements have been changed from their original values. The procedure is then repeated for the remaining equations. For instance, teh first row can be multiplied by a 31 /a 11 and the result subtracted from the third equation. Repeating the procedure for the remaining equations results in the following modified system: a 11 x 1 + a 12 x 2 +..., a 1n x n = b 1 a 22x 2 +... + a 2nx n = b 2 a n2x 2 +... + a nnx n = b n For the foregoing steps, row one is called the pivot equation and a 11 is called the pivot coefficient or element. Note that the process of multiplying the first row by a 21 /a 11 is equivalent to dividing it by a 11 and multiplying it by a 21. Sometimes the division operation is referred to as normalization. We make this distinction because a zero pivot element can interfere with normalization by causing a division by zero. We will return to this important issue after we complete our description of naive Gauss elimination. Now repeat the above to eliminate the second unknown from row 3 through n in the last set of equations. To do this multiply the second row by a 32/a 22 and subtract the result from the third equation. Perform a similar elimination for the remaining equations to yield a 11 x 1 + a 12 x 2 + a 13 x 3 +..., a 1n x n = b 1 a 22x 2 + a 23x 3 +... + a 2nx n = b 2 a 33x 3 +... + a 2nx n = b 3 a n2x 3 +.. + a nnx n = b n where the double prime indicates that the elements have been modified twice. 4

The procedure can be continued using the remaining pivot equations. The final manipulation in the sequence is to use the (n 1) th equation to eliminate the x n 1 term from the n th equation. At this point, the system will have been transformed to an upper triangular system: a 11 x 1 + a 12 x 2 + a 13 x 3 +..., a 1n x n = b 1 a 22x 2 + a 23x 3 +... + a 2nx n = b 2 a 33x 3 +... + a 2nx n = b 3 a (n 1) nn x n = b (n 1) n 2.1.2 back substitution x n = b(n 1) n a (n 1) nn x i = b (i 1) i j=i+1 a (i 1) ii a (i 1) ij x j i = n 1, n 2,..., 1 Example: Consider solving the following system using the naive Gauss elimination 3x 1 2x 2 + 5x 3 = 14 x 1 x 2 = 1 2x 1 + 4x 3 = 14 2.1.3 Potential difficulties when applying the Gauss elimination method Whereas there are many systems of equations that can be solved with naive Gauss elimination, there are some pitfalls that must be explored before writing a general computer program to implement the method. Division by zero The primary reason that the foregoing technique is called naive is that during both the elimination and the back-substitution phases, it is possible that a division by zero can occur. For example, if we use naive Gauss elimination to solve 2x 2 + 3x 3 = 8 4x 1 + 6x 2 + 7x 3 = 3 2x 1 + x 2 + 6x 3 = 5 the normalization of the first row would involve division by a 11 = 0. Problems also can arise when a coefficient is very close to zero due to round off errors. The technique of pivoting has been developed to partially avoid these problems. It will be described in the next section. For example, consider using Gauss elimination to solve 5

0.0003x 1 + 3.0000x 2 = 2.0001 1.0000x 1 + 1.0000x 2 = 1.0000 Note that in this form the first pivot element, a 11 = 0.0003, is very close to zero. The exact solution is x 1 = 1/3 and x 2 = 2/3. Multiplying the first equation by 1 /(0.0003) yields x 1 + 10, 000x 2 = 6667 which can be used to eliminate x 1 from the second equation: 9999x 2 = 6666 which can be solved for x 2 = 2/3. This result can be substituted back into the first equation to evaluate x 1 : x 1 = 2.0001 3(2/3) 0.0003 However, due to subtractive cancellation, the result is very sensitive to the number of significant figures carried in the computation: Significant x 2 x 1 figures 3 0.667-3.33 4 0.6667 0.0000 5 0.66667 0.30000 6 0.666667 0.330000 7 0.6666667 0.3330000 Note how the solution for x 1 is highly dependent on the number of significant figures. On the other hand, if the equations are solved in reverse order, the row with the larger pivot element is normalized. The equations are 1.0000x 1 + 1.0000x 2 = 1.0000 0.0003x 1 + 3.0000x 2 = 2.0001 Elimination and substitution yield x 2 = 2/3. For different numbers of significant figures, x 1 can be computed from the first equation, as in x 1 = 1 (2/3) 1 This case is much less sensitive to the number of significant figures in the computation: Significant x 2 x 1 figures 3 0.667 0.333 4 0.6667 0.3333 5 0.66667 0.33333 6 0.666667 0.333333 7 0.6666667 0.3333333 6

Gauss elimination with pivoting As mentioned earlier, obvious problems occur when a pivot element is zero because the normalization step leads to division by zero. Problems may also arise when the pivot element is close to, rather than exactly equal to, zero because if the magnitude of the pivot element is small compared to the other elements, then round-off errors can be introduced. Therefore, before each row is normalized, it is advantageous to determine the largest available coefficient (in absolute value) in the column below the pivot element. The rows can then be switched so that the largest element is the pivot element. 2.2 Gauss-Jordan elimination method The Gauss-Jordan method is a variation of Gauss elimination. The major difference is that when an unknown is eliminated in the Gauss-Jordan method, it is eliminated from all other equations rather than just the subsequent ones. In addition, all rows are normalized by dividing them by their pivot elements. Thus, the elimination step results in an identity matrix rather than a triangular matrix as shown in fig.5. Consequently, it is not necessary to employ back substitution to obtain the solution. Gauss-Jordan elimination with pivoting Fig.5 It is possible that the equations are written in such an order that during the elimination procedure a pivot equation has a pivot element that is equal to zero. Obviously, in this case it is impossible to normalize the pivot row (divide by the pivot element). As with the Gauss elimination method, the problem can be corrected by using pivoting. Although the Gauss-Jordan technique and Gauss elimination might appear almost identical, the former requires approximately 50 percent more operations than Gauss elimination. Therefore, Gauss elimination is the simple elimination method of preference for obtaining solutions of linear algebraic equations. One of the primary reasons that we have introduced the Gauss-Jordan, however, is that it is still used in engineering as well as in some numerical algorithms. Example: Consider solving the following system using the naive Gauss elimination 3x 1 2x 2 + 5x 3 = 14 x 1 x 2 = 1 2.3 LU decomposition method 2x 1 + 4x 3 = 14 As described in the previous sections, Gauss elimination is designed to solve systems of linear algebraic equations, [A]X = B Although it certainly represents a sound way to solve such systems, it becomes inefficient when solving equations with the same coefficients [A], but with different right-hand-side constants (the b s). Recall that 7

Gauss elimination involves two steps: forward elimination and back- substitution. Of these, the forwardelimination step comprises the bulk of the computational effort. This is particularly true for large systems of equations. LU decomposition methods separate the time-consuming elimination of the matrix [A] from the manipulations of the right-hand side B. Thus, once [A] has been decomposed, multiple right-hand-side vectors can be evaluated in an efficient manner. Before showing how this can be done, let us first provide a mathematical overview of the decomposition strategy. 2.3.1 Overview of the LU decomposition A two-step strategy (see Fig.6) for obtaining solutions can be based explained as follow LU decomposition step. [A] is factored or decomposed into lower [L] and upper [U] triangular matrices. Substitution step. [L] and [U] are used to determine a solution X for a right-hand side B. This step itself consists of two steps. First, an intermediate vector D is generated by forward substitution. Then, the result is substituted back to solve for X by back substitution. [A] = [L][U] D = [U]X [L][U]X = [L]D = B The forward-substitution step can be represented concisely as d 1 = b 1 a 11 d i = b i i 1 j=1 a ijd j i = 2, 3,..., n the back-substitution step can be represented concisely as x i = d i x n = d n a nn j=i+1 a ij x j i = n 1, n 2,..., 1 8

Fig.6 For a given matrix several methods can be used to determine the corresponding [L] and [U]. Two of the methods, one related to the Gauss elimination method and another called Crout s method, are described next. 2.3.2 LU decomposition using Gauss elimination procedure When the Gauss elimination procedure is applied to a matrix, the elements of the matrices [L] and [U] are actually calculated. The upper triangular matrix [U] is the matrix of coefficients [A] that is obtained at the end of the Gauss elimination procedure. For the lower triangular matrix [L], the elements on the diagonal are all 1, and the elements below the diagonal are the multipliers m ij that multiply the pivot equation when it is used to eliminate the elements below the pivot coefficient. For the case of a system of three equations, the decomposition has the form: a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 = m 21 1 0 1 0 0 m 31 m 32 1 where m 21 = a 21 /a 11, m 31 = a 31 /a 11, and m 32 = a 32/a 22 2.3.3 LU decomposition using Crout s method a 11 a 12 a 13 0 a 22 a 23 0 0 a 33 In this method the matrix is decomposed into the product [L][U], where the diagonal elements of the matrix [U] are all 1s. It turns out that in this case, the elements of both matrices can be determined using formulas that can be easily programmed. For example, in the case of a system of four equations a 11 a 12 a 13 a 14 a 21 a 22 a 23 a 24 a 31 a 32 a 33 a 34 a 41 a 42 a 43 a 44 = L 11 0 0 0 L 21 L 22 0 0 L 31 L 32 L 33 0 L 41 L 42 L 43 L 44 1 U 12 U 13 U 14 0 1 U 23 U 24 0 0 1 U 34 0 0 0 1 Executing the matrix multiplication on the right-hand side of the equation gives: a 11 a 12 a 13 a 14 a 21 a 22 a 23 a 24 a 31 a 32 a 33 a 34 a 41 a 42 a 43 a 44 = L 11 (L 11 U 12 ) (L 11 U 13 ) (L 11 U 14 ) L 21 (L 21 U 12 + L 22 ) (L 21 U 13 + L 22 U 23 ) (L 21 U 14 + L 22 U 24 ) L 31 (L 31 U 12 + L 32 ) (L 31 U 13 + L 32 U 23 + L 33 ) (L 31 U 14 + L 32 U 24 + L 33 U 34 ) L 41 (L 41 U 12 + L 42 ) (L 41 U 13 + L 42 U 23 + L 43 ) (L 41 U 14 + L 42 U 24 + L 43 U 34 + L 44 ) The elements of the matrices [L] and [U] can be determined by solving the previous equation. The solution is obtained by equating the corresponding elements of the matrices on both sides of the equation. 9

One can observe that the elements of the matrices [L] and [U] can be easily determined row after row from the known elements of [A] and the elements of [L] and [U] that are already calculated. Starting with the first row, the value of L 11 is calculated from L 11 = a 11. Once L 11 is known, the values of U 12, U 13, and U 14 are calculated by: U 12 = a 12 /L 11 U 13 = a 13 /L 11 U 14 = a 14 /L 11 Moving on to the next row the next elements can be calculated in a similar manner. A procedure for determining the elements of the matrices [L] and [U] can be written as follow. If [A] is an (n n) matrix, the elements of [L] and [U] are given by: Step 1: Calculating the first column of [L]: for i=1, 2,..., n L i1 = a i1 Step 2: Substituting 1s in the diagonal of [U] : for i=1, 2,..., n U ii = 1 Step 3: Calculating the elements in the first row of [U] (except U 11 which was already calculated): for j=2, 3,..., n U 1j = a 1j L 11 Step 4: Calculating the rest of the elements row after row (i is the row number and j is the column number). The elements of [L] are calculated first because they are used for calculating the elements of [U] : for i=2, 3,..., n for j=2, 3,..., i j 1 L ij = a ij L ik U kj k=1 for j=(i+1), (i+2),..., n U ij = i 1 a ij L ik U kj k=1 L ii 2.3.4 LU decomposition with pivoting Decomposition of a matrix into the matrices [L] and [U] means that [A] = [L][U]. In the presentation of Gauss and Crouts decomposition methods in the previous two subsections, it is assumed that it is possible to carry out all the calculations without pivoting. In reality, as was discussed before, pivoting may be required for a successful execution of the Gauss elimination procedure. Pivoting might also be needed with Crouts method. If pivoting is used, then the matrices [L] and [U] that are obtained are not the decomposition of the original matrix [A]. The product [L][U] gives a matrix with rows that have the same elements as [A], but due to the pivoting, the rows are in a different order. When pivoting is used in the decomposition procedure, the changes that are made have to be recorded and stored. This is done by creating a matrix [P ], called a permutation matrix, such that: [P ][A] = [L][U] The order of the rows of B have to be changed such that it is consistent with the pivoting. This is done by multiplying B by the permutation matrix,[p ]. 10

Example: Consider solving the following system using the LU decomposition x 1 x 2 = 1 3x 1 2x 2 + 5x 3 = 14 3 Iterative methods 2x 1 + 4x 3 = 14 Iterative or approximate methods provide an alternative to the elimination methods described previously. Such approaches are similar to the techniques we developed to obtain the roots of a single equation (fixed point iteration). Those approaches consisted of guessing a value and then using a systematic method to obtain a refined estimate of the root. Because the present part of the book deals with a similar problemobtaining the values that simultaneously satisfy a set of equations-we might suspect that such approximate methods could be useful in this context. For a system with n equations, the explicit equations for the [x i ] unknowns can be written as: x i = 1 b i a ij x j j=1j i For a system of (n = 4) equations the previous equation reduces to Fig.7 For a system of n equations [a]x = b, a sufficient condition for convergence is that in each row of the matrix the absolute value of the diagonal element is greater than the sum of the absolute values of the off-diagonal elements. > j=n j=1,j i a ij This condition is sufficient but not necessary for convergence when the iteration method is used. When this condition is satisfied, the matrix is classified as diagonally dominant, and the iteration process converges toward the solution. The solution, however, might converge even when the condition is not satisfied. Two iterative methods are presented next 3.1 Jacobi iterative method In the Jacobi method, an initial (first) value is assumed for each of the unknowns, x (1) 1, x(1) 2,..., x(1) n. If no information is available regarding the approximate values of the unknown, the initial value of all the unknowns can be assumed to be zero. The second estimate of the solution is calculated by substituting the first estimate in the right-hand side of the following equation x i = 1 b i a ij x j j=1j i 11

In general, the (k+1)th estimate of the solution is calculated from the (k)th estimate by: x (k+1) i = 1 b i a ij x (k) j j=1j i The iterations continue until the differences between the values that are obtained in successive iterations are small. The iterations can be stopped when the absolute value of the estimated relative error of all the unknowns is smaller than some predetermined value: Example: x (k+1) i x (k) i x (k+1) i < ɛ i = 1, 2,..., n Consider solving the following system using Jacobi iterative method (three iterations). 3x 1 2x 2 + 5x 3 = 14 x 1 x 2 = 1 3.2 Gauss-Seidel iterative method 2x 1 + 4x 3 = 14 In the GaussSeidel method, initial (first) values are assumed for the unknowns x 2, x 3,..., x n (all of the unknowns except x 1 ). If no information is available regarding the approximate value of the unknowns, the initial value of all the unknowns can be assumed to be zero. The first assumed values of the unknowns are substituted in the following equation with i = 1 to calculate the value of x 1. x i = 1 b i a ij x j a 11 j=1j i Next, the same equation with i = 2 is used for calculating a new value for x 2. This is followed by i = 3 for calculating a new value for x 3. The process continues until i = n, which is the end of the first iteration. Then, the second iteration starts with i = 1 where a new value for x 1 is calculated, and so on. In the GaussSeidel method, the current values of the unknowns are used for calculating the new value of the next unknown. In other words, as a new value of an unknown is calculated, it is immediately used for the next application. Applying the previous equations to the GaussSeidel method gives the iteration formula: x (k+1) i = 1 b i x (k+1) 1 = 1 a 11 i 1 j=1 a ij x (k+1) j + x (k+1) n = 1 a nn b 1 j=i+1 j=2 a 1j x (k) j a ij x (k) j n 1 b n a nj x (k+1) j=1 i = 2, 3,..., n 1 Convergence can be checked using the same criterion as in the Jacobi method ɛ a,i = x ( i k) x(k 1) i x ( i k) j < ɛ s for all i, where k and k 1 are the present and previous iterations. 12

Example: Consider solving the following system using Gauss-Seidel iterative method (three iterations). 3x 1 2x 2 + 5x 3 = 14 x 1 x 2 = 1 2x 1 + 4x 3 = 14 4 Use of MATLAB built in functions for solving systems of linear equations 4.1 Left division Given a system of linear equations in the form [A]X = b, one can use the left division in MATLAB so solve for X. The syntax is 4.2 Inverse operation X = [A]\b Given a system of linear equations in the form [A]X = b, one can use the inverse of the matrix to X. The syntax is 4.3 LU decomposition X = [A] 1 b or X = inv([a]) b MATLAB has a built in function for decomposition to solve for X. MATLAB uses partial pivoting. the lu function gives [L], [U], and the permutation matrix such as [L] [U] = [P ] [A]. [L, U, P ] = lu([a]) 5 Application: Inverse of a matrix As an application of the proceeding methods we proceed to finding the inverse of a matrix. The procedure will be demonstrated for a 3 3 matrix but can easily extended to a matrix of dimension n n. consider matrices [A] and [B] such as [A] [B] = I, where [I] is the identity matrix. or [A] [B] = [A] = [B] = a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 b 11 b 12 b 13 b 21 b 22 b 23 b 31 b 32 b 33 b 11 b 12 b 13 b 21 b 22 b 23 b 31 b 32 b 33 = 1 0 0 0 1 0 0 0 1 13

a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 b 11 b 21 b 31 b 12 b 22 b 32 b 13 b 23 b 33 = = = these systems can be solved by any of the methods discussed in this chapter 6 Ill conditioned systems A numerical solution of a system of equations is seldom an exact solution. Even though direct methods (Gauss, GaussJordan, LU decomposition) can be exact, they are still susceptible to round-off errors when implemented on a computer. This is especially true with large systems and with ill-conditioned systems. An ill-conditioned system of equations is one in which small variations in the coefficients of the matrix (A) cause large changes in the solution. When an ill-conditioned system of equations is being solved numerically, there is a high probability that the solution obtained will have a large error or that a solution will not be obtained at all. To illustrate this and also be able to identify whether a system of linear equations is ill conditioned we first introduce the concept of norm. By definition a norm is a real valued function that provides a measure of the size or length of multiple component mathematical entities such as vectors and matrices. Example of norms Euclidean norm: For an n-dimensional vector X = [x 1, x 2,..., x n ] Uniform vector norm: X e = n i=1 x 2 i 1 0 0 0 1 0 0 0 1 X = max 1 i n x i Frobenius norm:, for a n n matrix A of components a ij Uniform matrix norm: A e = A = max i=1 j=1 1 i n j=1 a 2 ij a ij 14

Although there are theoretical benefits for using certain of the norms, the choice is sometimes influenced by practical considerations. For example, the uniform row norm is widely used because of the ease with which it can be calculated and the fact that it usually provides an adequate measure of matrix size. Now that we have introduced the concept of norm, we can use it to define another quantity called matrix condition number This number is always larger or equal to 1. Cond[A] = A A 1 Cond[A] = A A 1 1 It can be shown that the true relative error of the solution of ([A]X = b) or X / X is less or equal to the true relative error of the residual AX / b or where X X Cond[A] AX b and AX = AX t AX NS DeltaX = X t X NS If the condition number of teh matrix [A] is large there is a large probability that the true relative error of the solution be large as well. Example Consider solving the following system of linear equations 6x 1 2x 2 = 10 11.5x 1 3.85x 2 = 17 15