-4- Algorithm Fast LU Decomposition



Similar documents
Lecture 3: Finding integer solutions to systems of linear equations

Direct Methods for Solving Linear Systems. Matrix Factorization

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

Solution of Linear Systems

Continued Fractions and the Euclidean Algorithm

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

The van Hoeij Algorithm for Factoring Polynomials

Factorization Theorems

Operation Count; Numerical Linear Algebra

The Mixed Binary Euclid Algorithm

6. Cholesky factorization

Modélisation et résolutions numérique et symbolique

7 Gaussian Elimination and LU Factorization

Symbolic Determinants: Calculating the Degree

Notes on Determinant

Arithmetic and Algebra of Matrices

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Short Programs for functions on Curves

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

LU Factoring of Non-Invertible Matrices

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Factoring & Primality

3.2 The Factor Theorem and The Remainder Theorem

Solving Linear Systems, Continued and The Inverse of a Matrix

α = u v. In other words, Orthogonal Projection

8 Primes and Modular Arithmetic

JUST THE MATHS UNIT NUMBER 1.8. ALGEBRA 8 (Polynomials) A.J.Hobson

Elementary Matrices and The LU Factorization

Primality Testing and Factorization Methods

The Characteristic Polynomial

Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA

= = 3 4, Now assume that P (k) is true for some fixed k 2. This means that

Factorization Methods: Very Quick Overview

1 Solving LPs: The Simplex Algorithm of George Dantzig

CHAPTER 5. Number Theory. 1. Integers and Division. Discussion

LINEAR ALGEBRA. September 23, 2010

Lecture 13: Factoring Integers

Using row reduction to calculate the inverse and the determinant of a square matrix

THE NUMBER OF REPRESENTATIONS OF n OF THE FORM n = x 2 2 y, x > 0, y 0

MATRICES WITH DISPLACEMENT STRUCTURE A SURVEY

Integer Factorization using the Quadratic Sieve

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

by the matrix A results in a vector which is a reflection of the given

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

Integer roots of quadratic and cubic polynomials with integer coefficients

SOLVING LINEAR SYSTEMS

MAT188H1S Lec0101 Burbulla

Zeros of a Polynomial Function

Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range

Some Lecture Notes and In-Class Examples for Pre-Calculus:

Number Theory Hungarian Style. Cameron Byerley s interpretation of Csaba Szabó s lectures

PROPERTIES OF ELLIPTIC CURVES AND THEIR USE IN FACTORING LARGE NUMBERS

Solving Systems of Linear Equations

Lecture 13 - Basic Number Theory.

UNCOUPLING THE PERRON EIGENVECTOR PROBLEM

General Framework for an Iterative Solution of Ax b. Jacobi s Method

PROBLEM SET 6: POLYNOMIALS

Continued Fractions. Darren C. Collins

Prerequisites: TSI Math Complete and high school Algebra II and geometry or MATH 0303.

Arithmetic algorithms for cryptology 5 October 2015, Paris. Sieves. Razvan Barbulescu CNRS and IMJ-PRG. R. Barbulescu Sieves 0 / 28

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur

SECTION 2.5: FINDING ZEROS OF POLYNOMIAL FUNCTIONS

A Direct Numerical Method for Observability Analysis

FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers

Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any.

A simple and fast algorithm for computing exponentials of power series

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Linearly Independent Sets and Linearly Dependent Sets

8 Square matrices continued: Determinants

Applied Linear Algebra I Review page 1

Lecture notes on linear algebra

Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

9.2 Summation Notation

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

ABSTRACT ALGEBRA: A STUDY GUIDE FOR BEGINNERS

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the school year.

Transportation Polytopes: a Twenty year Update

Notes on Cholesky Factorization

Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks

Algebra I Credit Recovery

The Division Algorithm for Polynomials Handout Monday March 5, 2012

Solving Systems of Linear Equations Using Matrices

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Systems of Linear Equations

it is easy to see that α = a

Linear Algebra: Determinants, Inverses, Rank

3 Some Integer Functions

Basic Algorithms In Computer Algebra

COMMUTATIVITY DEGREES OF WREATH PRODUCTS OF FINITE ABELIAN GROUPS

GREATEST COMMON DIVISOR

Linear Codes. Chapter Basics

1 Homework 1. [p 0 q i+j p i 1 q j+1 ] + [p i q j ] + [p i+1 q j p i+j q 0 ]

RESULTANT AND DISCRIMINANT OF POLYNOMIALS

Transcription:

Appears in SIGSAM Bulletin 22(2):4-49 (April 988). Analysis of the Binary Complexity of Asymptotically Fast Algorithms for Linear System Solving* Introduction Brent Gregory and Erich Kaltofen Department of Computer Science Rensselaer Polytechnic Institute Troy, New York 280-3590 Inter-Net: kaltofen@cs.rpi.edu Shortened Version Tw o significant developments can be distinguished in the theory of algebraic algorithm design. One is that of fast algorithms in terms of counting the arithmetic operations, such as the asymptotically fast matrix multiplication procedures and related linear algebra algorithms or the asymptotically fast polynomial multiplication procedures and related polynomial manipulation algorithms. The other is to observe the actual bit complexity when such algorithms are performed for concrete fields, in particular the rational numbers. It was discovered in the mid-960s that for rational polynomial GCD operations the classical algorithm can lead to exponential coefficient size growth. The beautiful theory of subresultants by Collins [7] and Brown & Traub [5] explains, however, that the size growth is not inherent but a consequence of over-simplified rational arithmetic. Similar phenomena were observed for the Gaussian elimination procedure [2], [9]. In this article we investigate what size growth occurs in the asymptotically faster methods. We selected the matrix triangularization algorithm by Bunch & Hopcroft [6], [], 6.4, as our paradigm. During Gaussian elimination of an integral matrix all occurring reduced rational numerators and denominators divide minors of the n (n + ) augmented input matrix, so any intermediate number is at most roughly n times longer than the original entries. The asymptotically fast methods require asymptotically fast matrix multiplication. The fine structure of the best methods [8] is fairly complex and subject to change as improvements are made. We shall assume that the matrix multiplication algorithms are called with integral matrices, that is all rational entries are first brought to a common denominator. Since Strassen s [5] original method computes linear combinations of the entries, which has essentially the same effect, this assumption seems realistic. We then show that the Bunch & Hopcroft n n matrix triangularization algorithm is worse than Gaussian elimination in that the intermediate numbers can be * This material is based upon work supported by the National Science Foundation under Grant No. DCR-85-0439 (second author), and by an IBM Faculty Development Award (second author). This paper was presented at the poster session of the 987 European Conference on Computer Algebra in Leipzig, East Germany.

-2- roughly n 2 times longer than the original entries. In light of this negative result a modular version of the fast algorithm following McClellan [2] becomes computationally superior to the rational one. We strongly suspect that the fast Knuth-Sch önhage polynomial GCD procedure [3], [3], behaves similarly. Again the modular version following Brown [4] would fix this problem. We like to acknowledge that results similar to the ones discussed in this article were also obtained by Leighton []. Notation: ByKwe denote an arbitrary field, by K m n the algebra of m by n matrices over K. For amatrix A, A( j, j 2, ; k, k 2, )denotes the submatrix obtained from A by selecting the rows j, j 2, and the columns k, k 2,. Wealso write ranges of rows/columns such as j,..., j 2 meaning all j from inclusively j to j 2. A full range is denoted by *. The elements of A are denoted by a j,k. By I n we denote the n n identity matrix. Finally, bym(n)wedenote a function dominating the complexity of n n matrix multiplication, currently at best n ω+o( ), ω = 2.376 [8]. Gaussian Elimination We first state the result on the intermediate numbers during Gaussian elimination. In this and the following algorithms we assume that all principal minors are non-zero. Normally, this condition is enforced by permuting the columns during the algorithms. Since this has no influence on the size of the computed elements, we lose no generality by our assumption. Algorithm Gaussian Elimination Input: The algorithm triangularizes U K n n in place. We superscribe the iteration counter to distinguish different versions of U. Initially, U (0) A K n n,which is the matrix to be triangularized. Output: U (n ) in upper triangular form. For i,..., n Do For j i +,..., n Do For k i +,..., n Do u (i) j,k u(i ) j,k u(i ) j,i u (i ) i,i u (i ) i,k. Theorem: After the innermost loop completes, we have for all i < n, i + j n, i + k n, (See Gantmacher [0], Chapter II, for a proof.) u (i) det(a(,..., i, j;,..., i,k)) j,k =. () det(a(,..., i;,..., i))

-3- This leads not only to size bounds on the intermediate rationals by Hadamard s determinant inequality, but gives the following simple exact division rule: The unreduced numerators and denominators of the new rational entries are divisible by their previous denominators.* Applying this rule, the algorithm then computes exactly the numerators and denominators in (). A similar rule applies for the backward substitution process if one uses the triangularization to solve alinear system. Fast Triangular Matrix Inversion We now describe a division-free version of the asymptotically fast algorithm for computing the inverse of a triangular matrix [], 6.3. This algorithm will be called by the asymptotically fast LU decomposition routine. Algorithm Triangular Inverse Input: E D n n upper triangular, n =2 r,r 0, D an integral domain. Output: E* D n n upper triangular such that EE* =(e, e n,n )I n. If n =Then Return E* = I. Let E, : = E(,..., n 2 ;,..., n 2 ), E, 2 : = E(,..., n 2 ; n 2 +,..., n), E 2, 2 : = E( n 2 +,..., n; n 2 +,..., n). Step R (Recursive calls): Call the algorithm recursively with E, obtaining E *, and E 2, 2 obtaining E * 2, 2. Step M (Matrix multiplication): η e, e n/2,n/2, η 2 e n/2+,n/2+ e n,n. Return E* η 2 E *, 0 E *, E, 2E * 2, 2 η E *. 2, 2 by In terms of arithmetic operations in D, the running time I(n) of this algorithm is estimated I(n) 2I( n 2 ) + 2M( n 2 ) + O(n2 ), which gives I(n) =O(M(n)). Fast LU Decomposition *The discovery of this fact is difficult to trace in the literature. In recent times it has been treated by E. Bareiss, J. Edmonds, and D. K. Faddeev. Last century mathematicians sometimes associated with it include C. L. Dodgson, C. F. Gauss, C. Jordan, and J. J. Sylvester. Unfortunately, wehave not found a convincing first source.

-4- We now present a detailed description of the Bunch-Hopcroft algorithm. For later analysis purposes our algorithm changes the input matrix in place and is therefore more involved than the purely recursive version in [], 6.4, which we recommend to the reader for an initial understanding of the algorithm. Algorithm Fast LU Decomposition Global Objects: L, U K n n, K afield, n =2 r,r 0. Furthermore, we will use an iteration counter i. L and U get updated in place, at which point i is incremented. We denote their versions as L (i), U (i). Initially, L (0) I n, U (0) A K n n,which is the matrix to be decomposed. We assume that all principal minors of A are non-zero. The counter i merely helps to analyze the algorithm and does not participate in the computation. Input: Arow index j, j n,and a bandwidth m =2 s,0 s<r. Let the iteration counter on call be i 0.Figure depicts a typical state at this point. Output: Let the iteration counter on return be i 3.This call will have factored the sub-matrix U (i 0) ( j,..., j + m ; j,..., n) = L 3 U 3, where the lower triangular factor is embedded in L (i 3) as and the upper triangular factor in U (i 3) as L 3 : = L (i 3) ( j,..., j + m ; j,..., j + m ) K m m, U 3 : = U (i 3) ( j,..., j + m ; j,..., n) K m (n j+). No other blocks in L (i 0) and U (i 0) will be changed by this call, so we always have A = L (i) U (i). If m =Then Return. No update occurs. It is here that a column transposition of U (i0) can make u (i 0) j, j 0incase a principal minor of A is 0. Step U (Factor U (i 0) ( j,..., j + m/2 ; j,..., n)): Call the algorithm recursively with j and m/2. Let i be the value of i after this call. The updated blocks in L and U are and L : = L (i ) ( j,..., j + m 2 ; j,..., j + m 2 ) K m 2 m 2 U : = U (i ) ( j,..., j + m 2 ; j,..., n) K m 2 (n j+). Step Z (Zero-out U (i ) ( j + m/2,..., j + m ; j,..., j + m/2 )): Let

-5- E : = U ( * ;,..., m 2 ) K m 2 m 2, E : = U ( * ; m 2 +,..., n j +) K m 2 (n j+ m 2 ), and let F : = U (i ) ( j + m 2,..., j + m ; j,..., j + m 2 ) K m 2 m 2, F : = U (i ) ( j + m 2,..., j + m ; j + m 2,..., n) K m 2 (n j+ m 2 ). Notice that E is upper triangular, and that the blocks of U (i ) corresponding to F and F have not changed since version i 0.Atthis point U (i 0) ( j,..., j + m ; j,..., n) = L 0 0 E I m/2 F E. F Call algorithm Triangular Inverse on E to obtain E. Compute FE. Atthis point L and U will be updated so we increment i i +. The new value of i is i 2 = i +. Set L (i 2) ( j + m 2,..., j + m ; j,..., j + m 2 ) FE, (2) U (i 2) ( j + m 2,..., j + m ; j,..., j + m 2 ) 0 K m 2 m 2, At this point we have U (i 2) ( j + m 2,..., j + m ; j + m 2,..., n) F (FE )E. (3) U (i 0) ( j,..., j + m ; j,..., n) = L FE 0 I m/2 E 0 Step L (Factor U (i 2) ( j + m/2,..., j + m ; j + m/2,..., n)): Call the algorithm recursively with j + m/2 and m/2. E F (FE. )E The algorithm initially gets called with j =and m = n. Wethen always have i 0 = j, i 2 = j + m/2, i 3 = j + m, as can be seen from Figure. In terms of arithmetic operations on an n n matrix for input width m performed, we have T n (m) 2T n ( m 2 ) + I( m 2 ) + O( n m M( m 2 )),

-6- Figure :L (0) and U (0) upon call of algorithm Fast LU Decomposition with n =6, i =0, j =, m =2.The sub-diagonal blocks in L (0) numbered in italics are filled in in the order given. (0) L 2 4 3 8 5 6 7 9 0 The submatrix of U (0) to be factored by this call is hashed. (0) U 9 0 ///// ///// ///// ///// ///// ///// 2 ///// ///// ///// ///// ///// ///// 3 6 8 9 0 2 (0) The values of µ j are determined by the incomplete fractal boundary of U (0) (0).E.g.µ 9 =8, (0) (0) (0) (0) (0) µ 0 =9,µ =0, µ 2 =0, µ 3 =8,µ 4 =8.

-7- which using a O(m ω ), ω >2,m mmatrix multiplication procedure gives T n (m) =O(nm ω ). Analysis As in the exact division algorithm by Gaussian elimination, we can give a formula for the entries in U (i). Wedefine µ (i) j as the column index ofthat element in a row j of U (i) which is furthest to the right and which has been explicitly zeroed by step Z. If no such element exists, µ (i) 0. Observe that µ (i) j = j for i j. Also refer to Figure for an example. Theorem: After step Z in the Fast LU decomposition algorithm, u (i) j,k for all i n, j n, µ (i) j < k n. det(a(,..., µ(i) j, j;,..., µ(i) j =,k)) det(a(,..., µ (i) j ;,..., µ(i) j = j )) (4) Although (4) is the expected generalization of (), the intermediate numerators and denominators become much larger now. The reason is that the matrix multiplication for (2) and (3) requires us, by the assumption made in the introduction, to bring the rational entries of E, E, and F on a common denominator. In the worst case, for j = n/2 +, m = n/2, we have E =U (n/2 ) (,..., n 2 ; n 2 +,..., n). The common denominator of its entries is n/2 Π det(a(,..., j;,..., j)), (5) j= which if the original matrix A has integer entries of length l can be an integer of length O(n 2 l). Similar growth already occurs in the inversion E,but the entries of F have a small denominator. Over an abstract integral domain (5) is the best common denominator, but for specific domains such as the integers one could, of course, choose the GCD of the factors in (5). However, for random matrices this does not help much. The reason is that as polynomials in Z[a,,..., a n,n ]the factors are relatively prime polynomials, because determinants are irreducible polynomials [6], 23, Exercise 3. Now a random integral evaluation is likely to preserve relative primeness. For an explanation of this phenomenon we refer to [4]. As we have shown, exact rational arithmetic during the fast algorithms for linear system solving can lead to unexpected size growth. A remedy for this problem can be a modular approach, that is solving the system modulo prime numbers and then computing the rational solution by Chinese remaindering [2]. One gets the following theorem. Theorem: Let Ax = b be an n n non-singular linear system in which A and b have integral entries of lenght at most l. Then one can find the rational solution vector x in (nl) +o( ) n ω +o( ) binary steps.

-8- This theorem can also be generalized to singular systems, both in the over- and underdetermined case. Notice that the Fast LU Decompostion algorithm allows to compute the rank of an m n matrix, m n, ino(m ω n)arithmetic operations. Conclusion The main result in this paper is negative, inthat we show that one cannot unify exact division methods in computational algebra such as the Exact Division Gaussian elimination or the Subresultant PRS algorithm with asymptotically fast methods such as the Bunch-Hopcroft LU decomposition algorithm or the Knuth-Sch önhage polynomial GCD algorithm. In gaining an asymptotic speed-up in terms of arithmetic operation count one significantly increases the size of intermediately computed fractions. For specific domains, such as the rational numbers, this growth can be controlled by modular techniques. Note added on September 8, 2006: In the Theorem on page 2, the ranges of j and k were corrected. References. Aho, A., Hopcroft, J., and Ullman, J., The Design and Analysis of Algorithms, Addison and Wesley, Reading, MA (974). 2. Bareiss, E. H., Sylvester s identity and multistep integers preserving Gaussian elimination, Math. Comp., 22, pp. 565-578 (968). 3. Brent, R. P., Gustavson, F. G., and Yun, D. Y. Y., Fast solution of Toeplitz systems of equations and computation of Pad é approximants, J. Algorithms,, pp. 259-295 (980). 4. Brown, W. S., On Euclid s algorithm and the computation of polynomial greatest common divisors, J. ACM, 8, pp. 478-504 (97). 5. Brown, W. S. and Traub, J. F., On Euclid s algorithm and the theory of subresultants, J. ACM, 8, pp. 505-54 (97). 6. Bunch, J. R. and Hopcroft, J. E., Triangular factorization and inversion by fast matrix multiplication, Math. Comp., 28, pp. 23-236 (974). 7. Collins, G. E., Subresultants and reduced polynomial remainder sequences, J. ACM, 4, pp. 28-42 (967). 8. Coppersmith, D. and Winograd, S., Matrix multiplication via arithmetic progressions, Proc. 9th Annual ACM Symp. Theory Comp., pp. -6 (987). 9. Edmonds, J., Systems of distinct representatives and linear algebra, J. Research National Bureau Standards B, 7B, pp. 24-245 (967).

-9-0. Gantmacher, F. R., The Theory of Matrices, Vol., Chelsea Publ. Co., New York, N. Y. (960).. Leighton, F. T., Gaussian elimination over the rationals, Unpublished Manuscript, Applied Math. Dept., MIT (980). 2. McClellan, M. T., The exact solution of systems of linear equations with polynomial coefficients, J. ACM, 20, pp. 563-588 (973). 3. Moenck, R. T., Fast computation of GCDs, Proc. 5th ACM Symp. Theory Comp., pp. 42-5 (973). 4. Sch önhage, A., Probabilistic computation of integer GCDs, J. Algorithms (to appear). 5. Strassen, V., Gaussian elimination is not optimal, Numerische Mathematik, 3, pp. 354-356 (969). 6. Waerden, B. L. van der, Modern Algebra, F. Ungar Publ. Co., New York (953).