Faster Inversion and Other Black Box Matrix Computations Using Efficient Block Projections

Faster Inversion and Other Black Box Matrix Comptations Using Efficient Block Projections Wayne Eberly 1, Mark Giesbrecht, Pascal Giorgi,, Arne Storjohann, Gilles Villard (1) Department of Compter Science, University of Calgary, Calgary, Alberta, Canada eberly@cpsc.calgary.ca () David R. Cheriton School of Compter Science, University of Waterloo, Waterloo, Ontario, Canada mwg@cs.waterloo.ca, pgiorgi@cs.waterloo.ca, astorjoh@cs.waterloo.ca () CNRS, LIP, École Normale Spériere de Lyon, Lyon, France Gilles.Villard@ens-lyon.fr () IUT Université de Perpignan, Perpignan, France pascal.giorgi@niv-perp.fr ABSTRACT Efficient block projections of non-singlar matrices have recently been sed by the athors in [10] to obtain an efficient algorithm to find rational soltions for sparse systems of linear eqations. In particlar a bond of O (n.5 ) machine operations is presented for this comptation assming that the inpt matrix can be mltiplied by a vector with constantsized entries sing O (n) machine operations. Somewhat more general bonds for black-box matrix comptations are also derived. Unfortnately, the correctness of this algorithm depends on the existence of efficient block projections of non-singlar matrices, and this was only conjectred. In this paper we establish the correctness of the algorithm from [10] by proving the existence of efficient block projections for arbitrary non-singlar matrices over sfficiently large fields. We frther demonstrate the seflness of these projections by incorporating them into existing black-box matrix algorithms to derive improved bonds for the cost of several matrix problems. We consider, in particlar, matrices that can be mltiplied by a vector sing O (n) field operations: We show how to compte the inverse of any sch non-singlar matrix over any field sing an expected nmber of O (n.7 ) operations in that field. A basis for the nll space of sch a matrix, and a certification of its This material is based on work spported in part by the French National Research Agency (ANR Gecko, Villard), and by the Natral Sciences and Engineering Research Concil (NSERC) of Canada (Eberly, Giesbrecht, Storjohann). Permission to make digital or hard copies of all or part of this work for personal or classroom se is granted withot fee provided that copies are not made or distribted for profit or commercial advantage and that copies bear this notice and the fll citation on the first page. To copy otherwise, to repblish, to post on servers or to redistribte to lists, reqires prior specific permission and/or a fee. ISSAC'07,Jly 9 Agst 1, 007, Waterloo, Ontario, Canada. Copyright 007 ACM 978-1-5959-7-8/07/0007...$5.00. rank, are obtained at the same cost. An application of this techniqe to Kaltofen and Villard s Baby-Steps/Giant-Steps algorithms for the determinant and Smith Form of an integer matrix is also sketched, yielding algorithms reqiring O (n. ) machine operations. More general bonds involving the nmber of black-box matrix operations to be sed are also obtained. The derived algorithms are all probabilistic of the Las Vegas type. They are assmed to be able to generate random elements bits or field elements at nit cost, and always otpt the correct answer in the expected time given. Categories and Sbject Descriptors I.1. [Symbolic and Algebraic Maniplation]: Algorithms Algebraic algorithms, analysis of algorithms General Terms Algorithms Keywords Sparse integer matrix, strctred integer matrix, linear system solving, black box linear algebra 1. INTRODUCTION In or paper [10] we presented an algorithm which prportedly solved a sparse system of rational eqations considerably more efficiently than standard linear eqations solving. Unfortnately, its effectiveness in all cases was conjectral, even as its complexity and actal performance were very appealing. This effectiveness relied on a conjectre regarding the existence of so-called efficient block projections. Given a matrix A F n n over any field F, these projections shold be block vectors F n s (where s is a blocking factor dividing n, so n = ms) sch that we can compte v or v t qickly for any v F n s, and sch that the seqence of vectors, A,..., A m 1 has rank n. In this paper, we 1

prove the existence of a class of sch efficient block projections for non-singlar n n matrices over sfficiently large fields; we reqire that the size of the field F exceed n(n + 1). This can be sed to establish a variety of reslts concerning matrices A Z n n with efficient matrix-vector prodcts in particlar, sch that a matrix-vector prodct Ax mod p can be compted for a given integer vector x and a small (word-sized) prime p sing O (n) bit operations. Sch matrices inclde all sparse matrices having O(n) nonzero entries, assming these are appropriately represented. They also inclde a variety of strctred matrices, having constant displacement rank (for one definition of displacement rank or another) stdied in the recent literatre. In particlar, or existence reslt implies that if A Z n n is non-singlar and has an efficient matrix-vector prodct then the Las Vegas algorithm for system solving given in [10] can be sed to solve a system Ax = b for a given integer vector b sing an expected nmber of matrix-vector prodcts modlo a word-sized prime that is O (n 1.5 log( A + b )) together with an expected nmber of additional bit operations that is O (n.5 log( A + b )). If A has an efficient matrix-vector prodct then the total expected nmber of bit operations sed by this algorithm is less than that sed by any previosly known algorithm, at least when standard (i.e., cbic) matrix arithmetic is sed. Consider, for example, the case when the cost of a matrixvector prodct by A modlo a word-sized prime is O (n) operations, and the entries in A are constant size. The cost of or algorithm will be O (n.5 ) bit operations. This improves pon the p-adic lifting method of Dixon [], which reqires O (n ) bit operations for sparse or dense matrices. This theoretical efficiency was reflected in practice in [10] at least for large matrices. We present several other rather srprising applications of this techniqe. Each incorporates the techniqe into an existing algorithm in order to redce the asymptotic complexity for the matrix problem to be solved. In particlar, given a matrix A F n n over an arbitrary field F, we are able to compte the complete inverse of A with O (n 1/(ω 1) ) operations in F pls O (n 1/(ω 1) ) matrix-vector prodcts by A. Here ω is sch that we can mltiply two n n matrices with O(n ω ) operations in F. Standard matrix mltiplication gives ω =, while the best known matrix mltiplication of Coppersmith and Winograd [5] has ω =.7. If again we can compte v Av with O (n) operations in F, this implies an algorithm to compte the inverse with O (n 1/(ω 1) ) operations in F. This is always in O (n ω ), and in particlar eqals O (n.7 ) operations in F for the best known ω of [5]. Other relatively straightforward applications of these techniqes yield algorithms for the fll nllspace and (certified) rank with this same cost. Finally, we sketch how these methods can be employed in the algorithms of Kaltofen and Villard [18] and Giesbrecht [1] to compting the determinant and Smith form of sparse matrices more efficiently. There has certainly been mch important work done on finding exact soltions to sparse rational systems prior to [10]. Dixon s p-adic lifting algorithm [] performs extremely well in practice for dense and sparse linear systems, and is implemented efficiently in LinBox [7] and Magma (see [10] for a comparison). Kaltofen and Sanders [17] are the first to propose to se Krylov-type algorithms for these problems. Krylov-type methods are sed to find Smith forms of sparse matrices and to solve Diophantine systems in parallel in [1, 1], and this is frther developed in [8, 18]. See the references in these papers for a more complete history. For sparse systems over a field, the seminal work is that of Wiedemann [] who shows how to solve sparse n n systems over a field with O(n) matrix-vector prodcts and O(n ) other operations. This research is frther developed in [, 1, 17] and many other works. The bit complexity of similar operations for varios families of strctred matrices is examined by Emiris and Pan [11].. EFFICIENT BLOCK PROJECTIONS For now we will consider an arbitrary invertible matrix A F n n over a field F, and s an integer, the blocking factor, that divides n exactly. Let m = n/s. For a so-called block projection F n s and 1 k m, we denote by K k (A, ) the block Krylov matrix [, A,..., A k 1 ] F n ks. We wish to show that K m(a, ) F n n is non-singlar for a particlarly simple and sparse, assming some properties of A. Or factorization ses the special projection (which we will refer to as an efficient block projection) = I s. I s 7 5 F n s, (.1) which is comprised of m copies of I s and ths has exactly n non-zero entries. We sggest a similar projection in [10] withot proof of its reliability (i.e., that the corresponding block Krylov matrix is non-singlar). We establish here that it does yield a block Krylov matrix of fll rank, and hence can be sed for an efficient inverse of a sparse A. Let D = diag(δ 1,..., δ 1, δ,..., δ,..., δ m,..., δ m) be an n n diagonal matrix whose entries consist of m distinct indeterminates δ i, each δ i occrring s times. Theorem.1. If the leading ks ks minor of A is nonzero, for 1 k m, then K m(dad, ) F[δ 1,..., δ m] n n is non-singlar. Proof. Let B = DAD. For 1 k m, define B k as the specialization of B obtained by setting δ k+1, δ k+,..., δ m to zero. Ths B k is the matrix constrcted by setting to zero the last n ks rows and colmns of B. Similarly, for 1 k m we define k F n s to be the matrix constrcted from by setting to zero the last n ks rows. In particlar we have B m = B and m =. This specialization will allow s to arge incrementally abot how the rank is increased as k increases. We proceed by indction on k and show that rank K k (B k, k ) = ks, (.) for 1 k m. For the base case k = 1 we have K 1(B 1, 1) = 1 and ths rank K 1(B 1, 1) = rank 1 = s. Now, assme that (.) holds for some k with 1 k < m. By the definition of B k and k, only the first ks rows of B k and k will be involved in the left hand side of (.). Similarly, only the first ks colmns of B k will be involved. Since by assmption on B the leading ks ks minor is nonzero, we have rank B k K k (B k, k ) = ks, which is eqivalent to rank K k (B k, B k k ) = ks. By the fact that the first ks rows of k+1 k are zero, we have B k ( k+1 k ) = 0, or eqivalently B k k+1 = B k k, and hence rank K k (B k, B k k+1 ) = ks. (.) 1

The matrix in (.) can be written as» Mk K k (B k, B k k+1 ) = F n ks, 0 where M k F ks ks is non-singlar. Introdcing the block k+1 we obtain the matrix [ k+1, K k (B k, B k k+1 )] = M k I s 0 5. (.) 0 0 whose rank is (k + 1)s. Noticing that» k+1, K k (B k, B k k+1 ) = K k+1 (B k, k+1 ), we are led to rank K k+1 (B k, k+1 ) = (k + 1)s. Finally, sing the fact that B k is the specialization of B k+1 obtained by setting δ k+1 to zero, we obtain rank K k+1 (B k+1, k+1 ) = (k + 1)s, which is (.) for k + 1 and ths establishes the theorem by indction. If the leading ks ks minor of A is non-zero, then the leading ks ks minor of A T is non-zero as well, for any integer k. This gives s the following corollary. Corollary.. If the leading ks ks minor of A is nonzero for 1 k m, and B = DAD, then K m(b T, ) is non-singlar. Sppose now that A F n n is an arbitrary non-singlar matrix and the size of F exceeds n(n + 1). It follows by Theorem of Kaltofen and Sanders [17] that there exists a lower trianglar Toeplitz matrix L F n n and an pper trianglar Toeplitz matrix U F n n sch that each of the leading minors of A b = UAL is non-zero. Let B = DAD; b the prodct of the determinants of the matrices K m(b, ) and K m(b T, ) (mentioned in the above theorem and corollary) is a polynomial with total degree less than n(m 1) < n(n + 1) (if m 1). In this case it follows that there is also a non-singlar diagonal matrix D F n n sch that K m(b, ) and K m(b T, ) are non-singlar, for B = D b AD = DUALD. Now let R = LD U F n n, b F s n and bv F n s sch that Then and b T = (L T ) 1 D 1 and bv = LD. K m(ra, bv) = LDK m(b, ) L T DK m((ra) T, b T ) = K m(b T, ), so that K m(ra, bv) and K m((ra) T, b T ) are each non-singlar as well. Becase D is diagonal and U and L are trianglar Toeplitz matrices, it is now easily established that (R, b, bv) is an efficient block projection for the given matrix A, where sch projections are as defined in [10]. This proves Conjectre.1 of [10] for the case that the size of F exceeds n(n + 1): Corollary.. For any non-singlar A F n n and s n (over a field of size greater than n(n + 1)) there exists an efficient block projection (R,, v) F n n F s n F n s.. FACTORIZATION OF THE MATRIX INVERSE The existence of the efficient block projection established in the previos section allows s to define a sefl factorization of the inverse of a matrix. This was sed to obtain faster heristics for solving integer systems in [10]. The basis is the following factorization of the matrix inverse. Let B = DAD, where D is an n n diagonal matrix whose diagonal entries consist of m distinct indeterminates, each occrring s times contigosly, as previosly defined. Define K (r) = K m(b, ) with as in (.1) and K (l) = K m(b T, ) T, where (r) and (l) refer to projection on the right and left respectively. For any 0 k m 1 and any two indices l and r sch than l + r = k we have T B l B r = T B k. Hence the matrix H = K (l) blocks of dimension s s: B K (r) is block-hankel with T B T B... T B m H = T B T B......... 7 T B m 5 T B m... T B m T B m 1 Notice that H = K (l) B K (r) = K (l) DAD K (r). Theorem.1 and Corollary. imply that if all leading ks ks minors of A are non-singlar then K (l) and K (r) are each non-singlar as well. This establishes the following. Theorem.1. If A F n n is sch that all leading ks ks minors are non-singlar, D is a diagonal matrix of indeterminates, and B = DAD, then B 1 and A 1 may be factored as B 1 =K (r) A 1 =DK (r) H 1 K (l), H 1 K (l) D, (.1) where K (l) and K (r) are as defined above, and H F n n is block-hankel (and invertible) with s s blocks, as above. Note that for any specialization of the indeterminates in D to field elements in F sch that det H 0 we obtain a similar formla to (.1) completely over F. A similar factorization in the non-blocked case is sed in [9, (.5)] for fast parallel matrix inversion.. BLACK-BOX MATRIX INVERSION OVER A FIELD Sppose again that A F n n is invertible, and that for any v F n 1 the prodcts Av and A T v can be compted in φ(n) operations in F (where φ(n) n). Following Kaltofen, we call sch matrix-vector and vector-matrix prodcts blackbox evalations of A. In this section we will show how to compte A 1 with O (n 1/(ω 1) ) black box evalations and O (n 1/(ω 1) ) additional operations in F. Note that when φ(n) = O (n) the exponent in n of this cost is smaller than ω, and is O (n.7 ) with the crrently best-known matrix mltiplication. 15

Again assme that n = ms, where s is a blocking factor and m the nmber of blocks. Assme for the moment that all principal ks ks minors of A are non-zero, 1 k m. Let δ 1, δ,..., δ m be the indeterminates that form the diagonal entries of D and let B = DAD. By Theorem.1 and Corollary., the matrices K m(b, ) and K m(b T, ) are each invertible. If m then the prodct of the determinants of these matrices is a non-zero polynomial F[δ 1,..., δ m] with total degree at most n(m 1). Sppose that F has at least n(m 1) elements. Then cannot be zero at all points in (F \ {0}) n. Let d 1, d,..., d m be non-zero elements of F sch that (d 1, d,..., d m) 0, let D = diag(d 1,..., d 1,..., d m,..., d m), and let B = DAD. Then K (r) = K m(b, ) F n n and K (l) = K m(b T, ) T F n n are each invertible becase (d 1, d,..., d m) 0, B is invertible becase A is and d 1, d,..., d m are all non-zero, and ths H = K (l) BK (r) F n n is invertible as well. Correspondingly, (.1) sggests B 1 = K (r) H 1 K (l), and A 1 = DK (r) for compting the matrix inverse. H 1 K (l) D 1. Comptation of T, T B,..., T B m 1 and K (l). We can compte this seqence, hence K (r), with m 1 applications of B to vectors sing O(nφ(n)) operations in F.. Comptation of H. De to the special form (.1) of, one may then compte w for any w F s n with O(sn) operations. Hence we can now compte T B i for 0 i m 1 with O(n ) operations in F.. Comptation of H 1. The off-diagonal inverse representation of H 1 as in (A.) in the Appendix can be fond with O (s ω m) operations by Proposition A.1.. Comptation of H 1 K (l). From Corollary A. in the Appendix, we can compte the prodct H 1 M for any matrix M F n n with O (s ω m ) operations. 5. Comptation of K (r) We can compte K (r) any M F n n (H 1 ). M = [, B,..., B m 1 ]M, for K (l) by splitting M into m blocks of s consective rows M i, for 0 i m 1: m 1 X K M = B i (M i) i=0 =M 0 + B(M 1 + B(M + + B(M m + BM m 1) ). (.1) Becase of the special form (.1) of, each prodct M i F n n reqires O(n ) operations, and hence all sch prodcts involved in (.1) can be compted in O(mn ) operations. Becase applying B to an n n matrix costs nφ(n) operations, K (r) M is compted in O(mnφ(n) + mn ) operations sing the iterative form of (.1) In total, the above process reqires O(mn) applications of A to a vector (the same as for B), and O(s ω m + mn ) additional operations. If φ(n) = O (n), the overall nmber of field operations is minimized with the blocking factor s = n 1/(ω 1). Theorem.1. Let A F n n, where n = ms and s = n 1/(ω 1), be sch that all leading ks ks minors are nonsinglar for 1 k m. Let B = DAD, for D = diag(d 1,..., d 1,..., d m,..., d m), sch that d 1,..., d m are non-zero and each of the matrices K m(dad, ) and K m((dad) T, ) is invertible. Then the inverse matrix A 1 can be compted sing O(n 1/(ω 1) ) black box operations and an additional O (n 1/(ω 1) ) operations in F. The above discssion makes a nmber of assmptions. First, it assmes that the blocking factor s exactly divides n. This is easily accommodated by simply extending n to the nearest mltiple of s, placing A in the top left corner of the agmented matrix, and adding diagonal ones in the bottom right corner. Theorem.1 also makes the assmptions that all the leading ks ks minors of A are non-singlar and that the determinants of K m(dad, ) and K m((dad) T, ) are each non-zero. Althogh we know of no way to ensre this deterministically in the times given, standard techniqes can be sed to obtain these properties probabilistically if F is sfficiently large. Sppose, in particlar, that n 1 and that #F > (m + 1)n log n. Fix a set S of at least (m + 1)n log n nonzero elements of F. We can ensre that the leading ks ks minors of A are non-zero by pre- and post-mltiplying by btterfly network preconditioners X and Y respectively, with parameters chosen niformly and randomly from S. If X and Y are constrcted sing the generic exchange matrix of [,.], then it will se at most n log n / random elements from S, and from [, Theorem.] it follows that all leading ks ks minors of A e = XAY will be non-zero simltaneosly with probability at least /. This probability of sccess can be made arbitrarily close to 1 with a choice from a larger S. We note that A 1 = Y A e 1 X. Ths, once we have compted A e 1 we can compte A 1 with an additional O (n ) operations in F, sing the fact that mltiplication of an arbitrary n n matrix by an n n btterfly preconditioner can be done with O (n ) operations. Once again let be the prodcts of the determinants of the matrices K m(dad, ) and K m((dad) T, ), so that is non-zero with total degree at most n(m 1). If we choose randomly selected vales from S for δ 1,..., δ m, becase #S (m + 1)n log n > deg the probability that is zero at this point is at most 1/ by the Schwartz- Zippel Lemma [1, ]. In smmary, for randomly selected btterfly preconditioners X, Y as above, and independently and randomly chosen vales d 1, d,..., d m the probability that A e = XAY has non-singlar leading ks ks minors for 1 k m and (d 1, d,..., d m) is non-zero is at least 9/1 > 1/ when random choices are made niformly and independently from a finite sbset S of F\{0} with size at least (m+1)n log n. When #F (m + 1)n log n, we can easily constrct a field extension E of F that has size greater than (m + 1)n log n and perform the comptation in that extension. Becase this extension will have degree O(log #F n) over F, it will add only a logarithmic factor to the final cost. While we certainly do not claim that this is not of practical concern, it does not affect the asymptotic complexity. 1

This algorithm is Las Vegas (or trivially modified to be so): For if either K m(dad, ) or K m((dad) T, ) is singlar then so is H and this is detected at step. On the other hand, if K m(dad, ) and K m((dad) T, ) are both nonsinglar then the algorithm s otpt is correct. Theorem.. Let A F n n be non-singlar. Then the inverse matrix A 1 can be compted by a Las Vegas algorithm whose expected cost is O (n 1/(ω 1) ) black box operations and O (n 1/(ω 1) ) additional operations in F. Table.1 (below) states the expected costs to compte the inverse sing varios vales of ω when φ(n) = O (n). ω Black-box Blocking Inversion applications factor cost (Standard) 1.5 n 1/ O (n.5 ).807 (Strassen) 1. n 0.55 O (n. ).755 (Cop/Win) 1.7 n 0.78 O (n.7 ) Table.1: Exponents of matrix inversion with a matrix vector cost φ(n) = O (n). Remark.. The strctre (.1) of the projection plays a central role in compting the prodct of the block Krylov matrix by a n n matrix. For a general projection F n s, how to do better than a general matrix mltiplication, i.e., how to take advantage of the Krylov strctre for compting K M, appears to be nknown. Mltiplying a Black-Box Matrix Inverse By Any Matrix The above method can also be sed to compte A 1 M for any matrix M F n n with the same cost as in Theorem.. Consider the new step 1.5: 1.5. Comptation of K (l) M. Split M into m blocks of s colmns, so that M = [M 0,..., M m 1] where M k F n s. Now consider compting K (l) M k for some k {0,..., m 1}. This can be accomplished by compting B i M k for 0 i m 1 in seqence, and then mltiplying on the left by T to compte T B i M k for each iterate. The cost for compting K (l) M k for a single k by the above process is n s mltiplication of A to vectors and O(ns) additional operations in F. The cost of doing this for all k sch that 0 k m 1 is ths m(n s) < nm mltiplications of A to vectors and O(n ) additional operations. Since applying A (and hence B) to an n n matrix is assmed to cost nφ(n) operations in F, K (l) M is compted in O(mnφ(n) + mn ) operations in F by the process described here. Note that this is the same as the cost of Step 5, so the overall cost estimate is not affected. Becase Step does not rely on any special form for K (l), we can replace it with a comptation of H 1 (K (l) M) with the same cost. The otpt is again easily certified with n additional black-box evalations. We obtain the following corollary. Corollary.. Let A F n n be non-singlar and let M F n n. We can compte A 1 M with a Las Vegas algorithm whose expected cost is O (n 1/(ω 1) ) black box operations and O (n 1/(ω 1) ) additional operations in F. The estimates in Table.1 apply to this comptation as well. 5. APPLICATIONS TO BLACK-BOX MATRICES OVER A FIELD The algorithms of the previos section have applications in some important comptations with black-box matrices over an arbitrary field F. In particlar, we consider the problems of compting the nllspace and rank of a blackbox matrix. Each of these algorithms is probabilistic of the Las Vegas type; the otpt is certified to be correct. Kaltofen and Sanders [17] present algorithms for compting the rank of a matrix and for randomly sampling the nllspace, bilding pon the work of Wiedemann []. In particlar, they show for random lower pper and lower trianglar Toeplitz matrices U, L F n n, and random diagonal D, that all leading k k minors of A e = UALD are non-singlar for 1 k r = rank A, and that if f ea F[x] is the minimal polynomial of A, e then it has degree r + 1 if A is singlar (and degree n if A is non-singlar). This is proved to be tre for any inpt A F n n, and for random choice of U, L and D, with high probability. The cost of compting f ea (and hence rank A) is shown to be O(n) applications of the black-box for A and O(n ) additional operations in F. However, no certificate is provided that the rank is correct within this cost (and we do not know of one or provide one here). Kaltofen and Sanders [17] also show how to generate a vector niformly and randomly from the nllspace of A with this cost (and, of corse, this is certifiable with a single evalation of the black box for A). We also note that the algorithms of Wiedemann and Kaltofen and Sanders reqire only a linear amont of extra space, which will not be the case for or algorithms. We first employ the random preconditioning of [17] and let ea = UALD as above. We will ths assme in what follows that A has all leading i i minors non-singlar for 1 i r. Althogh an nlcky choice may make this statement false, this case will be identified in or method. Also assme that we have compted the rank r of A with high probability. Again, this will be certified in what follows. 1. Inverting the leading minor. Let A 0 be the leading r r minor of A and partition A as «A0 A A = 1. A A Using the algorithm of the previos section, compte A 1 0. If this fails, and the leading r r minor is singlar, then either the randomized conditioning or the rank estimate has failed and we either report this failre or try again with a different randomized preconditioning. If we can compte A 1 0, then the rank of A is at least the estimated r.. Applying the inverted leading minor. Compte A 1 0 A1 Fr (n r) sing the algorithm of the previos section (this cold in fact be merged into the first step). 17

. Confirming the nllspace. Note that «««A0 A 1 A 1 0 A1 0 = A A I A A 1 = 0, 0 A1 A {z } N and the Schr complement A A 1 0 A1 A mst be zero if the rank r is correct. This can be checked with n r evalations of the black box for A. We note that becase of its strctre, N = A 1 0 A 1 I has rank n r.. Otpt rank and nllspace basis. If the Schr complement is zero, then otpt the rank r and N, whose colmns give a basis for the nllspace of A. Otherwise, otpt fail (and possibly retry with a different randomized pre-conditioning). Theorem 5.1. Let A F n n have rank r. Then a basis for the nllspace of A and rank r of A can be compted with an expected nmber of O (n 1/(ω 1) ) applications of A to a vector, pls an additional expected nmber of O (n 1/(ω 1) ) operations in F. The algorithm is probabilistic of the Las Vegas type.. APPLICATIONS TO SPARSE RATIONAL LINEAR SYSTEMS Given a non-singlar A Z n n and b Z n 1, in [10] we presented an algorithm and implementation to compte A 1 b with O (n 1.5 (log( A + b ))) matrix-vector prodcts v A mod p for a machine-word sized prime p and any v Z n 1 p pls O (n.5 (log( A + b ))) additional bit-operations. Assming that A and b had constant sized entries, and that a matrix-vector prodct by A mod p cold be performed with O (n) operations modlo p, the algorithm presented cold solve a system with O (n.5 ) bit operations. Unfortnately, this reslt was conditional pon the nproven Conjectre.1 of [10]: the existence of an efficient block projection. This conjectre was established in Corollary. of the crrent paper. We can now nconditionally state the following theorem. Theorem.1. Given any invertible A Z n n and b Z n 1, we can compte A 1 b sing a Las Vegas algorithm. The expected nmber of matrix-vector prodcts v Av mod p is in O (n 1.5 (log( A + b ))), and the expected nmber of additional bit-operations sed by this algorithm is in O (n.5 (log( A + b ))). Sparse Integer Determinant and Smith Form The efficient block projection of Theorem.1 can also be employed relatively directly into the block baby-steps/giantsteps methods of [18] for compting the determinant of an integer matrix. This will yield improved algorithms for the determinant and Smith form of a sparse integer matrix. Unfortnately, the new techniqes do not obviosly improve the asymptotic cost of their algorithms in the case for which they were designed, namely, for comptations of the determinants of dense integer matrices. We only sketch the method for compting the determinant here following the algorithm in Section of [18], and estimate its complexity. Throghot we assme that A Z n n is non-singlar and assme that we can compte v Av with φ(n) integer operations, where the bit-lengths of these integers are bonded by O (log(n + v + A )). 1. Preconditioning and setp. Precondition A B = D 1UAD, where D 1, D are random diagonal matrices, and U is a nimodlar preconditioner from [, 5]. While we will not provide the detailed analysis here, selecting coefficients for these randomly from a set S 1 of size n is sfficient to ensre a high probability of sccess. This preconditioning will ensre that all leading minors are non-singlar and that the characteristic polynomial is sqarefree with high probability (see [] Theorem. for a proof of the latter condition). From Theorem.1, we also see that K m(b, ) has fll rank with high probability. Let p be a prime that is larger than the a priori bond on the coefficients of the characteristic polynomial of A; this is easily determined to be (n log A ) n+o(1). Fix a blocking factor s to be optimized later, and assme n = ms.. Choosing projections. Let Z n s be an efficient block projection as in (.1) and v Z n s a random (dense) block projection with coefficients chosen from a set S of size at least n.. Forming the seqence α i = A i v Z s s. Compte this seqence for i = 0... m. Compting all the A i v takes O (nφ(n) m log A ) bit operations. Compting all the A i v takes O (n m log A ) bit operations.. Compting the minimal matrix generator. The minimal matrix generator F (λ) modlo p can be compted from the initial seqence segment α 0,..., α m 1. See [18, ]. This can be accomplished with O (ms ω n log A ) bit operations. 5. Extracting the determinant. Following the algorithm in [18, ], we first check if its degree is less than n and if so, retrn failre. Otherwise, we know det F A (λ) = det(λi A). Retrn det A = det F (0) mod p. The correctness of the algorithm, and specifically the block projections, follows from fact that [, A,..., A m 1 ] is of fll rank with high probability by Theorem.1. Becase the projection v is dense, the analysis of [18, (.)] is applicable, and the minimal generating polynomial will have fll degree m with high probability, and hence its determinant at λ = 0 will be the determinant of A. The total cost of this algorithm is O ((nφ(n)m + n m + nms ω ) log A ) bit operations, which is minimized when s = n 1/ω. This yields an algorithm for the determinant which reqires O ((n 1/ω φ(n) + n 1/ω ) log A ) bit operations. This is probably most interesting when ω =, where it yields an algorithm for determinant that reqires O (n. log A ) bit operations on a matrix with psedolinear cost matrix-vector prodct. We also note that a similar approach allows s to se the Monte Carlo Smith form algorithm of [1], which is compted by means of compting the characteristic polynomial of random preconditionings of a matrix. This redction is 18

explored in [18] in the dense matrix setting. The pshot is that we obtain the Smith form with the same order of complexity, to within a poly-logarithmic factor, as we have obtained the determinant sing the above techniqes. See [18, 7.1] and [1] for details. We make no claim that this is practical in its present form. Note: A referee has indicated that a lifting algorithm of Pan et al [0] can also be sed to solve integer systems when efficient matrix-vector prodcts (modlo small primes) are spported for both the coefficient matrix and its inverse. This wold provide an alternate application of or central reslts to solve integer systems. We wish to thank the referee for this information. APPENDIX A. APPLYING THE INVERSE OF A BLOCK-HANKEL MATRIX In this appendix we address asymptotically fast techniqes for compting a representation of the inverse of a block Hankel matrix, for applying this inverse to an arbitrary matrix. The fndamental techniqe we will employ is to se the offdiagonal inversion formla of Beckermann & Labahn [1] and its fast variants [1]. An alternative to sing the inversion formla wold be to se the generalization of the Levinson- Drbin algorithm in [1]. Again assme n = ms for integers m and s, and let H = α 0 α 1... α m 1 α 1 α......... α m α m 1... α m α m 1 7 5 Fn n (A.1) be a non-singlar block-hankel matrix whose blocks are s s matrices over F, and let α m be arbitrary in F s s. We follow the approach of [19] for compting the inverse matrix H 1. Since H is invertible, the following for linear systems (see [19, (.8)-(.11)]) and H [q m 1,, q 0] t = [0,, 0, I] F n s, H [v m,, v 1] t = [α m, α m 1α m] F n s, [q m 1... q 0] H = [0... 0 I] F s n, (A.) [vm... v1] H = [α m... α m 1 α m] F s n, (A.) have niqe soltions given by the q k, qk F s s, (for 0 k m 1), and the v k, vk F s s (for 1 k m). We then obtain the following eqation (see [19, Theorem.1]): v m 1... v 1 I H 1 =....... v 1... I 7 5 q m... q 0 0....... q 0... 7 5 0 q m 1... q 0...... q m 1 v m... v 1...... v m 7 5 7 5. (A.) The linear systems (A.) and (A.) may also be formlated in terms of matrix Padé approximation problems. We associate to H the matrix polynomial A = P m i=0 αixi F s s [x]. The s s matrix polynomials Q, P, Q, P in F s s [x] that satisfy A(x)Q(x) P (x) + x m 1 mod x m, where deg Q m 1 and deg P m, Q (x)a(x) P (x) + x m 1 mod x m, where deg Q m 1 and deg P m (A.5) are niqe and provide the coefficients Q = P m 1 i=0 qixi and Q = P m 1 i=0 q i x i for constrcting H 1 sing (A.) (see [19, Theorem.1]). The notation mod x i for i 0 indicates that the terms of degree i or higher are ignored. The s s matrix polynomials V, U, V, U in F s s [x] that satisfy A(x)V (x) U(x) mod x m+1, V (0) = I, where deg V m and deg U m 1, V (x)a(x) U (x) mod x m+1, V (0) = I, where deg Q m 1 and deg P m, (A.) are niqe and provide the coefficients V = 1 + P m i=1 vixi and Q = 1 + P m i=1 v i x i for (A.). Using the matrix Padé formlation, the matrices Q, Q, V, and V may be compted sing the σ-basis algorithm in [1], or its fast conterpart in [1,.] that ses fast matrix mltiplication. For solving (A.5), the σ-basis algorithm with σ = s(m 1) solves» Q [A I] Rx m 1 mod x m, P» [Q P A ] I R x m 1 mod x m, with Q, P, Q, P F s s [x] that satisfy the degree constraints deg Q m 1, deg Q m 1, and deg P m, deg P m. The reside matrices R and R in F s s are non-singlar, hence Q = QR 1 and Q = (R )Q are soltions Q and Q for applying the inversion formla (A.). For (A.), the σ-basis algorithm with σ = s(m + 1) leads to» V [A I] 0 mod x m+1, U» [V U A ] 0 mod x m+1 I with deg V m, deg V m, and deg U m 1, deg U m 1. The constant terms V (0) and V (0) in F s s are nonsinglar, hence V = V (V (0)) 1 and V = (V (0)) 1 V are soltions for applying (A.). Using Theorem. in [1] together with the above material we get the following cost estimate. Proposition A.1. Compting the expression (A.) of the inverse of the block-hankel matrix (A.1) redces to mltiplying matrix polynomials of degree O(m) in F s s, and can be done with O (s ω m) operations in F. Mltiplying a block trianglar Toeplitz or Hankel matrix in F n n with blocks of size s s by a matrix in F n n redces 19

to the prodct of two matrix polynomials of degree O(m), and of dimensions s s and s n. Using the fast algorithms in [] or [], sch a s s prodct can be done in O (s ω m) operations. By splitting the s n matrix into s s blocks, the s s by s n prodct can ths be done in O (m s ω m) = O (s ω m ) operations. For n = s ν let ω(1, 1, ν) be the exponent of the problem of s s by s n matrix mltiplication over F. The splitting considered jst above of the s n matrix into s s blocks, corresponds to taking ω(1, 1, ν) = ω + ν 1 < ν + 1.7 (ω <.7 de to [5]), with the total cost O (s ω(1,1,ν) m) = O (s ω m ). Depending on σ 1, a slightly smaller bond than ν + 1.7 for ω(1, 1, ν) may be sed de the matrix mltiplication techniqes specifically designed for rectanglar matrices in [15]. This is tre as soon as ν 1.171, and gives for example ω(1, 1, ν) < ν + 1. for ν =, i.e., for s = n. Corollary A.. Let H be the block-hankel matrix of (A.1). If the representation (A.) of H 1 is given, then compting H 1 M for an arbitrary M F n n redces to for s s by s n prodcts of polynomial matrices of degree O(m). This can be done with O (s ω(1,1,ν) m) or O (s ω m ) operations in F (n = s ν = ms). B. REFERENCES [1] B. Beckermann and G. Labahn. A niform approach for the fast comptation of matrix-type Padé approximants. SIAM J. Matrix Anal. Appl., 15():80 8, Jly 199. [] A. Bostan and E. Schost. Polynomial evalation and interpolation on special sets of points. J. Complex., 1():0, 005. [] D. Cantor and E. Kaltofen. Fast mltiplication of polynomials over arbitrary algebras. Acta Informatica, 8:9 701, 1991. [] L. Chen, W. Eberly, E. Kaltofen, B. D. Sanders, W. J. Trner, and G. Villard. Efficient matrix preconditioners for black box linear algebra. Linear Algebra and its Applications, :119 1, 00. [5] D. Coppersmith and S. Winograd. Matrix mltiplication via arithmetic progressions. J. Symb. Comp., 9:51 80, 1990. [] John D. Dixon. Exact soltion of linear eqations sing p-adic expansions. Nmerische Mathematik, 0:17 11, 198. [7] J.-G. Dmas, T. Gatier, M. Giesbrecht, P. Giorgi, B. Hovinen, E. Kaltofen, B. D. Sanders, W. J. Trner, and G. Villard. LinBox: A generic library for exact linear algebra. In Arjeh M. Cohen, Xiao-Shan Gao, and Nobki Takayama, editors, Proceedings of the 00 International Congress of Mathematical Software, Beijing, China, pages 0 50. World Scientific, Agst 00. [8] J.-G. Dmas, B. D. Sanders, and G. Villard. Integer Smith form via the valence: experience with large sparse matrices from homology. In ISSAC 00: Proceedings of the 000 International Symposim on Symbolic and Algebraic Comptation, pages 95 105, New York, NY, USA, 000. ACM Press. [9] W. Eberly. Processor-efficient parallel matrix inversion over abstract fields: two extensions. In Proceedings, PASCO 97, pages 8 5, New York, NY, USA, 1997. ACM Press. [10] W. Eberly, M. Giesbrecht, P. Giorgi, A. Storjohann, and G. Villard. Solving sparse rational linear systems. In ISSAC 0: Proceedings of the 00 International Symposim on Symbolic and algebraic comptation, pages 70, New York, NY, USA, 00. ACM Press. [11] I. Z. Emiris and V. Y. Pan. Improved algorithms for compting determinants and resltants. J. Complex., 1(1): 71, 005. [1] M. Giesbrecht. Efficient parallel soltion of sparse systems of linear diophantine eqations. In Proceediings, PASCO 97, pages 1 10, 1997. [1] M. Giesbrecht. Fast comptation of the Smith form of a sparse integer matrix. Comptational Complexity, 10(1):1 9, 00. [1] P. Giorgi, C.-P. Jeannerod, and G. Villard. On the complexity of polynomial matrix comptations. In Rafael Sendra, editor, Proceedings of the 00 International Symposim on Symbolic and Algebraic Comptation, Philadelphia, Pennsylvania, USA, pages 15 1. ACM Press, New York, Agst 00. [15] X. Hang and V. Y. Pan. Fast rectanglar matrix mltiplication and applications. J. Complex., 1():57 99, 1998. [1] E. Kaltofen. Analysis of Coppersmith s block Wiedemann algorithm for the parallel soltion of sparse linear systems. Mathematics of Comptation, (10):777 80, April 1995. [17] E. Kaltofen and B. D. Sanders. On Wiedemann s method of solving sparse linear systems. In Proc. AAECC-9, volme 59 of Springer Lectre Notes in Comp. Sci., 1991. 9 8. [18] E. Kaltofen and G. Villard. On the complexity of compting determinants. Comptational Complexity, 1(-):91 10, 00. [19] G. Labahn, D. K. Chio, and S. Cabay. The inverses of block Hankel and block Toeplitz matrices. SIAM J. Compt., 19(1):98 1, 1990. [0] V. Y. Pan, B. Mrphy, R. E. Rosholt, and X. Wang. Toeplitz and Hankel meet Hensel and Newton: Nearly optimal algorithms and their practical acceleration with satrated initialization. Technical Report 00 01, The Gradate Center, CUNY, New York, 00. [1] J. T. Schwartz. Fast probabilistic algorithms for verification of polynomial identities. J. Assoc. Compting Machinery, 7:701 717, 1980. [] D. H. Wiedemann. Solving sparse linear eqations over finite fields. IEEE Transactions on Information Theory, (1):5, Janary 198. [] R. Zippel. Probabilistic algorithms for sparse polynomials. In Proc. EUROSAM 79, pages 1, Marseille, 1979. 150