Extended gcd algorithms

Similar documents

The van Hoeij Algorithm for Factoring Polynomials

Continued Fractions and the Euclidean Algorithm

Lecture 3: Finding integer solutions to systems of linear equations

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Integer Factorization using the Quadratic Sieve

Solution of Linear Systems

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

The Mixed Binary Euclid Algorithm

1 VECTOR SPACES AND SUBSPACES

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Systems of Linear Equations

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

3. INNER PRODUCT SPACES

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Notes on Factoring. MA 206 Kurt Bryan

7 Gaussian Elimination and LU Factorization

Factoring & Primality

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

Linear Codes. Chapter Basics

Factorization Theorems

CONTINUED FRACTIONS AND FACTORING. Niels Lauritzen

1 Solving LPs: The Simplex Algorithm of George Dantzig

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

Section Inner Products and Norms

Similarity and Diagonalization. Similar Matrices

Notes on Determinant

Vector and Matrix Norms

Factorization Methods: Very Quick Overview

The Ideal Class Group

Analysis of Approximation Algorithms for k-set Cover using Factor-Revealing Linear Programs

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Transportation Polytopes: a Twenty year Update

Ideal Class Group and Units

Applied Algorithm Design Lecture 5

We shall turn our attention to solving linear systems of equations. Ax = b

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

The Goldberg Rao Algorithm for the Maximum Flow Problem

Data Mining: Algorithms and Applications Matrix Math Review

Unique Factorization

The Determinant: a Means to Calculate Volume

How To Know If A Domain Is Unique In An Octempo (Euclidean) Or Not (Ecl)

Linear Algebra Review. Vectors

Chapter 3. if 2 a i then location: = i. Page 40

How To Prove The Dirichlet Unit Theorem

Offline sorting buffers on Line

0.1 Phase Estimation Technique

Factoring Algorithms

DEGREES OF ORDERS ON TORSION-FREE ABELIAN GROUPS

24. The Branch and Bound Method

Basic Algorithms In Computer Algebra

Inner product. Definition of inner product

The Characteristic Polynomial

Monogenic Fields and Power Bases Michael Decker 12/07/07

Some applications of LLL

Separation Properties for Locally Convex Cones

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

NOTES ON LINEAR TRANSFORMATIONS

COMMUTATIVITY DEGREES OF WREATH PRODUCTS OF FINITE ABELIAN GROUPS

CHAPTER 5. Number Theory. 1. Integers and Division. Discussion

MATH10040 Chapter 2: Prime and relatively prime numbers

Two classes of ternary codes and their weight distributions

Inner Product Spaces

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Direct Methods for Solving Linear Systems. Matrix Factorization

BANACH AND HILBERT SPACE REVIEW

MATH 590: Meshfree Methods

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

A Tool Kit for Finding Small Roots of Bivariate Polynomials over the Integers

1 Introduction to Matrices

International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013

Elements of Abstract Group Theory

Faster deterministic integer factorisation

Number Theory Hungarian Style. Cameron Byerley s interpretation of Csaba Szabó s lectures

GENERATING SETS KEITH CONRAD

Linear Algebra I. Ronald van Luijk, 2012

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Math 181 Handout 16. Rich Schwartz. March 9, 2010

ON FIBONACCI NUMBERS WITH FEW PRIME DIVISORS

it is easy to see that α = a

17. Inner product spaces Definition Let V be a real vector space. An inner product on V is a function

Fairness in Routing and Load Balancing

CS 103X: Discrete Structures Homework Assignment 3 Solutions

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

11 Multivariate Polynomials

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

THE DYING FIBONACCI TREE. 1. Introduction. Consider a tree with two types of nodes, say A and B, and the following properties:

SOLVING LINEAR SYSTEMS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Continuity of the Perron Root

LINEAR ALGEBRA W W L CHEN

Transcription:

Extended gcd algorithms George Havas BS Majewski KR Matthews Abstract Extended gcd calculation has a long history and plays an important role in computational number theory and linear algebra Recent results have shown that finding optimal multipliers in extended gcd calculations is difficult We study algorithms for finding good multipliers and present new algorithms with improved performance We present a well-performing algorithm which is based on lattice basis reduction methods and may be formally analyzed We also give a relatively fast algorithm with moderate performance 1 Introduction Euclid s algorithm for computing the greatest common divisor of 2 numbers is considered to be the oldest proper algorithm known ([12]) This algorithm can be amplified naturally in various ways Thus, we can use it recursively to compute the gcd of more than two numbers Also, we can do a constructive computation, the so-called extended gcd, which expresses the gcd as a linear combination of the input numbers Extended gcd computation is of particular interest in number theory (see [4, chapters 2 and 3]) and in computational linear algebra ([8, 9, 11]), in Key Centre for Software Technology, Department of Computer Science, University of Queensland, Queensland 472, Australia, email: havas@csuqozau Key Centre for Software Technology, Department of Computer Science, University of Queensland, Queensland 472, Australia, email: bohdan@csuqozau Queensland Centre for Number Theory and Exact Matrix Computation, Department of Mathematics, University of Queensland, Queensland 472, Australia, email: krm@mathsuqozau 1

both of which it takes a basic role in fundamental algorithms An overview of some of the earlier history of the extended gcd is given in [4], showing that it dates back to at least Euler, while additional references and more recent results appear in [15] In view of many efforts to find good algorithms for extended gcd computation, Havas and Majewski ([15]) showed that this is genuinely difficult They prove that there are a number of problems for which efficient solutions are not readily available, including: Given a multiset of n numbers, how many of them do we need to take to obtain the gcd of all of them? ; and How can we efficiently find good multipliers in an extended gcd computation? They show that associated problems are NP-complete and it follows that algorithms which guarantee optimal solutions are likely to require exponential time We present new polynomial-time algorithms for solving the extended gcd problem These outperform previous algorithms We include some practical results and an indication of the theoretical analyses of the algorithms 2 The Hermite normal form As indicated in the introduction, extended gcd calculation plays an important role in computing canonical forms of matrices On the other hand, we can use the Hermite normal form of a matrix as a tool to help us develop extended gcd algorithms This relationship can be viewed along the following lines An m n integer matrix B is said to be in Hermite normal form if (i) the first r rows of B are nonzero; (ii) for 1 i r, if b iji j 1 < j 2 < < j r ; is the first nonzero entry in row i of B, then (iii) b iji > for 1 i r; (iv) if 1 k < i r, then b kji < b iji Let A be an m n integer matrix Then there are various algorithms for finding a unimodular matrix P such that P A = B is in row Hermite normal form These include those of Kannan Bachem [19, pages 349 357] 2

and Havas Majewski [9], which attempt to reduce coefficient explosion during their execution Throughout this paper we use x to mean the nearest integer to x Let [ C denote ] the submatrix of B formed by the r nonzero rows and write Q P =, where Q and R have r and m r rows, respectively Then R QA = C and RA = and the rows of R form a Z basis for the sublattice N(A) of Z m formed by the vectors X satisfying XA = Sims points out in his book [19, page 381] that the LLL lattice basis reduction algorithm ([13]) can be applied first to the rows of R to find a short basis for N(A); then by applying step 1 of the LLL algorithm with no interchanges of rows, suitable multiples of the rows of R can be added to those of Q, thereby reducing the entries in Q to manageable size, without changing the matrix B (Also see [6, page 144]) Many variants and applications of the LLL exist Here we simply observe that a further improvement to P is usually obtained by the following version of Gaussian reduction (see [2, page 1] for the classical algorithm): For each row Q i of Q and for each row R j of R, compute t = Q i rr j 2, where r = (Qi, R j (R j, R j ) If t < Q i 2, replace Q i by Q i rr j The process is repeated until no further shortening of row lengths occurs Before this reduction is done, a similar improvement is made to the rows of R, with the extra condition that the rows are presorted according to increasing length before each iteration We have implemented algorithms along these lines and report that the results are excellent for Hermite normal form calculation in general Details will appear elsewhere Here we apply it to extended gcd calculation in the following way Take A to be a column vector of positive integers d 1,, d m Then B is a column vector with one nonzero entry d = gcd (d 1,, d m ), r = 1, N(A) is an (m 1) dimensional lattice and Q is a row vector of integers x 1,, x m which satisfy d = x 1 d 1 + + x m d m 3

Thus, any Hermite normal form algorithm which computes the transforming matrix incorporates an extended gcd algorithm The first row of the transforming matrix is the multiplier vector This immediately carries across the NP-completeness results for extended gcd computation to Hermite normal form computation in a natural way We expect to obtain a short multiplier vector for the following reasons Using the notation of [6, page 142], we have Q = Q + m 1 l=1 µ mj R j, where R 1,, R m 1, Q form the Gram Schmidt basis for R 1,, R m 1, Q The vector A t is orthogonal to R 1,, R m 1 and we see After the reduction of Q, we have Q = dat m 1 A 2 + i=1 Q = dat A 2 λ i Ri, λ i 1, (i = 1,, m 1) 2 But R i R i and in practice the vectors R 1,, R m 1 are very short The triangle inequality now implies Q 1 + 1 2 m 1 i=1 R i While our method gives short multiplier vectors Q, it does not always produce the shortest vectors These can be determined for small m by using the Fincke Pohst algorithm (see [16, page 191]) to find all solutions in integers x 1,, x m 1 of the inequality Q m 1 i=1 x i R i 2 Q 2 Another approach to the extended gcd problem is to apply the LLL algorithm to the lattice L spanned by the rows of the matrix 4

W = 1 γa 1 1 γa n If γ is sufficiently large, the reduced basis will have the form where d = gcd (a 1,, a m ), P = [ P1 P 2 [ P1 P 2 ±γd ] ], is a unimodular matrix and P 2 is a multiplier vector which is small in practice This is true for the following reason: L consists of the vectors (X, a) = (x 1,, x m, γ(a 1 x 1 + + a m x m )), where x 1,, x m Z Hence X N(A) (X, ) L Also if (X, a) L and X does not belong to N(A), then a and (X, a) 2 γ 2 (1) Now Proposition 112 of [13] implies that if b 1,, b m form a reduced basis for L, then b j 2 m 1 2 max{ X 1,, X m 1 } = M, (2) if X 1,, X m 1 are linearly independent vectors in N(A) Hence if γ > M, it follows from inequalities 1 and 2 that the first m 1 rows of a reduced basis for L have the form (b j1,, b jm, ) The last vector of the reduced basis has the form (b m1,, b mm, γg) and the equations [ ] [ ] P A = A = P 1, g g imply d g and g d, respectively We have implemented and tested all the methods described in this section based on the calc platform developed by the third author 5

3 A new LLL-based algorithm The theoretical results of the previous section do indicate two specific LLLbased methods for computing small multipliers, but they have some drawbacks In the first instance, we need to execute some algorithm that computes matrix P, apply the LLL algorithm to the null space and finally, carry out the size reduction step on Q The second method requires executing the LLL algorithm on a matrix with numbers Ω(m) bits long, which incurs severe time penalties In this section we show how to modify the algorithm of Lenstra, Lenstra and Lovász in such a way that it computes short multipliers without any preprocessing or rescaling The modification is done along similar lines to those presented in [14], although the resulting algorithm is far simpler, due to the particular nature of our problem What follows is a modification of the method as described in [5, pages 83 87] and we follow the notation used there Let d 1, d 2,, d m be the nonnegative (with zeros allowed) integers for which we want to solve the extended gcd problem It is convenient to think of matrix b as initially set to be b = 1 d 1 1 d m However later we see that we do not actually include the last column in the matrix, but handle it separately Further, for any two (rational) vectors u = [u i ] l 1 and v = [v i] l 1, we define their weighted inner product W (u, v) to be: l 1 u v = u i v i + γ 2 u l v l, i=1 with γ tending to infinity This definition ensures that if either u l or v l is, the inner product becomes the simple dot product of two vectors However, if both u l and v l are nonzero, the preceding elements are insignificant and may be simply neglected In the following text we refer to the norm defined by this inner product as the weighted norm DEFINITION 31 [5, Definition 261] With the weighted norm, the basis 6

b 1, b 2,, b m is called LLL-reduced if and b i b j µ i,j = b j b 1, for 1 j < i m (C1) j 2 b i b i (α µ 2 i,i 1)b i 1 b i 1 (C2) where 4 1 < α < 1 (Note the slight change in the lower bound for α from the standard LLL To ensure proper gcd computation, α = 1/4 is not allowed) Assume that vectors b 1, b 2,, b k 1 are already LLL-reduced We state, for future use, the following lemma LEMMA 31 If vectors b 1, b 2,, b k 1 are LLL-reduced with respect to the weighted norm then the following are true: b j,m+1 =, for 1 j < k 1; 1 b k 1,m+1 gcd(d 1, d 2,, d k 1 ); b k 1,m+1 = b k 1,m+1 Proof The proof is a simple induction on conditions (C1) and (C2) The lemma is also implied by the argument from the end of Section 2 The vector b k needs to be reduced so that µ k,k 1 1 2 (Here we use a partial size reduction, sufficient to test condition (C2)) This is done by replacing b k by b k µ k,k 1 b k 1 By the definition of µ k,k 1 we have b k,m+1 b k 1,m+1 if b k,m+1 and b k 1,m+1 ; µ k,k 1 = if b k,m+1 = and b k 1,m+1 ; µ k,k 1 otherwise Here µ i,j is defined as µ i,j = ( m q=1 b i,q b j,q )/( m q=1 (b j,q )2 ), ie, µ i,j is a Gram- Schmidt coefficient for b with the last column ignored In other words, the algorithm first tries to reduce b k,m+1 If b k,m+1 b k 1,m+1 /2 no reduction occurs, unless both tail entries are zero (In this case we have a reduction in the traditional LLL sense) Observe that the first and second case in the definition of µ k,k 1 can be merged together Thus we need only to test if 7

b k 1,m+1, or in general if b j,m+1, if size reduction is carried out on the (j + 1)th row After the size reduction is done, we need to satisfy for that vector the so-called Lovász condition (C2) Consider (C2) when γ tends to infinity As in the case of partial size reduction we need to distinguish three possibilities, depending if the tail entries are or are not equal to zero (One option, where both tail entries in the orthogonal projections of b k and b k 1 are nonzero, as proved below, is impossible) If b k 1,m+1 and b k,m+1 are zero, condition (C2) becomes the standard Lovász condition, and the rows remain unchanged if the kth row is (α µ 2 k,k 1 ) times longer than the (k 1)th row In the second instance, if b k 1,m+1 =, while b k,m+1, the kth vector, with respect to the weighted norm, is infinitely longer than the preceding one, hence no exchange will occur Assume now that b k 1,m+1 and b k,m+1 The orthogonal projection of b k is computed as b k = b k k 2 i=1 µ k,i b i µ k,k 1 b k 1 By Lemma 31, we know that if b k 1,m+1 then it is the only such vector, ie, b i,m+1 =, for i = 1,, k 2 By the above presented argument µ k,k 1 = b k,m+1 /b k 1,m+1 and hence b k,m+1 = Consequently, the length of b k is infinitely shorter than the length of b k 1, condition (C2) is not satisfied, and rows k and k 1 are exchanged Once condition (C2) is satisfied for some k > 1, we need to complete the size reduction phase for the kth vector, so that µ k,j 2 1, for all j < k This is done along analogous lines as described for the partial size reduction To speed up the algorithm, we may use the observation that it always starts by computing the greatest common divisor of d 1 and d 2 and a set of minimal multipliers Thus we may initially compute g 2 = gcd(d 1, d 2 ) and x 1 d 1 + x 2 d 2 = g 2, with x 1 and x 2 being a definitely least solution (cf [15]) Also, because of the above discussed nature of the weighted norm, we need to compute Gram-Schmidt coefficients only for the n n submatrix b, with the last column ignored The complete algorithm is specified in Fig 1, with some auxiliary procedures given in Fig 2 We use a natural pseudocode which is consistent with the GAP language ([18]), which provided our first develop- 8

if m = 1 then return [1]; fi; (g 2, x 1, x 2 ) := ExtendedGcd(d 1, d 2 ); if m = 2 then return [x 1, x 2 ]; fi; d 2 /g 2 d 1 /g 2 x 1 x 2 b := 1 ; 1 k max := 2; k := k max + 1; d 2 := g 2 ; d 1 := ; µ 2,1 := (b 1 b 2 ) / (b 1 b 1 ); b 1 := b 1; b 2 := b 2 µ 2,1 b 1 ; repeat if k > k max then k max := k; µ k,j := b k b j b, j = 1, 2,, k 1; j b j b k := b k k 1 j=1 µ k,jb j ; fi; do Reduce(k, k 1); exit if d k 1 < d k or ( d k 1 = d k and b k b k (α µ2 k,k 1 )b k 1 b k 1 ) Swap(k); k := max(2, k 1); od; for i {k 2, k 3,, 1} do Reduce(k, i); od; k := k + 1; until k > m; if d m > then return b m ; else return b m ; fi; 9 Figure 1: The LLL based gcd algorithm

ment environment Magma ([2]) Analogous algorithms have also been implemented in procedure Reduce(k, i) procedure Swap(k) if d i then q := d k d i ; d k d k 1 ; bk b k 1 ; else for j {1 k 2} do q := µ k,i ; µ k,j µ k 1,j ; fi; od; if q then µ := µ k,k 1 ; b k := b k qb i ; µ k,i := µ k,i q; B := b k b k + µ2 b k 1 b k 1 ; b := b k ; for j {1 i 1} do b k := b k b k B b k 1 µ b k 1 b k 1 B b k ; µ k,j := µ k,j qµ i,j ; b k 1 := b + µb k 1 ; od; µ k,k 1 := µ b k 1 b k 1 fi; for i {k + 1 k max } do end; ν := µ i,k ; µ i,k := µ i,k 1 µν; µ i,k 1 := ν + µ k,k 1 µ i,k ; od; end; B ; Figure 2: Algorithms Reduce and Swap Experimental evidence suggests that sorting the sequence d i m i=1 in increasing order results in the fastest algorithm and tends to give better multipliers than for unsorted or reverse sorted orderings The theoretical analysis of the LLL algorithm with α = 3/4 leads to an O(m 4 log max j {d j }) complexity bound for the time taken by this algorithm to perform the extended gcd calculation We can also obtain bounds on the quality of the multipliers, but these bounds seem unduly pessimistic in comparison with practical results, as is observed in other LLL applications (cf [17]) 1

4 A sorting gcd approach The presented LLL variation is very successful in obtaining small multipliers However the quartic complexity (in terms of the number of numbers) of such an approach may be unjustified for those applications that can accept somewhat worse solutions This is one of the reasons why we have developed a faster heuristic procedure, which we call a sorting gcd algorithm (In fact it is quite similar to a method due to Brun [4, section 3H]) The algorithm starts by setting up an array b, similar to the array b created in the previous section d 1 1 b = d m 1 At each step the algorithm selects two rows, i and j, and subtracts row i from row j, some q times Naturally, in the pessimistic case max k { b j,k qb i,k } = max { b j,k } + q max { b i,k } k Once rows i and j are selected, we cannot change any individual element of either row i or j Thus in order to minimize the damage done by a single row operation we need to minimize q In the standard Euclidean algorithm, q is always selected as large as possible Artificially minimizing q by (say) selecting q to be some predefined constant may cause unacceptably long running times for the algorithm However, by clever choice of i and j, which minimizes q = b j,1 /b i,1, we obtain a polynomial time algorithm, which is quite successful in yielding good solutions to the extended gcd problem Clearly, to minimize b j,1 /b i,1, we need to select i so that b i,1 is the largest number not exceeding b j,1 This is the central idea of the sorting gcd algorithm In order to make the algorithm precise, we decide that the row with the largest leading element is always selected as the jth row Throughout the algorithm, we operate on positive numbers only, ie, we assume b i,1, for i = 1, 2,, m, and for two rows i and j we compute the largest q such that b j,1 qb i,1 b i,1 1 Furthermore, we observe that we can obtain additional benefits if we spread the multiplier q over several rows This may be done whenever there is more than one row with 11 k

the same leading entry as row i Also in such cases, the decision as to which of the identical rows to use first should be based on some norm of these rows The distribution of the multiplier q should be then biased towards rows with smaller norm (A discussion of relevant norms for this and related contexts appears in [7]) An efficient implementation of the method may use a binary heap, with each row represented by a key comprising the leading element and the norm of that row with the leading element deleted, ie, b i,1, [b i,2, b i,3,, b i,n+1 ], plus an index to it Retrieving rows i and j takes then at most O(log m) time, as does reinserting them back to the heap (which occurs only if the leading element is not zero) Row j cannot be operated upon more than log d j times and hence the complexity of the retrieval and insertion operations for all m rows is O(m log m log max j {d j }) On top of that we have the cost of subtracting two rows, repeated at most log d j times, for each of m rows Thus the overall complexity of the algorithm is O(m 2 log max j {d j }) Subtle changes, like spreading the multiplier q across several rows with identical leading entry, do not incur any additional time penalties in the asymptotic sense The following lemma gives us a rough estimate of the values that q may take Most of the time q = O(1), if m 3 LEMMA 41 Assume we are given a positive integer d 1 The average value of q = d 1 / max i 2 {d i }, where d 2,, d m is a sequence of m 1 randomly generated positive integers less than d 1, is Ψ(d 1 + 2) ( + γ 1 ) for m = 2 q = ζ(m 1) + O for m 3 m 1 (m 2)d m 2 1 where Ψ(x) ln(x) 1 2 x O(x 2 ) and ζ(k) = i=1 i k Outline of Proof For the expression d 1 / max i {d i } to be equal to some q {1,, d 1 }, we must have at least one d i falling into the interval (d 1 /(q + 1),, d 1 /q] and no d i can be larger than d 1 /q Hence the probability of obtaining such a quotient q is equal to (1/q) m 1 (1/(q+1)) m 1 The average value of q is obtained by calculating d 1 q=1 q[(1/q)m 1 (1/(q + 1)) m 1 ] 12

5 Some examples We have applied the methods described here to numerous examples, all with excellent performance Note that there are many papers over the years which study explicit input sets and quite a number of these are listed in the references of [4] and [15] We illustrate algorithm performance with a small selection of examples Note also that there are many parameters which can affect the performance of LLL lattice basis reduction algorithms (also observed by many others, including [17]) First and foremost is the value of α Smaller values of α tend to give faster execution times but worse multipliers, however this is by no means uniform Also, the order of input may have an effect, as mentioned before (a) Take d 1, d 2, d 3, d 4 to be 11685838, 18181878, 314252913, 134684 Following the first method described in some detail, we see that the Kannan Bachem algorithm gives a unimodular matrix P satisfying P A = [1,,, ] t : P = 225128444972656 144322692479928 1 452568913779963 288645385878369 2 74123999242 475185352446 1 954939 5842919 Applying LLL with α = 3/4 to the last three rows of P gives 225128444972656 144322692479928 1 13 146 58 362 63 13 22 144 15 128 678 381 Continuing the reduction with no row interchanges, shortens row 1: 88 352 167 11 13 146 58 362 63 13 22 144 15 128 678 381 The multiplier vector 88, 352, 167, 11 is the unique multiplier vector of least length, as is shown by the Fincke-Pohst method In fact LLL-based methods give this optimal multiplier vector for all α [1/4, 1] 13

Earlier algorithms which aim to improve on the multipliers do not fare particularly well Blankinship s algorithm ([1]) gives the multiplier vector, 35543971456, 1, 62136727713712 The algorithm due to Bradley ([3]) gives 27237259, 1746943, 1, This shows that Bradley s definition of minimal is not useful The gcd tree algorithm of [15] meets its theoretical guarantees and gives,, 6177777, 1876366 (This algorithm was not designed for practical use) The sorting gcd algorithm using the Euclidean norm gives 5412, 3428, 881, 2632 while it gives 3381, 49813, 27569, 3469 with the max norm Another sorting variant gives 485, 272, 279, 1728, all of which are reasonable results in view of the greater speed There are a number of interesting comparisons and conclusions to be drawn here First, the multipliers revealed by the first row of P are the same as those which are obtained if we use the natural recursive approach to extended gcd calculation Viewed as an extended gcd algorithm, the Kannan- Bachem algorithm is a variant of the recursive gcd and the multipliers it produces are quite poor compared to the optimal multipliers This indicates that there is substantial scope for improving on the Kannan-Bachem method not only for extended gcd calculation but also for Hermite normal form calculation in general Better polynomial time algorithms for both of these computations are a consequence of these observations Thus, the gcd-driven Hermite normal form algorithms decribed in [9] give similar multipliers to those produced by the sorting gcd algorithm on which they are based (b) Take d 1,, d 1 to be 763836, 166557, 113192, 178512, 1476, 377752, 114793, 3126753, 1997137, 26318 The LLL-based method of section 3 gives the following multiplier vectors for various values of α We also give the length squared for each vector α multiplier vector x x 2 1/4 7, 1, 5, 1, 1,, 4,,, 93 1/3 1,, 6, 1, 1, 1,, 2, 3, 53 1/2 3,, 3,, 1, 1,, 1, 4, 2 41 2/3 1, 3, 2, 1, 5,, 1, 1, 2, 1 47 3/4 1, 3, 2, 1, 5,, 1, 1, 2, 1 47 1 1,, 1, 3, 1, 3, 3, 2, 2, 2 42 14

The Fincke-Pohst algorithm reveals that the unique shortest solution is 3, 1, 1, 2, 1, 2, 2, 2, 2, 2 which has length squared 36 The sorting gcd algorithm using the Euclidean norm gives 2, 3, 6, 2, 9,, 2,, 2, 6 with length squared 178 while with the max norm it gives 9, 5, 2, 7, 2, 3,, 3, 1, 5 which has length squared 27 The recursive gcd gives multipliers 193673223, 138729291, 1,,,,,,,, The Kannan-Bachem algorithm gives 445376559, 31896527153,,,,,,,, 1 Blankinship s algorithm gives the multiplier vector while Bradley s algorithm gives 3485238369, 1, 23518892995,,,,,,, 135282, 96885, 1,,,,,,, (c) The following example has theoretical significance Take d 1,, d m to be the Fibonacci numbers (i) F n, F n+1,, F 2n, n odd, n 5; (ii) F n, F n+1,, F 2n 1, n even, n 4 Using the identity F m L n = F m+n + ( 1) n F m n, it can be shown that the following are multipliers: (i) L n 3, L n 4,, L 2, L 1, 1, 1,,, n odd; (ii) L n 3, L n 4,, L 2, (L 1 + 1), 1,,, n even, where L 1, L 2, denote the Lucas numbers 1, 3, 4, 7, In fact we have the identities: 15

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 L n 1 L n 2 L 4 L 3 L 2 L 1 1 1 L n 3 L n 4 L 2 L 1 1 1 if n is odd and n 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 L n 1 L n 2 L 4 L 3 L 2 (L 1 + 1) 1 L n 3 L n 4 L 2 (L 1 + 1) 1 F n F n+1 F 2n 3 F 2n 2 F 2n 1 F 2n F n F n+1 F 2n 4 F 2n 3 F 2n 2 F 2n 1 = 1 = if n is even and n 4 The square matrices are unimodular, as their inverses are the following matrices, respectively: F 3 F 2 F 5 F 4 F n 3 F n 2 F n 2 F n F 4 F 3 F 6 F 5 F n 2 F n 1 F n 1 F n+1 F 5 F 1 F 4 F 7 F 6 F n 1 F n F n F n+2 F 6 F 2 F 5 F 1 F 8 F 7 F n F n+1 F n+1 F n+3 F 7 F 3 F 6 F 2 F 9 F 1 F 8 F n+1 F n+2 F n+2 F n+4 F n+1 F n 3 F n F n 4 F n+3 F n 5 F n+2 F n 6 F 2n 5 F 1 F 2n 4 F 2n 4 F 2n 2 F n+2 F n 2 F n+1 F n 3 F n+4 F n 4 F n+3 F n 5 F 2n 4 F 2 1 F 2n 3 F 2n 3 F 2n 1 F n+3 F n 1 F n+2 F n 2 F n+5 F n 3 F n+4 F n 4 F 2n 3 F 3 1 F 2n 2 1 F 2n 2 F 2n F 3 F 2 F 5 F 4 F n 4 F n 2 F n 2 F n 2 F n F 4 F 3 F 6 F 5 F n 3 F n 1 F n 1 F n 1 F n+1 F 5 F 1 F 4 F 7 F 6 F n 2 F n F n F n F n+2 F 6 F 2 F 5 F 1 F 8 F 7 F n 1 F n+1 F n+1 F n+1 F n+3 F 7 F 3 F 6 F 2 F 9 F 1 F 8 F n F n+2 F n+2 F n+2 F n+4 F n F n 4 F n 1 F n 5 F n+2 F n 6 F n+1 F n 7 F 2n 7 F 1 F 2n 5 F 2n 5 F 2n 5 F 2n 3 F n+1 F n 3 F n F n 4 F n+3 F n 5 F n+2 F n 6 F 2n 6 F 2 F 2n 4 F 1 F 2n 4 F 2n 4 F 2n 2 F n+2 F n 2 F n+1 F n 3 F n+4 F n 4 F n+3 F n 5 F 2n 5 F 3 F 2n 3 F 2 F 2n 3 1 F 2n 3 F 2n 1 It is not difficult to prove that the multipliers given here are the unique vectors of least length (by completing the appropriate squares and using various Fibonacci and Lucas number identities, see [1]) The length squared of the multipliers is L 2n 5 + 1 in both cases (In practice the LLL-based algorithms compute these minimal multipliers) 16 1 (3) (4)

These results give lower bounds for extended gcd multipliers in terms of Euclidean norms Since, with φ = 1+ 5 2, L 2n 5 + 1 φ 2n 5 φ 5 5F 2n it follows that a general lower bound for the Euclidean norm of the multiplier vector in terms of the initial numbers d i must be at least O( max{d i }) Also, the length of the vector F n, F n+1,, F 2n is of the same order of magnitude as F 2n, so a general lower bound for the length of the multipliers in terms of the Euclidean length of the input, l say, is O( l) 6 Conclusions We have described new algorithms for extended gcd calculation which provide good multipliers We have given examples of their performance and indicated how the algorithms may be fully analyzed Related algorithms which compute canonical forms of matrices will be the subject of another paper Acknowledgments The first two authors were supported by the Australian Research Council References [1] WA Blankinship, A new version of the Euclidean algorithm, Amer Math Mon, 7:742 745, 1963 [2] W Bosma and J Cannon, Handbook of Magma functions, Department of Pure Mathematics, Sydney University, 1993 [3] GH Bradley, Algorithm and bound for the greatest common divisor of n integers, Communications of the ACM, 13:433 436, 197 17

[4] AJ Brentjes, Multi dimensional continued fraction algorithms, Mathematisch Centrum, Amsterdam 1981 [5] H Cohen, A course in computational algebraic number theory, Springer Verlag, 1993 [6] M Grötschel, L Lovász and A Schrijver, Geometric Algorithms and Combinatorial Optimization, Springer Verlag, Berlin 1988 [7] G Havas, DF Holt and S Rees, Recognizing badly presented Z-modules, Linear Algebra and its Applications, 192:137 163, 1993 [8] G Havas and BS Majewski, Integer matrix diagonalization, Technical Report TR277, The University of Queensland, Brisbane, 1993 [9] G Havas and BS Majewski, Hermite normal form computation for integer matrices, Congressus Numerantium, 1994 [1] VE Hoggatt Jr, Fibonacci and Lucas Numbers, Houghton Mifflin Company, Boston 1969 [11] CS Iliopoulos Worst case complexity bounds on algorithms for computing the canonical structure of finite abelian groups and the Hermite and Smith normal forms of an integer matrix, SIAM J Computing, 18:658 669, 1989 [12] DE Knuth, The Art of Computer Programming, Vol 2: Seminumerical Algorithms, Addison-Wesley, Reading, Mass, 2nd edition, 1973 [13] AK Lenstra, HW Lenstra Jr, and L Lovász Factoring polynomials with rational coefficients, Math Ann, 261:515 534, 1982 [14] L Lovász and HE Scarf, The generalized basis reduction algorithm, Mathematics of Operations Research (1992), 17, 751 764 [15] BS Majewski and G Havas, The complexity of greatest common divisor computations, Proceedings First Algorithmic Number Theory Symposium, Lecture Notes in Computer Science 877, 184 193, 1994 [16] M Pohst and H Zassenhaus, Algorithmic Algebraic Number Theory, Cambridge University Press, 1989 18

[17] CP Schnorr and M Euchner, Lattice basis reduction: improved practical algorithms and solving subset sum problems, Lecture Notes in Computer Science 529, 68 85, 1991 [18] M Schönert et al, GAP Groups, Algorithms and Programming, Lehrstuhl D für Mathematik, RWTH, Aachen, 1994 [19] CC Sims, Computing with finitely presented groups, Cambridge University Press, 1994 [2] B Vallée, A Central Problem in the Algorithmic Geometry of Numbers: Lattice Reduction, CWI Quarterly (199) 3, 95 12 19