Faster polynomial multiplication via multipoint Kronecker substitution

Similar documents

Faster deterministic integer factorisation

A simple and fast algorithm for computing exponentials of power series

Factoring Polynomials

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Application. Outline. 3-1 Polynomial Functions 3-2 Finding Rational Zeros of. Polynomial. 3-3 Approximating Real Zeros of.

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

Prime Numbers and Irreducible Polynomials

3.2 The Factor Theorem and The Remainder Theorem

FACTORING SPARSE POLYNOMIALS

HYPERELLIPTIC CURVE METHOD FOR FACTORING INTEGERS. 1. Thoery and Algorithm

it is easy to see that α = a

The finite field with 2 elements The simplest finite field is

The Piranha computer algebra system. introduction and implementation details

Mathematics of Internet Security. Keeping Eve The Eavesdropper Away From Your Credit Card Information

Integer Factorization using the Quadratic Sieve

Integrals of Rational Functions

Computer and Network Security

Some facts about polynomials modulo m (Full proof of the Fingerprinting Theorem)

CHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY

H/wk 13, Solutions to selected problems

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any.

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the school year.

POLYNOMIAL FUNCTIONS

2.3. Finding polynomial functions. An Introduction:

EMBEDDING DEGREE OF HYPERELLIPTIC CURVES WITH COMPLEX MULTIPLICATION

Outline. Computer Science 418. Digital Signatures: Observations. Digital Signatures: Definition. Definition 1 (Digital signature) Digital Signatures

Copyrighted Material. Chapter 1 DEGREE OF A CURVE

Lecture 6: Finite Fields (PART 3) PART 3: Polynomial Arithmetic. Theoretical Underpinnings of Modern Cryptography

Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.

Modern Algebra Lecture Notes: Rings and fields set 4 (Revision 2)

Zeros of Polynomial Functions

Basic Algorithms In Computer Algebra

The Mathematics of the RSA Public-Key Cryptosystem

Unique Factorization

JUST THE MATHS UNIT NUMBER 1.8. ALGEBRA 8 (Polynomials) A.J.Hobson

Arithmetic algorithms for cryptology 5 October 2015, Paris. Sieves. Razvan Barbulescu CNRS and IMJ-PRG. R. Barbulescu Sieves 0 / 28

minimal polyonomial Example

Algebraic Computation Models. Algebraic Computation Models

SOLVING POLYNOMIAL EQUATIONS

a 1 x + a 0 =0. (3) ax 2 + bx + c =0. (4)

Polynomials and Factoring

Integer roots of quadratic and cubic polynomials with integer coefficients

Determining the Optimal Combination of Trial Division and Fermat s Factorization Method

Discrete Mathematics: Homework 7 solution. Due:

Zeros of a Polynomial Function

Chapter 10: Network Flow Programming

9. POLYNOMIALS. Example 1: The expression a(x) = x 3 4x 2 + 7x 11 is a polynomial in x. The coefficients of a(x) are the numbers 1, 4, 7, 11.

I. Introduction. MPRI Cours Lecture IV: Integer factorization. What is the factorization of a random number? II. Smoothness testing. F.

3 1. Note that all cubes solve it; therefore, there are no more

Table of Contents. Bibliografische Informationen digitalisiert durch

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and

High-Performance Modular Multiplication on the Cell Processor

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

Lecture 13 - Basic Number Theory.

The mathematics of RAID-6

Module MA3411: Abstract Algebra Galois Theory Appendix Michaelmas Term 2013

Basics of Polynomial Theory

Zeros of Polynomial Functions

Algebraic and Transcendental Numbers

Quotient Rings and Field Extensions

FFT Algorithms. Chapter 6. Contents 6.1

Runtime and Implementation of Factoring Algorithms: A Comparison

Improved Single and Multiple Approximate String Matching

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include

Library (versus Language) Based Parallelism in Factoring: Experiments in MPI. Dr. Michael Alexander Dr. Sonja Sewera.

Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks

Introduction to Finite Fields (cont.)

Partial Fractions. Combining fractions over a common denominator is a familiar operation from algebra:

Determinants can be used to solve a linear system of equations using Cramer s Rule.

2.5 Zeros of a Polynomial Functions

11 Multivariate Polynomials

Math 4310 Handout - Quotient Vector Spaces

CRYPTOG NETWORK SECURITY

3.3. Solving Polynomial Equations. Introduction. Prerequisites. Learning Outcomes

Trigonometric Functions and Equations

The Factor Theorem and a corollary of the Fundamental Theorem of Algebra

OSTROWSKI FOR NUMBER FIELDS

Zeros of Polynomial Functions

MOP 2007 Black Group Integer Polynomials Yufei Zhao. Integer Polynomials. June 29, 2007 Yufei Zhao

Chapter 4. Polynomial and Rational Functions. 4.1 Polynomial Functions and Their Graphs

Modélisation et résolutions numérique et symbolique

Binary Number System. 16. Binary Numbers. Base 10 digits: Base 2 digits: 0 1

Elliptic Curve Cryptography

Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA

2.4 Real Zeros of Polynomial Functions

Polynomials and Factoring

Monogenic Fields and Power Bases Michael Decker 12/07/07

Lecture 5 Rational functions and partial fraction expansion

Cyclotomic Extensions

Transcription:

Faster polynomial multiplication via multipoint Kronecker substitution 5th February 2009

Kronecker substitution KS = an algorithm for multiplying polynomials in Z[x]. Example: f = 41x 3 + 49x 2 + 38x + 29, g = 19x 3 + 23x 2 + 46x + 21. To find h = fg, evaluate f (10 4 ) = 41004900380029, g(10 4 ) = 19002300460021 by packing coefficients together. Then h(10 4 ) = f (10 4 )g(10 4 ) = 779187437354540344421320609. Coefficients of f and g are < 50, so coefficients of h are < 4 50 2 = 10 4. Can unpack h(10 4 ) to obtain h = 779x 6 + 1874x 5 + 3735x 4 + 4540x 3 + 3444x 2 + 2132x + 609.

Kronecker substitution Notes: Same idea reduces multiplication in R[x, y] to multiplication in R[x] for any ring R, via y x n for large enough n (Kronecker 1882). Application to arithmetic in Z[x] suggested by Schönhage (1982). On real hardware, use a power of 2, not 10. We assume packing and unpacking is linear time.

Kronecker substitution Advantages of KS over direct multiplication algorithms: If coefficients are small relative to machine word size, makes more efficient use of hardware multiply instruction. Places burden of optimisation on existing libraries like GMP (GNU Multiple Precision Arithmetic Library) already obscenely optimised for huge variety of platforms. Examples of real implementations: The Magma computer algebra package (popular in number theory and arithmetic geometry) uses KS for arithmetic in Z[x] and (Z/nZ)[x] when coefficients are small. NTL uses KS to reduce arithmetic in GF(p n )[x] to arithmetic in GF(p)[x].

Kronecker substitution What is the running time? Suppose f, g have (non-negative) coefficients with c bits. Suppose len f = len g = n (i.e. they have degree n 1). Coefficients of h = fg are bounded by 2 2c n, so suffices to evaluate at 2 b where b = 2c + log 2 n. Running time is therefore M(nb) + O(nb), where M(k) = time to multiply k-bit integers, O(nb) is the linear-time packing/unpacking cost.

KS2 algorithm Idea: evaluate at several (carefully selected!) points, thereby reducing to several smaller integer multiplications. Example (in base 10): f = 41x 3 + 49x 2 + 38x + 29, g = 19x 3 + 23x 2 + 46x + 21. Then f (10 2 ) = 41493829, g(10 2 ) = 19234621, f ( 10 2 ) = 40513771, g( 10 2 ) = 18774579. Packed with alternating signs still linear time. Two half-sized integer multiplications: h(10 2 ) = f (10 2 )g(10 2 ) = 798118074653809, h( 10 2 ) = f ( 10 2 )g( 10 2 ) = 760628994227409.

KS2 algorithm Problem: coefficients of h overlap, in both h(10 2 ) and h( 10 2 ): 779373534440609 187445402132 + 798118074653809 = h(10 2 ) 779373534440609 187445402132 760628994227409 = h( 10 2 ) Solution: if h(x) = h 0 (x 2 ) + xh 1 (x 2 ), then h 0 (10 4 ) = 1 2 (h(102 ) + h( 10 2 )) = 779373534440609 10 2 h 1 (10 4 ) = 1 2 (h(102 ) h( 10 2 )) = 18744540213200 Unpacking is still linear time.

KS2 algorithm What s the point? Running time is about 2M(nb/2) + O(nb), compared to M(nb) + O(nb) for standard KS. Assume that M(k) = O(k α ). Then M(nb) 2M(nb/2) = (nb)α 2(nb/2) α = 2α 1. For classical multiplication, α = 2. Expect 2x speedup. For Karatsuba multiplication, α 1.58. Expect 1.5x speedup. In practice, the linear terms get in the way!! For FFT multiplication, M(k) k log k. No constant speedup expected; but perhaps some savings from better memory locality.

KS2 vs KS1 (64-bit, Core 2 Duo) 2 running time of KS2 as proportion of KS1 1.5 1 0.5 5-bit coefficients 10-bit coefficients 20-bit coefficients 0 1 10 100 1000 10000 100000 Polynomial length

KS3 algorithm Another set of points to try... f = 41x 3 + 49x 2 + 38x + 29, g = 19x 3 + 23x 2 + 46x + 21. Then f (10 2 ) = 41493829, g(10 2 ) = 19234621, 10 6 f (10 2 ) = 29384941, 10 6 g(10 2 ) = 21462319. Packed in reversed order still linear time. Two half-sized integer multiplications: h(10 2 ) = f (10 2 )g(10 2 ) = 798118074653809, 10 12 h(10 2 ) = 10 6 f (10 2 )10 6 g(10 2 ) = 630668977538179.

KS3 algorithm 0779 1874 3735 4540 3444 2132 0609 + 798118074653809 = h(10 2 ) 0609 2132 3444 4540 3735 1874 0779 + 630668977538179 = 10 12 h(10 2 ) Problem: coefficients of h overlap. How to reconstruct h? Let h = h 6 x 6 + h 5 x 5 + h 4 x 4 + h 3 x 3 + h 2 x 2 + h 1 x + h 0.

KS3 algorithm 0779 1874 3735 4540 3444 2132 0609 + 798118074653809 0609 2132 3444 4540 3735 1874 0779 + 630668977538179 We can recover the bottom half of h 6 from the lowest base-100 digit of 10 12 h(10 2 ), since there is no overlap there. Where is the top half of h 6?

KS3 algorithm 0779 1874 3735 4540 3444 2132 0609 + 798118074653809 0609 2132 3444 4540 3735 1874 0779 + 630668977538179 The top half of h 6 is located in the highest base-100 digit of h(10 2 ). But hang on... couldn t there be a carry from 79 + 18?

KS3 algorithm 0779 1874 3735 4540 3444 2132 0609 + 798118074653809 0609 2132 3444 4540 3735 1874 0779 + 630668977538179 NO: because 79 < 98. (Strictly speaking, we also need to know that 18 < 99, otherwise we could get burned by a carry propagating from further down, e.g. from 74 + 37. I ll return to this later.)

KS3 algorithm 1874 3735 4540 3444 2132 0609 + 19118074653809 0609 2132 3444 4540 3735 1874 6306689775374 + Therefore we completely recover h 6 = 779, and we can subtract it from the appropriate location in both sums.

KS3 algorithm 1874 3735 4540 3444 2132 0609 + 19118074653809 Do the same thing again: Bottom half of h 5 is 74. 0609 2132 3444 4540 3735 1874 6306689775374 + This time there was a carry, because 11 < 74. Therefore top half of h 5 is 19 1 = 18. Subtract h 5 and repeat!

KS3 algorithm An example where the bogus carry propagation problem occurs: both h(x) = 9901x 2 + 9901x and h(x) = 100x 3 + 100 satisfy h(10 2 ) = 10 12 h(10 2 ) = 100000100. We can protect against this by insisting that the coefficients of h are bounded by 9899 instead of 9999. This doesn t materially affect the applicability of the algorithm.

KS3 algorithm This unpacking algorithm runs in linear time, so we obtain the same estimate 2M(nb/2) + O(nb) for the running time of KS3.

KS3 vs KS1 2 running time of KS3 as proportion of KS1 1.5 1 0.5 5-bit coefficients 10-bit coefficients 20-bit coefficients 0 1 10 100 1000 10000 100000 Polynomial length

KS4 algorithm Key ideas of KS2 and KS3 are orthogonal. Let s try four points: f = 41x 3 + 49x 2 + 38x + 29, g = 19x 3 + 23x 2 + 46x + 21. f (10) = 46309, g(10) = 21781, f ( 10) = 36451, g( 10) = 17139, 10 3 f (10 1 ) = 33331, 10 3 g(10 1 ) = 25849, 10 3 f ( 10 1 ) = 25649, 10 3 g( 10 1 ) = 16611. Notice there is now even overlap in the evaluation phase, e.g. for f ( 10) we have 4929 4138 36451

KS4 algorithm Leads to four multiplications of one fourth the size: h(10) = f (10)g(10) = 1008656329 h( 10) = f ( 10)g( 10) = 624733689 10 6 h(10 1 ) = 10 3 f (10 1 )10 3 g(10 1 ) = 861573019 10 6 h( 10 1 ) = 10 3 f ( 10 1 )10 3 g( 10 1 ) = 426055539 Then: } h(10) h( 10) h(10 1 } ) h( 10 1 ) { KS2 h 0 (10 2 ) h 1 (10 2 ) KS2 { h 0 (10 2 ) h 1 (10 2 ) and h 0 (10 2 } ) h 0 (10 2 ) h 1 (10 2 } ) h 1 (10 2 ) KS3 h 0 (x) KS3 h 1 (x)

KS4 algorithm Running time is now 4M(nb/4) + O(nb). If M(k) = O(k α ), then M(nb) 4M(nb/4) = 4α 1. Classical multiplication = 4x speedup over ordinary KS. Karatsuba multiplication = 2.25x speedup. FFT multiplication = no constant speedup.

KS4 vs KS1 2 running time of KS4 as proportion of KS1 1.5 1 0.5 5-bit coefficients 10-bit coefficients 20-bit coefficients 0 1 10 100 1000 10000 100000 Polynomial length

KS2, KS3, KS4 vs KS1 for 20-bit coefficients 2 running time as proportion of KS1 1.5 1 0.5 KS2 KS3 KS4 0 1 10 100 1000 10000 100000 Polynomial length

KS 5? Could we evaluate at other points? One possibility: f (10i) = 4871 + 40620i, g(10i) = 2279 18540i h(10i) = f (10i)g(10i) = 741993791 + 182881320i. This yields h 0 ( 10 2 ), h 1 ( 10 2 ). Then use h(±10) = f (±10)g(±10) to recover h 0 (10 2 ), h 1 (10 2 ), and then h(x). Problem: complex multiplication requires three real multiplications. So this strategy reduces to five multiplications of 1/4 the size. Other roots of unity lead to similar problems.

Applications These algorithms are implemented in zn poly, a new library for polynomial arithmetic in (Z/mZ)[x], where n fits into a machine word ( long in C). Currently zn poly is pretty good at multiplication and middle product, not very good at division yet.

Applications Algorithms for multiplication in zn poly: Direct classical/karatsuba for small degree KS1/KS2/KS3/KS4 for medium degree Schönhage Nussbaumer FFT for large degree (odd modulus only) Note that the FFT reduces a length-n multiplication to about n length- n multiplications, so the improved KS affects huge multiplications (even though it isn t used directly).

Comparison of packages (48-bit coeffs, 2.6GHz Opteron) Normalised running time (log base 2) KS4 FFT with all KS FFT with only KS1 NTL Magma 0 5 10 15 20 25 30 Polynomial length (log base 2)

Applications Real-life uses of zn poly: My Ph.D. thesis: a new algorithm for computing zeta functions of hyperelliptic curves over finite fields (important problem in cryptography). Record example: genus 3 over F p for p = 2 55 55, Jacobian has 2 165 points. Used 30 hours and 90 GB RAM on single Opteron core. Joint work with Joe Buhler: verification of Vandiver s conjecture and computation of cyclotomic invariants for p < 163,000,000. Used about 21 CPU years on a supercomputer at TACC (Texas Advanced Computing Center).

Applications Other folks who have used zn poly: Computing L-series of hyperelliptic curves, Andrew Sutherland and Kiran Kedlaya (MIT), ANTS-VIII, 2008. Computing Hilbert class polynomials with the Chinese Remainder Theorem (in preparation), Andrew Sutherland. The FLINT library ( Fast Library for Number Theory, William Hart, Warwick) uses zn poly for arithmetic in (Z/mZ)[x].

Thank you!