Primality - Factorization

Primality - Factorization Christophe Ritzenthaler November 9, 2009 1 Prime and factorization Definition 1.1. An integer p > 1 is called a prime number (nombre premier) if it has only 1 and p as divisors. Example 1. There are infinitely many prime numbers. The biggest generic one is (((((((((2 3 + 3) 3 + 30) 3 + 6) 3 + 80) 3 + 12) 3 + 450) 3 + 894) 3 + 3636) 3 + 70756) 3 + 97220 Interested readers may read http: // www. cs. uwaterloo. ca/ journals/ JIS/ VOL8/ Caldwell/ caldwell78. html for the origin of this number. It has 20,562 decimal digits and the proof was built using fastecpp on several networks of workstations. We will write P the set of prime numbers. To estimate the efficiency of some algorithms, we need results on density of primes. Theorem 1.1. Let π(x) = #{p x prime}. One has π(x) x log x. Let n 2 be an integer and c an integer prime to n. Let π n,c (x) = #{p x prime, p = kn + c}. One has π n,c (x) 1 x φ(n) log x. To find a prime number, the number of attempts is then of the size of x. Indeed, the probability to fail in k attempts is (1 1/ log(x)) k so the probability to succeed which is closed to 1 for any k = log(x) 1+ɛ. 1 (1 1/ log(x)) k k/ log(x) 1 e Remark 1. For x 17, one has π(x) > x/ log x and for x > 1 one has π(x) < 1.25506(x/ log x). Let us finish with the fundamental result. 1

Theorem 1.2. Every integer a > 1 can be written as the product of prime numbers a = p P p e(p) with e(p) 0 and e(p) = 0 except for finitely many primes p. Up to permutation, the factors in this product are uniquely determined. 2 Prime numbers To produce big prime numbers is very important for cryptographic applications. For a given n, there is no generic algorithm which can compute a random prime number less than n. However a result by Hadamard and de la Vallée Poussin shows that #{p n prime} n log n (see 1). So the usual method is to pick random numbers and to test if they are prime or not. This requests that we have fast algorithms to test primality. 2.1 Primality test 2.1.1 Trial division The simplest algorithm is based on the following result. Proposition 2.1. n is a composite number if and only if it has a prime divisor p such that p n. Proof. Since n is composite, n = ab and either a or b is smaller than n. This proposition suggests that one can try all prime numbers less or equal to n using Eratosthenes sieve. Following the estimate density of prime (see 1), it means that we make up to n/ log n divisions, leading to an exponential algorithm in O(e ( 1 2 ɛ) log n ). 2.1.2 Fermat test and Carmichael numbers By Fermat little theorem, one knows that if n is a prime number then a n 1 1 (mod n) for all a Z coprime with n. If the theorem was an equivalence, we would have an easy polynomial algorithm to test if a number is a prime. Unfortunately Example 2. Consider n = 341 = 11 31. One has 2 340 1 (mod 341). Such a number is called pseudo-prime (pseudo-premier) in base 2. We can prove that there are infinitely many pseudo-primes in base 2 by showing that if n is such a number then 2 n 1 also. Indeed because n is a pseudo-prime in base 2 one has n 2 n 1 1, i.e. there is c such that nc = 2 n 1 1. Now The last expression is divisible by 2 n 1 so 2 2n 1 1 1 = 2 2(2n 1 1) 1 = 2 2nc 1. 2 2n 1 1 1 (mod 2 n 1). 2

To finish the proof, one has to show that 2 n 1 is not a prime. Since n = ab, 2 n 1 is divisible by 2 a 1. An idea is then to change the value of a : for instance 3 340 56 (mod 341). Unfortunately, there are numbers that are pseudo-prime in any base. Such numbers are called Carmichael numbers (for instance 561 = 3 11 17). It has been shown by Alford, Granville and Pomerance in 1994 that there are infinitely many Carmichael numbers so Fermat test cannot be completely sure. Let us show some properties of these numbers. Proposition 2.2. An (odd) composite number n 3 is a Carmichael number if and only if it is square free and for each prime divisor p of n, p 1 divides n 1. Proof. First it is easy to see that a Carmichael number is odd : indeed ( 1) n 1 1 (mod n) if and only if n is odd. Let a be a Carmichael number, for any a prime to n one has a n 1 1 (mod n). Let p be a prime divisor of n. There exists a primitive element modulo p that is prime to n. Indeed, let a a primitive element modulo p and n = p r m with m coprime to p. There exists an element (still denoted a) in Z/p r Z lifting the initial a (because the morphism Z/p r Z Z/pZ est surjectif). We find s Z/mZ coprime to m and since Z/nZ Z/p r Z Z/mZ we construct the element a Z/nZ image of (a, s). Such an element satisfies the properties for a. Now, one has of course a n 1 1 (mod p) but as a is primitive p 1 divides n 1. Now suppose that n = p 2 m and write a = 1 + pm. One has a p 1 + p 2 m +... 1 (mod n) So the order of a is p. But p does not divide n 1 (p n) so we get a contradiction. Conversely, let n be a square-free integer such that p 1 divides n 1 for all prime divisors p of n. Let a be prime to n one has and because n 1 is a multiple of p 1, a p 1 1 (mod p) a n 1 1 (mod p). Using the Chinese Remainder theorem for all the factors p, one gets a n 1 1 (mod n). Corollary 2.1. Any Carmichael number is the product of at least 3 distinct odd primes. 3

Proof. Because a Carmichael number is without square factor and is not prime it has at least two prime factors. Let us assume that n = pq with p < q. Then q 1 divides pq 1 = p(q 1) + p 1 so q 1 divides p 1. Absurd. Example 3. Show that if 6m + 1, 12m + 1 and 18m + 1 are primes then n = (6m + 1)(12m+1)(18m+1) is a Carmichael number. First by the Chinese Remainder theorem, one can see that if n = ab with a, b coprime then for any x prime to n one has x lcm(φ(a),φ(b)) 1 (mod n). Now lcm(φ(6m + 1), φ(12m + 1), φ(18m + 1)) = 36m and also 36m n 1. One can check that 1729 is such a number. 2.1.3 Lucas test Let n > 1 be an integer. We will show that if there exists an a such that a n 1 1 (mod n) and a q 1 (mod n) for all q n 1, q n 1, then n is prime. This is a very good test for Fermat numbers F m, i.e. numbers of the form n = 2 2m + 1 (For m = 0... 32 only the first five are prime. F 33 is so big that it may be many years before we can decide its nature). But obviously this test is not good for a generic prime since we must know the factorization of n 1. Let assume that such an a exists and let d be the order of a in (Z/nZ). Since a n 1 1 (mod n), d (n 1). More exactly as no proper divisor of n 1 is the order of a, one has d = n 1. Now n 1 = d φ(n). This is possible only if n is prime. 2.1.4 Rabin-Miller test Contrary to the Fermat test, the Miller-Rabin test can prove the compositeness of any composite number (i.e. there is no analog of Carmichael numbers for this test). But Rabin-Miller test is a Monte-Carlo algorithm : it always stops ; if it answers yes, the number is composite and if it answers no then the answer is correct with a probability greater than 3/4. Let n be an odd positive integer and s = max{r N, 2 r n 1}. Let d = (n 1)/2 s. Lemma 2.1 (Miller). If n is a prime and if a is an integer prime to n then we have either a d 1 (mod n) or there exists r {0,..., s 1} such that a 2rd 1 (mod n). Proof. The order of a is a divisor of n 1. It can be d and then a d 1 (mod n). If it is not then its order is 2 r d for r {1,..., s}. So a 2rd 1 (mod n) and a 2r 1d is a non-trivial square root of 1 so a 2r 1d 1 (mod n). If we find an a which is prime to n and that satisfies neither of the conditions, then n is composite. Such an integer a is called a witness (témoin) for the compositeness of n. Example 4. Let n = 561. a = 2 is a witness for n. Indeed here s = 4, d = 35 and 2 35 263 (mod 561), 2 2 35 166 (mod 561), 2 4 35 67 (mod 561), 2 8 35 1 (mod 561). 4

For the efficiency of the Rabin-Miller test, it is important that there are sufficiently many witnesses for the compositeness of a composite number. Theorem 2.1 (Rabin). If n 3 is an odd composite number, then the set {1,..., n 1} contains at most (n 1)/4 numbers that are prime to n and not witnesses for the compositeness of n. Proof. Let k be the maximum value of r for which there is an integer a prime to n that satisfies the second identity. We set m = 2 k d. Let n = p pe(p) be the prime factorization of n. Let J = {a : gcd(a, n) = 1, a n 1 1 (mod n)} K = {a : gcd(a, n) = 1, a m ±1 (mod p e(p) ) for all p n} L = {a : gcd(a, n) = 1, a m ±1 (mod n)} M = {a : gcd(a, n) = 1, a m 1 (mod n)}. We have M L K J (Z/nZ). For each a which is not a witness for the compositeness of n, the residue class a belongs to L. We will prove that the index of L in (Z/nZ) is at least four. The index of M in K is a power of 2. Indeed one can write M K M x x x 2. Let denote I the image of the morphism s : x x 2 from the multiplicative group K. s has kernel a group of order 2 j for some j so #I = #K/2 j. Now since #I divides #M we can write #M = #I a and [K : M] = 2 j /t and is a power of 2. Let s say 2 j. If j 2 then we are finished. If j = 1 (i.e. [L : K] = 2) then n has two prime divisors. It follows from Cor. 2.1 that n is not a Carmichael number. This implies that J is a proper subgroup of (Z/nZ) and the index of J in (Z/nZ) is at least 2. Therefore the index of L in (Z/nZ) is at least 4. Finally, let j = 0. Then n is a prime power, say n = p e with e > 1. But φ : (Z/nZ) Z/(p 1)Z Z/p e 1 Z is an isomorphism. As n 1 is prime to p a n 1 1 (mod n) if and only if φ(a) = (µ, 0). So [(Z/nZ) : J] = #Z/p e 1 Z = p e 1. This is bigger than 4 except for n = 9 which can be checked by hand. To apply the Rabin-Miller test, we choose a random number a {2,..., n 1}. If gcd(a, n) > 1 then n is composite. Otherwise we compute a d, a 2d,..., a 2s 1d. If we find a witness for the compositeness of n, then we have proved that n is composite. By Th. 2.1, the probability that n is composite and that a is not a witness is less than 1/4. So if we repeat the test t times we can make this probability less than (1/4) t. For t = 10 this probability is less than 10 6. Remark 2. Under the Generalized Riemann hypothesis (which is conjectural but believed true), it can be proved that there is always a witness for the compositeness of n with 5

a O((log n) 2 ). If we want a absolute test, Adleman, Pomerance, Rumely, Cohen and Lenstra have given an algorithm (APRCL) which is slower but still feasible on numbers of 1000 digits (it runs in O( n C log log n 2 2 )). In 2002, M. Agrawal, N. Kayal and N. Saxena have found a deterministic polynomial algorithm to solve the problem of primality. 3 Factorization Now given an n that is known to be composite, how can we find its decomposition in prime factors? We are going to present algorithms to obtain a non-trivial factor. By repeating inductively the algorithm, we can then factorize the number. 3.1 Trial division To find small prime factors of n, a precomputed table of all prime numbers below a fixed bound B is computed. This can be done using the sieve of Eratosthenes. A typical bound is B = 10 6. Example 5. We want to factor n = 3 21 + 1. Trial division with primes less than 50 yields the factors 2 2, 7 2, 43. If we divide n by those factors, we obtain m = 1241143. Since 2 m 1 793958 (mod m), this number is still composite. 3.2 Pollard p 1 method This algorithm is efficient when n has a prime factor p such that p 1 has only small prime divisors. Indeed, by Fermat s little theorem, one has a k 1 (mod p) for all multiple k of p 1. If p 1 has only small prime divisors, one can try k = q P,q e B q e where B is a given bound. Now if a k 1 is not divisible by n, then gcd(a k 1, n) is a non-trivial factor of n. Example 6. Let n = 1241143 of the previous example. We set B = 13. Then k = 8 9 5 7 11 13 and gcd(2 k 1, n) = 547. So n = 547 2269 which are both prime numbers. 6

3.3 Quadratic sieve 3.3.1 Idea The quadratic sieve finds integer x, y such that and x 2 y 2 (mod n) x ±y (mod n). Then n is a divisor of x 2 y 2 = (x y)(x + y) but of neither x y or x + y. Hence g = gcd(x y, n) is a proper divisor of n. Example 7. Let n = 7429, x = 227, y = 210. Then x 2 y 2 = n, x y = 17 so 17 n. 3.3.2 Determination of x and y The idea from the previous section is also used in other factoring algorithms, such as the number field sieve (NFS), but those algorithms have different ways of finding x, y. We describe how x, y are found in the quadratic sieve. Let m = n and f(x) = (X + m) 2 n. We first explain the procedure on an example. Example 8. Let n = 7429. Then m = 86. One has This implies f( 3) = 83 2 7429 = 540 = 1 2 2 3 3 5, f(1) = 87 2 7429 = 140 = 2 2 5 7, f(2) = 88 2 7429 = 315 = 3 2 5 7. 83 2 1 2 2 3 3 5 (mod 7429), 87 2 2 2 5 7 (mod 7429), 88 2 3 2 5 7 (mod 7429). If the last two congruences are multiplied then we obtain (87 88) 2 (2 3 5 7) 2 (mod n). Therefore we can set x 87 88 (mod n) 227 and y 2 3 5 7 (mod n) 210. In the example we have presented number s for which the value f(s) has only small prime factors. Then we use the congruence (s + m) 2 f(s) (mod n). From those congruences, we select a subset whose products yields squares on the leftand the right-hand sides. The left-hand side of each congruence is a square anyway. Also we know the prime factorization of each right-hand side. The product of a number of right-hand sides is a square if the exponents 1 and all prime factors are even. In the next section, we explain how an appropriate subset of congruences is chosen. 7

Table 1: Factor base and sieving # decimal digits of n 50 60 70 80 90 100 110 120 # factor base in thousand 3 4 7 15 30 51 120 245 # sieving interval in million.2 2 5 6 8 14 16 26 3.3.3 Choosing appropriate congruences The selection process is controlled by coefficients λ i {0, 1}. If λ i = 1 the congruence i is chosen; otherwise it is not. The product of the right hand sides of the chosen congruences is ( 1 2 2 3 3 5) λ1 (2 2 5 7) λ2 (3 2 5 7) λ 3 = ( 1) λ1 2 2λ 1+2λ2 3 3λ 1+2λ3 5 λ 1+λ 2 +λ3 7 λ 2+λ 3. We want this number to be a square, so we have to solve the following linear system: λ 1 0 (mod 2) 2λ 1 + 2λ 2 0 (mod 2) 3λ 1 + 2λ 3 0 (mod 2) λ 1 + λ 2 + λ 3 0 (mod 2) λ 2 + λ 3 0 (mod 2). A solution is λ 1 = 0, λ 2 = λ 3 = 1. In general we choose a positive integer B. Then we look for integers s such that f(s) has only prime factors that belong to the factor base F (B) = {p P, p B} { 1}. Such values f(s) are called B-smooth. If we have found as many values for s as the factor base has elements, then we try to solve the corresponding linear system over Z/2Z. Faster algorithms than Gauss algorithm exist in this case. 3.3.4 Sieving It remains to be shown how the values of s are found for which f(s) is B-smooth. One possibility is to compute the value f(s) for s = 0, ±1, ±2,... and to test by trial division whether f(s) is B-smooth. Unfortunately, those values typically are not B-smooth. This is very inefficient as the factor base is large for large n (see Tab. 1). A more efficient method is to use sieving techniques, which are described as follows. We explain a simplified version that shows the main idea. We fix a sieving interval S = { C, C+1,..., 0, 1,..., C}. We want to find all s S such that f(s) is B-smooth. 8

To find out which of the values f(s) is divisible by a prime number p in the factor base, we start from the end. We fix a prime p. The equation f(s) 0 (mod p) has two solutions s i,p which can be computed quickly. Then we try to find values s i,p + kp S. After each step, we divide the corresponding f(s) by p. Prime powers can be treated similarly. Example 9. Let n = 7429, m = 86. The factor base is the set {2, 3, 5, 7} { 1}. As sieve interval, we use the set S = { 3,..., 3}. s 3 2 1 0 1 2 3 (s + m) 2 n 540 373 204 33 140 315 492 Sieve with 2 135 51 35 123 Sieve with 3 5 17 11 35 41 Sieve with 5 1 7 7 Sieve with 7 1 1 Remark 3. The optimum size of the factor base is roughly B = ( ) 2/4 e log n log(log n) and the sieving interval is in C = B 3. The heuristic running time is L n (1/2, 1). The fastest current algorithm is NFS which is in L n (1/3, (64/9) 1/3 ). 9