Math 5330 Spring 013 Elementary factoring algorithms The RSA cryptosystem is founded on the idea that, in general, factoring is hard. Where as with Fermat s Little Theorem and some related ideas, one can usually tell very quickly if a composite number is, in fact, composite, actually producing a factorization of a composite number is a very different thing. Currently, the only method at our disposal is trial division. For small numbers, trial division is the method of choice. If you wish to factor a number n ă 10 10, you should probably use trial division. But what if you want to factor a large number? Trial division still has a part to play. If you have a number of size roughly 10 30, then you would need to be very lucky to factor it with trial division. If the number were to be the product of two nearly equal primes (or if the number itself were prime) then you would have to perform trial division up to about 10 15 to see this. To put this in perspective, there are roughly 9,000,000,000,000 primes up to 10 15, and even if we could perform 10 6 multi precision divisions a second, it would take 9,000,000 seconds to try them all. That is, trial division could take about a year. So what to do with a 30-digit or larger number? First, one usually uses trial division for a while. After all, we know how to factor any even number. At some point, it is useful to now that the number actually is composite, so after some trial division, if m is the current unfactored part, calculate m 1 pmod mq. If it is not 1, then m is composite. Usually one does some more trial division (try, say, all primes p ă 10 6.) But after that, switch to some other factoring method. What other factoring methods are there? Here I will present several other fairly simple factoring methods. The first dates back to Fermat, the rest are less than 50 years old. Fermat s Factoring Method Our first method is based on the idea that if n x y, then n px yqpx ` yq. That is, we will try to represent n as the difference of two squares, and use that representation to factor n. To do this, we start with a number x 0 r? ns, and calculate px 0 ` kq n, for k 0, 1,,..., stopping when a square is returned. There is a trick to speed up the calculations for px 0 ` kq n, and that is that two successive values are related. That is, px 0 ` k ` 1q n rpx 0 ` kq ns ` px 0 ` kq ` 1, so we only have to calculate one square. For example, if n 3977, then x 0 64, and we need to calculate x 0 n 64 3977 119. To calculate 65 3977 we don t even have to square 65, we just add ˆ 64 ` 1 to 119 to get 48. Moreover, these numbers, px 0 ` kq ` 1 grow by each time, so we don t even need to recalculate them, we just add to the previous value. Here is a table for these calculations.
k x 0 ` k px 0 ` kq ` 1 px 0 ` kq n 0 64 19 119 1 65 131 48 66 133 379 3 67 135 51 4 68 137 647 5 69 139 784 = 8 What this tells us is that 69 3977 8 which we rearrange as 3977 69 8 p69 8qp69 ` 8q 41 ˆ 97. Each iteration in the table goes very fast on a computer, the most difficult step of which is to determine if px 0 ` kq n is a square. Fermat s factoring method works reasonably well for small numbers n and for numbers n pq where p and q are nearly equal. An example I ve come across is in trying to factor n 10 ` 1. If you use trial division for a while, you find factors 89 and 101, leaving a 19-digit number, 1,11,470,797,641, 561, 909. If you try Fermat s method on this number, you fairly quickly find 1, 11, 470, 797, 641, 561, 909 105668961 ˆ 105788969. How good is Fermat s method? For small numbers, it is a reasonable thing to try. But in fact, it is worse than trial division in general! The worst case of Fermat s method is where n is prime. In this case, n factors as n 1, so we need x ` y n, x y 1. This means x n ` 1 and y n 1. Now the x here is x 0 `k, where x 0 is roughly? n. That is, we need? n ` 1 n ` k, so k «n ` 1?n steps before concluding that n is prime. To see what this means, suppose we have an n around 10 10. This is a very small number, as factoring goes. If n is prime, it will take about? n steps or 10 5 steps to show this by trial division. With Fermat s method, it will take 1 1010 10 5 steps. Thus, trial divisor takes about 100,000 steps, Fermat s method takes 4,999,900,000 steps. On average, one expects to find a composite number n to have a prime divisor of size n.63, and coprime part of size about n 37. If the coprime part is actually prime, then trial divisor will find the factorization of n in about n.37 steps. Fermat s method will take something like 1 n.63 steps, so again trial division wins. Thus, in general, one should never use Fermat s method to completion. You can try several million steps, maybe, hoping to get lucky, but then switch to something else. Before moving on to the next method, I should mention that many approaches can be improved, or are more advantageous in some situations than in others. We already know, for example, that if n p, then the only possible divisors of n are primes q 1 pmod pq, so we can skip most numbers when using trial division on such numbers. With Fermat s method, there is another way to speed things up. Paradoxically, it is to try to factor a number larger than n rather than factoring n. Pick some appropriate number, m, and try to factor mn Page
rather than n. The idea is that mn might factor into two nearly equal parts. Here is a simple example. If we wish to factor 107 with Fermat s method, then x 0 35 and after 10 steps, we get x 0 ` 9 44, with 107 44 7 p44 ` 7qp44 7q 71 ˆ 17. If, on the other hand, we first multiply n by 3, and use Fermat s method on 361, then x 0 61 and already we have 61 361 100 10. Here, we have 361 61 10 71 ˆ 51, and looking for the factor divisible by 3, we recover 107 71 ˆ 17. In general, one multiplies n by some number with lots of factors, like 315 3 ˆ 5 ˆ 7 on the hopes that some factors multiplying p with others multiplying q producing nearly equal numbers. For example, suppose we wish to use Fermat s method to factor 741. This would require 35 steps with Fermat s method: x 0 87, 87 741 148, 88 741 33,..., p87 ` 4q 741 4900 70. If, instead, we multiply n by 315 and try to factor 337615, then four steps are required: x 0 159, 159 337615 6, 1530 Ñ 385, 1531 Ñ 6346, 153 Ñ 9409 97. The reason: 741 41 ˆ 181, and these primes are far apart. However, multiplying by 315 gave the factorization 315ˆ741 153 97 169ˆ1435 p9ˆ181qp35ˆ41q. Multiplying by a number m CAN make Fermat s method worse. I believe there is an algorithm for picking a sequence of numbers m to multiply by n. One tries Fermat s method on each mn for some prescribed period of time, and in the end, you can factor n in something under 3? n steps rather than? n steps as required by trial division. I do not know the details. The next two methods were both devised by a mathematician by the name of John Pollard. They are both considerably better than trial division. However, before using them, one should check that n ı pmod nq, so one knows n is composite. Pollard s rho method (1975) This method uses an iterated functions approach. Let fpxq x `1 (lots of other functions could be used instead of this one), and consider the sequence fp1q, fpfp1qq, fpfpfp1qqq,.... pmod pq. This sequence will be eventually periodic. This means that after a while, a periodic pattern will present itself. For example, if p 3, the sequence is 1,, 5, 3, 10, 9, 13, 9, 13, 9,.... We call 1,, 3, 4, 10 the tail of this eventually periodic pattern. If we let f m pxq represent the m-fold composition fpfp fpxq qq, then for any prime p there are integers k m for which f k paq f m p1q pmod pq. This is because there are only p possible remainders when a number is divided by p, but there are infinitely many m. Once we have an m and a k, then f k`1 p1q f m`1 p1q, f k` p1q f m` p1q, and so on. This means that if p is some unknown divisor of n, and if we could find the right m and k, then we might be able to find p because p would be a divisor of gcdpf m p1q f k p1q, nq. How do we find m and k when we don t even know p? We use a method called Floyd s Cylce Finding Algorithm. The algorithm works like this: Suppose we have a sequence a 0, a 1, a,... which is eventually periodic. Then a m a m for some integer m. We can use this to form a factoring algorithm: To factor n, for k 1,, 3,..., calculate gcdpf k p1q f k p1q, nq. In fact, what we do is calculate a sequence f k p1q pmod nq, to keep the numbers from getting Page 3
too large, and for even values of k, we calculate gcdpf k p1q f k{ p1q, nq. As an example, let n 1357. We have k f k f k{ difference gcd 1 - - - 5 3 1 3 6 - - - 4 677 5 67 1 5 101 - - - 6 66 6 40 1 7 193 - - - 8 611 677-66 1 9 147 - - - 10 155 101 34 1 11 906 - - - 1 109 66 943 3 and so, 3 is a divisor of 1357. The reason this works should be made clear if we just do things modulo 3: k f k f k{ difference 1 - - 5 3 3 3 - - 4 10 5 5 5 9 - - 6 13 3 10 7 9 - - 8 13 10 3 9 9 - - 10 13 9 4 11 9 - - 1 13 13 0 That is, f 1 p1q f 6 is divisible by 3, so it is at the stage k 1 that the prime 3 is discovered by Pollard s rho algorithm. How fast is the rho method? Certainly it has to find a prime p in at most p steps. This does not sound very good: trial division will find p in exactly p steps. However, there is reason to believe the rho method finds p much faster than p steps. Suppose, instead of numbers f m p1q, we just produced random numbers. How long would it take before two of our random numbers agreed modulo p? The is a variation of the birthday problem in probability: If you pick k things (with replacement) from n types of things, what is the probability of getting two of the same thing? The probability that the are all different is ˆ npn 1qpn q pn k ` 1q 1 1 1 ˆ 1 ˆ 1 k 1. n k n n n Page 4
Let s ask a different question: When is the probability of finding a match 1? To approximate the probability, take the logarithm. We want ln k 1 ÿ j 1 Using the approximation lnp1 xq «x, we want ln «1 n ` n ` ` k 1 n ˆ ln 1 j. n kpk 1q n «k n. This means we want k «a n lnpq «1.177? n. For example, with the birthday problem (how many people do you need in a room to have a 50-50 chance that two have a birthday in common?), this says you would need about 1.177? 365 «.5 people. What this means for the rho method: If the numbers f m p1q act random enough, then we expect to find a prime p not in p steps, but more like 1.177? p steps. Numerical evidence supports this, so for simplicity, we say the rho method probably finds a factor p in? p ă n 1{4 steps. More is known. If we used a simpler function for fpxq, say fpxq ax ` b, a linear function rather than a quadratic, then the iterates do not seem random enough, and we get something more like p steps again. But using most quadratic or higher degree polynomials, the iterates do appear to act like random numbers. Pollard s p 1 method (1974) Recall Fermat s Little Theorem yet again: For any prime p, and any number a with p ffl a, then a p 1 1 pmod pq. In particular, if p ą, then p 1 1 pmod pq. If m is a multiple of p 1, say m kpp 1q, then m p p 1 q k 1 k 1 pmod pq. This means that p m 1 for any m where pp 1q m. For example, if p 7, then p 1 6 so 7 m 1 for any m divisible by 6. For example, 1 1 4095 7 ˆ 585. We can turn this into a factoring algorithm as follows: take a sequence of m s with lots of small factors (we will use the sequence m k k!, but other sequences would work as well.) For each term in the sequence, we calculate gcdpn, m k 1q, and stop when the gcd returns a number larger than 1. This method will find a prime divisor p of n if p 1 m k. This method works very well if p 1 has all small prime divisors. The Maple command ifactor(n, easy) does the following: It uses trial division up to some limit, and then uses some fixed number of iterations of the p 1 method. For example, ifactor(10 37 1, easy) returns p3q c8 p4769013q. What this means is that it found 9 and 47,69,103 as factors of 10 37 1, leaving a 8-digit number that it knew to be composite (the meaning of the c). The factor 4769013 was found by the p 1 method. It was successful because p 1 ˆ 3 ˆ 37 ˆ 41 ˆ 61 ˆ 3 Page 5
has all small divisors. In particular, it did NOT find the smaller prime divisor q 08119 because q 1 ˆ 37 ˆ 7407, and it did not do enough iterations so that 7407 m. As a simple example of the p 1 method, let s factor n 3811. As with the rho method, we form a table: k k! pmod 3811q gcdp k! 1, 3811q 4 1 3 64 1 4 1194 1 5 17 1 6 357 37 and 3811 37 ˆ 103. We found 37 after 6 steps because 37-1 = 36, a divisor of 6!. Some notes on this table: We did not calculate k!, but k! pmod nq. Also, one can calculate pk`1q! by using the formula pk`1q! p k! q k`1, using the binary squaring algorithm. That is, once we know 5! 17 pmod 3811q, we calculate 6! pmod 3811q by calculating instead, 17 6 pmod 3811q. In real life, back in the late 70 s, the p 1 method was used to show that 10 53 1 is divisible by p 13581567337711173. In fact, this prime was found fairly quickly because p 1 ˆ 3 ˆ 11 ˆ 53 ˆ 179 ˆ 1553 ˆ 3557 ˆ 8941, which has all of its prime divisors less than 10,000. Page 6