Determining the Optimal Combination of Trial Division and Fermat s Factorization Method

Determining the Optimal Combination of Trial Division and Fermat s Factorization Method Joseph C. Woodson Home School P. O. Box 55005 Tulsa, OK 74155

Abstract The process of finding the prime factorization of large integers has many important applications in the field of computer science, most notably in cryptography. Currently the bestknown algorithms for factoring very large numbers are subexponential. However, the subexponential algorithms invoke the factorization of smaller numbers, and the best algorithms for factoring smaller numbers are still the exponential algorithms such as trial division, Fermat s factorization method, and Lehman s algorithm. In this investigation, a highly optimized combination of trial division with Fermat s factorization method was implemented to attain much faster performance than trial division alone. Since all of the optimizations involved the modulus function, they were implemented by creating thousands of threads and discarding those which were essentially useless before running the rest. Trial division alone was the best method to use for numbers smaller than 48 bits in size, the combination of trial division and Fermat s method implemented in the investigation was the best for numbers between 48 and 70 bits in size, and Lehman s method was the best for numbers larger than 70 bits in size. The algorithm used in this investigation resulted in faster performance than any other known algorithm for numbers between 48 and 70 bits in size.

Introduction The Fundamental Theorem of Arithmetic states that every positive integer except 1 can be written uniquely as the product of one or more prime numbers. While the theorem itself is simple, the process of finding the prime factors of a number is decidedly more difficult than it seems at first glance. In fact, that process is so infeasible for large enough numbers that the most widely used cryptography algorithm is designed so that it can only be broken by factoring a large number into two prime factors (Krakanis 1986, p. 142-143). Many factorization algorithms have been devised, with varying degrees of success. The simplest of these is trial division, which is nothing more than testing the number for divisibility by every integer less than or equal to its square root. Pierre de Fermat improved on this with his own algorithm, which works by converting the number into a difference of squares and factoring it as such (Bressoud and Wagon 2000, p. 169). The modular characteristics of perfect squares can be used to provide a notable performance increase to this algorithm (Crandall and Pomerance 2001, p. 192). Lehman (1974) improved performance further by multiplying the number by a relatively small integer before testing. All of the preceding algorithms are exponential; that is, their runtime is a constant raised to a power equal to a polynomial function of the size of the number being factored. In 1931, Lehmer and Powers (1931) proposed that continued fractions can be used to factor a large integer. However, it was not until 1975 that Morrison and Brillhart (1975) programmed it on a computer and discovered that it was much faster than previous algorithms for large numbers. It turns out that the continued fractions algorithm is subexponential; that is, its runtime is a constant raised to a power with only logarithms of the size of the number being factored and constants in it. Later subexponential algorithms including the quadratic sieve and the general 1

number field sieve are much more efficient than the continued fraction algorithm, the latter currently being the fastest algorithm available for factoring very large integers (Crandall and Pomerance 2001, p. 225-242). Since the general number field sieve is by far the best algorithm for factoring large integers, one may wonder whether the other algorithms have a purpose. The answer is that while the subexponential algorithms may be very fast at factoring very large integers, the exponential factorization algorithms are still the fastest at factoring moderate-sized integers. In fact, most subexponential factorization algorithms require the complete factorization of a smaller number in order to function properly (Crandall and Pomerance 2001). A previous investigation showed a possible optimization to the trial division algorithm. The remainder of this paper discusses the combination of a further improved trial division algorithm with an optimized implementation of Fermat s factorization method. Methods The previous experiment investigated the effects of various degrees of real-time number pruning using the Sieve of Eratosthenes on the performance of the trial division algorithm. The basic structure of the C# 4.0 program from this investigation was used as a framework for the creation of the new program. The previous investigation used incrementing counters and Boolean values to avoid unnecessary division operations. Despite the 15% improvement in performance by using a single counter, the overhead of the additions within the counters made further improvement impossible. This concern was alleviated in the current investigation by subdividing the factors into 15,015 threads, each programmed to test divisibility by all odd numbers with a given value modulo 15,015, and discarding the 9,255 threads that represented factors divisible by 3, 5, 7, 11, or 13. 2

This improvement reduces the runtime by nearly 62% while only incurring the cost of a constant overhead (as opposed to the linear overhead of using counters). The main purpose of current the investigation was to implement Fermat s factorization method in combination with trial division and to find the optimal location at which to change from trial division to Fermat s method. The implementation of Fermat s algorithm created for this investigation improves on the commonly used design by separating the task of performing the test into 55,440 threads based on the value of the test numbers modulo 55,440 and discarding all but the 480 to 864 threads that can possibly lead to perfect squares being found. This optimization improves performance by a factor of roughly 115. The program also included a basic unoptimized implementation of Lehman s factorization algorithm. This was created for the purpose of comparing the optimized Fermat s algorithm to Lehman s, which is supposedly more efficient. A multithreaded, but otherwise unoptimized, version was eventually implemented; however, the single-threaded version was used for the test runs because the multithreaded version was not finished at the time. The first thing the program would do was use trial division up to, where is the integer being factored. This number was arbitrarily chosen based on preliminary test results which showed that a lower number would be unnecessary. Secondly, the program would do a prescribed number of Fermat s test iterations greater than or equal to 0 and less than or equal to the maximum number of useful iterations, + 1, where is the integer being factored and is the highest divisor already tested using trial division. In this configuration was equal to. If the number of Fermat s test iterations was 3

equal to 0, then the program would finish the trial division through. Otherwise, the program would use trial division from + 2 through + 1 + 1, where is the integer being factored and is the number of Fermat s test iterations. The program had a variety of user-configurable settings. The first parameter after the program s filename was the number to factor. Optional command-line switches specified the maximum number of processor cores to use, the number of integers to run the tests on, the number of Fermat s test iterations to perform per test, whether to use Miller-Rabin probabilistic primality test and only attempt to factor prime numbers (used for performance testing, such as in this investigation), and whether to use Lehman s test in lieu of the combined trial division / Fermat s method test. Every time the program ran out of time for all trials in a run, it would output relevant data to a CSV file. Before every run, the program checked this file to see whether any past runs known to take less time than the one being attempted had run out of time. If so, then the program would automatically output that it ran out of time and terminate. A batch file was used to automatically initiate the runs over 37 nights. Every run had 8 trials, each limited to 30 minutes. Miller-Rabin testing was enabled since only a worst-case runtime was desired. A total of 1,136 runs (9,088 trials) were performed, running for 442 total hours on an Intel Core 2 Quad Q6600 processor. One number was used for every even number of bits from 8 to 128. Each of these numbers began with eight trailing ones in binary to reduce the uncertainty introduced by randomly generating numbers. For every chosen number, one run was done with no Fermat s test iterations, one run was done with the maximum number of Fermat s test iterations possible, one run was done for each number of Fermat s test iterations equal to 4

10 ( being an integer between 1 and log inclusive), and one run was done using Lehman s algorithm. Additional runs were done afterwards to determine, to the nearest 1/32, the base-10 logarithm of the exact number of Fermat s test iterations that resulted in the minimum runtime for each number between 66 and 78 bits in size. Results and Discussion After all runs were complete, the runtimes of the last seven trials of each run were averaged to obtain the average runtime for the run. The first trials were disregarded due to the fact that the initial startup times were highly variable, resulting in very large error values for smaller numbers when all trials were averaged. For numbers between 8 and 18 bits in size, the runtimes were negligible (several microseconds) and similar for all numbers of Fermat s test iterations (Figure 1). For numbers between 20 and 46 bits in size, any nonzero number of Fermat s test iterations resulted in a significant decrease in performance (Figures 1 and 2). This is likely because of the extra overhead involved in initializing the threads for Fermat s test. For numbers between 48 and 64 bits in size, it is optimal to use a small number of Fermat s test iterations (Figure 2). There was no significant difference in performance for these numbers as long as the number of iterations was between 10 and 10. For numbers between 66 and 70 bits in size, relatively large numbers of Fermat s test iterations (around 10 to 10 ) provided optimal performance (Figures 3 and 4). For numbers between 72 and 84 bits in size, the lowest runtimes were achieved by using Lehman s test (Figure 3). All runs with numbers larger than 84 bits ran out of time. The runtimes using optimal settings were less than 20 milliseconds for all numbers 64 bits in size or smaller. A large increase in runtime was seen for numbers larger than 64 bits 5

(Figure 5). This was likely due to the performance penalty incurred from using the GNU Multi- Precision library (C# has no primitive type for such integers). Conclusions A fit of the runtimes for numbers 66 bits and larger to an exponential curve showed that combining Fermat s factorization method with trial division resulted in a constant performance improvement compared to using trial division alone. However, the ratio of the two runtimes to each other remains relatively constant since the exponent is essentially the same in both equations. Lehman s method, while not nearly as fast for small numbers due to its relatively high constant, is definitely the best of the three for large numbers since its runtime has a lower exponent than the other two. In general, it is most efficient to use trial division alone for numbers smaller than 48 bits, trial division combined with Fermat s factorization method for numbers between 48 and 70 bits in size, and Lehman s method for numbers larger than 70 bits. 6

Acknowledgements The libraries at Oklahoma State University Tulsa, the University of Tulsa, and Oral Roberts University provided the books referenced in the paper. 7

References Bressoud D, Wagon, S. 2000. A course in computational number theory. New York (NY): Key College Publishing. 367 p. Coutinho SC. 1999. The mathematics of ciphers: number theory and RSA cryptography. Natick (MA): A K Peters. 196 p. Crandall R, Pomerance C. 2001. Prime numbers: a computational perspective. New York (NY): Springer. 545 p. Du Sautoy M. 2003. The music of the primes. New York (NY): HarperCollins Publishers. 335 p. Estermann T. 1952. Introduction to modern prime number theory. London (United Kingdom): Cambridge University Press. 74 p. Kranakis E. 1986. Primality and cryptography. Chichester (United Kingdom): John Wiley & Sons. 235 p. Lehman RS. 1974. Factoring large integers. Mathematics of Computation 28:637 646. Lehmer DH, Powers RE. 1931. On factoring large numbers. Bulletin of the American Mathematical Society 37:770-776. Loweke GP. 1982. The lore of prime numbers. New York (NY): Vantage Press. 259 p. Morrison MA, Brillhart J. 1975. A method of factoring and the factorization of F 7. Mathematics of Computation 29:183-205. Trappe W, Washington LC. 2006. Introduction to cryptography with coding theory. Upper Saddle River (NJ): Pearson Prentice Hall. 577 p. 8

Figures Runtime (microseconds) 100000.0 10000.0 1000.0 100.0 10.0 1.0 8 10 12 14 16 18 20 22 24 26 28 30 32 Without Fermat With Fermat Lehman Number Size (bits) Figure 1: Comparison of runtimes for numbers 8 to 32 bits in size. Runtime (microseconds) 100000000.0 10000000.0 1000000.0 100000.0 10000.0 1000.0 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 Without Fermat With Fermat Lehman Number Size (bits) Figure 2: Comparison of runtimes for numbers 34 to 64 bits in size. Runtime (microseconds) 10000000000.0 1000000000.0 100000000.0 10000000.0 1000000.0 y = 0.0003e 0.3947x R² = 0.9971 y = 2E- 05e 0.4074x R² = 0.9954 y = 3.4268e 0.2353x R² = 0.9997 66 68 70 72 74 76 78 80 82 84 Without Fermat With Fermat Lehman Number Size (bits) Figure 3: Comparison of runtimes for numbers 66 to 84 bits in size with exponential trendlines. 9

Base- 10 Logarithm of Optimal Number of Fermat's Test Iterations 12.5 12 11.5 11 10.5 10 9.5 66 68 70 72 74 76 78 Number Size (bits) Figure 4: Optimal numbers of Fermat s test iterations for numbers 66 to 78 bits in size. 10000000000.0 1000000000.0 100000000.0 Runtime (microseconds) 10000000.0 1000000.0 100000.0 10000.0 1000.0 100.0 10.0 Without Fermat With Fermat Lehman 1.0 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 Number Size (bits) Figure 5: Comparison of runtimes for numbers 8 to 84 bits in size. 10