Integer Factorisation



Similar documents
Factoring & Primality

An Overview of Integer Factoring Algorithms. The Problem

Integer Factorization using the Quadratic Sieve

Factorization Methods: Very Quick Overview

Factoring Algorithms

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

A Comparison Of Integer Factoring Algorithms. Keyur Anilkumar Kanabar

FACTORING LARGE NUMBERS, A GREAT WAY TO SPEND A BIRTHDAY

Primality - Factorization

ELLIPTIC CURVES AND LENSTRA S FACTORIZATION ALGORITHM

Runtime and Implementation of Factoring Algorithms: A Comparison

International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013

The Mathematics of the RSA Public-Key Cryptosystem

Lecture 13 - Basic Number Theory.

CHAPTER 5. Number Theory. 1. Integers and Division. Discussion

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and

8 Primes and Modular Arithmetic

Study of algorithms for factoring integers and computing discrete logarithms

Notes on Factoring. MA 206 Kurt Bryan

On Generalized Fermat Numbers 3 2n +1

ECE 842 Report Implementation of Elliptic Curve Cryptography

Determining the Optimal Combination of Trial Division and Fermat s Factorization Method

Primality Testing and Factorization Methods

FACTORING. n = fall in the arithmetic sequence

Elementary factoring algorithms

FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z

Factoring Algorithms

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Faster deterministic integer factorisation

MATH 168: FINAL PROJECT Troels Eriksen. 1 Introduction

Computer and Network Security

Lecture 13: Factoring Integers

The Quadratic Sieve Factoring Algorithm

The van Hoeij Algorithm for Factoring Polynomials

Revised Version of Chapter 23. We learned long ago how to solve linear congruences. ax c (mod m)

Quotient Rings and Field Extensions

RSA Question 2. Bob thinks that p and q are primes but p isn t. Then, Bob thinks Φ Bob :=(p-1)(q-1) = φ(n). Is this true?

Arithmetic algorithms for cryptology 5 October 2015, Paris. Sieves. Razvan Barbulescu CNRS and IMJ-PRG. R. Barbulescu Sieves 0 / 28

THE NUMBER OF REPRESENTATIONS OF n OF THE FORM n = x 2 2 y, x > 0, y 0

If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C?

Discrete Mathematics, Chapter 4: Number Theory and Cryptography

SUBGROUPS OF CYCLIC GROUPS. 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by

ALGEBRAIC APPROACH TO COMPOSITE INTEGER FACTORIZATION

Alex, I will take congruent numbers for one million dollars please

Principles of Public Key Cryptography. Applications of Public Key Cryptography. Security in Public Key Algorithms

Public Key Cryptography: RSA and Lots of Number Theory

CONTINUED FRACTIONS AND FACTORING. Niels Lauritzen

SUM OF TWO SQUARES JAHNAVI BHASKAR

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

NEW DIGITAL SIGNATURE PROTOCOL BASED ON ELLIPTIC CURVES

Lecture 3: One-Way Encryption, RSA Example

Why? A central concept in Computer Science. Algorithms are ubiquitous.

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

STUDY ON ELLIPTIC AND HYPERELLIPTIC CURVE METHODS FOR INTEGER FACTORIZATION. Takayuki Yato. A Senior Thesis. Submitted to

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Cryptography and Network Security Chapter 8

CHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY

Number Theory. Proof. Suppose otherwise. Then there would be a finite number n of primes, which we may

8 Divisibility and prime numbers

Lecture 3: Finding integer solutions to systems of linear equations

Number Theory and Cryptography using PARI/GP

PROPERTIES OF ELLIPTIC CURVES AND THEIR USE IN FACTORING LARGE NUMBERS

9. POLYNOMIALS. Example 1: The expression a(x) = x 3 4x 2 + 7x 11 is a polynomial in x. The coefficients of a(x) are the numbers 1, 4, 7, 11.


CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

ABSTRACT ALGEBRA: A STUDY GUIDE FOR BEGINNERS

Elements of Applied Cryptography Public key encryption

CS 103X: Discrete Structures Homework Assignment 3 Solutions

Lecture Note 5 PUBLIC-KEY CRYPTOGRAPHY. Sourav Mukhopadhyay

COLLEGE ALGEBRA. Paul Dawkins

k, then n = p2α 1 1 pα k

Factoring. Factoring 1

Continued Fractions and the Euclidean Algorithm

PYTHAGOREAN TRIPLES KEITH CONRAD

it is easy to see that α = a

6 EXTENDING ALGEBRA. 6.0 Introduction. 6.1 The cubic equation. Objectives

Math Workshop October 2010 Fractions and Repeating Decimals

2 Primality and Compositeness Tests

The Taxman Game. Robert K. Moniot September 5, 2003

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

Integer roots of quadratic and cubic polynomials with integer coefficients

Summation Algebra. x i

RSA Attacks. By Abdulaziz Alrasheed and Fatima

Cryptography and Network Security Department of Computer Science and Engineering Indian Institute of Technology Kharagpur

Elementary Number Theory and Methods of Proof. CSE 215, Foundations of Computer Science Stony Brook University

Number Theory Hungarian Style. Cameron Byerley s interpretation of Csaba Szabó s lectures

Answer Key for California State Standards: Algebra I

Basic Algorithms In Computer Algebra

11 Ideals Revisiting Z

Number Theory and the RSA Public Key Cryptosystem

What are the place values to the left of the decimal point and their associated powers of ten?

RSA and Primality Testing

THE CONGRUENT NUMBER PROBLEM

Today s Topics. Primes & Greatest Common Divisors

Mathematics Review for MS Finance Students

Factoring Polynomials

2.1 Complexity Classes

Library (versus Language) Based Parallelism in Factoring: Experiments in MPI. Dr. Michael Alexander Dr. Sonja Sewera.

Transcription:

Integer Factorisation Vassilis Kostakos Department of Mathematical Sciences University of Bath vkostakos@yahoo.com http://www.geocities.com/vkostakos May 7, 2001

MATH0082 Double Unit Project Comparison of Integer Factorisation Algorithms Candidate: Kostakos, V Supervisor: Russell Bradford SURNAME Checker: Review date: December 2000 Final submission date: 10 May 2001 Equipment required: Implement and compare several integer factorisation algorithms. Algorithms descriptions 15 α Implementation 15 α Comparison tests 30 2α Report and analysis 40 2α Total 100 6α Note: All the software files which are refereed to by this report may be found on the BUCS filesystem at : ~ma9vk\public_html\project\

Abstract The problem of integer factorisation has been around for a very long time. This report describes a number of algorithms and methods for performing factorisation. Particularly, the Trial Divisions and Fermat algorithms are dicussed. Furthermore, Pollard s ρ and p 1 methods are described, and finally Lenstra s Elliptic Curves method. The theory behind each algorithm is explained, so that the reader can become familiar with the process. Then, a sample pseudocode is presented, along with the expected running time for each algorithm. Finally, this report includes test data for each algorithm.

CONTENTS 1 Introduction 1 I Documentation 3 2 Project Plan 4 2.1 Resources............................... 4 2.2 Scheduling............................... 4 2.3 Coding standards........................... 5 3 Requirements 7 3.1 User Definition............................ 7 3.2 Functional Requirements....................... 7 3.3 Non-functional Requirements.................... 8 3.4 Software and Hardware Requirements............... 9 4 Testing 10 4.1 Correctness tests........................... 10 4.2 Performance tests........................... 10 II Implementation 12 5 Tools for factorisation 13 5.1 Greatest common divisor...................... 13 5.2 Fast exponentiation modulo..................... 13 5.3 Primality testing........................... 14 6 Trial divisions algorithm 15 6.1 Description of trial divisions algorithm............... 15 6.2 Implementation of trial divisions algorithm............ 15 6.3 Running time............................. 16 6.4 Remarks................................ 17 i

CONTENTS ii 7 Fermat s algorithm 18 7.1 Quick description of Fermat s algorithm.............. 18 7.2 Detailed description of Fermat s algorithm............. 18 7.3 Implementation of Fermat s algorithm............... 19 7.4 Running time............................. 19 7.5 Remarks................................ 20 8 The Pollard ρ method 22 8.1 Description of the algorithm..................... 22 8.1.1 Constructing the sequence.................. 22 8.1.2 Finding the period...................... 23 8.1.3 Calculating the factor.................... 24 8.2 Implementation of Pollard ρ..................... 24 8.3 Running time............................. 25 8.4 Remarks................................ 26 9 The Pollard p 1 method 27 9.1 Description of the algorithm..................... 27 9.2 A slight improvement........................ 28 9.3 Implementation of Pollard p 1................... 28 9.4 Running time............................. 29 9.5 Remarks................................ 29 10 Elliptic Curves Method 31 10.1 Introduction to elliptic curves.................... 31 10.1.1 Elliptic curves as a group.................. 31 10.1.2 Elliptic curves modulo n................... 32 10.1.3 Computation on elliptic curves............... 32 10.1.4 Factorisation using elliptic curves.............. 33 10.2 Implementation of elliptic curves method............. 34 10.3 Running time............................. 34 10.4 Remarks................................ 36 11 Overall Comparison 37 12 Epilogue 39 III Appendices 40 A Benchmarks 41 A.1 Tests with products of two nearby primes............. 41 A.2 Tests with products of three nearby primes............ 42 A.3 Tests with products of three arbitrary primes........... 42 B Program output 43 B.1 Tests output............................. 43 B.2 Combined factorisation output................... 45 B.3 Biggest factorisation......................... 47 Bibliography 48

LIST OF TABLES 2.1 My schedule.............................. 5 A.1 Products of two nearby primes................... 41 A.2 Products of three nearby primes.................. 42 A.3 Products of three arbitrary primes................. 42 iii

LIST OF FIGURES 5.1 Pseudocode for computing gcd(a, b) using the Euclidean algorithm................................. 13 5.2 Pseudocode for fast computation of a b mod m.......... 14 6.1 Pseudocode for trial divisions algorithm.............. 16 6.2 Results of tests on Trial divisions algorithm............ 16 7.1 Pseudocode for Fermat s algorithm................. 20 7.2 Results of tests on Fermat s algorithm............... 21 8.1 Pseudocode for the Pollard ρ algorithm.............. 25 8.2 Results of tests on the Pollard ρ algorithm............ 26 9.1 Pseudocode for Fermat s algorithm................. 28 9.2 Results of tests on Pollard p 1 algorithm............ 29 10.1 Pseudocode for main loop of Elliptic curves method....... 34 10.2 Pseudocode for NEXTVALUES function of Elliptic curves method. 35 10.3 Results of tests on Elliptic curves algorithm........... 36 iv

CHAPTER 1 Introduction This report, along with the software which I wrote, consist of my final year project. The main objective of this report is to balance somewhere between a theoretical explanation of certain factorisation algorithms and a description of my source code. Background information The problem of factorisation has been known for thousands of years. However, only recently did it become popular. This sudden interest in factorisation was due to the advances in cryptography, and mainly the RSA public key cryptosystem. The problem of factorisation may be stated as follows: Given a composite integer N, find a nontrivial factor f of N. There are a lot of factorisation algorithms out there. Some of them are heavily used, others just serve educational purposes. The factorisation algorithms may be distinguished in two different ways: Deterministic or nondeterministic Run time depends on size of N or f. Deterministic algorithms are algorithms which are guaranteed to find a solution if we let them run long enough. On the contrary, nondeterministic algorithms may never terminate. The most usual distinction, however, deals with the runtime of the algorithm. The running time of recent algorithms depends on the size of the input number N, whereas older algorithms depended on the size of the factor f which they find. About my project In doing my project, I tried to cover a broad range of algorithms and methods. The running time of all the algorithms I have implemented depends on the size 1

CHAPTER 1. INTRODUCTION 2 of the factor f which they find. Furthermore, only the first two algorithms which I describe are deterministic. About this report This report is divided into 3 parts. The first part deals with my preparation and scheduling for doing the project. Matters like requirements, resources, etc. are all covered in the first part. The second part of this report presents an account of all the algorithms I implemented. For each algorithm, I have tried to describe the theoretical background in order to make the reader understand what s going on. Then, I describe my implementation of the algorithm, along with some sort of pseudocode for illustration purposes. Finally, I present my test results, in the form of a graph. (In Appendices A and B I have included a set of tests on all of the algorithms). The third part consists of the Appendices, in which I have includes sample timings of the algorithms, as well as output of my program.

Part I Documentation 3

CHAPTER 2 Project Plan 2.1 Resources I started planning for this project by writing down what resources I though I was going to need in order to successfully complete the project. In terms of Hardware, all I needed was a computer, which I already owned. Furthermore, I could use the computing facilities of the University as well. In terms of software, I decided that I wanted to write the program using C. There are lots of different environments for creating C programs. However, I used the LCC-WIN32 version 3.3 for Windows, which includes an ansi-c compiler. My main concern was finding a suitable arbitrary-precision library, which I could use with my program. In the end, I decided to use Mike s Arbitrary Precision Math Library (MAPM) version 3.70, written by Michael C. Ring (ringx004@tc.umn.edu). Furthermore, I thought that I would also need some kind of books or papers, which would help me. In addition to the resources listed in the bibliography section, I also made use of the following programming books: Walter A. Burkhard, C for programmers, 1988 Wadsworth, Inc. Morton H. Lewin, Elements of C, Piscataway, New Jersey. M.I. Bolsky, The C Programmer s Handbook, AT&T Bell Laboratories, Prentice Hall, Inc. Leslie Lamport, LaTeX user s guide and reference manual, 1994 Addison- Wesley Publishing Company. 2.2 Scheduling The next part in planning my project was to devise of a schedule, which would roughly be my guide in what I do. In table 2.1 you can see my schedule, or to be precise, the final version of my schedule. 4

CHAPTER 2. PROJECT PLAN 5 Schedule Weeks Tasks 1 (Semester 1) 2 (Semester 1) 3 (Semester 1) Signed up for LEGO maze-solving robot 4 (Semester 1) Preliminary research on Robot movement, etc. 5 (Semester 1) 6 (Semester 1) Wrote first version of software for robot. 7 (Semester 1) NEW PROJECT: Integer factorisation 8 (Semester 1) Looking for a maths library 9 (Semester 1) Found the MAPM library, performance tests 10 (Semester 1) Implement trial divisions algorithm 11 (Semester 1) Wrote low-level functions for MAPM 12 (Semester 1) Implemented Fermat s algorithm (Christmas) (Christmas) (Christmas) 13 (Exams) Exams 14 (Exams) Exams 15 (Exams) Exams Revise for exams Revise for exams Revise for exams 1 (Semester 2) Research into Pollard s algorithms 2 (Semester 2) Implement MODEXPO, GCD, PRIME functions 3 (Semester 2) Pollard s ρ algorithm 4 (Semester 2) Tests on all algorithms so far implemented 5 (Semester 2) Pollard p 1. Read about Elliptic curves 6 (Semester 2) Elliptic curves algorithm and testing 7 (Semester 2) Function interface modifications, more tests 8 (Semester 2) Developed COMBINED function. Started report (Easter) (Easter) (Easter) Test result analysis, graph generation Report writting Report writting 9 (Semester 2) Report writting 10 (Semester 2) Report revision, final version preparation 11 (Semester 2) DEADLINE Table 2.1: My schedule I tried to follow my schedule as close as possible. Sometimes, I made changes to it, in order to accommodate any new tasks I thought were required. The final version of my schedule resembles quite a lot my initial schedule, however I have made a number of changes. 2.3 Coding standards It is always a good idea to specify some coding standards before starting a project, even if only one person is going to do any coding. First of all, I should say that all the source files were compiled using the -ansi flag. I received no warning messages when compiling the final version of my program. Here are some guidelines which I followed:

CHAPTER 2. PROJECT PLAN 6 Function names beginning with m belong to the MAPM library. Specifically, the functions that begin with m apm are functions which are defined in the library itself. Any other functions beginning with m are macros of functions in the MAPM library, which I defined in order to shorten the code. Function names beginning with M are low-level functions which interface the MAPM library. I wrote these functions in order to improve the performance of the program, and shorten the code as well. The prototypes for functions in file xxx.c are placed in the file xxx.h. It was obvious that the software program I was creating was quite modular, and could be built in big chunks at a time. Therefore, I decided to use a common algorithm testing interface. This meant that I would place each algorithm in a separate file, and use a common file to call the factorisation routines. This would also make it possible to call all of my algorithms from another function in an effort to factorise a really hard number. By doing the above, I was planning to minimise the effort of adding a new algorithm to my program, and make the testing of different algorithms quicker and easier.

CHAPTER 3 Requirements This chapter describes all the requirements and specifications that I used for implementing this project. Of course, these requirements were in no case static. In fact, they would change quite often, as I moved further into the project. A change in the requirements would often reflect upon a new idea that I came up with, or an idea that I wanted to drop. Therefore, these are the requirements at the end of the project. 3.1 User Definition The first thing that I had to specify was my target audience. It helps a lot to know who you want to look at your work. I guess it would be too naive to assume that my audience consisted of the two examiners that would assess my project. On the other hand, I wouldn t like to embark on a commercial software project, which would target a large piece of the market. With the above in mind, I chose my audience to be the academic community. Such an audience is not really keen about software that blows and whistles, but is more interested in the theoretical background. In fact, I believe that my project could be used for educational purposes, because it demonstrates a simple implementation of some fundamental mathematical concepts. Of course, when I refer to my project, I refer to both the software as well as the final report. Therefore, my choice of the academic community as an audience should have an effect on both the software and the final report. 3.2 Functional Requirements I believe that it was clear that my software should accept as input an integer N, and produce as output a factorisation p 1 p 2 p n of N. But there is more to it than just that. A very important requirement was that the software should be able to perform arbitrary-precision arithmetic. In other words, it should be able to deal with really long numbers, and perform calculations on them. 7

CHAPTER 3. REQUIREMENTS 8 Also, the software should output the computational time that was required to complete the factorisation, and also verify that the results it gives are correct. This should also be done while running long tests, and in which case the results should be somehow stored on a disk file. 3.3 Non-functional Requirements The most important element of the non-functional requirements deals with the algorithms that the program will implement. Therefore, I decided to implement the following algorithms. Trial division algorithm Fermat s algorithm Pollard s ρ method Pollard s p 1 method Elliptic curves method The fact that I chose not to implement one of the big algorithms, namely MPQS and NFS, is that I did not have enough time. By applying a variety of smaller algorithms, I got the flavour of different methods and theories, on which the very advanced algorithms are based. In terms of the user interface, I believe that a GUI was not something really required. Therefore, I chose to implement a command line interface, with simple input and output. The source code of the program was divided into the following files: MAIN.C This is the main file of the program. Nothing special here. MAIN.H Main header file. Contains definition of output destination for parameterised compilation. AL TRIAL.C This file contains the source code for the trial divisions algorithm. AL FERMT.C This file contains the source code for fermat s method. AL PRHO1.C This file contains the source code for Pollard s p 1 method. AL PRHO.C This file contains the source code for Pollard s ρ method. AL ELLCRVS.C This file contains the source code for the Elliptic curve method. TESTS.C Here are defined some tests for measuring the speed of each algorithm. TESTS.H This file contains parameters for the testing routine. MYLIB.C In this file I have included some of my tool functions, as well as some low-level functions for the arbitrary-precision arithmetic library I used.

CHAPTER 3. REQUIREMENTS 9 MYLIB.H This file includes function prototypes as well as macro definitions. COMBINED.C This file contains a function which utilises all the factoring algorithms. It tries to factor a given number by applying the different algorithms until the number has been completely factorised, or until it gives up. 3.4 Software and Hardware Requirements I developed the software on an MS-Windows 98 machine, with an Intel Celeron 433MHz processor. However, the software is capable of running on any machine which fulfills the minimum MS-Windows 95 requirements. Also, the source code may be compiled under a different operating system (Unix, Linux, etc.) in order to produce compatible versions of the program.

CHAPTER 4 Testing The tests I performed for my project come in two flavours. First, I had to test my algorithms to see if they ran as expected, i.e. try to find bugs in the program. However, I also ran lots of performance tests, ie perform lots of factorisations in order to get a feeling of performance of each algorithm. 4.1 Correctness tests Most of my testing for correctness was performed in place with the program. Essentially, I had to make sure that my algorithms did indeed perform a factorisation. This is quite easy to check within the main flow of the program, so I felt that there was no need for separate testing modules. By just adding a couple of lines of code, I was able to test the correctness of my results every time I performed a factorisation. This way, I was constantly checking for errors, even when I was running the performance tests. I should note at this point that all my checking was performed (inevitably) using the facilities of the MAPM library. I guess that if the MAPM library contained any sort of errors, my checks, and in fact my whole program, would be erroneous. 4.2 Performance tests I had to perform two separate kinds of performance tests. First of all, I ran tests on the library MAPM, to get a feel for its capabilities. These tests were supposed to give me an approximation of how fast this library was, and how to judge my algorithms according to the library s capabilities. The second, and most important kind of performance test was to benchmark the algorithms I implemented. These kinds of tests I usually performed after I felt that an algorithm was fully implemented. The results of these tests are included in the last section of each algorithm s chapter. I have tried to evaluate these tests, to the best of my abilities, and perhaps draw on some conclusions. 10

CHAPTER 4. TESTING 11 In the Appendix A I have tried to perform a mini benchmarking scheme, where all the algorithms were given the same numbers, and their performance was timed and entered into a table. Although I did not run too many of these tests, I felt that the results were quite within what I expected. Finally, in Appendix B I have included some sample printouts of the performance tests for each algorithm, as well as sample output of my final program, which utilises all the algorithms in order to factorise an input number.

Part II Implementation 12

CHAPTER 5 Tools for factorisation Before proceeding with the actual algorithms and their description, it would be useful to describe some tool algorithms which are used throughout the factorisation algorithms. 5.1 Greatest common divisor This algorithm is by far the most used algorithm in my program. It is used by all the factorisation methods I have implemented. A very efficient routine for finding the greatest common divisor of two numbers a and b would greatly enhance the performance of the factorisation algorithms. In figure 5.1 I have included pseudocode for finding the gcd(a, b) using the Euclidean method. WHILE b 0 DO temp := b b := a MOD b a := temp RETURN a Figure 5.1: Pseudocode for computing gcd(a, b) using the Euclidean algorithm 5.2 Fast exponentiation modulo The idea behind fast exponentiation is that if the exponent is a power of 2 then we can exponentiate by successively squaring: x 8 = ((x 2 ) 2 ) 2 13

CHAPTER 5. TOOLS FOR FACTORISATION 14 n = 1 WHILE b 0 IF b is odd THEN n := n a MOD m b := b/2 a := a a MOD m Figure 5.2: Pseudocode for fast computation of a b mod m x 256 = (((((((x 2 ) 2 ) 2 ) 2 ) 2 ) 2 ) 2 ) 2. If the exponent is not a power of 2, then we use its binary representation, which is just a sum of powers of 2: x 291 = x 256 x 32 x 2 x 1. The pseudocode shown in figure 5.2 will quickly compute a b mod m. The way it works is that it finds the binary representation of b, while at the same time compute successive squares of a. The variable n records the product of the powers of a, and also contains the final result at the end of the computation. 5.3 Primality testing According to Fermat s little theorem, if n is odd and composite and n satisfies 2 n 1 1 (mod n) then we say that n is pseudoprime. Therefore, for any number n, we can just compute the value 2 n 1 (mod n) using the algorithm 5.2, and then simply check to see if the return value is 1 or not. Despite the fact that this test is not a 100% guarantee of primality, in practice it is very useful. This test can be made stronger by computing the same values for the bases 2,3,5,7, and then checking to see if all of them yield the result 1.

CHAPTER 6 Trial divisions algorithm The most straight-forward algorithm for factorising an integer is using trial divisions. This algorithm is a good place to start, and it is quite easy to understand. 6.1 Description of trial divisions algorithm This algorithm essentially tries to factorise an integer N using brute force. Starting at p = 2, this algorithm tries to divide N with every number until it succeeds. When this happens, it sets N N/p, and resumes its operation. The way in which we choose our p can speed up, or slow down, our algorithm. For instance, we could pick our p s sequentially, by adding 1 at every iteration. Even better, we could divide N by 2 and 3, and then keep adding 2 to p in order to generate a sequence of odd numbers. The fastest way, but with more memory requirements, is to generate a list of all prime number below a specified limit, and then assign those values to p. 6.2 Implementation of trial divisions algorithm In figure 6.1 you can see the pseudocode of my implementation. I have not made any attempts to optimise this algorithm, and so I have used the naive way of choosing my p s, i.e. by adding 1 to the trial divisor at every iteration. As far as the source code is concerned, this function accepts the following parameters: n: The number to be factorised. Note that no changes are made to the original value of this variable. max: This variable sets the limit of the maximum test factor to be used. factors: An array of MAPM variables, in which the factors of n will be written. 15

CHAPTER 6. TRIAL DIVISIONS ALGORITHM 16 INPUT N test factor := 2 WHILE (N > 1 AND test factor < max) IF (N MOD test factor) == 0 THEN N := N / test factor PRINT test factor ELSE test factor := test factor + 1 Figure 6.1: Pseudocode for trial divisions algorithm Figure 6.2: Results of tests on Trial divisions algorithm 6.3 Running time According to [2], the expected running time of this algorithm is O(f (log N) 2 ), where f is the size of the factor found. The efficiency of this algorithm depends on your strategy of choosing the trial divisors p, as explained earlier. In figure 6.2 you can see the results of the tests of my implementation of this algorithm. The graph shows the factor size versus the amount of time it took, from a sample of 1427 factorisations. As expected, the amount of time the algorithm takes increases exponentially with the size of the factor found. Practically, after 6 or 7 digits, this algorithm becomes too expensive.

CHAPTER 6. TRIAL DIVISIONS ALGORITHM 17 6.4 Remarks One of the features of this algorithm is that if we let it run long enough on a prime N p, it will prove the primality of N p. In most cases this is not wanted, and it is regarded as a waste of effort. However, this algorithm is very fast in finding prime factors of size less than 5-6 decimal digits. Furthermore, this algorithm may be used in breaking up composite factors which are found using the algorithms described in the following chapters.

CHAPTER 7 Fermat s algorithm The first of the modern algorithms that I will describe is due to Fermat. It is not usually implemented these days unless it is known that the number to be factored has two factors which are relatively close to the square root of the number. However, this algorithm contains the key idea behind two of the most powerful algorithms for factorisation, the Quadratic Sieve and the Continued Fractions algorithm. 7.1 Quick description of Fermat s algorithm Fermat s idea is the following. Let the number to be factored be N. Suppose that N can be written as the difference of two squares, such as N = x 2 y 2 Instantly, we could write N as (x y)(x + y), and thus we have successfully broken N into two factors. The two factors may not be prime. In that case, we could recursively apply this process until we deduce a prime factorisation for N. 7.2 Detailed description of Fermat s algorithm The first step in describing this algorithm is to prove that every odd number N can be written as a difference of squares. Let us suppose that N = a b. Since we assumed N to be odd, then both a and b must be odd. Now, let us define x and y as follows: x = (a + b)/2, y = (a b)/2 Then, if we try to work out x 2 y 2 for the above values, we get x 2 y 2 = (a 2 + 2ab + b 2 ) (a 2 2ab + b 2 ) = ab = n. Fermat s algorithm works in the opposite direction from trial division. When we apply trial division, we start by looking at small factors, and we work our 18

CHAPTER 7. FERMAT S ALGORITHM 19 way up to N. In Fermat s algorithm, we start by looking for factors near N, and work our way down. 7.3 Implementation of Fermat s algorithm Now I will describe an implementation of Fermat s algorithm. As I mentioned earlier, we search for integers x and y such that x 2 y 2 = N. We can start with x = N, and try increasing y until x 2 y 2 is equal or less than N. If it is equal to N then we are done! If not, we increase x by one, and we iterate. In order to further optimise the algorithm, let us set r = x 2 y 2 N. Therefore, we have success when r = 0. All that we really want to do is keep track of r. The value of r can change only when we increase x by one or y by one. When we replace x 2 with (x + 1) 2, variable r increases by 2x + 1. We could express this increase in r by setting u = 2x + 1. Similarly, when y 2 is replaced by (y + 1) 2 the variable r decreases by 2y + 1. This decrease in r can be expressed as v = 2y + 1. (Note that when x and y increase by one, u and v increase by two.) Having defined r, u, and v, we can proceed with our implementation. It turns out that we do not actually need the values x and y. Since we start by setting x = N and y = 0, it follows that u = 2 N + 1 and v = 1. Also, r = ( N ) 2 N. All we now have to do is define an increase in x and an increase in y. According to the definition of u and v, an increase to x by 1 would increase r by u, and u by 2. Similarly, and increase to y by 1 would decrease r by v, and increase v by 2. The algorithm is completely defined. All we now have to do is keep increasing x and y (in practice u, v, and r), until r = 0. When r is zero, we can compute (x+y) and (x-y) as follows: x + y = (u + v 2)/2, x y = (u v)/2 At this point, I believe that some sort of pseudocode would be most appropriate in order to fully understand my implementation. Figure 7.1 contains the pseudocode which describes my implementation. 7.4 Running time How much work is actually needed to find the factors of N? Let us suppose that N = a b, with a < b. The factorisation will be achieved when x = (a + b)/2. Since the starting value of x is N, and b = N/a, the factorisation will take approximately 1 2 (a + N a ) N = ( N a) 2 2a cycles. If the two factors of N are really close, i.e. if a = k N, with 0 < k < 1, then the number of cycles required in order to obtain the factorisation is (1 k) 2 N. 2k

CHAPTER 7. FERMAT S ALGORITHM 20 INPUT N sqrt := N u := 2 * sqrt + 1 v := 1 r := sqrt * sqrt - N WHILE r <> 0 IF r > 0 THEN /* Keep increasing y */ WHILE r > 0 r := r - v v := v + 2 IF r < 0 THEN /* Increase x */ r := r + u u := u + 2 PRINT (u + v - 2) / 2 PRINT (u - v) / 2 Figure 7.1: Pseudocode for Fermat s algorithm This complexity is of the order O(cN 1 2 ). However, the value of k can be very small, and thus making this algorithm impractical. For instance, let us consider an ordinary case where a N 1 3 and b N 2 3. In such a case, the number of cycles necessary will be ( N 3 N) 2 2 3 N = ( 3 N) 2 ( 6 N 1) 2 2 3 N 1 2 N 2 3, which is considerably higher than O(N 1 2 ). Therefore this algorithm is only practical when the factors a and b are almost equal to each other. In figure 7.2 you can see the test results of my implementation of Fermat s algorithm, from a sample of 2075 factorisations. Again, the graph shows the relation of the size of the factor found versus the amount of time it took. As we expected, this algorithm become too slow for factors with 7 or more digits. The graph follows the same trend as the trial divisions algorithm. In practice however, we will prefer the trial divisions algorithm. 7.5 Remarks This algorithm has a very nice feature: it does not involve multiplication. We have defined the variables r, u, v in such a way that we only need to perform addition and subtraction. This is why sometimes this algorithm is called factorising by addition and subtraction. However, the number of additions and subtractions that we have to perform is quite large. For example, in order to factorise 1783647329 = 84449 21121 we need to increase x 10551 times, and y 31664 times.

CHAPTER 7. FERMAT S ALGORITHM 21 Figure 7.2: Results of tests on Fermat s algorithm Additionally, this algorithm suffers from the same problem as trial divisions, it will prove primality in the worst case. If this algorithm is given a prime number p, then the results will eventually be 1 and p. By the way, this is even worst than proving primality with trial divisions. The total number of cycles required for proving primality is n n, which is much worst than trial divisions.

CHAPTER 8 The Pollard ρ method This method is also called Pollard s second factoring method or the Monte Carlo Method because of it pseudo random nature. It is based on a statistical idea [7] and has been refined by Richard Brent [1]. The ideas involved for finding the factors of a number N are described below. 8.1 Description of the algorithm In short, the algorithm comprises of the following steps: 1. Construct a sequence of integers {x i } which is periodically recurrent (mod p), where p is a prime factor of N. 2. Search for the period of repetition, i.e. find i and j such that x i x j (mod p). 3. Calculate the factor p of N. 8.1.1 Constructing the sequence The first step in finding a factor is to construct a sequence of periodically recurrent values. Let us consider a recursively defined sequence of numbers, according to the formula x i f(x i 1, x i 2,..., x i k ) (mod m) where m is any arbitrary integer, and given the initial values x 1,..., x k. This means that the values x k+1, x k+2,... can be computed by using the k previously computed values. However, all the values are computed mod m, and therefore there are only m possible values that each x i can take. This means that there are at most m s distinct sequences of s values. Therefore, after at most m s + 1 values, we will have two identical sequences of s consecutive numbers. Let these sequences of s values be x i, x i+1,..., x i+s 1 and x j, x j+1,..., x j+s 1. Since these sequences are identical, it follows that their next elements, namely x i+s and x j+s respectively, will be the same. In fact, every element x i+s+n and x j+s+n will be identical thereafter. 22

CHAPTER 8. THE POLLARD ρ METHOD 23 This means that the sequence {x i } is periodically repeated, except maybe from a part at the beginning which is called the aperiodic part. This part can be thought of the tail of the Greek letter ρ. Once we get off the tail, we keep cycling around the same sequence of values. That s why this algorithm is known as the Pollard ρ algorithm. Back to our problem, instead of random integers {x i }, it would be sufficient to recursively compute a sequence of pseudo-random integers. The simplest way to do this would be to define a linear formula such as x i+1 ax i (mod N) for a fixed a and x 0. Unfortunately, it turns out that this does not produce sufficiently random values to give a short period of recurring values. This means that we would have to compute a lot of values before we can identify the period of recurrence. The next simplest choice is to use a quadriatic formula such as x i+1 x 2 i + a (mod N) for a fixed a and x 0. It has been empirically observed that the above expression does produce sufficiently random values 1. Pollard found that in such a sequence {x i } of integers mod N an element is usually recurring after only about C N steps. 8.1.2 Finding the period The second step of the algorithm is to search for the period within the sequence {x i }. To determine it in the most general case would require finding where a sequence of consecutive elements is repeated if the period is long. This is quite a tedious task, and is ruled out by the amount of labour involved. In the simplest case however, where x i is defined in terms of x i 1 only, the sequence will start to repeat as soon as any single x k is the same as any of the previous ones. Therefore, in order to find the period, we only need to compare each new x j with the previous values. The original version of Pollard s method used Floyd s cycle-finding algorithm for finding the period. 2 Suppose the sequence {x i } (mod m) has an aperiodic part of length a and a period of length l. The period will then ultimately be revealed by the test: Is x 2i x i (mod m)? The ρ method of Pollard has been made about 25% faster by a modification to the cycle-finding algorithm due to Brent [1]. As we saw above, Pollard searched for the period of the sequence x i (mod m) by considering x 2i x i (mod m). Instead, Brent halts x i when i = 2 k 1 and subsequently considers x 2n 1 x j, 2 n+1 2 n 1 j 2 n+1 + 1. In this way the period is discovered after fewer arithmetic operations than demanded by the original algorithm of Pollard. The saving in Brent s modification is due to the fact that the lower x i s are not computed twice as in Floyd s algorithm. 1 Note that this is not true if a is either 0 or -2. 2 The proof of Floyd s cycle-finding algorithm is omitted.

CHAPTER 8. THE POLLARD ρ METHOD 24 8.1.3 Calculating the factor Finally, consider the third and last step of Pollard s ρ method. If we have a sequence {x i } that is periodic (mod N), how can we find p, the unknown factor of N? In section 8.1.1 we saw that the formula x i xi 2 + a (mod N) is sufficient to give us a desired sequence of pseudo-random numbers. Now, let us introduce the formula y i = x i (mod p) where p is the unknown factor of N. This formula gives rise to a few nice properties. The sequence {y i } is periodic, and eventually we will have y i = y j (mod p). But when this happens, then x i = x j (mod p), which means that p will divide x i x j. Therefore, by taking the GCD of x i x j and N, we have a very good chance of finding a non-trivial divisor of N. All this is nice, except from the fact that we do not know p, which means that we cannot compute {y i }, and therefore we do not know when y i will equal y j. This is where the algorithms for finding the period in a periodic sequence are used. What we do is that we use Floyd s or Brent s algorithm to choose lots of x i s and x j s, and we each time we compute the GCD of x i x j and N. Usually, the GCD will be one. But as soon as x i x j (mod p), then x i x j will be divisible by p, which means that the GCD will be a non-trivial divisor of N. A further improvement that can be made to both versions of Pollard s ρ algorithm has as follows. Instead of computing the GCD at every cycle of the algorithm, we can accumulate the product of differences of all the pairs we have considered. After say 20 cycles, we can compute the GCD of this product and N, without risking to miss any factors of N. This way, the burden of computing a GCD at each cycle is reduced to one subtraction and one multiplication. 8.2 Implementation of Pollard ρ As with the previous algorithms, I implemented Pollard s ρ algorithm in a single function. The parameters that the function expects are: n: The number to be factorised. Note that no changes are made to the original value of this variable. max: This variable sets the limit of the maximum test factor to be used. factors: An array of MAPM variables, in which the factors of n will be written. In figure 8.1 you can see the pseudocode of my algorithm. Note that the constant a, which is used to generate the pseudorandom sequence, is hardcoded in the function. It is quite an easy task to change its value, in order to get a different sequence of numbers.

CHAPTER 8. THE POLLARD ρ METHOD 25 INPUT N, c, max x1 := 2 x2 := x1 2 + c /* Our chosen function */ range := 1 product := 1 terms := 0 WHILE terms < max DO FOR j := 1 to range DO x2 := (x2 2 + a) MOD N /* Our chosen function */ product := product (x1 - x2) MOD N terms := terms + 1 IF (terms MOD 20 == 0) THEN g := gcd(product, N) IF g > 1 THEN PRINT g N := N / g product := 1 next values(x1, x2, range) /* Brent s improvement */ Figure 8.1: Pseudocode for the Pollard ρ algorithm 8.3 Running time Under plausible assumptions, the expected running time of Pollard s ρ algorithm is O(f 1/2 (log N) 2 ), where f is the size of the factor found. Figure 8.2 shows the test results of my implementation, from a sample of 4997 factorisations. There are a number of conclusions and comments to be made about this funny-looking graph. First of all, we have to remember that this algorithm is not deterministic, but probabilistic. Therefore, the results might contradict themselves at some points. For instance, at first glance one might think that this algorithm takes more time to finds small factors than larger ones. However, this is not entirely true. You should keep in mind that this graph only contains timings of successful factorisations. So, although the times for 20-digit factors are quite small, the success rate of the algorithm is quite low for such factors. The best way to explain this graph if we observe its patterns. There is an obvious pattern for each group of factors. The timings seem to build up slowly, and then explode very high. If this pattern is true for 20-digit numbers as well, we can see that the graph only contains the first part of the pattern, where the timings are quite small. If we had enough space to fit the entire graph, then when the pattern for 20-digit factors completed itself, its height could as much as the Eiffel tower s!

CHAPTER 8. THE POLLARD ρ METHOD 26 8.4 Remarks Figure 8.2: Results of tests on the Pollard ρ algorithm The method that I have just described for finding prime factors of composite integers is probabilistic. This means that we have to be prepared to be unlucky on occasion, and not get any results. If we run the Pollard ρ algorithm and do not find any prime divisors that might be because there are no prime divisors in the appropriate interval or it might be because of bad luck. What we need to do in such situations is to change our luck. For this algorithm, this would mean to change certain constants, such as the recursive function described in section 8.1.1. Then, of course, we have to know when it is time to give up, and perhaps try another algorithm. In practice, after running trial divisions up to 10 6 or 10 7, one would run the Pollard ρ algorithm for a while. Keep in mind, however, that if all the prime factors are roughly larger than 10 12 then this algorithm will not usually work.

CHAPTER 9 The Pollard p 1 method The next algorithm that I will consider is known as the Pollard p 1 algorithm [6]. It formalises several rules, which have been known for some time. The principle here is to use information concerning the order of some element a of the group M N of primitive residue classes mod N to deduce properties of the factors of N. 9.1 Description of the algorithm This algorithm is pretty much based on Fermat s little theorem: If p is prime, and a 0 mod p then a p 1 1 (mod p). Now, let us suppose that the number to be factored is N, and that one of its prime factors is p. Also, assume that p 1 divides Q. Using Fermat s theorem, and under the assumption that (p 1) Q, we arrive at a Q 1 (mod p), and therefore p divides a Q 1. Now, we can apply GCD to N and a Q 1 to get p or some other non-trivial divisor of N. Our problem now is to find a Q such that (p 1) Q, and keeping in mind that we do not know p. This can be done in two ways. The easiest way is to set Q = max! (mod n). This value can be computed quickly, since a max! = ( (((a 1 ) 2 ) 3 ) 4 ) max, and because as we saw in Section 5.2, exponentiation modulo N is very fast. Note that a can be any number, as long as it is relatively prime to N. Another, more efficient way to choose Q is to set Q = p 1 p 2 p k, where p i is a prime number less than a specified limit. In such a case we should also append to Q some additional multiples of the small primes, so as not to miss out any factors of N. This will cut the number of exponentiations required by about a factor of eight. 27

CHAPTER 9. THE POLLARD P 1 METHOD 28 INPUT N, c, max m := c FOR i := 1 to max DO m := modexpo(m, i, N) IF (i MOD 10 == 0) THEN g := gcd(m-1, N) IF g > 1 THEN PRINT g Figure 9.1: Pseudocode for Fermat s algorithm No matter how we choose Q, we have to keep in mind that essentially the size of Q is what limits our search space. For instance, by choosing Q = 10000! we are assuming that p 1 has prime factors less than 10000. 9.2 A slight improvement In practice, we do not know how close we have to get to max before we have picked up the first prime divisor of N. And we do not want to go so far that we pick them all up. For that reason, we periodically check the value of GCD(a Q 1, N). If it is still 1, we continue. If it is N, then we have picked up all the divisors of N. In such a case we need to either backtrack a bit, or try using a different a. 9.3 Implementation of Pollard p 1 In this section I will describe how I implemented the algorithm, as well as discuss certain issues that came up while implementing this algorithm. The function accepts the following parameters: num: the number to be factorised c, max: so that Q = c max! factors: an array where the factors of num are stored The algorithm is essentially a loop which runs until we have reached the specified limit of iterations, which is max. In most literature, this limit is set to 10000, so I decided to follow this guideline. My implementation uses the simple way of choosing Q, i.e. setting Q = 10000!, and subsequently calculating 2 10000! (mod N). This is done using the procedure modexpo, which was described in section 5.2. Every 10 cycles, the program calculates the gcd of the current 2 k! (mod N) and N, using the algorithm described in section 5.1. If the gcd is greater than one, then the gcd is written in the factors array. Subsequently, the program sets N N/gcd. If the remaining N is composite, then the procedure is applied recursively to the new N, otherwise the function terminates.

CHAPTER 9. THE POLLARD P 1 METHOD 29 Figure 9.2: Results of tests on Pollard p 1 algorithm The pseudocode of my implementation is shown in figure 9.1. I should note that my implementation makes no effort in backtracking or changing a in case gcd is equal to 1. It is up to the caller of the function to choose an appropriate a (c) and limit of iterations (max). 9.4 Running time In the worst case, Pollard s p 1 algorithm takes as long as the trial divisions algorithm. However, it usually does better, provided that we are lucky enough to find a factor. In figure 9.2 I have plotted the results of 14217 factorisations using this algorithm. As previously, the graph contains timings derived only from the successful tests, not the ones that failed. The patterns in the graph resemble greatly the graph of Pollard s ρ algorithm. However, there is another point to be made about this algorithm. Apparently, Pollard s p 1 algorithm is much faster that Pollard s ρ algorithm, but with less success. It turns out that the algorithm sheldomly gives back results, but when it does, it is very fast. This is why I had to perform so many tests on this algorithm, because more than 70% of the tests failed. 9.5 Remarks This algorithm has the same problems as the previous one. As described earlier, at some point we might find the GCD to be equal to N. In such cases we will want to try to change the base a to a different integer. Also, the algorithm might not terminate if p 1 has only large prime factors.

CHAPTER 9. THE POLLARD P 1 METHOD 30 It has been statistically found that the largest prime factor of an arbitrary integer N usually falls around N 0.63. Therefore, with a limit of 10000, Pollard p 1 will find prime factors that are less than two million. We should keep in mind however that there is a fairly wide distribution of the largest prime factor of N, and therefore factors much larger than two million may be found. According to [2], the largest factor found by this algorithm during the Cunningham project is a 32-digit factor 49858990580788843054012690078841 of 2 977 1. I should also note that because of Pollard p 1, the RSA public key cryptosystem has restrictions on the primes a and b that are chosen. Essentially, if a 1 or b 1 have only small prime factors, then Pollard p 1 will break the encryption very quickly.

CHAPTER 10 Elliptic Curves Method Factorisation based on elliptic curves is a relatively new method. As its name implies, this method is based on the theory of elliptic curves. First, I will briefly describe what elliptic curves are, and demonstrate the theory behind them. Then, I will go on with the description of the factorisation method using elliptic curves. 10.1 Introduction to elliptic curves Elliptic curves are equations of the form where a and b are constants, such that y 2 = x 3 + ax + b, 4a 3 + 27 0. These curves have the curious property that if a line intersects it at two points, then it will also have a third point of intersection. A tangent to the curve is considered to have two points of intersection at the point of tangency. If we know the two points (x 1, y 1 ), (x 2, y 2 ) of intersection, we can compute the slope λ of the line, as well as the third point of intersection in the following way: λ = { 3x 2 1 +a 2y if x 1 = x 2, otherwise y 1 y 2 x 1 x 2 x 3 = λ 2 x 1 x 2 (mod n) y 3 = λ(x 3 x 1 ) + y 1 (mod n) 10.1.1 Elliptic curves as a group In order to perform factorisation with elliptic curves, we need to make the set of points on an elliptic curve into a group. To do this, we must define a binary operation, the identity element, as well as the inverse. 31

CHAPTER 10. ELLIPTIC CURVES METHOD 32 We start by defining the binary operation as follows: (x 1, y 1 ) (x 2, y 2 ) = (x 3, y 3 ) where x 3 and y 3 are computed as shown earlier. Note that the new point is not the third point of intersection, but its reflection across the x-axis. It is still, however, on the same elliptic curve. Now we proceed with defining the identity element of our group as follows: (x, y) (x, y) = (x, y) (x, y) = With the above definition, we have managed to define both the identity element and the inverses. The identity element can be thought of as a point far north, such that every vertical line passes through it. In terms of notation, E(a, b) denotes the group of rational points on the curve y 2 = x 3 + ax + b, where 4a 2 + 27b 2 0, together with the point. Also, with (x i, y i ) we denote (x 1, y 1 )#i, where (x 1, y 1 )#i = (x 1, y 1 ) (x 1, y 1 ) (x 1, y 1 ). } {{ } i times 10.1.2 Elliptic curves modulo n All our reasoning from the previous sections still applies to elliptic curves modulo n. If x 1 x 2 (mod n) and y 1 y 2 (mod n) then (x 1, y 1 ) (x 2, y 2 ) =. Let s be the inverse of x 1 x 2. As before, we define: { (3x 2 λ = 1 + a) s if x 1 x 2 (mod n), (y 1 y 2 ) s otherwise x 3 = λ 2 x 1 x 2 (mod n) y 3 = λ(x 3 x 1 ) + y 1 (mod n) Furthermore, we will define the binary operation as and we will define (x i, y i ) mod n as (x 1, y 1 ) (x 2, y 2 ) (x 3, y 3 ) (mod n), (x i, y i ) (x 1, y 1 )#i (mod n). Finally, E(a, b)/n will denote the elliptic group modulo n whose elements are pairs (x, y) of non-negative integers less than n and satisfying y 2 x 3 = ax + b, together with the point. 10.1.3 Computation on elliptic curves In order to implement factorisation, we need a fast way of computing (x, y)#i. Given the first coordinate of (x 1, y 1 ), we can compute the first coordinate of (x 2, y 2 ) as follows: x 2 = (x2 a) 2 8bx 4(x 3 + ax + b).

CHAPTER 10. ELLIPTIC CURVES METHOD 33 Therefore, given the first coordinate of (x, y)#i, we can compute the first coordinate of (x, y)#2i using the above formula. We can extend this to 2i + 1 with the following formula: x 2i+1 = (a x ix i+1 ) 2 4b(x i + x i+1 ) x 1 (x i x i+1 ) 2. As you can see, such computations involve lots of fractions. We can avoid using rational numbers if we introduce the notion of a triplet (X, Y, Z), where x = X/Z, y = Y/Z, and where X,Y, and Z are integers. Another nice feature of this notation is that the identity element now has the explicit representation (0, Y, 0), where Y can be any integer. If we define (X i, Y i, Z i ) = (X, Y, Z)#i, we can adjust our previous formulas to our new notation: X 2i = (X 2 i az 2 i ) 2 8bX i Z 3 i, Z 2i = 4Z i (X 3 i + ax i Z 2 i + bz 3 i ), X 2i+1 = Z((X i X i+1 az i Z i+1 ) 2 4bZ i Z i+1 (X i Z i+1 + X i+1 Z i )), Z 2i+1 = X 1 (X i+1 Z i X i Z i+1 ) 2. I should note that for our purposes, we do not need to calculate the second coordinate Y of the triplets. Still, Y i can always be recovered from X i and Z i. Also, we can use our triplets modulo n, as long as we do all our computations modulo n. 10.1.4 Factorisation using elliptic curves The method I will be describing is essentially due to A. K. Lenstra, and H. W. Lenstra, Jr. Let N be a composite number relatively prime to 6. (In practice, this means that N has no small factors). We randomly choose a for our elliptic curve, and a random point (x, y) on the curve. We can now compute b as follows: b y 2 x 3 ax (mod N). We convert to triplets (X, Y, Z), with our initial triplet being (x, y, 1). If p is a prime number which divides N, and E(a, b)/p divides k!, then (X, Y, Z)#k! = ( (((X, Y, Z)#1)#2) )#k will be the identity element in E(a, b)/p (but not in E(a, b)). This simply means that there is at least one coordinate of (X, Y, Z)#k! which is not divisible by N, but all the coordinates are divisible by p. Since Z k! is divisible by p, there is a good chance that the greatest common divisor of Z k! and N is a non-trivial divisor of N.

CHAPTER 10. ELLIPTIC CURVES METHOD 34 INPUT N, X, Y, a, max b := Y 2 - X 3 - ax MOD N g := gcd(4a 3 + 27b 2, N) IF g > 1 THEN PRINT g Z := 1 k := 2 WHILE k <= max DO FOR i := 1 to 10 DO NEXTVALUES(X,Z,k,N,a,b) k := k + 1 g := gcd(z, N) IF g > 1 THEN PRINT g Figure 10.1: Pseudocode for main loop of Elliptic curves method 10.2 Implementation of elliptic curves method My implementation of the Elliptic Curves Method consists of two big functions and four smaller functions. The first two are shown in figures 10.1 and 10.2. The main loop of the algorithm uses the same structure as some of our previous algorithms. Essentially, we loop many times, and at each iteration we take the gcd of N and Z. This function accepts the following parameters: n: The number to be factorised. Must be relatively prime to 6. X, Y: These are arbitrary integers, between 1 and n. a: An arbitrary integer, the first parameter of our curve. max: This variable sets the limit of the maximum iterations. factors: An array of MAPM variables, in which the factors of n will be written. Most of the work, however, is done in the NEXTVALUES function. This function is responsible for calculating the first and third coordinates of our triplets. This algorithm uses the binary expansion of k in order to find the results. By doing this, it manages to compute X k and Z k by successively computing X 2i or X 2i+1 in a minimum number of steps. The four small functions that I mentioned use the formulas from section 10.1.3 to compute the values X 2i, X 2i+1, Z 2i, Z 2i+1. Their implementation is quite straightforward. 10.3 Running time According to [2], under plausible assumptions, the expected running time of this algorithm is O(exp( c ln f ln ln f) (log N) 2 ), where c 2 is a constant.

CHAPTER 10. ELLIPTIC CURVES METHOD 35 /*Calculates the first and third coordinates of (X, Y, Z)#k (mod N).*/ INPUT X, Z, k, n, a, b i := 0 C[] := BINARY(k) X1 := X Z1 := Z X2 := X 2i (X,Z) Z2 := Z 2i (X,Z) FOR i := length(c[])-1 TO 1 DO U1 := X 2i+1 (X1,Z1,X2,Z2) U2 := Z 2i+1 (X1,Z1,X2,Z2) IF C[i] == 0 THEN temp := X 2i (X1,Z1) Z1 := Z 2i (X1,Z1) X1 := temp X2 := U1 Z2 := U2 ELSE temp := X 2i (X2,Z2) Z2 := Z 2i (X2,Z2) X2 := temp X1 := U1 Z1 := U2 PRINT X1, Z1 Figure 10.2: Pseudocode for NEXTVALUES function of Elliptic curves method

CHAPTER 10. ELLIPTIC CURVES METHOD 36 Figure 10.3: Results of tests on Elliptic curves algorithm In figure 10.3 you can see the results of 6398 factorisations which I performed using this algorithm. As in the Pollard p 1 algorithm, we can speed up this algorithm by restricting k to a set of powers of primes less than max rather than running over all integers less than max. Also, we can expect better results if we regularly interrupt the run and restart with a new set of parameters rather than persisting on our initial choice of parameters. 10.4 Remarks The Elliptic Curve Method has the characteristic of being practical from the point where trial division becomes impossible until well into the range where MPQS and NFS can be implemented. The largest factor that has been found by ECM is the 53-digit factor of 2 677 1, according to [2]. Note that if the RSA system was implemented with 512-bit keys and the three-factor variation, the smallest prime would be less than 53 digits, so Elliptic Curves could be used to break the system.

CHAPTER 11 Overall Comparison The various factorisation methods I have described are all useful in different situations. When factoring a large number, the method to be chosen must depend on knowledge about the factors of the number. To begin with, you must make sure that the number is composite, so that you do not make a long computer run which will result in nothing. It would be really frustrating to discover after a very long run that N has the prime factorisation N = 97 p, which could have been obtained almost immediately by using trial division. Further, you could use Fermat s method in case N is the product of two almost equal factors, or Pollard s p 1 method in the event of N having one factor p with p 1 being a product of only small primes. There are lots of methods for finding middle-sizes factors, all of which are good for specific situations. But the question remains: how long do you keep looking for these middling sized factors before pulling out something like the Quadriatic Sieve or NFS? A well-balanced strategy, developed by Naur [5] may be summarised as follows: 1. Make sure N is composite. Since very small divisors are quite common and are found very quickly by trial division, it is worthwhile attempting trial division up to 100 or 1000 even before applying a strong pseudoprime test. 2. Perform trial division up to 10 5 or 10 6. If Pollard s ρ method is available, then trial division need only be performed to a much lower search limit, e.g.. 10 4, since the small divisors will fall out rapidly also with Pollard s methods. One reason why trial division with the small primes is useful, despite the fact that Pollard s ρ method is quicker, is that the small factors tend to appear multiplied together when found with Pollard s method, and thus have to be separated by trial division anyhow. Apply a compositeness test on what is left of N every time a factor has been found and removed. 3. At this point you need to take a long shot, and with a little luck, shorten the running time enormously it could even be decisive for quick success or complete failure in the case when N is very large. The strategy to be employed is: Take the methods you have implemented on your computer 37

CHAPTER 11. OVERALL COMPARISON 38 covering various situations, which will mean one or more of the following: Pollard s p 1 and p+1 methods, Fermat s, Shanks, or even the Williams methods. The methods should be capable of being suspended and resumed from where they stopped. Since you cannot possibly know in advance which of these methods will achieve a factorisation (If a factorisation will be found at all), it is a good technique at this stage to run the program of each method in sequence for a predetermined number of steps, say 1000 or 10000, and breaking the runs off at re-start points in order to be able to proceed, if necessary. If N does not factorise during such a run you have to repeat the whole process from the re-start point of the previous run. Also, you might want to consider the possibility of changing your choice of constants. 4. If the number N has still not been factored, you will need to rely upon the big algorithms. Depending on the size of the number and on the capacity of your computer, this can be the Multiple Polynomial Quadriatic Sieve (MPQS), the Number Field Sieve (NFS), or even the Elliptic Curves Method. Now you have to sit down and wait; fairly good estimates of the maximal running times are available for all these methods, so that you will know approximately how long the computer run could take. Choosing which methods to use, and when, is still more an art than a science. You should keep in mind that the big algorithms are much more cumbersome and it is worth spending at least a few minutes trying to vary your luck first. Theoretically and experimentally, it has been shown that you have a better chance of finding your mid-sized factors if you run several algorithms with several choices of parameters rather than spending the same amount of time on a single algorithms with a single set of parameters.

CHAPTER 12 Epilogue When I started my work on this project, I had very little knowledge of this field of study. Factorisation was something that I had never encountered before, at least not in great detail. I believe that this was to my advantage, since I was able to write my report from an introductory point of view, paying attention to the points which were hard for me to understand. During the course of my project, I had to make lots of choices regarding the material which I would study. For instance, I chose not to implement one of the big guns of factorisation, such as MPQS or NFS. I believe that my choices allowed me to focus on the quality of what I did, instead of doing many things, but without enough care. This way, I was able to firmly understand the concepts of these elementary algorithms, and thus obtain a good background in the subject. Of course, my project included lots of programming. Although I was already familiar with the programming language I used (ANSI C), I was able to further develop my programming skills. My final program consisted of roughly 1500 lines of code, which means that my program was not small. Furthermore, I developed a sense of responsibility as far as organisation procedures are concerned, such as keeping a logbook and doing tests. I am really happy to have managed to keep the balance between the theoretical and practical issues in doing my project. Although my report tends more towards theory, nevertheless I did quite a lot of work on the actual software. Thus I have been able to produce a complete tutorial of factorisation, including the theoretical description and background, the pseudocode description, and the actual implementation. I hope that this project will be helpful to those who get their hands on it. 39

Part III Appendices 40

APPENDIX A Benchmarks In this appendix I have included some sample test results of all the algorithms I implemented. Theses tests used certain numbers which I specifically chose for some reason, and not any arbitrary numbers. A.1 Tests with products of two nearby primes For the first set of tests, I used numbers which were products of two nearby primes. In table A.1 you can see the results of this set of tests. Algorithm number 1 number 2 number 3 number 4 Trial Divisions 0.05 1.71 8.58 15.81 Fermat Method 0.01 0.02 0.02 0.02 Pollard s p 1 0.12 0.72 1.9 0.39 Pollard s ρ 0.02 0.33 0.59 0.61 Lenstra s ECM 0.08 0.11 0.56 0.93 number 1 = 3980021 = 1993 x 1997 number 2 = 16831170221 = 129733 x 129737 number 3 = 431589872009 = 656951 x 65695 number 4 = 1469322167111 = 1212121 x 1212191 Table A.1: Products of two nearby primes As it was expected, Fermat s algorithm was by far the fastest. Another point to make is the quick time of Pollard p 1 for number 4. If we look at the decomposition of the factors of number 4 (minus one), we find out that 1212121 1 = 1212120 = 2 2 2 3 3 5 7 13 37. This means that p 1 had small factors, and that s why Pollard s p 1 algorithm found the factors so quickly. Finally, I should note that Pollard s ρ algorithm was faster that Trial Divisions, even though the numbers were relatively small. 41

APPENDIX A. BENCHMARKS 42 A.2 Tests with products of three nearby primes For this set of tests, I used numbers which were products of three nearby primes. In table A.2 you can see the results of this set of tests. Algorithm number 1 number 2 number 3 number 4 Trial Divisions 0.07 0.14 0.26 2.72 Fermat Method - - - - Pollard s p 1 0.26 0.68 0.28 1.74 Pollard s ρ 0.05 0.05 0.07 0.29 Lenstra s ECM 0.12 0.67 0.82 0.70 number 1 = 7956061979 = 1993 x 1997 x 1999 number 2 = 110154695923 = 4789 x 4793 x 4799 number 3 = 1019829472003 = 10061 x 10067 x 10069 number 4 = 1005660644975291 = 100183 x 100189 x 100193 Table A.2: Products of three nearby primes The results showed that Pollard s ρ algorithm was the fastest. The Elliptic Curves Method was fairly quick for large factors. For smaller factors, Trial Divisions was quicker. Note that it made no sense to run this test on Fermat s method, since this method is used with numbers which have two factors. A.3 Tests with products of three arbitrary primes For the last set of tests, I used numbers which were products of three arbitrary primes. The size of the factors gradually grows as we move on to the next number. In table A.3 you can see the results of this set of tests. Algorithm number 1 number 2 number 3 number 4 Trial Divisions 0.11 0.221 3.122 8.26 Fermat Method - - - - Pollard s p 1 0.23 0.90 5.46 - Pollard s ρ 0.07 0.06 0.14 1.31 Lenstra s ECM 0.53 0.46 6.781 0.94 number 1 = 14960418503 = 179 x 8467 x 9871 number 2 = 8355211084777 = 1163 x 12347 x 581857 number 3 = 416531649825896503 = 12983 x 987533 x 32487877 number 4 = 153674304751986405509 = 762479 x 1276237 x 157921783 Table A.3: Products of three arbitrary primes In this set, Pollard s ρ algorithm was again the fastest overall. We see, however, that for very large factors, ECM showed its capabilities by being the fastest. Also, I should say that Pollard s p 1 algorithm simply gave up for number 4, so I do not have a timing for that. We also see the Trial Divisions timings growing quite rapidly.

APPENDIX B Program output In this Appendix, I have included some sample output of my program. This output was either created by running the performance tests, or running the combined version of my program. B.1 Tests output Trial divisions algorithm 8 14700064 17.140 2,2,2,2,2,459377, OK! 8 14920780 0.000 2,2,5,7,197,541, OK! 8 16279397 1.420 401,40597, OK! 8 17470133 26.090 23,759571, OK! 8 18292242 1.820 2,3,59,51673, OK! 8 19120508 0.110 2,2,11,103,4219, OK! 8 20635253 0.550 1231,16763, OK! 8 21426599 0.110 43,181,2753, OK! 8 22389487 4.340 173,129419, OK! 8 24145844 1.100 2,2,193,31277, OK! 8 24772976 52.950 2,2,2,2,1548311, OK! 8 27083076 78.700 2,2,3,2256923, OK! 8 28960820 0.220 2,2,5,7,31,6673, OK! 8 29620253 79.590 13,2278481, OK! 8 30567968 0.050 2,2,2,2,2,421,2269, OK! 8 33072882 10.000 2,3,19,290113, OK! 8 34633266 0.050 2,3,47,191,643, OK! 8 37023334 651.580 2,18511667, OK! 8 38003210 0.000 2,5,7,31,83,211, OK! 8 38324435 261.010 5,7664887, OK! Fermat s algorithm 12 103447054117 192.950 29666491,3487, OK! 12 107658803491 25.540 4206901,25591, OK! 43

APPENDIX B. PROGRAM OUTPUT 44 12 118366881563 0.990 499373,237031, OK! 12 121823707817 1.040 518251,235067, OK! 12 122404743091 766.150 118494427,1033, OK! 12 131816524371 0.050 377727,348973, OK! 12 132132493075 283.470 43680163,3025, OK! 12 137348837443 404.580 63616877,2159, OK! 12 149761336043 0.820 515471,290533, OK! 12 158116709061 3.740 973703,162387, OK! 12 161430148107 0.930 548707,294201, OK! 12 167785725663 22.080 3733799,44937, OK! 12 178428901625 706.010 109802401,1625, OK! 12 187189598215 0.880 573865,326191, OK! 12 204614067397 3.730 1019497,200701, OK! 12 241531775015 12.740 2338951,103265, OK! 12 256718621685 0.820 631545,406493, OK! 12 264985242949 66.460 10837399,24451, OK! 12 279740763407 6021.260 929371307,301, OK! 12 286185151757 19107.040 2778496619,103, OK! 12 290275123679 36.200 6102191,47569, OK! 12 299489220963 0.440 613593,488091, OK! 12 324305418991 289.230 45123893,7187, OK! 12 349234244871 4.510 1255069,278259, OK! Pollard rho algorithm 15 119353790409531 78.050 69, ERROR!!! 15 125351514673454 4.060 2,1381,67049,676883, OK! 15 149331533171784 5.990 24,28961,214845731, OK! 15 152348612751036 4.560 19188,349,22750103, OK! 15 172004878340591 89.800 NO FACTORS FOUND 15 194957421274879 73.440 1231, ERROR!!! 15 205920181957521 75.910 34383, ERROR!!! 15 222008721116816 6.260 59888,47777,77591, OK! 15 249557297942163 108.310 123, ERROR!!! 15 297738900701700 113.040 300, ERROR!!! 15 321283014530048 5.600 1024,431,1223,595229, OK! 15 351838344957934 149.070 2, ERROR!!! 15 377118177660380 5.440 118330146740,3187, OK! 15 407893389920692 28.230 52,6367,1231993663, OK! 15 453023303965458 7.960 6,1321483,57135721, OK! 15 501651885394403 91.950 3767, ERROR!!! 15 581362260122354 104.240 2,1597, ERROR!!! 15 631626096053032 7.800 8,805813,97979633, OK! Pollard p-1 algorithm 20 17983096255782676173 71.900 87, ERROR!!! 20 18732302463106516915 42.290 5,4716563,794320036141, OK! 20 19633168496947424017 70.140 53,83, ERROR!!! 20 20054808357175364639 0.600 89,461,227,2153286050833, OK! 20 21240599822239584893 73.880 101, ERROR!!!

APPENDIX B. PROGRAM OUTPUT 45 20 24333145381449307839 0.440 3459,853,8247050571457, OK! 20 25281400673763563085 17.140 765,6281321,36013,146093, OK! 20 26159111191007383487 0.110 1429,397,46110544251599, OK! 20 28423862054615105617 0.110 77,3141601,117500938421, OK! 20 29474651485752579413 79.310 NO FACTORS FOUND 20 30334734795998729991 79.040 3, ERROR!!! 20 33204165949884833525 0.990 58387475,2797,203320147, OK! 20 35955144223483810055 0.050 5,7191028844696762011, OK! 20 38847376691209313199 0.050 8127,4780038967787537, OK! 20 40772255186800963589 2.910 761,334423,160207903963, OK! 20 44695231498724965775 72.220 2725,23, ERROR!!! 20 45751739079576358967 0.060 37,1236533488637198891, OK! Elliptic curves algorithm 16 2033370515132833 1.370 67,30348813658699,43, ERROR!!! 16 2144333894601389 0.500 19, ERROR!!! 16 2351222877625469 0.330 59,39851235213991, OK! 16 2480636328028919 3.080 197, ERROR!!! 16 2588922079845487 0.280 37259, ERROR!!! 16 3532991485651031 23.510 5233,675136916807, OK! 16 3766348308874693 0.940 349,10791828965257, OK! 16 3861434629423001 0.330 19,203233401548579, OK! 16 4209004773580541 0.110 2057, ERROR!!! 16 8394595909750987 0.330 NO FACTORS FOUND 16 9220175949479131 0.880 637,1109, ERROR!!! 16 9701531376381211 0.000 107, ERROR!!! 17 10644389056710187 1.210 NO FACTORS FOUND 17 11166864797389741 0.330 11,1015169527035431, OK! 17 11489043630188369 0.160 2191,5243744240159, OK! 17 12078054826846641 0.000 NO FACTORS FOUND 17 12380759253388243 0.160 37, ERROR!!! 17 14432713913707519 6.260 1277, ERROR!!! 17 25621001377456313 0.330 NO FACTORS FOUND 17 28111503784053067 0.170 161,1577, ERROR!!! 17 30979002302099723 476.530 45247,684664227509, OK! 17 32178036153737471 0.330 1501,21437732280971, OK! B.2 Combined factorisation output This section includes output produced when my program used the combined function for factorising. All it means is that all my algorithms were called one after the other, and tried to factorise part of the input number. The output of the program is quite self-explanatory. C:\>factor 298347004781928719247912 Trying to factorise 298347004781928719247912 Trying trial divisions... Remainding portion is 53923156875619

APPENDIX B. PROGRAM OUTPUT 46 Trying Pollard rho-1... Remainding portion is 53923156875619 Trying Pollard rho... 2,2,2,3,83,409,6791,411469,131050351,1, OK! C:\>factor 45346346353453643534522543411 Trying to factorise 45346346353453643534522543411 Trying trial divisions... Remainding portion is 121571974137945425025529607 Trying Pollard rho-1... 373,127819,52721,18040742769701893,1, OK! C:\>factor 765674960895860548647659458604856094859061115 Trying to factorise 765674960895860548647659458604856094859061115 Trying trial divisions... Remainding portion is 1168969405947878700225434287946345182990933 Trying Pollard rho-1... Remainding portion is 1168969405947878700225434287946345182990933 Trying Pollard rho... 5,131,35574947,32859343569728401850477367905744039,1, OK! C:\>factor 849357309574398572983749827349822289473 Trying to factorise 849357309574398572983749827349822289473 Trying trial divisions... Remainding portion is 28502879612550708848744918532495127 Trying Pollard rho-1... Remainding portion is 28502879612550708848744918532495127 Trying Pollard rho... Remainding portion is 1540239701527958435278069669 Trying Elliptic curves... 3,3,7,11,43,18505483, INCOMPLETE!!! C:\>factor 32499823472313423412312414243511 Trying to factorise 32499823472313423412312414243511 Trying trial divisions... Remainding portion is 119925547868315215543588244441 Trying Pollard rho-1... Remainding portion is 119925547868315215543588244441 Trying Pollard rho... Remainding portion is 119925547868315215543588244441 Trying Elliptic curves... 271, INCOMPLETE!!! C:\>factor 23423423523253423423423423524199392991 Trying to factorise 23423423523253423423423423524199392991 Trying trial divisions... Remainding portion is 23423423523253423423423423524199392991 Trying Pollard rho-1...

APPENDIX B. PROGRAM OUTPUT 47 Remainding portion is 1480901784361979099919290859467623 Trying Pollard rho... Remainding portion is 1480901784361979099919290859467623 Trying Elliptic curves... 15817, INCOMPLETE!!! B.3 Biggest factorisation During the course of this project, I performed lots of factorisations. The largest one I achieved was done using Pollard s ρ algorithm, and has as follows: 1041979940506209714136430511217320000000000000000000000000 0000000000000000000000000000000000000000000000000000000000 00000000000001 = 247 1667 49891 2200717 23048379600231661 8883049050652428355273922788632352489739844628952085821986 62338027412731196022283540032318521067 where 1041 0001 has 130 digits, and 2304 1067 is a 113-digit probable prime. This factorisation took 35 seconds on a Pentium Celeron at 433 MHz.

BIBLIOGRAPHY [1] Richard P. Brent. An improved monte carlo factorization algorithm. Nordisk Tidskrift for Informationsbehandling (BIT), 20:176 184, 1980. [2] Richard P. Brent. Some parallel algorithms for integer factorisation. Technical report, 1999. [3] Donald E. Knuth. The art of computer programming, Volume 2. Addison- Wesley, second edition, 1981. [4] Evangelos Kranakis. Primality and cryptography. B.G. Teubner, 1986. [5] Thorkil Naur. Integer factorisation, daimi report. Technical report, 1982. [6] J. M. Pollard. Theorems on factorisation and primality testing. Proc. Cambr. Philos. Soc., 76:521 528, 1974. [7] J. M. Pollard. A monte carlo method for factorisation. Nordisk Tidskrift for Informationsbehandling (BIT), 15:331 334, 1975. [8] Hans Riesel. Prime numbers and computer methods for factorization. Birkhauster, 1985. 48