A Comparison Of Integer Factoring Algorithms. Keyur Anilkumar Kanabar



Similar documents
Factoring & Primality

The Mathematics of the RSA Public-Key Cryptosystem

Primality Testing and Factorization Methods

MATH 168: FINAL PROJECT Troels Eriksen. 1 Introduction

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

Number Theory and Cryptography using PARI/GP

Elements of Applied Cryptography Public key encryption

Integer Factorisation

International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013

Primality - Factorization

Integer Factorization using the Quadratic Sieve

Factoring Algorithms

Number Theory. Proof. Suppose otherwise. Then there would be a finite number n of primes, which we may

Runtime and Implementation of Factoring Algorithms: A Comparison

Number Theory and the RSA Public Key Cryptosystem

Principles of Public Key Cryptography. Applications of Public Key Cryptography. Security in Public Key Algorithms

Public Key Cryptography: RSA and Lots of Number Theory

Cryptography and Network Security Chapter 8

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and

Determining the Optimal Combination of Trial Division and Fermat s Factorization Method

Factoring integers, Producing primes and the RSA cryptosystem Harish-Chandra Research Institute

CHAPTER 5. Number Theory. 1. Integers and Division. Discussion

RSA Attacks. By Abdulaziz Alrasheed and Fatima

Elementary factoring algorithms

CONTINUED FRACTIONS AND FACTORING. Niels Lauritzen

Arithmetic algorithms for cryptology 5 October 2015, Paris. Sieves. Razvan Barbulescu CNRS and IMJ-PRG. R. Barbulescu Sieves 0 / 28

An Overview of Integer Factoring Algorithms. The Problem

Secure Network Communication Part II II Public Key Cryptography. Public Key Cryptography

Mathematics of Internet Security. Keeping Eve The Eavesdropper Away From Your Credit Card Information

Outline. Computer Science 418. Digital Signatures: Observations. Digital Signatures: Definition. Definition 1 (Digital signature) Digital Signatures

A SOFTWARE COMPARISON OF RSA AND ECC

RSA Question 2. Bob thinks that p and q are primes but p isn t. Then, Bob thinks Φ Bob :=(p-1)(q-1) = φ(n). Is this true?

Factoring integers and Producing primes

Discrete Mathematics, Chapter 4: Number Theory and Cryptography

Factorization Methods: Very Quick Overview

Recent Breakthrough in Primality Testing

8 Primes and Modular Arithmetic

Computer and Network Security

Study of algorithms for factoring integers and computing discrete logarithms

Basic Algorithms In Computer Algebra

Notes on Factoring. MA 206 Kurt Bryan

Software Tool for Implementing RSA Algorithm

Lecture 3: One-Way Encryption, RSA Example

The application of prime numbers to RSA encryption

Lecture 13 - Basic Number Theory.

Lecture Note 5 PUBLIC-KEY CRYPTOGRAPHY. Sourav Mukhopadhyay

ELLIPTIC CURVES AND LENSTRA S FACTORIZATION ALGORITHM

FACTORING. n = fall in the arithmetic sequence

Faster deterministic integer factorisation

FACTORING LARGE NUMBERS, A GREAT WAY TO SPEND A BIRTHDAY

The Quadratic Sieve Factoring Algorithm

Factoring. Factoring 1

NEW DIGITAL SIGNATURE PROTOCOL BASED ON ELLIPTIC CURVES

RSA and Primality Testing

Primes in Sequences. Lee 1. By: Jae Young Lee. Project for MA 341 (Number Theory) Boston University Summer Term I 2009 Instructor: Kalin Kostadinov

Library (versus Language) Based Parallelism in Factoring: Experiments in MPI. Dr. Michael Alexander Dr. Sonja Sewera.

Shor s algorithm and secret sharing

How To Know If A Message Is From A Person Or A Machine

2 Primality and Compositeness Tests

On Generalized Fermat Numbers 3 2n +1

Is n a Prime Number? Manindra Agrawal. March 27, 2006, Delft. IIT Kanpur

Factoring a semiprime n by estimating φ(n)

Cryptography and Network Security

ALGEBRAIC APPROACH TO COMPOSITE INTEGER FACTORIZATION

Public Key Cryptography and RSA. Review: Number Theory Basics

MATH10040 Chapter 2: Prime and relatively prime numbers

How To Factoring

Applied Cryptography Public Key Algorithms

Lecture 13: Factoring Integers

SECURITY IMPROVMENTS TO THE DIFFIE-HELLMAN SCHEMES

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Public-key cryptography RSA

9. POLYNOMIALS. Example 1: The expression a(x) = x 3 4x 2 + 7x 11 is a polynomial in x. The coefficients of a(x) are the numbers 1, 4, 7, 11.

6 EXTENDING ALGEBRA. 6.0 Introduction. 6.1 The cubic equation. Objectives

Two Integer Factorization Methods

Cryptography and Network Security Chapter 9

How To Solve The Prime Factorization Of N With A Polynomials

Revised Version of Chapter 23. We learned long ago how to solve linear congruences. ax c (mod m)

CHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY

Cryptography and Network Security Chapter 10

STUDY ON ELLIPTIC AND HYPERELLIPTIC CURVE METHODS FOR INTEGER FACTORIZATION. Takayuki Yato. A Senior Thesis. Submitted to

Notes on Network Security Prof. Hemant K. Soni

CRYPTOGRAPHY IN NETWORK SECURITY

CS 103X: Discrete Structures Homework Assignment 3 Solutions

Alex, I will take congruent numbers for one million dollars please

A Factoring and Discrete Logarithm based Cryptosystem

An Introduction to the RSA Encryption Method

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Computer Networks. Network Security 1. Professor Richard Harris School of Engineering and Advanced Technology

Chapter 11 Number Theory

Public-Key Cryptanalysis 1: Introduction and Factoring

Factoring Algorithms

Continued Fractions and the Euclidean Algorithm

Today s Topics. Primes & Greatest Common Divisors

11 Ideals Revisiting Z

Maths delivers! A guide for teachers Years 11 and 12. RSA Encryption

ECE 842 Report Implementation of Elliptic Curve Cryptography

Number Theory Hungarian Style. Cameron Byerley s interpretation of Csaba Szabó s lectures

Transcription:

A Comparison Of Integer Factoring Algorithms Keyur Anilkumar Kanabar Batchelor of Science in Computer Science with Honours The University of Bath May 2007

This dissertation may be made available for consultation within the University Library and may be photocopied or lent to other libraries for the purposes of consultation. Signed:

A Comparison Of Integer Factoring Algorithms Submitted by: Keyur Anilkumar Kanabar COPYRIGHT Attention is drawn to the fact that copyright of this dissertation rests with its author. The Intellectual Property Rights of the products produced as part of the project belong to the University of Bath (see http://www.bath.ac.uk/ordinances/#intelprop). This copy of the dissertation has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and that no quotation from the dissertation and no information derived from it may be published without the prior written consent of the author. Declaration This dissertation is submitted to the University of Bath in accordance with the requirements of the degree of Batchelor of Science in the Department of Computer Science. No portion of the work in this dissertation has been submitted in support of an application for any other degree or qualification of this or any other university or institution of learning. Except where specifcally acknowledged, it is the work of the author. Signed:

Abstract The problem of integer factorisation has been investigated for a number of centuries by mathematicians. In recent years the use of public key cryptosystems has increased the amount of research taking place in this field. In this report we will focus closely on five factoring algorithms: Trial Division, Fermat s Algorithm, Pollard ρ, Pollard p 1 and Lenstra s Elliptic Curve Method. We will investigate these algorithms, their running times and compare their performance. The report will also discusses how integer factorisation it paving the way for advances in Computer Science.

Contents 1 Introduction 1 1.1 Background............................ 1 1.2 The Problem........................... 2 1.3 Planning.............................. 2 2 Mathematical Introduction 3 2.1 Greatest Common Divisor.................... 3 2.2 Modular Arithmetic....................... 4 2.3 Primes and Mersenne Primes.................. 4 2.3.1 Prime Numbers...................... 4 2.3.2 Mersenne Primes..................... 5 2.4 Pseudoprimes and Primality Testing.............. 5 2.5 Big O Notation.......................... 6 2.6 Smoothness............................ 6 2.7 Integer Factorisation and RSA................. 6 2.7.1 Introduction to RSA Algorithm............. 7 2.8 Applying Factorisation to RSA Keys.............. 8 3 Literature Review 9 3.1 Introduction............................ 9 3.2 Trial Division Algorithm..................... 10 3.3 Fermat s Algorithm........................ 10 3.4 Pollard ρ Algorithm....................... 11 3.5 Pollard p 1 Algorithm..................... 13 3.6 Elliptic Curve Method...................... 14 3.7 Conclusion............................ 18 4 Requirements 19 4.1 Functional Requirements..................... 19 4.2 Non-functional Requirements.................. 19 4.3 System Requirements...................... 20 4.3.1 Hardware Requirements................. 20 4.3.2 Software Requirements.................. 20 ii

CONTENTS iii 5 Design 22 5.1 Trial Division........................... 23 5.2 Fermat s Algorithm........................ 24 5.3 Pollard ρ Method......................... 25 5.4 Pollard p 1 Method...................... 25 5.5 Elliptic Curve Method...................... 26 6 Testing 27 6.1 Timing Algorithms........................ 27 6.2 Algorithms using Smoothness Bounds............. 29 7 Results & Analysis 30 7.1 Trial Division Results...................... 31 7.1.1 Trial Division Results Using 2 Nearby Factors..... 31 7.1.2 Trial Division Results Using 3 Nearby Factors..... 32 7.1.3 Trial Division Results Using 3 Arbitrary Factors... 33 7.1.4 All Trial Division Results................ 34 7.2 Fermat s Algorithm Results................... 35 7.3 Pollard ρ Results......................... 36 7.3.1 Pollard ρ Results Using 2 Nearby Factors....... 36 7.3.2 Pollard ρ Results Using 3 Nearby Factors....... 37 7.3.3 Pollard ρ Results Using 3 Arbitrary Primes...... 38 7.3.4 All Pollard ρ Results................... 39 7.4 Pollard p 1 Results....................... 40 7.4.1 Pollard p 1 Results Using 2 Nearby Factors..... 40 7.4.2 Pollard p 1 Results Using 3 Nearby Factors..... 41 7.4.3 Pollard p 1 Results Using 3 Arbitrary Factors.... 42 7.4.4 All Pollard p 1 Results................. 43 7.5 Elliptic Curve Method (ECM) Results............. 44 7.5.1 ECM Results Using 2 Nearby Factors......... 44 7.5.2 ECM Results Using 3 Nearby Factors......... 45 7.5.3 ECM Results Using 3 Arbitrary Factors........ 46 7.5.4 All ECM Results..................... 47 7.5.5 Comparing All Factoring Algorithms.......... 48 7.6 Summary............................. 52 8 Conclusions 53 8.1 Introduction............................ 53 8.2 Critique.............................. 53 8.3 Achievements........................... 55 8.4 Future Advances......................... 55 8.5 Summary............................. 56 A Test Data & Results Tables 57

CONTENTS iv B Code 64 C User Documentation 65 C.1 Trial Division........................... 65 C.2 Fermat s Algorithm........................ 65 C.3 Pollard ρ Algorithm....................... 65 C.4 Pollard p 1 Algorithm..................... 66 C.5 Elliptic Curve Method Algorithm................ 66 D Gantt Chart 67 Bibliography 68

List of Figures 3.1 An Elliptic Curve......................... 15 6.1 B 1 values used for the Elliptic Curve Method......... 29 7.1 Trial Division with data From Table A.1............ 31 7.2 Trial Division with data From Table A.2............ 32 7.3 Trial Division with data From Table A.3............ 33 7.4 Trial Division with data From Table A.1, A.2 & A.3..... 34 7.5 Fermat s Algorithms with Data From Table A.1........ 35 7.6 Pollard ρ using data from Table A.1.............. 36 7.7 Pollard ρ using data from Table A.2.............. 37 7.8 Pollard ρ using data from Table A.3.............. 38 7.9 Pollard ρ with data From Table A.1, A.2 & A.3........ 39 7.10 Pollard p 1 with data from Table A.1............ 40 7.11 Pollard p 1 with data from Table A.2............ 41 7.12 Pollard p 1 with data from Table A.3............ 42 7.13 Pollard p 1 with data From Table A.1, A.2 & A.3...... 43 7.14 Elliptic Curve Method using data from Table A.1....... 44 7.15 Elliptic Curve Method using data from Table A.2....... 45 7.16 Elliptic Curve Method using data from Table A.3....... 46 7.17 Elliptic Curve Method with data From Table A.1, A.2 & A.3 47 7.18 Comparing all algorithms using data from Table A.1..... 48 7.19 Comparing all algorithms using data from Table A.2..... 50 7.20 Comparing all algorithms using data from Table A.3..... 51 v

List of Tables A.1 2 Nearby Factor. 1% difference between factors........ 58 A.2 3 Nearby Primes. 1% difference between factors........ 59 A.3 3 Arbitrary Factors........................ 60 A.4 Results generated using Table A.1............... 61 A.5 Results generated using Table A.2............... 62 A.6 Results generated using Table A.3............... 63 vi

Acknowledgements I would first like to thank Dr Russell Bradford, whose guidance and support over the duration of this dissertation has been invaluable. I would also like to thank my family, for their unrivalled support throughout all that I do. vii

Chapter 1 Introduction 1.1 Background Number theory is probably one of the oldest branches of pure mathematics. One of the most important results in number theory is the Fundamental Theorem of Arithmetic. This theorem states: That every natural number greater than 1 can be written as a unique product of prime numbers Prime numbers can thus be considered the building blocks of the natural numbers. Integer factorisation is an area of number theory which takes composite numbers and breaks them down into their constituent prime factors. Gauss identified primality testing and integer factorisation as the two most fundamental problems in his Disquisitiones Arithmeticae (Gauss 1801) and they have been the prime subject of many mathematicians work ever since. In the past decades the rapid growth and use of WAN s 1 especially the Internet has had far reaching effects. The breath of human knowledge is now available to anyone with basic computer literacy. With this newly found ability to access information is a need for mechanisms to store and transmit data securely began to emerge. Cryptosystems were developed to meet this demand, the most notable being RSA (Rivest, Shamir & Adelman 1977). These cryptosystems were based on number theory. The strength of the RSA cryptosystems was based on the notion that the multiplication of two large prime numbers (so-called semiprimes) is much easier than factoring the resulting product. When the numbers are very large, no efficient integer factorization algorithm is publicly known (Brent 2000a); a recent effort which factored a 193 digit number (RSA-640) (Labs 2005) took approximately 30 2.2GHz- Opteron-CPU years according to the submitters and over five months of 1 Wide Area Networks i.e Intranets 1

CHAPTER 1. INTRODUCTION 2 calendar time. The presumed difficulty of this problem is at the heart of certain cryptography algorithms. Many areas of Mathematics and Computer Science have been brought to bear on the problem, including elliptic curves, algebraic number theory, and quantum computing. 1.2 The Problem Deterministic algorithms are algorithms which are guaranteed to find a solution if we let them run long enough. On the contrary, nondeterministic algorithms may never terminate. The most usual distinction, however, deals with the run-time of the algorithm. The running time of recent algorithms depends on the size of the input number N, whereas older algorithms depended on the size of the factor f which they find. The aim of this report is to study and implement these deterministic and nondeterministic factoring algorithms: Trial Division Algorithm Fermat s Algorithm Pollard ρ Method Pollard p 1 Method Elliptic Curve Method (ECM) and study their running time based on the lengths and number of factors found within the integers inputted. The investigation in this report will aim to compare and comment how these algorithms performed against one another with a given set of data. As opposed to hypothesise on the performance and expected behaviour of the algorithms. 1.3 Planning The success of any research or software project depends upon the amount of time and consideration taken when planning how the allotted time should be spent. Appendix D contains a Gantt chart that was developed at the outset of this project. Every effort was taken to ensure that the time spent on each task was allocated correctly. I will endeavour to follow the plan stated in Appendix D as closely as possible.

Chapter 2 Mathematical Introduction Though the intended audience for this report are academics and/or students who have some prior knowledge in this particular field, it would be presumptuous to assume that all the readers are familiar with the range of mathematical tools and techniques used in integer factorisation. This chapter will briefly cover these tools and techniques. 2.1 Greatest Common Divisor The greatest common divisor (GCD) is sometimes referred to as the the greatest common factor or the highest common factor. In this report we will use the term greatest common divisor. If u and v are both non-zero integers, we say that their greatest common divisor or gcd(u, v), is the largest integer that divides both numbers without a remainder. Two numbers are called coprime or relatively prime if their greatest common divisor equals 1. The most most efficient method for calculating the great common divisor is the Euclidean Algorithm. Recall the the division algorithm states that for two integers n and m (assume we may write m = nq + r, where q is called the quotient and r is called the remainder and satisfies the condition r < n). The Euclidean algorithm states: to find the greatest common divisor between n and m, divide m by n. If the remainder is zero, then m is a multiple of n and we are done, the least common multiple being n. If not, divide the divisor (n) by the remainder. Continue this process, dividing the last divisor by the last remainder, until the remainder is zero. The last non-zero remainder is then the greatest common divisor of the integers m and n. 3

CHAPTER 2. MATHEMATICAL INTRODUCTION 4 2.2 Modular Arithmetic Modular arithmetic is the arithmetic of congruences, sometimes referred to as clock arithmetic. Addition, subtraction and multiplication work like in regular arithmetics, but there is no division. The notation used for expressions involving modular arithmetic is: x = y (mod m) Which reads as x is equivalent to y, modulo m. x is the integer remainder of dividing y by m. For example, 7 = 23 (mod 8) and 4 = 13 (mod 9). The following statement is a basic principle of modular arithmetic: a + kp = a (mod p) 2.3 Primes and Mersenne Primes 2.3.1 Prime Numbers A prime number (or a prime) is a natural number that has exactly two (distinct) natural number divisors, which are 1 and the prime number itself. The first question that arises is whether or not the list of primes is finite. If it were then it would be possible to list all the primes and anyone wishing to determine whether an integer were prime could simply look it up. Euclid determined that there are an infinite number of primes (Ribenboim 2004). The proof can be easily obtained from a number of mathematical texts or the Internet. The almost random distribution of prime numbers within N make locating and verifying the primality of an integer increasingly difficult as the size of the number increases. Euler once commented Mathematicians have tried in vain to this day to discover some order in the sequence of prime numbers, and we have reason to believe that it is a mystery into which the mind will never penetrate (Havil 2003) The Sieve of Eratosthenes is a straightforward way to compute the list of all prime numbers up to a given integer limit n (Rev. Samuel Horsley 1772). To find all primes less than or equal to n, we list all the integers from 2 to n. We then proceed to work down the list. The first integer 2, must be prime. We cross off all multiples of 2 which are larger than 2. The first integer after 2 which has not been crossed is 3 - this is obviously prime. We cross off all multiples of 3 which are larger than 3. We continue down the list in this manner. When we have found a new prime, we cross off all multiples of that new prime which are larger than the prime itself and then move to the next integer which has not been crossed off and which must again be prime. One of the work savers that Eratosthenes discovered was that we do not need to continue this all the way up to n. Once we have found a prime larger

CHAPTER 2. MATHEMATICAL INTRODUCTION 5 than the square root of n, all of the remaining integers which have not been crossed off must be prime. If any of them were composite then they would have to have a factor less than or equal to their square root. If n = a b then a n or b n The Sieve of Eratosthenes was superceded by the Sieve of Atkin. This sieve was a much faster and more modern algorithm than the Sieve of Eratosthenes. The Sieve of Atkin was developed by A. O. L. Atkin and Daniel J. Bernstein. (Atkin & Bernstein 1999) 2.3.2 Mersenne Primes A Mersenne prime is a number that has the form: M n = 2 n 1 which is prime. In order for M n to be prime, n must itself be prime. This is true since for composite n with factors r and s, n = rs. Therefore, 2 n 1 can be written as 2 (rs) 1, which is a binomial number that always has a factor (2 r 1). The first few examples of Mersenne primes are 3, 7, 31, 127, 8191, 131071 corresponding to n = 2, 3, 5, 7, 13, 17 Mersenne primes were first studied because of the remarkable properties that every Mersenne prime corresponds to exactly one perfect number 1. 2.4 Pseudoprimes and Primality Testing A pseudoprime is an integer which shares a property common to all prime numbers but is not actually prime. Pseudoprimes are classified according to which property they satisfy (Rivest 1991). The most significant class of pseudoprimes are derived from Fermat s little theorem and appropriately referred to as Fermat pseudoprimes. This theorem simply states that if p is prime and a is coprime to p, then a p 1 1 is divisible by p. If a number x is not prime, a is coprime to x and x divides a x 1 1, then x is called a pseudoprime to base a. A number x that is a pseudoprime for all values of a that are coprime to x is called a Carmichael number 2. The smallest Fermat pseudoprime for the base 2 is 341. It is not a prime, since it equals 11 31, but it satisfies Fermat s little theorem: 2341= 2(mod 341). The rarity of such pseudoprimes has important practical implications. For example, public-key cryptography algorithms such as RSA require the 1 A perfect number is a positive integer is said to be perfect if it is the sum of its proper divisors (those positive divisors strictly less than itself) 2 A Carmichael number is an odd composite number n which satisfies Fermat s little theorem

CHAPTER 2. MATHEMATICAL INTRODUCTION 6 ability to quickly find large primes. The usual algorithm to generate prime numbers is to generate random odd numbers and test them for primality. However, deterministic primality tests are slow. If the user is willing to tolerate a very small chance that the number found is not a prime number but a pseudoprime, it is possible to use the much faster and simpler Fermat primality test. Another approach is to use more refined notions of pseudoprimality, e.g. strong pseudoprimes or Euler-Jacobi (Guy 1994) pseudoprimes, for which there are no analogues of Carmichael numbers. This leads to probabilistic algorithms such as the Solovay-Strassen primality test and the Miller-Rabin primality test, which produce what are known as industrialgrade primes (Loudon 1999). Industrial-grade primes are integers for which primality has not been certified (i.e. rigorously proven), but have undergone a test such as the Miller-Rabin test which has positive, but impossibly low, probability of failure (Hurd 2003). 2.5 Big O Notation The function f(x) is O(g(x)) as x if and only if there are positive real constants c, k such that for every x >k, 0 f(n) cg(n). When Big O notation is applied to the running time or storage requirements of an algorithm, one may write simply O(g(x)), and it is assumed that x. If multiple variables are present, the variable which goes to infinity is indicated. As part of the definition of O(g(x)), all possible executions of the algorithms must be considered as x. 2.6 Smoothness We say that a positive integer is B-smooth if all its prime factors n are n B. An integer is said to be smooth with respect to S, where S is some set of integers, if it can be completely factored using the elements of S. We often simply use the term smooth, in which case the bound B or the set S is clear from the context. 2.7 Integer Factorisation and RSA As discussed earlier the vast increase of research into integer factorisation has come about due to the widespread use of computer based cryptosystems. Due to modern implementations of these cryptosystems (RSA in particular) we rarely get to see the underlying mechanics of how data is encrypted and decrypted. Hence, sometimes we fail to see the obvious relationship between cryptosystems, the mathematics involved and where integer factorisation comes into all this. In this chapter we will briefly look at the most commonly

CHAPTER 2. MATHEMATICAL INTRODUCTION 7 used cryptosystem - RSA. We will also look at how integer factorisation can (usually with small RSA key lengths) crack an encrypted RSA message. 2.7.1 Introduction to RSA Algorithm In 1978, the first general-purpose public-key algorithm RSA was created by Ron Rivest, Adi Shamir, and Leonard Adleman (Rivest et al. 1977). RSA is an incredibly simple algorithm that can be easily understood, implemented, and analyzed by someone with basic knowledge of number theory. It is well suited for encryption, digital signatures, and authentication. Before one can use RSA, a key-pair must be generated 3. To generate the keys, first find two large prime numbers p and q of approximately equal size. The modulus for the keys is defined as N = pq The user then must select an encryption exponent e that is relatively prime to φ(n) = (p 1)(q 1). The decryption exponent d is defined as d e 1 (modφ(n)) The public key now consists of e, N and the private key is d, N. If we wish to encrypt a plaintext message block m (with m < N ) we use the following method: c m e (mod N ) c is now referred to as the ciphertext. To decrypt the ciphertext back to plaintext is done using the following technique: We can verify this result m c d (mod N ) c d (m e ) d (mod N ) (m ed ) (mod N ) (m 1+kφ(N) ) (mod N ) m.(m φ(n) ) k (mod N ) m.1 (mod N ) We have assumed that both m and N are relatively prime. If they re not then we need to use Carmichael s λ function (Erdos, Pomerance & Schmutz 1991)in the place of Euler s φ function. 3 A public key can be distributed freely and a private key is to be kept secret by the key pair owner.

CHAPTER 2. MATHEMATICAL INTRODUCTION 8 2.8 Applying Factorisation to RSA Keys The simplicity of the RSA algorithm is one of its major strengths. Let us assume we wish to intercept an encrypted message and return it back into plaintext. We have access to the public key 4 as well as the encrypted message. Using the value of N that is apart of the public key we can use a factoring algorithm to derive the value of of p and q. Once we have these two values we can the use the Euler function to generate a new decryption exponent e. We have now essentially created another private key which we can use to view the encrypted message m. 4 Assumption is made that reader understands the basics of asymmetric cryptography. If not please read Sun s introduction to public key cryptography

Chapter 3 Literature Review 3.1 Introduction Number theorists have been studying the problem of integer factorisation for hundreds of years. Some of the most brilliant mathematicians have spent their careers investigating this area. One such mathematician was C.F. Gauss who once wrote (Gauss 1801): The problem of distinguishing prime numbers from composite numbers and of resolving the latter into their prime factors is known to be one of the most important and useful in arithmetic. It has engaged the industry and wisdom of ancient and modern geometers to such an extent that it would be superfluous to discuss the problem at length Further, the dignity of the science itself seems to require solution of a problem so elegant and so celebrated. This problem has mostly remained in the domain of mathematics until recently. As information has become more readily available in an age where data can be transferred across vast public networks, security has become of paramount importance. Cryptosystems have been developed to address this problem. The very fact that large integers are difficult to factorise is the cornerstone of many modern cryptosystems. One such example is the RSA public key cryptosystem. As cryptosystems have become more widely adopted both mathematicians and computer scientists have become more interested in their actual strength. Commercial organisations such as RSA Laboratories have issued public competitions with cash prizes (Labs 2005). One purpose of this contest is to track the state of the art in factoring. Advances and studies into large integer factorisation will ultimately lead to more secure and robust cryptosystems. 9

CHAPTER 3. LITERATURE REVIEW 10 The following sections in this literature review will look at how a broad range of factoring algorithms will be integrated into the proposed integer factorisation system. 3.2 Trial Division Algorithm The trial division algorithm is the simplest of the factoring algorithms. It is a suitable algorithm to begin the investigation of integer factoring algorithms. This algorithm attempts to factorise a composite integer N using a brute force approach. The trial division algorithm attempts to divide N by every prime number up to and including the N. If a number p is found which evenly divides into N, then p is a prime factor of N. Trial division method guarantees to find the factors of N since it checks all the possible prime factors. If the algorithm is unable to find a factor of N then it proves that N is a prime it self. An alternative (and slower) technique of the trial division method is known. Instead of attempting to divide N by prime numbers it attempts to divide it by every integer up to and including N. The trial division algorithm is reasonably efficient (hence computationally feasible) for small numbers. One might use trial division to factorise numbers with factors of 7 or 8 digits. After that it gradually becomes inefficient, and cannot compete with the stronger algorithms (Flannery, Flannery & Flannery 2000). According to (Brent 1999) the complexity of the trial division algorithm is: O(f log N ) 2 ) where f is the size of the factor to be found. The time taken by the algorithm to find factors increases exponentially as f increases. 3.3 Fermat s Algorithm Fermat s algorithm can be viewed as a modern factoring algorithm (Uhl n.d.). Though the algorithm is not widely used today, it is implemented when the number to be factored has two factors which are relatively close to the square root of the number. The Continued Fractions and Quadratic Sieve algorithms are based on the key principles on Fermat s Algorithm (Morrison & Brillhart 1975). Fermat s algorithm works in the following way. Let us begin with a number N which we wish to factorise. Suppose that N can be written as the difference of two squares, such as: N = x 2 y 2

CHAPTER 3. LITERATURE REVIEW 11 then, N = (x y)(x + y) we have succeeded in breaking N into two smaller factors. Fermat s algorithm works in a different director to the Trial Division algorithm. With the Trial Division algorithm we begin by looking for a small factor and then working our way up to the N. With Fermat s algorithm we being by looking at the factors close to the N and begin working down. Let us begin with a positive integer N which is to be factored. We search for integers x and y such that x 2 y 2 = N. We begin with x equal to the smallest integer greater than or equal to the N and try increasing y until x 2 y 2 either equals or is less than N. In the first case we are done, in the second we increase x by one and iterate. We continue this until we have success. We could also set t to x 2 y 2 N, then we have success when t = 0. The next aspect of Fermat s Algorithm that we ll be looking at is the complexity of it. Let us suppose that N = a b, with a < b. Factorisation will be achieved when x = (a + b) / 2. Since the starting value of x is N, and b = N/a, the factorisation will take approximately 1 2 (a + N a ) N = ( N a) 2 2a cycles. If the two factors of N are very close, i.e. if a = k N, with 0 <k <1, then the number of cycles required in order to obtain the factorisation is (1 k) 2 2k N This complexity is of the order O(cN 1 2 ). However, the value of k can be very small, and thus making this algorithm impractical. For instance, let us consider an ordinary case where a N 1 3 and b N 2 3. In such a case, the number of cycles necessary will be ( N 3 N) 2 2 3 N = ( 3 N) 2 ( 6 N 1) 2 2 3 N 1 2 N 2 3 which is considerably higher than O(N 1 2 ). Therefore this algorithm is only practical when the factors a and b are almost equal to each other. 3.4 Pollard ρ Algorithm The Pollard ρ algorithm was developed in 1975, by a British mathematician John Pollard (Pollard 1975). This factoring algorithm is sometimes referred to as the Pollard Monte Carlo factorisation method due to its pseudo-random

CHAPTER 3. LITERATURE REVIEW 12 nature. This algorithm becomes inefficient if all the prime factors are large (larger than 10 12 ) (Bressoud 1989).It s usually used to find moderately sized factors, around 10 5 to 10 10. Richard P. Brent later refined the algorithm in a paper published in 1980 (Brent 1980). This refinement was claimed to improve the algorithm developed by Pollard by some 24%. It did this by using a different method of cycle detection that was considerably quicker than Floyd s original Rho algorithm (Floyd 1967). Let N the composite number we wish to factorise and d be an unknown nontrivial divisor of N. Let f(x) be a simple irreducible polynomial in x. Generally we would use x 2 +1 or something similar. Starting with an integer x 0, we create a sequence from the recursive definition: Let: x i = f(x i 1 )modn y i = x i mod d Since x i f(x i 1 ) (mod N ), y i is congruent to f(y i 1 ) modulo d. There are only a finite number of congruence classes, modulo d (namely d of them) and so eventually we will have y i = y j, for some pair (i, j). But once that happens, we will keep cycling and for all positive t: y i+t = y j+t The sequence begins to look like a circle with the tail - like the Greek letter ρ. It was because of this the algorithm was given lends its name that. If y i equals y j, then x i x j (mod d) and so d divides x i x j. There is a very good chance that x i and x j are not equal, and if this is the case then gcd(n, x i x j ) is a non-trivial divisor of N. The problem we face now is that since we do not know d, we do not know the values of the y i s and so we do not know when y i equals y j. There are in fact infinitely many pairs i, j for which y i and y j are equal. If the length of the cycle is c, then once we are off the tail, and (i, j) pair for which

CHAPTER 3. LITERATURE REVIEW 13 c divides j 1 will work. We find some systematic way of choosing a lot of pairs (i, j) and for each pair compute gcd(n, x i x j ). In order to avoid storing many values of the x i s we look at the differences, expressed in general as: x 2 n 1 x j, 2 n+1 2 n 1 j 2 n+1 1 The important factor here is the difference between the coordinates, which decrease by one each time. We keep moving the smaller coordinates up to guarantee we get of the tail. 3.5 Pollard p 1 Algorithm The pollard p 1 algorithm was developed by J.M. Pollard in 1974 (Pollard 1974). It is a special-purpose algorithm; this means that it is only suitable for integers with specific types of factors. The algorithm (Bressoud 1989) is loosely based on Fermat s little theorem: If p is prime, and a 0 mod p then a p 1 1 (mod p) Let us supposed that the integer we wish to find the factors of N, and that one of the prime factors is p. We also assume that p 1 divides Q. Using Fermat s little theorem, and assuming that (p 1) Q we can derive the following: a Q 1 (mod p) thus, p divides a Q -1. We apply the GCD (greatest common divisor)operator to N and a Q -1 to derive p or some other non-trivial divisor of N. We now wish to find a Q such that (p 1) Q, we also have to remember that we do not have a value for p. This problem can be approached using two different techniques. The first, and probably easier of the two is to set Q = max! (mod N ). This value can be computed quickly: a max! = (... (((a 1 ) 2 ) 3 ) 4...)max This method is quick because of the fact that exponential modulo N is a fast operation. a can be any number, so long as its relatively prime to N. The second, and faster technique for finding Q is to set Q = p 1 p 2 p 3... p k where p i is a prime number less that the limited specified. In some cases we may wish to add multiples of small primes to Q so we can ensure that we

CHAPTER 3. LITERATURE REVIEW 14 don t miss any factors of N. Doing this will reduce the number of exponentials required by a factor of eight. This algorithm does have some problems as well. It is possible that we may find the GCD to be equal to N, in which case we change the base to something else. There may also be scenarios where the algorithm may not terminate if p 1 has only large prime factors. It has been proven that the largest prime factor of an arbitrary integer N usually falls around the 0.63 power of that integer. For example if we set max value in the algorithm to 10,000 then it will usually find any prime factors that are less than two million. This algorithm is similar to the other Pollard algorithm and hence is most efficient finding factors similar to the ones found by the Pollard ρ algorithm. 3.6 Elliptic Curve Method The Elliptic Curve Method (ECM) is sometimes referred to as Lenstra Elliptic Curve Method (Lenstra 1987). This method implements elliptic curves to find factors of an integer. To understand this method it is necessary to study elliptic curves. Elliptic curves are equations of the form: y 2 = x 3 + ax + b where a and b are constants chosen so that 4a 3 + 27b 0 These curves have an unusual property that if a non-vertical line intersects it at two points, then it will have a third point of intersection. A tangent to the curve is considered to have two points of interaction at the point of tangency (Atkin & Morain 1993). A graph showing an elliptic curve can be seen on Fig 3.1 If we know the two points (x 1, y 1 ) and (x 2, y 2 ) of the intersection, we can compute the slope (λ) of the line, as well as the third point of intersection in the following way: λ = { 3x 2 1 +a 2y if x 1 = x 2 otherwise y 1 y 2 x 1 x 2 x 3 = λ 2 x 1 x 2 (mod n) y 3 = λ(x 3 x 1 ) + y 1 (mod n)

CHAPTER 3. LITERATURE REVIEW 15 Figure 3.1: An Elliptic Curve If we wish to use elliptic curves to perform factorisation we need to make the set of points on an elliptic curve into a group. We begin this operation by defining the following binary operation. We assume that the elliptic curve has two rational points on the curve (x 1, y 1 ) and (x 2, y 2 ) (x 1, y 1 ). The binary operation is: (x 1, y 1 ) (x 2, y 2 ) = (x 3, y 3 ), where x 3 and y 3 were defined earlier. We must note that the new point is not the third point of the intersection, but it is defined as a reflection on the x-axis. It is still, however, on the same elliptic curve. We now have a set and a binary operation. We proceed by defining the identity element of our group as follows: (x, y) (x, y) = (x, y) (x, y) = The point can be thought of as a point infinitely far north so that every vertical line passes thought it. One of the interesting facts about this definition is that now every straight line, which intersects the curve at two points, also intersects at a third. The notation E(a, b) denotes the group of rational points on the curve y 2 = x 3 + ax + b, where 4a 3 + 27b 0, together with the point. Also, with (x i, x j ) we denote (x i, x j )#i, where: (x i, x j )#i = (x 1, y 1 ) (x 1, y 1 ) (x 1, y 1 ) } {{ } itimes

CHAPTER 3. LITERATURE REVIEW 16 All the points previously discussed still apply to elliptic curves modulo n. If x 1 x 2 (mod n) and y 1 y 2 (mod n) then: (x 1, y 1 ) (x 2, y 2 ) = Let us set s to be the inverse of x 1 x 2. We define the following: { (3x 2 λ = 1 + a) s if x 1 x 2 (mod n) (y 1 y 2 ) s otherwise x 3 = λ 2 x 1 x 2 (mod n) y 3 = λ(x 3 x 1 ) + y 1 (mod n) We can also define the binary operation as: (x 1, y 1 ) (x 2, y 2 ) (x 3, y 3 ) (mod n) we will also define (x i, y j ) mod n as: (x i, y j ) (x 1, y 1 )#i(mod n) The expression E(a, b)/n will indicate the elliptic group modulo n who has element pair (x, y) of non-negative integers less than n and satisfying the expression y 2 = x 3 + ax + b together with the point. If we are going to be able to implement factorisation techniques and primality tests using the arithmetic of elliptic curves, we need a fast way of computing (x, y)#i. Given the first coordinate of (x 1, y 1 ), we can compute the first coordinate of (x 2, y 2 ) as follows: x 2 = (x2 a) 2 8bx 4(x 3 +ax+b) Thus, give the first coordinates of (x, y)#i, we can compute the coordinates of (x, y)#2i using the above expression. To extend this (x, y)#2i + 1 with the following expression: x 2i+1 = (a x ix i+1 ) 2 4b(x i +x i+1 ) x 1 (x i x i+1 ) 2 We can avoid rational numbers and restrict our attention to integers if we introduce a third coordinate and write: x = X Y, y = Y Z

CHAPTER 3. LITERATURE REVIEW 17 where X, Y and Z are integers. An additional feature of using such notation is that the identity element now has the explicit representation (0, Y, 0) where Y can be any integer. If we have the following relationship (X i, Y i, Z i ) = (X, Y, Z) #i. We can then adjust our previous expression to obtain the following computational rules: X 2i = (X i 2 az i 2 ) 2 8bX i Z i 3 Z 2i = 4Z i (X i 3 + ax i Z i 2 + bz i 3 ) X 2i+1 = Z[(X i Xi + 1 az i Z i+1 ) 2 4bZ i Z i+1 (X i Z i+1 + X i+1 Z i )] Z 2i+1 = X 1 (X i+1 Z i X i Z i+1 ) 2 While we shall not need the value for Y i, it is worth noting that, up to sign Y i can be recovered from the value of X i and Z i. The following procedure for factoring integers by the means of elliptic curves is essentially due to A.K. Lenstra and H.W. Lenstra Jr. Let N be a composite number relatively prime to 6. In practice, N is known to have no small prime factors. We randomly choose a parameter a for our elliptic curve and a point (x, y) on the curve, 0 x, y N We must note that the values of a, x and y uniquely determine b: We verify that: b y 2 x 3 ax(mod n) gcd(4a 3 + 27b 2, N) = 1 if not then we have probably found a factor of N. Converting (X, Y, Z), our initial triple is (x, y, 1). If p is prime dividing N and E(a, b)/p divides k!, then (X, Y, Z)#k! = ( (((X, Y, Z)#1)#2)#3 )#k will be the identity in E(a, b)/p, which means that p will divide Z k!. If k is not too large, there is a very good chance that gcd(z k!, N) is a non-trivial divisor of N. The Elliptic Curve Method is used when the use of trial division becomes impossible. The largest factor that has been found using Elliptic Curve

CHAPTER 3. LITERATURE REVIEW 18 Method is the 66-digit factor 3 466 + 1n was found by B. Dodson on Apr. 6, 2005. (Zimmermann n.d.) The Elliptic Curve Method is inefficient for factoring smaller composite numbers. If RSA system were implemented with 512-bit keys and the three-factor variation, the smallest prime would be less that 66-digit, so Elliptic Curve method could be used to break the system. 3.7 Conclusion Having concluded a large amount of research in my chosen topic I feel that I now a solid grounding in this field of research. It has also clarified the direction that I wish to take my project in. These algorithms have been well research and implemented in a number of computer languages. My supervisor and I have decided that there should be another application, which has the ability to analyse the incoming number to be factorised. This application would then select the factoring algorithm that would be most appropriate to factorise it. As discussed earlier, these algorithms have been designed to operate on numbers of a certain size - the proposed application will remove this uncertainty of which algorithm to select. After searching through a wide range of material I have been unable to find any documentation or applications, which discuss or implement the idea I ve proposed. The proposed application could probably be used in the academic environment as a teaching tool to undergraduates to demonstrate the way different integer factoring algorithms operate.

Chapter 4 Requirements In this chapter we will outline the requirements and specifications required for implementing this project. Every attempt will be made to ensure that the requirements made here to be accurate and thorough. But there is a likelyhood that they could change as the project passes through the various stages of development. 4.1 Functional Requirements In this section we will look at the basic functionality that will be required of the software. 1.1 Should be able to accept input as an integer N. 1.2 Inputted integers maybe larger than the programming language is able to accept. We must use a arbitrary precision arithmetic library so that the limit of integer size is limited only by the system memory. 1.3 The function should be able to perform mathematical functions with the chosen arbitrary precision library. 1.4 Display the computational time taken (in seconds) to complete a factorisation & verify the results given are correct. 1.5 Users should be able to interact with the application via a command prompt interface 1.6 Users should have the ability to terminate the application at any point. 4.2 Non-functional Requirements In this section we will look at the criteria that can be used to judge the operation of a system, rather than specific behaviors. 19

CHAPTER 4. REQUIREMENTS 20 2.1 Implement the following algorithms: Trial Division Algorithm Fermat s Algorithm Pollard s ρ Method Pollards s p 1 Method Elliptic Curve Method 2.2 The study should corroborate current literature and studies and offer further insight keeping the results relevant to the problem. 2.3 Implementation should not deviate from findings in Literature Review. 4.3 System Requirements This section will briefly look at the hardware and software requirements of the project. 4.3.1 Hardware Requirements The software is being developed on a machine with a Windows XP environment and Intel Pentium M 1.73GHz Processor. Hence the software should be able to run on any machine that is able to run the Windows XP operating system. 4.3.2 Software Requirements The two main consideration to be focused upon is the programming language that is going to be used and the arbitrary precision library used to allow us to deal with large integers. Programming Language Selection The nature of this project specifically focuses on the efficiency of how well the algorithm runs based on a given input by the user. Though there are a vast number of programming language available to us we have to focus on our key requirements from the programming langauge: 1. A language that is reasonably close to the machine. Hence functionality should be available to give direct access to system resources such as main memory. 2. A number of arbitrary precision library which interface with the language should be freely available.

CHAPTER 4. REQUIREMENTS 21 3. Available on a number of different operating platforms i.e Unix, Windows, Mac OS. 4. Relatively easy to learn with a number of books and online resources available to aid development. 5. Programming tools are resources freely available. 6. Object orientated & other high level functionality is not required. Another important factor (apart from the ones described above) is that the chosen programming language should be familiar to myself. Due to time constraints it would be difficult to implement quite complex algorithms in a language which I am unfamiliar with. Based on all these factors the ANSI C programming is the most obvious choice. In this project I will be using the GNU C compiler 1. The motivation for using this language was it met the requirements described above. Arbitrary Precision Library Almost as important as the choice of programming language is the choice of which arbitrary precision library we are going to use. Due to the large size of integers we will be dealing with, the native types found in the C language such as int,float,long would not be large enough for our purposes. It is for this reason we use arbitrary precision libraries to allow us to use and perform arithmetic functions on large integers. The requirements for the library are as follows: 1. The library should be designed to perform as fast as possible 2. Easy to learn and should integrate without much difficulty with the C programming language. 3. Should be able to represent integers, rational numbers and floating point numbers. 4. Have a rich set of functions. 5. Well documented and freely available. Though a number of such libraries are available for the C language I have decided to use the GNU GMP 4.2.1 library. Not only does it fulfill the requirement made above but it one of the larger big number libraries, hence there are a vast number of tutorials and guides available. Some of the large factorisation attempts also utilise this library as well e.g. GMP-ECM which can be found at http://www.komite.net/laurent/soft/ecm/ecm-6.0.1.html 1 Documentation can be found at GNU C homepage

Chapter 5 Design Having reviewed the mathematics and the underlying concepts behind each of the algorithms we must now look at the task of implementing them. The success of any software project is firmly grounded on the how well the application is designed. Though every attempt will be made to follow the design laid out at this stage it may become apparent that certain design decision may hinder the efficiency of an algorithm implementation. This section will not deal with the actual implementation of the algorithms into C programming language. Also it will not explain all the functions and operations used within the GNU GMP library. If you are interested in this then the source code (see Appendix B) provided is fully commented and provides sufficient explanation. One function within the GMP library that is of particular importance is int mpz probab prime p (mpz t n, int reps). This function determines whether or not n is prime. Like any primality test it cannot guarantee the value passed in is prime. The function begins by performing some trial divisions, then some Miller-Rabin probabilistic primality tests. Miller-Rabin and similar tests can be more properly called compositeness tests. Numbers which fail are known to be composite but those which pass might be prime or might be composite. Only a few composites pass, hence those which pass are considered probably prime. The primary use of this function within the implemented factoring algorithms is to determine whether or not the value passed in by the user is prime. There is a deep philosophical difference between the approach of the Trial Division algorithm and Fermat s algorithm and Pollard ρ and Pollard p 1. The first two factorisation algorithms systematically search for factors (referred to as deterministic algorithms). Due to their analytical nature, they are quite straightforward to analyse. The Pollard ρ and Pollard p 1 are referred to probabilistic algorithms. These algorithms introduce randomness into the process. Due to this we can no longer be certain of finding a factor of a given size within a fixed 22

CHAPTER 5. DESIGN 23 amount of time. The trade-off for this probabilistic nature is that we gain the chance to determine the factors in much less time than compared to a deterministic algorithm would take. With probabilistic algorithms we cannot simply factor 20 to 30 digit numbers, much less 80 to 100 digit numbers, with deterministic methods. There maybe occasions when we maybe be misfortunate when determining factors of a composite number. The Pollard ρ and Pollard p 1 may not turn up any prime divisors that might be because there are no prime prime divisors in the appropriate interval or it may be down to bad luck! To overcome this we must find techniques that will increase the likelihood of finding factors. With the Pollard ρ we could replace the function used to generate the sequence: x 2 + 1. Rather than develop the pseudocode from the material covered in the literature review (chapter 3) it was decided to use the pseudocode from published sources. The sources used were Handbook of Applied Cryptography (Menezes, van Oorschot & Vanston 1996) and Factorization & Primality Testing (Bressoud 1989). The reason for taking this step was to ensure that the algorithms being implemented are as close as possible to their mathematical version. Using published pseudocodes would ensure this. 5.1 Trial Division Due to the relative simplicity of the trial division algorithm there will be very little to comment upon the implementation. Below is the pseudocode of the implementation. Algorithm 1 Trial Division Algorithm Require: An integer N 0. bound N iter = 2 while N>1 AND iter < bound do if (N MOD iter) == 0 then N = N / iter PRINT iter else iter = iter + 1 end if end while The only attempt that has been made to optimise the algorithm is to ensure that that maximum value of the iter variable is the N. The source code is fully commented and can be studied for further detail on how the algorithm operates.

CHAPTER 5. DESIGN 24 5.2 Fermat s Algorithm This section briefly looks at the pseudocode for the Fermat s Algorithm. Algorithm 2 Fermat s Algorithm Require: An integer N 0. for x to N do ysqr = x x N if issqaure(ysqr) then y = ysqr s = (x y) t = (x + y) if s<>1 & s<>n then return s, t end if end if end for Here the issquare(z) function is true if z is a square number and false otherwise. This algorithm does have a small flaw. This flaw is also experienced by the trial division algorithm where it will prove primality in the worst case.

CHAPTER 5. DESIGN 25 5.3 Pollard ρ Method Algorithm 3 Pollard ρ Method Require: An integer N 0. a 2 b 2 for i = 1, 2, do a a 2 + 1 mod N b b 2 + 1 mod N Compute d = gcd(a b, N) if 1<d<N then return d and terminate end if if d = N then Terminate algorithm with failure end if end for This algorithm is probabilistic in its nature. This means that we have to be prepared for occasions were no results are produced. 5.4 Pollard p 1 Method Algorithm 4 Pollard p 1 Method Require: An integer N 0. Select smoothness bound B Select random integer a, 2 a N 1, and compute d = gcd(a, N). If d 2 return d. for Each prime q B do l = lnn lnq a a ql mod N Compute d = gcd(a 1, N) if d = 1 or d = N then terminate algorithm with failure. end if Otherwise return d end for This algorithm requires a smoothness bound to be inputted by the user. Smoothness is discussed in chapter 2. The smoothness variable could have an effect on the time taken to find factors using this algorithm. This will be discussed further in chapter 7 (Results & Analysis).

CHAPTER 5. DESIGN 26 5.5 Elliptic Curve Method Algorithm 5 Elliptic Curve Method Require: An integer N 0, integer bound B 1 Select a random elliptic curve E a,b mod n and point P 0 = x 0 : y 0 : z 0 on it. Compute Q = Π Π B1 Π logb 1/logπ P 0 for Π prime, B 1 Π do (x Π : y Π : z Π ) g gcd(n, Z P i ) if g 1 then return g end if end for return FAIL. Like the previous algorithm this factorisation method also features a smoothness bound. The bound (B 1 ) used for this algorithm were not selected at random. Published sources have listed B 1 values for given lengths of composite numbers. Again this will be discussed in chapter 7 (Results & Analysis).

Chapter 6 Testing This chapter will focus on the steps and methods taken to test the algorithms described in the previous chapters. Our main focus is to study how much CPU time is taken by a given algorithm to factorise an integer. 6.1 Timing Algorithms Finding the amount of CPU time taken by the algorithm to work is a relatively straight forward process with the C programming language. Below is a code snippet of the code required to calculate the time time taken. #i n c l u d e <time. h> c l o c k t s t a r t, end ; double cpu time used ; Listing 6.1: CPU Time Code s t a r t = c l o c k ( ) ;... / Do the work. / end = c l o c k ( ) ; cpu time used = ( ( double ) ( end s t a r t ) ) / CLOCKS PER SEC; Obviously we then need to print the cpu time used variable to the console. The only limitation of using this code is that it can only measure execution time to a 1000 th of a second. Therefore any times that are outputted as 0 seconds could also mean that the execution took place quicker than the timing period. Times quicker than a thousandth of a second can be considered as negligible for the purposes of our investigation. When performing tests on these algorithms we will be looking for 2 main results. Firstly we will want to ensure that the output produced by the algorithm were actually the prime factors of the integer entered. We then 27

CHAPTER 6. TESTING 28 want to obtain the time taken for the algorithm to generate the results. Of course the code used for timing the algorithm takes time to execute as well. But for our investigation this execution time can be ignored. Testing was carried out using a black box 1 method. Black box testing allows scrutiny over the functionality important to the study, which is the input and the output. To check whether or not the algorithms were performing factorisation we simply checked within the main flow of the program. It was for this reason that external test modules were not required. Whilst developing these algorithms, numerous tests were carried out to ensure that the results being generated were the actual prime factors of the integer entered into the algorithm. One the main tools that assisted in this was a web based application Factorization using ECM 2. This web application was only used to ensure the validity of the output and not the processing time taken. Documentation is available on the website which demonstrates the result produced by the web application are reliable. The tests took the form of simple programs consisting of specific inputs allowing monitoring of the output. One of the most important considerations when testing, is the selection of the data that will be used to generate our results. During the development and implementation of these algorithms numerous tests were carried out to test the functionality of each implementation. It became clear that the type of composite integer used can greatly effect the amount of time taken for a algorithm to find its prime factors. As well as the actual length of the composite number to be factored, other influences are the number of prime factors that the composite number consists of and the distance the prime factors are from one another. Based on this it only seems logical to ensure the test data overcomes the differences described so that we have a consistent set of data from which we can draw valid conclusions. Therefore it was decided to use composite numbers with only two prime factors. The two prime factors also had be to be relatively close to one another - in our case there was only a 1% difference between the two. The second round of testing will contain integers with three prime factors. Again, these factors will have a difference of 1%. The final set of test data will have integers with 3 factors, but these factors will be selected at random. These tests should allow us to study the general behaviour of these algorithm implementation and should give us sufficient insight to allow us to draw relevant conclusions. An acknowledgment of a website must be made that assisted in the generation of the prime factors used in Table A.1, A.2 & A.3. The website What are the Next Ten Prime Numbers can be found at (http : 1 A software testing technique whereby the internal workings of the item being tested are not known by the tester. 2 Further information can be found at Factorization using the Elliptic Curve Method page

CHAPTER 6. TESTING 29 //www.rsok.com/ jrm/next t en p rimes.html). 6.2 Algorithms using Smoothness Bounds As mentioned in chapter 5 the Elliptic Curve Method uses a smoothness bound. Figure 6.1: B 1 values used for the Elliptic Curve Method The data used to generate the graph (Fig 6.1) can be found on the following website (http : //www.alpertron.com.ar/ecm.ht M). The smoothness value used in the Pollard p 1 were a constant value of 100. A constant value was used as I was unable to find a source with suggested value for the smoothness bound. A considerable amount of time would have been required to find the optimal values to use with the Pollard p 1 method.

Chapter 7 Results & Analysis The results were obtained from a machine with the following specification: Intel Pentium M 740 1.73Ghz 1 GB RAM Operating System: Microsoft Windows XP (Service Pack 2) Development Environment: Bloodshed Dev-C++ ver 4.9.9.2 C Compiler: MinGW (bundled with Dev-C++ ver 4.9.9.2) Every effort was taken to ensure that the machine the tests were being performed upon was running a minimum number of processes. Also, none of the algorithms were altered in any way during the collation of the results. The result tables found in Appendix A are slightly different from one another. Table A.1 shows that we tested the algorithms with composite numbers ranging from 8 digits to 34 digits. 8 digits was used as a stating point as smaller integer lengths did not produce any useful results. Other tests were also commenced from 8 digits for consistency. Unlike the other two tables Table A.1 ends at 34 digits. It ended at this value as one or more of the algorithms produced results for this lengths. Larger integers were used but did not produce results within a reasonable amount of time. The tables A.4 to A.6 list the time (in seconds) taken to factorise the given integer. The enteries in these tables that contain a - symbol signify that either the algorithm failed to find the factors or it took far too long to produce a results (usually greater than 30 minutes). The graphs on the following pages were created using Microsoft Excel 2003. The graphs of results are plotted with the number of digits in the composite number on the x-axis against CPU Time in seconds on the y-axis. Data used to generate these graphs can be found in Appendix A. 30

CHAPTER 7. RESULTS & ANALYSIS 31 7.1 Trial Division Results The graphs in the following sections was generated using the data that can be found in Appendix A. 7.1.1 Trial Division Results Using 2 Nearby Factors Figure 7.1: Trial Division with data From Table A.1 The results in the graph above were generated using the data that can be found in Table A.1 in Appendix A. The two prime factors used to generate the composite number had a difference of 1% or less. The graph clearly shows that as the number of digits in the composite increase the time taken to find the factors increase at an almost exponential rate. The results also confirms that trial division is cost effective for relatively small composite numbers (5-10 digits). It also demonstrates the concept of the deterministic nature of the trial division algorithm - that factors can be obtained as long as we are prepared to wait. There is one result on the graph that appear to be abnormal. This is for composite numbers containing 17 digits. The reason for this would be the two prime factors were quite close to the prime factors that made up the 16

CHAPTER 7. RESULTS & ANALYSIS 32 digit composite number. Despite this result its still quite clear to see the overall trend. 7.1.2 Trial Division Results Using 3 Nearby Factors Figure 7.2: Trial Division with data From Table A.2 The results in the graph above were generated using the data that can be found in Table A.2 in Appendix A. The three prime factors used to generate the composite number had a difference of 1% or less. The graph in Fig 7.2 displays similar properties displayed in the graph in Fig 7.1. There is a point on both graphs were the time to factorise begins to rise at an almost exponential rate. With Fig 7.2 though this rapid increase in time happens when there are more digits in the composite number. A probable reason for this type of behaviour could be attributed towards a greater number of factors that make up the composite number(3 as opposed to 2). Hence, the composite number itself will obviously be larger because there are more similarly sized factors than with a composite number using the same size factors but only composed of two factors. This behaviour also agrees with the running time discussed in the Literature Review (chapter 3).

CHAPTER 7. RESULTS & ANALYSIS 33 7.1.3 Trial Division Results Using 3 Arbitrary Factors Figure 7.3: Trial Division with data From Table A.3 The three prime factors used to generate the composite numbers were arbitrarily chosen. The results in Fig 7.3 are very similar to the results shown in Fig 7.2. Despite the factors are not within a certain distance to one another the size of the factors themselves are similar since they produce a composite number of the same size used to generate the results in Fig 7.2. Again we see that after a certain point the time required to factorise a composite begins to increase greatly. This feature is common with trial division - that after a certain point (dependant on the make-up of composite) it does become computationally expensive to use this algorithm.

CHAPTER 7. RESULTS & ANALYSIS 34 7.1.4 All Trial Division Results Figure 7.4: Trial Division with data From Table A.1, A.2 & A.3 The graph in Fig 7.4 has combined all the results obtained from the tests carried out on the Trial Division algorithm. It is very clear to see that the algorithm performed similarly up to 15 digits in the composite number. After this point the results begin to diverge quite quickly. Comparing the three sets of results its clear that using a composite number consisting of 3 nearby factors is more computationally expensive. No plausible explanation can be given as to why this type of behaviour has emerged.

CHAPTER 7. RESULTS & ANALYSIS 35 7.2 Fermat s Algorithm Results Figure 7.5: Fermat s Algorithms with Data From Table A.1 Fermat s Algorithm is the only algorithm able to deal with composite numbers that contain two factors. It is for this reason that it was only tested with data from Table A.1 in Appendix A. The results produced by the Fermat s algorithm is similar to the results obtained from the Trial Division algorithm. There is a slight anomaly with the results at the 27 th & 28 th results. Having investigated this I am unable to explain this result. The results have performed according to the running time described in the Literature Review.

CHAPTER 7. RESULTS & ANALYSIS 36 7.3 Pollard ρ Results The graphs in the following sections was generated using the data that can be found in Appendix A. 7.3.1 Pollard ρ Results Using 2 Nearby Factors Figure 7.6: Pollard ρ using data from Table A.1 The results in the graph above were generated using the data that can be found in Table A.1 in Appendix A. The two prime factors used to generate the composite number had a difference of 1% or less. One observation that can be made is the graph appears to be similar to the Trial Division graphs, the only major difference being is the trend line is further down the x axis. Hence the Pollard ρ method is able to factorise larger composite numbers quicker than the trial division and is able to find the factors of larger composite numbers. With the test data used the Pollard ρ method would start to become expensive after the composite number had exceeded 40 digits.

CHAPTER 7. RESULTS & ANALYSIS 37 7.3.2 Pollard ρ Results Using 3 Nearby Factors Figure 7.7: Pollard ρ using data from Table A.2 The results in the graphs above were generated using the data that can be found in Table A.2 in Appendix A. The graph in Fig 7.7 is of a similar shape to the one found in Fig 7.6. There are a few a few results which do not follow the general trend. These could be due to the pseudo random nature of the algorithm. Alterations could have been made to the parameters passed in to try and alter the time take to factorise those composite numbers.

CHAPTER 7. RESULTS & ANALYSIS 38 7.3.3 Pollard ρ Results Using 3 Arbitrary Primes Figure 7.8: Pollard ρ using data from Table A.3 Again the graph in Fig 7.8 is similar to the graphs found in Fig 7.6 & Fig 7.7. The only major difference is the time taken to factorise some of the larger composite numbers ( 25 digit) grew very sharply. Overall this algorithm did take short period of time to arrive at the prime factors, the maximum time is less than 0.1 seconds.

CHAPTER 7. RESULTS & ANALYSIS 39 7.3.4 All Pollard ρ Results Figure 7.9: Pollard ρ with data From Table A.1, A.2 & A.3 The graph in Fig 7.9 compares all the results generated by the Pollard ρ using the three sets of test data. One observation that is very clear is that the Pollard ρ algorithm is very efficient in factorising numbers with 3 factors. In comparison to the time take for 2 factor results, the 3 factor results could almost be considered negligible. Currently the Pollard ρ algorithm seems to be the most efficient in factorising composite number smaller than 35 digits. One thing we do have to keep in mind is that this algorithm is probabilistic. Therefore we have to bear in mind that the results generated shows that this algorithm is efficient for the data it was provided and should not assume that this is the case for all types of composite numbers (of varying length and number of factors).

CHAPTER 7. RESULTS & ANALYSIS 40 7.4 Pollard p 1 Results The graphs in the following sections was generated using the data that can be found in Appendix A. 7.4.1 Pollard p 1 Results Using 2 Nearby Factors Figure 7.10: Pollard p 1 with data from Table A.1 The graph found in Fig 7.10 is similar to the ones seen earlier. They are able to perform factorisations in a relatively quick period of time up to a certain point. After this point the time taken to factorise begins to grow in an exponential manner. In Fig 7.10 we can see the possible characteristics of a pseudo random algorithm. The time taken to factorise an 18 digit composite number is longer than all the following composite numbers up to 21 digits.

CHAPTER 7. RESULTS & ANALYSIS 41 7.4.2 Pollard p 1 Results Using 3 Nearby Factors Figure 7.11: Pollard p 1 with data from Table A.2 The graph found in Fig 7.11 displays a very unusual set of results. The best way to understand this graph is to observe its overall pattern. There is an obvious pattern for each group of factors. The timings seem to oscillate, with varying amplitude as the size of the composite number grows. Again, this type of behaviour can be attributed to the pseudo random nature of the Pollard p 1 algorithm. It does seem strange that this type of behaviour was observed with composite numbers consisting of three factors which are positioned within 1% on one another.

CHAPTER 7. RESULTS & ANALYSIS 42 7.4.3 Pollard p 1 Results Using 3 Arbitrary Factors Figure 7.12: Pollard p 1 with data from Table A.3 The graph in Fig 7.12 returns back to a pattern that we are more familiar with. There are a couple points along the curve that do not follow the general trend. These can most likely be attributed to the pseudo random nature of the algorithm.

CHAPTER 7. RESULTS & ANALYSIS 43 7.4.4 All Pollard p 1 Results Figure 7.13: Pollard p 1 with data From Table A.1, A.2 & A.3 The graph in Fig 7.13 puts the results generated by the Pollard p 1 onto a single graph. The algorithm performed best when given composite numbers composed of three factors located nearby to one another. The curve for 2 Nearby Factor & 3 Arbitrary Factor appear to share some similar characteristics - where on some occasions small composite numbers did take longer to factorise than some of the larger ones.

CHAPTER 7. RESULTS & ANALYSIS 44 7.5 Elliptic Curve Method (ECM) Results The graphs in the following sections was generated using the data that can be found in Appendix A. 7.5.1 ECM Results Using 2 Nearby Factors Figure 7.14: Elliptic Curve Method using data from Table A.1 The graph in Fig 7.14 is again of a different characteristic to those seen earlier. The curve appears to peak at 18 digits and the begins to fall back. This kind of behavior is expected from the ECM (as mentioned in the Literature Review). It has been documented that the ECM struggles to factorise smaller integers when compared to the other algorithms discussed in this report.

CHAPTER 7. RESULTS & ANALYSIS 45 7.5.2 ECM Results Using 3 Nearby Factors Figure 7.15: Elliptic Curve Method using data from Table A.2 The graph in Fig 7.15 returns back to a pattern we are more familiar with. There is a slight anomaly with the results produced between 25 digits and 20 digits.

CHAPTER 7. RESULTS & ANALYSIS 46 7.5.3 ECM Results Using 3 Arbitrary Factors Figure 7.16: Elliptic Curve Method using data from Table A.3 The graph in Fig 7.16 is similar in shape to n Fig 7.15. The results in these last to graphs seem to contradict the findings in Fig 7.14 that the ECM is able to factorise integers quicker as the size of the composite number grows.

CHAPTER 7. RESULTS & ANALYSIS 47 7.5.4 All ECM Results Figure 7.17: Elliptic Curve Method with data From Table A.1, A.2 & A.3 The ECM algorithm seems to perform well when given composite numbers composed of three arbitrary integers. The results for the composite number made up of 2 nearby factors seems to be quite unusual considering that the ECM is the most powerful algorithm that we ll be investigating. We would have assumed that the results would have followed the same pattern as the 2 Nearby Factor curve.

CHAPTER 7. RESULTS & ANALYSIS 48 7.5.5 Comparing All Factoring Algorithms This section looks at how the algorithms investigated perform against one another given the same set of data. Obtaining and analysing these results were one of the central motivations of this report. The data used to generate the graphs can be found in Appendix A. Results Using 2 Nearby Factors Figure 7.18: Comparing all algorithms using data from Table A.1 The graph in Fig 7.18 look at how all the algorithms performed against one another. With the given data set the Pollard p 1 seems to perform relatively well compared to the other factoring algorithms. It was expected that that the Elliptic Curve Method would perform well. After peaking at 18 digits it was expected that the times to factor large integers would continue to decrease. Instead the time increased to an extent where the result could not be obtained within a reasonable period of time. The likely reason for this could be due to some problem in the implementation. The Pollard ρ algorithm may have not beaten the other algorithms in terms of time but has shown it is capable of factoring up to 34 digits within a reasonable period

CHAPTER 7. RESULTS & ANALYSIS 49 of time. The findings in the literature review did suggest that the Pollard ρ algorithm may struggle after the factors exceed 10 12 digits. The other algorithms performed as expected.

CHAPTER 7. RESULTS & ANALYSIS 50 Results Using 3 Nearby Factors Figure 7.19: Comparing all algorithms using data from Table A.2 It is clear to see that the Pollard ρ algorithm performed the best with the given set of data. One could have assumed that the ECM method would have performed better considering it is one of the more powerful algorithms. Again the Pollard p 1 performed well. With integers containing 3 factors we can see the trial division algorithm beginning to struggle. This comment could also be made about the Elliptic Curve Method results as well. For integers composed of 3 nearby factors the Pollard ρ algorithm seemed to take the least amount of time for the algorithm to obtain the correct factors.

CHAPTER 7. RESULTS & ANALYSIS 51 Results Using 3 Arbitrary Factors Figure 7.20: Comparing all algorithms using data from Table A.3 The graph in Fig 7.20 seems to show a behaviour that is expected of the type of factoring algorithms we are using. Unlike previous results the Elliptic Curve Method performed well against the other algorithms. The Pollard p 1 performed poorly considering the results we ve seen previously. Clearly the Pollard ρ algorithm performed excellently considering the type of data we re using. For integers composed of 3 arbitrary factors the Pollard ρ algorithm is least expensive algorithm to run.

CHAPTER 7. RESULTS & ANALYSIS 52 7.6 Summary Reviewing the results we have seen some clear pattern emerge. Irrespective of which data set we used the trial division method was the most computationally expensive. Realistically this algorithm could only be used in scenarios which require the factoring of small integers. Despite its poor performance, the advantage of the trial division algorithm is the ease of implementation. The Fermat s algorithm performed reasonably well when compared to the other algorithms. It was expected to perform better than trial division - which it did. The Pollard ρ algorithm performed very well across the range of test data sets. It was the least computationally expensive algorithm and also was able to factorise the larger integers in the test data set than expected. The Pollard p 1 algorithm performed well, especially when using the 3 nearby factor data set. The other two set of results showed the Pollard p 1 performing relatively well. Unfortunately it could not handle the larger integers as well as the Pollard ρ algorithm. The results of the Elliptic Curve Method (ECM) are unusual for the algorithm which was expected to perform well. The ECM algorithm did perform well when tested with integers with 3 factors. But it did struggle as the integer sizes started to grow. The results in Appendix A will show that the ECM did not report a time with the largest integers in the test data set. The Literature Review did comment that the ECM may struggle with smaller factors but it was expected to do well with the larger composite numbers, especially since ECM has successfully cracked small RSA keys. The probable reason for this could be due to errors during implementation. Further investigations should have been performed to find any improvements with the ECM when it began posting poor results.

Chapter 8 Conclusions 8.1 Introduction In the Introduction (chapter 1) of the project it was stated that this project aimed to study and implement the following algorithms: Trial Division Algorithm Fermat s Algorithm Pollard ρ Method Pollard p 1 Method Elliptic Curve Method (ECM) This chapter aims to assess how well this project has met this aim and covers any suggested future developments. The chapter starts with a critique of the processes undertaken. This discusses the good and bad choices made and what has been learnt from undertaking this project. The next section covers the achievements of the project. It details where the project fits within the field and what the project brings to this field. Finally, the future of the project is discussed, how any unmet requirements can be fulfilled and what development could be undertaken based on this project in the future. 8.2 Critique The project was started by conducting a review of the literature in the field. After studying a few papers it became clear that the level of mathematics involved was unfamiliar and would require some additional study. A couple of books that were particularly useful were Guide to Elliptic Cryptography (Hankerson, Menezes & Vanstone n.d.) and Primality Testing and Integer Factorization in Public-Key Cryptography (Yan 2003). The Internet was also a a useful tool, especially the Wolfram Mathworld site. 53

CHAPTER 8. CONCLUSIONS 54 It was clear from the outset that this report would not be contributing to the current research taking place in this field. This was not due to a lack of trying, but down to the sheer size and depth of of research being conducting in integer factorisation. The algorithms currently known are loosely grouped into two camps. The first would most likely contain the more powerful algorithms such as Number Field Sieve (NFS) and the second would contain the algorithms discussed in this report. One of the motivations for integer factorisation was to attack cryptosystems such as RSA. Attempts on RSA could only be realistically attacked with the larger algorithms (Robinson 2003). Hence, any report that would possibly extend on the current research taking place would need to discuss and build upon the larger algorithms. Due to time constraints it was not possible to embark upon studying and possibly implementing the larger factoring algorithms (e.g. Number Field Sieve) (Pomerance 1996) let alone extending upon these larger algorithms. Despite this, this report does provide a good starting point for students wishing to conduct research into this field. The algorithms discussed in this report may not be considered cutting edge but did contribute to the development of the large factoring algorithms (Montgomery 1994). After deciding the algorithms to be investigated, the first source that provided an excellent starting point was a book by Donald Knuth (Knuth 1997). This gave a clear sense of the direction the report would take. Research material was found and studied and compiled into the Literature Review (chapter 3). Unlike traditional software development projects, the requirements for extensive design and analysis was not required. Pseudocode for these algorithms is available available from a number of sources (discussed in chapter 5). An investigation was conducted to study the tools required to implement these algorithms. The findings for this can be found with the Requirements chapter (chapter 4). Having implemented the algorithms the next stage was to begin testing. As discussed in the testing chapter (chapter 5) the tests comprised of three main tests: testing with 2 nearby factors, testing with 3 nearby factor and 3 arbitrary factors. These tests could have been extending to include ones which look at integers of a similar size but are composed of a different number of factors. Also, with the probabilistic algorithms (Pollard ρ, Pollard p 1 and Elliptic Curve Method) tests could have been conducted to investigate which bounds (especially the Pollard p 1 algorithm) would be optimal given an integer type. At the beginning of the project it was discussed that an additional application would be developed. This application would take an input from the user (a large composite integer) and selects the appropriate factoring algorithm to generate a result. The development for this application was abandoned shortly after beginning of the implementation of the five main algorithms. The reasons for the abandonment of this application was mainly due to time constraints. An application of this nature would require a con-

CHAPTER 8. CONCLUSIONS 55 siderable amount time to implement. Another reason was the application would not provide any data (that was not already available to us) to contribute to the results. An emphasis was placed on the understanding and obtaining a good set of results for the algorithms already selected. With regards to the plan set out in Appendix D, I managed to stay to track from most of the tasks. Changes occurred due to other commitments such as examinations or social events. 8.3 Achievements The project has successfully implemented the five factoring algorithms we set out in chapter 1. During research undertaken for the literature review (chapter 3) it was determined that this is an area in which there is a great deal of research that has taken place. Despite the large volume of research that has taken place I was unable to find any comprehensive studies aimed at people who are new to the field of research. Hopefully this report meets that need. Every effort was taken to ensure that the code generated for this project could be reused. This would allow students to build upon the research carried out in this report and possibly taking it further. Reusability was done by ensuring that the code was well documented. Steps were also taken to ensure the code was complied to a coding standard. More information can be found in Appendix B. 8.4 Future Advances If the opportunity was given to continue and build upon this project there are a number of directions that could be taken. The most obvious progression would have been to study and implement the larger algorithm such as: Quadratic Sieve Multiple Polynomial Sieve (MPQS) Number Field Sieve (NFS) This would aid the breadth of this study and allow for a top-level view over the integer factorisation problem. Most contemporary research into integer factorisation would primarily focus on the algorithms listed and build upon them. Other advances could be to investigate how the algorithms discussed in this report would perform when implemented using different computer languages and implemented on different hardware platforms. The project could have also been extended to into another exciting research fields such as

CHAPTER 8. CONCLUSIONS 56 Grid and Cluster Computing. Research in this area studies how these computationally expensive algorithms can be performed in a relatively cheap distributed environment (Yan, James & Wu 2006). 8.5 Summary Integer factorisation has become an important research field within Computer Science. It has acted as a catalyst for reasearch area such as Quantum Computing, High Performance Processing (Wunderlich 1983) and Parallel Computing (Brent 2000b). This project has provided a valuable insight into a well researched topic within Computer Science. This project serves a solid and comprehensive foundation to students (and possibly academics) wishing to embark upon research into this this exciting area of research. The current set of factoring algorithms pose no serious threat to modern cryptosystems e.g RSA (Odlyzko 1995). But it will be interesting to see how this field continues to grow and challenge the computer security industry.

Appendix A Test Data & Results Tables This chapter contains all the data that was used to generate the results in Chapter 7. A couple of tools were used to generate this data. First, was a website 1 that is able to generate the next ten 10 prime numbers after the number inputted by myself. The second tool was a small application developed by myself using C and GNU GMP to allow me to multiple together the large factors to produce the composite numbers. The source code and documentation for this application have been excluded from this report as the application was insignificant and irrelevant to the main objective to the report. 1 John Moyer s What are the Next Ten Prime Numbers? 57

APPENDIX A. TEST DATA & RESULTS TABLES 58 Factor 1 Factor 2 Composite N Composite Length 3641 3701 12809161 8 digit 10939 11783 128894237 9 digit 58451 58771 3435223721 10 digit 180623 183823 33202661729 11 digit 515951 516883 266686300733 12 digit 1935781 1936189 3748037878609 13 digit 9973021 9974399 99474890689379 14 digit 14658001 14660879 214899179042879 15 digit 87609593 87697201 7683116086849193 16 digit 100639727 100640149 10128397120599323 17 digit 590338313 590338913 348502809153085921 18 digit 2098394041 2098396141 4403261957931795781 19 digit 6093099203 6093105301 37125895053318175103 20 digit 20983788373 20983809313 440319813883378517749 21 digit 84832524173 84832523947 7196557138390478870831 22 digit 234245412659 234245412451 54870913303062151617209 23 digit 352342396093 352342395827 124145163990833424303911 24 digit 2359235236123 2359235235937 5565990898925529810152251 25 digit 9897756149717 9897746251961 97965478833685375120845037 26 digit 9897756149717 10897746251981 107863234983600330243839377 27 digit 80546818263553 85155916957753 6859038167262341961920676409 28 digit 98977561497181 110897746252019 10976388499557984324984058439 29 digit 694926970487413 694926970487377 482923494310788761323853885701 30 digit 1506998576885783 1508505575462707 2273315755446568584255314994581 31 digit 9897561128394917 9907458689523319 98059678006604446847234312569523 32 digit 10293849302938493 10293859596787813 105963439434940945466724710985809 33 digit 33696009303094673 33696009303095179 1135421042954259800567202166881467 34 digit Table A.1: 2 Nearby Factor. 1% difference between factors

APPENDIX A. TEST DATA & RESULTS TABLES 59 Factor 1 Factor 2 Factor 3 Composite N Composite Length 1993 1997 1999 7956061979 10 digit 1993 1997 3001 11944043021 11 digit 4789 4793 4799 110154695923 12 digit 10061 10067 10069 1019829472003 13 digit 40237 40241 40277 65215596741409 14 digit 82339 82351 82373 558546517820897 15 digit 100183 100189 100193 1005660644975291 16 digit 346349 346361 346373 41551523698367897 17 digit 875363 875377 875389 670786637300360039 18 digit 1218221 1218247 1218251 1807999095332691337 19 digit 2568361 2568373 2568393 16942401273824052017 20 digit 9568369 9568379 9568439 876026768754707058589 21 digit 12182153 12182179 12182189 1807899810651633746143 22 digit 35731261 35731271 35731279 45618958937958560962949 23 digit 52182167 52182181 52182233 142091139418789481644891 24 digit 167432219 167432257 167432429 4693726093583716603529407 25 digit 367432391 367432193 367432223 49605734447784768510068249 26 digit 8322873159 832873207 832873213 577745607234649319160393469 27 digit 1205939299 1205939309 1205939351 1753785062337747774411690241 28 digit 3684391247 3684391273 3684391303 50014650152798002070898912593 29 digit 5705393911 5705939311 5705939413 185772512294787467989233923177 30 digit Table A.2: 3 Nearby Primes. 1% difference between factors

APPENDIX A. TEST DATA & RESULTS TABLES 60 Factor 1 Factor 2 Factor 3 Composite N Composite Length 179 233 1231 51341317 8 digit 797 233 1231 228597931 9 digit 797 1237 1231 1213629359 10 digit 179 8467 9871 14960418503 11 digit 263 8581 102233 230719741099 12 digit 1163 12347 581857 8355211084777 13 digit 1009 26417 812341 21652748706773 14 digit 2081 60223 851419 106703288395397 15 digit 5091 81619 3023749 1256435226791421 16 digit 20333 81619 71232297 11821532550681719 17 digit 12983 987533 32487877 416531649825896503 18 digit 20333 4021471 81232979 6642304516916452297 19 digit 20333 7021489 81232979 11597464733720368423 20 digit 762479 1276237 157921783 153674304751986405509 21 digit 203339 90214909 212329811 3895022510844218792461 22 digit 203339 1021417 12123298223 25181193724634786927449 23 digit 2033327 19021531 12123298223 468892715217724656517051 24 digit 2033327 79021541 12123298223 1947930738076695975296261 25 digit 21033329 79021541 12123298223 20149965098176522360339547 26 digit 21033329 1990215443 12123298223 507491643503914026956049581 27 digit 321033347 1990215443 12123298223 7745884680860187546819089783 28 digit 321033347 4990215449 12123298223 19421833719838612721072144069 29 digit 321033347 4990215449 66123298231 105931214396208018625820766493 30 digit Table A.3: 3 Arbitrary Factors.

APPENDIX A. TEST DATA & RESULTS TABLES 61 Digits in Composite Number Trial Algorithm Pollard ρ Pollard p 1 Fermat s Algorithm Elliptic Curve Method 8 digits 0.031 0 0 0 0 9 digits 0.109 0 0 0 0 10 digits 0.25 0 0 0 0 11 digits 0.546 0 0 0 0 12 digits 1.109 0 0.078 0 0 13 digits 4.75 0 0.015 0 0.156 14 digits 37.937 0 0.39 0 5.203 15 digits 52.079 0 2.234 3.321 13.468 16 digits 452.671 0 6.156 4.32 115.23 17 digits 476.109 0.031 1.234 9.341 451.203 18 digits 722.022 0.062 140.761 7.32 586.593 19 digits - 0.046 41.093 11.544 539.281 20 digits - 0.25 52 15.25 317.797 21 digits - 0.203 76.452 19.326-22 digits - 1.171 41.734 40.252-23 digits - 1.64 247.238 123.444-24 digits - 0.687 680.015 189.321-25 digits - 5.937-245.321-26 digits - 5.671-344.13-27 digits - 13.232-312 - 28 digits - 32.766-420.531-29 digits - 87.234 - - - 30 digits - 116.64 - - - 31 digits - 161.265 - - - 32 digits - 589.375 - - - 33 digits - 929.32 - - - 34 digits - 1395.453 - - - Table A.4: Results generated using Table A.1

APPENDIX A. TEST DATA & RESULTS TABLES 62 Digits in Composite Number Trial Algorithm Pollard ρ Pollard p 1 Elliptic Curve Method 8 digits 0 0 0 0 9 digits 0 0 0 0 10 digits 0.015 0 0.109 0 11 digits 0.046 0 0.316 0 12 digits 0.91 0.015 0.46 0 13 digits 0.125 0 0.915 0 14 digits 0.875 0 1.578 0 15 digits 2.609 0 2.187 0 16 digits 3.128 0 0 0 17 digits 18.734 0 1.671 0 18 digits 73.297 0 15.765 0.21 19 digits 7.171 0 0.484 1.328 20 digits 15.75 0 2.671 7.281 21 digits 89.75 0 0.109 14.281 22 digits 109.703 0 21.872 26.984 23 digits 351.562 0 3.609 24.25 24 digits 520.015 0 39.203 148.781 25 digits 1470 0.046 45.859 265.718 26 digits 3954.984 0.047 1.89 769 27 digits - 0.046 45.25 542.172 28 digits - 0.109 1.187 1465.968 29 digits - 0.125 - - 30 digits - 0.218 - - Table A.5: Results generated using Table A.2

APPENDIX A. TEST DATA & RESULTS TABLES 63 Digits in Composite Number Trial Algorithm Pollard ρ Pollard p 1 Elliptic Curve Method 8 digits 0 0 0 0 9 digits 0.015 0 0 0 10 digits 0 0 0 0 11 digits 0.031 0 0 0 12 digits 0.25 0 0 0 13 digits 2.577 0 0 0 14 digits 4.155 0 0 0.215 15 digits 4.781 0 0 0.711 16 digits 7.822 0 0 1.33 17 digits 0.328 0 0 1.94 18 digits 3.99 0 0.312 3.171 19 digits 17.374 0 1.312 4.218 20 digits 31.765 0 0.078 8.828 21 digits 48.531 0 16.218 2.234 22 digits 50 0 1.25 10.11 23 digits 51.89 0 139.531 13.743 24 digits 107.296 0 9.578 17.817 25 digits 419.278 0 101.328 27.781 26 digits 513.731 0 11.156 32.641 27 digits 821.421 0 249.14 53.662 28 digits 1120.226 0 588.21 82.573 29 digits - 0.015 721.55 101.44 30 digits - 0.09 1083.46 231.212 Table A.6: Results generated using Table A.3

Appendix B Code Coding standards are important for any piece of software that could potentially be use by others. They promote easily readable code which enable more people to identify extensions and bugs rapidly. This project follows the coding standards set out by GNU Coding Standards and found at http://www.gnu.org/prep/standards/ 64

Appendix C User Documentation This chapter briefly looks at how to use the implemented algorithms. These algorithms were compiled using a pre-compiled GNU GMP library for the Windows platform. To download and for further information please visit (http://cs.nyu.edu/exact/core/gmp/). The source code and executables for this project are available from my homepage (http://people.bath.ac.uk/ wb1kak). For further information on the GNU GMP library please visit (http://gmplib.org/) The following instructions assume that the user is using a Windows environment. C.1 Trial Division To run, type the following at the command prompt: (C:\>td composite) where composite is the integer to be factored. C.2 Fermat s Algorithm To run, type the following at the command prompt: (C:\>fermat composite) where composite is the integer to be factored. C.3 Pollard ρ Algorithm To run, type the following at the command prompt: (C:\>rho composite) where composite is the integer to be factored. 65

APPENDIX C. USER DOCUMENTATION 66 C.4 Pollard p 1 Algorithm To run, type the following at the command prompt: (C:\>pollard composite b) where composite is the integer to be factored & b is small integer for computation. C.5 Elliptic Curve Method Algorithm To run, type the following at the command prompt: (C:\>ecm composite bound) where composite is the integer to be factored & b is smoothness bound B1. For any queries please e-mail wb1kak@bath.ac.uk

Appendix D Gantt Chart This chapter contains the Gantt chart used to timetable my time for the duration of this project. This Gantt chart was created using the GanttProject 2.0.4 application which can be found at http://ganttproject.biz/ 67

Bibliography Atkin, A. & Bernstein, D. (1999), Prime sieves using binary quadratic forms. Atkin, A. & Morain, F. (1993), Finding suitable curves for the elliptic curve method of factorization., Mathematics of Computation (60), 399 405. Brent, R. P. (1980), An improved monte carlo factorization algorithm., Nordisk Tidskrift for Informationsbehandlung (BIT) 20, 176 184. Brent, R. P. (1999), Some parallel algorithms for integer factorisation, in European Conference on Parallel Processing, pp. 1 22. Brent, R. P. (2000a), Recent progress and prospects for integer factorisation algorithms, Lecture Notes in Computer Science 1858, 3+. Brent, R. P. (2000b), Recent progress and prospects for integer factorisation algorithms, Lecture Notes in Computer Science 1858, 3+. Bressoud, D. M. (1989), Factorization and Primality Testing, Springer - Verlag. Erdos, P., Pomerance, C. & Schmutz, E. (1991), Carmichael s lambda function, Acta Arithmetica 58, 363 385. Flannery, S., Flannery, D. & Flannery, D. (2000), In Code: A Mathematical Journey, Profile Books Ltd. Floyd, R. W. (1967), Nondeterministic algorithms, Journal of the ACM 14(4). Gauss, C. F. (1801), Disquisitiones Arithmeticae (English Translation by Arthur A. Clarke), Yale University Press. Guy, R. (1994), Unsolved Problems in Number Theory, Vol. 2, New York: Springer-Verlag. Chapter 12: Pseudoprimes. Euler Pseudoprimes. Strong Pseudoprimes. Hankerson, D., Menezes, A. & Vanstone, S. (n.d.), Guide to Elliptic Curve Cryptography, Springer Professional Computing. 68

BIBLIOGRAPHY 69 Havil, J. (2003), Gamma: Exploring Euler s Constant, Princeton University. Hurd, J. (2003), Verification of the Miller-Rabin probabilistic primality test, Journal of Logic and Algebraic Programming 50(1 2), 3 21. Knuth, D. (1997), Art of Computer Programming, Volume 2: Seminumerical Algorithms (3rd Edition), Vol. 2, Addison-Wesley Professional. Labs, R. (2005), Rsa factoring challenge, http://www.rsa.com/rsalabs/ node.asp?id=2093. Lenstra, H. W. (1987), Factoring integers with elliptic curves, Annals of Mathematics 2(126). Loudon, K. (1999), Mastering Algorithms with C, O Reilly. Menezes, A., van Oorschot, P. & Vanston, S. A. (1996), Handbook of Applied Cryptography, CRC Press Inc. Montgomery, P. (1994), A survey of modern integer factorization algorithms. Morrison, M. A. & Brillhart, J. (1975), A method of factoring and the factorization of f7, Mathematics. of Computation 29(129), 183 206. Odlyzko, A. M. (1995), The future of integer facotization, Technical report, AT & T Bell Labs. Pollard, J. M. (1974), Theorems on factorization and primality testing, Proc. Cambridge Phil. Soc 76, 521 528. Pollard, J. M. (1975), A monte carlo method for factorization., Nordisk Tidskrift for Informationsbehandlung (BIT) 15, 331 334. Pomerance, C. (1996), A tale of two sieves, Notices of the AMS. Rev. Samuel Horsley, F. R. S. (1772), The sieve of eratosthenes. being an account of his method of finding all the prime numbers, Philosophical Transactions (1683-1775) 62, 327 347. Ribenboim, P. (2004), The New Book of Prime Number Records, 3rd edn, Springer. Rivest, R. L. (1991), Finding four million large random primes, Lecture Notes in Computer Sciences 537, 625 637. Rivest, R. L., Shamir, A. & Adelman, L. M. (1977), A method for obtaining digital signatures and public-key cryptosystems, Technical Report MIT/LCS/TM-82.

BIBLIOGRAPHY 70 Robinson, S. (2003), Still guarding secrets after years of attacks, rsa earns accolades for its founders, SIAM News 36(5). Uhl, A. (n.d.), Parallel computing in cryptoanalysis: Experiences in a graduate students project - workpackage wp5.1. Wunderlich, M. (1983), Recent advances in the design and implementation of large integer factorization algorithms, Proceedings of the 1983 IEEE Symposium on Security and Privacy p. 67. Yan, S. (2003), Primality Testing and Integer Factorization in Public-Key Cryptography, Kluwer Academic Publishers. Yan, S. Y., James, G. & Wu, G. (2006), Cryptographic and computational challenges in grid computing., in FCS, pp. 200 204. Zimmermann, P. (n.d.), Ecm top 100 table, http://www.rsa.com/ rsalabs/node.asp?id=2093.