FPGA and ASIC Implementation of Rho and P-1 Methods of Factoring. Master s Thesis Presentation Ramakrishna Bachimanchi Director: Dr.



Similar documents
Hardware Implementations of RSA Using Fast Montgomery Multiplications. ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner

Mathematics of Internet Security. Keeping Eve The Eavesdropper Away From Your Credit Card Information

Primality Testing and Factorization Methods

An Efficient Hardware Architecture for Factoring Integers with the Elliptic Curve Method

DDS. 16-bit Direct Digital Synthesizer / Periodic waveform generator Rev Key Design Features. Block Diagram. Generic Parameters.

Factoring Algorithms

High-Level Synthesis for FPGA Designs

Library (versus Language) Based Parallelism in Factoring: Experiments in MPI. Dr. Michael Alexander Dr. Sonja Sewera.

FACTORING LARGE NUMBERS, A GREAT WAY TO SPEND A BIRTHDAY

Arithmetic algorithms for cryptology 5 October 2015, Paris. Sieves. Razvan Barbulescu CNRS and IMJ-PRG. R. Barbulescu Sieves 0 / 28

Integer Factorization Based on Elliptic Curve Method: Towards Better Exploitation of Reconfigurable Hardware

Factoring Algorithms

Speeding Up RSA Encryption Using GPU Parallelization

Elements of Applied Cryptography Public key encryption

Integer Factorization using the Quadratic Sieve

Public-Key Cryptanalysis 1: Introduction and Factoring

FPGA Implementation of RSA Encryption Engine with Flexible Key Size

Factoring. Factoring 1

AES (Rijndael) IP-Cores

Example-driven Interconnect Synthesis for Heterogeneous Coarse-Grain Reconfigurable Logic

9/14/ :38

Factorization Methods: Very Quick Overview

I. Introduction. MPRI Cours Lecture IV: Integer factorization. What is the factorization of a random number? II. Smoothness testing. F.

Factoring & Primality

FACTORING. n = fall in the arithmetic sequence

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Area time efficient hardware architecture for factoring integers with the elliptic curve method

Public-key cryptography RSA

MATH 168: FINAL PROJECT Troels Eriksen. 1 Introduction

Overview of Public-Key Cryptography

Factoring integers and Producing primes

RSA Attacks. By Abdulaziz Alrasheed and Fatima

The application of prime numbers to RSA encryption

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Implementation and Design of AES S-Box on FPGA

Faster deterministic integer factorisation

Digital Systems Design! Lecture 1 - Introduction!!

7a. System-on-chip design and prototyping platforms

Lecture 13: Factoring Integers

FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers

Public Key Cryptography and RSA. Review: Number Theory Basics

Introduction to Programmable Logic Devices. John Coughlan RAL Technology Department Detector & Electronics Division

RSA Question 2. Bob thinks that p and q are primes but p isn t. Then, Bob thinks Φ Bob :=(p-1)(q-1) = φ(n). Is this true?

Hardware-Software Codesign in Embedded Asymmetric Cryptography Application a Case Study

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING

Modern Factoring Algorithms

CRYPTOGRAPHY IN NETWORK SECURITY

RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition

Two Binary Algorithms for Calculating the Jacobi Symbol and a Fast Systolic Implementation in Hardware

An Overview of Integer Factoring Algorithms. The Problem

The Mathematics of the RSA Public-Key Cryptosystem

1 Digital Signatures. 1.1 The RSA Function: The eth Power Map on Z n. Crypto: Primitives and Protocols Lecture 6.

Computer and Network Security

Primality - Factorization

Public Key Cryptography: RSA and Lots of Number Theory

How To Factoring

Factoring integers, Producing primes and the RSA cryptosystem Harish-Chandra Research Institute

Principles of Public Key Cryptography. Applications of Public Key Cryptography. Security in Public Key Algorithms

Shor s algorithm and secret sharing

Study of algorithms for factoring integers and computing discrete logarithms

FactHacks: RSA factorization in the real world

Elementary factoring algorithms

Cryptography & Network-Security: Implementations in Hardware

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Discrete Mathematics, Chapter 4: Number Theory and Cryptography

White Paper FPGA Performance Benchmarking Methodology

VHDL GUIDELINES FOR SYNTHESIS

7! Cryptographic Techniques! A Brief Introduction

A Practical Parallel CRC Generation Method

The implementation and performance/cost/power analysis of the network security accelerator on SoC applications

Two Integer Factorization Methods

LogiCORE IP AXI Performance Monitor v2.00.a

How To Design A Chip Layout

Quantum Computing Lecture 7. Quantum Factoring. Anuj Dawar

Hardware and Software

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Introduction to Digital System Design

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

SDLC Controller. Documentation. Design File Formats. Verification

Vivado Design Suite Tutorial

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Computing exponents modulo a number: Repeated squaring

High-Performance Modular Multiplication on the Cell Processor

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Modeling Latches and Flip-flops

Hardware Implementation of the Stone Metamorphic Cipher

Optimising the resource utilisation in high-speed network intrusion detection systems.

Factoring pq 2 with Quadratic Forms: Nice Cryptanalyses

Open Flow Controller and Switch Datasheet

Step : Create Dependency Graph for Data Path Step b: 8-way Addition? So, the data operations are: 8 multiplications one 8-way addition Balanced binary

Case Study: Improving FPGA Design Speed with Floorplanning

IJESRT. [Padama, 2(5): May, 2013] ISSN:

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware

How To Know If A Message Is From A Person Or A Machine

A First Course in Digital Design Using VHDL and Programmable Logic

CIS 6930 Emerging Topics in Network Security. Topic 2. Network Security Primitives

System-on. on-chip Design Flow. Prof. Jouni Tomberg Tampere University of Technology Institute of Digital and Computer Systems.

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

Factoring and Discrete Log

Transcription:

FPGA and ASIC Implementation of Rho and P-1 Methods of Factoring Master s Thesis Presentation Ramakrishna Bachimanchi Director: Dr. Kris Gaj

Contents Introduction Background Hardware Architecture FPGA and ASIC Design Flow Results Conclusions

RSA In 1977 Ron Rivest, Adi Shamir & Leonard Adleman developed the first public key cryptosystems, they called RSA

RSA Public key {e, N} Private key {d, P,Q} Alice Encryption Network Decryption Bob { e, N } { d, P, Q } N = P Q P, Q - large prime factors e d 1 mod ((P-1)(Q-1))

Common Applications of RSA Secure WWW, SSL Network Browser WebServer S/MIME, PGP Alice Bob

Recommended key sizes for RSA Size of the RSA key = size of N=P Q Old standard: Individual users New standard: Short-term use ( up to 2010) 512 bits (155 decimal digits) 1024 bits Long-term use 2048 bits

Factoring RSA RSA-200 (663-bits) factored by Bahr, Boehm, Frank and Kleinjung When? Dec 2003 May 2005 Effort? First stage: About 1 year on various machines, equivalent to 55 years on Opteron 2.2 GHz CPU Second stage: 3 months on a cluster of 80 2.2 GHz Opterons connected via a gigabit network

Number Field Sieve Best Algorithm to Factor Large Numbers Complexity: Sub-exponential time and memory N = Number to factor, k = Number of bits of N Exponential function, e k Sub-exponential function, e k1/3 (ln k) 2/3 Polynomial function, a k m

Steps of Number Field Sieve (NFS) Polynomial Selection Relation Collection Sieving 200 bit & 350 bit numbers Mini factoring Pollard rho p-1 method ECM Linear Algebra Square Root

Rho Algorithm

Pollard s Rho Method Birthday paradox: If more than 23 random people are in a room (or even if they aren't) there is a more than 50% probability that the birthdays of two of them fall on the same day of the year.

Pollard's rho method - Example N = 97 1889 = 183 233 x i+1 = x i2 + 1 mod N x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 2 5 26 677 91864 15449 102236 39678 5749 69062 mod 97: 2 5 26 95 5 26 95 5 26 95 x 2 x 5 x 8 mod q x 1 x 4 x 7 mod q 26 5 2 95 x 0 x 3 x 6 x 9 mod q x 1 x 4 mod q q (x 1 x 4 ) q N q gcd(x 1 x 4, N) q=gcd(-91 859, 183 233) = 97

Pollard s Rho Method x 3 mod q x 4 mod q x s x e mod q x e mod q.... x e-1 mod q... x s mod q x i+1 mod q x s+1 mod q period=e-s.. x s+2 mod q. x i mod q x 2 mod q x 1 mod q x 0 mod q x s x e mod q x s+1 x e+1 mod q.... x s+k x e+k mod q

Rho Algorithm- Floyd s Version Initialize b c x 0 1. ( ) 2 choose the polynomial as f x x a 2. calculate b f ( b) mod n and c f ( f ( c)) mod n 3. compute d gcd( b- c, n) 4. if 1 d n, a non trivial factor of n is found 5. if d 1 go to step 2 if d N change a and go to step 1

Rho Method - Floyd s Version x 1 -x 2 x 1 -x 3 x 1 -x 4 x 1 -x 5 x 1 -x 6 ---------------------------------------------------- x 1 -x i x 2 -x 3 x 2 -x 4 x 2 -x 5 x 2 -x 6 x 2 -x 7 ---------------------------------------------------- x 2 -x i x 3 -x 4 x 3 -x 5 x 3 -x 6 x 3 -x 7 x 3 -x 8 ---------------------------------------------------- x 3 -x i x 4 -x 5 x 4 -x 6 x 4 -x 7 x 4 -x 8 x 4 -x 9 ---------------------------------------------------- x 4 -x i x 5 -x 6 x 5 -x 7 x 5 -x 8 x 5 -x 9 x 5 -x 10 ----------------------------------------------------- x 5 -x i x 6 -x 7 x 6 -x 8 x 6 -x 9 x 6 -x 10 x 6 -x 11 x 6 -x 12 --------------------------------------- x 6 -x i x 7 -x 8 x 7 -x 9 x 7 -x 10 x 7 -x 11 x 7 -x 12 x 7 -x 13 x 7 -x 14 ------------------------- x 7 -x i x 8 -x 9 x 8 -x 10 x 8 -x 11 x 8 -x 12 x 8 -x 13 x 8 -x 14 x 8 -x 15 x 8 -x 16 --------------- x 8 -x i --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- x k -x k+1 x k -x k+2 x k -x k+3 --------------------------------------------------- x k -x 2k --------- x k -x i

Pollard s Rho Algorithm - Floyd s Version f(x)=x 2 +a with a {-2,0} # iterations t <100 q max (q max is the maximum factor we expect to find using rho method) We choose random x 0 in the range(0,n-1) and x 1 =f(x 0 ) V 2 V 1 d x 0 d=1 x 2 x 1 d=d*(x 2 -x 1 ) f(f()) f() x 4 x 2 d=d*(x 4 -x 2 ) x 6 x 3 d=d*(x 6 -x 3 )... x t x t/2 d=d*(x t -x t/2 ) x t+2 x (t+2)/2 d=d*(x t+2 -x (t+2)/2 )....... x 2i x i d=d*(x 2i -x i ) x 2(i+1) x i+1 d=d*(x 2i+2 -x i+1 )....... x 2t x t d=d*(x 2t -x t ) *x 2i+2 =f(f(x 2i )),x i+1 =f(x i ) q=gcd(d,n) Minimization for area and/or memory

Rho Algorithm- Floyd s Version Contd. Inputs x a f x x a N t even 2 : 0,, ( ),, (, 2) Outputs : q ( such that q N) v x f ( x ), v x f ( x ), temp v -v x - x, d 1 1 1 0 2 2 1 2 1 2 1 for ( i 2; i t; i ) { v v 2 2 2 v v a v 2 2 v 2 2 2 v v a v 2 2 v 2 1 1 v v a * all operations are done 1 1 temp v -v mod ulo N 2 1 d d* temp } q gcd ( d, N)

Rho Method - Brent s Version x 1 -x 2 x 1 -x 3 x 1 -x 4 x 1 -x 5 x 1 -x 6 ---------------------------------------------------- x 1 -x i x 2 -x 3 x 2 -x 4 x 2 -x 5 x 2 -x 6 x 2 -x 7 ---------------------------------------------------- x 2 -x i x 3 -x 4 x 3 -x 5 x 3 -x 6 x 3 -x 7 x 3 -x 8 ---------------------------------------------------- x 3 -x i x 4 -x 5 x 4 -x 6 x 4 -x 7 x 4 -x 8 x 4 -x 9 ---------------------------------------------------- x 4 -x i x 5 -x 6 x 5 -x 7 x 5 -x 8 x 5 -x 9 x 5 -x 10 ----------------------------------------------------- x 5 -x i x 6 -x 7 x 6 -x 8 x 6 -x 9 x 6 -x 10 x 6 -x 11 x 6 -x 12 --------------------------------------- x 6 -x i x 7 -x 8 x 7 -x 9 x 7 -x 10 x 7 -x 11 x 7 -x 12 x 7 -x 13 x 7 -x 14 ------------------------- x 7 -x i x 8 -x 9 x 8 -x 10 x 8 -x 11 x 8 -x 12 x 8 -x 13 x 8 -x 14 x 8 -x 15 x 8 -x 16 --------------- x 8 -x i --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- x k -x k+1 x k -x k+2 x k -x k+3 ---------------- x 2k -x 2 k + 2 k-1 +1 -------------------------------------------- x 2k -x 2 k+1

Rho Method - Brent s Version Sequence of Operations v 2 d v 1 x 2 d=1 x 2 x 3 x 4 d=d*(x 4 -x 2 ) x 4 x 5 x 6 x 7 d*(x 7 -x 4 ) x 8 d*(x 8 -x 4 ) x 8 x 9 x 10 x 11 x 12 x 13 d*(x 13 -x 8 ) x 14 d*(x 14 -x 8 ) Minimization for x 15 d*(x 15 -x 8 ) execution time x 16 d*(x 16 -x 8 ) x 16 24%

Rho Algorithm- Brent s Version Inputs x a f x x a N t even 2 : 0,, ( ),, (, 2) Outputs : q ( such that q N) x f ( x ), v v x f ( x ), k 1 1 0 2 1 2 1 for ( i 3; i 2 t; i ) { v f ( v ) if { 2 2 k k-1 k 1 (2 2 1 i 2 ) temp v -v 2 1 d d * temp } if { v k 1 ( i 2 ) v 1 2 k k 1 } } q gcd( d, N)

p-1 Algorithm

p-1 Algorithm Based on Fermat s Little Theorem a p-1 1(mod p) a m(p-1) 1(mod p) a m(p-1) 1 0(mod p) N number to be factored a, any small integer p, non-trivial factor of N Choose a small number a, such that 1<a<N Choose a special number k Compute a k (mod N) 1 Compute gcd(a k (mod N) 1, N)

p-1 algorithm Inputs : N a B 1 B 2 number to be factored arbitrary integer such that gcd(a, N)=1 smoothness bound for Phase1 smoothness bound for Phase2 Outputs: q - factor of N, 1 < q N or FAIL

p-1 algorithm Phase 1 ei 1: k p such that p - consecutive primes B k 2: q a mod N 0 3: q gcd( q 1, N) p i 0 i 4 : if q 1 5: return q (factor of N) 6: else 7: go to Phase 2 8: end if i ei e - largest exponent such that p B i precomputations 1 main computations postcomputations i 1

p-1 algorithm Phase 2 09: d 1 10: for each prime p B to B do p 0 1 2 11: d d ( q 1) (mod N) 12 : end for 13: q gcd( d, N) 14: if q 1 then 15: return q 16: else 17: return FAIL 18: end if main computations postcomputations

p-1 Phase 1 Numerical example N = 1 740 719 = 1279 1361 a = 2 B 1 = 20 k = 2 4 3 2 5 7 11 13 17 19 = 232 792 560 q 0 =a k mod N = 2 232 792 560 mod 1 740 719 = 1 003 058 q = gcd (1 003 058 1; 1 740 719) = 1361 Why did the method work? q-1 = 1360 = 2 5 17 k a k mod q = a (q-1) m mod q = 1 q a k -1

Modular Exponentiation- Sliding Window Method Input : g, e ( e e... e, e ) with e 1, and an int eger w 1 Output : g 1. precomputation e 1 2 t t 1 1 0 g g, g g 2. A 1, i t 3. while i 0 do the following 2 For i from to do g g g w 1 1 (2 1) : 2i 1 2i 1 * 2 2 if e 0 then do : A A, i i -1 i i-l 1 i i 1 t otherwise ( e 0), find the longest bitstring e e... e such that i - l 1 w and e 4.Re turn( A) l 1, i i i-1 l and do the following 2 A A g( e e... e ) i l *, 1 l

Sliding Window Method- Example calculating g 50, e = (110010) 2, window size 2 Pre-computations g 3 Main computations, A 1 11 0010, window size = 2 and the value = 11 = 3 A (A) 4.g 3 = g 3 11 0 010 A A 2 = g 6 110 0 10 A A 2 = g 12 1100 1 0, window size = 1 and the value = 1 = 1 A (A) 2.g 1 = g 25 11001 0 A A 2 = g 50

Hardware Architecture

Top-level View FPGA / ASIC Control Unit I/O Host computer Global memory Rho, p-1, unified Units RAM

Low Level Arithmetic Units

Montgomery Multiplication A _M _C hoice B A _M write start 3 2 32 w w B M A ws ws ws S1in S2in Es Es Eb Eb loada S1 S2 B reset M reset A (Shift_Reg) reg_rst reg_rst reset clk reset M U LT IPLIE R read S1out S2out zeros Bout zeros w w Mout read w w Ai qi BB mm w w w w A(0) Ai C 32 read done_m ul Based on McIvor, McLoone, et al. Asilomar 2003: full-length CSAs word-length CPAs S1in S2in >>1 >>1 A1 A2 B C CSR42 + ws read ws data_out S2out(0) S1out(0) SUM CARRY sum carry w w S1out(ws-1 downto 0) S2out(ws-1 downto 0) ws ws Bout(0) Ai U V W Y w w w w CSR42 CSA w+1 w+1 CSA w+2 w+2 qi S C

Addition / Subtraction a d d r1 W E L a d d r2 B A_M _Choice L U T 3 2 X 3 2 M E M A _ M A_M write add_sub 32 32 M A _ M _ C h o ic e A _ M B < < 1 2 M clk reset ADDE R/ SUB TRACTO R O P 1 O P 2 E A 3 2 b ti re g A 3 2 b ti re g B E B s u b 32 + s u m 1 s u m 2 E C 1 C read Original design C o u t A D D E R C in C 1 E C 2 C 2 < > re a d s ig n Z

Global Memory- Rho 0 31 0 n for unit1 n for unit2...... n for unit m Same for all units x 0 a t No. of iterations

Local Memory- Rho data_out 32 31 0 g_l A_M Grei 32 0 M temp data_in 32 0 1 32 0 Kout 32 C V1 6 Aaddr 1 V2 u_l a 6 Baddr B 32 d WEA Local Memory 63

Computation Flow MUL ADD/SUB 1 to 2t-1 v 2 v 2 2 cond1 temp (v 2 -v 1 ) cond1 d d*temp 1 to 2t-1 v 2 v 2 + a cond1: 2 k +2 k-1 +1 i-1 2 k+1

Control Unit - Rho Memory Initialization Main Computations Reading Out Results

Global Memory p-1 0 Phase 1 31 0 N for unit 1 N for unit 2... N for unit m 0 Phase 2 31 0 GCD_table[1]... GCD_table[GMAXD] M min M max Determines j such that 1 j D and gcd(j, D) = 1 g 2 g 1 initial values for All units prime_table[1] prime_table[2] k N... Determines m,j such that P = m.d-j is a prime k prime_table[pmax D ] 511 511

Local Memory p-1 a) 0 Phase 1 31 0 N g 2 g 1 g 3........ b) 0 Phase 2 31 0 N /d d 2 d d 11 d 13........ g s *s = 2 k -1 d 209 d D d m.d 511 d = g e 511 d md - d j x

Control Unit Phase 1 Phase 2 Memory Initialization Memory Initialization Pre-Computations Modular Exponentiation Reading Out Results Main-Computations Reading Out Results

Unified Architecture ADD/SUB Local Memory for p-1 Control Unit MUL Local Memory for Rho Global Memory

Control Unit Memory Initialization Rho-Computations P-1 -Computations Reading Out Results

Control Unit Total 17 state machines with 140 states 5 state machines with 45 states in Rho 12 state machines with 103 states in P-1 5 Shift registers 9 Registers 13 Counters 22 Comparators Original design

Design Flow

FPGA vs ASIC FPGA Field Programmable Gate Array Array of logic blocks Switchable interconnect resources Final user can set switches Immediate use ( Zero fab time) Not good for high volume applications ASIC Application Specific Integrated Circuit Standard cells and Macros Requires full manufacturing sequence Good for high volume applications

FPGA Design Flow Design Entry Design Verification Specification RTL Description (VHDL / Verilog HDL) Functional Simulation Synthesis Post-Synthesis Simulation Implementation Timing Simulation Configuration On Chip Testing

ASIC Design Flow Front-End Design Synthesis Timing Analysis Design Analyzer Primetime Back-End Design Floorplanning Placement Clock Tree Synthesis Astro Routing Design for Manufacturing

Results

Families of Xilinx FPGA Devices Low-cost High-performance Spartan 3 Virtex II (< $130*) (< $2,700*) Spartan 3E Virtex 4 (< $35*) (< $3,000*) *approximate cost of the largest device per unit for a batch of 10,000 units

FPGA Implementation of Single Units Results Rho P-1 Unified Resources -CLB Slices 1,680(4%) 1,749(5%) 2,042(6%) -LUTs 2,714(4%) 2,875(4%) 3,451(5%) -FFs 1,518(2%) 1,645(2%) 1,740(2%) -BRAMs 0/144 2/144 2/144 Max. Clock Frequency 130 MHz 131 MHz 115 MHz Target device is Virtex II XC2v6000-6

Number of unified units per FPGA 42 19 21 8 Spartan 3 Virtex II Spartan 3E Virtex 4 XC3S5000-5 XC2V6000-6 XC3S1600-5 XC4VLX200-11 Low-cost High-performance Low-cost High-performance

Performance Unified Operations per Second 2,262 819 581 x 1.41 x 7.8 290 Spartan 3 Virtex II Spartan 3E Virtex 4 XC3S5000-5 XC2V6000-6 XC3S1600-5 XC4VLX200-11 Low-cost High-performance Low-cost High-performance

Performance to cost ratio Unified Operations per second per $100 828 447 x 14.9 x 11 30 75 Spartan 3 Virtex II Spartan 3E Virtex 4 XC3S5000-5 XC2V6000-6 XC3S1600-5 XC4VLX200-11 Low-cost High-performance Low-cost High-performance

ASIC - Layout of p-1 - floorplanning

Layout of p-1 - placement

Layout of p-1 clock tree synthesis

Layout of p-1 Global Routing

Layout of p-1 Detailed Routing

Results - ASIC Implementation Unified architecture Operation rho p-1 Area 1.15 mm2 1.21 mm2 1.8 mm2 Max. Clock Frequency 200 MHz 200MHz 200 MHz Time for execution 3.52 ms 9.56 ms 13.1 ms # of operations per second (using maximum no. of units) 96,022 34,100 16,615 Core utilization ratio 70% 70% 65% Area of Virtex II FPGA is 19.68 x 19.8 mm2 (estimation by R.J. Lim Fong, MS Thesis, VPI, 2004)

FPGA vs ASIC - Area 338 ASIC FPGA 322 216 x 17 20 Rho x 14 23 x 10 21 P-1 Unified Area of Virtex II FPGA is 19.68 x 19.8 mm2 (estimation by R.J. Lim Fong, MS Thesis, VPI, 2004)

Rho in an ASIC 130 nm Global Memory Local Memory

ASIC 130 nm vs. Virtex II 6000 rho (20 units) 19.68 mm 19.80 mm 51x Area of Virtex II 6000 (estimation by R.J. Lim Fong, MS Thesis, VPI, 2004) 2.7 mm 2.82 mm Area of an ASIC with equivalent functionality

ASICs vs. FPGAs Source: I. Kuon, J. Rose, University of Toronto Measuring the Gap Between FPGAs and ASICs IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 62, no. 2, Feb 2007.

Contributions Verified the VHDL code through functional and timing simulation by comparison with the operation of test software implementation written in C. Ported the VHDL code to 4 different families of FPGA devices and to a standard-cell ASIC based on 130 nm TSMC library

Conclusions Low-cost FPGA devices, such as Spartan 3, outperformed high-performance devices, such as Virtex II, in terms of performance to cost ratio by a factor of 14.9 ASIC Implementation outperforms FPGA with a factor of 50* in terms of area and 1.5 times in terms of frequency. *In case of rho it is 50, for other architectures it may be less

Conclusions Low cost FPGA devices Spartan 3 and Spartan 3E are suitable for code-breaking ASIC implementation is suitable when large number of chips (>1,000,000) are considered

Future Work Implementation of Trial Division in Hardware Implementation of ECM in Hardware using one multiplier and one adder/subtractor Integrating Trial division, Rho, P-1 and ECM to build a co-factoring machine Experiments on COPACOBANA

Thank you! Questions???