Medical Algorithms of an Elliptical Portfolio



Similar documents
An Innovate Dynamic Load Balancing Algorithm Based on Task

Efficient Key Management for Secure Group Communications with Bursty Behavior

Applying Multiple Neural Networks on Large Scale Data

ASIC Design Project Management Supported by Multi Agent Simulation

Cooperative Caching for Adaptive Bit Rate Streaming in Content Delivery Networks

Resource Allocation in Wireless Networks with Multiple Relays

JON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT

ON SELF-ROUTING IN CLOS CONNECTION NETWORKS. BARRY G. DOUGLASS Electrical Engineering Department Texas A&M University College Station, TX

Real Time Target Tracking with Binary Sensor Networks and Parallel Computing

The Path to Program Sustainability

Machine Learning Applications in Grid Computing

arxiv: v1 [math.pr] 9 May 2008

Fuzzy TOPSIS and GP Application for Evaluation And Selection of a Suitable ERP

Online Bagging and Boosting

Design of Model Reference Self Tuning Mechanism for PID like Fuzzy Controller

The Concept of the Effective Mass Tensor in GR. The Equation of Motion

CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY

2. FINDING A SOLUTION

State of Louisiana Office of Information Technology. Change Management Plan

PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO

CPU Animation. Introduction. CPU skinning. CPUSkin Scalar:

Analyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy

Software Quality Characteristics Tested For Mobile Application Development

International Journal of Management & Information Systems First Quarter 2012 Volume 16, Number 1

REQUIREMENTS FOR A COMPUTER SCIENCE CURRICULUM EMPHASIZING INFORMATION TECHNOLOGY SUBJECT AREA: CURRICULUM ISSUES

Managing Complex Network Operation with Predictive Analytics

An Optimal Task Allocation Model for System Cost Analysis in Heterogeneous Distributed Computing Systems: A Heuristic Approach

Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network

An Approach to Combating Free-riding in Peer-to-Peer Networks

Preference-based Search and Multi-criteria Optimization

Use of extrapolation to forecast the working capital in the mechanical engineering companies

Evaluating Inventory Management Performance: a Preliminary Desk-Simulation Study Based on IOC Model

Calculating the Return on Investment (ROI) for DMSMS Management. The Problem with Cost Avoidance

Fuzzy Sets in HR Management

Reliability Constrained Packet-sizing for Linear Multi-hop Wireless Networks

An Integrated Approach for Monitoring Service Level Parameters of Software-Defined Networking

Enterprise Resource Planning

Protecting Small Keys in Authentication Protocols for Wireless Sensor Networks

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2

Homework 8. problems: 10.40, 10.73, 11.55, 12.43

Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks

Adaptive Modulation and Coding for Unmanned Aerial Vehicle (UAV) Radio Channel

CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS

Implementation of Active Queue Management in a Combined Input and Output Queued Switch

A framework for performance monitoring, load balancing, adaptive timeouts and quality of service in digital libraries

Inverse Trig Functions

Improving Emulation Throughput for Multi-Project SoC Designs

Impact of Processing Costs on Service Chain Placement in Network Functions Virtualization

Models and Algorithms for Stochastic Online Scheduling 1

Implementation and Design of AES S-Box on FPGA

Information Processing Letters

Capacity of Multiple-Antenna Systems With Both Receiver and Transmitter Channel State Information

A Multi-Core Pipelined Architecture for Parallel Computing

Equivalent Tapped Delay Line Channel Responses with Reduced Taps

Math , Fall 2012: HW 1 Solutions

GPRS performance estimation in GSM circuit switched services and GPRS shared resource systems *

How To Balance Over Redundant Wireless Sensor Networks Based On Diffluent

Firewall Design: Consistency, Completeness, and Compactness

An Improved Decision-making Model of Human Resource Outsourcing Based on Internet Collaboration

Introduction to the Microsoft Sync Framework. Michael Clark Development Manager Microsoft

10.2 Systems of Linear Equations: Matrices

Partitioning Data on Features or Samples in Communication-Efficient Distributed Optimization?

An online sulfur monitoring system can improve process balance sheets

Method of supply chain optimization in E-commerce

Markov Models and Their Use for Calculations of Important Traffic Parameters of Contact Center

Implementation of Full -Parallelism AES Encryption and Decryption

A CHAOS MODEL OF SUBHARMONIC OSCILLATIONS IN CURRENT MODE PWM BOOST CONVERTERS

Research Article Performance Evaluation of Human Resource Outsourcing in Food Processing Enterprises

The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic Jobs

Image restoration for a rectangular poor-pixels detector

On Computing Nearest Neighbors with Applications to Decoding of Binary Linear Codes

Pure Bending Determination of Stress-Strain Curves for an Aluminum Alloy

IMPLEMENTATION AND PERFORMANCE ANALYSIS OF ELLIPTIC CURVE DIGITAL SIGNATURE ALGORITHM

Optimal Resource-Constraint Project Scheduling with Overlapping Modes

Modeling Nurse Scheduling Problem Using 0-1 Goal Programming: A Case Study Of Tafo Government Hospital, Kumasi-Ghana

Lecture L25-3D Rigid Body Kinematics

The Application of Bandwidth Optimization Technique in SLA Negotiation Process

Quality evaluation of the model-based forecasts of implied volatility index

Load Control for Overloaded MPLS/DiffServ Networks during SLA Negotiation

INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES

Media Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation

Bellini: Ferrying Application Traffic Flows through Geo-distributed Datacenters in the Cloud

Searching strategy for multi-target discovery in wireless networks

CITY BANK AMERICAN EXPRESS CREDIT CARD APPLICATION FORM

INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE SYSTEMS

Presentation Safety Legislation and Standards

Transcription:

Coputers an Electrical Engineering 35 (29) 54 58 Contents lists available at ScienceDirect Coputers an Electrical Engineering journal hoepage: www.elsevier.co/locate/copeleceng An area/perforance trae-off analysis of a GF(2 ) ultiplier architecture for elliptic curve cryptography Miguel Morales-Sanoval, Clauia Feregrino-Uribe, René Cuplio *, Ignacio Algreo-Baillo Coputer Science Departent, National Institute for Astrophysics, Optics an Electronics, Luis Enrique Erro No. 1, Tonantzintla, Pue. 7284, Mexico article info abstract Article history: Receive 24 January 27 Receive in revise for 26 Noveber 27 Accepte 27 May 28 Available online 31 August 28 A harware architecture for GF(2 ) ultiplication an its evaluation in a harware architecture for elliptic curve scalar ultiplication is presente. The architecture is a paraeterizable igit-serial ipleentation for any fiel orer. Area/perforance trae-off results of the harware ipleentation of the ultiplier in an FPGA are presente an iscusse. Ó 28 Elsevier Lt. All rights reserve. 1. Introuction Finite fiels like the binary GF(2 ) an the prie GF(p) have been use successfully in error correction coes an cryptographic algoriths. In elliptic curve cryptography (ECC), the overall perforance of cryptographic ECC schees is harly eterine by arithetic in GF(2 ), being inversion an ultiplication the ost tie consuing operations. Accoring to the literature, arithetic in GF(2 ) binary fiels using polynoial basis leas to efficient harware ipleentations of ECC. Soe works relate to harware ipleentation of ECC have reporte paraeterizable GF(2 ) arithetic units to copute the ost tie consuing operation in elliptic curve cryptography, the scalar ultiplication. Those architectures are base on a iversity of ultiplication algoriths, for exaple: Massey Oura ultipliers [1], linear feeback shift registers ultipliers [2], Karatsuba [3,4], an igit-serial ultipliers [5]. Other works have stuie an ipleente GF(2 ) ultipliers using polynoial basis like [8,9]. Others have use ifferent algoriths, like the Montgoery ultiplication [1,11]. Although, fro the architectural point of view, it is well known that the arithetic unit has a big ipact in the tiing an area of harware for scalar ultiplication, it is not clear whether the architecture perforance is ue to the parallelis in the ultipliers, the nuber of ultipliers, or the kin of ultipliers use. This technical counication presents the harware architecture of a GF(2 ) igit-serial ultiplier an evaluates the area/perforance trae off, consiering various igit sizes an finite fiel orers. 2. GF(2 ) ultiplication architecture Multiplication in GF(2 ) in polynoial basis is the operation A(x) B(x) o F(x), that can be copute using a variety of propose algoriths in the literature. On the one han, serial or bit-serial algoriths, consier each iniviual bit of the operan B(x) which iplies a latency for ultiplication of clock cycles. On the other han, igit-serial ultipliers consier a group of bits of operan B(x) at tie an perfor the ultiplication in / cycles. However, it is not clear which is the * Corresponing author. Tel.: +52 222 26631; fax: +52 222 2663152. E-ail aress: rcuplio@inaoep.x (R. Cuplio). 45-796/$ - see front atter Ó 28 Elsevier Lt. All rights reserve. oi:1.116/j.copeleceng.28.5.8

M. Morales-Sanoval et al. / Coputers an Electrical Engineering 35 (29) 54 58 55 best size of for this kin of ultiplier to achieve an appropriate perforance that eets the constraints for a specific application. Varying the size of the igit allows to explore the cost in area an perforance iproveents fro a serial ipleentation up to a parallel ultiplication architecture. At each iteration, the operan A(x) is ultiplie by a group of bits of operan B(x) an the result is reuce oulo F(x). The result is ae accuulatively to the result of the next iteration, consiering the following bits of B(x) until all B(x) bits are processe. The reuction in the operation latency coes with an increent in the coplexity at each step of the ultiplication. For our ipleentation, we consier the igit serial Algorith 1 [6], the sae algorith use for the work reporte in [5], an show the ifferent area/tie results when the igit size is varie. This will help esigners to select suitable paraeters when ipleenting architectures for high level applications like cryptographic algoriths or error correction coe algoriths. Algorith 1. Digit-serial ultiplication: ultiplication in GF(2 ) Require: A(x), B(x) in GF(2 ), F(x) the + 1 grae irreucible polynoial Ensure: C(x)=A(x) * B (x) o F(x) 1: C(x) B s 1 (x)a(x) o F(x) 2: for k fro s 2 own to o C(x) x C(x) C(x) C(x)+B k (x)a(x) o F(x) en for Being B(x) an eleent in GF(2 ) using polynoial basis, this is viewe as the polynoial b 1 x 1 + b 2 x 2 + + b 1 x + b. For a positive igit nuber <, the polynoial B(x) can be groupe so that it can be expresse as B(x)=x (s 1) B s 1 (x)+x (s 2) B s 2 (x)++ x B 1 (x)+b (x), where s = /e an each wor B i (x) is efine as follows: 8 P 1 >< b iþj x j if 6 i < s 1; B i ðxþ ¼ >: j¼ ð%þ 1 P j¼ b iþj x j if i ¼ s 1: If x is factore fro the groupe representation of B(x), the resulting expression is BðxÞ ¼x ðx ð ðx ðx B s 1 ðxþþb s 2 ðxþþþþþb 1 ÞþB Þ: This last representation of operan B(x) is use in Algorith 1 to copute the fiel ultiplication. That is, A(x)B(x) o F(x)=x (x ((x (x B s 1 (x)a(x)+b s 2 (x)a(x)) +)+ B 1 A(x)) + B A(x)) o F(x). At each iteration, the accuulator C(x) is ultiplie by x an the result is ae to the ultiplication of A(x) by each wor B i (x) ofb(x). The partial result C(x) is reuce oulo F(x). CðxÞ ¼B s 1 ðxþaðxþ o FðxÞ Initialization CðxÞ ¼x CðxÞ o FðxÞ ¼x B s 1 ðxþaðxþ o FðxÞ Iteration s 2 CðxÞ ¼x B s 1 ðxþaðxþþb s 2 ðxþaðxþ o FðxÞ CðxÞ ¼x CðxÞ o FðxÞ ¼x ðx B s 1 ðxþaðxþþb s 2 ðxþaðxþþ o FðxÞ Iteration s 3 CðxÞ ¼x ðx B s 1 ðxþaðxþþb s 2 ðxþaðxþþ þ B s 3 ðxþaðxþ o FðxÞ The propose architecture for Algorith 1 is shown In the left sie of Fig. 1. A finite state achine controls the ata flow executing the loop in Algorith 1. At each iteration, a new igit of bits fro B(x) is processe so the operation is perfore in /e cycles. The operations x C(x) an B i (x)a(x) are copute using parallel cobinatorial ultipliers, that ultiplies a 1 grae polynoial with a 1 grae polynoial. Being U(x) a 1 grae polynoial u 1 x 1 + u 2 x 2 +...+ u 1 x + u, an A(x) a 1 grae polynoial, the parallel ultiplication is UðxÞAðxÞ o FðxÞ ¼u 1 x 1 AðxÞ o FðxÞ þ u 2 x 2 AðxÞ o FðxÞ þ þ u 1 xaðxþ o FðxÞ þ u AðxÞ o FðxÞ: The operation xa(x) o F(x) is a shift to the left operation of A(x) together a reuction of F(x). Thus, the value x i A(x) o F(x) is the shifte an reuce version of x i 1 A(x) o F(x). So each value x i A(x) o F(x) can be generate sequentially starting with x A(x). Finally, each x i A(x) o F(x) value is ae epening on the bit value of u i. These operations are execute by the parallel ultiplier shown in the right sie of Fig. 1.

56 M. Morales-Sanoval et al. / Coputers an Electrical Engineering 35 (29) 54 58 A(x) B(x) 1 Shift Digit B(x)-register A(x) F(x) Cobinatorial Multiplier S&R S&R S&R S&R Cobinatorial Multiplier U(x) u u 1 u 2 u -2 u -1 Q 1 (x) Q 2 (x) XOR C(x)-register S&R = A(x)x o F(x) - A(x) U(x) o F(x) C(x) = A(x) B(x) o F(x) GF(2 ) igit ultiplier architecture Parallel Cobinatorial ultiplier Fig. 1. Harware architecture for igit serial finite fiel ultiplication. The operation x C(x) o F(x) is copute in two steps. Using the polynoial representation of C(x), x CðxÞ o FðxÞ ¼x ðc 1 x 1 þ c 2 x 2 þþc x þ c 1 x 1 þþc 1 x þ c Þ o FðxÞ ¼ x ðc 1 x 1 þ c 2 x 2 þþc x Þ o FðxÞþx ðc 1 x 1 þþc 1 x þ c Þ o FðxÞ ¼ðc 1 x þ-1 þ c 2 x þ 2 þþc x Þ o FðxÞþðc 1 x 1 þþc 1 x þ1 þ c x Þ o FðxÞ ¼ Q 1 ðxþ o FðxÞþQ 2 ðxþ o FðxÞ: Q 2 (x)isa 1 grae polynoial, corresponing to the least significant bits of C(x) shifte positions to the left. Q 2 (x) oes not nee to be reuce. By factoring x fro Q 1 (x), it is obtaine Q 1 (x)=x (c 1 x 1 + c 2 x 2 + + c ). In this case, being F(x) a + 1 trinoial or pentanoial of the for F(x)=x + g(x), where g(x) is a polynoial with grae g, the equivalence x g(x) can be use. In this case, g(x) correspons to all bits of F(x) except the -bit. Thus, Q 1 (x) o F(x)=g(x)(c 1 x 1 + c 2 x 2 ++ c ). That is, the operation is a ultiplication of g(x) of grae g, an a polynoial of grae, corresponing to the ost significant bits of C(x). The resulting polynoial is of grae g +. In all the cases the polynoial F(x) use in the tests for the finite fiels 2 {163,233,283,49,571}, an igits {1,4,8,16,32}, the value g +, so no reuction is necessary. The polynoial g(x) is expane to a 1 grae polynoial so Q 1 (x) o F(x) be copute using the parallel cobinatorial ultiplier. All these coputations are perfore by the oules in the architecture for the ultipliers, which inclues the parallel ultipliers, a shift to the left oule of -bits, two registers an a 3-input xor gate. 3. Ipleentation an results The architecture was esigne in VHDL, siulate an valiate using Active-HDL an a test progra in C. The architecture is paraetrizable in the file orer for any value. The average syste throughput of the architecture was obtaine by synthesizing it to several finite fiel orers for the reconfigurable evice xc2v2 FPGA, using Xilinx s tools. The ultiplier was ipleente for the fiel orers = 163, 233, 283, 49 an 571 recoene by NIST [7] for elliptic curve cryptography, an for the fiel = 277 recoene by IPSec. Due the large nuber of I/O pins in the architecture, the GF(2 ) ultiplier was ipleente together an I/O interface. This is a finite state achine that gets the input paraeters A(x) an B(x) as 32-bit wors an once the operation is copute, it elivers the results in several 32-bit wors. The results presente in figures inclue the I/O interface. We also investigate the perforance of the ultiplier consiering the processing tie. Fig. 2 shows the processing tie for specific finite fiels an igit sizes an Fig. 3 shows the area resources require for each one of these finite fiels an igit sizes. Fro these figures, it can be observe that the bigger the igit, the better the perforance, but the higher area requireents. Latency of the ultiplier is ainly reuce by the size of the igit. Fro Fig. 2, it is seen that the ifference in tiing between the igit size 16 an 32 bits is not significant, thus the extra cost in ters of area for igit sizes greater that 32 bits is not justifie.

M. Morales-Sanoval et al. / Coputers an Electrical Engineering 35 (29) 54 58 57 Tie (us) 2.6 2.3 2 1.7 1.4 1.1 igit = 1 igit = 4 igit = 8 igit = 16 igit = 32.8.5.2.1 15 2 25 3 35 4 45 5 55 6 (fiel orer) Fig. 2. Tie (us) to copute GF(2 ) ultiplication using ifferent parallelis grae an finite fiel orers. 12 Area (Slices) 1 8 6 igit = 1 igit = 4 igit = 8 igit = 16 igit = 32 4 2 15 2 25 3 35 4 45 5 55 6 (fiel orer) Fig. 3. Area (slices) resources for ifferent parallelis grae an finite fiel orers. The application constraints will guie the selection of the best ipleentation paraeters. As an exaple of an application, consier a reconfigurable architecture for scalar ultiplication in elliptic curve cryptography that anages several finite fiel orers, but only assigns fixe space for the fiel ultiplier. For each specific finite fiel orer, there is a igit size that axiizes the perforance of the ultiplier for a fixe area. For exaple, with 4 K gates the best perforer is a 4- igit fiel ultiplier for the fiel = 571. In this area, we coul also ipleent a 8-igit or a 16-igit ultiplier for the fiels = 49 an = 277, respectively, an so on. It is worth to ention that the results presente in this technical counication were obtaine fro place an route optiize for spee an without keeping the hierarchical structure of the esign. Finally, Table 1 shows a coparison of the area results an perforance achieve in this work against the results presente in [8] for several kins of parallel ultipliers, using the fiel = 233 an the sae technology, a virtex2 FPGA xc2v6-4. In this coparison, the I/O interface was not use. The results show that the igit serial solution requires less area, 1 ties lower for = 32 copare to the parallel ipleentation of the classical ultiplier at the cost of six ore clock cycles. In all the cases, the igit-serial ultiplier has greater frequency which iplies this oule can be integrate to other esigns working at high frequencies. The ultiplier using = 1 achieves better tiing (269 MHz,.6us) copare with the bit-serial ipleentation in [9] (42 MHz, 7.4 us) for the finite fiel = 163. Other works have ipleente finite fiel ultipliers an use the in elliptic curve coprocessors or processors [1 5] but the results for the stanalone ultiplier are not available. Others have ipleente the ultiplier for the GF(p) finite fiel so a irect coparison is not possible [1,11].

58 M. Morales-Sanoval et al. / Coputers an Electrical Engineering 35 (29) 54 58 Table 1 Area an tie coparison results Ref. Multiplier LUT/FF Slices Gate count Clock perio (ns) Frequency (MHz) [8] Classical (est.) 37,296/37,552 528,427 13. 77 [8] HybriKaratsuba 11,746/13,941 182,7 11.7 9.3 [8] MasseyOura 36,857/8,543 289,489 15.91 62.8 [8] SunarKoc 45,435/41,942 68,149 1.73 93.2 This work ( = 1) 484/477 246 6731 4.26 234.8 This work ( = 4) 1188/634 766 12,95 4.82 27.2 This work ( = 8) 2115/71 1384 18,394 5.32 187.7 This work ( = 16) 44/889 2436 31,139 6.19 161.5 This work ( = 32) 711/1349 4457 53,647 6.59 151.6 4. Conclusions An area/perforance trae off analysis for a igit-serial GF(2 ) finite fiel ultiplication was presente. The size of the igit to use in an application of the propose ultiplier architecture will be guie by the area assigne to the ultiplier. Also, the require processing tie an which other igits can be use to axiize the perforance for other fiel orer using greater igits shoul be taken into account. Acknowlegents First author thanks the National Council for Science an Technology fro Mexico (CONACyT) for financial support through the scholarship nuber 171577. References [1] Ernest M, Klupsch S, Hauck O, Huss SA. Rapi prototyping for harware accelerate elliptic curve public key cryptosystes. In: Proceeings of 12th IEEE workshop on rapi syste prototyping, RSP 21, Monterey, CA; June 21, p. 24 31. [2] Benara M, Dalrup M, Gathen J, Shokrollahi J, Teich J. Reconfigurable ipleentation of elliptic curve crypto algoriths. In: IPDPS 2: Proceeings of the 16th international parallel an istribute processing syposiu. Washington, DC, USA: IEEE Coputer Society; 22. p. 284 91. [3] Ernest M, Jung M, Malener F, Huss S, Blüel R. A reconfigurable syste on chip ipleentation for elliptic curve cryptographyover GF(2 ). In: Proceeings of the 4th international workshop on cryptographic harware an ebee systes CHES 22. Lecture notes in coputer science, vol. 2523. Rewoo Shores, CA: Springer; 22. p. 381 99. [4] Saquib N, Roriguez F, Diaz A. A parallel architecture for fast coputation of elliptic curve scalar ultiplication over GF(2 ). In: Proceeings of 11th reconfigurable architectures workshop, RAW 4, Sta. Fe, USA; April 24. p. 26 7. [5] Lutz J, Hasan A. High perforance FPGA base elliptic curve cryptographic co-processor. ITCC 4: international conference on inforation technology: coing an coputing, vol. 2. IEEE Society Press; 24. p. 486 92. [6] Lutz Jonathan. High perforance elliptic curve cryptographic co-processor. Master s thesis, University of Waterloo; 23. [7] NIST. Recoene elliptic curves for feeral governent use. <http://csrc.nist.gov/csrc/festanars.htl>; 1999. [8] Grabbe C, Benara M, Teich J, von zur Gathen J, Shokrollahi J. FPGA esigns of parallel high perforance GF(2 233 ) ultipliers. In: Proceeings of IEEE ISCAS 3, vol. II; 23. p. 268 71. [9] Kitsos P, Theooriis G, Koufopavlou O. An efficient reconfigurable ultiplier architecture for galois fiel GF(2 ). Microelectron J 23;34(1). [1] Savasß E, Tenca AF, Koç ÇK. A scalable an unifie ultiplier architecture for finite fiels GF(p) an GF(2 ). Cryptographic harware an ebee systes, LNCS no. 1965; August 2. p. 281 96. [11] Tenca AF, Koc Cetin K. A scalable architecture for oular ultiplication base on Montgoery s algorith. IEEE Trans Coput 23;52(9):1215 21.