CoProcessor Design for Crypto- Applications using Hyperelliptic Curve Cryptography 28. Februar 2008 Alexander Klimm, Oliver Sander, Jürgen Becker Institut für Technik der Informationsverarbeitung Sylvain Subileau, Daimler AG Institut für Technik der Informationsverarbeitung
Overview Motivation Public Key Cryptography Hyperelliptic Curve Cryptography (HECC) Hardware/Software Codesign for HECC on a Xilinx FPGA Measurements and Evaluation Outlook & future Work ITIV 2008, Alexander Klimm 2
Motivation Increased need for security of embedded systems Increasing number of embedded devices cell phones, PDAs, ECUs, etc. Networks of embedded devices ECUs, ubiquitous computing, etc. Applications need secure systems chip tuning, c2x, toll systems, etc. Communication and data transfers need to be secured. Industry very cost driven Small Platforms low computation power small memory space short time-to-market ITIV 2008, Alexander Klimm 3
Motivation Public Key Cryptography (PKC) in Embedded Systems Advantages: Less storage memory for keys needed Secrets stay inside an entity If one entity is compromised the others remain still secure Easier logistics RSA standard for PKC: Long keys (1024 bit) Probably not secure enough in the future ECC & HECC: Smaller keys (163 bit) and same level of security as RSA computational intensive algorithms Goal: HECC based protocol (few patents) Acceleration of used algorithms Small hardware platform (FPGA) ITIV 2008, Alexander Klimm 4
Public-Key Cryptography Eve? Basic Crypto: Alice encrypts a message with a key. Alice m c m Bob Bob decrypts the received cyphertext with his key that matches the encryption key. E(m)=c D(c)=m Disadvantage: Alice needs a key for every possible communication partner. Alice Bob Public-Key Crypto: m c m All encryption-keys are public. E e (m)=c D d (c)=m? Only Bob can decrypt a message that is meant for him with his secret private key. Eve ITIV 2008, Alexander Klimm 5
Elliptic Curves - PointAdd - P+Q = R P+(-P) = O O: defined as Point of Infinity Quelle: Certicom Corp. ITIV 2008, Alexander Klimm 6
HECC Hyperelliptic Curve Cryptography - PointAdd - 2 5 C2 : v = u 5u + 4u + 3 3 ITIV 2008, Alexander Klimm 7
HW/SW Codesign - Design Approach - HECC Implementations so far only in SW Addition Inversion Software Protocol Scalar Mult. HECC Arithmetic PointAdd PointDouble GF2N Arithmetic Multiplication Multiplication MAC MAC Hardware SW Implementions are too slow: Duration of one Scalar Multiplication: Freescale Star12 (16 Bit, 16 MHz): > 5000 ms PowerPC (32 Bit, 80 MHz): > 500 ms Optimized Code ca. 100 ms Goal: 50 ms Implementation on MicroBlaze (32 Bit, 33 MHz) and outsourcing of time-consuming calculations to HW Evaluation of performance and adaptation of HW/SW ITIV 2008, Alexander Klimm 8
System Components FPGA Gatewayplatform Spartan3-5000 (33 MHz) Microcontroller MicroBlaze (32 bit) OPB Bus Microcontroller opb_uartlite Output of Data to PC (i.e. Testlogs) Counter_verylight Co-Processor opb_gpio: LED Debug outputs Counter-verylight a precise Counter to count the system s clock-cycles opb_gpio LED opb_uartlite RS232 ITIV 2008, Alexander Klimm 9
CoProzessordesign - Hardware Modules - Hardware Units used: GF(2 n ) Multiplier - MALUd1 Operand A 83 GF-Add (XOR-Logic of two 83 Bit input signals) MAC (Multiply-Accumulate) Operand B MULTIPLY MALUd1 MAC Unit 83 Operand C 83 ADD 83 Result ITIV 2008, Alexander Klimm 10
MALUd1 - Setup - GF2n Multiplication - Shift&Add Algorithm with simultaneous reduction of result Setup of module Cell Reduction by adding a reduction polynom (hardcoded, XOR) a_i b(4) b(3) b(2) b(1) b(0) t_next(4) t_next(3) t_next(2) t_next(1) t_next(0) t(5) t(4) p(4) t(3) p(3) t(2) p(2) t(1) p(1) t(0) p(0) ITIV 2008, Alexander Klimm 11
Connection of Peripherals to the MicroBlaze Vergleich der Varianten bzgl. Implementierung MicroBlaze ALU (FIFO), zeit, strom Entschlüsselte Daten Instructions Register 32x32 Data-side bus interface OPB CoProzessor (Option A) Interface options a) On-Chip Peripheral Bus (OPB) b) Fast Simplex Link (FSL) FSL1 FSL2 CoProzessor (Option B) ITIV 2008, Alexander Klimm 12
Tradeoff: Performance vs. Secure System Vergleich der Varianten bzgl. Implementierung MicroBlaze ALU (FIFO), zeit, strom Entschlüsselte Daten Instructions FSL1 Register 32x32 FSL2 Data-side bus interface OPB CoProzessor Processor Software Algorithms Peripheral System Busses Memory Implementation CoProzessor SIDECHANNEL AWARENESS! ITIV 2008, Alexander Klimm 13
Interface MAC/MicroBlaze via OPB -Overview- MicroBlaze ALU Reg Option B Option A MicroBlaze (OPB Master) OPB_Mult2 (OPB Slave) IPIF interface USER_LOGIC_I (user_logic.vhd) FSM: Data Control MAC (MALUd1.vhd + XOR Logic) Registers are accessible (read/write) by Software. Busprotocol is implemented by IPIF. Function of Slave is implemented in user_logic.vhd. OPB Register for Operand A Register for Operand B Register for Operand C Register for Result ITIV 2008, Alexander Klimm 14
Interface Multiplier/MicroBlaze via FSL -Overview- MicroBlaze ALU Reg Option A MicroBlaze fsl_hwa fsl_interface (user_logic.vhd) Option B FSL0 FSM: Data Control MULTIPLY FIFO Register for Operand A Data transfer over two FIFO, embedded into the MicroBlaze. Dataflow Control in User Logic FSL1 Register for Operand B Register for Result ITIV 2008, Alexander Klimm 15
Comparision of Basic GF Operations 300 Very high benefit for GF- Multiplication 250 Almost no gain for GF-Addition # clk 200 150 MAC Unit beneficial for PointAdd/-Double operations 100 50 0 Software OPB FSL FSL-MAC gf-add (u1+u2) 93 85 not implemented not implemented gf-mult (u1*u2) 12391 168 131 131 ITIV 2008, Alexander Klimm 16
Communication-Overhead GF-Multiplication: 180 160 140 120 100 80 60 40 20 0 OPB- Interface FSL- Interface Communication MicroBlaze-HW internal Datatransfer Multiplication OPB : 50% of processing time needed for communication between MicroBlaze and HW. FSL : over 30% of processing time needed for communication between MicroBlaze and HW. ITIV 2008, Alexander Klimm 17
Speed Measurement - Comparison Point Add/Point Double & Scalar Multiplication - 100 90 80 70 ms 60 50 40 max. time 30 20 10 0 Software OPB-Mul OPB-MAC FSL-MUL FSL-MAC hecc-pointadd 30,276 14,014 13,981 13,959 13,925 hecc-pointdouble 25,343 13,591 13,571 13,55 13,529 hecc-scalarmult ITIV 2008, Alexander Klimm 2789 62,39 57,479 53,981 48,858 18
Resources Used on FPGA Platform - Spartan 500 E - Spartan5000 Spartan1000 Spartan500E MicroBlaze UART-RS232 Other Peripherals CoProcessor Counter_verylight GF_MUL (OPB) Available Resources Spartan3E S500 ft 256-4: 4656 Slices, 9312 FF, 9312 LUTs, Used Resources MicroBlaze: 1020 Slices, 811 FF, 1597 4 input LUTs CoProzessor: 517 Slices, 509 FF, 920 4 input LUTs UART - RS232 (OPB): 258 Slices, 277 FF, 438 4 input LUTs Counter_verylight: 188 Slices, 211 FF, 298 4 input LUTs ITIV 2008, Alexander Klimm 19
Summary Goals reached: Time Constraints are met if FSL interfacing is employed. Minimal resources are used. System can be implemented on a fairly small FPGA (Spartan 500E). Evaluation: FSL Interface fast, but FIFOs tend to be very power consuming. OPB Interface too slow to meet timing requirements. Next Steps: Evaluation of the systems security (Side-Channels). Optimization of SW, and CoProzessor. ITIV 2008, Alexander Klimm 20
Future Work Architectures CoProcessor PicoBlaze NiosII FPGA Security Storage of secrets, secure memory on FPGAs avoidance of unauthorized access to FPGA and/or its bitstream How are known side-channel attacks a danger to FPGA implementation? Countermeasures against security threats ITIV 2008, Alexander Klimm 21
Thank you for your attention. Any Questions? Alexander Klimm Universität Karlsruhe (TH) ITIV (Institut für Technik der Informationsverarbeitung ) email: klimm@itiv.uni-karlsruhe.de ITIV 2008, Alexander Klimm 22