Performance Investigations. Hannes Tschofenig, Manuel Pégourié-Gonnard 25 th March 2015

Similar documents
Forward Secrecy: How to Secure SSL from Attacks by Government Agencies

NXP & Security Innovation Encryption for ARM MCUs

A framework towards adaptable and delegated end-to-end transport-layer security for Internet-integrated Wireless Sensor Networks

Security Technical. Overview. BlackBerry Enterprise Service 10. BlackBerry Device Service Solution Version: 10.2

Thanks, But No Thanks

Pulse Secure, LLC. January 9, 2015

Secure Network Communications FIPS Non Proprietary Security Policy

NIST Cryptographic Algorithm Validation Program (CAVP) Certifications for Freescale Cryptographic Accelerators

I N F O R M A T I O N S E C U R I T Y

SMPTE Standards Transition Issues for NIST/FIPS Requirements v1.1

FIPS Non- Proprietary Security Policy. McAfee SIEM Cryptographic Module, Version 1.0

National Security Agency Perspective on Key Management

I N F O R M A T I O N S E C U R I T Y

Information Security

RSA BSAFE. Crypto-C Micro Edition for MFP SW Platform (psos) Security Policy. Version , October 22, 2012

CRYPTOGRAPHY AS A SERVICE

FUJITSU Software BS2000 internet Services. Version 3.4A November Readme

UM0586 User manual. STM32 Cryptographic Library. Introduction

Cryptographic Algorithms and Key Size Issues. Çetin Kaya Koç Oregon State University, Professor

Safeguarding Data Using Encryption. Matthew Scholl & Andrew Regenscheid Computer Security Division, ITL, NIST

White Paper. Enhancing Website Security with Algorithm Agility

Computer Security: Principles and Practice

Authentication requirement Authentication function MAC Hash function Security of

SPC5-CRYP-LIB. SPC5 Software Cryptography Library. Description. Features. SHA-512 Random engine based on DRBG-AES-128

Secure Socket Layer (SSL) and Transport Layer Security (TLS)

WIRELESS LAN SECURITY FUNDAMENTALS

OpenADR 2.0 Security. Jim Zuber, CTO QualityLogic, Inc.

Secure Sockets Layer (SSL ) / Transport Layer Security (TLS) Network Security Products S31213

Haswell Cryptographic Performance

Lecture 9: Application of Cryptography

NIST Test Personal Identity Verification (PIV) Cards

2. Cryptography 2.4 Digital Signatures

Table of Contents. Bibliografische Informationen digitalisiert durch

2014 IBM Corporation

Certicom Security for Government Suppliers developing client-side products to meet the US Government FIPS security requirement

Outline. Transport Layer Security (TLS) Security Protocols (bmevihim132)

SPINS: Security Protocols for Sensor Networks

NANOSSH Mocana s comprehensive SSH and RADIUS developers suite, purpose-built for resource-constrained or high-performance device environments.

An Introduction to Cryptography as Applied to the Smart Grid

Security Technical. Overview. BlackBerry Enterprise Server for Microsoft Exchange. Version: 5.0 Service Pack: 4

Accellion Secure File Transfer Cryptographic Module Security Policy Document Version 1.0. Accellion, Inc.

Final Exam. IT 4823 Information Security Administration. Rescheduling Final Exams. Kerberos. Idea. Ticket

, ) I Transport Layer Security

Symantec Mobility: Suite Server Cryptographic Module

Security Engineering Part III Network Security. Security Protocols (I): SSL/TLS

CPA SECURITY CHARACTERISTIC IPSEC VPN GATEWAY

BlackBerry Enterprise Server 5.0 SP3 and BlackBerry 7.1

IoT Security Platform

Usable Crypto: Introducing minilock. Nadim Kobeissi HOPE X, NYC, 2014

Figure 1: Application scheme of public key mechanisms. (a) pure RSA approach; (b) pure EC approach; (c) RSA on the infrastructure

SE 4472a / ECE 9064a: Information Security

RSA BSAFE. Security tools for C/C++ developers. Solution Brief

Overview of Cryptographic Tools for Data Security. Murat Kantarcioglu

FIPS Non-Proprietary Security Policy. IBM Internet Security Systems SiteProtector Cryptographic Module (Version 1.0)

SSL BEST PRACTICES OVERVIEW

Using etoken for SSL Web Authentication. SSL V3.0 Overview

Symantec Corporation Symantec Enterprise Vault Cryptographic Module Software Version:

OFFICIAL SECURITY CHARACTERISTIC MOBILE DEVICE MANAGEMENT

Network Security. Computer Networking Lecture 08. March 19, HKU SPACE Community College. HKU SPACE CC CN Lecture 08 1/23

Cryptography and Key Management Basics

FIPS Security Policy LogRhythm Log Manager

BlackBerry Enterprise Solution

Overview of CSS SSL. SSL Cryptography Overview CHAPTER

Network Security Services (NSS) Cryptographic Module Version

Randomized Hashing for Digital Signatures

GCM-SIV: Full Nonce Misuse-Resistant Authenticated Encryption at Under One Cycle per Byte. Yehuda Lindell Bar-Ilan University

Chapter 11 Security+ Guide to Network Security Fundamentals, Third Edition Basic Cryptography

How To Encrypt Data With Encryption

The Impact of Cryptography on Platform Security

Triathlon of Lightweight Block Ciphers for the Internet of Things

TLS and SRTP for Skype Connect. Technical Datasheet

GNUTLS. a Transport Layer Security Library This is a Draft document Applies to GnuTLS by Nikos Mavroyanopoulos

Using BroadSAFE TM Technology 07/18/05

Connected from everywhere. Cryptelo completely protects your data. Data transmitted to the server. Data sharing (both files and directory structure)

Transitions: Recommendation for Transitioning the Use of Cryptographic Algorithms and Key Lengths

BroadSAFE Enhanced IP Phone Networks

Northrop Grumman M5 Network Security SCS Linux Kernel Cryptographic Services. FIPS Security Policy Version

Security Protocols HTTPS/ DNSSEC TLS. Internet (IPSEC) Network (802.1x) Application (HTTP,DNS) Transport (TCP/UDP) Transport (TCP/UDP) Internet (IP)

FIPS Level 1 Security Policy for Cisco Secure ACS FIPS Module

Real-Time Communication Security: SSL/TLS. Guevara Noubir CSU610

Managed Portable Security Devices

Communication Security for Applications

TLS all the tubes! TLS Fast Yet? IsWebRTC. It can be. Making TLS fast(er)... the nuts and bolts. +Ilya

SEC 2: Recommended Elliptic Curve Domain Parameters

C O M P U T E R S E C U R I T Y

The Secure Sockets Layer (SSL)

Ciphire Mail. Abstract

Announcement. Final exam: Wed, June 9, 9:30-11:18 Scope: materials after RSA (but you need to know RSA) Open books, open notes. Calculators allowed.

High-speed high-security cryptography on ARMs

Network Security Part II: Standards

SECURE USB FLASH DRIVE. Non-Proprietary Security Policy

Security Policy for Oracle Advanced Security Option Cryptographic Module

Chapter 17. Transport-Level Security

Digital Signature Standard (DSS)

ARCHIVED PUBLICATION

APPLICATION NOTE. Atmel AT02333: Safe and Secure Bootloader Implementation for SAM3/4. Atmel 32-bit Microcontroller. Features.

Secure Socket Layer. Carlo U. Nicola, SGI FHNW With extracts from publications of : William Stallings.

Cryptography and Network Security Sicurezza delle reti e dei sistemi informatici SSL/TSL

BlackBerry Enterprise Solution Security Release Technical Overview

The new 32-bit MSP432 MCU platform from Texas

Transcription:

Performance Investigations Hannes Tschofenig, Manuel Pégourié-Gonnard 25 th March 2015 1

Motivation In <draft-ietf-lwig-tls-minimal> we tried to provide guidance for the use of DTLS (TLS) when used in IoT deployments and included performance data to help understand the design tradeoffs. Later, work in the IETF DICE was started with the profile draft, which offers detailed guidance concerning credential types, communication patterns. It also indicates which extensions to use or not to use. Goal of <draft-ietf-lwig-tls-minimal> is to offer performance data based on the recommendations in the profile draft. This presentation is about the current status of gathering performance data for later inclusion into the <draft-ietf-lwig-tls-minimal> document. 2

Performance Data This is the data we want: Flash code size Message size / Communication Overhead CPU performance Energy consumption RAM usage Also allows us to judge the improvements of various extensions and gives engineers a rough idea what to expect when planning to use DTLS/TLS in an IoT product. <draft-ietf-lwig-tls-minimal-01> offers preliminary data about Code size of various basic building blocks (data from one stack only) Memory (RAM/flash) (pre-shared secret credential only) Communication overhead (high level only) 3

Overview Goal of the authors: Determine performance of asymmetric cryptography on ARM-based processors. Next slides explains Assumptions for the measurements, ARM processors used for the measurements, Development boards used, Actual performance data, and Comparison with other algorithms. 4

Assumptions Main focus of the measurements so far was on raw crypto (and not on protocol exchanges) ECC rather than RSA Different ECC curves Run-time performance (not energy consumption, RAM usage, code size) No hardware acceleration was used. Used open source software; code based on PolarSSL/mbed TLS stack. No hardware-based random number generator in the development platform was used à Not fit for real deployment. 5

ARM Cortex-M Processors Processors used in the performance tests Lowest cost Low power Lowest power Outstanding energy efficiency Performance efficiency Feature rich connectivity Digital Signal Control (DSC) Processor with DSP Accelerated SIMD Floating point (FP) Recently released; Best performance 6 Processors use the 32-bit RISC architecture http://www.arm.com/products/processors/cortex-m/index.php

Prototyping Boards used in Performance Tests ST Nucleo F401RE (STM32F401RET6) ARM Cortex-M4 CPU with FPU at 84MHz 512KB Flash, 96KB SRAM ST Nucleo F103 (STM32F103RBT6) ARM Cortex-M4 CPU with FPU at 72MHz 128KB Flash, 20KB SRAM ST Nucleo L152RE (STM32L152RET6) ARM Cortex-M3 CPU at 32MHz 512 KBytes Flash, 80KB RAM ST Nucleo F091 (STM32F091RCT6) ARM Cortex-M0 CPU at 48MHz 256 KBytes Flash, 32KB RAM NXP LPC1768 ARM Cortex-M3 CPU at 96MHz 512KB Flash, 32KB RAM Freescale FRDM-KL25Z ARM Cortex-M0+ CPU at 48MHz 128KB Flash, 16KB RAM 7 LPC1768 ST Nucleo FRDM-KL25Z

ECC Curves NIST curves: secp521r1, secp384r1, secp256r1, secp224r1, secp192r1 Koblitz curves : secp256k1, secp224k1, secp192k1 Brainpool curves: brainpoolp512r1, brainpoolp384r1, brainpoolp256r1 Curve25519 (only preliminary results). Note that FIPS186-4 refers to secp192r1 as P-192, secp224r1 as P-224, secp256r1 as P-256, secp384r1 as P-384, and secp521r1 as P-521. 8

Optimizations NIST Optimization Utilizes special structure of NIST chosen curves. Appendix 1 of http://csrc.nist.gov/groups/st/toolkit/documents/dss/nistrecur.pdf Longer version in FIPS PUB 186-4: http://nvlpubs.nist.gov/nistpubs/fips/nist.fips.186-4.pdf Relevant configuration parameter: POLARSSL_ECP_NIST_OPTIM Fixed Point Optimization: Pre-computes points Described in https://eprint.iacr.org/2004/342.pdf Relevant configuration parameter: POLARSSL_ECP_FIXED_POINT_OPTIM Window: Technique for more efficient exponentation Sliding window technique described in https://en.wikipedia.org/wiki/exponentiation_by_squaring Relevant configuration parameter: POLARSSL_ECP_WINDOW_SIZE (min=2, max=7). 9

ECDSA, ECDHE, and ECDH Elliptic Curve Digital Signature Algorithm (ECDSA) is the elliptic curve variant of the Digital Signature Algorithm (DSA) or, as it is sometimes called, the Digital Signature Standard (DSS). It is used in TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8 ciphersuite recommended in CoAP (and consequently also in the DTLS profile draft). ECDSA, like DSA, has the property that poor randomness used during signature generation can compromise the long-term signing key. For this reason the deterministic variant of (EC)DSA (RFC 6979) is implemented, which uses the private key as a source or entropy to seed a PRNG. Note: None of the prototyping boards listed in the slide deck provide true random number generation. CoAP recommends this ciphersuite TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8 that makes use of the Ephemeral Elliptic Curve Diffie-Hellman (ECDHE). The Elliptic Curve Diffie-Hellman (ECDH) is only used for comparison purposes in this slide deck but not used in the recommended ciphersuites. 10

Key Length Tradeoff between security and performance. Values based on recommendations from RFC 4492. [I-D.ietf-uta-tls-bcp] recommends at least 112 bits symmetric keys. A 2013 ENISA report states that an 80bit symmetric key is sufficient for legacy applications but recommends 128 bits for new systems. Symmetric ECC DH/DSA/RSA 80 163 1024 112 233 2048 128 283 3072 192 409 7680 256 571 15360 11

Observations: Performance Figures ECDSA signature operation is faster than ECDSA verify operation. Brainpool curves are slower than NIST curves because Brainpool curves use random primes. ECC key sizes above 256 bits are substantially slower than ECC curves with key size 192, 224, and 256. ECDH is only slightly faster than ECDHE (when fixed point optimization is enabled). CPU speed has a significant impact on the performance. The performance of symmetric key cryptography (keyed hash functions, encryption functions) is neglectable. 12

Observations: Optimizations NIST curve optimization provides substantial benefit for NIST secp*r1 curves. Fixed point optimization has a significant influence on the performance. There is a performance RAM usage tradeoff: increased performance comes at the expense of additional RAM usage. ECC library increases code size but also requires a fair amount of RAM for optimizations (for most curves). 13

14 ECC Performance of the Cortex M3/M4

Performance of various NIST/Koblitz ECC Curves NIST curves: secp521r1, secp384r1, secp256r1, secp224r1, secp192r1 Koblitz curves: secp256k1, secp224k1, secp192k1 15

Performance difference between signature vs. verify For comparison: secp192r1 (signature) needs 66msec. For comparison: secp256r1 (signature) needs 122msec. 16

Performance of Brainpool Curves For comparison: Secp256r1 (signature) needs 122msec. 17

Performance of Brainpool Curves For comparison: Secp256r1 (verify) needs 458msec. 18

Performance impact of the window parameter For comparison: secp521r1 (signature, W=7) needs 351msec. For comparison: secp192r1 (signature, W=7) needs 66msec. 19

The Performance Impact of the NIST Optimization secp192r1 (ECDHE): 5986 msec (F401RE, optimization disabled) vs. 638 msec (optimization enabled) 20

21 ECC Performance of the Cortex M0/M0+

ECDHE Performance of the KL25Z 22

ECDSA Performance of the KL25Z 23

24 + FP optimization enabled

25 + FP optimization enabled

26 + FP optimization enabled

27 CPU Speed Impact

Performance of ECDHE: L152RE vs. LPC1768 L152RE: Cortex-M3 with 32MHz LPC1768: Cortex-M3 with 96MHz 28 secp192r1 (ECDHE): 1155 msec (L152RE) vs. 229 msec (LPC1768) NIST optimization enabled. Fixed-point speed-up enabled.

Performance Comparison: Prototyping Boards ECDSA Performance (Signature Operation, w=7, NIST Optimization Enabled) Time (msec) 2000.00 1800.00 1600.00 1400.00 1200.00 1000.00 800.00 600.00 400.00 200.00 secp192r1 secp224r1 secp256r1 secp384r1 secp521r1 0.00 LPC1768, 96 MHz, Cortex M3 L152RE, 32 MHz, Cortex M3 F103RB, 72 MHz, Cortex M4 F401RE, 84 MHz, Cortex M4 Prototyping Boards 29

Curve25519 (Warning: Preliminary Results) 30

1600 1400 FRDM-KL46Z (Cortex-M0+, 48 MHz) 700 600 LPC1768 (Cortex-M3, 96 MHz) msec 1200 1000 800 600 400 200 msec 500 400 300 200 100 0 Curve25519-mbedtls Curve25519-donna P256-mbed ECDHE 1458 552 1145 Notes: The Curve25519-mbedtls implementation uses a generic libary. Hence, the special properties of Curve25519 are not utilized. Curve25519 has very low RAM requirements (~1 Kbyte only). Curve25519-donna is based on the Google implementation. Improvements for M0/M0+ are likely since the code has not been tailored to the architecture. Question: 31 Is Curve25519 a way to get ECC on M0/M0+? msec 0 Curve25519-mbedtls Curve25519-donna P256-mbed ECDHE 598 94 432 600 500 400 300 FRDM-K64F (Cortex-M4, 120 MHz) 200 100 0 Curve25519-mbedtls Curve25519-donna P256-mbed ECDHE 506 58 391

The Power of Assembly Optimizations Example: micro-ecc library https://github.com/kmackay/micro-ecc/tree/old Written in C, with optional inline assembly for ARM and Thumb platforms. LPC1114 at 48MHz (ARM Cortex-M0) ECDH time (ms) secp192r1 secp256r1 LPC1114 175.7 465.1 STM32F091 604,55 1260.9 ECDSA verify time (ms) secp192r1 Secp256r1 LPC1114 217.1 555.2 STM32F091 845.5 1758.8 Performance improvement between 200 and 300 % 32

33 RAM Usage

What was measured? Heap using a custom memory allocation handler (instead of malloc). Memory allocated on the stack was not measured (but it is negligible). Measurement was done on a Linux PC (rather than on the embedded device itself for convenience reasons). Two aspects investigated: Memory impact caused by different window parameter changes. Memory impact caused by FP performance optimization. 34

Heap Usage with Disabled FP Optimization w6 w=2 10000 9000 8000 7000 6000 Bytes 5000 4000 3000 2000 1000 0 secp521r1 secp384r1 secp256r1 secp224r1 secp192r1 secp521r1 secp384r1 secp256r1 secp224r1 secp192r1 secp521r1 secp384r1 secp256r1 secp224r1 secp192r1 Sign Sign Sign Sign Sign Verify Verify Verify Verify Cryptographic Computations Verify ECDHE ECDHE ECDHE ECDHE ECDHE 35

Heap Usage with FP Optimization Enabled w=6 w=2 18000 16000 14000 12000 Bytes 10000 8000 6000 4000 2000 0 secp521r1 secp384r1 secp256r1 secp224r1 secp192r1 secp521r1 secp384r1 secp256r1 secp224r1 secp192r1 secp521r1 secp384r1 secp256r1 secp224r1 secp192r1 Sign Sign Sign Sign Sign Verify Verify Verify Verify Verify ECDHE ECDHE ECDHE ECDHE ECDHE 36 Cryptographic Operation

Heap Usage (Window Size 6) Enabled Optimization 18000 Disabled Optimization 16000 14000 12000 Bytes 10000 8000 6000 4000 2000 0 secp521r1 secp384r1 secp256r1 secp224r1 secp192r1 secp521r1 secp384r1 secp256r1 secp224r1 secp192r1 secp521r1 secp384r1 secp256r1 secp224r1 secp192r1 Sign Sign Sign Sign Sign Verify Verify Verify Verify Verify ECDHE ECDHE ECDHE ECDHE ECDHE Cryptographic Computations 37 Note: NIST optimization enabled in both cases since it does not have an impact on the heap usage.

Summary To enable certain optimizations sufficient RAM is needed. A tradeoff decision between RAM and speed. Heap Usage (secp256r1) Optimizations pays off. 6000 This slide shows heap usage 5000 (NIST optimization 4000 enabled). Bytes 3000 2000 1000 38 0 Sign Verify ECDHE W=6, FP 4568 5380 5012 W=2, No FP 2972 3072 2692

4000 3500 LPC1768 (secp256r1) 3000 2500 2000 msec 1500 1000 500 0 Sign Verify ECDHE w=6, FP, NIST 122 458 431 w=6, no FP, NIST 340 677 644 w=2, no FP, NIST 378 759 734 w=2, no FP, no NIST 1893 3788 3781 Using ~50 % more RAM increases the performance by a factor 8 or more. 39

40 Applying Results to TLS/DTLS

Raw Public Keys with TLS_ECDHE_ECDSA_* TLS / DTLS client needs to perform the following computations: 1. Client verifies the signature covering the Server Key Exchange message that contains the server's ephemeral ECDH public key (and the corresponding elliptic curve domain parameters). 2. Client computes ECDHE. 3. Client creates signature over the Client Key Exchange message containing the client's ephemeral ECDH public key (and the corresponding elliptic curve domain parameters). Summary: 1 x ECDSA verification for step (1) 1 x ECDHE computation for step (2) 1 x ECDSA signature for step (3) Example (LPC1768, secp224r1, W=7, FP and NIST optimization enabled) 329msec (ECDSA verification) 303 msec (ECDHE computation) 85 msec (ECDSA signature) Total: 717 msec 41

Applying Results to TLS/DTLS Certificates with TLS_ECDHE_ECDSA_* Same as with raw public key plus (assuming no OCSP and certs are signed with ECC certificates) CA Certificate CA Certificate 1 st Intermediate CA Certificate 1 x ECDSA verification for 1 st Intermediate CA certificate CA Certificate Intermediate CA Certificate 1 x ECDSA verification for Intermediate CA certificate 2 nd Intermediate CA Certificate 1 x ECDSA verification for 2 nd Intermediate CA certificate Server Certificate 1 x ECDSA verification for server certificate Server Certificate 1 x ECDSA verification for server certificate Server Certificate 1 x ECDSA verification for server certificate 42

43 Symmetric Key Cryptography

Symmetric Key Cryptography Secure Hash Algorithm (SHA) creates a fixed length fingerprint based on an arbitrarily long input. The output length of the fingerprint is determined by the hash function itself. For example, SHA256 produces an output of 256 bits. Advanced Encryption Standard (AES) is an encryption algorithm, which has a fixed block size of 128 bits, and a key size of 128, 192, or 256 bits. A mode of operation describes how to repeatedly apply a cipher's single-block operation to securely transform amounts of data larger than a block. Examples of modes of operation: CCM, GCM, CBC. Test relevant information: SHA computes a hash over a buffer with a length of 1024 bytes. AES-CBC: 1024 input bytes are encrypted. No integrity protection is used. IV size is 16 bytes. AES-CCM and AES-GCM: 1024 input bytes are encrypted and integrity protected. No additional data is used. In this version of the test a 12 bytes nonce value is used together with the input data. In addition to the encrypted data a 16 byte tag value is produced. 44

Symmetric Key Crypto: Performance of the KL25Z 8 7.6 7 6 6 6.5 6.9 5.8 6.7 Time (msec) 5 4 3 2.2 4.2 2.8 3.2 3.6 2 1 0 SHA-256 SHA-512 AES- CBC-128 AES- CBC-192 AES- CBC-256 AES- GCM-128 AES- GCM-192 AES- GCM-256 AES- CCM-128 AES- CCM-192 AES- CCM-256 Cryptographic Operation 45

Symmetric Key Crypto: Performance of the LPC1768 2.5 2 1.8 1.9 2 1.7 1.9 2.1 Time (msec) 1.5 1 0.6 1.4 0.7 0.8 0.9 0.5 0 SHA-256 SHA-512 AES- CBC-128 AES- CBC-192 AES- CBC-256 AES- GCM-128 AES- GCM-192 AES- GCM-256 AES- CCM-128 AES- CCM-192 AES- CCM-256 Cryptographic Operation 46

Conclusion ECC requires performance-demanding computations. Those take time. What an acceptable delay is depends on the application. Many applications only need to run public key cryptographic operations during the initial (session) setup phase and infrequently afterwards. With session resumption DTLS/TLS uses symmetric key cryptography most of the time (which is lightning fast). Detailed performance figures depend on the enabled performance optimizations (and indirectly the available RAM size), the key size, the type of curve, and CPU speed. Choosing the microprocessor based on the expected usage environment is important. 47

Next Steps Collecting performance data on IoT devices is time-consuming. We would appreciate help. In particular, we need Verification of the gathered data Data from other crypto libraries Further tests (energy efficiency, complete DTLS/TLS handshake data, data about various extensions, more data for Curve25519, etc.). We plan to update <draft-ietf-lwig-tls-minimal> accordingly. 48