AES-GCM software performance on the current high end CPUs as a performance baseline for CAESAR competition



Similar documents
AES-GCM for Efficient Authenticated Encryption Ending the Reign of HMAC-SHA-1?

GCM-SIV: Full Nonce Misuse-Resistant Authenticated Encryption at Under One Cycle per Byte. Yehuda Lindell Bar-Ilan University

Encrypting the Internet

Haswell Cryptographic Performance

The Impact of Cryptography on Platform Security

Improving OpenSSL* Performance

Security Protocols HTTPS/ DNSSEC TLS. Internet (IPSEC) Network (802.1x) Application (HTTP,DNS) Transport (TCP/UDP) Transport (TCP/UDP) Internet (IP)

Technologies to Improve. Ernie Brickell

Enhancing McAfee Endpoint Encryption * Software With Intel AES-NI Hardware- Based Acceleration

Chapter 7 Transport-Level Security

AES-Based Authenticated Encryption Modes in Parallel High-Performance Software

Breakthrough AES Performance with. Intel AES New Instructions

Chapter 17. Transport-Level Security

Secure Socket Layer (SSL) and Transport Layer Security (TLS)

Summary of Results. NGINX SSL Performance

Is Your SSL Website and Mobile App Really Secure?

Network Security Essentials Chapter 5

SENSE Security overview 2014

Other VPNs TLS/SSL, PPTP, L2TP. Advanced Computer Networks SS2005 Jürgen Häuselhofer

Security. Contents. S Wireless Personal, Local, Metropolitan, and Wide Area Networks 1

Analyzing the Security Schemes of Various Cloud Storage Services

ZEN NETWORKS 3300 PERFORMANCE BENCHMARK SOFINTEL IT ENGINEERING, S.L.

Table of Contents. Bibliografische Informationen digitalisiert durch

Secure Socket Layer/ Transport Layer Security (SSL/TLS)

Intel Xeon Processor E5-2600

Version Highlights. CertainT 100 SSL Accelerator. Version International. New hardware and software version. North America

Network Security. Lecture 3

Overview. SSL Cryptography Overview CHAPTER 1

INF3510 Information Security University of Oslo Spring Lecture 9 Communication Security. Audun Jøsang

Forward Secrecy: How to Secure SSL from Attacks by Government Agencies

Security Protocols/Standards

Security Policy Revision Date: 23 April 2009

TLS/SSL in distributed systems. Eugen Babinciuc

40G MACsec Encryption in an FPGA

As enterprises conduct more and more

SMPTE Standards Transition Issues for NIST/FIPS Requirements v1.1

ZVA64EE PERFORMANCE BENCHMARK SOFINTEL IT ENGINEERING, S.L.

[SMO-SFO-ICO-PE-046-GU-

Savitribai Phule Pune University

KeyStone Architecture Security Accelerator (SA) User Guide

3.2: Transport Layer: SSL/TLS Secure Socket Layer (SSL) Transport Layer Security (TLS) Protocol

NEFSIS DEDICATED SERVER

A PERFORMANCE EVALUATION OF COMMON ENCRYPTION TECHNIQUES WITH SECURE WATERMARK SYSTEM (SWS)

Einführung in SSL mit Wireshark

Cryptography and Network Security Chapter 12

Secure Sockets Layer (SSL ) / Transport Layer Security (TLS) Network Security Products S31213

13 Virtual Private Networks 13.1 Point-to-Point Protocol (PPP) 13.2 Layer 2/3/4 VPNs 13.3 Multi-Protocol Label Switching 13.4 IPsec Transport Mode

Software implementation of Post-Quantum Cryptography

Overview SSL/TLS HTTPS SSH. TLS Protocol Architecture TLS Handshake Protocol TLS Record Protocol. SSH Protocol Architecture SSH Transport Protocol

CS155. Cryptography Overview

Dashlane Security Whitepaper

Performance Investigations. Hannes Tschofenig, Manuel Pégourié-Gonnard 25 th March 2015

CRYPTOGRAPHY AS A SERVICE

Cleaning Encrypted Traffic

EXAM questions for the course TTM Information Security May Part 1

SPC5-CRYP-LIB. SPC5 Software Cryptography Library. Description. Features. SHA-512 Random engine based on DRBG-AES-128

OpenADR 2.0 Security. Jim Zuber, CTO QualityLogic, Inc.

National Security Agency Perspective on Key Management

Thanks, But No Thanks

Message Authentication Codes

CRYPTOGRAPHY IN NETWORK SECURITY

An Overview of Communication Manager Transport and Storage Encryption Algorithms

Web Security Considerations

A Study of What Really Breaks SSL HITB Amsterdam 2011

Overview of CSS SSL. SSL Cryptography Overview CHAPTER

2014 IBM Corporation

Lecture 9 - Network Security TDTS (ht1)

CSE/EE 461 Lecture 23

An Introduction to Cryptography as Applied to the Smart Grid

AES-COPA v.2. Designers/Submitters: Elena Andreeva 1,2, Andrey Bogdanov 3, Atul Luykx 1,2, Bart Mennink 1,2, Elmar Tischhauser 3, and Kan Yasuda 1,4

The Misuse of RC4 in Microsoft Word and Excel

Implementation and Performance of AES-NI in CyaSSL. Embedded SSL

Transport Level Security

SSL/TLS: The Ugly Truth

Upsurge in Encrypted Traffic Drives Demand for Cost-Efficient SSL Application Delivery

Cryptosystems. Bob wants to send a message M to Alice. Symmetric ciphers: Bob and Alice both share a secret key, K.

Common Pitfalls in Cryptography for Software Developers. OWASP AppSec Israel July The OWASP Foundation

HTTPS: Transport-Layer Security (TLS), aka Secure Sockets Layer (SSL)

Chapter 10. Network Security

Lecture 7: Transport Level Security SSL/TLS. Course Admin

Accellion Secure File Transfer Cryptographic Module Security Policy Document Version 1.0. Accellion, Inc.

Evaluation of Digital Signature Process

ERserver. iseries. Securing applications with SSL

Security Engineering Part III Network Security. Security Protocols (I): SSL/TLS

Using BroadSAFE TM Technology 07/18/05

SE 4472a / ECE 9064a: Information Security

SSL Performance Problems

Introduction to Cryptography

Alliance Key Manager A Solution Brief for Technical Implementers

Network Security. Modes of Operation. Steven M. Bellovin February 3,

Secure Socket Layer. Introduction Overview of SSL What SSL is Useful For

Lab Exercise SSL/TLS. Objective. Step 1: Open a Trace. Step 2: Inspect the Trace

Transcription:

Directions in Authenticated Ciphers DIAC 2013, 11 13 August 2013, Chicago, USA AES-GCM software performance on the current high end CPUs as a performance baseline for CAESAR competition Shay Gueron University of Haifa Department of Mathematics, Faculty of Natural Sciences, University of Haifa, Israel Intel Corporation Israel Development Center, Haifa, Israel shay@math.haifa.ac.il shay.gueron@intel.com S. Gueron. DIAC 2013 1

Agenda We will explore the impact of an efficient Authenticated Encryption (AES- GCM) at the system level Set a performance bar for the CAESAR competition S. Gueron. DIAC 2013 2

Background and motivation S. Gueron. DIAC 2013 3

Background We live in an https:// world TLS/SSL protected clientserver communications Why not https everywhere? Due to overheads costs Cryptographic algorithms for secure communications computational overhead SSL/TLS secure webservers are widely used, e.g., for Banking Ecommerce (ebay, Amazon, PayPal ) Google searches Social networks (e.g., Facebook) File hosting (Dropbox, Skydrive, Google Drive) S. Gueron. DIAC 2013 4

client server handshake public key based; a fixed (large) computational payload Client Traffic Application Data: Authenticated Encryption Server Popular cipher choices used today RC4 + HMAC-MD5 RC4 + HMAC-SHA-1 AES + HMAC-SHA-1 AES-GCM is a more efficient Authenticated Encryption scheme

The effect of the choice of AE on the servers efficiency S. Gueron. DIAC 2013 6

Different Types of Workloads Main two steps of the SSL connection Establish a shared secret through an SSL handshake Use the shared secret to derive symmetric keys to authenticate/encrypt all subsequent traffic What dominates the workload? Handshake dominates when the files are small Authenticated Encryption dominates when the files are large File sizes depends on the type of the application; Here are some example Webmail login to check the last mail (textual): 470 KB Login to a bank account and check balance: 320 KB Facebook login and scroll through 5 pictures a friend posted: 850KB File hosting services: typically use large files (up to a few GB) Client-server communications are affected differently by AE performance

Sessions/Second 160 140 120 100 95 SSL Server Performance small data sessions (500KB) Apache Server Performance on Intel Microarchitecture Codename Sandy Bridge, 1 Core, 1GHz, HT On 107 136 89 500KB 80 60 40 20 28 - The choice of AE algorithm affects directly servers *Tested Apache server uses OpenSSL development version efficiency S. Gueron. DIAC 2013 8

Sessions/Second SSL Server Performance large-data sessions (10MB) 20 18 Apache Server Performance on Intel Microarchitecture Codename Sandy Bridge, 1 Core, 1GHz, HT On 19 17 10MB 16 14 12 10 8 8 10 12 6 4 2 - The choice of AE algorithm affects directly servers *Tested Apache server uses OpenSSL development version efficiency S. Gueron. DIAC 2013 9

Sessions/Second 160 140 120 100 95 SSL Server Performance small and large Apache Server Performance on Intel Microarchitecture Codename Sandy Bridge, 1 Core, 1GHz, HT On 107 136 89 10MB 500KB 80 60 40 20-8 10 28 19 17 12 The choice of AE algorithm affects directly servers *Tested Apache server uses OpenSSL development version efficiency S. Gueron. DIAC 2013 10

AES software optimization and the latest open source patches S. Gueron. DIAC 2013 11

AES-GCM and Intel s AES-NI / PCLMULQDQ Intel introduced a new set of instructions (2010) AES-NI: Facilitate high performance AES encryption and decryption PCLMULQDQ 64 x 64 128 (carry-less) Binary polynomial multiplication; speeds up computations in binary fields Has several usages --- AES-GCM is one To use it for the GHASH computations: GF(2 128 ) multiplication: 1. Compute 128 x 128 256 via carry-less multiplication (of 64-bit operands) 2. Reduction: 256 128 modulo x 128 + x 7 + x 2 + x + 1 (done efficiently via software) AES-NI and PCLMULQDQ can be used for speeding up AES-GCM Authenticated Encryption S. Gueron. DIAC 2013 12

Recent open source contributions The new AES-GCM related patches Sept./Oct. 2012: We published two patches for two popular open source distributions: OpenSSL and NSS Authors: S. Gueron and V. Krasnov Inherently side channel protected constant time in the strict definition Fast on the current x86-64 processors (Intel Microarchitectures Codename Sandy Bridge, Ivy Bridge and Haswell) The (we can do ) best balance between slower CLMUL implementation of Sandy Bridge and Ivy Bridge Microarchitectures, and the faster CLMUL of the Haswell Microarchitecture Status: integrated into NSS, the ideas adopted and integrated into OpenSSL March 2013: We published a patch for OpenSSL accelerating AES-CTR performance AES-CTR is the encryption mode, underlying the AES-GCM Status: mostly adopted by OpenSSL May 2013: We published a patch for NSS, performing side-channel protected GHASH, for generic processors Let s review how this was done S. Gueron. DIAC 2013 13

AES-GCM software optimization highlights Carry-less Karatsuba multiplication Best on Sandy Bridge / Ivy Bridge Microarchitectures (slower PCLMULQDQ) Schoolbook method for Haswell Microarchitecture Haswell has improved PCLMULQDQ New reduction algorithm Carry-less Montgomery for the GHASH operations [Gueron 2012] Encrypt 8 counter blocks Deferred reduction (using 8 block aggregation) Fixed elements outside the brackets Interleave CTR and GHASH Gains 3% over encrypting and MAC-ing serially Inherently side channel protected constant time in the strict definition The challenge: delicate balance on the Sandy Bridge/Ivy Bridge/Haswell Microarchitectures S. Gueron. DIAC 2013 14

CipherText Hkey X 3 X 2 X 1 X 0 The optimized reduction [Gueron ] 0xc200000000000000 X 0 0xc200000000000000 B 0 A 1 A 0 X 0 X 1 That s all C 1 C 0 B 0 B 1 B 1 B 0 vmovdqa T3, [W] vpclmulqdq T2, T3, T7, 0x01 vpshufd T4, T7, 78 vpxor T4, T4, T2 vpclmulqdq T2, T3, T4, 0x01 vpshufd T4, T4, 78 vpxor T4, T4, T2 vpxor T1, T1, T4 ; result in T1 X 3 X 2 H 1 H 0 S. Gueron. DIAC 2013 15 D 1 D 0

Performance numbers S. Gueron. DIAC 2013 16

Cycles/Byte - lower is better AES-GCM performance on OpenSSL (development version) OpenSSL AES-GCM performance, 8KB 3.50 3.00 2.50 2.55 2.55 2.72 2.72 2.87 2.87 2.00 1.50 1.00 1.03 1.17 1.31 0.50 - AES128-GCM AES192-GCM AES256-GCM Sandy Bridge Ivy Bridge Haswell A significant (more than 2x) performance improvement on the new Intel Microarchitecture Codename Haswell S. Gueron. DIAC 2013 17

Cycles/Byte - lower is better AES128-GCM performance on NSS 3.14.2 60.00 50.00 55.42 NSS AES128-GCM performance, 8KB 53.67 47.88 40.00 30.00 20.00 10.00-2.70 2.66 1.12 NSS 3.14 (pre patch) NSS 3.14.2 (AESNI/CLMULNI patch) Sandy Bridge Ivy Bridge Haswell A significant performance improvement on the new version of NSS More than 2x speedup on the new Intel Microarchitecture Codename Haswell S. Gueron. DIAC 2013 18

A note on code balancing NSS uses a single code path for Intel Microarchitectures Codenames Sandy Bridge, Ivy Bridge and Haswell A good performance balance & code that runs on the three microarchitectures Simple maintenance OpenSSL uses separate code paths Achieving top performance on both architectures The Haswell code path interleaves the encryption and the authentication Gains 3% over the serial CTR + GHASH implementation that is used for Sandy Bridge and Ivy Bridge Microarchitectures The Haswell code path uses the (new) MOVBE instruction Generates code that runs only on Haswell and onward However: using MOVBE has no performance benefit over MOV+BSWAP or BSWAP+MOV But artificially create Haswell only code It is difficult to write code that is 100% optimal on Sandy Bridge, Ivy Bridge and Haswell Microarchitectures S. Gueron. DIAC 2013 19

Encryption and Authentication breakdown Processor Microarchitecture Sandy Bridge Ivy Bridge GHASH consumes ~ 70% of AES-GCM on Intel Microarchitectures Codename Sandy Bridge and Ivy Bridge GHASH consumes ~40% on the latest Intel Microarchitecture Codename Haswell Notes: the MAC computations are still significant Limited by the (current) performance of PCLMULQDQ Haswell Cycles/byte AES-GCM 2.53 2.53 1.03 GHASH 1.79 1.79 0.40 AES-CTR 0.73 0.73 0.63 Ultimate performance goal: AES-GCM at the same performance as CTR S. Gueron. DIAC 2013 20

Cycles/Byte - lower is better Comparison to other AE schemes 7.00 6.00 6.14 5.70 5.00 4.71 4.00 3.00 2.00 1.00 2.55 2.55 0.98 0.98 1.03 0.73 0.73 0.69 0.63 - Sandy Bridge Ivy Bridge Haswell AES-CBC with HMAC SHA-1 AES-GCM OCB CTR S. Gueron. DIAC 2013 21

Summary AES-GCM is the best performing Authenticated Encryption combination among the NIST standard options (esp. compared to using HMAC SHA-1) Already enables in the open source libraries OpenSSL and NSS Performance keeps improving across CPU generations The ultimate performance goal: AES-GCM at the performance of CTR+ ε With some luck, we might see significant deployment already in 2013, with leading browsers (Chrome/Firefox) offering AES-GCM as an option What about CAESAR competition candidates? Performance on high end CPU s should be no less than the performance of AES-GCM (on the latest architecture) Minimum - below 1 C/B. Target way below 1 C/B And now, you have a baseline to meet Proposing a minimum for the CAESAR competition candidates: at least as fast as AES-GCM on the high end CPU s S. Gueron. DIAC 2013 22

Thank you for listening Questions? Feedback?

References S. Gueron. DIAC 2013 24

Software patches References [PATCH] Efficient implementation of AES-GCM, using Intel's AES-NI, PCLMULQDQ instruction, and the Advanced Vector Extension (AVX), http://rt.openssl.org/ticket/display.html?id=2900&user=guest&pass=guest Efficient AES-GCM implementation that uses Intel's AES and PCLMULQDQ instructions (AES-NI) and the Advanced Vector Extension (AVX) architecture, https://bugzilla.mozilla.org/show_bug.cgi?id=805604 [PATCH] Fast implementation of AES-CTR mode for AVX capable x86-64 processors, http://rt.openssl.org/ticket/display.html?id=3021&user=guest&pass=guest A patch for NSS: a constant-time implementation of the GHASH function of AES-GCM, for processors that do not have the AES-NI/PCLMULQDQ, https://bugzilla.mozilla.org/show_bug.cgi?id=868948 Papers: AES-GCM 1. S. Gueron, Michael E. Kounavis: Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode (Rev. 2.01) http://software.intel.com/sites/default/files/article/165685/clmul-wp-rev-2.01-2012-09-21.pdf 2. S. Gueron, M. E. Kounavis: Efficient Implementation of the Galois Counter Mode Using a Carry-less Multiplier and a Fast Reduction Algorithm. Information Processing Letters 110: 549û553 (2010). 3. S. Gueron: Fast GHASH computations for speeding up AES-GCM (to be published). AES-NI 5. S. Gueron. Intel Advanced Encryption Standard (AES) Instructions Set, Rev 3.01. Intel Software Network. http://software.intel.com/sites/default/files/article/165683/aes-wp-2012-09-22-v01.pdf 6. S. Gueron. Intel's New AES Instructions for Enhanced Performance and Security. Fast Software Encryption, 16th International Workshop (FSE 2009), Lecture Notes in Computer Science: 5665, p. 51-66 (2009). S. Gueron. DIAC 2013 25