Efficient Software Implementation of AES on 32-bit Platforms

Similar documents

Efficient Software Implementation of AES on 32-Bit Platforms

SeChat: An AES Encrypted Chat

AES Power Attack Based on Induced Cache Miss and Countermeasure

Implementation of Full -Parallelism AES Encryption and Decryption

The Advanced Encryption Standard: Four Years On

Enhancing Advanced Encryption Standard S-Box Generation Based on Round Key

IJESRT. [Padama, 2(5): May, 2013] ISSN:

AES-CBC Software Execution Optimization

Design and Verification of Area-Optimized AES Based on FPGA Using Verilog HDL

Rijndael Encryption implementation on different platforms, with emphasis on performance

An Instruction Set Extension for Fast and Memory-Efficient AES Implementation

Secret File Sharing Techniques using AES algorithm. C. Navya Latha Garima Agarwal Anila Kumar GVN

The Advanced Encryption Standard (AES)

Parallel AES Encryption with Modified Mix-columns For Many Core Processor Arrays M.S.Arun, V.Saminathan

The Advanced Encryption Standard (AES)

Design and Implementation of Asymmetric Cryptography Using AES Algorithm

FPGA IMPLEMENTATION OF AN AES PROCESSOR

Hardware Implementation of AES Encryption and Decryption System Based on FPGA

Survey on Enhancing Cloud Data Security using EAP with Rijndael Encryption Algorithm

CSCE 465 Computer & Network Security

Modern Block Cipher Standards (AES) Debdeep Mukhopadhyay

Combining Mifare Card and agsxmpp to Construct a Secure Instant Messaging Software

Network Security. Omer Rana

Fast Implementations of AES on Various Platforms

ELECTENG702 Advanced Embedded Systems. Improving AES128 software for Altera Nios II processor using custom instructions

Area Optimized and Pipelined FPGA Implementation of AES Encryption and Decryption

Separable & Secure Data Hiding & Image Encryption Using Hybrid Cryptography

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

Improving Performance of Secure Data Transmission in Communication Networks Using Physical Implementation of AES

An Energy Efficient ATM System Using AES Processor

Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur

Fast Software AES Encryption

synthesizer called C Compatible Architecture Prototyper(CCAP).

Note on naming. Note on naming

CS 758: Cryptography / Network Security

A VHDL Implemetation of the Advanced Encryption Standard-Rijndael Algorithm. Rajender Manteena

Enhancing the Mobile Cloud Server security by MAC Address

White Paper. Shay Gueron Intel Architecture Group, Israel Development Center Intel Corporation

The implementation and performance/cost/power analysis of the network security accelerator on SoC applications

Automata Designs for Data Encryption with AES using the Micron Automata Processor

Implementation and Design of AES S-Box on FPGA

Polymorphic AES Encryption Implementation

Cyber Security Workshop Encryption Reference Manual

Lecture 8: AES: The Advanced Encryption Standard. Lecture Notes on Computer and Network Security. by Avi Kak

Side-Channel Analysis Resistant Implementation of AES on Automotive Processors

Intel Advanced Encryption Standard (AES) New Instructions Set

Provisioning of Compression and Secure Management Services for Healthcare Data in Cloud Computing

A Survey on Performance Analysis of DES, AES and RSA Algorithm along with LSB Substitution Technique

Performance Evaluation of AES using Hardware and Software Codesign

Area optimized in storage area network using Novel Mix column Transformation in Masked AES

FPGA IMPLEMENTATION OF AES ALGORITHM

Advanced Encryption Standard by Example. 1.0 Preface. 2.0 Terminology. Written By: Adam Berent V.1.7

Advanced Encryption Standard by Example. 1.0 Preface. 2.0 Terminology. Written By: Adam Berent V.1.5

AES implementation on Smart Card

AES Cipher Modes with EFM32

AN IMPLEMENTATION OF HYBRID ENCRYPTION-DECRYPTION (RSA WITH AES AND SHA256) FOR USE IN DATA EXCHANGE BETWEEN CLIENT APPLICATIONS AND WEB SERVICES

Implementing Enhanced AES for Cloud based Biometric SaaS on Raspberry Pi as a Remote Authentication Node

Lecture 4 Data Encryption Standard (DES)

Design and Analysis of Parallel AES Encryption and Decryption Algorithm for Multi Processor Arrays

Switching between the AES-128 and AES-256 Using Ks * & Two Keys

How To Understand And Understand The History Of Cryptography

A NOVEL STRATEGY TO PROVIDE SECURE CHANNEL OVER WIRELESS TO WIRE COMMUNICATION

Network Security Technology Network Management

Added Advanced Encryption Standard (A-Aes): With 512 Bits Data Block And 512, 768 And 1024 Bits Encryption Key

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Network Security - ISA 656 Introduction to Cryptography

Network Security. Security. Security Services. Crytographic algorithms. privacy authenticity Message integrity. Public key (RSA) Message digest (MD5)

Cryptography and Network Security: Summary

New AES software speed records

AES 128-Bit Implementation with Z8 Encore! XP Microcontrollers

AStudyofEncryptionAlgorithmsAESDESandRSAforSecurity

Online Mobile Cloud Based Compiler

Block encryption. CS-4920: Lecture 7 Secret key cryptography. Determining the plaintext ciphertext mapping. CS4920-Lecture 7 4/1/2015

Journal of Research in Electrical and Electronics Engineering (ISTP-JREEE)

Network Security. Chapter 3 Symmetric Cryptography. Symmetric Encryption. Modes of Encryption. Symmetric Block Ciphers - Modes of Encryption ECB (1)

Multi-Layered Cryptographic Processor for Network Security

THIRD PARTY AUDITING SYSTEM FOR CLOUD STORAGE

A New Digital Encryption Scheme: Binary Matrix Rotations Encryption Algorithm

Implementation of Advance Encryption Standard (AES) in Biometric Electronic Voting Software

Overview of Symmetric Encryption

Cryptography and Network Security

How To Encrypt With A 64 Bit Block Cipher

AVR1318: Using the XMEGA built-in AES accelerator. 8-bit Microcontrollers. Application Note. Features. 1 Introduction

AES (Rijndael) IP-Cores

A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTION STANDARD (AES) ALGORITHM USING SYSTEMVERILOG

Disk Encryption. Adnan Vaseem Alam. Master of Science in Communication Technology. Scrutinizing IEEE Standard 1619\XTS-AES

AES Flow Interception : Key Snooping Method on Virtual Machine. - Exception Handling Attack for AES-NI -

Improved Method for Parallel AES-GCM Cores Using FPGAs

A Secure Data Transmission By Integrating Cryptography And Video Steganography

Transcription:

Efficient Software Implementation of AES on 32-bit Platforms Guido Bertoni, Luca Breveglieri Politecnico di Milano, Milano - Italy Pasqualina Lilli Lilli Fragneto AST-LAB of ST Microelectronics, Agrate B. - Italy Marco Macchetti,, Stefano Marchesin ALARI - Università della Svizzera Italiana, Lugano - Switzerland

Table of Contents Introduction Short description of AES Optimisation of the algorithm Simulation results Conclusions 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 1 / 23

Introduction A work for the efficient software implementation of AES. Optimised software implementation (in C) oriented to 32-bit platforms with low memory (e.g. embedded systems). Evaluation of the time performances on various platforms: ARM, ST and Pentium. Comparison with the time performances of Gladman s C code. The usage of look-up tables is limited: only the S-BOX S and the inverse S-BOX S transformations are tabularised (2 256 bytes). 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 2 / 23

Algorithm Description - General Rijndael is the selected (NIST competition) algorithm for AES (Advanced Encryption Standard). It is a block cipher algorithm, operating on blocks of data. It needs a secret key, which is another block of data. Performs encryption and the inverse operation, decryption (using the same secret key). It reads an entire block of data, processes it in rounds and then outputs the encrypted (or decrypted) data. Each round is a sequence of four inner transformations. The AES standard specifies 128-bit data blocks and 128- bit, 192-bit or 256-bit secret keys. 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 3 / 23

Algorithm Description Encrypt. PLAINTEXT ROUND 0 ROUND 0 ROUND 1 ROUND 1 ROUND 9 ROUND 9 ROUND 10 ROUND 10 encryption algorithm ROUND KEY 0 ROUND KEY 1 ROUND KEY 9 ROUND KEY 10 SECRET KEY KEY SCHEDULE KEY SCHEDULE ROUND KEY structure of a generic round INPUT DATA SUBBYTES SUBBYTES SHIFTROWS SHIFTROWS MIXCOLUMNS MIXCOLUMNS ADDROUNDKEY ADDROUNDKEY ENCRYPTED DATA OUTPUT DATA 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 4 / 23

Algorithm Description Encrypt. SubBytes 6%2; s 0 s 1 s 2 s 3 V s 6 s 7 s 9 s 10 s 11 s 12 s 13 s 14 s 15 state array ShiftRows state array s 4 s 5 s 8 one byte s 0 s 1 s 2 s 3 s 4 s 7 rotation of s s 5 5 s 6 s 8 s 9 s 10 s 11 s 12 s 13 s 14 s 15 s 0 s 4 s 8 s 12 1 byte s 0 s 4 s 8 s 12 s 1 s 2 s 3 s 5 s 6 s 7 s 9 s 10 s 11 s 13 s 14 s 15 2 bytes 3 bytes s 5 s 10 s 15 s 9 s 14 s 3 s 13 s 2 s 7 s 1 s 6 s 11 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 5 / 23

Algorithm Description Encrypt. MixColumns coeff.s matrix state array s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 s 11 s 12 s 13 s 14 s 15 field GF(2 8 ) AddRoundKey 02 03 03 02 03 02 03 02 bit-wise XOR state array s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 s 11 polynomial multiplications round key s 12 s 13 s 14 s 15 s 0 s 4 s 8 s 12 s 0 s 4 s 8 s 12 k 0 k 4 k 8 k 12 s 1 s 5 s 9 s 13 s 1 s 5 s 9 s 13 k 1 k 5 k 9 k 13 s 2 s 6 s 10 s 14 s 2 s 6 s 10 s 14 k 2 k 6 k 10 k 14 s 3 s 7 s 11 s 15 s 3 s 7 s 11 s 15 k 3 k 7 k 11 k 15 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 6 / 23

Optimisation The Idea To improve the time performances of AES, a transposed state array has been used. state transposed state s 0 s 1 s 4 s 5 s 8 s 9 s 12 s 13 s 0 s 4 s 1 s 5 s 2 s 6 s 3 s 7 s 2 s 6 s 10 s 14 s 8 s 9 s 10 s 11 s 3 s 7 s 11 s 15 s 12 s 13 s 14 s 15 Very simple idea, but yields interesting consequences! 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 7 / 23

Optimisation - Consequences The following round transformations are essentially invariant with respect to transposition (and their speed is unchanged): SubBytes ShiftRows AddRoundKey (but the round keys must be transposed) Instead, the MixColumns transformation must be completely restructured. The new MixColumns is considerably sped-up by the transposition of the state. 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 8 / 23

Old MixColumns It is a matricial product (in GF(2 8 )): Mix Column number c (0 c 3) s s s s 0, c 1, c 2, c 3, c = 02 03 03 02 03 02 03 02 s s s s 0, c 1, c 2, c 3, c In C language a macro is used: fwd_mcol(x) state column (f2 = FFmulX(x), f2ûpr(x^f2,3)ûpr(x,2)ûpr(x,1)) 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 9 / 23

Old MixColumns - Cost s s s s 0, c 1, c 2, c 3, c = 02 s s s s 0, c 1, c 2, c 3, c s s 03 s s 3, c 0, c 1, c 2, c s s s s 2, c 3, c 0, c 1, c s s s s 1, c 2, c 3, c 0, c The cost per column is: a single doubling, 4 additions (XOR) and a 3 rotations (all operations work on 32 bits). For a complete MixColumns transformation 4 doublings, 16 additions ions (XOR) and 12 rotations are required. doubling means 4 multiplications in GF(2 8 ) of each byte of the 32-bit word. 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 10 / 23

New MixColumns s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 s 11 s 12 s 13 s 14 s 15 02 03 03 02 03 02 03 02 s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 s 11 s 12 s 13 s 14 s 15 s 0 s 4 s 8 s 12 [ 02 03 ] s 0 s 1 s 2 s 4 s 5 s 6 s 8 s 9 s 10 s 12 s 13 s 14 state row s 3 s 7 s 11 s 15 Transposition is equivalent to processing the state array by rows, instead of processing it by columns! 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 11 / 23

New MixColumns The New MixColumns transformation is: y 0 = ({02} x 0 ) + ({03} x 1 ) + x 2 + x 3 y 1 = x 0 + ({02} x 1 ) + ({03} x 2 ) + x 3 y 2 = x 0 + x 1 + ({02} x 2 ) + ({03} x 3 ) y 3 = ({03} x 0 ) + x 1 + x 2 + ({02} x 3 ) The symbols x i and y i (0 i 3) indicate the 32-bit rows of the state array before and after New MixColumns, respectively. The 32-bit word x i accommodates 4 bytes coming from 4 different columns (and similarly for y i ). The operation {02} x i or doublings consists of 4 multiplications in GF(2 8 ) of each byte of the 32bits word. 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 12 / 23

New MixColumns The transformation can be executed in three steps. It can be conceived as a sort of double and add algorithm. 5HPDLQGHU 02 03 03 02 03 02 03 02 y 0 = x 1 + x 2 + x 3 y 1 = x 0 + x 2 + x 3 y 2 = x 0 + x 1 + x 3 y 3 = x 0 + x 1 + x 2 x 0 = {02} x 0 x 1 = {02} x 1 x 2 = {02} x 2 x 3 = {02} x 3 y 0 += x 0 + x 1 y 1 += x 1 + x 2 y 2 += x 2 + x 3 y 3 += x 3 + x 0 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 13 / 23

MixColumns Cost Comparison The standard implementation of MixColumns requires: 4 doublings, 16 XOR s and 12 rotations, and one intermediate variable The transposed version of MixColumns requires: 4 doublings, 16 XOR s and NO rotation, and NO intermediate variable. Software time performances should improve! 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 14 / 23

Decryption Decryption uses the InvMixColumns transformation inverse of MixColumns. Also InvMixColumns can be sped-up by the transposition of the state array. Transposition yields a higher speed-up for InvMixColumns than for MixColumns. This is due to the complex structure of the coefficient matrix of InvMixColumns. Mixcolumns coeff.s:, 02 and 03 (hex). InvMixColumns coeff.s: 09, 0b, 0d and 0e (hex). 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 15 / 23

Old InvMixColumns s s s s 0, c 1, c 2, c 3, c = 0e 09 0d 0b 0b 0e 09 0d 0d 0b The entries of the coefficient matrix of InvMixColumns contain a larger number of 1 s than those of MixColumns. Transposition exposes more parallelism and hence yields a significant speed-up. 0e 09 09 0d 0b 0e s s s s 0, c 1, c 2, c 3, c 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 16 / 23

New InvMixColumns s 0 s 4 s 8 s 12 s 0 s 4 s 8 s 12 [ 0e 0b 0d 09] s 1 s 2 s 5 s 6 s 9 s 10 s 13 s 14 Reminder: y 0 = x 1 + x 2 + x 3 s 3 s 7 s 11 s 15 0e hex = 1 1 1 0 b 0b hex = 1 0 1 1 b 0d hex = 1 1 0 1 b 09 hex = 1 0 0 1 b x 0 = {02} x 0 x 1 = {02} x 1 x 2 = {02} x 2 x 3 = {02} x 3 y 0 += x 0 + x 1 x 0 = {02} (x 0 + x 2 ) x 1 = {02} (x 1 + x 3 ) y 0 += x 0 x 0 = {02} (x 0 + x 1 ) y 0 += x 0 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 17 / 23

InvMixColumns Cost Comparison The standard algorithm requires: 12 doublings, 32 XOR s and 12 rotations, and 4 intermediate variables. The transposed algorithm requires only: 7 doublings, 27 XOR s and NO rotation, and NO intermediate variable. Software time performances should improve! But time performances should improve in hardware as well! 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 18 / 23

Time Performances The time performances of the proposed algorithm have been tested on some 32-bit CPU s: ARM 7 TDMI and ARM 9 TDMI, typical microcontrollers ST 22, a CPU designed for smart card (by STM) and PENTIUM III, a general purpose CPU The time performances are computed in CPU cycles, and are compared with those of Gladman s C code. Where Gladman is better, it is due to the time overhead required to transpose input and output data, to remain compliant with the standard. 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 19 / 23

Results (ARM) CPU Version Key Schedule Encryption Decryption ARM 7 TDMI Transposed Gladman 634 449 1675 1641 2074 2763 ARM 9 TDMI Transposed Gladman 499 333 1384 1374 1764 2439 Simulations have been executed by means of the ARM Development Suite ADS 1.1. 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 20 / 23

Results (ST 22 and P III) CPU Version Key Schedule Encryption Decryption ST 22 Transposed Gladman 0.22 0.13 0.51 0.61 0.60 1 Transposed 370 1119 1395 P III Gladman 396 1404 2152 Gladman (look-up tab.) 202 / 306 (enc.) / (dec.) 362 381 ST 22 figures are normalized with respect to Gladman decryption. 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 21 / 23

Comparisons with Gladman CPU Key Schedule Encryption Decryption ARM 7 41.20 % 2.07 % -24.94 % ARM 9 49.85 % 0.73 % -27.68 % ST 22 69.23 % -16.39 % -40.00 % P III -6.57 % -20.30 % -35.18 % The comparison is performed setting to 100 % the time performances of Gladman s implementation for the corresponding function. In red the cases where the transposed version has higher performances. 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 22 / 23

Conclusions and Further Developments Conclusions: Study and optimization of AES. Some interesting time performance improvements in software. Part of this work is under patenting process. Further Developments: Hardware implementations. 08/09/2002 CHES 2002 Workshop Redwood City (SF Bay), CA, USA pp. 23 / 23

???? Any Question????