Comparative Performance Review of SHA-3 Candidates

Similar documents

Comparison of seven SHA-3 candidates software implementations on smart cards.

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉志尉 National Chiao Tung University

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Industry First X86-based Single Board Computer JaguarBoard Released

GenICam 3.0 Faster, Smaller, 3D

Instruction Set Design

Week 1 out-of-class notes, discussions and sample problems

CISC, RISC, and DSP Microprocessors

Benchmarking Large Scale Cloud Computing in Asia Pacific

on an system with an infinite number of processors. Calculate the speedup of

12. Introduction to Virtual Machines

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Computer Architectures

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

SOC architecture and design

How Java Software Solutions Outperform Hardware Accelerators

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

A client side persistent block cache for the data center. Vault Boston Luis Pabón - Red Hat

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Instruction Set Architecture (ISA)

IA-64 Application Developer s Architecture Guide

How To Compare Two Servers For A Test On A Poweredge R710 And Poweredge G5P (Poweredge) (Power Edge) (Dell) Poweredge Poweredge And Powerpowerpoweredge (Powerpower) G5I (

Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan

Running Windows on a Mac. Why?

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

Lua as a business logic language in high load application. Ilya Martynov ilya@iponweb.net CTO at IPONWEB

How To Write Portable Programs In C

Introducción. Diseño de sistemas digitales.1

Embedded Linux RADAR device

Instruction Set Architecture (ISA) Design. Classification Categories

Driving force. What future software needs. Potential research topics

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

picojava TM : A Hardware Implementation of the Java Virtual Machine

1 Performance Comparison of SHA-3 Finalists

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Control period can have high frequency (short control frame) ex: 0.1kHz - 1kHz in robot controllers

Generations of the computer. processors.

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

System Requirements Table of contents

LSN 2 Computer Processors

AC : A PROCESSOR DESIGN PROJECT FOR A FIRST COURSE IN COMPUTER ORGANIZATION

Muse Server Sizing. 18 June Document Version Muse

The Central Processing Unit:

Code Sharing using C++ between Desktop Applications and Real-time Embedded Platforms

Embedded Operating Systems in a Point of Sale Environment. White Paper

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

System Considerations

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

All Tech Notes and KBCD documents and software are provided "as is" without warranty of any kind. See the Terms of Use for more information.

Optimizing Code for Accelerators: The Long Road to High Performance

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

Board Notes on Virtual Memory

Computer Architecture. Secure communication and encryption.

The Motherboard Chapter #5

Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert

Intel 8086 architecture

İSTANBUL AYDIN UNIVERSITY

The Stream Cipher HC-128

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

Maritime HMI - S Line.

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Virtuoso and Database Scalability

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Mobile Processors: Future Trends

Central Processing Unit (CPU)

Summer Internship 2013 Group No.4-Enhancement of JMeter Week 1-Report-1 27/5/2013 Naman Choudhary

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

1/20/2016 INTRODUCTION

Solido Spam Filter Technology

Chapter 5 Instructor's Manual

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

11.1 inspectit inspectit

Rethinking SIMD Vectorization for In-Memory Databases

Chapter 2 - Computer Organization

Hardware Configuration Guide

Chapter 1 Computer System Overview

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

MAGENTO HOSTING Progressive Server Performance Improvements

Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System?

Chapter 2 Logic Gates and Introduction to Computer Architecture

Apache Derby Performance. Olav Sandstå, Dyre Tjeldvoll, Knut Anders Hatlen Database Technology Group Sun Microsystems

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.

Lattice QCD Performance. on Multi core Linux Servers

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

Transcription:

Comparative Performance Review of the SHA-3 Second-Round Candidates Cryptolog International Second SHA-3 Candidate Conference

Outline sphlib

sphlib sphlib is an open-source implementation of many hash functions: includes and the 14 SHA-3 second-round candidates (and many older hash functions) optimized, portable C code for large systems optimized, portable C code for small embedded systems optimized Java code all functions were implemented by the same developer with similar optimization techniques and efforts

sphlib: why? sphlib was initiated in 2007 with the following goals: to benchmark hash functions on many platforms, with an emphasis on embedded systems; to make a fair comparison between hash functions; to explore performance of of hash functions. On PC systems, sphlib implementations are not the fastest available, but measures on PC can be used to validate optimization strategies, and to explore some architecture effects.

Comparison with ebash ebash relies on externally provided implementations: 1. ebash publishes results on some platforms; 2. implementers try to beat current records and submit new code; 3. loop to 1. This development process appears to be slow on non-pc platforms, and has not yet created enough momentum on embedded systems. In the meantime, sphlib provides measures. Also, sphlib has Java code.

which we consider here have the following characteristics: 32-bit registers raw RISC with no extra unit (no SSE2, often no FPU either): optimal for portable C programming small L1 cache RAM for instruction (no more than 16 kb) low operating frequency (less than 200 MHz) are specialized and CPU-starved.

constraints cannot fully unroll all loops (small L1 cache): implies extra indirections 64-bit operations are slow no superscalar execution: parallelism is expensive some do not have 32-bit rotations However, embedded systems often have plenty of registers. Endianness appears to be mostly irrelevant to performance.

Java constraints Virtual Machines (in particular Java) offer portability, ease of development and distribution, and safety against buggy or hostile code. no special operations, only generic arithmetic code code cannot be adjusted for register size the JIT compiler produces fat code severe L1 cache issues table accesses are more expensive the JIT compiler must work fast

x86-64: Intel Q6600, 64-bit, 2.4 GHz (MBytes/s) 0 50 100 150 200 250 300 350 256-bit output 0 50 100 150 200 250 300 350 400 450 500 512-bit output

i386: Intel Q6600, 32-bit, 2.4 GHz (MBytes/s) 0 50 100 150 200 250 300 256-bit output 0 50 100 150 200 250 300 512-bit output

G3: PowerPC 750, 32-bit, 300 MHz (MBytes/s) 0 5 10 15 20 25 30 256-bit output 0 5 10 15 20 25 30 512-bit output

MIPS: BCM3302, 32-bit, 200 MHz (MBytes/s) 0 1 2 3 4 5 6 7 8 256-bit output 0 1 2 3 4 5 6 7 8 512-bit output

ARMv4: ARM920T, 32-bit, 75 MHz (MBytes/s) 0 0.5 1 1.5 2 2.5 3 3.5 256-bit output 0 0.5 1 1.5 2 2.5 3 3.5 512-bit output

Java: Intel Q6600, 64-bit, 2.4 GHz (MBytes/s) 0 20 40 60 80 100 120 140 160 180 256-bit output 0 20 40 60 80 100 120 140 160 180 200 512-bit output

Java: Intel Q6600, 32-bit, 2.4 GHz (MBytes/s) 0 20 40 60 80 100 120 140 160 256-bit output 0 20 40 60 80 100 120 140 160 512-bit output

Hash function performance is important on embedded systems, much less so on large systems. Most SHA-3 second round candidates (and ) appear to be optimized for large systems and/or 64-bit platforms., and are consistent (performance does not depend on output size). is mostly consistent. was designed to run well on embedded systems.

Unexplored questions Code footprint: often crucial in embedded systems. sphlib limits itself to the L1 cache size (8 kb on MIPS) but does not reduce code further. Short messages: sphlib optimizes code for long messages, with buffers. Meaningful benchmarks are harder for short messages. Script languages (e.g. Javascript). http://www.saphir2.com/sphlib/