A Predictive Model for Cache-Based Side Channels in Multicore and Multithreaded Microprocessors



Similar documents
OC By Arsene Fansi T. POLIMI

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Enhancing McAfee Endpoint Encryption * Software With Intel AES-NI Hardware- Based Acceleration

Implementation of Full -Parallelism AES Encryption and Decryption

CSCE 465 Computer & Network Security

Breakthrough AES Performance with. Intel AES New Instructions

Instruction Set Architecture (ISA)

AES Power Attack Based on Induced Cache Miss and Countermeasure

Speeding up GPU-based password cracking

N-Variant Systems. Slides extracted from talk by David Evans. (provenance in footer)

Between Mutual Trust and Mutual Distrust: Practical Fine-grained Privilege Separation in Multithreaded Applications

Are Cache Attacks on Public Clouds Practical?

Technical Challenges for Big Health Care Data. Donald Kossmann Systems Group Department of Computer Science ETH Zurich

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Energy-Efficient, High-Performance Heterogeneous Core Design

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Introduction to Cloud Computing

Operating Systems, 6 th ed. Test Bank Chapter 7

Using Power to Improve C Programming Education

OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect

Multithreading Lin Gao cs9244 report, 2006

How To Encrypt With A 64 Bit Block Cipher

Chapter 2 Parallel Computer Architecture

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

CPSC 467b: Cryptography and Computer Security

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Outline. Cache Parameters. Lecture 5 Cache Operation

TPCalc : a throughput calculator for computer architecture studies

Tackling The Challenges of Big Data. Tackling The Challenges of Big Data Big Data Systems. Security is a Negative Goal. Nickolai Zeldovich

ELECTENG702 Advanced Embedded Systems. Improving AES128 software for Altera Nios II processor using custom instructions

GPU Hardware Performance. Fall 2015

Side Channel Analysis and Embedded Systems Impact and Countermeasures

Software implementation of Post-Quantum Cryptography

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

Multi-Threading Performance on Commodity Multi-Core Processors

The Impact of Cryptography on Platform Security

18-548/ Associativity 9/16/98. 7 Associativity / Memory System Architecture Philip Koopman September 16, 1998

Performance Impacts of Non-blocking Caches in Out-of-order Processors

USB Portable Storage Device: Security Problem Definition Summary

ADVANCED COMPUTER ARCHITECTURE

Chair for Network Architectures and Services Department of Informatics TU München Prof. Carle. Network Security. Chapter 13

IJESRT. [Padama, 2(5): May, 2013] ISSN:

Putting it all together: Intel Nehalem.

Horst Görtz Institute for IT-Security

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Chapter 11 Security+ Guide to Network Security Fundamentals, Third Edition Basic Cryptography

Parallel Computing 37 (2011) Contents lists available at ScienceDirect. Parallel Computing. journal homepage:

AVR1318: Using the XMEGA built-in AES accelerator. 8-bit Microcontrollers. Application Note. Features. 1 Introduction

AES1. Ultra-Compact Advanced Encryption Standard Core. General Description. Base Core Features. Symbol. Applications

Lecture 6 - Cryptography

Microsemi Security Center of Excellence

The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa

CHAPTER 1 INTRODUCTION

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Virtualization. Clothing the Wolf in Wool. Wednesday, April 17, 13

Compter Networks Chapter 9: Network Security

Configuring CoreNet Platform Cache (CPC) as SRAM For Use by Linux Applications

Cryptography & Network Security. Introduction. Chester Rebeiro IIT Madras

AN IMPLEMENTATION OF HYBRID ENCRYPTION-DECRYPTION (RSA WITH AES AND SHA256) FOR USE IN DATA EXCHANGE BETWEEN CLIENT APPLICATIONS AND WEB SERVICES

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

COS 318: Operating Systems. Virtual Machine Monitors

IT Networks & Security CERT Luncheon Series: Cryptography

LSN 2 Computer Processors

1 Construction of CCA-secure encryption

Michael Seltzer COMP 116: Security Final Paper. Client Side Encryption in the Web Browser Mentor: Ming Chow

Secure Sockets Layer (SSL ) / Transport Layer Security (TLS) Network Security Products S31213

Embedded Parallel Computing

Precise and Accurate Processor Simulation

361 Computer Architecture Lecture 14: Cache Memory

FPGA-based Multithreading for In-Memory Hash Joins

Hardware Implementation of AES Encryption and Decryption System Based on FPGA

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Performance Guideline for syslog-ng Premium Edition 5 LTS

Evaluating The Performance of Symmetric Encryption Algorithms

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

Intel Pentium 4 Processor on 90nm Technology

Understanding Hardware Transactional Memory

Elliptic Curve Cryptography Methods Debbie Roser Math\CS 4890

IoT Security Concerns and Renesas Synergy Solutions

Chapter 1 Computer System Overview

Identity-based Encryption with Post-Challenge Auxiliary Inputs for Secure Cloud Applications and Sensor Networks

<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Dr. Jinyuan (Stella) Sun Dept. of Electrical Engineering and Computer Science University of Tennessee Fall 2010

Network Security Technology Network Management

Performance measurements of syslog-ng Premium Edition 4 F1

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

SeChat: An AES Encrypted Chat

Transcription:

A Predictive Model for Cache-Based Side Channels in Multicore and Multithreaded Microprocessors Leonid Domnitser, Nael Abu-Ghazaleh and Dmitry Ponomarev Department of Computer Science SUNY-Binghamton {lenny, nael, dima}@cs.binghamton.edu

Multi-cores-->Many-cores Moore's law coming to an end Power wall; ILP wall; memory wall End of lazy-boy programming era Multi-cores offer a way out New Moore's law: 2x number of cores every 1.5 years New security vulnerabilities arise due to resource sharing Side-Channel Attacks and Denial of Service Attacks

Last-level Cache Sharing on Multicores (Intel Xeon) 2 quad-core Intel Xeon e5345 (Clovertown)

L1 Cache Sharing in SMT Processor Instruction Cache Issue Queue Register File PC PC PC Execution Units Fetch Unit Decode PC PC PC PC Register Rename Load/Store Queues LDST PC Units Data Cache Re-order Buffers Arch State Private Resources Shared Resources

Advanced Encryption Standard (AES) One of the most popular algorithms in symmetric key cryptography 16-byte input (plaintext) 16-byte output (ciphertext) 16-byte secret key (for standard 128-bit encryption) several rounds of 16 XOR operations and 16 table lookups secret key byte Input byte index Lookup Table

Set-Associative Caches Address 31 30 12 11 10 9 8 3 2 1 0 22 8 Index V Tag Data V Tag Data V Tag Data V Tag Data 0 1 2 253 254 255 22 32 4-to-1 multiplexor Hit Data

Attack Example Cache a b c d Main Memory Attacker s data AES data b>(a c d) Can exploit knowledge of the cache replacement policy to optimize attack 8

Cache-Based Side Channel Attacks An attacker and a victim process (e.g. AES) run together using a shared cache Access-Driven Attack: Attacker occupies the cache, evicting victim s data When victim accesses cache, attacker s data is evicted By timing its accesses, attacker can detect intervening accesses by the victim Time-Driven Attack Attacker fills the cache Times victim s execution for various inputs Performs correlation analysis

Simple Attack Code Example #define ASSOC 8 #define NSETS 128 #define LINESIZE 32 #define ARRAYSIZE (ASSOC*NSETS*LINESIZE/sizeof(int)) static int the_array[arraysize] int fine_grain_timer(); //implemented as inline assembler void time_cache() { register int i, time, x; for(i = 0; i < ARRAYSIZE; i++) { time = fine_grain_timer(); x = the_array[i]; time = fine_grain_timer() - time; the_array[i] = time; } }

Existing Solutions Avoid using pre-computed tables too slow Close the channel: Lock critical data in the cache (Lee, ISCA 07) Impacts performance Randomize the victim selection (Lee, MICRO 08) Both are highly complex, impact performance and require OS/ISA support Constrain the channel: NoMo: dynamic partitioning of the cache that sets some ways of the cache to be exclusive to each thread Still leaks a small amount of accesses

Exposure rate Nael Abu-Ghazaleh SAB2003 Aggregate Exposure of Critical Data 1.0 Aggregate Critical Exposure 0.8 0.6 0.4 AES encrypt AES decrypt BF encrypt BF decrypt 0.2 0.0 NoMo-0 NoMo-1 NoMo-2 NoMo-3 NoMo-4

Contribution and Motivation Predictive mathematical model for access leakage for access based side channel attacks Why? Vulnerability is a function of information leaked through the side-channel Estimate how vulnerable current algorithms and caches to attack Estimate how effective imperfect solutions that constrain rather than close the side-channel are

Model Parameters

Leakage Prediction Model P(D C) is the probability that the cache access is detected by the attacker given that this access is critical P(D) is the probability that the access is detected P(C) is the probability that the access is critical Our model computes P(C D) as follows: P(D C) = [P(C D)*P(D)] / P(C)

Estimating P(C) Leakage Prediction Model Average fraction of critical accesses constant for a given program. Can be estimated through static analysis or profiling Estimating P(D) Number of detected accesses out of the total number of accesses 100% for the perfect attacker. Less in reality as some accesses are hidden by cache hits. Estimating P(C D) Need to filter out the noise due to non-critical accesses

Estimating P(D)

Estimating P(C D)

Estimating P(D C)

Aggregate Leakage Predicted by Model

Predicted Leakge Per Set, Blowfish 8-way set associative cache

Predicted leakage Per Set, AES 8-way set associative cache

Predicted leakage Per Set, Blowfish, AES, Direct Mapped Cache

Model Validation Methodology We used M-Sim-3.0 cycle accurate simulator (multithreaded and Multicores derivative of Simplescalar) developed at SUNY Binghamton http://www.cs.binghamton.edu/~msim Evaluated leakage for AES and Blowfish encryption/decryption Ran security benchmarks for blocks of randomly generated input Implemented the attacker as a separate thread and ran it alongside the crypto processes Assumed that the attacker is able to synchronize at the block encryption boundaries (i.e. It fills the cache after each block encryption and checks the cache after the encryption).

Model Validation: Per Block Leakage in AES

Model Validation: Blowfish

Model Validation: Direct-Mapped Cache

Conclusions and Future Work We developed a model for information leakage for access based cache side channel attacks An important attack with the emergence of multi-core and multithreaded architectures Model can capture overall accesses in an application or any subset of them (e.g., specific sets) The model was validated against a cycle accurate simulator Future work: model the impact of defenses that reduce the leakage Future work: translate leakage pattern into a measure of difficulty of breaking the key

Thank you! Questions?

Спасибо большое какие-нибудь вопросы? 30

Example of Partial Side Channel Closure: NoMo Caches Key idea: An application sharing cache cannot use all lines in a set NoMo invariant: For an N-way cache, a thread can use at most N Y lines Y NoMo degree Essentially, we reserve Y cache ways for each co-executing application and dynamically share the rest If Y=N/2, we have static non-overlapping cache partitioning Implementation is very simple just need to check the reservation bits at the time of replacement

NoMo Replacement Logic

NoMo example for an 8-way cache NoMo Entry Reserved Shared Initial More Thread (Yellow cache way 2 enters = usage T1, Blue = T2) F:1 H:1 R:2 C:1 Q:2 K:1 A:1 A:1 P:1 G:1 B:1 B:1 N:1 J:1 N:1 D:1 M:1 L:1 T:2 S:2 I:1 U:2 O:1 E:1 Showing 4 lines of an 8-way cache with NoMo-2 X:N means data X from thread N

Why Does NoMo Work? Victim s accesses become visible to attacker only if the victim has accesses outside of its allocated partition between two cache fills by the attacker. In this example: NoMo-1

Evaluation Methodology We used M-Sim-3.0 cycle accurate simulator (multithreaded and Multicores derivative of Simplescalar) developed at SUNY Binghamton http://www.cs.binghamton.edu/~msim Evaluated security for AES and Blowfish encryption/decryption Ran security benchmarks for 3M blocks of randomly generated input Implemented the attacker as a separate thread and ran it alongside the crypto processes Assumed that the attacker is able to synchronize at the block encryption boundaries (i.e. It fills the cache after each block encryption and checks the cache after the encryption) Evaluated performance on a set of SPEC 2006 Benchmarks. Used Pin-based trace-driven simulator with Pintools.

Sets with Critical Exposure AES enc. AES dec. BF enc. BF dec. NoMo-0 128 128 128 128 NoMo-1 128 128 128 128 NoMo-2 10 14 22 22 NoMo-3 0 0 1 1 NoMo-4 0 0 0 0

Impact on IPC Throughput (105 2-threaded SPEC 2006 workloads simulated)

Normalized Fairness Nael Abu-Ghazaleh SAB2003 Impact on Fair Throughput (105 2-threaded SPEC 2006 workloads simulated) 1.01 1.00 0.99 0.98 NoMo-1 NoMo-2 NoMo-3 NoMo-4 0.97 Benchmark Mixes

NoMo Design Summary Practical and low-overhead hardware-only design for defeating access-driven cache-based side channel attacks Can easily adjust security-performance trade-offs by manipulating degree of NoMo Can support unrestricted cache usage in singlethreaded mode Performance impact is very low in all cases No OS or ISA support required

NoMo Results Summary (for an 8-way L1 cache) NoMo-4 (static partitioning): complete application isolation with 1.2% average (5% max) performance and fairness impact on SPEC 2006 benchmarks NoMo-3: No side channel for AES, and 0.07% critical leakage for Blowfish. 0.8% average(4% max) performance impact on SPEC 2006 benchmarks NoMo-2: Leaks 0.6% of critical accesses for AES and 1.6% for Blowfish. 0.5% average (3% max) performance impact on SPEC 2006 benchmarks NoMo-1: Leaks 15% of critical accesses for AES and 18% for Blowfish. 0.3% average (2% max) performance impact on SPEC 2006 benchmarks