Speeding up GPU-based password cracking

Similar documents
GPU-based Password Cracking

Fast Implementations of AES on Various Platforms

Speeding Up RSA Encryption Using GPU Parallelization

Fast Software AES Encryption

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

GPUs for Scientific Computing

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

Protecting against modern password cracking

Introduction to GPU hardware and to CUDA

Password Manager with 3-Step Authentication System

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Introduction to GPU Programming Languages

ultra fast SOM using CUDA

Password Cracking in the Cloud

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Next Generation GPU Architecture Code-named Fermi

NVIDIA GeForce GTX 580 GPU Datasheet

Project: Simulated Encrypted File System (SEFS)

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

FIPS Non- Proprietary Security Policy. McAfee SIEM Cryptographic Module, Version 1.0

The State of Modern Password Cracking

GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Guided Performance Analysis with the NVIDIA Visual Profiler

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

High-Performance Modular Multiplication on the Cell Processor

Password Cracking Beyond Brute-Force

Common Pitfalls in Cryptography for Software Developers. OWASP AppSec Israel July The OWASP Foundation

Secure Network Communications FIPS Non Proprietary Security Policy

A Predictive Model for Cache-Based Side Channels in Multicore and Multithreaded Microprocessors

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

GPGPU Computing. Yong Cao

Key & Data Storage on Mobile Devices

CUDA programming on NVIDIA GPUs

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

NVIDIA Tools For Profiling And Monitoring. David Goodwin

Side Channel Analysis and Embedded Systems Impact and Countermeasures

Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Karsten Nohl, Breaking GSM phone privacy

Lecture 4 Data Encryption Standard (DES)

Lecture Objectives. Lecture 8 Mobile Networks: Security in Wireless LANs and Mobile Networks. Agenda. References

HPC with Multicore and GPUs

Encrypting Network Traffic

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Chapter 11 Security+ Guide to Network Security Fundamentals, Third Edition Basic Cryptography

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

IoT Security Concerns and Renesas Synergy Solutions

Lecture Note 8 ATTACKS ON CRYPTOSYSTEMS I. Sourav Mukhopadhyay

IJESRT. [Padama, 2(5): May, 2013] ISSN:

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

OpenCL Programming for the CUDA Architecture. Version 2.3

12/3/08. Security in Wireless LANs and Mobile Networks. Wireless Magnifies Exposure Vulnerability. Mobility Makes it Difficult to Establish Trust

CFD Implementation with In-Socket FPGA Accelerators

L20: GPU Architecture and Models

Windows passwords security

INTRODUCTION ADVANTAGES OF RUNNING ORACLE 11G ON WINDOWS. Edward Whalen, Performance Tuning Corporation

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

CHAPTER 1 INTRODUCTION

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

Rainbow Cracking: Do you need to fear the Rainbow? Philippe Oechslin, Objectif Sécurité. OS Objectif Sécurité SA, Gland,

GPU for Scientific Computing. -Ali Saleh

Network Security. Security Attacks. Normal flow: Interruption: 孫 宏 民 Phone: 國 立 清 華 大 學 資 訊 工 程 系 資 訊 安 全 實 驗 室

Introduction to GPU Computing

Introduction to GPU Architecture

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

ST810 Advanced Computing

QF01/ الخطة الدراسية كلية العلوم وتكنولوجيا المعلومات- برنامج الماجستير/ الوصف المختصر

gpus1 Ubuntu Available via ssh

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Network Security. Computer Networking Lecture 08. March 19, HKU SPACE Community College. HKU SPACE CC CN Lecture 08 1/23

Usable Crypto: Introducing minilock. Nadim Kobeissi HOPE X, NYC, 2014

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

Planning Your Installation or Upgrade

UM0586 User manual. STM32 Cryptographic Library. Introduction

Cut Network Security Cost in Half Using the Intel EP80579 Integrated Processor for entry-to mid-level VPN

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Key Hopping A Security Enhancement Scheme for IEEE WEP Standards

Turbomachinery CFD on many-core platforms experiences and strategies

on an system with an infinite number of processors. Calculate the speedup of

Security (WEP, WPA\WPA2) 19/05/2009. Giulio Rossetti Unipi

Cryptography and Key Management Basics

Textbooks: Matt Bishop, Introduction to Computer Security, Addison-Wesley, November 5, 2004, ISBN

Cryptography Lecture 8. Digital signatures, hash functions

sndio OpenBSD audio & MIDI framework for music and desktop applications

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

Design and Verification of Area-Optimized AES Based on FPGA Using Verilog HDL

Transcription:

Speeding up GPU-based password cracking SHARCS 2012 Martijn Sprengers 1,2 Lejla Batina 2,3 Sprengers.Martijn@kpmg.nl KPMG IT Advisory 1 Radboud University Nijmegen 2 K.U. Leuven 3 March 17-18, 2012

Who am I? Professional life Ethical hacker KPMG IT Advisory Education Master Computer Security at the Kerckhoffs Institute Expertise and experience Computer and network security Password cracking Social Engineering Spare time Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 2 / 28

Cracking password hashes with GPU s Goals Show how password hashing schemes can be efficiently implemented on GPU s Impact on current authentication mechanisms Pose relevant questions immediately but save discussions for the end Outline Background information on MD5-crypt and GPU Optimizations and speed-ups Results and improvements Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 3 / 28

Cracking password hashes with GPU s Goals Show how password hashing schemes can be efficiently implemented on GPU s Impact on current authentication mechanisms Pose relevant questions immediately but save discussions for the end Outline Background information on MD5-crypt and GPU Optimizations and speed-ups Results and improvements Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 3 / 28

Motivation Why password hashing schemes? Database leakage Disgruntled employee SQL injections Accessible storage SAM file (Windows) passwd file (Unix) Why exhaustive search? Humans and randomness Humans and memorability Limited keyspace enables exhaustive search Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 4 / 28

Motivation Why password hashing schemes? Database leakage Disgruntled employee SQL injections Accessible storage SAM file (Windows) passwd file (Unix) Why exhaustive search? Humans and randomness Humans and memorability Limited keyspace enables exhaustive search Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 4 / 28

Why exhaustive search? Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 5 / 28

Motivation Why MD5-crypt? Commonly used Default Unix scheme, Cisco routers, RIPE authentication Basis for other hashing schemes and frameworks SHA-crypt, bcrypt, PBKDF2 Why GPU? New API s support native arithmetic operations Designed for highly parallelized algorithms Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 6 / 28

Motivation Why MD5-crypt? Commonly used Default Unix scheme, Cisco routers, RIPE authentication Basis for other hashing schemes and frameworks SHA-crypt, bcrypt, PBKDF2 Why GPU? New API s support native arithmetic operations Designed for highly parallelized algorithms Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 6 / 28

Password hashing schemes Definition PHS : Z m 2 Z s 2 Z n 2 Properties Correct use of salts Prevents from time-memory trade-off attacks Slow calculation Key-stretching Avoid pipelined implementations Hashing k passwords with the same salt should cost k times more computation time than hashing a single password Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 7 / 28

Avoid pipelined implementations Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 8 / 28

MD5-crypt MD5-compression round MD5-crypt MD5-crypt( somesalt, password ) = $1$somesalt$W.KCTbPSiFDGffAGOjcBc. Key-stretching 1002 calls to MD5-compression function Concatenates password, salt and intermediate result pseudo randomly Source: Wikipedia Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 9 / 28

CUDA and memory model Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 10 / 28

Attacker model Assumptions Attacker model Plaintext password recovery Exhaustive search (ciphertext only) No time-memory trade-off Hardware One CUDA enabled GPU: NVIDIA GTX 295 480 thread processors 60 streaming multiprocessors Password generation Password length < 16 Performance measured in unique password checks per second Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 11 / 28

Our optimizations Our optimizations Memory Fast shared memory Algorithm wise Precompute intermediate results Execution configuration Block- and gridsizes Maximizing parallelization Password hashing is embarrassingly parallel Instructions Modulo arithmetic is expensive Control flow Branching is expensive Algorithm optimizations Password length < 16 One call to MD5compress() Password length << 16 Precompute intermediate results Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 12 / 28

Our optimizations Our optimizations Memory Fast shared memory Algorithm wise Precompute intermediate results Execution configuration Block- and gridsizes Maximizing parallelization Password hashing is embarrassingly parallel Instructions Modulo arithmetic is expensive Control flow Branching is expensive Algorithm optimizations Password length < 16 One call to MD5compress() Password length << 16 Precompute intermediate results Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 12 / 28

Our optimizations Our optimizations Memory Fast shared memory Algorithm wise Precompute intermediate results Execution configuration Block- and gridsizes Maximizing parallelization Password hashing is embarrassingly parallel Instructions Modulo arithmetic is expensive Control flow Branching is expensive Algorithm optimizations Password length < 16 One call to MD5compress() Password length << 16 Precompute intermediate results Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 12 / 28

Memory optimizations Constant memory Default: variables stored in local memory Physically resides in global memory (500 clock cycles latency) Cached on chip As fast as register access (1 clock cycle latency per warp) Shared memory User managed cache On chip (2 clock cycles latency per warp) Shared by all threads in a block Small (16384 Bytes per multiprocessor) Accessed via 16 banks Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 13 / 28

Memory and algorithm optimizations Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 14 / 28

Bank conflicts Problem int shared[threads_per_block][16]; int *buffer = shared[threadid]; Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 15 / 28

Bank conflicts Solution int shared[threads_per_block][16+1]; int *buffer = shared[threadid]+1; Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 16 / 28

Execution configuration optimizations Influence on our implementation Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 17 / 28

Comparison with CPU implementations Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 18 / 28

Comparison with other implementations Other implementations Work Cryptographic type Algorithm Speed up GPU over CPU Bernstein et al. [2, 1] Asymmetric ECC 4-5 Manavski et al. [5] Symmetric AES 5-20 Harrison et al. [3] Symmetric AES 4-10 Harrisonet al. [4] Asymmetric RSA 4 This work Hashing MD5-crypt 25-30 Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 19 / 28

Consequences for password safety Influence on password classes Length 26 characters 36 characters 62 characters 94 characters 4 0,5 Seconds 2 Seconds 16 Seconds 2 Minutes 5 13 Seconds 1 Minute 17 Minutes 2 Hours 6 5 Minutes 41 Minutes 18 Hours 10 Days 7 2 Hours 1 Days 46 Days 3 Years 8 2 Days 37 Days 8 Years 264 Years 9 71 Days 4 Years 488 Years 20647 Years 10 5 Years 132 Years 30243 Years 2480775 Years Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 20 / 28

Conclusions Should we worry? Yes, if your password length is < 9 characters Increase entropy in passwords password policy Advantage: old schemes still usable Disadvantage: humans and randomness Disadvantage: humans and memorability What is a good policy? Increase complexity by at least 4 orders of magnitude Advantage: MD5-crypt still usable Disadvantage: passwords not backwards compatible Disadvantage: Moore s law Switch to SHA-crypt or PBKDF2 Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 21 / 28

Conclusions Should we worry? Yes, if your password length is < 9 characters Increase entropy in passwords password policy Advantage: old schemes still usable Disadvantage: humans and randomness Disadvantage: humans and memorability What is a good policy? Increase complexity by at least 4 orders of magnitude Advantage: MD5-crypt still usable Disadvantage: passwords not backwards compatible Disadvantage: Moore s law Switch to SHA-crypt or PBKDF2 Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 21 / 28

Future work Future work Optimizations Additional algorithm optimizations Newer hardware Time-Memory Trade-Off Heterogenous crack clusters Consisting of a mix of GPU s, CPU s, mobile devices, etc. Large distributed environments Jungle computing or Amazon s EC2 OpenCL Other schemes and applications SHA-crypt, bcrypt, etc. Frameworks as PBKDF2 Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 22 / 28

Questions and discussion Thank you for your attention! Any questions? Contact: Sprengers.Martijn@kpmg.nl Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 23 / 28

References D. Bernstein, H. C. Chen, C. M. Cheng, T. Lange, R. Niederhagen, P. Schwabe, and B. Y. Yang. ECC2K-130 on NVIDIA GPUs. Progress in Cryptology-INDOCRYPT 2010, pages 328 346, 2010. D. J. Bernstein, H. C. Chen, M. S. Chen, C. M. Cheng, C. H. Hsiao, T. Lange, Z. C. Lin, and B. Y. Yang. The billion-mulmod-per-second PC. SHARCS Workshop, 2009. O. Harrison and J. Waldron. Practical symmetric key cryptography on modern graphics hardware. In Proceedings of the 17th conference on Security symposium, pages 195 209. USENIX Association, 2008. Owen Harrison and John Waldron. Efficient acceleration of asymmetric cryptography on graphics hardware. In Bart Preneel, editor, AFRICACRYPT, volume 5580 of Lecture Notes in Computer Science, pages 350 367. Springer, 2009. S. A. Manavski. CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In Signal Processing and Communications, 2007. ICSPC 2007. IEEE International Conference on, pages 65 68. IEEE, 2008. Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 24 / 28

Execution configuration optimizations Occupancy Occupancy = Active warps per multiprocessor W α Maximum active warps per multiprocessor W max W α restricted by register and shared memory usage W max restricted by hardware (32 in our case) Programmer can influence W α by setting the number of threads per block T b correctly Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 25 / 28

Execution configuration optimizations Theoretical calculation Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 26 / 28

Execution configuration optimizations Influence on our implementation Martijn Sprengers, Lejla Batina March 17-18, 2012 Speeding up GPU-based password cracking 27 / 28

Password Salt MD5Compress(password salt password) result MD5Compress(password $1$ salt result) result Init(buffer) n=0 buffer = password result If(n%3) result buffer = buffer salt else MD5-crypt If(n%7) else buffer = buffer password If(n<1000) MD5Compress(buffer) n++ If(n==1000) result result