Jeremi M Gosney Founder & CEO, Stricture Consulting Group

Similar documents
MOSIX: High performance Linux farm

Building Clusters for Gromacs and other HPC applications

Password Cracking Beyond Brute-Force

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

Using Linux Clusters as VoD Servers

Using Linux Clusters as VoD Servers

Lecture 1: the anatomy of a supercomputer

ECLIPSE Performance Benchmarks and Profiling. January 2009

Analysis of FileVault 2: Apple's full disk encryption. Omar Choudary Felix Grobert Joachim Metz

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

- An Essential Building Block for Stable and Reliable Compute Clusters

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

365 Cloud Storage. Security Brief

Simplest Scalable Architecture

Virtualization and Performance NSRC

LS DYNA Performance Benchmarks and Profiling. January 2009

Enterprise-Class Virtualization with Open Source Technologies

Embedded Display Module EDM6070

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

SALSA Flash-Optimized Software-Defined Storage

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Ground up Introduction to In-Memory Data (Grids)

Parallel Programming Survey

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Cluster Implementation and Management; Scheduling

Scaling out a SharePoint Farm and Configuring Network Load Balancing on the Web Servers. Steve Smith Combined Knowledge MVP SharePoint Server

Optimizing computation of Hash-Algorithms as an attacker. Version

farmerswife Contents Hourline Display Lists 1.1 Server Application 1.2 Client Application farmerswife.com

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

Solution Brief July All-Flash Server-Side Storage for Oracle Real Application Clusters (RAC) on Oracle Linux

CSE-E5430 Scalable Cloud Computing Lecture 2

Microsoft Exchange Server 2003 Deployment Considerations

AMD CodeXL 1.7 GA Release Notes

Mellanox Academy Online Training (E-learning)

WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION

High Performance Computing in CST STUDIO SUITE

AMD PhenomII. Architecture for Multimedia System Prof. Cristina Silvano. Group Member: Nazanin Vahabi Kosar Tayebani

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

PCI Technology Overview

PCI Express and Storage. Ron Emerick, Sun Microsystems

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Business white paper. HP Process Automation. Version 7.0. Server performance

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Distributed Operating Systems. Cluster Systems

Intel DPDK Boosts Server Appliance Performance White Paper

DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service

Cloud-Based dwaf A Real World Deployment Case Study. OWASP 5. April The OWASP Foundation

Enabling Technologies for Distributed Computing

Scalable Cluster Computing with MOSIX for LINUX

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Mark Bennett. Search and the Virtual Machine

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Boundless Security Systems, Inc.

InfiniBand in the Enterprise Data Center

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

BHyVe. BSD Hypervisor. Neel Natu Peter Grehan

The future is in the management tools. Profoss 22/01/2008

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

Android Virtualization from Sierraware. Simply Secure

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

The Monitis Monitoring Agent ver. 1.2

Cloud Computing through Virtualization and HPC technologies

ARM Processors for Computer-On-Modules. Christian Eder Marketing Manager congatec AG

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Several tips on how to choose a suitable computer

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Parallel Firewalls on General-Purpose Graphics Processing Units

TEGRA LINUX DRIVER PACKAGE R21.1

Lecture 2 Cloud Computing & Virtualization. Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

SAMBA AND SMB3: ARE WE THERE YET? Ira Cooper Principal Software Engineer Red Hat Samba Team

Load Balancing in Beowulf Clusters

Example of Standard API

Xen and the Art of. Virtualization. Ian Pratt

SCSI support on Xen. MATSUMOTO Hitoshi Fujitsu Ltd.

System Management Interrupt Free Hardware

Lua as a business logic language in high load application. Ilya Martynov ilya@iponweb.net CTO at IPONWEB

Transcription:

Jeremi M Gosney Founder & CEO, Stricture Consulting Group Passwords^12 Security Conference December 3, 2012

The Problem Password crackers need more power! Yes, really. A GPU is great! More GPUs are better. How many GPUs are enough?

Potential Solutions Build up $$ Motherboard, BIOS, Driver limitations Build out Distributing load can be tricky

Enter Virtual OpenCL (VCL) Makes remote GPUs appear as if they were local Implements entire OpenCL 1.1 Standard Created by Amnon Barak and Amnon Shiloh, Hebrew University Distributed by MOSIX (www.mosix.org)

VCL Pros Free (gratis) Supports any and all OpenCL devices Works with any unmodified* OpenCL app Eliminates the complexity of distributing load Makes clustering ridiculously easy

VCL Cons Closed source Closed license 64-bit Linux only (Not sure this is really a con) Need highspeed LAN No Internet clustering

How VCL Works Provides a pool of shared devices Completely transparent to app & user Two-Layer model Broker node Executes kernels on compute nodes. Compute nodes Compute.

The Broker Node This is the front-end Has your software stack installed Only needs the VCL library and broker daemon Does not need any OpenCL devices Does not need any drivers

Compute Nodes This is the back-end No software to install outside of VCL back-end daemon Needs OpenCL devices Needs proprietary drivers & OpenCL runtime

VCL Architecture CPU Process Kernels VCL Library Broker Back-end daemon Back-end daemon OpenCL Devices Broker node Compute node Diagrams Courtesy of Amnon Barak

Alternate Architecture CPU Process Kernels VCL Library Broker Back-end daemon Back-end daemon OpenCL devices OpenCL Devices Compute node Compute node Diagrams Courtesy of Amnon Barak

VCL Workflow Application Kernel 1 running Kernel 2 running File System Kernel N running Broker node Compute node Diagrams Courtesy of Amnon Barak

SuperCL Network optimization library for VCL Single call for multiple executions Direct file I/O to/from OpenCL memory object Async data transfer with broker

SuperCL Workflow Application Input data from file-system Kernel 1 running Kernel 2 running Kernel N running Output data to file-system File System Broker node Compute node Diagrams Courtesy of Amnon Barak

VCL + Hashcat We can use VCL for password cracking! You knew this part was coming. Jens Steube added VCL support for up to 128 AMD GPUs in oclhashcat-plus v0.09 MOSIX were more than happy to help debug and resolve issues

VCL + Hashcat Considerations Bandwidth Brute force / mask attacks need very little bandwidth Wordlist attacks need a lot Latency Slow hashes can tolerate some latency Fast hashes cannot No SuperCL support in Hashcat Memory Broker node needs a lot of it Higher the n value, the more you need

Our Cluster Five 4U servers 25 AMD Radeon GPUs 10x HD 7970 4x HD 5970 (dual GPU) 3x HD 6990 (dual GPU) 1x HD 5870 4x SDR Infiniband interconnect 7kW of electricity Broker daemon runs on a cluster node

epixoip@token:~/oclhashcat-lite-0.11$ LD_LIBRARY_PATH=/usr/lib/vcl vclrun \./vclhashcat-lite64.bin -b --benchmark-mode 1 force oclhashcat-lite v0.11 by atom starting... Password lengths: 1-54 Watchdog: Temperature abort trigger disabled Watchdog: Temperature retain trigger disabled Device #1: Tahiti, 2048MB, 1100Mhz, 32MCU Device #2: Tahiti, 2048MB, 1100Mhz, 32MCU Device #3: Cypress, 512MB, 830Mhz, 20MCU Device #4: Tahiti, 2048MB, 1100Mhz, 32MCU Device #5: Cypress, 512MB, 830Mhz, 20MCU Device #6: Tahiti, 2048MB, 1100Mhz, 32MCU Device #7: Cypress, 512MB, 830Mhz, 20MCU Device #8: Tahiti, 2048MB, 1100Mhz, 32MCU Device #9: Cypress, 512MB, 830Mhz, 20MCU Device #10: Cayman, 1024MB, 880Mhz, 24MCU Device #11: Tahiti, 2048MB, 1100Mhz, 32MCU Device #12: Cypress, 512MB, 830Mhz, 20MCU Device #13: Cayman, 1024MB, 880Mhz, 24MCU Device #14: Tahiti, 2048MB, 1100Mhz, 32MCU Device #15: Cypress, 512MB, 830Mhz, 20MCU Device #16: Cayman, 1024MB, 880Mhz, 24MCU Device #17: Tahiti, 2048MB, 1100Mhz, 32MCU Device #18: Cypress, 512MB, 830Mhz, 20MCU Device #19: Cayman, 1024MB, 880Mhz, 24MCU Device #20: Tahiti, 2048MB, 1100Mhz, 32MCU Device #21: Cypress, 512MB, 830Mhz, 20MCU Device #22: Cayman, 1024MB, 880Mhz, 24MCU Device #23: Tahiti, 2048MB, 1100Mhz, 32MCU Device #24: Cypress, 512MB, 830Mhz, 20MCU Device #25: Cayman, 1024MB, 880Mhz, 24MCU [s]tatus [p]ause [r]esume [q]uit =>

Bandwidth Measurements Brute force consistently uses < 8 Mbps Wordlist attacks on fast hashes use no more than 800 Mbps Average peak of 88 Mbit per physical card Ethernet latencies are still an issue Infiniband helps tremendously

Benchmarks Fast Hashes LM 20 G/s SHA1 63 G/s MD5 180 G/s NTLM 348 G/s 0 50,000,000,000 100,000,000,000 150,000,000,000 200,000,000,000 250,000,000,000 300,000,000,000 350,000,000,000

Benchmarks Slow Hashes md5crypt 77 M/s bcrypt (05) 71 k/s sha512crypt 364 k/s 0 10000000 20000000 30000000 40000000 50000000 60000000 70000000 80000000

Keeping in touch IRC: epixoip on EFnet, Freenode Twitter: @jmgosney