High-Speed Computing & Co-Processing with FPGAs



Similar documents
7a. System-on-chip design and prototyping platforms

Open Flow Controller and Switch Datasheet

Reconfigurable System-on-Chip Design

Hardware and Software

Chapter 7 Memory and Programmable Logic

High-Level Synthesis for FPGA Designs

Simplifying Embedded Hardware and Software Development with Targeted Reference Designs

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Rapid System Prototyping with FPGAs

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Pre-tested System-on-Chip Design. Accelerates PLD Development

Model-based system-on-chip design on Altera and Xilinx platforms

System on Chip Platform Based on OpenCores for Telecommunication Applications

Fondamenti su strumenti di sviluppo per microcontrollori PIC

Cryptography & Network-Security: Implementations in Hardware

Horst Görtz Institute for IT-Security

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Hardware Implementations of RSA Using Fast Montgomery Multiplications. ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner

Interfacing Credit Card-sized PCs to Board Level Electronics

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

Introduction to Programmable Logic Devices. John Coughlan RAL Technology Department Detector & Electronics Division

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Chapter 2 Features of Embedded System

What is a System on a Chip?

2. TEACHING ENVIRONMENT AND MOTIVATION

9/14/ :38

FPGA Music Project. Matthew R. Guthaus. Department of Computer Engineering, University of California Santa Cruz

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Getting Started with Embedded System Development using MicroBlaze processor & Spartan-3A FPGAs. MicroBlaze

Quartus II Software Design Series : Foundation. Digitale Signalverarbeitung mit FPGA. Digitale Signalverarbeitung mit FPGA (DSF) Quartus II 1

Linux. Reverse Debugging. Target Communication Framework. Nexus. Intel Trace Hub GDB. PIL Simulation CONTENTS

The Evolution of CCD Clock Sequencers at MIT: Looking to the Future through History

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

Chapter 5 :: Memory and Logic Arrays

AMC13 T1 Rev 2 Preliminary Design Review. E. Hazen Boston University E. Hazen - AMC13 T1 V2 1

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

IMPLEMENTATION OF FPGA CARD IN CONTENT FILTERING SOLUTIONS FOR SECURING COMPUTER NETWORKS. Received May 2010; accepted July 2010

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

A Comparative Study Of Two Symmetric Encryption Algorithms Across Different Platforms.

International Journal of Advancements in Research & Technology, Volume 2, Issue3, March ISSN

7 Series FPGA Overview

A Compact FPGA Implementation of Triple-DES Encryption System with IP Core Generation and On-Chip Verification

SNC-VL10P Video Network Camera

AES1. Ultra-Compact Advanced Encryption Standard Core. General Description. Base Core Features. Symbol. Applications

System Considerations

Cryptographic Rights Management of FPGA Intellectual Property Cores

Cryptanalysis with a cost-optimized FPGA cluster

Wireless Local Area. Network Security

LatticeECP2/M S-Series Configuration Encryption Usage Guide

DS1104 R&D Controller Board

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

International Journal of Scientific & Engineering Research, Volume 4, Issue 6, June ISSN

Kirchhoff Institute for Physics Heidelberg

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware

A PERFORMANCE EVALUATION OF COMMON ENCRYPTION TECHNIQUES WITH SECURE WATERMARK SYSTEM (SWS)

Technical Aspects of Creating and Assessing a Learning Environment in Digital Electronics for High School Students

RAM & ROM Based Digital Design. ECE 152A Winter 2012

Chapter 4 System Unit Components. Discovering Computers Your Interactive Guide to the Digital World

Introduction to Xilinx System Generator Part II. Evan Everett and Michael Wu ELEC Spring 2013

Lecture N -1- PHYS Microcontrollers

PNOZmulti Configurator V8.1.1

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

The following is a summary of the key features of the ARM Injector:

Chapter 2 Logic Gates and Introduction to Computer Architecture

Microtronics technologies Mobile:

FPGA-based Multithreading for In-Memory Hash Joins

Low-resolution Image Processing based on FPGA

System Generator for DSP

Polymorphic AES Encryption Implementation

Chapter 5 Busses, Ports and Connecting Peripherals

Qsys and IP Core Integration

Ahsay Online Backup. Whitepaper Data Security

White Paper FPGA Performance Benchmarking Methodology

Specification and Design of a Video Phone System

Discovering Computers Living in a Digital World

How To Design An Image Processing System On A Chip

Product Development Flow Including Model- Based Design and System-Level Functional Verification

SDLC Controller. Documentation. Design File Formats. Verification

Cyber Security Practical considerations for implementing IEC 62351

ChipScope Pro 11.4 Software and Cores

Today. Important From Last Time. Old Joke. Computer Security. Embedded Security. Trusted Computing Base

GETTING STARTED WITH PROGRAMMABLE LOGIC DEVICES, THE 16V8 AND 20V8

VLSI Based Vehicle Security and Accident Information System

Data Superhero Online Backup Whitepaper Data Security

Implementation of Full -Parallelism AES Encryption and Decryption

USB - FPGA MODULE (PRELIMINARY)

Security & Chip Card ICs SLE 44R35S / Mifare

Cryptography and Network Security

REC FPGA Seminar IAP Seminar Format

Chapter 11 Security+ Guide to Network Security Fundamentals, Third Edition Basic Cryptography

CoProcessor Design for Crypto- Applications using Hyperelliptic Curve Cryptography

Transcription:

High-Speed Computing & Co-Processing with FPGAs FPGAs (Field Programmable Gate Arrays) are slowly becoming more and more advanced and practical as high-speed computing platforms. In this talk, David will provide an in-depth introduction into the guts and capabilities of modern day FPGAs and show how you can take your current algorithms and efficiently convert them to gate logic and run them on hardware. This presentation will also introduce a set of open source cores (jawn v1.0) that will implement the basic functionality of john the ripper on FPGAs and allow you to crack password hashes as fast as 100+ PCs using FPGA PCMCIA cards on your laptop. David Hulton <dhulton@picocomputing.com> Founder, Dachb0den Labs Chairman, ToorCon Information Security Conference Embedded Systems Engineer, Pico Computing, Inc.

Disclaimer Educational purposes only Full disclosure I'm not a hardware guy

Goals This talk will cover: Introduction to FPGAs Verilog Optimization Concepts Cryptography History Password File Cracker (jawn v0.1) Artificial Intelligence Neural Networks

Introduction to FPGAs Field Programmable Gate Array Lets you prototype IC's Code translates directly into circuit logic

Introduction to FPGAs Configurable Logic Blocks (CLBs) Registers (flip flops) for fast data storage Logic Routing Input/Output Blocks (IOBs) Basic pin logic (flip flops, muxs, etc) Block Ram Internal memory for data storage Digial Clock Managers (DCMs) Clock distribution Programmable Routing Matrix Intelligently connects all components together PPC

FPGA Pros / Cons Pros Common Hardware Benefits Massively parallel Pipelineable Reprogrammable Cons Self-reconfiguration Size constraints / limitations More difficult to code & debug

Introduction to FPGAs Common Applications Encryption / decryption AI / Neural networks Digital signal processing (DSP) Software radio Image processing Communications protocol decoding Matlab / Simulink code acceleration Etc.

Introduction to FPGAs Common Applications Encryption / decryption AI / Neural networks Digital signal processing (DSP) Software radio Image processing Communications protocol decoding Matlab / Simulink code acceleration Etc.

Types of FPGAs Antifuse Programmable only once Flash Programmable many times SRAM Programmable dynamically Most common technology Requires a loader (doesn't keep state after poweroff)

Development Platform ROAG PCMCIA Form Factor Virtex II-Pro (XC2VP4-5) Embedded PowerPC 405 128MB RAM 32MB Flash 10/100 Ethernet Synchronous Serial Port 2 RS232 Ports CANBus Satellite Radio Controller

Development Platform Virtex II-Pro (XC2VP4-5) 6,768 Logic Cells 12KB of Registers (Distributed RAM) ~ 180,000 Gates 64KB of Block RAM PowerPC 405 300mhz Max Clock Speed

Development Platform FPGA Programming PCMCIA JTAG Embedded System Xilinx's Microkernel Linux OpenBSD / NetBSD / etc?

Creating Your Project Tools ISE 6.3i Chipscope 6.3i Modelsim 5.8c EDK 6.3i Installation date + 60-day trials available on xilinx.com

Verilog Hardware Description Language Simple C-like Syntax Like Go - Easy to learn, difficult to master

Demonstration Interfacing with the PCMCIA bus Creating your design Building Running

PCMCIA Bus Lines Address Data In Data Out Read Write Example 0x10C8000 0xBEEF 0x4110 Read in input from PCMCIA bus Invert bits and return it

Massively Parallel Example PC (32 * ~ 7 clock cycles?) @ 3.0Ghz for(i = 0; i < 32; i++) c[i] = a[i] * b[i]; Hardware (1 clock cycle) @ 300Mhz a b c x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Massively Parallel Example PC Speed scales with # of instructions & clock speed Hardware Speed scales with FPGA's: Size Clock Speed

Pipeline Example PC (x * ~ 10 clock cycles?) @ 3.0Ghz for(i = 0; i < x; i++) f[i] = a[i] + b[i] * c[i] d[i] ^ e[i] Hardware (x + 3 clock cycles) @ 300Mhz + x - ^ Stage 1 Stage 2 Stage 3 Stage 4 In 1ns 2ns 3ns 4ns Out

Pipeline Example PC (x * ~ 10 clock cycles?) @ 3.0Ghz for(i = 0; i < x; i++) f[i] = a[i] + b[i] * c[i] d[i] ^ e[i] Hardware (x + 3 clock cycles) @ 300Mhz + x - ^ Stage 1 Stage 2 Stage 3 Stage 4 In 1ns 2ns 3ns 4ns Out

Pipeline Example PC (x * ~ 10 clock cycles?) @ 3.0Ghz for(i = 0; i < x; i++) f[i] = a[i] + b[i] * c[i] d[i] ^ e[i] Hardware (x + 3 clock cycles) @ 300Mhz + x - ^ Stage 1 Stage 2 Stage 3 Stage 4 In 1ns 2ns 3ns 4ns Out

Pipeline Example PC (x * ~ 10 clock cycles?) @ 3.0Ghz for(i = 0; i < x; i++) f[i] = a[i] + b[i] * c[i] d[i] ^ e[i] Hardware (x + 3 clock cycles) @ 300Mhz + x - ^ Stage 1 Stage 2 Stage 3 Stage 4 In 1ns 2ns 3ns 4ns Out

Pipeline Example PC (x * ~ 10 clock cycles?) @ 3.0Ghz for(i = 0; i < x; i++) f[i] = a[i] + b[i] * c[i] d[i] ^ e[i] Hardware (x + 3 clock cycles) @ 300Mhz + x - ^ Stage 1 Stage 2 Stage 3 Stage 4 In 1ns 2ns 3ns 4ns Out

Pipeline Example PC Speed scales with # of instructions & clock speed Hardware Speed scales with FPGA's: Size Clock speed Slowest operation in the pipeline

Self-Reconfiguration Example PC data = MultiplyArrays(a, b); RC4(key, data, len); m = MD5(data, len); Hardware Control Logic MultiplyArrays.bit RC4.bit MD5.bit

Self-Reconfiguration Example PC data = MultiplyArrays(a, b); RC4(key, data, len); m = MD5(data, len); Hardware Control Logic MultiplyArrays.bit RC4.bit MD5.bit

Self-Reconfiguration Example PC data = MultiplyArrays(a, b); RC4(key, data, len); m = MD5(data, len); Hardware Control Logic MultiplyArrays.bit RC4.bit MD5.bit

History of FPGAs and Cryptography Minimal Key Lengths for Symmetric Ciphers Ronald L. Rivest (R in RSA) Bruce Schneier (Blowfish, Twofish, etc) Tsutomu Shimomura (Mitnick) A bunch of other ad hoc cypherpunks

History of FPGAs and Cryptography Budget Tool 40-bits 56-bits Recom Pedestrian Hacker Tiny Computers 1 week infeasible 45 $400 FPGA 5 hours 38 years 50 Small Company $10K FPGA 12 min 556 days 55 Corporate Department $300K FPGA 24 sec 19 days 60 ASIC 0.18 sec 3 hrs Big Company $10M FPGA 0.7 sec 13 hrs 70 ASIC 0.005 sec 6 min Intelligence Agency $300M ASIC 0.0002 sec 12 sec 75

History of FPGAs and Cryptography 40-bit SSL is crackable by almost anyone 56-bit DES is crackable by companies Scared yet? This paper was published in 1996

History of FPGAs and Cryptography 1998 The Electronic Frontier Foundation (EFF) Cracked DES in < 3 days Searched ~9,000,000,000 keys/second Cost < $250,000 2001 Richard Clayton & Mike Bond (University of Cambridge) Cracked DES on IBM ATMs Able to export all the DES and 3DES keys in ~ 20 minutes Cost < $1,000 using an FPGA evaluation board

History of FPGAs and Cryptography 2004 Philip Leong, Chinese University of Hong Kong IDEA RC4 50Mb/sec on a P4 vs. 5,247Mb/sec on Pilchard Cracked RC4 keys 58x faster than a P4 Parallelized 96 times on a FPGA Cracks 40-bit keys in 50 hours Cost < $1,000 using a RAM FPGA (Pilchard)

Password File Cracker Design Pipeline design Internal cracking engine password = des_crack(hash, options); Interface over PCMCIA Can specify cracking options Bits to search e.g. Search 55-bits (instead of 56) Offset to start search e.g. First card gets offset 0, second card gets offset 2**55 Typeable/printable characters Alpha-numeric Allows for basic distributed cracking & resume functionality

Password File Cracker Hash/Options Password Cracker() Y Generate Key N Hash Match? Crypt()

Password File Cracker PC (3.0Ghz P4 \w john) ~ 300,000 c/s Hardware (Low end FPGA \w jawn) 100Mhz/25 = ~4,000,000 c/s When timing issues are resolved it should run at 200Mhz Type P4 ROAG 8 ROAGs 56-bits 3808 Y 292 Y 36 Y Typeable / printable 381 Y 28 Y 3.5 Y Alpha-numeric 14 Y 1.1 Y 50 D

Up & Coming Pico (PCMCIA) 20k CLBs (~ 600k gates) @ ~ 350Mhz (3x250Mhz)/25 = ~30m c/s Picomon (Compact Flash) 30k CLBs (~ 1m gates) @ ~ 400Mhz (5x300Mhz)/25 = ~60m c/s Nest (PCI) 16 Picomons 480k CLBs (~ 16m gates) @ ~ 400Mhz (16x5x300Mhz)/25 = ~960m c/s NOTE: Straight DES cracking is ~ 24b c/s (> 2.5x faster than the EFF DES cracker)

Up & Coming Real Performance Type Pico Picomon Nest 56-bits 36Y 19Y 1.2Y Typeable / printable 3.8Y 1.9Y 43D Alphanumeric 54D 27D 41H Straight DES 1.5Y 277D 17.4D

Artificial Intelligence Back Propegation Neural Network Applications Handwriting Recognition Character Recognition Voice Recognition FFTs Automatic Protocol Emulation Pattern Matching Etc.

BP Neural Networks Running for(i=0; i<neurons; i++) { } Training for(j=0, x=0; j<layerdimms[i]; j++) x += y[j]*w[j][i]; y[i] = x - t[i]; do { e += Train(y, x); } while (e > ERRMIN);

BP Neural Networks Running (XOR) Input Output Training Error

Feedback? What do you think? Possible Applications? Questions?

Conclusions / Shameful Plugs ToorCon 7 End of September, 2005 San Diego, CA USA http://www.toorcon.org ShmooCon Super Bowl Weekend, 2005 Washington DC, USA http://www.shmoocon.com LayerOne June, 2005 Los Angeles, USA http://www.layerone.info

Questions? Suggestions? David Hulton 0x31337@gmail.com h1kari@dachb0den.com Will be back up soon! OpenCores Xilinx http://www.opencores.org ISE Foundation (Free 60-day trial) Pico Computing, Inc. http://www.picocomputing.com Products will be available around March, 2005