ECE 545 Digital System Design with VHDL Course web page: ECE web page Courses Course web pages ECE 545 http://ece.gmu.edu/coursewebpages/ece/ece545/f11/
Kris Gaj Research and teaching interests: Contact: reconfigurable computing computer arithmetic cryptography network security The Engineering Building, room 3225 kgaj@gmu.edu Office hours: Thursday, 7:30-8:30 PM, Tuesday, 7:30-8:30 PM, and by appointment
ECE 545 Part of: MS in Computer Engineering One of five core courses (must be passed with B or better) Fundamental course for the specialization area: Digital Systems Design Elective course in the remaining specialization areas MS in Electrical Engineering Elective
ECE 545 Part of: PhD in Electrical and Computer Engineering Knowledge tested at the Technical Qualifying Exam (TQE) Topic 2: Digital Design and Computer Organization
I am interested in VLSI Digital Systems Design ASICs & FPGAs VHDL/Verilog CAD Tools Reconfigurable Computing Microelectronics VLSI Fabrication Nanoelectronics I want to specialize primarily in CAD tools & Design Automation Hardware Description Languages FPGAs & Reconfigurable computing Computer Arithmetic Front-end ASIC Design (algorithmic downto gate level) Back-end ASIC Design (circuit and mask layout levels) Analog & Digital Circuit Design VLSI Fabrication Microelectronics Nanoelectronics Semiconductor Devices Recommended program & specialization MS CpE Digital Systems Design MS EE Microelectronics/ Nanoelectronics
Design level algorithmic register-transfer gate transistor layout devices Digital System Design with VHDL ECE 545 ECE 584 ECE 680 Physical VLSI Design Computer Arithmetic ECE 645 ECE 586 Courses Digital Integrated Circuits Semiconductor Device Fundamentals VLSI Design for ASICs ECE 681 ECE684 VLSI Test Concepts ECE 682 MOS Device Electronics
Pre- Approved Electives Suggested Electives CpE Digital Systems Design ECE 545 Digital System Design with VHDL ECE 586 Digital Integrated Circuits ECE 645 Computer Arithmetic ECE 681 VLSI Design for ASICs ECE 682 VLSI Test Concepts ECE 584, 684, (technology) ECE 511, 611, (microprocessors) ECE 646, 746, (applications) CpE Microprocessors and Embedded Systems ECE 510 Real-Time Concepts ECE 511 Microprocessors ECE 611 Advanced Microprocessors ECE 612 Real-Time Embedded Systems ECE 641 Computer System Architecture CS 540, 583 (languages, algorithms) CS 635 (parallel machines) ECE 542, 642, 742 (networks) ECE 645, 681 (digital design) ECE 548 (sequential mach. theory) Professors K. Gaj, K. Hintz, H. Homayoun, J. Kaps, T. Storey H. Homayoun, J. Kaps, P. Pachowicz, C. Sabzevari
DIGITAL SYSTEMS DESIGN Concentration advisors: Kris Gaj, Jens-Peter Kaps, Ken Hintz 1. ECE 545 Digital System Design with VHDL K. Gaj, project, FPGA design with VHDL, Aldec/Mentor Graphics, Xilinx/Altera 2. ECE 645 Computer Arithmetic K. Gaj, project, FPGA design with VHDL Aldec/Mentor Graphics, Xilinx/Altera 3. ECE 681 VLSI Design for ASICs H. Homayoun, project/lab, front-end and back-end ASIC design with Synopsys tools 4. ECE 586 Digital Integrated Circuits D. Ioannou, R. Mulpuri, 5. ECE 682 VLSI Test Concepts T. Storey
Grading Scheme Homework - 10% Project - 40% Midterm Exam - 20% Final Exam - 30%
Midterm exam 1 ü 2 hours 30 minutes ü in class ü design-oriented ü open-books, open-notes ü practice exams available on the web Tentative date: Last week of October
Final exam ü 2 hours 45 minutes ü in class ü design-oriented ü open-books, open-notes ü practice exams available on the web Date: Thursday, December 13, 4:30-7:15pm
Textbooks 12
Required Textbook Pong P. Chu, RTL Hardware Design Using VHDL, Wiley-Interscience, 2006.
Supplementary Textbook Basics Refresher Stephen Brown and Zvonko Vranesic, Fundamentals of Digital Logic with VHDL Design, McGraw-Hill, 3 rd or 2 nd Edition
Supplementary Textbook Advanced Hubert Kaeslin, Digital Integrated Circuit Design: From VLSI Architectures to CMOS Fabrication, Cambridge University Press; 1st Edition, 2008. Used in ECE 681 VLSI Design for ASICs
Technology & Tools 16
What is an FPGA? Configurable Logic Blocks Block RAMs Block RAMs I/O Blocks Block RAMs
FPGA Design process (1) Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds.. Specification / Pseudocode On-paper hardware design (Block diagram & ASM chart) Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core; VHDL description (Your Source Files) Functional simulation Synthesis Post-synthesis simulation
FPGA Design process (2) Implementation Timing simulation Configuration On chip testing
Simulation Tools
FPGA Synthesis Tools
Logic Synthesis VHDL description Circuit netlist architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; end MLU_DATAFLOW; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others;
FPGA Implementation After synthesis the entire implementation process is performed by FPGA vendor tools
Design Process control from Active-HDL
Xilinx FPGA Tools ECE Labs Aldec Active-HDL Design Flow Aldec Active-HDL (IDE) Xilinx XST & Synopsys Synplify Premier Xilinx ISE Design Suite Xilinx ISE Design Flow Mentor Graphics ModelSim SE Xilinx XST & Synopsys Synplify Premier Xilinx ISE Design Suite (IDE) simulation synthesis implementation
Xilinx FPGA Tools Home Aldec Active-HDL Design Flow Aldec Active-HDL Student Edition (IDE) Xilinx XST (restricted) Xilinx ISE Design Flow Mentor Graphics ModelSim PE Student Edition Xilinx XST (restricted) Xilinx ISE WebPACK (restricted) simulation synthesis implementation Xilinx ISE WebPACK (IDE) (restricted)
Altera FPGA Tools ECE Labs Altera Design Flow Mentor Graphics ModelSim-Altera Altera Quartus II Subscription Edition simulation synthesis & implementation
Altera FPGA Tools Home Altera Design Flow Mentor Graphics ModelSim-Altera Starter (restricted) Altera Quartus II Web Edition (restricted) simulation synthesis & implementation
Project 32
Project ü semester-long ü related to the research project conducted by Cryptographic Engineering Research Group (CERG) at GMU ü supporting NIST (National Institute of Standards and Technology) in the evaluation of candidates for a new cryptographic standard
Background 34
Crypto 101
Cryptography is Everywhere Buying a book on-line Withdrawing cash from ATM Teleconferencing over Intranets Backing up files on remote server
Cryptographic Standards Before 1997 Secret-Key Block Ciphers IBM & NSA 1977 1999 DES Data Encryption Standard 2005 Triple DES Hash Functions 1993 1995 2003 NSA SHA SHA-1 Secure Hash Algorithm SHA-2 1970 1980 1990 2000 2010 time
Why a Contest for a Cryptographic Standard? Avoid back-door theories Speed-up the acceptance of the standard Stimulate non-classified research on methods of designing a specific cryptographic transformation Focus the effort of a relatively small cryptographic community
Cryptographic Standard Contests IX.1997 X.2000 AES NESSIE I.2000 XII.2002 CRYPTREC 34 stream ciphers 4 HW winners + 4 SW winners 15 block ciphers 1 winner XI.2004 estream 51 hash functions 1 winner V.2008 X.2007 XII.2012 SHA-3 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 time
Cryptographic Contests - Evaluation Criteria Security Software Efficiency Hardware Efficiency µprocessors µcontrollers FPGAs ASICs Flexibility Simplicity Licensing 40
Specific Challenges of Evaluations in Cryptographic Contests Very wide range of possible applications, and as a result performance and cost targets throughput: cost: single Mbits/s to hundreds Gbits/s single cents to thousands of dollars Winner in use for the next 20-30 years, implemented using technologies not in existence today Large number of candidates Limited time for evaluation Only one winner and the results are final
Mitigating Circumstances Security is a primary criterion Performance of competing algorithms tend to very significantly (sometimes as much as 500 times) Only relatively large differences in performance matter (typically at least 20%) Multiple groups independently implement the same algorithms (catching mistakes, comparing best results, etc.) Second best may be good enough
AES Contest 1997-2000
Each team submits Rules of the Contest Detailed cipher specification Justification of design decisions Tentative results of cryptanalysis Source code in C Source code in Java Test vectors
AES: Candidate Algorithms 8 4 2 Canada: CAST-256 Deal USA: Mars RC6 Twofish Safer+ HPC Costa Rica: Frog Germany: Magenta Belgium: Rijndael France: DFC Israel, UK, Norway: Serpent Korea: Crypton Japan: E2 1 Australia: LOKI97
AES Contest Timeline June 1998 15 Candidates CAST-256, Crypton, Deal, DFC, E2, Frog, HPC, LOKI97, Magenta, Mars, RC6, Rijndael, Safer+, Serpent, Twofish, August 1999 October 2000 5 final candidates Mars, RC6, Twofish (USA) Rijndael, Serpent (Europe) 1 winner: Rijndael Belgium Round 1 Security Software efficiency Round 2 Security Software efficiency Hardware efficiency
NIST Report: Security & Simplicity Security High MARS Twofish Serpent Adequate Rijndael RC6 Complex Simple Simplicity
Efficiency in software: NIST-specified platform Throughput [Mbits/s] 30 25 20 15 10 5 200 MHz Pentium Pro, Borland C++ 128-bit key 192-bit key 256-bit key 0 Rijndael RC6 Twofish Mars Serpent
NIST Report: Software Efficiency Encryption and Decryption Speed 32-bit processors 64-bit processors DSPs high RC6 Rijndael Twofish Rijndael Twofish medium Rijndael Mars Twofish Mars RC6 Mars RC6 low Serpent Serpent Serpent
Throughput [Mbit/s] 500 450 400 350 300 250 431 Efficiency in FPGAs: Speed 444 414 353 Xilinx Virtex XCV-1000 294 George Mason University University of Southern California Worcester Polytechnic Institute 200 150 100 50 177 173 104 149 62 143 112 88 61 102 0 Serpent x8 Rijndael Twofish Serpent RC6 Mars x1
Throughput [Mbit/s] Efficiency in ASICs: Speed MOSIS 0.5µm, NSA Group 700 606 128-bit key scheduling 600 3-in-1 (128, 192, 256 bit) key scheduling 500 443 400 300 202 202 200 100 105 105 103 104 57 57 0 Rijndael Serpent Twofish RC6 Mars x1
Lessons Learned Results for ASICs matched very well results for FPGAs, and were both very different than software FPGA ASIC x8 x1 x1 GMU+USC, Xilinx Virtex XCV-1000 NSA Team, ASIC, 0.5µm MOSIS Serpent fastest in hardware, slowest in software
Lessons Learned Hardware results matter! Final round of the AES Contest, 2000 Speed in FPGAs GMU results Votes at the AES 3 conference
Limitations of the AES Evaluation Optimization for maximum throughput Single high-speed architecture per candidate No use of embedded resources of FPGAs (Block RAMs, dedicated multipliers) Single FPGA family from a single vendor: Xilinx Virtex
FPGA Evaluations AES estream SHA-3 Multiple FPGA families No No Yes Multiple architectures No Yes Yes Use of embedded resources Primary optimization target No No Yes Throughput Area Throughput/ Area Throughput/ Area Experimental results No No Yes Availability of source codes No No Yes Specialized tools No No Yes
ASIC Evaluations AES estream SHA-3 Multiple processes/ libraries No No Yes Multiple architectures No Yes Yes Primary optimization target Throughput Power x Area x Time Throughput /Area Post-layout results No Yes Yes Experimental results No Yes Yes Availability of source codes No No Yes Specialized tools No No No
Benchmarking Tools
Tools for Benchmarking Implementations of Cryptography Software FPGAs ASICs ebacs D. Bernstein (UIC) T. Lange (TUE) 2006-present ATHENa K. Gaj, J. Kaps, et al. (GMU) 2009-present?
Benchmarking in Software: ebacs 59
ebacs: ECRYPT Benchmarking of Cryptographic Systems: http://bench.cr.yp.to/ SUPERCOP - toolkit developed by D. Bernstein and T. Lange for measuring performance of cryptographic software measurements on multiple machines (currently over 90) each implementation is recompiled multiple times (currently over 1600 times) with various compiler options time measured in clock cycles/byte for multiple input/output sizes median, lower quartile (25 th percentile), and upper quartile (75 th percentile) reported standardized function arguments (common API) 60
SUPERCOP Extension for Microcontrollers XBX: 2009-present Allows on-board timing measurements Supports at least the following microcontrollers: Developers: Ø Christian Wenzel-Benner, ITK Engineering AG, Germany Ø Jens Gräf, LiNetCo GmbH, Heiger, Germany 8-bit: Atmel ATmega1284P (AVR) 32-bit: TI AR7 (MIPS) Atmel AT91RM9200 (ARM 920T) Intel XScale IXP420 (ARM v5te) Cortex-M3 (ARM)
Benchmarking in FPGAs: ATHENa 62
ATHENa Automated Tool for Hardware EvaluatioN http://cryptography.gmu.edu/athena Open-source benchmarking environment, written in Perl, aimed at AUTOMATED generation of OPTIMIZED results for MULTIPLE hardware platforms. The most recent version 0.6.2 released in June 2011. Full features in ATHENa 1.0 to be released in 2012. 63
Why Athena? "The Greek goddess Athena was frequently called upon to settle disputes between the gods or various mortals. Athena Goddess of Wisdom was known for her superb logic and intellect. Her decisions were usually well-considered, highly ethical, and seldom motivated by self-interest. from "Athena, Greek Goddess of Wisdom and Craftsmanship" 64
Basic Dataflow of ATHENa 5 Database query ATHENa Server 6 User Ranking of designs 1 Download scripts and configuration files8 HDL + scripts + configuration files FPGA Synthesis and Implementation 2 3 Result Summary + Database Entries HDL + FPGA Tools 4 Database Entries 0 Designer Interfaces + Testbenches 65
Three Components of the ATHENa Environment ATHENa Tool ATHENa Database of Results ATHENa Website
ATHENa Database of Results 67
ATHENa Database http://cryptography.gmu.edu/athenadb 68
ATHENa Database Result View Algorithm parameters Design parameters Optimization target Architecture type Datapath width I/O bus widths Availability of source code Platform Vendor, Family, Device Timing Maximum clock frequency Maximum throughput Resource utilization Logic blocks (Slices/LEs/ALUTs) Multipliers/DSP units Tools Names & versions Detailed options Credits Designers & contact information 69
ATHENa Database Compare Feature Matching fields in grey Non-matching fields in red and blue 70
ATHENa - Website 71
ATHENa Website http://cryptography.gmu.edu/athena/ Download of ATHENa Tool Links to related tools SHA-3 Competition in FPGAs & ASICs Specifications of candidates Interface proposals RTL source codes Testbenches ATHENa database of results Related papers & presentations 72
ATHENa Result Replication Files Scripts and configuration files sufficient to easily reproduce all results (without repeating optimizations) Automatically created by ATHENa for all results generated using ATHENa Stored in the ATHENa Database In the same spirit of Reproducible Research as: J. Claerbout (Stanford University) Electronic documents give reproducible research a new meaning, in Proc. 62nd Ann. Int. Meeting of the Soc. of Exploration Geophysics, 1992, http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible:seg92..... Patrick Vandewalle 1, Jelena Kovacevic 2, and Martin Vetterli 1 ( 1 EPFL, 2 CMU) Reproducible research in signal processing - what, why, and how. IEEE Signal Processing Magazine, May 2009. http://rr.epfl.ch/17/ 73
Benchmarking Goals Facilitated by ATHENa 1. cryptographic algorithms Comparing multiple: 2. hardware architectures or implementations of the same cryptographic algorithm 3. hardware platforms from the point of view of their suitability for the implementation of a given algorithm, (e.g., choice of an FPGA device or FPGA board) 4. tools and languages in terms of quality of results they generate (e.g. Verilog vs. VHDL, Synplicity Synplify Premier vs. Xilinx XST, ISE v. 13.1 vs. ISE v. 12.3) 74
Your Project: Implementation and Benchmarking of Authenticated Ciphers 75
Features of Authenticated Ciphers 1. Confidentiality Bob 2. Message integrity Bob Charlie Alice Alice 3. Message authentication Bob Charlie Alice Charlie
All Projects - Organization Projects divided into phases Deliverables for each phase submitted through Blackboard at selected checkpoints and evaluated by the instructor and/or TA Feedback provided to students on a best effort basis Final report and codes submitted using Blackboard at the end of the semester
Honor Code Rules All students are expected to write and debug their codes individually Students are encouraged to help and support each other in all problems related to the - operation of the CAD tools - understanding of an investigated algorithm and existing implementations - understanding of the project tasks
Additional Skills Learned in the Project Reading & understanding specification of a complex algorithm Design of new hardware architectures based on existing architectures (datapath & controller) Reading, understanding, and modifying existing VHDL code Using embedded resources of modern FPGAs Characterizing performance of your codes for multiple FPGA families 79