ECE 4514 Digital Design II. Spring Lecture 14: The Spartan 3E FPGA

Similar documents
Chapter 7 Memory and Programmable Logic

7 Series FPGA Overview

Chapter 2 Logic Gates and Introduction to Computer Architecture

Optimising the resource utilisation in high-speed network intrusion detection systems.

Delay Characterization in FPGA-based Reconfigurable Systems

Introduction. Jim Duckworth ECE Department, WPI. VHDL Short Course - Module 1

Open Flow Controller and Switch Datasheet

9/14/ :38

Introduction to Programmable Logic Devices. John Coughlan RAL Technology Department Detector & Electronics Division

ASYNCHRONOUS COUNTERS

Memory Basics. SRAM/DRAM Basics

To design digital counter circuits using JK-Flip-Flop. To implement counter using 74LS193 IC.

FPGA Implementation of Boolean Neural Networks using UML

Design and FPGA Implementation of a Novel Square Root Evaluator based on Vedic Mathematics

Rotary Encoder Interface for Spartan-3E Starter Kit

EE 42/100 Lecture 24: Latches and Flip Flops. Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

USB - FPGA MODULE (PRELIMINARY)

Modeling Latches and Flip-flops

Chapter 5 :: Memory and Logic Arrays

ETEC 2301 Programmable Logic Devices. Chapter 10 Counters. Shawnee State University Department of Industrial and Engineering Technologies

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

CMS Level 1 Track Trigger

Lecture 8: Synchronous Digital Systems

Applications of algorithms for image processing using programmable logic

Digital Systems. Syllabus 8/18/2010 1

Modeling Registers and Counters

Lecture 7: Clocking of VLSI Systems

Memory unit. 2 k words. n bits per word

LAB #4 Sequential Logic, Latches, Flip-Flops, Shift Registers, and Counters

Design Verification and Test of Digital VLSI Circuits NPTEL Video Course. Module-VII Lecture-I Introduction to Digital VLSI Testing

Read-only memory Implementing logic with ROM Programmable logic devices Implementing logic with PLDs Static hazards

Lecture-3 MEMORY: Development of Memory:

A First Course in Digital Design Using VHDL and Programmable Logic

Hardware and Software

How To Fix A 3 Bit Error In Data From A Data Point To A Bit Code (Data Point) With A Power Source (Data Source) And A Power Cell (Power Source)

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Low-resolution Image Processing based on FPGA

Sequential Logic. (Materials taken from: Principles of Computer Hardware by Alan Clements )

System on Chip Platform Based on OpenCores for Telecommunication Applications

So far we have investigated combinational logic for which the output of the logic devices/circuits depends only on the present state of the inputs.

The 104 Duke_ACC Machine

Chapter 9 Latches, Flip-Flops, and Timers

Digital Systems Design! Lecture 1 - Introduction!!

Lesson 12 Sequential Circuits: Flip-Flops

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Introduction to Digital Design Using Digilent FPGA Boards Block Diagram / Verilog Examples

Experiment # 9. Clock generator circuits & Counters. Eng. Waleed Y. Mousa

FPGA Music Project. Matthew R. Guthaus. Department of Computer Engineering, University of California Santa Cruz

Counters and Decoders

Lab 1: Introduction to Xilinx ISE Tutorial

40G MACsec Encryption in an FPGA

IMPLEMENTATION OF FPGA CARD IN CONTENT FILTERING SOLUTIONS FOR SECURING COMPUTER NETWORKS. Received May 2010; accepted July 2010

FUNCTIONAL ENHANCEMENT AND APPLICATIONS DEVELOPMENT FOR A HYBRID, HETEROGENEOUS SINGLE-CHIP MULTIPROCESSOR ARCHITECTURE

RAM & ROM Based Digital Design. ECE 152A Winter 2012

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Accelerate Cloud Computing with the Xilinx Zynq SoC

Lecture 5: Gate Logic Logic Optimization

Example-driven Interconnect Synthesis for Heterogeneous Coarse-Grain Reconfigurable Logic

Introduction to Digital System Design

Computer Systems Structure Main Memory Organization

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2

1. Memory technology & Hierarchy

Electronics Merit Badge Class 3. 1/30/2014 Electronics Merit Badge Class 3 1

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

BINARY CODED DECIMAL: B.C.D.

GETTING STARTED WITH PROGRAMMABLE LOGIC DEVICES, THE 16V8 AND 20V8

CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: ext: Sequential Circuit

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

SYSTEM-ON-PROGRAMMABLE-CHIP DESIGN USING A UNIFIED DEVELOPMENT ENVIRONMENT. Nicholas Wieder

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Implementation and Design of AES S-Box on FPGA

Lecture N -1- PHYS Microcontrollers

CHAPTER 11: Flip Flops

路 論 Chapter 15 System-Level Physical Design

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 16 Timing and Clock Issues

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

SDLC Controller. Documentation. Design File Formats. Verification

University of Texas at Dallas. Department of Electrical Engineering. EEDG Application Specific Integrated Circuit Design

A New Paradigm for Synchronous State Machine Design in Verilog

7a. System-on-chip design and prototyping platforms

Master/Slave Flip Flops

Topics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.

FPGA Implementation of System-on-chip (SOC) Architecture for Spacecraft Application

Digital Systems Based on Principles and Applications of Electrical Engineering/Rizzoni (McGraw Hill

SEQUENTIAL CIRCUITS. Block diagram. Flip Flop. S-R Flip Flop. Block Diagram. Circuit Diagram

A Practical Parallel CRC Generation Method

Hardware Implementations of RSA Using Fast Montgomery Multiplications. ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner

Design Verification & Testing Design for Testability and Scan

REC FPGA Seminar IAP Seminar Format

Kirchhoff Institute for Physics Heidelberg

COMBINATIONAL and SEQUENTIAL LOGIC CIRCUITS Hardware implementation and software design

Study of 32-bit RISC Processor Architecture and VHDL FPGA Implementation 32-bitMatrix Manipulation

Programmable Logic Devices: A Test Approach for the Input/Output Blocks and Pad-to-Pin Interconnections

Module 3: Floyd, Digital Fundamental

Arquitectura Virtex. Delay-Locked Loop (DLL)

Lecture 10: Latch and Flip-Flop Design. Outline

Low Power AMD Athlon 64 and AMD Opteron Processors

Transcription:

ECE 4514 Digital Design II Lecture 14: The Spartan 3E FPGA

FPGA Internals: The Spartan 3ES500 FPGA The internals of an FPGA focus on the Spartan 3ES500 FPGA Objectives: Understand how an FPGA can implement arbitrary circuits written in Verilog Identify important metrics of FPGA capacity and features Understand FPGA 'speak' used by design tools: LUT, LC, BRAM, DCM, IOB,.. Two important documents for the Spartan 3E FPGA 'Spartan 3E Family Complete Datasheet' 'Spartan 3 User Guide'

Spartan 3 5 programmable elements in a regular network

Spartan 3 5 programmable elements in a regular network 1. CLB Implement gates and flipflops

Spartan 3 5 programmable elements in a regular network 2. Input/Output Blocks Provide off-chip connectivity

Spartan 3 5 programmable elements in a regular network 3. Block RAM On-chip 18-KBit Static RAM storage

Spartan 3 5 programmable elements in a regular network 4. Multiplier Hardwired 18x18 Multiplier Cell

Spartan 3 5 programmable elements in a regular network 5. DCM Digital Clock Manager for Clock generation and distribution

Spartan 3 5 programmable elements in a regular network Interconnections Programmable Interconnection Network is a vital part to FPGA

Spartan 3 In this lecture we discuss two elements CLB Interconnection Network Other elements will be discussed later BRAM Multiplier IOB DCM

CLB = Configurable Logic Block 4 'Slices' per CLB Each slice can work as logic (gates + flip-flop), distributed RAM, or shift register Switch Matrix connects to FPGA network

Left-hand and Right-hand slice SRL16 = 16-bit shift register (~ 16x1 bit memory) RAM16 = 16-bit RAM (~ 16x1 bit memory) LUT4 = 4-bit lookup table (~ 4x4bit memory) SLICEM = Slice that can work as memory or logic SLICEL = Slice that only works as logic left-hand slice right-hand slice

LUT4: basic building block RAM memory 4-bit input, 1-bit output Can implement any logic function of up to 4 inputs A0 A1 A2 A3 16-bit memory Q

LUT4: basic building block Example: Implement 3-input AND Assume A0 - A2 are used A0 A1 A2 A3 Q 0 0 0 0 0 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 1 1 0 1 x x x x x x x x 0 0 0 0 0 0 0 1 A0 A1 A2 A3 16-bit memory Q

Slice Muxes extend 4-input LUT A0 A1 A2 A3 16-bit memory Q1 Q A0 A1 A2 A3 16-bit memory Q2 A4

Upper & Lower LUT can be combined further..

One CLB can implement up to 7-input LUT LUT Local Wires

Top half of a (left-hand) slice

Top half of a (left-hand) slice SRL16/ RAM16/ LUT4 MUX REGISTER

Works only as logic..

.. or as logic + register

.. or only as register

Each LUT comes with a FF, so: In an FPGA, the left circuit uses as much slices as the right circuit 1 slice 1 slice 1 slice 1 slice

What about this case? Do you need the same amount of slices for each circuit? 2-input AND 2-input AND 2-input AND 2-input AND

What about this case? Do you need the same amount of slices for each circuit? No! You can cluster the right circuit in a single slice 2-input AND 2-input AND 2-input AND 2-input AND 1 slice 1 slice 1 slice

Fast Carry Path (for arithmetic)

Typical use of carry path

Summary CLB 1 configurable logic block contains 4 slices Each slice contains a combination of 2 LUT 2 flip flop MUXes Carry-logic A LUT can operate as Random logic function of 4 inputs 16-bit RAM 16-bit shift register Tools report area results in slices Slices can be many different things, depending on configuration

Interconnect Network Interconnections are THE critical component in an FPGA Largest physical area on a die Largest power consumption In most cases, wire delay becomes performance bottleneck Ironically, interconnections are hardly mentioned as a feature in FPGA product literature Engineers like to hear about the number of CLBs in an FPGA, not about the number of wires between CLBs... FPGA interconnect network is a hierarchical architecture Many fast, short wires with small drive capacity Few longer wires with high drive capacity Like a slice architecture, design of interconnect network is a 'secret sauce' of the FPGA..

Interconnection Network in Spartan 3 CLBs connect to 'switch matrix' which connects to the on-chip network Each switch matrix interconnects many different wires

Interconnection Network 4 wire types connected to switch matrix

Interconnection Network Clock signals are distributed with a separate, dedicated network Another reason to avoid gated clocks.. A recursive 'fish-bone' network ensures that clock arrives everywhere at the same time Clock signals generated externally or else inside of 'DCM' DCM can create multiply/ divide external clock to a different on-chip clock

Now, we understand databook figures The Spartan 3E starter kit has a 3ES500 FPGA

Summary on 3ES500 XC3S500E = 500K 'system gates' 10,476 Logic Cells A 'Logic Cell' is the unit of configuration in an FPGA. It consists of a lookup table and a flip-flop. In the case of Xilinx, 1 LC = 2.25 LUT..

Summary on 3ES500 XC3S500E = 500K 'system gates' columns 46 rows, 34 columns rows

Summary on 3ES500 XC3S500E = 500K 'system gates' columns 46 rows, 34 columns 46 x 34 = 1564 positions rows 1,164 CLB's So 400 'positions' used for other things (RAM,...)

Summary on 3ES500 XC3S500E = 500K 'system gates' columns 1,164 CLB's 4,656 Slices rows Each CLB contains 4 'Slices' A Slice contains 2.25 LC's 2.25 x 4,656 = 10,476

Summary on 3ES500 XC3S500E = 500K 'system gates' columns 1,164 CLB's 4,656 Slices rows 73K 'Distributed RAM' Bits Each Slice contains 4 x 4 bits = 16 bits of 'RAM' 4,656 x 16 = 74496 = 72.75K

Summary on 3ES500 XC3S500E = 500K 'system gates' columns 1,164 CLB's 4,656 Slices rows 360K of Block RAM bits 18 Kbit per Block RAM Thus 360/18 = 20 Block RAM

Summary on 3ES500 XC3S500E = 500K 'system gates' columns 1,164 CLB's 4,656 Slices rows 360K of Block RAM bits 20 Multipliers Next to each Block RAM there is a multiplier cell

Now, we also understand tool output The SHA-1 Design, revisited

Now, we also understand tool output The SHA-1 Design, revisited Used 995 Slices (2x995 = 1990 LUT's at most) Used 1836 LUTs of 1990 possible LUTs Used 1676 of 1836 occupied LUTs for actual logic Used 503 Flip Flops of (2x995 = 1990 Flip Flops at most)

Summary FPGA FPGA contains 5 elements + interconnection network CLB RAM Multiplier DCM IOB A 'gate-level netlist' is mapped on an FPGA by configuring these elements Choose CLB configuration (LUT, flip-flop, carry logic,..) Choose interconnections in network