How Does FPGA Work. Outline. FPGA Basics. FPGA Basics Virtex 5 Power Consumption in FPGAs Low Power Approaches. Arnaud Taffanel Peyman Pouyan

Similar documents
Chapter 7 Memory and Programmable Logic

Introduction to Digital System Design

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

9/14/ :38

Low Power AMD Athlon 64 and AMD Opteron Processors

Lecture 5: Gate Logic Logic Optimization

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

What is a System on a Chip?

RAM & ROM Based Digital Design. ECE 152A Winter 2012

Delay Characterization in FPGA-based Reconfigurable Systems

Open Flow Controller and Switch Datasheet

7 Series FPGA Overview

Reconfigurable System-on-Chip Design

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2

Memory Basics. SRAM/DRAM Basics

REC FPGA Seminar IAP Seminar Format

Introduction to Programmable Logic Devices. John Coughlan RAL Technology Department Detector & Electronics Division

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Sequential 4-bit Adder Design Report

INSTITUTE OF AERONAUTICAL ENGINEERING Dundigal, Hyderabad

University of Texas at Dallas. Department of Electrical Engineering. EEDG Application Specific Integrated Circuit Design

7a. System-on-chip design and prototyping platforms

Programmable Logic IP Cores in SoC Design: Opportunities and Challenges

System on Chip Platform Based on OpenCores for Telecommunication Applications

Chapter 2 Logic Gates and Introduction to Computer Architecture

FPGAs in Next Generation Wireless Networks

GETTING STARTED WITH PROGRAMMABLE LOGIC DEVICES, THE 16V8 AND 20V8

Memory. The memory types currently in common usage are:

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: ext: Sequential Circuit

Introduction. Jim Duckworth ECE Department, WPI. VHDL Short Course - Module 1

Memory Systems. Static Random Access Memory (SRAM) Cell

Read-only memory Implementing logic with ROM Programmable logic devices Implementing logic with PLDs Static hazards

Power Reduction Techniques in the SoC Clock Network. Clock Power

NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter

5V Tolerance Techniques for CoolRunner-II Devices

High-Level Synthesis for FPGA Designs

Chapter 9 Latches, Flip-Flops, and Timers

Qsys and IP Core Integration

Semiconductor Memories

With respect to the way of data access we can classify memories as:

STMicroelectronics. Deep Sub-Micron Processes 130nm, 65 nm, 40nm, 28nm CMOS, 28nm FDSOI. SOI Processes 130nm, 65nm. SiGe 130nm

International Journal of Electronics and Computer Science Engineering 1482

Introduction to CMOS VLSI Design

OpenSPARC T1 Processor

1.Introduction. Introduction. Most of slides come from Semiconductor Manufacturing Technology by Michael Quirk and Julian Serda.

Digital Integrated Circuit (IC) Layout and Design

Networking Virtualization Using FPGAs

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Chapter 9 Semiconductor Memories. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

ECE410 Design Project Spring 2008 Design and Characterization of a CMOS 8-bit Microprocessor Data Path

FPGA Music Project. Matthew R. Guthaus. Department of Computer Engineering, University of California Santa Cruz

Testing of Digital System-on- Chip (SoC)

IL2225 Physical Design

NORTHEASTERN UNIVERSITY Graduate School of Engineering

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 3, SEPTEMBER

«A 32-bit DSP Ultra Low Power accelerator»

路 論 Chapter 15 System-Level Physical Design

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Switch Fabric Implementation Using Shared Memory

NEW adder cells are useful for designing larger circuits despite increase in transistor count by four per cell.

CS250 VLSI Systems Design Lecture 8: Memory

EMBEDDED SYSTEM BASICS AND APPLICATION

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview

Advanced VLSI Design CMOS Processing Technology

USB - FPGA MODULE (PRELIMINARY)

Alpha CPU and Clock Design Evolution

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.

Chapter 1 Lesson 3 Hardware Elements in the Embedded Systems Chapter-1L03: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

1.1 Silicon on Insulator a brief Introduction

ANALOG & DIGITAL ELECTRONICS

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Arquitectura Virtex. Delay-Locked Loop (DLL)

SOLVING HIGH-SPEED MEMORY INTERFACE CHALLENGES WITH LOW-COST FPGAS

Signal Integrity: Tips and Tricks

Evaluating Embedded Non-Volatile Memory for 65nm and Beyond

Two-level logic using NAND gates

Gates, Circuits, and Boolean Algebra

1. Memory technology & Hierarchy

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures

Interfacing Credit Card-sized PCs to Board Level Electronics

INTRODUCTION TO DIGITAL SYSTEMS. IMPLEMENTATION: MODULES (ICs) AND NETWORKS IMPLEMENTATION OF ALGORITHMS IN HARDWARE

CMOS, the Ideal Logic Family

Digital Systems Design! Lecture 1 - Introduction!!

TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN

ELEC 5260/6260/6266 Embedded Computing Systems

Introduction to Xilinx System Generator Part II. Evan Everett and Michael Wu ELEC Spring 2013

CMOS Binary Full Adder

Hardware and Software

Pass Gate Logic An alternative to implementing complex logic is to realize it using a logic network of pass transistors (switches).

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

Digital VLSI Systems Design Prof. S. Srinivasan Department of Electrical Engineering Indian Institute of Technology, Madras.

PowerPC Microprocessor Clock Modes

Optimization and Comparison of 4-Stage Inverter, 2-i/p NAND Gate, 2-i/p NOR Gate Driving Standard Load By Using Logical Effort

Tyrant: A High Performance Storage over IP Switch Engine

Using FPGAs to Design Gigabit Serial Backplanes. April 17, 2002

White Paper Utilizing Leveling Techniques in DDR3 SDRAM Memory Interfaces

Transcription:

Advanced Digital IC Design How Does FPGA Work Arnaud Taffanel Peyman Pouyan Outline FPGA Basics Virtex 5 Power Consumption in FPGAs Low Power Approaches 2008-2-19 CMOS Design Styles FPGA Basics ASIC STANDARD IC FULL CUSTOM SEMI- CUSTOM Programmable STANDARD CELL GATE ARRAY, SEA OF GATES FPGA CPLD

Programmable Main Idea Basic idea: two-dimensional array of logic blocks and flip-flops with a means for the user to configure The interconnection between the logic blocks The function of each block. Different Programmable Devices Types of programmable logic Programmable Array (PLA) Programmable AND (PAL) Complex Programmable Device (CPLD) Field Programmable Gate Array (FPGA) Programmable Array (PLA) Two programmable planes Any combination of ANDs / Ors Sharing of AND terms across multiple Ors Programmable switches between horizontal and vertical lines A B C D Programmable AND array Programmable OR array Programmable AND (PAL) One programmable plane - AND / fixed OR Finite combination of ANDs / Ors Fewer switch count Faster than PLAs A B C D Programmable AND array Fixed OR Q0 Q1 Q2 Q3 Q0 Q1 Q2 Q3

Complex Programmable Devices (CPLD) Why FPGAs? Block contains PAL / PLA Registers Interconnect includes Full crossbar I/O Block Block Block Block Programmable interconnect Block Block Block Block I/O Partial interconnect Why FPGAs? (Contd.) Custom ICs sometimes designed to replace the large amount of glue logic Reduced system complexity and manufacturing cost, improved performance Custom ICs are very expensive to develop Custom ICs have a long delay to fabricate (time to market) Need to worry about two kinds of costs Development cost sometimes called non-recurring engineering (NRE) Manufacturing Cost Why FPGAs? (Contd.) Custom IC approach suitable only for products With very high volume (which decrease the NRE) Not time to market sensitive FPGAs introduced as an alternative to custom ICs Improved density relative to discrete SSI/MSI components With the aid of computer aided design (CAD) tools circuits could be implemented in a short amount of time relative to ASICs No physical layout process, no mask making, no IC manufacturing Lowers NRE Shortens TTM

FPGAs Why FPGAs? (Contd.) Programmable Elements Overview Compete with custom ICs Compete with microprocessors in dedicated and embedded applications Summary performance NREs Unit cost TTM ASIC ASIC FPGA ASIC FPGA MICRO FPGA MICRO MICRO ASIC FPGA MICRO Computer Aided Design Programmable Elements Overview (Contd.) Antifuse Programmable Elements Overview (Contd.) SRAM Configuration Memory Cell Read or Write Data Routing Connections

Field Programmable Gate Arrays Yes(in-system) Yes Medium No No Low Configurable Block (CLB) Look-up table (LUT) Register Or any kind of logic Adder, Multiplier, Memory, Microprocessor Input/Output Block (IOB) Special logic blocks at periphery of device for external connections Programmable interconnect Wires to connect inputs and outputs to logic blocks Field Programmable Gate Arrays (Contd.) Other FPGA building blocks LUT based Block: Clock distribution Embedded memory blocks Special purpose blocks DSP blocks Hardware multipliers, adders and registers Embedded microprocessors/microcontrollers High-speed serial transceivers

A transmission gate-based LUT 2-Input MUX as A Programmable A 0 B 1 S F Configuration A B S F= 0 0 0 0 0 X 1 X 0 Y 1 Y 0 Y X XY X 0 Y XY Y 0 X XY Y 1 X X + Y 1 0 X X 1 0 Y Y 1 1 1 1 MUX based Block: a&b c

Programmable Interconnect Fast local interconnect Horizontal and vertical lines of various lengths Switch matrixes Switch Matrix Structure Switch matrix programming illustration Switch Matrix Interconnect Switch Matrix Structure (Contd.) 6 pass transistors per switch matrix interconnect point Pass transistors act as programmable switches Pass transistor gates are driven by configuration memory cells FPGA Variations Families of FPGA s differ in Physical means of implementing user programmability Arrangement of interconnection wires Basic functionality of the logic blocks Most significant difference is in the method for providing flexible blocks and connections

Sea-Of-Module Architecture Sea-Of-Module Architecture Routing Channel Architecture Actel FPGA 8 input, single output combinational logic blocks Rows of programmable logic building blocks rows of interconnect Anti-fuse Technology I/O Buffers, Programming and Test I/O Buffers, Programming and Test I/O Buffers, Programming and Test I/O Buffers, Programming and Test Computer Aided Design Module Wiring Tracks

Actel module Basic module is a modified 4:1 multiplexer SOA S0 S1 Actel module (Contd.) Implementation of S-R Latch using actel FPGA R "0" D0 D1 2:1 MUX "0" 2:1 MUX 2:1 MUX Y 2:1 MUX Q D2 D3 2:1 MUX "1" 2:1 MUX SOB S Actel Interconnect XC4000 FPGA Architecture Module Horizontal Track Anti-fuse Vertical Track SRAM cells throughout the FPGA determine the functionality of the device

2 Four input function Generators(LUTS) 1 Three-input function 2 Registers Possible functions: Any fct of 5 var Two fcts of 4 var+one Fct of 3 var XC4000E CLB Example Implement the following functions on a single CLB of the XC4000 FPGA: X = A B (C + D) Y = AK + BK + C D K + AEJL Use look up table F to implement X Use look up table G for AEJL Use F, G and H for Y: Y = K(A+B + C D ) + AEJL = KX + AEJL= KF +G Example Virtex 5

Virtex 5 High end of the Xilinx FPGA's Family High performance (550MHz) Low-power conception 65nm CMOS process 2 slices per CLB Slices CLBs 4 Register 4 LUT Carry logic 6-input LUT : More efficients logic (ie. 4->1 Mux) CLBs Slices

Slices (contd) SLICEL Many configuration possible 64x1 64x1 Single Single port port RAM RAM 32x1 32x1 Dual Dual port port RAM RAM 32 32 stages stages Shift Shift Register Register 64x1 64x1 ROM ROM LUT LUT SLICEM Only Rom or LUT Slices (contd) 6 Input LUT Optimize common logic implementation DSP Slices High-performances DSP-Slices Up to 250 GMACs! Single precision float optimized 40 Operating mode adaptable dynamically

RAM 2 Types of RAM usable Distributed RAM Used the LUTs as RAM Closed to the logic Less RAM Use CLBs Global RAM More RAM 555MHz Far from the logic Routing problem Speed problem Power Consumption In FPGAs Power Consumption in FPGAs 1-Static power (5% to 20% ) Leakage current: reverse biased diode leakage current Sub-threshold conduction of transistors 2-Dynamic power (80% to 95% ) PD =1/2*C*Vdd^2*α* fclk It is consumed at the time of output switching of a CMOS circuit Dynamic power consumption in FPGA design Clock frequency Supply voltage Switching activity Resource utilization Power dissipation distribution in Xilinx Virtex-II FPGA

Ways to Reduce Dynamic Power Low Power Approaches in FPGAs Frequency Reduction Voltage Scaling Capacitance Reduction Input capacitance of the fan-out gates, Capacitance associated with Programmable interconnects Parasitic capacitance of the gate. Switching Activity Reduction Switched Capacitance Reduction Resource Utilization Reduction Using Built-in Macro Functions for Low Power Idea: Alternative techniques which use less routing resources than the traditional techniques 1- Low power implementation of register files

2- Low power implementation of shift registers Xilinx SRLUT input and output ports 3-Low power implementation of multiplier and accumulators

Virtex 5 Low Power 65nm CMOS process Potentially more leakage current Solved by Triple Oxide Process Technology Gain in dynamic power Many hard IP block DSP slices Virtex 5 Triple Oxide 65nm process -> more static power Tens of millions static configurations transistors Virtex 5 Triple Oxide (contd) 3 oxide Thickness Thin oxide for high speed part large oxide for 3.3v I/O midox for static configuration part Virtex 5 hard IP block Rocket IO : Low power serial IO hard IP bloc SATA (Serial Ata) Gigabyte Ethernet PCI Express Consume 100mW at 3.2 Gbps Tri-mode Ethernet MAC (10/100/1000)

References The Design Warriors Guide to FPGAS Low Power FPGA Design Techniques for Embedded Systems(PHD Thesis by Anurag Tiwari ) WWW.XILINX.COM Architecture of FPGAs and CPLDs: A Tutorial by Stephen Brown and Jonathan Rose Department of Electrical and Computer Engineering University of Toronto Peter.Nilsson: Slides of Advanced Digital IC Design Arnaud Taffanel - Peyman Pouyan