FPGAs for High-Performance DSP Applications



Similar documents
White Paper FPGA Performance Benchmarking Methodology

Enhancing High-Speed Telecommunications Networks with FEC

Video and Image Processing Suite

White Paper Utilizing Leveling Techniques in DDR3 SDRAM Memory Interfaces

White Paper Military Productivity Factors in Large FPGA Designs

White Paper Streaming Multichannel Uncompressed Video in the Broadcast Environment

Understanding CIC Compensation Filters

Using Altera MAX Series as Microcontroller I/O Expanders

White Paper Increase Flexibility in Layer 2 Switches by Integrating Ethernet ASSP Functions Into FPGAs

Engineering Change Order (ECO) Support in Programmable Logic Design

Using the On-Chip Signal Quality Monitoring Circuitry (EyeQ) Feature in Stratix IV Transceivers

White Paper Understanding Metastability in FPGAs

White Paper Video and Image Processing Design Using FPGAs

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Quartus II Software Design Series : Foundation. Digitale Signalverarbeitung mit FPGA. Digitale Signalverarbeitung mit FPGA (DSF) Quartus II 1

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

December 2002, ver. 1.0 Application Note 285. This document describes the Excalibur web server demonstration design and includes the following topics:

White Paper 40-nm FPGAs and the Defense Electronic Design Organization

Using Pre-Emphasis and Equalization with Stratix GX

White Paper Video Surveillance Implementation Using FPGAs

MAX II ISP Update with I/O Control & Register Data Retention

White Paper Using the Intel Flash Memory-Based EPC4, EPC8 & EPC16 Devices

Using Nios II Floating-Point Custom Instructions Tutorial

White Paper Reduce Total System Cost in Portable Applications Using Zero-Power CPLDs

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.

Using the Agilent 3070 Tester for In-System Programming in Altera CPLDs

Quartus II Software and Device Support Release Notes Version 15.0

Qsys and IP Core Integration

In-System Programmability

OTU2 I.7 FEC IP Core (IP-OTU2EFECI7Z) Data Sheet

Non-Data Aided Carrier Offset Compensation for SDR Implementation

7a. System-on-chip design and prototyping platforms

MorphIO: An I/O Reconfiguration Solution for Altera Devices

PROFINET IRT: Getting Started with The Siemens CPU 315 PLC

LogiCORE IP AXI Performance Monitor v2.00.a

Radar Processing: FPGAs or GPUs?

USB-Blaster Download Cable User Guide

Binary Numbering Systems

15. Introduction to ALTMEMPHY IP

MIMO detector algorithms and their implementations for LTE/LTE-A

ModelSim-Altera Software Simulation User Guide

Using the Altera Serial Flash Loader Megafunction with the Quartus II Software

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions

TIP-VBY1HS Data Sheet

SDLC Controller. Documentation. Design File Formats. Verification

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN.

9/14/ :38

Xilinx 7 Series FPGA Power Benchmark Design Summary May 2015

Rapid System Prototyping with FPGAs

Embedded Electric Power Network Monitoring System

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

MAX 10 Analog to Digital Converter User Guide

Altera Error Message Register Unloader IP Core User Guide

Fujisoft solves graphics acceleration for the Android platform

Model-based system-on-chip design on Altera and Xilinx platforms

HARDWARE ACCELERATION IN FINANCIAL MARKETS. A step change in speed

8. Hardware Acceleration and Coprocessing

White Paper Power-Optimized Solutions for Telecom Applications

Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data

Avalon Interface Specifications

13. Publishing Component Information to Embedded Software

MasterBlaster Serial/USB Communications Cable User Guide

Networking Remote-Controlled Moving Image Monitoring System

Fastest Path to Your Design. Quartus Prime Software Key Benefits

White Paper Building Flexible, Cost-Efficient Broadband Access Equipment Line Cards

FPGA Music Project. Matthew R. Guthaus. Department of Computer Engineering, University of California Santa Cruz

PowerPlay Power Analysis & Optimization Technology

Building Blocks for Rapid Communication System Development

For Quartus II Software. This Quick Start Guide will show you. how to set up a Quartus. enter timing requirements, and

High-Level Synthesis Tools for Xilinx FPGAs

FPGA Design From Scratch It all started more than 40 years ago

Ping Pong Game with Touch-screen. March 2012

Current-Transformer Phase-Shift Compensation and Calibration

Nutaq. PicoDigitizer 125-Series 16 or 32 Channels, 125 MSPS, FPGA-Based DAQ Solution PRODUCT SHEET. nutaq.com MONTREAL QUEBEC

DDS. 16-bit Direct Digital Synthesizer / Periodic waveform generator Rev Key Design Features. Block Diagram. Generic Parameters.

Video and Image Processing Design Example

Architectures and Platforms

Introduction to PCI Express Positioning Information

A DA Serial Multiplier Technique based on 32- Tap FIR Filter for Audio Application

Cryptographic Rights Management of FPGA Intellectual Property Cores

White Paper Broadcast Video Infrastructure Implementation Using FPGAs

FPGA Prototyping Primer

ELECTENG702 Advanced Embedded Systems. Improving AES128 software for Altera Nios II processor using custom instructions

Providing Battery-Free, FPGA-Based RAID Cache Solutions

Designing with High-Density BGA Packages for Altera Devices

Altera SoC Embedded Design Suite User Guide

Quartus II Software Download and Installation Quick Start Guide

Digital Multiplexer and Demultiplexer. Features. General Description. Input/Output Connections. When to Use a Multiplexer. Multiplexer 1.

DDR4 Memory Technology on HP Z Workstations

Continuous-Time Converter Architectures for Integrated Audio Processors: By Brian Trotter, Cirrus Logic, Inc. September 2008

USB-Blaster II Download Cable User Guide

PSoC Creator Component Data Sheet. Calculation accuracy 0.01 C for -200 C to 850 C temperature range

Building an IP Surveillance Camera System with a Low-Cost FPGA

FPGAs in Next Generation Wireless Networks

AN3998 Application note

LatticeECP3 High-Speed I/O Interface

Transcription:

White Paper FPGAs for High-Performance DSP Applications This white paper compares the performance of DSP applications in Altera FPGAs with popular DSP processors as well as competitive FPGA offerings. With higher performance, you can easily time-divisionmultiplex your DSP design to increase the number of processing channels, reducing the overall cost of your system. Table 1 shows the performance advantages Altera offers over other silicon solutions for DSP systems. Table 1. Altera DSP Performance Advantage Comparison Category Altera FPGAs vs. DSP processors High-performance FPGAs comparison: Altera s Stratix II FPGAs vs. Xilinx s Virtex-4 FPGAs Low-cost FPGAs: Altera s Cyclone II FPGAs vs. Xilinx s Spartan-3 FPGAs Altera Performance Advantage 10 DSP processing power per dollar Up to 1.8 and on-average 1.2 higher performance Up to 2 and on-average 1.5 higher performance Figure 1 compares design performance in Altera Stratix II and Cyclone II devices to Xilinx Virtex-4 and Spartan-3 devices, respectively. Figure 1. DSP Proprietary IP & Open Core Results Comparison The Stratix II devices achieved an f MAX of over 350 MHz in 9 of the 17 designs, and two FIR designs exceeded 400 MHz. In comparison, only 2 of the 17 designs in Virtex-4 devices operated above 350 MHz. May 2005, ver. 1.1 1 WP-041905-1.1

FPGAs for High Performance DSP Applications Altera Corporation The Cyclone II devices achieved an f MAX of over 200 MHz in 9 of the 17 designs, and one FIR design exceeded 300 MHz. None of the 17 designs in Spartan-3 devices operated above 200 MHz. Performance Comparison Metrics There are many ways to compare the performance of different DSP solutions, and each provides a different level of accuracy. The following are three ways to compare DSP performance. Embedded Multipliers Performance: This is a simplistic method for comparing relative DSP performance that does not take into account the supporting architecture surrounding the embedded multipliers and the complexity and performance of the overall DSP design. This method is the least accurate of the three. DSP IP Benchmarks: This method is a more accurate performance comparison between different silicon solutions because it measures the performance of popular functional operations that are integral to many DSP designs. Finite Impulse Response (FIR) filtering and Fast Fourier Transforms (FFT) are two of the most common DSP IP benchmarks. Application Level Benchmarks: This method precisely measures the performance of a particular silicon solution when implementing a specific application. An example is the benchmarking results from Berkeley Technology Inc. (BDTI). The performance comparisons in this white paper use DSP IP benchmarks and application level benchmarks. The DSP IP performance data is based on both open and proprietary IP cores comparing Altera s Stratix II and Cyclone II FPGAs with Xilinx s Virtex-4 and Spartan-3 devices, respectively. The application level benchmark data is based on real DSP systems for comparison of Altera s first generation Stratix FPGAs against popular DSP processors. BDTI Benchmarks - FPGA vs. DSP Processor Berkeley Technology Inc (BDTI) is the leading provider of independent DSP benchmarks and publishes periodic analysis, FPGAs for DSP, comparing the FPGA performance vs. common DSP processors. The latest benchmark based on an orthogonal frequency division multiplexing (OFDM) system shows that Altera s first generation Stratix FPGAs provide over 95% cost reduction per channel compared to other DSP processor. (See Table 2). Table 2. BDTI Benchmark Results on OFDM System Comparing Stratix FPGAs & Other DSP Processors. DSP A DSP B Altera Stratix EP1S20-6 Altera Stratix EP1S80-6 Channels <0.2 ~0.7 ~20 ~60 Cost (1 ku) (1) ~$15 ~$210 $120 $600 Cost/channel ~$100 ~$300 ~$6 ~$10 Note to Table 2: (1) As of the second quarter of 2005. Results from FPGAs for DSP and unpublished benchmarks. Results 2005 BDTI 2

Altera Corporation FPGAs for High Performance DSP Applications OFDM Receiver System Information The benchmarked OFDM receiver system uses algorithms ranging from table look-ups to MAC-intensive transforms. The data sizes ranges from 4 to 16 bits while the data rate ranges from 40 to 320 Mbps. Data includes real and complex values. See Figure 2. Figure 2. OFDM System Block Diagram Input and output precision is 8-bit. This FIR filter in this design is a 127-tap complex FIR with real coefficients and the FFT is a 256-point complex FFT with input and output in natural order. The Slicer is a QAM-256 demapper. Soft decision Viterbi Decoder is used in this design. For even higher performance, based on the benchmark results using real customer designs, Altera s Stratix II FPGAs offer an average of 50% higher performance than Stratix FPGAs. See the Stratix II Performance & Logic Efficiency Analysis White Paper for more details. FPGA vs. FPGA DSP IP performance benchmarks compare both high-performance, high-density FPGAs and low-cost FPGAs. The high-performance, high-density FPGA analysis compares Altera Stratix II FPGAs and Xilinx Virtex-4 FPGAs. The low-cost FPGA analysis compares Altera Cyclone II FPGAs and Xilinx Spartan-3 FPGAs. The DSP IP performance benchmark uses Altera and Xilinx proprietary IP cores and open cores from www.opencores.org. Benchmarking Methodology & Setup Benchmarking an FPGA performance is a very complex task. A poor benchmarking process can provide inconclusive and incorrect results. Altera has invested significantly to develop a rigorous and scientific benchmarking methodology that is endorsed by industry experts as a meaningful and accurate way to measure FPGA performance. For detailed benchmarking methodology, refer to the FPGA Performance Benchmarking Methodology White Paper. Table 3 shows the benchmark setup. 3

FPGAs for High Performance DSP Applications Altera Corporation Table 3. Benchmark Setup FPGA Category FPGA Family Speed Grade High- Altera Stratix II Fastest(-3) QIS (1), (2) Performance FPGAs Xilinx Virtex-4 Fastest(-12) XST (1), (2) Low-Cost FPGAs Altera Cyclone II Fastest(-6) QIS (1), (2) Xilinx Spartan-3 Fastest(-5) XST (1), (2) Synthesis Tool Proprietary Open IP Cores Cores Synplify Pro 8.0 Synplify Pro 8.0 Synplify Pro 8.0 Synplify Pro 8.0 Place-&-Route Tool Quartus II version 5.0 ISE 7.1i Service Pack 1 Quartus II version 5.0 ISE 7.1i Service Pack 1 Notes to Table 3: (1) QIS Quartus Integrated Synthesis; XST-Xilinx Synthesis Technology (2) FPGA vendor s synthesis tools are used to compile proprietary cores because these cores are generated net lists and the tool is only responsible for sythesizing the core wrapper Proprietary IP & Open Core s Proprietary IP cores are cores generated from Altera s MegaWizard and Xilinx s CORE Generator tools. For proprietary IP core comparison, Altera used three types of common DSP IP cores with a total of nine designs: FIR filters FFT Forward Error Correction (FEC) These IP cores are generated from each FPGA vendor s tool and benchmarked without further manual optimization. For open core comparison, Altera selected and benchmarked six different DSP-related open IP cores from www.opencores.org. Cores are chosen if its popularity statistics on this web site is greater than 10%. In addition, the complex FFT core is chosen because it is commonly found in DSP designs. The selected open cores are written in generic HDL code except for the use of FPGA-specific primitives in original designs, such as instantiations of embedded memory blocks and multipliers. To allow the compilation of such designs for different FPGAs and to provide a fair comparison, FPGA-specific primitives in each design are converted to use the embedded features of a specific FPGA to achieve the best performance. After FPGA-specific primitives are converted, the open cores are benchmarked without futher manual optimization to keep them as close as possible to their original state. More information for both the proprietary IP and open cores is available in the appendix. High-Performance FPGA Proprietary IP & Open Core Comparison For high-performance and high-density FPGAs, Altera s Stratix II family offers up-to 1.8 higher performance, and an average of 1.2 higher performance, than Xilinx Virtex-4 FPGAs. See Figure 3 for relative performance comparison and Table 4 for detailed performance data for Stratix II and Virtex-4 families. 4

Altera Corporation FPGAs for High Performance DSP Applications Modern FPGAs embed dedicated multipliers to increase the speed of multiply-accumulate operations that are essential for many DSP designs. However, the best system performance relies on more than raw multiplier speed. It is critical to couple these multipliers with a complementary logic structure and routing fabrics of the same performance. The Stratix II family seamlessly integrates DSP blocks that operate at up to 450 MHz with highperformance adaptive logic modules (ALMs) and routing fabric to offer the highest system performance for your DSP designs. As shown in Figure 1, The Stratix II device family operated at over 350 MHz in 9 of the 17 designs, and two FIR designs exceeded 400 MHz. In comparison, only 2 of the 17 designs in Virtex- 4 devices exceeded 350 MHz, well under the performance claimed in the Virtex-4 data sheet. This shows that high system performance can only be achieved by having an intelligent combination of embedded features and fabrics. Figure 3. Stratix II vs. Virtex-4 Proprietary IP & Open Core Relative Performance Comparison 5

FPGAs for High Performance DSP Applications Altera Corporation Table 4. Detailed Stratix II vs. Virtex-4 DSP Proprietary IP & Open Core Benchmark Data DSP IP Cateogry FPGA Embedded DSP Block Based FIR Filter FFT Stratix II (MHz) Performance Comparison Virtex-4 Stratix II/ (MHz) Virtex-4 FIR1 368 306 1.20 FIR2 376 333 1.13 FIR3 450 341 1.32 FIR4 406 322 1.26 FIR5 368 334 1.10 FFT1 389 293 1.33 FFT2 393 370 1.06 Forward Error Reed Solomon 284 196 1.45 Correction (FEC) Viterbi 229 231 0.99 Category Average 1.20 1.19 1.20 Open Cores AES (Rijndael) 231 222 1.04 1.04 CORDIC 374 366 1.02 1.02 Radix 4 Complex FFT (CFFT) 340 270 1.26 1.26 Simple FM Receiver (FM) 177 99 1.78 1.78 VCS-DCT 231 237 0.97 VCS Huffman Decoder 276 232 1.19 1.10 VCS Huffman Encoder 392 344 1.14 VGA/LCD Controller 269 246 1.09 1.09 Average 1.19 Low-Cost FPGA Proprietary IP & Open Core Comparison Altera s low-cost Cyclone II FPGAs offer up to 2 higher performance, and an average of 1.5 higher performance, than the Xilinx Spartan-3 family. Based on the benchmarked data, the Cyclone II device family operated at over 200 MHz in 9 of the 17 designs, and one FIR design exceeded 300 MHz. None of the 17 designs in Spartan-3 devices operated above 200 MHz. In addition, Cyclone II FPGAs outperform Spartan-3 devices in all designs benchmarked. This performance advantage can directly translate to higher channel count or lower cost for typical designs. Figure 4 shows the relative performance comparison between Cyclone II and Spartan-3 FPGAs. Table 5 shows detailed performance data for Cyclone II and Spartan-3 FPGAs. 6

Altera Corporation FPGAs for High Performance DSP Applications Figure 4. Cyclone II vs. Spartan-3 Proprietary DSP IP Core Relative Performance Comparison Table 5. Detailed Cyclone II vs. Spartan-3 DSP Proprietary IP & Open Core Benchmark Data DSP IP Cateogry FPGA Embedded DSP Block Based FIR Filter FFT Cyclone II (MHz) Performance Comparison Spartan-3 (MHz) Cyclone II / Spartan-3 FIR1 258 172 1.50 FIR2 314 186 1.68 FIR3 208 186 1.12 FIR4 209 154 1.36 FIR5 136 (1) (1) FFT1 211 144 1.46 FFT2 206 174 1.19 Forward Error Reed Solomon 197 100 1.97 Correction (FEC) Viterbi 172 109 1.57 Category Average 1.40 1.32 1.76 Open Cores AES (Rijndael) 147 125 1.18 1.18 CORDIC 246 175 1.40 1.40 Radix 4 Complex FFT 206 155 1.33 1.33 Simple FM Reciever (FM) 108 50 2.15 2.15 VCS-DCT 1.66 96 1.72 VCS-Huffman Decoder VCS-Huffman Encoder VGA/LCD Controller 183 128 1.43 266 178 1.50 1.55 173 118 1.16 1.46 Average 1.48 Note to Table 5: (1) The Spartan-3 family cannot support the required number of dedicated multipliers for this design. 7

FPGAs for High Performance DSP Applications Altera Corporation Conclusion Based on the benchmarking results from BDTI as well as Altera s rigorous benchmarking methodology, Stratix II and Cyclone II FPGAs provide a performance advantage over both popular DSP processors and the competing FPGAs. High system performance for DSP applications cannot be achieved by simply embedding dedicated multipliers it is an aggregate result of high-performance multipliers and performance-matching logic structure and routing architecture as implemented in Stratix II FPGAs. In addition, Altera s Quartus II development software and DSP Builder provide a simple way to access the DSP performance in Stratix II and Cyclone II FPGAs without time-consuming manual optimization. Altera devices provide, on average, 10 DSP processing power per dollar than the industry s most widely used DSP processor solutions. Altera s high-density Stratix II FPGAs offer up to 1.8 and an average of 1.2 higher performance than Xilinx s Virtex-4 family Altera s low-cost Cyclone II FPGAs offer up to 2 and an average of 1.5 higher performance than Xilinx s Spartan-3 family Higher DSP performance directly translates to cost savings in typical designs by increasing time-divisionmultiplexing and, therefore, increasing the total number of processing channels available in your system. Altera offers a comprehensive DSP solution consisting of a complete integrated software environment, performance-optimized devices, DSP intellectual property (IP) cores, development kits, reference designs, and customer training. For more information, visit www.altera.com/dsp. 8

Altera Corporation FPGAs for High Performance DSP Applications Appendix Proprietary DSP IP Core Information DSP IP Cateogry FPGA Embedded DSP Block Based FIR Filter Altera v.3.2.1 Xilinx v.5.1 Taps Description & Altera MegaCore IP Parameters Clock/ Output Coefficient Width Data Width Channel Coefficient Symmetry FIR1 128 64 16 16 1 Yes FIR2 128 64 8 8 1 Yes FIR3 128 16 8 8 1 Yes FIR4 128 4 8 8 1 Yes FIR5 128 1 8 8 1 Yes FFT Altera v.2.1.2 Xilinx v.3.1 Arch. Points Data Precision Twiddle Engine Throughput Engine # Complex Multiplier FFT1 Burst 1024 16-bit 16-bit Quad 1 Standard FFT2 Streaming 1024 16-bit 16-bit Quad 1 Standard Reed Solomon Decoder Altera v.3.6.0 Xilinx v.5.1 Reed Solomon Pre Setting Decoding Key Size Bit/ Symbol Symbol/ Codeword Check Symbol/ Codeword DVB Standard Continuous Half 8 204 16 Viterbi Decoder Architecture Soft Width Constraint Length Trace Back Altera v.4.2.0 Xilinx v.5.0 Viterbi Parallel 3 7 66 DSP Open Core Information Core ID Core Original URL AES AES (Rijndael) www.opencores.org/projects.cgi/web/aes_core CORDIC CORDIC www.opencores.org/projects.cgi/web/cordic/overview FM Simple FM Receiver www.opencores.org/projects.cgi/web/simple_fm_receiver VGA VGA/LCD Controller www.opencores.org/projects.cgi/web/vga_lcd VCS Video Compression System www.opencores.org/projects.cgi/web/video_systems CFFT Radix 4 Complex FFT www.opencores.org/projects.cgi/web/cfft 9

FPGAs for High Performance DSP Applications Altera Corporation References Stratix II Performance & Logic Efficiency White Paper FPGA Performance Benchmarking Methodology White Paper For more information on Stratix II FPGA performance, see the Altera web site (www.altera.com/alterazone) 101 Innovation Drive San Jose, CA 95134 (408) 544-7000 www.altera.com Copyright 2005 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company, the stylized Altera logo, specific device designations, and all other words and logos that are identified as trademarks and/or service marks are, unless noted otherwise, the trademarks and service marks of Altera Corporation in the U.S. and other countries.* All other product or service names are the property of their respective holders. Altera products are protected under numerous U.S. and foreign patents and pending applications, maskwork rights, and copyrights. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera Corporation. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. 10