8. Hardware Acceleration and Coprocessing
|
|
|
- Ralf Porter
- 10 years ago
- Views:
Transcription
1 July 2011 ED Hardware Acceleration and ED This chapter discusses how you can use hardware accelerators and coprocessing to create more eicient, higher throughput designs in OPC Builder. This chapter discusses the ollowing topics: Accelerating Cyclic Redundancy Checking (CRC) Creating Nios II Custom Instructions Using the C2H Compiler Creating ulticore Designs Pre- and Post-Processing Replacing tate achines Hardware Acceleration Hardware accelerators implemented in FPGAs oer a scalable solution or perormance-limited systems. Other alternatives or increasing system perormance include choosing higher perormance components or increasing the system clock requency. Although these other solutions are eective, in many cases they lead to additional cost, power, or design time. Accelerating Cyclic Redundancy Checking (CRC) CRC is signiicantly more eicient in hardware than sotware; consequently, you can improve the throughput o your system by implementing a hardware accelerator or CRC. In addition, by eliminating CRC rom the tasks that the processor must run, the processor has more bandwidth to accomplish other tasks. Figure 8 1 illustrates a system in which a Nios II processor oloads CRC processing to a hardware accelerator. In this system, the Nios II processor reads and writes registers to control the CRC using its Avalon emory-apped (Avalon-) slave port. The CRC component s Avalon- master port reads data or the CRC check rom memory. This design example and the HDL iles to implement it are ully explained in the OPC Builder Component Development Walkthrough chapter o the OPC Builder User Guide Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, AX, EGACORE, NIO, QUARTU and TRATIX are Reg. U.. Pat. & Tm. O. and/or trademarks o Altera Corporation in the U.. and other countries. All other trademarks and service marks are the property o their respective holders as described at Altera warrants perormance o its semiconductor products to current speciications in accordance with Altera s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out o the application or use o any inormation, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version o device speciications beore relying on any published inormation and beore placing orders or products or services. Embedded Design Handbook July 2011 ubscribe
2 8 2 Chapter 8: Hardware Acceleration and Hardware Acceleration Figure 8 1. A Hardware Accelerator or CRC Nios II Processor CRC Hardware Accelerator Arbiter emory An alternative approach includes a dedicated DA engine in addition to the Nios II processor. Figure 8 2 illustrates this design. In this system, the Nios II processor programs the DA engine, which transers data rom memory to the CRC. Figure 8 2. DA and Hardware Accelerator or CRC Nios II Processor DA Arbiter Arbiter emory CRC Hardware Accelerator Although Figure 8 2 shows the DA and CRC as separate blocks, you can combine them as a custom component which includes both an Avalon- master and slave port. You can import this component into your OPC Builder system using the component editor. 1 To learn more about using component editor, reer to the Component Editor chapter o the OPC Builder User Guide. You can ind additional examples o hardware acceleration on Altera s Embedded Processor Design Examples web page. Embedded Design Handbook July 2011 Altera Corporation
3 Chapter 8: Hardware Acceleration and 8 3 Hardware Acceleration atching I/O Bandwidths I/O bandwidth can have a large impact on overall perormance. Low I/O bandwidth can cause a high-perormance hardware accelerator to perorm poorly when the dedicated hardware requires higher throughput than the I/O can support. You can increase the overall system perormance by matching the I/O bandwidth to the computational needs o your system. Typically, memory interaces cause the most problems in systems that contain multiple processors and hardware accelerators. The ollowing recommendations or interace design can maximize the throughput o your hardware accelerator: atch high perormance memory and interaces to the highest priority tasks your system must perorm. Give high priority tasks a greater share o the I/O bandwidth i any memory or interace is shared. I you have multiple processors in your system, but only one o the processors provides real-time unctionality, assign it a higher arbitration share. Pipelining Algorithms A common problem in systems with multiple Avalon- master ports is competition or shared resources. You can improve perormance by pipelining the algorithm and buering the intermediate results in separate on-chip memories. Figure 8 3 illustrates this approach. Two hardware accelerators write their intermediate results to on-chip memory. The third module writes the inal result to an o-chip memory. toring intermediate results in on-chip memories reduces the I/O throughput required o the o-chip memory. By using on-chip memories as temporary storage you also reduce read latency because on-chip memory has a ixed, low-latency access time. Figure 8 3. Using On-Chip emory to Achieve High Perormance FPGA Dual Port On-Chip emory Buer Dual Port On-Chip emory Buer I/O Processor/ Hardware Accelerator Processor/ Hardware Accelerator Processor/ Hardware Accelerator External emory To learn more about the topics discussed in this section reer to the ystem Interconnect Fabric or emory-apped s and OPC Builder emory ubsystem Development Walkthrough chapters o the OPC Builder User Guide. To learn more about optimizing memory design reer to the emory ystem Design chapter o the Embedded Design Handbook. July 2011 Altera Corporation Embedded Design Handbook
4 8 4 Chapter 8: Hardware Acceleration and Hardware Acceleration Creating Nios II Custom Instructions The Nios II processor employs a RIC architecture which can be expanded with custom instructions. The Nios II processor includes a standard interace that you can use to implement your own custom instruction hardware in parallel with the arithmetic logic unit (ALU). All custom instructions have a similar structure. They include up to two data inputs and one data output, and optional clock, reset, mode, address, and status signals or more complex multicycle operations. I you need to add hardware acceleration that requires many inputs and outputs, a custom hardware accelerator with an Avalon- slave port is a more appropriate solution. Custom instructions are blocking operations that prevent the processor rom executing additional instructions until the custom instruction has completed. To avoid stalling the processor while your custom instruction is running, you can convert your custom instruction into a hardware accelerator with an Avalon- slave port. I you do so, the processor and custom peripheral can operate in parallel. Figure 8 4 illustrates the dierences in implementation between a custom instruction and a hardware accelerator. Figure 8 4. Implementation Dierences between a Custom Instruction and Hardware Accelerator Nios II Processor Core Operand A General Purpose Registers Operand B ALU Custom Instruction Custom Instruction (Operator) readdata[31:0] Hardware Accelerator ystem Interconnect Fabric Avalon- lave Port writedata[31:0] D E D Q Q Operand A Operand B Operator address[0] E write decoder Embedded Design Handbook July 2011 Altera Corporation
5 Chapter 8: Hardware Acceleration and 8 5 Hardware Acceleration Figure 8 5. Individual Custom Instructions Because custom instructions extend the Nios II processor s ALU, the logic must meet timing or the AX o the processor will suer. As you add custom instructions to the processor, the ALU multiplexer grows in width as Figure 8 5 illustrates. This multiplexer selects the output rom the ALU hardware (c[31:0] in Figure 8 5). Although you can pipeline custom instructions, you have no control over the automatically inserted ALU multiplexer. As a result, you cannot pipeline the multiplexer or higher perormance. Custom 0 Custom 1 Custom 2 Custom 3 a[31:0] ALU + - c[31:0] >> << b[31:0] & July 2011 Altera Corporation Embedded Design Handbook
6 8 6 Chapter 8: Hardware Acceleration and Hardware Acceleration Instead o adding several custom instructions, you can combine the unctionality into a single logic block as shown in Figure 8 6. When you combine custom instructions you use selector bits to select the required unctionality. I you create a combined custom instruction, you must insert the multiplexer in your logic manually. This approach gives you ull control over the multiplexer logic that generates the output. You can pipeline the multiplexer to prevent your combined custom instruction rom becoming part o a critical timing path. Figure 8 6. Combined Custom Instruction Custom 0 Combined Custom Instruction Custom 1 Custom 2 D Q Custom 3 select[1:0] a[31:0] ALU + - >> << c[31..0] b[31:0] & Notes to Figure 8 6: (1) The Nios II compiler calls the select wires to the multiplexer <n>. With multiple custom instructions built into a logic block, you can pipeline the output i it ails timing. To combine custom instructions, each must have identical latency characteristics. Embedded Design Handbook July 2011 Altera Corporation
7 Chapter 8: Hardware Acceleration and 8 7 Hardware Acceleration Custom instructions are either ixed latency or variable latency. You can convert ixed latency custom instructions to variable latency by adding timing logic. Figure 8 7 shows the simplest method to implement this conversion by shiting the start bit by <n> clock cycles and logically ORing all the done bits. Figure 8 7. equencing Logic or ixed Latency Combined Custom Instruction ci_done Custom Instruction 0 with Variable Latency n done_0 done_1 done Custom Instruction 1 with a Fixed Latency o 3 Clock Cycles start D Q D Q D Q Each custom instruction contains at least one custom instruction slave port, through which it connects to the ALU. A custom instruction slave is that slave port: the slave interace that receives the data input signals and returns the result. The custom instruction master is the master port o the processor that connects to the custom instruction. For more inormation about creating and using custom instructions reer to the Nios II Custom Instruction User Guide. Using the C2H Compiler You can use the Nios II C2H Compiler to compile your C source code into HDL synthesizable source code. OPC Builder automatically places your hardware accelerator in your system. OPC Builder automatically connects all the master ports to the necessary memories and connects the Nios II processor data master port to the accelerator slave port which is used to transer data. July 2011 Altera Corporation Embedded Design Handbook
8 8 8 Chapter 8: Hardware Acceleration and Choose the Nios II C2H Compiler instead o custom instructions when your algorithm requires access to memory. The C2H Compiler creates Avalon- masters that access memory. I your algorithm accesses several memories, the C2H Compiler creates a master per memory access, allowing you to beneit rom the concurrent access eature o the system interconnect abric. You can also use the C2H Compiler to create hardware accelerators that are non-blocking so that you can use the accelerator in parallel with other sotware unctionality. Figure 8 8. C2H Discrete Cosine Transorm (DCT) Block Diagram Nios II Processor DCT Hardware Accelerator (Generated by C2H) Arbiter Arbiter Arbiter Arbiter Code & Data emory Input Buer emory Cosine Table emory Output Buer emory In igure Figure 8 8 the two-dimensional DCT algorithm is accelerated to oload a Nios II processor. The DCT algorithm requires access to input and output buers as well as a cosine lookup table. Assuming that each resides in separate memories, the hardware accelerator can access all three memories concurrently. For more inormation please reer to the Nios II C2H Compiler User Guide and the Optimizing C2H Compiler Results chapter in the Embedded Design Handbook. There are also C2H examples available on the Altera website. Partitioning system unctionality between a Nios II processor and hardware accelerators or between multiple Nios II processors in your FPGA can help you control costs. The ollowing sections demonstrate how you can use coprocessing to create high perormance systems. Embedded Design Handbook July 2011 Altera Corporation
9 Chapter 8: Hardware Acceleration and 8 9 Creating ulticore Designs ulticore designs combine multiple processor cores in a single FPGA to create a higher perormance computing system. Typically, the processors in a multicore design can communicate with each other. Designs including the Nios II processor can implement inter-processor communication, or the processors can operate autonomously. When a design includes more than one processor you must partition the algorithm careully to make eicient use o all o the processors. The ollowing example includes a Nios II-based system that perorms video over IP, using a network interace to supply data to a discrete DP processor. The original design overutilizes the Nios II processor. The system perorms the ollowing steps to transer data rom the network to the DP processor: 1. The network interace signals when a ull data packet has been received. 2. The Nios II processor uses a DA engine to transer the packet to a dual-port on-chip memory buer. 3. The Nios II processor processes the packet in the on-chip memory buer. 4. The Nios II processor uses the DA engine to transer the video data to the DP processor. In the original design, the Nios II processor is also responsible or communications with the ollowing peripherals that include Avalon- slave ports: Hardware debug interace User interace Power management Figure 8 9. Over-Utilized Video ystem Remote control receiver Figure 8 9 illustrates this design. Nios II/ Core Dual-Port On-Chip emory Buer Data Plane Control Plane Network DA External DP DRA Hardware Debug User Power anagement Remote Control Receiver July 2011 Altera Corporation Embedded Design Handbook
10 8 10 Chapter 8: Hardware Acceleration and Adding a second Nios II processor to the system, allows the workload to be divided so that one processor handles the network unctionality and the other the control interaces. Figure 8 10 illustrates the revised design. Because the revised design has two processors, you must create two sotware projects; however, each o these sotware projects handles ewer tasks and is simpler to create and maintain. You must also create a mechanism or inter-processor communication. The inter-processor communication in this system is relatively simple and is justiied by the system perormance increase. For more inormation about designing hardware and sotware or inter-processor communication, reer to the Creating ultiprocessor Nios II ystems Tutorial and the Peripherals section o the Embedded Peripherals IP User Guide. Reer to the Nios II Processor Reerence Handbook or complete inormation about this sot core processor. A Nios II ultiprocessor Design Example is available on the Altera website. Figure High Perormance Video ystem Nios II/ Core Dual-Port On-Chip emory Buer Data Plane Control Plane Network DA External DP Nios II/e Core DRA Hardware Debug User Power anagement Remote Control Receiver In Figure 8 10, the second Nios II processor added to the system perorms primarily low-level maintenance tasks; consequently, the Nios II/e core is used. The Nios II/e core implements only the most basic processor unctionality in an eort to trade o perormance or a small hardware ootprint. This core is approximately one-third the size o the Nios II/ core. To learn more about the three Nios II processor cores reer to the Nios II Core Implementation Details chapter in the Nios II Processor Reerence Handbook. Embedded Design Handbook July 2011 Altera Corporation
11 Chapter 8: Hardware Acceleration and 8 11 Pre- and Post-Processing The high perormance video system illustrated in Figure 8 10 distributes the workload by separating the control and data planes in the hardware. Figure 8 11 illustrates a dierent approach. All three stages o a DP workload are implemented in sotware running on a discrete processor. This workload includes the ollowing stages: Input processing typically removing packet headers and error correction inormation Algorithmic processing and error correction processing the data Output processing typically adding error correction, converting data stream to packets, driving data to I/O devices By o loading the processing required or the inputs or outputs to an FPGA, the discrete processor has more computation bandwidth available or the algorithmic processing. Figure Discrete Processing tages Discrete Processor Inputs Input Processing Algorithmic Processing Output Processing Outputs July 2011 Altera Corporation Embedded Design Handbook
12 8 12 Chapter 8: Hardware Acceleration and I the discrete processor requires more computational bandwidth or the algorithm, dedicated coprocessing can be added. Figure 8 12 below shows examples o dedicated coprocessing at each stage. Figure Pre- Dedicated, and Post-Processing Example 1: Inputs FPGA Pre-Processing Discrete Processor Outputs Example 2: Inputs Discrete Processor Outputs FPGA Dedicated Processing Example 3: Inputs Discrete Processor FPGA Post-Processing Outputs FPGA Discrete Processor Replacing tate achines You can use the Nios II processor to implement scalable and eicient state machines. When you use dedicated hardware to implement state machines, each additional state or state transition increases the hardware utilization. In contrast, adding the same unctionality to a state machine that runs on the Nios II processor only increases the memory utilization o the Nios II processor. A key beneit o using Nios II or state machine implementation is the reduction o complexity that results rom using sotware instead o hardware. A processor, by deinition, is a state machine that contains many states. These states can be stored in either the processor register set or the memory available to the processor; consequently, state machines that would not it in the ootprint o a FPGA can be created using memory connected to the Nios II processor. When designing state machines to run on the Nios II processor, you must understand the necessary throughput requirements o your system. Typically, a state machine is comprised o decisions (transitions) and actions (outputs) based on stimuli (inputs). The processor you have chosen determines the speed at which these operations take place. The state machine speed also depends on the complexity o the algorithm being implemented. You can subdivide the algorithm into separate state machines using sotware modularity or even multiple Nios II processor cores that interact together. Embedded Design Handbook July 2011 Altera Corporation
13 Chapter 8: Hardware Acceleration and 8 13 Low-peed tate achines Low-speed state machines are typically used to control peripherals. The Nios II/e processor pictured in Figure 8 10 on page 8 10 could implement a low speed state machine to control the peripherals. 1 Even though the Nios II/e core does not include a data cache, Altera recommends that the sotware accessing the peripherals use data cache bypassing. Doing so avoids potential cache coherency issues i the sotware is ever run on a Nios II/ core that includes a data cache. For inormation about data cache bypass methods, reer to the Processor Architecture chapter o the Nios II Processor Reerence Handbook. tate machines implemented in OPC Builder require the ollowing components: A Nios II processor Program and data memory timuli interaces Output interaces The building blocks you use to construct a state machine in OPC Builder are no dierent than those you would use i you were creating a state machine manually. One noticeable dierence in the OPC Builder environment is accessing the interaces rom the Nios II processor. The Nios II processor uses an Avalon- master port to access peripherals. Instead o accessing the interaces using signals, you communicate via memory-mapped interaces. emory-mapped interaces simpliy the design o large state machines because managing memory is much simpler than creating numerous directly connected interaces. For more inormation about the Avalon- interace, reer to the Avalon peciications. High-peed tate achines You should implement high throughput state machine using a Nios II/ core. To maximize perormance, ocus on the I/O interaces and memory types. The ollowing recommendations on memory usage can maximize the throughput o your state machine: Use on-chip memory to store logic or high-speed decision making. Use tightly-coupled memory i the state machine must operate with deterministic latency. Tightly-coupled memory has the same access time as cache memory; consequently, you can avoid using cache memory and the cache coherency problems that might result. For more inormation about tightly-coupled memory, reer to the Cache and Tightly-Coupled emory chapter o the Nios II otware Developer's Handbook. July 2011 Altera Corporation Embedded Design Handbook
14 8 14 Chapter 8: Hardware Acceleration and Document Revision History ubdivided tate achines ubdividing a hardware-based state machine into smaller more manageable units can be diicult. I you choose to keep some o the state machine unctionality in a hardware implementation, you can use the Nios II processor to assist it. For example, you may wish to use a hardware state machine or the high data throughput unctionality and Nios II or the slower operations. I you have partitioned a complicated state machine into smaller, hardware based state machines, you can use the Nios II processor or scheduling. Document Revision History Table 8 1. Document Revision History Table 8 1 shows the revision history or this document. Date Version Changes July Updated reerences and clariied meaning o custom instruction slave and custom instruction master. June Corrected Table o Contents. arch Initial release. Embedded Design Handbook July 2011 Altera Corporation
1. Overview of Nios II Embedded Development
May 2011 NII52001-11.0.0 1. Overview o Nios II Embedded Development NII52001-11.0.0 The Nios II Sotware Developer s Handbook provides the basic inormation needed to develop embedded sotware or the Altera
1. Overview of Nios II Embedded Development
January 2014 NII52001-13.1.0 1. Overview o Nios II Embedded Development NII52001-13.1.0 The Nios II Sotware Developer s Handbook provides the basic inormation needed to develop embedded sotware or the
Applying the Benefits of Network on a Chip Architecture to FPGA System Design
Applying the Benefits of on a Chip Architecture to FPGA System Design WP-01149-1.1 White Paper This document describes the advantages of network on a chip (NoC) architecture in Altera FPGA system design.
2. Developing Nios II Software
2. Developing Nios II Sotware July 2011 ED51002-1.4 ED51002-1.4 Introduction This chapter provides in-depth inormation about sotware development or the Altera Nios II processor. It complements the Nios
Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah
(DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de [email protected] NIOS II 1 1 What is Nios II? Altera s Second Generation
8. Exception Handling
8. Exception Handling February 2011 NII52006-10.1.0 NII52006-10.1.0 Introduction This chapter discusses how to write programs to handle exceptions in the Nios II processor architecture. Emphasis is placed
13. Publishing Component Information to Embedded Software
February 2011 NII52018-10.1.0 13. Publishing Component Information to Embedded Software NII52018-10.1.0 This document describes how to publish SOPC Builder component information for embedded software tools.
Qsys and IP Core Integration
Qsys and IP Core Integration Prof. David Lariviere Columbia University Spring 2014 Overview What are IP Cores? Altera Design Tools for using and integrating IP Cores Overview of various IP Core Interconnect
2. Getting Started with the Graphical User Interface
May 2011 NII52017-11.0.0 2. Getting Started with the Graphical User Interace NII52017-11.0.0 The Nios II Sotware Build Tools (SBT) or Eclipse is a set o plugins based on the Eclipse ramework and the Eclipse
Using the On-Chip Signal Quality Monitoring Circuitry (EyeQ) Feature in Stratix IV Transceivers
Using the On-Chip Signal Quality Monitoring Circuitry (EyeQ) Feature in Stratix IV Transceivers AN-605-1.2 Application Note This application note describes how to use the on-chip signal quality monitoring
Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor
Von der Hardware zur Software in FPGAs mit Embedded Prozessoren Alexander Hahn Senior Field Application Engineer Lattice Semiconductor AGENDA Overview Mico32 Embedded Processor Development Tool Chain HW/SW
Using Altera MAX Series as Microcontroller I/O Expanders
2014.09.22 Using Altera MAX Series as Microcontroller I/O Expanders AN-265 Subscribe Many microcontroller and microprocessor chips limit the available I/O ports and pins to conserve pin counts and reduce
Nios II Software Developer s Handbook
Nios II Software Developer s Handbook Nios II Software Developer s Handbook 101 Innovation Drive San Jose, CA 95134 www.altera.com NII5V2-13.1 2014 Altera Corporation. All rights reserved. ALTERA, ARRIA,
Avalon Interface Specifications
Avalon Interface Specifications Subscribe MNL-AVABUSREF 101 Innovation Drive San Jose, CA 95134 www.altera.com TOC-2 Contents 1. Introduction to the Avalon Interface Specifications... 1-1 1.1 Avalon Properties
White Paper Increase Flexibility in Layer 2 Switches by Integrating Ethernet ASSP Functions Into FPGAs
White Paper Increase Flexibility in Layer 2 es by Integrating Ethernet ASSP Functions Into FPGAs Introduction A Layer 2 Ethernet switch connects multiple Ethernet LAN segments. Because each port on the
What is a System on a Chip?
What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex
White Paper Streaming Multichannel Uncompressed Video in the Broadcast Environment
White Paper Multichannel Uncompressed in the Broadcast Environment Designing video equipment for streaming multiple uncompressed video signals is a new challenge, especially with the demand for high-definition
Section II. Hardware Abstraction Layer
Section II. Hardware Abstraction Layer This section describes the Nios II hardware abstraction layer (HAL). It includes the ollowing chapters: Chapter 5, Overview o the Hardware Abstraction Layer Chapter
Qsys System Design Tutorial
2015.05.04 TU-01006 Subscribe This tutorial introduces you to the Qsys system integration tool available with the Quartus II software. This tutorial shows you how to design a system that uses various test
Architectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
A performance analysis of EtherCAT and PROFINET IRT
A perormance analysis o EtherCAT and PROFINET IRT Conerence paper by Gunnar Prytz ABB AS Corporate Research Center Bergerveien 12 NO-1396 Billingstad, Norway Copyright 2008 IEEE. Reprinted rom the proceedings
Networking Remote-Controlled Moving Image Monitoring System
Networking Remote-Controlled Moving Image Monitoring System First Prize Networking Remote-Controlled Moving Image Monitoring System Institution: Participants: Instructor: National Chung Hsing University
7. Mentor Graphics PCB Design Tools Support
June 2012 QII52015-12.0.0 7. Mentor Graphics PCB Design Tools Support QII52015-12.0.0 This chapter discusses how the Quartus II sotware interacts with the Mentor Graphics I/O Designer sotware and the DxDesigner
1. Cyclone III Device Family Overview
1. Cyclone III Device Family Overview July 2012 CIII51001-2.4 CIII51001-2.4 Cyclone III device amily oers a unique combination o high unctionality, low power and low cost. Based on Taiwan Semiconductor
Video and Image Processing Design Example
Video and Image Processing Design Example AN-427-10.2 Application Note The Altera Video and Image Processing Design Example demonstrates the following items: A framework for rapid development of video
FPGAs for High-Performance DSP Applications
White Paper FPGAs for High-Performance DSP Applications This white paper compares the performance of DSP applications in Altera FPGAs with popular DSP processors as well as competitive FPGA offerings.
!!! Technical Notes : The One-click Installation & The AXIS Internet Dynamic DNS Service. Table of contents
Technical Notes: One-click Installation & The AXIS Internet Dynamic DNS Service Rev: 1.1. Updated 2004-06-01 1 Table o contents The main objective o the One-click Installation...3 Technical description
A Safety Methodology for ADAS Designs in FPGAs
A Safety Methodology for ADAS Designs in FPGAs WP-01204-1.0 White Paper This white paper discusses the use of Altera FPGAs in safety-critical Advanced Driver Assistance Systems (ADAS). It looks at the
7a. System-on-chip design and prototyping platforms
7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit
Enhancing High-Speed Telecommunications Networks with FEC
White Paper Enhancing High-Speed Telecommunications Networks with FEC As the demand for high-bandwidth telecommunications channels increases, service providers and equipment manufacturers must deliver
USB-Blaster Download Cable User Guide
USB-Blaster Download Cable User Guide Subscribe UG-USB81204 101 Innovation Drive San Jose, CA 95134 www.altera.com TOC-2 Contents Introduction to USB-Blaster Download Cable...1-1 USB-Blaster Revision...1-1
Ping Pong Game with Touch-screen. March 2012
Ping Pong Game with Touch-screen March 2012 xz2266 Xiang Zhou hz2256 Hao Zheng rz2228 Ran Zheng yc2704 Younggyun Cho Abstract: This project is conducted using the Altera DE2 development board. We are aiming
Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
1. Cyclone IV FPGA Device Family Overview
March 2016 CYIV-51001-2.0 1. Cyclone IV FPGA Device Family Overview CYIV-51001-2.0 Altera s new Cyclone IV FPGA device amily extends the Cyclone FPGA series leadership in providing the market s lowest-cost,
Using the Agilent 3070 Tester for In-System Programming in Altera CPLDs
Using the Agilent 3070 Tester for In-System Programming in Altera CPLDs AN-628-1.0 Application Note This application note describes how to use the Agilent 3070 test system to achieve faster programming
Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip
Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana
Nios II-Based Intellectual Property Camera Design
Nios II-Based Intellectual Property Camera Design Third Prize Nios II-Based Intellectual Property Camera Design Institution: Participants: Instructor: Xidian University Jinbao Yuan, Mingsong Chen, Yingzhao
ModelSim-Altera Software Simulation User Guide
ModelSim-Altera Software Simulation User Guide ModelSim-Altera Software Simulation User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com UG-01102-2.0 Document last updated for Altera Complete
Combinational-Circuit Building Blocks
May 9, 24 :4 vra6857_ch6 Sheet number Page number 35 black chapter 6 Combinational-Circuit Building Blocks Chapter Objectives In this chapter you will learn about: Commonly used combinational subcircuits
December 2002, ver. 1.0 Application Note 285. This document describes the Excalibur web server demonstration design and includes the following topics:
Excalibur Web Server Demonstration December 2002, ver. 1.0 Application Note 285 Introduction This document describes the Excalibur web server demonstration design and includes the following topics: Design
Digital Systems Design Based on DSP algorithms in FPGA for Fault Identification in Rotary Machines
Digital Systems Design Based on DSP algorithms in FPGA or Fault Identiication in Rotary Machines Masamori Kashiwagi 1, Cesar da Costa 1 *, Mauro Hugo Mathias 1 1 Department o Mechanical Engineering, UNESP
MAX 10 Analog to Digital Converter User Guide
MAX 10 Analog to Digital Converter User Guide Subscribe Last updated for Quartus Prime Design Suite: 16.0 UG-M10ADC 101 Innovation Drive San Jose, CA 95134 www.altera.com TOC-2 Contents MAX 10 Analog to
Using Nios II Floating-Point Custom Instructions Tutorial
Using Nios II Floating-Point Custom Instructions Tutorial 101 Innovation Drive San Jose, CA 95134 www.altera.com TU-N2FLTNGPNT-2.0 Copyright 2010 Altera Corporation. All rights reserved. Altera, The Programmable
ELECTENG702 Advanced Embedded Systems. Improving AES128 software for Altera Nios II processor using custom instructions
Assignment ELECTENG702 Advanced Embedded Systems Improving AES128 software for Altera Nios II processor using custom instructions October 1. 2005 Professor Zoran Salcic by Kilian Foerster 10-8 Claybrook
White Paper Video Surveillance Implementation Using FPGAs
White Paper Surveillance Implementation Using s Introduction Currently, the video surveillance industry uses analog CCTV cameras and interfaces as the basis of surveillance systems. These system components
Introduction to the Altera Qsys System Integration Tool. 1 Introduction. For Quartus II 12.0
Introduction to the Altera Qsys System Integration Tool For Quartus II 12.0 1 Introduction This tutorial presents an introduction to Altera s Qsys system inegration tool, which is used to design digital
- Nishad Nerurkar. - Aniket Mhatre
- Nishad Nerurkar - Aniket Mhatre Single Chip Cloud Computer is a project developed by Intel. It was developed by Intel Lab Bangalore, Intel Lab America and Intel Lab Germany. It is part of a larger project,
Pre-tested System-on-Chip Design. Accelerates PLD Development
Pre-tested System-on-Chip Design Accelerates PLD Development March 2010 Lattice Semiconductor 5555 Northeast Moore Ct. Hillsboro, Oregon 97124 USA Telephone: (503) 268-8000 www.latticesemi.com 1 Pre-tested
COMPUTER HARDWARE. Input- Output and Communication Memory Systems
COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)
Low-Overhead Hard Real-time Aware Interconnect Network Router
Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering
Altera Error Message Register Unloader IP Core User Guide
2015.06.12 Altera Error Message Register Unloader IP Core User Guide UG-01162 Subscribe The Error Message Register (EMR) Unloader IP core (altera unloader) reads and stores data from the hardened error
Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik
Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik Contents Überblick: Aufbau moderner FPGA Einblick: Eigenschaften
Using the Altera Serial Flash Loader Megafunction with the Quartus II Software
Using the Altera Flash Loader Megafunction with the Quartus II Software AN-370 Subscribe The Altera Flash Loader megafunction IP core is an in-system programming (ISP) solution for Altera serial configuration
Multicore Programming with LabVIEW Technical Resource Guide
Multicore Programming with LabVIEW Technical Resource Guide 2 INTRODUCTORY TOPICS UNDERSTANDING PARALLEL HARDWARE: MULTIPROCESSORS, HYPERTHREADING, DUAL- CORE, MULTICORE AND FPGAS... 5 DIFFERENCES BETWEEN
Altera Advanced SEU Detection IP Core User Guide
2015.05.04 ALTADVSEU Subscribe The Altera Advanced SEU Detection IP core contains the following features: Hierarchy tagging Enables tagging of logical hierarchies and specifying their criticality relative
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
From Bus and Crossbar to Network-On-Chip. Arteris S.A.
From Bus and Crossbar to Network-On-Chip Arteris S.A. Copyright 2009 Arteris S.A. All rights reserved. Contact information Corporate Headquarters Arteris, Inc. 1741 Technology Drive, Suite 250 San Jose,
Engineering Change Order (ECO) Support in Programmable Logic Design
White Paper Engineering Change Order (ECO) Support in Programmable Logic Design A major benefit of programmable logic is that it accommodates changes to the system specification late in the design cycle.
Providing Battery-Free, FPGA-Based RAID Cache Solutions
Providing Battery-Free, FPGA-Based RAID Cache Solutions WP-01141-1.0 White Paper RAID adapter cards are critical data-center subsystem components that ensure data storage and recovery during power outages.
User Guide. Introduction. HCS12PLLCALUG/D Rev. 0, 12/2002. HCS12 PLL Component Calculator
User Guide HCS12PLLCALUG/D Rev. 0, 12/2002 HCS12 PLL Component Calculator by Stuart Robb Applications Engineering Motorola, East Kilbride Introduction The MC9S12D amily o MCUs includes a Phase-Locked Loop
Production Flash Programming Best Practices for Kinetis K- and L-series MCUs
Freescale Semiconductor Document Number:AN4835 Application Note Rev 1, 05/2014 Production Flash Programming Best Practices for Kinetis K- and L-series MCUs by: Melissa Hunter 1 Introduction This application
Model-based system-on-chip design on Altera and Xilinx platforms
CO-DEVELOPMENT MANUFACTURING INNOVATION & SUPPORT Model-based system-on-chip design on Altera and Xilinx platforms Ronald Grootelaar, System Architect [email protected] Agenda 3T Company profile Technology
Application-Level Optimizations on NUMA Multicore Architectures: the Apache Case Study
Application-Level Optimizations on NUMA Multicore Architectures: the Apache Case Study Fabien Gaud INRIA [email protected] Gilles Muller INRIA [email protected] Renaud Lachaize Grenoble University
Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family
Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family White Paper June, 2008 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL
White Paper 40-nm FPGAs and the Defense Electronic Design Organization
White Paper 40-nm FPGAs and the Defense Electronic Design Organization Introduction With Altera s introduction of 40-nm FPGAs, the design domains of military electronics that can be addressed with programmable
Quartus II Software and Device Support Release Notes Version 15.0
2015.05.04 Quartus II Software and Device Support Release Notes Version 15.0 RN-01080-15.0.0 Subscribe This document provides late-breaking information about the Altera Quartus II software release version
How To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual
Architetture di bus per System-On On-Chip Massimo Bocchi Corso di Architettura dei Sistemi Integrati A.A. 2002/2003 System-on on-chip motivations 400 300 200 100 0 19971999 2001 2003 2005 2007 2009 Transistors
PROFINET IRT: Getting Started with The Siemens CPU 315 PLC
PROFINET IRT: Getting Started with The Siemens CPU 315 PLC AN-674 Application Note This document shows how to demonstrate a working design using the PROFINET isochronous real-time (IRT) device firmware.
Switch Fabric Implementation Using Shared Memory
Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today
Building an IP Surveillance Camera System with a Low-Cost FPGA
Building an IP Surveillance Camera System with a Low-Cost FPGA WP-01133-1.1 White Paper Current market trends in video surveillance present a number of challenges to be addressed, including the move from
QorIQ T4 Family of Processors. Our highest performance processor family. freescale.com
of Processors Our highest performance processor family freescale.com Application Brochure QorIQ Communications Platform: Scalable Processing Performance Overview The QorIQ communications processors portfolio
Let s put together a Manual Processor
Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce
White Paper Utilizing Leveling Techniques in DDR3 SDRAM Memory Interfaces
White Paper Introduction The DDR3 SDRAM memory architectures support higher bandwidths with bus rates of 600 Mbps to 1.6 Gbps (300 to 800 MHz), 1.5V operation for lower power, and higher densities of 2
Five Ways to Build Flexibility into Industrial Applications with FPGAs
Five Ways to Build Flexibility into Industrial Applications with FPGAs by Jason Chiang and Stefano Zammattio, Altera Corporation WP-01154-2.0 White Paper This document describes using an Altera industrial-grade
A FRAMEWORK FOR AUTOMATIC FUNCTION POINT COUNTING
A FRAMEWORK FOR AUTOMATIC FUNCTION POINT COUNTING FROM SOURCE CODE Vinh T. Ho and Alain Abran Sotware Engineering Management Research Laboratory Université du Québec à Montréal (Canada) [email protected]
White Paper FPGA Performance Benchmarking Methodology
White Paper Introduction This paper presents a rigorous methodology for benchmarking the capabilities of an FPGA family. The goal of benchmarking is to compare the results for one FPGA family versus another
Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com
Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and
Video and Image Processing Suite
Video and Image Processing Suite January 2006, Version 6.1 Errata Sheet This document addresses known errata and documentation issues for the MegaCore functions in the Video and Image Processing Suite,
Computer Organization & Architecture Lecture #19
Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of
Accelerating High-Speed Networking with Intel I/O Acceleration Technology
White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing
UMBC. ISA is the oldest of all these and today s computers still have a ISA bus interface. in form of an ISA slot (connection) on the main board.
Bus Interfaces Different types of buses: ISA (Industry Standard Architecture) EISA (Extended ISA) VESA (Video Electronics Standards Association, VL Bus) PCI (Periheral Component Interconnect) USB (Universal
All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule
All Programmable Logic Hans-Joachim Gelke Institute of Embedded Systems Institute of Embedded Systems 31 Assistants 10 Professors 7 Technical Employees 2 Secretaries www.ines.zhaw.ch Research: Education:
SAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
Chapter 2 Heterogeneous Multicore Architecture
Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is
LogiCORE IP AXI Performance Monitor v2.00.a
LogiCORE IP AXI Performance Monitor v2.00.a Product Guide Table of Contents IP Facts Chapter 1: Overview Target Technology................................................................. 9 Applications......................................................................
A Generic Network Interface Architecture for a Networked Processor Array (NePA)
A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction
RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition
RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition A Tutorial Approach James O. Hamblen Georgia Institute of Technology Michael D. Furman Georgia Institute of Technology KLUWER ACADEMIC PUBLISHERS Boston
OTU2 I.7 FEC IP Core (IP-OTU2EFECI7Z) Data Sheet
OTU2 I.7 FEC IP Core (IP-OTU2EFECI7Z) Data Sheet Revision 0.02 Release Date 2015-02-24 Document number TD0382 . All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX
Implementation of Web-Server Using Altera DE2-70 FPGA Development Kit
1 Implementation of Web-Server Using Altera DE2-70 FPGA Development Kit A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT OF FOR THE DEGREE IN Bachelor of Technology In Electronics and Communication
Quartus II Software Design Series : Foundation. Digitale Signalverarbeitung mit FPGA. Digitale Signalverarbeitung mit FPGA (DSF) Quartus II 1
(DSF) Quartus II Stand: Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de [email protected] Quartus II 1 Quartus II Software Design Series : Foundation 2007 Altera
OpenSoC Fabric: On-Chip Network Generator
OpenSoC Fabric: On-Chip Network Generator Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf MODSIM 2014 Presentation
AN FPGA FRAMEWORK SUPPORTING SOFTWARE PROGRAMMABLE RECONFIGURATION AND RAPID DEVELOPMENT OF SDR APPLICATIONS
AN FPGA FRAMEWORK SUPPORTING SOFTWARE PROGRAMMABLE RECONFIGURATION AND RAPID DEVELOPMENT OF SDR APPLICATIONS David Rupe (BittWare, Concord, NH, USA; [email protected]) ABSTRACT The role of FPGAs in Software
Real-Time Challenges and Opportunities in SoCs
Real-Time Challenges and Opportunities in SoCs WP-01190-1.1 White Paper Advanced process technology and system-integration provide the driving forces behind silicon convergence. FPGAs speed along this
Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:
55 Topic 3 Computer Performance Contents 3.1 Introduction...................................... 56 3.2 Measuring performance............................... 56 3.2.1 Clock Speed.................................
The Design and Implementation of a Solar Tracking Generating Power System
The Design and Implementation o a Solar Tracking Generating Power System Y. J. Huang, Member, IAENG, T. C. Kuo, Member, IAENG, C. Y. Chen, C. H. Chang, P. C. Wu, and T. H. Wu Abstract A solar tracking
Xeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
Video Conference System
CSEE 4840: Embedded Systems Spring 2009 Video Conference System Manish Sinha Srikanth Vemula Project Overview Top frame of screen will contain the local video Bottom frame will contain the network video
