FYS3240 PC-based instrumentation and microcontrollers. FPGA and GPU. Spring 2012 Lecture #11

Similar documents
Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Lab View with crio Tutorial. Control System Design Feb. 14, 2006

Medical Device Design: Shorten Prototype and Deployment Time with NI Tools. NI Technical Symposium 2008

9/14/ :38

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

How To Develop An Iterio Data Acquisition System For A Frustreo (Farc) (Iterio) (Fcfc) (For Aterio (Fpc) (Orterio).Org) (Ater

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

FYS3240 PC-based instrumentation and microcontrollers. Introduction. Spring 2012 Lecture #1

HPC with Multicore and GPUs

Introduction to the NI Real-Time Hypervisor

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

NI Platform for automotive measurement and test applications

HPC Wales Skills Academy Course Catalogue 2015

ni.com/academic NI Academic Products

GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

National Instruments MIMO Technology Demonstration

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

What s New in Mike Bailey LabVIEW Technical Evangelist. uk.ni.com

Chapter 1 Computer System Overview

The BSN Hardware and Software Platform: Enabling Easy Development of Body Sensor Network Applications

How To Build An Ark Processor With An Nvidia Gpu And An African Processor

Introduction to GPU hardware and to CUDA

OpenACC Basics Directive-based GPGPU Programming

Digital Systems Design! Lecture 1 - Introduction!!

Data storage and high-speed streaming

Best Practices for Deploying, Replicating, and Managing Real-Time and FPGA Applications. ni.com

BUILD VERSUS BUY. Understanding the Total Cost of Embedded Design.

7a. System-on-chip design and prototyping platforms

Open Flow Controller and Switch Datasheet

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

ni.com/sts NI Semiconductor Test Systems

LogiCORE IP AXI Performance Monitor v2.00.a

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

GPU Computing. The GPU Advantage. To ExaScale and Beyond. The GPU is the Computer

Multicore Programming with LabVIEW Technical Resource Guide

Data Center and Cloud Computing Market Landscape and Challenges

Embedded Systems: map to FPGA, GPU, CPU?

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Programming GPUs with CUDA

NI NI C Series Overview DATASHEET. 8-Channel Sinking Digital Input Module

Basics of Simulation Technology (SPICE), Virtual Instrumentation and Implications on Circuit and System Design

Router Architectures

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Lab Experiment 1: The LPC 2148 Education Board

Latency in High Performance Trading Systems Feb 2010

Bioreactor Process Plant Powered by NI LabVIEW and NI CompactRIO

FPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Architectures and Platforms

The Challenge of Handling Large Data Sets within your Measurement System

Introduction to Data Acquisition

Data Acquisition in LabVIEW

NVIDIA GeForce GTX 580 GPU Datasheet

Professional Training Program

Introduction to GPGPU. Tiziano Diamanti

high-performance computing so you can move your enterprise forward

Modular Instrumentation Technology Overview

AC : PRACTICAL DESIGN PROJECTS UTILIZING COMPLEX PROGRAMMABLE LOGIC DEVICES (CPLD)

Codesign: The World Of Practice

CUDA Debugging. GPGPU Workshop, August Sandra Wienke Center for Computing and Communication, RWTH Aachen University

How To Use First Robot With Labview

Developing reliable Multi-Core Embedded-Systems with NI Linux Real-Time

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware

FPGA-based MapReduce Framework for Machine Learning

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

VSys Modular Asset Monitoring System

PRAGMA ENGINEERING Srl. Next-Generation ATS (Sistemi ATE di Nuova Generazione)

High-Level Synthesis for FPGA Designs

Architecting High-Speed Data Streaming Systems. Sujit Basu

DS1104 R&D Controller Board

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

Computer Graphics Hardware An Overview

RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition

Serial Communications

What is a System on a Chip?

High Performance GPGPU Computer for Embedded Systems

Nutaq. PicoDigitizer 125-Series 16 or 32 Channels, 125 MSPS, FPGA-Based DAQ Solution PRODUCT SHEET. nutaq.com MONTREAL QUEBEC

Siemens and National Instruments Deliver Integrated Automation and Measurement Solutions

GPU Accelerated Monte Carlo Simulations and Time Series Analysis

Modeling a GPS Receiver Using SystemC

ARM Cortex -A8 SBC with MIPI CSI Camera and Spartan -6 FPGA SBC1654

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

AXIe: AdvancedTCA Extensions for Instrumentation and Test

Accurate Measurement of the Mains Electricity Frequency

Technical Training Module ( 30 Days)

Avoiding pitfalls in PROFINET RT and IRT Node Implementation

Eli Levi Eli Levi holds B.Sc.EE from the Technion.Working as field application engineer for Systematics, Specializing in HDL design with MATLAB and

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Seven Challenges of Embedded Software Development

A Computer Vision System on a Chip: a case study from the automotive domain

Network Enabled Battery Health Monitoring System

Xeon+FPGA Platform for the Data Center

Communicating with devices

EMX-2500 DATA SHEET FEATURES GIGABIT ETHERNET REMOTE CONTROLLER FOR PXI EXPRESS MAINFRAMES SYSTEM LEVEL FUNCTIONALITY

Quartus II Software Design Series : Foundation. Digitale Signalverarbeitung mit FPGA. Digitale Signalverarbeitung mit FPGA (DSF) Quartus II 1

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

HARDWARE ACCELERATION IN FINANCIAL MARKETS. A step change in speed

Transcription:

FYS3240 PC-based instrumentation and microcontrollers FPGA and GPU Spring 2012 Lecture #11 Bekkeng, 7.1.2012

Hardware accelaration In computing, Hardware acceleration is the use of computer hardware to perform some function faster than is possible in software running on the general-purpose CPU. Examples of hardware acceleration includes using graphics processing units (GPUs) and instructions for complex operations in CPUs. Normally, processors are sequential, and instructions are executed one by one. Various techniques are used to improve performance; hardware acceleration is one of them. The main difference between hardware and software is concurrency, allowing hardware to be much faster than software. Hardware accelerators are designed for computationally intensive software code The hardware that performs the acceleration, when in a separate unit from the CPU, is referred to as a hardware accelerator, or often more specifically as graphics accelerator or floating-point accelerator, etc. Those terms, however, are older and have been replaced with less descriptive terms like video card or graphics card. Many hardware accelerators are built on top of field-programmable gate array (FPGA) chips. From wikipedia, edited

FPGA = Field Programmable Gate Array

FPGA advantages High reliability High determinism High performance True parallelism Reconfigurable

Common Applications for FPGAs High-speed control Intelligent DAQ Digital communication protocols e.g. SPI and Wireless Sensor simulation Onboard processing and data reduction e.g. video processing Co-processing offload the CPU

To configure FPGAs on NI hardware

From LabVIEW to Hardware

LabVIEW FPGA Hardware Targets NI CompactRIO NI FlexRIO Adapter Modules FlexRIO FPGA Modules NI Single-Board RIO NI FlexRIO NI R Series Multifunction RIO NI Compact Vision System RIO = Reconfigurable I/O

FPGAs in DAQ-systems ( intelligent DAQ ) DAQ-cards with a programmable FPGA Multi-rate sampling Allows different sampling frequencies on the I/O channels For comparison, when using an ordinary DAQ-card (without a user reconfigurable FPGA) all channels must have the same sampling frequency User defined processing in the FPGA FPGA-based hardware timing/synchronization Both for PXI and PCI

LabVIEW FPGA VIs

LabVIEW FPGA VIs

LabVIEW FPGA VIs

http://zone.ni.com/devzone/cda/tut/p/id/3358 With the LabVIEW FPGA Module you develop FPGA applications on a host computer running Windows, and then LabVIEW compiles and implements the code in hardware Including GUI open Com. with FPGA close

Host Application for Live Communication with the FPGA The FPGA interface pallet makes it easy to perform real-time communication between the FPGA and the real-time or Windows host application The Open FPGA Reference function is first used to open and run a specified FPGA application The FPGA Read/Write Control can be used to read data from the FPGA indicators (outputs) or write data to the controls (inputs)

LabVIEW FPGA Event counter Digital I/O nodes can take advantage of the specialized structure called the single-cycle timed loop in LabVIEW FPGA, with which you can execute code at specified rates ranging from 2.5 to 200 MHz. Using a 40 MHz clock, for example, you can use a single-cycle timed loop to create a 40 MHz counter on any digital line

LabVIEW FPGA - Parallel Operations Two parallel loops with different sampling rates Run in parallel No shared resources between the two loops

Single Cycle Timed Loop (SCTL) All operations within the SCTL loop must fit into one clock cycle Some functions are not supported in SCTL, among others: Long sequences of serial code Loop timer, wait functions Analog input an analog output I/O nodes Loop structures (While Loop, For Loop) LabVIEW Help informs which functions are supported in SCTL

Optimization Clock frequency = 40 MHz Loop rates limited by longest path A0 takes about 35 ticks DI takes 1 tick (HW Specific) DI (digital input) limited by AO (analog output) when in same loop NOTE: A while loop takes 3 ticks

Optimization - Parallel loops Separate (parallel) loops allow DI to run independent of AO This allows DI to be sampled 10 times faster by using a separate loop

Include HDL-code in LabVIEW The IP Integration Node (replaces the HDL Interface from LabVIEW 2010) can bring in third-party Xilinx IPs. a wizard to import files and configure the interface step by step You can also use the IP Integration Node to include your own VHDL code Once you have configured the node, you can use the IP just like any other LabVIEW node with inputs and outputs. IP = Intellectual Property

Cycle-Accurate Simulation with ModelSim New in LabVIEW FPGA 2010

LabVIEW FPGA IPNet www.ni.com/ipnet

How to Learn More... http://www.ni.com/fpga/

GPUs GPU = Graphics Processing Unit GPUs can be used as hardware accelerators for numerical/computational tasks Can be used in Real-Time High-Performance Computing systems 240 cores NVIDIA TeslaTM C1060

Need for Speed... GPUs have more transistors dedicated for processing than a CPU The performance gain when using GPUs can be significant A single workstation with two x16 PCIe slots can have two GPUs Two TeslaTM C1060 GPUs gives 480 cores. The next generation CUDA architecture, code named Fermi, have up to 512 CUDA cores.

CUDA CUDA = Compute Unified Device Architecture CUDA is developed by Nvidia and is a GPU interface for C

CUDA example void saxpy_serial(int n, float a, float* x, float* y) { for (int i=0; i<n; ++i) y[i] = a*x[i] + y[i]; } Standard C // Kjør seriell saxpy saxpy_serial(n, 2.0, x, y); global void saxpy_parallel(int n, float a, float* x, float* y) { int i = blockidx.x * blockdim.x + threadidx.x; if (i < n) y[i] = a*x[i] + y[i]; } Parallel CUDA C // Kjør parallell saxpy med 256 tråder/blokk int nblocks = (n + 255) / 256; saxpy_parallel<<<nblocks, 256>>>(n, 2.0, x, y);

Application examples

Application example II http://www.top500.org/

LabVIEW and GPUs GPUs can not be directly programmed with LabVIEW However, a framework is designed to integrate GPU execution into LabVIEW's parallel execution, to execute the CUDA code Some applications are tailor made for deployment to GPUs, such as those related to matrix operations

LabVIEW and GPUs II The GPU interface is made up of two LabVIEW libraries lvcuda.lvlib and lvcublas.lvlib that contain LabVIEW data types and VIs http://zone.ni.com/devzo ne/cda/tut/p/id/11972

Example Hybrid architecture using COTS components Numerical Computing two NI 8353 multicore computers one nvidia Tesl GPU computing system (with four GPUs) Real-Time Measurement and Control two PXIe chassis embedded controllers with a multicore CPU FPGA-based data acquisition/control boards