Float to Fix conversion



Similar documents
Software Development with Real- Time Workshop Embedded Coder Nigel Holliday Thales Missile Electronics. Missile Electronics

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Model-based system-on-chip design on Altera and Xilinx platforms

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur

FPGAs in Next Generation Wireless Networks

Architectures and Platforms

7a. System-on-chip design and prototyping platforms

Extending the Power of FPGAs. Salil Raje, Xilinx

WiSER: Dynamic Spectrum Access Platform and Infrastructure

Product Development Flow Including Model- Based Design and System-Level Functional Verification

DEVELOPMENT OF DEVICES AND METHODS FOR PHASE AND AC LINEARITY MEASUREMENTS IN DIGITIZERS

DS1104 R&D Controller Board

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.

Non-Data Aided Carrier Offset Compensation for SDR Implementation

How To Design An Image Processing System On A Chip

LogiCORE IP AXI Performance Monitor v2.00.a

Radar Signal Processing:

9/14/ :38

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

CHAPTER 1 ENGINEERING PROBLEM SOLVING. Copyright 2013 Pearson Education, Inc.

High-Resolution Doppler-Polarimetric FMCW Radar with Dual-Orthogonal Signals

Eli Levi Eli Levi holds B.Sc.EE from the Technion.Working as field application engineer for Systematics, Specializing in HDL design with MATLAB and

Embedded System Hardware - Processing (Part II)

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

High-Level Synthesis for FPGA Designs

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Go Faster - Preprocessing Using FPGA, CPU, GPU. Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Converting Models from Floating Point to Fixed Point for Production Code Generation

Operating System Support for Multiprocessor Systems-on-Chip

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Digital Systems Design! Lecture 1 - Introduction!!

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

MIMO detector algorithms and their implementations for LTE/LTE-A

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25

Quartus II Software Design Series : Foundation. Digitale Signalverarbeitung mit FPGA. Digitale Signalverarbeitung mit FPGA (DSF) Quartus II 1

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Radar Processing: FPGAs or GPUs?

What is a System on a Chip?

BUILD VERSUS BUY. Understanding the Total Cost of Embedded Design.

Introduction to Xilinx System Generator Part II. Evan Everett and Michael Wu ELEC Spring 2013

AC : PRACTICAL DESIGN PROJECTS UTILIZING COMPLEX PROGRAMMABLE LOGIC DEVICES (CPLD)

Open Flow Controller and Switch Datasheet

Networking Remote-Controlled Moving Image Monitoring System

Nutaq. PicoDigitizer 125-Series 16 or 32 Channels, 125 MSPS, FPGA-Based DAQ Solution PRODUCT SHEET. nutaq.com MONTREAL QUEBEC

Multiprocessor System-on-Chip

DDS. 16-bit Direct Digital Synthesizer / Periodic waveform generator Rev Key Design Features. Block Diagram. Generic Parameters.

Rapid System Prototyping with FPGAs

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware

Latency in High Performance Trading Systems Feb 2010

CFD Implementation with In-Socket FPGA Accelerators

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

Extended Boundary Scan Test breaching the analog ban. Marcel Swinnen, teamleader test engineering

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

Wireless Communication and RF System Design Using MATLAB and Simulink Giorgia Zucchelli Technical Marketing RF & Mixed-Signal

Intel Labs at ISSCC Copyright Intel Corporation 2012

RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition

FPGA-based MapReduce Framework for Machine Learning

Systolic Computing. Fundamentals

Reconfig'09 Cancun, Mexico

ReCoSoC'11 Montpellier, France. Implementation Scenario for Teaching Partial Reconfiguration of FPGA

MsC in Advanced Electronics Systems Engineering

Integrating MATLAB into your C/C++ Product Development Workflow Andy Thé Product Marketing Image Processing Applications

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

A DA Serial Multiplier Technique based on 32- Tap FIR Filter for Audio Application

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Integer Computation of Image Orthorectification for High Speed Throughput

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

Acoustic Processor of the MCM Sonar

FPGA area allocation for parallel C applications

SDR Architecture. Introduction. Figure 1.1 SDR Forum High Level Functional Model. Contributed by Lee Pucker, Spectrum Signal Processing

VPX Implementation Serves Shipboard Search and Track Needs

A Computer Vision System on a Chip: a case study from the automotive domain

Digital Hardware Design Decisions and Trade-offs for Software Radio Systems

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Reconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio

A case study of mobile SoC architecture design based on transaction-level modeling

NIOS II Based Embedded Web Server Development for Networking Applications

STUDY ON HARDWARE REALIZATION OF GPS SIGNAL FAST ACQUISITION

FREQUENCY RESPONSE ANALYZERS

EE361: Digital Computer Organization Course Syllabus

White Paper FPGA Performance Benchmarking Methodology

Data Analysis with MATLAB The MathWorks, Inc. 1

Laboratoryof Electronics, Antennas and Telecommunications (UMR 7248)

Chapter 2 Logic Gates and Introduction to Computer Architecture

ELEC 5260/6260/6266 Embedded Computing Systems

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

Transcription:

www.thalesgroup.com Float to Fix conversion Fabrice Lemonnier Research & Technology

2 / Thales Research & Technology : Research center of Thales Objective: to propose technological breakthrough for the future products of Thales TRT Direction (E. Lansard) Research Groups Laboratories Sciences et Techniques de L Information Reasoning and analysis for complex system Decision & Optimisation Software system Engineering High Performance Computing Lab Safety for embedded system Lab Physics Waves and Signal Processing Micro-nano physics UMR CNRS- TRT Technology & Characterisatio n Submicron techno & process Nanocomposite & multifunction materials Organic materials chemistry Physics analysis Industrial process & technology analysis Technological demonstrators III-V Lab

3 / Applications Key issues Algorithms becoming dynamic and irregular, solve the issue of reconfigurable computing. Emerging algorithms in sensors raise technical challenges to architectures : beyond Von Neuman, beyond Moore. Applications become a mixture of computing levels (data flow, control) Drastic increase of data bandwidth out of the sensors Cognitive radio Design methodology Smart camera Drone Improve the link between algorithms and architecture Modularity and reuse Reliability Sub-micronic technologies are less and less reliable

4 / Rational The best trade off to raise the computing power for a low power consumption is obtained through: Parallelisation Customisation Australian Desert Animal: the Thorny Devil In the same time, we have to keep in mind the necessity of flexibility and programming efficiency.

5 / FPGA Why FPGA technology? High throughput Low power consumption (no compliant with GPGPU) Problem: floating-point computation is not efficient on FPGA (ratio of 5) The architecture design has to be in fixed-point BUT: the applications are generally coded in floating-point double precision The application has to be converted from floating-point to fixed-point: important impact on development flow (TTM, NRC)

6 / Fixed-point Arithmetic: Viewpoint from Industry Reduce development cost on FPGAs and MPSoC without floating-point unit Today this task is done by hand and can cost up to 6 man-months Avoid reject of designing efficient hardware accelerators on FPGAs Fixed-point arithmetic brings clear advantages in Area, speed, power, communication bandwith

7 / Architectures on FPGA Hardcoded IPs Necessary when volume of data is too important or latency is too short Conversion from floating-point to VHDL representation Dedicated processors When possible, dedicated processors are better due to programming efficiency Conversion from floating-point to assembly code

8 / Ter@pix core: an accelerator for Image processing SIMD : The sequencer execute the microcode All PEs (based on MULACC operator) execute the same instruction sequencer PE Local RAM PE Local RAM PE (Processing Element) : ALU based on a MULACC R A M PE PE PE Local RAM Local RAM Local RAM Local RAM (512 registers) 2 lines of the RAM PE Local RAM PE Local RAM Accelerator : SIMD Computing power : 50 Gops on Xilinx Virtex-5 SX240 Consumption : ~15W

9 / Ter@pix core: an accelerator for Image processing Common functions necessary to use an accelerator : NoC IF control data DMA Local Memory INoC sequencer PE PE CTR Local RAM Local RAM DMA to transfer data be computed CTR (controler) to execute the correct scheduling between data transfers and works Local Memory R A M PE PE PE Local RAM Local RAM Local RAM PE PE Local RAM Local RAM Accelerator : SIMD

10 / TeraTS: Application domains for Thales Airborne System Airborne radar Future requested power computing: Hundreds of Gops STAP algorithm (Space Time Adaptive Processing) high volume of data need of external memory Issue on bandwidth Electronic Warfare Future required computing power: Hundreds of Gops per channel computing directly in the data flow with a short latency : few µs High frequency Small array of data

11 / TeraTS: Solution Application high level representation Performance and flexibility through signal processing dedicated programmable accelerator FPGA RAM system level mapping GPP Hosting Structure Application executed on a GPP calling intensive computing operators on the accelerator Productivity through DDR I/Os D M A DSP Engine SEQUENCER PE0 PE1 Signal PE2 Processing PE3 IMEM Parallel Memory PE4 Programmable PE5 Accelerator PE6 PE7 Tool for mapping and parallelisation of the application and code generation Compiler toolset to generate the library of operators from C representation library of signal processing operators Compiler / assembler tools C code Xilinx Virtex-6 SX315 Consumption : ~50W

12 / PE of TeraTS The PE is a 32 bits processor Register file DSP ALU DSP ALU Link for complex operations

13 / TeraTS Programming tools Application High level representation Signal Processing operators in C language SPEAR DE (Parallelisation, mapping and code generation) compiler VLIWiser assembler C with Accelerator calls µblaze µcode Accelerator

14 / Application development impacts

15 / Space Time Adaptive Processing Objectives: remove clutter (ground reflexion) and detect moving targets.

16 / STAP principles V carrier pulses (recurrences) 5 antenna sub arrays v ground Surveillance of the ground by air: Detection of Moving Targets STAP computes dynamically the best filter to suppress the clutter (ground reflexion) and detect the moving targets ant range gates beam width rec rg An aircraft illuminates the ground, with a beam orthogonal to its velocity, by sending repeatedly sequences of periodic pulses (denoted rec) The echoed signal is received on 5 sensors (ant) The received signal is sampling at a given frequency, each sample corresponding to a distance called a range gates (rg) (typically 15 meters for a 10 MHz sampling)

17 / STAP algorithm Filters calculation Stimuli generation Filters application thresholding normalization ambiguity removal

18 / Programming flow Study Model Output PATTERN Matlab Ideal reference Model Output PATTERN Constrainted reference Model Intègre les caractéristiques de la cible (Tuile) Output PATTERN Environnement de Développement SPEAR Code generation Executable code

19 / Constrainted reference model Fixed-point arithmetic compliant with the constraints of the target: Multiplication, addition : 32 bits Accumulation: 70 bits Barrel shifter Signal noise ratio No overflow This conversion can be very long. It requests communication between engineers who don t speak the same language: Algorithm -> software -> hardware

20 / Fixed-Point Conversion Loss of precision incurs loss of performance Essentially, an optimization process Find trade-off between accuracy and cost Determine the number of bits for each data Manual conversion is tedious Strong need of tools Area Power Speed Performance Degradation

21 / Current Limit of Commercial tools Accuracy evaluation is performed using bit-true simulations Fixed-point simulation is very long Word-length optimisation time is prohibitive Used in all existing tools HDL coder Matlab (Mathworks), Vivado (Xilinx), Catalytic (Mentor Graphics) Strongly user-guided iterative process with long simulations in the loop It is the reason why we are involved in DEFIS project (ANR)

22 / DEFIS Aim of the DEFIS project is threefold To provide new methods for fixed-point conversion Analytical and efficient simulation-based methods To develop a complete software infrastructure for automatic fixedpoint conversion To demonstrate the quality of DEFIS flow on two industrial applications DEFIS at a glance Nov. 2011 to Feb. 2015 (40 months), today T0+19 Po le Images & seaux, Po le System@tic 2 PhD grants, 1 Engineer/PostDoc (36 months) 994.500, 281 person.months

XML This document is not to be reproduced, modified, adapted, published, translated in any material form in whole or in part nor disclosed to any third party without the prior written permission of 23 / System-Level Optimizations Spear DE (Thales) Accuracy constraint Software Integration FIPOGEN (LIP6) CGPE (LIRMM) Word-Length Optim. ID.Fix_FixConv (IRISA) Future modules ID.Fix_AccEval (IRISA) Fluctuat Error (CEA) Accuracy Evaluation GECOS (IRISA) Sardana (LIRMM) Parse Model (IRISA/CEA) Algorithm-Level Optimizations ID.Fix_DynEval (IRISA) Fluctuat Range (CEA) Stat_TVE (LIP6) Dynamic Range Evaluation Code generation (IRISA/CEA) DEFIS Software Infrastructure

24 / Perspectives Insertion in the development flow Reduction of the development cost Avoid risks when using dedicated accelerators based on fixed-point ALU

25 / Thank you for your attention! Questions