Software-Programmable FPGA IoT Platform. Kam Chuen Mak (Lattice Semiconductor) Andrew Canis (LegUp Computing) July 13, 2016



Similar documents
Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Architectures and Platforms

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Codesign: The World Of Practice

Quartus II Software Design Series : Foundation. Digitale Signalverarbeitung mit FPGA. Digitale Signalverarbeitung mit FPGA (DSF) Quartus II 1

synthesizer called C Compatible Architecture Prototyper(CCAP).

Extending the Power of FPGAs. Salil Raje, Xilinx

Performance evaluation

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Introducción. Diseño de sistemas digitales.1

ELEC 5260/6260/6266 Embedded Computing Systems

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

High-Level Synthesis for FPGA Designs

Product Development Flow Including Model- Based Design and System-Level Functional Verification

ARM Cortex-A9 MPCore Multicore Processor Hierarchical Implementation with IC Compiler

LogiCORE IP AXI Performance Monitor v2.00.a

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

DDS. 16-bit Direct Digital Synthesizer / Periodic waveform generator Rev Key Design Features. Block Diagram. Generic Parameters.

Model-based system-on-chip design on Altera and Xilinx platforms

Modeling a GPS Receiver Using SystemC

ZigBee Technology Overview

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

Data Center and Cloud Computing Market Landscape and Challenges

Nutaq. PicoDigitizer 125-Series 16 or 32 Channels, 125 MSPS, FPGA-Based DAQ Solution PRODUCT SHEET. nutaq.com MONTREAL QUEBEC

7a. System-on-chip design and prototyping platforms

Embedded Linux RADAR device

Embedded Systems. introduction. Jan Madsen

The Internet of Things: Opportunities & Challenges

FSMD and Gezel. Jan Madsen

CFD Implementation with In-Socket FPGA Accelerators

Eli Levi Eli Levi holds B.Sc.EE from the Technion.Working as field application engineer for Systematics, Specializing in HDL design with MATLAB and

Systems on Chip Design

International Journal of Advancements in Research & Technology, Volume 2, Issue3, March ISSN

STMicroelectronics is pleased to present the. SENSational. Attend a FREE One-Day Technical Seminar Near YOU!

EE361: Digital Computer Organization Course Syllabus

Pre-tested System-on-Chip Design. Accelerates PLD Development

SEMICONDUCTOR WIRELESS SENSOR NETWORK MARKET EXECUTIVE SUMMARY. Wireless Sensor Network Energy Harvesting And Storage Applications

MICROPROCESSOR AND MICROCOMPUTER BASICS

Rapid System Prototyping with FPGAs

International Journal of Scientific & Engineering Research, Volume 4, Issue 6, June ISSN

ARM Microprocessor and ARM-Based Microcontrollers

DesignWare IP for IoT SoC Designs

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

FPGAs for Trusted Cloud Computing

ARM Processors and the Internet of Things. Joseph Yiu Senior Embedded Technology Specialist, ARM

Introduction to Cloud Computing

Questions from The New SensorTag - IoT Made Easy Webinar

Module: Software Instruction Scheduling Part I

Building Blocks for PRU Development

MATLAB/Simulink Based Hardware/Software Co-Simulation for Designing Using FPGA Configured Soft Processors

A Verilog HDL Test Bench Primer Application Note

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

FPGAs for High-Performance DSP Applications

9/14/ :38

Smart Systems: the key enabling technology for future IoT

LSN 2 Computer Processors

Arbitration and Switching Between Bus Masters

Networking Remote-Controlled Moving Image Monitoring System

Introduction to Digital System Design

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

FPGA area allocation for parallel C applications

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions

VON BRAUN LABS. Issue #1 WE PROVIDE COMPLETE SOLUTIONS ULTRA LOW POWER STATE MACHINE SOLUTIONS VON BRAUN LABS. State Machine Technology

Vivado Design Suite Tutorial

How To Design An Image Processing System On A Chip

Design Cycle for Microprocessors

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: ext: Sequential Circuit

PULPino: A small single-core RISC-V SoC

High Performance or Cycle Accuracy?

RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition

Central Processing Unit (CPU)

ELECTENG702 Advanced Embedded Systems. Improving AES128 software for Altera Nios II processor using custom instructions

Hardware Implementations of RSA Using Fast Montgomery Multiplications. ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

Testing of Digital System-on- Chip (SoC)

Computer Organization and Components

USB 3.0 Connectivity using the Cypress EZ-USB FX3 Controller

Low Cost System on Chip Design for Audio Processing

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

Chapter 1 Computer System Overview

40G MACsec Encryption in an FPGA

System on Chip Platform Based on OpenCores for Telecommunication Applications

Introduction to the Latest Tensilica Baseband Solutions

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

A Computer Vision System on a Chip: a case study from the automotive domain

Eingebettete Systeme. 4: Entwurfsmethodik, HW/SW Co-Design. Technische Informatik T T T

Qsys and IP Core Integration

A DA Serial Multiplier Technique based on 32- Tap FIR Filter for Audio Application

Internet of things (IOT) applications covering industrial domain. Dev Bhattacharya

White Paper. Intel Sandy Bridge Brings Many Benefits to the PC/104 Form Factor

Transcription:

Software-Programmable FPGA IoT Platform Kam Chuen Mak (Lattice Semiconductor) Andrew Canis (LegUp Computing) July 13, 2016

Agenda Introduction Who we are IoT Platform in FPGA Lattice s IoT Vision IoT Platform Overview Architecture Overview RISC-V Architecture RISC-V Processor Overview LegUp High-Level Synthesis Design Flow Benchmark Results Page: 2

Agenda Introduction Who we are IoT Platform in FPGA Lattice s IoT Vision IoT Platform Overview Architecture Overview RISC-V Architecture RISC-V Processor Overview LegUp High-Level Synthesis Design Flow Benchmark Results Page: 3

INTRODUCTION Who we are? Lattice Semiconductor (NASDAQ: LSCC) We provide smart connectivity solutions powered by our low power FPGA, video ASSP, 60 GHz millimeter wave, and IP products to the consumer, communications, industrial, computing, and automotive markets worldwide. Our unwavering commitment to our customers enables them to accelerate their innovation, creating an ever better and more connected world. Page: 4

INTRODUCTION Who we are? Lattice Semiconductor (NASDAQ: LSCC) We provide smart connectivity solutions powered by our low power FPGA, video ASSP, 60 GHz millimeter wave, and IP products to the consumer, communications, industrial, computing, and automotive markets worldwide. Our unwavering commitment to our customers enables them to accelerate their innovation, creating an ever better and more connected world. LegUp Computing (LegUpComputing.com) We are a startup spun out of 7 years of research at the University of Toronto. We provide high-level synthesis tools that compile software into hardware running on an FPGA. Our product enables software engineers to easily target FPGA hardware to achieve huge gains in computational throughput and energy efficiency. Page: 5

Agenda Introduction Who we are IoT Platform in FPGA Lattice s IoT Vision IoT Platform Overview Architecture Overview RISC-V Architecture RISC-V Processor Overview LegUp High-Level Synthesis Design Flow Benchmark Results Page: 6

IOT PLATFORM IN FPGA Lattice s Vision Lattice s vision for an FPGA IoT platform: Page: 7

IOT PLATFORM IN FPGA Lattice s Vision Lattice s vision for an FPGA IoT platform: 1) Ease of use Use C/C++ as our design entry for customers. Users can even create hardware accelerators using C No More Verilog or VHDL Page: 8

IOT PLATFORM IN FPGA Lattice s Vision Lattice s vision for an FPGA IoT platform: 1) Ease of use Use C/C++ as our design entry for customers. Users can even create hardware accelerators using C No More Verilog or VHDL We use a RISC-V with LegUp accelerators to fulfil this requirement Page: 9

IOT PLATFORM IN FPGA Lattice s Vision Lattice s vision for an FPGA IoT platform: 1) Ease of use Use C/C++ as our design entry for customers. Users can even create hardware accelerators using C No More Verilog or VHDL We use a RISC-V with LegUp accelerators to fulfil this requirement 2) Flexibility Support a wide range of sensors, actuators, and communication devices APIs Capability to generate custom instructions or acceleration libraries if they are required for the user application Page: 10

IOT PLATFORM IN FPGA Lattice s Vision Lattice s vision for an FPGA IoT platform: 1) Ease of use Use C/C++ as our design entry for customers. Users can even create hardware accelerators using C No More Verilog or VHDL We use a RISC-V with LegUp accelerators to fulfil this requirement 2) Flexibility Support a wide range of sensors, actuators, and communication devices APIs Capability to generate custom instructions or acceleration libraries if they are required for the user application We can use a FPGA to fulfil this requirement Page: 11

IOT PLATFORM IN FPGA Lattice s Vision Lattice s vision for an FPGA IoT platform: 1) Ease of use Use C/C++ as our design entry for customers. Users can even create hardware accelerators using C No More Verilog or VHDL We use a RISC-V with LegUp accelerators to fulfil this requirement 2) Flexibility Support a wide range of sensors, actuators, and communication devices APIs Capability to generate custom instructions or acceleration libraries if they are required for the user application We can use a FPGA to fulfil this requirement A hybrid computing solution: Combine the RISC-V Processor with FPGA hardware and utilize our low power and small footprint FPGA advantages for user customization. Page: 12

IOT PLATFORM IN FPGA Lattice s Vision A Hybrid (Processor + FPGA) platform Cost Performance Page: 13

IOT PLATFORM IN FPGA Lattice s Vision A Hybrid (Processor + FPGA) platform Cost Data center / Cloud Computing Page: 14 Performance

IOT PLATFORM IN FPGA Lattice s Vision A Hybrid (Processor + FPGA) platform Cost Embedded, mobile, Internet-of-Things Data center / Cloud Computing Performance Page: 15

IOT PLATFORM IN FPGA IoT Platform Overview Sensing Processing Communication Page: 16

IOT PLATFORM IN FPGA IoT Platform Overview Sensing Processing Communication The Internet of Things (IoT) refers to the ever-growing network of physical objects that feature an IP address for internet connectivity, and the communication that occurs between these objects and other Internetenabled devices and systems. (From Webopedia) Page: 17

IOT PLATFORM IN FPGA IoT Platform Overview Sensing Accelerometer Gyroscope Pressure Temperature Processing Encryption Digital signal processing Security Data filtering Communication Wi-Fi Bluetooth I2C SPI Page: 18

IOT PLATFORM IN FPGA IoT Platform Overview Sensing Accelerometer Gyroscope Pressure Temperature Software description of the sensor component Lattice IoT Platform provides APIs for sensors/actuators Processing Encryption Digital signal processing Security Data filtering FPGA RISCV Processor Hardware Accelerator Communication Wi-Fi Bluetooth I2C SPI Software description of the communication device Lattice IoT platform provides APIs for communication devices Page: 19

IOT PLATFORM IN FPGA Architecture Overview Processing Encryption Digital Signal Processing Security Data Filtering FPGA RISCV Processor Hardware Accelerator Page: 20

IOT PLATFORM IN FPGA Architecture Overview Processing Encryption Digital Signal Processing Security Data Filtering FPGA RISCV Processor Hardware Accelerator RISC-V processor plus LegUp-generated hardware accelerators to handle the processing part of the IoT Platform: Low-power and small footprint solution Identify the critical hotspots in C program, use LegUp to synthesize them into the FPGA and speed up the overall performance RISC-V processor executes the rest of the C program Maintain a low power and gain high throughput Page: 21

Agenda Introduction Who we are IoT Platform in FPGA Lattice s IoT Vision IoT Platform Overview Architecture Overview RISC-V Architecture RISC-V Processor Overview LegUp High-Level Synthesis Design Flow Benchmark Results Page: 22

IOT PLATFORM IN FPGA Architecture Overview RISC-V + LegUp-generated accelerators RISCV LegUp-generated Accelerator LegUp-generated Accelerator master master BUS INTERFACE...master slave slave slave slave Instruction Memory Processor Data Memory Share Data Memory Accelerator Memory Page: 23

RISC-V ARCHITECTURE RISC-V Processor Overview PC GEN INST MEM (R) Decoder REG FILE (R) ALU CSR FILE (R) DATA MEM CSR FILE (W) REG FILE (W) Page: 24

RISC-V ARCHITECTURE RISC-V Processor Overview PC GEN INST MEM (R) Decoder REG FILE (R) ALU CSR FILE (R) DATA MEM CSR FILE (W) REG FILE (W) Lattice RISC-V Processor: 4 stages pipeline CPU Core Page: 25

RISC-V ARCHITECTURE RISC-V Processor Overview PC GEN INST MEM (R) Decoder REG FILE (R) ALU CSR FILE (R) DATA MEM CSR FILE (W) REG FILE (W) Lattice RISC-V Processor: 4 stages pipeline CPU Core RISC-V RV32IMC Instruction Set RV32I v2.0 RV32M v2.0 with Multiplier Only RV32C v1.9 Page: 26

RISC-V ARCHITECTURE RISC-V Processor Overview PC GEN INST MEM (R) Decoder REG FILE (R) ALU CSR FILE (R) DATA MEM CSR FILE (W) REG FILE (W) Lattice RISC-V Processor: 4 stages pipeline CPU Core RISC-V RV32IMC Instruction Set RV32I v2.0 RV32M v2.0 with Multiplier Only RV32C v1.9 Other optional features which can be configured through the Verilog parameters, i.e. Enable external interrupt, allow external stalls from accelerators, and use custom instructions Page: 27

RISC-V ARCHITECTURE RISC-V Processor Overview RV32I RV32IM RV32IC LM32 LM32(M) NIOS 2e* NIOS 2f* EM4* 8051* DMIPS 1.35 1.64 1.35 0.8 0.9 0.15 1.16 1.5 0.1 Area 1.3K LUTs 1.5K LUTs +DSP ~1.6K LUTs 2K LUTs 2.5K LUTs 1.4K LEs 3K LEs N/A N/A Code Density** 1.45 1.4 1 1.8 1.8 1.17 1.17 1 1.25 *Data from the third party vendor datasheet **Benchmark using Lattice internal designs Page: 28

Agenda Introduction Who we are IoT Platform in FPGA Lattice s IoT Vision IoT Platform Overview Architecture Overview RISC-V Architecture RISC-V Processor Overview LegUp High-Level Synthesis Design Flow Benchmark Results Page: 29

LEGUP HIGH-LEVEL SYNTHESIS Software (C code) Hardware Description (Verilog) www.legup.org www.legupcomputing.com Page: 30

BENEFITS OF LEGUP Harness parallelism of hardware Design in C (#1 embedded language) Many engineers don t know Verilog Finish in weeks instead of months Faster time to market C testbench 100X faster than simulation Allows design space exploration Easier debugging in C IEEE Spectrum 2015 Page: 31

LEGUP DESIGN FLOW int FIR(int ntaps, int sum) { int i; for (i=0; i < ntaps; i++) sum += h[i] * z[i]; return (sum);... Program code C Compiler RISC-V Page: 32

LEGUP DESIGN FLOW int FIR(int ntaps, int sum) { int i; for (i=0; i < ntaps; i++) sum += h[i] * z[i]; return (sum);... Program code C Compiler Profiling RISC-V Page: 33

LEGUP DESIGN FLOW int FIR(int ntaps, int sum) { int i; for (i=0; i < ntaps; i++) sum += h[i] * z[i]; return (sum);... Program code C Compiler Profiling RISC-V Profiling Data: Execution Cycles Power Page: 34

LEGUP DESIGN FLOW int FIR(int ntaps, int sum) { int i; for (i=0; i < ntaps; i++) sum += h[i] * z[i]; return (sum);... Program code C Compiler Profiling RISC-V Profiling Data: Suggested program segments to target to HW Execution Cycles Power Page: 35

LEGUP DESIGN FLOW int FIR(int ntaps, int sum) { int i; for (i=0; i < ntaps; i++) sum += h[i] * z[i]; return (sum);... Program code C Compiler Profiling RISC-V Profiling Data: High-level synthesis Suggested program segments to target to HW Execution Cycles Power Page: 36

LEGUP DESIGN FLOW int FIR(int ntaps, int sum) { int i; for (i=0; i < ntaps; i++) sum += h[i] * z[i]; return (sum);... C Compiler Profiling RISC-V Program code Altered SW binary (calls HW accelerators) Profiling Data: µp Hardened program segments FPGA fabric High-level synthesis Suggested program segments to target to HW Execution Cycles Power Page: 37

LEGUP DESIGN FLOW int FIR(int ntaps, int sum) { int i; for (i=0; i < ntaps; i++) sum += h[i] * z[i]; return (sum);... C Compiler Profiling RISC-V Program code Altered SW binary (calls HW accelerators) Profiling Data: µp Hardened program segments FPGA fabric High-level synthesis Suggested program segments to target to HW Execution Cycles Power Page: 38

LEGUP DESIGN FLOW int main () { sum = dotproduct(a, b, n);... int dotproduct(int *A, int *B, int N) { for (i=0; i<n; i++) { sum += A[i] * B[i]; return sum; Page: 39

LEGUP DESIGN FLOW int main () { sum = dotproduct(a, b, n);... We want to accelerate this C function in hardware int dotproduct(int *A, int *B, int N) { for (i=0; i<n; i++) { sum += A[i] * B[i]; return sum; Page: 40

LEGUP DESIGN FLOW int main () { sum = dotproduct(a, b, n);... Specify We want LegUp to accelerate Tcl command: this C set_accelerator_function function in hardware dotproduct int dotproduct(int *A, int *B, int N) { for (i=0; i<n; i++) { sum += A[i] * B[i]; return sum; Page: 41

LEGUP DESIGN FLOW int main () { sum = dotproduct(a, b, n);... Specify We want LegUp to accelerate Tcl command: this C set_accelerator_function function in hardware dotproduct Now run LegUp! int dotproduct(int *A, int *B, int N) { for (i=0; i<n; i++) { sum += A[i] * B[i]; return sum; Page: 42

LEGUP DESIGN FLOW int main () { sum = dotproduct(a, b, n);... int dotproduct(int *A, int *B, int N) { for (i=0; i<n; i++) { sum += A[i] * B[i]; return sum; Page: 43

LEGUP DESIGN FLOW int main () { sum = dotproduct(a, b, n);... int dotproduct(int *A, int *B, int N) { for (i=0; i<n; i++) { sum LegUp += A[i] * B[i]; return sum; Page: 44

LEGUP DESIGN FLOW int main () { sum = dotproduct(a, b, n);... int dotproduct(int *A, int *B, int N) { for (i=0; i<n; i++) { sum LegUp += A[i] * B[i]; return sum; Verilog HW Accelerator Page: 45

LEGUP DESIGN FLOW int main () { sum = dotproduct(a, b, n);... #define dotproduct_ret (volatile int *) 0xf0000000 #define dotproduct_status (volatile int *) 0xf0000008 #define dotproduct_arg1 (volatile int *) 0xf000000C #define dotproduct_arg2 (volatile int *) 0xf0000010 #define dotproduct_arg3 (volatile int *) 0xf0000014 int legup_dotproduct(int *A, int *B, int N) { *dotproduct_arg1 = (volatile int) A; *dotproduct_arg2 = (volatile int) B; *dotproduct_arg3 = (volatile int) N; *dotproduct_status = 1; return *dotproduct_ret; Automatically created C wrapper to interface with hardware Page: 46

LEGUP DESIGN FLOW int main () { sum = dotproduct(a, b, n);... #define dotproduct_ret (volatile int *) 0xf0000000 #define dotproduct_status (volatile int *) 0xf0000008 #define dotproduct_arg1 (volatile int *) 0xf000000C #define dotproduct_arg2 (volatile int *) 0xf0000010 #define dotproduct_arg3 (volatile int *) 0xf0000014 int legup_dotproduct(int *A, int *B, int N) { *dotproduct_arg1 = (volatile int) A; *dotproduct_arg2 = (volatile int) B; *dotproduct_arg3 = (volatile int) N; *dotproduct_status = 1; return *dotproduct_ret; Page: 47

LEGUP DESIGN FLOW int main () {... sum = sum legup_dotproduct(a,b,n); = dotproduct(n);... Automatically replace function calls #define dotproduct_ret (volatile int *) 0xf0000000 #define dotproduct_status (volatile int *) 0xf0000008 #define dotproduct_arg1 (volatile int *) 0xf000000C #define dotproduct_arg2 (volatile int *) 0xf0000010 #define dotproduct_arg3 (volatile int *) 0xf0000014 int legup_dotproduct(int *A, int *B, int N) { *dotproduct_arg1 = (volatile int) A; *dotproduct_arg2 = (volatile int) B; *dotproduct_arg3 = (volatile int) N; *dotproduct_status = 1; return *dotproduct_ret; Page: 48

LEGUP DESIGN FLOW int main () {... sum = sum legup_dotproduct(a,b,n); = dotproduct(n);... Automatically replace function calls #define dotproduct_ret (volatile int *) 0xf0000000 #define dotproduct_status (volatile int *) 0xf0000008 #define dotproduct_arg1 (volatile int *) 0xf000000C #define dotproduct_arg2 (volatile int *) 0xf0000010 #define dotproduct_arg3 (volatile int *) 0xf0000014 int legup_dotproduct(int *A, int *B, int N) { *dotproduct_arg1 = (volatile int) A; *dotproduct_arg2 = (volatile int) B; *dotproduct_arg3 = (volatile int) N; *dotproduct_status = 1; return *dotproduct_ret; Page: 49

LEGUP DESIGN FLOW int main () { sum = legup_dotproduct(a,b,n); #define dotproduct_ret (volatile int *) 0xf0000000 #define dotproduct_status (volatile int *) 0xf0000008 #define dotproduct_arg1 (volatile int *) 0xf000000C #define dotproduct_arg2 (volatile int *) 0xf0000010 #define dotproduct_arg3 (volatile int *) 0xf0000014... int legup_dotproduct(int *A, int *B, int N) { *dotproduct_arg1 = (volatile int) A; *dotproduct_arg2 = (volatile int) B; *dotproduct_arg3 = (volatile int) N; *dotproduct_status = 1; return *dotproduct_ret; Page: 50

LEGUP DESIGN FLOW int main () { sum = legup_dotproduct(a,b,n);... Software Compiler #define dotproduct_ret (volatile int *) 0xf0000000 #define dotproduct_status (volatile int *) 0xf0000008 #define dotproduct_arg1 (volatile int *) 0xf000000C #define dotproduct_arg2 (volatile int *) 0xf0000010 #define dotproduct_arg3 (volatile int *) 0xf0000014 int legup_dotproduct(int *A, int *B, int N) { *dotproduct_arg1 = (volatile int) A; *dotproduct_arg2 = (volatile int) B; *dotproduct_arg3 = (volatile int) N; *dotproduct_status = 1; return *dotproduct_ret; Page: 51

LEGUP DESIGN FLOW int main () { sum = legup_dotproduct(a,b,n);... Software Compiler RISC-V Processor #define dotproduct_ret (volatile int *) 0xf0000000 #define dotproduct_status (volatile int *) 0xf0000008 #define dotproduct_arg1 (volatile int *) 0xf000000C #define dotproduct_arg2 (volatile int *) 0xf0000010 #define dotproduct_arg3 (volatile int *) 0xf0000014 int legup_dotproduct(int *A, int *B, int N) { *dotproduct_arg1 = (volatile int) A; *dotproduct_arg2 = (volatile int) B; *dotproduct_arg3 = (volatile int) N; *dotproduct_status = 1; return *dotproduct_ret; Page: 52

IOT PLATFORM IN FPGA Architecture Overview RISC-V + LegUp-generated accelerators RISCV master LegUp-generated dotproduct() master BUS INTERFACE slave Processor Data Memory slave Share Data Memory Page: 53

USER DESIGN FLOW C Input Program Profiling + Hardware/Software Partitioning User Tcl Configuration Iterative Design Process RISC-V Binary RISC-V Toolchain/LegUp RISC-V Processor Complete System Verilog Hardware Hardware Accelerator Hardware Accelerator Accelerator Lattice RTL Implementation Page: 54

CASE STUDY Energy is calculated as a sum of squares of speech samples: for (i = 0; i < 256; i++) energy += samples[i]*samples[i]; Running on RISC-V32IM (DMIPS 1.64): Each loop iteration takes about 6 cycles: load, increment address, multiply, add, branch Clock cycles to complete: 1550 LegUp synthesized accelerator: Generates a pipelined hardware circuit exploiting parallelism One iteration completes every clock cycle Clock cycles to complete: 288 Speedup: 5.4X Page: 55

Agenda Introduction Who we are IoT Platform in FPGA Lattice s IoT Vision IoT Platform Overview Architecture Overview RISC-V Architecture RISC-V Processor Overview LegUp High-Level Synthesis Design Flow Benchmark Results Page: 56

STREAMING BENCHMARKS Edge Detection: 4 image filters Bayer/deBayer Filter Beamforming FIR Filter Page: 57

STREAMING BENCHMARKS Aptina Image Sensor ECP3 FPGA Sensor In Controller Edge Detector Video Out Controller HDMI Input: 1 pixel/cycle Output: 1 pixel/cycle Verilog C Clock Frequency: 74 MHz 60 fps Page: 58

STREAMING BENCHMARKS Average across 9 benchmarks LegUp vs RTL: FMax within 30% and slices < 10% larger RTL LegUp/RTL Time (us) 243 1.32 Cycles 38,544 1.00 FMax (MHz) 159 0.76 Slices 612 1.08 Registers 424 1.15 LUT4s 972 1.18 DSPs 16 1.00 Metric: Time = cycles * clock period Time taken to complete the entire benchmark Page: 59

RISC-V+LEGUP BENCHMARKS Accelerate most-compute intensive function with LegUp Benchmark RISC-V clock cycles RISC-V + LegUp clock cycles Energy sum 1,550 288 5.4 OPUS function 48,947 16,523 3.0 FIR filter 44,610 13,049 3.4 Matrix multiply 67,823 28,105 2.4 ADPCM 88,288 20,400 4.3 AES 29,071 12,278 2.4 Blowfish 752,774 460,274 1.6 GSM 17,092 6,243 2.7 Motion 13,193 4,912 2.7 SHA 726,321 190,451 3.8 Geomean 48,542 16,107 3.0 Speedup Page: 60

IOT PLATFORM IN FPGA Architecture Overview RISC-V + LegUp-generated coprocessor connected to I/Os RISCV master LegUp-generated Accelerator master I/Os BUS INTERFACE slave Processor Data Memory slave Share Data Memory slave Accelerator Memory Page: 61

CUSTOM INSTRUCTIONS The current architecture assumes we want to accelerate a large chunk of code Instead the user may want to add a custom instruction i.e. multiply-accumulate for DSP applications LegUp can synthesize the hardware needed for the custom instruction Hardware accelerator is tightly coupled with RISC-V processor: can read/write registers Benefit: area efficient reuse of hardware many times Work in progress Page: 62

FPGA IOT PLATFORM THANK YOU ANY QUESTION? Page: 63