credits Programming with actors Dave B. Parlour Xilinx Research Labs Thomas A. Lenart Lund University Robert Esser



Similar documents
Seeking Opportunities for Hardware Acceleration in Big Data Analytics

CFD Implementation with In-Socket FPGA Accelerators

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors

7a. System-on-chip design and prototyping platforms

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

The WIMP51: A Simple Processor and Visualization Tool to Introduce Undergraduates to Computer Organization

Central Processing Unit (CPU)

DDS. 16-bit Direct Digital Synthesizer / Periodic waveform generator Rev Key Design Features. Block Diagram. Generic Parameters.

FPGA-based Multithreading for In-Memory Hash Joins

Operating System Support for Multiprocessor Systems-on-Chip

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

1. Memory technology & Hierarchy

Software-Programmable FPGA IoT Platform. Kam Chuen Mak (Lattice Semiconductor) Andrew Canis (LegUp Computing) July 13, 2016

BUILD VERSUS BUY. Understanding the Total Cost of Embedded Design.

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Systolic Computing. Fundamentals

High-Level Synthesis for FPGA Designs

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Processor Architectures

FPGAs for Trusted Cloud Computing

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2

Open Flow Controller and Switch Datasheet

A Lab Course on Computer Architecture

Next Generation GPU Architecture Code-named Fermi

Introduction to Xilinx System Generator Part II. Evan Everett and Michael Wu ELEC Spring 2013

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware

Motivation: Smartphone Market

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Systems on Chip Design

ELECTENG702 Advanced Embedded Systems. Improving AES128 software for Altera Nios II processor using custom instructions

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

PROBLEMS #20,R0,R1 #$3A,R2,R4

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

Which ARM Cortex Core Is Right for Your Application: A, R or M?

FLIX: Fast Relief for Performance-Hungry Embedded Applications

Instruction scheduling

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview

Software Stacks for Mixed-critical Applications: Consolidating IEEE AVB and Time-triggered Ethernet in Next-generation Automotive Electronics

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.

Software Synthesis from Dataflow Models for G and LabVIEW

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Networking Virtualization Using FPGAs

Introduction to Digital System Design

Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

An Efficient Architecture for Image Compression and Lightweight Encryption using Parameterized DWT

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition

A case study of mobile SoC architecture design based on transaction-level modeling

Architectures and Platforms

Implementation of emulated digital CNN-UM architecture on programmable logic devices and its applications

LogiCORE IP AXI Performance Monitor v2.00.a

Embedded System Hardware - Processing (Part II)

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Holger Eichelberger, Cui Qin, Klaus Schmid, Claudia Niederée

Computer Engineering: Incoming MS Student Orientation Requirements & Course Overview

Multicore Programming with LabVIEW Technical Resource Guide

2. TEACHING ENVIRONMENT AND MOTIVATION

University of St. Thomas ENGR Digital Design 4 Credit Course Monday, Wednesday, Friday from 1:35 p.m. to 2:40 p.m. Lecture: Room OWS LL54

FPGA IMPLEMENTATION OF AES ALGORITHM

International Journal of Advancements in Research & Technology, Volume 2, Issue3, March ISSN

How To Build An Ark Processor With An Nvidia Gpu And An African Processor

SYSTEM-ON-PROGRAMMABLE-CHIP DESIGN USING A UNIFIED DEVELOPMENT ENVIRONMENT. Nicholas Wieder

Parallel AES Encryption with Modified Mix-columns For Many Core Processor Arrays M.S.Arun, V.Saminathan

Rapid System Prototyping with FPGAs

Extending the Power of FPGAs. Salil Raje, Xilinx

FlexPath Network Processor

8. Hardware Acceleration and Coprocessing

NVIDIA GeForce GTX 580 GPU Datasheet

Data Management for Portable Media Players

Video Conference System

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

Let s put together a Manual Processor

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

Sistemas Digitais I LESI - 2º ano

İSTANBUL AYDIN UNIVERSITY

What is a System on a Chip?

Project Server hardware and software requirements

XtratuM hypervisor redesign for LEON4 multicore processor

10 Gbps Line Speed Programmable Hardware for Open Source Network Applications*

Power Reduction Techniques in the SoC Clock Network. Clock Power

Hardware and Software

Implementation of Full -Parallelism AES Encryption and Decryption

A Mixed-Signal System-on-Chip Audio Decoder Design for Education

Capacity Plan. Template. Version X.x October 11, 2012

9/14/ :38

Going Linux on Massive Multicore

The Evolution of CCD Clock Sequencers at MIT: Looking to the Future through History

Process Description and Control william stallings, maurizio pizzonia - sistemi operativi

Lecture: Pipelining Extensions. Topics: control hazards, multi-cycle instructions, pipelining equations

Transcription:

Programming with actors Jörn W. Janneck credits Dave B. Parlour Thomas A. Lenart Lund University Robert Esser University of Adelaide Ptolemy Miniconference VI, 2005-05-12-2

The FPGA Platform: Huge amounts of fine-grained concurrency.... along with specialized blocks (multipliers, RAM, ALUs, processors...) Ptolemy Miniconference VI, 2005-05-12-3 The Problem: Using FPGAs to implement DSP applications requires circuit design expertise. Ptolemy Miniconference VI, 2005-05-12-4

The Research Goal: Design and build models and tools that make it possible for application domain experts to program FPGAs with highquality implementations. Ptolemy Miniconference VI, 2005-05-12-5 models and tools What does it take to program with actors? actors and dataflow as a concurrent model Cal as the language for writing actors driver application tools code generation circuits and software and combinations thereof animation & visualization Ptolemy Miniconference VI, 2005-05-12-6

models and tools We are focusing on...... hardware that can be programmed.... programming concepts that can be implemented. Ptolemy Miniconference VI, 2005-05-12-7 driver application MPEG-4 decoder Ptolemy Miniconference VI, 2005-05-12-8

driver application MPEG-4 decoder metrics 60 atomic actors 22 atomic actor classes 3307 LOC (Cal) LOC per actor class between 7 and 2054 actor constructs variable token rates static/cyclostatic rates data-dependent choice test for absence of tokens non-prefix-monotonic actors Ptolemy Miniconference VI, 2005-05-12-9 driver application MPEG-4 decoder development time approx. 2 months. approximate sizes of various models Cal: 3,300 LOC architectural C code: 4,200 LOC synthesizable VHDL: 15,000 LOC Cal VHDL Ptolemy Miniconference VI, 2005-05-12-10

code generation 2D-IDCT implementation first step to complete decoder implementation naive code generator has sufficient language coverage for IDCT compute. 10 classes 200 LOC covers most language features, except... - addressable memory - multicycle actions Ptolemy Miniconference VI, 2005-05-12-11 code generation 2D-IDCT, version 1 Starting architecture is very inefficient: 22 multipliers with 12.5% utilization. 1-D A T 1-D Ptolemy Miniconference VI, 2005-05-12-12

code generation 2D-IDCT, version 2 interleave row and column streams pipelined 1D-IDCT result: 6 multipliers with 46% utilization more operator re-use costly in terms of operand routing >100 Mhz clock Pipelined 1-D IDCT Ptolemy Miniconference VI, 2005-05-12-13 code generation summary good QoR for naive code generator redesigned model compares favorably with existing VHDL implementations smaller, faster, simpler to use HDTV rate demonstrates strength of programming model, rather than quality of code generator lots of room for improvement pipelining, folding Ptolemy Miniconference VI, 2005-05-12-14

animation & visualization actor animation (1/4) input queues Ptolemy Miniconference VI, 2005-05-12-15 animation & visualization actor animation (2/4) input queue history Ptolemy Miniconference VI, 2005-05-12-16

animation & visualization actor animation (3/4) actor state variables Ptolemy Miniconference VI, 2005-05-12-17 animation & visualization actor animation (4/4) action selection status Ptolemy Miniconference VI, 2005-05-12-18

outlook improved hardware code generation language coverage optimizations software code generation analysis and optimization tools debugging/visualization tools alternative entry mechanisms VisualCal Ptolemy Miniconference VI, 2005-05-12-19