Dataflow Programming with MaxCompiler



Similar documents
Going to the wire: The next generation financial risk management platform

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

High-Level Synthesis for FPGA Designs

The Click2NetFPGA Toolchain. Teemu Rinta-aho, Mika Karlstedt, Madhav P. Desai USENIX ATC 12, Boston, MA, 13 th of June, 2012

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

CFD Implementation with In-Socket FPGA Accelerators

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur

Xeon+FPGA Platform for the Data Center

High Performance Computing in CST STUDIO SUITE

Heterogeneous Cloud Computing:

FPGA-based MapReduce Framework for Machine Learning

Datacenter Operating Systems

Accelerate Cloud Computing with the Xilinx Zynq SoC

RIPL. An Image Processing DSL. Rob Stewart & Deepayan Bhowmik. 1st May, Heriot Watt University

Cloud Computing through Virtualization and HPC technologies

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

CReST Developers Guide

Extending the Power of FPGAs. Salil Raje, Xilinx

Example-driven Interconnect Synthesis for Heterogeneous Coarse-Grain Reconfigurable Logic

Adonis Technical Requirements

High-Density Network Flow Monitoring

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group

Dualog Connection Suite Hardware and Software Requirements

Effective Java Programming. efficient software development

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Full and Para Virtualization

FSMD and Gezel. Jan Madsen

Embedded Software development Process and Tools: Lesson-4 Linking and Locating Software

Supercomputing: DataFlow vs. ControlFlow (Paradigm Shift in Both, Benchmarking Methodology and Programming Model)

Entwicklung und Testen von Robotischen Anwendungen mit MATLAB und Simulink Maximilian Apfelbeck, MathWorks

I/O virtualization. Jussi Hanhirova Aalto University, Helsinki, Finland Hanhirova CS/Aalto

Main Memory Data Warehouses

CHAPTER FIVE RESULT ANALYSIS

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

<Insert Picture Here> Oracle Database Support for Server Virtualization Updated December 7, 2009

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

COMPUTING. SharpStreamer Platform. 1U Video Transcode Acceleration Appliance

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Basics in Energy Information (& Communication) Systems Virtualization / Virtual Machines

PRIMERGY server-based High Performance Computing solutions

Exar. Optimizing Hadoop Is Bigger Better?? March Exar Corporation Kato Road Fremont, CA

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Performance tuning Xen

Introduction to Programmable Logic Devices. John Coughlan RAL Technology Department Detector & Electronics Division

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

SEAIP 2009 Presentation

Using ModelSim, Matlab/Simulink and NS for Simulation of Distributed Systems

Fixed Price Website Load Testing

Qsys and IP Core Integration

HIGH PERFORMANCE BIG DATA ANALYTICS

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

HARNESS project: Managing Heterogeneous Compute Resources for a Cloud Platform

Eli Levi Eli Levi holds B.Sc.EE from the Technion.Working as field application engineer for Systematics, Specializing in HDL design with MATLAB and

Modeling a GPS Receiver Using SystemC

The Art of Virtualization with Free Software

wu.cloud: Insights Gained from Operating a Private Cloud System

Schnell und effizient durch Automatische Codegenerierung

Open Flow Controller and Switch Datasheet

Introduction to the NI Real-Time Hypervisor

How To Design An Image Processing System On A Chip

Big Data Functionality for Oracle 11 / 12 Using High Density Computing and Memory Centric DataBase (MCDB) Frequently Asked Questions

9/14/ :38

Data Analytics at NERSC. Joaquin Correa NERSC Data and Analytics Services

21. Software Development Team

Module I-7410 Advanced Linux FS-11 Part1: Virtualization with KVM

Digital Systems Design! Lecture 1 - Introduction!!

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

QoS & Traffic Management

Cloud Data Center Acceleration 2015

HPC performance applications on Virtual Clusters

Architectures and Platforms

Bringing Big Data Modelling into the Hands of Domain Experts

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011

SGI High Performance Computing

Implementation of Web-Server Using Altera DE2-70 FPGA Development Kit

Scaling Graphite Installations

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Cover. WinCC/Server Virtualization. WinCC. Technical Information April Applications & Tools. Answers for Industry.

Paolo Costa

Blueprints for Scalable IBM Spectrum Protect (TSM) Disk-based Backup Solutions

Multicore Parallel Computing with OpenMP

Embedded Systems: map to FPGA, GPU, CPU?

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

An Introduction to Parallel Computing/ Programming

IT Business Management System Requirements Guide

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

Networking Virtualization Using FPGAs

Vocera Voice 4.3 and 4.4 Server Sizing Matrix

Express5800 Scalable Enterprise Server Reference Architecture. For NEC PCIe SSD Appliance for Microsoft SQL Server

Hardware-accelerated Text Analytics

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

White Paper. Recording Server Virtualization

Tableau Server 7.0 scalability

CSC230 Getting Starting in C. Tyler Bletsch

#OpenPOWERSummit. Join the conversation at #OpenPOWERSummit 1

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

PyMTL and Pydgin Tutorial. Python Frameworks for Highly Productive Computer Architecture Research

Transcription:

Dataflow Programming with MaCompiler

Lecture Overview Programming DFEs MaCompiler Streaming Kernels Compile and build Java meta-programming 2

Reconfigurable Computing with DFEs Logic Cell (10 5 elements) Xilin Virte-6 FPGA DSP Block IO Block Block RAM DSP Block Block RAM (20TB/s) 3

DFE Acceleration Hardware Solutions MaRack 10U, 20U or 40U MaCard MaNode MaRack PCI-Epress Gen 2 Typical 50W-80W 24-48GB RAM 1U server 4 MAX3 Cards Intel Xeon CPUs 10U, 20U or 40U Rack 4

How could we program it? Schematic entry of circuits Traditional Hardware Description Languages VHDL, Verilog, SystemC.org Object-oriented languages C/C, Python, Java, and related languages Functional languages: e.g. Haskell High level interface: e.g. Mathematica, MatLab Schematic block diagram e.g. Simulink Domain specific languages (DSLs) 5

Level of Abstraction Accelerator Programming Models DSL DSL DSL DSL Higher Level Libraries Higher Level Libraries Fleible Compiler System: MaCompiler Possible applications 6

Start Acceleration Development Flow Original Application Identify code for acceleration and analyze bottlenecks Transform app, architect and model performance Write MaCompiler code Integrate with Host code Simulate NO NO Accelerated Application YES Meets performance goals? Build for Hardware YES Functions correctly? 7

MaCompiler Complete development environment for Maeler DFE accelerator platforms Write MaJ code to describe the dataflow accelerator MaJ is an etension of Java for MaCompiler Eecute the Java generate the accelerator C software on CPUs uses the accelerator class MyAccelerator etends Kernel { public MyAccelerator( ) { DFEVar = io.input("", dfefloat(8, 24)); DFEVar y = io.input( y", dfefloat(8, 24)); DFEVar 2 = * ; DFEVar y2 = y * y; DFEVar result = 2 y2 30; } } io.output( z", result, dfefloat(8, 24)); 8

Application Components CPU Host application SLiC MaelerOS Kernels DFE PCI Epress * Memory Memory Manager 9

Programming with MaCompiler Host Application *.c, *.f90... Rewrite only code to be accelerated MyKernel.maj MyManager.maj User Input MaIDE User Input Compiler maeleros Library MaCompiler SLiC Library Linker MAX File Sim or H/W (.ma) Output Output Eecutable 10

Simple Application Eample CPU Host Code Main Memory Host Code (.c) int*, *y; for (int i =0; i < DATA_SIZE; i) y[i]= [i] * [i] 30; y i i i 30 11

Development Process CPU Host Code MaCompilerRT MaelerOS Main Memory y PCI Epress Memory Manager DFE 30 y 30 Host Code (.c) int*, *y; MyKernel(DATA_SIZE, for (int i =0; i < DATA_SIZE; i) y[i]= [i], * [i] DATA_SIZE*4, 30; y, DATA_SIZE*4); MyManager (.maj) Manager m = new Manager(); Kernel k = new MyKernel(); m.setkernel(k); m.setio( link( ", CPU), link( y", CPU)); m.build(); MyKernel (.maj) DFEVar = io.input("", dfeint(32)); DFEVar result = * 30; io.output("y", result, dfeint(32)); 12

Development Process CPU Host Code MaCompilerRT MaelerOS Main Memory PCI Epress Memory y Manager DFE 30 y 30 Host Code (.c) device = ma_open_device(mafile, "/dev/maeler0"); int*, *y; MyKernel( DATA_SIZE,, DATA_SIZE*4); MyManager (.maj) Manager m = new Manager(); Kernel k = new MyKernel(); m.setkernel(k); m.setio( link( ", CPU), link( y", DRAM_LINEAR1D)); m.build(); MyKernel (.maj) DFEVar = io.input("", dfeint(32)); DFEVar result = * 30; io.output("y", result, dfeint(32)); 13

The Full Kernel public class MyKernel etends Kernel { public MyKernel (KernelParameters parameters) { super(parameters); DFEVar = io.input("", dfeint(32)); } } DFEVar result = * 30; io.output("y", result, dfeint(32)); 30 y 14

Streaming Data through the Kernel 5 4 3 2 1 0 0 0 30 30 y 30 15

Streaming Data through the Kernel 5 4 3 2 1 0 1 1 30 31 y 30 31 16

Streaming Data through the Kernel 5 4 3 2 1 0 2 4 30 34 y 30 31 34 17

Streaming Data through the Kernel 5 4 3 2 1 0 3 9 30 39 y 30 31 34 39 18

Streaming Data through the Kernel 5 4 3 2 1 0 4 16 30 46 y 30 31 34 39 46 19

Streaming Data through the Kernel 5 4 3 2 1 0 5 25 30 55 y 30 31 34 39 46 55 20

Compile, Build and Run Java program generates a MaFile when it runs 1. Compile the Java into.class files 2. Eecute the.class file Builds the dataflow graph in memory Generates the hardware.ma file 3. Link the generated.ma file with your host program 4. Run the host program Host code automatically configures DFE(s) and interacts with them at run-time 21

Java meta-programming You can use the full power of Java to write a program that generates the dataflow graph Java variables can be used as constants in hardware int y; DFEVar ; = y; Hardware variables can not be read in Java! Cannot do: int y; DFEVar ; y = ; Java conditionals and loops choose how to generate hardware not make run-time decisions 22

Dataflow Graph Generation: Simple What dataflow graph is generated? DFEVar = io.input(, type); DFEVar y; y = 1; io.output( y, y, type); 1 y 23

Dataflow Graph Generation: Simple What dataflow graph is generated? DFEVar = io.input(, type); DFEVar y; y = ; io.output( y, y, type); y 24

Dataflow Graph Generation: Variables What s the value of h if we stream in 1? DFEVar h = io.input( h, type); int s = 2; 1 10 s = s 5 h = h 10 7 h = h s; 18 What s the value of s if we stream in 1? DFEVar h = io.input( h, type); int s = 2; s = s 5 h = h 10 s = h s; Compile error. You can t assign a hardware value to a Java int 25

Dataflow Graph Generation: Conditionals What dataflow graph is generated? DFEVar = io.input(, type); int s = 10; DFEVar y; if (s < 100) { y = 1; } else { y = 1; } io.output( y, y, type); What dataflow graph is generated? DFEVar = io.input(, type); DFEVar y; y 1 if ( < 10) { y = 1; } else { y = 1; } io.output( y, y, type); Compile error. You can t use the value of in a Java conditional 26

Conditional Choice in Kernels Compute both values and use a multipleer. = control.mu(select, option0, option1,, optionn) = select? option1 : option0 Ternary-if operator is overloaded DFEVar = io.input(, type); DFEVar y; y = ( > 10)? 1 : 1 > 10 1-1 io.output( y, y, type); y 27

Dataflow Graph Generation: Java Loops What dataflow graph is generated? DFEVar = io.input(, type); DFEVar y = ; for (int i = 1; i <= 3; i) { y = y i; } io.output( y, y, type); 1 2 Can make the loop any size until you run out of space on the chip! Larger loops can be partially unrolled in space and used multiple times in time see Lecture on loops and cyclic graphs y 3 28

29 Real data flow graph as generated by MaCompiler 4866 nodes; 10,000s of stages/cycles

Eercises 1. Write a MaCompiler kernel program that takes three input streams, y and z which are dfeint(32) and computes an output stream p, where: p i ( i y ( i z ( i yi i i ) 2 ) 2 z i ) if if z z i i otherwise i i 2. Draw the dataflow graph generated by the following program: for (int i = 0; i < 6; i) { DFEVar = io.input( i, dfeint(32)); DFEVar y = ; if (i % 3!= 0) { for (int j = 0; j < 3; j) { y = y *j; } } else { y = y * y; } io.output( y i, y, dfeint(32)); } 30