Towards Software-defined Silicon: Applying LLVM to Simplifying Software



Similar documents
The Click2NetFPGA Toolchain. Teemu Rinta-aho, Mika Karlstedt, Madhav P. Desai USENIX ATC 12, Boston, MA, 13 th of June, 2012

Open Flow Controller and Switch Datasheet

DE4 NetFPGA Packet Generator Design User Guide

A low-cost, connection aware, load-balancing solution for distributing Gigabit Ethernet traffic between two intrusion detection systems

ECE 3401 Lecture 7. Concurrent Statements & Sequential Statements (Process)

Coding Rules. Encoding the type of a function into the name (so-called Hungarian notation) is forbidden - it only confuses the programmer.

Using Xilinx ISE for VHDL Based Design

if-then else : 2-1 mux mux: process (A, B, Select) begin if (select= 1 ) then Z <= A; else Z <= B; end if; end process;

VHDL GUIDELINES FOR SYNTHESIS

Digital Design with VHDL

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Example-driven Interconnect Synthesis for Heterogeneous Coarse-Grain Reconfigurable Logic

Open Source Network: Software-Defined Networking (SDN) and OpenFlow

Lecture 22: C Programming 4 Embedded Systems

Router Architectures

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

Implementation of Web-Server Using Altera DE2-70 FPGA Development Kit

AES (Rijndael) IP-Cores

RCL: Design and Open Specification

Virtual Private Systems for FreeBSD

Dynamically Translating x86 to LLVM using QEMU

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Modeling a GPS Receiver Using SystemC

An Example VHDL Application for the TM-4

Performance of Software Switching

NetFlow probe on NetFPGA

LAB #3 VHDL RECOGNITION AND GAL IC PROGRAMMING USING ALL-11 UNIVERSAL PROGRAMMER

Enabling Open-Source High Speed Network Monitoring on NetFPGA

Xenomai: integration and qualification of a real time operating system ARMadeus Systems

9/14/ :38

IO Visor Project Overview

Quartus II Introduction for VHDL Users

The irnetbox Manager User Guide

Challenges in high speed packet processing

Introduction to OpenFlow:

Mouse Programming Mouse Interrupts Useful Mouse functions Mouselib.h Mouselib.c

Python, C++ and SWIG

Applying Clang Static Analyzer to Linux Kernel

Introduction to the Altera Qsys System Integration Tool. 1 Introduction. For Quartus II 12.0

Network packet capture in Linux kernelspace

BSc (Hons) Business Information Systems, BSc (Hons) Computer Science with Network Security. & BSc. (Hons.) Software Engineering

The POSIX Socket API

Tue Apr 19 11:03:19 PDT 2005 by Andrew Gristina thanks to Luca Deri and the ntop team

Extending the Power of FPGAs. Salil Raje, Xilinx

7a. System-on-chip design and prototyping platforms

OpenFlow with Intel Voravit Tanyingyong, Markus Hidell, Peter Sjödin

Linux. Reverse Debugging. Target Communication Framework. Nexus. Intel Trace Hub GDB. PIL Simulation CONTENTS

Network Functions Virtualization on top of Xen

LLVMLinux: Embracing the Dragon

A Wire-speed Packet Classification and Capture Module for NetFPGA

A Programming Language for Processor Based Embedded Systems

Hybrid Platform Application in Software Debug

HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring

Xilinx ISE. <Release Version: 10.1i> Tutorial. Department of Electrical and Computer Engineering State University of New York New Paltz

1 Abstract Data Types Information Hiding

Learn CUDA in an Afternoon: Hands-on Practical Exercises

A way towards Lower Latency and Jitter

HDL Simulation Framework

Creating a Java application using Perfect Developer and the Java Develo...

System Structures. Services Interface Structure

Networking Virtualization Using FPGAs

Low Level Virtual Machine for Glasgow Haskell Compiler

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Watch your Flows with NfSen and NFDUMP 50th RIPE Meeting May 3, 2005 Stockholm Peter Haag

How To Write Portable Programs In C

Oracle SDN Performance Acceleration with Software-Defined Networking

Cisco Configuring Commonly Used IP ACLs

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Software Defined Networking and the design of OpenFlow switches

Table of Contents. Cisco How Does Load Balancing Work?

Control and forwarding plane separation on an open source router. Linux Kongress in Nürnberg

SystemC Tutorial. John Moondanos. Strategic CAD Labs, INTEL Corp. & GSRC Visiting Fellow, UC Berkeley

Example of Standard API

DDS. 16-bit Direct Digital Synthesizer / Periodic waveform generator Rev Key Design Features. Block Diagram. Generic Parameters.

Device Management Functions

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

EMSCRIPTEN - COMPILING LLVM BITCODE TO JAVASCRIPT (?!)

COMPSCI 314: SDN: Software Defined Networking

VHDL Test Bench Tutorial

Product Development Flow Including Model- Based Design and System-Level Functional Verification

3.5. cmsg Developer s Guide. Data Acquisition Group JEFFERSON LAB. Version

OpenAMP Framework for Zynq Devices

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

H0/H2/H4 -ECOM100 DHCP & HTML Configuration. H0/H2/H4--ECOM100 DHCP Disabling DHCP and Assigning a Static IP Address Using HTML Configuration

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur

Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

Virtuozzo Virtualization SDK

The BSN Hardware and Software Platform: Enabling Easy Development of Body Sensor Network Applications

MeshBee Open Source ZigBee RF Module CookBook

A REST API for Arduino & the CC3000 WiFi Chip

Firewall Implementation

Transcription:

Towards Software-defined Silicon: Applying LLVM to Simplifying Software Teemu Rinta-aho (teemu.rinta-aho@ericsson.com), Pekka Nikander, Adnan Ghani, Sameer D. Sahasrabuddhe NomadicLab, Ericsson Research Finland Wish-3, Chamonix, France April 2, 2011

Presentation outline Introduction Background Click Modular Router Stanford NetFPGA prototyping board High Level Synthesis (HLS) AHIR Overall Approach Use Case Conclusions

Introduction The starting point: study the applicability of HLS to packet networking The practical exercise: implement a Click-to-NetFPGA toolchain The motivation: study the applicability of modular compiler optimisations for atypical/generated hardware CPUs and ASIC/FPGA seen as two ends of a design space What could be done in the middle ground? This work started only a year ago; this presentation is to publish intermediate results on a work-in-progress project and hopefully get others interested We are happy to share the source code

Click Modular Router Software platform (C++) for building different kinds of packet-processing nodes / functions, or routers Defines a software router from packet processing elements Open source, by Eddie Kohler (MIT) Runs in Linux/BSD/Darwin userspace & Linux kernel (BTW, we are also working on a FreeBSD kernel version) 100+ existing elements, e.g. ARPResponder, Classifier Easy to add new elements Easy to build a custom routers

Stanford NetFPGA A PCI network interface card with an FPGA 4 x 1G Ethernet interface Line-rate, flexible, and open platform For research and classrooms More than 1,000 NetFPGA systems deployed A few open-source, Verilog-based reference designs A newer, NetFPGA 10G card is also available 4 x 10G Ethernet interface, bigger FPGA, faster PCI interface

High Level Synthesis (HLS) A high level program hardware (RTL) Typically: C/C++/System C Verilog/VHDL Currently available tools limitations and features Designed for hardware professionals; for writing hardware in C Support only a subset of C/C++, excluding e.g.: dynamic memory function pointers and virtual methods recursion Cannot handle existing software-oriented code e.g. Click elements written using full C++ Closed-source commercial products

AHIR A Hardware Intermediate Representation LLVM backend for generating VHDL To-be open source, by IIT Bombay (India) Factorises the system into control, data, and storage Supports scalable optimisations and analyses Current limitations: no recursion or function pointers, otherwise full C Generates a VHDL module out of each LLVM IR function Design = Set of modules with I/O channels I/O through a simple VHDL library, resembling Unix pipes

AN AHIR example Converting an LLVM instruction to VHDL Simple Addition Example C code: int d = m + n; Equivalent SSA: %d = add i32 %m, %n CDFG Start Control path Data path m add i32 n Init Fin req ack m add i32 n Stop %add Control Edge Data Edge

Presentation outline Introduction Background Click Modular Router Stanford NetFPGA prototyping board High Level Synthesis (HLS) AHIR Overall Approach Use Case Conclusions

Overall Approach Click C++ hardware friendly LLVM IR VHDL New Click target, similar to BSD/Linux specific differences E.g. use table-based memory allocation instead of skb/heap-based Click2LLVM: Click configuration LLVM IR Use Click library functions to set up a Click router in memory Dump the memory as a set of constants, expressed in LLVM IR LLVM IR hardware friendly LLVM IR Fixing this arguments to constants Aggressive constant propagation, aggressive inlining Together resulting in e.g. removing virtual function calls AHIR: hardware friendly LLVM IR VHDL

Hardware Hardware Click Click runtime Click runtime element source Click NetFPGA target library llvm-g++ Click Click element Click element Click element element IR IR IR IR llvm-g++ Click Click element Click element Click object element object element object object llvmlink Linked IR Click host target library Click2- LLVM Click config IR Click config source

optimising optimising optimising pass optimising pass optimising pass pass pass NetFPGA/ Click wrapper NetFPGA reference NIC Linked IR opt Optimised IR AHIR VHDL Xilinx ISE

Click/NetFPGA Wrapper Free Queue Reference NIC design Input Module src_in0 src min dst dst_out0 Output Module Reference NIC design Memory Subsystem

NetFPGA wrapper for the Click/AHIR system NetFPGA processes packets on the fly, word by word Click only transfers pointers to Packets Packets stay in the same memory location The wrapper receives and sends NetFPGA words Stores them locally as a Packet in a memory subsystem Memory is managed as a queue of free locations InputModule reads a free location from the "free queue", then receives data from the NetFPGA interface & stores it OutputModule writes the data on the NetFPGA interface, then returns the pointer to the free queue HLS includes elements from Click configuration Click elements may create or destroy packets using direct access to the free queue

Presentation outline Introduction Background Click Modular Router Stanford NetFPGA prototyping board High Level Synthesis (HLS) AHIR Overall Approach Use Case Conclusions

Use Case: Click.conf require(package "minimal-package"); src :: FromFPGA; min :: Minimal; dst :: ToFPGA; src -> min -> dst;

Use Case: Click.conf require(package "minimal-package"); Dynamically link external Click elements src :: FromFPGA; min :: Minimal; dst :: ToFPGA; src -> min -> dst;

Use Case: Click.conf require(package "minimal-package"); src :: FromFPGA; min :: Minimal; dst :: ToFPGA; Dynamically link external Click elements Declare and name the used elements src -> min -> dst;

Use Case: Click.conf require(package "minimal-package"); src :: FromFPGA; min :: Minimal; dst :: ToFPGA; src -> min -> dst; Dynamically link external Click elements Declare and name the used elements Define packet processing order

Use Case: simple_action() Packet * Minimal::simple_action(Packet *p) { unsigned char *data = p->uniqueify()->data(); data[0] ^= 0x0F; return p; } The packet processing part of a valid, yet an extremely simple, Click element named Minimal. Written in C++ without any special target-specific requirements or restrictions.

Use Case: ahir_glue_min() @1 = constant [8 x i8] c"min_in0\00" @GV_3 = constant [1 x %"struct.element::port"] [%"struct.element::port" { [16 x i8] c"dst_in0\00\00\00\00\00\00\00\00\00", %struct.element* undef, i32 undef }] @min = constant %struct.minimal {, @GV_3, } ; Struct contents not fully shown define void @ahir_glue_min() { %0 = call i32 @read_uintptr(i8* getelementptr inbounds ([8 x i8]* @1, i32 0, i32 0)) %1 = inttoptr i32 %0 to %struct.packet* call void @element_push(%struct.element* getelementptr inbounds (%struct.minimal* @min, i32 0, i32 0), i32 0, %struct.packet* %1) ret void } The new, generated, wrapper function together with generated global constants, in LLVM IR form.

Use Case: ahir_glue_min() optimised @1 = internal constant [8 x i8] c"min_in0\00" @GV_3 = constant [1 x %"struct.element::port"] [%"struct.element::port" { [16 x i8] c"dst_in0\00\00\00\00\00\00\00\00\00", %struct.element* undef, i32 undef }] define void @ahir_glue_min() ssp { %tmp = tail call i32 @read_uintptr(i8* getelementptr inbounds ([8 x i8]* @1, i32 0, i32 0)) %tmp3 = inttoptr i32 %tmp to %struct.packet* %tmp4 = getelementptr inbounds %struct.packet* %tmp3, i32 0, i32 3 %tmp5 = load i8** %tmp4, align 4 %tmp6 = load i8* %tmp5, align 1 %tmp7 = xor i8 %tmp6, 15 store i8 %tmp7, i8* %tmp5, align 1 %tmp8 = icmp eq i32 %tmp, 0 br i1 %tmp8, label %_ZN7Element4pushEiP6Packet.exit, label %bb.i bb.i: tail call void @write_uintptr(i8* getelementptr inbounds ( [1 x %"struct.element::port"]* @GV_3, i32 0, i32 0, i32 0, i32 0), i32 %tmp) ret void _ZN7Element4pushEiP6Packet.exit: ret void } The optimised wrapper function. Constant input and output port names, inlined packet processing code, only AHIR I/O function calls.

Use Case: Excerpts of Generated VHDL entity click_ahir_ll_ahir_glue_min_dp is port( io_dst_in0_ack : in std_logic; io_dst_in0_data : out std_logic_vector(31 downto 0); io_dst_in0_req : out std_logic; io_min_in0_ack : in std_logic; io_min_in0_data : in std_logic_vector(15 downto 0); io_min_in0_req : out std_logic; ); end click_ahir_ll_ahir_glue_min_dp; dpe_16 : APInt_S_2 -- tmp7 -- configuration: APIntXor generic map( colouring => (0 => 0)) port map( ackc => SigmaOut(20 downto 20), ackr => SigmaOut(19 downto 19), clk => clk, reqc => SigmaIn(19 downto 19), reqr => SigmaIn(18 downto 18), reset => reset, x => wire_11, y => wire_13, z => wire_12);

Use Case: Excerpts of Generated VHDL entity click_ahir_ll_ahir_glue_min_dp is port( io_dst_in0_ack : in std_logic; io_dst_in0_data : out std_logic_vector(31 downto 0); io_dst_in0_req : out std_logic; io_min_in0_ack : in std_logic; io_min_in0_data : in std_logic_vector(15 downto 0); io_min_in0_req : out std_logic; ); end click_ahir_ll_ahir_glue_min_dp; dpe_16 : APInt_S_2 -- tmp7 -- configuration: APIntXor generic map( colouring => (0 => 0)) port map( ackc => SigmaOut(20 downto 20), ackr => SigmaOut(19 downto 19), clk => clk, reqc => SigmaIn(19 downto 19), reqr => SigmaIn(18 downto 18), reset => reset, x => wire_11, y => wire_13, z => wire_12); Wires representing the read_uintptr() and write_uintptr() functions; leading to previous and next Click elements

Use Case: Excerpts of Generated VHDL entity click_ahir_ll_ahir_glue_min_dp is port( io_dst_in0_ack : in std_logic; io_dst_in0_data : out std_logic_vector(31 downto 0); io_dst_in0_req : out std_logic; io_min_in0_ack : in std_logic; io_min_in0_data : in std_logic_vector(15 downto 0); io_min_in0_req : out std_logic; ); end click_ahir_ll_ahir_glue_min_dp; dpe_16 : APInt_S_2 -- tmp7 -- configuration: APIntXor generic map( colouring => (0 => 0)) port map( ackc => SigmaOut(20 downto 20), ackr => SigmaOut(19 downto 19), clk => clk, reqc => SigmaIn(19 downto 19), reqr => SigmaIn(18 downto 18), reset => reset, x => wire_11, y => wire_13, z => wire_12); Wires representing the read_uintptr() and write_uintptr() functions; leading to previous and next Click elements Block describing the actual XOR operation on data (the actual operation is defined in a library and uses standard VHDL xor on two integers)

Conclusions High-level goal: Study the applicability of modular compiler optimisations for atypical/generated hardware Practical goal: Implement a Click-to-NetFPGA tool chain Main contributions: Click2LLVM: Dump a Click process memory as LLVM IR constants AHIR: Convert LLVM IR to VHDL The Click/NetFPGA wrapper for the NetFPGA reference NIC design Future work: evaluate the performance of the hardware accelerated Click router evaluate different software- and hardware-level optimisations gain more understanding on the possibilities of current compilation and high-level synthesis techniques