Compiling Recursion to Reconfigurable Hardware using CLaSH

Similar documents
Let s put together a Manual Processor

VHDL Test Bench Tutorial

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Recovering Business Rules from Legacy Source Code for System Modernization

Figure 1: Graphical example of a mergesort 1.

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Thomas Jefferson High School for Science and Technology Program of Studies Foundations of Computer Science. Unit of Study / Textbook Correlation

FPGA area allocation for parallel C applications

PCB Project (*.PrjPcb)

What Is Recursion? Recursion. Binary search example postponed to end of lecture

The Clean programming language. Group 25, Jingui Li, Daren Tuzi

CAs and Turing Machines. The Basis for Universal Computation

7a. System-on-chip design and prototyping platforms

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

The Evolution of CCD Clock Sequencers at MIT: Looking to the Future through History

COMPUTER SCIENCE TRIPOS

Control 2004, University of Bath, UK, September 2004

CFD Implementation with In-Socket FPGA Accelerators

EE360: Digital Design I Course Syllabus

ML for the Working Programmer

MEng, BSc Applied Computer Science

Regular Expressions and Automata using Haskell

A New Paradigm for Synchronous State Machine Design in Verilog

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

50 Computer Science MI-SG-FLD050-02

MEng, BSc Computer Science with Artificial Intelligence

ELECTENG702 Advanced Embedded Systems. Improving AES128 software for Altera Nios II processor using custom instructions

Lab 1: Introduction to Xilinx ISE Tutorial

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

COMPUTER SCIENCE, BACHELOR OF SCIENCE (B.S.)

9/14/ :38

University of Dayton Department of Computer Science Undergraduate Programs Assessment Plan DRAFT September 14, 2011

Digital Systems Design! Lecture 1 - Introduction!!

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

CSCI 3136 Principles of Programming Languages

Integrating Formal Models into the Programming Languages Course

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Compiler I: Syntax Analysis Human Thought

INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism

Using reverse circuit execution for efficient parallel simulation of logic circuits

DRAFT Gigabit network intrusion detection systems

ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER

Application Note: AN00141 xcore-xa - Application Development

Parallel Computing. Benson Muite. benson.

Building Applications Using Micro Focus COBOL

Chapter 15 Functional Programming Languages

Assessment for Master s Degree Program Fall Spring 2011 Computer Science Dept. Texas A&M University - Commerce

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware

Recursion. Slides. Programming in C++ Computer Science Dept Va Tech Aug., Barnette ND, McQuain WD

Levels of Programming Languages. Gerald Penn CSC 324

Multicore Programming with LabVIEW Technical Resource Guide

Spring 2011 Prof. Hyesoon Kim

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Finite State Machine Design and VHDL Coding Techniques

Sources: On the Web: Slides will be available on:

C Compiler Targeting the Java Virtual Machine

Algorithm & Flowchart & Pseudo code. Staff Incharge: S.Sasirekha

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

CA Compiler Construction

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Hardware and Software

CUDA programming on NVIDIA GPUs

INTRODUCTION TO DIGITAL SYSTEMS. IMPLEMENTATION: MODULES (ICs) AND NETWORKS IMPLEMENTATION OF ALGORITHMS IN HARDWARE

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Three Effective Top-Down Clustering Algorithms for Location Database Systems

Software Synthesis from Dataflow Models for G and LabVIEW

Systolic Computing. Fundamentals

OPC COMMUNICATION IN REAL TIME

Boole-WebLab-Deusto: Integration of a Remote Lab in a Tool for Digital Circuits Design

FPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL

Example-driven Interconnect Synthesis for Heterogeneous Coarse-Grain Reconfigurable Logic

Java Coding Practices for Improved Application Performance

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

WESTMORELAND COUNTY PUBLIC SCHOOLS Integrated Instructional Pacing Guide and Checklist Computer Math

fakultät für informatik informatik 12 technische universität dortmund Data flow models Peter Marwedel Informatik 12 TU Dortmund Germany

10CS35: Data Structures Using C

International Workshop on Field Programmable Logic and Applications, FPL '99

Parallelization: Binary Tree Traversal

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

COURSE TITLE COURSE DESCRIPTION

FPGA-based Multithreading for In-Memory Hash Joins

Programming Language Concepts for Software Developers

The 104 Duke_ACC Machine

TPCalc : a throughput calculator for computer architecture studies

EE361: Digital Computer Organization Course Syllabus

Towards a Benchmark Suite for Modelica Compilers: Large Models

Get an Easy Performance Boost Even with Unthreaded Apps. with Intel Parallel Studio XE for Windows*

Designing with Exceptions. CSE219, Computer Science III Stony Brook University

Integer Computation of Image Orthorectification for High Speed Throughput

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

SIM-PL: Software for teaching computer hardware at secondary schools in the Netherlands

Transcription:

Compiling Recursion to Reconfigurable Hardware using CLaSH Ruud Harmsen University of Twente P.O. Box 217, 7500AE Enschede The Netherlands r.harmsen@student.utwente.nl ABSTRACT Recursion is an important problem solving technique and implementing it on hardware is not trivial. In this research several methods on how to do this are discussed and one is implemented. The goal is to find out if it is possible to automate the process of making recursion possible on hardware and add this automation to the CλaSH compiler. The CλaSH compiler is a compiler from Haskell to VHDL, Haskell has a much higher level of abstraction than VHDL does, so should give an easier way to implement hardware designs. This compiler is still a work in progress, one of the missing things is support for recursion. This paper shows how to compile simple recursion to a stack-based framework which can be used in CλaSH. Keywords CλaSH, hardware recursion, FPGA 1. INTRODUCTION Recursion is an important and powerful problem solving technique, which is used in important algorithms e.g. search and routing algorithms. Recursion is a method where the solution to a problem depends on solutions to smaller instances of the same problem. The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. Implementing recursion on reconfigurable hardware like an Field programmable gate array (FPGA) is not trivial. Some examples for implementing recursion on hardware (FPGA) is done by Mihhailov et al.[6], the study implemented a parallel sorting algorithm on an FPGA. Marauyama et al.[5] show 2 algorithms which shows that recursion can be faster on reconfigurable hardware, the speed of some simple combinatorial algorithms are 4.1 to 6.7 times faster on a relatively slow FPGA (compared to a processor). The first attempt to do this was made by Tsutomu Maruyama et al.[5], where recursion was implemented using stacks. Sklyarov et al.[10] show the difference between a recursive and iterative solution of some algorithms and the difference in performance. Research on this topic is discussed in the Related Work section. Programming an FPGA is done in a Hardware Descrip- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 17 th Twente Student Conference on IT June 25 st, 2012, Enschede, The Netherlands. Copyright 2012, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science. tion Language (HDL) like VHDL or Verilog. A relatively new HDL is CAES Language for Synchronous Hardware (CλaSH), this is a functional HDL created by the CAESgroup [1]. CλaSH does not support recursion yet, this paper will give more insight in how to add recursion to this compiler. There are some HDL compilers available that can cope with recursion like the C to HDL compiler by Maruyama et al.[4], but C is not a functional language. 2. RELATED WORK Maruyama et al.[5] implement recursion using stacks. The authors show as an example that an algorithm to solve the knapsack problem can be split in 2 parts, a recursive part and a loop instruction (tail recursion optimization). The study shows that this loop part of the algorithm can be put on a stack and executed parallel to the recursive part using multiple threads. In that way a greater performance was achieved because multiple calculations are executed at the same time. In the example of the knapsack problem this shows a speedup of 6.7 times using an FPGA in stead of normal sequential execution, even when the clock speed is much lower than the processor (35Mhz compared to 200Mhz). Sklyarov suggests an implementation of recursive algorithms using Hierarchical Graph Scheme (HGS) [9]. His idea is to divide the algorithm to a discrete number of modules where each module is subdivided with a number of states. (See Figure 1). This is a Hierarchical Finite State Machine (HFSM). This idea uses 3 stacks, a Modulestack where the different models (Z n in Figure 1) are stored, a Statestack (a n in Figure 1) and the Datastack where the results are stored. Figure 1. Modules and states of HGS

Every time a recursive module is called it is put on the stack. Therefore the stack can grow out of bounds if a function has to many recursive calls. Pipelining like [5] could decrease the stacks by multi-threaded execution. Ferizis et al.[2] tried a different approach, no stacks in this case. This PhD study divides the recursive function in to a base part and a recursive part. The recursive part is then divided in a pre- and post recursive part. These are defined as the part before and after the recursive call is made. Due to dividing the algorithm in these parts, they are able to place recursive parts in a pipeline. One downside of this implementation is that this approach needs runtime configuration of the hardware, which is not possible on all types and takes precious time. The pipeline takes up a lot of space on the FPGA so the algorithm can t be larger than the FPGA itself. The results they achieved where good, a quick-sort algorithm of 8000 items took only 17% of the clock cycles it would take on a stack-based solution. Ninos et al.[7] used the previous proposals and suggests a data-oriented approach. This basically is an algorithm that converts recursion to iteration with a stack (the same as it is executed using software), called recursion simplification operation. This doesn t include parallel computing. The methods above are compared by Skliarova et al.[8]. The conclusion of this study was that there is not optimal solution to go from recursion to reconfigurable hardware. All solutions require that the designer produces the hardware compatible code, the solutions only provide the theory (with exception to the C to HDL compiler of Maruyama et al.[4]) Kegel [3] implemented the method of Sklyarov[9] using Haskell, as a start of adding recursion to the CλaSHcompiler. This still requires the designer to implement the method of Sklyarov[9] if recursion is used. Kegel suggests that the framework code should be generated by the compiler so the designer should not have to write the errorprone code. This research uses a simplified framework which could be extended to Kegel s framework as Future Work. 2.1 CLaSH Compiler The CλaSH compiler takes a subset of the Haskell language and converts this to basic VHDL. This VHDL code can be run on for example an FPGA. As the circuit descriptions, simulation code, and test input are also valid Haskell, complete simulations can be done by a Haskell compiler or interpreter, allowing high-speed simulation and analysis. The specifics are described in [1]. This is still a work in progress. One of the missing elements is the support for recursion. This research gives insight on how this could be added and a simple example is given. 3. THEORY Mapping a recursive function on hardware is not a simple task. There are several methods on how to implement this, the methods are discussed in the Related Work section. The most straight-forward method is the method of Sklyarov, this research uses a simplified version of his method. The method uses a stack to store the results in between of recursive calls. This framework can handle only simple recursion functions with 1 recursive call in the recursive case and 1 base case. It should give insight in how recursion can be handled and the possibility of more complex functions will be future work. This mimics the way recursion is solved in software. 3.1 Compiler The aim for this paper is to provide a method to implement recursion into the CλaSH compiler. The CλaSH compiler takes a CλaSH source file, runs it trough the Haskell compiler and compiles this to VHDL. Inside the compiler several steps are needed to make this possible. One of these steps should be to convert recursive functions to hardware compatible code. In this research this step of converting recursive functions to hardware compatible code is done in a pre-compiler. This pre-compiler gets an Abstract Syntax Tree (AST) of the recursive function and converts it to a non-recursive Haskell code using the framework which is explained below. 3.2 Standard Form Before a recursive function can be translated using the CλaSH compiler it has to be recognized. A simple recursive function can be described as below. Listing 1. The Standard Form of a simple recursive function 1 f n p n = a 2 otherwise = g n ( f ( h n ) ) 3 g = main f u n c t i o n 4 a = base case e x p r e s s i o n 5 p = base case c o n d i t i o n o f n 6 h = f u n c t i o n a p p l i e d on n The g,a,p,h parameters have to be determined in order to convert it to the framework explained below. The simple format is limited to only 1 recursive call and 1 initial parameter. This limits the standard form to a few functions but it explains the basics. An example function is the factorial function which is used below. In order to use this on hardware, the different parameters have to be extracted from the function and have to be used in the framework explained below. 3.3 The Simple Framework When the g,a,p,h parameters are determined they can be processed by the simple framework which will be explained here. Listing 2 shows the simple framework. It consists of a function ff which has 3 parameters. The parsed parameters (g,h,p,a) The state (n,v,s) n is the input argument v current value (during evaluation) s is the stack The input, can be anything, this is only important as clock-tick The result after 1 clock cycle output y new state (n1,v1,s1) The workings of the framework is explained below. An example using the factorial function is given later on. The function ff is called every clock cycle. When initiated, the first parameter is the tuple of parsed parameters, the second parameter is the state which has to be initiated, the third input is the clock input. The function ff will return a new state (s1) and the current output value(y), which

Listing 2. The stack based framework 1 f f ( g, h, p, a ) (n, v, s ) = ( ( n1, v1, s1 ), y ) 2 where 3 ( x, s0 ) = pop s0 4 ( n1, v1, s1 ) not ( p n ) = ( h n, a, push (n, s ) ) 5 not ( emptys s ) = ( n, g ( x, v ), s0 ) 6 otherwise = ( n, v, s ) 7 y = v1 Figure 2. Framework graphical representation after 4 clock-cycles is just a copy of the current value of the state (v1). The next call, which most likely will happen the next clockcycle, calls the same function ff but the state(s1) from the last call will be used as input for this call. The output state(s1) will change depending on the base case condition (p) or when there is nothing left to do (the stack is empty). The output state(s1) will be as follows: not (p n) The base condition(p) is not met, the input argument(n1) will be changed by function h, the current value (v1) is the base case expression (a) and the input value (n) is pushed onto the stack. not (emptys s) When the stack is not empty, the input argument(n1) does not change, the current value(v1) is the function g applied on parameters v and x where x is a value popped from the stack. otherwise The stack is empty and the base condition(p) is met so there is nothing to work with and the output state is copied from the input state. A graphical representation of the framework is given in Figure 2. 3.4 Simple Recursion Function (Factorial) Listing 3 shows the recursive definition of the Factorial function in the format of the standard form. This is required for the grammar to recognize it as a recursive function. For easier processing, the code will be converted using λ-abstractions. This is a simple 1 on 1 process. Listing 3. Factorial function 1 f a c n n==0 = 1 2 otherwise = n f a c ( n 1) 3 g : n f a c (n 1) 4 a : 1 5 p : (n==0) 6 h : (n 1) For the factorial function this is done and the result is in Listing 4. Listing 4. Factorial using λ-abstraction 1 f a c n 2 (\ x >x==0) n = 1 3 otherwise = ( \ ( x, y) >(x y ) ) ( a, b ) 4 where a = n 5 b = f a c ( ( \ x >x 1) n ) 6 g = \( x, y) >(x y ) 7 a = 1 8 p = \ x >(x==0) 9 h = \ x >(x 1)

The conversions are as follows: n == 0 The base condition, this is (λx x == 0)n. n fac(n 1) This is a multiplication applied on n and fac. This corresponds with the lambda abstraction λ(x, y) x y where x = n, y = fac(n 1). n 1 The next iteration of the input value, in lambda calculus this corresponds with λx x 1 The g,h,p and a parameters will be as stated on lines[6..9] in Listing 4. When used in a simulation of the framework this will look something like Figure 2. This shows the factorial(5!) function with the first 4 cycles. During these 4 cycles the base condition is not met, n > 0 so n is pushed onto the stack. The output value(y) is the base case value. The output value of 10 clock cycles is [1, 1, 1, 1, 1, 1, 2, 6, 24, 120], more clock cycles will just return 120 because the otherwise condition from the function ff is met. 4. IMPLEMENTATION In this research a pre-compiler is used on a handmade AST. This is not the best solution but demonstrates how the grammar and framework works. Ideally the programmer does not have to do anything special, like special annotations in the code which has to be done in the C to HDL Compiler[4], to use recursion in CλaSH. The precompiler will parse the parameters out of the AST and paste s them into the framework. The framework displayed in Listing 2 is in Haskell syntax. The framework uses lists which are not compatible with CλaSH. These lists have to be converted to a vector. This conversion to CλaSH has been tried but not with success this is because of lack of knowledge of CλaSH. The output of this precompiler will consist of the framework code and the parsed functions (g,p,h,a). This output file is a valid Haskell file which can be compiled by the Haskell compiler. As mentioned above, this is not CλaSH compatible. This is called a pre-compiler because it is run before the actual compilation. 4.1 Grammar The grammar for this simple recursive standard form is shown in Listing 5. The functions that can be parsed by this grammar consist of a base-case and a recursivecase. This base-case consists of a condition and a value if this condition holds. The recursive-case consist of an expression on the input argument and the return value of the recursive call. As mentioned before, the grammar only accepts expressions in the λ-format. 5. CONCLUSION A best method of implementing recursion has still to be determined, but the a straight forward way to make CλaSH compatible with recursion is the method provided by Sklyarov [9] because it mimics the method of how recursion works in software. This research shows that a simplified version of this method is possible to implement in an automated way. There are still steps to be taken to add it completely to the CλaSH compiler, these steps are mentioned in the future work section. 6. FUTURE WORK The process of adding recursion to the CλaSH compiler is not finished. What is missing will be mentioned here as future work. These missing features are: Extending the grammar to add support for all recursive functions, it currently only can handle simple recursive functions like factorial. Add it to the CλaSH compiler, the current process of adding recursion to CλaSH is to pre-compile a recursive function which then will be given to the CλaSH compiler, this should be incorporated in the said compiler. In this research a handwritten AST is used, the input should come from either the Haskell compiler or the CλaSH compiler. A simple framework is used in this research to explain the process, a framework made in Haskell is available[3] and should be ported to CλaSH this then can be used with the extended grammar (which is also future work). 7. ACKNOWLEDGMENTS Without the help of J. Kuper and C.P.R. Baaij this research wouldn t be possible. J. Kuper for guiding me and C.P.R. Baaij for all my CλaSH related questions. 8. REFERENCES [1] C. Baaij. Cλash: From haskell to hardware. PhD thesis, Universty of Twente, 2009. [2] G. Ferizis and H. El Gindy. Mapping recursive functions to reconfigurable hardware. In Field Programmable Logic and Applications, 2006. FPL 06. International Conference on, pages 1 6. IEEE, 2006. [3] R. Kegel. Implementing Recursive Algorithms within Hardware using Finite State Machines. In The 15th Twente Student Conference on IT, 2011. [4] T. Maruyama and T. Hoshino. A C to HDL compiler for pipeline processing on FPGAs. Engineering, pages 101 110, 2000. [5] T. Maruyama, M. Takagi, and T. Hoshino. Hardware Implementation Techniques for Recursive Calls and Loops. In Techniques, pages 450 455, 1999. [6] D. Mihhailov, V. Sklyarov, I. Skliarova, and A. Sudnitson. Parallel FPGA-Based Implementation of Recursive Sorting Algorithms. 2010 International Conference on Reconfigurable Computing and FPGAs, pages 121 126, Dec. 2010. [7] S. Ninos and A. Dollas. Modeling recursion data structures for FPGA-based implementation. Logic and Applications, 2008. FPL 2008., pages 11 16, 2008. [8] I. Skliarova. Recursion in reconfigurable computing: A survey of implementation approaches. Field Programmable Logic and, pages 224 229, 2009. [9] V. Sklyarov. FPGA-based implementation of recursive algorithms. Microprocessors and Microsystems, 28(5-6):197 211, Aug. 2004. [10] V. Sklyarov, I. Skliarova, and B. Pimentel. FPGA-based implementation and comparison of recursive and iterative algorithms. In Field Programmable Logic and Applications, 2005. International Conference on, pages 235 240. IEEE, 2005.

Listing 5. Appendix 1 data FuncDef = FuncDef I d e n t i f i e r Var BaseCase RecCase 2 data BaseCase = BaseCase App Condition Var Expr 3 data RecCase = RecCase Expr 4 5 data Expr = EVar Var 6 EInt Int 7 OpExpr Operator Expr Expr 8 AppExpr App FuncExpr Expr 9 EFunc FuncExpr Expr 10 App2Expr App FuncExpr ( Expr, Expr ) 11 12 13 type Operator = String 14 type App = String 15 type Lam = Char 16 17 data FuncExpr = F I d e n t i f i e r String 18 LamExpr Lam Var Expr 19 Lam2Expr Lam ( I d e n t i f i e r, I d e n t i f i e r ) Expr 20 21 type Var = I d e n t i f i e r 22 type Condition = FuncExpr 23 type I d e n t i f i e r = String