Tail call elimination. Michel Schinz

Similar documents
Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Lecture Data Types and Types of a Language

Lecture 11: Tail Recursion; Continuations

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

COMP 356 Programming Language Structures Notes for Chapter 10 of Concepts of Programming Languages Implementing Subprograms.

Introduction. dnotify

CSC230 Getting Starting in C. Tyler Bletsch

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

Efficient Compilation of Tail Calls and Continuations to JavaScript

Object Oriented Software Design II

How To Understand How A Process Works In Unix (Shell) (Shell Shell) (Program) (Unix) (For A Non-Program) And (Shell).Orgode) (Powerpoint) (Permanent) (Processes

Mutual Exclusion using Monitors

Channel Access Client Programming. Andrew Johnson Computer Scientist, AES-SSG

Surface and Volumetric Data Rendering and Visualisation

Chapter 5 Names, Bindings, Type Checking, and Scopes

Moving from CS 61A Scheme to CS 61B Java

Specific Simple Network Management Tools

Crash Course in Java

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

Introduction. What is an Operating System?

Recursion vs. Iteration Eliminating Recursion

CORBA Programming with TAOX11. The C++11 CORBA Implementation

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

High-Level Programming Languages. Nell Dale & John Lewis (adaptation by Michael Goldwasser)

Leak Check Version 2.1 for Linux TM

Hacking Techniques & Intrusion Detection. Ali Al-Shemery arabnix [at] gmail

Software Vulnerabilities

Stack Overflows. Mitchell Adair

Operating Systems. Privileged Instructions

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

CS 33. Multithreaded Programming V. CS33 Intro to Computer Systems XXXVII 1 Copyright 2015 Thomas W. Doeppner. All rights reserved.

Tutorial on C Language Programming

C Compiler Targeting the Java Virtual Machine

Memory management. Announcements. Safe user input. Function pointers. Uses of function pointers. Function pointer example

CSI 402 Lecture 13 (Unix Process Related System Calls) 13 1 / 17

Data Structures and Algorithms C++ Implementation

How To Program In Scheme (Prolog)

Java's garbage-collected heap

Parallelization: Binary Tree Traversal

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Operating System Engineering: Fall 2005

Linux Process Scheduling Policy

The programming language C. sws1 1

The Power of Ten Rules for Developing Safety Critical Code 1. Gerard J. Holzmann NASA/JPL Laboratory for Reliable Software Pasadena, CA 91109

Habanero Extreme Scale Software Research Project

Jorix kernel: real-time scheduling

The C Programming Language course syllabus associate level

Informatica e Sistemi in Tempo Reale

/* File: blkcopy.c. size_t n

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation

How I Learned to Stop Fuzzing and Find More Bugs

Brent A. Perdue. July 15, 2009

P1 P2 P3. Home (p) 1. Diff (p) 2. Invalidation (p) 3. Page Request (p) 4. Page Response (p)

Return-oriented programming without returns

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

Chapter 15 Functional Programming Languages

GENERIC and GIMPLE: A New Tree Representation for Entire Functions

University of Twente. A simulation of the Java Virtual Machine using graph grammars

Process definition Concurrency Process status Process attributes PROCESES 1.3

Threads Scheduling on Linux Operating Systems

Storage Classes CS 110B - Rule Storage Classes Page 18-1 \handouts\storclas

Coding Rules. Encoding the type of a function into the name (so-called Hungarian notation) is forbidden - it only confuses the programmer.

Computer Systems II. Unix system calls. fork( ) wait( ) exit( ) How To Create New Processes? Creating and Executing Processes

Keil C51 Cross Compiler

Chapter 7D The Java Virtual Machine

Object Oriented Software Design II

Lecture Outline. Stack machines The MIPS assembly language. Code Generation (I)

Programming Language Features (cont.) CMSC 330: Organization of Programming Languages. Parameter Passing in OCaml. Call-by-Value

Introduction. Reading. Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF. CMSC 818Z - S99 (lect 5)

Organization of Programming Languages CS320/520N. Lecture 05. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Project 2: Bejeweled

C Interview Questions

Parallel Computing. Shared memory parallel programming with OpenMP

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

Monitors, Java, Threads and Processes

Revealing the performance aspects in your code. Intel VTune Amplifier XE Generics. Rev.: Sep 1, 2013

Jonathan Worthington Scarborough Linux User Group

Debugging and Bug Tracking. Slides provided by Prof. Andreas Zeller, Universität des Saarlands

Chapter 5. Recursion. Data Structures and Algorithms in Java

Virtual Servers. Virtual machines. Virtualization. Design of IBM s VM. Virtual machine systems can give everyone the OS (and hardware) that they want.

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

To Java SE 8, and Beyond (Plan B)

7th Marathon of Parallel Programming WSCAD-SSC/SBAC-PAD-2012

Libmonitor: A Tool for First-Party Monitoring

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

Real Time Programming: Concepts

MPLAB TM C30 Managed PSV Pointers. Beta support included with MPLAB C30 V3.00

A Typing System for an Optimizing Multiple-Backend Tcl Compiler

Virtuozzo Virtualization SDK

Translating C code to MIPS

Data Structure with C

MeshBee Open Source ZigBee RF Module CookBook

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Transcription:

Tail call elimination Michel Schinz

Tail calls and their elimination

Loops in functional languages Several functional programming languages do not have an explicit looping statement. Instead, programmers resort to recursion to loop. For example, the central loop of a Web server written in Scheme might look like this: (define web-server-loop (lambda () (wait-for-connection) (fork handle-connection) (web-server-loop))) 3

The problem Unfortunately, recursion is not equivalent to the looping statements usually found in imperative languages: recursive function calls, like all calls, consume stack space while loops do not... In our example, this means that the Web sever will eventually crash because of a stack overflow this is clearly unacceptable! A solution to this problem must be found... 4

The solution In our example, it is obvious that the recursive call to webserver-loop could be replaced by a jump to the beginning of the function. If the compiler could detect this case and replace the call by a jump, our problem would be solved! This is the idea behind tail call elimination. 5

Tail calls The reason why the recursive call of web-server-loop could be replaced by a jump is that it is the last action taken by the function : (define web-server-loop (lambda () (wait-for-connection) (fork handle-connection) (web-server-loop))) Calls in terminal position like this one are called tail calls. This particular tail call also happens to target the function in which it is defined. It is therefore said to be a recursive tail call. 6

Tail calls examples In the functions below, all calls are underlined. Which ones are tail calls? (define map (lambda (f l) (if (null? l) l (cons (f (car l)) tail call (map f (cdr l)))))) (define fold (lambda (f z l) (if (null? l) z (fold f (f z (car l)) (cdr l))))) recursive tail call 7

Tail call elimination When a function performs a tail call, its own activation frame is dead, as by definition nothing follows the tail call. Therefore, it is possible to first free the activation frame of a function about to perform such a call, then load the parameters for the call, and finally jump to the function s code. This technique is called tail call elimination (or optimisation), abbreviated TCE. 8

TCE example Consider the following function definition and call: (define sum (lambda (z l) (if (null? l) z (sum (+ z (car l)) (cdr l))))) (sum 0 (list3 1 2 3)) How does the stack evolve, with and without tail call elimination? 9

TCE example Without tail call elimination, each recursive call to sum makes the stack grow, to accommodate activation frames. 0 0 0 0 (1 2 3) (1 2 3) (1 2 3) (1 2 3) 1 1 1 (2 3) (2 3) (2 3) 3 3 (3) (3) time 6 () 10

TCE example With tail call elimination, the dead activation frames are freed before the tail call, resulting in a stack of constant size. 0 1 3 6 (1 2 3) (2 3) (3) () time 11

Tail call optimisation? Tail call elimination is more than just an optimisation! Without it, writing a program that loops endlessly using recursion and does not produce a stack overflow is simply impossible. For that reason, full tail call elimination is actually required in some languages, e.g. Scheme. In other languages, like C, it is simply an optimisation performed by some compilers in some cases. 12

Tail call elimination for minischeme

TCE example Without TCE succ: ;... LINT R27 add LINT R1 1 LOAD R2 R30 8 LINT R29 ret JMPZ R27 R0 ret: LOAD R29 R30 4 LOAD R30 R30 0 JMPZ R29 R0 (define succ (lambda (x) (add 1 x))) (define add...) unlink activation frame With TCE succ: ;... LINT R27 add LINT R1 1 LOAD R2 R30 8 LOAD R29 R30 4 LOAD R30 R30 0 JMPZ R27 R0 14

Implementing TCE Tail call elimination is implemented by: 1. identifying tail calls in the program, 2. compiling those tail calls specially, by deallocating the activation frame of the caller before jumping to the called function. We already know how to compile tail calls, but we did not explain yet how to identify them. 15

Identifying tail calls To identify tail calls, we first assume that all calls are marked with a unique number. We then define a function T that returns the marks corresponding to the tail calls. For example, given the following expression: (lambda (x) (if 1 (even? x) 2 (g 3 (h x)) 4 (h 5 (g x)))) T produces the set { 2,4 }. 16

Identifying tail calls T[(lambda (args) body 1 body n )] = T [body n ] where the auxiliary function T is defined as follows: T [(let (defs) body 1 body n )] = T [body n ] T [(if e 1 e 2 e 3 )] = T [e 2 ] T [e 3 ] T [ m (e 1 e 2 e n )] when e 1 is not a primitive = { m } T [anything else] = 17

Tail call elimination in uncooperative environments

TCE in various environments When generating assembly language, it is easy to perform TCE, as the target language is sufficiently low-level to express the deallocation of the activation frame and the following jump. When targeting higher-level languages, like C or the JVM, this becomes difficult although recent VMs like.net s support tail calls. We explore several techniques that have been developed to perform TCE in such contexts. 19

Benchmark program To illustrate how the various techniques work, we will use a benchmark program in C that tests whether a number is even, using two mutually tail-recursive functions. When no technique is used to manually eliminate tail calls, it looks as follows. And unless the C compiler performs tail call elimination like gcc does with full optimisation it crashes with a stack overflow at run time. int even(int x) { return x == 0? 1 : odd(x - 1); } int odd(int x) { return x == 0? 0 : even(x - 1); } int main(int argc, char* argv[]) { printf("%d\n", even(300000000)); } 20

Single function approach The single function approach consists in compiling the whole program to a single function of the target language. This makes it possible to compile tail calls to simple jumps within that function, and other calls to recursive calls to it. This technique is rarely applicable in practice, due to limitations in the size of functions of the target language. 21

Single function approach in C typedef enum { fun_even, fun_odd } fun_id; int wholeprog(fun_id fun, int x) { start: switch (fun) { case fun_even: if (x == 0) return 1; fun = fun_odd; x = x - 1; goto start; case fun_odd: if (x == 0) return 0; fun = fun_even; x = x - 1; goto start; } } int main(int argc, char* argv[]) { printf("%d\n", wholeprog(fun_even, 300000000)); } 22

Trampolines With trampolines, functions never perform tail calls directly. Rather, they return a special value to their caller, informing it that a tail call should be performed. The caller performs the call itself. For this scheme to work, it is necessary to check the return value of all functions, to see whether a tail call must be performed. The code which performs this check is called a trampoline. 23

Trampolines in C typedef void* (*fun_ptr)(int); struct { fun_ptr fun; int arg; } resume; void* even(int x) { if (x == 0) return (void*)1; resume.fun = odd; resume.arg = x - 1; return &resume; } void* odd(int x) { if (x == 0) return (void*)0; resume.fun = even; resume.arg = x - 1; return &resume; } int main(int argc, char* argv[]) { void* res = even(300000000); while (res == &resume) res = (resume.fun)(resume.arg); printf("%d\n",(int)res); } 24

Extended trampolines Extended trampolines trade some of the space savings of standard trampolines for speed. Instead of returning to the trampoline on every tail call, the number of successive tail calls is counted at run time, using a tail call counter (tcc) passed to every function. When that number reaches a predefined limit l, a non-local return is performed to transfer control to a trampoline waiting at the bottom of the chain, thereby reclaiming l activation frames in one go. 25

C s setjmp / longjmp Extended trampolines are more efficient when a non-local return is used to free dead stack frames. In C, non-local returns can be performed using the standard functions setjmp and longjmp, which can be seen as a form of goto that works across functions: setjmp(b) saves its calling environment in b, and returns 0, longjmp(b,v) restores the environment stored in b, and proceeds like if the call to setjmp had returned v instead of 0. In the following slides, we use _setjmp and _longjmp, which do not save and restore the signal mask and are therefore much more efficient. 26

Extended trampolines in C typedef int (*fun_ptr)(int, int); struct { fun_ptr fun; int arg; } resume; jmp_buf jmp_env; int even(int tcc, int x) { if (tcc > TC_LIMIT) { resume.fun = even; resume.arg = x; _longjmp(jmp_env, -1); } return (x == 0)? 1 : odd(tcc + 1, x - 1); } int odd(int tcc, int x) { /* similar to even */ } int main(int argc, char* argv[]) { int res = (_setjmp(jmp_env) == 0)? even(0, 300000000) : (resume.fun)(0, resume.arg); printf("%d\n",res); } 27

Baker s technique Baker s technique consists in first transforming the whole program to continuation-passing style. One important property of CPS is that all calls are tail calls. Consequently, it is possible to periodically shrink the whole stack using a non-local return. 28

Baker s technique in C typedef void (*cont)(int); typedef void (*fun_ptr)(int, cont); int tcc = 0; struct { fun_ptr fun; int arg; cont k; } resume; jmp_buf jmp_env; void even_cps(int x, cont k) { if (++tcc > TC_LIMIT) { tcc = 0; resume.fun = even_cps; resume.arg = x; resume.k = k; _longjmp(jmp_env, -1); } if (x == 0) (*k)(1); else odd_cps(x - 1, k); } void odd_cps(int x, cont k) { /* similar to even_cps */ } int main(int argc, char* argv[]) { if (_setjmp(jmp_env) == 0) even_cps(300000000, main_1); else (resume.fun)(resume.arg, resume.k); } void main_1(int val) { printf("%d\n", val); exit(0); } 29

Techniques summary None Single function Trampolines Extended trampolines Baker's non-tail call tail call normal return trampoline return non-local return 30

Benchmark 1 To get an idea of the relative performance of the techniques just presented, we first compiled all benchmarks without optimisation, using gcc 4.01 on a PowerPC G4. The version of the program that doesn t perform tail call elimination results in a stack overflow, and is therefore not included below. The normalised running times of the other versions are presented below. single function 1.0 trampolines 3.0 extended trampolines Baker s technique 1.4 1.6 0 1.75 3.50 31

Benchmark 2 As a second experiment, we compiled all programs with full optimisation. This enables the first version of the program to complete successfully, as gcc 4.01 performs at least some amount of tail call elimination. The normalised running times are presented below. initial version single function 1.0 1.6 trampolines 23.4 extended trampolines Baker s technique 3.6 4.0 0 12.5 25.0 32

Summary Tail call elimination consists in compiling tail calls specially, so that the activation frame of the caller is freed before the called function is invoked. This technique reduces memory usage and makes it possible to write loops using recursion without overflowing the stack. Tail call elimination can be hard to implement efficiently when the target platform is uncooperative. 33