Compilation 2012 and Syntax Extensions Jan Midtgaard Michael I. Schwartzbach Aarhus University
GPL Problem Solving The General Purpose Language (GPL) approach: analyze the problem domain express the conceptual model as an OO/FP/ design program a framework/library express concrete application as framework/library client Pros: predictable and familiar result (relatively) low cost of implementation Cons: difficult to fully exploit domain-specific knowledge only available to general programmers 2
DSL Problem Solving The DSL approach: analyze the problem domain express the conceptual model as a language design implement a compiler or interpreter Pros: possible to exploit all domain-specific knowledge also available to domain experts Cons: (relatively) high cost of implementation risk of Babylonian confusion lack of tool support (IDE, ) hard to combine DSLs or DSL and GPL this way 3
Variations of DSLs A stand-alone DSL: a novel language with unique syntax and features example: LaTeX An embedded DSL: an existing GPL extended with DSL features example: JSP An external DSL: a stand-alone DSL invoked from a GPL example: SQL invoked from Java (JDBC) 4
From DSL to GPL A stand-alone DSL may evolve into a GPL: Fortran Formula Translation Algol Algorithmic Language Cobol Common Business Oriented Language Lisp List Processing Language Simula Simulation Language ML Meta Language A (successful) DSL design should plan for growth 5
Using Domain-Specific Knowledge Domain-specific syntax: domain-specific syntax clarifies the behavior directly denote high-level concepts Domain-specific analysis: consider global properties of the application Domain-specific optimization: exploit domain-specific analysis results GPL frameworks cannot provide these benefits 6
The Ocamlyacc/Menhir Languages A stand-alone (or external) DSL: no general-purpose computing is required Domain concepts: Context-free grammars Tokens / terminals Non-terminals and productions Implemented using: a lexer+parser (hand-written or ocamllex/ocamlyacc) a symbol checker + analysis a parsetable builder + emitter (menhir contains different table/code/coq backends) 7
DSL Syntax for Grammars start : start PLUS term { } start MINUS term { } term { }; term : term STAR factor { } term SLASH factor { } factor { }; factor : ID { } LPAR start RPAR { }; The BNF syntax closely matches the domain at hand 8
GPL Alternatives Parsing can be done in a number of ways: Hand-written (lexer and) parser (more next week) Hand-written parser table Parser combinators Harder to write correctly Fixed implementation strategy In contrast (OCaml)yacc and menhir decouple the language description from the workings of the language parser 9
DSL Analysis for Grammars Symbol checking: Checks non-terminal and terminal names Checks indexes ($1) for validity (bounds + data) Menhir also type checks the productions (by type checking the action code) Analyses grammar for useless productions (reachability) and removes them Checks grammar for LALR/LR(1) conformance These are checked by phases in the ocamlyacc/menhir compiler 10
GPL Analysis Alternative Lots of yellow PostIt notes: These cannot (all) be checked by a GPL compiler, e.g., OCaml or Java. 11
The JWIG Language An embedded DSL (in Java): lots of general-purpose computing is required Domain concepts: XML templates Web services sessions Implemented using: a syntax extension a static analysis a framework 12
DSL Syntax for JWIG public class test extends Service { } String userid; public class Login extends Session { XML wrap = [[<html> <body bgcolor="yellow"> <[contents]> </body> </html>]]; public void main() { XML login = [[<form> Userid: <input type="text" name="userid"> <input type="submit"/> </form>]]; show wrap<[contents = login]; userid = receive userid; show wrap<[contents = "Welcome "+userid]; } } 13
GPL Syntax Alternative XML login = XML.make("<form>\nUserid: <input type=\"text\" name=\"userid\">\n<input type=\"submit\"/>\</form>"); show(wrap.plug("contents",login)); userid = receive("userid"); The DSL syntax maps directly to methods calls in an underlying Java framework Avoiding escapes makes the syntax more legible But this is just a thin layer of syntactic sugar 14
DSL Analysis for JWIG A static analysis that at compile time guarantees: only well-formed and valid XML is ever generated only existing form fields are ever received only exisiting gaps are ever plugged This is a DSL analysis that is performed on the resulting compiled class files 15
JWIG Implementation Model JWIG syntax jwigc Java syntax javac.class files jwiga JWIG framework analysis results 16
Syntax Extensions Programmers may want to extend the syntax of their programming language: introduce domain-specific syntax abbreviate common idioms define language extensions ensure consistency Such extensions are introduced through macros 17
Macros Macros are as old as programming Is used as an orthogonal abstraction mechanism Two different flavors: lexical macros syntactic macros Main Entry: 2 macro Pronunciation: 'ma-(")kro Function: noun Inflected Form(s): plural macros Etymology: short for macroinstruction Date: 1959 a single computer instruction that stands for a sequence of operations 18
Lexical Macros Operate on sequences of tokens Are handled by a preprocessor Are independent of the host language syntax Examples: CPP TeX 19
CPP - The C Preprocessor Integrated into C compilers Also works as a stand-alone expander Intercepts directives such as: #define #undef #ifdef #if #include 20
Lexical Macro Example CPP macro to square a number: #define square(x) X * X square(z + 1) z + 1 * z + 1 21
Lexical Macro Example CPP macro to square a number: #define square(x) X * X square(z + 1) z + (1 * z) + 1 Adding parentheses as a hack: #define square(x) (X) * (X) square(z + 1) (z + 1)*(z + 1) 22
Parsing Problem #define swap(x,y) { int t=x; X=Y; Y=t; } if (a > b) swap(a,b); else b=0; *** test.c:3: parse error before 'else' 23
Parsing Problem Hack #define swap(x,y) { int t=x; X=Y; Y=t; } if (a > b) swap(a,b); else b=0; *** test.c:3: parse error before 'else' #define swap(x,y) do { int t=x; X=Y; Y=t; } while (0) if (a > b) swap(a,b); else b=0; 24
Expansion Time #define A 87 #define B A #undef A #define A 42 B??? Eager expansion (definition time): B 87 Lazy expansion (invocation time): B A 42 CPP is lazy 25
Expansion Order #define id(x) X #define one(x) id(x) #define two a,b one(two)??? Inner ( call-by-value ): one(two) one(a,b) *** arity error 'one' Outer ( call-by-name ): one(two) id(two) two a,b 26
Expansion Order in CPP CPP uses a pragmatic "argument prescan": one(two) id(a,b) *** arity error 'id' Useful for composing macros: #define succ(x) ((X)+1) #define call7(x) X(7) call7(succ) succ(7) ((7)+1) 27
Recursive Expansion #define x 1+x x??? Definition time: *** recursive definition Invocation time: x 1+x 1+1+x 1+1+1+x... 28
Recursive Expansion in CPP CPP uses a pragmatic "intercept-and-ignore": int x = 2; #define x = 1+x x 1+x Maintain a stack of macro invocations Ignore invocations of macros already on the stack At runtime the value of x is 3 29
TeX Macros \def \vector #1[#2..#3] { } $({#1}_{#2},\ldots,{#1}_{#3})$ \vector \phi[0..n-1] $({\phi}_{0},\ldots,{\phi}_{n-1})$ Flexible invocation syntax Parsing ambiguities (chooses shortest invocation) Expansion is lazy and outer Recursion is permitted (conditions allowed) 30
Syntactic Macros Operate on sequences of ASTs Are handled by the parser Are integrated with the host language syntax Examples: C++ templates Jakarta Tool Suite 31
C++ Templates Integrated into C++ compilers Is intended as a genericity mechanism But is often used as a macro language Macros accept ASTs for: identifers constants types The result is always an AST for a declaration 32
Syntactic Macro Example template <class T> T GetMax(T x, T y) { return (x>y?x:y); } int i,j; max = GetMax <int> (i,j); Template bodies are parsed at definition time (unlike CPP macros) Templates are syntactically expanded Heavy use of templates yields bloated code (unlike Java generics that are not macros) 33
Metaprogramming C++ templates: perform compile time constant folding of arguments allow multiple template definitions and pattern matching This combination enables metaprogramming: Turing-complete computations during compilation Template libraries exist for: booleans control structures functions variables data structures 34
Metaprogramming Example template <int X, int Y> struct pow { static const int n=x*pow<x,y-1>::n; }; template <int X> struct pow<x,0> { static const int n = 1; }; const int z = pow<5,3>::n; The value 125 is assigned to z at compile time 35
Metaprogramming for Specialization template <int I> inline float dot(float *a, float *b) { return dot<i-1>(a,b) + a[i]*b[i]; } template <> inline float dot<0>(float *a, float *b) { return a[0]*b[0]; } float x[3], y[3]; float z = dot<2>(x,y); float z = x[0]*y[0] + x[1]*y[1] + x[2]*y[2]; The overhead of control structures are removed 36
Jakarta Tool Suite JTS extends Java with simple syntactic macros Macros accept ASTs for: AST_QualifiedName AST_Exp AST_Stm AST_FieldDecl AST_Class AST_TypeName The result is an AST specified as: exp{... }exp stm{... }stm mth{... }mth cls{... }cls 37
Hygienic Macros macro swap(ast_qualifiedname x, AST_QualifiedName y) local temp stm{ int temp = x; x = y; y = temp; }stm int temp = 42; int tump = 87; #swap(temp,tump); Potential name clash problem: int temp = temp; temp = tump; tump = temp; But local names are renamed uniquely: int temp143 = temp; temp = tump; tump = temp143; Hygienic macros are available in Scheme, various macro extensions of Java such as JSE, 38