ANTLR - Introduction. Overview of Lecture 4. ANTLR Introduction (1) ANTLR Introduction (2) Introduction



Similar documents
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Antlr ANother TutoRiaL

Compiler Construction

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

Programming Languages CIS 443

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

Textual Modeling Languages

Compiler Construction

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Moving from CS 61A Scheme to CS 61B Java

CSCI 3136 Principles of Programming Languages

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

Flex/Bison Tutorial. Aaron Myles Landwehr CAPSL 2/17/2012

Syntaktická analýza. Ján Šturc. Zima 208

Compiler I: Syntax Analysis Human Thought

Programming Project 1: Lexical Analyzer (Scanner)

Design Patterns in Parsing

Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment

03 - Lexical Analysis

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

Context free grammars and predictive parsing

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Compilers Lexical Analysis

Semantic Analysis: Types and Type Checking

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

7. Building Compilers with Coco/R. 7.1 Overview 7.2 Scanner Specification 7.3 Parser Specification 7.4 Error Handling 7.5 LL(1) Conflicts 7.

Yacc: Yet Another Compiler-Compiler

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

Static vs. Dynamic. Lecture 10: Static Semantics Overview 1. Typical Semantic Errors: Java, C++ Typical Tasks of the Semantic Analyzer

Introduction to Python

Theory of Compilation

J a v a Quiz (Unit 3, Test 0 Practice)

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

Topics. Parts of a Java Program. Topics (2) CS 146. Introduction To Computers And Java Chapter Objectives To understand:

A Grammar for the C- Programming Language (Version S16) March 12, 2016

If-Then-Else Problem (a motivating example for LR grammars)

Lecture 9. Semantic Analysis Scoping and Symbol Table

Pemrograman Dasar. Basic Elements Of Java

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Evaluation of JFlex Scanner Generator Using Form Fields Validity Checking

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Chapter 2: Elements of Java

Compiler Compiler Tutorial

JavaScript: Control Statements I

The previous chapter provided a definition of the semantics of a programming

Masters programmes in Computer Science and Information Systems. Object-Oriented Design and Programming. Sample module entry test xxth December 2013

C Compiler Targeting the Java Virtual Machine

Introduction to Java

Java Interview Questions and Answers

Translating to Java. Translation. Input. Many Level Translations. read, get, input, ask, request. Requirements Design Algorithm Java Machine Language

C++ INTERVIEW QUESTIONS

SML/NJ Language Processing Tools: User Guide. Aaron Turon John Reppy

Comp151. Definitions & Declarations

Install Java Development Kit (JDK) 1.8

CA Compiler Construction

Chapter 2 Basics of Scanning and Conventional Programming in Java

csce4313 Programming Languages Scanner (pass/fail)

Chapter One Introduction to Programming

CS 106 Introduction to Computer Science I

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

PL / SQL Basics. Chapter 3

UIL Computer Science for Dummies by Jake Warren and works from Mr. Fleming

A Programming Language Where the Syntax and Semantics Are Mutable at Runtime

JDK 1.5 Updates for Introduction to Java Programming with SUN ONE Studio 4

Introduction to Lex. General Description Input file Output file How matching is done Regular expressions Local names Using Lex

Chapter 3. Input and output. 3.1 The System class

Parsing Expression Grammar as a Primitive Recursive-Descent Parser with Backtracking

PROBLEM SOLVING SEVENTH EDITION WALTER SAVITCH UNIVERSITY OF CALIFORNIA, SAN DIEGO CONTRIBUTOR KENRICK MOCK UNIVERSITY OF ALASKA, ANCHORAGE PEARSON

A Lex Tutorial. Victor Eijkhout. July Introduction. 2 Structure of a lex file

Compiler Construction

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Java Basics: Data Types, Variables, and Loops

AP Computer Science Java Subset

Intel Assembler. Project administration. Non-standard project. Project administration: Repository

JavaScript: Introduction to Scripting Pearson Education, Inc. All rights reserved.

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters

Computer Programming C++ Classes and Objects 15 th Lecture

Selection Statements

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

GENERIC and GIMPLE: A New Tree Representation for Entire Functions

JAVA - QUICK GUIDE. Java SE is freely available from the link Download Java. So you download a version based on your operating system.

File class in Java. Scanner reminder. Files 10/19/2012. File Input and Output (Savitch, Chapter 10)

I. INTRODUCTION. International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 2, Mar-Apr 2015

6.170 Tutorial 3 - Ruby Basics

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

Lecture 5: Java Fundamentals III

Sample CSE8A midterm Multiple Choice (circle one)

Statements and Control Flow

The Cool Reference Manual

Transcription:

Overview of Lecture 4 www.antlr.org (ANother Tool for Language Recognition) Introduction VB HC4 Vertalerbouw HC4 http://fmt.cs.utwente.nl/courses/vertalerbouw/! 3.x by Example! Calc a simple calculator language Some grammar patterns Theo Ruys University of Twente Department of Computer Science Formal Methods & Tools Michael Weber! kamer: ZI 5037" telefoon: 3716" email: michaelw@cs.utwente.nl! 2 Introduction (1) www.antlr.org Introduction (2) www.antlr.org! input: language descriptions using EBNF grammar! output: recognizer for the language can build recognizers for three kinds of input:! character streams (i.e. by generating a scanner)! token streams (i.e. by generating a parser)! node streams (i.e. by generating a tree walker) uses the same syntax for all its recognizer descriptions. 3.x! LL(*) compiler generator Generated code is well-structured and readable. Parse-structure follows W&B s recursive descent approach.! generates recognizers in Java, C++, C#, Python, etc. generates (predictive) LL(k) or LL(*) recognizers! computes first, follow and lookahead sets! verifies syntax conditions (e.g. LL(k) test)! An generated scanner/lexer is a predictive recursive-descent recognizer and not a finite automaton. Other well-known compiler generators! scanner: lex/flex, JFlex! parser: yacc/bison, JCup, javacc, sablecc, SLADE Terminology! lexer = scanner, lexical analyser, tokenizer! parser = syntactical analyser! tree parser = tree walker 3 4

Introduction (3) www.antlr.org 3 - changes wrt 2.x Material on :! See s website. http://www.antlr.org/wiki/display/3/faq+-+getting+started! There is a wealth of information on. Unfortunately, the documentation is not very well structured and might be overwhelming for beginners. Spend some time browsing the documentation to get an overview of what is available.! Yahoo group: antlr-interest (also as mailing-list) Active community: quite some traffic! Book: Terence Parr. The Definitive Reference. Pragmatic Bookshelf, 2007. Hard copy: $36.95 PDF: $24.00 Works: integrated grammar and compiler environment. A new very powerful extension to LL(k) called LL(*). Tree building simplified by supporting rewrite rules. Truly retargetable code generator that makes it easy to build backends (e.g. Java, C#, C, Python, etc.). Improved error reporting and recovery. Integration of the StringTemplate template engine for generating structured text (useful for Code Generation). New syntax for grammars:! 3 is not upward compatible with version 2.x. 5 6 overview Source program input file structure *.g EBNF Grammars with Java actions/code lexer grammar My lexer and parser specification can be elegantly combined in a single specification parser grammar My MyLexer.g! MyParser.g! char stream MyLexer.java token stream MyParser.java scanner parser [gtype] grammar FooBar gtype may be empty or lexer, parser or tree. options { options for entire grammar file A single.g file can contain a Lexer and/or tokens { Parser, or a TreeParser. token definitions @header { will be copied to the generated Java file(s) e.g. imports MyWalker.g! tree grammar MyWalker Usually more than one tree walker. tree node stream MyWalker.java Target code tree walker, e.g. - context analyzer - code generator All Java files are generated automatically. @rulecatch { error handling: how to deal with exceptions? @members { optional class definitions: instance variables, methods rulename : all rules for FooBar 7 8

rule structure generates a recursive descent recognizer: for each rule, will generate a Java method. Running rulename [args] returns [T val] options { local options : alternative 1 alternative 2 alternative n A B EBNF operators A or B A* zero or more A s A+ one or more A s A? an optional A When using EBNF operators in : use parentheses to enclose more than one symbol.. An alternative is an EBNF regular expression containing:! rulename! TOKEN! EBNF operator! Java code in braces Example optional, used for passing information around + optional code sections to insert at start of end of method @init { @after { expr : operand (PLUS operand)* operand : LPAREN expr RPAREN NUMBER is a Java program: java org.antlr.tool file.g The 2 jar-file should be in the CLASSPATH of course. may contain specifications for a lexer and/or a parser, or a treewalker By default Java generates.java files which have to be compiled to an Java application. There also exist several GUI Development Environments: Works may be helpful http://www.antlr.org/works/ during development DT (for Eclipse) http://www.certiv.net/projects/plugins/antlrdt.html v3 E (for Eclipse) http://antlrv3ide.sourceforge.net/ Within Vertalerbouw we rely on the command-line version. It is allowed to use an E, though. 9 10 Calc Language (1) Calc: simple calculator language! declarations only integer variables must all come before statements! statements assignment to variables printing of expressions var n: integer var x: integer n := 2+4-1 x := n+3+7 print(x) Will be extended upon in the laboratory of week 3 and 4 Calc Language (2) EBNF for Calc var n: integer var x: integer n := 2+4-1 x := n+3+7 print(x) program ::= declarations statements EOF declarations ::= (declaration SEMICOLON)* declaration ::= ENTIFIER COLON type statements ::= (statement SEMICOLON)+ statement ::= assignment printstatement assignment ::= lvalue BECOMES expr printstatement ::= PRINT LPAREN expr RPAREN lvalue ::= ENTIFIER expr ::= operand ((PLUS MINUS) operand)* operand ::= ENTIFIER NUMBER LPAREN expr RPAREN type ::= INTEGER All terminals are written as UPPERCASE symbols. 11 12

Calc compiler overview - Parser & Lexer We let generate four recognizers:! CalcLexer (extends Lexer) translates a stream of characters to stream of tokens! CalcParser (extends Parser) translates a stream of tokens to an stream of tree nodes! CalcChecker (extends TreeParser) reads the stream of tree nodes (i.e. the AST) and checks whether the context constraints are obeyed! CalcInterpreter (extends TreeParser) reads the stream of tree nodes (i.e. the AST) and executes the program A lexer and parser are closely related. A lexer generates tokens which are consumed by a parser. In 3.x, the lexer and parser can be combined elegantly into a single grammar specification.! takes care of splitting the two specifications. Literals (i.e. character strings) are enclosed in single quotes (e.g. 'bar'). Lexer non-terminals (i.e. token names) always start with an UPPERCASE letter. Parser non-terminals always start with an lowercase letter. Foo.tokens Foo.g FooParser.java grammar Foo tokens { foo : BAR BAR : 'bar' Foo.g FooLexer.java 13 Will be imported by tree parsers. 14 Calc Parser & Lexer (1) Calc Parser & Lexer (2) First only recognising, no AST construction yet. grammar Calc options { k = 1 language = Java output = AST tokens { PLUS = '+' MINUS = '-' BECOMES = ':=' COLON = ':' SEMICOLON = '' LPAREN = '(' RPAREN = ')' // keywords PROGRAM = 'program' = 'var' PRINT = 'print' INTEGER = 'integer' This is a combined specification (not prefixed by lexer, parser or tree). amount of lookahead, disables LL(*) build an AST Target language is Java. token definitions (literals) tokens always start with an uppercase letter and specify the text for a token program : declarations statements EOF declarations : (declaration SEMICOLON)* statements : (statement SEMICOLON)+ declaration : ENTIFIER COLON type statement : assignment print assignment : lvalue BECOMES expr print : PRINT LPAREN expr RPAREN lvalue : ENTIFIER expr : operand ((PLUS MINUS) operand)* operand : ENTIFIER NUMBER LPAREN expr RPAREN type : INTEGER parser specific rules special end-of-file token parser rules start with a lowercase letter var n: integer var x: integer n := 2+4-1 x := n+3+7 print(x) In this example, all tokens are explicitly named (as UPPERCASE tokens). It is also possible to use literals in the parser specification. For example: print : 'print' '(' expr ')' expr : operand (('+' '-') operand)* 15 16

Calc Parser & Lexer (3) Calc Parser & Lexer (4) The Parser does not only recognize the language, it should also build the AST. ENTIFIER : NUMBER : DIGIT+ LETTER (LETTER DIGIT)* COMMENT : '//'.* '\n' { $channel=hden WS : (' ' '\t' '\f' '\r' '\n')+ { $channel=hden fragment DIGIT : ('0'..'9') fragment LOWER : ('a'..'z') fragment UPPER : ('A'..'Z') fragment LETTER : LOWER UPPER fragment lexer rules can be used by other lexer rules, but do not return tokens by themselves.* matches everything except the character that follows it (i.e. '\n'). lexer specific rules There are multiple token channels. The parser reads from the DEFAULT channel. By setting a token s channel to HDEN it will be ignored by the parser. shorthand for (the complete) 'a' 'b' 'c' 'y' 'z' No need to worry about counting the newlines the lexer takes care of this automatically. 17 program : declarations statements EOF -> ^(PROGRAM declarations statements) declarations : (declaration SEMICOLON!)* Imaginary token that is used as the root AST node (not really needed). statements : (statement SEMICOLON!)+ declaration : ^ ENTIFIER COLON! type statement : assignment print assignment : lvalue BECOMES^ expr print : PRINT^ LPAREN! expr RPAREN! lvalue : ENTIFIER expr : operand ((PLUS^ MINUS^) operand)* operand : ENTIFIER NUMBER LPAREN! expr RPAREN! type : INTEGER (due to rule of operand) this builds: ^(PLUS expr expr) T^ parser building the AST Annotations for building AST nodes make T the root of this (sub)rule T! discard T -> ^( ) tree construction for a rule For example: ^ ENTIFIER COLON! type = ^( ENTIFIER type) = type = type first child, next sibling notation 18 AST Tree (Example) var n: integer var x: integer n := 2+4-1 x := n+3+7 print(x) PROGRAM program : decls stats EOF -> ^(PROGRAM decls stats) decls : (decl SEMICOLON!)* stats : (stat SEMICOLON!)+ decl : ^ COLON! type stat : assign print assign : lvalue BECOMES^ expr print : PRINT^ LPAREN! expr RPAREN! lvalue : expr : oper ((PLUS^ MINUS^) oper)* oper : NUM LPAREN! expr RPAREN! type : INT Calc Tree walker tree grammar CalcTreeWalker options { tokenvocab = Calc ASTLabelType = CommonTree program : decls stats EOF -> ^(PROGRAM decls stats) decls : (decl SEMICOLON!)* stats : (stat SEMICOLON!)+ decl : ^ ENTIFIER COLON! type stat : assign print assign : lvalue BECOMES^ expr print : PRINT^ LPAREN! expr RPAREN! lvalue : expr : operand ((PLUS^ MINUS^) operand)* operand : NUM LPAREN! expr RPAREN! type : INT This is a specification of a tree walker. Import tokens from Calc.tokens. The AST nodes are of type CommonTree. n INT x = ENTIFIER INT = INTEGER INT BECOMES n PLUS NUMBER 2 MINUS NUMBER 1 NUMBER 4 PRINT BECOMES x PLUS x PLUS NUMBER 7 NUMBER n 3 program : ^(PROGRAM (declaration statement)+) The AST has a root node PROGRAM with declaration : ^( ENTIFIER type) many (declaration or statement) children. statement : ^(BECOMES ENTIFIER expr) ^(PRINT expr) expr : operand ^(PLUS expr expr) ^(MINUS expr expr) operand : ENTIFIER NUMBER type : INTEGER Match a tree whose root is a PLUS token with two children that match the expr rule. This tree walker does not do anything (yet). Note the conciseness of the grammar and the correspondence with the abstract syntax of the language Calc. 19 20

Calc Checker (1) tree grammar CalcChecker options { tokenvocab = Calc ASTLabelType = CommonTree @header { import java.util.set import java.util.hashset @rulecatch { catch (RecognitionException e) { throw e @members { private Set<String> idset = new HashSet<String>() public boolean isdeclared(string s) { return idset.contains(s) public void declare(string s) { idset.add(s) The CalcChecker checks the context rules of the language: each identifier can be declared only once identifiers that are used must have been declared. @header: code block which is copied verbatim to the beginning of CalcChecker.java. @rulecatch: specify your own error handler. Here: no error handler exceptions are propagated to the method calling this checker. @members: code block which is copied verbatim to the class definition of CalcChecker.java. The Calc language uses a monolithic block structure. For checking the scope rules we can use a Set. The methods isdeclared and declare become methods of the class CalcChecker. Calc Checker (2) program : ^(PROGRAM (declaration statement)+) With name=node we can refer to the AST node using name declaration : ^( id=entifier type) { if (isdeclared($id.gettext())) and get its String representation. throw new CalcException($id.getText() + " is already declared") else declare($id.gettext()) statement : ^(BECOMES id=entifier expr) or use the attribute text. { if (!isdeclared($id.text)) throw new CalcException($id.text + " is used but not declared") ^(PRINT expr) Java code block which is copied verbatim to the parse method of declaration in CalcChecker.java. Within Java code, the variables are (usually) prefixed with $. CalcException is an user-defined Exception (subclass of org.antlr.runtime RecognitionException) to express some problem in the input. 21 22 Calc Interpreter (1) n initialise n to 0 INT x Idea of Interpreter: depth-first, left-to-right traversal of AST. use Map for storing values of variables compute expressions bottom up initialise x to 0 INT PROGRAM BECOMES set n to 5 n NUMBER 2 MINUS 5 PLUS 6 NUMBER 1 NUMBER 4 BECOMES set x to 15 x retrieve n 5 PLUS 8 var n: integer var x: integer n := 2+4-1 x := n+3+7 print(x) PLUS 15 NUMBER 7 NUMBER 3 PRINT x Calc Interpreter (2) tree grammar CalcInterpreter options { tokenvocab = Calc ASTLabelType = CommonTree @header { import java.util.map import java.util.hashmap To store the values of the variables. @members { Map<String,Integer> store = new HashMap<String,Integer>() program : ^(PROGRAM (declaration statement)+) declaration : ^( id=entifier type) { store.put($id.text, 0) The structure of the tree grammar CalcInterpreter is the same as the one for the CalcChecker, of course. Initialized on 0. Idea of Interpreter: depth-first, left-to-right traversal of AST. use Map for storing values of variables compute expressions bottom up 23 24

Calc Interpreter (2) statement : ^(BECOMES id=entifier v=expr) { store.put($id.text, $v) ^(PRINT v=expr) { System.out.println("" + $v) expr returns [int val = 0] : z=operand { val = z ^(PLUS x=expr y=expr) { val = x + y ^(MINUS x=expr y=expr) { val = x - y operand returns [int val = 0] : id=entifier { val = store.get($id.text) n=number { val = Integer.parseInt($n.text) A rule can return a value: rulename returns [T x] The type of the return value is T and the value returned is the value of x at the end of the rule. Get the value of ENTIFIER out of the store. The rule expr returns a value. The value returned by expr is put into the store for id. deduces from the context the types of the variables: id is a CommonTree, v is an int. Note that it is also possible to pass arguments to a rule. Parse the string representation of the NUMBER. 25 lexer parser Calc driver public static void main(string[] args) { A lexer gets an stream as input. CalcLexer lex = new CalcLexer( new InputStream(System.in)) CommonTokenStream tokens = new CommonTokenStream(lex) CalcParser parser = new CalcParser(tokens) checker interpreter CalcParser.program_result result = parser.program() CommonTree tree = (CommonTree) result.gettree() CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree) CalcChecker checker = new CalcChecker(nodes) checker.program() CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree) CalcInterpreter interpreter = new CalcInterpreter(nodes) interpreter.program() Call the start symbol to start parsing. The parser gets the lexer s output tokens. The recognition methods may all throw Exceptions (e.g. RecognitionException, TokenStreamException) These have to be caught in main-method. See Calc.java. 26 -ast Calc visualizing the AST (1) public static void main(string[] args) { CalcLexer lexer = new CalcLexer( new InputStream(System.in)) CommonTokenStream tokens = new CommonTokenStream(lexer) CalcParser parser = new CalcParser(tokens) CalcParser.program_return result = parser.program() CommonTree tree = (CommonTree) result.gettree() // show S-Expression respresentation of the AST String s = tree.tostringtree() System.out.println(s) // print the AST as DOT specification DOTTreeGenerator gen = new DOTTreeGenerator() StringTemplate st = gen.todot(tree) System.out.println(st) DOTTreeGenerator is defined in package org.antlr.stringtemplate -dot.dot files can be visualized using the GraphViz program: http://www.graphviz.org/ Calc visualizing the AST (2) via tree.tostringtree() via DOTTreeGenerator and GraphViz var n: integer var x: integer n := 2+4-1 x := n+3+7 print(x) 27 28

Calc - generated Java files DEMO Calc Parser Java code (1) Calc.g grammar Calc Calc.g lexer grammar Calc CalcChecker.g tree grammar CalcChecker CalcInterpreter.g tree grammar CalcInterpreter public class CalcParser extends Parser { public final program_return program() throws RecognitionException { program_return retval = new program_return() try { // Calc.g:44:9: declarations statements EOF { pushfollow(follow_declarations_in_program412) declarations1=declarations() _fsp-- CalcLexer.java CalcParser.java Calc.tokens Contains for each rule r, a class r_return, which results in many CalcParser$ class files. CalcChecker.java CalcInterpreter.java stream_declarations.add(declarations1.gettree()) pushfollow(follow_statements_in_program414) statements2=statements() _fsp-- stream_statements.add(statements2.gettree()) EOF3=(Token)input.LT(1) match(input,eof,follow_eof_in_program416) stream_eof.add(eof3) catch (RecognitionException re) { reporterror(re) recover(input,re) return retval Most code that builds the AST is omitted! program : declarations statements EOF! 29 30 Calc Parser Java code (2) public final declarations_return declarations() throws RecognitionException { declarations_return retval = new declarations_return() try { loop1: do { int alt1=2 LA(1) - current lookahead Token. int LA1_0 = input.la(1) if ( (LA1_0==) ) alt1=1 switch (alt1) { case 1 : { pushfollow(follow_declaration_in_declarations463) declaration4=declaration() match(input,semicolon,follow_semicolon_in_declarations465) break default : break loop1 while (true) catch (RecognitionException re) { return retval declarations : (declaration SEMICOLON!)* Advantages With you can specify your compiler and let do the hard work of generating the compiler. But the generated Java code is similar to what you would write manually: it is possible (and easy!) to read and debug Java files generated by. The syntax for specifying scanners, parsers and tree walkers is the same. can generate recognizers for many programming languages (e.g. Java, C#, Python, (Objective) C, etc.) is well supported and has an active user community. 31 32

Some Tips Left associative! E left associative right associative operator precedence Left associative operator! : a! b! c " (a! b)! c Production rule: E ::= E! T T! E c T E b T T a parse tree dangling-else Second lecture on (lecture #9) will discuss some more advanced Tips and Techniques. which can be written (by eliminating left recursion) as E ::= X Y X Y ::= T ::=! T Y empty or using EBNF: E ::= T (! T)* 33 34 Right associative Right associative operator! : a! b! c " a! (b! c) Production rule: E ::= T! E T which can be written (using left factorisation) as E ::= T X X ::=! E empty or using EBNF: E ::= T (! E)? a T! E T b parse tree! E E T c Operator Precedence (1) Consider the following example a + b * c - d which should be parsed as (a + (b * c)) - d + - a * b expr2 This means that the operator * has precedence over the operators + and -. This can be reflected in the grammar by making sure that * is closer to the operands than + and -. expr1 : expr2 ((PLUS^ MINUS^) expr2)* expr2 : operand (TIMES^ operand)* operand : ENTIFIER c d 35 36

Operator Precedence (2) a + b * c - d Greedy (1) parse tree: expr1 expr1 : expr2 ((PLUS^ MINUS^) expr2)* expr2 : operand (TIMES^ operand)* operand : ENTIFIER constructed AST: Consider the classic if-then-else ambiguity (i.e., dangling else) e.g. stat : 'if' expr 'then' stat ('else' stat)? if b1 then if b2 then s1 else s2 MINUS Two possible parse trees: expr2 PLUS^ expr2 MINUS^ expr2 PLUS d stat stat operand operand TIMES^ operand operand a TIMES if b1 then stat if b1 then stat else s2 a b c d b c if b2 then s1 else s2 if b2 then s1 37 38 Greedy (2) So this ambiguity (which statement should the else be attached to) results in a parser nondeterminism. 3 warns you: warning(200): Foo.g:12:33: Decision can match input such as "'else'" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input If you make it clear to that you want the subrule to match greedily (i.e. the default behavior), will not generate the warning. stat : 'if' expr 'then' stat (options {greedy=true : 'else' stat)? Note: this is the way it should work according to the documentation. However, 3 still shows the warning. (Note that the generated compiler will work correctly though) 39