Context free grammars and predictive parsing



Similar documents
Syntaktická analýza. Ján Šturc. Zima 208

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

If-Then-Else Problem (a motivating example for LR grammars)

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

Compiler Construction

Lecture 9. Semantic Analysis Scoping and Symbol Table

Static vs. Dynamic. Lecture 10: Static Semantics Overview 1. Typical Semantic Errors: Java, C++ Typical Tasks of the Semantic Analyzer

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

FIRST and FOLLOW sets a necessary preliminary to constructing the LL(1) parsing table

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Textual Modeling Languages

Compiler I: Syntax Analysis Human Thought

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

Design Patterns in Parsing

CS143 Handout 08 Summer 2008 July 02, 2007 Bottom-Up Parsing

Compiler Construction

Introduction to Lex. General Description Input file Output file How matching is done Regular expressions Local names Using Lex

Parsing Expression Grammar as a Primitive Recursive-Descent Parser with Backtracking

Flex/Bison Tutorial. Aaron Myles Landwehr CAPSL 2/17/2012

03 - Lexical Analysis

Syntax Check of Embedded SQL in C++ with Proto

Semester Review. CSC 301, Fall 2015

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Programming Languages CIS 443

7. Building Compilers with Coco/R. 7.1 Overview 7.2 Scanner Specification 7.3 Parser Specification 7.4 Error Handling 7.5 LL(1) Conflicts 7.

Bottom-Up Parsing. An Introductory Example

Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Natural Language Database Interface for the Community Based Monitoring System *

CA Compiler Construction

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM

Intel Assembler. Project administration. Non-standard project. Project administration: Repository

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

ANTLR - Introduction. Overview of Lecture 4. ANTLR Introduction (1) ANTLR Introduction (2) Introduction

CSCI 3136 Principles of Programming Languages

Adaptive LL(*) Parsing: The Power of Dynamic Analysis

Semantic Analysis: Types and Type Checking

Master of Sciences in Informatics Engineering Programming Paradigms 2005/2006. Final Examination. January 24 th, 2006

A Programming Language Where the Syntax and Semantics Are Mutable at Runtime

Natural Language to Relational Query by Using Parsing Compiler

Scanner. tokens scanner parser IR. source code. errors

A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts

Anatomy of Programming Languages. William R. Cook

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b

We used attributes in Chapter 3 to augment a context-free grammar

Applies to Version 6 Release 5 X12.6 Application Control Structure

Communicating access and usage policies to crawlers using extensions to the Robots Exclusion Protocol Part 1: Extension of robots.

Programming Languages

C Compiler Targeting the Java Virtual Machine

Programming Language Concepts for Software Developers

Eindhoven University of Technology

Grammars and Parsing. 2. A finite nonterminal alphabet N. Symbols in this alphabet are variables of the grammar.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

CUP User's Manual. Scott E. Hudson Graphics Visualization and Usability Center Georgia Institute of Technology. Table of Contents.

Introduction to the 1st Obligatory Exercise

Architectural Design Patterns for Language Parsers

Bottom-Up Syntax Analysis LR - metódy

Software quality improvement via pattern matching

Development of a Relational Database Management System.

Chapter 2: Elements of Java

How to Improve Database Connectivity With the Data Tools Platform. John Graham (Sybase Data Tooling) Brian Payton (IBM Information Management)

Compiler Construction

Language provides a means of communication by sound and written

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Integrating Formal Models into the Programming Languages Course

Pemrograman Dasar. Basic Elements Of Java

Moving from CS 61A Scheme to CS 61B Java

Outline. Conditional Statements. Logical Data in C. Logical Expressions. Relational Examples. Relational Operators

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

In this Lecture SQL SELECT. Example Tables. SQL SELECT Overview. WHERE Clauses. DISTINCT and ALL SQL SELECT. For more information

CS 378 Big Data Programming. Lecture 9 Complex Writable Types

Programming Project 1: Lexical Analyzer (Scanner)

A Knowledge-based System for Translating FOL Formulas into NL Sentences

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

Antlr ANother TutoRiaL

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Chapter 2: Algorithm Discovery and Design. Invitation to Computer Science, C++ Version, Third Edition

The previous chapter provided a definition of the semantics of a programming

Chapter 7: Functional Programming Languages

Principles of Programming Languages Topic: Introduction Professor Louis Steinberg

Approximating Context-Free Grammars for Parsing and Verification

Generalizing Overloading for C++2000

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

SQL Server 2008 Core Skills. Gary Young 2011

SQL Database queries and their equivalence to predicate calculus

Transcription:

Context free grammars and predictive parsing Programming Language Concepts and Implementation Fall 2012, Lecture 6 Context free grammars Next week: LR parsing Describing programming language syntax Ambiguities and eliminating these The parser generator coco/r Overview Predictive parsing: Under the hood of coco/r 2

An example and a derivation = + * () Context free grammars Think of it as regular expressions + recursion Terminology: => + => + * => 2 + 3*4 Grammar 3.1 is an example of a grammar for straight-line programs. The start symbol is S (when the start symbol is not written explicitly it is conventional to assume that the left-hand nonterminal in the first production is the start symbol). The terminal symbols are id print num, + ( ) := ; - 1 non-terminal GRAMMAR 3.1: A syntax for straight-line programs. - 5 terminals (tokens): 1. S! S; +, S *, (, ), num 4. E! id 2. S! id := E 5. E! num - 4 productions (right hand sides) 3. S! print (L) 6. E! E + E - Terminals and nonterminals collectively are symbols 7. E! (S, E) 8. L! E 9. L! L, E and the nonterminals are S, E, and L. One sentence in the language of this grammar is Straight line programs (from book) S = S;S id := E print(l) E = id E + E (S,E) L = E L,E id := num; id := id + (id := num + num, id) where the source text (before lexical analysis) might have been a : = 7; b : = c + (d : = 5 + 6, d) The token-types (terminal symbols) are id, num, :=, and so on; the names (a,b,c,d) and numbers (7, 5, 6) are semantic values associated with some of the tokens. DERIVATIONS Another example To show that this sentence is in the language of the grammar, we can perform a derivation: Start with the start symbol, then repeatedly replace any nonterminal by one of its right-hand sides, as shown in Derivation 3.2. DERIVATION 3.2! S! S ; S! S ; id := E! id := E; id := E! id := num ; id := E! id := num ; id := E + E! id := num ; id := E + (S, E)! id := num ; id := id + (S, E)! id := num ; id := id + (id := E, E)! id := num ; id := id + (id := E + E, E)! id := num ; id := id + (id := E + E, id )! id := num ; id := id + (id := num + E, id)! id := num ; id := id + (id := num + num, id) 3 4

A context free grammar consists of - A finite set of nonterminals - A finite set of terminals - A finite set of productions - A choice of start symbol (a non-terminal) Official definition A production consists of - A nonterminal (called the left hand side) - A string of symbols (terminals or nonterminals) This is called Backus-Naur Form (BNF) 5 From MCIJ (note mixed notation) Example: Mini Java 6

SQL specification (in extended BNF)... <query specification> ::=!! SELECT [ <set quantifier> ] <select list> <table expression> <select list> ::=!! <asterisk>!! <select sublist> [ { <comma> <select sublist> }... ] <select sublist> ::= <derived column> <qualifier> <period> <asterisk> <derived column> ::= <value expression> [ <as clause> ] <as clause> ::= [ AS ] <column name> <table expression> ::=!! <from clause>!! [ <where clause> ]!! [ <group by clause> ]!! [ <having clause> ] http://savage.net.au/ SQL/sql-92.bnf <from clause> ::= FROM <table reference> [ { <comma> <table reference> }... ]... 7 Ambiguity

= + * () Ambiguity + 2 + 4 3 4 => + => + * => 2 + 3*4 2 3 => * => + * => 2 + 3*4 9 Encoding operator precedence Multiplication has higher precedence (binds stronger) than addition One nonterminal per precedence level Exercise: = + Term = Term * Term Term () - How many ways can you parse 2+3*4? - How about 2 + 3 + 4? 10

Ambiguity and associativity = - 5 2 3 2 Forcing left associativity 5 3 = - num 11 Exercise What ambiguities exist in the following grammar, and how do we get rid of them? = + * - / () 12

Exercise What ambiguities exist in the following grammar, and how do we get rid of them? = + * - / () * and / have higher precedence than -,+ All operators associate to the left, e.g., - 6-3-2 = (6-3)-2 6-(3-2) - 6/3*2 = (6/3)*2 6/(3*2) - 6-3+2 = (6-3)+2 6-(3+2) 13 Encoding operator precedence = + * - / () Use one non-terminal per precedence level Encoding associativity = + Term - Term Term Term = Term * num Term * () Term / num Term / () () or(better) = + - Term Term = Term * Term Term / Term () = + Term - Term Term Term = Term * Prim Term / Prim Prim Prim = () Exercise 14

Associativity of operators Most binary operators are left associative, e.g., +, -, *, / Few are right associative, e.g. = in C: x = y = 2 parsed as x = (y = 2) Forcing right associativity = ident = ident Some are non-associative, e.g., 1<2<3 is not legal Log = < =... 15 Consider the grammar Amguity: How to parse? Ambiguity: Dangling else Stmt = if then Stmt else Stmt if then Stmt id = if then if then id = else id = 16

Consider the grammar Amguity: How to parse Resolving the ambiguity Ambiguity: Dangling else Stmt = if then Stmt else Stmt if then Stmt id = if then if then id = else id = Stmt = Matched_Stmt Unmatched_Stmt Matched_Stmt = if then Matched_Stmt else Matched_Stmt id = Better to handle this using parser tricks. See later Unmatched_Stmt = if then Stmt if then Matched_Stmt else Unmatched_Stmt 17 The parser generator Coco/R

Extended BNF Example = Term { + Term - Term } Term = num { * num} Extra symbols - {α} means zero, one or many α - [α] means zero or one α - (α) is used for grouping EBNF is no more expressive than BNF, only more convenient 19 Using coco/r COMPILER essions... PRODUCTIONS /*-------------------------------------------------------------------*/ = Term { '+' Term '-' Term }. Term = number { '*' number }. essions =. Specification of start symbol END essions. 20

Using coco/r 21 Semantic actions in coco/r COMPILER essions public int res;... PRODUCTIONS /*-------------------------------------------------------------------*/ <out int n> (. int n1, n2;.) = Term<out n1> (. n = n1;.) { '+' Term<out n2> (. n = n+n2;.) '-' Term<out n2> (. n = n-n2;.) }. Term<out int n> = number (. n = Convert.ToInt32(t.val);.) { '*' number (. n = n*convert.toint32(t.val);.) }. essions (. int n;.) = <out n> (. res = n;.). END essions. 22

Method for parsing expressions In resulting Parser.cs void (out int n) {! int n1, n2;! Term(out n1);! n = n1;! while (la.kind == 3 la.kind == 4) {!! if (la.kind == 3) {!!! Get();!!! Term(out n2);!!! n = n+n2;!! } else {!!! Get();!!! Term(out n2);!!! n = n-n2;!! }! } } The generated parser Pass by reference, similar to ref If next token is + 23 Using coco/r with semantic actions 24

Suppose S is the start symbol of a grammar. To indicate that $ must come after a complete S- Predictive parsing phrase, we augment the grammar with a new start symbol S! and a new production S! " S$. In Grammar 3.8, E is the start symbol, so an augmented grammar is Grammar 3.10. Top-down parsing method aka LL-parsing GRAMMAR 3.10! S " E $! coco/r! generates LL parsers! T " T * F! E " E + T! T " T / F Produces! E " E # left-most T derivations! T " F! E " T! Example grammar 3.11 Guess a production based on the next token 3.2 PREDICTIVE PARSING Example parsing on board S = if E then S else S begin S L print E L = end ; S L E = num ident! F " id! F " num! F " (E) Some grammars are easy to parse using a simple algorithm known as recursive descent. In essence, each grammar production turns into one clause of a recursive function. We illustrate this by writing a recursive-descent parser for Grammar 3.11. GRAMMAR 3.11! S " if E then S else Rasmus S Ejlers Møgelberg! S " begin S L! S " print E!! L " end! L " ; S L!! E " num = num 25 Parser implementation A recursive-descent parser for this language has one function for each nonterminal and one clause for each production. final int IF=1, THEN=2, ELSE=3, BEGIN=4, END=5, PRINT=6, SEMI=7, NUM=8, EQ=9; int tok = gettoken(); void advance() {tok=gettoken();} void eat(int t) {if (tok==t) advance(); else error();} void S() {switch(tok) { case IF: eat(if); E(); eat(then); S(); eat(else); S(); break; case BEGIN: eat(begin); S(); L(); break; case PRINT: eat(print); E(); break; default: error(); }} void L() {switch(tok) { case END: eat(end); break; case SEMI: eat(semi); S(); L(); break; default: error(); 47 26

Parsing table S L E ---------------------------------------------------- if S->if E then S else S begin S->begin S L print S->print E end L->end ; L->;S L num E->num ident E->ident S = if E then S else S begin S L print E L = end ; S L E = num ident 27 Intended learning outcomes Construct grammars for programming languages Eliminate ambiguity by - Encoding operator precedence - Encoding operator associativity Use coco/r to create parsers and lexers 28