Approximating Context-Free Grammars for Parsing and Verification



Similar documents
If-Then-Else Problem (a motivating example for LR grammars)

CS143 Handout 08 Summer 2008 July 02, 2007 Bottom-Up Parsing

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Bottom-Up Syntax Analysis LR - metódy

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

Syntaktická analýza. Ján Šturc. Zima 208

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Scanner. tokens scanner parser IR. source code. errors

Bottom-Up Parsing. An Introductory Example

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

A Programming Language Where the Syntax and Semantics Are Mutable at Runtime

Formal Grammars and Languages

Compiler I: Syntax Analysis Human Thought

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, Class 4 Nancy Lynch

Joint Tactical Radio System Standard. JTRS Platform Adapter Interface Standard Version 1.3.3

Lecture 9. Semantic Analysis Scoping and Symbol Table

Unified Language for Network Security Policy Implementation

Compiler Construction

The Halting Problem is Undecidable

Compiler Design July 2004

SRM UNIVERSITY FACULTY OF ENGINEERING & TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF SOFTWARE ENGINEERING COURSE PLAN

CS5310 Algorithms 3 credit hours 2 hours lecture and 2 hours recitation every week

Computer Architecture Syllabus of Qualifying Examination

Intel Assembler. Project administration. Non-standard project. Project administration: Repository

CSCI 3136 Principles of Programming Languages

Compiler Construction

Model 2.4 Faculty member + student

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

SML/NJ Language Processing Tools: User Guide. Aaron Turon John Reppy

The Optimum One-Pass Strategy for Juliet

Course Manual Automata & Complexity 2015

1. Nondeterministically guess a solution (called a certificate) 2. Check whether the solution solves the problem (called verification)

Turing Machines: An Introduction

CS 341: Foundations of Computer Science II elearning Section Syllabus, Spring 2015

Context free grammars and predictive parsing

Implementation of Recursively Enumerable Languages using Universal Turing Machine in JFLAP

Ambiguity, closure properties, Overview of ambiguity, and Chomsky s standard form

Today s Agenda. Automata and Logic. Quiz 4 Temporal Logic. Introduction Buchi Automata Linear Time Logic Summary

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

SOLUTION Trial Test Grammar & Parsing Deficiency Course for the Master in Software Technology Programme Utrecht University

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b

Grammars and Parsing. 2. A finite nonterminal alphabet N. Symbols in this alphabet are variables of the grammar.

Introduction to Automata Theory. Reading: Chapter 1

Overview of E0222: Automata and Computability

Philadelphia University Faculty of Information Technology Department of Computer Science First Semester, 2007/2008.

Voice Driven Animation System

Pushdown Automata. place the input head on the leftmost input symbol. while symbol read = b and pile contains discs advance head remove disc from pile

Design Patterns in Parsing

Automata and Computability. Solutions to Exercises

NFAs with Tagged Transitions, their Conversion to Deterministic Automata and Application to Regular Expressions

Organization of DSLE part. Overview of DSLE. Model driven software engineering. Engineering. Tooling. Topics:

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

ω-automata Automata that accept (or reject) words of infinite length. Languages of infinite words appear:

Informatique Fondamentale IMA S8

Basic Parsing Algorithms Chart Parsing

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Efficient Monitoring of Parametric Context-Free Patterns

Fast nondeterministic recognition of context-free languages using two queues

Two-dimensional Languages

Textual Modeling Languages

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

Yacc: Yet Another Compiler-Compiler

TU e. Advanced Algorithms: experimentation project. The problem: load balancing with bounded look-ahead. Input: integer m 2: number of machines

CA Compiler Construction

A COMBINED FINE-GRAINED AND ROLE-BASED ACCESS CONTROL MECHANISM HONGI CHANDRA TJAN. Bachelor of Science. Oklahoma State University

Semantic Analysis: Types and Type Checking

How To Compare A Markov Algorithm To A Turing Machine

Semester Review. CSC 301, Fall 2015

PROPERTECHNIQUEOFSOFTWARE INSPECTIONUSING GUARDED COMMANDLANGUAGE

136 CHAPTER 4. INDUCTION, GRAPHS AND TREES

Software Engineering and Service Design: courses in ITMO University

Automata and Formal Languages

Natural Language to Relational Query by Using Parsing Compiler

Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi

C Compiler Targeting the Java Virtual Machine

Parsing Beyond Context-Free Grammars: Introduction

Honors Class (Foundations of) Informatics. Tom Verhoeff. Department of Mathematics & Computer Science Software Engineering & Technology

Lecture 2: Universality

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

FIRST and FOLLOW sets a necessary preliminary to constructing the LL(1) parsing table

Parsing by Example. Diplomarbeit der Philosophisch-naturwissenschaftlichen Fakultät der Universität Bern. vorgelegt von.

Adaptive LL(*) Parsing: The Power of Dynamic Analysis

[Refer Slide Time: 05:10]

Implementing Programming Languages. Aarne Ranta

Outline of today s lecture

Transcription:

Approximating Context-Free Grammars for Parsing and Verification Sylvain Schmitz LORIA, INRIA Nancy - Grand Est October 18, 2007

datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end /** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF %token VID %start dec %% dec: "of" "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp exp: VID "case" exp "of" match match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Standard ML Milner et al. [1997] Language Specification A Syntax Issue Program text Executable code Compiler

datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Standard ML Milner et al. [1997] A Syntax Issue Program text datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp ([], l) = rev l in filterp (l, []) end Executable code

datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Standard ML Milner et al. [1997] A Syntax Issue Program text datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp ([], l) = rev l in filterp (l, []) end Executable code MLton Moscow ML Poly/ML SML/NJ Error: match.sml 9.25. Syntax error: replacing EQUALOP with DARROW

datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Standard ML Milner et al. [1997] A Syntax Issue Program text datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp ([], l) = rev l in filterp (l, []) end Executable code MLton Moscow ML Poly/ML SML/NJ! Toplevel input:! filterp ([], l) = rev l! ˆ! Syntax error.

datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Standard ML Milner et al. [1997] A Syntax Issue Program text datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp ([], l) = rev l in filterp (l, []) end Executable code MLton Moscow ML Poly/ML SML/NJ Error: => expected but = was found

datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Standard ML Milner et al. [1997] A Syntax Issue Program text datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp ([], l) = rev l in filterp (l, []) end Executable code MLton Moscow ML Poly/ML SML/NJ stdin:7.24-7.29 Error: syntax error: deleting EQUALOP ID

datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end /** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF %token VID %start dec %% dec: "of" "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp exp: VID "case" exp "of" match match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Parsers A Syntax Issue Language Specification Program text Executable code Compiler

datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end /** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF %token VID %start dec %% dec: "of" "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp exp: VID "case" exp "of" match match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Parsers A Syntax Issue Language Specification Program text Executable code Parser Compiler

/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF "of" %token VID %start dec %% dec: "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp "case" exp "of" match exp: VID match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Parsers A Syntax Issue Context-free grammar /** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF "of" %token VID %start dec %% dec: "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp exp: VID "case" exp "of" match match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat Parser %%

/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF "of" %token VID %start dec %% dec: "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp "case" exp "of" match exp: VID match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Parsers A Syntax Issue Context-free grammar... NONE => filterp(r, l) filterp ([], l ) = rev l Input tokens Parser generator Parse tree fvalbind sfvalbind exp match mrule fvalbind sfvalbind Parser... pat exp atpats exp NONE => filterp(r, l) filterp ([], l ) = rev l

/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF "of" %token VID %start dec %% dec: "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp "case" exp "of" match exp: VID match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Parsers A Syntax Issue Context-free grammar... NONE => filterp(r, l) filterp ([], l ) = rev l Input tokens Parser generator Parse tree fvalbind sfvalbind exp match mrule fvalbind sfvalbind Parser... pat exp atpats exp NONE => filterp(r, l) filterp ([], l ) = rev l

LALR(1) Parser Generator Grammar Conflicts GNU Bison state 20 6 exp: "case" exp "of" match. 8 match: match. mrule shift, and go to state 24 [reduce using rule 6 (exp)] Restricted grammar class

LALR(1) Parser Generator Grammar Conflicts CFG GNU Bison state 20 6 exp: "case" exp "of" match. 8 match: match. mrule shift, and go to state 24 [reduce using rule 6 (exp)] LALR(1) Restricted grammar class

Dealing with Conflicts An Objective Measure [Malloy et al., 2002] on a C# Grammar Grammar Conflicts 700 2002_malloy.data using 1:($2+$3) 600 500 LALR(1) conflicts 400 300 200 100 0 2 4 6 8 10 12 14 16 18 20 Parser versions

Dealing with Conflicts A Subjective Measure Grammar Conflicts Courtesy of http://www.phdcomics.com.

Dealing with Conflicts A Subjective Measure Grammar Conflicts Courtesy of http://www.phdcomics.com.

Dealing with Conflicts A Subjective Measure Grammar Conflicts Courtesy of http://www.phdcomics.com.

State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007] LR(k) LALR(1)

State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007] LR-Regular LR(k) LALR(1)

State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007]

/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF "of" %token VID %start dec %% dec: "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp "case" exp "of" match exp: VID match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Ambiguity Solutions Context-free grammar exp match mrule exp case a of b=> case b of c=> c d=> d Input tokens Parser generator Parse forest match match exp pat exp mrule mrule case a of b=> case b of c=> c d=> d exp match match mrule exp match Parser exp pat exp mrule mrule case a of b=> case b of c=> c d=> d

/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF "of" %token VID %start dec %% dec: "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp "case" exp "of" match exp: VID match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Ambiguity Solutions Context-free grammar exp match mrule exp case a of b=> case b of c=> c d=> d Input tokens Parser generator Parse forest match match exp pat exp mrule mrule case a of b=> case b of c=> c d=> d exp match match mrule exp match Parser exp pat exp mrule mrule case a of b=> case b of c=> c d=> d

State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007] UCFG

State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007] Unsafe Safe UCFG LR-Regular LR(k) LALR(1)

State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007] Unsafe HVRU Safe UCFG

State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007] Unsafe HVRU Safe UCFG LR-Regular LR(k) LALR(1)

Contributions Solutions Noncanonical parsing methods [Szymanski and Williams, 1976, Tai, 1979] Noncanonical LALR(1) Shift-Resolve Noncanonical unambiguity test Framework for grammar approximations

Contributions Noncanonical parsing methods [Szymanski and Williams, 1976, Tai, 1979] Noncanonical LALR(1) Shift-Resolve Noncanonical unambiguity test Solutions UCFG LR-Regular LR(k) NLALR(1) LALR(1) CFG Framework for grammar approximations

Contributions Noncanonical parsing methods [Szymanski and Williams, 1976, Tai, 1979] Noncanonical LALR(1) Shift-Resolve Solutions UCFG LR-Regular LR(k) ShRe CFG Noncanonical unambiguity test Framework for grammar approximations LALR(1)

Contributions Noncanonical parsing methods [Szymanski and Williams, 1976, Tai, 1979] Noncanonical LALR(1) Shift-Resolve Solutions HVRU UCFG NU LR-Regular LR(k) CFG Noncanonical unambiguity test Framework for grammar approximations LALR(1)

Contributions Noncanonical parsing methods [Szymanski and Williams, 1976, Tai, 1979] Noncanonical LALR(1) Shift-Resolve Solutions HVRU UCFG NU LR-Regular LR(k) CFG Noncanonical unambiguity test Framework for grammar approximations LALR(1)

Bracketed Grammars Grammar Approximations G = N, T, P, S, V = N T dec fvalbind fvalbind sfvalbind exp match match mrule atpats atpats pat atpat 1 2 3 4 5 6 7 8 9 10 11 12 fun fvalbind sfvalbind fvalbind sfvalbind vid atpats = exp case exp of match mrule match mrule pat => exp atpat atpats atpat vid atpat vid

Bracketed Grammars Grammar Approximations G b = N, T b, P b, S, V b = N T b dec fvalbind fvalbind sfvalbind exp match match mrule atpats atpats pat atpat 1 d 1 fun fvalbind r 1 2 d 2 sfvalbind r 2 3 d 3 fvalbind sfvalbind r 3 4 d 4 vid atpats = exp r 4 5 d 5 case exp of match r 5 6 d 6 mrule r 6 7 d 7 match mrule r 7 8 d 8 pat => exp r 8 9 d 9 atpat r 9 10 d 10 atpats atpat r 10 11 d 11 vid atpat r 11 12 d 12 vid r 12

Positions Grammar Approximations fvalbind fvalbind sfvalbind sfvalbind vid atpats = exp d 3 d 2 sfvalbind r 2 d4 vid atpats = exp r 4 r 3

Position Graph Γ Left-to-right Walks in Trees Grammar Approximations fvalbind fvalbind sfvalbind d 4 vid sfvalbind atpats = exp d 3 d 2 sfvalbind r 2 d 4 vid atpats = exp r 4 r 3

Position Graph Γ Left-to-right Walks in Trees Grammar Approximations fvalbind fvalbind sfvalbind sfvalbind sfvalbind vid atpats = exp d 3 d 2 sfvalbind r 2 d 4 vid atpats = exp r 4 r3

Position Graph Γ Left-to-right Walks in Trees Grammar Approximations fvalbind r 3 fvalbind sfvalbind sfvalbind vid atpats = exp d 3 d 2 sfvalbind r 2 d 4 vid atpats = exp r 4 r 3

Position Graph Γ Left-to-right Walks in Trees Grammar Approximations... fvalbind d 3 r 3 fvalbind sfvalbind r 4 d 2 sfvalbind r 2 d 4 vid atpats = exp.........

Position Automaton Γ/ Grammar Approximations Definition Γ/ is the quotient of Γ by an equivalence relation between positions. Theorem (Language over-approximation) L(G b ) L(Γ/ ) T b

Example: item 0 Equivalence Grammar Approximations fvalbind d 3 r 3 fvalbind sfvalbind r 4 d 2 sfvalbind r 2 d 4 vid atpats = exp r 4 d 4 vid atpats = exp equivalence class [ sfvalbind vid atpats 4 = exp ] LR(0) items Γ/item 0 : nondeterministic LR(0) automaton

Example: item 0 Equivalence [ fvalbind sfvalbind ] 2 [ fvalbind fvalbind 3 sfvalbind ] Grammar Approximations d 4 d 4 [ sfvalbind vid 4 atpats = exp ] vid [ sfvalbind vid atpats 4 = exp ] atpats [ sfvalbind 4 vid atpats = exp ] = [ sfvalbind 4 vid atpats = exp ] exp [ sfvalbind 4 vid atpats = exp ] r 4 [ fvalbind 2 sfvalbind ] r 4 [ fvalbind 3 fvalbind sfvalbind ]

Summary Grammar Approximations general framework for approximations applications: parser construction ambiguity detection XML validation [Segoufin and Vianu, 2002]? symbolic supertagging [Boullier, 2003]?

Summary Grammar Approximations general framework for approximations applications: parser construction ambiguity detection XML validation [Segoufin and Vianu, 2002]? symbolic supertagging [Boullier, 2003]?

Shift-Resolve Parsing Parsing Principles noncanonical k = 1 reduced lookahead symbol resolve = reduce + pushback: emulates a bounded reduced lookahead without any preset bound

Shift-Resolve Parsing Parsing Principles noncanonical k = 1 reduced lookahead symbol resolve = reduce + pushback: emulates a bounded reduced lookahead without any preset bound

Shift-Resolve Parse Parsing Example fvalbind fvalbind sfvalbind exp sfvalbind atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l

Shift-Resolve Parse Parsing Example fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l

Shift-Resolve Parse Parsing Example fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l

Shift-Resolve Parse Parsing Example fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l

Shift-Resolve Parse Parsing Example fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l

Shift-Resolve Parse Parsing Example fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l

Generating the Parser Parser Construction 1. position automaton 2. determinization by subset construction

Subset Construction Principle Parser Construction d i transitions denote traditional item closures r i transitions denote a phrase that should be reduced other transitions denote shifts items in the construction hold 1. a state of the position automaton 2. a parsing action 3. a pushback length

Subset Construction Principle Parser Construction d i transitions denote traditional item closures r i transitions denote a phrase that should be reduced other transitions denote shifts items in the construction hold 1. a state of the position automaton 2. a parsing action 3. a pushback length

Subset Construction Example Parser Construction exp case exp of match, 0, 0 match match mrule, 0, 0

Subset Construction Example Parser Construction r 5 exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0

Subset Construction Example Parser Construction r 4 exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0

Subset Construction Example Parser Construction exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0

Subset Construction Example Parser Construction exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0

Subset Construction Example Parser Construction exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0 fvalbind fvalbind sfvalbind, 5, 1 match match mrule, 0, 0

Subset Construction Example exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0 Parser Construction d 8 fvalbind fvalbind sfvalbind, 5, 1 match match mrule, 0, 0 mrule pat => exp, 0, 0

Subset Construction Example exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0 Parser Construction fvalbind fvalbind sfvalbind, 5, 1 match match mrule, 0, 0 mrule pat => exp, 0, 0 pat vid atpat, 0, 0 sfvalbind vid atpats = exp, 0, 0

Construction Failure Parser Construction exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0

Construction Failure Parser Construction r 5 exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0 mrule pat exp, 5, 0

Construction Failure Parser Construction exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0 mrule pat exp, 5, 0 match mrule, 5, 0 match match mrule, 5, 0

Complexity Parser Construction Γ/ : size of the position automaton A : size of the parser: O(2 Γ/ P ) parsing time complexity for input w: O( w )

Complexity Parser Construction Γ/ : size of the position automaton Γ/item 0 = O( G ) A : size of the parser: O(2 Γ/ P ) parsing time complexity for input w: O( w )

Limitations Parser Construction incomparable with classical parsing techniques + subset construction mendable

Limitations Parser Construction incomparable with classical parsing techniques + subset construction mendable

Summary Parser Construction Shift Resolve parsers 1. Large class of grammars accepted 2. Unambiguity 3. Linear time parsing 2-steps construction 1. Simple 2. Flexible

Principles a bracketed sentence = a derivation tree ambiguity = more than one tree with the same yield d 6 d 8 d 13 vid r 13 => d 5 case d 14 vid r 14 of d 7 d 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 13 vid r 13 => d 5 case d 14 vid r 14 of d 7 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 r 5 r 8 r 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 construct a FSA A such that L(G b ) L(A), and look for bracketed sentences with the same yield

Principles a bracketed sentence = a derivation tree ambiguity = more than one tree with the same yield d 6 d 8 d 13 vid r 13 => d 5 case d 14 vid r 14 of d 7 d 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 13 vid r 13 => d 5 case d 14 vid r 14 of d 7 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 r 5 r 8 r 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 construct a FSA A such that L(G b ) L(A), and look for bracketed sentences with the same yield

Principles a bracketed sentence = a derivation tree ambiguity = more than one tree with the same yield d 6 d 8 d 13 vid r 13 => d 5 case d 14 vid r 14 of d 7 d 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 13 vid r 13 => d 5 case d 14 vid r 14 of d 7 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 r 5 r 8 r 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 construct a FSA A such that L(G b ) L(A), and look for bracketed sentences with the same yield

RU( ) Regular Unambiguity G is regular unambiguous for of finite index, if there does not exist w b w b in L(Γ/ ) T b with h(w b ) = h(w b ) LR(0) RU(item 0 ) regular approximations are too weak

RU( ) Regular Unambiguity G is regular unambiguous for of finite index, if there does not exist w b w b in L(Γ/ ) T b with h(w b ) = h(w b ) LR(0) RU(item 0 ) regular approximations are too weak

Nonterminal Transitions Noncanonical Unambiguity SF(G b ) L(Γ/ ) look for two different bracketed sentential forms in L(Γ/ ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 a nonterminal transition represents exactly its derived context-free language

Nonterminal Transitions Noncanonical Unambiguity SF(G b ) L(Γ/ ) look for two different bracketed sentential forms in L(Γ/ ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 a nonterminal transition represents exactly its derived context-free language

Nonterminal Transitions Noncanonical Unambiguity SF(G b ) L(Γ/ ) look for two different bracketed sentential forms in L(Γ/ ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 a nonterminal transition represents exactly its derived context-free language

Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 d 14 vid r 14 => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 14 vid r 14 => d 5 case exp of match r 5 r 8 r 6 mrules r 7 epsilon: mae

Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 d 14 vid r 14 => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 14 vid r 14 => d 5 case exp of match r 5 r 8 r 6 mrules r 7 epsilon: mae

Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 d 14 vid r 14 => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 14 vid r 14 => d 5 case exp of match r 5 r 8 r 6 mrules r 7 epsilon: mae

Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 d 14 vid r 14 => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 14 vid r 14 => d 5 case exp of match r 5 r 8 r 6 mrules r 7 shift: mas

Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 d 14 vid r 14 => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 14 vid r 14 => d 5 case exp of match r 5 r 8 r 6 mrules r 7 nothing!

Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 shift: mas

Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 conflict: mac

Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 conflict: mac

Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 conflict: mac

Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 shift: mas

Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 reduce: mar

Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 conflict: mac

NU( ) Noncanonical Unambiguity ma=mas mae mac mar G is noncanonically unambiguous if there does not exist a relation (q s, q s ) ma (q f, q f ) that uses mac at some step Computation in O( Γ/ 2 ) in space

Comparisons Noncanonical Unambiguity Regular Unambiguity RU( ) Bounded-length detection schemes LR(k) and LR-Regular (LR(Π)) Horizontal and vertical ambiguity (HVRU( ))

Bounded-length detection [Gorn, 1963, Cheung and Uzgalis, 1995, Schröer, 2001, Jampana, 2005] Noncanonical Unambiguity generate sentences not conservative prefix m prevents from false positives in sentences of length < m need to generate a 2n +1 to find G n 4 ambiguous, but G n 4 NU(item 0) S A B n a, A Aaa a, B 1 aa, B 2 B 1 B 1,..., B n B n 1 B n 1 (G n 4 )

LR(k) and LR-Regular Noncanonical Unambiguity [Knuth, 1965, Hunt III et al., 1975, Čulik and Cohen, 1973, Heilbrunner, 1983] conservative tests define item Π s.t. LR(Π) NU(item Π ) need a LR(2 n ) test to prove G n 3 unambiguous, but G n 3 NU(item 0) S A B n, A Aaa a, B 1 aa, B 2 B 1 B 1,..., B n B n 1 B n 1 (G n 3 )

Implementation For the whole SML grammar: Experimental Results conflicts in the LALR(1) parser sml.y: conflicts: 223 shift/reduce, 35 reduce/reduce Our tool: 89 potential ambiguities with LR(1) precision detected For the SML grammar fragment: 2 potential ambiguities with LR(0) precision detected: (match -> mrule., match -> match. mrule ) (match -> match. mrule, match -> match mrule. ) NU(item 1 ) correctly identifies 87% of our unambiguous grammars 73% of the non-lalr(1) ones

Summary Experimental Results conservative ambiguity detection provably better than several other techniques also experimentally better

Conclusion Closing Comments Main issues in parser development: nondeterminism ambiguity in particular Deterministic parsers for larger classes of grammars Ambiguity detection algorithm

Directions for Future Work Future Work Linear time parsing for NU( ) grammars? Improved implementation Noncanonical languages Regular approximations

Future Work Thanks!

References Our Issue Shift/Reduce Conflict GNU Bison state 20 6 exp: "case" exp "of" match. 8 match: match. mrule shift, and go to state 24 [reduce using rule 6 (exp)]

References Our Issue Shift/Reduce Conflict Which action to choose? fvalbind match match mrule mrule pat exp pat exp... of SOME y => filterp(r, y :: l ) NONE => filterp(r, l)

References Our Issue Shift/Reduce Conflict Which action to choose? Reduce? fvalbind sfvalbind exp match mrule sfvalbind... pat exp error! vid of SOME y => filterp(r, y :: l ) NONE => filterp(r, l)

References Our Issue Shift/Reduce Conflict Which action to choose? Shift? fvalbind match match mrule mrule pat exp pat exp... of SOME y => filterp(r, y :: l ) NONE => filterp(r, l)

References Our Issue Shift/Reduce Conflict Which action to choose? fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l

References Our Issue Shift/Reduce Conflict Which action to choose? Reduce? fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l

References Our Issue Shift/Reduce Conflict Which action to choose? Shift? fvalbind fvalbind sfvalbind exp match match mrule... pat exp pat exp error! NONE=> filterp(r, l) filterp ([], l ) = rev l fvalbind

References Unbounded Lookahead sfvb mrule vid atpats =... pat =>... atpat vid atpat......

References Limitations Ambiguity Report grambiguity [Brabrand et al., 2007] *** horizontal ambiguity at E[plus]: Exp <--> + Exp ambiguous string: "x+x+x" ANTLRWorks [Parr, 2007]

References Other Limitations memory requirements: a solution could be a NLALR test dynamic disambiguation: inverse problem, some means to deciding equivalence needed

References H. J. S. Basten. Ambiguity detection methods for context-free grammars. Master s thesis, Centrum voor Wiskunde en Informatica, Universiteit van Amsterdam, Aug. 2007. P. Boullier. Supertagging: A non-statistical parsing-based approach. In IWPT 03, pages 55 65, 2003. URL ftp://ftp.inria.fr/inria/projects/atoll/ Pierre.Boullier/supertaggeur final.pdf. C. Brabrand, R. Giegerich, and A. Møller. Analyzing ambiguity of context-free grammars. In J. Holub and J. Žd árek, editors, CIAA 07, 2007. URL http://www.brics.dk/ brabrand/grambiguity/. To appear in Lecture Notes in Computer Science. D. G. Cantor. On the ambiguity problem of Backus