Approximating Context-Free Grammars for Parsing and Verification Sylvain Schmitz LORIA, INRIA Nancy - Grand Est October 18, 2007
datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end /** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF %token VID %start dec %% dec: "of" "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp exp: VID "case" exp "of" match match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Standard ML Milner et al. [1997] Language Specification A Syntax Issue Program text Executable code Compiler
datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Standard ML Milner et al. [1997] A Syntax Issue Program text datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp ([], l) = rev l in filterp (l, []) end Executable code
datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Standard ML Milner et al. [1997] A Syntax Issue Program text datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp ([], l) = rev l in filterp (l, []) end Executable code MLton Moscow ML Poly/ML SML/NJ Error: match.sml 9.25. Syntax error: replacing EQUALOP with DARROW
datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Standard ML Milner et al. [1997] A Syntax Issue Program text datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp ([], l) = rev l in filterp (l, []) end Executable code MLton Moscow ML Poly/ML SML/NJ! Toplevel input:! filterp ([], l) = rev l! ˆ! Syntax error.
datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Standard ML Milner et al. [1997] A Syntax Issue Program text datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp ([], l) = rev l in filterp (l, []) end Executable code MLton Moscow ML Poly/ML SML/NJ Error: => expected but = was found
datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Standard ML Milner et al. [1997] A Syntax Issue Program text datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp ([], l) = rev l in filterp (l, []) end Executable code MLton Moscow ML Poly/ML SML/NJ stdin:7.24-7.29 Error: syntax error: deleting EQUALOP ID
datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end /** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF %token VID %start dec %% dec: "of" "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp exp: VID "case" exp "of" match match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Parsers A Syntax Issue Language Specification Program text Executable code Compiler
datatype a option = NONE SOME of a fun filter pred l = let fun filterp (x::r, l) = case (pred x) filterp ([], l) = rev l in of SOME y => filterp(r, y::l) NONE => filterp(r, l) filterp (l, []) end /** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF %token VID %start dec %% dec: "of" "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp exp: VID "case" exp "of" match match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Parsers A Syntax Issue Language Specification Program text Executable code Parser Compiler
/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF "of" %token VID %start dec %% dec: "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp "case" exp "of" match exp: VID match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Parsers A Syntax Issue Context-free grammar /** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF "of" %token VID %start dec %% dec: "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp exp: VID "case" exp "of" match match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat Parser %%
/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF "of" %token VID %start dec %% dec: "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp "case" exp "of" match exp: VID match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Parsers A Syntax Issue Context-free grammar... NONE => filterp(r, l) filterp ([], l ) = rev l Input tokens Parser generator Parse tree fvalbind sfvalbind exp match mrule fvalbind sfvalbind Parser... pat exp atpats exp NONE => filterp(r, l) filterp ([], l ) = rev l
/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF "of" %token VID %start dec %% dec: "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp "case" exp "of" match exp: VID match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Parsers A Syntax Issue Context-free grammar... NONE => filterp(r, l) filterp ([], l ) = rev l Input tokens Parser generator Parse tree fvalbind sfvalbind exp match mrule fvalbind sfvalbind Parser... pat exp atpats exp NONE => filterp(r, l) filterp ([], l ) = rev l
LALR(1) Parser Generator Grammar Conflicts GNU Bison state 20 6 exp: "case" exp "of" match. 8 match: match. mrule shift, and go to state 24 [reduce using rule 6 (exp)] Restricted grammar class
LALR(1) Parser Generator Grammar Conflicts CFG GNU Bison state 20 6 exp: "case" exp "of" match. 8 match: match. mrule shift, and go to state 24 [reduce using rule 6 (exp)] LALR(1) Restricted grammar class
Dealing with Conflicts An Objective Measure [Malloy et al., 2002] on a C# Grammar Grammar Conflicts 700 2002_malloy.data using 1:($2+$3) 600 500 LALR(1) conflicts 400 300 200 100 0 2 4 6 8 10 12 14 16 18 20 Parser versions
Dealing with Conflicts A Subjective Measure Grammar Conflicts Courtesy of http://www.phdcomics.com.
Dealing with Conflicts A Subjective Measure Grammar Conflicts Courtesy of http://www.phdcomics.com.
Dealing with Conflicts A Subjective Measure Grammar Conflicts Courtesy of http://www.phdcomics.com.
State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007] LR(k) LALR(1)
State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007] LR-Regular LR(k) LALR(1)
State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007]
/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF "of" %token VID %start dec %% dec: "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp "case" exp "of" match exp: VID match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Ambiguity Solutions Context-free grammar exp match mrule exp case a of b=> case b of c=> c d=> d Input tokens Parser generator Parse forest match match exp pat exp mrule mrule case a of b=> case b of c=> c d=> d exp match match mrule exp match Parser exp pat exp mrule mrule case a of b=> case b of c=> c d=> d
/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0 262 63181 4. */ %token CASE "case" %token FUN "fun" %token MATCH "=>" %token OF "of" %token VID %start dec %% dec: "fun" fvalbind fvalbind: sfvalbind fvalbind sfvalbind sfvalbind: VID atpats = exp "case" exp "of" match exp: VID match: mrule match mrule mrule: pat "=>" exp atpats: atpat atpats atpat atpat: VID pat: VID atpat %% Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Ambiguity Solutions Context-free grammar exp match mrule exp case a of b=> case b of c=> c d=> d Input tokens Parser generator Parse forest match match exp pat exp mrule mrule case a of b=> case b of c=> c d=> d exp match match mrule exp match Parser exp pat exp mrule mrule case a of b=> case b of c=> c d=> d
State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007] UCFG
State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007] Unsafe Safe UCFG LR-Regular LR(k) LALR(1)
State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007] Unsafe HVRU Safe UCFG
State of the Art LR(k) [Knuth, 1965] Solutions CFG LR-Regular [Čulik and Cohen, 1973] Generalized LR [Tomita, 1986] Unambiguous CFGs [Cantor, 1962, Chomsky and Schützenberger, 1963] Horizontal and vertical unambiguity test [Brabrand et al., 2007] Unsafe HVRU Safe UCFG LR-Regular LR(k) LALR(1)
Contributions Solutions Noncanonical parsing methods [Szymanski and Williams, 1976, Tai, 1979] Noncanonical LALR(1) Shift-Resolve Noncanonical unambiguity test Framework for grammar approximations
Contributions Noncanonical parsing methods [Szymanski and Williams, 1976, Tai, 1979] Noncanonical LALR(1) Shift-Resolve Noncanonical unambiguity test Solutions UCFG LR-Regular LR(k) NLALR(1) LALR(1) CFG Framework for grammar approximations
Contributions Noncanonical parsing methods [Szymanski and Williams, 1976, Tai, 1979] Noncanonical LALR(1) Shift-Resolve Solutions UCFG LR-Regular LR(k) ShRe CFG Noncanonical unambiguity test Framework for grammar approximations LALR(1)
Contributions Noncanonical parsing methods [Szymanski and Williams, 1976, Tai, 1979] Noncanonical LALR(1) Shift-Resolve Solutions HVRU UCFG NU LR-Regular LR(k) CFG Noncanonical unambiguity test Framework for grammar approximations LALR(1)
Contributions Noncanonical parsing methods [Szymanski and Williams, 1976, Tai, 1979] Noncanonical LALR(1) Shift-Resolve Solutions HVRU UCFG NU LR-Regular LR(k) CFG Noncanonical unambiguity test Framework for grammar approximations LALR(1)
Bracketed Grammars Grammar Approximations G = N, T, P, S, V = N T dec fvalbind fvalbind sfvalbind exp match match mrule atpats atpats pat atpat 1 2 3 4 5 6 7 8 9 10 11 12 fun fvalbind sfvalbind fvalbind sfvalbind vid atpats = exp case exp of match mrule match mrule pat => exp atpat atpats atpat vid atpat vid
Bracketed Grammars Grammar Approximations G b = N, T b, P b, S, V b = N T b dec fvalbind fvalbind sfvalbind exp match match mrule atpats atpats pat atpat 1 d 1 fun fvalbind r 1 2 d 2 sfvalbind r 2 3 d 3 fvalbind sfvalbind r 3 4 d 4 vid atpats = exp r 4 5 d 5 case exp of match r 5 6 d 6 mrule r 6 7 d 7 match mrule r 7 8 d 8 pat => exp r 8 9 d 9 atpat r 9 10 d 10 atpats atpat r 10 11 d 11 vid atpat r 11 12 d 12 vid r 12
Positions Grammar Approximations fvalbind fvalbind sfvalbind sfvalbind vid atpats = exp d 3 d 2 sfvalbind r 2 d4 vid atpats = exp r 4 r 3
Position Graph Γ Left-to-right Walks in Trees Grammar Approximations fvalbind fvalbind sfvalbind d 4 vid sfvalbind atpats = exp d 3 d 2 sfvalbind r 2 d 4 vid atpats = exp r 4 r 3
Position Graph Γ Left-to-right Walks in Trees Grammar Approximations fvalbind fvalbind sfvalbind sfvalbind sfvalbind vid atpats = exp d 3 d 2 sfvalbind r 2 d 4 vid atpats = exp r 4 r3
Position Graph Γ Left-to-right Walks in Trees Grammar Approximations fvalbind r 3 fvalbind sfvalbind sfvalbind vid atpats = exp d 3 d 2 sfvalbind r 2 d 4 vid atpats = exp r 4 r 3
Position Graph Γ Left-to-right Walks in Trees Grammar Approximations... fvalbind d 3 r 3 fvalbind sfvalbind r 4 d 2 sfvalbind r 2 d 4 vid atpats = exp.........
Position Automaton Γ/ Grammar Approximations Definition Γ/ is the quotient of Γ by an equivalence relation between positions. Theorem (Language over-approximation) L(G b ) L(Γ/ ) T b
Example: item 0 Equivalence Grammar Approximations fvalbind d 3 r 3 fvalbind sfvalbind r 4 d 2 sfvalbind r 2 d 4 vid atpats = exp r 4 d 4 vid atpats = exp equivalence class [ sfvalbind vid atpats 4 = exp ] LR(0) items Γ/item 0 : nondeterministic LR(0) automaton
Example: item 0 Equivalence [ fvalbind sfvalbind ] 2 [ fvalbind fvalbind 3 sfvalbind ] Grammar Approximations d 4 d 4 [ sfvalbind vid 4 atpats = exp ] vid [ sfvalbind vid atpats 4 = exp ] atpats [ sfvalbind 4 vid atpats = exp ] = [ sfvalbind 4 vid atpats = exp ] exp [ sfvalbind 4 vid atpats = exp ] r 4 [ fvalbind 2 sfvalbind ] r 4 [ fvalbind 3 fvalbind sfvalbind ]
Summary Grammar Approximations general framework for approximations applications: parser construction ambiguity detection XML validation [Segoufin and Vianu, 2002]? symbolic supertagging [Boullier, 2003]?
Summary Grammar Approximations general framework for approximations applications: parser construction ambiguity detection XML validation [Segoufin and Vianu, 2002]? symbolic supertagging [Boullier, 2003]?
Shift-Resolve Parsing Parsing Principles noncanonical k = 1 reduced lookahead symbol resolve = reduce + pushback: emulates a bounded reduced lookahead without any preset bound
Shift-Resolve Parsing Parsing Principles noncanonical k = 1 reduced lookahead symbol resolve = reduce + pushback: emulates a bounded reduced lookahead without any preset bound
Shift-Resolve Parse Parsing Example fvalbind fvalbind sfvalbind exp sfvalbind atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l
Shift-Resolve Parse Parsing Example fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l
Shift-Resolve Parse Parsing Example fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l
Shift-Resolve Parse Parsing Example fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l
Shift-Resolve Parse Parsing Example fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l
Shift-Resolve Parse Parsing Example fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l
Generating the Parser Parser Construction 1. position automaton 2. determinization by subset construction
Subset Construction Principle Parser Construction d i transitions denote traditional item closures r i transitions denote a phrase that should be reduced other transitions denote shifts items in the construction hold 1. a state of the position automaton 2. a parsing action 3. a pushback length
Subset Construction Principle Parser Construction d i transitions denote traditional item closures r i transitions denote a phrase that should be reduced other transitions denote shifts items in the construction hold 1. a state of the position automaton 2. a parsing action 3. a pushback length
Subset Construction Example Parser Construction exp case exp of match, 0, 0 match match mrule, 0, 0
Subset Construction Example Parser Construction r 5 exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0
Subset Construction Example Parser Construction r 4 exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0
Subset Construction Example Parser Construction exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0
Subset Construction Example Parser Construction exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0
Subset Construction Example Parser Construction exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0 fvalbind fvalbind sfvalbind, 5, 1 match match mrule, 0, 0
Subset Construction Example exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0 Parser Construction d 8 fvalbind fvalbind sfvalbind, 5, 1 match match mrule, 0, 0 mrule pat => exp, 0, 0
Subset Construction Example exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0 Parser Construction fvalbind fvalbind sfvalbind, 5, 1 match match mrule, 0, 0 mrule pat => exp, 0, 0 pat vid atpat, 0, 0 sfvalbind vid atpats = exp, 0, 0
Construction Failure Parser Construction exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0
Construction Failure Parser Construction r 5 exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0 mrule pat exp, 5, 0
Construction Failure Parser Construction exp case exp of match, 0, 0 match match mrule, 0, 0 sfvalbind vid atpats = exp, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 fvalbind sfvalbind, 5, 0 fvalbind fvalbind sfvalbind, 5, 0 dec fun fvalbind, 5, 0 S dec $, 5, 0 mrule pat exp, 5, 0 match mrule, 5, 0 match match mrule, 5, 0
Complexity Parser Construction Γ/ : size of the position automaton A : size of the parser: O(2 Γ/ P ) parsing time complexity for input w: O( w )
Complexity Parser Construction Γ/ : size of the position automaton Γ/item 0 = O( G ) A : size of the parser: O(2 Γ/ P ) parsing time complexity for input w: O( w )
Limitations Parser Construction incomparable with classical parsing techniques + subset construction mendable
Limitations Parser Construction incomparable with classical parsing techniques + subset construction mendable
Summary Parser Construction Shift Resolve parsers 1. Large class of grammars accepted 2. Unambiguity 3. Linear time parsing 2-steps construction 1. Simple 2. Flexible
Principles a bracketed sentence = a derivation tree ambiguity = more than one tree with the same yield d 6 d 8 d 13 vid r 13 => d 5 case d 14 vid r 14 of d 7 d 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 13 vid r 13 => d 5 case d 14 vid r 14 of d 7 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 r 5 r 8 r 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 construct a FSA A such that L(G b ) L(A), and look for bracketed sentences with the same yield
Principles a bracketed sentence = a derivation tree ambiguity = more than one tree with the same yield d 6 d 8 d 13 vid r 13 => d 5 case d 14 vid r 14 of d 7 d 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 13 vid r 13 => d 5 case d 14 vid r 14 of d 7 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 r 5 r 8 r 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 construct a FSA A such that L(G b ) L(A), and look for bracketed sentences with the same yield
Principles a bracketed sentence = a derivation tree ambiguity = more than one tree with the same yield d 6 d 8 d 13 vid r 13 => d 5 case d 14 vid r 14 of d 7 d 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 13 vid r 13 => d 5 case d 14 vid r 14 of d 7 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 r 5 r 8 r 6 d 8 d 13 vid r 13 => d 14 vid r 14 r 8 r 7 construct a FSA A such that L(G b ) L(A), and look for bracketed sentences with the same yield
RU( ) Regular Unambiguity G is regular unambiguous for of finite index, if there does not exist w b w b in L(Γ/ ) T b with h(w b ) = h(w b ) LR(0) RU(item 0 ) regular approximations are too weak
RU( ) Regular Unambiguity G is regular unambiguous for of finite index, if there does not exist w b w b in L(Γ/ ) T b with h(w b ) = h(w b ) LR(0) RU(item 0 ) regular approximations are too weak
Nonterminal Transitions Noncanonical Unambiguity SF(G b ) L(Γ/ ) look for two different bracketed sentential forms in L(Γ/ ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 a nonterminal transition represents exactly its derived context-free language
Nonterminal Transitions Noncanonical Unambiguity SF(G b ) L(Γ/ ) look for two different bracketed sentential forms in L(Γ/ ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 a nonterminal transition represents exactly its derived context-free language
Nonterminal Transitions Noncanonical Unambiguity SF(G b ) L(Γ/ ) look for two different bracketed sentential forms in L(Γ/ ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 a nonterminal transition represents exactly its derived context-free language
Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 d 14 vid r 14 => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 14 vid r 14 => d 5 case exp of match r 5 r 8 r 6 mrules r 7 epsilon: mae
Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 d 14 vid r 14 => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 14 vid r 14 => d 5 case exp of match r 5 r 8 r 6 mrules r 7 epsilon: mae
Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 d 14 vid r 14 => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 14 vid r 14 => d 5 case exp of match r 5 r 8 r 6 mrules r 7 epsilon: mae
Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 d 14 vid r 14 => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 14 vid r 14 => d 5 case exp of match r 5 r 8 r 6 mrules r 7 shift: mas
Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 d 14 vid r 14 => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 d 14 vid r 14 => d 5 case exp of match r 5 r 8 r 6 mrules r 7 nothing!
Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 shift: mas
Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 conflict: mac
Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 conflict: mac
Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 conflict: mac
Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 shift: mas
Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 reduce: mar
Mutual Accessibility Relations between pairs of states of Γ/, (q 1, q 2 ) Noncanonical Unambiguity synchronized left-to-right walks from an initial pair (q s, q s ) d 6 d 8 pat => d 5 case exp of d 7 match mrules r 7 r 5 r 8 r 6 d 7 d 6 d 8 pat => d 5 case exp of match r 5 r 8 r 6 mrules r 7 conflict: mac
NU( ) Noncanonical Unambiguity ma=mas mae mac mar G is noncanonically unambiguous if there does not exist a relation (q s, q s ) ma (q f, q f ) that uses mac at some step Computation in O( Γ/ 2 ) in space
Comparisons Noncanonical Unambiguity Regular Unambiguity RU( ) Bounded-length detection schemes LR(k) and LR-Regular (LR(Π)) Horizontal and vertical ambiguity (HVRU( ))
Bounded-length detection [Gorn, 1963, Cheung and Uzgalis, 1995, Schröer, 2001, Jampana, 2005] Noncanonical Unambiguity generate sentences not conservative prefix m prevents from false positives in sentences of length < m need to generate a 2n +1 to find G n 4 ambiguous, but G n 4 NU(item 0) S A B n a, A Aaa a, B 1 aa, B 2 B 1 B 1,..., B n B n 1 B n 1 (G n 4 )
LR(k) and LR-Regular Noncanonical Unambiguity [Knuth, 1965, Hunt III et al., 1975, Čulik and Cohen, 1973, Heilbrunner, 1983] conservative tests define item Π s.t. LR(Π) NU(item Π ) need a LR(2 n ) test to prove G n 3 unambiguous, but G n 3 NU(item 0) S A B n, A Aaa a, B 1 aa, B 2 B 1 B 1,..., B n B n 1 B n 1 (G n 3 )
Implementation For the whole SML grammar: Experimental Results conflicts in the LALR(1) parser sml.y: conflicts: 223 shift/reduce, 35 reduce/reduce Our tool: 89 potential ambiguities with LR(1) precision detected For the SML grammar fragment: 2 potential ambiguities with LR(0) precision detected: (match -> mrule., match -> match. mrule ) (match -> match. mrule, match -> match mrule. ) NU(item 1 ) correctly identifies 87% of our unambiguous grammars 73% of the non-lalr(1) ones
Summary Experimental Results conservative ambiguity detection provably better than several other techniques also experimentally better
Conclusion Closing Comments Main issues in parser development: nondeterminism ambiguity in particular Deterministic parsers for larger classes of grammars Ambiguity detection algorithm
Directions for Future Work Future Work Linear time parsing for NU( ) grammars? Improved implementation Noncanonical languages Regular approximations
Future Work Thanks!
References Our Issue Shift/Reduce Conflict GNU Bison state 20 6 exp: "case" exp "of" match. 8 match: match. mrule shift, and go to state 24 [reduce using rule 6 (exp)]
References Our Issue Shift/Reduce Conflict Which action to choose? fvalbind match match mrule mrule pat exp pat exp... of SOME y => filterp(r, y :: l ) NONE => filterp(r, l)
References Our Issue Shift/Reduce Conflict Which action to choose? Reduce? fvalbind sfvalbind exp match mrule sfvalbind... pat exp error! vid of SOME y => filterp(r, y :: l ) NONE => filterp(r, l)
References Our Issue Shift/Reduce Conflict Which action to choose? Shift? fvalbind match match mrule mrule pat exp pat exp... of SOME y => filterp(r, y :: l ) NONE => filterp(r, l)
References Our Issue Shift/Reduce Conflict Which action to choose? fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l
References Our Issue Shift/Reduce Conflict Which action to choose? Reduce? fvalbind fvalbind sfvalbind exp match mrule sfvalbind pat exp atpats exp... NONE=> filterp(r, l) filterp ([], l ) = rev l
References Our Issue Shift/Reduce Conflict Which action to choose? Shift? fvalbind fvalbind sfvalbind exp match match mrule... pat exp pat exp error! NONE=> filterp(r, l) filterp ([], l ) = rev l fvalbind
References Unbounded Lookahead sfvb mrule vid atpats =... pat =>... atpat vid atpat......
References Limitations Ambiguity Report grambiguity [Brabrand et al., 2007] *** horizontal ambiguity at E[plus]: Exp <--> + Exp ambiguous string: "x+x+x" ANTLRWorks [Parr, 2007]
References Other Limitations memory requirements: a solution could be a NLALR test dynamic disambiguation: inverse problem, some means to deciding equivalence needed
References H. J. S. Basten. Ambiguity detection methods for context-free grammars. Master s thesis, Centrum voor Wiskunde en Informatica, Universiteit van Amsterdam, Aug. 2007. P. Boullier. Supertagging: A non-statistical parsing-based approach. In IWPT 03, pages 55 65, 2003. URL ftp://ftp.inria.fr/inria/projects/atoll/ Pierre.Boullier/supertaggeur final.pdf. C. Brabrand, R. Giegerich, and A. Møller. Analyzing ambiguity of context-free grammars. In J. Holub and J. Žd árek, editors, CIAA 07, 2007. URL http://www.brics.dk/ brabrand/grambiguity/. To appear in Lecture Notes in Computer Science. D. G. Cantor. On the ambiguity problem of Backus