Bottom-Up Syntax Analysis LR - metódy

Similar documents

CS143 Handout 08 Summer 2008 July 02, 2007 Bottom-Up Parsing

Syntaktická analýza. Ján Šturc. Zima 208

If-Then-Else Problem (a motivating example for LR grammars)

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

Bottom-Up Parsing. An Introductory Example

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Approximating Context-Free Grammars for Parsing and Verification

FIRST and FOLLOW sets a necessary preliminary to constructing the LL(1) parsing table

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Yacc: Yet Another Compiler-Compiler

Pushdown Automata. place the input head on the leftmost input symbol. while symbol read = b and pile contains discs advance head remove disc from pile

Compiler Construction

Scanner. tokens scanner parser IR. source code. errors

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Context free grammars and predictive parsing

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

CS154. Turing Machines. Turing Machine. Turing Machines versus DFAs FINITE STATE CONTROL AI N P U T INFINITE TAPE. read write move.

Compiler I: Syntax Analysis Human Thought

Automata and Formal Languages. Push Down Automata. Sipser pages Lecture 13. Tim Sheard 1

Formal Grammars and Languages

Pushdown Automata. International PhD School in Formal Languages and Applications Rovira i Virgili University Tarragona, Spain

Intel Assembler. Project administration. Non-standard project. Project administration: Repository

Automata and Computability. Solutions to Exercises

Automata on Infinite Words and Trees

Compiler Design July 2004

SOLUTION Trial Test Grammar & Parsing Deficiency Course for the Master in Software Technology Programme Utrecht University

Basic Parsing Algorithms Chart Parsing

4.1 Modules, Homomorphisms, and Exact Sequences

Regular Expressions and Automata using Haskell

Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

Fast nondeterministic recognition of context-free languages using two queues

C H A P T E R Regular Expressions regular expression

Left-Handed Completeness

Web Data Extraction: 1 o Semestre 2007/2008

Design Patterns in Parsing

NFAs with Tagged Transitions, their Conversion to Deterministic Automata and Application to Regular Expressions

Flex/Bison Tutorial. Aaron Myles Landwehr CAPSL 2/17/2012

The Ambiguity Review Process. Richard Bender Bender RBT Inc. 17 Cardinale Lane Queensbury, NY

Turing Machines: An Introduction

Matrix Representations of Linear Transformations and Changes of Coordinates

DATA STRUCTURES USING C

Data Structures and Algorithms V Otávio Braga

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Adaptive LL(*) Parsing: The Power of Dynamic Analysis

C Compiler Targeting the Java Virtual Machine

Honors Class (Foundations of) Informatics. Tom Verhoeff. Department of Mathematics & Computer Science Software Engineering & Technology

How to Improve Database Connectivity With the Data Tools Platform. John Graham (Sybase Data Tooling) Brian Payton (IBM Information Management)

Chapter 4: Statistical Hypothesis Testing

Sequentially Indexed Grammars

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

Parsing by Example. Diplomarbeit der Philosophisch-naturwissenschaftlichen Fakultät der Universität Bern. vorgelegt von.

Introduction to the 1st Obligatory Exercise

Logic in general. Inference rules and theorem proving

Introduction to Automata Theory. Reading: Chapter 1

CUP User's Manual. Scott E. Hudson Graphics Visualization and Usability Center Georgia Institute of Technology. Table of Contents.

PRODUCTION PLANNING AND SCHEDULING Part 1

Omega Automata: Minimization and Learning 1

Eventia Log Parsing Editor 1.0 Administration Guide

CS / IT B.E. VII Semester Examination, December 2013 Cloud Computing Time : Three Hours Maximum Marks : 70

2) Write in detail the issues in the design of code generator.

Unified Language for Network Security Policy Implementation

Informatique Fondamentale IMA S8

Automata and Formal Languages

Lecture 9. Semantic Analysis Scoping and Symbol Table

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Lecture Notes 2: Matrices as Systems of Linear Equations

Increasing Interaction and Support in the Formal Languages and Automata Theory Course

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

3515ICT Theory of Computation Turing Machines

High Impact & Alpha Five: A Mail Merge Guide.

Joint Tactical Radio System Standard. JTRS Platform Adapter Interface Standard Version 1.3.3

A Programming Language Where the Syntax and Semantics Are Mutable at Runtime

Grammars and Parsing. 2. A finite nonterminal alphabet N. Symbols in this alphabet are variables of the grammar.

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

LINEAR INEQUALITIES. Mathematics is the art of saying many things in many different ways. MAXWELL

Analysis of a Search Algorithm

EE 435 Lecture 13. Cascaded Amplifiers. -- Two-Stage Op Amp Design

On Winning Conditions of High Borel Complexity in Pushdown Games

Regular Languages and Finite State Machines

Section Inner Products and Norms

Fast Contextual Preference Scoring of Database Tuples

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

6.080/6.089 GITCS Feb 12, Lecture 3

PSTricks. pst-ode. A PSTricks package for solving initial value problems for sets of Ordinary Differential Equations (ODE), v0.7.

Adding GADTs to OCaml the direct approach

Transcription:

Bottom-Up Syntax Analysis LR - metódy Ján Šturc

Shift-reduce parsing LR methods (Left-to-right, Righ most derivation) SLR, Canonical LR, LALR Other special cases: Operator-precedence parsing Backward deterministic parsing via string matching

Shift-Reduce Conflicts Ambiguous grammar: S if E then S if E then S else S Resolve in favor of shift, so else matches closest if.

Reduce-Reduce Conflicts Grammar: C A B A a B a Stack Input Action $ aa$ shift $a a$ reduce A a or B a? $A a$ shift $Aa $ reduce A a or B a? $AB $ reduce $C $ accept Conflicts resolving on the base of the state (history) of the parsing. In general, we can resolve conflicts on the state of the parsing, lookahead symbols, operator precedence, length of right-hand side of production.

LR(k) Parsers: Use a DFA for Shift/Reduce Decisions Položkový automat

DFA for Shift/Reduce Decisions

Model of an LR Parser

LR Parsing Configuration ( = LR parser state): s 0 X 1 s 1 X 2 s 2 X m s m, a i a i+1 a n $ stack input If action[s, a ] = shift s, then push a, push s, and advance input: s 0 X 1 s 1 X 2 s 2 X m s m a i s, a i+1 a n $ If action[s m, a i + = reduce A β and goto[s m-r, A] = s with r= β then pop 2r symbols, push A, and push s: s 0 X 1 s 1 X 2 s 2 X m-r s m-r A s, a i a i+1 a n $ If action[s m, a i ] = accept, then stop If action[s m, a i ] = error, then report the error and attempt to recovery

Example LR Parse Table Grammar: 1. E E + T 2. E T 3. T T * F 4. T F 5. F ( E ) 6. F id shift 5 reduce by the production # 1

Example LR Parsing Grammar: 1. E E + T 2. E T 3. T T * F 4. T F 5. F ( E ) 6. F id

SLR parsing SLR (Simple LR): a simple extension of LR(0) shift-reduce parsing SLR eliminates some conflicts by populating the parsing table with reductions A α on symbols in FOLLOW(A) S E E id + E E id

SLR Parsing Table Reductions do not fill entire rows Otherwise the same as LR(0)

SLR Parsing An LR(0) state is a set of LR(0) items. An LR(0) item is a production with a (dot) in the righthand side Build the LR(0) DFA by Closure operation to construct LR(0) items Goto operation to determine transitions Construct the SLR parsing table from the DFA LR parser program uses the SLR parsing table to determine shift/reduce operations

Constructing SLR Parsing Tables 1. Augment the grammar with S S$ 2. Construct the set C={I 0,I 1,,I n } of LR(0) items 3. If [A α aβ] I i and goto(i i, a)=i j then set action[i,a]=shift j 4. If [A α ] I i then set action[i, a]=reduce A α for all a FOLLOW(A) (apply only if A S ) 5. If *S S ] is in I i then set action[i, $]=accept 6. If goto(i i, A)=I j then set goto[i, A]=j 7. Repeat 3-6 until no more entries added 8. The initial state i is the I i holding item *S S$]

LR(0) Items of a Grammar An LR(0) item of a grammar G is a production of G with a at some position of the right-hand side Thus, a production A X Y Z has four items: *A X Y Z+ *A X Y Z+ *A X Y Z+ *A X Y Z + Note that production A ε has one item *A + here bullet is simultaneously at the beginning and at the end

Constructing the set of LR(0) Items of a Grammar 1. The grammar is augmented with a new start symbol S and production S S$. 2. Initially, set C = closure(,*s S $]}) (this is the start state of the DFA). 3. For each set of items I C and each grammar symbol X (N T) such that goto(i, X) C and goto(i, X), add the set of items goto(i, X) to C. 4. Repeat 3 until no more sets can be added to C.

The Closure Operation for LR(0) Items 1. Start with closure(i) = I 2. If [A α Bβ] closure(i) then for each production B γ in the grammar, add the item *B γ+ to I if it is not already in I 3. Repeat 2 until no new items can be added

Grammar: E E + T T T T * F F F ( E ) F id The Closure Operation (Example)

The Goto Operation for LR(0) Items 1. For each item [A α Xβ] I, add the set of items closure({[a αx β]}) to goto(i, X) if not already there 2. Repeat step 1 until no more items can be added to goto(i, X) 3. Intuitively, goto(i, X) is the set of items that are valid for the viable prefix γx when I is the set of items that are valid for γ

The Goto Operation (Example 1) Grammar: E E + T T T T * F F F ( E ) F id Suppose I =, *E E+ *E E + T] *E T+ *T T * F] *T F+ *F ( E )] *F id] } Then goto(i, E) = closure(,*e E, E E + T+-) =, *E E +, *E E + T+ -

The Goto Operation (Example 2) Grammar: E E + T T T T * F F F ( E ) F id Suppose I =, *E E +, *E E + T+ - Then goto(i,+) = closure(,*e E + T+-) =, *E E + T+, *T T * F+, *T F+, *F ( E )+, *F id+ -

Example SLR Grammar and LR(0) Items Augmented grammar: 1. C C 2. C A B 3. A a 4. B a I 0 = closure(,*c C+-) I 1 = goto(i0,c) = closure(,*c C +-)

Example SLR Parsing Table

SLR and Ambiguity Every SLR grammar is unambiguous, but not every unambiguous grammar is SLR Consider for example the unambiguous grammar 1. S L = R 2. S R 3. L * R 4. L id 5. R L

LR(1) Grammars SLR too simple (málo využivajú lookahead, vlastne len pri redukcii FOLLOW) LR(1) parsing uses lookahead to avoid unnecessary conflicts in parsing table LR(1) item = LR(0) item + lookahead LR(0) item: *A α β+ LR(1) item: *A α β, a] Obvykle sa LR(1) itemy implementujú ako trojica čísiel: číslo pravidla, pozícia bodky a vnútorný kód lookahead tokenu.

SLR Versus LR(1) Split the SLR states by adding LR(1) lookahead Unambiguous grammar 1. S L = R 2. S R 3. L * R 4. L id 5. R L Should not reduce, because no right-sentential form begins with R=

LR(1) Items If an LR(1) item [A α β, a] contains a lookahead terminal a, means α is already on the top of the stack and to see βa is expected. For items of the form [A α, a+ the lookahead a is used to reduce A α only if the next input is a For items of the form [A α β, a] with β ε the lookahead has no effect during parsing

The Closure Operation for LR(1) Items 1. Start with closure(i) = I 2. If [A α Bβ, a] closure(i) then for each production B γ in the grammar and each terminal b FIRST(βa), add the item *B γ, b] to I if not already in I 3. Repeat 2 until no new items can be added

The Goto Operation for LR(1) Items 1. For each item [A α Xβ, a] I, add the set of items closure({[a αx β, a]}) to goto(i,x) if not already there 2. Repeat step 1 until no more items can be added to goto(i, X)

Constructing the set of LR(1) Items of a Grammar 1. Augment the grammar with a new start symbol S and production S S$ 2. Initially, set C = closure(,*s S, $+-) this is the start state of the DFA. 3. For each set of items I C and each grammar symbol X (N T) such that goto(i, X) C and goto(i, X), add the set of items goto(i,x) to C 4. Repeat 3 until no more sets can be added to C

Example Grammar and LR(1) Items Unambiguous LR(1) grammar: 2. S L = R 3. S R 4. L * R 5. L id 6. R L Augment with 1. S S$ LR(1) items (next slide)

I 0 : I 1 : *S S, $] goto(i 0,S)=I 1 *S L=R, $] goto(i 0,L)=I 2 *S R, $] goto(i 0,R)=I 3 *L *R, =] goto(i 0,*)=I 4 *L id, =] goto(i 0,id)=I 5 *R L, $] goto(i 0,L)=I 2 *S S, $] I 6 : I 7 : I 8 : *S L= R, $] goto(i 6,R)=I 4 *R L, $] goto(i 6,L)=I 10 *L *R, $] goto(i 6,*)=I 11 *L id, $] goto(i 6,id)=I 12 *L *R, = $] *R L, =] I 2 : *S L =R, $] goto(i 2,=)=I 6 *R L, $] I 9 : *S L=R, $] I 3 : *S R, $+ I 10 : *R L, $] I 4 : *L * R, =] goto(i 4,R)=I 7 *R L, =] goto(i 4,L)=I 8 *L *R, =] goto(i 4,*)=I 4 *L id, =] goto(i 4,id)=I 5 I 11 : *L * R, $] goto(i 11,R)=I 13 *R L, $] goto(i 11,L)=I 10 *L *R, $] goto(i 11,*)=I 11 *L id, $] goto(i 11,id)=I 12 I 5 : *L id, =] I 12 : *L id, $] I 13 : *L *R, $]

Canonical LR(1) Parsing Tables 1. Augment the grammar with S S$ 2. Construct the set C={I 0,I 1,,I n } of LR(1) items 3. If [A α aβ, b] I i and goto(i i, a)=i j then set action[i, a]=shift j (b is here irrelevant) 4. If [A α, a] I i then set (Surely a Follow(A) ) action[i, a]=reduce A α (apply only if A $) 5. If *S S, $] is in I i then set action[i, $]=accept 6. If goto(i i, A)=I j then set goto[i, A]=j 7. Repeat 3-6 until no more entries added 8. The initial state i is the I i holding item *S S,$]

Example LR(1) Parsing Table Grammar: 1. S S $ 2. S L = R 3. S R 4. L * R 5. L id 6. R L

LALR(1) Grammars LR(1) parsing tables have too many states LALR(1) parsing (Look-Ahead LR) merges LR(1) states to reduce table size Less powerful than LR(1) Will not introduce shift-reduce conflicts, because shifts do not use lookaheads May introduce reduce-reduce conflicts, but seldom do so for grammars of programming languages

Constructing LALR(1) Parsing Tables Construct sets of LR(1) items Merge LR(1) sets with sets of items that share the same first part Kernel item (*S S$] or is inside right-hand side) I 4 : I 11 : *L * R, =] *R L, =+ *L *R, =] *L id, =] *L * R, $] *R L, $] *L *R, $] *L id, $] *L * R, = $] *R L, = $] *L *R, = $] *L id, = $] Shorthand for two items

Example LALR(1) Grammar Augmented unambiguous LR(1) grammar: 1. S S $ 2. S L = R 3. S R 4. L * R 5. L id 6. R L LALR(1) items (next slide)

Example LALR(1) Parsing Table Grammar: 1. S S $ 2. S L = R 3. S R 4. L * R 5. L id 6. R L

Improving of the LALR(1) state construction Disadvantage of the previous constuction is that during construction all the LR(1) states are created and then merged to LR(0) states. If we neglect the ε-production, the reductions take place for kernel items only. Reduction A ε is called on input a iff there is a kernel item [B β Cγ, b] such that C rm Aδ and a is in First(δγb). So the ε-reductions for the kernel items can be precomputed. In the computation of the closure of an item [B γ δ, a] there are only two possibilities for lookahead symbols for the items in the closure; they arise according to step 2 at slide 32, we say the are generated spontaneously or it is a and we say that a propagate.

Computation of the lookaheads

LALR(1) sets of items construction

Error treatment Empty fields of the action table are places for insertion the error reporting and recovery action Synch (panic mode) Phrase level recovery LR like analysis (LR, SLR, LALR) discovers errors as soon as possible, whenever the parsed string is not a viable prefix any word generated by the given grammar

LL, SLR, LR, LALR Grammars Most of the programming languages have LALR(1) or even SLR(1) grammar. All the deterministic CFlanguages can be covered by the i LR(i). The compiler generators, TWS s and compiler compilers including Bison and Yacc use mostly LALR(1) analysis