Context-Free Grammars (CFG)

Similar documents
Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Turing Machines: An Introduction

Automata and Computability. Solutions to Exercises

3515ICT Theory of Computation Turing Machines

SRM UNIVERSITY FACULTY OF ENGINEERING & TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF SOFTWARE ENGINEERING COURSE PLAN

C H A P T E R Regular Expressions regular expression

Regular Languages and Finite State Machines

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, Class 4 Nancy Lynch

Introduction to Turing Machines

Reading 13 : Finite State Automata and Regular Expressions

Introduction to Automata Theory. Reading: Chapter 1

Model 2.4 Faculty member + student

The Halting Problem is Undecidable

CS154. Turing Machines. Turing Machine. Turing Machines versus DFAs FINITE STATE CONTROL AI N P U T INFINITE TAPE. read write move.

Computability Theory

Notes on Complexity Theory Last updated: August, Lecture 1

Automata and Formal Languages

Handout #1: Mathematical Reasoning

FIRST and FOLLOW sets a necessary preliminary to constructing the LL(1) parsing table

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems.

OBTAINING PRACTICAL VARIANTS OF LL (k) AND LR (k) FOR k >1. A Thesis. Submitted to the Faculty. Purdue University.

Row Echelon Form and Reduced Row Echelon Form

Correspondence analysis for strong three-valued logic

Syntaktická analýza. Ján Šturc. Zima 208

CS 3719 (Theory of Computation and Algorithms) Lecture 4

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

Deterministic Finite Automata

Chapter 7 Uncomputability

Turing Machines, Part I

Computer Architecture Syllabus of Qualifying Examination

Lemma 5.2. Let S be a set. (1) Let f and g be two permutations of S. Then the composition of f and g is a permutation of S.

10.2 Series and Convergence

Regular Expressions and Automata using Haskell

Regular Languages and Finite Automata

Lecture 2: Universality

Kevin James. MTHSC 412 Section 2.4 Prime Factors and Greatest Comm

ÖVNINGSUPPGIFTER I SAMMANHANGSFRIA SPRÅK. 15 april Master Edition

Informatique Fondamentale IMA S8

CS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions

Fractions to decimals

CS143 Handout 08 Summer 2008 July 02, 2007 Bottom-Up Parsing

Learning Analysis by Reduction from Positive Data

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

Automata on Infinite Words and Trees

24 Uses of Turing Machines

A first step towards modeling semistructured data in hybrid multimodal logic

Continued Fractions and the Euclidean Algorithm

8 Divisibility and prime numbers

Overview of E0222: Automata and Computability

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

=

Increasing Interaction and Support in the Formal Languages and Automata Theory Course

MACM 101 Discrete Mathematics I

Formal Grammars and Languages

Pushdown Automata. place the input head on the leftmost input symbol. while symbol read = b and pile contains discs advance head remove disc from pile

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

CS5236 Advanced Automata Theory

Ambiguity, closure properties, Overview of ambiguity, and Chomsky s standard form

Page 331, 38.4 Suppose a is a positive integer and p is a prime. Prove that p a if and only if the prime factorization of a contains p.

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

Cubes and Cube Roots

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, p i.

CMPSCI 250: Introduction to Computation. Lecture #19: Regular Expressions and Their Languages David Mix Barrington 11 April 2013

Parsing Beyond Context-Free Grammars: Introduction

CAs and Turing Machines. The Basis for Universal Computation

Computer Science 281 Binary and Hexadecimal Review

How To Compare A Markov Algorithm To A Turing Machine

25 Integers: Addition and Subtraction

Pushdown Automata. International PhD School in Formal Languages and Applications Rovira i Virgili University Tarragona, Spain

Computational Models Lecture 8, Spring 2009

CS 103X: Discrete Structures Homework Assignment 3 Solutions

1 Definition of a Turing machine

Philadelphia University Faculty of Information Technology Department of Computer Science First Semester, 2007/2008.

3. Mathematical Induction

! " # The Logic of Descriptions. Logics for Data and Knowledge Representation. Terminology. Overview. Three Basic Features. Some History on DLs

Cartesian Products and Relations

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Increasing Interaction and Support in the Formal Languages and Automata Theory Course

This asserts two sets are equal iff they have the same elements, that is, a set is determined by its elements.

Properties of Stabilizing Computations

Grammars and Parsing. 2. A finite nonterminal alphabet N. Symbols in this alphabet are variables of the grammar.

LINEAR EQUATIONS IN TWO VARIABLES

Cardinality. The set of all finite strings over the alphabet of lowercase letters is countable. The set of real numbers R is an uncountable set.

Two-dimensional Languages

Math 319 Problem Set #3 Solution 21 February 2002

Linear Algebra. A vector space (over R) is an ordered quadruple. such that V is a set; 0 V ; and the following eight axioms hold:

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

WHEN DOES A CROSS PRODUCT ON R n EXIST?

facultad de informática universidad politécnica de madrid

Learning Automata and Grammars

A COMBINED FINE-GRAINED AND ROLE-BASED ACCESS CONTROL MECHANISM HONGI CHANDRA TJAN. Bachelor of Science. Oklahoma State University

ω-automata Automata that accept (or reject) words of infinite length. Languages of infinite words appear:


6.080/6.089 GITCS Feb 12, Lecture 3

Bottom-Up Parsing. An Introductory Example

Base Conversion written by Cathy Saxton

Stupid Divisibility Tricks

Transcription:

Context-Free Grammars (CFG) SITE : http://www.sir.blois.univ-tours.fr/ mirian/ Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 1/2

An informal example Language of palindromes: L pal A palindrome is a string that reads the same forward and backward Ex: otto, madamimadam, 0110, 11011, ǫ L pal is not a regular language (can be proved by using the pumping lemma) We consider Σ = {0, 1}. There is a natural, recursive definition of when a string of 0 and 1 is in L pal. Basis: ǫ, 0 and 1 are palindromes Induction: If w is a palindrome, so are 0w0 and 1w1. No string is palindrome of 0 and 1, unless it follows from this basis and inductive rule. A CFG is a formal notation for expressing such recursive definitions of languages Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 2/2

What is a grammar? A grammar consists of one or more variables that represent classes of strings (i.e., languages) There are rules that say how the strings in each class are constructed. The construction can use : 1. symbols of the alphabet 2. strings that are already known to be in one of the classes 3. or both Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 3/2

A grammar for palindromes In the example of palindromes, we need one variable P which represents the set of palindromes; i.e., the class of strings forming the language L pal Rules: The first three rules form the basis. P ǫ P 0 P 1 P P 0P0 1P1 They tell us that the class of palindromes includes the strings ǫ, 0 and 1 None of the right sides of theses rules contains a variable, which is why they form a basis for the definition The last two rules form the inductive part of the definition. For instance, rule 4 says that if we take any string w from the class P, then 0w0 is also in class P. Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 4/2

Definition of Context-Free Grammar A GFG (or just a grammar) G is a tuple G = (V, T, P, S) where 1. V is the (finite) set of variables (or nonterminals or syntactic categories). Each variable represents a language, i.e., a set of strings 2. T is a finite set of terminals, i.e., the symbols that form the strings of the language being defined 3. P is a set of production rules that represent the recursive definition of the language. 4. S is the start symbol that represents the language being defined. Other variables represent auxiliary classes of strings that are used to help define the language of the start symbol. Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 5/2

Production rules Each production rule consists of: 1. A variable that is being (partially) defined by the production. This variable is often called the head of the production. 2. The production symbol. 3. A string of zero or more terminals and variables. This string, called the body of the production, represents one way to form strings in the language of the variable of the head. In doing so, we leave terminals unchanged and substitute for each variable of the body any string that is known to be in the language of that variable Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 6/2

Compact Notation for Productions We often refers to the production whose head is A as productions for A or A-productions Moreover, the productions can be replaced by the notation A α 1, A α 2... A α n A α 1 α 2... α n Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 7/2

Examples: CFG for palindromes G pal = ({P }, {0,1}, A, P) where A represents the production rules: P ǫ P 0 P 1 P 0P0 P 1P1 We can also write: P ǫ 0 1 0P0 1P1 Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 8/2

Examples: CFG for expressions in a typical programming language Operators: + (addition) and (multiplication) Identifiers: must begin with a or b, which may be followed by any string in {a, b,0,1} We need two variables: E: represents expressions. It is the start symbol. I: represents the identifiers. Its language is regular and is the language of the regular expression: (a + b)(a + b + 0 + 1) Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 9/2

Exemples (cont.): The grammar Grammar G 1 = ({E, I}, T, P, E) where: T = {+,,(,), a, b,0, 1} and P is the set of productions: 1 E I 2 E E + E 3 E E E 4 E (E) 5 I a 6 I b 7 I Ia 8 I Ib 9 I I0 10 I I1 Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 10/2

Derivations Using a Grammar We apply the productions of a CFG to infer that certain strings are in the language of a certain variable Two inference approaches: 1. Recursive inference, using productions from body to head 2. Derivations, using productions from head to body Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 11/2

Recursive inference - an exemple We consider some inferences we can make using G 1 Recall G 1 : E I E + E E E (E) I a b Ia Ib I0 I1 String Lang Prod String(s) used (i) a I 5 - (ii) b I 6 - (iii) b0 I 9 (ii) (iv) b00 I 9 (iii) (v) a E 1 (i) (vi) b00 E 1 (iv) (vii) a + b00 E 2 (v), (vi) (viii) (a + b00) E 4 (vii) (ix) a (a + b00) E 3 (v), (viii) Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 12/2

Derivations Applying productions from head to body requires the definition of a new relational symbol: Let: G = (V, T, P, S) be a CFG A V α, β (V T) and A γ P Then we write or, if G is understood and say that αaβ derives αγβ. αaβ G αγβ αaβ αγβ Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 13/2

Zero or more derivation steps We define to be the reflexive and transitive closure of (i.e., to denote zero or more derivation steps): Basis: Let α (V T). Then α α. Induction: If α β and β γ, then α γ. Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 14/2

Examples of derivation Derivation of a (a + b000) by G 1 E E E I E a E a (E) a (E + E) a (I + E) a (a + E) a (a + I) a (a + I0) a (a + I00) a (a + b00) Note 1: At each step we might have several rules to choose from, e.g. I E a E a (E), versus I E I (E) a (E). Note 2: Not all choices lead to successful derivations of a particular string, for instance E E + E (at the first step) won t lead to a derivation of a (a + b000). Important: Recursive inference and derivation are equivalent. A string of terminals w is infered to be in the language of some variable A iff A w Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 15/2

Leftmost and Rightmost derivation In other to restrict the number of choices we have in deriving a string, it is often useful to require that at each step we replace the leftmost (or rightmost) variable by one of its production rules Leftmost derivation lm : Always replace the left-most variable by one of its rule-bodies Rightmost derivation rm : Always replace the rightmost variable by one of its rule-bodies. EXAMPLES 1 Leftmost derivation: previous example 2 Rightmost derivation: E rm E E rm E (E) rm E (E + E) rm E (E + I) rm E (E + I0) rm E (E + I00) rm E (E + b00) rm E (I + b00) rm E (a + b00) rm I (a + b00) rm a (a + b00) We can conclude that E rm a (a + b00) Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 16/2

The Language of the Grammar If G(V, T, P, S) is a CFG, then the language of G is L(G) = {w in T S G w} i.e., the set of strings over T derivable from the start symbol. If G is a CFG, we call L(G) a context-free language. Example: L(G pal ) is a context-free language. Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 17/2

Theorem A string w {0, 1} is in L(G pal ) iff w = w R. Proof : ( -direction.) Suppose w = w R, i.e., that w is a palindrome. We show by induction on w that w L(G pal ) Basis: Basis: w = 0, or w = 1. Then w is ǫ, 0, or 1. Since P ǫ, P 0, and P 1 are productions, we conclude that P w in all base cases. Induction: Suppose w 2. Since w = w R, we have w = 0x0, or w = 1x1, and x = x R. If w = 0x0 we know from the IH that P x. Then P 0P0 0x0 = w Thus w L(Gpal). The case for w = 1x1 is similar. Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 18/2

Proof (cont.) ( -direction.) We assume w L(G pal ) and we prove that w = w R. Since w L(G pal ), we have P w. We do an induction of the length of Basis: The derivation P w is done in one step. Then w must be ǫ,0, or 1, all palindromes. Induction: Let n 1, and suppose the derivation takes n + 1 steps. Then we must have w = 0x0 0P0 P or w = 1x1 1P1 P where the second derivation is done in n steps. By the IH x is a palindrome, and the inductive proof is complete.. Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 19/2

Sentential Forms Let G = (V, T, P, S) be a CFG, and α (V T). If we say α is a sentential form. S α If S lm α we say that α is a left-sentential form, and if S rm α we say that is a right-sentential form. Note: L(G) is those sentential forms that are in T. Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 20/2

Example Recall G 1 : E I E + E E E (E) I a b Ia Ib I0 I1 1 Then E (I + E) is a sentential form since E E E E (E) E (E + E) E(I + E) This derivation is neither leftmost, nor right-most. 2 a E left-sentential form, since E E E I E a E 3 E (E + E) is a right-sentential form since E E E E (E) E (E + E) Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 21/2

Parse Trees If w L(G), for some CFG, then w has a parse tree, which tells us the (syntactic) struc- ture of w. w could be a program, a SQL-query, an XML- document, etc. Parse trees are an alternative representation to derivations and recursive inferences. There can be several parse trees for the same string. Ideally there should be only one parse tree (the "true" structure) for each string, i.e. the language should be unambiguous. Unfortunately, we cannot always remove the ambiguity. Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 22/2

Gosta Grahne... Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 23/2

Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 24/2

Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 25/2

Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 26/2