Fall 1998 Formal Language Theory Dr. R. Boyer

Similar documents
6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, Class 4 Nancy Lynch

Automata and Computability. Solutions to Exercises

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Automata on Infinite Words and Trees

Scanner. tokens scanner parser IR. source code. errors

Reading 13 : Finite State Automata and Regular Expressions

C H A P T E R Regular Expressions regular expression

Regular Expressions and Automata using Haskell

1. Prove that the empty set is a subset of every set.

Regular Languages and Finite State Machines

Automata and Formal Languages

Finite Automata. Reading: Chapter 2

Introduction to Automata Theory. Reading: Chapter 1

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

The Halting Problem is Undecidable

Turing Machines: An Introduction

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

Regular Languages and Finite Automata

Deterministic Finite Automata

Finite Automata and Regular Languages

Chapter 7 Uncomputability

Finite Automata. Reading: Chapter 2

Quotient Rings and Field Extensions

it is easy to see that α = a

CS5236 Advanced Automata Theory

Cartesian Products and Relations

Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi

Class One: Degree Sequences

Converting Finite Automata to Regular Expressions

Solutions for Practice problems on proofs

Notes on Determinant

11 Multivariate Polynomials

6.2 Permutations continued

Testing LTL Formula Translation into Büchi Automata

Fundamentele Informatica II

1 if 1 x 0 1 if 0 x 1

3. INNER PRODUCT SPACES

6.080/6.089 GITCS Feb 12, Lecture 3

How To Compare A Markov Algorithm To A Turing Machine

Exponential time algorithms for graph coloring

Reducing Clocks in Timed Automata while Preserving Bisimulation

ASSIGNMENT ONE SOLUTIONS MATH 4805 / COMP 4805 / MATH 5605

CHAPTER 5. Number Theory. 1. Integers and Division. Discussion

Graph Theory Problems and Solutions

Why? A central concept in Computer Science. Algorithms are ubiquitous.

1 Approximating Set Cover

ω-automata Automata that accept (or reject) words of infinite length. Languages of infinite words appear:

An algorithmic classification of open surfaces

(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems.

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include

CS154. Turing Machines. Turing Machine. Turing Machines versus DFAs FINITE STATE CONTROL AI N P U T INFINITE TAPE. read write move.

Lecture 1: Schur s Unitary Triangularization Theorem

Math 55: Discrete Mathematics

Computational Models Lecture 8, Spring 2009

Baltic Way Västerås (Sweden), November 12, Problems and solutions

A Systematic Approach. to Parallel Program Verication. Tadao TAKAOKA. Department of Computer Science. Ibaraki University. Hitachi, Ibaraki 316, JAPAN

Review of Fundamental Mathematics

Data Structures Fibonacci Heaps, Amortized Analysis

136 CHAPTER 4. INDUCTION, GRAPHS AND TREES

On line construction of suffix trees 1

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Properties of Stabilizing Computations

Computability Theory

Putnam Notes Polynomials and palindromes

Network (Tree) Topology Inference Based on Prüfer Sequence

SCORE SETS IN ORIENTED GRAPHS

Gröbner Bases and their Applications

Notes on Complexity Theory Last updated: August, Lecture 1

2.3 Convex Constrained Optimization Problems

Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA

LEARNING OBJECTIVES FOR THIS CHAPTER

6.3 Conditional Probability and Independence

5.1 Bipartite Matching

Lecture 2: Universality


3515ICT Theory of Computation Turing Machines

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

CS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions

CS 3719 (Theory of Computation and Algorithms) Lecture 4

Catalan Numbers. Thomas A. Dowling, Department of Mathematics, Ohio State Uni- versity.

k, then n = p2α 1 1 pα k

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Just the Factors, Ma am

How To Solve A Minimum Set Covering Problem (Mcp)

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

Solutions to Homework 6 Mathematics 503 Foundations of Mathematics Spring 2014

Continued Fractions and the Euclidean Algorithm

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

Midterm Practice Problems

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Lecture 1: Course overview, circuits, and formulas

The Prime Numbers. Definition. A prime number is a positive integer with exactly two positive divisors.

1 Definition of a Turing machine

Applications of Fermat s Little Theorem and Congruences

Two Way F finite Automata and Three Ways to Solve Them

A CONSTRUCTION OF THE UNIVERSAL COVER AS A FIBER BUNDLE

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Transcription:

Fall 1998 Formal Language Theory Dr. R. Boyer Week Five: Regular Languages; Pumping Lemma 1. There are algorithms to answer the following questions: (1) given a DFA M and a string w; is w 2 L(M)? The algorithm is linear in the length of the string. (2) givenadfa M; is L(M) =;? (3) givenadfa M; is L(M) =? (4) given two DFA's M 1 and M 2 ; is L(M 1 ) L(M 2 )? (5) given two DFA's M 1 and M 2 ; is L(M 1 )=L(M 2 )? Note: this question can be solved either by using (4) and by using the uniqueness of the minimal state equivalent DFA. (6) givenanfa M and a string w; is w 2 L(M)? There is an algorithm which is polynomial in the length of the string. 2. Proposition. If L = L(M); where M is a NFA, then L is a regular language; that is, there is a regular expression r such that L = L(r): We shall study two constructions for this correspondence. The rst is a graph based algorithm that builds up the regular expression as nodes are deleted from the state diagram. The second method is given as interpreting the regular language as the solution of a system of equations. 3. To present the graph oriented algorithm, we need to introduce an even more general notion of NFA. It will have the expanded property that its edges may be labeled by regular expressions, not simply by a 2 ore: We will denote this new class of automata by GNFA. Method of Acceptance: in an usual NFA, the machine matches an input symbol with an edge label in order to make amove. For a GNFA, the automaton will consume, perhaps, more than one symbol in order to make the next move. It will consume a substring that belongs to the regular language which is denoted by the edge label. 1

A detailed description of this mode of acceptance follows. Let M be a GNFA, and let w 2 : If w 2 L(M); then w = w 1 w 2 :::w k and there is a sequence of states q 0 = s; q 1 ;:::;q k = f; such that w i 2 L(R i ); where (q i,1 ;q i )=R i ; where R i is the regular expression label. Normalization Condition: we shall assume that the GNFA M has a start state s that has NO edge coming into it and that M has a unique nal state ffg with NO edges leaving it. Further, assume f 6= s: Also, it will be convenient to write the transitions in the form (q; q 0 )=R; where q and q 0 are states and R is a regular expression. 4. Graph Theoretical Algorithm of converting NFA into an equivalent regular expression. Step 1: Convert the given NFA M into a GNFA M 0 byintroducing a new start state, new nal state, and the necessary transitions. Suppose M 0 has k states. Step 2: If k =2; then M 0 has just the start state and the unique nal state. It is clear that L(M 0 )=L(R); where R = (s; f): Step 3: \Node Reduction Step" If k > 2; we remove a node to produce an equivalent automaton M 00 : In particular, select a node q 00 6= s; f: Then dene M 00 =(Qnfq 00 g; ; 00 ;s;ffg) where 00 (q i ;q j )= 0 (q i ;q j ) if either 0 (q i ;q 00 )=; or 0 (q 00 ;q i )=;; otherwise, 00 (q i ;q j )=R 1 (R 2 ) R 3 [ R 4 ; if R 1 = 0 (q i ;q 00 );R 2 = 0 (q 00 ;q 00 );R 3 = 0 (q 00 ;q j ); and R 4 = 0 (q i ;q j ): Step 4: Repeat Step 3 until M 00 has two states: s and f: We need to verify that Step 3 does indeed produce an equivalent automaton. We can argue by visualizing paths through the state diagram of the GNFA. In particular, it is sucient to observe that if a GNFA M 3 ; with transition 3 ; had just three states q 1 ;q 2 and q 3 ; then there is an equivalent GNFA M 2 ; with transition 2 ; with two states, where: 2 (q 1 ;q 2 )=R 1;3 (R 3;3 ) R 3;2 [ R 1;2 : 2

Here, we write R i;j for 3 (q i ;q j ): We can describe this method as an algorithm as follows. We let G be a GNFA with start state s and unique accepting state f; with the remaining states given as q 1 ;q 2 ;:::;q n : The algorithm given below successively removes the states q 1 ;q 2 ; and so on, one at a time, producing an equivalent GNFA. When the looping terminates, the resulting GNFA has only two states s and f: The language that it will accept is denoted by the regular expression given by the label (s; f): for k =1::n do for i; j =1::n do new(q i ;q j ):=(q i ;q j ) [ (q i ;q k ) (q k ;q k ) (q k ;q j ) od; for i = k +1::n do new(s; q j ):=(s; q j ) [ (s; q k ) (q k ;q k ) (q k ;q j ); new(q i ;f):=(q i ;f) [ (q i ;q k ) (q k ;q k ) (q k ;q j ) od; new(s; f) :=(s; f) [ (s; q k ) (q k ;q k ) (q k ;f); := new; od; We next compare the above graph algorithm with the approach taken in the textbook. We should compare this algorithm with Dijkstra's algorithm for solving the "single-source shortest path" problem that you studied in algorithms. As usual, the language L is accepted by the DFA M =(fq 1 ;:::;q n g; ;;q 1 ;F): We let R k ij denote the set of all strings that take the automaton M from state q i to state q j without going through any state numbered k or larger; that is, 3

only strings are allowed that start at q i and end at q j and only use states q 1 ;:::;q k,1 as intermediate states. Such sets of strings satisfy an important inductive identity: R k+1 i;j = R k i;j [ R k i;k(r k k;k) R k k;j : To understand this identity, consider the following. Any string w that is contained in R k+1 ij either uses state q k or not. If w does not use state q k ; then it must lie in R k ij : So, we must examine how state q k is used by the string w: Of course, q k must be used as some intermediate state. So, w uses a path from state q i to q k and from q k to q j ; such that only states q 1 ;:::;q k are used as intermediate states. So, it seems that we must add the set R k ik Rk kj : In fact, there is a further possibility. We can also use paths that cycle through state q k : So, the set of strings that use state q k is: R k ik (Rk kk ) R k kj : One detail to consider is that (R k kk ) is properly larger than R k ij itself, because q k is not allowed as an intermediate state for strings from R k ij : We can easily describe the set of strings R 1 ij : For i 6= j; R1 ij = fa : (q i;a)= q j g; while for i = j; R 1 ii = fa : (q i;a)=q i g[feg: We show by induction that R k ij is denoted by a regular expression rk ij : For k =1; we let, for i 6= j; rij 1 = a 1 [ :::[ a p ; for i = j; rii 1 = e [ a 1 [ :::[ a p ; where (q i ;a`) =q j ; 1 ` p: If this set is empty, then rij 1 = ;; for i 6= j; while for i = j; rii 1 = e: We conclude that R1 ij = L(r1 ij ): We now assume that the result holds for value k: That is, for any set R k`m ; there exists a regular expression rk`m such that Rk`m = L(rk`m ): We need to show that any set R k+1 ij is given by a regular expression. By the inductive property of the set R k+1 ij ; we know that R k+1 ij = R k ij [ Rk ik (Rk kk ) R k (k; j): 4

By the induction hypothesis, we have: R k ij [ R k ik (R k kk) R k kj = L(r k ij) [ L(r k ik)l((r k kk) L(r k kj) This nishes the induction. = L(r k ij [ r k ik((r k kk) r k kj): Finally, we observe that L(M) = S fr n+1 1j : q j 2 F g: Algorithm for the computation of the regular expression r k+1 i;j : Note: L(r k+1 i;j )=R k+1 i;j : We shall write r(i; j; k + 1) for rk+1 i;j : function r(i; j; k +1) if k =0then case i = j : RETURN( (q i ;q j ) [feg) case i 6= j : RETURN( (q i ;q j )) else r(i; j; k +1):=r(i; j; k) [ r(i; k; k) r(k; k; k) r(k; j; k) ; end Note: the length of the regular expression will be exponentially long relative to the number of states of the DFA. Example: State a b 1 3 2 2 1 3 3 2 2 The accepting states are fq 2 ;q 3 g and the initial state is fq 1 g: 5

Sample Calculation for r13 4 : We rst nd that r 4 13 = r 3 13 [ r13(r 3 33) 3 r 3 33 : Next, we compute r 3 13 = r2 13 [ r2 12 (r2 22 ) r 2 23 and r 3 33 = r2 33 [ r2 32 (r2 22 ) r 2 23 : We must now expand the regular expressions: r 2 13 ;r2 22 ;r2 23 ;r2 33 ; and r2 32 : We nd that: r 2 13 = r 1 13 [ r 1 11(r 1 11) r 1 13 and r 2 22 = r 1 22 [ r 1 21(r 1 11) r 1 12 : r 2 23 = r 1 23 [ r 1 21(r 1 11) r 1 13 and r 2 33 = r 1 33 [ r 1 31(r 1 11) r 1 13 : Finally, r 2 32 = r 1 32 [ r 1 31(r 1 11) r 1 12 : The base cases which are read o the state diagram of the DFA are given as follows: r 1 = 11 e; r1 = 12 a; r1 = 13 b; r 1 21 = a; r 1 22 = e; r 1 23 = b; r 1 31 = ;; r 1 32 = a [ b; r 1 33 = e: 6

It is more ecient to arrange the calculation of r k+1 ij in the form of a table. Reg. Expr. k =1 k =2 k =3 r k 11 r k 12 r k 13 r k 21 r k 22 r k 23 e a b a e b r k 31 ; r k 32 r k 33 a [ b e 5. Another method of nding a regular expression equivalent to a nite automaton treats the problem as one of solving a system of equations for the language, where concatenation plays the role of multiplication and union the role of addition. Let X q = fx 2 : (q; x) 2 F g: Then we nd that: X X X q = ax (q;a) :ifq=2 F; while = ax (q;a) + ; if q 2 F: a2 a2 This is the linear system for the sets X q 's we mentioned above. For regular languages, we need what is known as Arden's Lemma: Arden's Lemma: Let A; B with e=2 A: Then the equation: X = A X [ B has the unique solution X = A B: 7

Step 1: If X is a solution, then A B X: To see this, note that A B =(A + [ e)b = A + B [ B = A(A B) [ B: Step 2: X A B: By Step 1, X = A B [ C; since A B X with C \ A B = ;: We want to show that C = ;: Now X = AX [ B; so A B [ C = A(A B [ C) [ B = A + B [ AC [ B = A + B [ B [ AC = (A + [ e)b [ AC = A B [ AC: Next, consider the relation: (A B [ C) \ C =(A B [ AC) \ C: Then C = AC \ C; so C AC: Since e 6= A; the shortest string in AC must be longer than the shortest string in C: Hence, AC = C = ;: We conclude: A B is the unique solution. Note: If e 2 A; then the solution A B is no longer unique but it is the smallest solution. 6. We now present a useful theoretical result that states that regular languages must obey a certain type of \periodicity" property. It is used to show that certain simple languages cannot be regular. Pumping Lemma. Let M =(K; ;;s;f)beadfa, with L = L(M): Suppose m = jkj: Let w 2 L(M) with jwj m: Then there are strings x; y; and z such that w = xyz; jxyj m; y 6= e; and xy k z 2 L; 8k 0: We call m; the pumping constant. Idea of the Proof. Any string accepted by M whose length is greater than the number of states of the machine must have aloopinit. It is precisely this loop that can be iterated. We use the contrapositive form of the pumping lemma to show that a language is NOT regular - 8

Let L be a language. Suppose that there exists a string w with substrings x; y; z such that y 6= e; w = xyz; and xy k z=2 L; for some integer k 0; then L cannot be a regular language. So, to show a language is NOT regular, think of playing the following sort of game: nd a string w 2 L so that for any non-empty substring y of w; there exists some pumped form of w : xy k z so that xy k z=2 L: Examples. (1) L 1 = fa n b n : n 1g is not regular. Suppose the language were regular. Choose n greater than the pumping constant given above. Then w = a n b n can be factored as xyz; y 6= e; and xy k z 2 L 1 ; for all k 0: Choose k =0; so xz 2 L; but xz = a n,jyj b n 2 L: Contradiction. We say that we pumped "down" in this example. (2) L 2 = fa n2 : n 1g is not regular. Suppose L 2 were regular. Choose n greater than the pumping constant m: Then a n2 = xyz; where y 6= e and jyj m n: So, xy k z 2 L 2 ; for all k 0: Choose k =2: Then xy 2 z 2 L 2 implies jxy 2 zj is a perfect square. But n 2 < jxy 2 zj <n 2 + n<(n +1) 2 : Contradiction. In this example, we say that we pumped "up." (3) L 3 = fw w R : w 2 g is not regular if = fa; bg: Suppose L 3 were regular. Choose the string w so jwj m +1; where m is the pumping constant. Further, we may choose w to have the special form: w = a m b; so ww r = a m bba m : By the Pumping Lemma, ww R = xyz; with jxyj m and xy k z 2 L 3 ; for k 0: Take k =0: Then xz = a m,jyj bba m 2 L 3 : Contradiction. 7. Problem: Given a DFA M; nd an equivalent DFA with a minimum number of states. We present two solutions to this problem. The rst one of algorithmic. The second one is more conceptual and proves that the equivalent minimum state 9

DFA is unique, up to the labeling of its states. 8. First Method: Merging of Equivalent States Let M =(K; ;;q 0 ;F)beaDFA. Given two states q and q 0 from K; we dene an equivalence relation on the states of M by: q q 0 means (q; w) 2 F () (q 0 ;w) 2 F; 8w 2 : The -equivalence classes are computed by a sequence of other equivalence relations n by successive renements. Let q and q 0 be two states of M: Then: q 0 q 0 means q 2 F () q 0 2 F: That is, 0 has two equivalence classes: the set of accepting states F and the set of rejecting states Q n F: For n>0; dene n+1 to mean: q n+1 q 0 as q n q 0 and (q; a) n (q 0 ;a); 8a 2 : That is, q n q 0 means (q; w) 2 F () (q 0 ;w) 2 F; for all strings w whose length is less than or equal n: The equivalence classes of n stabilize for n less than or equal to the number of states of the automaton M: Further, q q 0 if and only if q n q 0 ; for all n: Now, if all the states of M are reachable from the start state q 0 and if equivalent states are merged, then the resulting automaton has a minimum number of states. These observations give rise to an eective algorithm to nd the minimum state automaton, by successively computing the n -equivalence classes, for n = 0; 1; 2;:::: The process terminates when the equivalence classes for two successive values of n agree. Algorithm for Merging Equivalent States: We rst make a table of unordered pairs of distinct states. No pair is marked. (1) First, mark all pairs of inequivalent states relative to strings of length 0; so mark the pair fp; qg if p 2 F; q 2 Q n F or p 2 Q n F; q 2 F: 10

(2) Next, we mark all pairs of inequivalent states relative to strings of length k =1; 2; ::; n; where n is the total number of states of the original DFA. for k =1::n do if there is an unmarked pair fp; qg, so that f(p; );(q; )g is marked, then mark the pair fp; qg: od; (3) When the loop terminates, all inequivalent pairs are marked; so the unmarked pairs are equivalent states. Merge these pairs together. Example. State a b 0 1 2 1 3 4 2 4 3 3 5 5 4 5 5 5 5 5 The accepting states are f1; 2; 5g: The result of the algorithm is seen to be that states 1 and 2 should be merged and states 3 and 4 should be merged as well. 11

f0; 3g! a f1; 5g; f0; 3g! b f2; 5g: f0; 4g! a f1; 5g; f0; 4g! b f2; 5g f1; 2g! a f3; 4g; f1; 2g! b f3; 4g: f1; 5g! a f3; 5g; f1; 5g! b f4; 5g: f3; 4g! a f5; 5g; f3; 4g! b f5; 5g: f2; 5g! a f4; 5g; f2; 5g! b f3; 5g: 9. Second Method: Construction of the Minimum State DFA directly from the Language L Let M =(K; ;;q 0 ;F) be a nite deterministic automaton such that all its states are reachable from its start state. Let L = L(M) be the language it accepts. We associate with M a special equivalence relation R M on ; where xr M y () (q 0 ;x)=(q 0 ;y); where x; y 2 that is, two strings x and y are equivalent if they terminate at the same state. Hence, we can identify the R M equivalence classes [x] M with the sates of M: The language L(M) is the union of the R M -equivalence classes which include an element x; so (q 0 ;x) 2 F: We may call R M ; machine equivalence. We call an equivalence relation R on right-invariant if xry ) xzryz; for all strings z 2 : Note: R M is right invariant. Let L be any language over the alphabet ; that is, L : We can associate an equivalence relation R L on directly from L; without using a nite automaton. 12

Given any two strings x; y 2 ; we say xr L y yz 2 L; for all z 2 : () xz 2 L exactly when Note: the equivalence relation R L is right invariant and R L is a renement of R M ; if L = L(M); foradfa M: 10. We can construct a deterministic nite automaton M L directly from the equivalence relation R L ; if R L has FINITE index; that is, if the number of R L equivalence classes is nite. We set M L =(K L ; ; L ;s L ;F L ): Let K L ; the states of the machine M L ; be the collection of all R L -equivalence classes; write them as [x] L ; for a string x: The transition function L : K L! K L is given as: L ([x] L ;a)=[xa] L : Note: L is well dened. Set s L =[e] L and F L = f[x] L : x 2 Lg: The minimum state automaton accepting L is given by M L ; further, any other minimum state automaton that accepts L can be identied with M L ; by a re-labeling of its states. 13