Theory of Computation Class Notes 1

Transcription

1 Theory of Computation Class Notes 1 1 based on the books by Sudkamp and by Hopcroft, Motwani and Ullman

2 ii

3 Contents 1 Introduction Sets Functions and Relations Countable and uncountable sets Proof Techniques Languages and Grammars Languages Regular Expressions Grammars Classification of Grammars and Languages Normal Forms of Context-Free Grammars Chomsky Normal Form (CNF) Greibach Normal Form (GNF) Finite State Automata Deterministic Finite Automata (DFA) Nondeterministic Finite Automata (NFA) NFA with Epsilon Transitions (NFA-ε or ε-nfa)) Finite Automata and Regular Sets Removing Nondeterminism Expression Graphs Regular Languages and Sets Regular Grammars and Finite Automata Closure Properties of Regular Sets Pumping Lemma for Regular Languages Pushdown Automata and Context-Free Languages Pushdown Automata Variations on the PDA Theme Pushdown Automata and Context-Free Languages The Pumping Lemma for Context-Free Languages Closure Properties of Context-Free Languages A Two-Stack Automaton Turing Machines The Standard Turing Machine Notation for the Turing Machine Turing Machines as Language Acceptors Alternative Acceptance Criteria Multitrack Machines iii

4 iv CONTENTS 6.5 Two-Way Tape Machines Multitape Machines Nondeterministic Turing Machines Turing Machines as Language Enumerators The Chomsky Hierarchy The Chomsky Hierarchy Decidability Decision Problems The Church-Turing Thesis The Halting Problem for Turing Machines A Universal Machine The Post Correspondence Problem Undecidability Problems That Computers Cannot Solve Programs that Print Hello, World The Hypothetical Hello, World Tester Reducing One Problem to Another A Language That Is Not Recursively Enumerable Enumerating the Binary Strings Codes for Turing Machines The Diagonalization Language Proof that L d is not Recursively Enumerable Complements of Recursive and RE languages The Universal Language Undecidability of the Universal Language Undecidable Problems About Turing Machines Reductions Turing Machine That Accepts the Empty Language Rice s Theorem and Properties of RE Languages Post s Correspondence Problem The Modified PCP Other Undecidable Problems Undecidability of Ambiguity for CFG s The Complement of a List Language Intractable Problems The Classes P and N P Problems Solvable in Polynomial Time An Example: Kruskal s Algorithm An N P Example: The Travelling Salesman Problem NP-complete Problems The Satisfiability Problem NP-Completeness of 3SAT

5 List of Figures 2.1 Derivation tree Example DFA L(M 1 ) L(M 2 ) L(M 1 )L(M 2 ) L(M 1 ) Sample Union Construction Machines that accept the primitive regular sets An NFA-ε Equivalent DFA Expression Graph Expression Graph Transformation (a) w, (b) w1 w 2(w 3 w 4 (w 1 ) w 2 ) Example (a),(b) Example (a),(b) Example (c),(d) Example NFA accepts a (a b + ) Example L = {a i i 0} {a i b i i 0} PDA L(M) = ww R Pumping Lemma for CFL A Turing Machine Turing Machine COPY TM accepting (a b) aa(a b) TM accepting a i b i c i A k-tape TM for L = a i b i c i Halting Machine Turing Machine D with R(M) as input Turing Machine D with R(D) as input Universal Machine Post Correspondence System Post Correspondence Solution Example Hello-World Program Fermat s last theorem expressed as a hello-world program A hypothetical program H that is a hello-world detector v

6 vi LIST OF FIGURES 9.4 H 1 behaves like H, but it says hello, world instead of no H 2 behaves like H 1, but uses its input P as both P and I What does H 2 do when given itself as input? Reduction of P 1 to P The table that represents acceptance of strings by Turing machines Relationship between the recursive, RE, and non-re languages Construction of a TM accepting the complement of a recursive language Simulation of two TM s accepting a language and its complement Organization of a universal Turing machine Reduction of L d to L u Reductions turn positive instances into positive and negative to negative Construction of a NTM to accept L ne Plan of TM M constructed from (M, w) Construction of a M for the proof of Rice s Theorem Turing Machine that accepts after guessing 10 strings Turing Machine that simulates M on w Turing Machine for L L u A graph

7 Chapter 1 Introduction 1.1 Sets A set is a collection of elements. To indicate that x is an element of the set S, we write x S. The statement that x is not in S is written as x / S. A set is specified by enclosing some description of its elements in curly braces; for example, the set of all natural numbers 0, 1, 2, is denoted by N = {0, 1, 2, 3, }. We use ellipses (i.e.,...) when the meaning is clear, thus J n = {1, 2, 3,, n} represents the set of all natural numbers from 1 to n. When the need arises, we use more explicit notation, in which we write S = {i i 0, i is even} for the last example. We read this as S is the set of all i, such that i is greater than zero, and i is even. Considering a universal set U, the complement S of S is defined as S = {x x U x / S} The usual set operations are union ( ), intersection ( ), and difference( ), defined as S 1 S 2 = {x x S 1 x S 2 } S 1 S 2 = {x x S 1 x S 2 } S 1 S 2 = {x x S 1 x / S 2 } The set with no elements, called the empty set is denoted by. It is obvious that S = S = S S = = U S = S A set S 1 is said to be a subset of S if every element of S 1 is also an element of S. We write this as S 1 S If S 1 S, but S contains an element not in S 1, we say that S 1 is a proper subset of S; we write this as S 1 S 1

8 2 CHAPTER 1. INTRODUCTION The following identities are known as the de Morgan s laws, 1. S 1 S 2 = S 1 S 2, 2. S 1 S 2 = S 1 S 2, 1. S 1 S 2 = S 1 S 2, x S 1 S 2 x U and x / S 1 S 2 x U and (x S 1 or x S 2 ) x U and ( (x S 1 ) and (x S 2 )) x U and (x / S 1 and x / S 2 ) (x U and x / S 1 ) and (x U and x / S 2 ) (x S 1 and x S 2 ) x S 1 S 2 (def.union) (negation of disjunction) (def.complement) (def.intersection) If S 1 and S 2 have no common element, that is, S 1 S 2 =, then the sets are said to be disjoint. A set is said to be finite if it contains a finite number of elements; otherwise it is infinite. The size of a finite set is the number of elements in it; this is denoted by S (or #S). A set may have many subsets. The set of all subsets of a set S is called the power set of S and is denoted by 2 S or P (S). Observe that 2 S is a set of sets. Example If S is the set {1, 2, 3}, then its power set is 2 S = {φ, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}} Here S = 3 and 2 S = 8. This is an instance of a general result, if S is finite, then 2 S = 2 S Proof: (By induction on the number of elements in S). Basis: S = 1 2 S = {, S} 2 S = 2 1 = 2 Induction Hypothesis: Assume the property holds for all sets S with k elements. Induction Step: Show that the property holds for (all sets with) k + 1 elements. Denote S k+1 = {y 1, y 2,..., y k+1 } = S k {y k+1 }

9 1.2. FUNCTIONS AND RELATIONS 3 where S k = {y 1, y 2, y 3,..., y k } 2 S k+1 = 2 S k {y k+1 } {y 1, y k+1 } {y 2, y k+1 }... {y k, y k+1 } x,y Sk {x, y, y k+1 }... S k+1 2 S k has 2 k elements by the induction hypothesis. The number of sets in 2 S k+1 which contain y k+1 is also 2 k. Consequently 2 S k+1 = 2 2 k = 2 k+1. A set which has as its elements ordered sequences of elements from other sets is called the Cartesian product of the other sets. For the Cartesian product of two sets, which itself is a set of ordered pairs, we write Example Let S 1 = {1, 2} and S 2 = {1, 2, 3}. Then S = S 1 S 2 = {(x, y) x S 1, y S 2 } S 1 S 2 = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)} Note that the order in which the elements of a pair are written matters; the pair (3, 2) is not in S 1 S 2. Example If A is the set of throws of a coin, i.e., A ={head,tail}, then A A = {(head,head),(head,tail),(tail,head),(tail,tail)} the set of all possible throws of two coins. The notation is extended in an obvious fashion to the Cartesian product of more than two sets; generally S 1 S 2 S n = {(x 1, x 2,, x n ) x i S i } 1.2 Functions and Relations A function is a rule that assigns to elements of one set (the function domain a unique element of another set (the range). We write f : S 1 S 2 to indicate that the domain of the function f is a subset of S 1 and that the range of f is a subset of S 2. If the domain of f is all of S 1, we say that f is a total function on S 1 ; otherwise f is said to be a partial function on S Domain f = {x S 1 (x, y) f, for some y S 2 } = D f 2. Range f = {y S 2 (x, y) f, for some x S 1 } = R f

10 4 CHAPTER 1. INTRODUCTION 3. The restriction of f to A S 1, f A = {(x, y) f x A} 4. The inverse f 1 : S 2 S 1 is {(y, x) (x, y) f} 5. f : S 1 S 1 is called a function on S 1 6. If x D f then f is defined at x; otherwise f is undefined at x; 7. f is a total function if D f = S f is a partial function if D f S 1 9. f is an onto function or surjection if R f = S 2. If R f S 2 then f is a function from S 1 (D f ) into S f is a one to one function or injection if (f(x) = z and f(y) = z) x = y 11. A total function f is a bijection if it is both an injection and a surjection. A function can be represented by a set of pairs {(x 1, y 1 ), (x 2, y 2 ), }, where each x i is an element in the domain of the function, and y i is the corresponding value in its range. For such a set to define a function, each x i can occur at most once as the first element of a pair. If this is not satisfied, such a set is called a relation. A specific kind of relation is an equivalence relation. A relation denoted r on X is an equivalence relation if it satisfies three rules, the reflexivity rule: the symmetry rule: and the transitivity rule: (x, x) r x X (x, y) r then (y, x) r x, y X (x, y) r, (y, z) r then (x, z) r x, y, z X An equivalence relation on X induces a partition on X into disjoint subsets called equivalence classes X j, j X j = X, such that elements from the same class belong to the relation, and any two elements taken from different classes are not in the relation. Example The relation congruence mod m (modulo m) on the set of the integers Z. i = j mod m if i j is divisible by m; Z is partitioned into m equivalence classes: {, 2m, m, 0, m, 2m, } {, 2m + 1, m + 1, 1, m + 1, 2m + 1, } {, 2m + 2, m + 2, 2, m + 2, 2m + 2, } {, m 1, 1, m 1, 2m, 3m 1, }

11 1.3. COUNTABLE AND UNCOUNTABLE SETS Countable and uncountable sets Cardinality is a measure that compares the size of sets. The cardinality of a finite set is the number of elements in it. The cardinality of a finite set can thus be obtained by counting the elements of the set.two sets X and Y have the same cardinality if there is a total one to one function from X onto Y (i.e., a bijection from X to Y ). The cardinality of a set X is less than or equal to the cardinality of a set Y if there is a total one to one function from X into Y. We denote cardinality of X by #X or X. A set that has the same cardinality as the set of natural numbers N, is said to be countably infinite or denumerable. Sets that are either finite or denumerable are referred to as countable sets. The elements of a countable set can be indexed (or enumerated) using N as the index set. Sets that are not countable are said to be uncountable. The cardinality of denumerable sets is #N = ℵ 0 ( aleph 0 ) The cardinality of the set of the real numbers, #R = ℵ 1 ( aleph 1 ) A set is infinite if it has proper subset of the same cardinality. Example The set J = N {0} is countably infinite; the function s(n) = n + 1 defines a one-to-one mapping from N onto J. The set J, obtained by removing an element from N, has the same cardinality as N. Clearly, there is no one to one mapping of a finite set onto a proper subset of itself. It is this property that differentiates finite and infinite sets. Example The set of odd natural numbers is denumerable. The function f(n) = 2n + 1 establishes the bijection between N and the set of the odd natural numbers. The one to one correspondence between the natural numbers and the set of all integers exhibits the countability of set of integers. A correspondence is defined by the function { n f(n) = if n is odd n 2 if n is even Example #Q + = #J = #N Q + is the set of the rational numbers p q > 0, where p and q are integers, q Proof Techniques We will give examples of proof by induction, proof by contradiction, and proof by Cantor diagonalization. In proof by induction, we have a sequence of statements P 1, P 2,, about which we want to make some claim. Suppose that we know that the claim holds for all statements P 1, P 2,, up to P n. We then try to argue that this implies that the claim also holds for P n+1. If we can carry out this inductive step for all positive n, and if we have some starting point for the induction, we can say that the claim holds for all statements in the sequence. The starting point for an induction is called the basis. The assumption that the claim holds for statements P 1, P 2,, P n is the induction hypothesis, and the argument connecting the induction

12 6 CHAPTER 1. INTRODUCTION hypothesis to P n+1 is the induction step. Inductive arguments become clearer if we explicitly show these three parts. Example Let us prove by mathematical induction. We establish n i=0 i2 = n(n+1)(2n+1) 6 (a) the basis by substituting 0 for n in n i=0 i2 = n(n+1)(2n+1) 6 and observing that both sides are 0. (b) For the induction hypothesis, we assume that the property holds with n = k; k i=0 i2 = k(k+1)(2k+1) 6 (c) In the induction step, we show that the property holds for n = k + 1; i.e., Since k i=0 i2 = (k)(k+1)(2k+1) 6 k+1 i=0 i2 = (k+1)(k+2)(2k+3) 6 k+1 i=0 i2 = k i=0 i2 + (k + 1) 2 and in view of the induction hypothesis, we need only show that (k)(k+1)(2k+1) 6 + (k + 1) 2 = (k+1)(k+2)(2k+3) 6 The latter equality follows from simple algebraic manipulation. In a proof by contradiction, we assume the opposite or contrary of the property to be proved; then we prove that the assumption is invalid. Example Show that 2 is not a rational number. As in all proofs by contradiction, we assume the contrary of what we want to show. Here we assume that 2 is a rational number so that it can be written as 2 = n m, where n and m are integers without a common factor. Rearranging ( 2 = n m ), we have 2m 2 = n 2 Therefore n 2 must be even. This implies that n is even, so that we can write n = 2k or and 2m 2 = 4k 2 m 2 = 2k 2

13 1.4. PROOF TECHNIQUES 7 Therefore m is even. But this contradicts our assumption that n and m have no common factor. Thus, m and n in ( 2 = n m ) cannot exist and 2 is not a rational number. This example exhibits the essence of a proof by contradiction. By making a certain assumption we are led to a contradiction of the assumption or some known fact. If all steps in our argument are logically sound, we must conclude that our initial assumption was false. To illustrate Cantor s diagonalization method, we prove that the set A = {f f a total function, f : N N }, is uncountable. This is essentially a proof by contradiction; so we assume that A is countable, i.e., we can give an enumeration f 0, f 1, f 2, of A. To come to a contradiction, we construct a new function f as f(x) = f x (x) + 1 x N The function f is constructed from the diagonal of the function values of f i A as represented in the figure below. For each x, f differs from f x on input x. Hence f does not appear in the given enumeration. However f is total and f : N N. Hence the set A is uncountable since such an f can be given for any chosen enumeration. Therefore A cannot be enumerated; hence A is uncountable. f 0 f 0 (0) f 0 (1) f 0 (2) f 1 f 1 (0) f 1 (1) f 1 (2) f 2 f 2 (0) f 2 (1) f 2 (2) f 3 f 3 (0) f 3 (1) f 3 (2) Remarks: The set of all infinite sequences of 0 s and 1 s is uncountable. With each infinite sequence of 0 s and 1 s we can associate a real number in the range [0, 1). As a consequence, the set of real numbers in the range [0, 1) is uncountable. Note that the set of all real numbers is also uncountable.

14 8 CHAPTER 1. INTRODUCTION

15 Chapter 2 Languages and Grammars 2.1 Languages We start with a finite, nonempty set Σ of symbols, called the alphabet. From the individual symbols we construct strings (over Σ or on Σ), which are finite sequences of symbols from the alphabet. The empty string ε is a string with no symbols at all. Any set of strings over/on Σ is a language over/on Σ. Example Example Σ = {c} L 1 = {cc} L 2 = {c, cc, ccc} L 3 = {w w = c k, k = 0, 1, 2,...} Σ = {a, b} L 1 = {ab, ba, aa, bb, ε} L 2 = {w w = (ab) k, k = 0, 1, 2, 3,...} = {ε, ab, abab, ababab,...} The concatenation of two strings w and v is the string obtained by appending the symbols of v to the right end of w, that is, if and w = a 1 a 2... a n v = b 1 b 2... b m, then the concatenation of w and v, denoted by wv, is which completes the induction step. wv = a 1 a 2... a n b 1 b 2... b m If w is a string, then w n is the string obtained by concatening w with itself n times. As a special case, we define w 0 = ε, 9

16 10 CHAPTER 2. LANGUAGES AND GRAMMARS for all w. Note that εw = wε = w for all w. The reverse of a string is obtained by writing the symbols in reverse order; if w is a string as shown above, then its reverse w R is If w R = a n... a 2 a 1 w = uv, then v is said to be prefix and u a suffix of w. The length of a string w, denoted by w, is the number of symbols in the string. Note that, ε = 0 If u and v are strings, then the length of their concatenation is the sum of the individual lengths, uv = u + v Let us show that uv = u + v. To prove this by induction on the length of strings, let us define the length of a string recursively, by a = 1 wa = w + 1 for all a Σ and w any string on Σ. This definition is a formal statement of our intuitive understanding of the length of a string: the length of a single symbol is one, and the length of any string is incremented by one if we add another symbol to it. Basis: uv = u + v holds for all u of any length and all v of length 1 (by definition). Induction Hypothesis: we assume that uv = u + v holds for all u of any length and all v of length 1, 2,..., n. Induction Step: Take any v of length n + 1 and write it as v = wa. Then, v = w + 1, uv = uwa = uw + 1. By the induction hypothesis (which is applicable since w is of length n). so that which completes the induction step. uw = u + w. uv = u + w + 1 = u + v. If Σ is an alphabet, then we use Σ to denote the set of strings obtained by concatenating zero or more symbols from Σ. The set Σ and Σ + are always infinite since there is no limit on the length of the strings in these sets. A language can thus be defined as a subset of Σ. A string w in a language L is also called a word or a sentence of L. Example 2.1.3

17 2.1. LANGUAGES 11 Σ = {a, b}. Then Σ = {ε, a, b, aa, ab, ba, bb, aaa, aab,...}. The set {a, aa, aab}. is a language on Σ. Because it has a finite number of words, we call it a finite language. The set L = {a n b n n 0} is also a language on Σ. The strings aabb and aaaabbbb are words in the language L, but the string abb is not in L. This language is infinite. Since languages are sets, the union, intersection, and difference of two languages are immediately defined. The complement of a language is defined with respect to Σ ; that is, the complement of L is L = Σ L The concatenation of two languages L 1 and L 2 is the set of all strings obtained by concatenating any element of L 1 with any element of L 2 ; specifically, L 1 L 2 = {xy x L 1 and y L 2 } We define L n as L concatenated with itself n times, with the special case for every language L. Example L 1 = {a, aaa} L 2 = {b, bbb} L 0 = {ε} Example L 1 L 2 = {ab, abbb, aaab, aaabbb} For then L = {a n b n n 0}, L 2 = {a n b n a m b m n 0, m 0} The string aabbaaabbb is in L 2.The star-closure or Kleene closure of a language is defined as and the positive closure as L = L 0 L 1 L 2 = L i i=0 L + = L 1 L 2 = L i i=0

18 12 CHAPTER 2. LANGUAGES AND GRAMMARS 2.2 Regular Expressions Definition Let Σ be a given alphabet. Then, 1., ε (representing {ε}), a (representing {a}) a Σ are regular expressions. They are called primitive regular expressions. 2. If r and r 1 are regular expressions so are (r), (r ), (r 1 + r 2 ), (r.r 1 ). 3. A string is a regular expression if it can be derived from the primitive regular expressions by applying a finite number of the operations +, * and concatenation. A regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts will usually print them boldface; however, we assume that it will be understood that, in the context of regular expressions, ε is used to represent {ε} and a is used to represent {a}. Example b (ab ab ) is a regular expression. Example (c + da bb) = {c, dbb, dabb, daabb,...} = {ε, c, cc,..., dbb, dbbdbb,..., dabb, dabbdabb,..., cdbb, cdabb,...} Beyond the usual properties of + and concatenation, important equivalences involving regular expressions concern porperties of the closure (Kleene star) operation. Some are given below, where α, β, γ stand for arbitrary regular expressions: 1. (α ) = α. 2. (αα ) = α α. 3. αα + ε = α. 4. α(β + γ) = αβ + αγ. 5. α(βα) = (αβ) α. 6. (α + β) = (α + β ). 7. (α + β) = (α β ). 8. (α + β) = α (βα ). In general, the distribution law does not hold for the closure operation. For example, the statement α + β =? (α + β) is false because the right hand side denotes no string in which both α and β appear.

19 2.3. GRAMMARS Grammars Definition A grammar G is defined as a quadruple G = (V, Σ, S, P ) where V is a finite set of symbols called variables or nonterminals, Σ is a finite set of symbols called terminal symbols or terminals, S V is a special symbol called the start symbol, P is a finite set of productions or rules or production rules. We assume V and Σ are non-empty and disjoint sets. Production rules specify the transformation of one string into another. They are of the form where Given a string w of the form x y x (V Σ) + and y (V Σ). w = uxv we say that the production x y is applicable to this string, and we may use it to replace x with y, thereby obtaining a new string, w z; we say that w derives z or that z is derived from w. Successive strings are derived by applying the productions of the grammar in arbitrary order. production can be used whenever it is applicable, and it can be applied as often as desired. If A we say that w 1 derives w, and write w 1 w. w 1 w 2 w 3 w The * indicates that an unspecified number of steps (including zero) can be taken to derive w from w 1. Thus w w is always the case. If we want to indicate that atleast one production must be applied, we can write Let G = (V, Σ, S, P ) be a grammar. Then the set w + v L(G) = {w Σ s w} is the language generated by G. If w L(G), then the sequence S w 1 w 2 w is a derivation of the sentence (or word) w. The strings S, w 1, w 2,, are called sentential forms of the derivation.

20 14 CHAPTER 2. LANGUAGES AND GRAMMARS Example Consider the grammar G = ({S}, {a, b}, S, P ) with P given by, S asb S ε Then S asb aasbb aabb, so we can write S aabb. The string aabb is a sentence in the language generated by G. Example P: < sentence > < Noun phrase >< V erb phrase > < Noun phrase > < Determiner >< Noun phrase > < Adjective >< Noun > < Noun phrase > < Article >< Noun > < V erb phrase > < V erb >< Noun phrase > < Determiner > T his < Adjective > Old < Noun > Man Bus < V erb > Missed < Article > T he Example < expression > < variable > < expression >< operation >< expression > < variable > A B C Z < operation > + / Leftmost Derivation < expression > < expression >< operation >< expression > < variable >< operation >< expression > A < operation >< expression > A+ < expression > A+ < expression >< operation >< expression > A+ < variable >< operation >< variable > A + B < operation >< expression > A + B < expression > A + B < variable > A + B C

21 2.3. GRAMMARS 15 4 $& > (*)+5 / (*),+- = 0 Ä7 (*),+-. / 0!7698/,$: 9;/ 0 (),+4 "!# $&%' 1(*),+-2 < B,, Figure 2.1: Derivation tree This is a leftmost derivation of the string A + B C in the grammar (corresponding to A + (B C)). Note that another leftmost derivation can be given for the above expression. A grammar G (such as the one above) is called ambiguous if some string in L(G) has more than one leftmost derivation. An unambiguous grammar for the language is the following: < expr > < multi expr > < multi expr >< add expr >< expr > < multi expr > < variable > < variable >< multi op >< variable > < multi op > / < add op > + < variable > A B C Z Note that, for an inherently ambiguous language L, every grammar that generates L is ambiguous. Example Show that L(G) = L G : S ε asb bsa SS L = {w n a (w) = n b (w)} 1. L(G) L. (All strings derived by G, are in L.) For w L(G), all productions of G add a number of a s which is same as the number of b s added; n a (w) = n b (w) w L 2. L L(G) Let w L. By definition of L, n a (w) = n b (w). We show that w L(G) by induction (on the

22 16 CHAPTER 2. LANGUAGES AND GRAMMARS length of w). Basis: ε is in both L and L(G). w = 2. The only two strings of length 2 in L are ab and ba S asb ab S bsa ba Induction Hypothesis: w L with 2 w 2i, we assume that w L(G). Induction Step: Let w 1 L, w 1 = 2i + 2. (a) w 1 of the form w 1 = awb (or bwa) where w = 2i w L(G) (by I. H.) We derive w 1 = awb using the rule S asb. We derive w 1 = bwa using the rule S bsa. (b) w 1 = awa or w 1 = bwb Let us assign a count of +1 to a and -1 to b; Thus for w 1 L the total count = 0. Example We will now show that count goes through 0 at least once within w 1 = awa (case bwb is similar) w 1 = a (count = +1) (count goes through 0) (count = -1) a (by end, count = 0). w 1 = w (count = 0) w where w L, w L. We also have w 2 and w 2 so that w 2i and w 2i w, w L(G) (I. H.) w 1 = w w can be derived in G from w and w, using the rule S SS. L(G) = {a 2n n 0} G = (V, T, S, P ) where V = {S, [, ], A, D} T = {a} P :S [A] [ [D ε D] ] DA AAD ] ε A a

23 2.3. GRAMMARS 17 For example, let us derive a 4. S [A] [DA] [AAD] [AA] [DAA] [AADA] [AAAAD] [AAAA] εaaaaε AAAA aaaa a 4 Example L(G) = {w {a, b, c} n a (w) = n b (w) = n c (w)} V = {A, B, C, S} T = {a, b, c} P : S ε ABCS AB BA AC CA BC CB BA AB CA AC CB BC A a B b C c derive ccbaba Solution: S ABCS ABCABCS ABCABCε ABCABC ACBACB CABCAB CACBBA CCABBA CCBABA cababa Example S ε asb L(G) = {ε, ab, aabb, aaabbb,...} L = {a i b i i 0} To prove that L = L(G) 1. L(G) L 2. L L(G)

24 18 CHAPTER 2. LANGUAGES AND GRAMMARS 2. L L(G) : Let w L, w = a k b k we apply S asb (k times), thus then S ε S a k Sb k S a k b k 1. L(G) L: We need to show that, if w can be derived in G, then w L. ε is in the language, by definition. We first show that all sentential forms are of the form a i Sb i, by induction on the length of the sentential form. Basis: (i = 1) asb is a sentential form, since S asb. Induction Hypothesis: Sentential form of length 2i + 1 is of the form a i Sb i. Induction Step: Sentential form of length 2(i + 1) + 1 = 2i + 3 is derived as S asb a(a i Sb i )b = a i+1 Sb i+1. To get a sentence, we must apply the production S ε; i.e., S a i Sb i a i b i represents all possible derivations; hence G derives only strings of the form a i b i (i 0). 2.4 Classification of Grammars and Languages A classification of grammars (and the corresponding classes of languages) is given with respect to the form of the grammar rules x y, into the Type 1, Type 2 and Type 3 classes, respectively. Type 1 If all the grammar rules x y satisfy x y, then the grammar is context sensitive or Type 1. Grammar G will generate a language L(G) which is called a context-sensitive language. Note that x has to be of length at least 1 and thereby y too. Hence, it is not possible to derive the empty string in such a grammar. Type 2 If all production rules are of the form x y where x = 1, then the grammar is said to be context-free or Type 2 (i.e., the left hand side of each rule is of length 1). Type 3 If the production rules are of the following forms: A xb A x where x Σ (a string of all terminals or the empty string), and A, B V (variables), then the grammar is called right linear. Similarly, for a left linear grammar, the production rules are of the form A Bx A x

25 2.5. NORMAL FORMS OF CONTEXT-FREE GRAMMARS 19 For a regular grammar, the production rules are of the form A ab A a A ε with a Σ. A language which can be generated by a regular grammar will (later) be shown to be regular. Note that, a language that can be derived by a regular grammar iff it can be derived by a right linear grammar iff it can be derived by a left linear grammar. 2.5 Normal Forms of Context-Free Grammars Chomsky Normal Form (CNF) Definition A context-free grammar G = (V, Σ, P, S) is in Chomsky Normal Form if each rule is of the form i) A BC ii) A a iii) S ε where B, C V {S} Theorem Let G = (V, Σ, P, S) be a context-free grammar. There is an algorithm to construct a grammar G = (V, Σ, P, S) in Chomsky normal form that is equivalent to G (L(G ) = L(G)). Example Convert the given grammar G to CNF. Solution: A CNF equivalent G can be given as : G : S A T 1 a A a T 1 AT 2 T 2 BC A A A B B T 3 B C B b T 3 C B C C C c C c G :S aabc a A aa a B bcb bc C cc c Greibach Normal Form (GNF) If a grammar is in GNF, then the length of the terminals prefix of the sentential form is increased at every grammar rule application, thereby enabling the prevention of the left recursion.

26 20 CHAPTER 2. LANGUAGES AND GRAMMARS Definition A context-free grammar G = (V, Σ, P, S) is in Greibach Normal Form if each rule is of the form, i) A aa 1 A 2... A n ii) A a iii) S ε

27 Chapter 3 Finite State Automata 3.1 Deterministic Finite Automata (DFA) Definition A deterministic finite automaton (DFA) is a quintuple M = (Q, Σ, δ, q 0, F ) where Q is a finite set of states, Σ a finite set of symbols called the alphabet, q 0 Q a distinguished state called the start state, F a subset of Q consisting of the final or accepting states, and δ a total function from Q Σ to Q called the transition function. Example Figure 3.1: Example DFA Some strings accepted by the machine are: baab baaab babaabaaba aaa a All of the above strings are characterized by the presence of at least one aa substring. According to the definition of a DFA, the following are identified: Q = {q 0, q 1, q 2 } Σ = {a, b} δ : Q Σ Q : (q i, a) q j where i can be equal to j and the mapping is given by the transition table below. Transition Table: 21

28 22 CHAPTER 3. FINITE STATE AUTOMATA δ a b q 0 q 1 q 0 q 1 q 2 q 0 q 2 q 2 q 2 A sample computation, on the string abaab, is represented as [q 0, abaab] [q 1, baab] [q 0, aab] [q 1, ab] [q 2, b] [q 2, ε] Definition Let M = (Q, Σ, δ, q 0, F ) be a DFA. The language of M, denoted L(M), is the set of strings in Σ accepted by M. A DFA can be considered as a language acceptor; the language recognized by the machine is the set of strings that are accepted by its computations. Two machines that accept the same language are said to be equivalent. Definition The extended transition function ˆδ of a DFA with transition function δ is a function from Q to Q Σ defined by recursion on the length of the input string. i) Basis: length(w) = 0. Then w = ε and ˆδ(q i, ε) = q i. length(w) = 1. Then w = a, for some a Σ and ˆδ(q i, a) = δ(q i, a). ii) Recursive step: Let w be a string of length n > 1. Then w = ua and ˆδ(q i, ua) = δ(ˆδ(q i, u), a). The computation of a machine in state q i with string w halts in state ˆδ(q i, w). A string w is accepted if ˆδ(q 0, w) F. Using this notation, the language of a DFA M is the set L(M) = {w ˆδ(q 0, w) F }. 3.2 Nondeterministic Finite Automata (NFA) Definition A nondeterministic finite automaton is a quintuple M = {Q, Σ, δ, q 0, F } where Q is a finite set of states, Σ a finite set of symbols called the alphabet, q 0 Q a distinguished state known as the start state, F a subset of Q consisting of the final or accepting states, and δ a total function from Q Σ to P(Q) known as the transition function. Note that a deterministic finite automaton is considered a special case of a nondeterministic one. The transition function of a DFA specifies exactly one state that may be entered from a given state and on a given input symbol, while an NFA allows zero, one or more states to be entered. Hence, a string input to an NFA may generate several distinct computations. For the language over Σ = {a, b} where each string has at least one occurrence of a double a, an NFA can be given with the following transition table: δ a b q 0 {q 0, q 1 } {q 0 } q 1 {q 2 } q 2 {q 2 } {q 2 }

29 3.3. NFA WITH EPSILON TRANSITIONS (NFA-ε OR ε-nfa)) 23 Two computations on the string aabaa are given by: and [q 0, aabaa] [q 1, abaa] [q 2, baa] [q 2, ba] [q 2, a] [q 2, ε] [q 0, aabaa] [q 0, abaa] [q 0, baa] [q 0, ba] [q 1, a] [q 2, ε] We will further show that a language accepted by an NFA is also accepted by a DFA. As an example, the language accepted by the above NFA is also accepted by the DFA of Example Definition The language of an NFA M, denoted L(M), is the set of strings accepted by M. That is, L(M) = {w there is a computation [q 0, w] [q i, ε] with q i F }. 3.3 NFA with Epsilon Transitions (NFA-ε or ε-nfa)) So far, in the discussion of Finite State automatons, the reading head was required to move at each step of the transitions. Intuitively, an ε-transition allows the reading head of the automaton to remain at a cell during a transition. Such a transition is called an ε-transition. Definition A nondeterministic finite automaton with ε-transitions is a quintuple M = (Q, Σ, δ, q 0, F ) where Q, δ, q 0, and F are as in an NFA. The transition function is a function from Q (Σ {ε}) to 2 Q. Epsilon transitions can be used to combine existing machines to construct more complex composite machines. Let M 1 and M 2 be two finite automata which consists of a single start state and a final state where there are no arcs entering the start state, and no arcs leaving the accepting state. Composite machines that accept L(M 1 ) L(M 2 ), L(M 1 )L(M 2 ), and L(M 1 ) are constructed from M 1 and M 2 as depicted in Figures The NFA-ε of Example accepts the language over Σ = {a, b} where each string has at least one occurrence of aa or bb. The states of machines M 1 and M 2 are given distint names. A possible computation on the string bbaaa is given below. Example [q 0, bbaaa] [ε 0, bbaaa] [ε 1, baaa] [ε 2, aaa] [q 2, aa] [q 2, a] [q 2, ε]

30 24 CHAPTER 3. FINITE STATE AUTOMATA Figure 3.2: L(M 1 ) L(M 2 ) Figure 3.3: L(M 1 )L(M 2 ) Figure 3.4: L(M 1 ) 3.4 Finite Automata and Regular Sets Theorem The set of languages accepted by finite state automata consists precisely of the regular sets over Σ First we will show that every regular set is accepted by some NFA-ε. This follows from the recursive definition of regular sets. The regular sets are built from the basis elements, {ε} and the singletons containing a symbol from the alphabet. Machines that accept these sets are given in Figure 3.6. The regular sets are constructed from the primitive regular sets using union, concatenation, and Kleene star operations Removing Nondeterminism Definition The ε-closure of a state q i, denoted ε-closure(q i ), is defined recursively by, i) Basis: q i ε-closure(q i ). ii) Recursive step: Let q j be an element of ε-closure(q i ). If q k δ(q j, ε), then q k ε-closure(q i ).

31 3.4. FINITE AUTOMATA AND REGULAR SETS 25 Figure 3.5: Sample Union Construction Figure 3.6: Machines that accept the primitive regular sets iii) Closure: q j is in ε-closure(q i ) only if it can be obtained from q i by a finite number of applications of operations in ii). Algorithm 1 Construction of DM, a DFA Equivalent to NFA-ε M (see text) Example For the NFA-ε of Figure 3.7, we derive the DFA of Figure 3.8.

32 26 CHAPTER 3. FINITE STATE AUTOMATA Figure 3.7: An NFA-ε!! (Note: the diagram of the figure is missing a transition from FG to BCE on 1, and transitions on 0 and 1 at Φ.) Figure 3.8: Equivalent DFA

33 3.4. FINITE AUTOMATA AND REGULAR SETS Expression Graphs Definition An expression graph is a labeled directed graph in which the arcs are labeled by regular expressions. An expression graph, like a state diagram, contains a distinguished start node and a set of accepting nodes. Example The expression graph given in (fig 3.9) accepts the regular expressions u and u vw. Figure 3.9: Expression Graph Figure 3.10: Expression Graph Transformation The reduced graph has atmost two nodes, the start node and an accepting node. If these are the same node, the reduced graph has the form (fig 3.11(a)), accepting w. A graph with distinct start and accepting nodes reduces to (fig 3.11(b)) and accepts the expression w 1 w 2(w 3 w 4 (w 1 ) w 2 ). This expression may be simplified if any of the arcs in the graph are labeled Φ. Figure 3.11: (a) w, (b) w 1 w 2(w 3 w 4 (w 1 ) w 2 )

34 28 CHAPTER 3. FINITE STATE AUTOMATA Algorithm 2 Construction of a Regular Expression from a Finite Automaton input: state diagram G of a finite automaton and the nodes of G are numbered 1, 2,..., n 1. Make m copies of G, each of which has one accepting state. Call these graphs G 1, G 2,..., G m. Each accepting node G is the accepting node of G t, for some t = 1, 2,..., m. 2. for each G t, do 2.1. repeat choose a node i in G t, that is neither the start nor the accepting node of G t delete the node i from G t according to the procedure: for every j,k not equal to i (this includes j = k) do i) if w j,i Φ, and w i,i = Φ then add an arc from node j to node k labeled w j,i w i,k ii) if w j,i Φ, w i,k Φ, and w i,i Φ then add an arc from node j to node k labeled w j,i (w i,i ) w i,k iii) if nodes j and k have arcs labeled w 1, w 2,..., w s connecting them then replace them by a single arc labeled w 1 w 2... w s iv) remove the node i and all arcs incident to it in G t until the only nodes in G t are the start node and the single accepting node. end for 2.2. determine the expression accepted by G t. end for 3. The regular expression accepted by G is obtained by joining the expressions for each G t with. The deletion of the node i is accomplished by finding all paths j, i, k of length two that have i as the intermediate node. An arc from j to k is added by passing the node i. If there is no arc from i to itself, the new arc is labeled by the concatenation of the expressions on each of the component arcs. If w i,i Φ, then the arc w i,i can be traversed any number of times before following the arc from i to k. The label for the new arc is w j,i (w i,i ) w i,k. These graph transformations are illustrated in (fig 3.10).

35 3.4. FINITE AUTOMATA AND REGULAR SETS 29 Example Example 1: Fig 3.12(a) shows the original DFA which is reduced to an expression graph shown in fig 3.12(b). Figure 3.12: Example (a),(b) 2. Example 2: Explanation of elimination: Sequence of steps where one state is eliminated at each step. Step 1: Given: fig 3.13(a) Step 2: Eliminating i at this step, fig 3.13(b) Figure 3.13: Example (a),(b) Step 3: After eliminating all but initial and final state in G i, fig 3.14(c) Step 4: Final regular expression,fig 3.14(d) 3. Example 3: Fig shows the different steps where

36 30 CHAPTER 3. FINITE STATE AUTOMATA Figure 3.14: Example (c),(d) L = r 1 r 2(r 3 + r 3 r 4r 1 r 2r 3 ) = r 1r 2 (r 3 + r 4 r 1r 2 ) or L = r1 fig 3.14(d)

37 3.4. FINITE AUTOMATA AND REGULAR SETS 31 Figure 3.15: Example

38 32 CHAPTER 3. FINITE STATE AUTOMATA

39 Chapter 4 Regular Languages and Sets 4.1 Regular Grammars and Finite Automata This chapter corresponds to Chapter 7 of the course textbook Theorem Let G = (V, Σ, P, S) be a regular grammar. Define the NFA M = (Q, Σ, δ, S, F ) as follows: { V {Z} wherez / V, if P contains a rule A a i) Q = V otherwise ii) δ(a, a) = iii) F = { B Z whenevera ab P whenevera a P { {A A ε P } {Z} ifz Q {A A ε P } otherwise Then L(M) = L(G). Example The grammar G generates and the NFA M accepts the language a (a b + ) G :S as bb a B bb ε The derivation of a string such as aabb is explained below: In G: S a aas aabb aabbb aabbε aabb 33

40 34 CHAPTER 4. REGULAR LANGUAGES AND SETS Figure 4.1: NFA accepts a (a b + ) In M: [S, aabb] [S, abb] [S, bb] [B, b] [B, ε] Similarly, a regular grammar that accepts the L(M) is constructed from the automaton M. G :S as bb az B bb ε Z ε The transitions provide the S rules and the first B rule. The varepsilon rules are added since B and Z are accepting states. Note: Example A regular grammar is constructed from the given DFA in fig 4.2. Figure 4.2: Example 4.1.2

41 4.2. CLOSURE PROPERTIES OF REGULAR SETS 35 S bb aa A as bc B ac bs ε C ab ba 4.2 Closure Properties of Regular Sets A language over an alphabet Σ is regular if it is i) a regular set (expression) over Σ ii) accepted by DFA, NFA, or NFA-ε iii) generated by a regular grammar. Theorem Let L 1 and L 2 be two regular languages. The languages L 1, L 2, L 1 L 2, and L 1 are regular languages. Theorem Let L be a regular language over Σ. The language L is regular. L = Σ L Theorem Let L 1 and L 2 be regular languages over Σ. The language L 1 L 2 is regular. Proof: By DeMorgan s law L 1 L 2 = L 1 L 2 The right-hand side of the equality is regular since it is built from L 1 and L 2 using union and complementation. Theorem Let L 1 be a regular language and L 2 be a context-free language. The language L 1 L 2 is not necessarily regular. Proof: Let L 1 = a b and L 2 = {a i b i 0}. L 2 is context-free since it is generated by the grammar S asb ε. The intersection of L 1 and L 2 is L 2, which is not regular. 4.3 Pumping Lemma for Regular Languages Pumping a string refers to constructing new strings by repeating (pumping) substrings in the original string. Theorem Let L be a regular language that is accepted by a DFA M with n states. Let w be any string in L with length(w) n. Then w can be written as xyz with length(xy) n, length(y) > 0, and xy k z L for all k 0. Example Prove that the languge L = {a i b i i 0} is not regular using the Pumping lemma for regular languages. Proof: By contradiction: Assume L is regular; then the pumping lemma holds. Let w = a n b n. By splitting a n b n into xyz, we get x = a i, y = a j, and z = a n i j b n

42 36 CHAPTER 4. REGULAR LANGUAGES AND SETS where Pumping y to y 2 gives, Therefore, L is not regular. i + j n and j > 0 a i a j a j a n i j b n = a n a j b n / L (contradiction with the pumping lemma). Example The language L = {a i i is prime} is not regular. Assume L is regular, and that a DFA with n states accepts L. Let m be a prime greater than n. The pumping lemma implies that a m can be decomposed as xyz, y ε, such that xy k z is in L for all k 0. The length of s = xy m+1 z must be prime if s is in L. But, length(xy m+1 z) = length(xyzy m ) = length(xyz) + length(y m ) = m + m(length(y)) = m(1 + length(y)) (4.1) Since its length is not prime, xy m+1 z is not in L (contradiction with the pumping lemma). Hence, L is not regular. Corollary Let DFA M have n states. i. L(M) is not empty if, and only if, M accepts a string w with length(w) < n. ii. L(M) has an infinite number of strings if, and only if, M accepts a string w where n length(z) < 2n. Theorem Let M be a DFA. There is a decision procedure to determine whether, i. L(M) is empty; ii. L(M) is finite; iii. L(M) is infinite.

43 Chapter 5 Pushdown Automata and Context-Free Languages 5.1 Pushdown Automata Definition A pushdown automaton is a six tuple (Q, Σ, Γ, δ, q 0, F ), where Q is a finite set of states, Σ a finite set called the input alphabet, Γ a finite set called the stack alphabet, q 0 the start state, F Q a set of final states, and δ a transition function from Q (Σ {ε}) (Γ {ε}) to subsets of Q (Γ {ε}). Example The language L = {a i i 0} {a i b i i 0} contains strings consisting solely of a s or an equal number of a s and b s. The stack of the PDA M that accepts L maintains a record of the number of a s processed until a b is encountered or the input string is completely processed. Figure 5.1: L = {a i i 0} {a i b i i 0} When scanning an a in state q 0, there are two transitions that are applicable. A string of the form a i b i, i > 0, is accepted by a computation that remains in states q 0 and q 1. If a transition to state q 2 follows the processing of the final a in a string a i, the stack is emptied and the input is accepted. Reaching q 2 in any other manner results in an unsuccessful computation, since no input is processed after q 2 is entered. 37

44 38 CHAPTER 5. PUSHDOWN AUTOMATA AND CONTEXT-FREE LANGUAGES The ε-transition from q 0 allows the machine to enter q 2 after the entire input string has been read, since a symbol is not required to process an ε-transition. The transition, which is applicable whenever the machine is in state q 0, introduces nondeterministic computations of M. Example The even-length palindromes over {a, b} are accepted by the PDA That is, L(M) = {ww R w {a, b} }. Figure 5.2: PDA L(M) = ww R A successful computation remains in state q 0 while processing the string w and enters state q 1 upon reading the first symbol in w R. 5.2 Variations on the PDA Theme Pushdown automata are often defined in a manner that differs slightly from Definition In this section we examine several alterations to our definition that preserve the set of accepted languages. Along with changing the state, a transition in a PDA is accompanied by three actions: popping the stack, pushing a stack element, and processing an input symbol. A PDA is called atomic if each transition causes only one of the three actions to occur. Transitions in an atomic PDA have the form [q j, ε] δ(q i, a, ε) [q j, ε] δ(q i, ε, A) [q j, A] δ(q i, ε, ε) Theorem shows that the languages accepted by atomic PDAs are the same as those accepted by PDAs. Moreover, it outlines a method to construct an equivalent atomic PDA from an arbitrary PDA. Theorem Let M be a PDA. Then there is an atomic PDA M with L(M ) = L(M). Proof: To construct M, the nonatomic transitions of M are replaced by a sequence of atomic transitions. Let [q j, B] δ(q i, a, A) be a transition of M. The atomic equivalent requires two new states, p 1 and p 2, and the transitions [p 1, ε] δ(q i, a, ε) δ(p 1, ε, A) = {[p 2, ε]} δ(p 2, ε, ε) = {[q j, B]} In a similar manner, a transition that consists of changing the state and performing two additional actions can be replaced with a sequence of two atomic transitions. Removing all nonatomic transitions produces an equivalent atomic PDA. An extended transition is an operation on a PDA that replaces the stack top with a string of symbols, rather than just a single symbol. The transition [q j, BCD] δ(q i, u, A) replaces the stack top A with the string BCD with B becoming the new stack top. The apparent generalization does

45 5.3. PUSHDOWN AUTOMATA AND CONTEXT-FREE LANGUAGES 39 not increase the set of languages accepted by pushdown automaton. A PDA containing extended transitions is called an extended PDA. Each extended PDA can be converted into an equivalent PDA in the sense of Definition To construct a P DA from an extended P DA, extended transitions are converted to a sequence of transitions each of which pushes a single stack element. To achieve the result of an extended transition that pushes k elements requires k 1 additional states to push the elements in the correct order. The sequence of transitions [p 1, D] δ(q i, u, A) δ(p 1, ε, ε) = {[p 2, C]} δ(p 2, ε, ε) = {[q j, B]} replaces the stack top A with the string BCD and leaves the machine in state q j. This produces the same result as the single extended transition [q j, BCD] δ(q i, u, A). 5.3 Pushdown Automata and Context-Free Languages Theorem Let L be a context-free language. Then there is a PDA that accepts L. Proof: Let G = (V, Σ, P, S) be a grammar in Greibach normal form that generates L. An extended PDA M with start state q 0 is defined by with transitions Q M = {q 0, q 1 } Σ M = Σ Γ M = V {S} F M = {q 1 } δ(q 0, a, ε) = {[q 1, w] S aw P } δ(q 1, a, A) = {[q 1, w] A aw P and A V {S}} δ(q 0, ε, ε) = {[q 1, ε]} if S ε P. We first show that L L(M). Let S uw be a derivation with u Σ + and w V. We will prove that there is a computation [q 0, u, ε] [q 1, ε, w] in M. The proof is by induction on the length of the derivation and utilizes the correspondence between derivations in G and computations of M. The basis consists of derivations S aw of length one. The transition generated by the rule S aw yields the desired computation. Assume that for all strings uw generated by derivations S n uw there is a computation in M. [q 0, u, ε] [q 1, ε, w] Now let S n+1 uw be a derivation with u = va Σ + and w V. This derivation can be written as S n vaw 2 uw,

46 40 CHAPTER 5. PUSHDOWN AUTOMATA AND CONTEXT-FREE LANGUAGES where w = w 1 w 2 and A aw 1 is a rule in P. The inductive hypothesis and the transition [q 1, w 1 ] δ(q 1, a, A) combine to produce the computation [q 0, va, ε] [q 1, a, Aw 2 ] [q 1, ε, w 1 w 2 ] For every string u in L of positive length, the acceptance of u is exhibited by the computation in M corresponding to the derivation S u. If ε L, then S ε is a rule of G and the computation [q 0, ε, ε] [q 1, ε, ε] accepts the null string. The opposite inclusion, L(M) L, is established by showing that for every computation [q 0, u, ε] [q 1, ε, w] there is a corresponding derivation S uw in G. Theorem Let P = (Q, Σ, Γ, δ, q 0, F ) be a PDA. Then there is a context-free grammar G such that L(G) = L(P ). 5.4 The Pumping Lemma for Context-Free Languages Lemma Let G be a context-free grammar in Chomsky normal form and A w a derivation of w Σ with derivation tree T. If the depth of T is n, then length(w) 2 n 1. Corollary Let G = (V, Σ, P, S) be a context-free grammar in Chomsky normal form and S w a derivation of w L(G). If length(w) 2 n, then the derivation tree has depth at least n + 1. Theorem (Pumping Lemma for Context-Free Languages) Let L be a context-free language. There is a number k, depending on L, such that any string z L with length(z) > k can be written as z = uvwxy where i) length(vwx) k ii) length(v) + length(x) > 0 iii) uv i wx i y L, for i 0. Proof: Let G = (V, Σ, P, S) be a Chomsky normal form grammar that generates L and let k = 2 n where n = #V. We show that all stings in L with length k or greater can be decomposed to satisfy the conditions of the pumping lemma. Let z L(G) be such a string and S z a derivation in G. By Corollary 5.4.1, there is a path of length at least n + 1 = #V + 1 in the derivation tree of S z. Let p be a path of maximal length from the root S to a leaf of the derivation tree. Then p must contain at least n + 2 nodes, all of which are labeled by variables except the leaf node which is labeled be a terminal symbol. The pigeon hole principle gurantees that some variable A must occur twice in the final n + 2 nodes of this path. The derivation tree can be divided into subtrees where the nodes labeled by the variable A indicated in the diagram are the final two occurrences of A in the path p. The derivation of z consists of the subderivations 1. S r 1 Ar 2 2. r 1 u 3. A + vax 4. A w 5. r 2 y.

47 5.4. THE PUMPING LEMMA FOR CONTEXT-FREE LANGUAGES 41 Figure 5.3: Pumping Lemma for CFL Subderivation 3 may be omitted or be repeated any number of times before applying subderivation 4. The resulting derivations generate the strings uv i wx i y L(G) = L. We now show that conditions (ii) and (iii) in the pumping lemma are satisfied by this decomposition. The subderivation A + vax must begin with a rule of the form A BC. The second occurence of the variable A is derived from either B or C. If it is derived from B, the derivation can be written A BC vayc vayz = vax The string z is nonnull since it is obtained by a derivation from a variable in a Chomsky normal form grammar that is not the start symbol of the grammar. It follows that x is also nonnull. If the second occurrence of A is derived from the variable C, a similar argument shows that v must be nonnull. The subpath of p from the first occurrence of the variable A in the diagram to a leaf must be of the length at most n + 2. since this is the longest path in the subtree with root A, the derivation tree generated by the derivation A vwx has depth at most n + 1. Also the string vwx obtained from this derivation has length k = 2 n or less. Example The language L = {a i b i c i i 0} is not context-free. Proof: Assume L is context-free. By Theorem 5.4.1, the string w = a k b k c k, where k is the number specified by the pumping lemma, can be decomposed into substrings uvwxy that satisfy the repetition properties. Consider the possibilities for the substrings v and x. If either of these contain more than one type of terminal symbol, then uv 2 wx 2 y contains a b preceding an a or a c preceding a b. In either case, the resulting string is not in L. By the previous observation, v and x must be substrings of one of a k,b k or c k. Since at most one of the strings v and x is null, uv 2 wx 2 y increases the number by at least one, maybe two, but not all three types of terminal symbols. This implies that uv 2 wx 2 y / L. Thus there is no decomposition of a k b k c k satisfying the conditions of the pumping lemma; consequently, L is not context-free.

48 42 CHAPTER 5. PUSHDOWN AUTOMATA AND CONTEXT-FREE LANGUAGES 5.5 Closure Properties of Context-Free Languages Theorem The set of context-free languages is closed under the operations union, concatenation, and Kleene star. Proof: Let L 1 and L 2 be two context-free languages generated by G 1 = (V 1, σ 1, P 1, S 1 ) and G 2 = (V 2, Σ 2, P 2, S 2 ), respectively. The sets V 1 and V 2 of variables are assumed to be disjoint. Since we may rename variables, this assumption imposes no restriction on the grammars. A context-free grammar will be constructed from G 1 and G 2 that establishes the desired closure property. i) Union: Define G = (V 1 V 2 {S}, Σ 1 Σ 2, P 1 P 2 {S S 1 S 2 }, S). A string w is in L(G) if, and only if, there is a derivation S S i w for i = 1or2. Thus w is in L 1 or L 2. On the other hand, any derivation S i w can be initialized with the rule S S i to generate w in G. ii) Concatenation: Define G = (V 1 V 2 {S}, Σ 1 Σ 2, P 1 P 2 {S S 1 S 2 }, S). The start symbol initiates derivation in both G 1 and G 2. A leftmost derivation of a terminal string in G has the form S S 1 S 2 us2 uv, where u L1 and v L 2. The derivation of u uses only rules from P 1 and v rules from P 2. Hence L(G) L 1 L 2. The opposite inclusion is established by observing that every string w in L 1 L 2 can be written uv with u L 1 and v L 2. The derivations S 1 G1 u and S 2 G1 v along with the S rule of G generate w in G. iii) Kleene Star: Define G = (V 1, σ 1, P 1 S S 1 S ε, S). The S rule of G generates any number of copies of S 1. Each of these, in turn, initiates the derivation of a string in L 1. The concatenation of any number of strings from L 1 yields L 1. Theorem The set of context-free languages is not closed under intersection or complementation. Proof: i) Intersection: Let L 1 = {a i b i c j i, j 0} and L 2 = {a j b i c i i, j 0}. L 1 and L 2 are both context-free, since they are generated by G 1 and G 2, respectively. G 1 :S BC B abb ε C cc ε G 2 : S AB A aa ε B bbc ε The intersection of L 1 and L 2 is the set {a i b i c i i 0}, which, by Example 5.4.1, is not contextfree. ii) Complementation: Let L 1 and L 2 be any two context-free languages. If the context-free languages are closed under complementation, then, by Theorem the language L = L 1 L 2 is context-free. By DeMorgan s law, L = L 1 L 2. This implies that the context-free languages are closed under intersection, contradicting the result of part(i). 5.6 A Two-Stack Automaton Finite automata accept regular languages. Pushdown automata accepts context-free languages. Definition A two-stack PDA is structure (Q, Σ, Γ, δ, q 0, F ), where Q is a finite set of states, Σ a finite set called the input alphabet, Γ a finite set called the stack alphabet, q 0 the start state, F Q a set of final states, and δ a transition function from Q (Σ {ε}) (Γ {ε}) (Γ {ε}) to subsets of Q (Γ {ε}) (Γ {ε}).

49 5.6. A TWO-STACK AUTOMATON 43 Example The two-stack PDA defined below accepts the language L = {a i b i c i i 0}. The first stack is used to match the a s and b s and the second b s and c s. Q = {q 0, q 1, q 2 } Σ = {a, b, c} The computation that accepts aabbcc δ(q 0, ε, ε, ε) = {[q 2, ε, ε]} δ(q 0, a, ε, ε) = {[q 0, A, ε]} Γ = {A} δ(q 0, b, A, ε) = {[q 1, ε, A} F = {q 2 } δ(q 1, b, A, ε) = {[q 1, ε, A} illustrates the interplay between the two stacks. [q 0, aabbcc, ε, ε] [q 0, abbcc, A, ε] δ(q 1, c, ε, A) = {[q 2, ε, ε]} δ(q 2, c, ε, A) = {[q 2, ε, ε]} [q 0, bbcc, AA, ε] [q 1, bcc, A, A] [q 1, cc, ε, AA] [q 2, c, ε, A] [q 2, ε, ε, ε]

50 44 CHAPTER 5. PUSHDOWN AUTOMATA AND CONTEXT-FREE LANGUAGES

51 Chapter 6 Turing Machines 6.1 The Standard Turing Machine Definition A Turing machine is a quintuple M = (Q, Σ, Γ, δ, q 0 ) where Q is a finite set of states, Γ is a finite set called the tape alphabet, Γ contains a special symbol B that represents a blank, Σ is a subset of Γ {B} called the input alphabet, δ is a partial function from Q Γ to Q Γ {L, R} called the transition function, and q 0 Q is a distinguished state called the start state Notation for the Turing Machine We may visualize a Turing Machine as in fig 6.1. The machine consists of a finite control, which can be in any of a finite set of states. There is a tape divided into squares or cells; each cell can hold any one of a finite number of symbols. Initially, the input, which is a finite length string of symbols chosen from the input alphabet, is placed on the tape. All other tape cells, extending infinitely to the left and right, initially hold a special symbol called the blank. The blank is a tape symbol, and there may be other tape symbols besides the input symbols and the blank, as well. There is a tape head that is always positioned at one of the tape cells. The Turing Machine is said to be scanning that cell. Initially, the tape head is at the left-most cell that holds the input. A move of the Turing Machine is a function of the state of the finite control and the tape symbol scanned. In one move, the Turing Machine will: 1. Change state. The next state optionally may be the same as the current state. 2. Write a tape symbol in the cell scanned. This tape symbol replaces whatever symbol was in that cell. Optionally, the symbol written may be the same as the symbol currently there. 3. Move the tape head left or right. In our formalism we require a move, and do not allow the head to remain stationary. This restriction does not constrain what a Turing Machine can compute, since any sequence of moves with a stationary head could be condensed, along with the next tape head move, into a single state change, a new tape symbol, a move left or right. Turing machines are designed to perform computations on strings from the input alphabet. A computation begins with the tape head scanning the leftmost tape square and the input string beginning at position one. All tape squares to the right of the input string are assumed to be blank. The Turing machine defined with initial conditions as described above, is referred to as the standard Turing machine. A language accepted by a Turing machine is called a recursively enumerable language. A language accepted by a Turing machine that halts for all input strings is said to be recursive. Example

52 46 CHAPTER 6. TURING MACHINES Figure 6.1: A Turing Machine The Turing machine COPY fig 6.2 with input alphabet a, b produces a copy of the input string. That is, a computation that begins with the tape having the form BuB terminates with tape BuBuB. 6.2 Turing Machines as Language Acceptors Example The Turing machine accepts the language (a b) aa(a b). The computation q 0 BaabbB Bq 1 aabbb Baq 2 abbb Baaq 3 bbb examines only the first half of the input before accepting the string aabb. The language (a b) aa(a b) is recursive; the computations of M halt for every input string. A successful computation terminates when a substring aa is encountered. All other computations halt upon reading the first blank following the input. Example The language {a i b i c i i 0} is accepted by the Turing machine in fig 6.4. A computation successfully terminates when all the symbols in the input string have been transformed to the appropriate tape symbol. 6.3 Alternative Acceptance Criteria Definition Let M = (Q, Σ, Γ, δ, q 0 ) be a Turing machine that accepts by halting. A string u Σ is accepted by halting if the computation of M with input u halts (normally). Theorem The following statements are equivalent: i) The language L is accepted by a Turing machine that accepts by final state.

53 - 3 / - 3 / 6.3. ALTERNATIVE ACCEPTANCE CRITERIA 47!! # $ " Figure 6.2: Turing Machine COPY 4-4 /,.-,0/ %'& (+ (21 3 (5 3 (*) 4-4 / Figure 6.3: TM accepting (a b) aa(a b) ii) The language L is accepted by a Turing machine that accepts by halting. Proof: Let M = (Q, Σ, Γ, δ, q 0 ) be a Turing machine that accepts L by halting. The machine M = (Q, Σ, Γ, δ, q 0, Q) in which every state is a final state, accepts L by final state. Conversely, let M = (Q, Σ, Γ, δ, q 0, F ) be a Turing machine that accepts the language L by final state. Define the machine M = (Q q f, Σ, Γ, δ, q 0 ) that accepts by halting as follows: i) If δ(q i, x) is defined, then δ (q i, x) = δ(q i, x). ii) For each state q i Q F, if δ(q i, x) is undefined, then δ (q i, x) = [q f, x, R]. iii) For each x Γ, δ (q f, x) = [q f, x, R]. Computations that accept strings in M and M are identical. An unsuccessful computation in M may halt in a rejecting state, terminate abnormally, or fail to terminate. When an unsuccessful computation in M halts, the computation in M enters the state q f. Upon entering q f, the machine moves indefinitely to the right. The only computations that halt in M are those that are generated by computations of M that halt in an accepting state. Thus L(M ) = L(M).

54 48 CHAPTER 6. TURING MACHINES " #! $ Figure 6.4: TM accepting a i b i c i 6.4 Multitrack Machines A multitrack tape is one in which the tape is divided into tracks. Multiple tracks increase the amount of information that can be considered when determining the appropriate transition. A tape position in a two-track machine is represented by the ordered pair [x, y], where x is the symbol in track 1 and y in track 2. The states, input alphabet, tape alphabet, initial state, and final states of a two-track machine are the same as in the standard Turing machine. A two-track transition reads and rewrites the entire tape position. A transition of a two-track machine is written δ(q i, [x, y]) = [q j, [z, w], d], where d {L, R}. The input to a two-track machine is placed in the standard input position in track 1. All the positions in track 2 are initially blank. Acceptance in multitrack machines is by final state. Languages accepted by two-track machines are precisely the recursively enumerable languages. Theorem A language L is accepted by a two-track Turing machine if, and only if, it is accepted by a standard Turing machine. Proof: Clearly, if L is accepted by a standard Turing machine it is accepted by a two-track machine. The equivalent two-track machine simply ignores the presence of the second track. Let M = (Q, Σ, Γ, δ, q 0, F ) be a two-track machine. A one-track machine will be constructed in which a single tape square contains the same information as a tape position in the two-track tape. The

55 6.5. TWO-WAY TAPE MACHINES 49 representation of a two-track tape position as an ordered pair indicates how this can be accomplished. The tape alphabet of the equivalent one-track machine M consits of ordered pairs of tape elements of M. The input to the two-track machine consists of ordered pairs whose second component is blank. The input symbol a of M is identified with the ordered pair [a, B]ofM. The one-track machine. with transition function accepts L(M). M = (Q, Σ {B}, Γ Γ, δ, q 0, F ) δ (q i, [x, y]) = δ(q i, [x, y]) 6.5 Two-Way Tape Machines A Turing machine with a two-way tape is identical to the standard model except that the tape extends indefinitely in both directions. Since a two-way tape has no left boundary, the input can be placed anywhere on the tape. All other tape positions are assumed to be blank. The tape head is initially positioned on the blank to the immediate left of the input string. 6.6 Multitape Machines A k-tape machine consists of k tapes and k independent tape heads. The states and alphabets of a multitape machine are the same as in a standard Turing machine. The machine reads the tapes simultaneously but has only one state. This is depicted by connecting each of the independent tape heads to a single control indicating the current state. A transition is determined by the state and symbols scanned by each of the tape heads. A transition in a multitape machine may i) change the state ii) write a symbol on each of the tapes iii) independently reposition each of the tape heads. The repositioning consists of moving the tape head one square to the left or one square to the right or leaving it at its current position. The input to a multitape machine is placed in the standard position on tape 1. All the other tapes are assumed to be blank. The tape heads origanlly scan the leftmost position of each tape. Any tape head attempting to move to the left of the boundary of its tape terminates the computation abnormally. Any language accepted by a k-tape machine is accepted by a 2k + 1-track machine. Theorem The time taken by the one-tape TM N to simulate n moves of a k-tape TM M is O(n 2 ) 6.7 Nondeterministic Turing Machines A nondeterministic Turing machine may specify any finite number of transitions for a given configuration. The components of a nondeterministic machine, with the exception of the transition function, are identical to those of the standard Turing machine. Transitions in a nondeterministic machine are defined by a partial function from Q Γ to subsets of Q Γ {L, R}. Language accepted by a nondeterministic Turing machine is recursively enumerable.

56 50 CHAPTER 6. TURING MACHINES 6.8 Turing Machines as Language Enumerators Definition A k-tape Turing machine E = (Q, Σ, Γ, δ, q 0 ) enumerates a language L if i) the computation begins with all tapes blank ii) with each transition, the tape head on tape 1(the output tape) remains stationary or moves to the right iii) at any point in the computation, the nonblank portion of tape 1 has the form B#u 1 #u 2 # #u k # or B#u 1 #u 2 # #u k #v, where u i L and v Σ iv) u will be written on the o/p tape 1 preceded and followed by # if, and only if, u L. Example The machine E enumerates the language L = {a i b i c i i 0}. (' ) +*,-.,/0 +*! " # 1 $ " # %& # Figure 6.5: A k-tape TM for L = a i b i c i Lemma If L is enumerated by a Turing machine, then L is recursively enumerable. Proof: Assume that L is enumerated by a k-tape Turing machine E. A k+1-tape machine M accepting L can be constructed from E. The additional tape of M is the input tape; the remaining k tapes allow M to simulate the computation of E. The computation of M begins with a string u on its input tape. Next M simulates the computation of E. When the simulation of E writes #, a string w L has been generated. M then compares u with w and accepts u if u = w. Otherwise, the simulation of E is used to generate another string from L and the comparision cycle is repeated. If u L, it will eventually be produced by E and consequently accepted by M.

57 Chapter 7 The Chomsky Hierarchy 7.1 The Chomsky Hierarchy Grammars Languages Accepting Machines Type 0 grammars, Recursively Turing Machine, phrase-structure grammars, enumerable nondeterministic unrestricted grammars languages Turing Machine Type 1 grammars, Context-sensitive Linear-bounded context-sensitive grammars, languages automata monoatonic grammars Type 2 grammars, Context-free Pushdown automata context-free grammars languages Type 3 grammars, Regular languages Deterministic finite regular grammars, automata, left-linear grammars, nondeterministic right-linear grammars finite automata 51

58 52 CHAPTER 7. THE CHOMSKY HIERARCHY

59 Chapter 8 Decidability 8.1 Decision Problems A decision problem P is a set of questions, each of which has a yes or no answer. The single question Is 8 a perfect square? is an example of the type of question under consideration in a decision problem. A decision problem usually consists of an infinite number of related questions. For example, the problem P SQ of determining whether an arbitrary natural number is a perfect square consists of the following questions: p 0 : Is 0 a perfect square? p 1 : Is 1 a perfect square? p 2 : Is 2 a perfect square?... A solution to a decision problem P is an algoritm that determines the appropriate answer to every question p P. An algorithm that solves a decision problem should be 1. Complete 2. Mechanistic 3. Deterministic. A procedure that satisfies the preceding properties is often called effective. A problem is decidable if it has a representation in which the set of accepted input strings form a recursive language. Since computations of deterministic multitrack and multitape machines can be simulated on a standard Turing machine, solutions using these machines also establishes the decidability of a problem. 53

60 54 CHAPTER 8. DECIDABILITY 8.2 The Church-Turing Thesis The Church-Turing thesis asserts that every solvable decision problem can be transformed into an equivalent Turing machine problem. The Church-Turing thesis for decision problems: There is an effective procedure to solve a decision problem if, and only if, there is a Turing machine that halts for all input strings and solves the problem. The extended Church-Turing thesis for decision problems A decision problem P is partially solvable if, and only if, there is a Turing machine that accepts precisely the elements of P whose answer is yes. A proof by the Church-Turing thesis is a shortcut often taken in establishing the existence of a decision algorithm. Rather than constructing a Turing machine solution to a decision problem, we describe an intuitively effective procedure that solves the problem. The Church-Turing thesis asserts that a decision problem P has a solution if, and only if, there is a Turing machine that determines the answer for every p P. If no such Turing machine exists, the problem is said to be undecidable. 8.3 The Halting Problem for Turing Machines Theorem The halting problem for Turing machines is undecidable. Proof: The proof is by contradiction. Assume that there is a Turing machine H that solves the halting problem. A string is accepted by H if i) the input consists of the representation of a Turing machine M followed by a string w ii) the computation of M with input w halts. If either of these conditions is not satisfied, H rejects the input. The operation of the machine H is depicted by the fig 8.1 The machine H is modified to construct a Turing machine H. The "! $#% & '( # *) *#+, - /.0 )1 0 2# 3 *#4, - Figure 8.1: Halting Machine computations of H are the same as H except H loops indefinitely whenever H terminates in an accepting state, that is, whenever M halts on input w. The transition function of H is constructed from that of H by adding transitions that causes H to move indefinitely to the right upon entering an accepting configuration of H. H is combined with a copy machine to construct another Turing machine D. The input to D is a Turing machine representation R(M). A computation of D begins by creating the string R(M)R(M) from the input R(M). The computation continues by running H on R(M)R(M). The input to the machine D may be the representation of any Turing machine with alphabet 0, 1, B. In particular, D is such a machine. Consider a computation of D with input R(D). Rewriting the previous diagram with M replaced by D and R(M) by R(D), we get Examining the preceding computation, we see that D halts with input R(D) if, and only if, D does not halt with input R(D). This is obviously a contradiction. However, the machine D can be

61 T 8.4. A UNIVERSAL MACHINE 55!" +%,%-. / 0 $#&%('()%(*!) Figure 8.2: Turing Machine D with R(M) as input 3:9;<=>6@? =9? ABC!="1 2.3D4 <+F,F-B * D ;<= 3:E&F,G(>A)FH=*9;<+=I6@? =9? AB!C!=I1 2+3D4 Figure 8.3: Turing Machine D with R(D) as input constructed directly from a machine H that solves the halting problem. The assumption that the halting problem is decidable produces the preceding contradiction. Therefore, we conclude that the halting problem is undecidable. Corollary The language L H = {R(M)w R(M)} where R(M) is the representation of a Turing machine M and M halts with input w over {0, 1} is not recursive. 8.4 A Universal Machine The machine U is called a universal Turing machine since the outcome of the computation of any machine M with input w can be obtained by the computation of U with input R(M)w. RS.T Ü V WYX[Z\ M^],_HJ P T JKa` ZX M S WbU `J P O_ V@Z O` ZX Nc!O V JKLKLMNO Ted Q M(_ X Q Of`J P O V@Z O` Z X Nc!O V P+Q(Q N Figure 8.4: Universal Machine

62 56 CHAPTER 8. DECIDABILITY Theorem The language L H is recursively enumerable. Proof: A deterministic three-tape machine U is designed to accept L H by halting. A computation of U begins with the input on tape 1. The encoding scheme presented in Section 8.3 is used to represent the input Turing machine. If the input string has the form R(M)w, the computation of M with input w is simulated on tape 3. The universal machine uses the information encoded in the representation R(M) to simulate the transitions of M. A computation of U consists of the following actions: 1. If the string does not have the form R(M)w for a Turing machine M and string w, U moves indefinitely to the right. 2. The string w is written on tape 3 beginning at position one. The tape head is then repositioned at the leftmost square of the tape. The configuration of tape 3 is the initial configuration of a computation of M with input w. 3. A single 1, the encoding of state q 0, is written on tape A transition of M is simulated on tape 3. The transition of M is determined by the symbol scanned on tape 3 and the state encoded on tape 2. Let x be the symbol from tape 3 and q i the state encoded on tape 2. a) Tape 1 is scanned for a transition whose first two components match en(q i ) and en(x). If there is no such transition, U halts accepting the input. b) Assume tape 1 contains the encoded transition en(q i )0en(x)0en(q j )0en(y)0en(d). Then i) en(q i ) is replaced by en(q j ) on tape 2. ii) The symbol y is written on tape 3. iii) The tape head of tape 3 is moved in the direction Example n specified by 5. The next transition of M is simulated by repeating steps 4 and 5. The simulations of the universal machine U accepts the strings in L H. The computations of U loop indefinitely for strings in {0, 1} L H. Since L H = L(U), L H is recursively enumerable. Corollary The recursive languages are a proper subset of the recursively enumerable languages. Proof: The acceptance of L H by the universal machine demonstrates that L H is recursively enumerable while Corollary established that L H is not recursive. Note: A language L is recursive if both L and L are recursively enumerable. Corollary The language L H is not recursively enumerable. 8.5 The Post Correspondence Problem The undecidable problems presented in the preceding sections have been concerned with the properties of Turing machines or mathematical systems that stimulate Turing machines. The Post correspondence problem is a combinatorial question that can be described as a simple game of manipulating dominoes. A domino consists of two strings from a fixed alphabet, one on the top half of the domino and the other on the bottom. The game begins when one of the dominoes is placed on a table. Another domino is then placed to the immediate right of the domino on the table. This process is repeated, constructing a sequence of adjacent dominoes. A Post correspondence system can be thought of as defining a finite set of domino types. We assume that there is an unlimited number of dominoes of each type; playing a

63 8.5. THE POST CORRESPONDENCE PROBLEM 57 domino does not limit the number of future moves. A string is obtained by concatenating the strings in the top halves of a sequence of dominoes. We refer to this as the top string. Similarly, a sequence of dominoes defines a bottom string. The game is successfully completed by constructing a finite sequence of dominoes in which the top and bottom strings are identical. Consider the Post correspondence system defined by dominoes in fig 8.5, The sequence in fig 8.6 is a Figure 8.5: Post Correspondence System Figure 8.6: Post Correspondence Solution solution to this Post correspondence system. Formally, a Post Correspondence System consists of an alphabet Σ and a finite set of ordered pairs [u i, v i ], i = 1, 2,, n, where u i, v i Σ +. A solution to a Post correspondence system is a sequence i 1, i 2,, i k such that u i1 u i2 u ik = v i1 v i2 v ik. The problem of determining whether a Post correspondence system has a solution is the Post correspondence problem. Example The Post correspondence system with alphabet {a, b} and ordered pairs [aaa, aa], [baa, abaaa] has a solution.

64 58 CHAPTER 8. DECIDABILITY Figure 8.7: Example 8.5.1

65 Chapter 9 Undecidability There are specific problems we cannot solve using a computer. These problems are called undecidable. While a Turing Machine looks nothing like a PC, it has been recognized as an accurate model for what any physical computing device is capapble of doing. We use the Turing Machine to develop a theory of undecidable problems. We show that a number of problems that are easy to express are in fact undecidable. 9.1 Problems That Computers Cannot Solve One particular problem that we discuss is whether the first thing a C program prints is hello, world. Although we might imagine that simulation of the program would allow us to tell what the program does, we must in reality contend with programs that take an unimaginably long time before making any output at all. This problem - not knowing when, if ever, something will occur - is the ultimate cause of our inablility to tell what a program does Programs that Print Hello, World In fig 9.1, it is easy to discover that the program prints hello, world and terminates. However, there are other programs that also print hello, world; yet the fact that they do so is far from obvious. Figure 9.2 shows another program that might print hello, world. It takes an input n, and looks for positive integer solutions to the equation x n + y n = z n. If it finds one, it prints hello, world. If it never finds integers x, y, and z to satisfy the equation, then it continues searching forever, and never prints hello, world. If the value of n that the program reads is 2, then it will eventually find combinations of integers main() { printf ( hello, world ); } Figure 9.1: Hello-World Program such as total = 12, x = 3, y = 4, and z = 5, for which x n + y n = z n. Thus, for input 2, the program does print hello, world. However, for any integer n > 2, the program will never find a triple of positive integers to satisfy x n + y n = z n, and thus will fail to print hello, world. Interestingly, until a few years ago, it was known whether or not this program would print hello, world for some large integer n. The claim that 59

66 60 CHAPTER 9. UNDECIDABILITY int exp (int i, n) /* computes i to the power n */ { int ans, j; ans = 1; for (j=1; j<=n; j++) ans *=i; return(ans); } main() { int n, total, x, y, z; scanf( %d, &n); total = 3; while( 1 ) { for( x=1; x<=total-2; x++ ) for ( y=1; y<=total-x-1; y++) { z = total - x - y; if ( exp (x,n) + exp (y,n) == exp (z,n) ) printf ( hello, world ); } total++; } } Figure 9.2: Fermat s last theorem expressed as a hello-world program it would not, i.e., that there are no integer solutions to the equation x n + y n = z n if n > 2, was made by Fermat 300 yars ago, but no proof was found until quite recently. This statement is often referred to as Fermat s last theorem. Let us define the hello world problem to be: determine whether a given C program, with a given input, prints hello, world as the first 12 characters that it prints. It would be remarkable indeed if we could write a program that could examine any program P and input I for P, and tell whether P, run with I as its input, would print hello, world. We shall prove that no such program exists The Hypothetical Hello, World Tester The proof of impossibility of making the hello-world test is a proof by contradiction. That is, we assume there is a program, call it H, that takes as input program P and an input I, and tells whether P with input I prints hello, world. Figure 9.3 is a representation of what H does. If a problem has an algorithm like H, that always tells correctly whether an instance of the problem! Figure 9.3: A hypothetical program H that is a hello-world detector

67 9.1. PROBLEMS THAT COMPUTERS CANNOT SOLVE 61 has answer yes or no, then the problem is said to be decidable. Our goal is to prove that H does not exist, i.e. the hello-world problem is undecidable. In order to prove that statement by contradiction, we are going to make several changes to H, eventually constructing a related program called H 2 that we show does not exist. Since the changes to H are simple transformations that can be done to any C program, the only questionable statement is the existence of H, so it is that assumption we have contradicted. To simplify our discussion, we shall make a few assumptions about C programs. 1. All output is character-based, e.g., we are not using a graphics package or any other facility to make output that is not in the form of characters. 2. All character-based output is performed using printf, rather than put-char() or another characterbased output function. We now assume that the program H exists. Our first modification is to change the output no, which is the response that H makes when its input program P does not print hello, world as its first output in reponse to input I. As soon as H prints n, we know it will eventually follow with o. Thus, we can modify any printf statement in H that prints n to instead print hello, world. Another printf statement that prints o but not the n is omitted. As a result, the new program, which we call H 1, behaves like H, except it prints hello, world exactly when H would print no. H 1 is suggested by Fig 9.4. Since we are interested in programs that take other programs as input and tell something about Figure 9.4: H 1 behaves like H, but it says hello, world instead of no them, we shall restrict H 1 so that it: a. Takes only input P, not P and I. b. Asks what P would do if its input were its own code, i.e., what would H 1 do on inputs P as program and P as input I as well? The modifications we must perform on H 1 to produce the program H 2 as shown in fig follows: 9.5 are as 1. H 2 first reads the entire input P and stores it in an array A, which it malloc s for the purpose. 2. H 2 then simulates H 1, but whenever H 1 would read input from P or I, H 2 reads from the stored copy in A. To keep track of how much of P and I H 1 has read, H 2 can maintain two cursors that mark positions in A. Fig, 9.6 shows what H 2 does when given itself as input. Recall that H 2, given any program P as input, makes output yes if P prints hello, world when given itself as input. Also, H 2 prints hello, world if P, given itself as input, does not print hello, world as its first output. Suppose that the H 2 represented by the box in fig 9.6 makes the output yes. Then the H 2 in the box is saying about its input H 2 that H 2, given itself as input, prints hello, world as its first output. But we just supposed that the first output H 2 makes in this situation is yes rather than hello, world. Thus it appears that in fig. 9.6 the output of the box is hello, world, since it must be one or the

68 62 CHAPTER 9. UNDECIDABILITY Figure 9.5: H 2 behaves like H 1, but uses its input P as both P and I! Figure 9.6: What does H 2 do when given itself as input? other. But if H 2, given itself as input, prints hello, world first, then the output of the box in fig. 9.6 must be yes. Whichever output we suppose H 2 makes, we can argue that it makes the other output. This situation is paradoxical, and we conclude that H 2 cannot exist. As a result, we have contradicted the assumption that H exists. That is, we have proved that no program H can tell whether or not a given program P with input I prints hello, world as its first output Reducing One Problem to Another Suppose we want to determine whether or not some other problem is solvable by a computer. We can try to write a program to solve it, but if we cannot figure out how to do so, then we can try to prove that no such program exists. We could prove this new problem undecidable by a technique similar to what we did for the helloworld problem: assume there is a program to solve it and develop a paradoxical program that must do two contradictory things. However, once we have a problem that we know is undecidable, we no longer have to prove the existence of a paradoxical situation. It is sufficient to show that if we could solve the new problem, then we could use that solution to solve a problem that we already know is undecidable. This technique is called the reduction of P 1 to P 2 and is illustrated in fig Suppose we know that P 1 is undecidable, and P 2 is a new problem that we would like to prove is undecidable as well. We suppose that there is a program represented in fig. 9.7 by the diamond labeled decide ; this program prints yes or no, depending on whether its input instance of problem P 2 is or is not in the language of that problem. In order to make a proof that problem P 2 is undecidable, we have to invent a construction, represented by the square box in fig. 9.7, that converts instances of P 1 to instances of P 2 that have the same answer. Once we have this construction, we can solve P 1 as follows: 1. Given an instance of P 1, that is, given a string w that may or may not be in the language P 1, apply the construction algorithm to produce a string x. 2. Test whether x is in P 2, and give the same answer about w and P 1.

69 9.2. A LANGUAGE THAT IS NOT RECURSIVELY ENUMERABLE 63 Figure 9.7: Reduction of P 1 to P 2 If w is in P 1, then x is in P 2, so this algorithm says yes. If w is not in P 1, then x is not in P 2, and the algorithm says no. Either way, it says the truth about w. Since we assumed that no algorithm to decide membership of a string in P 1 exists, we have proof by contradiction that the hypothesized decision algorithm for P 2 does not exist; i.e., P 2 is undecidable. We shall now give a formal proof of the existence of a problem about Turing Machines that no Turing Machine can solve. We divide problems that can be solved by a Turing Machine into two classes: those that have an algorithm(i.e.,a Turing Machine that halts whether or nor it accepts its input), and those that are only solved by Turing Machines that may run forever on inputs they do not accept. We prove undecidable the following problem: Does this Turing Machine accept this input? Then, we exploit this undecidability result to exhibit a number of other undecidable problems. 9.2 A Language That Is Not Recursively Enumerable Our goal is to prove undecidable the language consisting of pairs (M, w) such that: 1. M is a Turing machine (coded in binary) with input alphabet 0,1, 2. w is a string of 0 s and 1 s, and 3. M accepts input w. We must give a coding for Turing Machines that uses only 0 s and 1 s, regardless of how many states the TM has. We can then treat any binary string as if it was a Turing Machine. If the string is not a well-formed representation of some TM, we may think of it as representing a TM with no moves Enumerating the Binary Strings We shall assign integers to all binary strings so that each integer corresponds to one string. If w is a binary string, treat w as a binary integer i. Then we shall call w the ith string. That is, ε is the first string, 0 is the second, 1 is the third, 00 is the fourth, 01 the fifth, and so on. Equivalently, strings are ordered by length, and strings of equal length are ordered lexicographically. We refer to the ith string as w i Codes for Turing Machines To represent a TM M = (Q, 0, 1, Γ, δ, q 1, B, F ) as a binary string, we must first assign integers to the states, tape symbols, and directions L and R.

70 64 CHAPTER 9. UNDECIDABILITY 1. We shall assume the states are q 1, q 2,..., q r for some r.the start state will always be q 1, and q 2 will be the only accepting state. 2. We shall assume the tape symbols are X 1, X 2,..., X s for some s.x 1 always will be the symbol 0, X 2 will be 1, X 3 will be B, the blank. Other tape symbols can be assigned to the remaining integers arbitrarily. 3. We shall refer to direction L as D 1 and direction R as D 2. Once we have established an integer to represent each state, symbol, and direction, we can encode the transition function δ. Suppose one transition rule is δ(q i, X j ) = (q k, X l, D m ), for some integers i, j, k, l and m. We shall code this rule by the string 0 i 10 j 10 k 10 l 10 m. Notice that since all of i, j, k, l, and m are atleast one, there are no occurrences of two or more consecutive 1 s within the code for a single transition. A code for the entire TM M consists of all the codes for the transitions, in some order, separated by pairs of 1 s: C 1 11C C n 111C n where each of the C s is the code for one transition of M The Diagonalization Language There is now a concrete notion of M i, the ith Turing Machine: that TM M whose code is w i, the ith binary string. Many integers do not correspond to any TM at all. If w i is not a valid TM code, we shall take M i to be a TM with one state and no transitions. That is, for these values of i, M i is a Turing Machine that immediately halts on any input. Thus L(M i ) is φ if w i fails to be a valid TM code. The language L d, the diagonalization language, is the set of strings w i such that w i is not in L(M i ). That is, L d consists of all strings w such that the TM M whose code is w does not accept when given w as input. The reason L d is called a diagonalization language can be seen if we consider Fig This table tells for all i and j, whether TM M i accepts inout string w j ; 1 means yes and 0 means no. We may think of the ith row as the characteristic vector for the language L(M i ); that is the 1 s in this row indicate the strings that are members of this language. The diagonal value tells whether M i accepts w i. To construct L d, we complement the diagonal. For instance, if fig. 9.8 were the correct table, then the complemented diagonal would begin 1,0,0,0,... Thus, L d would contain w 1 = ε, not contain w 2 through w 4, which are 0,1, and 00, and so on. The trick of complementing the diagonal to construct the charactersitic vector of a language that cannot be the language that appears in any row, is called diagonalization Proof that L d is not Recursively Enumerable Theorem L d is not recursively enumerable language. That is, there is no Turing machine that accepts L d. Proof: Suppose L d were L(M) for some TM M. Since L d is a language over alphabet {0,1}, M would be in the list of Turing machine we have constructed since it includes the TM s with input alphabet {0,1}. Thus, there is atleast one code for M, say i; that is M = M i. Now, ask if w i is in L d. If w i is in L d, then M i accepts w i. But then, by definition of L d, w i is not in L d, because L d contains only those such w j that M j does not accept w j

71 9.2. A LANGUAGE THAT IS NOT RECURSIVELY ENUMERABLE 65 Figure 9.8: The table that represents acceptance of strings by Turing machines Similarly, if w i is not L d, then M i does not accept w i. Thus, by definition of L d, w i is in L d. Since w i can niether be in L d nor fail to be in L d, we conclude that there is a contradiction of our assumption that M exists. That is L d is not recursively enumerable languages. +*! #"!$%$& '(") Figure 9.9: Relationship between the recursive, RE, and non-re languages Complements of Recursive and RE languages Theorem If L is a recursive language, so is L. Proof: Let L = L(M) for some TM M that always halts. We construct a TM M such that L = L(M) by the construction suggested in fig That is, M behaves just like M. However, M is modified as follows to create M: 1. The accepting states of M are made non accepting states of M with no transitions; i.e., in these states M will halt without accepting.

72 66 CHAPTER 9. UNDECIDABILITY 2. M has a new accepting state r; there are no transitions from r. 3. For each combination of a nonaccepting state of M and a tape symbol of M such that M has no transition(i.e., M halts without accepting), add a transition to the accepting state r. Figure 9.10: Construction of a TM accepting the complement of a recursive language Since M is generated to halt, we know thet M is also guaranteed to halt. Moreover, M accepts exactly those strings that M does not accept, Thus, M accepts L. Theorem If both a language L and its complement are RE, then L is recursive. Note that by Theorem 9.2.2, L is recursive as well. Proof: The proof is suggested by fig Let L = L(M 1 ) and L = L(M 2 ). Both M 1 and M 2 are simulated in parallel by a TM M. We can make M a two-tape TM, and then convert it to a one-tape TM, to make the simulation easy and obvious. One tape of M simulates the tape of M 1, while the other tape of M simulates the tape of M 2. The states of M 1 and M 2 are each components of the state of M. Figure 9.11: Simulation of two TM s accepting a language and its complement If input w to M is in L, then M 1 will eventually accept. If so, M accepts and halts. If w is not in L, then it is in L, so M 2 will eventually accept. When M 2 accepts, M halts without accepting. Thus on all inputs, M halts, and L(M) is exactly L. Since M always halts, and L(M) = L, we conclude that L is recursive.

73 9.2. A LANGUAGE THAT IS NOT RECURSIVELY ENUMERABLE 67 Summarizing Theorem and 9.2.3: L and L are both recursive. L and L are both RE. L is RE but not recursive then L is not RE. L is RE but not recursive then L is not RE The Universal Language Definition L u, the universal language, is the set of binary strings that encode, a pair (M, w), where M is a TM with binary input alphabet, and w is a string in (0 + 1), such that w is in L(M). That is, L u is the set of strings representing a TM. We shall show that there is a TM U, often called the universal Turing machine, such that L u = L(U). Since the input to U is a binary string U is in fact some M j in the list of binary-input Turing machine. L u = { (M, w) w L(M) } It is easy to describe U as a multitape Turing machine, in the spirit of fig In case of U, the transitions of M are stored initially on the first tape, along with the string w. A second tape will be used to hold the simulated tape of M, using the same format as for the code of M. That is tape symbol X i of M will be represented by 0 i, and tape symbols will be seperated by single 1 s. The third tape U holds the state of M, with stae q i represented by i 0 s. A sketch of U is in fig The operation of U can be summarized as follows: 1. Examine the input to make sure that the code for M is a legitimate code for some TM. If not, U halts without accepting. Since invalid codes are assumed to represent the TM with no moves, and such a TM accepets no inputs, the action is correct. 2. Initialize the second tape to contain the input w, in its encoded form. That is, for each 0 w, place 10 on the second tape, and for each 1 of w, place 100 there. Note that the blanks on the simulated tape of M, which are represented by 1000, will not actually appear on that tape; all cells beyond those used for w will hold the blank of U. However, U knows that, should it look for a simulated symbol of M and find its own blank, it must replace that blank by the sequence 1000 to simulate the blank of M. 3. Place 0, the start state of M, on the third tape, and move the head of U s second tape to the first simulated cell. 4. To simulate a move of M, U searches on its first tape for a transition 0 i 10 j 10 k 10 l 10 m, such that 0 i is the state on tape 3, and 0 j is the tape symbol of M that begins at the position on tape 2 scanned by U. This transition is the one M would next make. U should: (a) Change the contents of tape 3 to 0 k ; that is, simulate the state change of M. To do so, U first changes all the 0 s on the tape 3 to blanks, and then copies 0 k from tape 1 to tape 3. (b) Replace 0 j on tape 2 by 0 l ; that is, change the tape symbol of M. If more or less space is needed (i.e., i l), use the scratch tape and the shifting-over techniques to manage the spacing. (c) Move the head on tape 2 to the position of the next 1 to the left or right, respectively, depending on whether m = 1 (move left) or m = 2 (move right). Thus, U simulates the move the M to the left or to the right. 5. If M has no transition that matched the simulated state and tape symbol, then in (4), no transition will be found. Thus, M halts in the simulated configuration, and U must do likewise. 6. If M enters its accepting state, then U accepts. In this manner, U simulated M on w. U accepts the coded pair (M, w) if and only if M accepts w.

74 68 CHAPTER 9. UNDECIDABILITY!"# $%& '('('*)+'('('('('*)+',)+''(').- -/- 0 1%& '('('2-/- - '(3435-/- - Figure 9.12: Organization of a universal Turing machine Undecidability of the Universal Language We can now exhibit a problem that is RE but not recursive; it is the language L u. Knowing that L u is undecidable(i.e., not a recursive language), is in many ways more valuable than our previous discovery that L d is not RE. The reason is that the reduction of L u to another problem P can be used to show there is no algorithm to solve P, regardless of whether or not P is RE. However, reduction of L d to P is only possible if P is not RE, so L d cannot be used to show undecidability for these problems that are RE but not recursive. On the other hand, if we want to show a problem not to be RE, then only L d can be used; L u is useless since it is RE. Theorem L u is RE but not recursive. Proof: We just proved in section that L u is RE. Suppose L u were recursive. Then by Theorem 9.2.2, L u, the complement of L u, would also be recursive. However, if we have a TM M to accept L u, then we can construct a TM to accept L d. SInce we already know that L d is not RE, we have a contradiction of our assumption that L u is recursive.

75 9.3. UNDECIDABLE PROBLEMS ABOUT TURING MACHINES 69 " $#! Figure 9.13: Reduction of L d to L u 9.3 Undecidable Problems About Turing Machines Reductions In general if we have an algorithm to convert instances of a problem P 1 to instances of problem P 2 that have the same answer, then we say that P 1 reduces to P 2. As in Fig 9.14, a reduction must turn any instance of P 1 that has a yes answer into instances of P 2 with a yes answer, and every instance of P 1 with a no answer be turned into instance of P 2 with a no answer. Figure 9.14: Reductions turn positive instances into positive and negative to negative Theorem If there is a reduction from P 1 to P 2, then: (a) If P 1 is undecidable then so is P 2. (b) If P 1 is non-re then so is P 2. Proof: First suppose P 1 is undecidable. If it is possible to decide P 2, then we can combine the reduction from P 1 to P 2 with the algorithm that decides P 2 to construct an algorithm, that decides P 1, as suggested in fig 9.7. Suppose if we are given an instance w of P 1. Apply to w the algorithm that converts w into an instance x of P 2. Then apply the algorithm that decides P 2 to x. If that algorithm says yes, then x is in P 2. Because we reduced P 1 to P 2, we know the answer to w for P 1 is yes ;i.e., w is in P 1. Likewise, If x is not in P 2 then w is not in P 1, and whether answer we give to the question is x in P 2? is also the correct answer to w in P 1?

76 70 CHAPTER 9. UNDECIDABILITY We have thus contradicted the assumption that P 1 is undecidable. Our conclusion is that if P 1 is undecidable, then P 2 is undecidable. Now, consider part (b). Assume that P 1 is non-re, but P 2 is RE. Now, we have an algorithm to reduce P 1 to P 2, but we have only a procedure to recognize P 2 ; that is, there is a TM that says yes, if its input is in P 2, but may not halt if its input is not in P 2. As for part (a), starting with an instance w of P 1, convert it by the reduction algorithm to an instance x of P 2. Then apply the TM for P 2 to x. If x is accepted is accepted then accept w. This procedure describes a TM whose language is a TM. If w is a TM, then x is in P 2, so this TM will accept w. If w is not in P 1, then x not in P 2. Then, the TM may or may not halt, but will surely not accept w, Sice we assumed no TM for P 1 exists, we have shown by contradiction that no TM for P 2 exists either; i.e., if P 1 is non-re, then P 2 is non-re Turing Machine That Accepts the Empty Language Definition L e = { M L(M) = } L ne = { M L(M) } Theorem L ne is recursively enumerable. Proof: A nondeterministic TM M can be converted to a deterministic TM The operations of M are as follows: 1. M takes as input a TM code M i. Figure 9.15: Construction of a NTM to accept L ne 2. M : is non deterministic L(M i ) M guesses an input w L(M i ) 3. M simulates M i on w If M i accepts w M accepts its own input, which is M i In this manner, if M i accepts even one string, M will guess that string, and accept M i. However L(M i ) =, then no guess w leads to acceptance by M i, so M does not accept M i. Thus, L(M) = L ne. Theorem L ne is not recursive.

77 9.3. UNDECIDABLE PROBLEMS ABOUT TURING MACHINES 71 Figure 9.16: Plan of TM M constructed from (M, w) Proof: Reduction: L u L ne Working of M : 1. x on input tape; M overrides x with string (M, w). Since M is designed for a specific pair (M, w), which has some length n, we may construct M to have a sequence of states q 0, q 1,..., q n, where q 0 is the start state. (a) In state q i, for i = 0, 1,..., n 1, M writes the (i + 1)st bit of the code for (M, w), goes to state q ( i + 1), and moves right. (b) In state q n, M moves right, If necessary, replacing any nonblanks by blanks. 2. When M reaches a blank in state q n, it uses a similar collection of states to reposition its head at the left end of the tape. 3. Now, using additional states, M simulates TM U on its present tape. 4. If U accepts (w), then M accepts (x). If U never accepts, then M never accepts either. Construction of M : If M accepts w: M accepts anything L(M ) M L ne If M never accpets w: M never accepts anything L(M ) = M / L ne Reduction of L u and non-recursiveness of L u is sufficient to complete the proof. L ne not recursuve. However, to illustrate the impact of reduction consider if L ne were recursive, then we could develop an algorithm for L u as follows: Since from Theorem that no such algorithm for L u exists. we Algorithm 3 Algorithm for L u 1. Convert (M, w) to the TM M as above. 2. Use the hypothetical algorithm for L ne to tell whether or not L(M ) =. If so, say M does not aceept w; if L(M ), say M does not accept w. have contradicted the assumption that L ne is recursive, and conclude that L ne is not recursive. Now, we know the status of L e. If L e were RE, then by Theorem 9.2.3, both it and L ne would be recursive. Since L ne is not recursive by Theorem 9.3.3, we conclude that:

78 72 CHAPTER 9. UNDECIDABILITY Theorem L e is not RE Rice s Theorem and Properties of RE Languages A property of the RE languages is simply a set of RE languages. Thus, the property of being contextfree is formally the set of all CFL s. The property of being empty is the set { } consisting of only the empty language. A property is trivial if it is either empty (i.e., satisfied by no language at all), or is all RE languages. Otherwise, it is nontrivial. Note that the empty property,, is different from the property of being an empty language,{ }. if P is a property of the RE languages, the language L P is the set of codes for the Turing Machines in M i such that L(M i ) is a language in P. When we talk about the decidability of a property P, we mean the decidability of the language L P Theorem Every nontrivial property of the RE languages is undecidable. Let P be a nontrivial property of the RE languages. Assume to begin that, the empty language, is not in P; Since P is nontrivial, there must be some non-empty languages L that is in P. Let M L be a TM accepting L. Example: 1. L P is { M L(M) } = L ne L P = { M L(M) P} 2. L P is { M L(M) = } = L e 3. L P is { M L(M) is regular language } 4. L P is { M L(M) is CF } Proof: Reduce L u L P 1. Assume / P P is non-trivial L P which is not empty. L is RE TM M L that accepts L. (a) Simulate M on input w. Note that w is not the input M ; rather, M writes M and w onto one of its tapes and simulates the universal TM U on that pair. (b) If M does not accept w, then M does not accept nothing else. M never accepts its own input x, so L(M ) =. Since we assume is not in property P, that means the code for M is not in L P. (c) If M accepts w, then M begins simulating M L on its own input x. Thus, M will accept exaclty the language L. Since L is in P, the code for M is in L P. M accepts w L(M ) P M does not accepts w L(M ) / P Since the above algorithm turns (M, w) into M that is in L P if and only if (M, w) is in L u, this algorithm is reduction of L u to L P, and proves that the property P is undecidable.

79 9.3. UNDECIDABLE PROBLEMS ABOUT TURING MACHINES 73 Figure 9.17: Construction of a M for the proof of Rice s Theorem 2. Now P / L P From (1) we know that L P is undecidable. However, since every TM accepts an RE language, L P, the set of codes for turing machines that do not accept a language in P is the same as L P, the set of TM s that accept language in P. Suppose L P were decidable. Then so would L P, because the complement of a recursive language is recursive (Theorem 9.2.2). Theorem Rice s theorem on recursive index sets states that if P is non-trivial, L P recursive. is not Theorem If L P is RE; then the list of binary codes for the finite sets in P is enumerable. Proof: Let (i, j) be a pair generated and we treat i as the binary code of a finite set, assuming 0 is the code for comma, 10 the code for zero, and 11 the code for one. We may in straightforward manner contruct a TM M (i) (essentially a finite automaton) that accepts exactly the words in the finite language represented i. We then simulate the enumerator for L P for j steps. If it has printed M (i), we print the code for the finite set represented by i, that is, the binary representation of i itself,followed by a delimiter symbol #. In any event, after the simulation we return control to the pair generated, which generates the pair following (i, j). Theorem L P is RE if and only if 1. If L is in P and L L, for some RE L, then L is in P 2. If L is an infinite set in P, then there is some finite subset L of L that is in P. 3. The set of finite languages in P is enumerable. Corollary The following properties of RE.,sets are not RE. a) L = b) L = Σ c) L is recursive d) L is not recursive

80 74 CHAPTER 9. UNDECIDABILITY e) L is a singleton f) L is a regular set g) L L u Example Show that the following properties of RE languages are not RE. 1. L = P = { L L = }. L P = { M T M description L(M) = }. Let L 1 P i.e. L 1 =. Let L 2 = Σ. L 1 is a subset of L 2, but L 2 / P. Rule 1 of theorem is not satisfied and so L is not RE. 2. S 1 = {M(T M) L(M) = 01 0} P = {01 0}. Let L 1 = S 1. Let us take L 2 = Σ. We can see that L 1 is a subset of L 2, and we know that Σ is RE. But we also know that Σ does not belong in P i.e. L 2 / P.The first rule of theorem is not satisfied. Therefore, S 1 is not RE. 3. S 2 = M L(M) 01 0 P {01 0}. Let L 1 be the language that contains strings of the form 00 i.e. L 1 = 00 and L 2 = L 1 L 2. L 2 is RE and L 2 / P. This once again does not satisfy rule 1 of theorem Therefore, S 2 is not RE. 4. L is Recursive. P = { L L is recursive }. L P = { M T M description L(M) is recursive }. Let L 1 P i.e. L 1 is recursive. Let L 2 = L 1 L u. L 1 is a subset of L 2, but L 2 / P, since L u is RE but not recursive(theorem 6.3.1). Rule 1 of theorem is not satisfied and so L is not RE. 5. L is not Recursive. P = { L L is not recursive }. L P = { M T M description L(M) is not recursive }. Let L 1 P i.e. L 1 is not recursive. Let L 2 = Σ. L 1 is a subset of L 2, but L 2 / P. Rule 1 of theorem is not satisfied and so L is not RE. 6. L is a singleton. P = {L L is a singleton }. L P = { M T M description L(M) is a singleton }. Let L 1 P i.e. L 1 is a singleton. Let L 2 = Σ. L 1 is a subset of L 2, but L 2 / P. Rule 1 of theorem is not satisfied and therefore L is not RE. 7. L is a regular set. P = {L L is a regular set }. L P = { M T M description L(M) is a regular set }. Let L 1 P i.e. L 1 is a regular set. Let L 2 = L 1 {0 n 1 n n 0 }. L 1 is a subset of L 2, but L 2 / P. Rule 1 of theorem is not satisfied and therefore L is not RE.

81 9.3. UNDECIDABLE PROBLEMS ABOUT TURING MACHINES 75 Corollary The following properties of RE.,sets are RE. a) L b) L contains atleast 10 memebers c) w is in L for some fixed word w d) L L u Example Show that the following properties of RE sets are RE. 1. L P = { L L }. L P = { M T M description L(M) }, which is the language L ne We know that L ne is RE(theorem 9.3.2). 2. L contains atleast 10 memebers. P = { L L 10 }. L P = { M T M description L(M) 10 } There exists a TM T 10 (fig. 9.18) that non-deterministically guesses strings. The TM accepts after 10 strings are guessed. Therefore, L is RE. Figure 9.18: Turing Machine that accepts after guessing 10 strings 3. w is in L for some fixed word w There exists a TM T w (fig. 9.19) that will take as input TM M and simulates M on the string w. If M accepts w, them T w accepts M. If T w accepts M, the string w L(M). Therefore, the property: w is in L for some fixed word w is RE. 4. L L u There exists a TM T L Lu (fig. 9.20) that takes as input some string w and simulates TM M on w. If M accepts w, then T L Lu simulates L u on w. If L u accepts w then T L Lu accepts and halts. This Turing machine accepts strings that are in L L u. Therefore this property is RE.

82 76 CHAPTER 9. UNDECIDABILITY Figure 9.19: Turing Machine that simulates M on w Figure 9.20: Turing Machine for L L u

83 9.4. POST S CORRESPONDENCE PROBLEM Post s Correspondence Problem Definition An instance of Post s Correspondence Problem(PCP) consists of two lists of strings over some alphabet σ; the two lists must be of equal length. We generally refer to th A and B lists, and write A = w 1, w 2,...w k and B = x 1, x 2...x k, for some integer k. For each i, the pair (w i, x i ) is said to be a corresponding pair. We say this instance of PCP has a solution, if there is a sequence of one or more integers i 1, i 2,..., i m that, when interpreted as indexes for strings in the A and B lists, yield the same string. That is, w i1 w i2...w im = x i1 x i2...x im. We say the sequence i 1, i 2,..., i m is a solution to this instance of PCP, if so. The Post s correspondence problem is: Given an instance of PCP, tell whether this instance has a solution. We shall prove PCP undecidable by reducing L u to PCP The Modified PCP It is easier to reduce L u to PCP if we first introduce an intermediate version of PCP, which we call the Modified Post s Correspondence Problem, or MPCP. In the modified PCP, there is an additional requirement on a solution that the first pair of the A and B lists must be the first pair in the solution. More formally, an instance of MPCP is two lists A = w 1, w 2,..., w k and B = x 1, x 2,..., x k, and a solution is a list of 0 or more integers i 1, i 2,..., i m such that w 1 w i1 w i2...w im = x 1 x i1 x i2...x im Theorem MPCP reduces to PCP. Theorem Post s Correspondence Problem is undecidable. 9.5 Other Undecidable Problems Now, we shall consider a variety of problems that we can prove undecidable by reducing PCP to the problem we wish to prove undecidable Undecidability of Ambiguity for CFG s Now, we shall see how to reduce PCP to the problem: the question of whether a given context-free grammar is ambiguous. Let the PCP instance consist of lists A = w 1, w 2,..., w k and B = x 1, x 2,..., x k. For list A we shall construct a CFG with A as the only variable. The terminals are all the symbols of the alphabet σ used for this PCP instance, plus a distinct set of index symbols a 1, a 2,..., a k that represent the choices of pairs of strings in a solution to the PCP instance. That is, the index symbol a i represents the choice of w i from the A list of x i from the B list. the productions for the CFG for the list A are: A w 1 Aa 1 w 2 Aa 2... w k Aa k w 1 a 1 w 2 a 2... w k a k We shall call this grammar G A and it s language L A. L A is the language for list A. Notice that the terminal strings derived by G A are all those of the form w i1 w i2...w im a im...a 12 a i1 for some m 1 and list of integers i 1, i 2,..., i m ; each integer is in the range 1 to k. The sentential forms of G A all have a single A between the strings (the w s) and the index symbols (the a s), until we use one of the last group of k productions, none of which have an A in the body. Also, only two production bodies end with a given index symbol a i : A w i Aa i and A w i a i. Now, let us consider the other part of the given PCP instance, the list B = x 1, x 2,..., x k. For this list we develop another grammar G B :

84 78 CHAPTER 9. UNDECIDABILITY B x 1 Ba 1 x 2 Ba 2... x k Ba k x 1 a 1 x 2 a 2... x k a k The language of this grammar will be referred to as L B. The same observations that we made for G A apply also to G B. In particular a terminal string in L B has a unique derivation, which can be determined by the index symbols in the tail of the string. Finally, we combine the languages and grammars of the two lists to form a grammar G AB for the entire PCP instance. G AB consists of: 1. Variables A, B, and S; the latter is the start symbol. 2. Productions S A B. 3. All the productions of G A. 4. All the productions of G B. We claim that G AB is ambiguous if and only if the instance (A, B) of PCP has a solution; that argument is the core of the next theorem. Theorem It is undecidable whether a CFG is ambiguous. Proof: We have already given the reduction of PCP to the question of whether a CFG is ambiguous; that reduction proves the problem of CFG ambiguity to be undecidable, since PCP is undecidable. We have to show that the above construction is correct, that is: G AB is ambiguous if and only if instance (A, B) of PCP has a solution. (If) Suppose i 1, i 2,..., i m is a solution, we know that w i1 w i2...w im = x i1 x i2...x im. Thus, these two derivations are derivations of the same terminal string. SInce the derivations themselves are clearly two distinct, leftmost derivations of the same terminal string, we conclude that G AB is ambiguous. (Only if) We already observed that a given terminal string cannot have more than one derivation in G A and not more than one in G B. So the only way that a terminal string could have two leftmost derivations in G AB is if one of them begins S A and continues with a derivation in G A, while the other begins S B and continues with a derivation of the same string in G B. The string with two derivations has a tail of indexes a im...a i2 a i1, for some m 1.This tail must be a solution to the PCP instance, because what precedes the tail in the string with two derivations is both w i1 w i2...w im and x i1 x i2...x im The Complement of a List Language Having context-free languages like L A for the list A let us show a number of problems about CFL s to be undecidable. More undecidability facts for CFL s can be obtained by considering the complement languages L A. Notice that the languages L A consists of all strings over the alphabet Σ {a 1, a 2,...a k } that are not in L A, where Σ is the alphabet of some of PCP, and the a i s are distict symbols representing the indexes of pairs in that PCP instance. We claim that L A is a CFL. Unlike L A, it is not very easy to design a grammar for L A, but we can design a PDA, in fact a deterministic PDA, for L A. The construction is in the next theorem. Theorem If L A is the language for the list A, then L A is a context-free language. Proof: Let Σ be the alphabet of strings on a list A = w 1, w 2,..., w k, and let I be the set of index symbols. I = {a 1, a 2,..., a k }. The DPDA P we design to accept L A works as follows. 1. As long as P sees symbols in Σ, it stores then on its stack. Since all strings in Σ as in L A, P accepts as it goes.

85 9.5. OTHER UNDECIDABLE PROBLEMS As soon as a P sees an index symbol in I, say a i, it pops its stack to see if the top symbols from wi R, that is, the reverse of the corresponding string. (a) If not, then the input seen so far, and any continuation of this input is in L A. Thus, P goes to an accepting state in which it consumes all future inputs without changing the stack. (b) If wi R was popped from the stack, but the bottom-of-stack marker is not yet exposed on the stack, then P accepts, but remembers, in its state that it is looking for symbols in I only, and may yet see a string in L A (which P will not accept). P repeats step(2) as long as the question of whether the input is in L A is unresolved. (c) If wi R was popped from the stack, and the bottom-of-stack marker is exposed, then P has seen an input in L A. P does not accept this input. However, since any input continuation cannot be in L A, P goes to a state where it accepts all futute inputs, leaving stack unchanged. 3. If, after seeing one or more symbols of I, P sees another symbol in Σ, then the input is not of the correct form to be in L A. Thus, P goes to a state in which it accepts this and all future inputs without changing its stack. Theorem Let G 1 and G 2 be context-free grammars, and let R be a regular expression. Then the following are undecidable. (a) Is L(G 1 ) L(G 2 ) =? (b) Is L(G 1 ) = L(G 2 )? (c) Is L(G 1 ) = L(R)? (d) Is L(G 1 ) = T for some alphabet T? (e) Is L(G 1 ) L(G 2 )? (f) Is L(R) L(G 1 )? Proof: Each of the proofs is reduction from PCP. We show how to take an instance (A, B) of PCP and convert it to a question about CFG s and/or regular expressions that has answer yes if and only if the instance of PCP has a solution. In some cases, we reduce PCP to the question as stated in the theorem; in other cases we reduce it to the complement. It doesn t matter, since if we show the complement of a problem to be undecidable, it is not possible that the problem is decidable, since the recursive languages are closed under complementation. (Theorem 9.2.2) Let the alphabet strings for the instance be Σ and the alphabet of index symbols be I. Our reductions depend on the fact that L A, L B, L A and L B all have CFG s. We construct these CFG s either directly as in section or by the construction of a PDA for the complement languages given in Theorem coupled with conversation from a PDA to a CFG. (a) Let L(G 1 ) = L A and L(G 2 ) = L B. Then L(G 1 ) L(G 2 ) is the set of solutions to this instance of PCP. The intersection is empty if and only if there is no solution. Note that, technically, we reduced PCP to the language of pairs of CFG s, whose intersection is nonempty;i.e., we have shown the problem is the intersection of two CFG s nonempty to be undecidable. However, as mentioned in the introduction of the proof, showing complement of a problem to be undecidable is tantamount to showing the probelm itself undecidable. (b) Since CFG s are closed under union, we can construct a CFG G 1 for L(G 1 ) L(G 2 ). Since (Σ I) is a regular set, we surely may construct for it a CFG G 2. Now L A L B = L A L B. Thus, L(G 1 ) is missing only those strings represent solutions to the instance of PCP. L(G 2 ) is missing no string in (Σ I). Thus, their languages are equal if and only if the PCP instance has no solution.

86 80 CHAPTER 9. UNDECIDABILITY (c) The argument is the same as for (b), but we let R be the regular expression (Σ I). (d) The argument of (c) suffices, since Σ I is the only alphabet of which L A L B could possibly be the closure. (e) Let G 1 be a CFG for (Σ I) and let G 2 be a CFG for L A L B. Then L(G 1 ) L(G 2 ) if and only if L A L B = (Σ I),i.e., if and only if PCP instance has no solution. (f) The argument is the same as (e), but let R be the regular expression (Σ I), and let L(G 1 ) be in L A L B.

87 Chapter 10 Intractable Problems In this chapter we introduce the theory of intractability, that is techniques for showing problems not to be solvable in polynomial time The Classes P and N P Problems Solvable in Polynomial Time A Turing machine M is said to be of time-complexity T (n) [or to have running time T (n) ] if whenever M is given an input w of length n, M halts after making at most T (n) moves, regardless of whether or not M accepts. We say a language L is in class P if there is some polynomial time T (n) such that L = L(M) for some deterministic TM M of time complexity T (n) An Example: Kruskal s Algorithm Let us consider the problem of finding a minimum-weight spanning tree for a graph. A spanning tree is a subset of the edges such that all nodes are connected through these edges, yet there are no cycles. An example of a spanning-tree appears in fig A minimum-weight spanning tree has the least possible total edge weight of all spanning trees. There is a well-known greedy algorithm, called Kruskal s algorithm, for finding a MWST. Figure 10.1: A graph 1. Maintain for each node the connected component in which the node appears, using whatever edges of the tree have been selected so far. Initially, no edges are selected, so every node is then in a connected component by itself. 81

88 82 CHAPTER 10. INTRACTABLE PROBLEMS 2. Consider the lowest-weight edge that has not yet been considered; break ties anyway you like. If this edge connects two nodes that are currently in different connected components then: (a) Select that edge for the spanning tree, and (b) Merge the two connected components involved, by changing the component number of all nodes in one of the two components to be the same as the component number of the other. If, on the other hand, the selected edge connects two nodes of the same component, then this edge does not belong in the spanning tree; it would create a cycle. 3. Continue considering edges until either all edges have been considered, or the number of edges selected for the spanning tree is one less than the number of nodes. Note that in the latter case, all nodes must be in one connected component, and we can stop considering edges. Example In the graph of fig 10.1, we first consider the edge (1,3), because it has the lowest weight, 10. since 1 and 3 are initially in different components, we accept this edge, and make 1 and 3 have the same component number, say component 1. The next edge in order of weights is (2,3), with weight 12. Since 2 and 3 are in different components, we accept this edge and merge node 2 into component 1. The third edge is (1,2), with weight 15. However, 1 and 2 are now in the same component, so we reject this edge and proceed to the fourth edge, (3,4). Since 4 is not in component 1, we accept this edge. Now, we have three edges for the spanning tree of a 4-node graph, and so may stop. It is possible to implement this algorithm(using a computer not a TM) on a graph with m nodes and e edges in time O(m + e log e). A simpler implementation proceeds in e rounds. A table gives the current component of each node. We pick the lowest-weight remaining edge in O(e) time, and find the components of the two nodes connected by the edge in m time. If they are in different components, merge all nodes wth those numbers in O(m) time, by scanning the table of nodes. The total time taken by this algorithm is O(e(e + m)). This running time is polynomial in the size of the input, which we might informally take to be the sum of e and m. When we translate the above ideas to Turing Machines, we face several issues: When we study algorithms we encounter problems that ask for outputs in a variety of forms, such as a list of edges in a MWST. When we deal with Turing machines, we may ony think of problems as languages, and th only output is yes or no, i.e. accept or reject. For instance, the MWST tree problem could be couched as: Given this graph G and limit W, does G have a spanning tree of weight W or less? That problem may seem easier to answer han the MWST problem with which we are familiar, since we don t even learn what the spanning tree is. However, in the theory of intractability. we generally want to argue that a problem is hard, not easy, and the fact that a yes-no version of a problem is hard implies that a more standard version, where a full answer must be computed, is also hard. While we might think informally of the size of a graph as the number of nodes or edges, the inout to a TM is a string over a finite alphabet. Thus, problem elements such as nodes and edges must be encoded suitably. The effect of this requirement is that inputs to Turing Machines are generaly slightly longer than the intuitive size of the input. However there are two reasons why the difference is not significant: 1. The difference between the size as a TM input string and as an informal problem inout is never more than a small factor, usually the logarithm of the input size. Thus, what can be done in polynomial time using one measure can be done in polynomial time using the other measure. 2. The length of a string representing the input is actually a more accurate measure of the number of bytes a real computer has to read to get its input. For instance, if a node is

89 10.1. THE CLASSES P AND N P 83 Example represented by an integer, then the number of bytes needed to represent that integer is proportional to the logarithm of the integer s size, and it is not 1 byte for any node as we might imagine in an informal accounting for input size. Let us consider a possible code for the graphs and weight limits that could be input to the MWST problem. The code has five symbols, 0,1, the left and right parentheses, and the comma. 1. Assign integers 1 through m to the nodes. 2. Begin the code with the value of m in binary and the weight limit W in binary, separated by a comma. 3. If there is an edge between nodes i and j with weight w, place (i, j, w) in the code. The integers i, j, and w are coded in binary. The order of i and j within an edge, and the order of edges within the code are immaterial. Thus, one of the possible codes for graph of Fig 10.1 with limit W = 40 is 100,101000(1,10,1111)(1,11,1010)(10,11,1100)(10,100,10100)(11,100,10010) If we represent inouts to the MWST problem as in Example , then an inout of length n can represent at most O(n/logn) edges. It is possible that m, the number of nodes, could be exponential in n, if there are very few edges. However, unless the number if edges, e, is atleast m-1, the graph cannot be connected and therefore will have no MWST, regardless of it s edges. Consequently, if the number of nodes is not atleast some fraction of n/log n, there is no need to run Kruskal s algorithm at all; we simply say no; there is no spanning tree of that weight. Thus, if we have an upper bound on the running time of Kruskal s algorithm as a function of m and e, such as the upper bound O(e(m + e)) developed above, we can conservatively replace both m and e by n and say that the running time, as a function of the input length n is O(n(n + n)), or O(n 2 ). We claim that in O(n 2 ) steps we can implement the version of Kruskal s algorithm described above on a multitape TM. The extra tapes are used for several jobs: 1. The input tape can hold the code for graph beginning with the number of nodes, the limit W and the edges as described in Example The second tape stores the list of nodes and their current components. This tape is O(n) in length. 3. A third tape is used to store the current least weight edge. Scanning for the lowest-weight unmarked edge takes O(n) time. 4. When an edge is selected in a round, place it s two nodes on a fourth tape. Search the table of nodes and components to find the components of these two nodes. This requires O(n) time. 5. A tape can be used to hold the two components i and j, being merged. We scan the table of nodes and components and each node found to be in component i has its component number changed to j. This scan takes O(n) time. We should thus be able to say that one round can be executed in O(n) time on a multi-tape TM. Since the number of rounds, e, is at most n, we conclude that O(n 2 ) time suffices on a multi-tape TM. Theorem says that whatever a multitape TM can do in s steps, a single-tape TM can d in O(s 2 ) steps. Thus, if the multitape TM takes O(n 2 ) steps, then we can construct a single-tape TM to do the same thing in 0(n 4 ) steps. Our conclusion is that the yes-no version of the MWST problem, does graph G have a MWST of total weight W or less, is in P.

90 84 CHAPTER 10. INTRACTABLE PROBLEMS An N P Example: The Travelling Salesman Problem The Travelling Salesman Problem (TSP) is an example of a problem that appears to be in N P but not in P. The input to TSP is the same as to MWST, a graph with integer weights on the edges such as that of fig 10.1, and a weight limit W. The question asked whether graph has a Hamilton circuit of total weight at most W. Definition A Hamilton circuit is a set of edges that connect the nodes into single cycle, with each node appearing exactly once. Note that the number of edges on a Hamilton circuit must equal the number of nodes in the graph NP-complete Problems Definition Let L be a language(problem) in N P. L is NP-complete if the following statements are true about L: 1. L is in N P. 2. For every language L in N P there is a polynomial-time reduction of L to L. Theorem If P 1 is NP-complete, P 2 is in N P, and there is a polynomial-time reduction of P 1 to P 2, then P 2 is NP-complete. Theorem If some NP-complete probelm P is in P, then P = N P The Satisfiability Problem Definition The boolean expressions are built from: 1. Variables whose values are boolean;i.e., they either have the value 1(true) or 0(false). 2. Binary operations and, standing for the logical AND and OR of two expressions. 3. Unary operation standing for logical negation. 4. Parentheses to group operators and operands, if necessary to alter the default precedence of operators: highest, then, and finally. Theorem (Cook s Theorem) SAT is NP-complete NP-Completeness of 3SAT Definition SAT probelm is: Given a boolean expression E that is the product of cluases, each of which is the sum of three distict literals, is E satisfiable? Theorem SAT is NP-complete.

91 Bibliography [1] Sudkamp A. Thomas, Languages and Machines, An Introduction to the Theory of Compute Science, Addison Wesley Publishing Company, Inc.,United States of America,