Theory of Computation CSCI 3434, Spring 2010

Homework 3: Solutions Theory of Computation CSCI 3434, Spring 2010 1. This was listed as a question to make it clear that you would lose points if you did not answer these questions or work with another student in the class. Some of the content in these solutions was taken from wikipedia. 2. Construct CFGs for the following languages over Σ = {0, 1}: a. L = odd-length strings whose first, middle, and last characters are all the same S 0A0 1B1 0 1 A 0A0 0A1 1A0 1A1 0 B 0B0 0B1 1B0 1B1 1 The string is built from the outside in. Initially, first and last characters are set alike. Then, if there s more than one character, any pairs of characters can be added, but it ends with the same character as the first and last given for the middle character. b. L = {0 m 1 n m n} S 0S1 A B 0 1 A 0A 0 B 1B 1 S builds equal numbers of 0 s and 1 s, then ends with an additional 0 or 1 to prevent having the same number. To get more than one additional 0 or 1, A adds unlimited 0 s and B adds unlimited 1 s. c. L = {x x x R (x is not a palindrome)} S 0S0 1S1 A A 0B1 1B0 01 10 B 0B0 0B1 1B0 1B1 1 0 ε The string is built from the outside in. The variable A, which must be used in a derivation of a final string, ensures that the string is not a palindrome 3. Describe the languages generated by: a. S ASA A ε A 00 ε L = {x x is a string of 0's of even length, including length zero} = (00)*

b. S 0S1 SS ε L = {x x has the same number of 0 s and 1 s, and every prefix has at least as many 0 s as 1 s } This is equivalent to the language of well-formed parentheses, with 0={ and 1=}. c. S T0T T TT 0T1 1T0 0 ε L = {x x has more 0 s than 1's} 4. Construct PDAs for the following languages over Σ = {0, 1}: a. L = {x x x R (x is not a palindrome)} 0, ε : push 0 1, ε : push 1 0, 0 : pop a, a : pop 1, 1 : pop A ε, a : pop B a, b {0,1} 0, 1 : pop C 0, 1 : pop EBN D a, b : pop State A reads the first half of the string, including the middle character for oddlength strings. State B reads the 2 nd half of the string while the portion read so far is palindromic with the end of the 1 st half. State C is entered when the string has been confirmed not to be a palindrome. If the end of string is encountered at the same time as the stack bottom, then the guess for when to transition out of state A was correct, and if already in state C, the string can be accepted. b. L = strings containing as many 1s as 0s 0, A : push 0 A {0, BOT} 1, B : push 1 B {1, BOT} 0, 1 : pop EBN

At any point, the stack contains only 0s or only 1s. A 0 is popped when a 1 is read, and vice versa, ensuring there are the same number of 1s as 0s. c. L = {0 m 1 m+n 0 n m, n 0} 1, BOT : push 1 (m=0) 0, ε : push 1, ε : push 1 1, BOT : push 1 EBN (m=n=0) EBN (n=0) EBN 0, 1 : pop 0, 1 : pop Initially, 0s are pushed onto the stack until a 1 is seen. Then an equal number of 1s can pop (m ) 0s, at which point any remaining 1s are pushed. These are balanced by final 0s, which pop the 1s. There are additional transitions (indicated) to account for m and/or n equal to 0. d. L = {0 n x x n } 0, ε : push 0 EOS, ε : NoOp a, 0 : pop a {0,1} EOS, ε : NoOp Since the length of x must be no more than the initial string of 0s, we may as well include all initial 0s in this portion of the string, forcing x to begin with a 1. We push all the initial 0s onto the stack, and pop them off for any character in x. We accept if we reach the end of the input string without first prematurely hitting the bottom of the stack. 5. Prove the following languages are not CFL: a. L = {0 m 1 n 2 p m = n+p} this is actually CF. Think of a PDA that pushes 0s, then pops 1s then pops 2s. b. L = {0 m 1 n 2 p m n+p} this is actually also CF. Think of a similar PDA as the one in 5a, but that only accepts if the stack isn t empty at EOS (more 0s) or if the stack becomes empty before reading all 1s and 2s (fewer 0s).

c. L = {0 m 1 n 2 p m < n < p} This is CF. Sketch of proof (see 6.2 for format of a complete proof): Consider the string x = 0 k 1 k+1 2 k +2 = uvwyz. The 2 locations to pump (v and y) cannot span all 3 regions (0s, 1s, and 2s), so cannot include both 0s and 2s. Anywhere v and y could be, either pumping up will yield too many 0s or 1s for the number of 2s (if no 2s pumped), or pumping down will yield not enough 1s or 2s for the number of 0s (if no 0s pumped) 6. Ex. 3.6, parts 1 and 2 on pp. 86-87. Prove your answers. Which of the following languages are context-free? 1. L 10 = {a i b j a j b i i,j 0 } Context free. It can be generated by the context free grammar: S asb A ε A baa 2. L 11 = {a i b j a i b j i,j 0 } This is not context free. Use the pumping lemma to justify this. Proof: Assume L 11 is context-free. Let k be the constant from the pumping lemma. Consider the string x = a k b k+1 a k b k +1. Think of x as comprised of 4 groups of characters: a group of a s followed by a group of b s followed by another group of a s followed by another group of b s. Then uvwyz = x with vy 1, vwy k, and uv i wy i z L i N. So v and y are either entirely within one group or within two consecutive groups. Consider u v 2 w y 2 z. If v or y contain both an a and a b, then we end up with more than 4 groups, and u v 2 w y 2 z is not in L 11. So both v and y must contain only one type of character. If u v 2 w y 2 z adds additional a s to one group of a s, it cannot also add to the other group of a s, so u v 2 w y 2 z is not in L 11. Similarly, if u v 2 w y 2 z adds additional b s to one group of b s, it cannot also add to the other group of b s, so u v 2 w y 2 z is not in L 11. We have considered all possibilities for v and y, and none would allow u v 2 w y 2 z to be in L 11. So we have a contradiction, and L 11 is not context-free. 7. Give a regular grammar that generates the same language as the following CFGs:

a. S as Sb ε Any number of a s can be placed before the growing point (S), and any number of b s can be placed after it. Thus, the language is a*b*. An equivalent regular grammar is: S as A ε A ba ε b. S SSS a ab An equivalent regular grammar is: S aa aba a ab A ab abb B aa aba a ab c. S AB A aaa bab a b B ab bb ε This grammar gives a palindrome of length at least one, followed by a*b*. The key thing to notice is that any string of length at least one begins with a palindrome of length at least one, since the first character is a palindrome regardless of whether there s a longer palindromic prefix. This makes the language (a,b) +. An equivalent regular grammar is: S as bs a b 8. Recall that there are CFLs that can only be generated by an ambiguous CFG and CFLs that can only be generated by a nondeterministic PDA. How does the set of inherently ambiguous CFLs relate to the set of nondeterministic CFLs (are they equal, disjoint, overlapping, or is one contained in the other)? The set of inherently ambiguous CFLs is a proper subset of the set of nondeterministic CFLs. This can be seen because any CFG created from a DPDA using the method shown in class will be unambiguous: the leftmost character of every substring can have been generated by only one nonterminal, since the machine for the grammar was deterministic. As discussed in class, the language of palindromes is a nondeterministic CFL (the middle of the word must be "guessed" by a PDA), but it is not inherently ambiguous (each word can be generated by only one sequence of rules S 0S0 1S1 0 1 ε ). This implies that ambiguous languages and deterministic languages are not equal, so ambiguous CFLs nondeterministic CFLs. NOTE that a nondeterministic CFL cannot be generated by a deterministic PDA. Any CFL can be generated by a nondeterministic PDA, but here we are interested only in those that can only be generated by a nondeterministic PDA (and cannot be generated by a deterministic PDA).

9. Ex. 3.7 on p. 87. Even if A and B are both context-free, A B may not be. Prove this by giving an example of two context-free languages whose intersection is not context-free. Consider the two CFLs A and B where A = {a m b n c n m, n 0}, generated by the CFG: S as A ε A ba c ε and B = {a m b m c n m, n 0}, generated by the CFG: S Sc A ε A aa b ε The intersection is A B = {a n b n c n n 0} As shown in class this language is not a CFL. 10. Describe these conventions for CFGs (you will need to use a book or google): a. Chomsky normal form Grammars in Chomsky Normal Form (CNF) contain only production rules of the forms A BC or A a or S ε with A, B, C, and S nonterminals, S the start symbol, a a terminal symbol, neither B nor C can be S. This form simplifies parsing, and there are common algorithms (such as CYK) that can efficiently parse a string to determine if it is in the language described by a CFG in CNF. All CFGs can be written in CNF. b. Backus-Naur form BNF is a format for CFGs (a metasyntax), which places both nonterminals and some terminals (those generated by some rule outside the grammar, for example by regular expressions) inside angle brackets (allowing meaningful symbol names). Literals within the grammar are placed in quotes. Rules are written with a nonterminal on the left, followed by ::=, followed by the productions for the rule, separated by. Terminals are identifiable because they never appear on the left of any rule. c. Greibach normal form Grammars in Greibech Normal Form contain only production rules of the form A a X or S ε with A and S nonterminals, S the start symbol, X a (possibly empty) sequence of nonterminals, S X, a is a terminal symbol. This grammar prevents left-recursions, which can simplify parsing. All CFGs can be written in GNF. 11. Prove that any CFL can be generated by a CFG in Chomsky normal form.

idea: Given any CFG G, construct G' in CNF with L(G) = L(G ) by replacing rules not in CNF with additional rules (and nonterminals) as follows: add new start symbol with only rule going to previous start symbol (this will be fixed later) remove A ε (new S ε can stay) by adding new rules for each rule with A on right-hand-side (RHS). iterate this step (doing this can create new A ε rules) similarly, remove rules A B by adding new rules for any rule with B on LHS. iterate this step. replace long rules with a chain of shorter rules, replacing terminals on RHS with new nonterminals with rule sending this new nonterminal to the terminal symbol it replaced. 12. In class we showed that L(CFG) L(PDA), that the language generated by any CFG could be generated by a PDA. Complete the proof that L(CFG) = L(PDA). To complete this proof, we need to show that L(PDA) L(CFG). The proof is by construction; for an arbitrary PDA M we generate a CFG G that generates exactly the strings recognized by M. Given a PDA M = (Σ, Γ, Q, q 0, q F, δ). We require that M empties its stack before accepting, but we can simply add a pre-accept state to do this. We require M to have a single accept state, which we can achieve with an ε -transition from all accept states to a new single accept state (or pre-accept state). M is also required to either push or pop (not both) a single stack symbol at every transition. This requires splitting states for all NoOp transitions to push and then pop an arbitrary symbol). We construct G as follows. We include a nonterminal A pq for all pairs of states p, q Q. The productions from A pq are all strings that cause M to transition from p to q with empty stacks (from state p with an empty stack, reading a substring causes a sequence of transitions that push and pop characters onto the stack, ending with a pop that empties the stack going to state q). G's start symbol is

A q0 q F, corresponding to starting at the start state with an empty stack and ending at the final state with an empty stack. The above constraints on M allow us to say some things about its processing between p and q with an empty stack in both states: 1. The move out of p is a push 2. The move into q is a pop 3. Either the symbol popped for q is (a) the same as the one pushed in p (in which case the stack was never empty between p and q) (b) different from the one pushed in p (in which case the stack was empty at some point between p and q) G can simulate case (a) with a rule A pq a A rs b with a, b Σ, the symbols read out of p and into q, respectively; and r and s the states after p and before q, respectively. We effectively define the symbol pushed in state p as the new bottom of the stack, since it is never removed until state q (so it is not removed anywhere between r and s). a or b or both can be ε. case (b) is simply: A pq A pr A rq with r the state where the stack becomes empty. Through successive application of these two types of rules the grammar builds a string accepted by M from both ends (A rs is not needed where there is only one state between p and q, so the grammar eventually finishes). Therefore L(G) = L(M), and L(PDA) L(CFG). 13. What does this TM do? Annotate it appropriately.

0 : 0R 1 : 1R 0 : 0L 0 : 1L : 0L 1 : 1R 1 : 0L A B C E 1 : 0L F : R 1 : 1L D 0 : 1L : 1R 1 : 1R 1 : 0L I G 0 : 1L H This TM multiplies an input binary number by 3. Idea: Multiply by 2 and add the original number to the result Multiply by 2 by appending a 0. Add the original number to the doubled number: starting from the right, in each position put the sum of the digit there and the digit to the left. State A: move to the end of the string A B transition: append a 0, thereby doubling the input string State B: no bit is being carried State F: a carried bit must be remembered State I: multiplication completed B loop: Add 0 to char to right, so no need to change anything. States B, C, D, E, F, G, H: Add original number to get a multiple of 3. B, D, D Loop: Simple add of a 1 to the number before it. B C: Read 1 to add to character to right. C D: 1 to be added to 0; do it. D B: move past added 1 to see next character to be added. C E F: 1 to be added to 1; do it and move to carry states. State F: reads next character to add to the character to the right (carrying a 1) B, C, E, F Loop: Add with one carry F, G, H Loop: Allows for multiple carries

F B: Add 0 to char to right, no need to change char to right, but change this 0 to 1 and get out of carry state. F G: Read 1 to add to character to right. G H: character to right must be 0 (see all entries to this loop), change it to 1. H F: Change added 1 to 0 to propagate carry. NOTE: Question 14 appeared only on the challenge version. 14. Prove your answers. In question 9 above you showed that context-free languages are not closed under union. Are context-free languages closed under: NOTE that there is a mistake in this question. Question 9 showed that context-free languages are not closed under intersection. Context Free languages are closed under union. a. concatenation? Yes. Consider CFGs for two CFLs, L 1 and L 2. Place a subscript on the nonterminals for both L 1 and L 2 to avoid ambiguity. Then for L = L 1 L 2, create a new rule S S 1 S 2, with S the start symbol for L. Include all rules for L1 and L2. Since all rules in L are context-free (either S S 1 S 2 is context-free, as are all rules from L 1 and L 2 ), L is context-free. b. complementation? No. This is a result of the union and intersection closure properties. Let L c indicate the complement of L. Assume that CFLs are closed under complementation. Then if L 1 and L 2 are context-free, so are L 1 c and L2 c. Then (L1 c L2 c ) must be context-free, and so must (L 1 c L2 c ) c = L1 L 2 by De Morgan's theorems. However, we know L 1 and L 2 is not necessarily context-free, a contradiction, so context-free languages are not closed under complementation. c. Kleene closure? Yes. To obtain L* create a new start symbol S with rules S SS ε, with S the start symbol for L, with S the start symbol for L, and include all the rules for L. The new grammar is context-free (The added rules for S are context-free as are the rules for L) and generates L*.