SOLUTION Trial Test Grammar & Parsing Deficiency Course for the Master in Software Technology Programme Utrecht University

SOLUTION Trial Test Grammar & Parsing Deficiency Course for the Master in Software Technology Programme Utrecht University Year 2004/2005 1. (a) LM is a language that consists of sentences of L continued by sentences of M. More formally, LM = {st s L t M}. (b) Yes, if both L and M are context free, then LM is also context free. The argument is as follows. If L and M are context free, then there are context free grammars G and H such that L = L(G) and M = L(H). To show that LM is context free it is sufficient to show that there is a context free grammar J such that LM = L(J). J can be easily constructed. Take all terminals, non-terminals, and rules of G and H into J. Suppose that S G and S H arew the start symbols of G and H respectively. Add now a new non-terminal S J to J. S J is also the start symbol of J. Add a new rule to J: S J S G S H From this rule it is obvious that S J will first expand to a a sentence derivable from S G (hence a sentence from L) continued by a sentence derivable from S H (a setence from H). Moreover notice that all rules of J are context free, because they are either rules from G or H, which are context free, or the new rule above, which is also context free. (c) Yes, uf both L and M are regular, then LM is also regular. Unfortunately, the argument given above is not suffcient to imply that LM is regular (well, the new rule added above is not allowed in a regular grammar). It is sufficient to show either a regular grammar generating LM, or an NFA generating LM. Personally, I find the latter easier in this particular problem, so that s what I will do. If L and M are regular languages, then by definition of regular languages and by their equivalence with NFAs (Theorems 12 and 13 1

from LN) there are NFAs M 1 and M 2 such that L = L(M 1 ) and M = L(M 2 ). To show that LM is regular it is sufficient to show that there is an NFA M 3 such that LM = L(M 3 ). Let M 1 = (X 1, Q 1, d 1, S 1, F 1 ) and M 2 = (X 2, Q 2, d 2, S 2, F 2 ). Let M 3 = (X 3, Q 3, d 3, S 3, F 3 ) The NFA M 3 is constructed by taking all states and transitions of M 1 and M 2. To make it simpler, we assume the states of M 1 and M 2 are disjoint (else rename them so that that they are). So, Q 3 = Q 1 Q 2 and for d 3 : d 3 r a = if r Q 1 then d 1 r a else d 2 r a However, we also extend d 3 such that from a final state of M 1 we can jump in a zero-input step (without consuming an input symbol) to a starting state of M 2 and therefore can continue with a sentence from M 2. However, since by definition our NFA cannot do a zero input step, then we will have to modify this a little. So, for each state f F 1, d 3 f is defined as below instead: d 3 f a = d 1 f a deltas d 2 S 2 a deltas is as defined in LN, namely: deltas d V a = {b r V b d r a} The starting states of M 3 are just the starting states of M 1. The final states of M 3 are the final states of M 2. If however ɛ M (so, S 2 is also a final state of M 2 ), then all final states of M 1 should also be added to the set of final states of M 3. By its construction, M 3 generates L(M 1 ) L(M 2 ) and thus LM. 2. (a) ppred :: Parser Char Pred ppred = first (pchainl pterm pand) pterm :: Parser Char Pred pterm = (\e1 _ e1-> Equal e1 e2) <$> pexpr <*> ptoken "==" <*> pexpr pand :: Parser Char (Pred->Pred->Pred) pand = (\_ -> And) <$> ptoken "&&" pexpr :: Parser Char Expr pexpr = first (pchainl pbasicexpr pplus) pbasicexpr :: Paser Char Expr pbasicexpr = pidentifier < > plet 2

pidentifier = (\c-> Ident [c]) <$> satisfy ( elem "wxyz") plet = (\_ x e1 _ e2 -> Let x e1 e2) <$> token "let" <*> pidentifier <*> pexpr <*> token "in" <*> pexpr pplus :: Parser Char (Expr->Expr->Expr) pplus = (\_ -> Plus) <$> ptoken "+" (b) type predalgebra e p = (Alg1 e p, Alg2 e) type Alg1 e p = ((e->e->p), (p->p->p)) type Alg2 e = ((String->e), (String->e->e->e), (e->e->e)) (c) foldpred :: predalgebra e p -> Pred -> p foldpred ((fequal,fand),(fident,flet,fplus)) p = fold1 p fold1 (Equal e1 e2) = fequal (fold2 e1) (fold2 e2) fold1 (And p1 p2) = fand (fold1 p1) (fold1 p2) fold2 (Ident x) = fident x fold2 (Let x e1 e2) = flet x (fold2 e1) (fold2 e2) fold2 (Plus e1 e2) = fplus (fold2 e1) (fold2 e2) (d) (e) containlet :: Pred -> Bool containlet = foldpred algebra albegra = ((fequal,fand),(fident,flet,fplus)) fequal = ( ) fand = ( ) fident x = False flet _ = True fplus = ( ) eval :: Pred -> Env -> Bool eval t env0 = foldpred algebra t algebra :: predalgebra (Env->Int) Bool albegra = ((fequal,fand),(fident,flet,fplus)) 3

fequal v1 v2 = v1 env0 == v2 env0 fand = (&&) fident x env = env? x flet x v1 v2 env = v2 (update (x,v1 env) env) fplus v1 v2 env = v1 env + v2 env ((x,val):s)? y = if x==y then val else s?y update (x,val) env = (x,val) : filter ((== x). fst) env 3. a,b c Non-terminal empty f irst f ollow S no { r,y,x,p,q } { # } A yes { y,x } { p,q,w } B no { p,q } { # } C no { x,y,z } { # } D yes { y } { x,w,p,q } E yes { x } { z,w,p,q } rule lookahead 1 S A B {y, x, p, q} 2 S r A w {r} 3 A D E {x, y, p, q, w} 4 B p {p} 5 B q C {q} 6 C E z {x, z} 7 C y {y} 8 D ɛ {x, w, p, q} 9 D y {y} 10 E ɛ {z, w, p, q} 11 E x {x} d An LL(1) parser uses a stack which initially contains the start symbol. At each step, the parser looks at the top of the stack. If it is a terminal symbol a, then it tries to consume an a from the input string. If the input string does not start with a then the parser simply fails, which means that the input string cannot be recognized by the parser. Now suppose the top symbol is a non-terminal A. The parser now take a rule that expands A. It pops A from the stack, and in its place it pushes the right hand side of the rule onto the stack. However, there may be more than one rule to expand A. Which one to use? If the parser is naive, then it simply picks one non-deterministically. However, if the grammar is LL(1), then based on the current first 4

symbol in the input string we can decide which rule to use. The look-ahead for each rule specifies the set of possible terminals that can occur as the first symbol in that situation. In an LL(1) grammar the look-ahead sets of all rules expanding a given non-terminal A are disjoint, and thus they completely determine which rule to use. 5