Introduction to Lnguges nd Grmmrs
lphbets nd Lnguges n lphbet is finite non-empty set. Let nd T be lphbets. T = { s t s ε, t ε T } (We ll often write T for T.) λ = empty string, string of length one 0 = {λ } 1 = n = (n-1), n > 1 + = 1 U 2 U 3 U... * = 0 U + lnguge L over n lphbet is subset of *.
How Mny Lnguges re There? How mny lnguges over prticulr lphbet re there? Uncountbly infinitely mny! Then, ny finite method of describing lnguges cn not include ll of them. Forml lnguge theory gives us techniques for defining some lnguges over n lphbet.
Why should I cre bout? Concepts of syntx nd semntics used widely in computer science: Bsic compiler functions Development of computer lnguges xploring the cpbilities nd limittions of lgorithmic problem solving
Methods for Defining Lnguges Grmmr Rules for defining which strings over n lphbet re in prticulr lnguge utomton (plurl is utomt) mthemticl model of computer which cn determine whether prticulr string is in the lnguge
Definition of Grmmr grmmr G is 4 tuple G = (N, Σ, P, ), where N is n lphbet of nonterminl symbols Σ is n lphbet of terminl symbols N nd Σ re disjoint is n element of N; is the strt symbol or initil symbol of the grmmr P is set of productions of the form α -> β where α is in (N U Σ)* N (N U Σ)* β is in (N U Σ)*
Definition of Lnguge Generted by Grmmr We define => by γ α δ => γ δ β if α -> β is in P, nd γ nd δ re in (N U Σ)* =>+ is the trnsitive closure of => =>* is the reflexive trnsitive closure of => The lnguge L generted by grmmr G = (N, Σ, P, ), is defined by L = L(G) = { x *=> x nd x is in Σ* }
Clsses of Grmmrs (The Chomsky Hierrchy) Type 0, Phrse tructure (sme s bsic grmmr definition) Type 1, Context ensitive (1) α -> β where α is in (N U Σ)* N (N U Σ)*, β is in (N U Σ)+, nd length(α) length(β) (2) γ δ -> γ β δ where is in N, β is in (N U Σ) +, nd γ nd δ re in (N U Σ)* Type 2, Context Free -> β where is in N, β is in (N U Σ) * Liner -> x or -> x B y, where nd B re in N nd x nd y re in Σ* Type 3, Regulr xpressions (1) left liner -> B or ->, where nd B re in N nd is in Σ (2) right liner -> B or ->, where nd B re in N nd is in Σ
Comments on the Chomsky Hierrchy (1) Definitions (1) nd (2) for context sensitive re equivlent. Definitions (1) nd (2) for regulr expressions re equivlent. If grmmr hs productions of ll three of the forms described in definitions (1) nd (2) for regulr expressions, then it is liner grmmr. ch definition of context sensitive is restriction on the definition of phrse structure. very context free grmmr cn be converted to context sensitive grmmr with stisfies definition (2) which genertes the sme lnguge except the lnguge generted by the context sensitive grmmr cnnot contin the empty string λ. The definition of liner grmmr is restriction on the definition of context free. The definitions of left liner nd right liner re restrictions on the definition of liner.
Comments on the Chomsky Hierrchy very lnguge generted by left liner grmmr cn be generted by right liner grmmr, nd every lnguge generted by right liner grmmr cn be generted by left liner grmmr. very lnguge generted by left liner or right liner grmmr cn be generted by liner grmmr. very lnguge generted by liner grmmr cn be generted by context free grmmr. Let L be lnguge generted by context free grmmr. If L does not contin λ, then L cn be generted by context sensitive grmmr. If L contins λ, then L-{λ} cn be generted by context sensitive grmmr. very lnguge generted by context sensitive grmmr cn be generted by phrse structure grmmr.
xmple : Left Liner Grmmr for Identifiers -> -> b -> 1 -> 2 -> -> b => => 1 => 1 => 2 => b 2 => 1 b 2 => 1 b 2
xmple : Right Liner Grmmr for Identifiers -> T -> b T -> -> b T -> T T -> b T T -> 1 T T -> 2 T T -> T -> b T -> 1 T -> 2 => => T => 1 => T => 1 T => 1 b T => 1 b 2
xmple: Right Liner Grmmr for { n b m c p n, m, p > 0} -> -> -> b B B -> b B B -> c C B -> c C -> c C C -> c => => => => b B => b c C => b c c
xmple : Liner Grmmr for { n b n n >0 } -> b -> b => b => b b => b b b => b b b b
xmple : Liner Grmmr for { n b m c m d n n > 0} (Context Free Grmmr) -> b -> T b T -> c T d T -> c d => b => b b => T b b b => c T d b b b => c c d d b b b
nother xmple : Context Free Grmmr for { n b n c m d m n > 0} -> R T R -> R b R -> b T -> c T d T -> c d => R T => R b T => R b b T => b b b T => b b b c T d => b b b c c d d
Context Free Grmmr for xpressions -> -> + T -> - T -> T T -> T * F T -> T / F T -> F F -> ( ) F -> F -> b F -> c F -> d F -> e => => + T => - T + T => T - T + T => F - T + T => - T + T => - T * F + T => - F * F + T => - b * F + T => - b * c + T => - b * c + F => - b * c - d
Context ensitive Grmmr for { n b n c n n >0} -> B C -> B C B -> b b B -> b b C B -> B C b C -> b c c C -> c c => B C => B C B C => b C B C => b B C C => b b C C => b b c C => b b c c
The Chomsky Hierrchy nd the Block Digrm of Compiler Type 3 Type 2 ource lnguge cnner progrm tokens Prser tree Intermedite Code Genertor Optimizer Int. code Code Genertor Object lnguge progrm Type 1 ymbol Tble Mnger rror Hndler rror messges ymbol Tble
utomt Turing mchine (Tm) Liner bounded utomton (lb) 2-stck pushdown utomton (2pd) (1-stck) pushdown utomton (pd) 1 turn pushdown utomton finite stte utomton (fs)
Recursive Definition Primitive regulr expressions: Given regulr expressions r 1 nd, r 2 λ, α r r r 1 1 + * r 1 ( r ) 1 r 2 2 re regulr expressions
xmple ( + b) * L (( + b) *) = L (( + b) ) L( *) ( b) L( *) = L + = = = = ( L( ) L( b) ) ( L( ) )* ({ } { b} ) { } ( )* {, b}{ λ,,,,... } {,,,..., b, b, b,... }
xmple ( + b) * L (( + b) *) = L (( + b) ) L( *) ( b) L( *) = L + = = = = ( L( ) L( b) ) ( L( ) )* ({ } { b} ) { } ( )* {, b}{ λ,,,,... } {,,,..., b, b, b,... }
xmple Regulr expression r = ( ) *( bb) * b L 2n 2 m ( r) = { b b : n, m 0}
xmple Regulr expression r = ( 0 + 1)*00 (0 + 1) * L (r) = { ll strings with t lest two consecutive 0 }
xmple Regulr expression r = ( 1+ 01)*(0 + λ) L (r) = { ll strings without two consecutive 0 }
quivlent Regulr xpressions Definition: r1 r2 Regulr expressions nd re equivlent if L ( r1 ) = L( r2 )
xmple L = { ll strings without two consecutive 0 } r1 = (1 + 01)*(0 + λ) r2 = (1*011*)*(0 + λ ) + 1*(0 + λ) L r ) = L( r ) = ( 1 2 L r1 nd r2 re equivlent regulr expr.
Liner Grmmrs Grmmrs with t most one vrible t the right side of production xmples: λ b b b λ
Non-Liner Grmmr Grmmr : G λ b b L( G) = { w: n ( w) n ( w)} = b Number of in string w
nother Liner Grmmr G Grmmr : B λ B b n n L( G) = { b : n 0}
Right-Liner Grmmrs ll productions hve form: xb or x xmple: b string of terminls
Left-Liner Grmmrs ll productions hve form: Bx or x xmple: b b B string of terminls B
Regulr Grmmrs regulr grmmr is ny right-liner or left-liner grmmr xmples: G 1 G 2 b b b B B
Observtion Regulr grmmrs generte regulr lnguges xmples: G 1 G 2 b B b b B L ( G 1 ) = ( b) * L ( G2) = b( b)*
Closure under lnguge opertions Theorem. The set of regulr lnguges is closed under the ll the following opertions. In other words if L 1 nd L 2 re regulr, then so is: Union: L 1 L 2 Intersection: L 1 L 2 Complement: L 1c = Σ* \ L 1 Difference: L 1 \ L 2 Conctention: L 1 L 2 Kleene str: L 1 *
Regulr expressions versus regulr lnguges We defined regulr lnguge to be one tht is ccepted by some DF (or NF). We cn prove tht lnguge is regulr by this definition if nd only if it corresponds to some regulr expression. Thus, 1) Given DF or NF, there is some regulr expression to describe the lnguge it ccepts 2) Given regulr expression, we cn construct DF or NF to ccept the lnguge it represents
ppliction: Lexicl-nlyzers nd lexicl-nlyzer genertors Lexicl nlyzer: Input: Chrcter string comprising computer progrm in some lnguge Output: string of symbols representing tokens elements of tht lnguge x in C++ or Jv : Input: if (x == 3) y = 2; Output (sort of): if-token, expression-token, vriblenme-token, ssignment-token, numeric-constnt token, sttement-seprtor-token.
Lexicl-nlyzer genertors Input: list of tokens in progrmming lnguge, described s regulr expressions Output: lexicl nlyzer for tht lnguge Technique: Builds n NF recognizing the lnguge tokens, then converts to DF.
Regulr xpressions: pplictions nd Limittions Regulr expressions hve mny pplictions: pecifiction of syntx in progrmming lnguges Design of lexicl nlyzers for compilers Representtion of ptterns to mtch in serch engines Provide n bstrct wy to tlk bout progrmming problems (lnguge correpsonds to inputs tht produce output yes in yes/no progrmming problem) Limittions There re lots of resonble lnguges tht re not regulr hence lots of progrmming problems tht cn t be solved using power of DF or NF
CFL Context-Free Lnguges n b n R { } { ww } Regulr Lnguges
Context-Free Lnguges Context-Free Grmmrs Pushdown utomt stck utomton
xmple context-free grmmr G: λ b derivtion: b bb bb
λ b L(G) = n n { b : n 0} Describes prentheses: (((( ))))
xmple context-free grmmr G: bb λ derivtion: bb bb
bb λ L (G) = { ww R : w {, b}*}
xmple context-free grmmr G: b λ derivtion: b b b
context-free grmmr G: b λ derivtion: b b bb bb
Definition: Context-Free Lnguges L lnguge is context-free if nd only if there is context-free grmmr G with L = L(G)
Derivtion Order 1. B 2. 4. B Bb 1 2 3. Leftmost derivtion: λ 3 5. B λ BBBBbb 4 5 Rightmost derivtion: 1 4 B 5 Bb 2 b 3 b b
B bbb B λ Leftmost derivtion: B bbbb bbb bbbbbb bbbbb bbbb Rightmost derivtion: B bbb bb bbbbb bbbb
B λ B Bb λ B B Bb B B b
B λ B Bb λ B B Bb Bb B B b λ
B λ B Bb λ B B Bb Bb b Derivtion Tree B B b λ λ
B λ B Bb λ B B Bb Bb b Derivtion Tree B yield B b λλ = b b λ λ
ometimes, derivtion order doesn t mtter Leftmost: B B B Bb b Rightmost: B Bb b b b me derivtion tree B B b λ λ
) ( + + + * + + + + + leftmost derivtion
) ( + + + + + + + leftmost derivtion
+ ( ) + Two derivtion trees + +
The grmmr is mbiguous: + ( ) string + hs two derivtion trees + +
Definition: context-free grmmr is mbiguous w L(G) G if some string hs: two or more derivtion trees Copyright 2006 ddison-wesley. ll rights reserved. 1-61
In other words: context-free grmmr is mbiguous w L(G) G if some string hs: two or more leftmost derivtions (or rightmost)
2 + 2 2 = 6 2 + 2 2 = 8 6 8 2 + 4 4 2 2 2 2 2 2 + 2 2 2 2 2
mbiguity is bd for progrmming lnguges We wnt to remove mbiguity Copyright 2006 ddison-wesley. ll rights reserved. 1-64
We fix the mbiguous grmmr: + ( ) New non-mbiguous grmmr: + T T T T F T F F F ( )
Copyright 2006 ddison-wesley. ll rights reserved. 1-66 F F F T F T T T T + ) ( F F F F T T T F T T T + + + + + + + + T + T F F T F +
Unique derivtion tree + + T T T F F F Copyright 2006 ddison-wesley. ll rights reserved. 1-67
The grmmr G: + T T T T F T F F F ( ) is non-mbiguous: very string hs w L(G) unique derivtion tree
nother mbiguous Grmmr IF_TMT if XPR then TMT if XPR then TMT else TMT Copyright 2006 ddison-wesley. ll rights reserved. 1-69
If expr1 then if expr2 then stmt1 else stmt2 IF_TMT if expr1 then TMT if expr2 then stmt1 else stmt2 IF_TMT if expr1 then TMT else stmt2 if expr2 then stmt1
Copyright 2006 ddison-wesley. ll rights reserved. 1-71 Inherent mbiguity ome context free lnguges hve only mbiguous grmmrs xmple: } { } { m m n m n n c b c b L = λ 1 1 b c λ 2 2 bbc B B 1 2
The string n b n c n hs two derivtion trees 1 2 1 c 2
implifiction of Context Free Grmmr
ubstitution Rule quivlent B B B bbc b ubstitute B b grmmr B B b bbc bbc Copyright 2006 ddison-wesley. ll rights reserved. 1-74
ubstitution Rule B B bbc b ubstitute B bbc B b bbc bbc bc quivlent grmmr
In generl: xbz B y 1 ubstitute B y 1 xbz xy1z equivlent grmmr
Nullble Vribles λ production : λ Nullble Vrible: K λ Copyright 2006 ddison-wesley. ll rights reserved. 1-77
Removing Nullble Vribles xmple Grmmr: Mb M Mb M λ Nullble vrible Copyright 2006 ddison-wesley. ll rights reserved. 1-78
Mb M Mb M λ ubstitute M λ Finl Grmmr Mb b M Mb M b Copyright 2006 ddison-wesley. ll rights reserved. 1-79
Unit-Productions Unit Production: B ( single vrible in both sides) Copyright 2006 ddison-wesley. ll rights reserved. 1-80
Removing Unit Productions Observtion: Is removed immeditely Copyright 2006 ddison-wesley. ll rights reserved. 1-81
xmple Grmmr: B B B bb Copyright 2006 ddison-wesley. ll rights reserved. 1-82
B B B bb ubstitute B B B B B bb Copyright 2006 ddison-wesley. ll rights reserved. 1-83
B B Remove B B B B B B bb B bb Copyright 2006 ddison-wesley. ll rights reserved. 1-84
B B bb B ubstitute B B B bb Copyright 2006 ddison-wesley. ll rights reserved. 1-85
Remove repeted productions Finl grmmr B B B bb B bb Copyright 2006 ddison-wesley. ll rights reserved. 1-86
Useless Productions b λ Useless Production ome derivtions never terminte... K K K Copyright 2006 ddison-wesley. ll rights reserved. 1-87
nother grmmr: λ B b Useless Production Not rechble from Copyright 2006 ddison-wesley. ll rights reserved. 1-88
In generl: contins only terminls if K xy K w w L(G) then vrible is useful otherwise, vrible is useless
production is useless x if ny of its vribles is useless b λ Productions Vribles useless useless useless useless B C useless useless C D useless
Removing Useless Productions xmple Grmmr: B C Cb C Copyright 2006 ddison-wesley. ll rights reserved. 1-91
First: find ll vribles tht cn produce strings with only terminls Round 1: C {, B} B C Cb Round 2: {, B, }
Keep only the vribles tht produce terminl symbols: {, B, } (the rest vribles re useless) C B C Cb B Remove useless productions
econd: Find ll vribles rechble from Use Dependency Grph B B not rechble
Keep only the vribles rechble from (the rest vribles re useless) Finl Grmmr B Remove useless productions
Removing ll tep 1: Remove Nullble Vribles tep 2: Remove Unit-Productions tep 3: Remove Useless Vribles Copyright 2006 ddison-wesley. ll rights reserved. 1-96
Chomsky Norml Form for CFG ch productions hs form: BC or vrible vrible terminl Copyright 2006 ddison-wesley. ll rights reserved. 1-97
xmples: b Chomsky Norml Form Not Chomsky Norml Form Copyright 2006 ddison-wesley. ll rights reserved. 1-98
Convertion to Chomsky Norml Form B xmple: b B c Not Chomsky Norml Form Copyright 2006 ddison-wesley. ll rights reserved. 1-99
Introduce vribles for terminls: T, T, T b c BT B B b c B T T T T c T b T b b T c c
Introduce intermedite vrible: c T b T T T B T T T BT c b c b c T b T T T B T T T BT V V c b c b 1 1 V 1
Introduce intermedite vrible: c T b T T T B T T V V T BT V V c b c b 2 2 1 1 V 2 c T b T T T B T T T BT V V c b c b 1 1
Finl grmmr in Chomsky Norml Form: c T b T T T B T T V V T BT V V c b c b 2 2 1 1 c B b B Initil grmmr
In generl: From ny context-free grmmr (which doesn t produce ) λ not in Chomsky Norml Form we cn obtin: n equivlent grmmr in Chomsky Norml Form Copyright 2006 ddison-wesley. ll rights reserved. 1-104
The Procedure First remove: Nullble vribles Unit productions Copyright 2006 ddison-wesley. ll rights reserved. 1-105
Then, for every symbol : dd production T In productions: replce with T New vrible: T
Replce ny production C C 1 2 L C n with V 1 K C V 1 C 2 1 V 2 V n 2 C n 1 C n K New intermedite vribles: V1, V2,, V n 2
Observtions Chomsky norml forms re good for prsing nd proving theorems It is very esy to find the Chomsky norml form for ny context-free grmmr Copyright 2006 ddison-wesley. ll rights reserved. 1-108
Greinbch Norml Form ll productions hve form: V V 1 2 L V k k 0 symbol vribles Copyright 2006 ddison-wesley. ll rights reserved. 1-109
xmples: B cb bb b b bb Greinbch Norml Form Not Greinbch Norml Form Copyright 2006 ddison-wesley. ll rights reserved. 1-110
Observtions Greinbch norml forms re very good for prsing It is hrd to find the Greinbch norml form of ny context-free grmmr Copyright 2006 ddison-wesley. ll rights reserved. 1-111
Properties of CFL Copyright 2006 ddison-wesley. ll rights reserved. 1-112
Union Context-free lnguges re closed under: Union L1 L2 is context free is context free L1 L 2 is context-free Copyright 2006 ddison-wesley. ll rights reserved. 1-113
xmple Lnguge Grmmr L = 1 { n b n } 1 1b λ L = 2 R { ww } 2 2 b2b λ Union L = { n b n } { ww R } 1 2
In generl: For context-free lnguges with context-free grmmrs nd strt vribles L 1, L 2 G 1, G 2 1, 2 The grmmr of the union hs new strt vrible nd dditionl production 1 L 2 L 1 2
Conctention Context-free lnguges re closed under: Conctention L1 is context free L2 is context free L 1 L 2 is context-free Copyright 2006 ddison-wesley. ll rights reserved. 1-116
xmple Lnguge Grmmr L = 1 { n b n } 1 1b λ L = 2 R { ww } 2 2 b2b λ L = { n b n Conctention }{ ww R } 1 2
In generl: For context-free lnguges with context-free grmmrs nd strt vribles L 1, L 2 G 1, G 2 1, 2 The grmmr of the conctention hs new strt vrible nd dditionl production 1 2 L 1 L 2
tr Opertion Context-free lnguges re closed under: tr-opertion L is context free * L is context-free Copyright 2006 ddison-wesley. ll rights reserved. 1-119
xmple Lnguge Grmmr L = { n b n } b λ tr Opertion L = { n b n }* 1 1 λ Copyright 2006 ddison-wesley. ll rights reserved. 1-120
In generl: For context-free lnguge with context-free grmmr nd strt vrible L G The grmmr of the str opertion hs new strt vrible 1 nd dditionl production 1 1 L* λ
Intersection Context-free lnguges re not closed under: intersection L1 L2 is context free is context free L1 L 2 not necessrily context-free Copyright 2006 ddison-wesley. ll rights reserved. 1-122
xmple } { 1 m n n c b L = λ λ cc C b C Context-free: } { 2 m m n c b L = λ λ bbc B B Context-free: } { 2 1 n n n c b L L = NOT context-free Intersection
Complement Context-free lnguges re not closed under: complement L is context free L not necessrily context-free Copyright 2006 ddison-wesley. ll rights reserved. 1-124
} { 2 1 2 1 n n n c b L L L L = = NOT context-free xmple } { 1 m n n c b L = λ λ cc C b C Context-free: } { 2 m m n c b L = λ λ bbc B B Context-free: Complement
The intersection of context-free lnguge nd regulr lnguge is context-free lnguge L1 L2 context free regulr L1 L 2 context-free
ummry Copyright 2006 ddison-wesley. ll rights reserved. 1-127
The Chomsky Hierrchy Non-recursively enumerble Recursively-enumerble Recursive Context-sensitive Context-free Regulr
Who is Nom Chomsky nywy? Philosopher of Lnguges Professor of Linguistics t MIT Constructed the ide tht lnguge ws not lerned behvior, but tht it ws cognitive nd innte; versus stimulusresponse driven In n effort to explin these theories, he developed the Chomsky Hierrchy
Chomsky Hierrchy Lnguge Grmmr Mchine xmple Regulr Grmmr Deterministic or Regulr Lnguge Right-liner grmmr Left-liner grmmr Nondeterministic Finite-stte cceptor * Context-free Lnguge Context-free grmmr Nondeterministic Pushdown utomton n b n Contextsensitive Context-sensitive grmmr Liner-bounded utomton n b n c n Recursively enumerble Unrestricted grmmr Turing mchine ny computble function
Chomsky Hierrchy Comprises four types of lnguges nd their ssocited grmmrs nd mchines. Type 3: Regulr Lnguges Type 2: Context-Free Lnguges Type 1: Context-ensitive Lnguges Type 0: Recursively numerble Lnguges These lnguges form strict hierrchy