Bsc technologes Generc Lnguge Technology: Bsc technologes Pro.dr. Mrk vn den Brnd Syntctcl nlyss Prser genertors Rewrte engnes / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 1 Tsks nd orgnzton o lexcl nlyzer Speccton o lexcl tokens v regulr expressons Implementton o regulr expressons (non-)determnstc nte utomt trnslton o regulr expresson to utomton progrm text (chrcters) Lexcl nlyzer Syntctcl nlyss get next token token prse tree Prser Symol tle / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 2 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 3
Flotng pont numers Tsks o the lexcl nlyzer: redng the nput nd producton o tokens elmnton o lyout nd comments keepng trck o poston normton lexcl l syntx [0] ([1-9][0-9]*) -> UnsgnedInt [\+\-]?\ UnsgnedInt -> SgnedInt UnsgnedInt "." [0-9]+ ([ee] SgnedInt)? -> UnsgnedRel UnsgnedInt [ee] SgnedInt -> UnsgnedRel UnsgnedInt UnsgnedRel -> Numer 0 1 14 0.1 3e4 3.014e-7 00 01 04.1 3e04 3.14e-07 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 4 Pge 5 A regulr expresson (r.e.) r over n lphet Σ corresponds to the lnguge L(r) 1. s r.e. nd corresponds to {} 2. Σ s r.e. nd corresponds to {} 3. Suppose r nd s re r.e. s correspondng to the lnguges L(r) nd L(s). lterntve (r) (s) s r.e. L(r) L(s). conctenton (r) (s) s r.e. L(r) L(s) c. Kleene closure (r)* s r.e. (L(r))* d. rckets (r) s r.e. L(r) Opertors re let-ssoctve nd prortes re * > conctenton > A regulr denton over lphet Σ hs the orm: re 1 -> d 1 re 2 -> d 2 re n -> d n where d re derent nmes nd ech re s r.e. over lphet Σ {d 1, d 2,, d -1 } Thus, n re occur only nmes whch re lredy dened / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 6 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 7
A regulr expresson cn e compled nto nte utomton (FA = nte utomton) whch s recognzer or the correspondng regulr lnguge A nte utomton s non-determnstc severl derent trnstons re possle or one nput symol n stte (NFA) There re two possle wys o trnsormng r.e. nto determnstc nte utomton: 1. r.e. NFA DFA 2. re r.e. DFA The generted DFA hs to e optmzed Otherwse the nte utomton s determnstc (DFA) / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 8 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 9 Non-determnstc nte utomton conssts o: 1. Set o sttes S 2. Input lphet Σ 3. Trnston uncton whch tkes stte/symol pr nd yelds set o new sttes 4. The strt stte s 0 S 5. A set F o cceptng sttes Exmple: S = {0,1,2,3} Σ = {, } S 0 = 0 F = {3} Ths utomton ccepts: ( )* Trnston uncton: Stte 0 {0,1} {0} 1 {2} 2 {3} / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 10 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 11
Trnstons my lso e lelled wth 1 2 0 3 4 NFA ccepts strng x There exsts pth n the trnston dgrm rom strt to nl stte such tht the conctenton o the lels on the pth equl x For exmple s ccepted y the prevous NFA Pth: 0 1 2 2 2 Lels: = The regulr expresson s: * * / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 12 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 13 Regulr expresson NFA Input: regulr expresson r over lphet Σ Output: NFA N whch ccepts L(r) 1. r = 2. r = where Σ 1. Suppose N(s) nd N(t) re NFAs or the r.e. s nd t. r = s t. r=st N(s) N(s) N(t) N(t) Ths operton s only possle the nl stte o N(s) hs no outgong trnstons nd the strt stte o N(t) hs no ncomng trnstons / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 14 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 15
c. r = s* N(s) d. r =(s) thenn(r) =N(s) Converson NFA DFA Regulr expresson cn e trnsormed nto NFAs DFAs cn e smulted/mplemented ecently Trnsormton o NFA nto DFA: construct DFA where ech stte represents suset o the sttes o the sttes o the NFA ter redng the nput 1 2 n the NFA s n set o sttes T, whch corresponds to one stte o the DFA / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 16 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 17 Auxlry unctons: -closure(s) yelds set o NFA sttes rechle rom stte s n NFA v -trnstons only -closure(t) yelds set o NFA sttes rechle rom stte s n T v -trnstons only move(t, ) yelds set o NFA sttes rechle rom stte s n T v nput Intlly, -closure(s 0 ) s the only stte n Dsttes nd unmrked whle there s n unmrked stte T n Dsttes do mrk T; or ech nput symol do U := -closure(move(t, )) U s not n Dsttes then dd U s n unmrked stte to Dsttes; end Dtrns[T,] := U; end end / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 18 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 19
Suset constructon (Rn & Scott) NFA = (Q, V, γ, q 0, F) NFA N or ( )* Equvlent DFA = (P(Q), V, δ, {q 0 }, F ) ) δ P(Q) V P(Q) δ(qq, ) ) = ( q:q qq : γ(q, ( )) F = {qq P(Q) qq F } δ({q 0 }, w) = set o ll sttes n whch the orgnl NFA cn e ter processng strng w 2 3 0 1 6 7 8 9 10 4 5 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 20 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 21 A = {0, 1, 4, 5, 7} (= -closure(0)) B = {1, 2, 3, 4, 6, 7, 8} (= move({0, 1, 4, 5, 7}, )) C = {1, 2, 4, 5, 6, 7} (= move({0, 1, 4, 5, 7}, )) D = {1,2,4,5,6,7,9}(= move({1, 2, 3, 4, 6, 7, 8}, )) E = {1, 2, 4, 5, 6, 7, 10} (= move({1, 2, 3, 4, 6, 7, 9}, )) Resultng DFA A B D E Trnston tle stte A B C B B D C B C D B E E B C C / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 22 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 23
DFA mnml DFA (MDFA) DFA = (Q, V, δ, q 0, F) equvlence relton on sttes; or ll s, t Q s t ( w : w V* : δ(s,w) F δ(t,w) F) Denton (equvlence clss o stte q) Q P(Q) ( q : q Q : (q) ={q q q } Prttonng stte set Q ccordng to relton yelds stte set Q whch s used n the mnml DFA (MDFA) (Q, V, δ, (q 0 ), F ) where ( s, : s V:δ ( (s),)) = (δ(s,)) F = { () F} LEX s scnner genertor whch trnsorms regulr expressons nto nte utomton: r.e. NFA re 0 {cton 0 } re 1 {cton 1 } 0 0 re k {cton k } strt F = { 0,, k } NFA DFA cceptng sttes hve the orm {,,,,, c, } wth correspondng cton: cton mn(,,c) k k / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 24 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 25 Implementton exmples cn e ound n Secton 3.3.2 o http://www.wn.tue.nl/~mvdrnd/courses/glt/0910/ ppers/notes.pdpd Resoluton o mgutes Longest mtch s preerred I two lterntves recognze the sme sequence o chrcters, the lterntve occurrng rst n the speccton s chosen BEGIN [sym := egnsym] IF [sym := sym] letter.(letter dgt)* [sym := dsym] dgt.(dgt)* t t [sym := ntrepsym] := [sym := ecomessym] / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 26 / Fcultet Wskunde en Inormtc 2-9-2010 PAGE 27