% / CS2 Lnguge Processing note 2 Regulr lnguges Lst lecture we reviewed finite utomt (k finite stte mchines) nd formlly defined DFAs (deterministic finite utomt) In this lecture, we will use these to define the notion of regulr lnguge We give exmples of regulr nd nonregulr lnguges We lso prove n importnt technicl result, the Pumping Lemm, which cn often e used to rigorously estlish tht lnguge is not regulr The lnguge of DFA Any computtion on DFA cn e represented s sequence of the sttes visited during the computtion The first stte is lwys the strting (initil) stte of the DFA We write to denote tht if the DFA is in stte nd receives the string s input then it will end up in stte If the lst stte of the computtion is in then the computtion is ccepting; otherwise it is reecting We sy tht string is ccepted y if is n ccepting computtion (where is the initil stte of ) The lnguge ccepted (or recognised) y is defined y! " #$ % ' )(+, ccepts 0/ Exmple 21 Consider the DFA 21%4356 758 % 39%48 :4 where the trnsition function is defined y the following tle: 3 6 3 6 7 3 7 8 3 8 8 8 The DFA is displyed in Figure 1 Wht is the lnguge! "# ccepted y? The crucil oservtion is tht the only wy to rech the ccepting stte 8 is y reding 3 consecutive s Once stte 8 is reched, the utomton remins there Thus ; < is the lnguge over %4=>? consisting of ll strings tht contin three consecutive s, or formlly, the lnguge! " <@BA C CD (E 1
%, 0 1 2 3 Figure 1: The utomton of Exmple 21 Regulr lnguges Definition 22 A lnguge GF< ( is sid to e regulr if there exists some DFA such tht is the lnguge recognised y (ie H#! " < ) Suppose we re given lnguge nd we wnt to show tht it is regulr How do we do this? At present, we hve only one technique ville: we simply construct DFA tht recognises the lnguge (In future lectures, we shll see other methods of showing tht lnguges re regulr) Exmple 23 Tke the lphet ÏJ% nd lnguge KJ% ' (+ the string ends with the sustring L/ We show tht this lnguge is regulr y constructing DFA tht recognises it This is chieved y the DFA drwn in Figure 2 0 1 2 3 Figure 2: A DFA for ll strings ending with In fct, there re (infinitely) mny other DFAs tht recognise the ove lnguge, lthough the one in Figure 2 is the simplest possile As we shll see 2
lter, this is generl phenomenon Every regulr lnguge hs infinitely mny DFAs tht recognise it, nd there is lwys simplest miniml one Are ll lnguges regulr? Wouldn t tht e nice? However, little it of thought revels tht there re severe restrictions on how DFA opertes The input cn only e red onewy, so DFA cnnot go ck nd rered ny of its input The only informtion ville during computtion is the stte tht the computtion is in Thus the only memory tht DFA hs is contined in its sttes Since the numer of sttes is finite, only finite mount of informtion cn e stored Intuitively, if DFA is presented with very long string s input then fter some time it will not e le to rememer certin prts of the input tht it hs lredy consumed These oservtions led to some firly simple exmples of lnguges tht cnnot e recognised y ny DFA For exmple, the lnguge elow (over the lphet % 0 1 ) is not regulr KM% OQPR3 0N 1N 0/ (In cse you re confused y the nottion here, the ove is simply mthemticl nottion for the lnguge %TS5 01 0011 000111 ///U ) The intuition for why is not regulr is tht ny mchine tht recognised the lnguge would hve to rememer the numer of 0s in order to determine whether the correct numer of 1s pper, nd DFA is incple of doing this However, this intuition does not yet mount to proof tht is not regulr The remining gol of this lecture is to estlish method for rigorously ustifying such clims of nonregulrity The Pumping Lemm In this section, we shll prove clssic result from utomt theory, known s the Pumping Lemm, tht formlises our intuitions, discussed ove, out the limittions of DFAs We need it of nottion first V denotes the set of nturl numers %06 7589 /// Note tht 3 is not included If we wnt to include 3, we use the nottion V, tht W %43 M%43967589 is, we let V VKX /// T Lemm 23 (The Pumping Lemm) Let e DFA with! "# Y sttes, nd let Q e string tht it ccepts If 5P Y then there exist three [\ ( strings (i) (ii) [^_ [0P`6,, such tht the four properties elow ll hold 3
6 / r l (iii) [0 Y, (iv) for every cdv : [feg ; < Proof: Since is ccepted y t lest 6 Y h steps However occur t lest twice in the first Yih We oserve the run of nd 0P Y, the computtion of on will tke hs only Y sttes, nd so some stte must steps of the computtion on input nd split into 3 strings [\ s follows: We let e the initil prt of the string tht tkes from strt stte to stte for the first time The lst letter of is the letter red y ust efore it reches stte We let [ e the next prt of, red y efore it reches stte for the second time Then we hve [fpj6, nd since we ssumed tht must occur t lest twice in the first 6 Yih steps of the computtion, we lso hve [k Y Let [0 e the remining prt of, so tht we hve We hve lredy oserved tht (i) (iii) re stisfied y our choice of ; it remins to prove (iv) Our choice of [ gurntees tht if is in stte nd strts to red [, then it will e in stte gin fter it hs red [ Formlly, But this mens tht we cn pump the string [ ny numer of times (even zero) nd we hve o lnm still end up in stte Formlly, for ll @'V [ e This implies tht is ccepted y tht y our choice of nd we hve s q (since for every phv To see this, oserve nd o s for n ccepting stte ccepts ) Putting things together, we otin [ e q lnm r o Thus if strts reding in its strting stte, it will finish its computtion [^eg! " < in n ccepting stte Therefore,, nd the lemm is proved s [\ How to show tht lnguge is not regulr The Pumping Lemm gives powerful tool for showing tht lnguge regulr The strtegy is lwys the sme is not Begin y ssuming tht the lnguge is regulr The ide is to use the Pumping Lemm to rech contrdiction from this ssumption Becuse is ssumed to e regulr, there must e some DFA tht recognises it Write Y for the numer of sttes in Choose some string in with 0P 4 Y
Apply the Pumping Lemm to The Pumping Lemm reks up into suitle, [ nd stisfying properties (i) (iv) [ e ut Choose cdv so tht ccording to the originl definition of This contrdicts property (iv) of the Pumping Lemm, which gurntees tht [ e! " <, since we hve ssumed tht K#! < Hving reched the sought contrdiction, conclude tht the initil ssumption (tht is regulr) is flwed Here re some typicl exmples Theorem 24 Let vj% 0 1 nd KM% 0N 1N woqpr3 Then is not regulr Proof: Suppose tht is regulr, recognised y some DFA with Y sttes Let {[{ Tke Since 0x 1x Note tht y nd zp ( [^ such tht, [0P`6, [f #3 9 [ e _ [ nd set Then Y By the Pumping Lemm, there re [ e! " < Y, nd for ll c V! " < Y, oth nd [ must consist entirely of 0 s In prticulr [ is ~ @ nonempty, so hs fewer 0 s thn 1 s, thus so our ssumption tht is regulr must e wrong Theorem 25 vm% ( ) nd HM% ( d Then is not regulr the prenthesis in re well lnced0/ This is contrdiction, Proof: Suppose tht is regulr, recognised y some DFA with Y sttes Let {[{ Tke Since [ W (x )x Note tht y nd zp ( [^ such tht, [0P`6, [f #3 9 [ e _ nd set Then Y By the Pumping Lemm, there re [ e! " < Y, nd for ll c V! " < Y, oth nd [ must consist entirely of (s In prticulr [ is z nonempty, so hs fewer opening prenthesis thn closing prenthesis Thus This is contrdiction, so our ssumption tht is regulr must e wrong Theorem 26 Let e the set of ASCII symols nd JAVA F ( the lnguge consisting of ll syntcticlly correct JAVAprogrms Then JAVA is not regulr Hint for the Proof: Consider the following fmily of syntcticlly correct JAVA progrms: oƒ clss HelloWorld { pulic sttic void min(string[ rgs) { Systemoutprintln("Hello World!"); 5
[ h ƒ h > clss HelloWorld { pulic sttic void min(string[ rgs) {{ Systemoutprintln("Hello World!"); > ~ clss HelloWorld { pulic sttic void min(string[ rgs) {{{ Systemoutprintln("Hello World!"); clss HelloWorld { pulic sttic void min(string[ rgs) {{{{ Systemoutprintln("Hello World!"); Theorem 27 Let % nd ˆ % Š regulr is prime numer Then Proof: Suppose tht is regulr, recognised y some DFA with Y sttes >P [0 Let Š where is not is prime numer igger thn Y Note tht J nd [\ ( [0 Y By the Pumping Lemm, there re such tht, [k9p 6, [ e! " < Y, nd for ll cdv! < Š Œ Then 2 [ nd, so tht PM6 Ž with nd Ž>hI Y Now 6 Žh w 6 Ž #h, which mens tht the length of is not z Set Tke Ž prime numer Thus, nd we hve contrdiction Hence is not regulr Hints for using the Pumping Lemm The Pumping Lemm is one of the more difficult mthemticl tools you will encounter in CS2 Here is some dvice on how to use it, nd on pitflls to void The first tricky step is to choose n pproprite string tht will llow the proof to go through (Don t forget to check tht nd ^P Y ) [\ You re not llowed to choose yourself! These re given to you y the Pumping Lemm All tht you know out them is tht they stisfy properties (i) (iv) The second tricky step is to choose n pproprite so tht [ e t [ e is not in In order to prove tht, the only informtion you should use is: your choice of nd, properties (i) (iii), nd everything you know out the lnguge 6
ƒ Exercises 1 Which of the following lnguges over the lphet ÏJ% c re regulr: % ( ' contins the sme numer of s nd s %4 U /// %5 O N dv % )( ' there is no c in occurring efore the first % O %4396 7585 ~ 5 9 5š5 96T3 N N f0/ Prove your the correctness of your nswers 2 Let e the lphet of ll ASCII symols, nd e the set of ll vlid identifiers in Jv tht cn e formed from symols in Show tht is regulr (this is quite tedious, ecuse ll reserved words hve to e voided, ut t lest think out it nd convince yourselves tht you could do this in principle ) 3 Perhps surprisingly, if one lnguge contins nother vf_c, then whether or not is regulr tells us nothing out c, nd vice vers Construct exmples of ll four regulr/nonregulr comintions Don Snnell 7