flex Regulr Expressions nd Lexicl Scnning Using flex to Build Scnner flex genertes lexicl scnners: progrms tht discover tokens. Tokens re the smllest meningful units of progrm (or other string). flex is freewre ville from the Free Softwre Foundtion nd Gnu Project (http://www.gnu.org/). It s lso ville in the Cygwin (Unix emultor for Windows) downlod (http://cygwin.com/). Regulr Expressions nd flex flex input: file contining regulr expressions nd some code The regulr expressions define tokens such s if, then, while, nd clsses of tokens such s identifier, flot. flex output: C (or C++) code for lexicl scnner (Stndrd) Regulr Expressions on Alphet A Ø (for ech symol in A) e1 e (where e1 nd e re regulr exps) e1 e (where e1 nd e re regulr exps) e* (where e is regulr expression) (e) (where e is regulr expression) Lnguge Denoted y Regulr Expression L(Ø) = Ø L( ) = { } L() = {} L(e1 e) = L(e1) L(e) L(e1 e) = L(e1) U L(e) L(e*) = L(e)* L((e)) = L(e) Precedence: *, then conctention, then Exmples on Alphet A = {,} is regulr expression denoting. is regulr expression denoting {, }. * is regulr expression denoting {,,,, }. ( )* is regulr expression denoting {strings tht egin nd end with }. 1
Exmples for You RE for strings of even length RE for strings with s sustring RE for strings with exctly one RE for strings tht do NOT hve s sustring RE for INTEGER (possily preceded y + or -, no leding zeroes; lphet is {0,,9,+,-}) flex Regulr Expressions Cn hve opertors other thn conctention, union, nd closure For every flex expression, there exists n equivlent stndrd regulr expression. Advntge of stndrd regulr expressions: esy to prove theorems Advntge of flex regulr expressions: esier to express mny lnguges flex Regulr Expressions Regulr Mening expression [c] or or c flex Regulr Expressions, cont d Regulr Mening expression [-z]* 0 or more smll letters [\t\n] T or newline [-z]+ 1 or more smll letters [-z] A smll letter [-z]? 0 or 1 smll letter [-za-z] smll letter or cpitl letter. Any chrcter except \n flex Regulr Expressions, cont d Regulr Mening expression [^c] A chrcter other thn,, or c. \. A period {exp} exp1 exp The vlue of exp Anything mtching exp1 or exp flex Exmple 1: Section 1 %{ chr unused_vr; %} %option noyywrp /* regulr definitions */ delim [ \t\n] ws {delim}+ letter [A-Z-z] digit [0-9] identifier {letter}({letter} {digit} _)*
flex Exmple 1: Section {ws} {/* Do nothing. */ } select {printf("found token SELECT\n");} from {printf("found token FROM\n");} \, {printf("found token COMMA\n");} quit {return;} {identifier}{printf("found token IDENTIFIER\n");} flex Exmple 1: Section 3 int min() { yylex(); return 0; } Using flex sql.l Using the Scnner myquery.sql flex lex.yy.c C compiler lexicl scnner lexicl scnner SELECT SID FROM yylex() flex Sections Section 1: C (or C++) code to e copied to scnner - %{ %}; flex options; nd regulr definitions Section : token definitions nd semntic ctions Section 3: dditionl definitions, usully functions, to e copied to end of scnner code flex Exmple : Section 1 Code Copied to Scnner %{ #define SELECT 1 #define FROM #define COMMA 3 #define IDENTIFIER 4 #define QUIT 5 chr *yylvl; %} 3
flex Exmple : Section 1, cont. %option noyywrp /* regulr definitions */ delim [ \t\n] ws {delim}+ letter [A-Z-z] digit [0-9] identifier {letter}({letter} {digit} _)* flex Exmple : Section {ws} {/* Do nothing. */ } select {return( SELECT );} from {return( FROM );} \, {return( COMMA ); } {identifier} {yylvl = (chr *) mlloc (sizeof(chr) *strlen(yytext)+1); strcpy (yylvl, yytext) ; return( IDENTIFIER );} flex Exmple : Section 3 int min() { int my_token=1; while(my_token = 0) { my_token = yylex(); if(my_token == IDENTIFIER) printf("returned token is %d with vlue %s\n", my_token, yylvl); else printf("returned token is %d\n", my_token); } return 0; } Flex Strt Conditions <COND1regexp mtch regexp if condition COND1 holds <COND1,INITIALregexp mtch regexp if condition COND1 or INITIAL holds <*regexp mtch regexp under ny condition BEGIN(COND1) - in semntic ction, switches to COND1 Declring Flex Strt Conditions %x COND1 declres COND1 to e n exclusive strt condition %s COND declres COND to e n inclusive (shred) strt condition If COND1 is current, regexp is ctive. If COND is current, regexp1 nd regexp3 re ctive: regexp1 <COND1regexp <CONDregexp3 Exmple: Flex Strt Conditions %s ATT_VALUE, STR_CONST <TAG_NAME"=" {BEGIN(ATT_VALUE); return(eq);} <ATT_VALUE{string_const} {BEGIN(TAG_NAME); return(str_const);} 4
Distinguishing Identifiers %x SELCLAUSE, FROMCLAUSE // SELCLAUSE nd FROMCLAUSE re // exclusive strt conditions identifier {letter}({letter} {digit} _)* // regulr definition // of "identifier" Distinguishing Identifiers, cont d <INITIALselect // Accept "select" if in INITIAL strt // condition {BEGIN(SELCLAUSE); // Switch to SELCLAUSE strt condition return( SELECT ); <SELCLAUSEfrom // Accept "from" if in SELCLAUSE strt condition {BEGIN(FROMCLAUSE); // Switch to FROMCLAUSE strt condition return( FROM );} Distinguishing Identifiers, cont d <SELCLAUSE{identifier} // Accept identifiers if in SELCLAUSE // strt condition {return( ATTIDENTIFIER );} Finite Automt: An utomton tht ccepts {, } 1 <FROMCLAUSE{identifier} // Accept identifiers if in FROMCLAUSE // strt condition {return( RELIDENTIFIER );} 0 3 Finite Automton for Digits Finite Automton for Identifier 0 1... 9 Digit = 0 1 9,,, z, A, B,, Z,,, z, A, B,, Z, 0,, 9, _ 0, 1,, 9 Identifier := {letter}( {letter} {digit} _ ) * 5
Nondeterministic Finite Automton: Lnguge = strings with sustring Deterministic Finite Automton: Lnguge = strings with sustring,,, 0 1 3 0 1 3 Exmple DFAs for you Theory DFA for strings in L( ( (c))* ) DFA for strings of even length DFA for strings tht hve s sustring DFA for INTEGER (possily preceded y + or -, no leding zeroes) For every regulr expression, there is n NFA tht genertes the sme lnguge (nd vice vers). For every NFA, there is DFA tht genertes the sme lnguge (nd vice vers). Wht flex Does Converts regulr expressions to nondeterministic utomt (NFAs) Converts nondeterministic utomt to deterministic utomt (DFAs) Minimizes deterministic utomt Outputs code to simulte minimized DFAs Conclusions Lexicl scnning is first phse of compiltion/interprettion. Lexicl scnning useful for mny progrms, not just trnsltors. flex nd lex re most populr of mny scnner genertors. flex is sed on elegnt theory of lnguges nd mchines. 6