LEX/Flex Scanner Generator



Similar documents
Introduction to Lex. General Description Input file Output file How matching is done Regular expressions Local names Using Lex

Compiler Construction

A Lex Tutorial. Victor Eijkhout. July Introduction. 2 Structure of a lex file

Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

LEX & YACC TUTORIAL. by Tom Niemann. epaperpress.com

Lex et Yacc, exemples introductifs

Compilers Lexical Analysis

csce4313 Programming Languages Scanner (pass/fail)

Flex/Bison Tutorial. Aaron Myles Landwehr CAPSL 2/17/2012

Programming Languages CIS 443

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Scanner. tokens scanner parser IR. source code. errors

File Handling. What is a file?

CS 241 Data Organization Coding Standards

Download at Boykma.Com

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Keywords are identifiers having predefined meanings in C programming language. The list of keywords used in standard C are : unsigned void

03 - Lexical Analysis

Chapter 13 - The Preprocessor

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

So far we have considered only numeric processing, i.e. processing of numeric data represented

Storage Classes CS 110B - Rule Storage Classes Page 18-1 \handouts\storclas

Programming Project 1: Lexical Analyzer (Scanner)

ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters

How to Write a Simple Makefile

Tutorial on C Language Programming

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

Introduction to Java

J a v a Quiz (Unit 3, Test 0 Practice)

Stacks. Linear data structures

Sources: On the Web: Slides will be available on:

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

C / C++ and Unix Programming. Materials adapted from Dan Hood and Dianna Xu

C++ INTERVIEW QUESTIONS

MS Visual C++ Introduction. Quick Introduction. A1 Visual C++

Moving from CS 61A Scheme to CS 61B Java

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Member Functions of the istream Class

University of Wales Swansea. Department of Computer Science. Compilers. Course notes for module CS 218

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

1 Abstract Data Types Information Hiding

13 Classes & Objects with Constructors/Destructors

C++ Outline. cout << "Enter two integers: "; int x, y; cin >> x >> y; cout << "The sum is: " << x + y << \n ;

5 Arrays and Pointers

Evaluation of JFlex Scanner Generator Using Form Fields Validity Checking

Simple C Programs. Goals for this Lecture. Help you learn about:

Compiler Construction

Compiler I: Syntax Analysis Human Thought

Chapter 5. Recursion. Data Structures and Algorithms in Java

Statements and Control Flow

Unix Shell Scripts. Contents. 1 Introduction. Norman Matloff. July 30, Introduction 1. 2 Invoking Shell Scripts 2

1 Description of The Simpletron

The C Programming Language course syllabus associate level

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Chapter 2: Elements of Java

/* File: blkcopy.c. size_t n

What is a Loop? Pretest Loops in C++ Types of Loop Testing. Count-controlled loops. Loops can be...

Lecture 3. Arrays. Name of array. c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] c[8] c[9] c[10] c[11] Position number of the element within array c

Informatica e Sistemi in Tempo Reale

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

Semantic Analysis: Types and Type Checking

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

A Grammar for the C- Programming Language (Version S16) March 12, 2016

Theory of Compilation

CISC 181 Project 3 Designing Classes for Bank Accounts

As previously noted, a byte can contain a numeric value in the range Computers don't understand Latin, Cyrillic, Hindi, Arabic character sets!

Fundamentals of Programming

CS170 Lab 11 Abstract Data Types & Objects

C Programming. for Embedded Microcontrollers. Warwick A. Smith. Postbus 11. Elektor International Media BV. 6114ZG Susteren The Netherlands

Basics of I/O Streams and File I/O

Example of a Java program

Lab 1: Introduction to C, ASCII ART and the Linux Command Line Environment

Object Oriented Software Design

Time Limit: X Flags: -std=gnu99 -w -O2 -fomitframe-pointer. Time Limit: X. Flags: -std=c++0x -w -O2 -fomit-frame-pointer - lm

PART-A Questions. 2. How does an enumerated statement differ from a typedef statement?

C++ Input/Output: Streams

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

Memory management. Announcements. Safe user input. Function pointers. Uses of function pointers. Function pointer example

IS0020 Program Design and Software Tools Midterm, Feb 24, Instruction

Pemrograman Dasar. Basic Elements Of Java

Recognizing PL/SQL Lexical Units. Copyright 2007, Oracle. All rights reserved.

Parameter Passing. Parameter Passing. Parameter Passing Modes in Fortran. Parameter Passing Modes in C

ECE 341 Coding Standard

grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print

Help on the Embedded Software Block

Number Representation

Boolean Data Outline

Project 2: Bejeweled

System Calls and Standard I/O

Passing 1D arrays to functions.

C++FA 5.1 PRACTICE MID-TERM EXAM

C Examples! Jennifer Rexford!

Eventia Log Parsing Editor 1.0 Administration Guide

Transcription:

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1 LEX/Flex Scanner Generator

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 2 flex - Fast Lexical Analyzer Generator We can use flex a to automatically generate the lexical analyzer/scanner for the lexical atoms of a language. a The original version was known as lex.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 3 Input The input to the flex program (known as flex compiler) is a set of patterns or specification of the tokens of the source language. Actions corresponding to different matches can also be specified.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 4 Input The pattern for every token class is specified as an extended regular expression and the corresponding action is a piece of C code. The specification can be given from a file (by default it is stdin).

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 5 Output The flex software compiles the regular expression specification to a DFA and implements it as a C program with the action codes properly embedded in it.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 6 Output The output of flex is a C file lex.yy.c with the function int yylex(void). This function, when called, returns a token. Attribute information of a token is available through the global variable yylval.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 7 flex Specification The flex specification is a collection of regular expressions and the corresponding actions as C code. It has three parts: Definitions %% Rules %% User code

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 8 flex Specification The definition has two parts: The portion within the pair of special parentheses %{ and %} is copied verbatim in lex.yy.c. Similarly the user code is also copied. The other part of the definition contains regular name definitions, start conditions and other options.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 9 flex Specification The rules are specified as <pattern> <action> pairs. The patterns are extended regular expressions corresponding to different tokens. The corresponding actions are given as C code. An action is taken at the end of the match.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 10 An Example: lex.l /* A scanner for a toy language:./lex_yacc/lex.l */ %{ // Copied verbatim #include <string.h> #include <stdlib.h> #include "y.tab.h" // tokens are defined int line_count = 0; yyltype yylval; %}

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 11 %option noyywrap %x CMNT DELIM ([ \t]) WHITESPACES ({DELIM}+) POSI_DECI ([1-9]) DECI (0 {POSI_DECI}) DECIMAL (0 {POSI_DECI}{DECI}*) LOWER ([a-z]) LETTER ({LOWER} [:upper:] _) IDENTIFIER ({LETTER}({LETTER} {DECI})*) %%

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 12 "/*" {BEGIN CMNT;} <CMNT>. {;} <CMNT>\n {++line_count;} <CMNT>"*/" {BEGIN INITIAL;} \n { ++line_count; return (int) \n ; } "\"".*"\"" { yylval.string=(char *)malloc((yyleng+1)*(sizeof(char))); strncpy(yylval.string, yytext+1, yyleng-2); yylval.string[yyleng-2]= \0 ; return STRNG; }

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 13 {WHITESPACES} { ; } if {return IF;} else {return ELSE; } while { return WHILE; } for { return FOR; } int { return INT; } \( { return (int) ( ; } \) { return (int) ) ; } \{ { return (int) { ; } \} { return (int) } ; } ; { return (int) ; ; }, { return (int), ; } = { return (int) = ; } "<" { return (int) < ; }

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 14 "+" { } "-" { } "*" { } "/" { yylval.integer = (int) + ; return BIN_OP; yylval.integer = (int) - ; return BIN_OP; yylval.integer = (int) * ; return BIN_OP; yylval.integer = (int) / ;

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 15 return BIN_OP; } {IDENTIFIER} { yylval.string=(char *)malloc((yyleng+1)*(sizeof(char))); strncpy(yylval.string, yytext, yyleng + 1); return ID; } {DECIMAL} { yylval.integer = atoi(yytext); return NUM; }. { ; } %% /* The function yywrap() int yywrap(){return 1;} */

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 16 y.tab.h #ifndef _Y_TAB_H #define _Y_TAB_H #define IF 300 #define ELSE 301 #define WHILE 302 #define FOR 303 #define INT 304 #define ID 306 #define NUM 307 #define STRNG 308 #define BIN_OP 310

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 17 int yylex(void); typedef union { char *string; int integer; } yyltype; #endif

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 18 mylex.c #include <stdio.h> #include <stdlib.h> #include "y.tab.h" extern yyltype yylval; int main() // mylex.c { int s; while((s=yylex())) switch(s) { case \n : printf("\n"); break;

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 19 case ( : printf("<(> "); break; case ) : printf("<)> "); break; case { : printf("<{> "); break; case } : printf("<}> "); break; case ; : printf("<;> "); break; case, : printf("<,> "); break; case = : printf("<=> "); break;

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 20 case < : printf("<<> "); break; case BIN_OP : printf("<bin_op, %c> ", (char) yylval.integer); break; case IF : printf("<if> "); break; case ELSE : printf("<else> "); break; case WHILE : printf("<while> "); break; case FOR : printf("<for> "); break; case INT : printf("<int> ");

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 21 } break; case ID : printf("<id, %s> ", yylval.string); free (yylval.string); break; case NUM : printf("<num, %d> ",yylval.integer); break; case STRNG : printf("<strng, %s> ", yylval.string); free (yylval.string); break; default: ; } return 0;

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 22 input /* Scanner input */ int main() { int n, fact, i; scanf("%d", &n); fact=1; for(i=1; i<=n; ++i) fact = fact*i; printf("%d! = %d\n", n, fact); return 0; } /* End */

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 23 How to Compile? $ flex lex.l lex.yy.c $ cc -Wall -c lex.yy.c lex.yy.o lex.yy.c:1161: warning: yyunput defined but not used $ cc -Wall mylex.c lex.yy.o a.out $./a.out < input > out $

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 24 A Simple Makefile objfiles = mylex.o lex.yy.o a.out : $(objfiles) cc $(objfiles) mylex.o : mylex.c cc -c -Wall mylex.c lex.yy.c : lex.l y.tab.h flex lex.l lex.yy.o : clean : rm a.out lex.yy.c $(objfiles)

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 25 Output <int> <ID, main> <(> <)> <{> <int> <ID, n> <,> <ID, fact> <,> <ID, i> <;> <ID, scanf> <(> <STRNG, %d> <,> <ID, n> <)> <;> <ID, fact> <=> <NUM, 1> <;> <for> <(> <ID, i> <=> <NUM, 1> <;> <ID, i> <<> <=> <ID, n> <ID, printf> <(> <STRNG, %d! = %d\n> <,> <ID, n> <,> <ID, f <ID, return> <NUM, 0> <;> <}>

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 26 Note There is a problem with the rule for comment. It will give wrong result for the following comment (input4) /* There is a string "within */ this comment" and it will not work */ Try with the rule in lex4.l

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 27 Note When the function yylex() is called, it reads data from the input file (by default it is stdin) and returns a token. At the end-of-file (EOF) the function returns zero (0).

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 28 Input/Output File Names The name of the input file to the scanner can be specified through the global file pointer (FILE *) yyin i.e. yyin = fopen("flex.spec", "r");. Similarly, the output file of the scanner a can also be supplied through the file pointer yyout i.e. yyout = fopen("lex.out", "w");. a Note that it will affect the output of ECHO and not change the output of printf() etc.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 29 Pattern and Action A pattern must start from the first column of the line and the action must start from the same line. Action corresponding to a pattern may be empty. The special parenthesis %{ and %} of the definition part should also start from the first column.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 30 Matching Rules If more than one pattern matches, then the longest match is taken. If both matches are of same length, then the earlier rule will take precedence. In our example if, else, while, for, int are specified above the identifier. If we change the order, the keywords will not be identified separately.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 31 Matching Rules In our example <=, is treated as two symbols < and =. But if there is a rule for <=, a single token can be generated.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 32 Lexeme If there is a match with the input text, the matched lexeme is pointed by the char pointer yytext (global) and its length is available in the variable yyleng. The yytext can be defined to be an array by using %array in the definition section. The array size will be determined by YYLMAX.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 33 ECHO The command ECHO copies the text of yytext to the output stream of the scanner a. The default rule is to echo any unmatched character to the output stream. a By default it is stdout and it can be changed by the file pointer yyout.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 34 yymore(), yyless(int) yymore(): the scanner appends the current lexeme to the next lexeme in the yytext. yyless(n): the scanner puts back all but the first n characters in the input buffer. Next time they will be rescanned. Both yytext and yyleng (n now) are properly modified.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 35 input(), unput(c) input(): reads the next character from the input stream. unput(c): puts the character c back in the input stream a. a flex manual: An important potential problem when using unput() is that if you are using %pointer (the default), a call to unput() destroys the contents of yytext, starting with its rightmost character and devouring one character to the left with each call. If you need the value of yytext preserved after a call to unput() (as in the above example), you must either first copy it elsewhere, or build your scanner using %array instead.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 36 Start Condition There is a mechanism to put a conditional guard for a rule. Let the condition for a rule be <C>. It is written as <C>P {action} The scanner will try to match the input with the pattern P, only if its start condition is C. In the example we have three rules guarded by the start condition <CMNT> used to remove the comment of a C program without any action.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 37 Start Condition A start condition is activated by a BEGIN action. In our example the start condition begins (BEGIN CMNT) after the scanner sees the starting of a comment (/*).

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 38 Start Condition The state of the scanner when no other start condition is active is called INITIAL. All rules without any start condition or with the start condition <INITIAL> are active at this state. The command BEGIN(0) or BEGIN(INITIAL) brings the scanner from any other state to this state. In our example the scanner comes back to INITIAL after consuming the comment.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 39 Start Condition Start conditions can be defined in the definition part of the specification using %s or %x. The start condition specified by %x is called an exclusive start condition. When such a start is active, only the rules guarded by it are activated. On the other hand %s specifies a an inclusive start condition. In our example, only three rules are active after the scanner sees the beginning of a comment (/*).

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 40 Start Condition The scope of start conditions may be nested and the start conditions can be stacked. Three routines are of interest: void yy_push_state(int new_state), void yy_pop_state() and int yy_top_state().

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 41 yywrap() Once the scanner receives the EOF indication from the YY_INPUT macro (zero), the scanner calls the function yywrap(). If there is no other input file, the function returns true and the function yylex() returns zero (0). But if there is another input file, yywrap() opens it, sets the file pointer yyin to it and returns false. The scanner starts consuming input from the new file.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 42 %option noyywrap This option in the definition makes scanner behave as if yywrap() has returned true. No yywrap() function is required.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 43 Prototype of Scanner Function The name and the parameters of the scanner function can be changed by defining:yy_declmacro. Asanexamplewemayhave #define YY_DECL int scanner(vp) yylval *vp;

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 44 Architecture of Flex Scanner The flex compiler includes a fixed program in its scanner. This program simulates the DFA. It reads the input and simulates the state transition using the state transition table constructed from the flex specification. Other parts of the scanner are as follows:

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 45 Architecture of Flex Scanner The transition table, start state and the final states - this comes from the construction of the DFA. Declarations and functions given in the definition and user code of the specification file - these are copied verbatim.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 46 Architecture of Flex Scanner Actions specified in the rules corresponding to different patterns are put in such a way that the simulator can initiate them when a pattern is matched (in the corresponding final state).