Introduction to Lex. General Description Input file Output file How matching is done Regular expressions Local names Using Lex

Similar documents

LEX/Flex Scanner Generator

Compiler Construction

Flex/Bison Tutorial. Aaron Myles Landwehr CAPSL 2/17/2012

Lex et Yacc, exemples introductifs

Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

Programming Project 1: Lexical Analyzer (Scanner)

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

J a v a Quiz (Unit 3, Test 0 Practice)

Programming Languages CIS 443

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Introduction to Java

Compilers Lexical Analysis

Evaluation of JFlex Scanner Generator Using Form Fields Validity Checking

Compiler Construction

Sources: On the Web: Slides will be available on:

A Lex Tutorial. Victor Eijkhout. July Introduction. 2 Structure of a lex file

LEX & YACC TUTORIAL. by Tom Niemann. epaperpress.com

Context free grammars and predictive parsing

University of Wales Swansea. Department of Computer Science. Compilers. Course notes for module CS 218

C Compiler Targeting the Java Virtual Machine

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

Scanner. tokens scanner parser IR. source code. errors

csce4313 Programming Languages Scanner (pass/fail)

Translating to Java. Translation. Input. Many Level Translations. read, get, input, ask, request. Requirements Design Algorithm Java Machine Language

Download at Boykma.Com

ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters

Building Java Programs

Antlr ANother TutoRiaL

Objects for lexical analysis

03 - Lexical Analysis

flex Regular Expressions and Lexical Scanning Regular Expressions and flex Examples on Alphabet A = {a,b} (Standard) Regular Expressions on Alphabet A

Theory of Compilation

Building Java Programs

Scanner. It takes input and splits it into a sequence of tokens. A token is a group of characters which form some unit.

A Grammar for the C- Programming Language (Version S16) March 12, 2016

Chapter 2: Elements of Java

Chapter 13 - The Preprocessor

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

Topic 11 Scanner object, conditional execution

Basics of I/O Streams and File I/O

Some Scanner Class Methods

File class in Java. Scanner reminder. Files 10/19/2012. File Input and Output (Savitch, Chapter 10)

Unit 6. Loop statements

Stacks. Linear data structures

java.util.scanner Here are some of the many features of Scanner objects. Some Features of java.util.scanner

How Compilers Work. by Walter Bright. Digital Mars

Passing 1D arrays to functions.

Compiler I: Syntax Analysis Human Thought

Install Java Development Kit (JDK) 1.8

Topics. Parts of a Java Program. Topics (2) CS 146. Introduction To Computers And Java Chapter Objectives To understand:

Sample CSE8A midterm Multiple Choice (circle one)

Introduction to Java. CS 3: Computer Programming in Java

Explain the relationship between a class and an object. Which is general and which is specific?

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Lecture 5: Java Fundamentals III

Module 816. File Management in C. M. Campbell 1993 Deakin University

Computer Programming Tutorial

The previous chapter provided a definition of the semantics of a programming

So far we have considered only numeric processing, i.e. processing of numeric data represented

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

Pemrograman Dasar. Basic Elements Of Java

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

CompSci 125 Lecture 08. Chapter 5: Conditional Statements Chapter 4: return Statement

Statements and Control Flow

Answers to Review Questions Chapter 7

How to Write a Simple Makefile

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Jerzy Nawrocki, Introduction to informatics

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Part I. Multiple Choice Questions (2 points each):

CSE 1223: Introduction to Computer Programming in Java Chapter 2 Java Fundamentals

Compiler Construction using Flex and Bison

Scanner sc = new Scanner(System.in); // scanner for the keyboard. Scanner sc = new Scanner(System.in); // scanner for the keyboard

Form Validation. Server-side Web Development and Programming. What to Validate. Error Prevention. Lecture 7: Input Validation and Error Handling

Lecture 9. Semantic Analysis Scoping and Symbol Table

Conditionals (with solutions)

Content of this lecture. Regular Expressions in Java. Hello, world! In Java. Programming in Java

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

C++ Input/Output: Streams

Lecture 1 Introduction to Java

Keywords are identifiers having predefined meanings in C programming language. The list of keywords used in standard C are : unsigned void

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

ANTLR - Introduction. Overview of Lecture 4. ANTLR Introduction (1) ANTLR Introduction (2) Introduction

The C++ Language. Loops. ! Recall that a loop is another of the four basic programming language structures

1) Which of the following is a constant, according to Java naming conventions? a. PI b. Test c. x d. radius

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

JDK 1.5 Updates for Introduction to Java Programming with SUN ONE Studio 4

UFTP AUTHENTICATION SERVICE

The following program is aiming to extract from a simple text file an analysis of the content such as:

Visual Basic Programming. An Introduction

Transcription:

Introduction to Lex General Description Input file Output file How matching is done Regular expressions Local names Using Lex

General Description Lex is a program that automatically generates code for scanners. Input: a description of the tokens in the form of regular expressions, together with the actions to be taken when each expression is matched. Output: a text file with C source code defining a procedure yylex() that is a table implementing the DFA for the regular expressions.

Input File Lex input file is divided in three parts /* declarations*/ /* rules */ /* auxiliary functions*/

Input File Declaration The declaration part includes the assignment of names to regular expressions in the form: <name> <regular_exp> It can include C code within %{ and %} in the first column, it can be used to declare local variables. Also, it is possible to specify some options with the syntax %option

Input File - Rules The rules part specifies what to do when a regular expression is matched <regular_exp> <action> Actions are normal C sentences (can be a complex C sentence between {}).

Input File Aux Functions The auxiliary functions part is only C code. It includes function definitions for every function needed in the rule part It can also contain the main() function if the scanner is going to be used as a standalone program. The main() function must call the function yylex()

Code Generated by Lex The output of Lex is a file called lex.yy.c It is a C function that returns an integer, i.e., a code for a Token If it contains the main() function definition, it must be compiled to run. Otherwise, the code can be an external function declaration for the function int yylex()

How matching is done By running the generated scanner, it analyses its input looking for strings that match any of its patterns, and then executes the action. If more than one match is found, it selects the regular expression matching the longest string. If it finds two or more matches of the same length, the one listed first is selected. If no match is found, then the default rule is executed: i.e., the next character in the input is copied to the output.

Regular expressions <char> ::= a the character a <char> ::= s the string s, even if s contains metacharacters <char> ::= \a the character a when a is a metacharacter (e.g., *) <char> ::=. any character except newline <regexp> ::= [<char>+] any of the character <char> <regexp> ::= [<char1>-<char2>] any character from <char1> to <char2> <regexp> ::= [^<char>+] any character except those <char> <regexp> ::= <regexp1>* zero or more repetitions of <regexp1> <regexp> ::= <regexp1>+ one or more repetitions of <regexp1> <regexp> ::= <regexp1>? zero or one repetitions of <regexp1> <regexp> ::= <regexp1> <regexp2> <regexp1> or <regexp2> <regexp> ::= <regexp1><regexp2> <regexp1> followed by <regexp2> <regexp> ::= (<regexp1>) same as <regexp1> <regexp> ::= {<name>} the named regular expression in the definitions part

Internal names The rules, inside the action definition, can refer to the following variables: yytext, the string being matched (lexeme) yyin, the input file yyout, the output file ECHO, the default rule action yyval, the global variable for communicating the Attribute for a Token to the Parser

Using Lex There are several lex versions. We're going to use flex. In order to maximize compatibility, use -l option when compiling, and %option noyywrap in the definition part of the Lex input file.

Example 1 %{ #include <stdio.h> %} [0-9]+ { printf("%s\n", yytext); }. \n ; main() { yylex(); }

Example 2 %{ int c=0, w=0, l=0; %} word [^ \t\n]+ eol \n {word} {w++; c+=yyleng;}; {eol} {c++; l++;}. {c++;} main() { } yylex(); printf("%d %d %d\n", l, w, c);

Example 3 %{ int tokencount=0; %} [a-za-z]+ { printf("%d WORD \"%s\"\n", ++tokencount, yytext); } [0-9]+ { printf("%d NUMBER \"%s\"\n", ++tokencount, yytext); } [^a-za-z0-9]+ { printf("%d OTHER \"%s\"\n", ++tokencount, yytext); } main() { yylex(); }

Example 4 %{ #include <stdio.h> int lineno =1; %} line.*\n {line} {printf("%5d %s", lineno++, yytext); } int main() { yylex(); return 0; }

Example 5 %{ #include <stdio.h> %} comment_line \/\/.*\n {comment_line} { printf("%s\n", yytext); }.*\n ; int main() { yylex(); return 0; }

Example 7 %{ #include <stdio.h> %} digit [0-9] number {digit}+ {number}. {;} { int n = atoi(yytext); printf("%x", n); } int main() { yylex(); return 0; }

Example 8 (1/4) %{ #include "globals.h" #include "util.h" #include "scan.h" int lineno=0; FILE *listing; FILE *code; FILE *source; code int TraceScan = 1; int EchoSource = 1; // used to output source code listing // used to output assembly code // used to input tiny program source /* lexeme of identifier or reserved word */ char tokenstring[maxtokenlen+1]; %} digit [0-9] number {digit}+ letter [a-za-z] identifier {letter}+ newline \n whitespace [ \t]+

Example 8 (2/4) "if" {return IF;} "then" {return THEN;} "else" {return ELSE;} "end" {return END;} "repeat" {return REPEAT;} "until" {return UNTIL;} "read" {return READ;} "write" {return WRITE;} ":=" {return ASSIGN;} "=" {return EQ;} "<" {return LT;} "+" {return PLUS;} "-" {return MINUS;} "*" {return TIMES;} "/" {return OVER;} "(" {return LPAREN;} ")" {return RPAREN;} ";" {return SEMI;}. {return ERROR;}

Example 8 (3/4) {number} {return NUM;} {identifier} {return ID;} {newline} {lineno++;} {whitespace} {/* skip whitespace */} "{" { char c; do { c = input(); if (c == '\n') lineno++; } while (c!= '}'); }

Example 8 (4/4) TokenType gettoken(void) { static int firsttime = TRUE; TokenType currenttoken; if (firsttime) { firsttime = FALSE; lineno++; yyin = source=stdin; yyout = listing=stdout; } currenttoken = (TokenType)yylex(); strncpy(tokenstring,yytext,maxtokenlen); if (TraceScan) { fprintf(listing,"\t%d: ",lineno); printtoken(currenttoken,tokenstring); } return currenttoken; } int main(){ TraceScan = TRUE; while( gettoken()!= ENDFILE); return 0; }