Flex/Bison Tutorial. Aaron Myles Landwehr aron+ta@udel.edu CAPSL 2/17/2012



Similar documents
COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

Compiler I: Syntax Analysis Human Thought

Introduction to Lex. General Description Input file Output file How matching is done Regular expressions Local names Using Lex

03 - Lexical Analysis

Compiler Construction

Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

CA Compiler Construction

Compilers Lexical Analysis

Programming Languages CIS 443

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Semantic Analysis: Types and Type Checking

Programming Project 1: Lexical Analyzer (Scanner)

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

Compiler Construction

Static vs. Dynamic. Lecture 10: Static Semantics Overview 1. Typical Semantic Errors: Java, C++ Typical Tasks of the Semantic Analyzer

C Compiler Targeting the Java Virtual Machine

A Lex Tutorial. Victor Eijkhout. July Introduction. 2 Structure of a lex file

Theory of Compilation

CSCI 3136 Principles of Programming Languages

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Context free grammars and predictive parsing

Antlr ANother TutoRiaL

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

Syntaktická analýza. Ján Šturc. Zima 208

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

Chapter 2: Elements of Java

ANTLR - Introduction. Overview of Lecture 4. ANTLR Introduction (1) ANTLR Introduction (2) Introduction

Lecture 18 Regular Expressions

Language Processing Systems

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

LEX/Flex Scanner Generator

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Lecture 9. Semantic Analysis Scoping and Symbol Table

The programming language C. sws1 1

Acknowledgements. In the name of Allah, the Merciful, the Compassionate

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Programming Languages

Introduction to Python

LEX & YACC TUTORIAL. by Tom Niemann. epaperpress.com

Compiler Construction using Flex and Bison

A Grammar for the C- Programming Language (Version S16) March 12, 2016

Syntax Check of Embedded SQL in C++ with Proto

Chapter One Introduction to Programming

Introduction to Data Structures

FIRST and FOLLOW sets a necessary preliminary to constructing the LL(1) parsing table

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

Download at Boykma.Com

Translating to Java. Translation. Input. Many Level Translations. read, get, input, ask, request. Requirements Design Algorithm Java Machine Language

Compiler Design July 2004

Regular Expressions and Automata using Haskell

Yacc: Yet Another Compiler-Compiler

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.

Textual Modeling Languages

Beyond the Mouse A Short Course on Programming

Base Conversion written by Cathy Saxton

Reading 13 : Finite State Automata and Regular Expressions

Lex et Yacc, exemples introductifs

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Introduction to Automata Theory. Reading: Chapter 1

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Introduction to Python

Anatomy of Programming Languages. William R. Cook

University of Wales Swansea. Department of Computer Science. Compilers. Course notes for module CS 218

Evaluation of JFlex Scanner Generator Using Form Fields Validity Checking

arrays C Programming Language - Arrays

How to Write a Simple Makefile

Design Patterns in Parsing

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Scanner. tokens scanner parser IR. source code. errors

Computer Programming Tutorial

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Moving from CS 61A Scheme to CS 61B Java

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

The C Programming Language course syllabus associate level

6.170 Tutorial 3 - Ruby Basics

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

If-Then-Else Problem (a motivating example for LR grammars)

The previous chapter provided a definition of the semantics of a programming

Computer Science 281 Binary and Hexadecimal Review

Sources: On the Web: Slides will be available on:

GENERIC and GIMPLE: A New Tree Representation for Entire Functions

CS143 Handout 08 Summer 2008 July 02, 2007 Bottom-Up Parsing

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi

Computer Science 217

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

A Programming Language Where the Syntax and Semantics Are Mutable at Runtime

C++ INTERVIEW QUESTIONS

Semester Review. CSC 301, Fall 2015

FTP client Selection and Programming

Parsing Expression Grammar as a Primitive Recursive-Descent Parser with Backtracking

Transcription:

Flex/Bison Tutorial Aaron Myles Landwehr aron+ta@udel.edu 1

GENERAL COMPILER OVERVIEW 2

Compiler Overview Frontend Middle-end Backend Lexer / Scanner Parser Semantic Analyzer Optimizers Code Generator 3

Lexer/Scanner Lexical Analysis process of converting a sequence of characters into a sequence of tokens. foo = 1-3**2 Lexeme foo Token Type Variable = Assignment Operator 1 Number - Subtraction Operator 3 Number ** Power Operator 2 Number 4

Parser Syntactic Analysis The process of analyzing a sequence of tokens to determine its grammatical structure. Syntax errors are identified during this stage. Lexeme Token Type foo Variable = Assignment Operator foo = - 1 Number - Subtraction Operator 3 Number ** Power Operator 1 ** 3 2 2 Number 5

Semantic Analyzer Semantic Analysis The process of performing semantic checks. E.g. type checking, object binding, etc. Code: float a = "example"; Semantic Check Error: error: incompatible types in initialization 6

Optimizer(s) Compiler Optimizations tune the output of a compiler to minimize or maximize some attributes of an executable computer program. Make programs faster, etc 7

Code Generator Code Generation process by which a compiler's code generator converts some intermediate representation of source code into a form (e.g., machine code) that can be readily executed by a machine. int foo() { } return 345; foo: addiu $sp, $sp, -16 addiu $2, $zero, 345 addiu $sp, $sp, 16 jr $ra 8

LEX/FLEX AND YACC/BISON OVERVIEW 9

General Lex/Flex Information lex is a tool to generator lexical analyzers. It was written by Mike Lesk and Eric Schmidt (the Google guy). It isn t used anymore. flex (fast lexical analyzer generator) Free and open source alternative. You ll be using this. 10

General Yacc/Bison Information yacc Is a tool to generate parsers (syntactic analyzers). Generated parsers require a lexical analyzer. It isn t used anymore. bison Free and open source alternative. You ll be using this. 11

Lex/Flex and Yacc/Bison relation to a compiler toolchain Frontend Middle-end Backend Lexer / Scanner Parser Semantic Analyzer Optimizers Code Generator Lex/Flex (.l spec file) Yacc/Bison (.y spec file) 12

FLEX IN DETAIL 13

How Flex Works Flex uses a.l spec file to generate a tokenizer/scanner..l spec file flex lex.yy.c The tokenizer reads an input file and chunks it into a series of tokens which are passed to the parser. 14

Flex.l specification file /*** Definition section ***/ %{ /* C code to be copied verbatim */ %} /* This tells flex to read only one input file */ %option noyywrap %% /*** Rules section ***/ /* [0-9]+ matches a string of one or more digits */ [0-9]+ { /* yytext is a string containing the matched text. */ printf("saw an integer: %s\n", yytext); }. \n { /* Ignore all other characters. */ } %% /*** C Code section ***/ 15

Flex Rule Format Matches text input via Regular Expressions Returns the token type. Format: REGEX { } /*Code*/ return TOKEN-TYPE; 16

Flex Regex Matching Rules Flex matches the token with the longest match: Input: abc Rule: [a-z]+ Token: abc(not "a" or "ab") Flex uses the first applicable rule: Input: post Rule1: "post" { printf("hello,"); } Rule2: [a-za-z]+ { printf ("World!"); } It will print Hello, (not World! ) 17

Flex Example [0-9]+ { } [A-Za-z]+ { } /*Code*/ yylval.dval = atof(yytext); return NUMBER; /*Code*/ struct symtab *sp = symlook(yytext); yylval.symp = sp; return WORD;. { return yytext[0]; } 18

Flex Example [0-9]+ { } [A-Za-z]+ { } Match one or more characters between 0-9. /*Code*/ yylval.dval = atof(yytext); return NUMBER; /*Code*/ struct symtab *sp = symlook(yytext); yylval.symp = sp; return WORD;. { return yytext[0]; } 19

Flex Example [0-9]+ { } /*Code*/ yylval.dval = atof(yytext); return NUMBER; Store the Number. [A-Za-z]+ { } /*Code*/ struct symtab *sp = symlook(yytext); yylval.symp = sp; return WORD;. { return yytext[0]; } 20

Flex Example [0-9]+ { } [A-Za-z]+ { } /*Code*/ yylval.dval = atof(yytext); return NUMBER; Return the token type. Declared in the.y file. /*Code*/ struct symtab *sp = symlook(yytext); yylval.symp = sp; return WORD;. { return yytext[0]; } 21

Flex Example [0-9]+ { } [A-Za-z]+ { } /*Code*/ yylval.dval = atof(yytext); return NUMBER; Match one or more alphabetical characters. /*Code*/ struct symtab *sp = symlook(yytext); yylval.symp = sp; return WORD;. { return yytext[0]; } 22

Flex Example [0-9]+ { } [A-Za-z]+ { } /*Code*/ yylval.dval = atof(yytext); return NUMBER; /*Code*/ struct symtab *sp = symlook(yytext); yylval.symp = sp; return WORD; Store the text.. { return yytext[0]; } 23

Flex Example [0-9]+ { } [A-Za-z]+ { } /*Code*/ yylval.dval = atof(yytext); return NUMBER; /*Code*/ struct symtab *sp = symlook(yytext); yylval.symp = sp; return WORD; Return the token type. Declared in the.y file.. { return yytext[0]; } 24

Flex Example [0-9]+ { } [A-Za-z]+ { Match any single character } /*Code*/ yylval.dval = atof(yytext); return NUMBER; /*Code*/ struct symtab *sp = symlook(yytext); yylval.symp = sp; return WORD;. { return yytext[0]; } 25

Flex Example [0-9]+ { } /*Code*/ yylval.dval = atof(yytext); return NUMBER; [A-Za-z]+ { } /*Code*/ struct symtab *sp = symlook(yytext); yylval.symp = sp; return WORD;. { return yytext[0]; } Return the character. No need to create special symbol for this case. 26

BISON IN DETAIL 27

How Bison Works Bison uses a.y spec file to generate a parser..y spec file bison somename.tab.c somename.tab.h The parser reads a series of tokens and tries to determine the grammatical structure with respect to a given grammar. 28

What is a Grammar? A grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet (tokens) that are valid according to the language's syntax. 29

Simple Example Grammar E E + E E - E E * E E / E id Above is a simple grammar that allows recursive math operations 30

Simple Example Grammar E E + E E - E E * E E / E id These are productions 31

Simple Example Grammar E E + E E - E E * E E / E id In this case expressions (E) can be made up of the statements on the right. *Note: the order of the right side doesn t matter. 32

Simple Example Grammar E E + E E - E E * E E / E id How does this work when parsing a series of tokens? 33

Simple Example Grammar Lexeme Token Type 2 Number + Addition Operator 2 Number - Subtraction Operator 1 Number E E + E E - E E * E E / E id Suppose we had the following tokens: 2 + 2-1 34

Simple Example Grammar Lexeme Token Type 2 Number + Addition Operator 2 Number - Subtraction Operator 1 Number E E + E E - E E * E E / E id We start by parsing from the left. We find that we have an id. Suppose we had the following tokens: 2 + 2-1 35

Simple Example Grammar Lexeme Token Type 2 Number + Addition Operator 2 Number - Subtraction Operator 1 Number E E + E E - E E * E E / E id An id is an expression. Suppose we had the following tokens: 2 + 2-1 36

Simple Example Grammar Lexeme Token Type 2 Number + Addition Operator 2 Number - Subtraction Operator 1 Number E E + E E - E E * E E / E id Next it will match one of the rules based on the next token because the parser know 2 is an expression. Suppose we had the following tokens: 2 + 2-1 37

Simple Example Grammar Lexeme Token Type 2 Number + Addition Operator 2 Number - Subtraction Operator 1 Number E E + E E - E E * E E / E id The production with the plus is matched because it is the next token in the stream. Suppose we had the following tokens: 2 + 2-1 38

Simple Example Grammar Lexeme Token Type 2 Number + Addition Operator 2 Number - Subtraction Operator 1 Number E E + E E - E E * E E / E id Next we move to the next token which is an id and thus an expression. Suppose we had the following tokens: 2 + 2-1 39

Simple Example Grammar Lexeme Token Type 2 Number + Addition Operator 2 Number - Subtraction Operator 1 Number E E + E E - E E * E E / E id We know that E + E is an expression. So we can apply the same ideas and move on until we finish parsing Suppose we had the following tokens: 2 + 2-1 40

Bison.y specification file /*** Definition section ***/ %{ /* C code to be copied verbatim */ %} %token <symp> NAME %token <dval> NUMBER %left '-' '+' %left '*' '/' %type <dval> expression %% /*** Rules section ***/ statement_list: statement '\n' statement_list statement '\n' statement: NAME '=' expression { $1->value = $3; } expression { printf("= %g\n", $1); } expression: NUMBER NAME { $$ = $1->value; } %% /*** C Code section ***/ 41

Bison: definition Section Example /*** Definition section ***/ %{ /* C code to be copied verbatim */ %} %token <symp> NAME %token <dval> NUMBER %left '-' '+' %left '*' '/' %type <dval> expression 42

Bison: definition Section Example /*** Definition section ***/ %{ /* C code to be copied verbatim */ %} %token <symp> NAME %token <dval> NUMBER Declaration of Tokens: %token <TYPE> NAME %left '-' '+' %left '*' '/' %type <dval> expression 43

Bison: definition Section Example /*** Definition section ***/ %{ /* C code to be copied verbatim */ %} Lower Higher %token <symp> NAME %token <dval> NUMBER %left '-' '+' %left '*' '/' %type <dval> expression Operator Precedence and Associativity 44

Bison: definition Section Example /*** Definition section ***/ %{ /* C code to be copied verbatim */ %} %token <symp> NAME %token <dval> NUMBER %left '-' '+' %left '*' '/' Associativity Options: %left - a OP b OP c %right - a OP b OP c %nonassoc - a OP b OP c (ERROR) %type <dval> expression 45

Bison: definition Section Example /*** Definition section ***/ %{ /* C code to be copied verbatim */ %} %token <symp> NAME %token <dval> NUMBER %left '-' '+' %left '*' '/' %type <dval> expression Defined non-terminal name (the left side of productions) 46

Bison: rules Section Example /*** Rules section ***/ statement_list: statement '\n' statement_list statement '\n' statement: NAME '=' expression { $1->value = $3; } expression { printf("= %g\n", $1); } expression: NUMBER NAME { $$ = $1->value; } 47

Bison: rules Section Example /*** Rules section ***/ statement_list: statement '\n' statement_list statement '\n' statement: NAME '=' expression { $1->value = $3; } expression { printf("= %g\n", $1); } expression: NUMBER NAME { $$ = $1->value; } This is the grammar for bison. It should look similar to the simple example grammar from before. 48

Bison: rules Section Example /*** Rules section ***/ statement_list: statement '\n' statement_list statement '\n' statement: NAME '=' expression { $1->value = $3; } expression { printf("= %g\n", $1); } expression: NUMBER NAME { $$ = $1->value; } What this says is that a statement list is made up of a statement OR a statement list followed by a statement. 49

Bison: rules Section Example /*** Rules section ***/ statement_list: statement '\n' statement_list statement '\n' statement: NAME '=' expression { $1->value = $3; } expression { printf("= %g\n", $1); } expression: NUMBER NAME { $$ = $1->value; } The same logic applies here also. The first production is an assignment statement, the second is a simple expression. 50

Bison: rules Section Example /*** Rules section ***/ statement_list: statement '\n' statement_list statement '\n' statement: NAME '=' expression { $1->value = $3; } expression { printf("= %g\n", $1); } expression: NUMBER NAME { $$ = $1->value; } This simply says that an expression is a number or a name. 51

Bison: rules Section Example /*** Rules section ***/ statement_list: statement '\n' statement_list statement '\n' statement: NAME '=' expression { $1->value = $3; } expression { printf("= %g\n", $1); } expression: NUMBER NAME { $$ = $1->value; } This is an executable statement. These are found to the right of a production. When the rule is matched, it is run. In this particular case, it just says to return the value. 52

Bison: rules Section Example /*** Rules section ***/ statement_list: statement '\n' statement_list statement '\n' 1 2 3 statement: NAME '=' expression { $1->value = $3; } expression { printf("= %g\n", $1); } expression: NUMBER NAME { $$ = $1->value; } The numbers in the executable statement correspond to the tokens listed in the production. They are numbered in ascending order. 53

ABOUT YOUR ASSIGNMENT 54

What you need to do You are given a prefix calculator. + 2 4 You need to make infix and postfix versions of the calculator. 2 + 4 2 4 + You then need to add support for additional operators to all three calculators. 55

Hints Name your calculators infix and postfix. You don t need to change the c code section of the.y. You may need to define new tokens for parts of the assignment. 56

Credit Wikipedia Most of the content is from or based off of information from here. Wookieepedia Nothing was taken from here. Not even this picture of Chewie. 2008 Tutorial * *From Wikipedia: qualifies as fair use under United States Copyright law. 57