Compiler Construction



Similar documents
CSCI 3136 Principles of Programming Languages

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Programming Languages CIS 443

CA Compiler Construction

Compiler I: Syntax Analysis Human Thought

Programming Languages

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

03 - Lexical Analysis

Semantic Analysis: Types and Type Checking

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

Compiler Construction

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

C Compiler Targeting the Java Virtual Machine

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

An Incomplete C++ Primer. University of Wyoming MA 5310

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Comp151. Definitions & Declarations

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

COMPUTER SCIENCE 1999 (Delhi Board)

PART-A Questions. 2. How does an enumerated statement differ from a typedef statement?

Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment

Language Processing Systems

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

Basics of I/O Streams and File I/O

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

Appendix K Introduction to Microsoft Visual C++ 6.0

High-Level Programming Languages. Nell Dale & John Lewis (adaptation by Michael Goldwasser)

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

How Compilers Work. by Walter Bright. Digital Mars

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

13 File Output and Input

Programming Project 1: Lexical Analyzer (Scanner)

Moving from CS 61A Scheme to CS 61B Java

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Storage Classes CS 110B - Rule Storage Classes Page 18-1 \handouts\storclas

Embedded SQL. Unit 5.1. Dr Gordon Russell, Napier University

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

Lecture 9. Semantic Analysis Scoping and Symbol Table

J a v a Quiz (Unit 3, Test 0 Practice)

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Static vs. Dynamic. Lecture 10: Static Semantics Overview 1. Typical Semantic Errors: Java, C++ Typical Tasks of the Semantic Analyzer

MPLAB TM C30 Managed PSV Pointers. Beta support included with MPLAB C30 V3.00

Introduction to Programming (in C++) Loops. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

Example. Introduction to Programming (in C++) Loops. The while statement. Write the numbers 1 N. Assume the following specification:

csce4313 Programming Languages Scanner (pass/fail)

Chapter 6: Programming Languages

Chapter One Introduction to Programming

Passing 1D arrays to functions.

CpSc212 Goddard Notes Chapter 6. Yet More on Classes. We discuss the problems of comparing, copying, passing, outputting, and destructing

02 B The Java Virtual Machine

2) Write in detail the issues in the design of code generator.

The C Programming Language course syllabus associate level

C++ Outline. cout << "Enter two integers: "; int x, y; cin >> x >> y; cout << "The sum is: " << x + y << \n ;

What is a Loop? Pretest Loops in C++ Types of Loop Testing. Count-controlled loops. Loops can be...

Informatica e Sistemi in Tempo Reale

C++FA 5.1 PRACTICE MID-TERM EXAM

13 Classes & Objects with Constructors/Destructors

Lecture 3. Arrays. Name of array. c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] c[8] c[9] c[10] c[11] Position number of the element within array c

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Semester Review. CSC 301, Fall 2015

IS0020 Program Design and Software Tools Midterm, Feb 24, Instruction

Keil C51 Cross Compiler

Symbol Tables. Introduction

The C++ Language. Loops. ! Recall that a loop is another of the four basic programming language structures

Explain the relationship between a class and an object. Which is general and which is specific?

CISC 181 Project 3 Designing Classes for Bank Accounts

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

C++ INTERVIEW QUESTIONS

The programming language C. sws1 1

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

Introduction to Java

UIL Computer Science for Dummies by Jake Warren and works from Mr. Fleming

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Lecture 1: Introduction

Chapter 7D The Java Virtual Machine

CE 504 Computational Hydrology Computational Environments and Tools Fritz R. Fiedler

Introduction to Lex. General Description Input file Output file How matching is done Regular expressions Local names Using Lex

Pemrograman Dasar. Basic Elements Of Java

File class in Java. Scanner reminder. Files 10/19/2012. File Input and Output (Savitch, Chapter 10)

El Dorado Union High School District Educational Services

Glossary of Object Oriented Terms

Flex/Bison Tutorial. Aaron Myles Landwehr CAPSL 2/17/2012

Java Programming Fundamentals

Software: Systems and Application Software

Sources: On the Web: Slides will be available on:

AP Computer Science Java Subset

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

The previous chapter provided a definition of the semantics of a programming

Data Structures using OOP C++ Lecture 1

Simple C Programs. Goals for this Lecture. Help you learn about:

ALLIED PAPER : DISCRETE MATHEMATICS (for B.Sc. Computer Technology & B.Sc. Multimedia and Web Technology)

Member Functions of the istream Class

Moving from C++ to VBA

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z)

Programming Languages

Transcription:

Compiler Construction Lecture 1 - An Overview 2003 Robert M. Siegfried All rights reserved A few basic definitions Translate - v, a.to turn into one s own language or another. b. to transform or turn from one of symbols into another Translator - n, someone or something that translates. Compilers are translators that produce object code (machine-runnable version) from source code (humanreadable version). Interpreters are translators that translate only as much as is necessary to run the next statement of the program.

Source language program Source Language - the language in which the source code is written Target Language - the language in which the object code is written Implementation Language - Language in which the compiler is written Compiler Target language program Example: C++ or Java program Compiler Pentium machine language program Choice of an Implementation Language The implementation language for compilers used to be assembly language. It is now customary to write a compiler in the source language. Why? The compiler itself can then be used as a sample program to test the compiler s ability to translate complex programs that utilize the various features of the source language.

Source Code The Compiling Process Compiler Object Module Linker Executable version Assembler version The Interpretation Process Source Code Interpreter Intermediate Interpreter Output Version Input

Source language - designed to be machine-translatable ( Context-free grammar ) e.g., FORTRAN, COBOL, Pascal, C, BASIC, LISP Portable, i.e., programs can be moved from one computer to another with minimal or no rewriting. The Level of Abstraction matches the problem and not the hardware. Does not require an intimate knowledge of the computer hardware Assembly language - machine acronyms for machine language commands. e.g., mov ax, 3 Eliminates the worst of the details, but leaves many to be dealt with. Object Module - a machine language version of the program lacking some necessary references. e.g., on the Intel 8x86 (in real mode) 1011 1 000 0000 0000 0000 0003 mov (from register 16-bit AX the immediate value to immediate) value reg. Load Module - a machine language version that is complete with addresses of all variables and routines.

Other types of Compilers There are compilers that do not necessarily follow this model: Load-and-go compilers generate executable code without the use of a linker. Cross compilers run on one type of computer and generate translations for other classes of computers. Cross-language compilers translate from one high-level language to another. (e.g., C++ to C) The organization of a compiler The various components of a compiler are organized into a front end and a back end. The front end is designed to produce some intermediate representation of a program written in the source language The back end is designed to produce a program for a target computer from the intermediate representation. Source language program Front end Intermediate Language program Back end Machine Language Program

Why Separate Front and Back Ends? BASIC PC COBOL C++ Java Ada JVM MacIntosh Linux Workstation Why Generate Intermediate Code? BASIC COBOL C++ Java Ada Quadruples IBM PC JVM MacIntosh Linux Workstation

Components of a Compiler - The Front End Source Code Lexical Analyzer (Scanner) Tokens Syntactic Analyzer (Parser) Parse tree Intermediate Code Intermediate Code Generator Annotated AST Semantic Analyzer Components of a Compiler - The Back End Intermediate Code Machine- Independent Optimizer Optimized Intermediate Code Object Code Generator Object Code Optimized Object Code Machine- Dependent Optimizer

Lexical Analysis The lexical analyzer (or scanner) breaks up the stream of text into a stream of strings called lexemes (or token strings) The scanner checks one character at a time until it determines that it has found a character which does not belong in the lexeme. The scanner looks it up in the symbol table (inserting it if necessary) and determines the token associated with that lexeme. Lexical Analysis (continued) Token - the language component that the character string read represents. Scanners usually reads the text of the program either a line or a block at a time. (File I/O is rather inefficient compared to other operations within the compiler.

Syntactic Analysis A syntactic analyzer (or parser) takes the stream of tokens determines the syntactic structure of the program. The parser creates a structure called a parse tree. The parser usually does not store the parse in memory or on disk, but it does formally recognize program s the grammatical structure Syntactic Analysis (continued) The grammar of a language is expressed formally as G = (T, N, S, P) where T is a set of terminals (the basic, atomic symbols of a language). N is a set of nonterminals (symbols which denote particular arrangements of terminals). S is the start symbol (a special nonterminal which denotes the program as a whole). P is the set of productions (rules showing how terminals and nonterminal can be arranged to form other nonterminals.

Syntactic Analysis (continued) An example of terminal would be PROGRAM, ID, and :=. An example of a nonterminal would be Program, Block and Statement. The start symbol in most cases would be Program An example of a production would be Block ::= BEGIN Statements END Semantic Analysis Semantic analysis involves ensuring that the semantics (or meaning) of the program is correct. It is quite possible for a program to be correct syntactically and to be correct semantically. Semantic analysis usually means making sure that the data types and control structures of a program are used correctly.

Semantic Analysis (continued) The various semantic analysis routines are usually incorporated into the parser and do not usually comprise a separate phase of the compiling process. The process of generating an intermediate representation (usually an abstract syntax tree) is usually directed by the parsing of the program. A More Realistic View of the Front End Lexical Analyzer (Scanner) Source Code gettoken() tokens Syntactic Analyzer (Parser) Annotated AST Call actions return Semantic Actions Intermediate Code Intermediate Code Generator

Error detection in Source Programs All the previous stages analyze the program, looking for potential errors. FOR i!= 1 TO n DO WriteLn; Lexical error IF x > N THEN Y := -3; ELSE Y := 3; Syntactic error Error Detection in Source Programs PROGRAM Average; VAR Average : Integer; Sum, Val1, Val2, Val3 : Real; BEGIN Val1 := 6.0; Val2 := 4; Val3 := 37.5; Mixed-typed assignment Sum := Val1 + Val2 + Val3; Average := (Val1 + Val2 + Val3) DIV 3 END. { Average Semantic error

Intermediate Code Generation The intermediate code generator creates a version of the program in some machineindependent language that is far closer to the target language than to the source language. The abstract syntax tree may serve as an intermediate representation. Object Code Generation The object code generator creates a version of the program in the target machine s own language. The process is significantly different from intermediate code generation. It may create an assembly language version of the program, although this is not the usual case.

An example of the compiling process int main() { float average; int x[3]; int i, sum; x[0] = 3; x[1] = 6; x[2] = 10; sum = 0; for (i = 0; i < 3; i++) sum += x[i]; average := Sum/3; An example of Lexical Analysis The tokens are: INT ID ( ) { FLOAT ID ; INT ID [ NUMLITERAL ] ; INT ID, ID ; ID [ NUMLITERAL ] = NUMLITERAL ; and so on

A sample parse tree Program functiondeclarator typespecifier functiondefinition functionbody INT declarator ID ( ) type-decllist functionbody { statement list The corresponding Abstract Syntax Tree ID INT VOID decls Statements ID ID ID

The intermediate code for the example main: x[0] = 3 x[1] = 6 x[2] = 10 sum = 0 i =0 t1: if i >= 3 goto t2: t3 : = x[i] Sum : = Sum + t3 goto t1 t2: Average := Sum / 3 Basic Block Basic Block The assembler code for the example _main PROC NEAR ; COMDAT ; File C:\MyFiles\Source\avg3\avg3.c ; Line 4 push ebp mov ebp, esp sub esp, 88 push ebx push esi push edi.. mov DWORD PTR _x$[ebp], 3 mov DWORD PTR _x$[ebp+4], 6 mov DWORD PTR _x$[ebp+8], 10 mov DWORD PTR _sum$[ebp], 0

The Symbol Table The symbol table tracks all symbols used in a given program. This includes: Key words Standard identifiers Numeric, character and other literals User-defined data types User-defined variables The Symbol Table (continued) Symbol tables must contain: Token class Lexemes Scope Types Pointers to other symbol table entries (as necessary)

Shaper - an example of a translator Shaper is a microscopic language which draws rectangles, square and right isosceles triangles on the screen. Shaper has three statements: RECTANGLE {WIDE or LONG Number BY Number SQUARE SIZE Number TRIANGLE SIZE Number Example RECTANGLE LONG 6 by 5 RECTANGLE WIDE 15 BY 30 SQUARE SIZE 9 TRIANGLE SIZE 5 The Shaper Translator #include #include #include #include #include <iostream.h> <fstream.h> <ctype.h> <stdlib.h> <string.h> enum tokentype {tokby, tokeof, tokerror, tokrectangle, toksize, toksquare, toktriangle, tokwide; char *tokenname[] = {"by","eof", "error", "long", "number", "rectangle", "size","square", "triangle", "wide";

const int filenamesize = 40, tokenstringlength = 15, numtokens = 10; int wordsearch(char *test, char *words[], int len); class scanner { public: scanner(int argcount, char *arg[]); scanner(void); ~scanner(void); tokentype scan(char tokenstring[]); private: tokentype scanword(char c, char tokenstring[]); tokentype scannum(char c, char tokenstring[]); ifstream infile; ; scanner::scanner(int argcount, char *arg[]) { char filename[filenamesize]; // If there is only one argument, it must be // the program file for Shaper. That means // that we need the source file. // If there are two arguments, we have it // already as the second argument. If there // are more, there must be a mistake. if (argcount == 1) { cout << "Enter program file name\t?"; cin >> filename; else if (argcount == 2) strcpy(filename, arg[1]);

else { cerr << "Usage: Shaper <filename>\n"; exit(1); infile.open(filename, ios::in); if (!infile) { cerr << "Cannot open " << filename << endl; exit(1); // scanner() - Default constructor for the // scanner scanner::scanner(void) { char filename[filenamesize]; cout << "Enter program file name\t?"; cin >> filename; // Open the input file infile.open(filename, ios::in); if (!infile) { cerr << "Cannot open " << filename << endl; exit(1);

scanner::~scanner(void) { infile.close(); //scan() - Scan out the words of the language tokentype scanner::scan(char tokenstring[]) { char c; // Skip the white space in the program while (!infile.eof() && isspace(c=infile.get())) ; // If this is the end of the file, send the // token that indicates this if (infile.eof()) return(tokeof);

//If it begins with a letter, it is a word. If //begins with a digit, it is a number. Otherwise, //it is an error. if (isalpha(c)) return(scanword(c, tokenstring)); else if (isdigit(c)) return(scannum(c, tokenstring)); else return(tokerror); //scanword() - Scan until you encounter // something other than a letter. // It uses a binary search to find // the appropriate token in the // table. tokentype scanner::scanword(char c, char tokenstring[]) { int i = 0; tokentype tokenclass; // Build the string one character at a time. // It keep scanning until either the end of // file or until it encounters a non-letter tokenstring[i++] = c;

while (!infile.eof() && isalpha(c = infile.get())) tokenstring[i++] = c; tokenstring[i] ='\0'; // Push back the last character infile.putback(c); // Is this one of the legal keywords for // Shaper? If not, it's an error if ((tokenclass = (tokentype)wordsearch(tokenstring, tokenname, numtokens)) == -1) return(tokerror); else return(tokenclass); //scannum() - It returns the token toknumber. // The parser will receive the // number as a string and is // responsible for converting it // into numerical form. tokentype scanner::scannum(char c, char tokenstring[]) { int i = 0; // Scan until you encounter something that // cannot be part of a number or the end of // file tokenstring[i++] = c;

while (!infile.eof() && tokenstring[i++] = c; isdigit(c = infile.get())) tokenstring[i] = '\0'; // Push back the last character infile.putback(c); return(toknumber); Managing the Symbol Table //wordsearch() - A basic binary search to find a // string in an array of strings int wordsearch(char *test, char *words[], int len) { int low = 0, mid, high = len - 1; // Keep searching as long as we haven't // searched the whole array while (low <= high) { mid = (low + high)/2; if (strcmp(test,words[mid]) < 0) // search the lower half high = mid - 1;

else if (strcmp(test,words[mid]) > 0) // search the upper half low = mid + 1; else // We found it!! return(mid); // It isn't there return(-1); Parsing A Shaper Program class parser : scanner { public: parser(int argcount, char *args[]); parser(void); void ProcProgram(void); private: void ProcRectangle(void); void ProcSquare(void); void ProcTriangle(void); tokentype tokenclass; char tokenstring[tokenstringlength]; ;

// parser() - A constructor that passes // initial values to the base // class parser::parser(int argcount, char *args[]) : scanner (argcount,args) { // Get the first token tokenclass = scan(tokenstring); // parser() - A default constructor parser::parser(void) { // Get the first token tokenclass = scan(tokenstring); void parser::procprogram(void) { // Get a token and depending on that token's // value, parse the statement. while (tokenclass!= tokeof) switch(tokenclass) { case tokrectangle: ProcRectangle(); tokenclass = scan(tokenstring); break; case toksquare: ProcSquare(); tokenclass = scan(tokenstring); break;

case toktriangle: ProcTriangle(); tokenclass = scan(tokenstring); break; default: cerr << tokenstring << " is not a legal << statement\n" << endl; exit(3); //ProcRectangle() - Parse the rectangle // command and if there // are no errors, it will / produce a rectangle // on the whose dimensions // are set by the // rectangle statement. void parser::procrectangle(void) { int shape, columns, rows; char tokenstring[tokenstringlength]; // The next word should be wide or long to // indicate whether there are more rows or // columns. This is not really necessary for // the statement to work correctly, but is a // good simple illustration of how type // checking works.

if ((tokenclass = scan(tokenstring))!= tokwide && tokenclass!= toklong) { cerr << "Expected \"wide\" or \"long\ << instead of " << tokenstring << endl; exit(4); // Get the number of columns and if it is a // number if ((tokenclass = scan(tokenstring))!= toknumber) { cerr << "Expected number instead of " exit(5); << tokenstring << endl; // The token by is simply a separator but the // grammar requires it. if ((tokenclass = scan(tokenstring))!= tokby){ cerr << "Expected \"by\" instead of " << tokenstring << endl; // Get the number of rows and if it is a // number if ((tokenclass = scan(tokenstring))!= toknumber) { cerr << "Expected number instead of " << tokenstring << endl; exit(5);

Adding the Semantic Actions to ProcRectangle void parser::procrectangle(void) { int shape, columns, rows; chartokenstring[tokenstringlength]; // The next word should be wide or long to indicate // whether there are more rows or columns. This is // not really necessary for the statement to work // correctly, but is a good simple illustration of // how type checking works. if ((tokenclass = scan(tokenstring))!= tokwide && tokenclass!= toklong) { cerr << "Expected \"wide\" or \"long\" instead << of << tokenstring << endl; exit(4); // The shape is indicated by whether this // token was wide or long shape = tokenclass; // Get the number of columns and if it is a number, // convert the character string into an integer if ((tokenclass = scan(tokenstring))!= toknumber) { cerr << "Expected number instead of " << tokenstring << endl; exit(5); columns = atoi(tokenstring); // The token by is simply a separator but the // grammar requires it. if ((tokenclass = scan(tokenstring))!= tokby){ cerr << "Expected \"by\" instead of " << tokenstring << endl;

// Get the number of rows and if it is a // number, convert the character string into // an integer. if ((tokenclass = scan(tokenstring))!= toknumber) { cerr << "Expected number instead of " << tokenstring << endl; exit(5); rows = atoi(tokenstring); // A long rectangle should have more rows than // columns and a wide rectangle will have the // opposite. This illustrates how type // checking works on a facile level. if (shape == toklong && columns < rows shape == tokwide && columns > rows) { cerr << "A " << tokenname[shape] << " rectangle cannot be " << columns << " by " << rows << endl; exit(6); DrawRectangle(columns, rows);