Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science



Similar documents
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Lecture 9. Semantic Analysis Scoping and Symbol Table

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Syntaktická analýza. Ján Šturc. Zima 208

Static vs. Dynamic. Lecture 10: Static Semantics Overview 1. Typical Semantic Errors: Java, C++ Typical Tasks of the Semantic Analyzer

Java Application Developer Certificate Program Competencies

C Compiler Targeting the Java Virtual Machine

Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

CS143 Handout 18 Summer 2012 July 16 th, 2012 Semantic Analysis

Compiler I: Syntax Analysis Human Thought

Semantic Analysis: Types and Type Checking

Master of Sciences in Informatics Engineering Programming Paradigms 2005/2006. Final Examination. January 24 th, 2006

Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

C++ INTERVIEW QUESTIONS

Debugging. Common Semantic Errors ESE112. Java Library. It is highly unlikely that you will write code that will work on the first go

Dynamic Programming Problem Set Partial Solution CMPSC 465

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

The Cool Reference Manual

Intermediate Code. Intermediate Code Generation

Rigorous Software Development CSCI-GA

Compiler Construction

Programming Project 1: Lexical Analyzer (Scanner)

Stack Allocation. Run-Time Data Structures. Static Structures

1.00 Lecture 1. Course information Course staff (TA, instructor names on syllabus/faq): 2 instructors, 4 TAs, 2 Lab TAs, graders

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Programming Languages

The C Programming Language course syllabus associate level

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Using Eclipse CDT/PTP for Static Analysis

The Mjølner BETA system

ProfBuilder: A Package for Rapidly Building Java Execution Profilers Brian F. Cooper, Han B. Lee, and Benjamin G. Zorn

Yacc: Yet Another Compiler-Compiler

Project 2: Bejeweled

Glossary of Object Oriented Terms

A Grammar for the C- Programming Language (Version S16) March 12, 2016

Massachusetts Institute of Technology 6.005: Elements of Software Construction Fall 2011 Quiz 1 October 14, Instructions

Database Programming with PL/SQL: Learning Objectives

Programming Languages CIS 443

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Intel Assembler. Project administration. Non-standard project. Project administration: Repository

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Flex/Bison Tutorial. Aaron Myles Landwehr CAPSL 2/17/2012

Java Generation from UML Models specified with Alf Annotations

Symbol Tables. Introduction

03 - Lexical Analysis

Redesign and Enhancement of the Katja System

Lab Experience 17. Programming Language Translation

How To Write A Test Engine For A Microsoft Microsoft Web Browser (Php) For A Web Browser For A Non-Procedural Reason)

Likewise, we have contradictions: formulas that can only be false, e.g. (p p).

ANTLR - Introduction. Overview of Lecture 4. ANTLR Introduction (1) ANTLR Introduction (2) Introduction

I. INTRODUCTION. International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 2, Mar-Apr 2015

An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases

ARIZONA CTE CAREER PREPARATION STANDARDS & MEASUREMENT CRITERIA SOFTWARE DEVELOPMENT,

Designing with Exceptions. CSE219, Computer Science III Stony Brook University

Textual Modeling Languages

Integrating VoltDB with Hadoop

Standard for Software Component Testing

Java Basics: Data Types, Variables, and Loops

Antlr ANother TutoRiaL

UIL Computer Science for Dummies by Jake Warren and works from Mr. Fleming

GENERIC and GIMPLE: A New Tree Representation for Entire Functions

Software Testing. Definition: Testing is a process of executing a program with data, with the sole intention of finding errors in the program.

Managing Variability in Software Architectures 1 Felix Bachmann*

Bottom-Up Parsing. An Introductory Example

Bitrix Site Manager 4.1. User Guide

Tutorial on C Language Programming

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

TypeScript for C# developers. Making JavaScript manageable

Programming Languages

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

The previous chapter provided a definition of the semantics of a programming

COMP 110 Prasun Dewan 1

Chapter 8: Bags and Sets

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation

Inside the Java Virtual Machine

COSC Introduction to Computer Science I Section A, Summer Question Out of Mark A Total 16. B-1 7 B-2 4 B-3 4 B-4 4 B Total 19

Java Interfaces. Recall: A List Interface. Another Java Interface Example. Interface Notes. Why an interface construct? Interfaces & Java Types

Ontology Model-based Static Analysis on Java Programs

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

AP Computer Science Java Subset

Theory of Compilation

arrays C Programming Language - Arrays

Guide to Performance and Tuning: Query Performance and Sampled Selectivity

Technical paper review. Program visualization and explanation for novice C programmers by Matthew Heinsen Egan and Chris McDonald.

Perl in a nutshell. First CGI Script and Perl. Creating a Link to a Script. print Function. Parsing Data 4/27/2009. First CGI Script and Perl

The Java Series. Java Essentials I What is Java? Basic Language Constructs. Java Essentials I. What is Java?. Basic Language Constructs Slide 1

Transcription:

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2010 Handout Semantic Analysis Project Tuesday, Feb 16 DUE: Monday, Mar 1 Extend your compiler to find, report, and recover from semantic errors in Decaf programs. Most semantic errors can be checked by testing the rules enumerated in the section Semantic Rules of Decaf Spec. These rules are repeated in this handout as well. However, you should read Decaf Spec in its entirety to make sure that your compiler catches all semantic errors implied by the language definition. We have attempted to provide a precise statement of the semantic rules. If you feel some semantic rule in the language definition may have multiple interpretations, you should work with the interpretation that sounds most reasonable to you and clearly list your assumptions in your project documentation. This part of the project includes the following tasks: 1. Createahigh-levelintermediaterepresentation(IR)tree. Youcandothiseitherbyinstructing ANTLR to create one for you, adding actions to your grammar to build a tree, or walking a generic ANTLR tree and building a tree on the way. The problem of designing an IR will be discussed in the lectures; some hints are given in the final section of this handout. When running in debug mode, your driver should pretty-print the constructed IR tree in some easily-readable form suitable for debugging. 2. Build symbol tables for the es. (A symbol table is an environment, i.e. a mapping from identifiers to semantic objects such as variable declarations. Environments are structured hierarchically, in parallel with source-level constructs, such as -bodies, method-bodies, loop-bodies, etc.) 3. Perform all semantic checks by traversing the IR and accessing the symbol tables. Note: the run-time checks are not required for this assignment. What to Hand In Follow the directions given in project overview handout when writing up your project. Your design documentation should include a description of how your IR and symbol table structures are organized, as well as a discussion of your design for performing the semantic checks. Each group will be assigned an athena group and shared space in the course locker. Each group must place their completed submission at: Groups may use: /mit/6.035/group/group/submit/group-semantics.tar.gz /mit/6.035/group/group/ as a place to collaborate and share code. It is recommended that you use revision control and place your repositories in this shared space. Submitted tarballs should have the following structure: * Athena is MIT's UNIX-based computing environment. OCW does not provide access to it.

GROUPNAME-semantics.tar.gz -- GROUPNAME-semantics -- AUTHORS (list of students in your group, one per line) -- code... (full source code, can build by running ant ) -- doc... (write-up, described in project overview handout) -- dist -- Compiler.jar (compiled output, for automated testing) You should be able to run your compiler from the command line with: java -jar dist/compiler.jar -target inter <filename> The resulting output to the terminal should be a report of all errors encountered while compiling the file. Your compiler should give reasonable and specific error messages (with line and column numbers and identifier names) for all errors detected. It should avoid reporting multiple error messages for the same error. For example, if y has not been declared in the assignment statement x=y+1;, the compiler should report only one error message for y, rather than one error message for y, another error message for the +, and yet another error message for the assignment. After you implement the static semantic checker, your compiler should be able to detect and report all static (i.e., compile-time) errors in any input Decaf program, including lexical and syntax errors detected by previous phases of the compiler. In addition, your compiler should not report any messages for valid Decaf programs. However, we do not expect you to avoid reporting spurious error messages that get triggered by error recovery. It is possible that your compiler might mistakenly report spurious semantic errors in some cases depending on the effectiveness of your parser s syntactic error recovery. As mentioned, your compiler should have a debug mode in which the IR and symbol table data structures constructed by your compiler are printed in some form. This can be run from the command line by java -jar dist/compiler.jar -target inter -debug <filename> Test Cases The test cases provided for this project are in: /mit/6.035/provided/semantics/ Read the comments in the test cases to see what we expect your compiler to do. Points will be awarded based on how well your compiler performs on these and hidden tests cases. Complete documentation is also required for full credit. 2

Grading Script As with the previous project, we are providing you with the grading script we will use to test your code (except for the write-up). This script only shows you results for the public test cases. The script can be found on athena in the course locker: The script takes your tarball as the only arg: /mit/6.035/provided/gradingscripts/p2grader.py /mit/6.035/provided/gradingscripts/p2grader.py GROUPNAME-parser.tar.gz Or it works on the extracted directory: /mit/6.035/provided/gradingscripts/p2grader.py GROUPNAME-parser/ Please test your submission against this script. It may help you debug your code and it will help make the grading process go more smoothly. Semantic Rules These rules place additional constraints on the set of valid Decaf programs besides the constraints implied by the grammar. A program that is grammatically well-formed and does not violate any of the following rules is called a legal program. A robust compiler will explicitly check each of these rules, and will generate an error message describing each violation it is able to find. A robust compiler will generate at least one error message for each illegal program, but will generate no errors for a legal program. 1. No identifier is declared twice in the same scope. 2. No identifier is used before it is declared. 3. The program contains a definition for a method called main that has no parameters (note that since execution starts at method main, any methods defined after main will never be executed). 4. The int literal in an array declaration must be greater than 0. 5. The number and types of arguments in a method call must be the same as the number and types of the formals, i.e., the signatures must be identical. 6. If a method call is used as an expression, the method must return a result. 7. A return statement must not have a return value unless it appears in the body of a method that is declared to return a value. 8. The expression in a return statement must have the same type as the declared result type of the enclosing method definition. 9. An id used as a location must name a declared local/global variable or formal parameter. 3

10. For all locations of the form id [ expr ] (a) id must be an array variable, and (b) the type of expr must be int. 11. The expr in an if statement must have type boolean. 12. The operands of arith op s and rel op s must have type int. 13. The operands of eq op s must have the same type, either int or boolean. 14. The operands of cond op s and the operand of logical not (!) must have type boolean. 15. The location and the expr in an assignment, location = expr, must have the same type. 16. The location andthe expr in anincrementing/decrementingassignment, location += expr and location -= expr, must be of type int. 17. The initial expr and the ending expr of for must have type int. 18. All break and continue statements must be contained within the body of a for. Implementation Suggestions You will need to declare es for each of the nodes in your IR. In many places, the hierarchy ofir node es will resemblethelanguagegrammar. For example, apartofyourinheritance tree might look like this (where indentation represents inheritance): abstract Ir abstract abstract abstract abstract. IrExpression IrLiteral IrIntLiteral IrBooleanLiteral IrCallExpr IrMethodCallExpr IrCalloutExpr IrBinopExpr IrStatement IrAssignStmt IrPlusAssignStmt IrBreakStmt IrContinueStmt IrIfStmt IrForStmt IrReturnStmt IrInvokeStmt IrBlock IrClassDecl IrMemberDecl IrMethodDecl IrFieldDecl 4

.. IrVarDecl IrType Classes such as these implement the abstract syntax tree of the input program. In its simplest form, each is simply a tuple of its subtrees, for example: public IrBinopExpr extends IrExpression { private final int operator; private final IrExpression lhs; + private final IrExpression rhs; / \ } lhs rhs or: public IrAssignStmt extends IrStatement { := private final IrLocation lhs; / \ private final IrExpression rhs; lhs rhs } In addition, you ll need to define es for the semantic entities of the program, which represent abstract properties (e.g. expression types, method signatures, descriptors, etc.) and to establish the correspondences between them. Some examples: every expression has a type; every variable declaration introduces a variable; every block defines a scope. Many of these properties are derived by recursive traversals over the tree. As far as possible, you should try to make instances of the the symbol es canonical, so that they may be compared using reference equality. You are strongly advised to design these es with care, employing good software-engineering practice and documenting them, as you will be living with them for the next few months! All error messages should be accompanied by the filename, line and column number of the token most relevant to the error message (use your judgement here). This means that, when building your abstract-syntax tree (or AST), you must ensure that each IR node contains sufficient information for you to determine its line number at some later time. It is not appropriate to throw an exception when encountering an error in the input: doing so would lead to a design in which at most one error message is reported for each run of the compiler. A good front-end saves the user time by reporting multiple errors before stopping, allowing the programmer to make several corrections before having to restart compilation. Semantic checking should be done top-down. While the type-checking component of semantic checking can be done in bottom-up fashion, other kinds of checks (for example, detecting uses of undefined variables) can not. There are two ways of achieving this. The first is to make use of parser actions in the middle of productions. This approach may require less code but can be more complex, because more work needs to be done directly within the ANTLR infrastructure. A cleaner approach is to invoke your semantic checker on a complete AST after parsing has finished. The pseudocode for block in this approach would resemble this: 5

void checkblock(envstack envs, Block b) { envs.push(new Env()); foreach s in b.statements checkstatement(envs, s); envs.pop(); } In this pseudocode, a new environment is created and pushed on the environment stack, the body of the block is then checked in the context of this environment stack, and then the new environment is discarded when the end of the block is reached. The semantic check can thus be expressed as a visitor (which should be familiar from Design Patterns or 6.170) over the AST. (Since code generation, certain optimizations, and prettyprinting can also be naturally expressed as visitors, we recommend this approach for a cleaner implementation.) The treatment of negative integer literals requires some care. Recall from the previous handout that negative integer literals are in fact two separate tokens: the positive integer literal and a preceding -. Whenever your top-down semantic checker finds a unary minus operator, it should check if its operand is a positive integer literal, and if so, replace the subtree (both nodes) by a single, negative integer literal. 6

MIT OpenCourseWare http://ocw.mit.edu 6.035 Computer Language Engineering Spring 2010 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.