Villanova University CSC 2400: Computer Systems I



Similar documents
Compiler Construction

Unix Shell Scripts. Contents. 1 Introduction. Norman Matloff. July 30, Introduction 1. 2 Invoking Shell Scripts 2

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Programming Languages CIS 443

Repetition Using the End of File Condition

Some Scanner Class Methods

JavaScript: Introduction to Scripting Pearson Education, Inc. All rights reserved.

Variables, Constants, and Data Types

ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters

.NET Standard DateTime Format Strings

A Lex Tutorial. Victor Eijkhout. July Introduction. 2 Structure of a lex file

CS 1133, LAB 2: FUNCTIONS AND TESTING

C++ INTERVIEW QUESTIONS

CS 241 Data Organization Coding Standards

The programming language C. sws1 1

Compiler I: Syntax Analysis Human Thought

Scanner. tokens scanner parser IR. source code. errors

Sources: On the Web: Slides will be available on:

Regular Expressions. In This Appendix

How To Write Portable Programs In C

Chapter One Introduction to Programming

HTML Web Page That Shows Its Own Source Code

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

How to Write a Simple Makefile

CS106A, Stanford Handout #38. Strings and Chars

C Examples! Jennifer Rexford!

Member Functions of the istream Class

Regular Expressions and Automata using Haskell

CS 106 Introduction to Computer Science I

03 - Lexical Analysis

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Outline Basic concepts of Python language

Lecture 22: C Programming 4 Embedded Systems

IMPORTANT USER INFORMATION

The C++ Language. Loops. ! Recall that a loop is another of the four basic programming language structures

Compilers Lexical Analysis

Simple C Programs. Goals for this Lecture. Help you learn about:

Pemrograman Dasar. Basic Elements Of Java

Topics. Parts of a Java Program. Topics (2) CS 146. Introduction To Computers And Java Chapter Objectives To understand:

Sequential Program Execution

C++ Language Tutorial

JavaScript: Control Statements I

About The Tutorial. Audience. Prerequisites. Copyright & Disclaimer

Ecma/TC39/2013/NN. 4 th Draft ECMA-XXX. 1 st Edition / July The JSON Data Interchange Format. Reference number ECMA-123:2009

Introduction to Automata Theory. Reading: Chapter 1

Learn Perl by Example - Perl Handbook for Beginners - Basics of Perl Scripting Language

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Introduction to Visual C++.NET Programming. Using.NET Environment

Lecture 18 Regular Expressions

Moving from CS 61A Scheme to CS 61B Java

csce4313 Programming Languages Scanner (pass/fail)

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

C / C++ and Unix Programming. Materials adapted from Dan Hood and Dianna Xu

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

Real SQL Programming 1

Appendix K Introduction to Microsoft Visual C++ 6.0

Basics of I/O Streams and File I/O

Informatica e Sistemi in Tempo Reale

Sample CSE8A midterm Multiple Choice (circle one)

So far we have considered only numeric processing, i.e. processing of numeric data represented

Objective-C Tutorial

COS 217: Introduction to Programming Systems

How Strings are Stored. Searching Text. Setting. ANSI_PADDING Setting

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

New York University Computer Science Department Courant Institute of Mathematical Sciences

Decision Logic: if, if else, switch, Boolean conditions and variables

Objects for lexical analysis

Introduction to Java

Lecture 5: Java Fundamentals III

Logistics. Software Testing. Logistics. Logistics. Plan for this week. Before we begin. Project. Final exam. Questions?

A Brief Introduction to the Use of Shell Variables

Hands-On UNIX Exercise:

Programming Project 1: Lexical Analyzer (Scanner)

Object Oriented Software Design

AN INTRODUCTION TO UNIX

C++ Essentials. Sharam Hekmat PragSoft Corporation

C++ Input/Output: Streams

Controlling LifeSize Video Systems from the CLI

Perl version documentation - Digest::SHA NAME SYNOPSIS SYNOPSIS (HMAC-SHA) Page 1

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

SQL Server Database Coding Standards and Guidelines


Lecture 4. Regular Expressions grep and sed intro

As previously noted, a byte can contain a numeric value in the range Computers don't understand Latin, Cyrillic, Hindi, Arabic character sets!

PART-A Questions. 2. How does an enumerated statement differ from a typedef statement?

10CS35: Data Structures Using C

Scanner. It takes input and splits it into a sequence of tokens. A token is a group of characters which form some unit.

Introduction to Lex. General Description Input file Output file How matching is done Regular expressions Local names Using Lex

File class in Java. Scanner reminder. Files 10/19/2012. File Input and Output (Savitch, Chapter 10)

University of Hull Department of Computer Science. Wrestling with Python Week 01 Playing with Python

FTP client Selection and Programming

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Importing Xerox LAN Fax Phonebook Data from Microsoft Outlook

Regular Expression Syntax

Embedded SQL programming

Perl in a nutshell. First CGI Script and Perl. Creating a Link to a Script. print Function. Parsing Data 4/27/2009. First CGI Script and Perl

Transcription:

Villanova University CSC 2400: Computer Systems I A "De-Comment" Program Purpose The purpose of this assignment is to help you learn or review (1) the fundamentals of the C programming language, (2) the details of the "de-commenting" task of the C preprocessor, and (3) how to use the GNU/Unix programming tools, especially bash, emacs, and gcc. Background The C preprocessor is an important part of the C programming system. Given a C source code file, the C preprocessor performs three jobs: Merge "physical" lines of source code into "logical" lines. That is, when the preprocessor detects a line that ends with the backslash character, it merges that physical line with the next physical line to form one logical line. Remove comments from ("de-comment") the source code. Handle preprocessor directives (#define, #include, etc.) that reside in the source code. The de-comment job is substantial. For example, the C preprocessor must be sensitive to: The fact that a comment is a token delimiter. After removing a comment, the C preprocessor must make sure that a whitespace character is in its place. Line numbers. After removing a comment, the C preprocessor sometimes must insert blank lines in its place to preserve the original line numbering. String and character literal boundaries. The preprocessor must not consider the character sequence (/*...*/) to be a comment if it occurs inside a string literal ("...") or character literal ('...').

Your Task Your task is to compose a C program named "decomment" that performs a subset of the de-comment job of the C preprocessor, as defined below. Functionality Your program should be a Unix "filter." That is, your program should read characters from the standard input stream, and write characters to the standard output stream and possibly to the standard error stream. Specifically, your program should (1) read text, presumably a C program, from the standard input stream, and (2) write that same text to the standard output stream with each comment replaced by a space. A typical execution of your program from the shell might look like this: decomment < somefile.c > somefilewithoutcomments.c In the following examples a space character is shown as " s " and a newline character as " n ". Your program should replace each comment with a space. Examples: abc/*def*/ghi n abc/*def*/ s ghi n abc s ghi n abc ss ghi n abc s /*def*/ghi n abc ss ghi n Your program should consider text of the form (/*... */) to be a comment. It should not consider text of the form (//... ) to be a comment. Example: abc//def n abc//def n Your program should allow a comment to span multiple lines. That is, your program should allow a comment to contain newline characters. Your program should add blank lines as necessary to preserve the original line numbering. Examples: abc/*def n ghi*/jkl n mno n abc sn jkl n mno n abc/*def n ghi n jkl*/mno n pqr n abc snn mno n pqr n

Your program should not recognize nested comments. Example: abc/*def/*ghi*/jkl*/mno n abc s jkl*/mno n Your program should detect an unterminated comment. If your program detects end-offile before a comment is terminated, it should write the message "Error: line X: unterminated comment" to the standard error stream. "X" should be the number of the line on which the unterminated comment begins. Examples: Error Message abc/*def n ghi n abc snn Error: s line s 1: s unterminated s comment n abcdef n ghi/* n abcdef n ghi sn Error: s line s 2: s unterminated s comment n abc/*def/ghi n jkl n abc snn Error: s line s 1: s unterminated s comment n abc/*def*ghi n jkl n abc snn Error: s line s 1: s unterminated s comment n abc/*def n ghi* n abc snn Error: s line s 1: s unterminated s comment n abc/*def n ghi/ n abc snn Error: s line s 1: s unterminated s comment n Your program should work for standard input lines of any length. Design Design your program as a deterministic finite state automaton (DFA, alias FSA). The FSA concept is described in wikipedia, http://en.wikipedia.org/wiki/finite-state_machine. Generally, a (large) C program should consist of multiple source code files. For this assignment, you need not split your source code into multiple files. Instead you may place all source code in a single source code file. Subsequent assignments will ask you to write programs consisting of multiple source code files. We suggest that your program use the standard C getchar() function to read characters from the standard input stream. Also use enum or define constructs for the states of your FSA in your C implementation. Logistics You should create your program on tanner using bash, emacs and gcc. Step 1: Design a DFA Express your DFA using the traditional "ovals and labeled arrows" notation. More precisely, use the same notation as is used in the examples shown in class. Capture as much of the program's logic as you can within your DFA. The more logic you express in your DFA, the better your grade on the DFA will be.

Step 2: Create Source Code Use emacs to create source code in a file named decomment.c that implements your DFA. Step 3: Preprocess, Compile, Assemble, and Link Use the gcc command to preprocess, compile, assemble, and link your program. Perform each step individually, and examine the intermediate results to the extent possible. Step 4: Execute Execute your program multiple times on various input files that test all logical paths through your code. You should also test your decomment program against its own source code using a command sequence such as this: decomment < decomment.c > output Step 5: Create a readme File Use Emacs to create a text file named "readme" (not "readme.txt", or "README", or "Readme", etc.) that contains: Your name and the assignment number. A description of whatever help (if any) you received from others while doing the assignment, and the names of any individuals with whom you collaborated, as prescribed by the course Policies web page. An indication of how much time you spent doing the assignment. Your assessment of the assignment: Did it help you to learn? What did it help you to learn? Do you have any suggestions for improvement? Etc. Any information that will help us to grade your work in the most favorable light. In particular you should describe all known bugs. Descriptions of your code should not be in the readme file. Instead they should be integrated into your code as comments. Your readme file should be a plain text file. Don't create your readme file using Microsoft Word or any other word processor. Step 6: Submit Hand in a printout of your decomment.c file, your readme file, and a hardcopy of your "circles and labeled arrows" DFA. A DFA drawn using drawing software (e.g. Microsoft PowerPoint) would be good, but it is sufficient to submit a neatly hand-drawn DFA.

Grading We will grade your work on two kinds of quality: quality from the user's point of view, and quality from the programmer's point of view. To encourage good coding practices, we will deduct points if gcc generates warning messages. From the user's point of view, a program has quality if it behaves as it should. The correct behavior of the decomment program is defined by the previous sections of this assignment specification. From the programmer's point of view, a program has quality if it is well styled and thereby easy to maintain. In part, style is defined by the rules summarized in the Basic Rules of Programming Style document. These additional rules apply: Names: You should use a clear and consistent style for variable and function names. One example of such a style is to prefix each variable name with characters that indicate its type. For example, the prefix "c" might indicate that the variable is of type char, "i" might indicate int, "ui" might mean unsigned int, etc. But it is fine to use another style -- a style that does not include the type of a variable in its name -- as long as the result is a clear and readable program. Comments: Each source code file should begin with a comment that includes your name, the number of the assignment, and the name of the file. Include comments in your source code to explain what blocks of code do. OPTIONAL Functionality (Bonus Points) You may expand your code to handle string literals and character literals as follows (just like a real preprocessor does). Text of the form (/*... */) that occurs within a string literal ("...") should not be considered a comment. Examples: abc"def/*ghi*/jkl"mno n abc/*def"ghi"jkl*/mno n abc/*def"ghijkl*/mno n abc"def/*ghi*/jkl"mno n Similarly, text of the form (/*... */) that occurs within a character literal ('...') should not be considered a comment. Examples: abc'def/*ghi*/jkl'mno n abc/*def'ghi'jkl*/mno n abc/*def'ghijkl*/mno n abc'def/*ghi*/jkl'mno n

Note that the C compiler would consider the first of those examples to be erroneous (multiple characters in a character literal). But many C preprocessors would not, and your program should not. Your program should handle escaped characters within string literals. That is, when your program reads a backslash (\) while processing a string literal, your program should consider the next character to be an ordinary character that is devoid of any special meaning. In particular, your program should consider text of the form ("...\"...") to be a valid string literal which happens to contain the double quote character. Examples: abc"def\"ghi"jkl n abc"def\'ghi"jkl n abc"def\"ghi"jkl n abc"def\'ghi"jkl n Similarly, your program should handle escaped characters within character literals. When your program reads a backslash (\) while processing a character literal, your program should consider the next character to be an ordinary character that is devoid of any special meaning. In particular, your program should consider text of the form ('...\'...') to be a valid character literal which happens to contain the quote character. Examples: abc'def\'ghi'jkl n abc'def\"ghi'jkl n abc'def\'ghi'jkl n abc'def\"ghi'jkl n Note that the C compiler would consider both of those examples to be erroneous (multiple characters in a character literal). But many C preprocessors would not, and your program should not. Your program should handle newline characters in C string literals without generating errors or warnings. Examples: abc"def n ghi"jkl n abc"def n ghi n jkl"mno/*pqr*/stu n abc"def n ghi"jkl n abc"def n ghi n jkl"mno s stu n Note that a C compiler would consider those examples to be erroneous (newline character in a string literal). But many C preprocessors would not, and your program should not. Similarly, your program should handle newline characters in C character literals without generating errors or warnings. Examples: abc'def n ghi'jkl n abc'def n ghi n jkl'mno/*pqr*/stu n abc'def n ghi'jkl n abc'def n ghi n jkl'mno s stu n

Note that a C compiler would consider those examples to be erroneous (multiple characters in a character literal, newline character in a character literal). But many C preprocessors would not, and your program should not. Your program should handle unterminated string and character literals without generating errors or warnings. Examples: abc"def/*ghi*/jkl n abc'def/*ghi*/jkl n abc"def/*ghi*/jkl n abc'def/*ghi*/jkl n Note that a C compiler would consider those examples to be erroneous (unterminated string literal, unterminated character literal, multiple characters in a character literal). But many C preprocessors would not, and your program should not. Acknowledgement This project has been designed by Prof. Jennifer Rexford from Princeton University, slightly modified by Prof. Mirela Damian.