Symbol Table. ALSU Textbook Chapter 2.7 and 6.5. Tsan-sheng Hsu.

Similar documents
DATA STRUCTURES USING C

Symbol Tables. Introduction

Semantic Analysis: Types and Type Checking

Review of Hashing: Integer Keys

Lecture 9. Semantic Analysis Scoping and Symbol Table

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Habanero Extreme Scale Software Research Project

Introduction to Data Structures

Organization of Programming Languages CS320/520N. Lecture 05. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Chapter 5 Names, Bindings, Type Checking, and Scopes

10CS35: Data Structures Using C

Lecture 2: Data Structures Steven Skiena. skiena

Algorithms and Data Structures

1. Relational database accesses data in a sequential form. (Figures 7.1, 7.2)

In-Memory Database: Query Optimisation. S S Kausik ( ) Aamod Kore ( ) Mehul Goyal ( ) Nisheeth Lahoti ( )

Stack Allocation. Run-Time Data Structures. Static Structures

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Common Data Structures

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

Instruction Set Architecture (ISA)

Hash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited

PES Institute of Technology-BSC QUESTION BANK

Stacks. Linear data structures

C++ INTERVIEW QUESTIONS

2) Write in detail the issues in the design of code generator.

1. Define: (a) Variable, (b) Constant, (c) Type, (d) Enumerated Type, (e) Identifier.

Quiz 4 Solutions EECS 211: FUNDAMENTALS OF COMPUTER PROGRAMMING II. 1 Q u i z 4 S o l u t i o n s

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

Semester Review. CSC 301, Fall 2015

Storage Classes CS 110B - Rule Storage Classes Page 18-1 \handouts\storclas

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Compiler Construction

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

COMP 356 Programming Language Structures Notes for Chapter 10 of Concepts of Programming Languages Implementing Subprograms.

UIL Computer Science for Dummies by Jake Warren and works from Mr. Fleming

File System Management

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Data Structures Using C++ 2E. Chapter 5 Linked Lists

Data Structures. Level 6 C Module Descriptor

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Organization of Records in Blocks

CS 2112 Spring Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

Data Structures in Java. Session 15 Instructor: Bert Huang

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

PRI-(BASIC2) Preliminary Reference Information Mod date 3. Jun. 2015

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

Glossary of Object Oriented Terms

El Dorado Union High School District Educational Services

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

Two Parts. Filesystem Interface. Filesystem design. Interface the user sees. Implementing the interface

GBIL: Generic Binary Instrumentation Language. Andrew Calvano. September 24 th, 2015

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation

Parameter Passing in Pascal

Motivation. Domain Name System (DNS) Flat Namespace. Hierarchical Namespace

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

The C Programming Language course syllabus associate level

Persistent Binary Search Trees

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

The programming language C. sws1 1

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

Practical Survey on Hash Tables. Aurelian Țuțuianu

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Semantic Checker (Part I)

Unit Storage Structures 1. Storage Structures. Unit 4.3

PROBLEM SOLVING SEVENTH EDITION WALTER SAVITCH UNIVERSITY OF CALIFORNIA, SAN DIEGO CONTRIBUTOR KENRICK MOCK UNIVERSITY OF ALASKA, ANCHORAGE PEARSON

Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. "Executive Summary": here.

Comp151. Definitions & Declarations

Record Storage and Primary File Organization

C Compiler Targeting the Java Virtual Machine

Static vs. Dynamic. Lecture 10: Static Semantics Overview 1. Typical Semantic Errors: Java, C++ Typical Tasks of the Semantic Analyzer

Tutorial on C Language Programming

MAX = 5 Current = 0 'This will declare an array with 5 elements. Inserting a Value onto the Stack (Push)

Lecture 1: Data Storage & Index

Java Interview Questions and Answers

LINKED DATA STRUCTURES

Raima Database Manager Version 14.0 In-memory Database Engine

COMPUTER SCIENCE. 1. Computer Fundamentals and Applications

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

C++FA 3.1 OPTIMIZING C++

Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

Concepts and terminology in the Simula Programming Language

Data Structure with C

Chapter 13 Storage classes

Naming vs. Locating Entities

Fundamental Algorithms

C Interview Questions

Intermediate Code. Intermediate Code Generation

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Transcription:

Symbol Table ALSU Textbook Chapter 2.7 and 6.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1

Definition Symbol table: A data structure used by a compiler to keep track of semantics of names. Data type. When is used: scope. The effective context where a name is valid. Where it is stored: storage address. Operations: Find: whether a name has been used. Insert: add a name. Delete: remove a name when its scope is closed. Compiler notes #5, 20130517, Tsan-sheng Hsu 2

Some possible implementations Unordered list: for a very small set of variables; coding is easy, but performance is bad for large number of variables. Ordered linear list: use binary search; insertion and deletion are expensive; coding is relatively easy. Binary search tree: O(log n) time per operation (search, insert or delete) for n variables; coding is relatively difficult. Hash table: most commonly used; very efficient provided the memory space is adequately larger than the number of variables; performance maybe bad if unlucky or the table is saturated; coding is not too difficult. Compiler notes #5, 20130517, Tsan-sheng Hsu 3

Hash table Hash function h(n): returns a value from 0,..., m 1, where n is the input name and m is the hash table size. Uniformly and randomly. Many possible good designs. Add up the integer values of characters in a name and then take the remainder of it divided by m. Add up a linear combination of integer values of characters in a name, and then take the remainder of it divided by m. Resolving collisions: try (h(n) + 1) mod m, where m is a large prime Linear resolution: number, and then (h(n) + 2) mod m,..., (h(n) + i) mod m. Chaining: most popular. Keep a chain on the items with the same hash value. Quadratic-rehashing: try (h(n) + 1 2 ) mod m, and then try (h(n) + 2 2 ) mod m, and then try (h(n) + i 2 ) mod m. Compiler notes #5, 20130517, Tsan-sheng Hsu 4

Performance of hash table Performance issues on using different collision resolution schemes. Hash table size must be adequately larger than the maximum number of possible entries. Frequently used variables should be distinct. Keywords or reserved words. Short names, e.g., i, j and k. Frequently used identifiers, e.g., main. Uniformly distributed. Compiler notes #5, 20130517, Tsan-sheng Hsu 5

Contents in a symbol table Possible entries in a symbol table: Name: a string. Attribute: Reserved word Variable name Type name Procedure name Constant name Data type. Storage allocation, size,... Scope information: where and when it can be used. Compiler notes #5, 20130517, Tsan-sheng Hsu 6

How names are stored Fixed-length name: allocated. allocate a fixed space for each name Too little: names must be short. Too much: waste a lot of spaces. NAME ATTRIBUTES STORAGE ADDR s o r t a r e a d a r r a y i 2 Variable-length name: A string of space is used to store all names. For each name, store the length and starting index of each name. NAME ATTRIBUTES STORAGE ADDR index length 0 5 5 2 7 10 17 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 s o r t $ a $ r e a d a r r a y $ i 2 $ Compiler notes #5, 20130517, Tsan-sheng Hsu 7

Handling block structures Nested blocks mean nested scopes. Two major ways for implementation: Approach 1: multiple symbol tables in one stack. Approach 2: one symbol table with chaining. Compiler notes #5, 20130517, Tsan-sheng Hsu 8

Sample code: block structure main() /* C code */ { /* open a new scope */ int H,A,L; /* parse point A */ { /* open another new scope */ float x,y,h; /* parse point B */ /* x and y can only be used here */ /* H used here is float */ } /* close an old scope */ /* H used here is integer */ { char A,C,M; /* parse point C */ /* A used here is char */ } } Compiler notes #5, 20130517, Tsan-sheng Hsu 9

Multiple symbol tables in one stack An individual symbol table for each scope. Use a stack to maintain the current scope. Search top of stack first. If not found, search the next one in the stack. Use the first one matched. Note: a popped scope can be destroyed in a one-pass compiler, but it must be saved in a multi-pass compiler. main() { /* open a new scope */ } int H,A,L; /* parse point A */ { /* open another new scope */ float x,y,h; /* parse point B */ /* x and y can only be used here */ /* H used here is float */ } /* close an old scope */ /* H used here is integer */ { char A,C,M; /* parse point C */ } S.T. for H, A, L S.T. for x,y,h S.T. for H, A, L S.T. for A,C,M S.T. for H, A, L parse point A parse point B parse point C searching direction Compiler notes #5, 20130517, Tsan-sheng Hsu 10

Pros and cons for multiple symbol tables Advantage: Easy to close a scope. Disadvantage: opened. Difficulties encountered when a new scope is Need to allocate adequate amount of entries for each symbol table if it is a hash table. Waste lots of spaces. A block within a procedure does not usually have many local variables. There may have many global variables, and many local variables when a procedure is entered. Compiler notes #5, 20130517, Tsan-sheng Hsu 11

One symbol table with chaining (1/2) A single global table marked with the scope information. Each scope is given a unique scope number. Incorporate the scope number into the symbol table. Two possible codings (among others): Hash table with chaining. Chaining at the front when names hashed into the same location. main() { /* open a new scope */ int H,A,L; /* parse point A */ { /* open another new scope */ float x,y,h; /* parse point B */ /* x and y can only be used here */ /* H used here is float */ } /* close an old scope */ /* H used here is integer */ { char A,C,M; /* parse point C */ } } parse point B H(2) L(1) x(2) y(2) A(1) H(1) symbol table: hash with chaining parse point C H(1) L(1) C(3) M(3) A(3) A(1) Compiler notes #5, 20130517, Tsan-sheng Hsu 12

One symbol table with chaining (2/2) A second coding choice: Binary search tree with chaining. Use a doubly linked list to chain all entries with the same name. main() { /* open a new scope */ int H,A,L; /* parse point A */ { /* open another new scope */ float x,y,h; /* parse point B */ /* x and y can only be used here */ /* H used here is float */ } /* close an old scope */ /* H used here is integer */ { char A,C,M; /* parse point C */ } } A(1) H(2) L(1) parse point B H(1) x(2) y(2) A(1) A(3) C(3) H(1) L(1) parse point C M(3) Compiler notes #5, 20130517, Tsan-sheng Hsu 13

Pros and cons for a unique symbol table Advantage: Does not waste spaces. Little overhead in opening a scope. Disadvantage: It is difficult to close a scope. Need to maintain a list of entries in the same scope. Using this list to close a scope and to reactive it for the second pass if needed. Compiler notes #5, 20130517, Tsan-sheng Hsu 14

Records and fields The with construct in PASCAL can be considered an additional scope rule. Field names are visible in the scope that surrounds the record declaration. Field names need only to be unique within the record. Another example is the using namespace directive in C++. Example (PASCAL code): A, R: record A: integer X: record A: real; C: boolean; end end R.A := 3; /* means R.A := 3; */ with R do A := 4; /* means R.A := 4; */ Compiler notes #5, 20130517, Tsan-sheng Hsu 15

Implementation of field names Two choices for handling field names: Allocate a symbol table for each record type used. main symbol table A record another symbol table A integer X record R record another symbol table A real another symbol table A integer C boolean X record another symbol table A real C boolean Associate a record number within the field names. Assign record number #0 to names that are not in records. A bit time consuming in searching the symbol table. Similar to the scope numbering technique. Compiler notes #5, 20130517, Tsan-sheng Hsu 16

Locating field names Example: with R do begin A := 3; with X do A := 3.3 end If each record (each scope) has its own symbol table, then push the symbol table for the record onto the stack. If the record number technique is used, then keep a stack containing the current record number; During searching, succeed only if it matches the name and the current record number. If fail, then use next record number in the stack as the current record number and continue to search. If everything fails, search the normal main symbol table. Compiler notes #5, 20130517, Tsan-sheng Hsu 17

Overloading (1/3) A symbol may, depending on context, have more than one semantics. Examples. operators: I := I + 3; X := Y + 1.2; function call return value and recursive function call: f := f + 1; Compiler notes #5, 20130517, Tsan-sheng Hsu 18

Overloading (2/3) Implementation: Link together all possible definitions of an overloading name. Call this an overloading chain. Whenever a name that can be overloaded is defined: if the name is already in the current scope, then add the new definition in the overloading chain; if it is not already there, then enter the name in the current scope, and link the new entry to any existing definitions; search the chain for an appropriate one, depending on the context. Whenever a scope is closed, delete the overloading definitions defined in this scope from the head of the chain. Compiler notes #5, 20130517, Tsan-sheng Hsu 19

Overloading (3/3) Example: PASCAL function name and return variable. Within the function body, the two definitions are chained. i.e., function call and return variable. When the function body is closed, the return variable definition disappears. [PASCAL] function f: integer; begin if global > 1 then f := f +1; end return Compiler notes #5, 20130517, Tsan-sheng Hsu 20

Forward reference Definition: A name that is used before its definition is given. To allow mutually referenced and linked data types, names can sometimes be used before that are declared. Possible implementations: Multi-pass compiler. Back-patching. Avoid resolving a symbol until all possible places where symbols can be declared have been seen. In C, ADA and languages commonly used today, the scope of a declaration extends only from the point of declaration to the end of the containing scope. If names must be defined before their usages, then one-pass compiler with normal symbol table techniques suffices. Some possible usages for forward referencing: GOTO labels. Recursively defined pointer types. Mutually or recursively called procedures. Compiler notes #5, 20130517, Tsan-sheng Hsu 21

GOTO labels Some language like C uses labels without declarations. Implicit declaration. Example: [C] L0: goto L0; goto L1; L1: Compiler notes #5, 20130517, Tsan-sheng Hsu 22

Recursively defined pointer types Determine the element type if possible; Chaining together all references to unknown type names until the end of the type declaration; All type names can then be looked up and resolved. Names that are unable to resolved are undeclared type names. Example: [PASCAL] type link = ^ cell; cell = record info: integer; next: link; end; Compiler notes #5, 20130517, Tsan-sheng Hsu 23

Mutually or recursively called procedures Need to know the specification of a procedure before its definition. Some languages require prototype definitions. Example: procedure A() { call B(); } procedure B() { call A(); } Compiler notes #5, 20130517, Tsan-sheng Hsu 24

Type equivalent and others How to determine whether two types are equivalent? Structural equivalence. Express a type definition via a directed graph where nodes are the elements and edges are the containing information. Two types are equivalent if and only if their structures (labeled graphs) are the same. A difficult job for compilers. entry = record info : real; coordinates : record x : integer; y : integer; end end Name equivalence. [entry] +-----> [info] <real> +-----> [coordinates] +----> [x] <integer> +----> [y] <integer> Two types are equivalent if and only if their names are the same. An easy job for compilers, but the coding takes more time. Symbol table is needed during compilation, and might also be needed during debugging. Compiler notes #5, 20130517, Tsan-sheng Hsu 25

Usage of symbol table with YACC Define symbol table routines: lookup(name,scope): check whether a name within a particular scope is currently in the symbol table or not. Return not found or an entry in the symbol table; enter(name,scope) Return the newly created entry. For interpreters: Use the attributes associated with the symbols to hold temporary values. Use a structure with maybe some unions to record all attributes. struct YYSTYPE { char type; /* data type of a variable */ int value; int addr; char * namelist; /* list of names */ char * name; /* id name */ } Compiler notes #5, 20130517, Tsan-sheng Hsu 26

YACC coding: declaration I Declaration: D L V {use lookup to check whether $2.name has been declared; use enter to insert $2.name with the type $1.type; allocate sizeof($1.type) bytes; record the storage address in the symbol table entry; $$.type = $1.type;} L L V, {use lookup to check whether $2.name has been declared; use enter to insert $2.name with the type $1.type; allocate sizeof($1.type) bytes; record the storage address in the symbol table entry; $$.type = $1.type;} T {$$.type = $1.type;} V id {save yytext into $$.name;} T int {$$.type = int;} Compiler notes #5, 20130517, Tsan-sheng Hsu 27

Grammar I Grammar I: using only simple synthesized attributes D L V L L V, T V id T int Input: int i,j right most derivation D = L V = L j = L V, j = L i, j = T i, j = int i, j 1. Known the type is integer 2. Pass the type to the parent node L D V 4 5 6 3. Save the name i 4. Symbol table processing for i L 2 3 V, j 5. Save the name j 1 T i 6. Symbol table processing for j int Compiler notes #5, 20130517, Tsan-sheng Hsu 28

YACC coding: declaration II Declaration: D T L {use lookup to check each name in the list $2.namelist for possible duplicated names; if it is not duplicated, then use enter to insert each name in the list $2.namelist with the type $1.type; allocate sizeof($1.type) bytes; record the storage address in the symbol table entry;} T int {$$.type = int;} L L, V {append the new name $3.name into the list $1.namelist; return $$.namelist as $1.namelist;} V {the variable name is in $1.name; create a list of one name, i.e., $1.name, $$.namelist;} V id {save yytext into $$.name;} Compiler notes #5, 20130517, Tsan-sheng Hsu 29

Grammar II Grammar II: using a list of names D T L L L, V V V id T int Input: int i,j right most derivation D = T L = T L, V = T L, j = T V, j = T i, j = int i, j 1. Known the type is integer 2. Save the name i 1 T D 6 L 5 3. Create a name list 4. Save the name j int 3 L, V 4 5. Append the new name 2 V j 6. Symbol table operations i Compiler notes #5, 20130517, Tsan-sheng Hsu 30

YACC coding: expressions and assignments Usage of variables: Assign S L var := Expression; {$1.addr is the address of the variable to be stored; $3.value is the value of the expression; generate code for storing $3.value into $1.addr;} L var id { use lookup to check whether yytext is already declared; $$.addr = storage address;} Expression Expression + Expression {$$.value = $1.value + $3.value;} Expression Expression {$$.value = $1.value $3.value;} id { use lookup to check whether yytext is already declared; if no, error if not, $$.value = the value of the variable yytext} Compiler notes #5, 20130517, Tsan-sheng Hsu 31