730 likes | 909 Views
Chap. 8, Declaration Processing and Symbol Tables. J. H. Wang Dec. 13, 2011. Outline. Constructing a Symbol Table Block-Structured Languages and Scopes Basic Implementation Techniques Advanced Features Declaration Processing Fundamentals Variable and Type Declarations
E N D
Chap. 8, Declaration Processing and Symbol Tables J. H. Wang Dec. 13, 2011
Outline • Constructing a Symbol Table • Block-Structured Languages and Scopes • Basic Implementation Techniques • Advanced Features • Declaration Processing Fundamentals • Variable and Type Declarations • Class and Method Declarations • An Introduction to Type Checking
Constructing a Symbol Table • We walk (make a pass over) the AST for two purposes • To process symbol declarations • To connect each symbol reference with its declaration • An AST node is enriched with a reference to the name’s entry in the symbol table
Static Scoping • Static scope: includes its defining block as well as any contained blocks that do not contain a declaration for the identifier • Global scope: a name space shared by all compilation units • Scopes might be opened and closed by braces ({ } as in C and Java), or by reserved keywords (begin and end as in Ada, Algol)
A Symbol Table Interface • Methods • OpenScope() • CloseScope() • EnterSymbol(name, type) • RetreiveSymbol(name) • DeclaredLocally(name) • Ex. • (Fig. 8.2) Code to build the symbol table for the AST in Fig. 8.1
Block-Structured Languages and Scopes • Block-structured languages: languages that allow nested name scopes • Concepts introduced by Algol 60 • Handling Scopes • Current scope: the innermost context • Open scopes (or currently active scopes): the current scope and its surrounding scopes • Closed scopes: all other scopes
Some common visibility rules • Accessible names are those in the current scope and in all other open scopes • If a name is declared in more than one scope, then a reference to the name is resolved to the innermost declaration • New declarations can be made only in the current scope • Global scope • Extern: in C • Public static: in Java
Compilation-unit scope: in C and C++ • Declared outside of all methods • Package-level scope: in Java • Every function definition is available in the global scope, unless it has static attribute • In C++ and Java, names declared within a class are available to all methods in the class • Protected members are available to the class’s subclasses • Names declared within a statement-block are available to all contained blocks, unless it’s redeclared in an inner scope
One Symbol Table or Many? • Two common approaches to implementing block-structured symbol tables • A symbol table associated with each scope • Or a single, global table
An Individual Table for Each Scope • Because name scope are opened and closed in a last-in first-out (LIFO) manner, a stack is an appropriate data structure for a search • The innermost scope appears at the top of stack • OpenScope(): pushes a new symbol table • CloseScope(): pop • (Fig. 8.3) • Disadvantage • Need to search a name in a number of symbol tables • Cost depending on the number of nonlocal references and the depth of nesting
One Symbol Table • All names in the same table • Uniquely identified by the scope name or depth • RetrieveSymbol() need not chain through scope tables to locate a name • More details in Sec.8.3.3 • (Fig. 8.8)
Basic Implementation Techniques • Entering and Finding Names • The Name Space • An Efficient Symbol Table Implementation
Entering and Finding Names • Examine the time needed to insert symbols, retrieve symbols, and maintain scopes • In particular, we pay attention to the cost of retrieving symbols • Names can be declared no more than once in each scope, but typically referenced multiple times • Various approaches • Unordered list • Ordered list • Binary search trees • Balanced trees • Hash tables
Unordered List • Simplest • Array • Linked list or resizable array • All symbols in a given scope appear adjacently • Insertion: fast • Retrieval: linear scan • Impractically slow
Ordered List • Binary search: O(log n) • How to organize the ordered lists for a name in multiple scopes? • An ordered list of stacks (Fig. 8.4) • RetrieveSymbols() locates stacks using binary search • CloseScope() examines each stack and pops those stacks whose top symbol is declared in the abandoned scope • To avoid such checking, we maintain a separate linking of symbol table entries that are declared at the same scope level (Sec.8.3.3) • Fast retrieval, but expensive insertion • Advantageous when the space of symbols is known • Reserved keywords
Binary Search Trees • Insert, search: O(log n), given random inputs • Average-case performance does not necessarily hold for symbol tables • Programmers do not choose identifiers at random! • Advantage • Simple, widely known implementation
Balanced Trees • The worst-case scenario for binary trees can be avoided if the tree is balanced • E.g.: red-black trees, splay trees • Insert, search: O(log n)
Hash Tables • Most common, due to its excellent performance • Insert, search: O(1), given • A sufficiently large table • A good hash function • Appropriate collision-handling techniques • (Sec.8.3.3)
The Name Space • Properties to consider • The name of a symbol does not change during compilation • Symbol names persist throughout compilation • Great variance in the length of identifier names • Unless an ordered list is maintained, comparisons of symbol names involve only equality and inequality • In favor of one logical name space (Fig. 8.5)
Names are inserted, but never deleted • Two fields • Origin • Length
An Efficient Symbol Table Implementation • A symbol table entry containing • Name • Type • Hash • Var • Level • Depth
Two index structures • Hash table • Scope display • Symbols at the same level • (Fig. 8.7) & (Fig. 8.8)
Advanced Features • Extensions of the simple symbol table framework to accommodate advanced features of modern programming languages • Name augmentation (overloading) • Name hiding and promotion • Modification of search rules
Records and Typenames • Aggregate data structures • struct, record • E.g. a.b.c.d • C, Ada, Pascal: completely specifying the containers and the field • COBOL, PL/I: intermediate containers can be omitted if the reference can be unambiguously resolved • a.c, c.d: more difficult to read • Can be nested arbitrarily deeply • Tree • typedef: alias for a type • Symbol table
Overloading and Type Hierarchies • Method overloading allowed in object-oriented languages such as C++ and Java • If each definition has a unique type signature • Number and types of the parameters and return type • E.g.: print(int), print(String) • To view a method definition not only in terms of its names but also its type signature • To encode type signature of a method along with its name • E.g.: M(int): void • To record a method along with a list of its overloaded definitions
Operator overloading: allowed in C++, Ada • Ada allows literals to be overloaded • E.g.: diamond in two different enumeration types: as a playing card, and as a gem • Pascal, Fortran allow the same symbol to represent the invocation of a method and the value of the method’s result • Two entries in the symbol table • C: the same name as a local variable, a struct name, and a label
E.g.: (in Ex. 17) • main() { struct xxx { int a, b; } c; int xxx;xxx: c.a = 1;} • Type extension through subclassing allowed in Java, C++ • resize(Shape) vs. resize(Rectangle)
Implicit Declarations • In some languages, the appearance of a name in a certain context serves to declare the name as well • E.g.: labels in C • In Fortran: inferred from the identifier’s first letter • In Ada: an index is implicitly declared to be of the same type as the range specifier • A new scope is opened for the loop so that the loop index cannot clash with an existing variable • E.g. for (int i=1; i<10; i++) { … }
Export and Import Directives • Export: some local scope names are to become visible outside that scope • Typically associated with modularization features such as Ada packages, C++ classes, C compilation units, and Java classes • Java: public attribute, String class in java.lang package • C: all methods are known outside unless the static attribute is specified • In a large software system, the space of global names can become polluted and disorganized • C, C++: Header files • Java: import • Ada: use
Altered Search Rules • To alter the way in which symbols are found in symbol table • In Pascal: with R do S • First try to resolve an identifier as a field of the record R • Advantageous if R is a complex name • Can usually generate efficient code • Forward reference in recursive data structures or methods • A portion of the program will reference a definition that has not yet been processed • Must be announced in some languages
Symbol Table Summary • The symbol table organization in this chapter efficiently represents scope-declared symbols in a block-structured language • Most languages include rules for symbol promotion to a global scope • Issues such as inheritance, overloading, and aggregate data types must be considered
Declaration Processing Fundamentals • Attributes in the symbol table • Internal representations of declarations • Identifiers are used in many different ways in a modern programming language • Variables, constants, types, procedures, classes, and fields • Every identifier will not have the same set of attributes • We need a data structure to store the variety of information • Using a struct that contains a tag, and a union for each possible value of the tag • Using object-based approach, Attributes and appropriate subclasses
Type Checking Using an Abstract Syntax Tree • Using the visitor pattern (in Chap. 7) • SemanticsVisitor: a subclass of Visitor • The top-level visitor for processing declarations and doing semantic checking on the AST nodes • TopDeclVisitor • A specialized visitor invoked by SemanticsVisitor for processing declarations • TypeVisitor • A specialized visitor used to handle an identifier that represents a type or a syntactic form that defines a type (such as an array)
Variable and Type Declarations • Simple variable declarations • A type name and a list of identifiers • (Fig. 8.12) • Visitor actions: (Fig. 8.13)
Type Declarations • A name and a description of the type to be associated with it • (Fig. 8.15) • Visit method: (Fig. 8.16)